CN116490610A

CN116490610A - Methods and compositions for modulating genome

Info

Publication number: CN116490610A
Application number: CN202180033116.3A
Authority: CN
Inventors: B.E.斯坦伯格; A.H.博特默; W.E.萨洛蒙; I.什彻巴科娃; C.G.S.科塔-拉穆西诺; J.R.鲁宾斯; R.J.西托里克; Z.J.王
Original assignee: Flagship Pioneering Innovations VI Inc
Current assignee: Flagship Pioneering Innovations VI Inc
Priority date: 2020-03-04
Filing date: 2021-03-04
Publication date: 2023-07-25

Abstract

Methods and compositions for modulating a target genome are disclosed.

Description

Methods and compositions for modulating genome

RELATED APPLICATIONS

The present application claims priority from U.S. Ser. No. 62/985,291, filed 3/4/2020, and U.S. Ser. No. 63/035638, filed 6/5/2020, each of which is incorporated herein by reference in its entirety.

Background

Without specialized proteins to facilitate insertion events, the integration of the nucleic acid of interest into the genome is less frequent and site-specific. Some existing methods, such as CRISPR/Cas9, are more suitable for small editing and are less efficient at integrating longer sequences. Other existing methods, such as Cre/loxP, require a first step of inserting the loxP site into the genome and then a second step of inserting the sequence of interest into the loxP site. There is a need in the art for improved proteins for inserting sequences of interest into the genome.

Disclosure of Invention

The present disclosure relates to novel compositions, systems, and methods for altering the genome at one or more locations in a host cell, tissue, or subject in vivo or in vitro. In particular, the invention features compositions, systems, and methods for introducing exogenous genetic elements into a host genome.

Features of the composition or method may include one or more of the examples listed below.

1. A system for modifying DNA, the system comprising:

(a) A polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and

(b) A template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide and (ii) a heterologous subject sequence,

wherein the polypeptide comprises a mutation that inactivates and/or deletes the nucleolar localisation signal.

2. A system for modifying DNA, the system comprising:

(a) A polypeptide or nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a first target DNA binding domain, e.g., comprising a first Zn-finger domain, (ii) a reverse transcriptase domain, (iii) an endonuclease domain, and (iv) a second target DNA binding domain, e.g., comprising a second Zn-finger domain that is heterologous to the first target DNA binding domain; and

optionally (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide and (ii) a heterologous subject sequence,

wherein (a) in a target cell binds to a smaller amount of target DNA sequence than an analogous polypeptide comprising only the first target DNA binding domain, e.g., wherein the presence of the second target DNA binding domain in a polypeptide having the first DNA binding domain improves the target sequence specificity of the polypeptide relative to the polypeptide target sequence specificity of a polypeptide comprising only the first target DNA binding domain.

3. The system of embodiment 2, wherein (iii) comprises (iv).

4. A system for modifying DNA, the system comprising:

wherein the system is capable of cleaving the first strand of the target DNA at least twice (e.g., twice), and

optionally, wherein the cuts are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or 200 nucleotides away from each other (and optionally no more than 500, 400, 300, 200, or 100 nucleotides away from each other).

5. A system for modifying DNA, the system comprising:

wherein the system is capable of cleaving the first strand and the second strand of the target DNA, and

Wherein the distance between these cuts is the same as the distance between cuts made by the endonuclease domain (e.g., the endonuclease domain of a naturally occurring retrotransposase).

6. A system for modifying DNA, the system comprising:

wherein (a), (b) or (a) and (b) further comprise a 5'utr and/or a 3' utr operably linked to a sequence encoding the polypeptide, the heterologous subject sequence (e.g., a coding sequence comprised in the heterologous subject sequence), or both.

7. The system of embodiment 6, wherein the 5'UTR and/or 3' UTR increases expression of the operably linked one or more sequences by at least 10%, 20%, 30%, 40%, 50%, 70%, 80%, 90%, or 100% relative to an otherwise similar nucleic acid comprising one or more endogenous UTRs or minimal 5'UTR and minimal 3' UTR associated with the heterologous subject sequence.

8. A system for modifying DNA, the system comprising:

(a) A polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a Reverse Transcriptase (RT) domain, (ii) a DNA Binding Domain (DBD); and (iii) an endonuclease domain, such as a nicking enzyme domain; and

(b) A template RNA (or DNA encoding the template RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to a target site (e.g., the second strand of a site in a target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' target homology domain;

wherein:

(i) The polypeptide comprises a heterologous targeting domain that specifically binds to a sequence comprised in the target site (e.g., in the DBD or the endonuclease domain); and/or

(ii) The template RNA comprises a heterologous homologous sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to the sequence comprised in the target site.

9. A system for modifying DNA, the system comprising:

(b) A template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide, (ii) a heterologous subject sequence, and (iii) a ribozyme heterologous to (a) (i), (a) (ii), (b) (i), or a combination thereof.

10. The system of example 9, wherein the ribozyme is heterologous to (b) (i).

11. The system of embodiment 9 or 10, wherein the template RNA comprises (iv) a second ribozyme, e.g., which is endogenous to (a) (i), (a) (ii), (b) (i), or a combination thereof, e.g., wherein the second ribozyme is endogenous to (b) (i).

12. The system of embodiment 9 or 10, wherein the heterologous ribozyme replaces a ribozyme that is endogenous to (a) (i), (a) (ii), (b) (i), or a combination thereof, e.g., wherein the second ribozyme is endogenous to (b) (i).

13. A system for modifying DNA, the system comprising:

optionally (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and

(b) A template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide, (ii) a heterologous subject sequence, (iii) a 5'UTR capable of being cleaved into fragments and a cleaved template RNA, wherein the 5' UTR is optionally a sequence that binds the polypeptide,

wherein the 5' utr comprises one or more mutations (e.g., relative to a wild-type 5' utr, e.g., as described herein) that increase the affinity of the fragment for the cleaved template RNA, e.g., such that the fragment hybridizes to the cleaved template RNA (e.g., the 5' utr of the cleaved template RNA), e.g., under stringent conditions, e.g., wherein the stringent conditions comprise hybridization in 4x sodium chloride/sodium citrate (SSC) at about 65 ℃ followed by washing in 1xSSC at about 65 ℃.

14. The system of embodiment 13, wherein the template RNA, e.g., 5'utr, comprises a ribozyme that cleaves the template RNA (e.g., in the 5' utr).

15. A system for modifying DNA, the system comprising:

wherein (a), (b) or (a) and (b) comprise introns that increase expression of the polypeptide, the heterologous subject sequence (e.g., a coding sequence located in the heterologous subject sequence), or both.

16. A method of modifying a target DNA strand in a cell, tissue or subject, the method comprising administering to the cell a system, wherein the system comprises:

wherein the system reverse transcribes the template RNA sequence into the target DNA strand, thereby modifying the target DNA strand, and

Wherein the cell has reduced activity of a Rad51 repair pathway, reduced expression of Rad51 or a component of a Rad51 repair pathway, or does not comprise a functional Rad51 repair pathway, e.g., does not comprise a functional Rad51 gene, e.g., comprises a mutation (e.g., a deletion) that inactivates one or both copies of a Rad51 gene or another gene in a Rad51 repair pathway.

17. A system for modifying DNA, the system comprising:

wherein the heterologous subject sequence comprises a sequence of any of tables 10A-10D or 11A-11G, e.g., a gene or fragment thereof.

18. A system for modifying DNA, the system comprising:

(a) A polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain, wherein the polypeptide is modified to enhance activity or alter specificity; and

(b) A template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide and (ii) a heterologous subject sequence.

19. A system for modifying DNA, the system comprising:

(b) A template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide and (ii) a heterologous subject sequence, wherein the template RNA comprises one or more chemical modifications selected from the group consisting of: dihydrouridine, inosine, 7-methylguanosine, 5-methylcytidine (5 mC), 5' -ribonucleoside phosphate, 2' -O-ribothymidine, 2' -riboribothymidine, C-5 propynyl-deoxycytidine (pdC), C-5 propynyl-deoxyuridine (pdU), C-5 propynyl cytidine (pC), C-5 propynyl uridine (pU), 5-methylcytidine, 5-methyluridine, 5-methyldeoxycytidine, 5-methyldeoxyuridine methoxy, 2, 6-diaminopurine, 5' -dimethoxytrityl-N4-ethyl-2 ' -deoxycytidine, C-5 propynyl-f-cytidine (pfU), 5-methylf-cytidine, 5-methylf-uridine, C-5 propynyl-m-cytidine (pmC), C-5 propynyl-uridine (pmU), 5-methylcytidine (Me-1-N), pseudouridine (Me-1-N-13), pseudouridine (Me-1-N-methylcytidine) or a small binding agent.

20. A system for modifying DNA, the system comprising:

(a) A polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a target DNA binding domain, (ii) a reverse transcriptase domain, optionally (iii) an endonuclease domain, wherein the polypeptide comprises a heterologous linker in place of a portion of (i), (ii) or (iii), or in place of an endogenous linker linking two of (i), (ii) or (iii); and

optionally (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide and (ii) a heterologous subject sequence.

21. A system for modifying DNA, the system comprising:

(b) A template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds to the polypeptide, (ii) a heterologous subject sequence, (iii) a first homologous domain of at least 5 or at least 10 bases having 100% identity to a target DNA strand, at the 5 'end of the template RNA, and (iv) a second homologous domain of at least 5 or at least 10 bases having 100% identity to the target DNA strand, at the 3' end of the template RNA.

22. A system as in any preceding embodiment, wherein the polypeptide comprises a mutation that inactivates and/or deletes a nucleolar localisation signal.

23. A system according to embodiment 22, wherein the activity of the nucleolar localisation signal is reduced by at least 50%, 60%, 70%, 80%, 90%, 95%, or 99%.

24. The system of any one of embodiments 22 or 23, wherein the polypeptide comprises a Nuclear Localization Signal (NLS), such as an endogenous NLS or an exogenous NLS.

25. The system of any preceding embodiment, wherein the polypeptide of (a) comprises a target DNA binding domain (e.g., the endonuclease domain comprises a target DNA binding domain), such as a first target DNA binding domain, or (a) further comprises a target DNA binding domain, such as a first target binding domain.

26. The system of embodiment 25, wherein:

(a) Further comprising a second target DNA binding domain, e.g. a Zn-finger domain, which is heterologous to e.g. the first target DNA binding domain or the endonuclease domain.

27. The system of embodiment 26, wherein the endonuclease domain comprises the second target DNA binding domain.

28. The system of embodiment 26 or 27, wherein the second target DNA binding domain affects endonuclease activity of the polypeptide.

29. The system of any preceding embodiment, wherein the second target DNA binding domain affects DNA nicking activity of the polypeptide.

30. The system of any preceding embodiment, wherein the second target DNA binding domain binds to a locus provided in table E3.

31. The system of any preceding embodiment, wherein the loci in table E3 have a genomic score of at least 6.

32. The system of any preceding embodiment, wherein the polypeptide of (a) binds a smaller number of target DNA sequences than a similar polypeptide comprising only the first target DNA binding domain or the second target DNA binding domain, e.g., wherein the presence of the second target DNA binding domain in a polypeptide having the first target DNA binding domain improves the target sequence specificity of the polypeptide relative to the target sequence specificity of a polypeptide comprising only the first target DNA binding domain.

33. The system of any preceding embodiment, wherein the second target DNA binding domain binds to a genomic DNA sequence less than 100, 90, 80, 70, 60, 50, 40, 30, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotide from the genomic DNA sequence to which the first target DNA binding domain binds.

34. The system of any preceding embodiment, wherein the second target DNA binding domain binds to a genomic sequence 1-100, 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 1-5, 5-100, 5-90, 5-80, 5-70, 5-60, 5-50, 5-40, 5-30, 5-20, 5-10, 10-100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 20-100, 20-90, 1-5, 5-100, 5-90, 5-80, 10-100, 10-30, 10-20, 20-100, 10-90 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70-100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides.

35. The system of any preceding embodiment, wherein the first or second target DNA binding domain comprises a CRISPR/Cas protein, TAL effector domain, zn-finger domain, or meganuclease domain.

36. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a TAL effector domain.

37. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a Zn-finger domain.

38. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a CRISPR/Cas protein.

39. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a meganuclease domain.

40. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a TAL effector domain and the second target DNA binding domain comprises a Zn-finger domain.

41. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a TAL effector domain and the second target DNA binding domain comprises a TAL effector domain.

42. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a TAL effector domain and the second target DNA binding domain comprises a meganuclease domain.

43. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a Zn-finger domain and the second target DNA binding domain comprises a Zn-finger domain.

44. The system of any preceding embodiment, wherein the first target DNA binding domain comprises a Zn-finger domain and the second target DNA binding domain comprises a meganuclease domain.

45. The system of any preceding embodiment, wherein the second DNA binding domain binds to a Genomic Safe Harbor (GSH) site or a Natural Harbor ^TM Sequences in the site.

46. The system of any preceding embodiment, wherein the system is capable of cleaving a first strand of the target DNA and a second strand of the target DNA, e.g., wherein the cleaves are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or 200 nucleotides away from each other (and optionally no more than 500, 400, 300, 200, or 100 nucleotides away from each other).

47. The system of any preceding embodiment, wherein the system is capable of cleaving a first strand of the target DNA at least twice (e.g., twice), e.g., wherein the cleaves are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or 200 nucleotides away from each other (and optionally no more than 500, 400, 300, 200, or 100 nucleotides away from each other).

48. The system of any preceding embodiment, wherein the cuts are 1-500, 1-400, 1-300, 1-200, 1-100, 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 1-5, 5-500, 5-400, 5-300, 5-200, 5-100, 5-90, 5-80, 5-70, 5-60, 5-50, 5-40, 5-30, 5-20, 5-10, 10-500, 10-400, 10-300, 10-200, 10-100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30 10-20, 20-500, 20-400, 20-300, 20-200, 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 30-500, 30-400, 30-300, 30-200, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-500, 40-400, 40-300, 40-200, 40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-500, 50-400, 50-300, 50-200, 50-100, 50-90, 50-80, 50-70, 50-60, 60-500, 60-400, 60-300, 60-200, 60-100, 60-90, 60-80, 60-70, 70-500, 70-400, 70-300, 70-200, 70-100, 70-90, 70-80, 80-500, 80-400, 80-300, 80-200, 80-100, 80-90, 90-500, 90-400, 90-300, 90-200, 90-100, 100-500, 100-400, 100-300, 100-200, 200-500, 200-400, 200-300, 300-500, 300-400, or 400-500 nucleotides.

49. The system of any preceding embodiment, wherein the distance between the cuts is the same as the distance between cuts produced by the endonuclease domain (e.g., an endonuclease domain of a naturally occurring retrotransposase).

50. The system of any preceding embodiment, wherein both of the cuts are generated by the same endonuclease domain (e.g., CRISPR/Cas protein, e.g., directed by multiple grnas located in the template RNA, for example).

51. The system of any preceding embodiment, wherein the polypeptide further comprises a second endonuclease domain.

52. The system of any preceding embodiment, wherein:

i) The first endonuclease domain (e.g., a nicking enzyme) cleaves the strand of target DNA to be edited, and the second endonuclease domain (e.g., a nicking enzyme) cleaves the unedited strand of target DNA, or

ii) the first endonuclease domain (e.g., a nicking enzyme) cleaves the strand of target DNA to be edited twice, and the second endonuclease domain (e.g., a nicking enzyme) cleaves the strand of target DNA to be edited another time.

53. The system of any preceding embodiment, wherein (a), (b) or (a) and (b) further comprise a 5'utr and/or a 3' utr operably linked to a sequence encoding the polypeptide, the heterologous subject sequence (e.g., a coding sequence comprised in the heterologous subject sequence), or both, wherein the 5'utr and/or 3' utr increases expression of one or more sequences operably linked.

54. The system of the preceding embodiment, wherein the 5'utr and/or 3' utr:

increasing the stability, e.g., half-life, of the template RNA, the RNA transcribed from (a), or both; and/or

Increasing the translational efficiency of the heterologous subject sequence, the polypeptide, or both.

55. The system of the preceding embodiment, wherein the 5'utr comprises a 5' utr from complement factor 3 (C3) or a functional fragment or variant thereof.

56. The system of any preceding embodiment, wherein the 3'utr comprises a 3' utr from a serotonin 1 (ORM 1) or a functional fragment or variant thereof.

57. The system of any preceding embodiment, wherein

i) The 5' UTR increases the rate of translation, e.g., relative to an otherwise similar nucleic acid comprising one or more endogenous UTRs or a minimum 5' UTR and a minimum 3' UTR associated with the heterologous subject sequence,

ii) the 3' UTR increases nucleic acid half-life, e.g., relative to an otherwise similar nucleic acid comprising one or more endogenous UTRs or minimal 5' UTR and minimal 3' UTR associated with the heterologous subject sequence, or

iii) Both i) and ii).

58. The system of any preceding embodiment, wherein the template RNA comprises a ribozyme that is heterologous to (a) (i), (a) (ii), (b) (i), or a combination thereof.

59. The system of any preceding embodiment, wherein the heterologous ribozyme replaces a ribozyme that is endogenous to (a) (i), (a) (ii), (b) (i), or a combination thereof.

60. The system of any preceding embodiment, wherein the template RNA comprises a second ribozyme, e.g., that is endogenous to (a) (i), (a) (ii), (b) (i), or a combination thereof.

61. The system of any preceding embodiment, wherein the heterologous ribozyme is located in the 5'utr or 3' utr of the template RNA.

62. The system of any preceding embodiment, wherein the heterologous ribozyme is 5 'of the heterologous subject sequence or 3' of the heterologous subject sequence.

63. The system of any preceding embodiment, wherein the heterologous ribozyme is capable of cleaving RNA comprising the ribozyme, e.g., 5 'of the ribozyme, 3' of the ribozyme, or within the ribozyme.

64. The system of any preceding embodiment, wherein the heterologous ribozyme is 5 'of the heterologous subject sequence and cleaves 3' of the heterologous ribozyme, e.g., wherein the heterologous ribozyme is a synthetic or naturally occurring hammerhead ribozyme.

65. The system of any preceding embodiment, wherein the heterologous ribozyme is 3 'of the heterologous subject sequence and cleaves 5' of the heterologous ribozyme, e.g., wherein the heterologous ribozyme is selected from the group consisting of an HDV family ribozyme or an axe-shaped ribozyme.

66. The system of any preceding embodiment, wherein the template RNA further comprises a ribozyme hybridization region, e.g., a template with altered targeting (e.g., by a homology arm) comprises a modified 5' utr comprising the ribozyme hybridization region.

67. The system of any preceding embodiment, wherein a portion of the ribozyme hybridizes to a sequence 5 'or 3' of the ribozyme (e.g., by Watson-Crick base pairing).

68. The system of any preceding embodiment, wherein the ribozyme sequence is altered from its native sequence by at least 1, 2, 3, 4, 5, 6, 8, 9, 10, 15, 20, 25 or more base pairs.

69. The system of any preceding embodiment, wherein the ribozyme sequence is altered from its native sequence to hybridize with the 5 'or 3' homology arm of the target ribozyme

70. The system of any preceding embodiment, wherein the system integrates a heterologous subject sequence into a target genome more efficiently than an otherwise similar system lacking the heterologous ribozyme, e.g., wherein at least 10%, 20%, 30%, 405, 50%, 60%, 70%, 80%, 90%, or 100% more of the cells exhibit integration in the presence of the system comprising the heterologous ribozyme as compared to the system lacking the heterologous ribozyme.

71. The system of any preceding embodiment, wherein the template RNA comprises a 5' utr capable of being cleaved into fragments and a cleaved template RNA.

72. The system of any preceding embodiment, wherein the template RNA comprises a ribozyme that cleaves the template RNA, e.g., in the 5' utr.

73. The system of any preceding embodiment, wherein the 5'utr comprises one or more mutations (e.g., relative to a wild-type 5' utr described herein, e.g., as described in table 1 or 3, or from a protein domain listed in table 2).

74. The system of any preceding embodiment, wherein the one or more mutations increase the affinity of the fragment for the cleaved template RNA, e.g., such that the fragment hybridizes to the cleaved template RNA (e.g., the 5' utr of the cleaved template RNA) under stringent conditions, e.g., wherein stringent conditions of hybridization comprise hybridization in 4x sodium chloride/sodium citrate (SSC) at about 65 ℃, followed by washing in 1xSSC at about 65 ℃.

76. The system of any preceding embodiment, wherein (a), (b) or (a) and (b) comprise introns that increase expression of the polypeptide, the heterologous subject sequence (e.g., a coding sequence located in the heterologous subject sequence), or both.

77. The system of any preceding embodiment, wherein the intron is operably linked (e.g., to be recognized by a cellular splice protein) to a sequence encoding the polypeptide, to the heterologous subject sequence (e.g., a coding sequence located in the heterologous subject sequence), or to both.

78. The system of any preceding embodiment, wherein the intron is located in the 5'utr (e.g., 5' of the heterologous subject sequence).

79. The system of any preceding embodiment, wherein the intron is located in the coding sequence of the heterologous subject sequence.

80. The system of any preceding embodiment, wherein the intron is positioned in a forward direction relative to the coding sequence of the heterologous subject sequence.

81. The system of any preceding embodiment, wherein the intron is positioned in reverse orientation relative to the coding sequence of the heterologous subject sequence.

82. The system of any preceding embodiment, wherein the intron is spliced after transcription of the template RNA and prior to target-initiated reverse transcription into a target, e.g., genomic DNA.

83. The system of any preceding embodiment, wherein the intron is spliced post-transcriptionally of the heterologous subject sequence after integration of the heterologous subject sequence into the target, e.g., genomic DNA.

84. The system of any preceding embodiment, wherein the intron comprises a microrna binding site.

85. The system of any one of the preceding embodiments, wherein the exonuclease domain (e.g., an endonuclease domain of R2Tg or R2-1_za) recognizes a motif (e.g., GG or AAGG, TAAGGT or TTAAGGTAGC) and the heterologous DNA binding domain recognizes a genomic DNA sequence, wherein the motif and the genomic DNA sequence are in the range of 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-100, 100-150, 150-200, or 200-250 nucleotides to each other, optionally wherein the motif recognized by the endonuclease domain comprises 4, 5, 6, 7, 8, 9, or 10 consecutive nucleotides of TTAAGGTAGC, AAGGTAGCCAAA or TAAGGTAGCCAAA, or wherein the motif recognized by the endonuclease domain comprises 2 or 3 consecutive nucleotides of AAGG.

86. The system of any preceding embodiment, wherein the motif is upstream of the genomic DNA sequence, e.g., the motif is about 30-80, 40-70, 50-60, or 55nt upstream of the genomic DNA sequence.

87. The system of any preceding embodiment, wherein the motif is downstream of the genomic DNA sequence, e.g., the motif is about 10-30, 15-25, or 20nt downstream of the genomic DNA sequence.

88. The system of any preceding embodiment, wherein the sequence is in the same orientation as the genomic DNA sequence or in a reverse complementary orientation to the genomic DNA sequence.

89. The system of any preceding embodiment, wherein the heterologous DNA binding domain (e.g., zinc finger domain) is the N-terminus or C-terminus of an endonuclease domain.

90. The system of any preceding embodiment, wherein a linker (e.g., a linker of table 38) is disposed between the heterologous DNA binding domain and the endonuclease domain.

91. The system of any one of the preceding embodiments, wherein the system comprises one or more circular RNA molecules (circrnas).

92. The system of any preceding embodiment, wherein the circRNA encodes a Gene Writer polypeptide.

93. The system of any preceding embodiment, wherein the circRNA comprises a template RNA.

94. The system of any preceding embodiment, wherein the circRNA is delivered to a host cell.

95. The system of any one of the preceding embodiments, wherein the circRNA is capable of being linearized, e.g., in a host cell, e.g., in a nucleus of the host cell.

95. The system of any one of the preceding embodiments, wherein the circRNA comprises a cleavage site.

97. The system of any preceding embodiment, wherein the circRNA further comprises a second cleavage site.

98. The system of any preceding embodiment, wherein the cleavage site is cleavable by a ribozyme, e.g., a ribozyme contained in the circRNA (e.g., by self-cleavage).

99. The system of any one of the preceding embodiments, wherein the circRNA comprises a ribozyme sequence.

100. The system of any preceding embodiment, wherein the ribozyme sequence is capable of self-cleaving, e.g., in a host cell, e.g., in the nucleus of the host cell.

101. The system of any preceding embodiment, wherein the ribozyme is an inducible ribozyme.

102. The system of any preceding embodiment, wherein the ribozyme is a protein-reactive ribozyme, e.g., a ribozyme that is reactive with a nucleoprotein, e.g., a genomic interaction protein, e.g., an epigenetic modification, e.g., EZH 2.

103. The system of any preceding embodiment, wherein the ribozyme is a nucleic acid-reactive ribozyme.

104. The system of any preceding embodiment, wherein the catalytic activity (e.g., autocatalytic activity) of the ribozyme is activated in the presence of a target nucleic acid molecule (e.g., an RNA molecule, e.g., mRNA, miRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA).

105. The system of any preceding embodiment, wherein the ribozyme is reactive with a target protein (e.g., MS2 coat protein).

106. The system of any preceding embodiment, wherein the target protein is localized to the cytoplasm or to the nucleus (e.g., epigenetic modifications or transcription factors).

107. The system of any preceding embodiment, wherein the ribozyme comprises a ribozyme sequence of a B2 or ALU retrotransposon, or a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

108. The system of any preceding embodiment, wherein the ribozyme comprises the sequence of a tobacco ringspot virus hammerhead ribozyme, or a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

109. The system of any preceding embodiment, wherein the ribozyme comprises the sequence of a Hepatitis Delta Virus (HDV) ribozyme, or has at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% nucleic acid sequence therewith.

110. The system of any preceding embodiment, wherein the ribozyme is activated by a moiety expressed in a target cell or target tissue.

111. The system of any preceding embodiment, wherein the ribozyme is activated by a moiety expressed in a target subcellular compartment (e.g., nucleus, nucleolus, cytoplasm, or mitochondria).

112. The system of any one of the preceding embodiments, wherein the ribozyme is comprised in a circular RNA or a linear RNA.

113. A system comprising a first circular RNA encoding a polypeptide of the Gene Writing system; and

a second circular RNA comprising a template RNA of the Gene Writing system.

114. The system of any one of the preceding embodiments, wherein the template RNA, e.g., 5'utr, comprises a ribozyme that cleaves the template RNA (e.g., in the 5' utr).

115. The system of any one of the preceding embodiments, wherein the template RNA comprises a ribozyme heterologous to (a) (i) (reverse transcriptase domain), (a) (ii) (endonuclease domain), (b) (i) (template RNA sequence that binds the polypeptide), or a combination thereof.

116. The system of any one of the preceding embodiments, wherein the heterologous ribozyme is capable of cleaving RNA comprising the ribozyme, e.g., 5 'to the ribozyme, 3' to the ribozyme, or within the ribozyme.

117. A Lipid Nanoparticle (LNP) comprising a system, polypeptide (or RNA encoding the same), nucleic acid molecule, or DNA encoding the system or polypeptide as described in any preceding embodiment.

118. A system comprising a first lipid nanoparticle comprising a polypeptide (or DNA or RNA encoding the same) of a Gene Writing system (e.g., as described herein); and

A second lipid nanoparticle comprising a nucleic acid molecule of a Gene Writing system (e.g., as described herein).

119. The system, kit, polypeptide, or reaction mixture of any preceding embodiment, wherein the system, nucleic acid molecule, polypeptide, and/or DNA encoding the same is formulated as a Lipid Nanoparticle (LNP).

120. The LNP of any preceding embodiment, comprising a cationic lipid.

121. The LNP of any preceding embodiment, wherein the cationic lipid has the following structure:

122. the LNP of any preceding embodiment, further comprising one or more neutral lipids, e.g., DSPC, DPPC, DMPC, DOPC, POPC, DOPE, SM, a steroid, e.g., cholesterol, and/or one or more polymer conjugated lipids, e.g., a pegylated lipid, e.g., PEG-DAG, PEG-PE, PEG-S-DAG, PEG-cer, or PEG dialkoxypropyl carbamate.

123. The system, kit, or polypeptide of any one of the preceding embodiments, wherein the system, polypeptide, and/or DNA encoding the same is formulated as a Lipid Nanoparticle (LNP).

124. The system, kit, or polypeptide of embodiment M1, wherein the lipid nanoparticle (or formulation comprising a plurality of lipid nanoparticles) lacks reactive impurities (e.g., aldehydes), or comprises reactive impurities (e.g., aldehydes) at less than a preselected level.

125. The system, kit, or polypeptide of embodiment M1, wherein the lipid nanoparticle (or formulation comprising a plurality of lipid nanoparticles) lacks an aldehyde, or comprises an aldehyde below a preselected level.

126. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle is comprised in a formulation comprising a plurality of the lipid nanoparticles.

127. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content.

128. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 3% total reactive impurity (e.g., aldehyde) content.

128. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

129. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 0.3% of any single reactive impurity (e.g., aldehyde) species.

130. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 0.1% of any single reactive impurity (e.g., aldehyde) species.

131. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content.

132. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation comprises less than 3% total reactive impurity (e.g., aldehyde) content.

133. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

134. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation comprises less than 0.3% of any single reactive impurity (e.g., aldehyde) species.

135. The system, kit, or polypeptide of any preceding embodiment, wherein the lipid nanoparticle formulation comprises less than 0.1% of any single reactive impurity (e.g., aldehyde) species.

136. The system, kit, or polypeptide of any preceding embodiment, wherein one or more, or optionally all, of the lipid agents used in a lipid nanoparticle or formulation thereof as described herein comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content.

137. The system, kit, or polypeptide of any preceding embodiment, wherein one or more, or optionally all, of the lipid reagents used in a lipid nanoparticle or formulation thereof as described herein comprise less than 3% total reactive impurity (e.g., aldehyde) content.

138. The system, kit, or polypeptide of any preceding embodiment, wherein one or more, or optionally all, of the lipid reagents for a lipid nanoparticle or formulation thereof as described herein comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

139. The system, kit, or polypeptide of any preceding embodiment, wherein one or more, or optionally all, of the lipid reagents used in a lipid nanoparticle or formulation thereof as described herein comprise less than 0.3% of any single reactive impurity (e.g., aldehyde) species.

140. The system, kit, or polypeptide of any preceding embodiment, wherein one or more, or optionally all, of the lipid reagents used in a lipid nanoparticle or formulation thereof as described herein comprise less than 0.1% of any single reactive impurity (e.g., aldehyde) species.

141. The system, kit or polypeptide of any preceding embodiment, wherein the total aldehyde content and/or the amount of any single reactive impurity (e.g., aldehyde) species is determined by Liquid Chromatography (LC), e.g., in combination with tandem mass spectrometry (MS/MS), e.g., according to the method described in example 26.

142. The system, kit, or polypeptide of any preceding embodiment, wherein the total aldehyde content and/or the amount of reactive impurities (e.g., aldehydes) species is determined by detecting one or more chemical modifications of a nucleic acid molecule (e.g., as described herein) associated with, for example, the presence of a reactive impurity (e.g., aldehyde) in a lipid reagent.

143. The system, kit, or polypeptide of any preceding embodiment, wherein the total aldehyde content and/or amount of aldehyde species is determined by detecting one or more chemical modifications of a nucleotide or nucleoside (e.g., ribonucleotide or ribonucleoside, e.g., comprised in or isolated from a nucleic acid molecule, e.g., as described herein) associated with, e.g., the presence of a reactive impurity (e.g., an aldehyde) in such lipid reagent, e.g., as described in example 27.

144. The system, kit or polypeptide of example M21, wherein the chemical modification of a nucleic acid molecule, nucleotide or nucleoside is detected by determining the presence of one or more modified nucleotides or nucleosides, e.g., using LC-MS/MS analysis, e.g., as described in example 27.

145. A method of modifying a target DNA strand in a cell, tissue, or subject, the method comprising administering to the cell, tissue, or subject any of the foregoing numbered systems, wherein the system reverse transcribes the template RNA sequence into the target DNA strand, thereby modifying the target DNA strand, and wherein the cell has reduced activity of a Rad51 repair pathway, reduced expression of Rad51, or a component of a Rad51 repair pathway, or does not comprise a functional Rad51 repair pathway, e.g., does not comprise a functional Rad51 gene, e.g., comprises a mutation (e.g., a deletion) that inactivates one or both copies of the Rad51 gene or another gene in a Rad51 repair pathway.

146. A host cell (e.g., a mammalian cell, such as a human cell) comprising any of the foregoing numbered systems, wherein the host cell has reduced activity of the Rad51 repair pathway, reduced expression of Rad51 or a component of the Rad51 repair pathway, or does not comprise a functional Rad51 repair pathway, e.g., does not comprise a functional Rad51 gene, e.g., comprises a mutation (e.g., a deletion) that inactivates one or both copies of the Rad51 gene or another gene in the Rad51 repair pathway.

147. The system of any preceding embodiment, wherein the polypeptide binds to a promoter region, 5'utr region, exon, intron, or 3' utr region of a sequence (e.g., a gene or fragment thereof) of any of tables 10A-10D or 11A-11G.

148. The system of any preceding embodiment, wherein the polypeptide further comprises a heterologous linker that replaces (i) a target DNA binding domain, (ii) a reverse transcriptase domain, optionally (iii) a portion of an endonuclease domain, or replaces an endogenous linker that connects two of (i), (ii), or (iii), wherein optionally the linker is a linker of table 38.

149. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (i).

150. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (ii).

151. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (iii).

152. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (i) and (ii).

153. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (i) and (iii).

154. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (ii) and (iii).

155. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, an endogenous linker linking (i) and (ii).

156. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, an endogenous linker linking (i) and (iii).

157. The system of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, an endogenous linker linking (ii) and (iii).

158. The system of any preceding embodiment, wherein the heterologous linker comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

159. The system of any preceding embodiment, wherein the heterologous linker comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, or 500 amino acids.

160. The method of any one of the preceding embodiments, wherein the tissue is liver, lung, skin, muscle tissue (e.g., skeletal muscle), eye or eye tissue, blood cells, immune cells, or the central nervous system.

161, the method of any of the preceding embodiments, wherein the cell is a Hematopoietic Stem Cell (HSC), T cell, B cell, or Natural Killer (NK) cell.

162. The method of any one of the preceding embodiments, wherein the cell is a fibroblast.

163. The method of any one of the preceding embodiments, wherein the cell is a primary cell.

164. The method of any one of the preceding embodiments, wherein the cell is not immortalized.

165. The system of any one of the preceding embodiments, wherein (a) comprises RNA and (b) comprises RNA.

166. The system of any one of the preceding embodiments, wherein (a) and (b) are part of the same nucleic acid.

167. The system of any preceding embodiment, wherein (a) and (b) are separate nucleic acids.

168. The system of any one of the preceding embodiments, comprising only RNA, or more RNA than DNA, with a RNA to DNA ratio of at least 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, or 100:1.

169. The system of any preceding embodiment, wherein the heterologous subject sequence comprises an open reading frame in a 5 'to 3' orientation on the template RNA.

170. The system of any preceding embodiment, wherein the heterologous subject sequence comprises an open reading frame in a 3 'to 5' orientation on the template RNA.

171. The system of any one of the preceding embodiments, wherein the sequence that binds the polypeptide is a 3' untranslated sequence.

172. The system of any preceding embodiment, wherein the template RNA further comprises a 5' untranslated sequence.

173. The system of any one of the preceding embodiments, wherein the template RNA further comprises a promoter operably linked to the heterologous subject sequence, e.g., in some embodiments, the heterologous subject sequence may comprise a promoter operably linked to a sequence, e.g., a protein coding sequence.

174. The system of any preceding embodiment, wherein the promoter is located between the 5' untranslated sequence and the heterologous subject sequence.

175. The system of any preceding embodiment, wherein the promoter is located between the 3' untranslated sequence that binds to the polypeptide and the heterologous subject sequence.

176. The system of any preceding embodiment, wherein the 5' untranslated sequence is a sequence of column 5 of table 3, or a sequence having at least 80% identity thereto.

177. The system of any preceding embodiment, wherein the 3' untranslated sequence is a sequence of column 6 of table 3, or a sequence having at least 80% identity thereto.

178. The system of any one of the preceding embodiments, wherein the heterologous subject sequence comprises an enzyme, a membrane protein, a blood factor, an intracellular protein, an extracellular protein, a structural protein, a signaling protein, a regulatory protein, a transport protein, a sensory protein, a motor protein, a defensin protein, a storage protein, an immune receptor protein (e.g., a synthetic immune receptor protein, such as a chimeric antigen receptor protein (CAR), a T cell receptor, a B cell receptor), or an antibody.

179. The system of any one of the preceding embodiments, wherein the template RNA comprises at least 5 bases or at least 10 bases at the 5' end of the template RNA that have 100% identity to a target DNA strand.

180. The system of any one of the preceding embodiments, wherein the template RNA comprises at least 5 bases or at least 10 bases at the 3' end of the template RNA that have 100% identity to a target DNA strand.

181. A method of modifying a target DNA strand in a cell, tissue or subject, the method comprising administering a system as described in any preceding embodiment to the cell, tissue or subject, thereby modifying the target DNA strand.

182. The method of any preceding embodiment, which results in the addition of an exogenous DNA sequence of at least 5 base pairs to the genome of the cell.

183. The method of any preceding embodiment, which results in the addition of an exogenous DNA sequence of at least 100 base pairs to the genome of the cell.

184. The method of any preceding embodiment, which results in insertion of the heterologous subject sequence into the target DNA at an average copy number of at least 0.01, 0.05, or 0.5 copies per genome.

185. The method of any preceding embodiment, which results in about 50-100% of the insertion of the heterologous subject sequence into the target DNA being untruncated.

186. The method of any preceding embodiment, wherein the nucleic acid of (a) is not integrated into the genome of the cell.

187. The method of any preceding embodiment, wherein the template RNA comprises at least 5 or at least 10 bases at the 5' end of the template RNA that are 100% identical to the target DNA strand.

188. The method of any preceding embodiment, wherein the template RNA comprises at least 5 or at least 10 bases at the 3' end of the template RNA that are 100% identical to the target DNA strand.

189. The system or method of any preceding embodiment, wherein the heterologous subject sequence encodes a therapeutic polypeptide or encodes a mammalian (e.g., human) polypeptide or fragment or variant thereof.

190. The system or method of any preceding embodiment, wherein one or more of the following:

i. the heterologous subject sequence encodes a protein, such as an enzyme (e.g., lysosomal enzyme) or a blood factor (e.g., factor I, II, V, VII, X, XI, XII or XIII);

the heterologous subject sequence comprises a tissue specific promoter or enhancer;

the heterologous subject sequence encodes a polypeptide of greater than 250, 300, 400, 500 or 1,000 amino acids, and optionally up to 7,500 amino acids;

the heterologous subject sequence encodes a fragment of a mammalian gene, but not the complete mammalian gene, e.g., encodes one or more exons, but not the full-length protein;

v. the heterologous subject sequence encodes one or more introns;

the heterologous subject sequence is not GFP, e.g., is not a fluorescent protein or is not a reporter protein; or (b)

The heterologous subject sequence is not a T cell chimeric antigen receptor.

191. The system or method of any preceding embodiment, wherein one or both of the reverse transcriptase domain or endonuclease domain is derived from an avian reverse transcriptase, e.g., has a sequence of table 1 or 3 or is at least 70%, 75%, 80%,85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto.

192. The system or method of any preceding embodiment, wherein the activity of the polypeptide at 37 ℃ is not less than 70%, 75%, 80%,85%, 90% or 95% of the activity under otherwise similar conditions at 25 ℃.

193. A system or method as in any preceding embodiment, wherein the polypeptide is derived from an avian retrotransposase, e.g., an avian retrotransposase of column 8 of table 3, or a sequence having at least 70%, 75%, 80%,85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

194. The system or method of any preceding embodiment, wherein the avian reverse transcriptase is from the group consisting of sparrow (Taeniopygia guttata), sparrow (Geospiza fortis), diphtheria tape (Zonotrichia albicollis), and diphtheria (Tinamus guttatus) or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

195. The system or method of any preceding embodiment, wherein the polypeptide is derived from a retrotransposase of column 8 of table 3, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

196. The system of any one of the preceding embodiments, wherein the template RNA comprises a sequence of table 3 (e.g., one or both of the 5 'untranslated region of column 6 of table 3 and the 3' untranslated region of column 7 of table 3), or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

197. The system or method of any preceding embodiment, wherein one or more of the following:

i. the nucleic acid encoding the polypeptide and the template RNA or the nucleic acid encoding the template RNA are separate nucleic acids;

the template RNA does not encode an active reverse transcriptase, e.g., comprises an inactivated mutant reverse transcriptase, e.g., as described in examples 1-2, or does not comprise a reverse transcriptase sequence; or (b)

The template RNA does not encode an active endonuclease, e.g., comprises an inactivated endonuclease or does not comprise an endonuclease; or (b)

The template RNA comprises one or more chemical modifications.

198. The system or method of any preceding embodiment, wherein the template RNA (or DNA encoding the template RNA) further comprises a promoter operably linked to the heterologous subject sequence,

wherein the promoter is located between the 5 'untranslated sequence that binds to the polypeptide and the heterologous sequence, or wherein the promoter is located between the 3' untranslated sequence that binds to the polypeptide and the heterologous sequence.

199. The system or method of any preceding embodiment, wherein the template RNA (or DNA encoding the template RNA) further comprises a 5 'untranslated sequence that binds to the polypeptide and a 3' untranslated sequence that binds to the polypeptide, and

wherein the heterologous subject sequence comprises an open reading frame (or reverse complement thereof) in a 5 'to 3' orientation on the template RNA; or (b)

Wherein the heterologous subject sequence comprises an open reading frame (or reverse complement thereof) in a 3 'to 5' orientation on the template RNA.

200. The system or method of any preceding embodiment, wherein at least one of the reverse transcriptase domain, endonuclease domain, or target DNA binding domain is heterologous.

201. The system or method of any preceding embodiment, wherein the polypeptide comprises a sequence having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a reverse transcriptase domain of an apurin/pyrimidine-free endonuclease (APE) type non-LTR retrotransposon and a sequence having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to an endonuclease domain of an APE type non-LTR retrotransposon.

202. The system or method of any preceding embodiment, wherein the polypeptide comprises a sequence having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a reverse transcriptase domain of a restriction enzyme-like endonuclease (RLE) type non-LTR retrotransposon and a sequence having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to an endonuclease domain of the RLE type non-LTR retrotransposon.

203. The system or method of any preceding embodiment, wherein the RT domain comprises a sequence selected from the group consisting of the sequences of table 1 or 3 or the sequence of the reverse transcriptase domain of table 2, wherein the RT domain further comprises a plurality of substitutions relative to the native sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions.

204. The system or method of any preceding embodiment, wherein the RT domain comprises a sequence selected from the group consisting of the sequences of table 1 or 3 or the sequence of a reverse transcriptase domain of table 2 or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

205. The system or method of any preceding embodiment, wherein the template RNA comprises a promoter operably linked to the heterologous subject sequence.

206. The system or method of any one of the preceding embodiments, wherein the polypeptide further comprises (iii) a DNA binding domain.

207. The system or method of any of embodiments 140-144, wherein the polypeptide comprises a sequence having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to the sequence of SEQ ID No. 1016.

208. The system or method of any of the preceding embodiments, wherein the polypeptide comprises a sequence having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a sequence in column 8 of table 3.

209. The system or method of any one of the preceding embodiments, wherein the nucleic acid encoding the polypeptide and the template RNA or the nucleic acid encoding the template RNA is covalently linked, e.g., part of a fusion nucleic acid.

210. The system or method of any preceding embodiment, wherein the fusion nucleic acid comprises RNA.

211. The system or method of any preceding embodiment, wherein the fusion nucleic acid comprises DNA.

212. The system or method of any one of the preceding embodiments, wherein (b) comprises a template RNA.

213. The system or method of any preceding embodiment, wherein the template RNA further comprises a nuclear localization signal.

214. The system or method of any preceding embodiment, wherein the RNA of (a) does not comprise a nuclear localization signal.

215. A system or method as in any preceding embodiment wherein the polypeptide further comprises a nuclear localization signal and/or a nucleolar localization signal.

216. The system or method of any one of the preceding embodiments, wherein (a) comprises an RNA encoding: (i) The polypeptide and (ii) a nuclear localization signal and/or a nucleolar localization signal.

217. The system or method of any one of the preceding embodiments, wherein the RNA comprises a pseudoknot sequence, e.g., 5' of the heterologous subject sequence.

218. The system or method of any preceding embodiment, wherein the RNA comprises a stem loop sequence or helix 5' to a pseudoknot sequence.

219. The system or method of any preceding embodiment, wherein the RNA comprises 3' of a pseudoknot sequence, e.g., 3' of a pseudoknot sequence and one or more (e.g., 2, 3, or more) stem loop sequences or helices 5' of a heterologous subject sequence.

220. The system or method of any preceding embodiment, wherein the template RNA comprising the pseudoknot has catalytic activity, e.g., RNA cleavage activity, e.g., cis-RNA cleavage activity.

221. The system or method of any one of the preceding embodiments, wherein the RNA comprises at least one stem-loop sequence or helix, e.g., 3', e.g., 1, 2, 3, 4, 5 or more stem-loop sequences, hairpin or helix sequences, of the heterologous subject sequence.

222. Any of the foregoing numbered systems or methods, wherein the polypeptide comprises a sequence of at least 50 amino acids (e.g., at least 100, 150, 200, 300, 500 amino acids) that is at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to a sequence of a polypeptide listed in tables 1-3 or a reverse transcriptase domain or endonuclease domain thereof.

223. Any of the foregoing numbered systems or methods, wherein the polypeptide comprises a sequence of at least 50 amino acids (e.g., at least 100, 150, 200, 300, 500 amino acids) that is at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to a sequence of a polypeptide listed in any of tables 1-3, or a reverse transcriptase domain, endonuclease domain, or DNA binding domain thereof.

224. Any of the foregoing numbered systems or methods, wherein the polypeptide comprises a sequence of at least 50 amino acids (e.g., at least 100, 150, 200, 300, 500 amino acids) that is at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to an amino acid sequence of column 8 of table 3 or a reverse transcriptase domain, endonuclease domain, or DNA binding domain thereof.

225. Any of the foregoing numbered systems or methods, wherein the template RNA comprises the sequence of table 3 (e.g., one or both of the 5 'untranslated region of column 6 of table 3 and the 3' untranslated region of column 7 of table 3), or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

226. The system or method of any preceding embodiment, wherein the template RNA comprises a sequence of about 100-125bp from the 3 'untranslated region of column 7 of table 3, e.g., wherein the sequence comprises, or has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity to, nucleotides 1-100, 101-200, or 201-325 of the 3' untranslated region of column 7 of table 3.

227. Any of the foregoing numbered systems or methods, wherein (a) comprises RNA and (b) comprises RNA.

228. Any of the foregoing numbered systems or methods, wherein (a), (b) or (a) and (b) do not comprise DNA, or do not comprise more than 10%, 5%, 4%, 3%, 2%, or 1% DNA by mass or molar basis.

229. Any of the foregoing numbered systems, which are capable of modifying DNA by insertion of the heterologous subject sequence without the need for RNA polymerization of (b) that relies on intervening DNA.

230. Any of the foregoing numbered systems, which are capable of modifying DNA by target-initiated reverse transcription.

231. Any of the foregoing numbered systems, which are capable of modifying DNA by insertion of a heterologous subject sequence in the presence of a DNA repair pathway inhibitor (e.g., SCR7, a PARP inhibitor), or in a cell line lacking a DNA repair pathway (e.g., a cell line lacking a nucleotide excision repair pathway or homology-directed repair pathway).

232. Any of the above numbered systems, which do not cause the formation of detectable levels of double strand breaks in the target cells.

233. Any of the above numbered systems that are capable of utilizing reverse transcriptase activity and optionally modifying DNA without homologous recombination activity.

234. Any of the foregoing numbered systems, wherein the template RNA has been treated to reduce the secondary structure, e.g., to a temperature such as to reduce the secondary structure, e.g., to at least 70, 75, 80, 85, 90, or 95 ℃.

235. The system of any preceding embodiment, wherein the template RNA is subsequently cooled to a temperature that, for example, allows for secondary structure, for example to less than or equal to 30 ℃, 25 ℃ or 20 ℃.

236. A host cell (e.g., a mammalian cell, such as a human cell) comprising a system of any of the foregoing numbering.

237. The method of any preceding embodiment, wherein the cell, tissue, or subject is a mammalian (e.g., human) cell, tissue, or subject.

238. The method of any one of the preceding embodiments, wherein the cell is a fibroblast.

239. The method of any one of the preceding embodiments, wherein the cell is a primary cell.

240. The method of any one of the preceding embodiments, wherein the cell is not immortalized.

241. A method of modifying the genome of a mammalian cell, the method comprising contacting the cell with a system as described in any preceding embodiment.

242. A method of inserting DNA into the genome of a mammalian cell, the method comprising contacting the cell with a system as described in any preceding embodiment.

243. A method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000bp of exogenous DNA to the genome of a mammalian cell without delivering the DNA to the cell, the method comprising contacting the cell with a system as described in any preceding embodiment.

244. A method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000bp of exogenous DNA to the genome of a mammalian cell, the method comprising contacting the cell with a system as described in any preceding embodiment,

wherein the method does not comprise contacting the mammalian cell with DNA, or wherein the method comprises contacting the mammalian cell with a composition comprising less than 1%, 0.5%, 0.2%, 0.1%, 0.05%, 0.02%, or 0.01% DNA by mass or molar basis.

245. A method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000bp of exogenous DNA to the genome of a mammalian cell, the method comprising contacting the cell with a system as described in any preceding embodiment, wherein the method delivers RNA only to the mammalian cell.

246. A method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000bp of exogenous DNA to the genome of a mammalian cell, the method comprising contacting the cell with a system as described in any preceding embodiment, wherein the method delivers RNA and protein to the mammalian cell.

247. The method of any preceding embodiment, wherein the template RNA is used as a template for insertion of the exogenous DNA.

248. The method of any preceding embodiment, which does not comprise DNA-dependent RNA polymerization of exogenous DNA.

249. The method of any preceding embodiment, which results in the addition of at least 5, 10, 20, 50, 100, 200, 500, 1,000, 2,000, or 5,000 base pairs of DNA to the genome of a cell, e.g., a mammalian cell.

250. A method of modifying the genome of a human animal cell, the method comprising contacting the cell with a system as described in any preceding embodiment,

wherein the method results in inserting the heterologous subject sequence into the genome of a human cell,

wherein the human cell does not exhibit up-regulation of any DNA repair gene and/or tumor suppressor gene, or wherein no DNA repair gene and/or tumor suppressor gene is up-regulated by more than 10%, 5%, 2%, or 1%, e.g., wherein up-regulation is measured by RNA-seq, e.g., as described in example 14 of PCT/US 2019/048607 (incorporated herein by reference).

251. A method of adding an exogenous coding region to the genome of a cell (e.g., a mammalian cell), the method comprising contacting the cell with a system as described in any preceding embodiment, wherein the template RNA comprises a non-coding strand of the exogenous coding region, wherein optionally the template RNA does not comprise a coding strand of the exogenous coding region, wherein optionally the delivering comprises non-viral delivering.

252. A method of expressing a polypeptide in a cell (e.g., a mammalian cell), the method comprising contacting the cell with a system as described in any preceding embodiment, wherein the template RNA comprises a non-coding strand that is the reverse complement of a sequence that will encode the polypeptide, wherein optionally the template RNA does not comprise a coding strand that encodes the polypeptide, wherein optionally the delivering comprises non-viral delivering.

253. The method of any preceding embodiment, wherein the sequence inserted into the mammalian genome is a sequence exogenous to the mammalian genome.

254. The method of any preceding embodiment, wherein the system operates independently of a DNA template.

255. The method of any preceding embodiment, wherein the cell is part of a tissue.

256. The method of any preceding embodiment, wherein the mammalian cell is an euploid, is not immortalized, is part of an organism, is a primary cell, is non-dividing, is a hepatocyte, or is from a subject with a genetic disorder.

257. The method of any preceding embodiment, wherein the contacting comprises contacting the cell with a plasmid, virus-like particle, virosome, liposome, vesicle, exosome, fusion, or lipid nanoparticle.

258. The method of any preceding embodiment, wherein the contacting comprises using non-viral delivery.

259. The method of any preceding embodiment, comprising contacting the cell with the template RNA (or DNA encoding the template RNA), wherein the template RNA comprises a non-coding strand of an exogenous coding region, wherein optionally the template RNA does not comprise a coding strand of the exogenous coding region, wherein optionally delivering comprises non-viral delivery, thereby adding the exogenous coding region to the genome of the cell.

260. The method of any preceding embodiment, comprising contacting the cell with the template RNA (or DNA encoding the template RNA), wherein the template RNA comprises a non-coding strand that is the reverse complement of a sequence that will encode the polypeptide, wherein optionally the template RNA does not comprise a coding strand that encodes the polypeptide, wherein optionally delivering comprises non-viral delivering, thereby expressing the polypeptide in the cell.

261. The method of any preceding embodiment, wherein the contacting comprises administering (a) and (b) to the subject, e.g., intravenously.

262. The method of any preceding embodiment, wherein the contacting comprises administering doses of (a) and (b) to the subject at least twice.

263. The method of any preceding embodiment, wherein the polypeptide reverse transcribes the template RNA sequence into the target DNA strand, thereby modifying the target DNA strand.

264. The method of any preceding embodiment, wherein (a) and (b) are administered separately.

265. The method of any preceding embodiment, wherein (a) and (b) are administered together.

266. The method of any preceding embodiment, wherein the nucleic acid of (a) is not integrated into the genome of the host cell.

267. A method of any of the foregoing numbering, wherein the sequence that binds the polypeptide has one or more of the following features:

(a) At the 3' end of the template RNA;

(b) At the 5' end of the template RNA;

(b) A non-coding sequence;

(c) Is a structured RNA; or (b)

(d) At least 1 hair clip ring structure is formed.

268. The method of any of the foregoing numbering, wherein the template RNA further comprises a sequence comprising at least 20 nucleotides having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to the target DNA strand.

269. The method of any of the foregoing numbering, wherein the template RNA further comprises a sequence comprising at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 nucleotides that has at least 80% (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100%) identity to the target DNA strand.

270. Any of the foregoing numbering methods comprising at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 nucleotides having at least 80% identity to the target DNA strand, or about: 2-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 10-100, or 2-100 nucleotides at the 3' end of the template RNA.

271. Any of the foregoing numbering methods, wherein the template RNA further comprises a sequence comprising at least 100 nucleotides having at least 80% (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) with the target DNA strand, e.g., at the 3' end of the template RNA.

272. The method of any preceding embodiment, wherein a site in the target DNA strand that has at least 80% identity to the sequence is near a target site (e.g., within about: 0-10, 10-20, 20-30, 30-50, or 50-100 nucleotides of the target site) on the target DNA strand that is recognized (e.g., bound and/or cleaved) by a polypeptide comprising the endonuclease.

273. Any of the foregoing numbering methods comprising at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 nucleotides having at least 80% identity to the target DNA strand, or about: 2-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 10-100, or 2-100 nucleotides at the 3' end of the template RNA;

optionally, wherein the sequence has at least 80% identity to a site in the target DNA strand that is near (e.g., within about: 0-10, 10-20, or 20-30 nucleotides of) a target site on the target DNA strand that is recognized (e.g., bound and/or cleaved) by a polypeptide comprising the endonuclease.

274. The method of any preceding embodiment, wherein the target site is a site in the human genome that has closest identity to a native target site of a polypeptide comprising the endonuclease, e.g., wherein the target site in the human genome has at least about 16, 17, 18, 19, or 20 nucleotides identical to the native target site.

275. The method of any of the foregoing numbering, wherein the template RNA has at least 3, 4, 5, 6, 7, 8, 9, or 10 bases of 100% identity to the target DNA strand.

276. Any of the foregoing numbering methods, wherein at least 3, 4, 5, 6, 7, 8, 9, or 10 bases with 100% identity to the target DNA strand are located at the 3' end of the template RNA.

277. Any of the foregoing numbering methods, wherein at least 3, 4, 5, 6, 7, 8, 9, or 10 bases with 100% identity to the target DNA strand are located at the 5' end of the template RNA.

278. The method of any of the foregoing numbering, wherein the template RNA comprises at least 3, 4, 5, 6, 7, 8, 9, or 10 bases at the 5 'end of the template RNA that are 100% identical to the target DNA and at least 3, 4, 5, 6, 7, 8, 9, or 10 bases at the 3' end of the template RNA that are 100% identical to the target DNA.

279. Any of the foregoing numbering methods, wherein the heterologous subject sequence is between 50 and 50,000 base pairs (e.g., between 50 and 40,000bp, between 500 and 30,000bp, between 500 and 20,000bp, between 100 and 15,000bp, between 500 and 10,000bp, between 50 and 5,000 bp).

280. The method of any of the preceding numbering, wherein the heterologous subject sequence is at least 10, 25, 50, 100, 150, 200, 250, 300, 400, 500, 600, or 700bp.

281. The method of any of the preceding numbering, wherein the heterologous subject sequence is at least 715, 750, 800, 950, 1,000, 2,000, 3,000, or 4,000bp.

282. The method of any preceding numbering, wherein the heterologous subject sequence is less than 5,000, 10,000, 15,000, 20,000, 30,000, or 40,000bp.

283. The method of any of the preceding numbering, wherein the heterologous subject sequence is less than 700, 600, 500, 400, 300, 200, 150, or 100bp.

284. The method of any of the preceding numbering, wherein the heterologous subject sequence comprises:

(a) Open reading frames, e.g., sequences encoding polypeptides, e.g., enzymes (e.g., lysosomal enzymes), membrane proteins, blood factors, exons, intracellular proteins (e.g., cytoplasmic proteins, nuclear proteins, organelle proteins, e.g., mitochondrial proteins or lysosomal proteins), extracellular proteins, structural proteins, signaling proteins, regulatory proteins, transport proteins, sensory proteins, motor proteins, defensin proteins, or storage proteins;

(b) Non-coding and/or regulatory sequences, such as sequences that bind transcriptional regulators, e.g., promoters, enhancers, insulators;

(c) Splice acceptor sites;

(d) A poly a site;

(e) An epigenetic modification site; or (b)

(f) A gene expression unit.

285. Any of the foregoing numbering methods, wherein the target DNA is a Genomic Safe Harbor (GSH) site.

286. The method of any of the foregoing numbering, wherein the target DNA is genomic Natural Harbor ^TM A site.

287. Any of the foregoing numbering methods that results in insertion of the heterologous subject sequence into a target site in the genome at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4,0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4, or 5 copies per genome.

288. Any of the foregoing numbering methods that result in about 25-100%, 50-100%, 60-100%, 70-100%, 75-95%, 80% -90% of the integrants into the target site in the genome being non-truncated as measured by the assays described herein, e.g., the assay of example 6.

289. Any of the foregoing numbering methods that result in insertion of the heterologous subject sequence at only one target site in the genome of the cell.

290. Any of the foregoing numbering methods that result in insertion of the heterologous subject sequence into a target site of a cell, the inserted heterologous sequence comprising less than 10%, 5%, 2%, 1%, 0.5%, 0.2%, or 0.1% mutation (e.g., SNP or one or more deletions (e.g., truncations or internal deletions)) relative to the heterologous sequence prior to insertion, e.g., as measured by the assay of example 12 of PCT/US 2019/048607 (incorporated herein by reference).

291. Any of the foregoing numbering methods, which results in insertion of the heterologous subject sequence into a target site of a plurality of cells, wherein less than 10%, 5%, 2%, or 1% of the copies of the inserted heterologous sequence comprise a mutation (e.g., a SNP or deletion, e.g., a truncation or internal deletion), e.g., as measured by the assay of example 12 of PCT/US 2019/048607 (incorporated herein by reference).

292. Any of the foregoing numbered methods, which results in insertion of the heterologous subject sequence into the genome of the target cell, and wherein the target cell does not exhibit up-regulation of p53, or exhibits less than 10%, 5%, 2% or 1% up-regulation of p53, e.g., wherein up-regulation of p53 is measured by p53 protein levels, e.g., according to the method described in example 30 of PCT/US 2019/048607, or by the levels of p53 phosphorylated at Ser15 and Ser 20.

293. Any of the foregoing numbering methods that result in insertion of the heterologous subject sequence into the genome of the target cell, and wherein the target cell does not exhibit up-regulation of any DNA repair gene and/or tumor suppressor gene, or wherein no DNA repair gene and/or tumor suppressor gene is up-regulated by more than 10%, 5%, 2%, or 1%, e.g., wherein up-regulation is measured by RNA-seq, e.g., as described in example 14 of PCT/US 2019/048607 (incorporated herein by reference).

294. Any of the foregoing numbering methods that result in insertion of the heterologous subject sequence into a target site (e.g., at a copy number of 1 insertion or more than one insertion) in about 1% -80% of the cells (e.g., about 1% -10%, 10% -20%, 20% -30%, 30% -40%, 40% -50%, 50% -60%, 60% -70% or 70% -80% of the cells) in the population of cells contacted with the system, e.g., as measured using single cell ddPCR, e.g., as described in example 17 of PCT/US2019/048607 (incorporated herein by reference).

295. Any of the foregoing numbering methods, which results in the insertion of the heterologous subject sequence at 1 insertion copy number to a target site in about 1% -80% of the cells (e.g., about 1% -10%, 10% -20%, 20% -30%, 30% -40%, 40% -50%, 50% -60%, 60% -70% or 70% -80% of the cells) in the population of cells contacted with the system, e.g., as measured using colony isolation and ddPCR, e.g., as described in example 18 of PCT/US2019/048607 (incorporated herein by reference).

296. Any of the foregoing numbering methods that result in insertion of the heterologous subject sequence into the target site (mid-target insertion) at a higher rate in the cell population than into the non-target site (off-target insertion), wherein the ratio of mid-target insertion to off-target insertion is greater than 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, 200:1, 500:1, or 1,000:1, e.g., using the assay of example 11 of PCT/US2019/048607 (incorporated herein by reference).

297. Any of the foregoing numbered systems, which result in insertion of a heterologous subject sequence in the presence of a DNA repair pathway inhibitor (e.g., SCR7, a PARP inhibitor), or in a cell line lacking a DNA repair pathway (e.g., a cell line lacking a nucleotide excision repair pathway or homology-directed repair pathway).

298. Any of the foregoing numbered systems, which may be formulated into pharmaceutical compositions.

299. Any of the foregoing numbering systems, in a pharmaceutically acceptable carrier (e.g., vesicles, liposomes, natural or synthetic lipid bilayers, lipid nanoparticles, exosomes).

300. A method of preparing a system for modifying DNA (e.g., as described herein), the method comprising:

(a) Providing a template nucleic acid (e.g., template RNA or DNA) comprising a heterologous, homologous sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence comprised in a target DNA molecule, and/or

(b) Polypeptides of the system (e.g., comprising a DNA Binding Domain (DBD) and/or an endonuclease domain) are provided that comprise a heterologous targeting domain that specifically binds to a sequence comprised in the target DNA molecule.

301. The method of any preceding embodiment, wherein:

(a) Comprising introducing into the template nucleic acid (e.g., template RNA or DNA) a heterologous homologous sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence contained in the target DNA molecule, and/or

(b) Comprising introducing into the polypeptide of the system (e.g., comprising a DNA Binding Domain (DBD) and/or an endonuclease domain) a heterologous targeting domain that specifically binds to a sequence comprised in the target DNA molecule.

302. The method of any preceding embodiment, wherein introducing of (a) comprises inserting the homologous sequence into the template nucleic acid.

303. The method of any preceding embodiment, wherein introducing of (a) comprises replacing a segment of the template nucleic acid with the homologous sequence.

304. The method of any preceding embodiment, wherein the introducing of (a) comprises mutating one or more nucleotides (e.g., at least 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, or 100 nucleotides) of the template nucleic acid, thereby producing a segment of the sequence of the template nucleic acid having the homologous sequence.

305. The method of any preceding embodiment, wherein the introducing of (b) comprises inserting the amino acid sequence of the targeting domain into the amino acid sequence of the polypeptide.

306. The method of any preceding embodiment, wherein the introducing of (b) comprises inserting a nucleic acid sequence encoding the targeting domain into a coding sequence contained in a nucleic acid molecule of the polypeptide.

307. The method of any preceding embodiment, wherein the introducing of (b) comprises replacing at least a portion of the polypeptide with the targeting domain.

308. The method of any preceding embodiment, wherein the introducing of (a) comprises mutating one or more amino acids (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, or more amino acids) of the polypeptide.

309. The method of any preceding embodiment, wherein a motif recognized by the endonuclease domain (e.g., at least 2, 4, 6, 8, 10, 20, 30, 40, or at least 50nt, or no more than 50, 40, 30, 20, 10, 8, 6, 4, or 2) or less than 3 Gene Write polypeptides is used as a seed for retargeting a Gene Writing system, wherein the DNA binding domain is modified such that binding of the Gene Write polypeptide to the new target site results in proper positioning of the endonuclease domain to the core motif to achieve endonuclease activity, optionally wherein the motif recognized by the endonuclease domain comprises 4, 5, 6, 7, 8, 9, or 10 consecutive nucleotides of TTAAGGTAGC, AAGGTAGCCAAA or TAAGGTAGCCAAA, or wherein the motif recognized by the endonuclease domain comprises 2, 3, or 4 consecutive nucleotides of AAGG.

310. The method of any preceding embodiment, wherein the AAGG sequence in the genome is used as a seed for re-targeting the Gene Writing system, wherein the DNA binding domain is modified such that binding of the Gene Writing polypeptide to the new target site results in proper localization of the endonuclease domain to the AAGG motif to achieve endonuclease activity.

311. A method for modifying a target site in genomic DNA in a cell, the method comprising contacting the cell with:

(b) Template RNA (or DNA encoding the template RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to the target site (e.g., the second strand of the site in the target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' target homologous domain,

wherein:

(i) The polypeptide comprises a heterologous targeting domain (e.g., in the DBD or the endonuclease domain) that specifically binds a sequence contained in or near a target site of the genomic DNA; and/or

(ii) The template RNA comprises a heterologous homologous sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence comprised in or adjacent to a target site of the genomic DNA;

thereby modifying a target site in genomic DNA in the cell.

312. A method of making a system for modifying the genome of a mammalian cell, the method comprising:

a) Providing a template RNA according to any one of the preceding embodiments, e.g., wherein the template RNA comprises (i) a sequence that binds to a polypeptide comprising a reverse transcriptase domain and an endonuclease domain, and (ii) a heterologous subject sequence; and

b) Treating the template RNA to reduce secondary structure, e.g., heating the template RNA to at least 70 ℃, 75 ℃, 80 ℃, 85 ℃, 90 ℃, or 95 ℃, and

c) The template RNA is then cooled, e.g., to a temperature that allows for secondary structure, e.g., to less than or equal to 30 ℃, 25 ℃, or 20 ℃.

313. The method of any preceding embodiment, further comprising contacting the template RNA with a polypeptide comprising (i) a reverse transcriptase domain and (ii) an endonuclease domain, or with a nucleic acid (e.g., RNA) encoding the polypeptide.

314. The method of any preceding embodiment, further comprising contacting the template RNA with a cell.

315. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence encodes a therapeutic polypeptide.

316. The system or method of any of the preceding embodiments, wherein the heterologous subject sequence encodes a mammalian (e.g., human) polypeptide or fragment or variant thereof.

317. The system or method of any of the preceding embodiments, wherein the heterologous subject sequence encodes an enzyme (e.g., a lysosomal enzyme), a blood factor (e.g., factor I, II, V, VII, X, XI, XII or XIII), a membrane protein, an exon, an intracellular protein (e.g., a cytoplasmic protein, a nuclear protein, an organelle protein, such as a mitochondrial protein or lysosomal protein), an extracellular protein, a structural protein, a signaling protein, a regulatory protein, a transporter protein, a sensory protein, a motor protein, a defensin protein, or a storage protein.

318. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence comprises a tissue specific promoter or enhancer.

319. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence encodes a polypeptide of greater than 250, 300, 400, 500, or 1,000 amino acids, and optionally up to 1300 amino acids.

320. The system or method of any of the preceding embodiments, wherein the heterologous subject sequence encodes a fragment of a mammalian gene, but not a complete mammalian gene, e.g., encodes one or more exons, but not a full-length protein.

321. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence encodes one or more introns.

322. The system or method of any of the preceding embodiments, wherein the heterologous subject sequence is different from GFP, e.g., different from a fluorescent protein or different from a reporter protein.

323. The system or method of any of the preceding embodiments, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain wherein one or both of (i) or (ii) is derived from an avian reverse transcriptase, e.g., a sequence having a sequence of table 1 or 3, or a protein domain listed in table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

324. The system or method of any preceding embodiment, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain, wherein one or both of (i) or (ii) is derived from an avian reverse transcription transposase, and wherein one or both of (i) or (ii) further comprises a plurality of substitutions relative to the native sequence, for example at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions.

325. The system or method of any one of the preceding embodiments, wherein the activity of the polypeptide at 37 ℃ is not less than 70%, 75%, 80%, 85%, 90% or 95% of the activity at 25 ℃ under otherwise similar conditions.

326. The system or method of any one of the preceding embodiments, wherein the nucleic acid encoding the polypeptide and the template RNA or the nucleic acid encoding the template RNA is a separate nucleic acid.

327. The system or method of any of the preceding embodiments, wherein the template RNA does not encode an active reverse transcriptase, e.g., comprises an inactivated mutant reverse transcriptase as described in example 1 or 2 of PCT/US 2019/048607 (incorporated herein by reference), or does not comprise a reverse transcriptase sequence.

328. The system or method of any one of the preceding embodiments, wherein the template RNA comprises one or more chemical modifications.

329. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence is located between the promoter and the sequence that binds the polypeptide.

330. The system or method of any one of the preceding embodiments, wherein the promoter is located between the heterologous subject sequence and the sequence that binds the polypeptide.

331. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence comprises a 5 'to 3' oriented open reading frame (or reverse complement thereof) on the template RNA.

332. The system or method of any one of the preceding embodiments, wherein the heterologous subject sequence comprises a 3 'to 5' oriented open reading frame (or reverse complement thereof) on the template RNA.

333. The system or method of any one of the preceding embodiments, wherein the polypeptide comprises (a) a reverse transcriptase domain and (b) an endonuclease domain, wherein at least one of (a) or (b) is heterologous.

334. The system or method of any one of the preceding embodiments, wherein the polypeptide comprises (a) a target DNA binding domain, (b) a reverse transcriptase domain, and (c) an endonuclease domain, wherein at least one of (a), (b), or (c) is heterologous.

335. A polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a Reverse Transcriptase (RT) domain, (ii) a DNA Binding Domain (DBD); and (iii) an endonuclease domain; wherein the DBD and/or the endonuclease domain comprises a heterologous targeting domain that specifically binds to a sequence comprised in a target DNA molecule (e.g., genomic DNA).

336. A polypeptide or nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a first target DNA binding domain, e.g., comprising a first Zn-finger domain, (ii) a reverse transcriptase domain, (iii) an endonuclease domain, and (iv) a second target DNA binding domain, e.g., comprising a second Zn-finger domain that is heterologous to the first target DNA binding domain.

337. The polypeptide of any preceding embodiment or nucleic acid encoding the polypeptide, wherein (iii) comprises (iv).

338. A polypeptide or nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a target DNA binding domain, (ii) a reverse transcriptase domain, optionally (iii) an endonuclease domain, wherein the polypeptide comprises a heterologous linker in place of a portion of (i), (ii) or (iii), or in place of an endogenous linker linking two of (i), (ii) or (iii).

339. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (i).

340. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (ii).

341. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (iii).

342. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (i) and (ii).

343. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (i) and (iii).

344. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, a portion of (ii) and (iii).

345. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, an endogenous linker linking (i) and (ii).

346. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, an endogenous linker linking (i) and (iii).

347. The polypeptide of any preceding embodiment, wherein the heterologous linker replaces, e.g., lacks, an endogenous linker linking (ii) and (iii).

348. The polypeptide of any preceding embodiment, wherein the heterologous linker comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

349. The polypeptide of any preceding embodiment, wherein the heterologous linker comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, or 500 amino acids.

350. A nucleic acid encoding a polypeptide according to any one of the preceding numbered embodiments.

351. A vector comprising a nucleic acid as in any preceding embodiment.

352. A host cell comprising a nucleic acid as described in any preceding embodiment.

353. A host cell comprising a polypeptide according to any one of the preceding numbered embodiments.

354. A host cell comprising a vector as set forth in any preceding embodiment.

355. A pharmaceutical composition comprising any of the foregoing numbered systems, nucleic acids, polypeptides, or vectors; and a pharmaceutically acceptable excipient or carrier.

356. The pharmaceutical composition of any preceding embodiment, wherein the pharmaceutically acceptable excipient or carrier is selected from a carrier (e.g., viral or plasmid vector), vesicle (e.g., liposome, exosome, natural or synthetic lipid bilayer), lipid nanoparticle.

357. The polypeptide of any one of the preceding embodiments, wherein the polypeptide further comprises a nuclear localization sequence.

358. Any of the preceding numbered embodiments, wherein the polypeptide comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

359. Any of the preceding numbered embodiments, wherein the reverse transcriptase domain comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

360. Any of the preceding numbered embodiments, wherein the retrotransposase comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

361. Any of the foregoing numbered embodiments, wherein the polypeptide, reverse transcriptase domain, or reverse transcriptase transposase comprises a linker comprising an amino acid sequence with at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

362. Any of the foregoing numbered embodiments, wherein the polypeptide comprises a DNA binding domain covalently linked to the remainder of the polypeptide by a linker, e.g., comprising at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, or 500 amino acids.

363. Any of the preceding embodiments, wherein the linker is attached to the remainder of the polypeptide at a position in the DNA binding domain, RNA binding domain, reverse transcriptase domain, or endonuclease domain.

364. Any of the preceding embodiments, wherein the linker is attached to the remainder of the polypeptide at a position on the N-terminal side of the alpha helical region of the polypeptide, e.g., at a position corresponding to version v1 as described in example 26 of PCT/US 2019/048607 (incorporated herein by reference).

365. Any of the foregoing embodiments, wherein the linker is attached to the remainder of the polypeptide at a position on the C-terminal side of the alpha helical region of the polypeptide, e.g., prior to the RNA binding motif (e.g., -1RNA binding motif), e.g., at a position corresponding to version v2 as described in example 26 of PCT/US 2019/048607 (incorporated herein by reference).

366. Any of the preceding embodiments, wherein the linker is attached to the remainder of the polypeptide at a position on the C-terminal side of the random coil region of the polypeptide, e.g., N-terminal with respect to a DNA binding motif (e.g., C-myb DNA binding motif), e.g., at a position corresponding to version v3 described in example 26 of PCT/US 2019/048607 (incorporated herein by reference).

367. Any of the preceding embodiments, wherein the linker comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).

368. Any of the foregoing numbered embodiments, wherein a polynucleotide sequence comprising at least about 500, 1000, 2000, 3000, 3500, 3600, 3700, 3800, 3900, or 4000 consecutive nucleotides from the 5' end of the template RNA sequence is integrated into the target cell genome.

369. Any of the foregoing numbered embodiments, wherein a polynucleotide sequence comprising at least about 500, 1000, 2000, 2500, 2600, 2700, 2800, 2900, or 3000 consecutive nucleotides from the 3' end of the template RNA sequence is integrated into the target cell genome.

370. Any of the foregoing numbered embodiments, wherein the nucleic acid sequence of the template RNA, or a portion thereof (e.g., a portion comprising at least about 100, 200, 300, 400, 500, 1000, 2000, 2500, 3000, 3500, or 4000 nucleotides) is integrated into the genome of the target cell population at a copy number of at least about 0.21, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0 integrants per genome.

371. Any of the foregoing numbered embodiments, wherein the nucleic acid sequence of the template RNA, or a portion thereof (e.g., a portion comprising at least about 100, 200, 300, 400, 500, 1000, 2000, 2500, 3000, 3500, or 4000 nucleotides) is integrated into the genome of the target cell population at a copy number of at least about 0.085, 0.09, 0.1, 0.15, or 0.2 integrants per genome.

372. Any of the foregoing numbered embodiments, wherein the nucleic acid sequence of the template RNA, or a portion thereof (e.g., a portion comprising at least about 100, 200, 300, 400, 500, 1000, 2000, 2500, 3000, 3500, or 4000 nucleotides) is integrated into the genome of the target cell population at a copy number of at least about 0.036, 0.04, 0.05, 0.06, 0.07, or 0.08 integrants per genome.

373. Any of the foregoing numbered embodiments, wherein the polypeptide comprises a functional endonuclease domain (e.g., wherein the endonuclease domain does not comprise a mutation that eliminates endonuclease activity, e.g., as described herein).

374. Any of the foregoing numbered embodiments, wherein the polypeptide comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to an R2 polypeptide (e.g., as described herein) from a medium-sized sparrow, such as sparrow (Geospiza fortis), or a functional fragment thereof.

375. Any of the foregoing numbered embodiments, wherein the polypeptide comprises an amino acid sequence from a medium sparrow, e.g., an R2 polypeptide of a medium sparrow (e.g., as described herein), or a functional fragment thereof, and further comprises a plurality of substitutions, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions relative to the sequence of the native sequence.

376. Any of the foregoing numbered embodiments, wherein the reverse transcriptase domain comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to an R2 polypeptide (e.g., as described herein) from a medium-sized sparrow, e.g., medium-sized sparrow, or a functional fragment thereof.

377. Any of the foregoing numbered embodiments, wherein the reverse transcriptase domain comprises an amino acid sequence of an R2 polypeptide (e.g., as described herein) from a medium sparrow, such as a medium sparrow, or a functional fragment thereof, and further comprises a number of substitutions, such as at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions relative to the sequence of the native sequence.

378. Any of the foregoing numbered embodiments, wherein the retrotransposase comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to an R2 polypeptide (e.g., as described herein) from a medium-sized sparrow, e.g., sparrow (Geospiza fortis), or a functional fragment thereof.

379. Any of the foregoing numbered embodiments, wherein the retrotransposase comprises an amino acid sequence from a medium sparrow, e.g., an R2 polypeptide of medium sparrow (e.g., as described herein), or a functional fragment thereof, and further comprises a number of substitutions, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions relative to the sequence of the native sequence.

380. Any of the preceding embodiments, wherein the nucleic acid sequence of the template RNA or a portion thereof (e.g., a portion comprising at least about 100, 200, 300, 400, 500, 1000, 2000, 2500, 3000, 3500, or 4000 nucleotides) is integrated into the genome of the target cell population at a copy number of at least about 0.21 integrants per genome.

381. Any of the foregoing numbered embodiments, wherein the polypeptide comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, or a functional fragment thereof, to an R4 polypeptide (e.g., as described herein) from large roundworm, e.g., human roundworm (Ascaris lumbricoides).

382. Any of the foregoing numbered embodiments, wherein the polypeptide comprises an amino acid sequence of an R4 polypeptide (e.g., as described herein) from large roundworm, e.g., human roundworm, or a functional fragment thereof, and further comprises a plurality of substitutions, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions relative to the sequence of the native sequence.

383. Any of the foregoing numbered embodiments, wherein the reverse transcriptase domain comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to an R4 polypeptide (e.g., as described herein) from large roundworm, e.g., human roundworm.

384. Any of the foregoing numbered embodiments, wherein the reverse transcriptase domain comprises an amino acid sequence of an R4 polypeptide (e.g., as described herein) from large roundworm, e.g., human roundworm, or a functional fragment thereof, and further comprises a number of substitutions, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions relative to the sequence of the native sequence.

385. Any of the foregoing numbered embodiments, wherein the retrotransposase comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity with an R4 polypeptide (e.g., as described herein) from large roundworm, e.g., human roundworm (Ascaris lumbricoides), or a functional fragment thereof.

386. Any of the foregoing numbered embodiments, wherein the retrotransposase comprises an amino acid sequence of an R4 polypeptide (e.g., as described herein) from large roundworm, e.g., human roundworm, or a functional fragment thereof, and further comprises a number of substitutions, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions relative to the sequence of the native sequence.

387. Any of the preceding embodiments, wherein the nucleic acid sequence of the template RNA or a portion thereof (e.g., a portion comprising at least about 100, 200, 300, 400, 500, 1000, 2000, 2500, 3000, 3500, or 4000 nucleotides) is integrated into the genome of the target cell population at a copy number of at least about 0.085 integrants per genome.

388. Any of the previously numbered examples, wherein introducing the system into the target cell does not result in a change (e.g., upregulation) in p53 and/or p21 protein levels, H2AX phosphorylation (e.g., γh2ax), ATM phosphorylation, ATR phosphorylation, chk1 phosphorylation, chk2 phosphorylation, and/or p53 phosphorylation.

389. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of p53 protein level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80%, or 90% of the p53 protein level induced by the introduction of a site-specific nuclease (e.g., cas 9) that targets the same genomic locus as the system.

390. Any of the preceding embodiments, wherein the p53 protein level is determined according to the method described in example 30 of PCT/US 2019/048607 (incorporated herein by reference).

391. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of the p53 phosphorylation level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80% or 90% of the p53 phosphorylation level induced by the introduction of a site-specific nuclease (e.g., cas 9) that targets the same genomic locus as the system.

392. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of p21 protein level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80%, or 90% of the p53 protein level induced by the introduction of a site-specific nuclease (e.g., cas 9) that targets the same genomic locus as the system.

393. Any of the preceding embodiments, wherein the p21 protein level is determined according to the method described in example 30 of PCT/US 2019/048607 (incorporated herein by reference).

394. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of H2AX phosphorylation level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80% or 90% of the level of H2AX phosphorylation induced by introducing a site-specific nuclease (e.g., cas 9) targeted to the same genomic locus as the system.

395. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of ATM phosphorylation level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80% or 90% of the level of ATM phosphorylation induced by introducing a site-specific nuclease (e.g., cas 9) that targets the same genomic locus as the system.

396. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of ATR phosphorylation level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80% or 90% of the ATR phosphorylation level induced by introducing a site-specific nuclease (e.g., cas 9) that targets the same genomic site as the system.

397. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of the level of Chk1 phosphorylation in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80% or 90% of the level of Chk1 phosphorylation induced by the introduction of a site-specific nuclease (e.g., cas 9) that targets the same genomic locus as the system.

398. Any of the foregoing numbered embodiments, wherein introducing the system into the target cell results in an up-regulation of the level of Chk2 phosphorylation in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80% or 90% of the level of Chk2 phosphorylation induced by the introduction of a site-specific nuclease (e.g., cas 9) that targets the same genomic locus as the system.

399. A system for modifying DNA, the system comprising:

(b) A template RNA (or DNA encoding the same) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence (e.g., a CRISPR spacer) that binds to a target site (e.g., an unedited strand of a site in a target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain.

400. A system for modifying DNA, the system comprising:

(b) Template RNA (or DNA encoding the template RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to a target site (e.g., an unedited strand of a site in the target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain,

wherein the RT domain has the sequence of table 1 or 3, or the sequence of a protein domain listed in table 2, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto.

411. A system for modifying DNA, the system comprising:

(b) Template RNA (etRNA) (or DNA encoding the same) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to a target site (e.g., an unedited strand of a site in a target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain,

wherein the system is capable of generating at least 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotide insertions in the target site.

412. A system for modifying DNA, the system comprising:

Wherein the heterologous subject sequence length is at least 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 120, 140, 160, 180, 200, 500, or 1,000nt.

413. The system of any preceding embodiment, wherein one or more of the following: the RT domain is heterologous to the DBD; the DBD is heterologous to the endonuclease domain; or the RT domain is heterologous to the endonuclease domain.

414. A system for modifying DNA, the system comprising:

wherein the system is capable of producing at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotide deletions in the target site.

415. A system for modifying DNA, the system comprising:

(b) A template (or DNA encoding the template RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to a target site (e.g., an unedited strand of a site in the target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain,

wherein (a) (ii) and/or (a) (iii) comprises a TALE molecule; zinc finger molecules; or a CRISPR/Cas molecule; or a functional variant (e.g., mutant) thereof.

416. A system for modifying DNA, the system comprising:

(b) Template RNA (or DNA encoding the template RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence (e.g., a CRISPR spacer) that binds to a target site (e.g., an unedited strand of a site in the target genome), (ii) optionally, a sequence that binds to the polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain,

Wherein the endonuclease domain, e.g., a nicking enzyme domain, cleaves both strands of the target site DNA, and wherein the cleaves are separated from each other by at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or 30 nucleotides.

417. A system for modifying DNA, the system comprising:

(b) A template RNA (or DNA encoding the same) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to a target site (e.g., an unedited strand of a site in a target genome), (ii) a sequence that specifically binds to the RT domain, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain.

418. The system of any preceding embodiment, wherein the template RNA further comprises a sequence that binds to (a) (ii) and/or (a) (iii).

419. A system for modifying DNA, the system comprising:

(a) A first polypeptide or a nucleic acid encoding the first polypeptide, wherein the first polypeptide comprises (i) a Reverse Transcriptase (RT) domain and (ii) optionally, a DNA binding domain,

(b) A second polypeptide or a nucleic acid encoding the second polypeptide, wherein the second polypeptide comprises (i) a DNA Binding Domain (DBD); (ii) Endonuclease domains, such as nicking enzyme domains; and

(c) A template RNA (or DNA encoding the template RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to the second polypeptide (e.g., binds to (b) (i) and/or (b) (ii)), (ii) optionally, a sequence that binds to the first polypeptide (e.g., specifically binds to the RT domain), (iii) a heterologous subject sequence, and (iv) a 3' homologous domain.

420. A system for modifying DNA, the system comprising:

(a) A polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a Reverse Transcriptase (RT) domain, and (ii) a DNA Binding Domain (DBD); and (iii) an endonuclease domain, such as a nicking enzyme domain;

(b) A first template RNA (or DNA encoding the RNA) comprising (e.g., from 5 'to 3') (i) a sequence that binds to the polypeptide (e.g., binds to (a) (ii) and/or (a) (iii)) and (ii) a sequence that binds to a target site (e.g., an unedited strand of a site in a target genome) (e.g., wherein the first RNA comprises a gRNA);

(c) A second template RNA (or DNA encoding the RNA) comprising (e.g., from 5' to 3 ') (i) optionally, a sequence that binds to the polypeptide (e.g., specifically binds to the RT domain), (ii) a heterologous subject sequence, and (iii) a 3' homologous domain.

421, wherein the second template RNA comprises (i).

422, wherein the first template RNA comprises a first conjugation domain and the second template RNA comprises a second conjugation domain.

423 the system of any preceding embodiment, wherein the first and second conjugation domains are capable of hybridizing to each other, e.g., under stringent conditions.

424, wherein the association of the first conjugation domain and the second conjugation domain co-localizes the first template RNA and the second template RNA.

425. The system of any preceding embodiment, wherein the template RNA comprises (i).

426. The system of any preceding embodiment, wherein the template RNA comprises (ii).

427. The system of any preceding embodiment, wherein the template RNA comprises (i) and (ii).

428. A template RNA (or DNA encoding a template RNA) comprising a targeting domain (e.g., a heterologous targeting domain) that specifically binds to a sequence comprised in a target DNA molecule (e.g., genomic DNA), a sequence that specifically binds to an RT domain of a polypeptide, and a heterologous subject sequence.

429. The system, method or template RNA of any preceding embodiment, wherein the polypeptide comprises a heterologous targeting domain that specifically binds to a sequence comprised in a target DNA molecule (e.g., genomic DNA).

430. The system, method, or template RNA of any preceding embodiment, wherein the heterologous targeting domain binds to a nucleic acid sequence that is different from the unmodified polypeptide.

431. The system, method, or template RNA of any preceding embodiment, wherein the polypeptide does not comprise a functional endogenous targeting domain (e.g., wherein the polypeptide does not comprise an endogenous targeting domain).

432. The system, method, or template RNA of any preceding embodiment, wherein the heterologous targeting domain comprises a zinc finger (e.g., a zinc finger that specifically binds a sequence comprised in the target DNA molecule).

433. The system, method, or template RNA of any preceding embodiment, wherein the heterologous targeting domain comprises a Cas domain (e.g., a Cas9 domain, or a mutant or variant thereof, e.g., a Cas9 domain that specifically binds to a sequence comprised in the target DNA molecule).

434. The system, method, or template RNA of any preceding embodiment, wherein the Cas domain is associated with a guide RNA (gRNA).

435. The system, method, or template RNA of any preceding embodiment, wherein the heterologous targeting domain comprises an endonuclease domain (e.g., a heterologous endonuclease domain).

436. The system, method, or template RNA of any preceding embodiment, wherein the endonuclease domain comprises a Cas domain (e.g., cas9 or a mutant or variant thereof).

437. The system, method, or template RNA of any preceding embodiment, wherein the Cas domain is associated with a guide RNA (gRNA).

438. The system, method, or template RNA of any preceding embodiment, wherein the endonuclease domain comprises a Fok1 domain.

439. The system, method, or template RNA of any preceding embodiment, wherein the template nucleic acid molecule comprises at least one (e.g., one or two) heterologous homologous sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence comprised in a target DNA molecule (e.g., genomic DNA).

440. The system, method, or template RNA of any preceding embodiment, wherein one of the at least one heterologous homologous sequence is located at or within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of the 5' end of the template nucleic acid molecule.

441. The system, method, or template RNA of any preceding embodiment, wherein one of the at least one heterologous homologous sequence is located at or within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of the 3' end of the template nucleic acid molecule.

442. The system, method, or template RNA of any preceding embodiment, wherein the heterologous homologous sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nicking site (e.g., produced by a nicking enzyme, e.g., an endonuclease domain as described herein) in the target DNA molecule.

443. The system, method, or template RNA of any preceding embodiment, wherein the heterologous homologous sequence and a nucleic acid sequence complementary to an endogenous homologous sequence of an unmodified form of the template RNA have less than 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, or 1% sequence identity.

444. The system, method, or template RNA of any preceding embodiment, wherein the heterologous homologous sequence has at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to the sequence of the target DNA molecule, which is different from (e.g., replaced by) the sequence bound by the endogenous homologous sequence.

445. The system, method, or template RNA of any preceding embodiment, wherein the heterologous homologous sequence comprises a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology (e.g., at the 3 'end thereof) to a sequence located 5' to a nicking site (e.g., a site nicked by a nicking enzyme (e.g., an endonuclease domain described herein) of the target DNA molecule.

446. The system, method, or template RNA of any preceding embodiment, wherein the heterologous homologous sequence comprises a sequence suitable for priming target-initiated reverse transcription (TPRT) initiation (e.g., at its 5' end).

447. The system, method, or template RNA of any preceding embodiment, wherein the heterologous homologous sequence has at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a target insertion site located in the target DNA molecule (e.g., for a heterologous subject sequence (e.g., as described herein)) (e.g., relative to 3') within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of the heterologous subject sequence.

448. The system, method, or template RNA of any preceding embodiment, wherein the template nucleic acid molecule comprises a guide RNA (gRNA) as described herein, for example.

449. The system, method, or template RNA of any preceding embodiment, wherein the template nucleic acid molecule comprises a gRNA spacer sequence (e.g., at or within 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides at its 5' end).

450. A template RNA (or DNA encoding a template RNA) comprising (e.g., from 5' to 3 ') (i) a sequence that binds to a target site (e.g., the second strand of a site in a target genome), (ii) a sequence that specifically binds to an RT domain of a polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' target homology domain.

451. The template RNA of any preceding embodiment, further comprising (v) a sequence of an endonuclease and/or a DNA binding domain that binds a polypeptide (e.g., the same polypeptide comprising the RT domain).

452. The template RNA of any preceding embodiment, wherein the RT domain comprises a sequence selected from the group consisting of the sequences of table 1 or 3 or the sequence of the reverse transcriptase domain of table 2 or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

453. The template RNA of any embodiment, wherein the RT domain comprises a sequence selected from the group consisting of the sequences of table 1 or 3 or the sequence of the reverse transcriptase domain of table 2, wherein the RT domain further comprises a plurality of substitutions relative to the native sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions.

454. The template RNA of any preceding embodiment, wherein the sequence of (ii) specifically binds to the RT domain.

455. The template RNA of any preceding embodiment, wherein the sequence that specifically binds to the RT domain is a sequence that binds to the RT domain in a wild-type environment, e.g., a UTR sequence, or a sequence that is at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identical thereto.

456. A template RNA (or DNA encoding the same) comprising, from 5 'to 3': (ii) A sequence of an endonuclease and/or a DNA binding domain that binds a polypeptide, (i) a sequence that binds a target site (e.g., the second strand of a site in a target genome), (iii) a heterologous subject sequence, and (iv) a 3' target homology domain.

457. A template RNA (or DNA encoding the same) comprising, from 5 'to 3': (iii) A heterologous subject sequence, (iv) a 3' target homology domain, (i) a sequence that binds to a target site (e.g., the second strand of a site in a target genome), and (ii) a sequence of an endonuclease and/or a DNA binding domain that binds to a polypeptide.

458. A template RNA (or DNA encoding a template RNA) comprising (e.g., from 5' to 3 ') (i) a sequence that optionally binds to a target site (e.g., an unedited strand of a site in a target genome), (ii) a sequence that optionally binds to an endonuclease and/or a DNA binding domain of a polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain.

459. The template RNA of any preceding embodiment, wherein the template RNA comprises (i).

460. The template RNA of any preceding embodiment, wherein the template RNA comprises (ii).

461. A template RNA (or DNA encoding a template RNA) comprising (e.g., from 5' to 3 ') (i) a sequence that binds to a target site (e.g., an unedited strand of a site in a target genome), (ii) a sequence that specifically binds to an RT domain of a polypeptide, (iii) a heterologous subject sequence, and (iv) a 3' homologous domain.

462. The template RNA of any preceding embodiment, wherein the RT domain comprises a sequence selected from the group consisting of the sequences of table 1 or 3, or the sequences of the protein domains listed in table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

463. The template RNA of any preceding embodiment, further comprising (v) a sequence of an endonuclease and/or a DNA binding domain that binds a polypeptide (e.g., the same polypeptide comprising the RT domain).

464. The template RNA of any preceding embodiment, wherein the sequence of (ii) specifically binds to an RT domain listed in table 1 or 3 or table 2, or an RT domain sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

465. The template RNA of any preceding embodiment, wherein the sequence that specifically binds to the RT domain is the sequence of table 1 or 3, or the sequence of a protein domain listed in table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identity thereto.

466. A template RNA (or DNA encoding the same) comprising, from 5 'to 3': (ii) A sequence of an endonuclease and/or a DNA binding domain that binds a polypeptide, (i) a sequence that binds a target site (e.g., an unedited strand of a site in a target genome), (iii) a heterologous subject sequence, and (iv) a 3' homologous domain.

467. A template RNA (or DNA encoding the same) comprising, from 5 'to 3': (iii) A heterologous subject sequence, (iv) a 3' homologous domain, (i) a sequence that binds to a target site (e.g., an unedited strand of a site in the target genome), and (ii) a sequence of an endonuclease and/or a DNA binding domain that binds to a polypeptide.

468. The system or template RNA of any preceding embodiment, wherein the template RNA, first template RNA, or second template RNA comprises a sequence that specifically binds to the RT domain.

469. The system or template RNA of any preceding embodiment, wherein a sequence that specifically binds to the RT domain is located between (i) and (ii).

470. The system or template RNA of any preceding embodiment, wherein a sequence that specifically binds to the RT domain is located between (ii) and (iii).

471. The system or template RNA of any preceding embodiment, wherein a sequence that specifically binds to the RT domain is located between (iii) and (iv).

472. The system or template RNA of any preceding embodiment, wherein a sequence that specifically binds to the RT domain is located between (iv) and (i).

473. The system or template RNA of any preceding embodiment, wherein a sequence that specifically binds to the RT domain is located between (i) and (iii).

474. A system for modifying DNA, comprising:

(a) A first template RNA (or DNA encoding the first template RNA) comprising (i) a sequence that binds to an endonuclease domain of a polypeptide, e.g., a nicking enzyme domain and/or a DNA Binding Domain (DBD), and (ii) a sequence that binds to a target site (e.g., an unedited strand of a site in a target genome) (e.g., wherein the first RNA comprises a gRNA);

(b) A second template RNA (or DNA encoding the second template RNA) comprising (i) a sequence that specifically binds to a Reverse Transcriptase (RT) domain of a polypeptide (e.g., a polypeptide of (a)), and (ii) a Target Site Binding Sequence (TSBS), and (iii) an RT template sequence.

475. The system of any preceding embodiment, wherein the nucleic acid encoding the first template RNA and the nucleic acid encoding the second template RNA are two separate nucleic acids.

476. The system of any preceding embodiment, wherein the nucleic acid encoding the first template RNA and the nucleic acid encoding the second template RNA are part of the same nucleic acid molecule, e.g., are present on the same vector.

477. A polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a Reverse Transcriptase (RT) domain, (ii) a DNA Binding Domain (DBD); and (iii) an endonuclease domain, such as a nicking enzyme domain, wherein the RT domain has the sequence of table 1 or 3, or the sequence of a protein domain listed in table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity thereto.

478. A system for modifying DNA, comprising:

(a) A first polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises a Reverse Transcriptase (RT) domain, wherein the RT domain has the sequence of table 1 or 3, or the sequence of a protein domain listed in table 2, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto; and optionally, a DNA Binding Domain (DBD) (e.g., a first DBD); and

(b) A second polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a DBD (e.g., a second DBD); and (ii) an endonuclease domain, such as a nicking enzyme domain.

479. The system of any preceding embodiment, wherein the nucleic acid encoding the first polypeptide and the nucleic acid encoding the second polypeptide are two separate nucleic acids.

480. The system of any preceding embodiment, wherein the nucleic acid encoding the first polypeptide and the nucleic acid encoding the second polypeptide are part of the same nucleic acid molecule, e.g., are present on the same vector.

481. The system, method, kit, template RNA, or reaction mixture of any of the preceding embodiments, wherein the RNA of the system (e.g., template RNA, RNA encoding the polypeptide of (a), or RNA expressed from a heterologous subject sequence integrated into the target DNA) comprises a microRNA binding site, e.g., in the 3' utr.

482. The system, method, kit, template RNA, or reaction mixture of example 481, wherein the microrna binding site is recognized by a miRNA present in a non-target cell type but not present in a target cell type (or present at a reduced level relative to the non-target cell).

483. The system, method, kit, template RNA, or reaction mixture of embodiment 481 or 482, wherein the miRNA is miR-142, and/or wherein the non-target cell is a Kupffer cell or a blood cell, e.g., an immune cell.

484. The system, method, kit, template RNA, or reaction mixture of embodiment 481 or 482, wherein the miRNA is miR-182 or miR-183, and/or wherein the non-target cell is a dorsal root ganglion neuron.

485. The system, method, kit, template RNA, or reaction mixture of any one of embodiments 481-484, wherein the system comprises a first miRNA binding site recognized by a first miRNA (e.g., miR-142), and the system further comprises a second miRNA binding site recognized by a second miRNA (e.g., miR-182 or miR-183), wherein the first miRNA binding site and the second miRNA binding site are located on the same RNA or different RNAs of the system.

486. The system, method, kit, template RNA, or reaction mixture of any one of embodiments 481-485, wherein the template RNA comprises at least 2, 3, or 4 miRNA binding sites, e.g., wherein the miRNA binding sites are recognized by the same or different mirnas.

487. The system, method, kit, template RNA, or reaction mixture of any one of embodiments 481-486, wherein the RNA encoding the polypeptide of (a) comprises at least 2, 3, or 4 miRNA binding sites, e.g., wherein the miRNA binding sites are recognized by the same or different mirnas.

488. The system, method, kit, template RNA, or reaction mixture of any one of embodiments 481-487, wherein RNA expressed from a heterologous subject sequence integrated into the target DNA comprises at least 2, 3, or 4 miRNA binding sites, e.g., wherein these miRNA binding sites are recognized by the same or different mirnas.

Definition of the definition

Domain: as used herein, the term "domain" refers to the structure of a biomolecule that contributes to a particular function of the biomolecule. The domains may comprise contiguous regions (e.g., contiguous sequences) or different non-contiguous regions (e.g., non-contiguous sequences) of a biomolecule. Examples of protein domains include, but are not limited to, endonuclease domains, DNA binding domains, and reverse transcription domains; examples of domains of nucleic acids are regulatory domains, such as transcription factor binding domains.

Exogenous: as used herein, the term exogenous, when used with respect to a biomolecule (e.g., a nucleic acid sequence or polypeptide), means that the biomolecule is introduced into the host genome, cell or organism by hand. For example, a nucleic acid added to an existing genome, cell, tissue, or subject using recombinant DNA technology or other methods is exogenous to the existing nucleic acid sequence, cell, tissue, or subject.

First/second chain: as used herein, a first strand and a second strand of a single DNA strand used to describe a target DNA distinguish the two DNA strands based on the strand of reverse transcriptase domain initiated polymerization, e.g., where synthesis initiated based on target priming. The first strand refers to the strand of the target DNA on which the reverse transcriptase domain initiates polymerization, e.g., where target-initiated synthesis is initiated. The second strand refers to the other strand of the target DNA. The first and second strand names do not otherwise describe the target site DNA strand; for example, in some embodiments, the first strand and the second strand are nicked by the polypeptides described herein, but the names of the "first" and "second" strands are independent of the order in which such nicks occur.

Genomic safe harbor site (GSH site): a genomic safe harbor site is a site in the host genome that is capable of accommodating integration of new genetic material, e.g., such that the inserted genetic element does not cause significant alterations to the host genome to risk host cell or organism constitution. GSH sites typically meet 1, 2, 3, 4, 5, 6, 7, 8, or 9 of the following criteria: (i) >300kb from the cancer-associated gene; (ii) >300kb from miRNA/other functional micrornas; (iii) >50kb from the 5' gene end; (iv) >50kb from the origin of replication; (v) >50kb from any extremely conserved element; (vi) low transcriptional activity (i.e., no mRNA +/-25 kb); (vii) not in the copy number variable region; (viii) in open chromatin; and/or (ix) is unique, with 1 copy in the human genome. Examples of GSH sites in the human genome that meet some or all of these criteria include: (i) Adeno-associated virus site 1 (AAVS 1), which is a naturally occurring site of integration of AAV virus on chromosome 19; (ii) Chemokine (C-C motif) receptor 5 (CCR 5) gene, a chemokine receptor gene known as HIV-1 co-receptor; (iii) a human ortholog of the mouse Rosa26 locus; (iv) an rDNA locus; (v) Albumin loci, e.g., for hepatocyte applications; (vi) T cell receptor alpha constant (TRAC) loci, for example, for T cell applications. Additional GSH sites are known and described, for example, in Pellenz et al, 8.8.20-day electronic publication (https:// doi.org/10.1101/396390).

Heterologous: when used in reference to a second element to describe the first element, the term heterologous means that the first element and the second element do not exist in nature in the arrangement as described. For example, a heterologous polypeptide, nucleic acid molecule, construct or sequence refers to (a) a polypeptide, nucleic acid molecule or a portion of a polypeptide or nucleic acid molecule sequence that is not native to the cell in which it is expressed, (b) a polypeptide or nucleic acid molecule or portion of a polypeptide or nucleic acid molecule that has been altered or mutated relative to its native state, or (c) a polypeptide or nucleic acid molecule that has altered expression compared to the native expression level under similar conditions. For example, heterologous regulatory sequences (e.g., promoters, enhancers) may be used to regulate the expression of a gene or nucleic acid molecule in a manner different from that in which the gene or nucleic acid molecule is normally expressed in nature. In another example, a heterologous domain of a polypeptide or nucleic acid sequence (e.g., a DNA binding domain of a polypeptide or a nucleic acid encoding a DNA binding domain of a polypeptide) may be disposed relative to other domains, or may be of different sequence or from different sources relative to other domains or portions of a polypeptide or nucleic acid encoding the same. In certain embodiments, the heterologous nucleic acid molecule may be present in the native host cell genome, but may have altered expression levels or have different sequences or both. In other embodiments, the heterologous nucleic acid molecule may not be endogenous to the host cell or host genome, but rather is introduced into the host cell by transformation (e.g., transfection, electroporation), wherein the added molecule may be integrated into the host genome, or may exist transiently (e.g., mRNA) or semi-stably as extrachromosomal genetic material for more than one generation (e.g., episomal viral vectors, plasmids, or other self-replicating vectors).

Mutant or mutated: the term "mutated" when applied to a nucleic acid sequence means that the nucleotides in the nucleic acid sequence can be inserted, deleted or altered as compared to a reference (e.g., native) nucleic acid sequence. A single change (point mutation) may be made at a locus, or multiple nucleotides may be inserted, deleted or altered at a single locus. In addition, one or more changes may be made at any number of loci within the nucleic acid sequence. The nucleic acid sequence may be mutated by any method known in the art.

Nucleic acid molecules: nucleic acid molecules refer to both RNA and DNA molecules, including but not limited to cDNA, genomic DNA, and mRNA, and also include synthetic nucleic acid molecules, such as chemically synthesized or recombinantly produced nucleic acid molecules, such as RNA templates as described herein. The nucleic acid molecule may be double-stranded or single-stranded, circular or linear. If single stranded, the nucleic acid molecule may be the sense strand or the antisense strand. Unless otherwise indicated, and as an example of all sequences described herein in the generic format "SEQ ID NO:1," a nucleic acid comprising SEQ ID NO:1 "refers to a nucleic acid having (i) the sequence of SEQ ID NO:1 or (ii) a sequence complementary to SEQ ID NO:1, at least in part. The choice between the two depends on the context in which SEQ ID NO. 1 is used. For example, if a nucleic acid is used as the probe, the choice between the two depends on the requirement that the probe be complementary to the desired target. As will be readily appreciated by those skilled in the art, the nucleic acid sequences of the present disclosure may be chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases. Such modifications include, for example, tags, methylation, substitution of one or more naturally occurring nucleotides with an analog, internucleotide modifications, such as uncharged linkages (e.g., methylphosphonate, phosphotriester, phosphoramidate, carbamate, etc.), charged linkages (e.g., phosphorothioate, phosphorodithioate, etc.), side chain moieties (e.g., polypeptides), intercalators (e.g., acridine, psoralen, etc.), chelators, alkylating agents, and modified linkages (e.g., alpha anomeric nucleic acids, etc.). Also included are synthetic molecules that mimic the ability of a polynucleotide to bind to a designated sequence via hydrogen bonding and other chemical interactions. Such molecules are known in the art and include, for example, those in which peptide bonds replace phosphate bonds in the backbone of the molecule. Other modifications may include, for example, analogs in which the ribose ring contains a bridging moiety or other structure (e.g., modifications found in "locked" nucleic acids).

Gene expression unit: a gene expression unit is a nucleic acid sequence comprising at least one regulatory nucleic acid sequence operably linked to at least one effector sequence. A first nucleic acid sequence is operably linked to a second nucleic acid sequence when the first nucleic acid sequence is placed into a functional relationship with the second nucleic acid sequence. For example, a promoter or enhancer is operably linked to a coding sequence if it affects the transcription or expression of the coding sequence. Operably linked DNA sequences may be contiguous or non-contiguous. Where it is desired to join two protein coding regions, the operably linked sequences may be in the same reading frame.

And (3) a host: as used herein, the term host genome or host cell refers to a cell and/or genome thereof into which a protein and/or genetic material has been introduced. It will be understood that such terms are intended to refer not only to a particular subject cell and/or genome, but also to the progeny of such a cell and/or the genome of the progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term "host cell" as used herein. The host genome or host cell may be an isolated cell or cell line grown in culture, or genomic material isolated from such a cell or cell line, or may be a host cell or host genome that constitutes a living tissue or organism. In some cases, the host cell may be an animal cell or a plant cell, e.g., as described herein. In some cases, the host cell may be a bovine cell, equine cell, porcine cell, caprine cell, ovine cell, chicken cell, or turkey cell. In some cases, the host cell may be a maize cell, a soybean cell, a wheat cell, or a rice cell.

Operative association: as used herein, "operably linked" describes a functional relationship between two nucleic acid sequences (e.g., 1) a promoter and 2) a heterologous subject sequence, and in such instances means that the promoter and the heterologous subject sequence (e.g., the gene of interest) are oriented such that, under appropriate conditions, the promoter drives expression of the heterologous subject sequence. For example, the template nucleic acid may be single stranded, e.g., in (+) or (-) orientation, but the operable association between the promoter and the heterologous subject sequence means that the template nucleic acid will be transcribed in a particular state, and when it is in the appropriate state (e.g., in the (+) orientation, in the presence of the desired catalytic factors, NTP, etc.), it will indeed be transcribed accurately. The operative association applies similarly to other nucleic acid pairs, including other tissue-specific expression control sequences (e.g., enhancers, repressors, and microRNA recognition sequences), IR/DR, ITR, UTR or homologous and heterologous subject sequences, or sequences encoding transposases.

False knot: as used herein, a "pseudojunction sequence" sequence refers to a nucleic acid (e.g., RNA) having a sequence with suitable self-complementarity to form a pseudojunction structure, such as having: a second section between the first section, and the third section, wherein the third section is complementary to the first section, and a fourth section, wherein the fourth section is complementary to the second section. The pseudoknot may optionally have an additional secondary structure, for example, a stem loop disposed in the second segment, a stem loop disposed between the second segment and the third segment, a sequence before the first segment or a sequence after the fourth segment. The dummy junction may have an additional sequence between the first section and the second section, between the second section and the third section, or between the third section and the fourth section. In some embodiments, the arrangement of the segments ranges from 5 'to 3': first, second, third and fourth. In some embodiments, the first and third segments comprise five perfectly complementary base pairs. In some embodiments, the second and fourth segments comprise 10 base pairs, optionally with one or more (e.g., two) projections. In some embodiments, the second segment comprises one or more unpaired nucleotides, e.g., forming a loop. In some embodiments, the third segment comprises one or more unpaired nucleotides, e.g., forming a loop.

Stem loop sequence: as used herein, a "stem-loop sequence" refers to a nucleic acid sequence (e.g., an RNA sequence) that has sufficient self-complementarity to form a stem-loop, e.g., having a stem comprising at least two (e.g., 3, 4, 5, 6, 7, 8, 9, or 10) base pairs, and having a loop with at least three (e.g., four) base pairs. The stem may contain a mismatch or protrusion.

One or more tissue-specific expression control sequences: as used herein, "tissue-specific expression control sequence" refers to a nucleic acid element that preferentially increases or decreases in a target tissue, e.g., relative to one or more off-target tissues, the level of transcripts comprising a heterologous subject sequence in a tissue-specific manner in the target tissue. In some embodiments, the tissue-specific expression control sequence preferentially drives or inhibits transcription, activity, or half-life of a transcript comprising the heterologous subject sequence in a target tissue in a tissue-specific manner, e.g., preferentially in one or more of the target tissues relative to one or more off-target tissues. Exemplary tissue-specific expression control sequences include tissue-specific promoters, repressors, enhancers, or combinations thereof, and tissue-specific microRNA recognition sequences. Tissue specificity refers to targeting (one or more tissues that are expected or resistant to the expression or activity of a template nucleic acid) and off-targeting (one or more tissues that are not expected or resistant to the expression or activity of a template nucleic acid). For example, a tissue-specific promoter (e.g., a promoter in a template nucleic acid or controlling expression of a transposase) preferentially drives expression in target tissue relative to off-target tissue. In contrast, micrornas that bind to tissue-specific microrna recognition sequences (either on the transposase-encoding nucleic acid or on the template nucleic acid, or both) are preferentially expressed in off-target tissue relative to on-target tissue, thereby reducing expression of the template nucleic acid (or transposase) in off-target tissue. Thus, with respect to transcription, activity or half-life of the associated sequences in a tissue, promoters and microRNA recognition sequences specific for the same tissue (e.g., target tissue) have different functions (promote and inhibit, respectively, with consistent expression levels, i.e., high levels of microRNA in off-target tissue and low levels in on-target tissue, whereas promoters drive high expression in on-target tissue and low expression in off-target tissue).

Drawings

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the office upon request and payment of the necessary fee.

The C-terminal linker region of the DNA binding domain of R2Tg may be truncated and modified. The deletion from myb domain at a or B to position 1 or 2 in the natural linker was constructed and replaced with a 3GS or XTEN synthetic linker (a). Integration efficiency was measured by ddPCR in HEK293T cells (B).

FIG. 2. Landing pads designed to test for mutations at the R2Tg Gene Writer target site.

FIG. 3a. DdPCR assay measures the percent integration per cell from all lentiviral integration landing pads

FIG. 3b. Amplicon sequencing and NGS analysis of indels present at the landing pad site.

FIG. 4 AAVS1 ZFP replaces the DNA binding domain of the reverse transcriptase Gene Writer.

Fig. 5 substitution of Cas9 or Cas9 nickase for the DNA binding domain of retrotransposase GeneWriter with or without active EN domain (= mutant)

FIG. 6 fusion of AAVS1 ZFP with a retrotransposase Gene Writer with or without a functional DNA binding domain.

FIG. 7 is a second strand-cut schematic. (A) Cas9 nickase fused to Gene Writer protein. The Gene Writer protein introduces a nick in the DNA strand through its EN domain (shown as X), and the fused Cas9 nickase introduces a nick on the top or bottom DNA strand (shown as X). (B) Gene writers target DNA through their DNA binding domains and introduce DNA nicks in their EN domains. A Cas9 nickase is then used to create a second nick (X) upstream or downstream of the top or bottom strand, EN.

FIG. 8 schematic representation of a nickase Cas9-GeneWriter fusion. Schematic representation of the fusion of the nickase Cas9 with the geneWriter protein. Schematic representation of 3' extended gRNA.

FIG. 9 schematic representation of a nickase Cas9-GeneWriter fusion. Schematic representation of the fusion of the nickase Cas9 with the geneWriter protein. (B) Flanking is a schematic representation of a donor transgene with UTR and homology to the cleavage site.

FIG. 10 schematic of the construct. Schematic representation of Gene Writer protein. (B) Flanking is a schematic representation of a donor transgene with UTR and homology to the cleavage site. Schematic of Cas9 construct used in (C).

FIG. 11 schematic representation of mRNA encoding Gene Writer (A). The natural untranslated regions (UTRs) are replaced with 5 'and 3' UTRs optimized for protein expression (shown as 5'utrexp and 3' utrexp). GeneWriter protein expression (B) was detected by HiBit assay by detecting HiBit tag expression.

FIG. 12 genome integration induced by Gene Writer protein with its native UTR and UTR optimized for protein expression. The presence of an RNA template with a native UTR of a retrotransposon stimulates Gene Writing activity with a non-native UTR.

FIG. 13 plasmid DNA delivery Gene Writer system using mRNA encoding the polypeptide and RNA template encoding for reverse transcription transposition.

Figure 14 illustrates a 5' utr engineering strategy diagram. HA = homology arm; k=kozak sequence; pA = poly a signal; ama=a.maritima; rx = other kinds of retrotransposons.

FIG. 15 possible positions of an intron (or introns) in an RNA template. Introns are represented by curves. 5' HA: a 5' homology arm; 3' ha: a 3' homology arm; 5' UTR: retrotransposon-specific 5' UTR;3' UTR: retrotransposon specific 3' UTR; GOI: the target gene. Orange blocks correspond to sequences designed to be expressed from genomic locations that contain themselvesCell-specific promoters, poly (A) signals and UTRs (5 'and 3' UTRs) for protein expression _exp ). The sequence may be in a sense (as shown above) or antisense orientation relative to the retrotransposon UTR and homology arms. Introns may be located within the GOI or UTR _exp And (3) inner part.

FIG. 16 genomic integration in HEK293T cells as reported by the 3' ddPCR assay. Gene Writer mRNA at 0.5 μg/well was co-transfected with RNA template (with or without enzymatically added cap 1 and poly (A) tail). Gene Writer mRNA to RNA transgene ratio was 1:1.

FIG. 17 genome integration was detected by 3' ddPCR, the genome being induced by expression of Gene Writer mRNA produced with either unmodified (G0) or modified nuclear nucleotides (pseudouridine (ψ), 1-N-methyl pseudouridine (1-Me- ψ), 5-methoxyuridine (5-MO-U) or 5-methylcytidine (5 mC)). 1ug Gene Writer mRNA per well was used. Unmodified RNA templates were used. Gene Writer RNA was co-transfected with RNA template at a molar ratio of 1:8.

FIG. 18 these modules comprise a typical Gene Writer RNA template, wherein individual modules may be combined, rearranged and/or omitted to produce a Gene Writer template. A = 5' homology arm; b=ribozyme; c=5' utr; d = heterologous subject sequence; e=3' utr; f=3' homology arm.

FIG. 19 these modules comprise a typical Gene Writer RNA template, wherein individual modules may be combined, rearranged and/or omitted to produce a Gene Writer template. A = 5' homology arm; b=ribozyme; c=5' utr; d = heterologous subject sequence; e=3' utr; f=3' homology arm

FIG. 20 construction of the driving and transgenic plasmids. In this set of experiments, the Homology Arm (HA) and stuffer sequences were variable

FIG. 21 integration efficiency at the 3 'or 5' end of the transgene as measured by digital droplet PCR across the tested constructs. Each point represents a duplicate experiment. Bars represent the average of two replicates. (A, B) integration efficiency measured across the 3' junction between transgene and host rDNA. (C, D) integration efficiency measured across 5' junctions.

FIG. 22 illustrates an example of a homology transfer design for a +/-3bp test. Red indicates homology to the 5 'of the wild-type (WT) nick site and blue indicates homology to the 3' of the nick. The 3 'transfer construct (+) starts 3' homology from farther downstream of the incision. 5' transfer construct (-) incorporates homology from the 5' cut into the 3' homology arm.

FIG. 23.3 'integration is due to the transfer of the transgenic 3' homology arm. Each data point represents a duplicate, while the bar represents the average of two replicates.

FIG. 24 (A) test schedule. Schematic representation of (B) R2Tg and transgene construct configuration. (C) Western blot against Rad51 showed impairment of Rad51 protein expression on day 3.

FIG. 25 treatment of U2OS cells with non-targeted control siRNA (control) or siRNA against Rad51 or R2Tg Wt or control RT and EN mutants. ddPCR at the 3 '(a) or 5' (B) junctions was used to assess integration efficiency on day 3.

FIG. 26 (A) sequence diagram of ribozyme of R2 element of thoracoline (R2 Tg) in case of Gene Writer transgene molecular RNA module. The ribozyme is characterized by: p, based on the pairing zone; p' based on the paired region complementary strand; a loop at the end of the L, P region; j, nucleotides linked to the base pairing region. The figure discloses SEQ ID NO 1592. (B) prediction of the secondary structure of R2Tg ribozyme. The shaded boxes indicate predicted catalytic sites that can be used to inactivate the ribozyme. The figure discloses SEQ ID NO 1592.

FIG. 27 is a sequence diagram of the ribozyme of the R2 element of the thoracoline (R2 Tg) in the case of the Gene Writer transgene molecular RNA module. The ribozyme is characterized by: p, based on the pairing zone; p' based on the paired region complementary strand; a loop at the end of the L, P region; j, nucleotides linked to the base pairing region. The figure discloses SEQ ID NO 1592.

FIG. 28 prediction of ribozyme secondary structure of R2 element of thoracoline. The figure discloses SEQ ID NO 1592.

FIGS. 29A and 29B are a series of diagrams showing configuration examples of Gene writers when domains derived from various sources are used. Gene writers as described herein may or may not include all of the domains described. For example, in some cases, the GeneWrite may lack an RNA binding domain, or may have a single domain that fulfills multiple domain functions, such as a Cas9 domain for DNA binding and endonuclease activity. Exemplary domains that can be included in a geneWriter polypeptide include a DNA binding domain (e.g., a DNA binding domain comprising elements of the sequences listed in any of tables 1 or 3, or a domain listed in Table 2; zinc fingers, TAL domains, cas9, dCas9, nicking enzymes Cas9, transcription factors or meganucleases), RNA binding domains (e.g., RNA binding domains comprising elements of sequences listed in B-frame protein, MS2 coat protein, dCas, or any of tables 1 or 3, or domains listed in table 2), reverse transcriptase domains (e.g., reverse transcriptase domains comprising elements of sequences listed in any of tables 1 or 3, or domains listed in table 2), and/or endonuclease domains (e.g., endonuclease domains comprising elements of sequences listed in any of tables 1 or 3, or domains listed in table 2), cas9, nicking enzymes Cas9, restriction enzymes (e.g., type II restriction enzymes, such as fokl), meganucleases, hollyconnecting dissociases, RLE reverse transcriptases, APE reverse transcriptases, or igs of GIY-yreverse transcriptases). Exemplary geneWriter polypeptides comprising exemplary combinations of such domains are shown in the bottom partial panels.

FIGS. 30A and 30B illustrate mutations in the DNA binding motif in Gene Writer polypeptides that inhibit natural site integration. FIG. 30A discloses the general domain structure of R2Tg retrotransposase (upper), including the DNA binding domain containing multiple predicted DNA binding elements (lower). The two zinc finger motifs and the c-myb motif shown in the protein were mutated as in example 30. FIG. 30B illustrates the evaluation of the integration activity of mutants of ZF1, ZF2 and c-myb domains in HEK293T cells by analysis of the natural rDNA site integration frequency using ddPCR. Each individual mutant, as well as triple mutants, was compared to wild type (positive control) and endonuclease inactivating enzyme (negative control). Data represent the average of two replicates. Illustration: ZF = zinc finger; myb = c-myb-like DNA binding motif; RBD = RNA binding domain; RT = reverse transcriptase domain; en=endonuclease domain; * =mutant domain; CNV/genome = average copy number of integrated DNA/genome copy.

FIGS. 31A and 31B illustrate that the endonuclease cleavage site of a retrotransposase can be detected by indel features. FIG. 31A shows predicted binding and cleavage sites in the R2Tg reverse transcriptase target site. FIG. 31B shows that the cleavage site of R2Tg retrotransposase is verified by analysis of genomic changes caused by endonuclease activity. Plasmid DNA encoding R2Tg reverse transcriptase was nuclear transfected into U2OS cells and genomic DNA was harvested three days later. Target site amplicons are generated using site-specific primers and sequenced to determine the location of genomic alterations indicative of endonuclease activity. Shown here is a graph depicting the frequency of insertions (circles) and deletions (triangles) for each nucleotide of the sequence (x-axis). The peak of the insertion signal (horizontal line under the figure) is located at the predicted GG dinucleotide. Illustration: ZF = zinc finger; myb=c-myb-like DNA binding motif

FIGS. 32A and B show the determination of sequence determinants of endonuclease activity of a retrotransposase by a schematic representation of landing pad screening. FIG. 33A shows that lentiviral expression vectors are used to clone landing pads containing a native R2 retrotransposase target site or a site comprising a mutation relative to the native site. Lentiviral constructs were packaged and used to transduce U2OS cells to generate cell lines with landing pads integrated into the genome. The landing pad also includes a Green Fluorescent Protein (GFP) reporter cassette for titer determination. FIG. 33B shows landing pad sequences for wild-type or mutant variants containing the R2 site. The native rDNA sequence landing pad containing the unmodified rDNA sequence (wt_r2tg) was used as a positive control. A series of 16 landing pads are shown, with the mutated regions indicated in dark grey and GG cleavage sites indicated in light grey (left). The figure (right) is used to visualize the magnitude of each target site variation in terms of endonuclease activity of the enzyme. AA dinucleotide mutations adjacent to the GG dinucleotide cleavage site were found to severely impair endonuclease activity, and thus the motif AAGG is important for R2Tg endonuclease activity.

FIG. 33 shows an overview of landing pad screening for re-targeting Gene Writer polypeptides. Schematic of a landing pad library constructed for analysis of sequences recognized in R2Tg retargeting. AAVS1-ZF binding sites (dark grey and labeled AAVS 1) were used as DNA binding motifs for retargeting, all landing pads were constructed in the context of human AAVS1 genomic sequences. The rDNA sequence (black) was added to the AAVS1 sequence in various ways: (class 1) different lengths of rDNA sequence, (class 2) different distances between AAVS1 ZF binding site and rDNA sequence, (class 3) different orientations of rDNA sequence relative to AAVS1 site. The combined exploration of categories 1, 2 and 3 resulted in landing pads of various rDNA sequence lengths and various distances and orientations relative to AAVS ZF binding sites. The AAGG minimum sequence of R2Tg cuts remained unchanged in all landing pads (black boxes with white fills). Each landing pad is designed with a unique bar code at the 3' end of the sequence to enable the sequence of landing pads to be computationally extracted and analyzed from the pool.

FIG. 34 shows a sequencing-based determination of landing pad representation in a U2OS pool. The landing pad pool of U2OS cells was sequenced and analyzed to determine the bar code representation. Approximately 94% of the landing pads are represented by at least 10,000 reads (horizontal black bars). The x-axis represents landing pad identity and the y-axis shows the total reading of the bar code.

FIGS. 35A and 35B disclose that generating indel features in landing pad libraries enables screening for chimeric Gene Writer polypeptides. FIG. 35A shows landing pad libraries of various compositions comprising AAVS1 and R2 rDNA target sequences, treated with full length R2Tg retrotransposase fused to zinc fingers for AAVS1 sequence recognition. Amplicon sequencing was performed and the frequency of insertion at the GG target site (y-axis) was plotted for each landing pad (x-axis). A representative number of 230 landing pads is shown on the x-axis. A positive control containing 200nt rDNA sequence is shown and the expected insertion profile at the GG cleavage site. Negative controls lacking any rDNA sequence did not have any inserts. The length of the rDNA sequence contained in the landing pad (where the insertion feature is found) indicates and corresponds to 44, 64 and 84nt. FIG. 35B is an illustrative representation of a landing pad configuration found to contain features for endonuclease activity.

FIGS. 36A and B disclose that generating indel features in landing pad libraries enables screening for chimeric Gene Writer polypeptides. FIG. 36A shows landing pad libraries of various compositions comprising AAVS1 and R2 rDNA target sequences, treated with full length R2Tg retrotransposase fused to zinc fingers for AAVS1 sequence recognition. Amplicon sequencing was performed and the frequency of insertion at the GG target site (y-axis) was plotted for each landing pad (x-axis). A representative number of 230 landing pads is shown on the x-axis. Negative controls lacking any rDNA sequence did not have any inserts. The length of the rDNA sequence contained in the landing pad (where the insertion feature is found) indicates and corresponds to 44, 64 and 84nt. FIG. 36B is an illustrative representation of a landing pad configuration found to contain features for endonuclease activity.

FIGS. 37A and 37B depict luciferase activity assays of primary cells. LNP formulated according to example 38 was analyzed for delivery of cargo to primary human (a) and mouse (B) hepatocytes as per example 39. Luciferase assays showed dose-responsive luciferase activity in cell lysates, indicating successful delivery of RNA to cells and expression of firefly luciferase from mRNA cargo.

Figure 38 shows LNP-mediated delivery of RNA cargo to the liver of mice. LNP containing firefly luciferase mRNA was formulated and delivered to mice by iv, and liver samples were collected 6, 24 and 48 hours after administration and luciferase activity was assayed. The reporter activity of each formulation was, in turn, LIPIDV005> LIPIDV004> LIPIDV003.RNA expression was transient and enzyme levels recovered to near mediator background 48 hours after administration.

Figure 39 shows improved expression of Cas-RT fusion by selection of linker sequences. To assess how the linker alters expression of the novel Gene Writer polypeptide in human cells, U2OS cells were transfected with Cas-RT expression plasmids (containing the various linkers from table 42, cas9 (N863A) nickase fused to the RT domain of the RNA binding domain mutated R2Bm retrotransposase). Cell lysates were collected and analyzed by western blot using a primary antibody against Cas 9. Primary antibodies to focal adhesion protein (left) or GADPH (right) were included as loading controls. The Cas9 control on the left represents titration of Cas9 expression plasmid. The open arrow indicates the original linker tested, while the closed arrow indicates the linker found to significantly improve expression of the fusion polypeptide (linker 10; SEQ ID NO: 468). The sample numbers correspond to the linker sequence identifiers in table 42.

Detailed Description

The present disclosure relates to compositions, systems, and methods for, e.g., targeting, editing, modifying, or manipulating DNA sequences at one or more locations in a DNA sequence in a cell, tissue, or subject (e.g., inserting a heterologous subject DNA sequence into a target site of a mammalian genome), in vivo or in vitro. The subject DNA sequences may include, for example, coding sequences, regulatory sequences, gene expression units.

More specifically, the present disclosure provides retrotransposon-based systems for inserting a sequence of interest into a genome. The present disclosure is based in part on bioinformatic analysis to identify retrotransposase sequences and related 5 'utrs and 3' utrs from a variety of organisms (see table 3).

Gene-writer ^TM Genome editor

non-Long Terminal Repeat (LTR) retrotransposons are a type of mobile genetic element that is widely distributed in eukaryotic genomes. They include two classes: purine-free/pyrimidine-free endonuclease (APE) type and restriction endonuclease like endonuclease (RLE) type. APE-like retrotransposons consist of two functional domains: endonuclease/DNA binding domains and reverse transcriptase domains. RLE class consists of three functional domains: a DNA binding domain, a reverse transcription domain, and an endonuclease domain. The reverse transcriptase domain of a non-LTR retrotransposon functions by binding to the RNA sequence template and reverse transcribing it into the target DNA of the host genome. The RNA sequence templates have 3 'untranslated regions that specifically bind to transposases and variable 5' regions that typically have one or more open reading frames ("ORFs") encoding transposase proteins. The RNA sequence templates may also contain 5' untranslated regions that specifically bind to retrotransposases.

Reverse transcription of non-LTR retrotransposons occurs through a unique process described as target-initiated reverse transcription (Luan et al Cell [ Cell ]72,595-605 (1993)). To initiate integration, a first single-stranded nick is created by the endonuclease domain of the retrotransposase, releasing the free 3' -OH. Retrotransposon RNA (which is bound by the retrotransposase using the structural features of the 3 'end) is then primed by the target site and polymerized at the free 3' -OH and serves as a template for reverse transcription. In some systems, the second nick is directed to the second DNA strand and the new free 3' -OH is used to initiate synthesis of the second strand. For this second cut, some non-LTR retrotransposons (e.g., R2) are thought to also need to interact with a second retrotransposase unit at the 5 'end of the retrotransposon RNA, which is activated upon release of the 5' end (Craig, mobile DNA III [ Mobile DNA III ], ASM, 3 rd edition (2105)).

As described herein, elements of such non-LTR retrotransposons may be functionally modularized and/or modified to target, edit, modify, or manipulate a target DNA sequence, e.g., to insert a subject (e.g., heterologous) nucleic acid sequence into a target genome, e.g., a mammalian genome, by reverse transcription. Such modular and modified nucleic acids, polypeptide compositions and systems are described herein and are referred to as Gene writers ^TM A gene editor. Gene Writer ^TM The gene editor system comprises: (A) A polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (x) an endonuclease domain comprising a DNA binding function or (y) an endonuclease domain and a separate DNA binding domain; and (B) a template RNA comprising (i) a sequence that binds to the polypeptide and (ii) a heterologous insert sequence. For example, the Gene Writer genome editor protein may comprise a DNA binding domain, a reverse transcriptase domain, and an endonuclease domain. In other embodiments, the Gene Writer genome editor protein may comprise a reverse transcriptase domain and an endonuclease domain. In certain embodiments, gene Writer ^TM The elements of the gene editor polypeptide may be derived from sequences other than LTR retrotransposons, for example APE-type or RLE-type retrotransposons or parts or domains thereof. In some embodiments, the RLE type non-LTR retrotransposon is from R2, neSL, HERO, R4, or CRE clade. In some embodiments, the Gene Writer genome editor is derived from the r4 element x4_line found in the human genome. In some embodiments, the APE non-LTR retrotransposon is from the R1 or Tx1 clade. In some embodiments, the Gene Writer genome editor source The Tx1 element Mare6 found in the human genome. Gene Writer ^TM The RNA template elements of the gene editor system are typically heterologous to the polypeptide elements and provide the subject sequence to be inserted (reverse transcribed) into the host genome. In some embodiments, the Gene Writer genome editor protein is capable of targeting primed reverse transcription. In some embodiments, the Gene Writer genome editing protein is capable of second strand synthesis.

In some embodiments, the Gene Writer genome editor is combined with a second polypeptide. In some embodiments, the second polypeptide is derived from an APE-type non-LTR retrotransposon. In some embodiments, the second polypeptide has a zinc knuckle-like motif. In some embodiments, the second polypeptide is a homolog of the Gag protein. In some embodiments, the second polypeptide has specific binding activity for an RNA template. In some embodiments, the second polypeptide aids in localization of the RNA template to the nucleus.

In embodiments, the disclosure provides nucleic acid molecules or systems for re-targeting, such as Gene Writer polypeptides or nucleic acid molecules, or systems as described herein. Retargeting (e.g., of a Gene Writer polypeptide or nucleic acid molecule, or of a system as described herein) generally comprises: (i) directing binding and cleavage of the polypeptide at the target site; and/or (ii) designing the template RNA to have complementarity to the target sequence. In some embodiments, the template RNA has complementarity to the target sequence 5' of the first strand incision, e.g., such that the 3' end of the template RNA anneals and the 5' end of the target site serves as a primer, e.g., for reverse transcription (TPRT) for target priming. In some embodiments, the endonuclease domain of the polypeptide and the 5' end of the RNA template are also modified as described.

Polypeptide component of Gene Writer Gene editor System

RT domain:

in certain aspects of the invention, the reverse transcriptase domain of the Gene Writer system is based on the reverse transcriptase domain of an APE-type or RLE-type non-LTR retrotransposon. The wild-type reverse transcriptase domain of APE-type or RLE-type non-LTR retrotransposons can be used in Gene writers ^TM In systems or can be modified (e.g. generalOver insertion, deletion or substitution of one or more residues) to alter the reverse transcriptase activity of the target DNA sequence. In some embodiments, the reverse transcriptase is altered from its native sequence to have altered codon usage, e.g., to improve upon a human cell. In some embodiments, the reverse transcriptase domain is a heterologous reverse transcriptase from a different retrovirus, an LTR-retrotransposon, or a non-LTR retrotransposon. In certain embodiments, gene Writer ^TM The system includes a polypeptide comprising a reverse transcriptase domain of an RLE-type non-LTR retrotransposon from the R2, neSL, HERO, R, or CRE clade, or an APE-type non-LTR retrotransposon from the R1 or Tx1 clade. In certain embodiments, gene Writer ^TM The system includes a polypeptide comprising a non-LTR retrotransposon, an LTR retrotransposon, a group II intron, a diversity generating element, a retrotransposon, a telomerase, a retroplasmid, a retrovirus, or a reverse transcriptase domain of an engineered polymerase listed in table 1 or table 3. In some embodiments, gene Writer ^TM The system includes a polypeptide comprising a reverse transcriptase domain listed in table 2. In the examples, gene Writer ^TM The amino acid sequence of the reverse transcriptase domain of the system is at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of the reverse transcriptase domain of a non-LTR reverse transcriptase transposon, an LTR reverse transcriptase, a group II intron, a diversity generating element, a reverse transcriptase, a telomerase, a reverse transcriptase plasmid, a retrovirus, or an engineered polymerase (the sequences of which are referenced in table 1 or table 3), or a peptide comprising the reverse transcriptase domains listed in table 2. In some embodiments, the RT domain has a sequence selected from table 1 or 3, or a sequence comprising a peptide selected from the RT domain of table 2, or a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical thereto. In some embodiments, the RT domain is derived from a retrovirus RT, e.g., HIV-1RT, moloney Murine Leukemia Virus (MMLV) RT, avian Myeloblastosis Virus (AMV) R T, rous Sarcoma Virus (RSV) RT. In some embodiments, the RT domain is derived from RT of a group II intron, e.g., group II intron maturation enzyme RT (MarathonRT) (Zhao et al RNA 24:2 2018) from Eubacterium rectum (Eubacterium rectale), RT domain derived from LtreA, RT TGIRT (or trt). In some embodiments, the RT domain is derived from RT of a reverse transcript, e.g., reverse transcriptase derived from Ec86 (RT 86). In some embodiments, the RT domain is derived from a reverse transcription element that generates diversity, e.g., brt-derived RT. In some embodiments, the RT domain is derived from an RT of a reverse transcription plasmid, e.g., an RT derived from a maericeville plasmid. In some embodiments, the RT domain is derived from a non-LTR retrotransposon, e.g., an RT derived from R2Bm, an RT derived from R2Tg, an RT derived from LINE-1, an RT derived from Penelope or Penelope-like element (PLE). In some embodiments, the RT domain is derived from an LTR retrotransposon, e.g., a reverse transcriptase derived from Ty 1. In some embodiments, the RT domain is derived from telomerase, e.g., TERT. One of ordinary skill in the art can identify reverse transcription domains based on homology to other known reverse transcription domains using conventional tools as Basic Local Alignment Search Tools (BLAST). In some embodiments, the reverse transcriptase comprises an InterPro domain IPR000477. In some embodiments, the reverse transcriptase comprises pfam domain PF00078. In some embodiments, the reverse transcriptase comprises an InterPro domain IPR013103. In some embodiments, RT comprises pfam domain PF07727. In some embodiments, the reverse transcriptase comprises a conserved protein domain of the cd00304 rt_like family, such as cd01644 (rt_pepa 17), cd01645 (rt_rtv), cd01646 (rt_bac_reverse transcript_i), cd01647 (rt_ltr), cd01648 (TERT), cd01650 (rt_nltr_like), cd01651 (rt_g2_intron), cd01699 (rna_dep_rnap), cd01709 (rt_like_1), cd03487 (rt_bac_reverse transcript_ii), cd03714 (rt_dirs1), cd03715 (rt_zfrev_like). It is also possible to search protein databases (e.g.InterPro (Mitchell et al Nucleic Acids Res [ nucleic acids research) ]47, D351-360 (2019)), uniProt (The UniProt Consortium Nucleic Acids Res [ UniProt Joint nucleic acid research)]47, D506-515 (2019)), or a conserved domain database (Lu et al Nucleic Acids Res [ nucleic acids research)]48, D265-268 (2020))) or using predictionsTools (e.g., interProScan) scan the open reading frames of reverse transcriptase domains to find proteins that contain these domains. The diversity of reverse transcriptases (e.g., comprising an RT domain) has been described in, but is not limited to, the following: reverse transcriptase for use with prokaryotes (Zimmerly et al Microbiol Spectr microbial Spectrum]3 (2) MDNA3-0058-2014 (2015); lampson b.c. (2007) Prokaryotic Reverse Transcriptases. [ prokaryotic reverse transcriptase]In the following steps: polaina j., macCabe a.p. (edit) Industrial Enzymes [ industrial enzymes]Springer, duoderench (Dordrecht), reverse transcriptase for viral use (Herschhorn et al Cell Mol Life Sci [ cell and molecular life sciences)]67 (16) 2717-2747 (2010); menendez-Arias et al Virus Res [ Virus Studies]234:153-176 (2017)), and a movable element (Eickbush et al Virus Res [ Virus research ]]134 (1-2) 221-234 (2008); craig et al Mobile DNA ]Third edition, DOI 10.1128/9781555819217 (2015)), each of which is incorporated herein by reference.

In some embodiments, the RT domain exhibits enhanced stringency of target-initiated reverse transcription (TPRT) initiation, e.g., relative to an endogenous RT domain. In some embodiments, the RT domain initiates TPRT when 3nt in the target site immediately upstream of the first strand incision, e.g., genomic DNA of the priming RNA template, has at least 66% or 100% complementarity to the cognate 3nt in the RNA template. In some embodiments, the RT domain initiates TPRT when there is less than 5nt mismatch (e.g., less than 1, 2, 3, 4, or 5nt mismatch) between template RNA homology and target DNA priming reverse transcription. In some embodiments, the RT domain is modified such that the stringency of mismatches in TPRT reaction priming is increased, e.g., wherein the RT domain does not tolerate any mismatches or tolerate fewer mismatches in the priming region relative to a wild type (e.g., unmodified) RT domain. In some embodiments, the RT domain comprises an HIV-1RT domain. In an example, the HIV-1RT domain initiates lower levels of synthesis even with three nucleotide mismatches relative to the surrogate RT domain (e.g., as described in Jambuuthugoda and Eickbush J Mol Biol [ J. Mol. Biol. 407 (5): 661-672 (2011; incorporated herein by reference in its entirety).

In some embodiments, the RT domain forms a dimer (e.g., a heterodimer or homodimer). In some embodiments, the RT domain is monomeric. In some embodiments, the RT domain, e.g., a retroviral RT domain, naturally functions as a monomer or dimer (e.g., a heterodimer or homodimer). In some embodiments, the RT domain naturally functions as a monomer, e.g., derived from a virus, where it functions as a monomer. Exemplary monomeric RT domains, their viral sources, and RT features associated with them can be found in table 30, and description of domain features in table 32. In some embodiments, the RT domains of the systems described herein comprise the amino acid sequences of table 30, or functional fragments or variants thereof, or sequences having at least 70%, 80%, 90%, 95% or 99% sequence identity thereto. In embodiments, the RT domain is selected from murine leukemia virus (MLV; sometimes referred to as MoMLV) (e.g., P03355), porcine Endogenous Retrovirus (PERV) (e.g., uniProt Q4VFZ 2), mouse Mammary Tumor Virus (MMTV) (e.g., uniProt P03365), messan-Pfizer) monkey virus (MPMV) (e.g., uniProt P07572), bovine Leukemia Virus (BLV) (e.g., uniProt P03361), human T-cell leukemia virus-1 (HTLV-1) (e.g., uniProt P03362), human Foamy Virus (HFV) (e.g., uniProt P14350), simian Foamy Virus (SFV) (e.g., uniProt P23074) or bovine foamy/syncytial virus (BFV/BSV) (e.g., uniProt O41894), or functional fragments or variants thereof (e.g., amino acid sequences having at least 70%, 80%, 90%, 95% or 99% identity thereto). In some embodiments, the RT domain is a dimer in its native function. Exemplary dimeric RT domains, their viral sources and RT features associated with them can be found in table 31, and description of domain features in table 32. In some embodiments, the RT domains of the systems described herein comprise the amino acid sequences of table 31, or functional fragments or variants thereof, or sequences having at least 70%, 80%, 90%, 95% or 99% sequence identity thereto. In some embodiments, the RT domain is derived from a virus, wherein it functions as a dimer. In embodiments, the RT domain is selected from the group consisting of RT domains from: avian sarcoma/leukemia virus (ASLV) (e.g., uniProt A0a142BKH 1), rous Sarcoma Virus (RSV) (e.g., uniProt P03354), avian Myeloblastosis Virus (AMV) (e.g., uniProt Q83133), human immunodeficiency virus type I (HIV-1) (e.g., uniProt P03369), human immunodeficiency virus type II (HIV-2) (e.g., uniProt P15833), simian Immunodeficiency Virus (SIV) (e.g., uniProt P05896), bovine Immunodeficiency Virus (BIV) (e.g., uniProt P19560), equine Infectious Anemia Virus (EIAV) (e.g., uniProt P03371) or Feline Immunodeficiency Virus (FIV) (e.g., uniProt P16088) (Herschhorn and Hizi Cell Mol Life Sci [ cell and molecular life sciences ]67 (16): 2717-47 (2010)), or functional fragments or variants thereof (e.g., having an amino acid sequence of at least 70%, 80%, 90%, 95% or 99% identity thereto. In some embodiments, the native heterodimeric RT domain may also function as a homodimer. In some embodiments, the dimeric RT domain is expressed as a fusion protein, e.g., a homodimeric fusion protein or a heterodimeric fusion protein. In some embodiments, the RT functionality of the system is implemented by multiple RT domains (e.g., as described herein). In further embodiments, multiple RT domains are fused or separate, e.g., may be on the same polypeptide or on different polypeptides.

In some embodiments, the genewriters described herein comprise an integrase domain, e.g., wherein the integrase domain may be part of an RT domain. In some embodiments, the RT domain (e.g., as described herein) comprises an integrase domain. In some embodiments, the RT domain (e.g., as described herein) lacks an integrase domain, or comprises an integrase domain that has been inactivated by mutation or deletion. In some embodiments, the genewriters described herein comprise an rnase H domain, e.g., wherein the rnase H domain may be part of an RT domain. In some embodiments, the RT domain (e.g., as described herein) comprises an rnase H domain, e.g., an endogenous rnase H domain or a heterologous rnase H domain. In some embodiments, the RT domain (e.g., as described herein) lacks an rnase H domain. In some embodiments, the RT domain (e.g., as described herein) comprises an addition, deletion, mutation, or exchange of a heterologous rnase H domain. In some embodiments, mutations in the RNase H domain produce polypeptides that exhibit lower RNase activity, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% reduced compared to an otherwise similar domain without the mutation, as determined by the method described by Kotewicz et al nucleic Acids Res 16 (1): 265-277 (1988), which is incorporated herein by reference in its entirety. In some embodiments, rnase H activity is eliminated.

In some embodiments, the RT domain is mutated to increase fidelity compared to an otherwise similar domain without the mutation. For example, in some embodiments, the YADD (SEQ ID NO: 1547) or YMDD (SEQ ID NO: 1548) motif in the RT domain (e.g., in reverse transcriptase) is replaced with YVDD (SEQ ID NO: 1549). In the examples, replacement of YADD (SEQ ID NO: 1547) or YMDD (SEQ ID NO: 1548) or YVDD (SEQ ID NO: 1549) results in greater fidelity of retroviral reverse transcriptase activity (e.g., as described in Jambuuthugoda and Eickbush J Mol Biol [ journal of molecular biology ] 2011; incorporated herein by reference in its entirety).

In some embodiments, the reverse transcriptase domain is modified, e.g., by site-specific mutagenesis. In some embodiments, the reverse transcriptase domain comprises a plurality of amino acid substitutions relative to the native sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. In embodiments, the reverse transcriptase domain is engineered to bind to heterologous template RNA.

Table 1: exemplary reverse transcriptase domains from different types of sources.

Sources include group II introns, non-LTR retrotransposons, retroviruses, LTR retrotransposons, diversity-generating reverse transcription elements, retrotransposons, telomerase, retroplasmids and evolved DNA polymerases. Related RT features from InterPro, pfam and cd databases are also included. Although the evolved polymerase RTX can perform RNA-dependent DNA polymerization, the intelproscan does not identify RT features and thus instead includes polymerase features.

/>

Table 2: the characteristics of the presence of reverse transcriptase are described in Table 1 for InterPro.

/>

Table 30: exemplary monomeric retroviral reverse transcriptase and RT domain characterization thereof

/>

Table 31: exemplary dimeric retroviral reverse transcriptase and RT domain characterization thereof

/>

Table 32: interPro description of the features present in the reverse transcriptase in Table 30 (monomer virus RT) and Table 31 (dimer virus RT).

/>

Endonuclease domain:

in certain embodiments, an endonuclease/DNA binding domain of an APE-type retrotransposon or an endonuclease domain of an RLE-type retrotransposon (e.g., by insertion, deletion, or substitution of one or more residues) may be used or may be modified in the Gene Writer systems described herein. In some embodiments, the endonuclease domain or endonuclease/DNA binding domain is altered from its native sequence to have altered codon usage, e.g., improved for human cells. In some embodiments, the endonuclease element is a heterologous endonuclease element, such as a Fok1 nuclease, a type II restriction endonuclease (RLE type nuclease), or another RLE type endonuclease (also known as REL). In some embodiments, the heterologous endonuclease activity has nicking enzyme activity and does not form a double strand break. In some embodiments, the heterologous endonuclease is a CRISPR-associated nuclease, such as Cas9, or a CRISPR-associated nuclease having nickase activity, such as Cas9 nickase. The amino acid sequence of the endonuclease domain of the Gene Writer system described herein may be at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of the endonuclease domain of a retrotransposon of a DNA sequence recited in table 1, 2 or 3. One of ordinary skill in the art can identify endonuclease domains based on homology to other known endonuclease domains using tools such as Basic Local Alignment Search Tools (BLAST). In certain embodiments, the heterologous endonuclease is Fok1 or a functional fragment thereof. In certain embodiments, the heterologous endonuclease is a Holliday (Holliday) ligation dissociation enzyme or a homolog thereof, such as a Holliday ligation dissociation enzyme from sulfolobus solfataricus (Sulfolobus solfataricus) -Ssol Hje (Govindaraju et al Nucleic Acids Research [ nucleic acids research ]44:7,2016). In certain embodiments, the heterologous endonuclease is an endonuclease of a large fragment of a spliceosome protein such as Prp8 (Mahbub et al, mobile DNA [ Mobile DNA ]8:16, 2017). For example, the Gene Writer polypeptides described herein may comprise a reverse transcriptase domain from an APE or RLE-type retrotransposon and an endonuclease domain comprising Fok1 or a functional fragment thereof. In other embodiments, the cognate endonuclease domain is modified, e.g., by site-specific mutagenesis, to alter DNA endonuclease activity. In other embodiments, the endonuclease domain is modified to remove any potential DNA sequence specificity.

In addition to target site nicks required to initiate target-initiated reverse transcription, supplementation of endonuclease activity may help improve resolution of integration events (Anzalone et al Nature [ Nature ]576,149-157 (2019)). In some embodiments, the endonuclease element of the polypeptide provides a nick for initiating target-initiated reverse transcription, and the additional heterologous domain of the polypeptide provides additional endonuclease activity. In some embodiments, the additional endonuclease activity is provided by a nicking enzyme. In some embodiments, the additional endonuclease activity may be provided by a heterologous DNA binding element that also has endonuclease activity, such as a Cas9 nickase. In some embodiments, additional endonuclease activity may be included in the first Gene Writer polypeptide. In some embodiments, additional endonuclease activity may be provided by a separate polypeptide.

In some embodiments, the Gene Writer polypeptides described herein comprise an endonuclease domain that cleaves at a predetermined position in a target DNA sequence, e.g., as measured using the assay of example 32 herein. In some embodiments, the endonuclease domain cleaves at a GG site in the target DNA sequence. In some embodiments, the endonuclease domain cleaves at an AAGG site in the target DNA sequence. In some embodiments, a target DNA sequence described herein comprises a GG or AAGG motif, e.g., a motif naturally occurring in the human genome.

DNA binding domain:

in certain aspects, the DNA binding domain of the Gene Writer polypeptides described herein is selected, designed, or constructed to bind to a desired host DNA target sequence. In certain embodiments, the DNA-binding domain of the engineered RLE is a heterologous DNA-binding protein or domain relative to the native retrotransposon sequence. In some embodiments, the heterologous DNA binding element is a zinc finger element or TAL effector element, such as a zinc finger or TAL polypeptide or functional fragment thereof. In some embodiments, the heterologous DNA binding element is a sequence-directed DNA binding element, such as Cas9, cpf1, or other CRISPR-associated protein that has been altered to have no endonuclease activity. In some embodiments, the heterologous DNA binding element retains endonuclease activity. In some embodiments, the heterologous DNA binding element retains only single stranded DNA cleavage activity, e.g., is a DNA nickase, e.g., is a Cas9 nickase. In some embodiments, a heterologous DNA binding element having endonuclease activity replaces the endonuclease element of the polypeptide. In some embodiments, the heterologous DNA binding element having endonuclease activity supplements the endonuclease element of the polypeptide, e.g., causes additional nicks at the target site. In particular embodiments, the heterologous DNA binding domain may be any one or more of Cas9, TAL domain, ZF domain, myb domain, combinations thereof, or multiples thereof. In certain embodiments, the heterologous DNA binding domain is a DNA binding domain of a retrotransposon described in the tables herein. One of ordinary skill in the art can identify DNA binding domains based on homology to other known DNA binding domains using tools as Basic Local Alignment Search Tools (BLAST). In other embodiments, the DNA binding domain is modified, e.g., by site-specific mutation, increasing or decreasing DNA binding elements (e.g., number and/or specificity of zinc fingers), etc., to alter DNA binding specificity and affinity. In some embodiments, the DNA binding domain is altered from its native sequence to have altered codon usage, e.g., to improve upon a human cell.

In some embodiments, the polypeptides described herein comprise mutations in the DNA binding domain. In some embodiments, the mutation reduces or eliminates the DNA binding activity of the DNA binding domain, e.g., in the assay of example 30, e.g., to less than 50%, 40%, 30%, 20%, 10%, 5%, 2% or 1% of the corresponding wild-type sequence. The mutation may be, for example, in the ZF1 domain, the ZF2 domain or the c-myb domain. The mutation may be a point mutation. Mutations can be in C residues (e.g., C to S), such as in C residues in the ZF1 or ZF2 domains; in R residues (e.g., R to A), such as in R residues in the c-myb domain; or in a W residue (e.g., W to a), e.g., in a W residue in the c-myb domain; or any combination thereof. In some embodiments, the polypeptide comprising a mutation in the DNA binding domain further comprises a heterologous DNA binding domain.

In some embodiments, naturally occurring AAGG sequences in the genome are used as seeds for retargeting the R2 reverse transcription transposase based Gene Writing system, wherein the DNA binding domain is mutated or replaced by a heterologous DNA binding domain, such that binding of the Gene Writing polypeptide to the new target site results in the correct localization of the endonuclease domain to the AAGG motif to achieve endonuclease activity. In some embodiments, a target DNA sequence described herein comprises a motif (e.g., a GG or AAGG motif) that is recognized by an endonuclease domain, e.g., a naturally occurring motif in the human genome. In some embodiments, the geneWriter comprises a DNA binding domain (e.g., a heterologous DNA binding domain) that binds near a motif recognized by the endonuclease domain, e.g., in a manner that positions the endonuclease domain of the geneWriter as a cleavage motif. In some embodiments, the DNA binding domain binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 nucleotides of a motif (e.g., a GG or AAGG motif) that is recognized by the endonuclease domain. The DNA binding domain may bind to a site upstream or downstream of a GG or AAGG motif. The DNA binding domain may bind to a site that is in the same orientation or in a reverse complementary orientation compared to the motif recognized by the endonuclease domain (e.g., a GG or AAGG motif). In some embodiments, the retargeted geneWriter polypeptide comprises (i) an endonuclease domain that recognizes a motif, and (ii) a heterologous DNA binding domain that recognizes a genomic DNA sequence. In some embodiments, the motif is about 30-80, 40-70, 50-60, or 55nt upstream of the genomic DNA sequence, wherein optionally the motif and the genomic DNA sequence are in the same orientation. In some embodiments, the motif is about 10-30, 15-25, or 20nt downstream of the genomic DNA sequence, wherein optionally the motif is in the opposite orientation to the genomic DNA sequence. In some embodiments, the DNA binding domain comprises a meganuclease domain (e.g., as described herein, e.g., in an endonuclease domain portion), or a functional fragment thereof. In some embodiments, the meganuclease domain has endonuclease activity, e.g., double-strand cleavage and/or nicking enzyme activity. In other embodiments, the meganuclease domain has reduced activity, e.g., lacks endonuclease activity, e.g., the meganuclease is catalytically inactive. In some embodiments, catalytically inactive meganucleases are used as DNA binding domains, e.g., as described in Fonfara et al Nucleic Acids Res [ nucleic acids Ind. 40 (2): 847-860 (2012), which is incorporated herein by reference in its entirety. In embodiments, the DNA binding domain comprises one or more modifications relative to the wild-type DNA binding domain, e.g., via directed evolution (e.g., phage-assisted continuous evolution (PACE)).

In certain aspects of the invention, the host DNA binding site integrated by the Gene Writer system may be in a Gene, in an intron, in an exon, in an ORF, outside the coding region of any Gene, in a regulator, within the regulatory region of a Gene, or outside the regulatory region of a Gene. In other aspects, the engineered RLEs may bind to one or more host DNA sequences.

In some embodiments, the Gene Writing system is used to edit target loci in multiple alleles. In some embodiments, the Gene Writing system is designed to edit a particular allele. For example, a Gene Writing polypeptide may be directed against a specific sequence present on only one allele, e.g., a template RNA comprising homology to a target allele (e.g., a gRNA or annealing domain), but not against a second homologous allele. In some embodiments, the Gene Writing system can alter haplotype-specific alleles. In some embodiments, the Gene Writing system targeting a particular allele preferentially targets that allele, e.g., has a bias of at least 2, 4, 6, 8, or 10 times for the target allele.

In certain embodiments, gene Writer ^TM Gene editor system RNA further packagesContaining intracellular localization sequences, e.g., nuclear localization sequences. The nuclear localization sequence may be an RNA sequence that facilitates RNA import into the nucleus. In certain embodiments, the nuclear localization signal is located on the template RNA. In certain embodiments, the reverse transcriptase polypeptide is encoded on a first RNA and the template RNA is a second separate RNA, and the nuclear localization signal is located on the template RNA but not on the RNA encoding the reverse transcriptase polypeptide. While not wishing to be bound by theory, in some embodiments, the RNA encoding the retrotransposase targets primarily the cytoplasm to facilitate translation thereof, while the template RNA targets primarily the nucleus to facilitate reverse-seating thereof into the genome. In some embodiments, the nuclear localization signal is at the 3 'end, the 5' end, or within the template RNA. In some embodiments, the nuclear localization signal is at 3 'of the heterologous sequence (e.g., directly at 3' of the heterologous sequence) or at 5 'of the heterologous sequence (e.g., directly at 5' of the heterologous sequence). In some embodiments, the nuclear localization signal is placed outside the 5'utr or outside the 3' utr of the template RNA. In some embodiments, a nuclear localization signal is placed between the 5'utr and the 3' utr, wherein optionally the nuclear localization signal is not transcribed with the transgene (e.g., the nuclear localization signal is in an antisense orientation or downstream of a transcription termination signal or polyadenylation signal). In some embodiments, the nuclear localization sequence is located inside an intron. In some embodiments, a plurality of identical or different nuclear localization signals are in the RNA, e.g., in a template RNA. In some embodiments, the length of the nuclear localization signal is less than 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1000bp. Various RNA nuclear localization sequences may be used. For example, lubelsky and Ulisky, nature ]555 (107-111), 2018 describes RNA sequences that drive the localization of RNA into the nucleus. In some embodiments, the nuclear localization signal is a SINE-derived nuclear RNA localization (SIRLOIN) signal. In some embodiments, the nuclear localization signal binds to a nuclear-rich protein. In some embodiments, the nuclear localization signal binds to HNRNPK protein. In some embodiments, the nuclear localization signal is pyrimidine-rich, e.g., a C/T-rich, C/U-rich, C-rich, T-rich, or U-rich region. In some embodiments, the nuclear localization signal is derived from long non-coding RNAs. At the position ofIn some embodiments, the nuclear localization signal is derived from MALAT1 long non-coding RNA or the M region of MALAT1 of 600 nucleotides (described in Miyagawa et al, RNA 18, (738-751), 2012). In some embodiments, the nuclear localization signal is derived from a BORG long non-coding RNA or is an AGCCC motif (described in Zhang et al, molecular and Cellular Biology [ molecular and cell biology ]]34,2318-2329 (2014). In some embodiments, the nuclear localization sequence is found in Shukla et al, the EMBO Journal [ EMBO Journal ]]e98452 (2018). In some embodiments, the nuclear localization signal is derived from a non-LTR retrotransposon, an LTR retrotransposon, a retrovirus, or an endogenous retrovirus.

In some embodiments, the polypeptides described herein comprise one or more (e.g., 2, 3, 4, 5) nuclear targeting sequences, such as Nuclear Localization Sequences (NLS), e.g., as described above. In some embodiments, the NLS is a two-component NLS. In some embodiments, the NLS facilitates the introduction of a protein comprising the NLS into the nucleus. In some embodiments, NLS is fused to the N-terminus of Gene writers described herein. In some embodiments, NLS is fused to the C-terminus of Gene Writer. In some embodiments, the NLS is fused to the N-terminus or C-terminus of the Cas domain. In some embodiments, the linker sequence is disposed between adjacent domains of the NLS and Gene Writer.

In some embodiments, the NLS comprises the amino acid sequences MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 1585), PKKRKVEGADKRTADGSEFESPKKKRKV (SEQ ID NO: 1586), RKSGKIAAIWKRPRKPKKKRKV (SEQ ID NO: 1587), KRTADGSEFESPKKKRKV (SEQ ID NO: 1588), KKTELQTTNAENKTKKL (SEQ ID NO: 1589), or KRGINDRNFWRGENGRKTR (SEQ ID NO: 1590), KRPAATKKAGQAKKKK (SEQ ID NO: 1591), or functional fragments or variants thereof. Exemplary NLS sequences are also described in PCT/EP 2000/01690, the disclosure of which is incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, the NLS comprises an amino acid sequence as disclosed in table 39. The NLS of the table can be used with one or more copies of a polypeptide at one or more positions in the polypeptide, e.g., 1, 2, 3 or more NLS copies in the N-terminal domain, in the peptide domain, between peptide domains, in the C-terminal domain, or in a combination of positions, to improve subcellular localization of the nucleus. Multiple unique sequences can be used in a single polypeptide. The sequences may be naturally single or two-part, e.g., having one or two basic amino acids, or may be used as chimeric two-part sequences. Sequence references correspond to UniProt accession numbers unless indicated as SeqNLS (Lin et al BMC bioinformatics [ BMC bioinformatics ]13:157 (2012), incorporated herein by reference in its entirety) for sequences mined using subcellular localization prediction algorithms.

Table 39: exemplary Nuclear localization Signal for Gene Writing System

/>

In some embodiments, the NLS is a two-component NLS. A two-component NLS typically comprises two basic amino acid clusters separated by a spacer sequence (which may be, for example, about 10 amino acids in length). Single component NLS typically lack spacers. An example of a two-component NLS is a nuclear plasma NLS, with the sequence KR [ PAATKKAGQA ] KKKK (SEQ ID NO: 1591), with spacers placed in brackets. Another exemplary two-component NLS has the sequence PKKKRKVEGADKRTADGSEFESPKKK RKV (SEQ ID NO: 1593). An exemplary NLS is described in international application WO 2020051561, which is incorporated herein by reference in its entirety, including its disclosure of nuclear localization sequences.

In certain embodiments, gene Writer ^TM The gene editor system polypeptide further comprises an intracellular localization sequence, e.g., a nuclear localization sequence and/or a nucleolar localization sequence. The nuclear localization sequence and/or nucleolar localization sequence may be an amino acid sequence that facilitates the import of a protein into the nucleus and/or nucleolus, where it may facilitate the integration of a heterologous sequence into the genome. In certain embodiments, the Gene Writer Gene editor system polypeptide (e.g., a retrotransposase, e.g., a polypeptide according to any one of tables 1, 2, or 3 herein) further comprises a nucleolar localization sequence. In certain embodiments, the reverse transcriptase polypeptide is encoded on a first RNA, the template RNA is a second separate RNA, and the nucleolar localisation signal is encoded on the RNA encoding the reverse transcriptase polypeptide, but not on the template RNA. In some embodiments, the nucleolar localization signal is located at the N-terminus, the C-terminus, or within the polypeptide. In some embodiments, multiple identical or different nucleolar localisation signals are used. In some embodiments, the nuclear localization signal is less than 5, 10, 25, 50, 75, or 100 amino acids in length. Various polypeptide nucleolar localization signals may be used. For example, yang et al, journal of Biomedical Science [ journal of biochemistry science ] ]22,33 (2015) describe a nuclear localization signal which also functions as a nucleolar localization signal. In some embodiments, the nucleolar localization signal may also be a nuclear localization signal. In some embodiments, the nucleolar localization signal may overlap with the nucleolar localization signal. In some embodiments, the nucleolar localization signal can comprise a segment of basic residues. In some embodiments, the nucleolar localisation signal may be enriched in arginine and lysine residues. In some embodiments, the nucleolar localization signal may be derived from a protein enriched in nucleoli. In some embodiments, the nucleolar localization signal can be derived from a protein enriched at a ribosomal RNA locus. In some embodiments, the nucleolar localization signal can be derived from a protein that binds rRNA. In some embodiments, the nucleolar localization signal may originate from MSP58. In some embodiments, the nucleolar localisation signal may be a single component motif. In some embodiments, the nucleolar localisation signal may be a two component basisAnd (5) sequencing. In some embodiments, the nucleolar localisation signal may consist of a plurality of single-component or two-component motifs. In some embodiments, the nucleolar localisation signal may consist of a mixture of single and two component motifs. In some embodiments, the nucleolar localisation signal may be a dual two component motif. In some embodiments, the nucleolus localization motif may be KRASSQALGTIPKRRSSSRFIKRKK (SEQ ID NO: 1530). In some embodiments, the nucleolar localization signal may be derived from a nuclear factor- κb induced kinase. In some embodiments, the nucleolar localisation signal may be the RKKKKKKK motif (SEQ ID NO: 1531) (Birbach et al, journal of Cell Science [ journal of cell science ] ]117 (3615-3624), 2004).

Since endogenous nucleolar localization signals may help drive Gene Writer polypeptides into nucleoli, since these polypeptides originate from retrotransposons that naturally target rDNA, e.g., R1, R2, R4, R8, R9, it may be beneficial to inactivate the signals when relocated to sites outside rDNA. Endogenous nucleolar localization signals (NoLS) can be computationally predicted using published algorithms trained with validated proteins that localize to nucleosomes (Scott, m.s., et al, nucleic Acids Research [ nucleic acids research ],38 (21), 7388-7399 (2010)). The predicted NoLS sequence is based on the amino acid sequence, the amino acid sequence background and the predicted secondary structure of the retrotransposase. The identified sequences are typically rich in basic amino acids (Scott, m.s., et al Nucleic Acids Research [ nucleic acids research ],38 (21), 7388-7399 (2010)) and mutation of these residues to simple side chain non-basic amino acids or their removal from the polypeptide chain prevents localization to nucleoli (Yang, c.p., et al Journal of Biomedical Science [ journal of biomedical science ],22 (1), 1-15 (2015), martin, r.m., et al, nucleolus [ Nucleus ],6 (4), 314-325 (2015)). In some embodiments, the NoLS sequence is located in an amino acid region of a retrotransposase between a reverse transcriptase domain and a restriction enzyme-like endonuclease domain. In some embodiments, the predicted NoLS region comprises lysine, arginine, histidine, and/or glutamine amino acids, and nucleolar localisation is inactivated by mutating one or more of these residues to alanine and/or removing from the polypeptide.

In some embodiments, a nucleic acid described herein (e.g., RNA encoding a geneWriter polypeptide or DNA encoding the same) comprises a microRNA binding site. In some embodiments, the microrna binding site is used to increase target cell specificity of the geneWriter system. For example, the microrna binding site can be selected based on the recognition of a miRNA that is present in a non-target cell type but not in a target cell type (or at a reduced level relative to a non-target cell). Thus, when RNA encoding a geneWriter polypeptide is present in a non-target cell, it will bind to the miRNA, whereas when RNA encoding a geneWriter polypeptide is present in a target cell, it will not bind to the miRNA (or bind, but bind at a reduced level relative to a non-target cell). While not wishing to be bound by theory, binding of mirnas to RNAs encoding the geneWriter polypeptides may reduce production of the geneWriter polypeptides, for example, by degrading the mRNA encoding the polypeptide or by interfering with translation. Thus, the heterologous subject sequence will insert into the genome of the target cell more efficiently than the genome of the non-target cell. Systems having a microRNA binding site in RNA encoding a GeneWriter polypeptide (or encoded in DNA encoding RNA) can also be used in combination with a template RNA modulated by a second microRNA binding site, e.g., as described herein under the heading "GeneWriter ^TM Template RNA component of the Gene editor System "described in the following. In some embodiments, for example, for liver indications, the miRNA is selected from table 4 of WO 2020014209 (which is incorporated herein by reference).

In some embodiments, the DNA encoding the Gene Writer polypeptide comprises a promoter sequence, such as a tissue specific promoter sequence. In some embodiments, a tissue-specific promoter is used to increase Gene Writer ^TM Target cell specificity of the system. For example, promoters may be selected based on their activity in a target cell type but not in a non-target cell type (or at a lower level). Systems having tissue-specific promoter sequences in the DNA of the polypeptide can also be used in combination with microRNA binding sites, for example, in template RNA or encoding Gene writers ^TM In nucleic acids of proteins, for example, as described herein. In the code Gene WriterSystems having tissue-specific promoter sequences in the DNA of the polypeptide may also be used in combination with DNA encoding RNA templates driven by tissue-specific promoters, e.g., to achieve higher levels of RNA templates in target cells than in non-target cells. In some embodiments, for example, for liver indications, the tissue specific promoter is selected from table 3 of WO 2020014209 (which is incorporated herein by reference).

The skilled artisan can determine the nucleic acid and corresponding polypeptide sequence of each retrotransposon and its domain based on the accession numbers provided in tables 1-3, for example, by using conventional sequence analysis tools such as Basic Local Alignment Search Tools (BLAST) or CD-searches (for conserved domain analysis). Other sequence analysis tools are known and can be found, for example, on https:// molbriol-tools.ca/motifs.htm. SEQ ID NOs 1-112 aligned with each row in Table 1 and SEQ ID NOs 113-1015 aligned with the first 903 row of Table 2.

Tables 1-3 herein provide sequences of exemplary transposons, including amino acid sequences of the retrotransposases, as well as sequences of 5 'and 3' untranslated regions and complete transposon nucleic acid sequences that allow binding of the retrotransposases to template RNAs. In some embodiments, the 5' utr of any of tables 1-3 allows the retrotransposase to bind to the template RNA. In some embodiments, the 3' utr of any of tables 1-3 allows the retrotransposase to bind to the template RNA. Thus, in some embodiments, a polypeptide for use in any of the systems described herein can be a polypeptide of any of tables 1-3 herein, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the system further comprises one or both (or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto) of the 5 'or 3' untranslated regions of any one of tables 1-3 herein, e.g., from the same transposon as the polypeptide mentioned in the previous sentence, as shown in the same row of the same table. In some embodiments, the system comprises one or both of the 5 'or 3' untranslated regions of any of tables 1-3 herein, e.g., a segment of the complete transposon sequence (which encodes an RNA capable of binding an retrotransposase) and/or a subsequence provided in the heading column predicted 5'utr or predicted 3' utr column.

In some embodiments, the polypeptides used in any of the systems described herein may be molecular or genetic reconstructions based on aligned polypeptide sequences of multiple retrotransposons. In some embodiments, the 5 'or 3' untranslated region used in any of the systems described herein can be a molecular reconstruction based on the aligned 5 'or 3' untranslated regions of multiple retrotransposons. Based on the accession numbers provided herein, the skilled artisan can align polypeptides or nucleic acid sequences, for example, by using conventional sequence analysis tools, such as Basic Local Alignment Search Tools (BLAST) or CD-searches (for conserved domain analysis). Molecular reconstruction can be created based on consensus sequences, for example, as described in Ivics et al, cell [ Cell ]1997,501-510; wagstaff et al, molecular Biology and Evolution [ molecular biology and evolution ]2013,88-99. In some embodiments, retrotransposons from which 5 'or 3' untranslated regions or polypeptides are derived are young or recently active mobile elements, as assessed by phylogenetic methods such as those described in Boissinot et al, molecular Biology and Evolution [ molecular biology and evolution ]2000, 915-928.

Table 3 (below) shows exemplary Gene Writer proteins and related sequences from various reverse seatings enzymes identified using data mining. Column 1 indicates the family to which the retrotransposon belongs. Column 2 lists the element names. Column 3 indicates the accession number (if any). Column 4 lists the organisms in which the retrotransposase is found. The DNA sequence of the retrotransposon is listed in column 5. Column 6 lists the predicted 5 'untranslated region and column 7 lists the predicted 3' untranslated region; both are segments of the sequence of column 5, which are predicted to allow the template RNA to bind to the reverse transcriptase of column 8. (it will be appreciated that columns 5-7 show DNA sequences and that RNA sequences according to any of columns 5-7 will typically include uracil instead of thymidine) column 8 lists the predicted retrotransposase sequences encoded in the retrotransposon of column 5.

/>

Exemplary cis Gene Writer example

In some embodiments, the writing domain (e.g., RT domain) comprises an RNA binding domain, e.g., that specifically binds to an RNA sequence. In some embodiments, the template RNA comprises an RNA sequence that is specifically bound by an RNA binding domain of the writing domain.

Template nucleic acid binding domain:

gene Writer polypeptides typically comprise a region capable of associating with a Gene Writer template nucleic acid (e.g., template RNA). In some embodiments, the template nucleic acid binding domain is an RNA binding domain. In some embodiments, the RNA binding domain is a modular domain that can be associated with an RNA molecule that contains a particular feature (e.g., a structural motif, such as a secondary structure in the 3' utr present in a non-LTR retrotransposon). In other embodiments, the template nucleic acid binding domain (e.g., RNA binding domain) is contained within a reverse transcription domain, e.g., the reverse transcriptase derived component has known RNA preference characteristics, e.g., secondary structure present in the non-LTR retrotransposon 3' utr. In other embodiments, the template nucleic acid binding domain (e.g., RNA binding domain) is contained within a DNA binding domain. For example, in some embodiments, the DNA binding domain is a CRISPR-associated protein that recognizes the structure of a template nucleic acid (e.g., a template RNA) comprising a gRNA. In some embodiments, the gRNA is a short synthetic RNA composed of a scaffold sequence involved in CRISPR-associated protein binding and a user-defined targeting sequence of about 20 nucleotides to a genomic target. Nishimasu et al Cell [ Cell ]156, pages 935-949 (2014) describe the structure of intact gRNA. The gRNA (also known as the sgRNA of a single guide RNA) consists of crRNA and tracrRNA derived sequences, which are joined by artificial tetracyclic junctions. The crRNA sequence can be divided into a guide region (20 nt) and a repeat region (12 nt), while the tracrRNA sequence can be divided into an anti-repeat region (14 nt) and three tracrRNA stem loops (Nishimasu et al Cell [ Cell ]156, pages 935-949 (2014)). In practice, the guide RNA sequence is typically designed to have a length of 17-24 nucleotides (e.g., 19, 20, or 21 nucleotides) and is complementary to the target nucleic acid sequence. Custom gRNA generators and algorithms are commercially available for designing effective guide RNAs. In some embodiments, the gRNA comprises two RNA components from a natural CRISPR system, such as crRNA and tracrRNA. As is well known in the art, the gRNA may also comprise a chimeric single guide RNA (sgRNA) containing sequences from the tracrRNA (to bind nucleases) and at least one crRNA (to direct nucleases to sequences targeted for editing/binding). Chemically modified sgrnas have also been demonstrated to be effective for use with CRISPR-associated proteins; see, e.g., hendel et al (2015) Nature Biotechnol [ natural biotechnology ], 985-991. In some embodiments, the gRNA comprises a nucleic acid sequence complementary to a DNA sequence associated with a target gene. In some embodiments, the polypeptide comprises a DNA binding domain comprising a CRISPR-associated protein associated with a gRNA that allows the DNA binding domain to bind to a target genomic DNA sequence. In some embodiments, the gRNA is contained within a template nucleic acid (e.g., a template RNA), and thus the DNA binding domain is also a template nucleic acid binding domain. In some embodiments, the polypeptide has RNA binding function in multiple domains, e.g., can bind to a gRNA structure in a CRISPR-associated DNA binding domain and a 3' utr structure in a non-LTR retrotransposon derived retrotranscription domain.

Endonuclease domain:

in some embodiments, the Gene Writer polypeptide has the function of cleaving a DNA target site through an endonuclease domain. In some embodiments, the endonuclease domain is also a DNA binding domain. In some embodiments, the endonuclease domain is also a template nucleic acid (e.g., template RNA) binding domain. For example, in some embodiments, the polypeptide comprises a CRISPR-associated endonuclease domain that binds to a template RNA comprising a gRNA, binds to a target DNA sequence (e.g., is complementary to a portion of the gRNA), and cleaves the target DNA sequence. In certain embodiments, an endonuclease/DNA binding domain of an APE-type retrotransposon or an endonuclease domain of an RLE-type retrotransposon (e.g., by insertion, deletion, or substitution of one or more residues) may be used or may be modified in the Gene Writer systems described herein. In some embodiments, the endonuclease domain or endonuclease/DNA binding domain is altered from its native sequence to have altered codon usage, e.g., improved for human cells. In some embodiments, the endonuclease element is a heterologous endonuclease element, such as a Fok1 nuclease, a type II restriction endonuclease (RLE type nuclease), or another RLE type endonuclease (also known as REL). In some embodiments, the heterologous endonuclease activity has nicking enzyme activity and does not form a double strand break. The amino acid sequence of the endonuclease domain of the Gene Writer system described herein may be at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of the endonuclease domain of a retrotransposon of a DNA sequence recited in table 1, 2 or 3. One of ordinary skill in the art can identify endonuclease domains based on homology to other known endonuclease domains using tools such as Basic Local Alignment Search Tools (BLAST). In certain embodiments, the heterologous endonuclease is Fok1 or a functional fragment thereof. In certain embodiments, the heterologous endonuclease is a Holliday (Holliday) ligation dissociation enzyme or a homolog thereof, such as a Holliday ligation dissociation enzyme from sulfolobus solfataricus (Sulfolobus solfataricus) -Ssol Hje (Govindaraju et al Nucleic Acids Research [ nucleic acids research ]44:7,2016). In certain embodiments, the heterologous endonuclease is an endonuclease of a large fragment of a spliceosome protein such as Prp8 (Mahbub et al, mobile DNA [ Mobile DNA ]8:16, 2017). In certain embodiments, the heterologous endonuclease is derived from a CRISPR-associated protein, such as Cas9. In certain embodiments, the heterologous endonuclease is engineered to have ssDNA cleavage activity only, e.g., nickase activity only, e.g., cas9 nickase. For example, the Gene Writer polypeptides described herein may comprise a reverse transcriptase domain from an APE or RLE-type retrotransposon and an endonuclease domain comprising Fok1 or a functional fragment thereof. In other embodiments, the cognate endonuclease domain is modified, e.g., by site-specific mutagenesis, to alter DNA endonuclease activity. In other embodiments, the endonuclease domain is modified to remove any potential DNA sequence specificity.

In some embodiments, the endonuclease domain has nicking enzyme activity and does not form a double strand break. In some embodiments, the endonuclease domain forms single strand breaks at a higher frequency than double strand breaks, e.g., at least 90%, 95%, 96%, 97%, 98%, or 99% of the breaks are single strand breaks, or less than 10%, 5%, 4%, 3%, 2%, or 1% of the breaks are double strand breaks. In some embodiments, the endonuclease does not substantially form a double strand break. In some embodiments, the exonuclease does not form a detectable level of double-strand breaks.

In some embodiments, the endonuclease domain has nicking enzyme activity that nicks the target site DNA of the strand to be edited; for example, in some embodiments, the endonuclease cleaves a target site of genomic DNA near the site of change on the strand to be extended by the writing domain. In some embodiments, the endonuclease domain has a cleavage enzyme activity that nicks the target site DNA of the strand to be edited and does not nick the target site DNA of the unedited strand. For example, when a polypeptide comprises a CRISPR-associated endonuclease domain that has nicking enzyme activity and does not form a double strand break, in some embodiments, the CRISPR-associated endonuclease domain nicks a target site DNA strand that contains a PAM site (e.g., and does not nick a target site DNA strand that does not contain a PAM site).

In some other embodiments, the endonuclease domain has nicking enzyme activity that nicks the target site DNA of the strand to be edited and the unedited strand. Without wishing to be bound by theory, after polymerization (e.g., reverse transcription) of the writing domain (e.g., RT domain) of the polypeptide herein from a heterologous subject sequence of a template nucleic acid (e.g., template RNA), the cellular DNA repair machinery must repair the nick on the DNA strand to be edited. The target site DNA now comprises two different DNA strand sequences to be edited: one corresponding to the original genomic DNA and the second corresponding to the one aggregated from the heterologous subject sequence. It is believed that these two different sequences balance one another, the first hybridizing to the unedited strand, then the other, and the incorporation of the cellular DNA repair device into its target site of repair is considered random. Without wishing to be bound by theory, it is believed that introducing additional nicks into the unedited strand may bias cellular DNA repair mechanisms towards more frequent adoption of sequences based on heterologous subject sequences than the original genomic sequence. In some embodiments, the additional nicks are located at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, or 150 nucleotides of the target site modification (e.g., insertion, deletion, or substitution) or nick 5 'or 3' on the strand to be edited.

Alternatively or additionally, without wishing to be bound by theory, it is believed that additional nicks of the second strand may promote unedited strand synthesis. In some embodiments, when the Gene Writer has inserted or replaced a portion of the edited chain, it is necessary to synthesize a new sequence corresponding to the insertion/replacement in the unedited chain.

In some embodiments, the polypeptide comprises a single domain (e.g., a single endonuclease domain) having endonuclease activity and the domain nicks the strand to be edited and the unedited strand. For example, in such embodiments, the endonuclease domain can be a CRISPR-associated endonuclease domain, and the template nucleic acid (e.g., template RNA) comprises a gRNA that directs nicking of a strand to be edited and an additional gRNA that directs nicking of an unedited strand. In some embodiments, the polypeptide comprises a plurality of domains having endonuclease activity, and the first endonuclease domain nicks the strand to be edited and the second endonuclease domain nicks the unedited strand (optionally, the first endonuclease domain does not (e.g., cannot) nick the unedited strand and the second endonuclease domain does not (e.g., cannot) nick the strand to be edited).

In some embodiments, the endonuclease domain is capable of nicking the first strand and the second strand. In some embodiments, the first and second strand breaks occur at the same location in the target site but on opposite strands. In some embodiments, the second chain cut occurs at staggered locations, e.g., upstream or downstream, of the first cut. In some embodiments, if the second strand break is upstream of the first strand break, the endonuclease domain produces a target site deletion. In some embodiments, the endonuclease domain generates a target site repeat if the second strand break is downstream of the first strand break. In some embodiments, if the first and second strand breaks occur at the same location of the target site, the endonuclease domain does not produce duplications and/or deletions (e.g., as described in gladyv and Arkhipova Gene [ Gene ]2009, incorporated herein by reference in its entirety). In some embodiments, the endonuclease endo-domains have altered activity depending on protein conformation or RNA binding status, e.g., which facilitates nicking of the first or second strand (e.g., as described in Christensen et al PNAS, national academy of sciences of the united states of america 2006; incorporated herein by reference in its entirety).

In some embodiments, the endonuclease domain comprises a meganuclease or a functional fragment thereof. In some embodiments, the endonuclease domain comprises a homing endonuclease or a functional fragment thereof. In some embodiments, the endonuclease domain comprises a meganuclease from the LAGLIDADG (SEQ ID NO: 1594), GIY-YIG, HNH, his-Cys cassette, or PD- (D/E) XK family, or a functional fragment or variant thereof, e.g., having a conserved amino acid motif, e.g., as shown by family name. In some embodiments, the endonuclease domain comprises a meganuclease or fragment thereof selected from, for example, I-SmaMI (Uniprot F7WD 42), I-SceI (Uniprot P03882), I-AniI (Uniprot P03880), I-Dmo I (Uniprot P21505), I-CreI (Uniprot P05725), I-TevI (Uniprot P13299), I-OnuI (Uniprot Q4VWW 5), or I-Bmo I (Uniprot Q9ANR 6). In some embodiments, the meganuclease in its functional form is a natural monomer, such as I-SceI, I-TevI, or a dimer, such as I-CreI. For example, a LAGLIDADG (SEQ ID NO: 1594) meganuclease having a single LAGLIDADG (SEQ ID NO: 1594) motif copy typically forms a homodimer, while members having two LAGLIDADG (SEQ ID NO: 1594) motif copies are typically found as monomers. In some embodiments, meganucleases, typically formed in dimeric form, are expressed as fusions, e.g., the two subunits are expressed as a single ORF and optionally joined by a linker, e.g., an I-CreI dimer fusion (Rodriguez-Fornes et al Gene Therapy 2020; incorporated herein by reference in its entirety). In some embodiments, meganucleases or functional fragments thereof are altered to favor nicking enzyme activity of one strand of a double-stranded DNA molecule, such as I-SceI (K122I and/or K223I) (Niu et al J Mol Biol [ journal of molecular biology ] 2008), I-AniI (K227M) (McConnell Smith et al PNAS [ Proc. Natl. Acad. Sci. USA ] 2009), I-Dmo I (Q42A and/or K120M) (Molina et al J Biol Chem [ journal of biochemistry ] 2015). In some embodiments, meganucleases or functional fragments thereof with such a preference for single strand cleavage are used as endonuclease domains, e.g., with nicking enzyme activity. In some embodiments, the endonuclease domain comprises a meganuclease or functional fragment thereof that is naturally targeted or engineered to target a safe harbor site, such as I-CreI (Rodriguez-Fornes et al, supra) that targets the SH6 site. In some embodiments, the endonuclease domain comprises a meganuclease or functional fragment thereof having a sequence-tolerant catalytic domain, e.g., an I-TevI that recognizes the minimal motif CNNNG (kleinsriver et al PNAS [ national academy of sciences usa ] 2012). In some embodiments, the target sequence tolerance catalytic domain is fused to a DNA binding domain, e.g., to direct activity, e.g., by fusing I-TevI to: (i) zinc fingers to produce Tev-ZFE (Kleinstiver et al PNAS [ national academy of sciences of the united states of america ] 2012), (ii) other meganucleases to produce MegaTevs (Wolfs et al Nucleic Acids Res [ nucleic acids research ] 2014), and/or (iii) Cas9 to produce TevCas9 (Wolfs et al PNAS [ national academy of sciences of the united states) 2016.

In some embodiments, the endonuclease domain comprises a restriction enzyme, e.g., a type IIS or type IIP restriction enzyme. In some embodiments, the endonuclease domain comprises a type IIS restriction enzyme, such as fokl, or a fragment or variant thereof. In some embodiments, the endonuclease domain comprises a type IIP restriction enzyme, such as PvuII, or a fragment or variant thereof. In some embodiments, the dimeric restriction enzyme is expressed as a fusion so that it functions as a single strand, e.g., fokI dimeric fusion (Minczuk et al Nucleic Acids Res [ nucleic acids Res. 36 (12): 3926-3938 (2008)).

The use of additional endonuclease domains is described, for example, in Guha and Edgell Int J Mol Sci [ J.International molecular science ]18 (22): 2565 (2017), which is incorporated herein by reference in its entirety.

In some embodiments, the endonuclease domain or DNA-binding domain (e.g., as described herein) comprises a Cas protein, such as streptococcus pyogenes Cas9 (SpCas 9) or a functional fragment or variant thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises a modified SpCas9. In embodiments, the modified SpCas9 comprises a modification that alters the specificity of the Protospacer Adjacent Motif (PAM). In embodiments, PAM is specific for the nucleic acid sequence 5 '-NGT-3'. In embodiments, the modified SpCas9 comprises one or more amino acid substitutions, e.g., at one or more of positions L1111, D1135, G1218, E1219, a1322, or R1335, e.g., selected from L1111R, D1135V, G1218R, E1219F, A1322R, R1335V. In embodiments, the modified SpCas9 comprises the amino acid substitution T1337R and one or more additional amino acid substitutions, e.g., selected from L1111, D1135L, S1136R, G1218S, E1219V, D1332A, D1332S, D1332T, D1332V, D13384L, D1332K, D1332R, R1335Q, T1337, T1337L, T1337Q, T1337I, T1337V, T1337F, T1337S, T13375 1337H, T1337Q, and T1337M, or corresponding amino acid substitutions thereof. In embodiments, the modified SpCas9 comprises: (i) One or more amino acid substitutions selected from D1135L, S1136R, G1218S, E1219V, A1322R, R1335Q, and T1337; and (ii) one or more amino acid substitutions selected from L1111R, G1218R, E1219F, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, T1337L, T1337I, T1337V, T F, T1337S, T1337N, T1337K, T1337R, T1337H, T1337Q, and T1337M, or the corresponding amino acid substitutions of these listed amino acid substitutions.

In some embodiments, the Gene Writer can comprise a Cas protein as listed in table 40. The predicted or validated nickase mutations for installing nickase activity in Cas proteins as shown in table 40 are based on the characteristics of the SpCas9 (N863A) mutation. In some embodiments, the systems described herein comprise the Gene Writer protein of table 3 and the Cas protein of table 40A. In some embodiments, the GeneWriter protein of table 3 is fused to the Cas protein of table 40A.

Table 40A: CRISPR/Cas proteins, species and mutations

/>

Table 40B provides parameters defining the necessary components for designing the gRNA and/or template RNA to apply the Cas variants listed in table 3A to Gene Writing. If they are available at a given locus, a grade indicates a preferred Cas variant. The cleavage site indicates the validated or predicted pre-spacer adjacent motif (PAM) requirement, the validated or predicted cleavage site position (relative to the most upstream base of the PAM site). The gRNA for a given enzyme can be assembled by ligating crRNA, tetraloop and tracrRNA sequences and further adding 5' spacers within the spacer (min) and spacer (max) that match the pre-spacer of the target site in length. In addition, the predicted location of ssDNA nicks on the target is important for designing the 3 'region of the template RNA (which needs to anneal immediately to the sequence 5' of the nick to initiate target-initiated reverse transcription).

Table 40B defines parameters for designing the necessary components of gRNA and/or template RNA to apply the Cas variants listed in Table 40A to Gene Writing

/>

In some embodiments, the endonuclease domain or DNA-binding domain (e.g., as described herein) comprises a Cas domain, e.g., a Cas9 domain. In embodiments, the endonuclease domain or DNA-binding domain comprises a nuclease-active Cas domain, a Cas nickase (nCas) domain, or a non-nuclease-active Cas (dCas) domain. In embodiments, the endonuclease domain or DNA-binding domain comprises a nuclease-active Cas9 domain, a Cas9 nickase (nCas 9) domain, or a non-nuclease-active Cas9 (dCas 9) domain. In some embodiments, the endonuclease domain or DNA-binding domain comprises the domain Cas9 of Cas9 (e.g., dCas9 and nCas 9), cas12a/Cpfl, cas12b/C2cl, cas12C/C2C3, cas12d/CasY, cas12e/CasX, cas12g, cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA-binding domain comprises Cas9 (e.g., dCas9 and nCas 9), cas12a/Cpfl, cas12b/C2cl, cas12C/C2C3, cas12d/CasY, cas12e/CasX, cas12g, cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA-binding domain comprises streptococcus pyogenes or streptococcus thermophilus Cas9, or a functional fragment thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas9 sequence, e.g., as described in Chylinski, rhun, and Charpentier (2013) RNA Biology [ RNA Biology ]10:5, 726-737; which is incorporated herein by reference. In some embodiments, the endonuclease domain or DNA-binding domain comprises the HNH nuclease subdomain and/or RuvC1 subdomain of Cas, e.g., cas9 as described herein, or a variant thereof. In some embodiments, the endonuclease domain or DNA-binding domain comprises Cas12a/Cpfl, cas12b/C2cl, cas12C/C2C3, cas12d/CasY, cas12e/CasX, cas12g, cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA-binding domain comprises a Cas polypeptide (e.g., an enzyme) or a functional fragment thereof. In embodiments, the Cas polypeptide (e.g., enzyme) is selected from the group consisting of Cas1, cas1B, cas2, cas3, cas4, cas5d, cas5t, cas5h, cas5A, cas6, cas7, cas8a, cas8b, cas8C, cas9 (e.g., csn1 or Csx 12), cas10d, cas12a/Cpfl, cas12b/C2cl, cas12C/C2C3, cas12d/CasY, cas12e/CasX, cas12g, cas12h, cas12i, csy1, csy2, csy3, csy4, cse1, cse2, cse3, cse4, cse5e, csc1, csc2, csa5, csn1, csn2, csm1, csm2, csm3, csm4, csm5, csm6, cmr1, cmr3, cmr4, cmr5, cmr6, csb1, csb2, csb3, csx17, csx14, csx10, csx16, csaX, csx3, csx1S, csx, f 1; csf2, csO, csf4, csd1, csd2, cst1, cst2, csh1, csh2, csa1, csa2, csa3, csa4, csa5, type II Cas effector protein, type V Cas effector protein, type VI Cas effector protein, CARF, dinG, cpf1, cas12b/C2C1, cas12C/C2C3, spCas9 (K855A), eSpCas9 (1.1), spCas9-HF1, ultra-precise Cas9 variants (hypaacas 9), homologues thereof, modified or engineered versions thereof, and/or functional fragments thereof. In embodiments, cas9 comprises one or more substitutions, e.g., selected from H840A, D10A, P475A, W476A, N477A, D1125A, W1126A, and D1127A. In embodiments, cas9 comprises one or more mutations at a position selected from the group consisting of: d10, G12, G17, E762, H840, N854, N863, H982, H983, a984, D986, and/or a987, for example, selected from one or more substitutions of D10A, G12A, G17A, E762A, H840A, N854A, N A, H982A, H983A, A984A, and/or D986A. In some embodiments, the endonuclease domain or DNA-binding domain comprises a Cas (e.g., cas 9) sequence or a fragment or variant thereof from: corynebacterium ulcerans (Corynebacterium ulcerans), corynebacterium diphtheriae (Corynebacterium diphtheria), spiroplasma syphilis (Spiroplasma syrphidicola), prasugrel intermedia (Prevotella intermedia), spiroplasma taiwanense, streptococcus iniae (Streptococcus iniae), bordetella septicum (Belliella baltica), campylobacter contorted (Psychroflexus torquis), streptococcus thermophilus (Streptococcus thermophilus), listeria innocuous (Listeria innocua), campylobacter jejuni (Campylobacter jejuni), neisseria meningitidis (Neisseria meningitidis), streptococcus pyogenes (Streptococcus pyogenes), or staphylococcus aureus (Staphylococcus aureus).

In some embodiments, the endonuclease domain or DNA binding domain (e.g., as described herein) comprises a Cpf1 domain, e.g., comprising one or more substitutions, e.g., at positions D917, E1006A, D1255, or any combination thereof, e.g., selected from D917A, E1006, A, D1255A, D a/E1006A, D917A/D1255A, E1006A/D1255A, and D917A/E1006A/D1255A.

In some embodiments, the endonuclease domain or DNA binding domain (e.g., as described herein) comprises spCas9, spCas9-VRQR (SEQ ID NO: 1696), spCas9-VRER (SEQ ID NO: 1697), xCas9 (sp), sacCas 9-KKH, spCas9-MQKSER (SEQ ID NO: 1698), spCas9-LRKIQK (SEQ ID NO: 1699), or spCas9-LRVSQL (SEQ ID NO: 1700).

In some embodiments, the endonuclease domain or DNA binding domain (e.g., as described herein) comprises, or has at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto as set forth in table 37 below. In some embodiments, an endonuclease domain or DNA binding domain comprises an amino acid sequence that has no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 differences (e.g., mutations) relative to any of the amino acid sequences described herein.

Table 37. Each of the reference sequences is incorporated by reference in its entirety.

/>

In some embodiments, the Gene Writing polypeptide has an endonuclease domain comprising a Cas9 nickase, e.g., cas9H 840A. In embodiments, cas9H840A has the following amino acid sequence:

cas9 nickase (H840A):

DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO:1597)

in some embodiments, the Gene Writing polypeptide comprises an RT domain from a retroviral reverse transcriptase, such as a wild type M-MLV RT, e.g., comprising the following sequence: M-MLV (WT):

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLI(SEQ ID NO:1598)

in some embodiments, the Gene Writing polypeptide comprises an RT domain from a retroviral reverse transcriptase, such as M-MLV RT, e.g., comprising the following sequence:

TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL(SEQ ID NO:1556)

in some embodiments, the Gene Writing polypeptide comprises an RT domain from a retroviral reverse transcriptase comprising the sequence of amino acids 659-1329 of np_ 057933. In embodiments, the Gene Writing polypeptide further comprises one additional amino acid at the N-terminus of the sequence of amino acids 659-1329 of np_057933, e.g., as shown below:

core RT (bold), annotated as above

RNase H(underlined), as noted above

In embodiments, the Gene Writing polypeptide further comprises one additional amino acid at the C-terminus of the amino acid 659-1329 sequence of np_ 057933. In embodiments, the Gene Writing polypeptide comprises an RNase H1 domain (e.g., amino acids 1178-1318 of NP-057933).

In some embodiments, a retroviral reverse transcriptase domain, such as M-MLV RT, may comprise one or more mutations in the wild type sequence which may improve characteristics of the RT, such as thermostability, sustained synthesis capacity and/or template binding. In some embodiments, the M-MLV RT domain comprises one or more mutations relative to the above-described M-MLV (WT) sequence, e.g., selected from D200N, L603W, T P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D N, R110 and S, K L, e.g., a combination of mutations, e.g., D200N, L603W, and T330P, optionally further comprising T306K and W313F. In some embodiments, the M-MLV RT as used herein comprises the mutations D200N, L603W, T P, T K and W313F. In an embodiment, the mutant M-MLV RT comprises the following amino acid sequence:

M-MLV(PE2)：

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLI(SEQ ID NO:1600)

in some embodiments, the Gene Writer polypeptide comprises a flexible linker between the endonuclease and the RT domain, e.g., a linker comprising amino acid sequence SGGSSGGSSGSETPGTSESATPE SSGGSSGGSS (SEQ ID NO: 1601). In some embodiments, the RT domain of the Gene Writer polypeptide may be located at the C-terminus of the endonuclease domain. In some embodiments, the RT domain of the Gene Writer polypeptide may be located N-terminal to the endonuclease domain.

In some embodiments, the Gene Writer polypeptide comprises a dCas9 sequence that contains D10A and/or H840A mutations, e.g., the following sequences:

SMDKKYSIGLAIGTNSVGWAVITDDYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO:1602)

in some embodiments, the template RNA molecule used in the system comprises (1) a gRNA spacer from 5 'to 3'; (2) a gRNA scaffold; (3) a 3' homologous domain of the heterologous subject sequence (4). In some embodiments:

(1) Is a Cas9 spacer of about 18-22nt (e.g., 20 nt).

(2) Is a gRNA scaffold comprising one or more hairpin loops, e.g., 1, 2, 3 loops, for associating a template with a nickase Cas9 domain. In some embodiments, the gRNA scaffold carries a 5 'to 3' sequence, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAG TCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC (SEQ ID NO: 1603).

(3) In some embodiments, the heterologous subject sequence length is, for example, 7-74, such as 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, or 70-80nt or 80-90nt. In some embodiments, the first (5' most) base of the sequence is not C.

(4) In some embodiments, the 3' cognate domain that binds to the target priming sequence after cleavage occurs is, for example, 3-20nt, such as 7-15nt, such as 12-14nt. In some embodiments, the 3' homeodomain has a GC content of 40% -60%.

A second gRNA associated with the system may help drive complete integration. In some embodiments, the second gRNA can be targeted at a location 0-200nt from the first strand incision, e.g., 0-50, 50-100, 100-200nt from the first strand incision. In some embodiments, the second gRNA can only bind to its target sequence after editing is performed, e.g., the gRNA binds to a sequence that is present in the heterologous subject sequence but not in the original target sequence.

In some embodiments, the Gene Writing system described herein is used to edit in HEK293, K562, U2OS, or HeLa cells. In some embodiments, the Gene Writing system is used to edit in primary cells (e.g., primary cortical neurons from E18.5 mice).

In some embodiments, the reverse transcriptase or RT domain (e.g., as described herein) comprises a MoMLV RT sequence or variant thereof. In embodiments, the MoMLV RT sequence comprises one or more mutations selected from the group consisting of: d200N, L603W, T P, T306K, W F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H R, E K, F309N, L435G, N454K, H594Q, D653N, R S, and K103L. In embodiments, the MoMLV RT sequence comprises a combination of mutations (e.g., D200N, L603W and T330P), optionally further comprising T306K and/or W313F.

In some embodiments, the endonuclease domain (e.g., as described herein) comprises nCAS9, e.g., comprises an H840A mutation.

In some embodiments, the heterologous subject sequence (e.g., of a system as described herein) is about 1-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000 or more nucleotides in length.

In some embodiments, the RT and endonuclease domains are linked by a flexible linker, e.g., comprising amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 1601).

In some embodiments, the endonuclease domain is N-terminal to the RT domain. In some embodiments, the endonuclease domain is C-terminal with respect to the RT domain.

In some embodiments, the system incorporates a heterologous subject sequence into the target site by TPRT, e.g., as described herein.

In some embodiments, the systems or methods described herein relate to CRISPR DNA-targeted enzymes or systems described in U.S. patent application publication nos. 20200063126, 20190002889, or 20190002875 (each of which is incorporated herein by reference in its entirety) or functional fragments or variants thereof. For example, in some embodiments, the GeneWriter polypeptide or Cas endonuclease described herein comprises a polypeptide sequence of any of the applications mentioned in this paragraph, and in some embodiments, the template RNA or guide RNA comprises a nucleic acid sequence of any of the applications mentioned in this paragraph.

The template nucleic acid (e.g., template RNA) component of the Gene Writer genome editing system described herein is generally capable of binding to the Gene Writer genome editing protein of the system. In some embodiments, a template nucleic acid (e.g., a template RNA) has a 3' region that is capable of binding a Gene Writer genome editing protein. The binding region, e.g., the 3' region, may be a structured RNA region, e.g., having at least 1, 2, or 3 hairpin loops, which are capable of binding to the Gene Writer genome editing protein of the system. The binding region can associate a template nucleic acid (e.g., a template RNA) with any polypeptide module. In some embodiments, the binding region of a template nucleic acid (e.g., a template RNA) can be associated with an RNA binding domain in a polypeptide. In some embodiments, the binding region of a template nucleic acid (e.g., a template RNA) can be associated with a reverse transcription domain of a polypeptide (e.g., specifically binds to an RT domain). For example, when the reverse transcription domain is derived from a non-LTR retrotransposon, the template nucleic acid (e.g., template RNA) may comprise a binding region derived from a non-LTR retrotransposon, e.g., a 3' utr from a non-LTR retrotransposon. In some embodiments, a template nucleic acid (e.g., a template RNA) can be associated with a DNA binding domain of a polypeptide, e.g., a gRNA is associated with a Cas 9-derived DNA binding domain. In some embodiments, the binding region can also provide DNA target recognition, e.g., the gRNA hybridizes to a target DNA sequence and binds to a polypeptide, e.g., a Cas9 domain. In some embodiments, a template nucleic acid (e.g., a template RNA) can be associated with multiple components of a polypeptide (e.g., a DNA binding domain and a reverse transcription domain). For example, a template nucleic acid (e.g., a template RNA) can comprise a gRNA region associated with a DNA binding domain derived from Cas9 and a 3' utr from a non-LTR retrotransposon associated with a retrotranscription domain derived from a non-LTR retrotransposon.

In some embodiments, the systems or methods described herein comprise a single template nucleic acid (e.g., template RNA). In some embodiments, the systems or methods described herein comprise a plurality of template nucleic acids (e.g., template RNAs). For example, the systems described herein comprise a first RNA comprising (e.g., from 5' to 3 ') a sequence that binds to a Gene Writer polypeptide (e.g., a DNA binding domain and/or an endonuclease domain, e.g., a gRNA) and a sequence that binds to a target site (e.g., an unedited strand of a site in the target genome) and a second RNA (e.g., a template RNA) comprising (e.g., from 5' to 3 ') optionally a sequence that binds to a Gene Writer polypeptide (e.g., a specific binding RT domain), a heterologous subject sequence, and a 3' homologous domain. In some embodiments, when the system comprises a plurality of nucleic acids, each nucleic acid comprises a conjugation domain. In some embodiments, the conjugation domain enables association of the nucleic acid molecule, e.g., by hybridization of complementary sequences.

In some embodiments, the template nucleic acid molecules described herein comprise 5 'and/or 3' homology regions. In some embodiments, the 5' homology region comprises a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity to a nucleic acid sequence comprised in a target nucleic acid molecule. In embodiments, the nucleic acid sequence in the target nucleic acid molecule is within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of the target insertion site (e.g., at 5' relative to the target insertion site), e.g., for a heterologous subject sequence, e.g., contained in a template nucleic acid molecule.

In some embodiments, the 3' homology region comprises a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity to a nucleic acid sequence comprised in a target nucleic acid molecule. In embodiments, the nucleic acid sequence in the target nucleic acid molecule is within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of the target insertion site (e.g., at 3' relative to the target insertion site), e.g., for a heterologous subject sequence, e.g., contained in a template nucleic acid molecule. In some embodiments, the 5' homology region is heterologous to the remainder of the template nucleic acid molecule. In some embodiments, the 3' homology region is heterologous to the remainder of the template nucleic acid molecule.

In some embodiments, a template nucleic acid (e.g., a template RNA) comprises a 3' target homeodomain. In some embodiments, the 3 'target homology domain is located 3' of the heterologous subject sequence and is complementary to a sequence adjacent to the site to be modified by the systems described herein, or to the system/Gene Writer to be driven ^TM The sequence complementary to the adjacent sequence of the modified site contains no more than 1, 2, 3, 4 or 5 mismatches. In some embodiments, the 3' homology region binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of the nick site in the target nucleic acid molecule. In some embodiments, binding of the 3 'homology region to the target nucleic acid molecule allows initiation of target-initiated reverse transcription (TPRT), e.g., the 3' homology region serves as a primer for TPRT. In some embodiments, the 3 'target homeodomain anneals to the target site, which provides a binding site and 3' hydroxyl for the Gene Writer polypeptide to initiate TPRT. In some embodiments of the present invention, in some embodiments, the 3' target homeodomain length is 3-5, 5-10, 10-30, 10-25, 10-20, 10-19, 10-18, 10-17, 10-16, 10-15, 10-14, 10-13, 10-12, 10-11, 11-30, 11-25, 11-20, 11-19, 11-18, 11-17, 11-16, 11-15, 11-14, 11-13, 11-12, 12-30, 12-25, 12-20, 12-19, 12-18, 12-17, 12-16, 12-15, 12-14, 12-13, 13-30, 13-25, 13-20, 13-19 13-18, 13-17, 13-16, 13-15, 13-14, 14-30, 14-25, 14-20, 14-19, 14-18, 14-17, 14-16, 14-15, 15-30, 15-25, 15-20, 15-19, 15-18, 15-17, 15-16, 16-30, 16-25, 16-20, 16-19, 16-18, 16-17, 17-30, 17-25, 17-20, 17-19, 17-18, 18-30, 18-25, 18-20, 18-19, 19-30, 19-25, 19-20, 20-30, 20-25, or 25-30nt, for example, the length is 10-17, 12-16, or 12-14nt.

In some embodiments, a template nucleic acid, e.g., a template RNA, may comprise a gRNA (e.g., pegRNA). In some embodiments, a template nucleic acid, e.g., a template RNA, can bind to a Gene Writer by interaction of a gRNA portion of the template nucleic acid with a template nucleic acid binding domain, e.g., an RNA binding domain (e.g., a heterologous RNA binding domain) ^TM A polypeptide. In some embodiments, the heterologous RNA binding domain is a CRISPR/Cas protein, such as Cas9.

In some embodiments, the region of the template nucleic acid (e.g., template RNA) comprising the gRNA employs a down-winding band structure of the gRNA that binds to the target DNA (e.g., as described in Mulepati et al Science [ Science ]2014, 9, 19: volume 345, 6203, pages 1479-1484). Without wishing to be bound by theory, this non-canonical structure is thought to be promoted by the exchange of RNA-DNA hybrids every six nucleotide rounds. Thus, in some embodiments, a region of a template nucleic acid (e.g., template RNA) comprising a gRNA can tolerate increased mismatches with a target site at some interval (e.g., every six bases). In some embodiments, a region of a template nucleic acid (e.g., template RNA) comprising a gRNA that is homologous to a target site can have wobble positions at regular intervals (e.g., every six bases) that do not require base pairing with the target site.

gRNA with induction activity

In some embodiments, the template nucleic acid, e.g., template RNA, comprises guide RNA (gRNA) with induction activity. The induction activity may be achieved by a template nucleic acid, e.g. a template RNA, which (in addition to the gRNA) further comprises a blocking domain, wherein the sequence of part or all of the blocking domain is at least partially complementary to part or all of the gRNA. Thus, the blocking domain is capable of hybridizing or substantially hybridizing to a portion or all of the gRNA. In some embodiments, the blocking domain and the inducible active gRNA are disposed on a template nucleic acid, e.g., a template RNA, such that the gRNA can adopt a first conformation (in which the blocking domain hybridizes or substantially hybridizes to the gRNA) and a second conformation (in which the blocking domain does not hybridize or substantially does not hybridize to the gRNA). In some embodiments, in the first conformation, the gRNA is unable to bind to a Gene Writer polypeptide (e.g., a template nucleic acid binding domain, DNA binding domain, or endonuclease domain (e.g., CRISPR/Cas protein)) or binds in a manner that has significantly reduced affinity compared to a similar template RNA lacking the blocking domain. In some embodiments, in the second conformation, the gRNA is capable of binding to a Gene Writer polypeptide (e.g., a template nucleic acid binding domain, a DNA binding domain, or an endonuclease domain (e.g., CRISPR/Cas protein)). In some embodiments, whether the gRNA is in the first or second conformation can affect whether DNA binding or endonuclease activity of a Gene Writer polypeptide (e.g., of a CRISPR/Cas protein comprised by the Gene Writer polypeptide) is active. In some embodiments, hybridization of the gRNA to the blocking domain can be disrupted using an open molecule. In some embodiments, the open molecule comprises an agent that binds to part or all of the gRNA or the blocking domain and inhibits hybridization of the gRNA to the blocking domain. In some embodiments, the open molecule comprises a nucleic acid, e.g., comprising a sequence that is partially or fully complementary to a gRNA, a blocking domain, or both. By selecting or designing appropriate open molecules, the provided open molecules can facilitate a change in the conformation of the gRNA such that it can associate with and provide related functions (e.g., DNA binding and/or endonuclease activity) of the CRISPR/Cas protein. Without wishing to be bound by theory, providing open molecules at selected times and/or positions may allow spatial and temporal control of the activity of the gRNA, CRISPR/Cas proteins, or the Gene Writer system comprising them. In some embodiments, the open molecule is exogenous to a cell comprising a Gene Writer polypeptide and/or a template nucleic acid. In some embodiments, the open molecule comprises an endogenous agent (e.g., is endogenous to a cell comprising a Gene Writer polypeptide and/or a template nucleic acid comprising a gRNA and a blocking domain). For example, the inducible gRNA, blocking domain, and open molecule can be selected such that the open molecule is an endogenous agent expressed in the target cell or tissue, e.g., to ensure the activity of the Gene Writer system in the target cell or tissue. As another example, the inducible gRNA, the blocking domain, and the open molecule can be selected such that the open molecule is absent or substantially absent from one or more non-target cells or tissues, e.g., to ensure that the activity of the Gene Writer system does not occur or does not substantially occur in one or more non-target cells or tissues, or occurs at a reduced level compared to the target cells or tissues. Exemplary blocking domains, open molecules, and uses thereof are described in PCT application publication WO 2020044039 A1 (which is incorporated herein by reference in its entirety). In some embodiments, a template nucleic acid, e.g., a template RNA, may comprise one or more UTRs (e.g., from an R2-type retrotransposon) and grnas. In some embodiments, UTR facilitates interaction of a template nucleic acid (e.g., a template RNA) with a writing domain of a Gene Writer polypeptide, such as a reverse transcriptase domain. In some embodiments, the gRNA facilitates interaction with a template nucleic acid binding domain (e.g., an RNA binding domain) of a polypeptide. In some embodiments, the gRNA directs the polypeptide to a matching target sequence, e.g., in the target cell genome. In some embodiments, the template nucleic acid may comprise only reverse transcriptase binding motifs (e.g., 3' utr from R2), and the gRNA may be provided as a second nucleic acid molecule (e.g., a second RNA molecule) for target site recognition. In some embodiments, the template nucleic acid containing the RT binding motif may be present on the same molecule as the gRNA, but processed into two RNA molecules by cleavage activity (e.g., ribozymes).

In some embodiments, the template RNA can be customized to correct a given mutation in the genomic DNA of the target cell (e.g., ex vivo or in vivo, e.g., in a target tissue or organ, e.g., in a subject). For example, the mutation may be a disease-related mutation relative to the wild-type sequence. Without wishing to be bound by theory, the set of empirical parameters helps ensure optimal initial computer-simulated design of the template RNA or portion thereof). As non-limiting illustrative examples, for selected mutations, the following design parameters may be employed. In some embodiments, the design is initiated by obtaining flanking sequences of about 500bp (e.g., up to 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, or 700bp and optionally at least 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, or 650 bp) on either side of the mutation as the target region. In some embodiments, the template nucleic acid comprises a gRNA. Methods for designing grnas are known to those skilled in the art. In some embodiments, the gRNA comprises a sequence that binds to a target site (e.g., a CRISPR spacer). The sequence that binds to the target site (e.g., CRISPR spacer) is selected for targeting the template nucleic acid to the target region by: consider the particular Gene Writer polypeptide (e.g., endonuclease domain or writing domain, e.g., comprising a CRISPR/Cas domain) used (e.g., for Cas9, the Protospacer Adjacent Motif (PAM) of NGG immediately 3' to 20nt gRNA binding region). In some embodiments, CRISPR spacers are selected by first depending on whether PAM is to be destroyed by Gene Writing-induced editing. In some embodiments, interruption of PAM may increase editing efficiency. In some embodiments, PAM may be destroyed by: silent mutations (e.g., mutations that do not alter the amino acid residues encoded by the target nucleic acid sequence, if any) are also introduced into the target site during Gene Writing (e.g., as part of another modification of the target site in genomic DNA or in addition thereto). In some embodiments, the CRISPR spacer is selected by: the sequences are ordered according to the proximity of their respective genomic loci to the desired editing position. In some embodiments, the gRNA comprises a gRNA scaffold. In some embodiments, the gRNA scaffold used may be a standard scaffold (e.g., for Cas9,5'-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCG TTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC-3' (SEQ ID NO: 1603)), or may contain one or more nucleotide substitutions. In some embodiments, the heterologous subject sequence has at least 90% identity, e.g., at least 90%, 95%, 98%, 99% or 100% identity, or comprises no more than 1, 2, 3, 4 or 5 non-identical positions to the target site of the first strand slit 3 '(e.g., up to 1, 2, 3, 4 or 5 nucleotides immediately adjacent to the first strand slit or 3' of the first strand slit), except for any insertions, substitutions or deletions that may be written to the target site by the Gene Writer. In some embodiments, the 3 'target homeodomain has at least 90% identity, e.g., at least 90%, 95%, 98%, 99% or 100% identity, or comprises no more than 1, 2, 3, 4 or 5 non-identical positions, to the target site of the 5' of the first strand incision (e.g., up to 1, 2, 3, 4 or 5 nucleotides immediately adjacent to the 5 'of the first strand incision or 3' of the first strand incision).

In some embodiments, the template has one or more sequences that facilitate binding of the template to the Gene Writer polypeptide. In some embodiments, these sequences may be derived from a retrotransposon UTR. In some embodiments, the UTR may flank the desired insertion sequence. In some embodiments, sequences with target site homology may be located outside of one or both UTRs. In some embodiments, a sequence having homology to a target site may anneal to the target sequence to initiate reverse transcription. In some embodiments, the 5 'and/or 3' utr may be located at the end of the target site homology sequence, e.g., such that target-initiated reverse transcription excludes reverse transcription of the 5 'and/or 3' utr. In some embodiments, the Gene Writer system can result in insertion of the desired payload without any additional sequence (e.g., no Gene expression units for UTR binding to the Gene Writer protein).

Template nucleic acids (e.g., template RNAs) may be designed to create insertions, mutations, or deletions at a target DNA locus. In some embodiments, a template nucleic acid (e.g., a template RNA) may be designed to result in insertion of a target DNA. For example, a template nucleic acid (e.g., a template RNA) may contain a heterologous sequence, wherein reverse transcription will result in insertion of the heterologous sequence into the target DNA. In other embodiments, the RNA template may be designed to write a deletion to the target DNA. For example, a template nucleic acid (e.g., a template RNA) may match the target DNA upstream and downstream of the desired deletion, where reverse transcription will result in replication of the sequences upstream and downstream from the template nucleic acid (e.g., the template RNA) without intervening sequences, e.g., resulting in deletion of intervening sequences. In other embodiments, a template nucleic acid (e.g., template RNA) may be designed to write edits to a target DNA. For example, the template RNA may match the target DNA sequence except for one or more nucleotides, where reverse transcription will cause these edits to replicate into the target DNA, e.g., causing mutations, such as transition or transversion mutations.

In some embodiments, the Gene Writer system is capable of generating at least 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotide (and optionally no more than 500, 400, 300, 200, or 100 nucleotide) insertions in the target site. In some embodiments, the Gene Writer system is capable of generating at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotide (and optionally no more than 500, 400, 300, 200, or 100 nucleotide) insertions in the target site. In some embodiments, the Gene Writer system is capable of generating at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 kilobase (and optionally no more than 1, 5, 10, or 20 kilobase) insertions in the target site. In some embodiments, the Gene Writer system is capable of producing at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotide deletions (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, the Gene Writer system is capable of producing at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotide deletions (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, the Gene Writer system is capable of producing a deletion of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, the Gene Writer system is capable of producing at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases) deletions. In some embodiments, the Gene Writer system is capable of producing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 or more nucleotide substitutions in the target site. In some embodiments, the substitution is a transition mutation. In some embodiments, the substitution is a transversion mutation. In some embodiments, the substitution converts adenine to thymine, adenine to guanine, adenine to cytosine, guanine to thymine, guanine to cytosine, guanine to adenine, thymine to cytosine, thymine to adenine, thymine to guanine, cytosine to adenine, cytosine to guanine, cytosine to cytosine, or cytosine to thymine.

Methods and compositions for modified RNAs (e.g., grnas or template RNAs)

In some embodiments, the RNA component of the system (e.g., template RNA or gRNA, e.g., as described herein) comprises one or more nucleotide modifications. In some embodiments, the pattern of modification of the gRNA can significantly affect in vivo activity as compared to an unmodified or terminally modified guide (e.g., as shown in FIG. 1D from Finn et al Cell Rep [ Cell report ]22 (9): 2227-2235 (2018; incorporated herein by reference in its entirety). Without wishing to be bound by theory, this process may be due, at least in part, to the RNA stability conferred by the modification. Non-limiting examples of such modifications may include 2 '-O-methyl (2' -O-Me), 2'-0- (2-methoxyethyl) (2' -0-MOE), 2 '-fluoro (2' -F), phosphorothioate (PS) linkages between nucleotides, G-C substitutions, and reverse abasic linkages between nucleotides and equivalents thereof.

In some embodiments, the template RNA (e.g., at the portion thereof that binds to the target site) or guide RNA comprises a 5' terminal region. In some embodiments, the template RNA or guide RNA does not comprise a 5' terminal region. In some embodiments, the 5' terminal region comprises a CRISPR spacer region, e.g., as described in Briner AE et al, molecular Cell [ Molecular cells ]56:333-339 (2014) for sgRNA (incorporated herein by reference in its entirety; applicable herein, e.g., for all guide RNAs). In some embodiments, the 5 'terminal region comprises a 5' terminal modification. In some embodiments, the 5' end region with or without a spacer may be associated with crRNA, trRNA, sgRNA and/or dgRNA. In some cases, a CRISPR spacer can comprise a guiding region, guiding domain, or targeting domain. In some embodiments, the target domain or target sequence may comprise a nucleic acid sequence that directs cleavage of the nuclease by the guide region/domain. In some embodiments, the spyCas9 protein can be directed to the target sequence of the target nucleic acid molecule by the guide region/domain through the nucleotides present in the CRISPR spacer.

In some embodiments, the template RNA (e.g., at a portion thereof that binds to a target site) or guide RNA, e.g., as described herein, comprises any of the sequences shown in table 4 of WO 2018107028A1 (incorporated herein by reference in its entirety). In some embodiments, when the sequence displays a guide region and/or a spacer region, the composition may or may not include that region. In some embodiments, the guide RNA comprises one or more modifications of any of the sequences shown in table 4 of WO 2018107028A1 (e.g., as represented by SEQ ID NO therein). In embodiments, the nucleotides may be the same or different, and/or the modification pattern shown may be the same or similar to the modification pattern of the guide sequence shown in table 4 of WO 2018107028 A1. In some embodiments, the pattern of modification includes the relative position and identity of the modification of the gRNA or gRNA region (e.g., 5 'terminal region, lower stem region, raised region, upper stem region, linking region, hairpin 1 region, hairpin 2 region, 3' terminal region). In some embodiments, the modification pattern comprises at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the modification of any one of the sequences shown in the sequence columns of table 4 of WO 2018107028A1 and/or modifications over one or more regions of the sequence. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical to the modification pattern of any one of the sequences shown in the sequence columns of table 4 of WO 2018107028 A1. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over one or more regions (e.g., over the 5 'terminal region, lower stem region, raised region, upper stem region, linking region, hairpin 1 region, hairpin 2 region, and/or 3' terminal region) of the sequences shown in table 4 of WO 2018107028 A1. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the modification pattern of the sequence on the 5' terminal region. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical on the lower stem. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on the projections. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical on the upper stem. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the linking region. In some embodiments, at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on modified pattern hairpin 1. In some embodiments, at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical on modified pattern hairpin 2. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical on the 3' end. In some embodiments, the modification pattern differs from the modification pattern of the sequence of table 4 of WO 2018107028A1 or a region of such a sequence (e.g., 5 'terminus, lower stem, bulge, upper stem, ligation, hairpin 1, hairpin 2, 3' terminus), e.g., at 0, 1, 2, 3, 4, 5, 6 or more nucleotides. In some embodiments, the gRNA comprises modifications that differ from modifications of the sequence of table 4 of WO 2018107028A1, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides. In some embodiments, the gRNA comprises modifications that differ, for example, at 0, 1, 2, 3, 4, 5, 6, or more nucleotides, of the modification (e.g., 5 'terminus, lower stem, bulge, upper stem, ligation, hairpin 1, hairpin 2, 3' terminus) of the region of the sequence of table 4 of WO 2018107028 A1.

In some embodiments, the template RNA (e.g., at the portion thereof that binds to the target site) or the gRNA comprises 2 '-O-methyl (2' -O-Me) modified nucleotides. In some embodiments, the gRNA comprises 2'-O- (2-methoxyethyl) (2' -O-moe) modified nucleotides. In some embodiments, the gRNA comprises 2 '-fluoro (2' -F) modified nucleotides. In some embodiments, the gRNA comprises Phosphorothioate (PS) linkages between nucleotides. In some embodiments, the gRNA comprises a 5 'terminal modification, a 3' terminal modification, or a 5 'and 3' terminal modification. In some embodiments, the 5' terminal modification comprises Phosphorothioate (PS) linkages between nucleotides. In some embodiments, the 5' -terminal modification comprises a 2' -O-methyl (2 ' -O-Me), 2' -O- (2-methoxyethyl) (2 ' -O-MOE), and/or 2' -fluoro (2 ' -F) modified nucleotide. In some embodiments, the 5' terminal modification comprises at least one Phosphorothioate (PS) linkage and one or more of 2' -O-methyl (2 ' -O-Me), 2' -O- (2-methoxyethyl) (2 ' -O-MOE) and/or 2' -fluoro (2 ' -F) modified nucleotides. The terminal modifications may comprise Phosphorothioate (PS), 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-MOE) and/or 2 '-fluoro (2' -F) modifications. Equivalent terminal modifications are also included in the examples described herein. In some embodiments, the template RNA or gRNA comprises a modification of the end in combination with a modification of one or more regions of the template RNA or gRNA. Other exemplary modifications and methods for protecting RNA (e.g., gRNA) and formulae thereof are described in WO 2018126176 A1 (which is incorporated herein by reference in its entirety).

In some embodiments, a structurally directed and systematic approach is used to introduce modifications (e.g., 2'-OMe-RNA, 2' -F-RNA, and PS modifications) to a template RNA or guide RNA, e.g., as described in Mir et al, nat com [ natural communication ]9:2641 (2018) (incorporated herein by reference in its entirety). In some embodiments, the incorporation of 2'-F-RNA increases the thermal stability and nuclease stability of the RNA-RNA or RNA-DNA duplex, e.g., while minimizing interference with C3' -internal sugar folds. In some embodiments, 2' -F may be more tolerant than 2' -Ome where 2' -OH is important for RNA: DNA duplex stability. In some embodiments, the crRNA comprises one or more modifications that do not reduce Cas9 activity, such as C10, C20, or C21 (fully modified), e.g., as described in supplemental table 1 of Mir et al, nat communication, 9:2641 (2018) (incorporated herein by reference in its entirety). In some embodiments, the tracrRNA comprises one or more modifications that do not reduce Cas9 activity, e.g., mir et al, nat com [ natural communication ]9:2641 (2018) that complement T2, T6, T7, or T8 (fully modified) described in table 1. In some embodiments, a crRNA comprising one or more modifications (e.g., as described herein) can be paired with a tracrRNA comprising one or more modifications (e.g., C20 and T2). In some embodiments, the gRNA comprises chimeras of, for example, crRNA and tracrrRNA (e.g., jinek et al Science [ Science ]337 (6096): 816-821 (2012)). In embodiments, modifications from crrnas and tracrRNA are mapped onto single guide chimeras, e.g., to produce modified grnas with enhanced stability.

In some embodiments, the gRNA molecule can be modified by adding or subtracting naturally occurring structural components, such as hairpins. In some embodiments, the gRNA may comprise a gRNA lacking one or more 3' hairpin elements, e.g., as described in WO 2018106727 (incorporated herein by reference in its entirety). In some embodiments, the gRNA can comprise added hairpin structures, e.g., hairpin structures added in spacer regions, which have been shown to increase the specificity of the CRISPR-Cas system in the teachings of Kocak et al Nat Biotechnol [ Nature Biotechnology ]37 (6): 657-666 (2019). Examples of other modifications, including shortened gRNA and specific modifications that increase in vivo activity, can be found in US 20190316121 (incorporated herein by reference in its entirety).

In some embodiments, a structurally directed and systematic approach (e.g., as described in Mir et al Nat Commun [ Nature communication ]9:2641 (2018); incorporated herein by reference in its entirety) is used to find modifications to the template RNA. In embodiments, the modification is identified by inclusion or exclusion of a guide region of the template RNA. In some embodiments, the structure of the polypeptide that binds to the template RNA is used to determine the non-protein contact nucleotides of the RNA, which may then be selected for modification, e.g., where there is less risk of disrupting RNA binding to the polypeptide. The secondary structure in the template RNA can also be predicted on computer by software means, for example the rnastruct means can be obtained as follows: the rna. Url. Rochester. Edu/RNAstructureWeb (Bellaousov et al Nucleic Acids Res [ nucleic acids research ]41:W471-W474 (2013); incorporated herein by reference in its entirety), for example, to determine secondary structures, such as hairpins, stems, and/or projections, for selection modifications.

Also included herein are compositions and methods for assembling complete or partial template RNA molecules (e.g., gene Writing template RNA molecules, optionally comprising gRNA, or gRNA molecules alone). In some embodiments, an RNA molecule can be assembled by ligating two or more (e.g., two, three, four, five, six, seven, eight, nine, ten, or more) RNA segments to each other. In one aspect, the disclosure provides a method for producing a nucleic acid molecule comprising contacting two or more linear RNA segments with each other under conditions that allow covalent linkage of the 5 'end of a first RNA segment with the 3' end of a second RNA segment. In some embodiments, the linker molecule may be contacted with the third RNA segment under conditions that allow covalent linkage of the 5 'end of the linker molecule to the 3' end of the third RNA segment. In embodiments, the method further comprises ligating a fourth, fifth, or additional RNA segment to the elongated molecule. In some cases, this form of assembly may allow for rapid and efficient assembly of RNA molecules.

The disclosure also provides compositions and methods for the ligation (e.g., covalent ligation) of crRNA molecules and tracrRNA molecules. In some embodiments, a single tracrRNA molecule/segment linked to a target site specific crRNA molecule/segment may be used to generate guide RNA molecules specific for different target sites (e.g., as shown in fig. 10 of US 2016102322 A1; incorporated herein by reference in its entirety). For example, fig. 10 of US 2016102322 A1 shows four tubes with different crRNA molecules, wherein crRNA molecule 3 is linked to a tracrRNA molecule to form a guide RNA molecule, thereby depicting exemplary linking of two RNA segments to form a product RNA molecule.

The disclosure also provides compositions and methods for producing template RNA molecules specific for Gene Writer polypeptides and/or genomic target sites. In one aspect, the method comprises: (1) identification of a target site and desired modification thereto, (2) generation of RNA segments, including upstream homologous segments, heterologous subject sequence segments, gene Writer polypeptide binding motifs and gRNA segments, and/or (3) ligation of four or more segments into at least one molecule, e.g., into a single RNA molecule. In some embodiments, some or all of the template RNA segments contained in (2) are assembled into a template RNA molecule, e.g., one, two, three, or four of the listed components. In some embodiments, the segment segments comprised in (2) may be produced in an additional segmented molecule, e.g., divided into at least 2, at least 3, at least 4, or at least 5 or more sub-segments, which are, e.g., subsequently assembled, e.g., by one or more methods described herein.

In some embodiments, the RNA segments can be produced by chemical synthesis. In some embodiments, the RNA segment may be produced by in vitro transcription of a nucleic acid template, for example by providing an RNA polymerase to act on a homologous promoter of the DNA template to produce an RNA transcript. In some embodiments, in vitro transcription is performed using, for example, T7, T3, or SP6 RNA polymerase or derivative thereof that acts on DNA (e.g., dsDNA, ssDNA, linear DNA, plasmid DNA, linear DNA amplicon, linearized plasmid DNA), e.g., encoding RNA segments, e.g., under transcriptional control of a homologous promoter (e.g., T7, T3, or SP6 promoter). In some embodiments, a combination of chemical synthesis and in vitro transcription is used to generate RNA segments for assembly. In embodiments, the gRNA, upstream target homology, and Gene Writer polypeptide binding segments are produced by chemical synthesis, and the heterologous subject sequence segments are produced by in vitro transcription. Without wishing to be bound by theory, in vitro transcription may be more suitable for producing longer RNA molecules. In some embodiments, the reaction temperature of in vitro transcription may be reduced, e.g., below 37 ℃ (e.g., between 0-10 ℃, 10-20 ℃, or 20-30 ℃), to result in a higher proportion of full length transcripts (Krieg Nucleic Acids Res [ nucleic acids research ]18:6463 (1990)). In some embodiments, long template RNA, e.g., greater than 5kb template RNA, is synthesized using protocols that improve long transcript synthesis, e.g., using T7 riboMAX expression that can produce a 27kb transcript in vitro (Thiel et al J Gen Virol [ J.Gen.Virol. ]82 (6): 1273-1281 (2001)). In some embodiments, modifications to the RNA molecules as described herein can be incorporated during RNA segment synthesis (e.g., by inclusion of modified nucleotides or alternative binding chemicals), after synthesis of the RNA segments by chemical or enzymatic processes, after assembly of one or more RNA segments, or a combination thereof.

In some embodiments, mRNA (e.g., mRNA encoding a Gene Writer polypeptide) of the system is synthesized in vitro from a linearized DNA template using T7 polymerase-mediated DNA-dependent RNA transcription, wherein UTP is optionally substituted with 1-methyl pseudo UTP. In some embodiments, transcripts incorporate 5 'and 3' UTRs, such as GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUA UAAGAGCCACC (SEQ ID NO: 1604) and UGAUAAUAGGCUGGAGCC UCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCU CCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA (SEQ ID NO: 1605) or functional fragments or variants thereof, and optionally include poly A tails, which may be encoded in a DNA template or added enzymatically after transcription. In some embodiments, a donor methyl group, such as S-adenosylmethionine, is added to the methylated capped RNA having the cap 0 structure to create a cap 1 structure that increases mRNA translation efficiency (Richner et al Cell [ Cell ]168 (6): pages 1114-1125 (2017)).

In some embodiments, transcripts from the T7 promoter start with a GGG motif. In some embodiments, transcripts from the T7 promoter do not start with a GGG motif. It has been shown that the GGG motif at the start of transcription, while providing higher yields, may lead to a step in the synthesis of poly (G) product by T7 RNAP due to slippage of the transcript from +1 to +3 on the three C residues of the template strand (Imburgio et al Biochemistry 39 (34): 10419-10430 (2000)). The teachings of Davidson et al Pac Symp Biocomput [ Pac Symp Biometrics ]433-443 (2010) describe T7 promoter variants and methods of discovery that meet both of these characteristics, with respect to adjusting transcription levels and altering transcription initiation site nucleotides to accommodate alternative 5' UTRs.

In some embodiments, RNA segments can be linked to each other by covalent coupling. In some embodiments, an RNA ligase (e.g., T4 RNA ligase) may be used to join two or more RNA segments to one another. When a reagent such as RNA ligase is used, the 5 'end is typically ligated to the 3' end. In some embodiments, if two segments are joined, two possible linear constructs may be formed (i.e., (1) 5 '-segment 1-segment 2-3' and (2) 5 '-segment 2-segment 1-3'). In some embodiments, intramolecular cyclization may also occur. These two problems can be solved, for example, by blocking one 5 'end or one 3' end so that the RNA ligase cannot ligate the end to the other end. In an embodiment, if construct 5 '-segment 1-segment 2-3' is desired, placing a blocking group at the 5 'end of segment 1 or the 3' end of segment 2 may result in the formation of only the correct linear ligation product and/or prevent intramolecular cyclization. Compositions and methods for covalently linking two nucleic acid (e.g., RNA) segments are disclosed in, for example, US 20160102322A1 (incorporated herein by reference in its entirety) along with methods comprising using RNA ligase to orient the two single stranded RNA segments to each other.

One example of a terminal blocker that can be used in combination with, for example, T4 RNA ligase is a dideoxy terminator. T4 RNA ligases typically catalyze the ATP-dependent ligation of a phosphodiester bond between the 5 '-phosphate and the 3' -hydroxyl terminus. In some embodiments, when using a T4 RNA ligase, the appropriate terminus must be present on the ligated terminus. One means of blocking T4 RNA ligase at the end includes not having the correct end form. Typically, the end of the RNA segment with 5-hydroxy or 3' -phosphate will not serve as a substrate for T4 RNA ligase.

Additional exemplary methods that can be used to ligate RNA segments are by click chemistry (e.g., as described in U.S. patent nos. 7,375,234 and 7,070,941 and U.S. patent publication No. 2013/0046084, the entire disclosures of which are incorporated herein by reference). For example, an exemplary click chemistry reaction is performed between an alkyne group and an azide group (see fig. 11 of US 2016102322 A1, which is incorporated herein by reference in its entirety). Any click reaction may be used to join RNA segments (e.g., cu-azide-alkyne, strain-promoted azide-alkyne, staudinger (Staudinger) ligation, tetrazine ligation, photoinduced tetrazole-alkene, thiol-alkene, NHS ester, epoxide, isocyanate, and aldehyde-aminooxy). In some embodiments, the use of click chemistry to link RNA molecules is advantageous because click chemistry is rapid, modular, efficient, generally does not produce toxic waste products, can be performed with water as a solvent, and/or can be configured to have stereospecificity.

In some embodiments, the RNA segments can be linked using an azide-alkyne Hu Yisi cycloaddition (Huisgen Cycloaddition) reaction, which is typically a1, 3-dipolar cycloaddition between an azide and a terminal or internal alkyne, which yields a1, 2, 3-triazole for linking the RNA segments. Without wishing to be bound by theory, one advantage of this method of attachment may be that the reaction may be initiated by the addition of the desired Cu (I) ions. Other exemplary mechanisms by which RNA segments can be linked include, but are not limited to, using halogen (F-, br-, I-)/alkyne addition reactions, carbonyl/thiol/maleimide, and carboxyl/amine linkages. For example, one RNA molecule may be modified at 3 'with a thiol (using a disulfide imide and a universal support or disulfide modified support), and the other RNA molecule may be modified at 5' with acrydite (using phosphoramidite acrylate), and then the two RNA molecules may be joined by a Michael (Michael) addition reaction. This strategy can also be applied to stepwise ligating multiple RNA molecules. Methods for ligating more than two (e.g., three, four, five, six, etc.) RNA molecules to one another are also provided. Without wishing to be bound by theory, this may be useful when the desired RNA molecule is longer than about 40 nucleotides, e.g., such that the efficiency of chemical synthesis is reduced, e.g., as indicated in US 2016102322 A1 (which is incorporated herein by reference in its entirety).

For example, tracrRNA is typically about 80 nucleotides in length. Such RNA molecules may be produced, for example, by processes such as in vitro transcription or chemical synthesis. In some embodiments, when chemical synthesis is used to produce such RNA molecules, they may be produced as a single synthesis product or by ligating two or more synthesized RNA segments to each other. In embodiments, when three or more RNA segments are linked to each other, different methods may be used to link the individual segments together. Furthermore, RNA segments can be connected to each other in one pot (e.g., container, vessel, well, tube, plate, or other receptacle), all at the same time, or in one pot at different times, or in different pots at different times. In a non-limiting example, to assemble RNA segments 1, 2, and 3 in numerical order, RNA segments 1 and 2 may first be joined to each other from 5 'to 3'. The reaction mixture components of the reaction product may then be purified (e.g., by chromatography) and then placed in a second pot to ligate the 3 'end to the 5' end of RNA segment 3. The final reaction product may then be ligated to the 5' end of RNA segment 3.

In another non-limiting example, RNA segment 1 (about 30 nucleotides) is part of the target locus recognition sequence and hairpin region 1 of the crRNA. RNA segment 2 (about 35 nucleotides) contains the remainder of hairpin region 1 and some linear tracrRNA between hairpin region 1 and hairpin region 2. RNA segment 3 (about 35 nucleotides) contains the remainder of the linear tracrRNA between hairpin region 1 and hairpin region 2, as well as all hairpin region 2. In this example, RNA segments 2 and 3 are linked from 5 'to 3' using click chemistry. In addition, both the 5 'and 3' ends of the reaction product are phosphorylated. The reaction product is then contacted with RNA segment 1 and T4 RNA ligase having a 3' terminal hydroxyl group to produce a guide RNA molecule.

Many additional ligation chemistries can be used to ligate RNA segments according to the methods of the invention. Some of these chemicals are set forth in table 6 of US 2016102322 A1, which is incorporated herein by reference in its entirety.

Gene writers, e.g. thermostable Gene writers

While not wishing to be bound by theory, in certain embodiments, reverse transcription transposases that evolve in a cold environment may not function properly at human body temperature. This application provides a number of thermostable Gene writers, including proteins derived from avian retrotransposases. Exemplary avian transposase sequences in Table 3 include Peacock (broccoli; transposon name R2-1_TG), peacock (Peacock in the middle mouth; transposon name R2-1_Gfo), diphtheria tape (diphtheria sparrow; transposon name R2-1_ZA), and diphtheria(diphtheria bird; transposition)Sub-name R2-1_tgut).

Thermal stability can be measured, for example, by testing the ability of the Gene Writer to polymerize DNA in vitro at high temperatures (e.g., 37 ℃) and low temperatures (e.g., 25 ℃). Suitable conditions for determining in vitro DNA polymerization activity (e.g.processibility) are described, for example, in Bibillo and Eickbush, "High Processivity of the Reverse Transcriptase from a Non-long Terminal Repeat retrotransposon [ high productivity of reverse transcriptase for non-long terminal repeat retrotransposon ]" (2002) JBC 277,34836-34845. In some embodiments, the thermostable Gene Writer polypeptide has an activity at 37 ℃ such as DNA polymerization activity that is no less than 70%, 75%, 80%, 85%, 90% or 95% of its activity at 25 ℃ under otherwise similar conditions.

In some embodiments, the geneWriter polypeptide (e.g., a sequence of Table 1, 2, or 3 or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto) is stable in a subject selected from a mammal (e.g., a human) or a bird. In some embodiments, the geneWriter polypeptides described herein function at 37 ℃. In some embodiments, the geneWriter polypeptides described herein have higher activity at 37 ℃ than at lower temperatures, e.g., at 30 ℃, 25 ℃, or 20 ℃. In some embodiments, the geneWriter polypeptides described herein have higher activity in human cells than in zebra fish cells.

In some embodiments, the Gene Writer polypeptide is active in human cells cultured at 37 ℃, e.g., the assays of example 6 or example 7 herein.

In some embodiments, the determining comprises the steps of: (1) HEK293T cells were introduced into one or more wells of 6.4mm diameter at 10,000 cells/well, (2) the cells were incubated at 37℃for 24 hours, (3) a solution containing 0.5. Mu.l was providedHD transfection reagent and 80ng DNA (wherein the DNA is a plasmid which in sequence comprises (a) CMV promoter, (b) 100bp sequence homologous to 100bp upstream of the target site, (c) encoding a 5 'sequence which binds to the GeneWriter protein)' Sequences of the untranslated region, (d) sequences encoding the geneWriter protein, (e) sequences encoding the 3' untranslated region that binds to the geneWriter protein, (f) 100bp sequences homologous to 100bp downstream of the target site, and (g) BGH polyadenylation sequences), and 10 μl of the transfection mixture of Opti-MEM, and incubated for 15 minutes at room temperature, (4) adding the transfection mixture to the cells, (5) incubating the cells for 3 days, and (6) determining integration of the exogenous sequence into the target locus (e.g., rDNA) in the genome of the cells, e.g., wherein one or more of the foregoing steps are performed as described in example 6 herein.

In some embodiments, the geneWriter polypeptide results in insertion of a heterologous subject sequence (e.g., GFP gene) into a target locus (e.g., rDNA) at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4,0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4, or 5 copies per genome. In some embodiments, a cell described herein (e.g., a cell comprising a heterologous sequence at a target insertion site) comprises a heterologous subject sequence having an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4,0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4, or 5 copies/genome.

In some embodiments, geneWriter causes integration of sequences in target RNA with relatively few truncation events at the ends. For example, in some embodiments, the Gene Writer protein (e.g., the protein of SEQ ID NO: 1016) results in about 25% -100%,50% -100%,60% -100%,70% -100%,75% -95%,80% -90% or 86.17% of the integrants entering the target site being untruncated as measured by the assays described herein (e.g., the assays of examples 6 and 8). In some embodiments, the Gene Writer protein (e.g., the protein of SEQ ID NO: 1016) results in at least about 30%, 40%, 50%, 60%, 70%, 80% or 90% of the integrants that enter the target site being untruncated. As measured by the assays described herein. In some embodiments, the integrants are divided into truncated and non-truncated using an assay that comprises amplification using a forward primer that comprises 565bp from the end of the element (e.g., wild-type transposon sequence, such as plagiobrachria clathrata) and a reverse primer in genomic DNA (e.g., rDNA) at the target insertion site. In some embodiments, the number of full length integrants in the target insertion site is greater than the number of integrants truncated 300-565 nucleotides in the target insertion site, e.g., the number of full length integrants is at least 1.1x, 1.2x, 1.5x, 2x, 3x, 4x, 5x, 6x, 7x, 8x, 9x, or 10x of the number of truncated integrants, or the number of full length integrants is at least 1.1x-10x, 2x-10x, 3x-10x, or 5x-10x of the number of truncated integrants.

In some embodiments, the systems or methods described herein result in insertion of a heterologous subject sequence at only one target site in the genome of a target cell. Insertion may be measured, for example, using thresholds above 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, e.g., as described in example 8. In some embodiments, the systems or methods described herein result in insertion of a heterologous subject sequence, wherein less than 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 20%, 30%, 40%, or 50% of the insertion is at a site other than the target site, e.g., using an assay described herein, e.g., an assay of example 8.

In some embodiments, the systems or methods described herein result in a "traceless" insertion of a heterologous subject sequence, while in some embodiments, the target site may exhibit a deletion or duplication of endogenous DNA due to the insertion of the heterologous sequence. The mechanism of different retrotransposons may lead to different patterns of replication or deletion at the target site occurring during the reverse transcription in the host genome. In some embodiments, the system results in a traceless insertion with no duplication or deletion in surrounding genomic DNA. In some embodiments, the system results in less than 1, 2, 3, 4, 5, 10, 50, or 100bp of genomic DNA deleted upstream of the insertion. In some embodiments, the system results in less than 1, 2, 3, 4, 5, 10, 50, or 100bp of genomic DNA being deleted downstream of the insertion. In some embodiments, the system results in duplication of less than 1, 2, 3, 4, 5, 10, 50, or 100bp of genomic DNA upstream of the insertion. In some embodiments, the system results in duplication of less than 1, 2, 3, 4, 5, 10, 50, or 100bp of genomic DNA downstream of the insertion.

In some embodiments, a Gene Writer or DNA binding domain thereof described herein specifically binds to its target site, e.g., as measured using the assay of example 21. In some embodiments, the geneWriter or its DNA binding domain binds to its target site more strongly than to any other binding site in the human genome. For example, in some embodiments, in the assay of example 21, the target site represents more than 50%, 60%, 70%, 80%, 90% or 95% of the binding events of geneWriter or its DNA binding domain to human genomic DNA.

Genetic engineering, e.g. dimerized GeneWriter

Some non-LTR retrotransposons use two subunits to accomplish retrotransposition (Christensen et al PNAS [ Proc. Natl. Acad. Sci. USA ]]2006). In some embodiments, a retrotransposase as described herein comprises two linked subunits as a single polypeptide. For example, two wild-type reverse transcriptase enzymes may be linked by a linker to form a covalent "dimerized" protein (see FIG. 17). In some embodiments, the nucleic acid encoding a retrotransposase encodes two retrotransposase subunits expressed as a single polypeptide. In some embodiments, the subunits are linked by peptide linkers, as described herein in the section entitled "linkers," and, for example, in Chen et al Adv Drug Deliv Rev [ advanced drug delivery reviews ] ]2013. In some embodiments, two subunits in a polypeptide are linked by a rigid linker. In some embodiments, the rigid linker is composed of a motif (EAAAK) _n (SEQ ID NO: 1534). In other embodiments, the two subunits in the polypeptide are linked by a flexible linker. In some embodiments, the flexible linker is composed of a motif (Gly) _n Composition is prepared. In some embodiments, the flexible linker is composed of a motif (GGGGS) _n (SEQ ID NO: 1535). In some embodiments, the rigid or flexible linker consists of 1, 2, 3, 4, 5, 10, 15 or more amino acids in length to enable retrotransposition. In some embodiments, the linker consists of a combination of rigid and flexible linker motifs. In some embodiments, the Gene Writer polypeptide may comprise a linkerSuch as peptide linkers, e.g., as described in table 38. Table 38 provides linker sequences for increasing expression, stability and function of Gene Writer polypeptides comprising multiple functional domains.

TABLE 38 exemplary linker sequences

/>

Based on the mechanism, not all functions are required for the two retrotransposase subunits. In some embodiments, the fusion protein may consist of a fully functional subunit and a second subunit lacking one or more functional domains. In some embodiments, one subunit may lack reverse transcriptase function. In some embodiments, one subunit may lack a reverse transcriptase domain. In some embodiments, one subunit may have only endonuclease activity.

In some embodiments, the genewriters described herein have a covalent dimerization configuration, e.g., as shown in any of figures 17A-17F of PCT/US 2019/048607 (incorporated herein by reference). The proteins described are: fig. 17A: wild-type full-length enzyme. FIG. 17B, two full-length enzymes (each comprising a DNA binding domain, an RNA binding domain, a reverse transcriptase domain, and an endonuclease domain) joined by a linker. FIG. 17C, DNA binding domain and RNA binding domain connected to full-length enzyme by a linker. FIG. 17D, DNA binding domain and RNA binding domain connected to RNA binding domain, reverse transcriptase domain and endonuclease domain by a linker. FIG. 17E, DNA binding domain through the first linker connected to the RNA binding domain, the RNA binding domain through the second linker connected to the second RNA binding domain, reverse transcriptase domain and endonuclease domain. FIG. 17F, DNA binding domain through the first connector connected to RNA binding domain, the RNA binding domain through the second connector connected to a plurality of RNA binding domains (in the figure, the molecule contains three RNA binding domains), the plurality of RNA binding domains through the connector connected to the reverse transcriptase domain and endonuclease domain. In some embodiments, each R2 binds to a UTR in the template RNA. In some embodiments, at least one module comprises a reverse transcriptase domain and an endonuclease domain. In some embodiments, the protein comprises a plurality of RNA binding domains. In some embodiments, the modular system is split and active only when it binds to DNA, where the system uses two different DNA binding modules, e.g., a first protein (which comprises a first DNA binding module fused to an RNA binding module that recruits an RNA template for target-initiated reverse transcription) and a second protein (which comprises a second DNA binding module that binds at an integration site and fuses to a reverse transcription and endonuclease module). In some embodiments, the nucleic acid encoding the geneWriter comprises an intein such that the geneWriter protein is expressed from two separate genes and fused post-translationally by protein splicing. In some embodiments, the geneWriter is derived from a non-LTR protein, e.g., an R2 protein.

In some embodiments, one subunit may have only endonuclease domains. In some embodiments, the two subunits comprising a single polypeptide may provide complementary functions.

In some embodiments, one subunit may lack endonuclease functionality. In some embodiments, one subunit may lack an endonuclease domain. In some embodiments, one subunit may have only reverse transcriptase activity. In some embodiments, one subunit may have only a reverse transcriptase domain. In some embodiments, one subunit may have only DNA-dependent DNA synthesis functions.

And (3) joint:

in some embodiments, the domains of the compositions and systems described herein (e.g., endonuclease and reverse transcriptase domains of a polypeptide or DNA binding domains and reverse transcriptase domains of a polypeptide) can be linked by a linker. The compositions comprising a linker element described herein have the general form S1-L-S2, wherein S1 and S2 may be the same or different and represent two domain portions (e.g., each a polypeptide or nucleic acid domain) that are associated with each other by a linker. In some embodiments, a linker may connect two polypeptides. In some embodiments, a linker can connect two nucleic acid molecules. In some embodiments, the linker can connect the polypeptide and the nucleic acid molecule. The linker may be a chemical bond, such as one or more covalent or non-covalent bonds. The joint may be flexible, rigid and/or cleavable. In some embodiments, the linker is a peptide linker. Typically, the peptide linker is at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length, e.g., 2-50 amino acids in length and 2-30 amino acids in length.

The most commonly used flexible linkers have sequences consisting mainly of Gly and Ser residues ("GS" linkers). Flexible linkers may have domains for linking that require some degree of movement or interaction, and may include small, non-polar (e.g., gly) or polar (e.g., ser or Thr) amino acids. The incorporation of Ser or Thr can also maintain the stability of the linker in aqueous solution by forming hydrogen bonds with water molecules and thus reduce adverse interactions between the linker and other moieties. Examples of such linkers include those having the structure [ GGS ]] ^>1 Or [ GGGS ]] ^>1 (SEQ ID NO: 1536). Rigid linkers are useful for maintaining a fixed distance between the domains and maintaining their independent function. Rigid linkers may also be useful when spatial separation of the domains is critical to maintaining stability or biological activity of one or more components in the agent. Rigid steelThe sexual linker may have an alpha helix structure or a proline rich sequence (Pro-rich sequence), (XP) n, where X represents any amino acid, preferably Ala, lys or Glu. The cleavable linker may release the free functional domain in vivo. In some embodiments, the linker may be cleaved under specific conditions (e.g., in the presence of a reducing agent or protease). In vivo cleavable linkers may take advantage of the reversible nature of the disulfide bond. One example includes thrombin-sensitive sequences (e.g., PRSs) between two Cys residues. In vitro thrombin treatment of CPRSC (SEQ ID NO: 1537) resulted in cleavage of thrombin sensitive sequences, while reversible disulfide bonds remained intact. Such linkers are known and are described, for example, in Chen et al, 2013: characteristics, designs and functions. Adv Drug Deliv Rev [ advanced drug delivery comments ] ]65 (10) 1357-1369. In vivo cleavage of the linker in the compositions described herein may also be performed by proteases that are expressed in vivo under pathological conditions (e.g., cancer or inflammation), in specific cells or tissues, or within certain cellular compartments that are restricted. The specificity of many proteases provides for a slower cleavage of the linker in a restricted compartment.

In some embodiments, the amino acid linker is an endogenous amino acid (or is homologous thereto) that exists between such domains of the native polypeptide. In some embodiments, endogenous amino acids present between such domains are substituted, but the length is unchanged from the native length. In some embodiments, additional amino acid residues are added to the naturally occurring amino acid residues between domains.

In some embodiments, the amino acid linkers are computationally designed or screened to maximize protein function (Anad et al, FEBS Letters [ FEBS communication ],587:19, 2013).

In addition to being fully encoded on a single transcript, polypeptides may also be produced by expressing two or more polypeptide fragments of the reconstituted holoenzyme separately. In some embodiments, the Gene Writer polypeptide is produced by expression as an individual subunit of the holoenzyme reassembled through engineered protein-protein interactions. In some embodiments, the reconstitution of the holoenzyme does not involve covalent binding between subunits. Peptides may also be fused together by trans-splicing of inteins (Tornabene et al science transformation medicine [ Sci Transl Med 11], eaav4523 (2019)). In some embodiments, the Gene Writer holoenzyme is expressed as separate subunits designed to produce fusion proteins by the presence of split inteins in the subunits. In some embodiments, the Gene Writer holoenzyme is reconstituted by forming covalent bonds between subunits. In some embodiments, the protein subunits are reassembled by engineered protein-protein binding partners, such as SpyTag and SpyPatcher (Zaker et al PNAS [ Proc. Natl. Acad. Sci. USA ]109, E690-E697 (2012)). In some embodiments, the additional domains described herein, e.g., cas9 nickase, are expressed as separate polypeptides that are associated with the Gene Writer polypeptide by covalent or non-covalent interactions as described above. In some embodiments, breaking up the Gene Writer polypeptide into subunits may facilitate protein delivery by maintaining the nucleic acid encoding each portion within optimal packaging limits of a viral delivery vector (e.g., AAV) (Tornabene et al Sci trans l Med [ science transformation medical ]11, eaav4523 (2019)). In some embodiments, the Gene Writer polypeptide is designed to dimerize using covalent or non-covalent interactions as described above.

In contrast to other types of retrotranscription machinery (e.g., retroviral RT and LTR retrotransposons), reverse transcription in non-LTR retrotransposons (e.g., R2) is only performed on RNA templates that contain specific recognition sequences. R2 retrotransposase requires its template to contain the smallest 3' UTR region to initiate TPRT (Luan and Eickbush Mol Cell Biol [ molecular cell biology ]15,3882-91 (1995)). In some embodiments, the Gene Writer polypeptide is derived from a retrotransposase having the desired binding motif, and the template RNA is designed to contain the binding motif such that only specific retrotranspositions of the desired template are present (see, e.g., example 22). In some embodiments, the Gene Writer polypeptide is derived from a retrotransposon selected from table 3, and the 3'utr on the RNA template comprises a 3' utr from the same retrotransposon in table 3.

It is well known that some Mobile elements are capable of moving non-self elements, e.g., L1 retrotransposase promotes the movement of non-autonomous Alu and SVA elements in the human genome (Craig, mobile DNA III [ Mobile DNA III ], ASM, 3 rd edition (2105)). Recent studies have mapped various transposable elements present in the human genome, including the non-LTR retrotransposon (Kojima Mobile DNA [ mobile DNA ]9 (2018)). Given that active transposition in the human genome is associated with diseases, such as the role of LINE-1 retrotransposition in tumorigenesis (Rodriguez-Martin et al Nat Genet [ nature inheritance ] (2020)), it is desirable that the Gene Writer does not recognize and move transposons or pseudo elements. In some embodiments, the Gene Writer polypeptide does not cause any movement of endogenous human DNA. In some embodiments, the Gene Writer is derived from a retrotransposase that is not present in the human genome. In some embodiments, the Gene Writer (see, e.g., kojima Mobile DNA [ mobile DNA ]9 (2018)) derived from a retrotransposase present in the human genome is engineered to recognize a heterologous sequence in the template RNA and no longer recognize the native UTR of the parent retrotransposon, e.g., to have a heterologous RNA binding domain that is unrelated to the 3' UTR present in the human genome. In some embodiments, the Gene Writer comprises an RNA binding domain that does not recognize any sequence present in the human genome.

To optimize protein expression, it may be helpful to provide tunable control that can be used to regulate protein activity. In some embodiments, the tunable system may include at least one effector module responsive to at least one stimulus. The system may be, but is not limited to, a Destabilizing Domain (DD) system. This system is further taught in PCT/US2018/020704 and U.S. provisional patent application No. 62/320,864 filed on 11 of 2016, 62/466,596 filed on 3 of 2017, and international publication WO 2017/180587 (each of which is incorporated herein by reference in its entirety). In some embodiments, the tunable system may include a first effector submodule. In some embodiments, the effector module may include a first Stimulus Responsive Element (SRE) operably connected to at least one load. In one aspect, the load may be an immunotherapeutic agent. In one aspect, the first SRE of the composition may be responsive to or interact with at least one stimulus. In some embodiments, the first SRE may comprise a Destabilizing Domain (DD). DD can be derived from a parent protein or from a mutant protein having one, two, three or more amino acid mutations compared to the parent protein. In some embodiments, the parent protein may be selected from, but is not limited to, the human protein FKBP, comprising seq.id No.3 of PCT/US2018/020704 (incorporated herein by reference in its entirety); human DHFR (hDHFR) comprising the amino acid sequence of seq id No.2 of PCT/US2018/020704 (incorporated herein by reference in its entirety); coli DHFR comprising the amino acid sequence of seq id No.1 of PCT/US2018/020704 (incorporated herein by reference in its entirety); PDE5 comprising the amino acid sequence of SEQ ID NO.4 of PCT/US2018/020704 (incorporated herein by reference in its entirety); PPAR, γ comprising the amino acid sequence of seq id No.5 of PCT/US2018/020704 (incorporated herein by reference in its entirety); CA2 comprising the amino acid sequence of seq id No.6 of PCT/US2018/020704 (incorporated herein by reference in its entirety); or NQ02 comprising the amino acid sequence of seq id No.7 of PCT/US2018/020704 (incorporated herein by reference in its entirety). In some embodiments, adjustable controls are applied to the Gene Writer polypeptide, such that DD and stimulation can be used to modulate template integration efficiency, for example. In some embodiments, the adjustable controls are applied to one or more peptides encoded within the heterologous subject sequence of the template, such that DD and stimulation can be used, for example, to modulate the activity of the load of genomic integration. In certain embodiments, the load comprising the DD may be a functional copy of a therapeutic protein, such as an endogenous mutant gene. In certain embodiments, the load comprising the DD can be a heterologous protein, such as a CAR.

Gene Writers as used in the systems and methods provided herein ^TM May be provided as polypeptides or nucleic acids encoding them.

Nucleic acid characterization

The elements of the systems provided herein may be provided as nucleic acids, e.g., template nucleic acids (e.g., template RNAs), particularly as described in the claims and examples, and in certain embodiments, the nucleic acids encode Gene writers ^TM Nucleic acids of polypeptides (e.g., retrotransposases). In various embodiments, the nucleic acid is operably associated with additional genetic elements (e.g., one or more tissue-specific expression control sequences (e.g., tissue-specific promoters and tissue-specific microrna recognition sequences)) and additional elements (e.g., inverted repeat sequences (e.g., inverted terminal repeat sequences, such as elements from or derived from viruses, e.g., AAV ITRs) and tandem repeat sequences, regions of homology (segments having different degrees of homology to the target DNA), UTRs (5 ', 3', or 5 'and 3' UTRs)), and various combinations of the foregoing. The nucleic acid elements of the systems provided herein can be provided in a variety of topologies, including single-stranded, double-stranded, circular, linear with open ends, linear with closed ends, and specific forms of these, such as, for example, douggybone DNA (dbDNA), closed end DNA (ceDNA).

In certain particular embodiments, one or more tissue-specific expression control sequences refer to sequences of one or more of the following: table 3 of WO 2020014209 (incorporated herein by reference), the last column thereof (SEQ ID NO reference) is omitted; or table 4 of WO 2020014209 (incorporated herein by reference), the last column thereof (SEQ ID NO reference) is omitted.

In some embodiments, a nucleic acid described herein (e.g., a template nucleic acid or a template encoding a reverse transcriptase) comprises a promoter sequence, e.g., a tissue-specific promoter. In some embodiments, a tissue-specific promoter is used to increase Gene Writer ^TM Target cell specificity of the system. For example, promoters may be selected based on their activity in a target cell type but not in a non-target cell type (or at a lower level). Thus, even if the nucleic acid encoding the polypeptide is delivered into a non-target cell, it does not drive expression of the reverse transcriptase (or only drives low level expression), thereby limiting integration of the RNA template. Systems having tissue specific promoter sequences in the reverse transcriptase DNA can also be used in combination with microrna binding sites encoded in the reverse transcriptase DNA, for example, as described herein. Systems having tissue-specific promoter sequences in the reverse transcriptase DNA may also be used in combination with the inclusion of the group RNA templates of heterologous subject sequences driven by tissue-specific promoters are used in combination, for example, to achieve higher levels of integration and heterologous subject sequence expression in target cells than in non-target cells.

In some embodiments, a nucleic acid described herein (e.g., encoding Gene Writer ^TM RNA of the polypeptide, or DNA encoding the RNA, or template nucleic acid) comprises a microrna binding site. In some embodiments, microRNA binding sites are used to increase Gene writers ^TM Target cell specificity of the system. For example, the microrna binding site can be selected based on the recognition of a miRNA that is present in a non-target cell type but not in a target cell type (or at a reduced level relative to a non-target cell). Thus, when Gene Writer is encoded ^TM When RNA of the polypeptide is present in non-target cells, it will bind to miRNA, while encoding Gene Writer ^TM When the RNA of the polypeptide is present in the target cell, it will not bind to the miRNA (or bind, but at a reduced level relative to non-target cells). While not wishing to be bound by theory, mirnas and encoding Gene writers ^TM Binding of RNA of the polypeptide may reduce Gene Writer, for example, by degrading mRNA encoding the polypeptide or by interfering with translation ^TM Production of the polypeptide. Thus, the heterologous subject sequence will insert into the genome of the target cell more efficiently than the genome of the non-target cell. It is also possible to code Gene Writer ^TM The system of polypeptides (or encoded in DNA encoding RNA) having a microRNA binding site in the RNA is used in combination with a template RNA that is modulated by a second microRNA binding site, e.g., as described herein under the heading "Gene Writer ^TM Template component of the Gene editor System "described in the following.

In some embodiments, the nucleic acid component sequences (e.g., retrotransposases or heterologous subject sequences) of the systems provided herein are flanked by untranslated regions (UTRs) (sometimes referred to as UTRs) that modify the level of protein expression _exp ) (FIGS. 11 and 15, example 6). The effects of various 5 'and 3' UTRs on protein expression are known in the art. For example, in some embodiments, the coding sequence may be preceded by a 5' utr that modifies RNA stability or protein translation. In some embodimentsIn examples, the sequence may be followed by a 3' UTR that modifies RNA stability or translation. In some embodiments, the sequence may be preceded by a 5'utr, followed by a 3' utr that modifies RNA stability or translation. In some embodiments, the 5 'and/or 3' UTR may be selected from the group consisting of complement factor 3 (C3) (cactcctccccatcctctccctctgtccctctgtccctctgaccctgcactgtcccagcacc (SEQ ID NO: 1606)) or 5 'and 3' UTR of serum mucin 1 (ORM 1) (caggacacagccttggatcaggacagagacttgggggccatcctgcccctccaacccgacatgtgtacctcagctttttccctcacttgcatcaataaagcttctgtgtttggaacagctaa (SEQ ID NO: 1607)) and (Asrani et al RNA Biology [ ribonuclease ] ]2018). In certain embodiments, the 5'utr is a 5' utr from C3 and the 3'utr is a 3' utr from ORM 1.

In certain embodiments, the 5'utr and 3' utr for protein expression (e.g., gene Writer polypeptide or mRNA (or DNA encoding RNA) of a heterologous subject sequence) comprise optimized expression sequences. In some embodiments, the 5'UTR comprises GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC (SEQ ID NO: 1608) and/or the 3' UTR comprises UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA (SEQ ID NO: 1609), e.g., as described in Richner et al Cell [ Cell ]168 (6): P1114-1125 (2017), the sequences of which are incorporated herein by reference.

In some embodiments, the 5 'and/or 3' utrs may be selected to enhance protein expression. In some embodiments, the 5 'and/or 3' utrs may be selected to modify protein expression to minimize overproduction inhibition. In some embodiments, the UTR is around the coding sequence, e.g., outside the coding sequence and in other embodiments near the coding sequence. In some embodiments, additional regulatory elements (e.g., miRNA binding sites, cis-regulatory sites) are included in the UTR.

In some embodiments, the Open Reading Frame (ORF) of the Gene Writer system, e.g., the ORF of an mRNA encoding a Gene Writer polypeptide (or DNA encoding an mRNA) or one or more ORFs of an mRNA of a heterologous subject sequence (or DNA encoding an mRNA), is flanked by 5 'and/or 3' untranslated regions (UTRs) that enhance its expression. In some embodiments, the 5' UTR of the mRNA component of the system (or transcripts produced from the DNA component) comprises the sequence 5'-GGGAAAUAAGAGAGAAAA GAAGAGUAAGAAGAAAUAUAAGAGCCACC-3' (SEQ ID NO: 1610). In some embodiments, the 3' UTR of the mRNA component of the system (or transcripts produced from the DNA component) comprises the sequence 5'-UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUG CCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUAC CCCCGUGGUCUUUGAAUAAAGUCUGA-3' (SEQ ID NO: 1611). It has been demonstrated by the following that a combination of 5'UTR and 3' UTR can result in the desired expression of an operably linked ORF: richner et al Cell 168 (6): P1114-1125 (2017), the teachings and sequences of which are incorporated herein by reference. In some embodiments, the systems described herein comprise DNA encoding transcripts, wherein the DNA comprises corresponding 5'utr and 3' utr sequences, wherein T replaces U in the sequences listed above. In some embodiments, the DNA vector used to generate the RNA component of the system further comprises a promoter, such as a T7, T3, or SP6 promoter, upstream of the 5' utr used to initiate in vitro transcription. The above 5' UTR starts with GGG, which is a suitable start for optimizing transcription using T7 RNA polymerase. The teachings of Davidson et al Pac Symp Biocomput [ Pac Symp Biometrics ]433-443 (2010) describe T7 promoter variants and methods of discovery that meet both of these characteristics, with respect to adjusting transcription levels and altering transcription initiation site nucleotides to accommodate alternative 5' UTRs.

Cyclic RNA in Gene Writing System

Circular RNAs (circrnas) have been found to occur naturally in cells, and have been found to have different functions, including non-coding and protein-coding effects in human cells. It has been shown that circRNA can be engineered by incorporating self-splicing introns into RNA molecules (or DNA encoding RNA molecules), resulting in RNA circularization, and that engineered circRNA can have enhanced protein production and stability (Wesselhoeft et al Nature Communications [ Nature communication]2018). It is expected that it may be useful to employ a circular and/or linear RNA state during formulation, delivery, or Gene Writing reactions within target cells. Thus, in some embodiments of any aspect described herein, the Gene Writing system comprises one or more circular RNAs (circRNAs). In some embodiments of any aspect described herein, GThe ene Writing system comprises one or more linear RNAs. In some embodiments, a nucleic acid described herein (e.g., a nucleic acid molecule encoding a Gene Writer polypeptide, or both) is a circRNA. In some embodiments, the circular RNA molecule encodes a Gene Writer ^TM A polypeptide. In some embodiments, gene Writer will be encoded ^TM The circRNA molecule of the polypeptide is delivered to the host cell. In some embodiments, the circular RNA molecule encodes a recombinase, e.g., as described herein. In some embodiments, the circRNA molecule encoding the recombinase is delivered to a host cell. In some embodiments, the circRNA molecule encoding the Gene Writer polypeptide is linearized (e.g., in a host cell) prior to translation.

In some embodiments, the nucleic acid (e.g., encoding a Gene Writer polypeptide, or a template RNA, or both) is provided as a circRNA. In some embodiments, the circRNA comprises one or more ribozyme sequences. In some embodiments, the ribozyme sequence is activated for self-cleavage, e.g., in a host cell, e.g., resulting in linearization of the circRNA. In some embodiments, the ribozyme is activated when the concentration of magnesium reaches a sufficient level to cleave, for example, in a host cell. In some embodiments, the circRNA is maintained in a low magnesium environment prior to delivery to the host cell. In some embodiments, the ribozyme is a protein-reactive ribozyme. In some embodiments, the ribozyme is a nucleic acid-reactive ribozyme.

In some embodiments, the circRNA is linearized in the nucleus of the target cell. In some embodiments, linearization of the circRNA in the nucleus involves components present in the nucleus, e.g., to activate a cleavage event. For example, B2 and ALU retrotransposons contain self-cleaving ribozymes whose activity is enhanced by interaction with the multicartridge EZH2 (Hernandez et al PNAS [ Proc. Natl. Acad. Sci. USA ]117 (1): 415-425 (2020)). Thus, in some embodiments, a ribozyme (e.g., a ribozyme from a B2 or ALU element) that is reactive with a nuclear element (e.g., a nucleoprotein, e.g., a genomic interacting protein, e.g., an epigenetic modifier, e.g., EZH 2) is incorporated into the circRNA of, e.g., a Gene Writing system. In some embodiments, nuclear localization of the circRNA results in increased autocatalytic activity of the ribozyme and linearization of the circRNA.

In some embodiments, the inducible ribozyme (e.g., in the circRNA described herein) is synthetically produced, e.g., by utilizing a protein ligand-reactive aptamer design. Systems have been described that utilize satellite RNA for tobacco ringspot virus hammerhead ribozymes with MS2 coat protein aptamers (Kennedy et al Nucleic Acids Res [ nucleic acids Ind. ]42 (19): 12306-12321 (2014), which is incorporated herein by reference in its entirety), which result in activation of ribozyme activity in the presence of MS2 coat protein. In embodiments, such systems react to protein ligands that are localized to the cytoplasm or nucleus. In some embodiments, the protein ligand is not MS2. Methods for producing RNA aptamers to target ligands have been described, for example, based on ligand system evolution by exponential enrichment (SELEX) (Tuerk and Gold, science [ Science ]249 (4968): 505-510 (1990); ellington and Szostank, nature [ Nature ]346 (6287): 818-822 (1990); methods of each are incorporated herein by reference) and in some cases, help with computer-simulated designs (Bell et al PNAS [ Proc. Natl. Sci. U.S. ]117 (15): 8486-8493), the methods of which are incorporated herein by reference). Thus, in some embodiments, an aptamer for a target ligand is produced and incorporated into a synthetic ribozyme system, e.g., to initiate ribozyme-mediated cleavage and circRNA linearization, e.g., in the presence of a protein ligand. In some embodiments, circRNA linearization is initiated in the cytoplasm, e.g., using an aptamer associated with a ligand in the cytoplasm. In some embodiments, circRNA linearization is initiated in the nucleus, e.g., using an aptamer associated with a ligand in the nucleus. In embodiments, the ligand in the nucleus comprises an epigenetic modifier or transcription factor. In some embodiments, the ligand that triggers linearization is present in the mid-target cell at a level that is higher than the off-target cell.

It is also contemplated that the nucleic acid reactive ribozyme system may be used for circRNA linearization. For example, biosensors that sense a defined target nucleic acid molecule to trigger ribozyme activation are described in, for example, penchovsky (Biotechnology Advances [ Biotechnology progression ]32 (5): 1015-1027 (2014), incorporated herein by reference). By these methods, the ribozyme naturally folds into an inactive state and is activated only in the presence of a defined target nucleic acid molecule (e.g., an RNA molecule). In some embodiments, the circRNA of the Gene Writing system comprises a nucleic acid reactive ribozyme that is activated in the presence of a defined target nucleic acid (e.g., RNA, such as mRNA, miRNA, guide RNA, gRNA, sgRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA). In some embodiments, the nucleic acid that initiates linearization is present in the mid-target cell at a level that is higher than the off-target cell.

In some embodiments of any aspect herein, the Gene Writing system incorporates one or more ribozymes having inducible specificity for a target tissue or target cell of interest, e.g., ribozymes activated by ligands or nucleic acids present at higher levels in the target tissue or target cell of interest. In some embodiments, the Gene Writing system incorporates ribozymes having inducible specificity for subcellular compartments (e.g., nucleus, nucleolus, cytoplasm, or mitochondria). In some embodiments, the ribozyme is activated by a ligand or nucleic acid present at a higher level in the target subcellular compartment. In some embodiments, the RNA component of the Gene Writing system is provided as circRNA, for example, by linearization activation. In some embodiments, the linearized activating molecule of the circRNA encoding the Gene Writing polypeptide is translated. In some embodiments, the signal that activates the circRNA component of the Gene Writing system is present at a higher level in the in-target cells or tissues, e.g., such that the system is specifically activated in these cells.

In some embodiments, the RNA component of the Gene Writing system is provided as circRNA that is inactivated by linearization. In some embodiments, the circRNA encoding the Gene Writing polypeptide is inactivated by cleavage and degradation. In some embodiments, the circRNA encoding the Gene Writing polypeptide is inactivated by cleavage of the translation signal from the coding sequence of the polypeptide. In some embodiments, the signal that inactivates the circRNA component of the Gene Writing system is present at a higher level in the off-target cells or tissues, such that the system is specifically inactivated in these cells.

In some embodiments, the nucleic acid delivered to the cell (e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both) is covalently closed linear DNA, or so-called "dog ybone" DNA. During its life cycle, phage N15 converts its genome from circular plasmid DNA to linear plasmid DNA using prokaryotic telomerase (proteomerase) (Ravin et al J Mol Biol [ journal of molecular biology ] 2001). This method has been applied to the in vitro production of covalently closed linear DNA (see, e.g., WO 2010086626 A1). In some embodiments, a prokaryotic telomerase is contacted with DNA containing one or more prokaryotic telomerase recognition sites, wherein the prokaryotic telomerase results in cleavage at one or more sites and subsequent ligation of complementary strands of DNA, resulting in covalent ligation between the complementary strands. In some embodiments, the nucleic acid (e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both) is first generated as circular plasmid DNA containing a single prokaryotic telomerase recognition site, which is then contacted with a prokaryotic telomerase to produce covalently closed linear DNA. In some embodiments, a nucleic acid (e.g., encoding a transposase, or encoding a template RNA, or both) flanked by prokaryotic telomerase recognition sites on a plasmid or linear DNA is contacted with the prokaryotic telomerase to produce a covalently closed linear DNA containing only DNA contained between the protease recognition sites. In some embodiments, the method of flanking the desired nucleic acid sequence by a prokaryotic telomerase recognition site results in the covalent closure of circular DNA lacking plasmid elements for bacterial cloning and maintenance. In some embodiments, the plasmid or linear DNA comprising the nucleic acid and one or more prokaryotic telomerase recognition sites is optionally amplified prior to the prokaryotic telomerase reaction, for example by rolling circle amplification or PCR.

In some embodiments, the nucleic acid delivered to the cell (e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both) is a closed-ended linear duplex DNA (CELiD DNA or ceDNA). In some embodiments, the ceDNA is derived from a replicative form of the AAV genome (Li et al PLoS One [ public science library. Complex ] 2013). In some embodiments, the nucleic acid (e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both) is flanked by ITRs, e.g., AAV ITRs, wherein at least one ITR comprises a terminal dissociation site and a replication protein binding site (sometimes referred to as a replication-type protein binding site). In some embodiments, the ITRs are derived from an adeno-associated virus, e.g., AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof.

In some embodiments, the cendna vector consists of two self-complementary sequences, such as asymmetric or symmetric or substantially symmetric ITRs as defined herein, flanking the expression cassette, wherein the cendna vector is not associated with a capsid protein. In some embodiments, the cenna vector comprises two self-complementary sequences found in the AAV genome, wherein at least one ITR comprises an operable Rep Binding Element (RBE) (also sometimes referred to herein as "RBS") of the AAV and a terminal dissociation site (trs) or functional variant of the RBE. See, for example, WO 2019113310.

In some embodiments, the nucleic acid delivered to the cell (e.g., encoding the Gene Writer polypeptide, or encoding the template RNA, or both) is designed as a micro-loop, where it does not belong to Gene Writing ^TM The plasmid backbone sequences of (2) are removed prior to administration to the cells. Micro-loops have been demonstrated to achieve higher transfection efficiency and gene expression than plasmids whose backbone contains bacterial moieties (e.g., bacterial origins of replication, antibiotic selection cassettes), and have been used to increase transposition efficiency (Sharma et al Mol Ther Nucleic Acids [ molecular therapy-nucleic acid]2013). In some embodiments, gene writers are encoded ^TM DNA vector of polypeptide in micro-ring In the form of delivery. In some embodiments, gene writers are encoded ^TM The DNA vector of the template is delivered in a micro-circular form. In some embodiments, the bacterial portion is flanked by recombination sites, such as attP/attB, loxP, FRT sites. In some embodiments, the addition of homologous recombination enzymes can effect intramolecular recombination and excision of the bacterial moiety. In some embodiments, the recombinase site is recognized by a phiC31 recombinase. In some embodiments, the recombinase site is recognized by Cre recombinase. In some embodiments, the recombinase site is recognized by FLP recombinase. In addition to plasmid DNA, a micro-loop can also be created by excision of the desired construct (e.g., gene Writer polypeptide expression cassette or template RNA expression cassette) from the viral backbone. It has been previously shown that excision and cyclization of donor sequences from viral backbones may be important for transposase-mediated integration efficiency (Yant et al Nat Biotechnol [ Nature Biotechnology)]2002). In some embodiments, the microring is first formulated and then delivered to the target cells. In other embodiments, the micro-loops are formed in the cell by co-delivery of the recombinase by a DNA vector (e.g., plasmid DNA, rAAV, scAAV, ceDNA, dougybone DNA), resulting in a recombinase recognition site flanking nucleic acid (e.g., encoding geneWriter) ^TM Nucleic acid encoding a polypeptide, or nucleic acid encoding an RNA template, or both).

Viral vectors and components thereof

In addition to sources of related enzymes or domains described herein, for example, as sources of polymerases and polymerase functions (e.g., DNA-dependent DNA polymerase, RNA-dependent RNA polymerase, RNA-dependent DNA polymerase, DNA-dependent RNA polymerase, reverse transcriptase) used herein, viruses are useful sources of delivery vehicles for the systems described herein. Some enzymes, such as reverse transcriptase, may have multiple activities, such as being capable of RNA-dependent DNA polymerization and DNA-dependent DNA polymerization, such as first and second strand synthesis. In some embodiments, the virus used as a source of Gene Writer delivery system or components thereof may be selected from the group as described in Baltimore Bacteriol Rev [ bacterial overview ]35 (3): 235-241 (1971).

In some embodiments, the virus is selected from group I viruses, e.g., the virus is a DNA virus and the dsDNA is packaged into virions. In some embodiments, the group I virus is selected from, for example, adenovirus, herpes virus, poxvirus.

In some embodiments, the virus is selected from group II viruses, e.g., the virus is a DNA virus and ssDNA is packaged into virions. In some embodiments, the group II virus is selected from, for example, parvoviruses. In some embodiments, the parvovirus is a dependent parvovirus, such as an adeno-associated virus (AAV).

In some embodiments, the virus is selected from group III viruses, e.g., the virus is an RNA virus and the dsRNA is packaged into virions. In some embodiments, the group III virus is selected from, for example, reovirus. In some embodiments, one or both strands of the dsRNA contained in such virions are coding molecules that can be used directly as mRNA after transduction into a host cell, e.g., can be directly translated into protein after transduction into a host cell without any intervening nucleic acid replication or polymerization steps.

In some embodiments, the virus is selected from group IV viruses, e.g., the virus is an RNA virus and ssRNA (+) is packaged into virions. In some embodiments, the group IV virus is selected from, for example, coronaviruses, picornaviruses, and togaviruses. In some embodiments, ssrnas (+) contained in such virions are coding molecules that can be used directly as mRNA after transduction into a host cell, e.g., can be translated directly into protein after transduction into a host cell without any intervening nucleic acid replication or polymerization steps.

In some embodiments, the virus is selected from group V viruses, e.g., the virus is an RNA virus and ssRNA (-) is packaged into virions. In some embodiments, the group V virus is selected from, for example, orthomyxoviruses, rhabdoviruses. In some embodiments, the RNA virus having a ssRNA (-) genome also carries an enzyme in the virus that is transduced into a host cell having a viral genome, such as an RNA-dependent RNA polymerase, capable of copying ssRNA (-) to ssRNA (+) that can be directly translated by the host.

In some embodiments, the virus is selected from group VI viruses, e.g., the virus is a retrovirus and ssRNA (+) is packaged into virions. In some embodiments, the group VI virus is selected from, for example, retroviruses. In some embodiments, the retrovirus is a lentivirus, e.g., HIV-1, HIV-2, SIV, BIV. In some embodiments, the retrovirus is a foamy virus (spuavirus), such as foamy virus (foamy virus), such as HFV, SFV, BFV. In some embodiments, ssrnas (+) contained in such virions are coding molecules that can be used directly as mRNA after transduction into a host cell, e.g., can be translated directly into protein after transduction into a host cell without any intervening nucleic acid replication or polymerization steps. In some embodiments, ssRNA (+) is first reverse transcribed and copied to produce dsDNA genomic intermediates from which mRNA can be transcribed in the host cell. In some embodiments, RNA viruses having a ssRNA (+) genome also carry enzymes within the virus that are transduced into host cells having a viral genome, such as RNA-dependent DNA polymerases, capable of copying ssRNA (+) to dsDNA that can be transcribed into mRNA and translated by the host. In some embodiments, the reverse transcriptase from group VI retroviruses is incorporated as a reverse transcriptase domain of the Gene Writer polypeptide.

In some embodiments, the virus is selected from group VII viruses, e.g., the virus is a retrovirus and the dsRNA is packaged into virions. In some embodiments, the group VII virus is selected from, for example, hepadnaviruses. In some embodiments, one or both strands of the dsRNA contained in such virions are coding molecules that can be used directly as mRNA after transduction into a host cell, e.g., can be directly translated into protein after transduction into a host cell without any intervening nucleic acid replication or polymerization steps. In some embodiments, one or both strands of dsRNA contained in such virions are first reverse transcribed and copied to produce dsDNA genomic intermediates from which mRNA can be transcribed in a host cell. In some embodiments, RNA viruses having a dsRNA genome also carry an enzyme within the virus that is transduced into host cells having a viral genome, such as an RNA-dependent DNA polymerase, capable of copying the dsRNA into dsDNA that can be transcribed into mRNA and translated by the host. In some embodiments, the reverse transcriptase from group VII retroviruses is incorporated as the reverse transcriptase domain of the Gene Writer polypeptide.

In some embodiments, the virosomes used in the present invention for delivering nucleic acids may also carry enzymes involved in the Gene Writing process. For example, a retroviral virion can comprise a reverse transcriptase domain that is delivered to a host cell with nucleic acid. In some embodiments, the RNA template may be associated with a Gene Writer polypeptide in the virion, such that both are co-delivered to the target cell after transduction of the nucleic acid from the viral particle. In some embodiments, the nucleic acid in the virion can comprise DNA, e.g., linear ssDNA, linear dsDNA, circular ssDNA, circular dsDNA, microring DNA, dbDNA, ceDNA. In some embodiments, the nucleic acid in the virion can comprise RNA, e.g., linear ssRNA, linear dsRNA, circular ssRNA, circular dsRNA. In some embodiments, the viral genome may be circularized after transduction into a host cell, e.g., the linear ssRNA molecules may undergo covalent linkage to form a circular ssRNA, and the linear dsRNA molecules may undergo covalent linkage to form a circular dsRNA or one or more circular ssrnas. In some embodiments, the viral genome may be replicated by rolling circle replication in a host cell. In some embodiments, the viral genome may comprise a single nucleic acid molecule, e.g., comprise a non-segmented genome. In some embodiments, the viral genome may comprise two or more nucleic acid molecules, e.g., comprise a segmented genome. In some embodiments, the nucleic acid in the virion can be associated with one or a protein. In some embodiments, one or more proteins in the virion can be delivered to the host cell after transduction. In some embodiments, the native virus can be adapted for nucleic acid delivery by adding a virosome packaging signal to the target nucleic acid, wherein the host cell is used to package the target nucleic acid containing the packaging signal.

In some embodiments, the virosomes used as delivery vehicles may comprise commensal human viruses. In some embodiments, the virosomes used as delivery vehicles may comprise a finger ring virus, the use of which is described in WO 2018232017A1, which is incorporated herein by reference in its entirety.

Adeno-associated virus

In some embodiments, the virus is an adeno-associated virus (AAV). In some embodiments, the AAV genome comprises two genes encoding four replication proteins and three capsid proteins, respectively. In some embodiments, the gene is flanked on either side by 145-bp Inverted Terminal Repeats (ITRs). In some embodiments, the virion comprises up to three capsid proteins (Vp 1, vp2, and/or Vp 3) produced, for example, at a 1:1:10 ratio. In some embodiments, the capsid proteins are generated from the same open reading frame and/or differential splicing (Vp 1) and alternative translational start sites (Vp 2 and Vp3, respectively). Typically, vp3 is the most abundant subunit in virions and is involved in receptor recognition at the cell surface, defining the tropism of the virus. In some embodiments, vp1 comprises a phospholipase domain at the N-terminus of Vp1 that functions, for example, in viral infectivity.

In some embodiments, the packaging capabilities of the viral vector limit the size of the base editor that can be packaged into the vector. For example, the packaging capacity of an AAV may be about 4.5kb (e.g., about 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, or 6.0 kb), e.g., comprising one or two Inverted Terminal Repeats (ITRs), e.g., 145 base ITRs.

In some embodiments, the recombinant AAV (rAAV) comprises a cis-acting 145-bp ITR flanking the vector transgene cassette, e.g., providing up to 4.5kb for packaging of exogenous DNA. After infection, in some cases, the rAAV may express the fusion proteins of the invention and not integrate into the host genome by persisting as an episome of a circular head-to-tail concatemer. rAAV can be used, for example, in vitro and in vivo. In some embodiments, AAV-mediated gene delivery requires that the length of the coding sequence of the gene be equal to or greater in size than the wild-type AAV genome.

AAV delivery of genes exceeding this size and/or use of large physiological regulatory elements may be accomplished, for example, by dividing one or more proteins to be delivered into two or more fragments. In some embodiments, the N-terminal fragment is fused to a split intein-N. In some embodiments, the C-terminal fragment is fused to a split intein-C. In embodiments, the fragments are packaged into two or more AAV vectors.

In some embodiments, dual AAV vectors are generated by dividing a large transgenic expression cassette into two separate halves (5-and 3-ends, or head and tail), e.g., where each half of the cassette is packaged in a single AAV vector (which is <5 kb). In some embodiments, the reassembly of the full-length transgene expression cassette can then be achieved after co-infection of the same cell by two dual AAV vectors. In some embodiments, co-infection is followed by one or more of the following: (1) Homologous Recombination (HR) between the 5 and 3 genomes (dual AAV overlapping vectors); (2) 5 and 3 ITR-mediated tail-to-head cyclization of the genome (dual AAV trans-splicing vector); and/or (3) a combination of both mechanisms (dual AAV hybrid vector). In some embodiments, in vivo use of the dual AAV vector results in expression of the full length protein. In some embodiments, the use of a dual AAV vector platform represents an efficient and viable gene transfer strategy for transgenes greater than about 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, or 5.0 kb. In some embodiments, AAV vectors can also be used to transduce cells with target nucleic acids, for example, in the in vitro production of nucleic acids and peptides. In some embodiments, AAV vectors can be used in vivo and ex vivo gene therapy procedures (see, e.g., west et al, virology [ Virology ]160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641;Kotin,Human Gene Therapy [ human gene therapy ]5:793-801 (1994); muzyczka, J.Clin. Invest. [ journal of clinical research ]94:1351 (1994); each of which is incorporated herein by reference in its entirety). Construction of recombinant AAV vectors is described in many publications, including U.S. Pat. nos. 5,173,414; tratschn et al, mol.cell.biol. [ molecular cell biology ]5:3251-3260 (1985); tratschn, et al, mol.cell.biol. [ molecular cell biology ]4:2072-2081 (1984); hermonat and Muzyczka, PNAS [ Proc. Natl. Acad. Sci. USA ]81:6466-6470 (1984); and Samulski et al, J.Virol. [ J.Virol.63:03822-3828 (1989), which is incorporated herein by reference in its entirety).

In some embodiments, the Gene writers described herein (e.g., with or without one or more guide nucleic acids) can be delivered using AAV, lentivirus, adenovirus, or other plasmid or viral vector types, particularly using formulations and dosages from: for example, U.S. patent No. 8,454,972 (formulation, dose for adenovirus), U.S. patent No. 8,404,658 (formulation, dose for AAV) and U.S. patent No. 5,846,946 (formulation, dose for DNA plasmid) and publications from clinical trials and clinical trials involving lentiviruses, AAV and adenovirus. For example, for AAV, the route of administration, formulation, and dosage can be as described in U.S. patent No. 8,454,972 and clinical trials involving AAV. For adenoviruses, the route of administration, formulation and dosage can be as described in U.S. patent No. 8,404,658 and clinical trials involving adenoviruses. For plasmid delivery, the route of administration, formulation and dosage can be as described in U.S. patent No. 5,846,946 and clinical studies involving plasmids. Dosages may be based on or extrapolated to an average 70kg individual (e.g., a male adult) and may be adjusted for patients, subjects, mammals of different weights and species. The frequency of administration is within the scope of a medical or veterinary practitioner (e.g., physician, veterinarian) and depends on conventional factors including the age, sex, general health, other conditions of the patient or subject and the particular disorder or symptom being addressed. In some embodiments, the viral vector may be injected into the tissue of interest. For cell type-specific Gene Writing, in some embodiments, gene Writing and optionally directing expression of the nucleic acid may be driven by a cell type-specific promoter.

In some embodiments, AAV allows for low toxicity, for example, because the purification method does not require ultracentrifugation of cellular particles that can activate an immune response. In some embodiments, AAV allows for a low likelihood of causing insertional mutagenesis because, for example, it does not substantially integrate into the host genome.

In some embodiments, the AAV has a packaging limit of about 4.4, 4.5, 4.6, 4.7, or 4.75 kb. In some embodiments, the Gene Writer, promoter, and transcription terminator may be mated in a single viral vector. In some cases, spCas9 (4.1 kb) may be difficult to package into AAV. Thus, in some embodiments, a Gene Writer that is shorter in length than other Gene writers or base editors is used. In some embodiments, the Gene Writer is less than about 4.5kb, 4.4kb, 4.3kb, 4.2kb, 4.1kb, 4kb, 3.9kb, 3.8kb, 3.7kb, 3.6kb, 3.5kb, 3.4kb, 3.3kb, 3.2kb, 3.1kb, 3kb, 2.9kb, 2.8kb, 2.7kb, 2.6kb, 2.5kb, 2kb, or 1.5kb.

AAV may be AAV1, AAV2, AAV5, or any combination thereof. In some embodiments, the type of AAV is selected according to the cell to be targeted; for example, AAV serotypes 1, 2, 5 or heterozygous capsid AAV1, AAV2, AAV5, or any combination thereof, can be selected for targeting brain or neuronal cells; alternatively, AAV4 may be selected for targeting to heart tissue. In some embodiments, AAV8 is selected for delivery to the liver. Exemplary AAV serotypes for these cells are described, for example, in Grimm, d. Et al, j.virol. [ journal of virology ]82:5887-5911 (2008), which is incorporated herein by reference in its entirety. In some embodiments, AAV refers to all serotypes, subtypes, and naturally occurring AAV as well as recombinant AAV. AAV may be used to refer to the virus itself or derivatives thereof. In some embodiments, the AAV comprises AAV1, AAV2, AAV3B, AAV, AAV5, AAV6, AAV6.2, AAV7, aavrh.64rl, aavhu.37, aavrh.8, aavrh.32.33, AAV8, AAV9, AAV-DJ, AAV2/8, AAVrhlO, AAVLK03, AV10, AAV11, AAV 12, rhlO, and hybrids thereof, avian AAV, bovine AAV, canine AAV, equine AAV, primate AAV, non-primate AAV, and ovine AAV. Genomic sequences of various AAV serotypes, as well as the sequences of the natural Terminal Repeat (TR), rep proteins, and capsid subunits, are known in the art. Such sequences can be found in literature or public databases such as GenBank. Other exemplary AAV serotypes are listed in table 36.

Table 36 exemplary AAV serotypes.

In some embodiments, a pharmaceutical composition (e.g., comprising an AAV as described herein) has less than 10% empty capsids, less than 8% empty capsids, less than 7% empty capsids, less than 5% empty capsids, less than 3% empty capsids, or less than 1% empty capsids. In some embodiments, the pharmaceutical composition has less than about 5% empty capsids. In some embodiments, the number of empty capsids is below the detection limit. In some embodiments, it is advantageous for the pharmaceutical composition to have a small amount of empty capsids, as, for example, empty capsids may produce an adverse response (e.g., immune response, inflammatory response, liver response, and/or cardiac response) with little or no substantial therapeutic benefit, for example.

In some embodiments, the residual host cell protein (rHCP) in the pharmaceutical composition is less than or equal to 100ng/ml rHCP/1x10 ¹³ vg/ml, e.g., less than or equal to 40ng/ml rHCP/1X10 ¹³ vg/ml or 1-50ng/ml rHCP/1x10 ¹³ vg/ml. In some embodiments, the pharmaceutical composition comprises less than 10ng rHCP/l.0x10 ¹³ vg, or less than 5ng rHCP/1.0x10 ¹³ vg, less than 4ng rHCP/1.0x10 ¹³ vg, or less than 3ng rHCP/1.0x10 ¹³ vg, or any concentration in between. In some embodiments, the residual host cell DNA (hcna) in the pharmaceutical composition is less than or equal to 5x10 ⁶ pg/ml hcDNA/1x10 ¹³ vg/ml, less than or equal to 1.2x10 ⁶ pg/ml hcDNA/1x10 ¹³ vg/ml, or 1x10 ⁵ pg/ml hcDNA/1x10 ¹³ vg/ml. In some embodiments, the residual host cell DNA in the pharmaceutical composition is less than 5.0x10 ⁵ pg/1x10 ¹³ vg less than 2.0x10 ⁵ pg/l.0x10 ¹³ vg, less than 1.1x10 ⁵ pg/1.0x10 ¹³ vg, less than 1.0x10 ⁵ pg hcDNA/1.0x10 ¹³ vg, less than 0.9x10 ⁵ pg hcDNA/1.0x10 ¹³ vg, less than 0.8x10 ⁵ pg hcDNA/1.0x10 ¹³ vg, or any concentration in between.

In some embodiments, the residual plasmid DNA in the pharmaceutical composition is less than or equal to 1.7x10 ⁵ pg/ml/1.0x10 ¹³ vg/ml, or 1x10 ⁵ pg/ml/1x1.0x10 ¹³ vg/ml, or 1.7x10 ⁶ pg/ml/1.0x10 ¹³ vg/ml. In some embodiments, the residual DNA plasmid in the pharmaceutical composition is less than 10.0x10 ⁵ pg/1.0x10 ¹³ vg less than 8.0x10 ⁵ pg/1.0x10 ¹³ vg or less than 6.8x10 ⁵ pg/1.0x10 ¹³ vg. In embodiments, the pharmaceutical composition comprises less than 0.5ng/1.0x10 ¹³ vg, less than 0.3ng/1.0x10 ¹³ vg, less than 0.22ng/1.0x10 ¹³ vg or less than 0.2ng/1.0x10 ¹³ vg or any intermediate concentration of Bovine Serum Albumin (BSA). In embodiments, the omnipotent nuclease (benzonase) in the pharmaceutical composition is less than 0.2ng/1.0x10 ¹³ vg, less than 0.1ng/1.0x10 ¹³ vg, less than 0.09ng/1.0x10 ¹³ vg less than 0.08ng/1.0x10 ¹³ vg or any intermediate concentration. In embodiments, poloxamer 188 (Poloxamer 188) in the pharmaceutical composition is about 10 to 150ppm, about 15 to 100ppm, or about 20 to 80ppm. In embodiments, cesium in the pharmaceutical composition is less than 50pg/g (ppm), less than 30pg/g (ppm), or less than 20pg/g (ppm), or any intermediate concentration.

In embodiments, the pharmaceutical composition comprises less than 10%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, or any percentage in between, total impurities, e.g., as determined by SDS-PAGE. In embodiments, for example, the total purity is greater than 90%, greater than 92%, greater than 93%, greater than 94%, greater than 95%, greater than 96%, greater than 97%, greater than 98%, or any percentage in between, as determined by SDS-PAGE. In embodiments, for example, there is no single renamed relevant impurity of more than 5%, more than 4%, more than 3% or more than 2%, or any percentage in between, as measured by SDS-PAGE. In embodiments, the pharmaceutical composition comprises a percentage of filled capsids relative to total capsids (e.g., peak 1+peak 2 as measured by analytical ultracentrifugation) of greater than 85%, greater than 86%, greater than 87%, greater than 88%, greater than 89%, greater than 90%, greater than 91%, greater than 91.9%, greater than 92%, greater than 93%, or any percentage in between. In an embodiment of the pharmaceutical composition, the percentage of filled capsids measured in peak 1 by analytical ultracentrifugation is 20-80%, 25-75%, 30-75%, 35-75% or 37.4-70.3%. In an embodiment of the pharmaceutical composition, the percentage of filled capsids measured in peak 2 by analytical ultracentrifugation is 20% -80%, 20% -70%, 22% -65%, 24% -62%, or 24.9% -60.1%.

In one embodiment, the pharmaceutical composition comprises 1.0 to 5.0x10 ¹³ vg/mL, 1.2 to 3.0x10 ¹³ vg/mL or 1.7 to 2.3x10 ¹³ Genome titer of vg/ml. In one embodiment, the pharmaceutical composition exhibits a bioburden of less than 5CFU/mL, less than 4CFU/mL, less than 3CFU/mL, less than 2CFU/mL, or less than 1CFU/mL, or any intermediate concentration. In embodiments, according to USP, e.g. USP<85>The amount of endotoxin (incorporated by reference in its entirety) is less than 1.0EU/mL, less than 0.8EU/mL, or less than 0.75EU/mL. In embodiments, according to USP, e.g. USP<785>The osmolality of the pharmaceutical composition (incorporated by reference in its entirety) is 350 to 450mOsm/kg, 370 to 440mOsm/kg, or 390 to 430mOsm/kg. In embodiments, the pharmaceutical composition contains less than 1200 particles/containers greater than 25 μm, less than 1000 particles/containers greater than 25 μm, less than 500 particles/containers greater than 25 μm, or any intermediate value. In embodiments, the pharmaceutical composition contains less than 10,000 particles/containers greater than 10 μm, less than 8000 particles/containers greater than 10 μm, or less than 600 particles/containers greater than 10 pm.

In one embodiment, the pharmaceutical composition has a concentration of 0.5 to 5.0x10 ¹³ vg/mL, 1.0 to 4.0x10 ¹³ vg/mL, 1.5 to 3.0x10 ¹³ vg/ml or 1.7 to 2.3x10 ¹³ Genome titer of vg/ml. In one embodiment, the pharmaceutical composition described herein comprises one or more of the following: less than about 0.09ng of omnipotent nuclease/1.0x10 ¹³ vg, cesium less than about 30pg/g (ppm), poloxamer 188 about 20 to 80ppm, BSA less than about 0.22 ng/1.0x10 ¹³ vg, less than about 6.8x10 ⁵ pg residual DNA plasmid/1.0x10 ¹³ vg, less than about 1.1x10 ⁵ pg residual hcDNA/1.0x10 ¹³ vg, less than about 4ng rHCP/1.0x10 ¹³ vg, pH 7.7 to 8.3, about 390 to 430mOsm/kg, less than about 600>Particles/containers of 25 μm, less than about 6000 sizes>10 μm particles/container, about 1.7x10 ¹³ -2.3x10 ¹³ vg/mL genome titer, about 3.9x10 ⁸ To 8.4x10 ¹⁰ IU/1.0x10 ¹³ The infectious titer of vg is about 100-300pg/1.0x10 ¹³ vg total protein, at about 7.5x10 ¹³ A7SMA mice with vg/kg dose of viral vector>Average survival over 24 days, from about 70% to 130% relative efficacy and/or less than about 5% empty capsids, based on in vitro cell-based assays. In various embodiments, the pharmaceutical compositions described herein comprise any of the viral particles discussed herein that retain potency within ±20%, between ±15%, between ±10%, or within ±5% of a reference standard. In some embodiments, efficacy is measured using a suitable in vitro cell assay or in vivo animal model.

Additional methods of preparing, characterizing, and administering AAV particles are taught in WO 2019094253, which is incorporated herein by reference in its entirety.

Other rAAV constructs usable with the present invention include those described in Wang et al 2019, available at the following web sites: org/10.1038/s41573-019-0012-9, including Table 1, which is incorporated herein by reference in its entirety.

Intein peptides

In some embodiments, as described in more detail below, intein-N can be fused to the N-terminal portion of a first domain described herein, and intein-C can be fused to the C-terminal portion of a second domain described herein for linking the N-terminal portion to the C-terminal portion, thereby linking the first and second domains. In some embodiments, the first and second domains are each independently selected from the group consisting of a DNA binding domain, an RNA binding domain, an RT domain, and an endonuclease domain.

As used herein, "intein" refers to a self-spliced protein intron (e.g., a peptide), e.g., that links flanking N-terminal and C-terminal exons (e.g., fragments to be linked). In some cases, inteins may comprise fragments of a protein that are capable of self-excision and linking the remaining fragments (extein) to peptide bonds in a process called protein splicing. Inteins are also known as "protein introns". The process of intein self-excision and joining the remainder of the protein is referred to herein as "protein splicing" or "intein-mediated protein splicing". In some embodiments, the intein of the precursor protein (the protein containing the intein prior to intein-mediated protein splicing) is from two genes. Such inteins are referred to herein as split inteins (e.g., split intein-N and split intein-C). For example, in cyanobacteria, catalytic subunit a of DNA polymerase III (i.e., dnaE) is encoded by two separate genes, dnaE-n and dnaE-c. The intein encoded by the dnaE-N gene may be referred to herein as "intein-N". The intein encoded by the dnaE-C gene may be referred to herein as "intein-C".

The use of inteins to ligate heterologous protein fragments is described below: for example, wood et al, J.biol.chem. [ journal of biochemistry ]289 (21); 14512-9 (2014) (which is incorporated herein by reference in its entirety). For example, when fused to separate protein fragments, the inteins IntN and IntC can recognize each other, self-clip, and/or simultaneously ligate the flanking N-terminal and C-terminal exons of the protein fragments to which they are fused, thereby reconstructing the full-length protein from both protein fragments.

In some embodiments, synthetic inteins based on dnaE inteins are used, i.e., cfa-N (e.g., a split intein-N) and Cfa-C (e.g., a split intein-C) intein pairs. Examples of such inteins have been described in the following: for example, stevens et al, J Am Chem Soc. [ journal of American society of chemistry ]2016, 2 months, 24 days; 138 2162-5 (incorporated herein by reference in its entirety). Non-limiting examples of intein pairs that may be used according to the present disclosure include: cfa DnaE inteins, ssp gyrB inteins, ssp DnaX inteins, ter DnaE3 inteins, ter ThyX inteins, rma DnaB inteins, and Cne Prp8 inteins (e.g., as described in U.S. Pat. No. 8,394,604, incorporated herein by reference).

In some embodiments, intein-N and intein-C can be fused to the N-terminal portion of split Cas9 and the C-terminal portion of split Cas9, respectively, so as to join the N-terminal portion of split Cas9 and the C-terminal portion of split Cas 9. For example, in some embodiments, intein-N is fused to the C-terminus of the N-terminal portion of split Cas9, i.e., forms the structure of N- [ the N-terminal portion of split Cas9 ] - [ intein-N ] -C. In some embodiments, intein-C is fused to the N-terminus of the C-terminal portion of split Cas9, i.e., forms the structure of N- [ intein-C ] to [ C-terminal portion of split Cas9 ] -C. The intein-mediated protein splicing mechanism for joining proteins fused to inteins (e.g., split Cas 9) is described in the following: shah et al Chem Sci [ chemistry science ]2014;5 (l): 446-46l, which is incorporated herein by reference. Methods for designing and using inteins are known in the art and are described, for example, by WO 2020051561, W02014004336, WO 2017132580, US 20150344549, and US 20180127780, each of which is incorporated herein by reference in its entirety.

In some embodiments, fragmentation refers to separation into two or more fragments. In some embodiments, the split Cas9 protein or the split Cas9 comprises a Cas9 protein provided as an N-terminal fragment and a C-terminal fragment encoded by two separate nucleotide sequences. Polypeptides corresponding to the N-terminal and C-terminal portions of the Cas9 protein may be spliced to form a reconstituted Cas9 protein. In embodiments, the Cas9 protein is split into two fragments within the disordered region of the protein, e.g., as described in the following: nishimasu et al, cell [ Cell ], volume 156, stage 5, pages 935-949, 2014, or Jiang et al (2016) Science [ Science ]351:867-871 and PDB files: 5F9R (each of which is incorporated herein by reference in its entirety). Disordered regions may be determined by one or more protein structure determination techniques known in the art, including, but not limited to, X-ray crystallography, NMR spectroscopy, electron microscopy (e.g., cryoEM), and/or computer simulated protein modeling. In some embodiments, the protein is split into two fragments at any C, T, A, or S, within the region of SpCas9, e.g., between amino acids a292-G364, F445-K483, or E565-T637, or at corresponding positions in any other Cas9, cas9 variant (e.g., nCas9, dCas 9), or other napDNAbp. In other embodiments, the protein is split into two fragments at SpCas 9T 310, T313, a456, S469, or C574. In some embodiments, the process of dividing a protein into two fragments is referred to as cleavage of the protein.

In some embodiments, the length of the protein fragment ranges from about 2-1000 amino acids (e.g., between 2-10, 10-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, or 900-1000 amino acids). In some embodiments, the protein fragment ranges in length from about 5-500 amino acids (e.g., between 5-10, 10-50, 50-100, 100-200, 200-300, 300-400, or 400-500 amino acids). In some embodiments, the protein fragment ranges in length from about 20-200 amino acids (e.g., between 20-30, 30-40, 40-50, 50-100, or 100-200 amino acids).

In some embodiments, a portion or fragment of the Gene Writer (e.g., cas9-R2 Tg) is fused to an intein. The nuclease may be fused to the N-terminus or the C-terminus of the intein. In some embodiments, a portion or fragment of the fusion protein is fused to an intein and to an AAV capsid protein. Inteins, nucleases, and capsid proteins can be fused together in any arrangement (e.g., nuclease-intein-capsid, intein-nuclease-capsid, capsid-intein-nuclease, etc.). In some embodiments, the N-terminus of the intein is fused to the C-terminus of the fusion protein, and the C-terminus of the intein is fused to the N-terminus of the AAV capsid protein.

In some embodiments, an endonuclease domain (e.g., a nickase Cas9 domain) is fused to an intein-N, and a polypeptide comprising an RT domain is fused to an intein-C.

Exemplary nucleotide and amino acid sequences for inteins are provided below:

DnaE intein-N DNA:

TGCCTGTCATACGAAACCGAGATACTGACAGTAGAATATGGCCTTCTGCCAATCGGGAAGATTGTGGAGAAACGGATAGAATGCACAGTTTACTCTGTCGATAACAATGGTAACATTTATACTCAGCCAGTTGCCCAGTGGCACGACCGGGGAGAGCAGGAAGTATTCGAATACTGTCTGGAGGATGGAAGTCTCATTAGGGCCACTAAGGACCACAAATTTATGACAGTCGATGGCCAGATGCTGCCTATAGACGAAATCTTTGAGCGAGAGTTGGACCTCATGCGAGTTGACAACCTTCCTAAT(SEQ ID NO:1612)

DnaE intein-N protein:

CLSYETEILTVEYGLLPIGKIVEKRIECTVYSVDNNGNIYTQPVAQWHDRGEQEVFEYCLEDGSLIRATKDHKFMTVDGQMLPIDEIFERELDLMRVDNLPN (SEQ ID NO: 1613) DnaE intein-C DNA:

ATGATCAAGATAGCTACAAGGAAGTATCTTGGCAAACAAAACGTTTATGATATTGGAGTCGAAAGAGATCACAACTTTGCTCTGAAGAACGGATTCATAGCTTCTAAT(SEQ ID NO:1614)

intein-C:

MIKIATRKYLGKQNVYDIGVERDHNFALKNGFIASN(SEQ ID NO:1615)

Cfa-N DNA：

TGCCTGTCTTATGATACCGAGATACTTACCGTTGAATATGGCTTCTTGCCTATTGGAAAGATTGTCGAAGAGAGAATTGAATGCACAGTATATACTGTAGACAAGAATGGTTTCGTTTACACACAGCCCATTGCTCAATGGCACAATCGCGGCGAACAAGAAGTATTTGAGTACTGTCTCGAGGATGGAAGCATCATACGAGCAACTAAAGATCATAAATTCATGACCACTGACGGGCAGATGTTGCCAATAGATGAGATATTCGAGCGGGGCTTGGATCTCAAACAAGTGGATGGATTG CCA(SEQ ID NO:1616)

Cfa-N protein:

CLSYDTEILTVEYGFLPIGKIVEERIECTVYTVDKNGFVYTQPIAQWHNRGEQEVFEYCLEDGSIIRATKDHKFMTTDGQMLPIDEIFERGLDLKQVDGLP(SEQ ID NO:1617)

Cfa-C DNA：

ATGAAGAGGACTGCCGATGGATCAGAGTTTGAATCTCCCAAGAAGAAGAGGAAAGTAAAGATAATATCTCGAAAAAGTCTTGGTACCCAAAATGTCTATGATATTGGAGTGGAGAAAGATCACAACTTCCTTCTCAAGAACGGTCTCGTAGCCAGCAAC(SEQ ID NO:1618)

Cfa-C protein:

MKRTADGSEFESPKKKRKVKIISRKSLGTQNVYDIGVEKDHNFLLKNGLVASN(SEQ ID NO:1619)

lipid nanoparticles

The methods and systems provided herein may take any suitable carrier or delivery form, including in certain embodiments Lipid Nanoparticles (LNPs). In some embodiments, the lipid nanoparticle comprises one or more ionic lipids, such as non-cationic lipids (e.g., neutral or anionic or zwitterionic lipids); one or more conjugated lipids (such as PEG conjugated lipids described in table 5 of WO 2019217941 or lipids conjugated to polymers; which are incorporated herein by reference in their entirety); one or more sterols (e.g., cholesterol); and, optionally, one or more targeting molecules (e.g., conjugated receptors, receptor ligands, antibodies); or a combination of the foregoing.

Lipids that may be used to form the nanoparticles (e.g., lipid nanoparticles) include those described in table 4, e.g., WO 2019217941, which is incorporated by reference-e.g., lipid-containing nanoparticles may comprise one or more of the lipids in table 4 of WO 2019217941. The lipid nanoparticle may comprise further elements, such as polymers, for example polymers described in table 5 of WO 2019217941 (incorporated by reference).

In some embodiments, conjugated lipids, when present, may include one or more of the following: PEG-Diacylglycerols (DAG) (such as l- (monomethoxy-polyethylene glycol) -2, 3-dimyristoylglycerol (PEG-DMG)), PEG-Dialkoxypropyl (DAA), PEG-phospholipid, PEG-ceramide (Cer), pegylated phosphatidylethanolamine (PEG-PE), PEG succinic diacylglycerols (PEGs-DAG) (such as 4-0- (2 ',3' -di (tetradecanoyloxy) propyl-l-0- (w-methoxy (polyethoxy) ethyl) succinate (PEG-S-DMG)), PEG dialkoxypropyl carbamate, N- (carbonyl-methoxy polyethylene glycol 2000) -1, 2-distearoyl-sn-glycerol-3-phosphate ethanolamine sodium salt, as well as those described in table 2 of WO 2019051289 (incorporated by reference) and combinations of the foregoing.

In some embodiments, sterols that may be incorporated into the lipid nanoparticle include one or more of cholesterol or cholesterol derivatives, such as those in W02009/127060 or US 2010/013088, which are incorporated by reference. Additional exemplary sterols include plant sterols, including those described in Eygeris et al (2020), dx.doi.org/10.1021/acs.nanolet.0c01386, which are incorporated herein by reference.

In some embodiments, the lipid particles comprise an ionizable lipid, a non-cationic lipid, a conjugated lipid that inhibits aggregation of the particles, and a sterol. The amounts of these components may be varied independently to achieve the desired characteristics. For example, in some embodiments, the lipid nanoparticle comprises: an ionizable lipid in an amount of about 20mol% to about 90mol% of the total lipid (in other embodiments, it may be 20-70% (mol), 30-60% (mol), or 40-50% (mol); about 50mol% to about 90 mol%) of the total lipid present in the lipid nanoparticle; a non-cationic lipid in an amount of about 5mol% to about 30mol% of the total lipid; conjugated lipids in an amount of about 0.5mol% to about 20mol% of the total lipids, and sterols in an amount of about 20mol% to about 50mol% of the total lipids. The ratio of total lipid to nucleic acid (e.g., encoding Gene Writer or template nucleic acid) can be varied as desired. For example, the ratio of total lipid to nucleic acid (mass or weight) may be about 10:1 to about 30:1.

In some embodiments, the ionizable lipid may be a cationic lipid, an ionizable cationic lipid, such as a cationic lipid that may exist in a positively charged form or a neutral form depending on pH, or an amine-containing lipid that may be readily protonated. In some embodiments, the cationic lipid is a lipid that is capable of being positively charged, for example, under physiological conditions. Exemplary cationic lipids include one or more positively charged amine groups. In some embodiments, the lipid particles comprise cationic lipids formulated with neutral lipids, ionizable amine-containing lipids, biodegradable alkyne lipids, steroids, phospholipids including polyunsaturated lipids, structural lipids (e.g., sterols), PEG, cholesterol, and polymer conjugated lipids. In some embodiments, the cationic lipid may be an ionizable cationic lipid. Exemplary cationic lipids as disclosed herein may have an effective pKa of greater than 6.0. In embodiments, the lipid nanoparticle may comprise a second cationic lipid having an effective pKa different from (e.g., greater than) the first cationic lipid. The lipid nanoparticle may comprise 40mol% to 60mol% of a cationic lipid, a neutral lipid, a steroid, a polymer conjugated lipid, and a therapeutic agent, such as a nucleic acid (e.g., RNA) as described herein (e.g., a template nucleic acid or a nucleic acid encoding a Gene Writer), encapsulated within or associated with the lipid nanoparticle. In some embodiments, the nucleic acid is co-formulated with a cationic lipid. The nucleic acid may be adsorbed to the surface of an LNP (e.g., an LNP comprising a cationic lipid). In some embodiments, the nucleic acid can be encapsulated in an LNP (e.g., an LNP comprising a cationic lipid). In some embodiments, the lipid nanoparticle may comprise a targeting moiety, e.g., a targeting moiety coated with a targeting agent. In an embodiment, the LNP formulation is biodegradable. In some embodiments, lipid nanoparticles comprising one or more lipids described herein (e.g., formulas (i), (ii), (vii), and/or (ix)) encapsulate at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 92%, at least 95%, at least 97%, at least 98%, or 100% of an RNA molecule, e.g., a template RNA and/or mRNA encoding a Gene Writer polypeptide.

In some embodiments, the ratio of lipid to nucleic acid (mass/mass ratio; w/w ratio) may be in the following range: about 1:1 to about 25:1, about 10:1 to about 14:1, about 3:1 to about 15:1, about 4:1 to about 10:1, about 5:1 to about 9:1, or about 6:1 to about 9:1. The amounts of lipid and nucleic acid can be adjusted to provide a desired N/P ratio, such as an N/P ratio of 3, 4, 5, 6, 7,8, 9, 10 or higher. Typically, the total lipid content of the lipid nanoparticle formulation may range from about 5mg/mL to about 30 mg/mL.

Exemplary ionizable lipids that can be used in the lipid nanoparticle formulation include, but are not limited to, those listed in table 1 of WO 2019051289, which is incorporated by reference herein. Additional exemplary lipids include, but are not limited to, one or more of the following formulas: x of US 2016/0311759; i in US 20150376115 or US 2016/0376224; i, II or III of US 20160151284; i, IA, II or IIA of US 20170210967; i-c of US 20150140070; a of US 2013/0178541; US 2013/0303587 or US 2013/01233338; US 2015/0141678I; II, III, IV or V of US 2015/0239218; i of US 2017/019904; i or II of WO 2017/117528; a of US 2012/0149894; a of US 2015/0057373; a of WO 2013/116126; a of US 2013/0090372; a of US 2013/0274523; a of US 2013/0274504; a of US 2013/0053572; a of W02013/016058; a of W02012/162210; i of US 2008/042973; i, II, III or IV of US 2012/01287870; i or II of US 2014/0200257; i, II or III of US 2015/0203446; i or III of US 2015/0005363; i, IA, IB, IC, ID, II, IIA, IIB, IIC, IID or III-XXIV of US 2014/0308304; US 2013/0338210; i, II, III or IV of W02009/132131; a of US 2012/01011478; i or XXXV of US 2012/0027796; XIV or XVII of US 2012/0058144; US 2013/0323369; i of US 2011/017125; i, II or III of US 2011/0256175; i, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII of US 2012/0202871; i, II, III, IV, V, VI, VII, VIII, X, XII, XIII, XIV, XV or XVI of US 2011/0076335; i or II of US 2006/008378; US 2013/012338I; i or X-A-Y-Z of US 2015/0064242; XVI, XVII or XVIII of US 2013/0022649; i, II or III of US 2013/016307; i, II or III of US 2013/016307; i or II of US 2010/0062967; I-X of US 2013/0189351; i of US 2014/0039032; v of US 2018/0028664; i of US 2016/0317458; i of US 2013/0195920; 5, 6 or 10 of US 10,221,127; III-3 of WO 2018/081480; i-5 or I-8 of WO 2020/081938; 18 or 25 of US 9,867,888; a of US 2019/0136131; II of WO 2020/219876; 1 of US 2012/0027803; OF-02 OF US 2019/0240049; 23 of US 10,086,013; cKK-E12/A6 by Miao et al (2020); c12-200 of WO 2010/053572; 7C1 of Dahlman et al (2017); 304-O13 or 503-O13 of Whitehead et al; TS-P4C2 of US 9,708,628; i of WO 2020/106946; WO 2020/106946.

In some embodiments, the ionizable lipid is MC3 (6Z, 9Z,28Z,3 lZ) -heptadecane-6, 9,28,3 l-tetraen-l 9-yl-4- (dimethylamino) butyrate (DLin-MC 3-DMA or MC 3), e.g., as described in example 9 of WO 2019051289A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is lipid ATX-002, e.g., as described in example 10 of WO 2019051289A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is (l 3Z, l 6Z) -a, a-dimethyl-3-nonylbehenyl-l 3, l 6-dien-l-amine (compound 32), e.g., as described in example 11 of WO 2019051289A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is compound 6 or compound 22, e.g., as described in example 12 of WO 2019051289A9 (incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is heptadec-9-yl 8- ((2-hydroxyethyl) (6-oxo-6- (undecyloxy) hexyl) amino) octanoate (SM-102); for example as described in example 1 of US 9,867,888 (which is incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is 9z,12 z) -3- ((4, 4-bis (octyloxy) butanoyl) oxy) -2- (((((3- (diethylamino) propoxy) carbonyl) oxy) methyl) propyloctadeca-9, 12-dienoate (LP 01), for example, as synthesized in example 13 of WO 2015/095340 (which is incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is 9- ((4-dimethylamino) butyryl) oxy) heptadecanedioic acid di ((Z) -non-2-en-1-yl) ester (L319), e.g., as synthesized in example 7, example 8, or example 9 of US 2012/0027803 (which is incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is 1,1' - ((2- (4- (2- ((2- (bis (2-hydroxydodecyl) amino) ethyl) piperazin-1-yl) ethyl) azetidinediyl) bis (dodecane-2-ol) (C12-200), e.g., as synthesized in examples 14 and 16 of WO 2010/053572 (which is incorporated herein by reference in its entirety). In some embodiments, the ionizable lipid is; imidazol Cholesterol Ester (ICE) lipid (3 s,10R,13R, 17R) -10, 13-dimethyl-17- ((R) -6-methylhept-2-yl) -2,3,4,7,8,9,10,11,12,13,14,15,16,17-decatetrahydro-lH-cyclopenta [ a ] phenanthren-3-yl 3- (1H-imidazol-4-yl) propionate, for example, structure (I) from WO 2020/106946, which is incorporated herein by reference in its entirety.

Some non-limiting examples of lipid compounds that can be used (e.g., in combination with other lipid components) to form lipid nanoparticles for delivery of compositions described herein, such as nucleic acids (e.g., RNAs) described herein (e.g., template nucleic acids or nucleic acids encoding genewriters) include:

in some embodiments, LNP comprising formula (i) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

In some embodiments, LNP comprising formula (ii) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

/>

In some embodiments, LNP comprising formula (iii) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

In some embodiments, LNP comprising formula (v) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

In some embodiments, LNP comprising formula (vi) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

In some embodiments, LNP comprising formula (viii) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

In some embodiments, LNP comprising formula (ix) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

Wherein the method comprises the steps of

X ¹ Is O, NR ¹ Or a direct bond, X ² Is C2-5 alkylene, X ³ Is C (=0) or a direct bond, R ¹ Is H or Me, R ³ Is Ci-3 alkyl, R ² Is Ci-3 alkyl, or R ² To which nitrogen atom and X are attached ² Together 1-3 carbon atoms of (a) form a 4-, 5-or 6-membered ring, or X ¹ Is NR ¹ ，R ¹ And R is ² Together with the nitrogen atom to which they are attached form a 5-or 6-membered ring, or R ² And R is R ³ Together with the nitrogen atom to which they are attached form a 5-, 6-or 7-membered ring, Y ¹ Is C2-12 alkylene, Y ² Selected from the group consisting of

n is 0 to 3, R ⁴ Is Ci-15 alkyl, Z ¹ Is a Ci-6 alkylene group or a direct bond,

Z ² is that

(in either orientation) or absent, provided that if Z ¹ Is a direct bond, then Z ² Absence of;

R ⁵ is C5-9 alkyl or C6-10 alkoxy, R ⁶ Is C5-9 alkyl or C6-10 alkoxy, W is methylene or a direct bond, and R ⁷ H or Me, or salts thereof, provided that if R ³ And R is ² Is C2 alkyl, X ¹ Is O, X ² Is a linear C3 alkylene group, X ³ C (=0), Y ¹ Is a linear Ce alkylene group, (Y) ² )n-R ⁴ Is that

，R ⁴ Is a linear C5 alkyl group, Z ¹ Is C2 alkylene, Z ² Absent, W is methylene, and R ⁷ Is H, then R ⁵ And R is ⁶ Not Cx alkoxy.

In some embodiments, LNP comprising formula (xii) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

In some embodiments, LNP comprising formula (xi) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

Wherein->

In some embodiments, the LNP comprises a compound of formula (xiii) and a compound of formula (xiv).

In some embodiments, LNP comprising formula (xv) is used to deliver the GeneWriter compositions described herein to the liver and/or hepatocytes.

PEI ₆₀₀ Core(s)

/>

In some embodiments, LNP comprising a formulation of formula (xvi) is used to deliver the GeneWriter compositions described herein to lung endothelial cells.

Wherein->

In some embodiments, the lipid compound used to form the lipid nanoparticle for delivery of the compositions described herein, e.g., a nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding geneWriter), is prepared by one of the following reactions:

exemplary non-cationic lipids include, but are not limited to, distearoyl-sn-glycerophosphoryl ethanolamine, distearoyl phosphatidylcholine (DSPC), distearoyl phosphatidylcholine (DOPC), dipalmitoyl phosphatidylcholine (DPPC), distearoyl phosphatidylglycerol (DOPG), dipalmitoyl phosphatidylglycerol (DPPG), dioleoyl-phosphatidylethanolamine (DOPE), 1, 2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), palmitoyl Oleoyl Phosphatidylcholine (POPC), palmitoyl Oleoyl Phosphatidylethanolamine (POPE), distearoyl-phosphatidylethanolamine 4- (N-maleimidomethyl) -cyclohexane-1-formate (DOPE-mal), dipalmitoyl phosphatidylethanolamine (DPPE), dimyristoyl phosphatidylethanolamine (DMPE), distearoyl-phosphatidylethanolamine (DSPE), monomethyl-phosphatidylethanolamine (e.g., 16-O-monomethyl PE), dimethyl-phosphatidylethanolamine (e.g., 16-O-dimethyl PE), l 8-l-trans-stearoyl-phosphatidylethanolamine (hspa), l-stearoyl-2-phosphatidylethanolamine (DOPE), phosphatidylethanolamine (spp-phosphatidylethanolamine (pc), phosphatidylcholine (soc), phosphatidylcholine (soyl choline), phosphatidylcholine (soyl choline (soc), phosphatidylcholine (soyl choline (soyl), dimyristoyl phosphatidylglycerol (DMPG), distearoyl phosphatidylglycerol (DSPG), dithioyl phosphatidylcholine (DEPC), palmitoyl Oleoyl Phosphatidylglycerol (POPG), ditelapsiyl phosphatidylethanolamine (DEPE), lecithin, phosphatidylethanolamine, lysolecithin, lysophosphatidylethanolamine, phosphatidylserine, phosphatidylinositol, sphingomyelin, lecithin (ESM), cephalin, cardiolipin, phosphatidic acid, cerebroside, ditetradecylphosphoric acid, lysophosphatidylcholine, ditolyl phosphatidylcholine, or mixtures thereof. It should be understood that other diacyl phosphatidyl choline and diacyl phosphatidyl ethanolamine phospholipids may also be used. The acyl groups in these lipids are preferably acyl groups derived from fatty acids having a C10-C24 carbon chain, such as lauroyl, myristoyl, palmitoyl, stearoyl or oleoyl. In certain embodiments, additional exemplary lipids include, but are not limited to, those described in Kim et al (2020) dx.doi.org/10.1021/acs.nanolet.0c01386, which is incorporated herein by reference. In some embodiments, such lipids include plant lipids (e.g., DGTS) that were found to improve liver transfection with mRNA.

In some embodiments, the non-cationic lipid can have the following structure

Other examples of non-cationic lipids suitable for use in the lipid nanoparticle include, but are not limited to, non-phospholipids such as stearylamine, dodecylamine, hexadecylamine, acetyl palmitate, glyceryl ricinoleate, cetyl stearate, isopropyl myristate, amphoteric acrylic polymers, triethanolamine-lauryl sulfate, alkyl-aryl sulfate, polyethoxylated fatty acid amides, dioctadecyl dimethyl ammonium bromide, ceramides, sphingomyelin, and the like. Other non-cationic lipids are described in WO 2017/099823 or U.S. patent publication US 2018/0028664, the contents of which are incorporated herein by reference in their entirety.

In some embodiments, the non-cationic lipid is oleic acid or a compound of formula I, II or IV of US 2018/0028664, incorporated by reference in its entirety. The non-cationic lipids may comprise, for example, 0-30% (mole) of the total lipids present in the lipid nanoparticle. In some embodiments, the non-cationic lipid content is 5% -20% (mole) or 10% -15% (mole) of the total lipid present in the lipid nanoparticle. In embodiments, the molar ratio of ionizable lipid to neutral lipid is about 2:1 to about 8:1 (e.g., about 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, or 8:1).

In some embodiments, the lipid nanoparticle does not comprise any phospholipids.

In some aspects, the lipid nanoparticle may further comprise a component such as a sterol to provide membrane integrity. One exemplary sterol that can be used in the lipid nanoparticle is cholesterol and its derivatives. Non-limiting examples of cholesterol derivatives include polar analogues such as 5 a-cholestanol, 53-cholestanol, cholestanyl- (2, -hydroxy) -ethyl ether, cholestanyl- (4' -hydroxy) -butyl ether and 6-ketocholestanol; nonpolar analogs such as 5 a-cholestane, cholestenone, 5 a-cholestanone, 5 p-cholestanone, and cholesteryl decanoate; and mixtures thereof. In some embodiments, the cholesterol derivative is a polar analog, e.g., cholesteryl- (4' -hydroxy) -butyl ether. Exemplary cholesterol derivatives are described in PCT publication W02009/127060 and U.S. patent publication US 2010/013088, each of which is incorporated herein by reference in its entirety.

In some embodiments, the component that provides membrane integrity, such as sterols, may comprise 0-50% (mole) (e.g., 0-10%, 10% -20%, 20% -30%, 30% -40%, or 40% -50%) of the total lipids present in the lipid nanoparticle. In some embodiments, such components are 20% -50% (mole), 30% -40% (mole) of the total lipid content of the lipid nanoparticle.

In some embodiments, the lipid nanoparticle may comprise polyethylene glycol (PEG) or conjugated lipid molecules. Typically, these are used to inhibit aggregation of lipid nanoparticles and/or to provide steric stabilization. Exemplary conjugated lipids include, but are not limited to, PEG-lipid conjugates, polyoxazoline (POZ) -lipid conjugates, polyamide-lipid conjugates (such as ATTA-lipid conjugates), cationic Polymer Lipid (CPL) conjugates, and mixtures thereof. In some embodiments, the conjugated lipid molecule is a PEG-lipid conjugate, such as a (methoxypolyethylene glycol) conjugated lipid.

Exemplary PEG-lipid conjugates include, but are not limited to, PEG-Diacylglycerol (DAG) (such as l- (monomethoxy-polyethylene glycol) -2, 3-dimyristoylglycerol (PEG-DMG)), PEG-Dialkoxypropyl (DAA), PEG-phospholipid, PEG-ceramide (Cer), pegylated phosphatidylethanolamine (PEG-PE), PEG succinic diacylglycerol (PEGs-DAG) (such as 4-0- (2 ',3' -di (tetradecanoyloxy) propyl-l-0- (w-methoxy (polyethoxy) ethyl) succinate (PEG-S-DMG)), PEG dialkoxypropyl carbamate, N- (carbonyl-methoxy polyethylene glycol 2000) -l, additional exemplary PEG-lipid conjugates are described, for example, in U.S. Pat. No. 5,885,6l3, U.S. Pat. No. 6,287,59, U.S. Pat. No. 2003/007789, U.S. Pat. No. 2003/007829, U.S. Pat. No. 2005/0175682, U.S. Pat. No. 2008/0020058, U.S. Pat. No. 2011/017125, U.S. Pat. No. 2010/0130588, U.S. Pat. No. 2016/0376224, U.S. Pat. No. 2017/0119904, and U.S. Pat. No. 3/099823, all of which are incorporated herein by reference in their entirety, in some embodiments, the PEG-lipid is of formula III of U.S. Pat. No. 5,885,885, U.S. Pat. No. 5,59, U.S. No. 2003/007829, U.S. Pat. 1/017125, U.S. Pat. No. 5,2010/01joint, A compound of III-a-I, III-a-2, III-b-1, III-b-2, or V, the contents of which are incorporated herein by reference in their entirety. In some embodiments, the PEG-lipid has formula II of US 20150376115 or US 2016/0376224, the contents of both of which are incorporated herein by reference in their entirety. In some embodiments, the PEG-DAA conjugate can be, for example, PEG-dilauryloxypropyl, PEG-dimyristoxypropyl, PEG-dipalmitoxypropyl, or PEG-distearyloxy propyl. The PEG-lipid may be one or more of the following: PEG-DMG, PEG-dilauryl glycerol, PEG-dipalmitoyl glycerol, PEG-distearyl glycerol, PEG-dilauryl glycerolipid amide, PEG-dimyristoyl glycerolipid amide, PEG-dipalmitoyl glycerolipid amide, PEG-distearyl glycerolipid amide, PEG-cholesterol (l- [8' - (cholest-5-en-3 [ beta ] -oxy) carboxamide-3 ',6' -dioxaoctyl ] carbamoyl- [ omega ] -methyl-poly (ethylene glycol), PEG-DMB (3, 4-ditetraalkoxybenzyl- [ omega ] -methyl-poly (ethylene glycol) ether), and 1, 2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N- [ methoxy (polyethylene glycol) -2000] in some embodiments, PEG-lipid comprises PEG-DMG, 1, 2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N- [ methoxy (polyethylene glycol) -2000]. In some embodiments, PEG-lipid comprises a structure selected from:

In some embodiments, lipids conjugated to molecules other than PEG may also be used in place of PEG-lipids. For example, polyoxazoline (POZ) -lipid conjugates, polyamide-lipid conjugates (such as ATTA-lipid conjugates), and cationic polymer lipid (GPL) conjugates may be used in place of or in addition to PEG-lipids.

Exemplary conjugated lipids, namely PEG-lipids, (POZ) -lipid conjugates, ATTA-lipid conjugates, and cationic polymer-lipids are described in PCT and LIS patent applications listed in table 2 of WO 2019051289 A9 and WO 2020106946 A1, the contents of all of which are incorporated herein by reference in their entirety.

In some embodiments, the LNP comprises a compound of formula (xix), a compound of formula (xxi), and a compound of formula (xxv). In some embodiments, LNPs comprising formulations of formula (xix), formula (xxi), and formula (xxv) are used to deliver the GeneWriter compositions described herein to the lung or lung cells.

In some embodiments, the PEG or conjugated lipid may comprise 0-20% (mole) of the total lipid present in the lipid nanoparticle. In some embodiments, the PEG or conjugated lipid is present in an amount of 0.5% -10% or 2% -5% (mole) of the total lipid present in the lipid nanoparticle. The molar ratios of ionizable lipids, non-cationic lipids, sterols, and PEG/conjugated lipids can be varied as desired. For example, the lipid particle may comprise from 30% to 70% of the ionizable lipid by mole or total weight of the composition, from 0% to 60% cholesterol by mole or total weight of the composition, from 0% to 30% of the non-cationic lipid by mole or total weight of the composition, and from 1% to 10% conjugated lipid by mole or total weight of the composition. Preferably, the composition comprises 30% to 40% of ionizable lipids based on the moles or total weight of the composition, 40% to 50% of cholesterol based on the moles or total weight of the composition, and 10% to 20% of non-cationic lipids based on the moles or total weight of the composition. In some other embodiments, the composition is 50% -75% ionizable lipid by mole or total weight of the composition, 20% -40% cholesterol by mole or total weight of the composition, and 5% -10% non-cationic lipid by mole or total weight of the composition, and 1% -10% conjugated lipid by mole or total weight of the composition. The composition may contain 60% to 70% of ionizable lipids based on the moles or total weight of the composition, 25% to 35% of cholesterol based on the moles or total weight of the composition, and 5% to 10% of non-cationic lipids based on the moles or total weight of the composition. The composition may also contain up to 90% by mole or total weight of the composition of an ionizable lipid and from 2% to 15% by mole or total weight of the composition of a non-cationic lipid. The formulation may also be a lipid nanoparticle formulation, for example comprising 8% -30% of ionizable lipids, based on the moles or total weight of the composition, 5% -30% of non-cationic lipids, based on the moles or total weight of the composition, and 0-20% cholesterol, based on the moles or total weight of the composition; 4% -25% by mole or total weight of the composition of ionizable lipids, 4% -25% by mole or total weight of the composition of non-cationic lipids, 2% -25% by mole or total weight of the composition of cholesterol, 10% -35% by mole or total weight of the composition of conjugated lipids, and 5% by mole or total weight of the composition of cholesterol; or 2% -30% of ionizable lipids based on moles or total weight of the composition, 2% -30% of non-cationic lipids based on moles or total weight of the composition, 1% -15% of cholesterol based on moles or total weight of the composition, 2% -35% of conjugated lipids based on moles or total weight of the composition, and 1% -20% of cholesterol based on moles or total weight of the composition; or even up to 90% by moles or total weight of the composition of ionizable lipids and from 2% to 10% by moles or total weight of the composition of non-cationic lipids, or even 100% by moles or total weight of the composition of cationic lipids. In some embodiments, the lipid particle formulation comprises ionizable lipids, phospholipids, cholesterol, and pegylated lipids in a molar ratio of 50:10:38.5:1.5. In some other embodiments, the lipid particle formulation comprises ionizable lipids, cholesterol, and pegylated lipids in a molar ratio of 60:38.5:1.5.

In some embodiments, the lipid particles comprise an ionizable lipid, a non-cationic lipid (e.g., a phospholipid), a sterol (e.g., cholesterol), and a pegylated lipid, wherein the mole ratio of the lipid of the ionizable lipid is in the range of 20 to 70 mole percent, targeted at 40-60, the mole percent of the non-cationic lipid is in the range of 0 to 30, targeted at 0 to 15, the mole percent of the sterol is in the range of 20 to 70, targeted at 30 to 50, and the mole percent of the pegylated lipid is in the range of 1 to 6, targeted at 2 to 5.

In some embodiments, the lipid particle comprises ionizable lipid/non-cationic lipid/sterol/conjugated lipid in a molar ratio of 50:10:38.5:1.5.

In one aspect, the present disclosure provides lipid nanoparticle formulations comprising phospholipids, lecithins, phosphatidylcholines, and phosphatidylethanolamines.

In some embodiments, one or more additional compounds may also be included. Those compounds may be administered alone or additional compounds may be included in the lipid nanoparticles of the present invention. In other words, the lipid nanoparticle may contain other compounds than the first nucleic acid in addition to the nucleic acid or at least the second nucleic acid. Other additional compounds may be selected from the group consisting of, without limitation: small or large organic or inorganic molecules, monosaccharides, disaccharides, trisaccharides, oligosaccharides, polysaccharides, peptides, proteins, peptide analogs and derivatives thereof, peptidomimetics, nucleic acids, nucleic acid analogs and derivatives, extracts made from biological materials, or any combination thereof.

In some embodiments, the lipid nanoparticle (or formulation comprising the lipid nanoparticle) lacks reactive impurities (e.g., aldehydes or ketones), or comprises less than a preselected level of reactive impurities (e.g., aldehydes or ketones). While not wanting to be bound by theory, in some embodiments, lipid reagents are used to prepare lipid nanoparticle formulations, and the lipid reagents may comprise contaminating reactive impurities (e.g., aldehydes or ketones). The lipid reagent used for manufacture may be selected based on having less than a preselected level of reactive impurities (e.g., aldehydes or ketones). Without wishing to be bound by theory, in some embodiments, aldehydes may cause modification and damage to the RNA, e.g., cross-linking between bases and/or covalent conjugation of lipids to the RNA (e.g., formation of lipid-RNA adducts). In some cases, this may result in failure of the reverse transcriptase reaction and/or incorporation of inappropriate bases, e.g., mutations in the newly synthesized target DNA, at one or more sites of one or more lesions.

In some embodiments, the lipid nanoparticle formulation is produced using a lipid reagent comprising a total reactive impurity (e.g., aldehyde) content of less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%. In some embodiments, the lipid nanoparticle formulation is produced using a lipid reagent comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, the lipid nanoparticle formulation is produced using a lipid reagent comprising: (i) A total reactive impurity (e.g., aldehyde) content of less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, the lipid nanoparticle formulation is produced using a plurality of lipid agents, and each of the plurality of lipid agents independently meets one or more criteria described in this paragraph. In some embodiments, each of the plurality of lipid agents meets the same criteria, such as the criteria of this paragraph.

In some embodiments, the lipid nanoparticle formulation comprises a total reactive impurity (e.g., aldehyde) content of less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%. In some embodiments, the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, the lipid nanoparticle formulation comprises: (i) A total reactive impurity (e.g., aldehyde) content of less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

In some embodiments, one or more, or optionally all, of the lipid agents used in a lipid nanoparticle or formulation thereof as described herein comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content. In some embodiments, one or more, or optionally all, of the lipid reagents for a lipid nanoparticle or formulation thereof as described herein comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, one or more, or optionally all, of the lipid agents used in the lipid nanoparticles or formulations thereof described herein comprise: (i) A total reactive impurity (e.g., aldehyde) content of less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.

In some embodiments, the total aldehyde content and/or the amount of any single reactive impurity (e.g., aldehyde) species is determined by Liquid Chromatography (LC), e.g., in combination with tandem mass spectrometry (MS/MS), e.g., according to the method described in example 34. In some embodiments, the reactive impurity (e.g., aldehyde) content and/or the amount of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications of a nucleic acid molecule (e.g., RNA molecule, e.g., as described herein) associated with, for example, the presence of the reactive impurity (e.g., aldehyde) in a lipid reagent. In some embodiments, the reactive impurity (e.g., aldehyde) content and/or the amount of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications, e.g., of a nucleotide or nucleoside (e.g., ribonucleotide or ribonucleoside, e.g., comprised in or isolated from a template nucleic acid as described herein) associated with the presence of a reactive impurity (e.g., aldehyde) in, e.g., a lipid reagent, e.g., as described in example 35. In embodiments, chemical modification of a nucleic acid molecule, nucleotide or nucleoside is detected by determining the presence of one or more modified nucleotides or nucleosides, e.g., using LC-MS/MS analysis, e.g., as described in example 35.

In some embodiments, a nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a GeneWriter) comprises no aldehyde modification, or comprises less than a preselected amount of aldehyde modification. In some embodiments, the nucleic acid has fewer than 50, 20, 10, 5, 2, or 1 aldehyde modifications per 1000 nucleotides on average, e.g., where the single cross-linking of two nucleotides is a single aldehyde modification. In some embodiments, the aldehyde modification is an RNA adduct (e.g., a lipid-RNA adduct). In some embodiments, aldehyde modified nucleotides are crosslinked between bases. In some embodiments, a nucleic acid (e.g., RNA) described herein comprises fewer than 50, 20, 10, 5, 2, or 1 crosslinks between nucleotides.

In some embodiments, LNPs are directed to a specific tissue by adding a targeting domain. For example, biological ligands can be displayed on the surface of LNPs to enhance interaction with cells displaying cognate receptors, thereby facilitating association with and delivery of cargo to tissues in which the cells express the receptor. In some embodiments, the biological ligand may be a ligand that drives delivery to the liver, e.g., LNP displaying GalNAc facilitates delivery of the nucleic acid cargo to hepatocytes displaying asialoglycoprotein receptors (ASGPRs). Work by Akine et al Mol Ther [ molecular therapy ]18 (7): 1357-1364 (2010) teaches conjugation of trivalent GalNAc ligands to PEG-lipids (GalNAc-PEG-DSG) to produce an ASGPR dependent LNP to obtain an observable LNP carrier effect (see, e.g., FIG. 6). Other LNP formulations displaying ligands, such as formulations incorporating folic acid, transferrin or antibodies, are discussed in WO 2017223135, which is incorporated herein by reference in its entirety, and in addition the references used therein are also incorporated herein: namely, kolhatkar et al, curr Drug Discov Technol [ contemporary drug discovery technique ]. 20118:197-206; musacchio and Tochiclin, front Biosci [ bioscience Front ]2011 16:1388-1412; yu et al, mol Membrane Biol [ molecular Membrane biology ]201027:286-298; patil et al, crit Rev Ther Drug Carrier Syst [ important comments on therapeutic drug carrier systems ]. 2008:25:1-61; benoit et al, biomacromolecules [ biomacromolecule ].2011 12:2708-2714; zhao et al, expert Opin Drug Deliv [ drug delivery expert view ].2008 5:309-319; akine et al Mol Ther [ molecular therapy ].2010 18:1357-1364; srinivasan et al, methods Mol Biol [ Methods of molecular biology ]. 2012:820:105-116; ben-Arie et al, methods Mol Biol [ Methods of molecular biology ].2012 757:497-507; peer2010J Control Release [ journal of controlled release ].20:63-68; peer et al, proc Natl Acad Sci U SA [ Proc. Natl. Acad. Sci. USA ] 2007:4095-4100; kim et al, methods Mol Biol. [ Methods of molecular biology ] 2011:721:339-353; subramannya et al Mol Ther [ molecular therapy ].2010 18:2028-2037; song et al, nat Biotechnol. [ Nature Biotechnology ]200523:709-717; peer et al Science [ Science ]. 2008:319:627-630; and Peer and Lieberman, gene Ther [ Gene therapy ].2011 18:1127-1133.

In some embodiments, LNP is selected for tissue specific activity by adding a selective organ targeting (Selective ORgan Targeting, SORT) molecule to a formulation comprising traditional components, such as ionizable cationic lipids, amphiphilic phospholipids, cholesterol, and poly (ethylene glycol) (PEG). The teachings of Cheng et al Nat Nanotechnol 15 (4): 313-320 (2020) demonstrate that the addition of a complementary "SORT" component can precisely alter the in vivo RNA delivery profile and mediate tissue-specific (e.g., lung, liver, spleen) gene delivery and editing, depending on the percentage and biophysical properties of the SORT molecule.

In some embodiments, the LNP comprises biodegradable ionizable lipids. In some embodiments, the LNP comprises (9 z, l2 z) -3- ((4, 4-bis (octyloxy) butanoyl) oxy) -2- (((((3- (diethylamino) propoxy) carbonyl) oxy) methyl) propyloctadeca-9, l 2-dienoate, also known as 3- ((4, 4-bis (octyloxy) butanoyl) oxy) -2- (((((3- (diethylamino) propoxy) carbonyl) oxy) methyl) propyl (9 z, l2 z) -octadeca-9, l 2-dienoate) or another ionizable lipid. See, e.g., WO 2019/067992, WO/2017/173054, WO 2015/095340 and WO 2014/136086, and the lipids of the references provided therein. In some embodiments, the terms cationic and ionizable are interchangeable in the context of LNP lipids, e.g., wherein the ionizable lipid is cationic according to pH.

In some embodiments, multiple components of the Gene Writer system can be prepared as a single LNP formulation, e.g., an LNP formulation comprising mRNA and RNA templates encoding the Gene Writer polypeptide. The ratio of nucleic acid components can be varied to maximize the characteristics of the therapeutic agent. In some embodiments, the ratio of RNA template to mRNA encoding the Gene Writer polypeptide is about 1:1 to 100:1 on a molar ratio, e.g., about 1:1 to 20:1, about 20:1 to 40:1, about 40:1 to 60:1, about 60:1 to 80:1, or about 80:1 to 100:1. In other embodiments, a system of multiple nucleic acids may be prepared from separate formulations, e.g., one LNP formulation comprising a template RNA and a second LNP formulation comprising mRNA encoding a Gene Writer polypeptide. In some embodiments, the system may comprise more than two nucleic acid components formulated into the LNP. In some embodiments, the system can comprise a protein (e.g., a Gene Writer polypeptide) and a template RNA formulated into at least one LNP formulation.

In some embodiments, the mean LNP diameter of the LNP formulation may be between tens and hundreds of nm, as measured by Dynamic Light Scattering (DLS). In some embodiments, the mean LNP diameter of the LNP formulation can be about 40nm to about 150nm, such as about 40nm, 45nm, 50nm, 55nm, 60nm, 65nm, 70nm, 75nm, 80nm, 85nm, 90nm, 95nm, 100nm, 105nm, 110nm, 115nm, 120nm, 125nm, 130nm, 135nm, 140nm, 145nm, or 150nm. In some embodiments, the mean LNP diameter of the LNP formulation can be about 50nm to about 100nm, about 50nm to about 90nm, about 50nm to about 80nm, about 50nm to about 70nm, about 50nm to about 60nm, about 60nm to about 100nm, about 60nm to about 90nm, about 60nm to about 80nm, about 60nm to about 70nm, about 70nm to about 100nm, about 70nm to about 90nm, about 70nm to about 80nm, about 80nm to about 100nm, about 80nm to about 90nm, or about 90nm to about 100nm. In some embodiments, the mean LNP diameter of the LNP formulation may be about 70nm to about 100nm. In particular embodiments, the mean LNP diameter of the LNP formulation may be about 80nm. In some embodiments, the mean LNP diameter of the LNP formulation may be about 100nm. In some embodiments, the LNP formulation has an average LNP diameter ranging from about l mm to about 500mm, from about 5mm to about 200mm, from about 10mm to about 100mm, from about 20mm to about 80mm, from about 25mm to about 60mm, from about 30mm to about 55mm, from about 35mm to about 50mm, or from about 38mm to about 42mm.

In some cases, the LNP may be relatively homogeneous. The polydispersity index may be used to indicate the homogeneity of the LNP, e.g., the particle size distribution of the lipid nanoparticles. A small (e.g., less than 0.3) polydispersity index generally indicates a narrow particle size distribution. The polydispersity index of the LNP may be from about 0 to about 0.25, such as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, or 0.25. In some embodiments, the polydispersity index of the LNP may be from about 0.10 to about 0.20.

The zeta potential of the LNP can be used to indicate the electrokinetic potential of the composition. In some embodiments, the zeta potential may describe the surface charge of the LNP. Lipid nanoparticles having a relatively low charge (positive or negative) are generally desirable because higher charged species may undesirably interact with cells, tissues, and other elements in the body. In some embodiments, the zeta potential of the LNP may be from about-10 to about +20mV, from about-10 to about +15mV, from about-10 to about +10mV, from about-10 to about +5mV, from about-10 to about 0mV, from about-10 to about-5 mV, from about-5 to about +20mV, from about-5 to about +15mV, from about-5 to about +10mV, from about-5 to about +5mV, from about-5 to about 0mV, from about 0 to about +20mV, from about 0 to about +15mV, from about 0 to about +10mV, from about 0 to about +5mV, from about +5 to about +20mV, from about +5 to about +15mV, or from about +5 to about +10mV.

The encapsulation efficiency of a protein and/or nucleic acid (e.g., a Gene Writer polypeptide or mRNA encoding the polypeptide) describes the amount of protein and/or nucleic acid that is encapsulated or otherwise associated with an LNP after preparation relative to the initial amount provided. Encapsulation efficiency is desirably high (e.g., near 100%). Encapsulation efficiency may be measured, for example, by comparing the amount of protein or nucleic acid in a solution containing lipid nanoparticles before and after disruption of the lipid nanoparticles with one or more organic solvents or detergents. Anion exchange resins can be used to measure the amount of free protein or nucleic acid (e.g., RNA) in a solution. Fluorescence can be used to measure the amount of free protein and/or nucleic acid (e.g., RNA) in a solution. For the lipid nanoparticles described herein, the encapsulation efficiency of the protein and/or nucleic acid may be at least 50%, e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. In some embodiments, the encapsulation efficiency may be at least 80%. In some embodiments, the encapsulation efficiency may be at least 90%. In some embodiments, the encapsulation efficiency may be at least 95%.

The LNP may optionally comprise one or more coatings. In some embodiments, the LNP may be formulated in a capsule, film, or tablet with a coating. Capsules, films or tablets comprising the compositions described herein may have any useful size, tensile strength, hardness or density.

Additional exemplary lipids, formulations, methods, and LNP characterizations are taught by WO 2020061457, which is incorporated herein by reference in its entirety.

In some embodiments, in vitro or ex vivo cell lipofection is performed using Lipofectamine MessengerMax (sameir Fisher) or a TransIT-mRNA transfection reagent (Mi Lusi organism (Mirus Bio)). In certain embodiments, LNP is formulated using a GenVoy ILM ionizable lipid mixture (precision nanosystems (Precision NanoSystems)). In certain embodiments, LNPs are formulated using 2, 2-dioleyleneyl-4-dimethylaminoethyl- [1,3] -dioxolane (DLin-KC 2-DMA) or dioleylenemethyl-4-dimethylaminobutyrate (DLin-MC 3-DMA or MC 3), the formulation and in vivo use of which are taught in Jayaraman et al Angew Chem Int Ed Engl [ German application chemistry ]51 (34): 8529-8533 (2012), which is incorporated herein by reference in its entirety.

LNP formulations optimized for delivery of CRISPR-Cas systems (e.g., cas9-gRNA RNP, gRNA, cas9 mRNA) are described in WO 2019067992 and WO 2019067910, both incorporated by reference.

Additional specific LNP formulations useful for delivering nucleic acids are described in US 8158601 and US 8168775, both incorporated by reference, including the formulation used in patsiran (patsiran) sold under the name ontatro.

Exemplary administrations of the Gene Writer LNP can include about 0.1, 0.25, 0.3, 0.5, 1, 2, 3, 4, 5, 6, 8, 10, or 100mg/kg (RNA). Exemplary administration of an AAV comprising nucleic acid encoding one or more components of a system can include about 10 ¹¹ 、10 ¹² 、10 ¹³ And 10 ¹⁴ MOI of vg/kg.

^TM Template RNA component of Gene Writer Gene editor System

The Gene Writer system described herein can transcribe RNA sequence templates into host target DNA sites by target-initiated reverse transcription. By reverse transcribing the RNA sequence templates directly into the host genome to program one or more DNA sequences, the Gene Writer system can insert the subject sequences into the target genome without introducing exogenous DNA sequences into the host cell (unlike, for example, the CRISPR system) and eliminating exogenous DNA insertion steps. Thus, the Gene Writer system provides a platform that uses a custom RNA sequence template that contains the subject sequence, e.g., a sequence that contains heterologous Gene coding and/or functional information.

To describe the template RNA component of the Gene Writer system, the template RNA can be thought of as comprising modular parts (FIGS. 18-19). The Gene Writer template may include all or some of the illustrated sections, and modules may be combined, rearranged, and/or omitted. The modules shown are not intended to limit the potential elements contained in the templates, and additional components, such as 5 'and 3' end domains, can be easily envisioned to improve template stability.

Table 4. Modules contain typical Gene Writer RNA templates.

A = 5' homology arm; b=ribozyme; c=5' utr; d = heterologous subject sequence; e=3' utr; f=3' homology arm

In some embodiments, the template RNA encodes a Gene Writer protein in cis with the heterologous subject sequence. Various cis constructs are described, for example, in Kuroki-Kami et al (2019) Mobile DNA [ Mobile DNA ]10:23 (incorporated herein by reference in its entirety), and may be used in combination with any of the embodiments described herein. For example, in some embodiments, the template RNA comprises a heterologous subject sequence, a sequence encoding a Gene Writer protein (e.g., a protein comprising (i) a reverse transcriptase domain and (ii) an endonuclease domain, e.g., as described herein), a 5 'untranslated region, and a 3' untranslated region. The components may be included in a variety of orders. In some embodiments, the Gene Writer protein and heterologous subject sequences are encoded in different orientations (sense versus antisense), for example using the arrangement shown in FIG. 3A above by Kuroki-Kami et al. In some embodiments, the Gene Writer protein and the heterologous subject sequence are encoded in the same orientation. In some embodiments, the nucleic acid encoding the polypeptide and the template RNA or the nucleic acid encoding the template RNA is covalently linked, e.g., is part of a fusion nucleic acid and/or part of the same transcript. In some embodiments, the fusion nucleic acid comprises RNA or DNA.

In some cases, the nucleic acid encoding the Gene Writer polypeptide may be 5' to the heterologous subject sequence. For example, in some embodiments, the template RNA comprises, from 5 'to 3', a 5 'untranslated region, a sense encoded Gene Writer polypeptide, a sense encoded heterologous subject sequence, and a 3' untranslated region. In some embodiments, the template RNA comprises, from 5 'to 3', a 5 'untranslated region, a sense encoded Gene Writer polypeptide, an antisense encoded heterologous subject sequence, and a 3' untranslated region.

In some embodiments, the RNA further comprises homology to a DNA target site.

It will be appreciated that when the template RNA is described as comprising an open reading frame or its reverse complement, in some embodiments, the template RNA must first be converted to double stranded DNA (e.g., by reverse transcription), and then the open reading frame can be transcribed and translated.

In certain embodiments, custom RNA sequence templates can be identified, designed, engineered, and constructed to contain sequences that alter or specify host genome function, for example, by introducing heterologous coding regions into the genome; affecting or causing exon structure/alternative splicing; causing disruption of the endogenous gene; causing transcriptional activation of the endogenous gene; causing epigenetic regulation of endogenous DNA; causing up-or down-regulation of an operably linked gene, and so forth. In certain embodiments, the customized RNA sequence templates may be engineered to contain sequences encoding exons and/or transgenes, providing binding sites for transcription factor activators, repressors, enhancers, and the like, as well as combinations thereof. In other embodiments, the coding sequence may be further customized with a splice acceptor site, a poly a tail. In certain embodiments, the RNA sequence may comprise a sequence encoding an RNA sequence template homologous to an RLE transposase, engineered to comprise a heterologous coding sequence, or a combination thereof.

Template RNA may have some homology to the target DNA. In some embodiments, the template RNA has at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 175, 200, or more bases at the 3' end of the RNA that are fully homologous to the target DNA. In some embodiments, the template RNA has at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 175, 180, or 200 or more bases at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% homologous to the target DNA, for example, at the 5' end of the template RNA. In some embodiments, the template RNA has a 3' untranslated region derived from a non-LTR retrotransposon (e.g., a non-LTR retrotransposon described herein). In some embodiments, the template RNA has a 3 'region of at least 10, 15, 20, 25, 30, 40, 50, 60, 80, 100, 120, 140, 160, 180, 200, or more bases that is at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% homologous to a 3' sequence of a non-LTR retrotransposon (e.g., a non-LTR retrotransposon as described herein) (e.g., a non-LTR retrotransposon in table 1, 2, or 3). In some embodiments, the template RNA has a 5' untranslated region derived from a non-LTR retrotransposon (e.g., a non-LTR retrotransposon described herein). In some embodiments, the template RNA has a 5 'region of at least 10, 15, 20, 25, 30, 40, 50, 60, 80, 100, 120, 140, 160, 180, or 200 or more bases that is at least 40%, 50%, 60%, 70%, 80%, 90%, 95% or more homologous to a 5' sequence of a non-LTR retrotransposon (e.g., a non-LTR retrotransposon described herein) (e.g., a non-LTR retrotransposon described in table 2 or 3).

The template RNA component of the Gene Writer genome editing system described herein is generally capable of binding to the Gene Writer genome editing protein of the system. In some embodiments, the template RNA has a 3' region that is capable of binding to the Gene Writer genome editing protein. The binding region, e.g., the 3' region, may be a structured RNA region, e.g., having at least 1, 2, or 3 hairpin loops, which are capable of binding to the Gene Writer genome editing protein of the system.

The template RNA component of the Gene Writer genome editing system described herein is generally capable of binding to the Gene Writer genome editing protein of the system. In some embodiments, the template RNA has a 5' region that is capable of binding to the Gene Writer genome editing protein. The binding region, e.g., the 5' region, may be a structured RNA region, e.g., having at least 1, 2, or 3 hairpin loops, which are capable of binding to the Gene Writer genome editing protein of the system. In some embodiments, the 5' untranslated region comprises a pseudoknot, e.g., a pseudoknot that can bind to the Gene Writer protein.

In some embodiments, the template RNA (e.g., an untranslated region of a hairpin RNA, e.g., a 5' untranslated region) comprises a stem loop sequence. In some embodiments, the template RNA (e.g., an untranslated region of a hairpin RNA, e.g., a 5' untranslated region) comprises a hairpin. In some embodiments, the template RNA (e.g., an untranslated region of a hairpin RNA, e.g., a 5' untranslated region) comprises a helix. In some embodiments, the template RNA (e.g., an untranslated region of a hairpin RNA, e.g., a 5' untranslated region) comprises a pseudo-knot. In some embodiments, the template RNA comprises a ribozyme. In some embodiments, the ribozyme is similar to a Hepatitis Delta Virus (HDV) ribozyme, e.g., has a secondary structure similar to an HDV ribozyme and/or has one or more activities, e.g., self-cleaving activity, of an HDV ribozyme. See, e.g., eickbush et al, molecular and Cellular Biology [ molecular and cell biology ],2010,3142-3150.

In some embodiments, the template RNA (e.g., an untranslated region of a hairpin RNA, e.g., a 3' untranslated region) comprises one or more stem loops or helices. Exemplary structures of R2 3' UTRs are shown, for example, in Ruschak et al, "Secondary structure models of the 3'untranslated regions of diverse R2 RNAs [ model of secondary structure of 3' untranslated region of different R2 RNAs ]" RNA.2004, month 6; 10 978-987, e.g., in FIG. 3, and in Eikbush and Eikbush, "R2 and R2/R1 hybrid non-autonomous retrotransposons derived by internal deletions of full-length elements [ R2 and R2/R1 hybrid non-autonomous retrotransposons result from internal deletions of full length elements ]" Mobile DNA [ Mobile DNA ] (2012) 3:10; for example, at fig. 3 therein, these documents are incorporated herein by reference in their entirety.

In some embodiments, a template RNA described herein comprises a sequence capable of binding to a geneWriter protein described herein. For example, in some embodiments, the template RNA comprises an MS2 RNA sequence capable of binding to an MS2 coat protein sequence in a geneWriter protein. In some embodiments, the template RNA comprises an RNA sequence capable of binding to a B-box sequence. In some embodiments, the template RNA comprises an RNA sequence (e.g., crRNA sequence and/or tracrRNA sequence) capable of binding to a dCas sequence in a GeneWriter protein. In some embodiments, the template RNA is linked (e.g., covalently) to a non-RNA UTR, such as a protein or small molecule, in addition to or instead of the UTR.

In some embodiments, the template RNA has a poly a tail at the 3' end. In some embodiments, the template RNA does not have a poly a tail at the 3' end.

In some embodiments, the template RNA has a 5 'region of at least 10, 15, 20, 25, 30, 40, 50, 60, 80, 100, 120, 140, 160, 180, 200, or more bases homologous to a 5' sequence of a non-LTR retrotransposon (e.g., a non-LTR retrotransposon described herein) that is at least 40%, 50%, 60%, 70%, 80%, 90%, 95% or more.

The template RNA of the system typically comprises a subject sequence for insertion into the target DNA. The sequence of objects may be encoded or non-encoded.

In some embodiments, the systems or methods described herein comprise a single template RNA. In some embodiments, the systems or methods described herein comprise a plurality of template RNAs. In some embodiments, the DNA encoding the template is circularized by the activity of an enzyme, such as a recombinase, to increase the activity, as described in Yant et al, nature Biotechnology [ Nature Biotechnology ]20:990-1005,2002.

In some embodiments, the heterologous subject sequence may comprise an open reading frame. In some embodiments, the template RNA has a kozak sequence. In some embodiments, the template RNA has an internal ribosome entry site. In some embodiments, the template RNA has a self-cleaving peptide, e.g., a T2A or P2A site. In some embodiments, the template RNA has an initiation codon. In some embodiments, the template RNA has a splice acceptor site. In some embodiments, the template RNA has a splice donor site. Exemplary splice acceptors and splice donor sites are described in WO 2016044416, which is incorporated herein by reference in its entirety. Exemplary splice acceptor site sequences are known to those skilled in the art and include, by way of example only, CTGACCCTTCTCTCTCTCCCCCAGAG (SEQ ID NO: 1620) (from human HBB gene) and TTTCTCTCCCACAAG (SEQ ID NO: 1621) (from human immunoglobulin-gamma gene). In some embodiments, the template RNA has a microrna binding site downstream of the stop codon. In some embodiments, the template RNA has a poly a tail downstream of the stop codon of the open reading frame. In some embodiments, the template RNA comprises one or more exons. In some embodiments, the template RNA comprises one or more introns. In some embodiments, the template RNA comprises a eukaryotic transcription terminator. In some embodiments, the template RNA comprises an enhanced translation element or translation enhancing element. In some embodiments, the RNA comprises the R region of human T cell leukemia virus (HTLV-1). In some embodiments, the RNA comprises a post-transcriptional regulatory element that enhances nuclear export, such as a post-transcriptional regulatory element of hepatitis b virus (HPRE) or woodchuck hepatitis virus (WPRE). In some embodiments, in the template RNA, the heterologous subject sequence encodes a polypeptide and is encoded in an antisense orientation relative to the 5 'and 3' utrs. In some embodiments, in the template RNA, the heterologous subject sequence encodes a polypeptide and in a sense orientation relative to the 5 'and 3' utrs.

In some embodiments, a nucleic acid described herein (e.g., a template RNA or DNA encoding a template RNA) comprises a microrna binding site. In some embodiments, the microrna binding site is used to increase target cell specificity of the geneWriter system. For example, the microrna binding site can be selected based on the recognition of a miRNA that is present in a non-target cell type but not in a target cell type (or at a reduced level relative to a non-target cell). Thus, when the template RNA is present in a non-target cell, it will bind to the miRNA, whereas when the template RNA is present in a target cell, it will not bind to the miRNA (or bind, but bind at a reduced level relative to the non-target cell). While not wishing to be bound by theory, binding of the miRNA to the template RNA may interfere with insertion of the heterologous subject sequence into the genome. Thus, the heterologous subject sequence will insert into the genome of the target cell more efficiently than the genome of the non-target cell. The system having a microRNA binding site in the template RNA (or DNA encoding it) can also be used in combination with a nucleic acid encoding a Gene Writer polypeptide, wherein expression of the Gene Writer polypeptide is modulated by a second microRNA binding site, e.g., as described herein, e.g., in the section entitled "polypeptide component of the Gene Writer Gene editor system". In some embodiments, for example for liver indications, the miRNA is selected from table 4 of WO 2020014209.

In some embodiments, the sequence of objects may contain non-coding sequences. For example, the template RNA may comprise regulatory elements, e.g., promoter or enhancer sequences or miRNA binding sites. In some embodiments, integration of the subject sequence at the target site will result in up-regulation of the endogenous gene. In some embodiments, integration of the subject sequence at the target site will result in down-regulation of the endogenous gene. In some embodiments, the template RNA comprises a tissue-specific promoter or enhancer, each of which may be unidirectional or bidirectional. In some embodiments, the promoter is an RNA polymerase I promoter, an RNA polymerase II promoter, or an RNA polymerase III promoter. In some embodiments, the promoter comprises a TATA element. In some embodiments, the promoter comprises a B recognition element. In some embodiments, the promoter has one or more binding sites for a transcription factor. In some embodiments, the non-coding sequence is transcribed in an antisense orientation relative to the 5 'and 3' utrs. In some embodiments, the non-coding sequence is transcribed in the sense orientation relative to the 5 'and 3' utrs.

In some embodiments, a nucleic acid described herein (e.g., a template RNA or DNA encoding a template RNA) comprises a promoter sequence, such as a tissue-specific promoter sequence. In some embodiments, a tissue specific promoter is used to increase target cell specificity of the Gene Writer system. For example, promoters may be selected based on their activity in a target cell type but not in a non-target cell type (or at a lower level). Thus, even if the promoter is integrated into the genome of a non-target cell, it does not drive expression of the integrated gene (or only drives low levels of expression). As described herein, systems having tissue specific promoter sequences in the template RNA can also be used in combination with microrna binding sites (e.g., in the template RNA or nucleic acid encoding the Gene Writer protein). Systems having a tissue specific promoter sequence in the template RNA may also be used in combination with DNA encoding a Gene Writer polypeptide driven by a tissue specific promoter, e.g., to obtain higher levels of Gene Writer protein in target cells than in non-target cells. In some embodiments, for example, for liver indications, the tissue specific promoter is selected from table 3 of WO 2020014209 (which is incorporated herein by reference).

In some embodiments, the Gene Writer system, e.g., DNA encoding a Gene Writer polypeptide, DNA encoding a template RNA, or DNA or RNA encoding a heterologous subject sequence, is designed such that one or more elements are operably linked to a tissue-specific promoter, e.g., a promoter active in T cells. In further embodiments, the T cell active promoter is inactive in other cell types, e.g., B cells, NK cells. In some embodiments, the T cell active promoter is derived from a promoter of a gene encoding a T cell receptor component (e.g., TRAC, TRBC, TRGC, TRDC). In some embodiments, the T cell active promoter is derived from a promoter of a gene encoding a component of a T cell specific cluster of differentiation proteins (e.g., CD3, such as CD3D, CD3E, CD G, CD 3Z). In some embodiments, the T cell specific promoter in the Gene Writer system is found by comparing publicly available Gene expression data across cell types and selecting a promoter from genes with enhanced expression in T cells. In some embodiments, promoters may be selected according to the desired expression width, such as promoters active only in T cells, promoters active only in NK cells, promoters active in both T cells and NK cells.

In some embodiments, the template RNA comprises a microrna sequence, an siRNA sequence, a guide RNA sequence, a piwi RNA sequence.

In some embodiments, the template RNA comprises sites that coordinate epigenetic modifications. In some embodiments, the template RNA comprises elements that inhibit, e.g., prevent, epigenetic silencing. In some embodiments, the template RNA comprises a chromatin insulator. For example, the template RNA contains CTCF sites or sites targeted for DNA methylation.

To facilitate higher levels or more stable gene expression, the template RNA may include features that prevent or inhibit gene silencing. In some embodiments, these features prevent or inhibit DNA methylation. In some embodiments, these features promote DNA demethylation. In some embodiments, these features prevent or inhibit histone deacetylation. In some embodiments, these features prevent or inhibit histone methylation. In some embodiments, these features promote histone acetylation. In some embodiments, these features promote histone demethylation. In some embodiments, various features may be incorporated into the template RNA to facilitate one or more of these modifications. CpG dinucleotides are methylated by the host methyltransferase. In some embodiments, the template RNA lacks CpG dinucleotides, e.g., it does not contain CpG nucleotides or contains a reduced number of CpG dinucleotides compared to the corresponding unaltered sequence. In some embodiments, the promoter that drives transgene expression from the integrated DNA lacks CpG dinucleotides.

In some embodiments, the template RNA comprises a gene expression unit consisting of at least one regulatory region operably linked to an effector sequence. An effector sequence may be a sequence transcribed into RNA (e.g., a coding sequence or a non-coding sequence, e.g., a sequence encoding a microrna).

In some embodiments, the subject sequence of the template RNA is inserted into an endogenous intron of the target genome. In some embodiments, the subject sequence of the template RNA is inserted into the target genome, thereby acting as a new exon. In some embodiments, inserting the subject sequence into the target genome results in replacement of the native exon or skipping of the native exon.

In some embodiments, the subject sequence of the template RNA is inserted into a genomic safe harbor site of the target genome, e.g., AAVS1, CCR5, ROSA26, or albumin locus. In some embodiments, the CAR is integrated into the T cell receptor alpha constant (TRAC) locus using the Gene Writer (eyqm et al Nature 543,113-117 (2017)). In some embodiments, the CAR is integrated into the T cell receptor beta constant (TRBC) locus using a Gene Writer. Many other safe harbors have been identified by computational methods (Pellenz et al Hum Gen Ther [ human Gene therapy ]30,814-828 (2019)) and can be used for Gene Writer-mediated integration. In some embodiments, the subject sequence of the template RNA is added to an intergenic or intragenic region of the genome. In some embodiments, the subject sequence of the template RNA is added to within 0.1kb, 0.25kb, 0.5kb, 0.75 kb,1kb, 2kb, 3kb, 4kb,5kb, 7.5kb, 10kb, 15kb, 20kb, 25kb, 50, 75kb, or 100kb of the 5 'or 3' of the endogenous active gene of the genome. In some embodiments, the subject sequence of the template RNA is added within 0.1kb, 0.25kb, 0.5kb, 0.75 kb,1kb, 2kb, 3kb, 4kb,5kb, 7.5kb, 10kb, 15kb, 20kb, 25kb, 50, 75kb, or 100kb of the 5 'or 3' of the endogenous promoter or enhancer of the genome. In some embodiments, the subject sequence of the template RNA may be, for example, between 50 and 50,000 base pairs (e.g., between 50 and 40,000bp, between 500 and 30,000bp, between 500 and 20,000bp, between 100 and 15,000bp, between 500 and 10,000bp, between 50 and 5,000 bp. In some embodiments, the heterologous subject sequence is less than 1,000, 1,300, 1500, 2,000, 3,000, 4,000, 5,000, or 7,500 nucleotides in length.

In some embodiments, the genomic safe Harbor site is a Natural Harbor ^TM A site. In some embodiments, natural Harbor ^TM The site is ribosomal DNA (rDNA). In some embodiments, natural Harbor ^TM Site 5S rDNA, 18S rDNA, 5.8S rDNA or 28S rDNA. In some embodiments, natural Harbor ^TM The site is the Mutsu site in 5S rDNA. In some embodiments, natural Harbor ^TM The site is an R2 site, an R5 site, an R6 site, an R4 site, an R1 site, an R9 site or an RT site in 28S rDNA. In some embodiments, natural Harbor ^TM The site is the R8 site or the R7 site in 18S rDNA. In some embodiments, natural Harbor ^TM The site is DNA encoding transfer RNA (tRNA). In some embodiments, natural Harbor ^TM The site is DNA encoding tRNAasp or tRNA-Glu. In some embodiments, natural Harbor ^TM The site is DNA encoding a spliceosome RNA. In some embodiments, natural Harbor ^TM The site is DNA encoding a small nuclear RNA (snRNA) such as U2 snRNA.

Thus, in some aspects, the present disclosure provides for inserting heterologous subject sequences into Natural harbors ^TM Site methods. In some embodiments, the method comprises using the GeneWriter system described herein, e.g., using a polypeptide of any of tables 1-3 or a polypeptide having sequence similarity thereto, e.g., at least 80%, 85%, 90% or 95% identity thereto. In some embodiments, the method comprises inserting a heterologous subject sequence into a Natural Harbor using an enzyme, such as a retrotransposase ^TM A site. In some aspects, the disclosure provides a host human cell comprising a Natural Harbor located in the cell genome ^TM A heterologous subject sequence (e.g., a sequence encoding a therapeutic polypeptide) to the site. In some embodiments, natural Harbor ^TM The sites are those described in table 5 below. In some embodiments, the heterologous subject sequence is inserted within 20, 50, 100, 150, 200, 250, 500, or 1000 base pairs of the sequence shown in table 5. In some embodiments, the heterologous subject sequence is inserted within 0.1kb, 0.25kb, 0.5kb, 0.75kb, 1kb, 2kb, 3kb, 4kb, 5kb, 7.5kb, 10kb, 15kb, 20kb, 25kb, 50, 75kb, or 100kb of the sequences shown in Table 5. In some embodiments, a heterologous subject sequence is inserted having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to a sequence set forth in table 5A site. In some embodiments, the heterologous subject sequence is inserted within 20, 50, 100, 150, 200, 250, 500, or 1000 base pairs, or within 0.1kb, 0.25kb, 0.5kb, 0.75kb, 1kb, 2kb, 3kb, 4kb, 5kb, 7.5kb, 10kb, 15kb, 20kb, 25kb, 50, 75kb, or 100kb of a site having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to the sequence shown in table 5. In some embodiments, the heterologous subject sequence is inserted within a gene shown in column 5 of Table 5, or within 20, 50, 100, 150, 200, 250, 500, or 1000 base pairs of the gene, or within 0.1kb, 0.25kb, 0.5kb, 0.75kb, 1kb, 2kb, 3kb, 4kb, 5kb, 7.5kb, 10kb, 15kb, 20kb, 25kb, 50, 75kb, or 100 kb.

Table 5 Natural Harbor ^TM A site. Column 1 indicates insertion of Natural Harbor ^TM Retrotransposons at the site. Column 2 indicates the Natural Harbor ^TM A gene at the locus. Columns 3 and 4 show exemplary human genomic sequences (e.g., 250 bp) 5 'and 3' of the insertion site. Columns 5 and 6 list example gene symbols and corresponding gene IDs.

Table 5: exemplary Natural Harbor ^TM Site(s)

/>

In some embodiments, the systems or methods described herein result in insertion of a heterologous sequence into a target site in the human genome. In some embodiments, the target site in the human genome has sequence similarity to a corresponding target site in a corresponding wild-type retrotransposase in the genome of a native organism (e.g., a retrotransposase from which geneWriter is derived). For example, in some embodiments, the identity between 40 nucleotides of the human genomic sequence centered at the insertion site and 40 nucleotides of the genomic sequence of the natural organism centered at the insertion site is less than 99.5%, 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 60% or 50%, or 50% -60%, 60% -70%, 70% -80%, 80% -90% or 90-100%. In some embodiments, the identity between 100 nucleotides of the human genomic sequence centered at the insertion site and 100 nucleotides of the genomic sequence of the natural organism centered at the insertion site is less than 99.5%, 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 60% or 50%, or 50% -60%, 60% -70%, 70% -80%, 80% -90% or 90-100%. In some embodiments, the identity between 500 nucleotides of the human genomic sequence centered at the insertion site and 500 nucleotides of the genomic sequence of the natural organism centered at the insertion site is less than 99.5%, 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 60% or 50%, or 50% -60%, 60% -70%, 70% -80%, 80% -90% or 90-100%.

Additional template functionality

In some embodiments, the template (e.g., template RNA) comprises certain structural features, e.g., computer-determined structural features. In embodiments, the template RNA is expected to have a minimum energy structure of-280 to-480 kcal/mol (e.g., -280 to-300, -300 to-350, -350 to-400, -400 to-450, or-450 to-480 kcal/mol), e.g., as measured by RNAstructure, such as Turner and Mathews Nucleic Acids Res [ nucleic acids Ind. 38:D280-282 (2009) (incorporated herein by reference in its entirety).

In some embodiments, the template (e.g., template RNA) comprises certain structural features, e.g., structural features determined in vitro. In embodiments, the template RNA is sequence optimized, e.g., to reduce secondary structure, as determined in vitro, e.g., by SHAPE-MaP (e.g., as described in Siegfred et al Nat Methods 11:959-965 (2014), which is incorporated herein by reference in its entirety). In some embodiments, the template (e.g., template RNA) comprises certain structural features, e.g., structural features determined in the cell. In embodiments, the template RNA is sequence optimized, e.g., to reduce secondary structure measured in the cell, e.g., by DMS-MaPseq (e.g., as described in Zubradt et al Nat Methods [ Nature methodologies ]14:75-82 (2017); which is incorporated herein by reference in its entirety).

^TM Additional functional features of Gene Writers

In some cases, the Gene Writer as described herein may be characterized by one or more functional measurements or features. In some embodiments, the DNA binding domain has one or more of the following functional characteristics. In some embodiments, the RNA binding domain has one or more of the following functional characteristics. In some embodiments, the endonuclease domain has one or more of the following functional features. In some embodiments, the reverse transcriptase domain has one or more of the following functional characteristics. In some embodiments, the template (e.g., template RNA) has one or more of the following functional features. In some embodiments, the target site bound by the Gene Writer has one or more of the following functional characteristics.

Gene Writer polypeptides

DNA binding domain

In some embodiments, the DNA binding domain is capable of binding a target sequence (e.g., a dsDNA target sequence) with greater affinity than the reference DNA binding domain. In some embodiments, the reference DNA binding domain is a DNA binding domain of r2_bm from bombyx mori. In some embodiments, the DNA binding domain is capable of binding a target sequence (e.g., a dsDNA target sequence) with an affinity of between 100pM-10nM (e.g., between 100pM-1nM or 1nM-10 nM).

In some embodiments, the affinity of a DNA binding domain for its target sequence (e.g., dsDNA target sequence) is measured in vitro, e.g., by thermophoresis, e.g., as described in Asmari et al Methods [ Methods ]146:107-119 (2018) (incorporated herein by reference in its entirety).

In embodiments, the DNA binding domain is capable of binding its target sequence (e.g., dsDNA target sequence), e.g., with an affinity between 100pM-10nM (e.g., between 100pM-1nM or 1nM-10 nM) in the presence of, e.g., about 100-fold molar excess of a disordered sequence competitor dsDNA.

In some embodiments, the DNA binding domain is found to be associated with its target sequence (e.g., dsDNA target sequence) more frequently than any other sequence in the genome of the target cell (e.g., human target cell), e.g., as measured by ChIP-seq (e.g., in HEK293T cells), e.g., as described in He and Pu (2010) curr.protoc Mol Biol [ molecular biology latest protocol ] chapter 21 (incorporated herein by reference in its entirety). In some embodiments, the DNA binding domain is found to be associated with its target sequence (e.g., dsDNA target sequence) at a frequency of at least about 5-fold or 10-fold more frequently than any other sequence in the genome of the target cell, e.g., as measured by ChIP-seq (e.g., in HEK293T cells), e.g., as described in He and Pu (2010), supra.

In some embodiments, the Gene Writer polypeptide comprises a modification to a DNA binding domain, e.g., relative to a wild-type polypeptide. In some embodiments, the DNA binding domain comprises additions, deletions, substitutions or modifications to the amino acid sequence of the original DNA binding domain. In some embodiments, the DNA binding domain is modified to include a heterologous functional domain that specifically binds to a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the functional domain replaces at least a portion (e.g., all) of a previous DNA binding domain of the polypeptide. In some embodiments, the functional domain comprises a zinc finger (e.g., a zinc finger that specifically binds to a target nucleic acid (e.g., DNA) sequence of interest). In some embodiments, the functional domain comprises a Cas domain (e.g., a Cas domain that specifically binds to a target nucleic acid (e.g., DNA) sequence of interest; in embodiments, the Cas domain comprises Cas9 or a mutant or variant thereof (e.g., as described herein). In embodiments, the Cas domain is associated with a guide RNA (gRNA), e.g., as described herein; in embodiments, the Cas domain is directed by a gRNA to a target nucleic acid (e.g., DNA) sequence; in embodiments, the Cas domain is encoded in the same nucleic acid (e.g., RNA) molecule as the gRNA; in embodiments, the Cas domain is encoded in a different nucleic acid (e.g., RNA) molecule than the gRNA).

RNA binding domains

In some embodiments, the RNA binding domain is capable of binding to the template RNA with greater affinity than the reference RNA binding domain. In some embodiments, the reference RNA binding domain is an RNA binding domain of r2_bm from bombyx mori. In some embodiments, the RNA binding domain is capable of binding to the template RNA with an affinity of 100pM-10nM (e.g., 100pM-1nM or 1nM-10 nM). In some embodiments, the affinity of an RNA binding domain for its template RNA is measured in vitro, e.g., by thermophoresis, e.g., as described in Asmari et al Methods 146:107-119 (2018), which is incorporated herein by reference in its entirety. In some embodiments, the affinity of the RNA binding domain for its template RNA is measured in the cell (e.g., by FRET or CLIP-Seq).

In some embodiments, the RNA binding domain is associated with the template RNA in vitro at a frequency at least about 5-fold or 10-fold higher than the disordered RNA. In some embodiments, the frequency of association between the RNA binding domain and the template RNA or the disordered RNA is measured by CLIP-seq, e.g., as described in Lin and Miles (2019) Nucleic Acids Res [ nucleic acids Ind. 47 (11): 5490-5501 (which is incorporated herein by reference in its entirety). In some embodiments, the RNA binding domain is associated with the template RNA in a cell (e.g., HEK293T cell) at a frequency at least about 5-fold or 10-fold higher than the disordered RNA. In some embodiments, the frequency of association between the RNA binding domain and the template RNA or the disordered RNA is measured by CLIP-seq, e.g., as described in Lin and Miles (2019) supra.

Endonuclease domains

In some embodiments, the endonuclease domain is associated with a target dsDNA in vitro at a frequency that is at least about 5-fold or 10-fold higher than the disordered dsDNA. In some embodiments, the endonuclease domain is associated with the target dsDNA in vitro at a frequency that is at least about 5-fold or 10-fold higher than the disordered dsDNA, for example in a cell (e.g., HEK293T cell). In some embodiments, the frequency of association between the endonuclease domain and the target DNA or the disordered DNA is measured by ChIP-seq, e.g., as described in He and Pu (2010) curr.protoc Mol Biol [ latest protocol for molecular biology ] chapter 21 (incorporated herein by reference in its entirety).

In some embodiments, the endonuclease domain may catalyze the formation of a nick at the target sequence, e.g., at least about 5-fold or 10-fold increase relative to non-target sequences (e.g., relative to any other genomic sequence in the target cell genome). In some embodiments, the level of incision formation is determined using a NickSeq, e.g., as described in Elacqua et al (2019) bioRxiv doi.org/10.1101/867937, which is incorporated herein by reference in its entirety.

In some embodiments, the endonuclease domain is capable of nicking DNA in vitro. In embodiments, the nicks result in exposed bases. In an embodiment, the exposed bases can be detected using nuclease sensitivity assays, for example, as described in Chaudhry and Weinfeld (1995) Nucleic Acids Res [ nucleic acids In 23 (19): 3805-3809 (incorporated herein by reference in its entirety). In embodiments, the level of exposed bases (e.g., detected by a nuclease sensitivity assay) is increased by at least 10%, 50% or more relative to a reference endonuclease domain. In some embodiments, the reference endonuclease domain is an endonuclease domain from the r2_bm of bombyx mori.

In some embodiments, the endonuclease domain is capable of nicking DNA in a cell. In an embodiment, the endonuclease domain is capable of nicking DNA in HEK293T cells. In an embodiment, an unrepaired incision that undergoes replication in the absence of Rad51 results in an increase in NHEJ rate at the incision site, which can be detected, for example, by using a Rad51 inhibition assay, e.g., as described in Bothmer et al (2017) Nat com [ natural communication ]8:13905 (incorporated herein by reference in its entirety). In the examples, the NHEJ rate increases above 0-5%. In embodiments, for example, after Rad51 inhibition, the NHEJ rate increases to 20% -70% (e.g., at 30% -60% or 40% -50%).

In some embodiments, the endonuclease domain releases the target after cleavage. In some embodiments, the release of the target is indirectly indicated by assessing multiple turnover of the enzyme, e.g., as described in Youri k et al RNA 25 (1): 35-44 (2019) (incorporated herein by reference in its entirety) and as shown in FIG. 2. In some embodiments, the k of the endonuclease domain, as measured by such methods _exp Is 1x10 ^-3 -1x10 ^- 5min-1。

In some embodiments, the endonuclease domain has greater than about 1x10 in vitro ⁸ s ^-1 M ^-1 Catalytic efficiency (k) _cat /K _m ). In embodiments, the catalytic efficiency of the endonuclease domain is greater than about 1x10 in vitro ⁵ 、1x10 ⁶ 、1x10 ⁷ Or 1x10 ⁸ ,s ^-1 M ^-1 . In the examples, catalytic efficiency is as in Chen et al (2018) Science]360 (6387) 436-439 (which is incorporated herein by reference in its entirety). In some embodiments, the endonuclease domain has greater than about 1x10 in the cell ⁸ s ^-1 M ^-1 Catalytic efficiency (k) _cat /K _m ). In embodiments, the catalytic efficiency of the endonuclease domain is greater than about 1x10 in the cell ⁵ 、1x10 ⁶ 、1x10 ⁷ Or 1x10 ⁸ s ^-1 M ^-1 。

In some embodiments, the Gene Writer polypeptide comprises a modification to an endonuclease domain, e.g., relative to a wild-type polypeptide. In some embodiments, the endonuclease domain comprises an addition, deletion, substitution, or modification to the amino acid sequence of the original endonuclease domain. In some embodiments, the endonuclease domain is modified to include a heterologous functional domain that specifically binds to and/or induces endonuclease cleavage of a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the endonuclease domain comprises a zinc finger. In some embodiments, the endonuclease domain comprises a Cas domain (e.g., cas9 or a mutant or variant thereof). In embodiments, the endonuclease domain comprising the Cas domain is associated with a guide RNA (gRNA), e.g., as described herein. In some embodiments, endonuclease domains are modified to include functional domains that do not target a particular target nucleic acid (e.g., DNA) sequence. In an embodiment, the endonuclease domain comprises a Fok1 domain.

Reverse transcriptase domain

In some embodiments, the reverse transcriptase domain has a lower probability of premature termination rate (P _off ). In some embodiments, the reference reverse transcriptase domain is a reverse transcriptase domain or a viral reverse transcriptase domain of r2_bm from bombyx mori, e.g., an RT domain from M-MLV.

In some embodiments, the reverse transcriptase domain has less than about 5x10 ^-3 /nt、5x10 ^-4 Nt or 5x10 ^-6 In vitro premature termination rate of/nt (P _off ) For example as measured on 1094nt RNA. In the examples, in vitro premature termination rates such as Bibillo and Eickbush (2002) J Biol Chem journal of biochemistry]277 (38) 34836-34845 (which is incorporated herein by reference in its entirety).

In some embodiments, the reverse transcriptase domain is capable of completing at least about 30% or 50% integration in a cell. The percentage of complete integration can be measured by dividing the number of substantially full-length integration events (e.g., genomic loci comprising at least 98% of the desired integration sequence) by the number of total (including substantially full-length and partial) integration events in the cell population. In an embodiment, long-reading amplicon sequencing is used to determine integration (e.g., across integration sites) in a cell, e.g., as described in Karst et al (2020) bioRxiv doi.org/10.1101/645903 (which is incorporated herein by reference in its entirety).

In embodiments, quantifying integration in a cell includes counting an integrated portion comprising at least about 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of a DNA sequence corresponding to a template RNA (e.g., at least 0.05, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 3, 4, or 5kb in length, e.g., 0.5-0.6, 0.6-0.7, 0.7-0.8, 0.8-0.9, 1.0-1.2, 1.2-1.4, 1.4-1.6, 1.6-1.8, 1.8-2.0, 2-3, 3-4, or 4-5kb in length).

In some embodiments, the reverse transcriptase domain is capable of polymerizing dntps in vitro. In embodiments, the reverse transcriptase domain is capable of polymerizing dNTPs in vitro at a rate of 0.1-50nt/sec (e.g., 0.1-1, 1-10, or 10-50 nt/sec). In an example, polymerization of dNTPs by a reverse transcriptase domain is measured by a single molecule assay, e.g., as described in Schwartz and Quake (2009) PNAS [ Proc. Natl. Acad. Sci. USA ]106 (48): 20294-20299, which is incorporated by reference in its entirety.

In some embodiments, the in vitro error rate (e.g., misincorporation of nucleotides) of the reverse transcriptase domain is 1x10 ^-3 -1x10 ^-4 Or 1x10 ^-4 -1x10 ^-5 Individual substitutions/nts, e.g. as described in Yasukawa et al (2017) Biochem Biophys Res Commun [ Biochemical and biophysical research communications ] ]492 147-153 (which are incorporated herein by reference in their entirety). In some embodiments, the reverse transcriptase domain has an error rate (e.g., misincorporation of nucleotides) of 1x10 in a cell (e.g., HEK293T cell) ^-3 -1x10 ^-4 Or 1x10 ^-4 -1x10 ^-5 The substitutions/nts are, for example, sequenced by long-reading amplicons, e.g., as described in Karst et al (2020) bioRxiv doi.org/10.1101/645903, which is incorporated herein by reference in its entirety.

In some embodiments, the reverse transcriptase domain is capable of reverse transcription of target RNA in vitro. In some embodiments, the reverse transcriptase requires at least 3nt of primers to initiate reverse transcription of the template. In some embodiments, reverse transcription of target RNA is determined by detecting cDNA from the target RNA (e.g., when ssDNA primers are provided, e.g., that anneal to the target at the 3' end by at least 3, 4, 5, 6, 7, 8, 9, or 10 nt), e.g., as described in Bibillo and Eickbush (2002) J Biol Chem [ journal of biochemistry ]277 (38): 34836-34845 (which is incorporated herein by reference in its entirety).

In some embodiments, the reverse transcriptase domain performs reverse transcription at least 5 or 10-fold more efficiently (e.g., by cDNA production) than an RNA template lacking a protein binding motif (e.g., 3' utr), e.g., when converting its RNA template to cDNA. In an embodiment, reverse transcription efficiency is measured as described in Yasukawa et al (2017) Biochem Biophys Res Commun [ Biochemical and biophysical research Comm. ]492 (2): 147-153 (which is incorporated herein by reference in its entirety).

In some embodiments, the reverse transcriptase domain specifically binds to a particular RNA template at a higher frequency (e.g., about 5 or 10 times higher frequency) than any endogenous cellular RNA (e.g., when expressed in a cell (e.g., HEK293T cell)). In an embodiment, the specific binding frequency between the reverse transcriptase domain and the template RNA is measured by CLIP-seq, e.g., as described in Lin and Miles (2019) Nucleic Acids Res [ nucleic acids Ind. 47 (11): 5490-5501 (incorporated herein by reference in its entirety).

In some embodiments, the Gene Writer as described herein comprises a polypeptide associated with a guide RNA (gRNA). In certain embodiments, the gRNA is contained in a template nucleic acid molecule. In other embodiments, the gRNA is separate from the template nucleic acid molecule. In some embodiments in which the gRNA is included in a template nucleic acid molecule, the template nucleic acid molecule further comprises a gRNA spacer sequence (e.g., at or within 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides at its 5' end). In embodiments, the gRNA spacer comprises a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a nucleic acid sequence comprised in a target nucleic acid molecule. In embodiments, the gRNA spacer directs Cas domain (e.g., cas 9) activity at a nucleic acid sequence contained in a target nucleic acid molecule. In some embodiments in which the gRNA is included in a template nucleic acid molecule, the template nucleic acid molecule further comprises a primer binding site (e.g., at or within 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides at its 3' end). In embodiments, the primer binding site comprises a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a nucleic acid sequence located at the 5' end (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40 or 50 nucleotides) of a nicking site on a target nucleic acid molecule. In an embodiment, the binding of the primer binding site to the target nucleic acid molecule serves to prime TPRT.

In some embodiments, the Reverse Transcriptase (RT) domain exhibits enhanced stringency of target-initiated reverse transcription (TPRT) initiation, e.g., relative to an endogenous RT domain. In some embodiments, the RT domain initiates TPRT when 3nt in the target site immediately upstream of the first strand incision, e.g., genomic DNA of the priming RNA template, has at least 66% or 100% complementarity to the cognate 3nt in the RNA template. In some embodiments, the RT domain initiates TPRT when there is less than 5nt mismatch (e.g., less than 1, 2, 3, 4, or 5nt mismatch) between template RNA homology and target DNA priming reverse transcription. In some embodiments, the RT domain is modified such that the stringency of mismatches in TPRT reaction priming is increased, e.g., wherein the RT domain does not tolerate any mismatches or tolerate fewer mismatches in the priming region relative to a wild type (e.g., unmodified) RT domain. In some embodiments, the RT domain comprises an HIV-1RT domain. In an example, the HIV-1RT domain initiates lower levels of synthesis even with three nucleotide mismatches relative to the surrogate RT domain (e.g., as described in Jambuuthugoda and Eickbush J Mol Biol [ J. Mol. Biol. 407 (5): 661-672 (2011; incorporated herein by reference in its entirety).

Target site

In some embodiments, after Gene Writing, the target site surrounding the integrated sequence comprises a limited number of insertions or deletions, e.g., as determined by long-reading amplicon of the target site, e.g., as described in Karst et al (2020) bioRxiv doi.org/10.1101/645903 (incorporated herein by reference in its entirety), e.g., in less than about 50% or 10% of integration events. In some embodiments, the target site does not exhibit multiple insertion events, such as head-to-tail or head-to-head repetition, e.g., as determined by long-reading amplicon sequencing of the target site, e.g., as described in Karst et al bioRxiv doi.org/10.1101/645903 (2020), which is incorporated herein by reference in its entirety. In some embodiments, the target site comprises an integration sequence corresponding to the template RNA. In some embodiments, the target site does not comprise an insertion resulting from the endogenous RNA for more than about 1% or 10% of the events, e.g., as determined by long-reading amplicon sequencing of the target site, e.g., as described in Karst et al bioRxiv doi/10.1101/645903 (2020), which is incorporated herein by reference in its entirety. In some embodiments, the target site comprises an integration sequence corresponding to the template RNA.

In some embodiments, the target site comprises an integration sequence corresponding to the template RNA. In embodiments, the target site does not comprise sequences other than the RT template (e.g., the gRNA scaffold, the vector backbone, and/or the ITR), e.g., as determined by long-reading amplicon sequencing of the target site (e.g., as described in Karst et al bioRxiv doi.org/10.1101/645903 (2020), which is incorporated herein by reference in its entirety).

Evolutionary variants of Gene Writer

In some embodiments, the invention provides evolutionary variants of Gene writers. In some embodiments, the evolved variant may be generated by subjecting the reference Gene Writer or one of the fragments or domains contained therein to mutagenesis treatment. In some embodiments, one or more domains (e.g., reverse transcriptase, DNA binding (including, e.g., sequence directed DNA binding elements), RNA binding, or endonuclease domains) evolve. In some embodiments, one or more such evolved variant domains may be evolved alone or with other domains. In some embodiments, one or more evolved variant domains may be combined with one or more non-evolved homologous components or evolved variants of one or more homologous components, e.g., the evolved variants of one or more homologous components may be evolved in a parallel or sequential manner.

In some embodiments, the process of mutagenizing the reference Gene Writer or fragment or domain thereof comprises mutagenizing the reference Gene Writer or fragment or domain thereof. In embodiments, mutagenesis includes a continuous evolution method (e.g., PACE) or a discontinuous evolution method (e.g., PANCE), e.g., as described herein. In some embodiments, the evolved Gene Writer, or fragment or domain thereof, comprises one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of a reference Gene Writer, or fragment or domain thereof. In embodiments, an amino acid sequence variation may include one or more mutated residues (e.g., conservative substitutions, non-conservative substitutions, or combinations thereof) within the amino acid sequence of a reference Gene Writer, e.g., due to a change in the nucleotide sequence encoding the Gene Writer (e.g., a change in a codon at any particular position in the coding sequence) that results in a deletion of one or more amino acids (e.g., a truncated protein), an insertion of one or more amino acids, or any combination of the preceding. The evolved variant Gene Writer may include variants of one or more components or domains of the Gene Writer (e.g., variants that introduce a reverse transcriptase domain, an endonuclease domain, a DNA binding domain, an RNA binding domain, or a combination thereof).

In some aspects, provided herein are Gene writers, systems, kits, and methods that use or comprise an evolved variant of a Gene Writer, e.g., a Gene Writer that employs an evolved variant of a Gene Writer or is produced by or producible by PACE or PANCE. In an embodiment, the non-evolved reference Gene Writer is a Gene Writer as disclosed herein.

As used herein, the term "phage-assisted continuous evolution (PACE)" generally refers to continuous evolution employing phage as a viral vector. Examples of PACE techniques have been described, for example, in the following: international PCT application No. PCT/US 2009/056194 filed on 9/8/2009, published as WO 2010/028347 on 11/3/2010; international PCT application PCT/US 2011/066747 filed on 12, 22, 2011, which is disclosed as WO 2012/088381 on 28, 6, 2012; us patent No. 9,023,594 issued 5/2015; us patent No. 9,771,574 issued on 2017, 9, 26; us patent No. 9,394,537 issued 2016, 7, 19; international PCT application PCT/US 2015/01022 filed on 1 month 20 of 2015, disclosed as WO 2015/134121 on 11 of 9 months of 2015; us patent No. 10,179,911 issued on 1 month 15 2019; and International PCT application PCT/US 2016/027795 filed at 15/4/2016, which is disclosed as WO 2016/168831 at 20/10/2016, each of which is incorporated herein by reference in its entirety.

As used herein, the term "phage-assisted discontinuous evolution (PANCE)" generally refers to discontinuous evolution using phage as a viral vector. Examples of PANCE techniques have been described, for example, in Suzuki T.et al, crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthase [ crystal structure reveals an elusive functional domain of pyrrolysinyl tRNA synthetase ], nat Chem Biol [ Nature Biol. [ Nature Biol.13 (12): 1261-1266 (2017), which is incorporated herein by reference in its entirety. Briefly, PANCE is a technique for rapid in vivo directed evolution using continuous flask transfer of evolving Select Phage (SP), which contains a gene of interest to be evolved in fresh host cells (e.g., e.coli cells). The genes in the host cells may remain unchanged, while the genes contained in the SPs continue to evolve. Following phage growth, an aliquot of the infected cells may be used to transfect a subsequent flask containing the host E.coli. This process may be repeated and/or continued until the desired phenotype has evolved, e.g., for a desired number of transitions.

The method of applying PACE and PANCE to Gene writers can be readily understood by those skilled in the art by reference to, inter alia, the foregoing references. Additional exemplary methods for directing the continuous evolution of a genome-modified protein or system, e.g., using phage particles, e.g., in a host cell population, can be used to generate evolved variants of Gene writers or fragments or subdomains thereof. Non-limiting examples of such methods are described in the following: international PCT application PCT/US 2009/056194 filed on 9/8/2009, published as WO 2010/028347 on 11/3/2010; international PCT application PCT/US 2011/066747 filed on 12, 22, 2011, which is disclosed as WO 2012/088381 on 28, 6, 2012; us patent No. 9,023,594 issued 5/2015; us patent No. 9,771,574 issued on 2017, 9, 26; us patent No. 9,394,537 issued 2016, 7, 19; international PCT application PCT/US 2015/01022 filed on 1 month 20 of 2015, disclosed as WO 2015/134121 on 11 of 9 months of 2015; us patent No. 10,179,911 issued on 1 month 15 2019; international application number PCT/US 2019/37216 filed on 6 th month 14 of 2019; international patent publication WO 2019/023680 published on 1 month 31 of 2019; international PCT application PCT/US 2016/027795 filed 4/15/2016, published as WO 2016/168831 at 10/20/2016; international patent publication number PCT/US 2019/47996 filed 8/23 in 2019; each of which is incorporated by reference herein in its entirety.

In some non-limiting illustrative embodiments, the method of evolving the evolved variant Gene Writer, or fragment or domain thereof, comprises: (a) Contacting a population of host cells with a population of viral vectors comprising a Gene of interest (starting Gene Writer or fragment or domain thereof), wherein: (1) host cells are susceptible to infection by viral vectors; (2) The host cell expresses viral genes required for the production of viral particles; (3) The expression of at least one viral gene required to produce infectious viral particles depends on the function of the gene of interest; and/or (4) the viral vector allows the protein to be expressed in the host cell and can be replicated and packaged by the host cell into viral particles. In some embodiments, the method comprises (b) contacting the host cell with a mutagen that uses a host cell having a mutation that increases the mutation rate (e.g., by carrying a mutant plasmid or some genomic modification-e.g., proofreading an impaired DNA polymerase, SOS genes, such as UmuC, umuD', and/or RecA, which mutations may be under the control of an inducible promoter if combined with the plasmid), or a combination thereof. In some embodiments, the method comprises (c) incubating the host cell population under conditions that allow replication of the virus and production of the viral particles, wherein the host cells are removed from the host cell population, and introducing fresh, uninfected host cells into the host cell population, thereby replenishing the host cell population and producing a host cell stream. In some embodiments, the cells are incubated under conditions that allow the gene of interest to obtain mutations. In some embodiments, the method further comprises (d) isolating from the population of host cells a mutant version of the viral vector encoding an evolved Gene product (e.g., an evolved variant Gene Writer, or fragment or domain thereof).

Those skilled in the art will appreciate the various features that may be employed within the framework described above. For example, in some embodiments, the viral vector or phage is a filamentous phage, such as an M13 phage, e.g., an M13 selection phage. In certain embodiments, the gene required to produce infectious viral particles is M13 gene III (gliii). In an embodiment, the phage may lack a functional gIll, but additionally comprise gI, gII, gIV, gV, gVI, gVII, gVIII, gIX, and gX. In some embodiments, the production of infectious VSV particles involves the envelope protein VSV-G. Various embodiments may use different retroviral vectors, such as murine leukemia virus vectors or lentiviral vectors. In embodiments, retroviral vectors can be efficiently packaged using the VSV-G envelope protein (e.g., as a substitute for the viral natural envelope protein).

In some embodiments, the host cell is incubated according to a suitable number of viral life cycles, e.g., at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles, each viral life cycle being 10-20 minutes in the illustrative and non-limiting examples of M13 phage. Similarly, conditions may be adjusted to adjust the time that the host cell remains in the host cell population, for example, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 70, about 80, about 90, about 100, about 120, about 150, or about 180 minutes. The host cell population may be controlled in part by the density of host cells, or in some embodiments, the host cell density in the influx is, for example, 10 ³ Individual cells/ml, about 10 ⁴ Individual cells/ml, about 10 ⁵ Individual cells/ml, about 5-10 ⁵ Individual cells/ml, about 10 ⁶ Individual cells/ml, about 5-10 ⁶ Individual cells/ml, about 10 ⁷ Individual cells/ml, about 5-10 ⁷ Individual cells/ml, about 10 ⁸ Individual cells/ml, about 5-10 ⁸ Individual cells/ml, about 10 ⁹ Individual cells/ml, about 5.10 ⁹ Individual cells/ml, about 10 ¹⁰ Individual cells/ml, or about 5.10 ¹⁰ Individual cells/ml.

Promoters

In some embodiments, one or more promoter or enhancer elements are operably linked to a nucleic acid encoding a Gene Writer protein or a template nucleic acid, e.g., that controls expression of a heterologous subject sequence. In certain embodiments, the one or more promoter or enhancer elements comprise a cell type or tissue specific element. In some embodiments, the promoter or enhancer is the same or derived from a promoter or enhancer that naturally controls expression of a heterologous subject sequence. For example, ornithine transcarbamylase promoters and enhancers may be used to control the expression of an ornithine transcarbamylase gene in the systems or methods provided herein in order to correct ornithine transcarbamylase defects. In some embodiments, the promoters used in the present invention are directed to genes described in any of tables 9-22, e.g., which may be used with alleles of a reference gene, or in other embodiments, with heterologous genes. In some embodiments, the promoter is a promoter in table 33 or a functional fragment or variant thereof.

For example, the user may be able to access the user at a uniform resource locator (e.g.,www.invivogen.com/tissue-specific- promoters) Commercially available exemplary tissue-specific promoters were found. In some embodiments, the promoter is a native promoter or a minimal promoter, e.g., consisting of a single fragment from the 5' region of the gene. In some embodiments, the native promoter comprises a core promoter and its native 5' utr. In some embodiments, the 5' utr comprises an intron. In other embodiments, these include composite promoters that combine promoter elements that differ in origin, or minimal promoter groups that are identical in origin by a distal enhancerIs assembled to produce the product.

Exemplary cell or tissue specific promoters are provided in the following table, and exemplary nucleic acid sequences encoding them are known in the art and are readily available using a variety of sources, such as NCBI databases, including RefSeq, and eukaryotic promoter databases (//epd.

TABLE 33 exemplary cell or tissue specific promoters

/>

TABLE 34 additional exemplary cell or tissue specific promoters

/>

Any of a number of suitable transcriptional and translational control elements may be used in the expression vector, including constitutive and inducible promoters, transcriptional enhancer elements, transcriptional terminators, and the like, depending on the host/vector system utilized (see, e.g., bitter et al (1987) Methods in Enzymology [ methods of enzymology ],153:516-544; incorporated herein by reference in its entirety).

In some embodiments, a nucleic acid encoding a Gene Writer or a template nucleic acid is operably linked to a control element (e.g., a transcriptional control element, such as a promoter). In some embodiments, the transcriptional control element may function in: eukaryotic cells such as mammalian cells; or prokaryotic cells (e.g., bacterial or archaeal cells). In some embodiments, the nucleotide sequence encoding the polypeptide is operably linked to a plurality of control elements that allow, for example, expression of the nucleotide sequence encoding the polypeptide in prokaryotic and eukaryotic cells.

For purposes of illustration, examples of spatially restricted promoters include, but are not limited to, neuronal specific promoters, adipocyte specific promoters, cardiomyocyte specific promoters, smooth muscle specific promoters, photoreceptor specific promoters, and the like. Neuron-specific spatially restricted promoters include, but are not limited to, the neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSENO2, X51956); aromatic Amino Acid Decarboxylase (AADC) promoters, neurofilament promoters (see, e.g., genBank humanfl, L04147); the synapsin promoter (see, e.g., genBank sumsynib, M55301); the thy-1 promoter (see, e.g., chen et al (1987) Cell [ cells ]51:7-19; and Llewellyn et al (2010) Nat. Med. [ Nature. Medical ]16 (10): 1161-1166); 5-hydroxytryptamine receptor promoter (see, e.g., genBank S62283); tyrosine hydroxylase promoter (TH) (see, e.g., oh et al (2009) Gene Ther [ Gene therapy ]16:437; sasaoka et al (1992) mol. Brain Res. [ molecular brain research ]16:274; boundy et al (1998) J. Neurosci. [ journal of neuroscience ]18:9989; and Kaneda et al (1991) Neuron [ Neuron ] 6:583-594); gnRH promoters (see, e.g., radovick et al (1991) Proc. Natl. Acad. Sci. USA [ Proc. Natl. Acad. Sci. USA. U.S. Sci. Natl. Sci. 88:3402-3406); the L7 promoter (see, e.g., oberdick et al (1990) Science [ Science ] 248:223-226); DNMT promoter (see, e.g., bartge et al (1988) Proc. Natl. Acad. Sci. USA [ Proc. Natl. Acad. Sci. USA ] 85:3648-3652); enkephalin promoters (see, e.g., comb et al (1988) EMBO J. [ J. European molecular biology) 17:3793-3805); myelin Basic Protein (MBP) promoter; ca2+ -calmodulin-dependent protein kinase II-alpha (CamKIIalpha) promoter (see, e.g., mayford et al (1996) Proc. Natl. Acad. Sci. USA [ national academy of sciences USA ]93:13250; and Casanova et al (2001) Genesis [ genetic ] 31:37); CMV enhancer/platelet-derived growth factor-beta promoter (see, e.g., liu et al (2004) Gene Therapy [ Gene Therapy ] 11:52-60); etc.

Adipocyte-specific spatially restricted promoters include, but are not limited to: the aP2 gene promoter/enhancer, e.g., the-5.4 kb to +21bp region of the human aP2 gene (see, e.g., tozzo et al (1997) Endocrinol [ endocrinology ].138:1604; ross et al (1990) Proc. Natl. Acad. Sci. USA [ Proc. Natl. Acad. Sci. USA ]87:9590; and Pavjani et al (2005) Nat. Med. [ Nature. Medical ] 11:797); glucose transporter-4 (GLUT 4) promoter (see, e.g., knight et al (2003) Proc. Natl. Acad. Sci. USA [ Proc. Natl. Acad. Sci. USA ] 100:14725); fatty acid translocase ((FAT/CD 36) promoter (see, e.g., kuriki et al (2002) biol.pharm.Bull. [ journal of biology and medicine ]25:1476; and Sato et al (2002) J.biol. Chem. [ J.Biochem. ] 277:15703), stearoyl-CoA desaturase-1 (SCD 1) promoter (Tabor et al (1999) J.biol. Chem. [ J.Biochem. ] 274:20603), leptin promoter (see, e.g., mason et al (1998) Endocrinol. [ endocrinology ]139:1013, and Chen et al (1999) biochem. Res. Comm. [ Biochem. ]262:187 ], adiponectin promoter (see, e.g., kita et al (2005) Biochem. Res. Comm. [ Biomm. ]331:484; chakra borti (2010) Endocrinol. ] 151:2408), lipid lowering protein promoter (see, e.g., plson et al (1989) Biochem. Res. Comm. ]. Natl. Mol. [ Prolec. 1996:7490:72:72, and so on.

Cardiomyocyte-specific spatially restricted promoters include, but are not limited to, control sequences derived from: myosin light chain-2, alpha-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like. Franz et al (1997) cardiova.Res. [ cardiovascular Studies ]35:560-566; robbins et al (1995) Ann.N.Y. Acad.Sci. [ New York academy of sciences ]752:492-505; linn et al (1995) circ.Res. [ Loop research ]76:584-591; parmacek et al (1994) mol. Cell. Biol. [ molecular cell biology ]14:1870-1885; hunter et al (1993) Hypertension 22:608-617; sartrelli et al (1992) Proc.Natl.Acad.Sci.USA [ Proc. Natl. Acad. Sci. USA ]89:4047-4051.

Smooth muscle-specific spatially restricted promoters include, but are not limited to, the SM22 alpha promoter (see, e.g., aky u rek et al (2000) mol. Med. [ molecular medicine ]6:983; and U.S. patent No. 7,169,874); smooth muscle cell differentiation specific antigen (smoothened) promoters (see, e.g., WO 2001/018048); an alpha-smooth muscle actin promoter; etc. For example, the 0.4kb region of the SM 22. Alpha. Promoter, which contains two CArG elements, has been shown to mediate vascular smooth muscle cell-specific expression (see, e.g., kim, et al (1997) mol. Cell. Biol. [ molecular cell biology ]17,2266-2278; li, et al, (1996) J. Cell Biol. [ journal of cell biology ]132,849-859; and Moessler, et al (1996) Development [ Development ]122, 2415-2425).

Photoreceptor-specific spatially limited promoters include, but are not limited to, the rhodopsin promoter; rhodopsin kinase promoter (Young et al (2003) ophthalmol. Vis. Sci. [ ophthalmic and vision science ] 44:4076); beta phosphodiesterase gene promoter (Nicoud et al (2007) J.Gene Med. [ J. Gene medicine ] 9:1015); the retinal pigment degeneration gene promoter (Nicoud et al (2007) supra); an inter-photoreceptor vitamin A-like binding protein (IRBP) gene enhancer (Nicoud et al (2007) supra); IRBP gene promoter (Yokoyama et al (1992) Exp Eye Res. [ journal of laboratory ophthalmic research ] 55:225); etc.

Non-limiting exemplary cell-specific promoters

Cell-specific promoters known in the art may be used to direct expression of the Gene Writer protein, e.g., as described herein. Non-limiting exemplary mammalian cell-specific promoters have been characterized and used in mice that express Cre recombinase in a cell-specific manner. Certain non-limiting exemplary mammalian cell-specific promoters are listed in table 1 of US 9845481, which is incorporated herein by reference.

In some embodiments, the cell-specific promoter is a promoter active in plants. Many exemplary cell-specific plant promoters are known in the art. See, for example, U.S. patent No. 5,097,025;5,783,393;5,880,330;5,981,727;7,557,264;6,291,666;7,132,526; 7,323,622; U.S. publication No. 2010/0269226;2007/0180580;2005/0034192; and 2005/0086712, which are incorporated herein by reference in their entirety for any purpose.

In some embodiments, the vector as described herein comprises an expression cassette. As used herein, the term "expression cassette" refers to a nucleic acid construct comprising nucleic acid elements sufficient to express a nucleic acid molecule of the invention. Typically, an expression cassette comprises a nucleic acid molecule of the invention operably linked to a promoter sequence. The term "operably linked" refers to the association of two or more nucleic acid fragments on a single nucleic acid fragment such that the function of one nucleic acid fragment is affected by the other. For example, a promoter is operably linked to a coding sequence when the promoter is capable of affecting the expression of the coding sequence (e.g., the coding sequence is under the transcriptional control of the promoter). The coding sequence may be operably linked to the regulatory sequence in a sense or antisense orientation. In certain embodiments, the promoter is a heterologous promoter. As used herein, the term "heterologous promoter" refers to a promoter that is not found in nature operably linked to a given coding sequence. In certain embodiments, the expression cassette may comprise additional elements, such as introns, enhancers, polyadenylation sites, woodchuck Response Elements (WREs), and/or other elements known to affect the expression level of the coding sequence, typically a "promoter" that controls the expression of the coding sequence or functional RNA. In certain embodiments, the promoter sequence comprises proximal and distal upstream elements, and may further comprise enhancer elements. An "enhancer" typically can stimulate the activity of a promoter, and can be an inherent element of a promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. In certain embodiments, the promoter is entirely derived from a native gene. In certain embodiments, the promoters are comprised of different elements derived from different naturally occurring promoters. In certain embodiments, the promoter comprises a synthetic nucleotide sequence. Those skilled in the art will appreciate that different promoters will direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions, or in response to the presence or absence of a drug or transcription cofactor. Ubiquitous, cell type-specific, tissue-specific, developmental stage-specific, and conditional promoters, for example, drug-responsive promoters (e.g., tetracycline-responsive promoters) are well known to those skilled in the art. Examples of promoters include, but are not limited to: phosphoglycerate Kinase (PKG) promoter, CAG (CMV enhancer, complex of chicken beta actin promoter (CBA) and rabbit beta globin intron), NSE (neuron-specific enolase), synaptoprotein or NeuN promoter, SV40 early promoter, mouse mammary tumor virus LTR promoter; adenovirus major late promoter (Ad MLP); herpes Simplex Virus (HSV) promoters, cytomegalovirus (CMV) promoters such as CMV immediate early promoter region (CMVIE), SFFV promoters, rous Sarcoma Virus (RSV) promoters, synthetic promoters, hybrid promoters, and the like. Other promoters may be of human origin or from other species (including from mice). Common promoters include, for example: human Cytomegalovirus (CMV) immediate early gene promoter, SV40 early promoter, rous sarcoma virus long terminal repeat, [ beta ] -actin, rat insulin promoter, phosphoglycerate kinase promoter, human alpha-1 antitrypsin (hAAT) promoter, thyroxine transporter promoter, TBG promoter and other liver specific promoters, desmin promoter and similar muscle specific promoters, EF 1-alpha promoter, CAG promoter and other constitutive promoters, hybrid promoters with tissue specificity, promoters specific for neurons (such as synaptotagins) and glyceraldehyde-3-phosphate dehydrogenase promoters, all of which are well known and readily available to those skilled in the art, may be used to obtain high levels of expression of the coding sequence of interest. In addition, sequences derived from non-viral genes (e.g., murine metallothionein genes) will also find use herein. Such promoter sequences are commercially available from, for example, stratagene Inc. (Stratagene) (San Diego, calif.). Additional exemplary promoter sequences are described, for example, in WO 2018213786A1 (which is incorporated herein by reference in its entirety).

In some embodiments, the apolipoprotein E enhancer (ApoE) or functional fragment thereof is used, for example, to promote expression in the liver. In some embodiments, two copies of an ApoE enhancer or functional fragment thereof are used. In some embodiments, the ApoE enhancer or functional fragment thereof is used in combination with a promoter (e.g., a human alpha-1 antitrypsin (hAAT) promoter).

In some embodiments, the regulatory sequences confer tissue specific gene expression capacity. In some cases, the tissue-specific regulatory sequences bind tissue-specific transcription factors that induce transcription in a tissue-specific manner. Various tissue-specific regulatory sequences (e.g., promoters, enhancers, etc.) are known in the art. Exemplary tissue-specific regulatory sequences include, but are not limited to, the following tissue-specific promoters: liver-specific thyroxine-binding globulin (TBG) promoter, insulin promoter, glucagon promoter, somatostatin promoter, pancreatic Polypeptide (PPY) promoter, synapsin-1 (Syn) promoter, creatine kinase (MCK) promoter, mammalian Desmin (DES) promoter, alpha-myosin heavy chain (a-MHC) promoter, or cardiac troponin T (cTnT) promoter. Other exemplary promoters include: beta-actin promoter, hepatitis B virus core promoter, sandig et al, gene Ther. [ Gene therapy ],3:1002-9 (1996); alpha Fetoprotein (AFP) promoter, arbuthnot et al, hum. Gene Ther. [ human Gene therapy ],7:1503-14 (1996)), osteocalcin promoter (Stein et al, mol. Biol. Rep. [ report of molecular biology ],24:185-96 (1997)); bone sialoprotein promoter (Chen et al, J.bone miner.Res. [ journal of bone and mineral research ]11:654-64 (1996)), CD2 promoter (Hansal et al, J.Immunol. [ J.Immunol. ],161:1063-8 (1998); immunoglobulin heavy chain promoters, T cell receptor alpha chain promoters, neurons such as the Neuron Specific Enolase (NSE) promoter (Andersen et al cell. Mol. Neurobal. [ cell and molecular neurobiology ],13:503-15 (1993)), the neurofilament light chain gene promoter (Picccili et al, proc. Natl. Acad. Sci. USA [ Proc. Natl. Sci. USA ],88:5611-5 (1991)), and the Neuron specific vgf gene promoter (Picccili et al, neuron [ Neuron ],15:373-84 (1995)), as well as other.additional exemplary promoter sequences are described, for example, in U.S. Pat. No. 10300146 (which is incorporated herein by reference in its entirety).

In some embodiments, the vectors described herein are polycistronic expression constructs. Polycistronic expression constructs include, for example, constructs carrying a first expression cassette, e.g., comprising a first promoter and a first coding nucleic acid sequence, and a second expression cassette, e.g., comprising a second promoter and a second coding nucleic acid sequence. In some cases, such polycistronic expression constructs may be particularly useful for delivering non-translated gene products (e.g., hairpin RNAs) as well as polypeptides (e.g., gene writer and gene writer templates). In some embodiments, the polycistronic expression construct may exhibit reduced expression levels of one or more of the included transgenes, e.g., due to promoter interference or the presence of very close incompatible nucleic acid elements. If the polycistronic expression construct is part of a viral vector, the presence of a self-complementary nucleic acid sequence may, in some cases, interfere with the propagation of the virus or the formation of structures required for packaging.

In some embodiments, the sequence encodes a hairpin-containing RNA. In some embodiments, the hairpin RNA is a guide RNA, a template RNA, a shRNA, or a microrna. In some embodiments, the first promoter is an RNA polymerase I promoter. In some embodiments, the first promoter is an RNA polymerase II promoter. In some embodiments, the second promoter is an RNA polymerase III promoter. In some embodiments, the second promoter is a U6 or H1 promoter. In some embodiments, the nucleic acid construct comprises the structure of AAV construct B1 or B2.

Without wishing to be bound by theory, polycistronic expression constructs may not achieve optimal expression levels compared to expression systems containing only one cistron. One of the suggested reasons for the lower expression levels achieved using polycistronic expression constructs comprising two or more promoter elements is the phenomenon of promoter interference (see, e.g., curtin J a, dane a P, swanson a, alexander ie, ginn S L. Bidirect promoter interference between two widely used internal heterologous promoters in a late-generation lentiviral construct. [ bidirectional promoter interference between two widely used internal heterologous promoters in advanced lentiviral constructs ]. Gene ter [ Gene therapy ].2008 month 3; 15 (5): 384-90; and Martin-Duque P, jezzard S, kaftansis L, vasssaux g. Direct comparison of the insulating properties of two genetic elements in an adenoviral vector containing two different expression cassettes [ direct comparison of the insulating properties of two genetic elements in adenovirus vectors containing two different expression cassettes ]. Hum Gene ter [ human Gene therapy ].2004, 15 (10): 995-1002; both references are incorporated herein by reference for promoter interference). In some embodiments, the problem of promoter interference can be overcome, for example, by generating a polycistronic construct comprising only one promoter that promotes transcription of multiple encoding nucleic acid sequences separated by internal ribosome entry sites; or by separating cistrons comprising a self-promoter with transcriptional insulator elements. In some embodiments, single-promoter driven expression of multiple cistrons may result in uneven expression levels of the cistrons. In some embodiments, the promoter cannot be efficiently isolated and the isolation element may be incompatible with some gene transfer vectors (e.g., some retroviral vectors).

MicroRNA

mirnas and other small interfering nucleic acids typically regulate gene expression by target RNA transcript cleavage/degradation or translational inhibition of target messenger RNAs (mrnas). In some cases, mirnas may be expressed naturally, typically as the final 19-25 untranslated RNA product. mirnas generally exhibit their activity through sequence-specific interactions with the 3' untranslated region (UTR) of target mRNA. These endogenously expressed mirnas can form hairpin precursors that are subsequently processed into miRNA duplex and further processed into mature single-stranded miRNA molecules. Such mature mirnas typically direct the polyprotein complex mirsc that recognizes the target 3' utr region of the target mRNA based on its complementarity to the mature miRNA. Useful transgene products may include, for example, mirnas or miRNA binding sites that regulate expression of the linked polypeptides. A non-limiting list of miRNA genes; for example, in methods such as those listed in US 10300146,22:25-25:48 (which is incorporated by reference), the products of these genes and their homologs can be used as transgenes or as targets for small interfering nucleic acids (e.g., miRNA sponges, antisense oligonucleotides). In some embodiments, one or more binding sites for one or more of the foregoing mirnas are incorporated into a transgene (e.g., a transgene delivered by a rAAV vector), e.g., to inhibit expression of the transgene in one or more tissues of an animal carrying the transgene. In some embodiments, the binding site may be selected to control expression of the transgene in a tissue-specific manner. For example, the binding site of liver-specific miR-122 can be incorporated into a transgene to inhibit expression of the transgene in the liver. Additional exemplary miRNA sequences are described, for example, in U.S. patent No. 10300146 (which is incorporated herein by reference in its entirety).

However, for liver-specific Gene Writing, over-expression of miR-122 can be utilized to affect miR-122 specific degradation rather than using a binding site. The miRNA is positively correlated with liver differentiation and maturation and enhanced expression of liver-specific genes. Thus, in some embodiments, the coding sequence of miR-122 can be added to a component of the Gene Writer system to enhance liver-directed therapy.

miR inhibitors or miRNA inhibitors are typically agents that block miRNA expression and/or processing. Examples of such agents include, but are not limited to: microRNA antagonists, microRNA-specific antisense, microRNA sponges and microRNA oligonucleotides (double-stranded, hairpin, short oligonucleotides) that inhibit the interaction of miRNAs with the Drosha complex. MicroRNA inhibitors, such as miRNA sponges, may be expressed in cells from transgenes (e.g., as described in Ebert, M.S. Nature Methods, electronic publication 8, 12, 2007; incorporated herein by reference in its entirety). In some embodiments, a microrna sponge or other miR inhibitor is used with AAV. Microrna sponges typically specifically inhibit mirnas by complementing heptameric seed sequences. In some embodiments, a single sponge sequence may be used to silence the entire miRNA family. Other methods for silencing miRNA function (de-repression of miRNA targets) in a cell will be apparent to one of ordinary skill in the art.

In some embodiments, the mirnas as described herein comprise the sequences listed in table 4 of PCT publication No. WO 2020014209, which is incorporated herein by reference. Also incorporated herein by reference is a list of exemplary miRNA sequences from WO 2020014209.

In some embodiments, it is advantageous to silence one or more components of the Gene Writer system (e.g., mRNA encoding a Gene Writer polypeptide, gene Writer template RNA, or a heterologous subject sequence expressed from the genome after successful Gene Writing) in a portion of the cell. In some embodiments, it may be advantageous to limit expression of components of the Gene Writing system to select cell types within the tissue of interest.

For example, it is known that macrophages and immune cells (e.g., kupffer cells in the liver) can be involved in the uptake of a delivery vehicle of one or more components of the Gene Writing system in a given tissue (e.g., liver). In some embodiments, at least one binding site for at least one miRNA that is highly expressed in macrophages and immune cells, such as, for example, in a cumulating cell, is included in at least one component of the Gene Writing system, such as, for example, a nucleic acid encoding a Gene Writing polypeptide or transgene. In some embodiments, mirnas targeting one or more binding sites are listed in the tables referenced herein, e.g., miR-142, e.g., mature mirnas hsa-miR-142-5p or hsa-miR-142-3p.

In some embodiments, it may be beneficial to reduce Gene Writer levels and/or Gene Writer activity in cells where transgene Gene Writer expression or overexpression may have toxic effects. For example, delivery of transgenic overexpression cassettes to dorsal root ganglion neurons has been shown to be likely to result in toxicity of gene therapy (see Hordeux et al Sci Transl Med [ science transformation medical ]12 (569): eaba9188 (2020), incorporated herein by reference in its entirety). In some embodiments, at least one miRNA binding site may be incorporated into the nucleic acid component of the Gene Writing system to reduce expression of the system component in neurons, such as dorsal root ganglion neurons. In some embodiments, the at least one miRNA binding site that is incorporated into the nucleic acid component of the Gene Writing system to reduce expression of the system component in neurons is a binding site for miR-182, e.g., a mature miRNA hsa-miR-182-5p or hsa-miR-182-3p. In some embodiments, the at least one miRNA binding site that is incorporated into the nucleic acid component of the Gene Writing system to reduce expression of the system component in neurons is a binding site for miR-183, e.g., a mature miRNA hsa-miR-183-5p or hsa-miR-183-3p. In some embodiments, a combination of miRNA binding sites may be used to enhance the restriction of expression of one or more components of the Gene Writing system to a tissue or cell type of interest.

Table A5 below provides exemplary mirnas and corresponding expression cells, e.g., mirnas for which binding sites (complementary sequences) may be incorporated in transgenic or polypeptide nucleic acids, e.g., to reduce expression in the off-target cells, in some embodiments.

Table A5: exemplary miRNAs from off-target cells and tissues

Anti-crispr system for modulating geneWriter activity

Various methods for modulating Cas molecule activity may be used in conjunction with the systems and methods described herein. For example, in some embodiments, a polypeptide described herein (e.g., cas molecule or GeneWriter comprising a Cas domain) can be modulated using an anti-crepr agent (e.g., an anti-crepr protein or an anti-crepr small molecule). In some embodiments, the Cas molecule or Cas domain comprises a responsive intein, e.g., 4-hydroxytamoxifen (4-HT) -responsive intein, an iCas molecule (e.g., iCas 9); 4-HT reactive Cas (e.g., allosterically regulated Cas9 (arC 9) or dead Cas9 (dC 9)). The systems and methods described herein may also utilize chemically induced dimerization of split-protein fragment dimerization systems (e.g., rapamycin-mediated FK506 binding protein 12 (FKBP) and FKBP rapamycin binding domain (FRB) (abscisic acid-induced ABI-PYL1 and gibberellin-induced GID1-GAI heterodimerization domain), dimers of BCL-xL peptides and BH3 peptides, A385358 (A3) small molecules, degron systems (e.g., FKBP-Cas9 destabilization systems, auxin-induced degron (AID) or E.coli DHFR degron systems), aptamers or aptases fused to gRNAs (e.g., tetracycline-and theophylline-responsive biological switches), acrIIA2 and AcrIIA4 proteins, and BRD 0539.

In some embodiments, a small molecule responsive intein (e.g., 4-hydroxy tamoxifen (4-HT) -responsive intein) is inserted into a specific site within a Cas molecule (e.g., cas 9). In some embodiments, insertion of the 4 HT-responsive intein disrupts Cas9 enzyme activity. In some embodiments, the Cas molecule (e.g., iCas 9) is fused to the hormone-binding domain of the estrogen receptor (ERT 2). In some embodiments, the ligand binding domain of human estrogen receptor- α can be inserted into a Cas molecule (e.g., cas9 or dead Cas9 (dC 9)), e.g., at position 231, resulting in a 4 HT-responsive anti-crispr Cas9 (e.g., arC9 or dC 9). In some embodiments, dCas9 can provide 4-HT dose dependent inhibition of Cas9 function. In some embodiments, arC9 can provide 4-HT dose-dependent control of Cas9 function. In some embodiments, a Cas molecule (e.g., cas 9) is fused to split a protein fragment. In some embodiments, chemically-induced dimerization of split-protein fragments (e.g., rapamycin-mediated dimerization of FK506 binding protein 12 (FKBP) and FKBP rapamycin binding domain (FRB)) may induce low levels of Cas9 molecular activity. In some embodiments, a chemically induced dimerization system (e.g., abscisic acid-induced ABI-PYL1 and gibberellin-induced GID1-GAI heterodimerization domains) may induce dose-dependent and reverse transcription activation/inhibition of Cas9. In some embodiments, the Cas9 induction system (ciCas 9) comprises replacing the REC2 domain of the Cas molecule (e.g., cas 9) with a BCL-xl peptide and ligating the BH3 peptide to the N-terminus and C-terminus of the modified Cas9. BCL. In some embodiments, interaction between BCL-xL and BH3 peptide can maintain Cas9 in an inactive state. In some embodiments, a small molecule (e.g., a-385358 (A3)) can disrupt the interaction between BLC-xl and BH3 peptide to activate Cas9. In some embodiments, the Cas9 induction system may exhibit dose-dependent control of nuclease activity. In some embodiments, the degron system can induce degradation of a Cas molecule (e.g., cas 9) upon activation or inactivation by an external factor (e.g., small molecule ligand, light, temperature, or protein). In some embodiments, the small molecule BRD0539 reversibly inhibits a Cas molecule (e.g., cas 9). Additional information about anti-crispr proteins or anti-crispr small molecules can be found in the following: such as gancopadhyay, s.a. et al Precision control of CRISPR-Cas9[ precision control of CRISPR-Cas9 ], biochemistry [ Biochemistry ],2019 (using small molecules and light), maji, b. et al a high-throughput platform to identify small molecule inhibitors of CRISPR-Cas9[ high throughput platform for identifying small molecule inhibitors of CRISPR-Cas9 ] and Pawluk Anti-CRISPR: discovery, mechanism and function [ Anti-CRISPR: discovery, mechanism and function Nature Reviews Microbiology [ overview of natural microbiology ] volume 16, pages 12-17 (2018), each of which is incorporated herein by reference in its entirety.

Self-inactivating module for modulating GeneWriter Activity

In some embodiments, the Gene Writer system described herein includes a self-inactivating module. The self-inactivating module results in reduced expression of the Gene Writer polypeptide, the Gene Writer template, or both. Without wishing to be bound by theory, the self-inactivating module provides short-term Gene Writer expression prior to inactivation. Without wishing to be bound by theory, the activity of the Gene Writer polypeptide at the target site introduces mutations (e.g., substitutions, insertions, or deletions) into the DNA encoding the Gene Writer polypeptide or Gene Writer template, which results in reduced expression of the Gene Writer polypeptide or template. In some embodiments of the self-inactivating module, the target site of the Gene Writer polypeptide is included in DNA encoding the Gene Writer polypeptide or the Gene Writer template. In some embodiments, one, two, three, four, five, or more copies of the target site are contained in DNA encoding a Gene Writer polypeptide or Gene Writer template. In some embodiments, the target site in the DNA encoding the Gene Writer polypeptide or Gene Writer template is the same target site on the genome. In some embodiments, the target site is a different target site than the target site on the genome. In some embodiments, the self-inactivating module target site uses the same or different template RNA or guide RNA as the genomic target site. In some embodiments, the target site is modified by reverse transcription initiated based on the target of the template RNA. In some embodiments, the target side is notched. The target site may incorporate an enhancer, a promoter, an untranslated region, an exon, an intron, an open reading frame, or a stuffer sequence.

In some embodiments, the decrease in expression after inactivation is 25%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or more lower than a Gene Writer system that does not include a self-inactivating module. In some embodiments, the Gene Writer system comprising a self-inactivating module has an integration rate at the target site of 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, or higher than the off-target site as compared to the Gene Writer system not comprising the self-inactivating module. Gene Writer systems comprising self-inactivating modules have a target site modification efficiency of 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%99%, or more, compared to Gene Writer systems not comprising self-inactivating modules. In some embodiments, when the Gene Writer polypeptide is delivered as DNA (e.g., by a viral vector), a self-inactivating module is included.

Self-inactivating modules of nucleases have been described. See, e.g., in Li et al, a Self-deletting AAV-CRISPR System for In Vivo Genome Editing [ Self-deleted AAV-CRISPR system for in vivo genome editing ], mol Ther Methods Clin Dev [ molecular therapy methods and clinical development ]2019, 3, 15; 12:111-122, p.singhal, self-activating Cas9: a method for reducing exposure while maintaining efficacy in virally delivered Cas9applications [ self-Inactivating Cas9: a method of reducing exposure while maintaining efficacy in a virus-delivered Cas9application ] (obtainable at www.editasmedicine.com/wp-content/uploads/2019/10/aef _asgct_post_2017_final_ presentation_5-11-17_515pm1_149537387_1495959595_149767403. Pdf), and Epstein and Schaffer Engineering aSelf-Inactivating CRISPR System for AAV Vectors Targeted Genome Editing [ engineered self-inactivating CRISPR system for AAV vector-targeted genome editing ] |i volume 24, supplement 1, s50,2016, month 1, and WO 2018106693 A1.

Small molecules

In some embodiments, the polypeptides described herein (e.g., gene Writer polypeptides) are controllable by small molecules. In some embodiments, the polypeptide dimerizes via a small molecule.

In some embodiments, the polypeptide may be controlled by Chemical Induced Dimerization (CID) of the small molecule. CID is commonly used to create switches for protein function to alter cellular physiology. An exemplary high specificity, high potency dimer is rimiducid (AP 1903), which has two identical protein binding surfaces aligned tail to tail, mutants of each pair of FKBP 12: FKBP12 (F36V) (FKBP 12V36, F _V36 Or F _v ) Has high affinity and specificity, one or more F _V Attachment of a domain to one or more cell signaling molecules that typically rely on homodimerization can convert the protein into rimiducid control. Homodimerization with rimiducid is used in the context of an inducible caspase safety switch. Such molecular switches are controlled by different dimer ligands based on heterodimeric small molecules, rapamycin or rapamycin analogues ("rapalog"). Rapamycin binds to FKBP12 and variants thereof, and can induce signaling structures fused to FKBP12 by binding to FKBP12 and polypeptides comprising the FKBP-rapamycin binding (FRB) domain of mTOR Heterodimerization of domains. In some embodiments of the present application, molecular switches are provided that greatly increase the use of rapamycin, rapamycin analogs, and rimiducid as agents for therapeutic applications.

In some embodiments of the two-switch technology, homodimers, such as AP1903 (rimiducid), directly induce dimerization or multimerization of polypeptides comprising the multimerization region of FKBP 12. In other embodiments, the polypeptide comprising FKBP12 multimerization is multimerized, or aggregated by binding to a heterodimer (e.g., rapamycin or a rapamycin analog) that also binds to the FRB or FRB variant multimerization region on the chimeric polypeptide, and also expresses, e.g., a chimeric antigen receptor in a modified cell. Rapamycin is a natural product macrolide that binds to FKBP12 with high affinity (< 1 nM) and co-initiates a high affinity, inhibitory interaction with the FKBP-rapamycin-binding (FRB) domain of mTOR. FRB is small (89 amino acids) and therefore can be used as a protein "tag" or "handle" when attached to many proteins. Co-expression of FRB fusion proteins with FKBP12 fusion proteins makes them approximately rapamycin inducible (12-16). This can be taken as the basis of a cell safety switch, regulated by the orally available ligand rapamycin or rapamycin derivatives (rapalogs), which do not inhibit mTOR at low therapeutic doses, but instead bind to the selected caspase-9 fusion mutant FRB domain. (see Sabatini D M, et al, cell. [ Cell ]1994;78 (1): 35-43; brown E J, et al, nature. [ Nature ]1994;369 (6483): 756-8; chen J, et al, proc Natl Acad Sci USA. [ Proc. Natl. Acad. Sci. USA ]1995;92 (11): 4947-51; and Choi J, science [ Science ].1996 273 (5272): 239-42).

In some embodiments, two control levels are provided in the treated cells. In an embodiment, the first control level may be adjustable, i.e. the removal level of the therapeutic cells may be controlled, resulting in a partial removal of the therapeutic cells. In some embodiments, the chimeric antigen polypeptide comprises a binding site for rapamycin or a rapamycin analog. In embodiments, suicide genes, such as genes encoding caspase polypeptides, are also present in the therapeutic cells. With such a controllable first level, in some embodiments, the need for continued treatment may be balanced with the need to eliminate or reduce the level of negative side effects. In some embodiments, the rapamycin analog rapalog is administered to a patient, which then binds to both the caspase polypeptide and the chimeric antigen receptor, thereby recruiting the caspase polypeptide to the CAR's location and aggregating the caspase polypeptide. After aggregation, caspase polypeptides induce apoptosis. The amount of rapamycin or rapamycin analog administered to a patient may vary; if it is desired to remove lower levels of cells by apoptosis to reduce side effects and continue CAR treatment, then lower levels of rapamycin or rapamycin may be administered to the patient. In some embodiments, the second control level may be designed to achieve maximum cell elimination. This second level may be based on, for example, the use of rimiducid or AP 1903. AP1903 may be administered to the patient if rapid elimination of up to 100% of the therapeutic cells is desired. Multimeric AP1903 binds to a caspase polypeptide, resulting in multimerization and apoptosis of the caspase polypeptide. In certain examples, the second level may also be regulated or controlled by the level of AP1903 administered to the subject.

In certain embodiments, small molecules may be used to control genes, as described, for example, in US 10584351 at 47:53-56:47 (which is incorporated herein by reference in its entirety), as well as suitable ligands for control features, for example, in US 10584351 at 56:48 and thereafter and in U10046049 at 43:27-52:20 (which is incorporated herein by reference), and in 52:21 and thereafter for ligands of such control systems.

Chemically modified nucleic acids and nucleic acid end features

The nucleic acids described herein (e.g., template nucleic acid, e.g., template RNA; or nucleic acid encoding GeneWriter (e.g., mRNA)) can comprise unmodified or modified nucleobases. Naturally occurring RNAs are synthesized from four basic ribonucleotides: ATP, CTP, UTP and GTP, but may contain post-transcriptional modified nucleotides. In addition, about one hundred different nucleoside modifications have been identified in RNA (Rozenski, J, crain, P, and McCloskey, J. (1999). The RNA Modification Database:1999update [ RNA modification database:1999update ] nucleic Acids Res [ nucleic Acids Res ] 27:196-197). RNA may also comprise fully synthetic nucleotides that are not found in nature.

In some embodiments, the chemical modification is a chemical modification provided in: PCT/US 2016/032654, U.S. patent publication 20090286852, international application nos. WO/2012/019168, WO/2012/045075, WO/2012/135805, WO/2012/158736, WO/2013/039857, WO/2013/039861, WO/2013/052523, WO/2013/090648, WO/2013/096709, WO/2013/101690, WO/2013/106496, WO/2013/130161, WO/2013/151669, WO/2013/151736, WO/2013/151672, WO/2013/151664, WO/2013/151665, WO/2013/151668, WO/2013/151671, WO/2013/151667, WO/2013/151670, WO/2013/151666, WO/2013/151663 WO/2014/028429, WO/2014/081507, WO/2014/093924, WO/2014/093574, WO/2014/113089, WO/2014/144711, WO/2014/144767, WO/2014/144039, WO/2014/152540, WO/2014/152030, WO/2014/152031, WO/2014/152027, WO/2014/152211, WO/2014/158795, WO/2014/15813, WO/2014/164253, WO/2015/006747, WO/2015/034928, WO/2015/034925, WO/2015/038892, WO/2015/048744, WO/2015/051214, WO/2015/051173, WO/2015/051051, WO/2015/058069, WO/2015/0818 WO/2015/089511, WO/2015/105926, WO/2015/164674, WO/2015/196130, WO/2015/196128, WO/2015/196118, WO/2016/01226, WO/2016/011082, WO/2016/011086, WO/2016/022914, WO/2016/036902, WO/2016/077125, or WO/2016/077123, each of which is incorporated herein by reference in its entirety. It will be appreciated that the incorporation of a chemically modified nucleotide into a polynucleotide may result in the incorporation of the modification into the nucleobase, the backbone, or both, depending on the location of the modification in the nucleotide. In some embodiments, the backbone modification is one provided in EP 2813570, which is incorporated herein by reference in its entirety. In some embodiments, the modified cap is a cap provided in U.S. patent publication 20050287539 (which is incorporated herein by reference in its entirety).

In some embodiments, the chemically modified nucleic acid (e.g., RNA, e.g., mRNA) comprises one or more ARCAs: anti-reverse cap analogue (m 27.3' -OGP 3G), GP3G (unmethylated cap analogue), m7GP3G (monomethylated cap analogue), m32.2.7gpp 3G (trimethylated cap analogue), m5CTP (5 ' -methyl-cytidine triphosphate), m6ATP (N6-methyl-adenosine-5 ' -triphosphate), s2UTP (2-thio-uridine triphosphate and ψ (pseudouridine triphosphate).

In some embodiments, the chemically modified nucleic acid comprises a 5' cap, for example: 7-methylguanosine caps (e.g., O-Me-m7G caps); hypermethylated cap analogs; NAD+ derived cap analogues (e.g., as described in Kiledjian, trends in Cell Biology [ trends cytobiology ]28,454-464 (2018); or modified, e.g., biotinylated, cap analogs (e.g., as described in Bednarek et al, phil Trans R Soc B [ Royal university, london, J.F. ] -bioscience, 373,20180167 (2018)).

In some embodiments, the chemically modified nucleic acid comprises 3' features selected from one or more of the following: poly A tail; a 16 nucleotide long stem-loop structure flanked by unpaired 5 nucleotides (e.g., as described in Mannironi et al, nucleic Acid Research [ nucleic acids research ]17,9113-9126 (1989); triple helix structure (e.g., as described by Brown et al, PNAS [ national academy of sciences of the united states of america ]109,19202-19207 (2012); tRNA, Y RNA, or vault RNA structure (e.g., as described by Labno et al, biochemica et Biophysica Acta [ journal of biochemistry and biophysics ]1863,3125-3147 (2016); incorporation of one or more deoxyribonucleotide triphosphates (dNTPs), 2' O-methylated NTPs, or phosphorothioate-NTPs; mononucleotide chemical modification (e.g., oxidizing 3' -terminal ribose to a reactive aldehyde, then conjugating the aldehyde-reactive modified nucleotide); or chemically linked to another nucleic acid molecule.

In some embodiments, the nucleic acid (e.g., template nucleic acid) comprises one or more modified nucleotides, e.g., a nucleotide selected from the group consisting of dihydrouridine, inosine, 7-methylguanosine, 5-methylcytidine (5 mC), 5 '-ribonucleoside phosphate, 2' -O-ribonucleoside methyl, 2 '-O-ribonucleoside ethyl, 2' -ribonucleoside fluoro, C-5 propynyl-deoxycytidine (pdC), C-5 propynyl-deoxyuridine (pdU), C-5 propynyl cytidine (pC), C-5 propynyluridine (puc), 5-methylcytidine, 5-methyluridine, 5-methyldeoxycytidine, 5-methyldeoxyuridine methoxy, 2, 6-diaminopurine, 5 '-dimethoxytrityl-N4-ethyl-2' -deoxycytidine, C-5 propynyl-f-cytidine (pfC), C-5 propynyl-f-uridine (pfU), 5-methyl-cytidine, 5-methyl-f-uridine, C-5-propynyl-uridine (pdU), C-5-methyl cytidine (p-m), 8-methyl-cytidine (m) and 8-methyl-cytidine (m) binding agent (m) are present in the nucleic acid, or 5-methoxyuridine (5-MO-U).

In some embodiments, the nucleic acid comprises a backbone modification, such as a modification to a sugar or phosphate group in the backbone. In some embodiments, the nucleic acid comprises a nucleobase modification.

In some embodiments, the nucleic acid comprises one or more chemically modified nucleotides of table 6, one or more chemical backbone modifications of table 7, one or more chemically modified caps of table 8. For example, in some embodiments, the nucleic acid comprises two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of chemical modifications. For example, the exogenous mRNA can comprise two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of modified nucleobases, e.g., as described herein, e.g., in table 6. Alternatively or in combination, the exogenous mRNA may comprise two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of backbone modifications, e.g., as described herein, e.g., in table 7. Alternatively or in combination, the nucleic acid may comprise one or more modified caps, e.g., as described herein, e.g., in table 8. For example, in some embodiments, the nucleic acid comprises one or more types of modified nucleobases and one or more types of backbone modifications; one or more types of modified nucleobases and one or more modified caps; one or more types of modified caps and one or more types of backbone modifications; or one or more types of modified nucleobases, one or more types of backbone modifications, and one or more types of modified caps.

In some embodiments, the nucleic acid comprises one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, or more) modified nucleobases. In some embodiments, all nucleobases of a nucleic acid are modified. In some embodiments, the nucleic acid is modified at one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, or more) positions in the backbone. In some embodiments, all backbone positions of the nucleic acid are modified.

TABLE 6 modified nucleotides

/>

TABLE 7 backbone modification

TABLE 8 modified caps

Production of compositions and systems

As will be appreciated by those of skill in the art, methods of designing and constructing nucleic acid constructs and proteins or polypeptides (e.g., the systems, constructs and polypeptides described herein) are conventional in the art. In general, recombinant methods may be used. Generally, see Smales and James (editions), therapeutic Proteins: methods and Protocols [ therapeutic protein: methods and protocols ] (Methods in Molecular Biology [ methods of molecular biology ]), huma Press [ Hu Mana Press ] (2005); and Crommelin, sindelar and Meibohm (editions), pharmaceutical Biotechnology: fundamentals and Applications [ pharmaceutical biotechnology: foundation and application ], springer [ Springer Press ] (2013). Methods for designing, preparing, evaluating, purifying, and manipulating nucleic acid compositions are described in Green and Sambrook (eds.), molecular Cloning: A Laboratory Manual [ molecular cloning: laboratory manual ] (fourth edition), cold Spring Harbor Laboratory Press [ Cold spring harbor laboratory Press ] (2012).

The present disclosure provides, in part, nucleic acids (e.g., vectors) encoding the Gene Writer polypeptides described herein, the template nucleic acids described herein, or both. In some embodiments, the vector comprises a selectable marker, e.g., an antibiotic resistance marker. In some embodiments, the antibiotic resistance marker is a kanamycin resistance marker. In some embodiments, the antibiotic resistance marker does not confer resistance to a β -lactam antibiotic. In some embodiments, the vector does not comprise ampicillin resistance markers. In some embodiments, the vector comprises a kanamycin resistance marker and does not comprise an ampicillin resistance marker. In some embodiments, the vector encoding the Gene Writer polypeptide is integrated into the target cell genome (e.g., after administration to a target cell, tissue, organ, or subject). In some embodiments, the vector encoding the Gene Writer polypeptide is not integrated into the target cell genome (e.g., after administration to a target cell, tissue, organ, or subject). In some embodiments, the vector encoding the template nucleic acid (e.g., template RNA) is not integrated into the target cell genome (e.g., after administration to a target cell, tissue, organ, or subject). In some embodiments, the selectable marker is not integrated into the genome if the vector is integrated into a target site in the genome of the target cell. In some embodiments, if the vector is integrated into a target site in the genome of the target cell, no genes or sequences involved in vector maintenance (e.g., plasmid maintenance genes) are integrated into the genome. In some embodiments, if the vector is integrated into a target site in the genome of the target cell, the transfer regulatory sequence (e.g., an inverted terminal repeat sequence, e.g., from an AAV) is not integrated into the genome. In some embodiments, administration of a vector (e.g., a vector encoding a Gene Writer polypeptide described herein, a template nucleic acid described herein, or both) to a target cell, tissue, organ, or subject can cause portions of the vector to integrate into one or more target sites in one or more genomes of the target cell, tissue, organ, or subject. In some embodiments, less than 99%, 95%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, or 1% of the target sites (e.g., no target sites) comprising the integration material comprise a selectable marker (e.g., an antibiotic resistance gene), a transfer regulatory sequence (e.g., an inverted terminal repeat sequence, e.g., from an AAV), or both from the vector.

Exemplary methods for producing the therapeutic drug proteins or polypeptides described herein involve expression in mammalian cells, although insect cells, yeast, bacteria, or other cells may also be used, under the control of appropriate promoters, to produce recombinant proteins. Mammalian expression vectors may contain non-transcribed elements such as origins of replication, suitable promoters, and other 5 'or 3' flanking non-transcribed sequences; and 5 'or 3' untranslated sequences, such as necessary ribosome binding sites, polyadenylation sites, splice donor and acceptor sites, and termination sequences. DNA sequences derived from the SV40 viral genome, such as the SV40 origin, early promoters, splicing and polyadenylation sites, may be used to provide other genetic elements necessary for expression of heterologous DNA sequences. Suitable cloning and expression vectors for use with bacterial, fungal, yeast, and mammalian cell hosts are described in the following documents: green and Sambrook, molecular Cloning: A Laboratory Manual [ molecular cloning: laboratory manual ] (fourth edition), cold Spring Harbor Laboratory Press [ Cold spring harbor laboratory Press ] (2012).

Various mammalian cell culture systems can be used to express and produce recombinant proteins. Examples of mammalian expression systems include CHO, COS, HEK293, heLA and BHK cell lines. The process of host cell culture for the production of protein therapeutics is described in the following documents: zhou and Kantardjiiff (editions), mammalian Cell Cultures for Biologics Manufacturing [ mammalian cell culture for biological manufacture ] (Advances in Biochemical Engineering/Biotechnology [ progress of biochemical engineering/Biotechnology ]), springer [ Springer Press ] (2014). The compositions described herein may include a vector, such as a viral vector encoding a recombinant protein, such as a lentiviral vector. In some embodiments, a vector, such as a viral vector, may comprise a nucleic acid encoding a recombinant protein.

Purification of protein therapeutics is described in the following documents: franks, protein Biotechnology: isolation, characation and Stabilization [ protein biotechnology: isolation, characterization, and stabilization ], humana Press [ Hu Mana Press ] (2013); and Cutler, protein Purification Protocols [ protein purification protocol ] (Methods in Molecular Biology [ methods of molecular biology ]), humana Press [ Hu Mana Press ] (2010).

In some embodiments, quality criteria include, but are not limited to:

(i) The length of the mRNA encoding the geneWriter polypeptide, e.g., whether the length of the mRNA is greater than a reference length or within a reference length range, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the mRNA present is greater than 3000, 4000 or 5000 nucleotides in length;

(ii) The presence, absence and/or length of poly-a tails on the mRNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the mRNA present contains poly-a tails (e.g., poly-a tails of at least 5, 10, 20, 30, 50, 70, 100 nucleotides in length);

(iii) The presence, absence and/or type of 5 'cap on the mRNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the mRNA present contains a 5' cap, e.g., whether the cap is a 7-methylguanosine cap, e.g., an O-Me-m7G cap;

(iv) The presence, absence and/or type of one or more modified nucleotides in the mRNA (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the mRNA selected from pseudouridine, dihydrouridine, inosine, 7-methylguanosine, 1-N-methylpseudouridine (1-Me- ψ), 5-methoxyuridine (5-MO-U), 5-methylcytidine (5 mC), or locked nucleotides) contains one or more modified nucleotides;

(v) Stability of mRNA (e.g., over time and/or under preselected conditions), such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of mRNA remains intact (e.g., greater than 100, 125, 150, 175, or 200 nucleotides in length) after stability testing; or (b)

(vi) Efficacy of an mRNA in a system for modifying DNA, e.g., whether at least 1% of target sites are modified after determining the efficacy of a system comprising the mRNA.

Kit, article of manufacture and pharmaceutical composition

In one aspect, the disclosure provides a kit comprising a Gene Writer or a Gene Writing system, e.g., as described herein. In some embodiments, the kit comprises a Gene Writer polypeptide (or nucleic acid encoding a polypeptide) and a template RNA (or DNA encoding a template RNA). In some embodiments, the kit further comprises reagents for introducing the system into a cell, such as transfection reagents, LNP, and the like. In some embodiments, the kit is suitable for use in any of the methods described herein. In some embodiments, the kit comprises one or more elements, compositions (e.g., pharmaceutical compositions), gene writers, and/or Gene Writer systems, or functional fragments or components thereof, e.g., disposed in an article of manufacture. In some embodiments, the kit comprises instructions for its use.

In one aspect, the present disclosure provides an article of manufacture, e.g., having disposed therein a kit or component thereof as described herein.

In one aspect, the present disclosure provides a pharmaceutical composition comprising a Gene Writer or a Gene Writing system, e.g., as described herein. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier or excipient. In some embodiments, the pharmaceutical composition comprises a template RNA and/or an RNA encoding a polypeptide. In embodiments, the pharmaceutical composition has one or more (e.g., 1, 2, 3, or 4) of the following features:

(a) Less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) of the DNA template relative to the template RNA and/or RNA encoding the polypeptide, e.g., on a molar basis;

(b) Less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) of uncapped RNA relative to the template RNA and/or RNA encoding the polypeptide, e.g., on a molar basis;

(c) Less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) of the partial length RNA relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;

(d) Essentially devoid of unreacted cap dinucleotides.

Chemistry, manufacture and control (CMC)

Purification of protein therapeutics is described, for example, in the following documents: franks, protein Biotechnology: isolation, characation and Stabilization [ protein biotechnology: isolation, characterization, and stabilization ], humana Press [ Hu Mana Press ] (2013); and Cutler, protein Purification Protocols [ protein purification protocol ] (Methods in Molecular Biology [ methods of molecular biology ]), humana Press [ Hu Mana Press ] (2010).

In some embodiments, gene Writer ^TM The system, polypeptide, and/or template nucleic acid (e.g., template RNA) meets certain quality criteria. In some embodiments, the Gene writers produced by the methods described herein ^TM The system, polypeptide, and/or template nucleic acid (e.g., template RNA) meets certain quality criteria. Thus, in some aspects, the disclosure relates to the manufacture of Gene writers that meet certain quality criteria ^TM Methods of systems, polypeptides, and/or template nucleic acids (e.g., template RNA),for example, wherein the quality standard has been determined. In some aspects, the disclosure also relates to a method of treating a cancer in a Gene Writer ^TM Methods for determining the quality standard in a system, polypeptide, and/or template nucleic acid (e.g., template RNA). In some embodiments, the quality criteria include, but are not limited to, one or more of the following (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12):

(i) Whether the length of the template RNA, e.g., the length of the template RNA, is greater than or within a reference length, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the length of the template RNA present is greater than 100, 125, 150, 175 or 200 nucleotides;

(ii) The presence, absence and/or length of poly-a tails on the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the template RNA present contains poly-a tails (e.g., poly-a tails of at least 5, 10, 20, 30, 50, 70, 100 nucleotides in length);

(iii) The presence, absence and/or type of a 5 'cap on the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the template RNA present contains a 5' cap, e.g., whether the cap is a 7-methylguanosine cap, e.g., an O-Me-m7G cap;

(iv) The presence, absence and/or type of one or more modified nucleotides in the template RNA (e.g., selected from pseudouridine, dihydrouridine, inosine, 7-methylguanosine, 1-N-methylpseudouridine (1-Me- ψ), 5-methoxyuridine (5-MO-U), 5-methylcytidine (5 mC), or locked nucleotides), e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% of the template RNA present contains one or more modified nucleotides;

(v) Stability of the template RNA (e.g., over time and/or under preselected conditions), such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA remains intact (e.g., greater than 100, 125, 150, 175, or 200 nucleotides in length) after stability testing;

(vi) Efficacy of a template RNA in a system for modifying DNA, e.g., whether at least 1% of target sites are modified after determining the efficacy of a system comprising the template RNA;

(vii) A length of the polypeptide, first polypeptide, or second polypeptide, e.g., whether the length of the polypeptide, first polypeptide, or second polypeptide exceeds a reference length or is within a reference length range, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, first polypeptide, or second polypeptide is present is greater than 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1700, 1800, 1900, or 2000 amino acids (and optionally, no more than 2500, 2000, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 700, or 600 amino acids in length);

(viii) The presence, absence, and/or type of post-translational modification on the polypeptide, the first polypeptide, or the second polypeptide, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, the first polypeptide, or the second polypeptide contains phosphorylation, methylation, acetylation, myristoylation, palmitoylation, prenylation, glipyton, or lipidation, or any combination thereof;

(ix) The presence, absence and/or type of one or more artificial, synthetic or atypical amino acids (e.g., selected from ornithine, β -alanine, GABA, δ -aminolevulinic acid, PABA, D-amino acid (e.g., D-alanine or D-glutamic acid), aminoisobutyric acid, dehydroalanine, cystathionine, lanthionine, methylcystine, diaminopimelic acid, homoalanine, norvaline, norleucine, homonorleucine (Homonorleucine), homoserine, O-methyl-homoserine and O-ethyl-homoserine, ethionine, selenocysteine, selenohomocysteine, selenomethionine, selenoethylsulfaniline, tellurocysteine or telluromethionine), e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptides, the first polypeptide or the second polypeptide contains one or more artificial, synthetic or atypical amino acids;

(x) Stability of the polypeptide, first polypeptide, or second polypeptide (e.g., over time and/or under preselected conditions), such as whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, first polypeptide, or second polypeptide remains intact after stability testing (e.g., greater than 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1700, 1800, 1900, or 2000 amino acids (and optionally, no more than 2500, 2000, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 700, or 600 amino acids) in length);

(xi) Efficacy of the polypeptide, the first polypeptide or the second polypeptide in a system for modifying DNA, e.g., whether at least 1% of the target sites are modified after the efficacy of the system comprising the polypeptide, the first polypeptide or the second polypeptide is determined; or (b)

(xii) The presence, absence, and/or level of one or more of a pyrogen, virus, fungus, bacterial pathogen, or host cell protein, e.g., whether the system is free or substantially free of pyrogen, virus, fungus, bacterial pathogen, or host cell protein contamination.

In some embodiments, the systems or pharmaceutical compositions described herein are free of endotoxin.

In some embodiments, the presence, absence and/or level of one or more of a pyrogen, a virus, a fungus, a bacterial pathogen and/or a host cell protein is determined. In embodiments, a determination is made as to whether the system is free or substantially free of pyrogens, viruses, fungi, bacterial pathogens, and/or host cell protein contamination.

In some embodiments, a pharmaceutical composition or system as described herein has one or more of the following features (e.g., 1, 2, 3, or 4):

(d) Essentially devoid of unreacted cap dinucleotides.

Application of

The invention also provides for the use (methods) of modifying a DNA molecule (e.g., nuclear DNA), whether in vitro, ex vivo, in situ, or in vivo, e.g., in tissue of an organism (e.g., a subject, including mammalian subjects, e.g., a human), using the systems described herein, optionally using any of the delivery modes described herein (including viral delivery modes, e.g., AAV). The Gene Writer system can meet therapeutic needs by integrating the coding Gene into the RNA sequence template, for example, by providing expression of therapeutic transgenes in individuals with loss-of-function mutations, by substituting normal transgenes for gain-of-function mutations, by providing regulatory sequences to eliminate gain-of-function mutant expression, and/or by controlling expression of operably linked genes, transgenes, and systems thereof. In certain embodiments, the RNA sequence template encodes a promoter region, such as a tissue-specific promoter or enhancer, that is required for treatment of the host cell. In certain embodiments, the template nucleic acid encodes a promoter region, e.g., a tissue-specific promoter or enhancer, that is required for treatment of a host cell. In yet other embodiments, the promoter may be operably linked to a coding sequence, e.g., for therapeutic intervention.

In certain aspects, the invention provides methods of modifying a target DNA strand in a cell, tissue, or subject, comprising administering to the cell, tissue, or subject a system as described herein (optionally by means described herein), wherein the system inserts a heterologous subject sequence into the target DNA strand, thereby modifying the target DNA strand. In certain embodiments, the heterologous subject sequence is thus expressed in a cell, tissue, or subject. In some embodiments, the cell, tissue, or subject is a mammalian (e.g., human) cell, tissue, or subject. Exemplary cells so modified include hepatocytes, lung epithelial cells, fibroblasts. Such cells may be primary cells or not immortalized. In a related aspect, the invention also provides methods of treating a tissue of a mammal, the methods comprising administering to the mammal a system described herein, thereby treating the tissue, wherein the tissue lacks a heterologous subject sequence. In certain embodiments of any of the preceding aspects and embodiments, the Gene Writer polypeptide is provided as a nucleic acid that is transiently present.

In some embodiments, the systems of the present invention are capable of producing insertions, substitutions, deletions, or combinations thereof in a target DNA. In some embodiments, the insertion, deletion, substitution, or combination thereof increases or decreases expression (e.g., transcription or translation) of the gene. In some embodiments, an insertion, deletion, substitution, or combination thereof increases or decreases expression (e.g., transcription or translation) of a gene by altering, adding, or deleting sequences in a promoter or enhancer (e.g., sequences that bind transcription factors). In some embodiments, the insertion, deletion, substitution, or combination thereof alters translation of the gene (e.g., alters an amino acid sequence), insertion or deletion of a start or stop codon, alters or fixes the translational framework of the gene. In some embodiments, an insertion, deletion, substitution, or combination thereof alters splicing of a gene, e.g., by inserting, deleting, or altering splice acceptor or donor sites. In some embodiments, the insertion, deletion, substitution, or combination thereof alters the transcript or protein half-life. In some embodiments, the insertion, deletion, substitution, or combination thereof alters protein localization in the cell (e.g., from the cytoplasm to the mitochondria, from the cytoplasm to the extracellular space (e.g., adding a secretion tag)). In some embodiments, the insertion, deletion, substitution, or combination thereof alters (e.g., improves) protein folding (e.g., to prevent accumulation of misfolded proteins). In some embodiments, the insertion, deletion, substitution, or combination thereof alters, increases, decreases the activity of a gene, e.g., the activity of a protein encoded by the gene.

In the examples, gene Writer ^TM The gene editor system may provide therapeutic transgenes that express, for example, alternative blood factors or alternative enzymes such as lysosomal enzymes. For example, the compositions, systems, and methods described herein can be used to express a galactosidase a or β in a target human genome to treat Fabry Disease (Fabry Disease); an imisidase, tagatosidase (taliglucase) a, verasidase (velaglucase) a, or arabinosidase against Gaucher Disease (Gaucher Disease); plug Bei Zhimei a for lysosomal acid lipase deficiency (Wolman disease)/CESD); laroninase, ai Du sulfatase, epende Luo Liusuan esterase alpha, or sulfurylase against mucopolysaccharidosis; the alpha-glucosidase for pompe disease. For example, the compositions, systems, and methods described herein may be used to express factor I, II, V, VII, X, XI, XII or XIII in a target human genome to ameliorate a blood factor deficiency.

In some embodiments, the heterologous subject sequence encodes an intracellular protein (e.g., a cytoplasmic protein, a nuclear protein, an organelle protein such as a mitochondrial protein or a lysosomal protein, or a membrane protein). In some embodiments, the heterologous subject sequence encodes a membrane protein, e.g., a membrane protein other than a CAR and/or an endogenous human membrane protein. In some embodiments, the heterologous subject sequence encodes an extracellular protein. In some embodiments, the heterologous subject sequence encodes an enzyme, structural protein, signaling protein, regulatory protein, transport protein, sensory protein, motor protein, defensin, or storage protein. Other proteins include immunoreceptor proteins, for example synthetic immunoreceptor proteins such as chimeric antigen receptor proteins (CARs), T cell receptors, B cell receptors, or antibodies.

Gene Writing ^TM The system can be used to modify immune cells. In some embodiments, gene Writing ^TM The system can be used to modify T cells. In some embodiments, the T cells may include any T cell subpopulation, such as cd4+, cd8+, gamma-delta, naive T cells, stem cell memory T cells, central memory T cells, or a mixture of subpopulations. In some embodiments, gene Writing ^TM The system may be used to deliver or modify T Cell Receptors (TCRs) in T cells. In some embodiments, gene Writing ^TM The system can be used for connectingAt least one Chimeric Antigen Receptor (CAR) is delivered to T cells. In some embodiments, gene Writing ^TM The system can be used to deliver at least one CAR to Natural Killer (NK) cells. In some embodiments, gene Writing ^TM The system can be used to deliver at least one CAR to Natural Killer T (NKT) cells. In some embodiments, gene Writing ^TM The system can be used to deliver at least one CAR to a progenitor cell, such as a progenitor cell of T, NK or NKT cells. In some embodiments, cells modified with at least one CAR (e.g., CAR-T cells, CAR-NK cells, CAR-NKT cells) or a combination of cells modified with at least one CAR (e.g., a mixture of CAR-NK/T cells) are used to treat a disorder as in MacKay, et al Nat Biotechnol [ natural biotechnology, incorporated herein in its entirety by reference ]38,233-244 (2020). In some embodiments, the immune cell comprises a CAR specific for a tumor or pathogen antigen selected from the group consisting of: AChR (fetal acetylcholine receptor), ADGRE2, AFP (alpha fetoprotein), BAFF-R, BCMA, CAIX (carbonic anhydrase IX), CCR1, CCR4, CEA (carcinoembryonic antigen), CD3, CD5, CD8, CD7, CD10, CD13, CD14, CD15, CD19, CD20, CD22, CD30, CD33, CLLI, CD34, CD38, CD41, CD44, CD49f, CD56, CD61, CD64, CD68, CD70, CD74, CD99, CD117, CD123, CD133, CD138, CD44v6, CD267, CD269, CDs, CLEC12A, CS1, EGP-2 (epithelial glycoprotein-2), EGP-40, EGFR (HER 1), EGFR-VIII, epCAM (epithelial cell adhesion molecule), ephA2, ERBB2 (HER 2), human epidermal growth factor receptor 2), ERBB3, ERBB4, FBP (folate binding protein), flt3 receptor, folate receptor- α, GD2 (ganglioside G2), GD3 (ganglioside G3), GPC3 (glypican 3), GPI00, hTERT (human telomerase reverse transcriptase), ICAM-1, integrin B7, interleukin 6 receptor, IL13Ra2 (interleukin 13 receptor 30 subunit α -2), kappa-light chain, KDR (kinase insert domain receptor), leY (Lewis Y), L1CAM (LI cell adhesion molecule), LILRB2 (leukocyte immunoglobulin-like receptor B2), MARTI, MAGE-A1 (melanoma-associated antigen Al), MAGE-A3, MSLN (mesothelin), MUC16 (mucin 16), MUCI (mucin I), KG2D ligand, NY-ESO -1 (cancer-testis antigen), PRI (protease 3), TRBCI, TRBC2, TFM-3, TACI, tyrosinase, survivin, hTERT, carcinoembryonic antigen (h 5T 4), p53, PSCA (prostate stem cell antigen), PSMA (prostate specific membrane antigen), krrl, TAG-72 (tumor-associated glycoprotein 72), VEGF-R2 (vascular endothelial growth factor R2), WT-1 (nephroblastoma protein) and antigens of HIV (human immunodeficiency virus), hepatitis b, hepatitis c, CMV (cytomegalovirus), EBV (epstein barr virus), HPV (human papillomavirus).

In some embodiments, immune cells, such as T cells, NK cells, NKT cells, or progenitor cells, are modified ex vivo and then delivered to the patient. In some embodiments, the Gene Writer is delivered by one of the methods mentioned herein ^TM The system, and in the patient in vivo modified immune cells, such as T cells, NK cells, NKT cells or progenitor cells.

In some embodiments, the Gene writers described herein ^TM The system is delivered to tissue or cells from the brain, cerebellum, adrenal gland, ovary, pancreas, parathyroid, pituitary, testis, thyroid, breast, spleen, tonsils, thymus, lymph nodes, bone marrow, lung, myocardium, esophagus, stomach, small intestine, colon, liver, salivary gland, kidney, prostate, blood, or other cells or tissue types. In some embodiments, the Gene writers described herein ^TM The system is useful for treating diseases such as cancer, inflammatory diseases, infectious diseases, genetic defects, or other diseases. The cancer may be a cancer of the brain, cerebellum, adrenal gland, ovary, pancreas, parathyroid, pituitary, testis, thyroid, breast, spleen, tonsils, thymus, lymph nodes, bone marrow, lung, myocardium, esophagus, stomach, small intestine, colon, liver, salivary gland, kidney, prostate, blood, or other cell or tissue types, and may include a variety of cancers.

In some embodiments, the Gene writers described herein ^TM The system is administered by enteral administration (e.g., oral, rectal, gastrointestinal, sublingual, sub-labial or buccal administration). In some embodiments, the Gene writers described herein ^TM The system is administered parenterally (e.g., intravenously,Intramuscular, subcutaneous, intradermal, epidural, intracerebral, intracerebroventricular, epidermal, nasal, intraarterial, intra-articular, intracavernosal, intraocular, intraosseous infusion, intraperitoneal, intrathecal, intrauterine, intravaginal, intravesical, perivascular, or transmucosal administration). In some embodiments, the Gene writers described herein ^TM The system is administered by topical administration (e.g., transdermal administration).

In some embodiments, the Gene Writing system can be used to simultaneously or sequentially make multiple modifications to a target cell. In some embodiments, the Gene Writing system can be used to further modify the modified cells. In some embodiments, the Gene Writing system can be used to modify cells edited by complementary techniques, such as Gene-edited cells, e.g., cells with one or more CRISPR knockouts. In some embodiments, the previously edited cell is a T cell. In some embodiments, the previous modification includes gene knockout of, for example, endogenous TCR (e.g., TRAC, TRBC), HLA class I (B2M), PD1, CD52, CTLA-4, TIM-3, LAG-3, DGK in T cells. In some embodiments, the Gene Writing system is used to insert a TCR or CAR into a previously modified T cell.

In some embodiments, gene writers as described herein ^TM The system can be used to modify animal cells, plant cells, or fungal cells. In some embodiments, gene writers as described herein ^TM The system can be used to modify mammalian cells (e.g., human cells). In some embodiments, gene writers as described herein ^TM The system can be used to modify cells from livestock animals (e.g., cattle, horses, sheep, goats, pigs, llamas, alpacas, camels, yaks, chickens, ducks, geese, or ostriches). In some embodiments, gene writers as described herein ^TM The system can be used as a laboratory or research tool, or in a laboratory or research method, for example, to modify animal cells, such as mammalian cells (e.g., human cells), plant cells, or fungal cells.

In some embodiments, gene writers as described herein ^TM The system can be used to express a protein, template, or heterologous subject sequence (e.g., in an animalSuch as mammalian cells (e.g., human cells), plant cells, or fungal cells. In some embodiments, gene writers as described herein ^TM The system can be used to express a protein, template, or heterologous subject sequence under the control of an inducible promoter (e.g., a small molecule inducible promoter). In some embodiments, the Gene Writing system or its payload is designed for tunable control, for example, by using an inducible promoter. For example, the promoter driving the gene of interest (e.g., tet) may be silent upon integration, but in some cases may be activated upon exposure to a small molecule inducer (e.g., doxycycline). In some embodiments, the tunable expression allows post-treatment control of genes (e.g., therapeutic genes), e.g., allowing for small molecule dependent dosing effects. In embodiments, small molecule dependent administration of the drug includes temporally and/or spatially altering the level of the gene product, e.g., by topical administration. In some embodiments, the promoters used in the systems described herein may be inducible, e.g., responsive to endogenous molecules of the host and/or exogenous small molecules administered thereto.

In some embodiments, the Gene Writer system is used to alter non-coding regions and/or regulate control regions, e.g., to regulate expression of endogenous genes. In some embodiments, the Gene Writer system is used to induce up-or down-regulation of Gene expression. In some embodiments, the regulatory control region comprises one or more of a promoter, enhancer, UTR, CTCF site, and/or gene expression control region.

In some embodiments, the Gene Writer system can be used to treat or prevent, or reduce the severity or symptoms of, a repeat-amplified disease (e.g., a disease of table 26). In some embodiments, the repeat-amplification disease comprises amplification of trinucleotide repeats. In some embodiments, the subject has at least 10, 20, 30, 40, or 50 copies of the repeat sequence. In embodiments, the repeat-amplified disease is a genetic disease. Non-limiting examples of repeat amplified disease include Huntington's Disease (HD) and tonic dystrophy. For example, healthy individuals may possess 10 to 35 tandem copies of the CAG trinucleotide repeat sequence, whereas huntingtin patients typically possess >40 copies, which may lead to, for example, elongated and dysfunctional huntingtin. In some embodiments, gene Writer corrects for duplicate amplification, for example, by identifying DNA at the end of the repeat region and nicking one strand (fig. 30). In some embodiments, the template RNA component of the Gene Writer comprises a region having a number of repeats characteristic of a healthy subject, for example about 20 repeats (e.g., 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40 repeats). In some embodiments, the template RNA component of the Gene Writer is replicated into the target site by TPRT. In some embodiments, the second strand nick and second strand synthesis then results in the integration of newly replicated DNA comprising the correct number of repeat sequences (e.g., as described herein). In some embodiments, the system recognizes DNA at the end of the repeat region and the template carries information on the number of new repeats. In embodiments, gene writers may be used in this manner, regardless of the individual and/or the number of repetitive sequences present in the individual cells. Because of the presence of multiple repeat sequences, in some embodiments, alternative non-geneWriter therapeutics (e.g., CRISPR-based homologous recombination therapeutics) may result in unpredictable repair behavior. Other non-limiting examples of repeat sequence amplification diseases and pathogenic repeat sequences can be found, for example, in La Spada and Taylor Nat Rev Genet [ Nature reviews genetics ]11 (4): 247-258 (2010), which is incorporated herein by reference in its entirety.

In some embodiments, the Gene Writing system can be used to treat healthy individuals, for example as a prophylactic therapy. In some embodiments, the Gene Writer system may target mutations, e.g., mutations that have been shown to have protective effects against a disease of interest. An exemplary list of such diseases and protective mutant targets can be found in table 22.

Gene Writing ^TM The system may be used for treating liver indications. In some embodiments, it is preferred for Gene Writing ^TM Including but not limited to any of table 5 selected from tables 10A-10D or 11A-11G or WO 2020014209 (which is incorporated herein by reference)Disease. In exemplary embodiments, OTC deficiency is addressed by delivering all or a fragment of an OTC gene, e.g., an OTC gene contained in table 5 of WO 2020014209. In some embodiments, OTC deficiency is addressed by delivering the complete OTC gene expression cassette to the genome that complements the function of the mutant gene. In some embodiments, fragments of the OTC gene are used to replace pathogenic mutations at their endogenous loci. In other embodiments, gene Writing ^TM The system is used to address a condition selected from column 6 of table 4, the following "lung disease" paragraphs or any of tables 10A-10D or 11A-11G of WO 2020014209 by: all or a fragment of the gene expression cassette encoding the corresponding gene shown in column 1 of table 4, the following "lung disease" paragraphs or any of tables 10A-10D or 11A-11G of WO 2020014209 is delivered. In some embodiments, all or a fragment of the gene expression cassette is delivered to an endogenous locus of a pathogenic mutation. In some embodiments, all or a fragment of the gene expression cassette is integrated at a separate locus in the genome and complements the function of the mutated gene.

In certain embodiments, gene Writer ^TM The system provides heterologous subject sequences comprising the genes in table 4 of WO 2020014209 or the following "lung disease" paragraph.

Table 10A: indication and genetic targets, e.g. in liver

/>

Table 10B: indication and genetic targets for HSCs

/>

Table 10C: indication and genetic target of CNS

/>

Table 10D: indication and genetic target of the eye

/>

In some embodiments, the GeneWriter system described herein is used to treat an indication of any of tables 11A-11G. For example, in some embodiments, the geneWriter system modifies a target site in genomic DNA in a cell, wherein the cell comprises a mutation at a gene of any of tables 11A-11G, e.g., a gene in a subject having a corresponding indication listed in any of tables 11A-11G. In some embodiments, the target site is in a random site in the genome. In some embodiments, the target site is in a GSH sequence. In some embodiments, the cell is a hepatocyte comprising mutations in the genes of table 11A, for example in a subject having the corresponding indications listed in table 11A. In some embodiments, the cell is a HSC comprising a mutation in a gene of table 11B, e.g., in a subject having the corresponding indication listed in table 11B. In some embodiments, the cell is a CNS cell comprising a mutation in a gene of table 11C, e.g., in a subject having the corresponding indication listed in table 11C. In some embodiments, the cell is an eye cell comprising a mutation in a gene of table 11D, e.g., in a subject having the corresponding indication listed in table 11D. In some embodiments, the cell is a lung cell comprising a mutation in a gene of table 11E, e.g., in a subject having the corresponding indication listed in table 11E. In some embodiments, the cell is a muscle cell (e.g., skeletal muscle cell) comprising a mutation in a gene of table 11F, e.g., in a subject having the corresponding indication listed in table 11F. In some embodiments, the cell is a skin cell comprising a mutation in a gene of table 11G, e.g., in a subject having the corresponding indication listed in table 11G.

Table 11A: indication and genetic target of liver

/>

Table 11B: indication and genetic targets for HSCs

/>

Table 11C: indication and genetic target of CNS

Disease of the human body	Affected genes
		Delangerhorn syndrome (SMC 1A)	SMC1A
1 neurofibromatosis	NF1
		2 neurofibromatosis	NF2
Leite (Rett) syndrome (MECP 2)	MECP2

Table 11D: indication and genetic target of the eye

Disease of the human body	Affected genes
		Cone rod malnutrition (CRX)	CRX
Cone rod malnutrition (GUCY 2D)	GUCY2D
		Lattice corneal dystrophy type I	TGFBI
Retinitis pigmentosa (AD)	RHO
		Yolk macular dystrophy	BEST1；PRPH2

Table 11E: pulmonary indications and genetic targets

/>

Table 11F: indication and genetic target of skeletal muscle

/>

Table 11G: indication and genetic target of skin

/>

Other suitable indications

Exemplary suitable diseases and disorders that can be treated by the systems or methods provided herein (e.g., those comprising Gene writers) include, but are not limited to: baraitser-Winter syndromes 1 and 2; diabetes and diabetes insipidus with optic atrophy and deafness; alpha-1-antitrypsin deficiency; heparin cofactor II deficiency; adrenoleukodystrophy; keppen-Lubinsky syndrome; teriqi-kolin (Treacher collins) syndrome 1; mitochondrial complex I, II, III, III (core 2, 4 or 8) deficiency; hypermanganemia is accompanied by dystonia, polycythemia and cirrhosis; intestinal cancer; rhabdomyoid tumor susceptibility syndrome 2; wilson disease (Wilson disease); hyperphenylalaninemia, BH4 deficiency, a, BH4 deficiency due to partial pts deficiency, D, and non-pku; hyperinsulinatic hypochaffiness families 3, 4 and 5; keratosis folliculitis; orofacial-finger syndrome; seSAME syndrome; deafness, non-syndromic sensory nerves, mitochondria; proteinuria; insulin dependent diabetes secretory diarrhea syndrome; smoke disease 5; congenital aplastic anemia 1, 5, 8 and 10; pseudo-achondroplasia spondyloeepiphyseal dysplasia syndrome; keratofrailty syndrome 2; methylmalonic acid blood is accompanied by homocystinuria; adam-Orific syndrome 5 and 6; autosomal recessive agaropectinemia 2; cortical deformity, occipital bone; febrile convulsion, familial, 11; mucopolysaccharidoses type VI (severe) and type VII; ma Deng Walker (Marden Walker) like syndrome; pseudo-neonatal adrenoleukodystrophy; globoid myopathy; craniocerebral hypoplasia; multiple cutaneous and mucosal venous malformations; acute infantile liver failure; intrahepatic cholestasis in newborns due to naringenin deficiency; ventricular septal defect 1; dysplasia of the eye and tooth fingers; wilms' tumor 1; class Weill-Marchesani syndrome; renal dysplasia; cataracts type 1, 4, autosomal dominant, polytype, microkeratome, coplock (copdock) like, juvenile, microkeratome and diabetes, diffuse non-progressive nuclear; tooth-type hypoalkaline phosphatase; brain-eye-face-bone syndrome; schizophrenia 15; cerebral amyloid angiopathy, APP-related; familial hemophagocytic lymphocytosis 3; porphyrinogen synthase deficiency; type 2 narcotic ataxia; hair-nose-finger syndrome 3; progressive familial heart block type IB; glioma susceptibility 1; li Xiteng Stent-Crohn (Lichtenstein-Knorr) syndrome; x-linked hypohidrosis ectodermal dysplasia; type 3, type 3 hypocalcuria and type 4 of bat syndrome; deficiency of carbonic anhydrase VA, hyperamidemia; cardiomyopathy; skin heterochrosis, hereditary fibrosis, with tendon contracture, myopathy and pulmonary fibrosis; combining d-2-and 1-2-hydroxyglutarate; arginase deficiency; cone rod malnutrition 2 and 6; ste-li-o (Smith-Lemli-optz) syndrome; mucolipid storage disease III gamma; bulaue (Blau) syndrome; wei Mo (Wemer) syndrome; meningioma; iodotyrosyl coupling defects; dubin-Johnson syndrome; 3-Oxo-5α -steroid delta 4-dehydrogenase deficiency; bauchenniehaus (Boucher Neuhauser) syndrome; iron deposition in the brain; mental retardation, X-linkage 102 and syndrome 13; familial pituitary adenoma susceptibility; dysplasia of the corpus callosum; hyperalpha lipoproteinemia 2; deficiency of ferrous oxidase; immune deficient growth hormone is insensitive; ataxia-cataract syndrome (Marinesco-sj\xc3\xb6 gren syndrome); ma Erci olv (Martsolf) syndrome; familial horizontal fixation paralysis is accompanied by progressive scoliosis; michel-ril (Mitchell-Riley) syndrome; hypocalcuria hypercalcemia, familial, type 1 and type 3; lubinstein-Taybi syndrome; nephritis and deafness syndrome; teenager retinal split disease; becker muscular dystrophy; loey Dietz syndrome 1, 2, 3; congenital muscle hypertrophy-brain syndrome; familial juvenile gout; spermatogenic disorders 11, 3 and 8; cleft lips 11 and 7, cleft lips/cleft palate-ectodermal dysplasia syndrome; mental retardation, X linkage, nonspecific, syndrome, hedera type and syndrome, wu type; combined oxidative phosphorylation defects 1, 3, 4, 12, 15, and 25; frontotemporal dementia; kniest dysplasia; familial cardiomyopathy; benign familial hematuria; pheochromocytoma; aminoglycoside-induced deafness; deficiency of gamma-aminobutyric acid transaminase; eyelid albinism type IB, type 3 and type 4; kidney defect syndrome; central nervous system myelination is reduced; hennekam lymphangioectasia-lymphedema syndrome 2; migraine, familial basilar artery; x-linked distal spinal muscular atrophy; x-linked paraventricular ectopic; small head deformity; mucopolysaccharidoses, MPS-I-H/S, MPS-II, MPS-III-A, MPS-III-B, MPS-III-C, MPS-IV-A, MPS-IV-B; infant Parkinsonism (Parkinsonism) -dystonia; frontotemporal dementia with TDP43 inclusion bodies, TARDBP-related; hereditary diffuse gastric cancer; type I and type II sialidosis; microcephaly-capillary malformation syndrome; hereditary breast cancer and ovarian cancer syndrome; cerebral small vessel disease is accompanied with bleeding; non-ketonic hyperglycinemia; navajo (Navajo) nerve liver disease; ear condyloid syndrome 2; spastic paraplegia 15, 2, 3, 35, 39, 4, autosomal dominant, 55, autosomal recessive, and 5A; autosomal recessive skin relaxations IA and IB; hemolytic anemia, non-spherical erythrocyte anemia, caused by glucose phosphate isomerase deficiency; early aging syndrome; familial amyloid kidney disease with urticaria and deafness; stenosis on the aortic valve; diffuse palmoplantar keratosis, bothnian type; heart-hand syndrome; coffin Siris/intellectual disability; left and right shaft deformity; lapradilin (rapadiline) syndrome; true small eyeballs 2; craniocerebral premature closure and dental variation; paraganglioma 1; linederson (Snyder Robinson) syndrome; ventricular fibrillation; activated PI 3K-delta syndrome; haowel-Evans (howell-Evans) syndrome; flat face joint dislocation foot abnormality syndrome, dominant type; van Maldergem syndrome 2; MYH-related polyps; 6-pyruvoyl-tetrahydropterin synthetase deficiency; arabinogalactan syndrome 1 and 2; myomatosis of lymphatic vessels; myo-oculopathy; WFSl-related disorders; primary hypertrophic osteoarthropathy, autosomal recessive inheritance 2; infertility; nestor-Guillermo (Nester-Gui Naiwei mol) premature senility syndrome; mitochondrial trifunctional protein deficiency; left heart dysplasia syndrome 2; primary dilated cardiomyopathy; retinitis pigmentosa; congenital megacolon 3; hereditary thrombotic thrombocytopenic purpura; diribu Kou Si (Desbuquois) dysplasia 2; diarrhea 3 (sodium secretion, congenital, syndrome) and 5 (congenital tufted bowel disease); congenital armor disease type 4 and type 2; autosomal dominant and recessive cerebral arterial diseases with subcortical infarction and leukoencephalopathy; vitelline-like macular dystrophy (Vi tel 1i form dystopy); type II, type IV (hepatopathy combination), type V and type VI; atypical Rett (Rett) syndrome; atrioventricular septal defect 4; palmoplantar keratosis Zhou Pohuai syndrome (papilon-lef\xc3\xa 8vre syndrome); leber (Leber) congenital amaurosis; x-linked inherited motor and sensory neuropathy; progressive sclerosing poliomyelitis; goldmann-Favre syndrome; kidney-liver-pancreatic dysplasia; hall-pandi syndrome; amyloidothyroxine amyloidosis; mei-Nitwo syndrome; hyperimmune immunoglobulin E syndrome; posterior ataxia and retinitis pigmentosa; punctate dysplasia 1, X-linked recessive and 2X-linked dominant; crystalline lens ectopic, isolated autosomal recessive and dominant; familial cold urticaria; familial adenomatous polyposis 1 and 3; sweat pore keratosis 8, spread superficial light; PIK3 CA-related overgrowth spectrum; intracranial sponge hemangioma 2; exudative vitreoretinopathy 6; congenital megabrain malformation telangiectasia; TARP syndrome; diabetes mellitus, permanent newborns, with neurological characteristics; short rib thoracic dysplasia 11 or 3 with or without multifinger deformity; hairy osteochondral dysplasia; beta thalassemia; niemann-Pick (Niemann-Pick) disease, cl, C2, a and Cl, adult; shack-marry-disease type IB, 2B2, 2C, 2F, 21, 2U (axon), 1C (demyelination), dominant middle C, recessive middle a, 2A2, 4C, 4D, 4H, IF, IVF and X; tyrosinemia type I; paroxysmal atrial fibrillation; UV-sensitive syndrome; tooth loss, selectivity, 3 and 4; metacin (Merosin) deficiency congenital muscular dystrophy; long chain 3-hydroxyacyl-CoA dehydrogenase deficiency; congenital aneroid; left ventricular densification insufficiency 5; an aromatic-L-amino acid decarboxylase deficiency; coronary heart disease; full white armor; distal joint contracture type 2B; retinitis pigmentosa 10, 11, 12, 14, 15, 17, and 19; luo Binsuo Laow (Robinow Sorauf) syndrome; tenorio (Tenorio) syndrome; lactating tumor; neurofibromatosis, blue (land) type 2; congenital muscular dystrophy-associated glycoprotein disease with brain and eye abnormalities, types A2, A7, A8, al and a 14; ectopic viscera, 2, 4, and 6, autosomes; yankee victoria (Jankovic river syndrome); lipodystrophy, familial partial, types 2 and 3; haemoglobin H disease, not deficiency; multiple central osteolytic sarcoidosis and arthropathy; hypoplasia of thyroid gland; the lack of acyl-coa dehydrogenase family member 9; alexander disease; phytanic acid storage disease; breast cancer-ovarian cancer, familial 1, 2, and 4; proline dehydrogenase deficiency; childhood hypophosphatasia; pancreatic hypoplasia and congenital heart disease; vitamin D dependent rickets, type Lan De (land) 2; iris anterior chamber angle hypoplasia type 1 and type dominant; autosomal recessive hypohidrosis ectodermal dysplasia syndrome; mental retardation, X linkage, 3, 21, 30 and 72; type 2 hereditary hemorrhagic telangiectasia; eyelid fissures (eyelid orifices) are narrow, ptosis and inner canthus skin tag; deficiency of adenine phosphoribosyl transferase; epileptic seizures, benign familial infants, 2; acrodysplasia 2, with or without hormonal resistance; fallot (Fallot) tetranection; retinitis pigmentosa 2, 20, 25, 35, 36, 38, 39, 4, 40, 43, 45, 48, 66, 7, 70, 72; lysosomal acid lipase deficiency; eichsfeld type congenital muscular dystrophy; wacker-Walberg (Walker-Warburg) congenital muscular dystrophy; TNF receptor-related periodic fever syndrome (trap); progressive myoclonus epilepsy with ataxia; epilepsy, childhood absence epilepsy 2, 12 (idiopathic generalized epilepsy, susceptibility) 5 (night frontal lobe epilepsy), night frontal lobe epilepsy type 1, partial, with variable lesions, progressive myoclonus epilepsy 3, and X-linked, with variable learning and behavioral disorders; long QT syndrome; dicarboxylic amino acid urine; short-finger type A1 and A2; elastic pseudoxanthomatoid disorders with multiple clotting factor deficiencies; multisystem smooth muscle dysfunction syndrome; and Cenani Lenz type; zhu Bate (Joubert) syndrome 1, 6, 7, 9/15 (double genes), 14, 16 and 17, and oromaxillofacial syndrome xiv; phalangeal brain syndrome; retinoblastoma; familial dyskinesia, accompanied by facial fibrotic twitching; hereditary sensory and autonomic neuropathy IIB and IIA; familial hyperinsulinemia; giant leukoencephalopathy blue (land) 2a with subcortical cysts; an arctic (Aase) syndrome; wiedemann-Steiner syndrome; exfoliative ichthyosis; congenital myotonia; granulomatosis, chronic, X-linked, variability; 2-methylbutyryl-coa dehydrogenase deficiency; sarcoidosis, early onset; glaucoma, congenital and glaucoma, congenital, defect; breast cancer, susceptibility; neuronal ceroid lipofuscinosis 2, 6, 7, and 10; congenital systemic lipodystrophy type 2; fructose-bisphosphatase deficiency; congenital contracture spider foot-like digits; linqi (Lynch) syndromes I and II; phosphoglycerate dehydrogenase deficiency; burn-Mckeown syndrome; myocardial infarction 1; achromatopsia 2 and 7; retinitis pigmentosa 73; red blindness gene defect; polycephalum deformity, asymmetry, bilateral frontal top; spinal muscular atrophy, distal, autosomal recessive inheritance, 5; methylmalonic acid urea due to methylmalonyl-coa mutase deficiency; familial brain punch-through deformity; hurler (Hurler) syndrome; ear-palatophageal finger syndrome, type I and type II; pediatric megabrain deformity syndrome 1 or 2; cardiomyopathy, fatal pediatric, due to cytochrome c oxidase deficiency; jurisprudence in the class of sprains; thyrotropin releasing hormone resistance, systemic; diabetes, type 2 and insulin dependent, 20; thoracic aortic aneurysms and aortic dissection; estrogen resistance; maple diabetes type 1A and 3; hypourethral cleavages 1 and 2, x are linked; metachromatic leukodystrophy juvenile, late infant and adult; early T cell progenitor acute lymphoblastic leukemia; hereditary sensory neuropathy, IC; mental retardation, autosomal dominant inheritance 31; retinitis pigmentosa 39; breast cancer, early onset; may-Hegglin abnormalities; gaucher's (Gaucher) disease type 1 and subacute neuronal disorders; a tenolean (tertamy) syndrome; spinal muscular atrophy, lower limb dominance 2, autosomal dominance; fanconi (Fanconi) anemia, complementation group E, I, N and O; a melanosis; megacolon disease; combination malonic acid and methylmalonic acid urea; arrhythmogenic right ventricular cardiomyopathy types 5, 8 and 10; congenital lipoma overgrowth, vascular malformations and superficial nevi; extract moxiy (Timothy) syndrome; guanidine acetate methyltransferase deficiency; myoclonus dystonia; kanzaki (Kanzaki) disease; a defect in neutral 1 amino acid transport; diabetes insipidus of neurohypophysis; abnormal thyroid hormone metabolism; benign shoulder fibular muscular dystrophy with cardiomyopathy; hypoglycemia of liver deficient glycogen synthase; hypertrophic cardiomyopathy; congenital myasthenia syndrome associated with acetylcholine receptor deficiency, 11; mental retardation syndrome X linkage syndrome 5; ston (stormerken) syndrome; aplastic anemia; mental retardation; periodic normocampal paralysis, potassium sensitivity; darnon disease (Danon disease); kidney tuberculosis 13, 15 and 4; periodic paralysis of thyroid toxicity and periodic paralysis of thyroid toxicity 2; infertility associated with multi-tailed sperm and excessive DNA; glaucoma, primary open angle, juvenile onset; fibrinogen-free and congenital fibrinogen-free; polycystic kidney disease 2, adult, infant; familial delayed skin porphyria; ocular brain-kidney syndrome (tuberculosis of kidney, motor nerve loss and cerebellum abnormality); frontotemporal dementia 3 chromosome linkage and frontotemporal dementia ubiquitin positivity; abnormal proliferation; immunodeficiency-centromere instability-facial abnormality syndrome 2; anemia, nonspherical cell hemolysis, due to G6PD deficiency; bronchodilation with or without elevated sweat chloride 3; congenital myopathy is accompanied by fiber imbalance; carney (Carney) syndrome, type 1; cryptorchid, unilateral or bilateral; siemens bullous ichthyosis; isolated luteinizing hormone deficiency; DFNA 2 non-syndromic hearing loss; clarinet-waldensberg (Klein-waadenberg) syndrome; gray platelet syndrome; bile acid synthesis deficiency, congenital, 2;46, xy reversal, types 1, 3, 5; acute intermittent porphyria; de lambert syndrome (Cornelia de Fange syndrome) 1 and 5; hyperglyciuria; cone body malnutrition 3; abnormal fibrinogen blood disease; karak (Karak) syndrome; congenital muscular dystrophy-mental disorder-amyotrophic lateral sclerosis-associated glycoprotein disease, type B5; infant's eye shake, X links; congenital dysplastic, autosomal recessive inheritance, 1, 3, 4, and 5; small head deformity with or without chorioretinopathy, lymphedema or mental retardation; hyperlysinemia; barde-pidem (barset-Biedl) syndromes 1, 11, 16 and 19; autosomal recessive central nuclear myopathy; frazier (Frasier) syndrome; tail degeneration syndrome; congenital extraocular muscle fibrosis, 1, 2, 3a (with or without extraocular involvement), 3b; prader-Willi (Prader-Willi) like syndrome; malignant melanoma; bloom syndrome; keratosis folliculitis, segmental; multi-center osteolytic kidney disease; type 1, type 2B and type 3 hemochromatosis; infant cerebellar ataxia and cerebellar ataxia with progressive extraocular muscle paralysis, mental retardation and imbalance syndrome 2; left heart dysplasia syndrome; epilepsy, hearing loss and mental retardation syndrome; quantitative trait locus 2 for serum levels of transferrin; albino eye, type I; marfan syndrome; congenital muscular dystrophy-associated glycoprotein diseases with brain and eye abnormalities, types a14 and B14; hyperammonemia, type III; cryptoeye syndrome; congenital alopecia universalis; adult hypophosphatasia; mannose binding protein deficiency; bullseye macular dystrophy; autosomal dominant torsionally dystonia 4; nephrotic syndrome, type 3, type 5, with or without ocular abnormalities, type 7 and type 9; epileptic seizure, early infant epileptic encephalopathy 7; persistent infant-type insulin hypersecretion hypoglycemia; thrombocytopenia, X linked; neonatal hypotonia; ostwald-vickers-berg (Orstavik Lindemann Solberg) syndrome; pulmonary hypertension, primary, 1, hereditary hemorrhagic telangiectasia; pituitary-dependent hypercortisolism; wart-like epidermodysplasia; focal variant junctional epidermolysis bullosa; cytochrome c oxidase i deficiency; gold de le (Kindler) syndrome; myosclerosis, autosomal recessive inheritance; arterial trunk; retrobulbar syndrome type 2; ADULT syndrome; ji Weige (Zellweger) syndrome spectrum; leukoencephalopathy is accompanied by ataxia, brain stem and spinal cord involvement, elevated lactic acid, white matter ablation and progressive, ovarian failure; antithrombin III deficiency; full forebrain deformity 7; roberts (Roberts) -SC-photophobia syndrome; mitochondrial DNA depletion syndromes 3 and 7, hepatic encephalopathy and 13 (encephalomyopathy); brain punch-through deformity 2; small head deformity, normal intelligence and immunodeficiency; megaaxonal neuropathy; steckel-Weber syndrome, capillary malformation, congenital, 1; fabry disease and fabry disease heart variant; deficiency of glutamate iminomethyltransferase; fanconi Bi Keer (Fanconi-Bickel) syndrome; tip-on-tip fine dysplasia; epilepsy, idiopathic systemic, susceptibility, 12; basal ganglia calcification, idiopathic, 4; polysaccharide myopathy 1 with or without immunodeficiency; prostate malignancy; congenital ectodermal dysplasia of the face; congenital heart disease; age-related macular degeneration 3, 6, 11 and 12; congenital myotonia, autosomal dominant and recessive forms; hypomagnesemia 1, intestinal tract; sulfite oxidase deficiency, solitary; pick's disease; type I plasminogen-deficiency; and refers to type 3; cone rod malnutrition enamel hypoplasia; pseudoprimary aldosteronism; end bone dysplasia; neonatal type barter syndrome type 2; congenital muscular dystrophy-associated glycoprotein diseases with mental retardation, type B2, type B3, type B5 and type B15; familial infantile muscular inoperability; lymphoproliferative syndromes 1, 1 (X linked) and 2; hypercholesterolemia and autosomal recessive hypercholesterolemia; ovarian malignancy; infant GM1 ganglioside deposition; syndrome X-linked mental retardation 16; deficiency of 5-phosphoribosyl isomerase; alzheimer's disease, type 1, type 3 and type 4; an anhigan-taylor (Andersen Tawil) syndrome; multiple arthritis syndrome 3; chilblain-like lupus 1; hemophagocytic lymphocytosis, familial, 2; ackersen Fisher-Rieger (Axenfeld-Rieger) syndrome type 3; myopathy, congenital core; osteoarthritis with mild cartilage dysplasia; peroxisome biogenesis disorders; severe congenital neutrophil deficiency; hereditary neuralgia muscular atrophy; focal non-epidermolytic palmoplantar keratosis; abnormal plasminogen disease; familial colorectal cancer; spasticity ataxia 5, autosomal recessive inheritance, charlevoid-Saguenay type, 1, 10 or 11, autosomal recessive inheritance; frontal metaphyseal dysplasia Lan De (land) 3; deficiency of genetic factor II, IX, VIII; spinal joint dysplasia, ehlers-Danlos syndrome, immune disorder, proteoglycan type, congenital joint dislocation, short limb hand type, sedaghatian type, cone bar dystrophy, kozlowski type; ichthyosis premature labor syndrome; still (Stickler) syndrome type 1; focal segmental glomerulosclerosis 5; 5-hydroxyproline enzyme deficiency; syndrome small eye deformities 5, 7, and 9; juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome; butyryl-coa dehydrogenase deficiency; adult-onset diabetes mellitus, type 2, in young people; mental retardation, syndrome, claes-Jensen type, X linkage; deafness, cochlea, myopia and intellectual disability, no vestibular involvement, autosomal dominant inheritance, X linkage 2; spinal wrist fusion syndrome; sting-related juvenile onset vascular lesions; neutral fat deposition with myopathy; immune dysfunction 2 in which calcium entry deficiency leads to T cell inactivation; heart-face skin syndrome; corticosterone methyl oxidase type 2 deficiency; hereditary myopathy is accompanied by early respiratory failure; interstitial nephritis, megakaryosis; trimethylaminuria; hyperimmune protein D is associated with periodic fever; malignant Gao Reyi susceptible to type 1; multiple disorders accompanied by mental retardation, dwarfism and retinitis pigmentosa; breast adenocarcinoma; complement factor B deficiency; wu Erli-h (Ullrich) congenital muscular dystrophy; left ventricular densification incomplete cardiomyopathy; fish eye disease; fischer (Finnish) congenital nephrotic syndrome; limb banding muscular dystrophy, type IB, type 2A, type 2B, type 2D, cl, type C5, type C9, type C14; idiopathic fibrosing alveolitis, chronic form; primary familial hypertrophic cardiomyopathy; angiotensin converting enzyme, benign serum elevation; cd8 deficiency, familiarity; pranetti (protein) syndrome; glucose-6-phosphate transport defects; primary-foci-favium (Borjeson-forsman-Lehmann) trisomy; ji Weige (Zellweger) syndrome; spinal muscular atrophy, type II; prostate cancer, hereditary, 2; thrombocytopenia, platelet dysfunction, hemolysis and imbalance in globin synthesis; congenital glycosylation disorders type IB, ID, 1G, 1H, 1J, IK, IN, IP, 2C, 2J, 2K, ilm; the interface epidermolysis bullosa type Herlitz; general epilepsy with febrile convulsion syndrome 3, type 1 and type 2; schizophrenia 4; coronary artery disease, autosomal dominant inheritance 2; congenital dysplastic keratosis, autosomal dominant inheritance, 2 and 5; hypocortical laminar dislocation, X-linkage; adenylate kinase deficiency; severe combined immunodeficiency of the X linkage; coproporphyria; transthyretin-associated amyloid cardiomyopathy; hypocalcemia, autosomal dominant inheritance 1; cloth Lu Jiada (Brugada) syndrome; congenital myasthenia syndrome, acetazolamide reactivity; primary hypomagnesemia; sclerosteosis; frontotemporal dementia and/or amyotrophic lateral sclerosis 3 and 4; mevalonic aciduria; schwannoma 2; hereditary motor and sensory neuropathy accompanied by optic atrophy; late-onset skin porphyria; osteochondritis dissecans; seizures, benign familial neonates, 1, and/or myokinesia; long QT syndrome, LQT1 subtype; mental retardation, anterior maxillary protrusion, strabismus; idiopathic hypercalcemia in infants; hypogonadism 11 with or without loss of olfaction; polycystic lipid membranous bone dysplasia with sclerotic leukoencephalopathy; primary autosomal recessive microcephaly 10, 2, 3, and 5; aortic arch separation; thrombocytopenia congenital megakaryocyte-free; hermansky-Pudlak (Hermansky-Pudlak) syndromes 1, 3, 4 and 6; long QT syndrome 1, 2/9, 2/5, (bi-gene), 3, 5 and 5, availability, susceptibility; andeman (Andermann) syndrome; cone cell dystrophy 3B; erythropoietic protoporphyria; aminopterin (Sepiapterin) reductase deficiency; very long chain acyl-coa dehydrogenase deficiency; hyperferritin cataractous syndrome; silver spastic paraplegia syndrome; charcot-Marie-Tooth disease; atrial septal defect 2; karnaval syndrome; hereditary pain insensitivity with anhidrosis; catecholamine sensitive ventricular rate; low potassium periodic paralysis 1 and 2; sudden infant death syndrome; hypopigmented microcytic anemia with iron overload; GLUT1 lacks syndrome 2; leukodystrophy with myelin dysplasia, 11 and 6; cone total color blindness; autosomal dominant type 1 and type 2, recessive type 4, recessive type 1, recessive type 6; severe congenital neutrophil deficiency 3, autosomal recessive or dominant; methionine adenosyltransferase deficiency, autosomal dominant inheritance; paroxysmal familial ventricular fibrillation; erythrocyte pyruvate kinase deficiency; fatal cartilage dysplasia of newborn; torsade de pointes ventricular tachycardia; the type Markesberg-Griggs of remote myopathy; lack of UDP glucose-hexose-1-phosphate uridyltransferase; sudden cardiac death; neu-Laxova syndrome 1; no transferrin disease; hyperparathyroidism 1 and 2; malignant melanoma of skin 1; sympathetic, proximal, lb; progressive pseudo-rheumatoid dysplasia; wei Deni his-hofmann syndrome; poor cartilage formation type 2; all forebrain deformities 2, 3, 7, and 9; sindre disease type 1; brain retinal microvasculopathy with calcification and cysts; heterogeneity, visceral properties, X-linkage; tuberous sclerosis syndrome; catagen syndrome; thyroid hormone resistance, systemic, autosomal dominant inheritance; autosomal recessive genetic wilting; nail disorder, non-syndromic congenital, type 8; mohr-Tranebjaerg syndrome; cone body malnutrition 12; hearing impairment; ovarian leukodystrophy; proximal tubular acidosis with ocular abnormalities and mental retardation; dihydropteridine reductase deficiency; focal epilepsy with or without language disorder of mental retardation; ataxia-telangiectasia syndrome; brown-viletto-VanLaere syndrome and Brown-viletto-VanLaere syndrome 2; cardiomyopathy; peripheral demyelinating neuropathy, central myelin dysplasia; corneal dystrophy, fuchs endothelium, 4; cowden (Cowden) syndrome 3; dystonia 2 (torsions, autosomal recessions), 3 (torsions, X-linked), 5 (dopa-responsive), 10, 12, 16, 25, 26 (myoclonus); epiphyseal dysplasia, multiple, with myopia and conductive deafness; heart conduction defects, nonspecific; gill ear syndromes 2 and 3; peroxisome biogenesis disorders 14B, 2A, 4A, 5B, 6A, 7A and 7B; familial renal diabetes; candidiasis, familial, 2, 5, 6, and 8; autoimmune diseases, multisystems, infant onset; early-stage epileptic encephalopathy 2, 4, 7, 9, 10, 11, 13, and 14; segawa syndrome, autosomal recessive inheritance; deafness, autosomal dominant inheritance 3a, 4, 12, 13, 15, autosomal dominant non-syndromic sensory neurones 17,20 and 65; congenital erythropoiesis anoxia, type I and type II; enhanced s-cone syndrome; adult neuronal ceroid lipofuscinosis; atrial fibrillation, familial, 11, 12, 13, and 16; norum (Norum) disease; osteosarcoma; partial albinism; a biotin enzyme deficiency; cellular and humoral immune complex defects with granuloma; alpers (Alpers) encephalopathy; deficiency of the full carboxylase synthase; juvenile, type 1, type 2, type 11, type 3 and type 9 maturity-onset diabetes; variant porphyrias; infantile cortical bone hyperplasia; testosterone 17-beta-dehydrogenase deficiency; l-2-hydroxyglutarate urine disorder; tyrosinase negative eyelid albinism; primary ciliated dyskinesia 24; 4-pontic cerebellar hypoplasia; ciliary body movement disorders, primary, 7, 11, 15, 20, and 22; basal nuclear calcification 5; brain atrophy; craniosynostosis 1 and 4; keratoconus 1; skin diseases; congenital adrenal hyperplasia and congenital adrenal dysplasia, linked by X; mitochondrial DNA depletion syndromes 11, 12 (cardiomyopathy type), 2, 4B (MNGIE type), 8B (MNGIE type); short fingers with hypertension; applanation cornea 2; askog (Aarskog) syndrome; multiple epiphyseal dysplasia 5 or dominant; corneal endothelial dystrophy type 2; deficiency of aminoacylase 1; speech and language retardation; nicolaides-Baraitser syndrome; enterokinase deficiency; congenital hypodactyla, ectodermal dysplasia and cleft lip/palate syndrome 3; congenital multiple joint contracture, distal, X-linked; perrault (Perrault) syndrome 4; yervell and Lange-Nielsen syndrome 2; hereditary non-polyposis colorectal tumor; fetal face syndrome, autosomal inheritance, with short-range congenital (multi) digits; neurofibrosarcoma; cytochrome-c oxidase deficiency; vesicoureteral reflux 8; deficiency of dopamine beta hydroxylase; type I and type II carbohydrate deficiency glycoprotein syndromes; progressive familial intrahepatic cholestasis 3; benign familial neonatal-infant seizures; pancreatitis, chronic, susceptibility; proximal extremity punctate dysplasia type 2 and type 3; a steroid production disorder due to a deficiency of cytochrome p450 oxidoreductase; deafness accompanied by membranous labyrinthine hypoplasia and small tooth malformed deafness (FAMM); rossmand-thomson syndrome; cortical dysplasia, complex, with other brain deformities 5 and 6; muscle weakness, familial infantile form, 1; i-type hair with poor development of the nasal phalanges; worth disease; hypoplasia of the spleen; molybdenum cofactor deficiency, complement group a; sebastin (Sebastin) syndrome; progressive familial intrahepatic cholestasis 2 and 3; well-Ma Qiesa Ni (Weill-Marchesani) syndrome 1 and 3; original dwarfism of type 2 small skull dysplasia; pulmonary surfactant metabolic dysfunctions 2 and 3; heavy X-linked myotubular myopathy; pancreatic cancer 3; platelet hemorrhagic disorders 15 and 8; tyrosinase positive eyelid albinism; borrone Di Rocco Crovato syndrome; ATR-X syndrome; sucrase-isomaltase deficiency; complement component 4, partially absent, due to cl inhibitor dysfunction; congenital central hypoventilation; infant hypophosphatasia; plasminogen activator inhibitor type 1 deficiency; non-hodgkin's malignant lymphoma; hyperornithine-hyperammonemia-homocystinuria syndrome; shi Zhaner (Schwartz Jampel) syndrome type 1; fetal hemoglobin quantitative trait locus 1; distal myopathy accompanies tibial anterior morbidity; noonan (Noonan) syndrome 1 and 4, leopard syndrome 1; glaucoma 1, open angle, e, F and G; keny-kafei (Kenny-Caffey) syndrome type 2; PTEN hamartoma syndrome; progressive pseudohypertrophic muscular dystrophy; insulin resistant diabetes mellitus and acanthosis nigricans; ophthalmia, isolated 3, 5, 6, 8 and concomitant defect 6; lyne (rain) syndrome; premature ovarian failure 4, 5, 7 and 9; allan-Hundent-De Li (Allan-Hemdon-Dudley) syndrome; citrullinemia type I; alzheimer's disease, familial, 3, with spastic lower extremity paresis and disuse; familial hemiplegia migraine type 1 and type 2; pericardial enlargement with cystic kidney disease; elastic pseudoxanthoma; homocysteinemia due to MTHFR deficiency, CBS deficiency and homocystinuria, pyridoxine reactivity; dilated cardiomyopathy 1A, 1AA, 1C, 1G, IBB, 1DD, IFF, 1HH, II, IKK, IN, IS, 1Y, and 3B; muscle AMP guanine oxidase deficiency; familial breast cancer; hereditary iron granule young cell anemia; myoglobin urine, acute recurrent, autosomal recessive inheritance; neuroferritosis; arrhythmia (arrhythmia); glucose transporter type 1 deficiency syndrome; forebrain no split sequence sign; vascular disease, hereditary, with kidney disease, aneurysms and muscle spasms; isovaleryl-coa dehydrogenase deficiency; kalman (Kallmann) syndromes 1, 2 and 6; permanent neonatal diabetes; acromium callus syndrome, schinzel type; gordon (Gordon) syndrome; MYH 9-related disorders; tang Na-Barro (Donni Barrow) syndrome; severe congenital neutrophil deficiency 6, autosomal recession; shack-Marie-Tooth (Charcot-Marie-Tooth) disease, type ID and type IVF; kepler (Coffin-Lowry) syndrome; mitochondrial 3-hydroxy-3-methylglutaryl coa synthase deficiency; hypomagnesemia, seizures and mental retardation; ischial patella dysplasia; multiple congenital anomaly-hypotonia-seizure syndrome 3; spastic paraplegia 50, autosomal recessive inheritance; short stature with non-specific skeletal abnormalities; severe myoclonus epilepsy in infants; propionic acidemia; teenager nephron kidney tuberculosis; large head deformity, large child, facial deformity syndrome; stergard disease 4; einles-swenlo syndrome type 7 (autosomal recessive inheritance), classical, type 2 (premature senility), hydroxylysine deficient, type 4 variant, and due to tenascin-X deficiency; a myopic eye 6; a flattened hip; familial common cold autoinflammatory syndrome 2; malformations of the heart and great vessels; von willebrand disease type 2M and type 3; galactokinase deficiency; cloth Lu Jiada (Brugada) syndrome 1; x-linked ichthyosis with sterol sulfatase deficiency; congenital eye defects; histiocytosis-lymphadenopathy syndrome; anepithymia, cerebellar ataxia and mental retardation; left ventricular cardiac insufficiency 3; amyotrophic lateral sclerosis types 1, 6, 15 (with or without frontotemporal dementia), type 22 (with or without frontotemporal dementia), and type 10; osteogenesis imperfecta type 12, 5, 7, 8, I, III, sclera normal, dominant, recessive perinatal mortality; a hematological tumor; susceptibility to fava bean disease; pulmonary fibrosis and/or bone marrow failure, telomere-related, 1 and 3; dominant hereditary optic atrophy; dominant dystrophic epidermolysis bullosa, no skin; muscular dystrophy, congenital, giant cone; multiple gastrointestinal block; olbulite (McCune-align) syndrome; patellar nail syndrome; mcLeod (McLeod) acanthocytosis syndrome; common variant immunodeficiency disease 9; partial hypoxanthine-guanine phosphoribosyl transferase deficiency; pseudoaldosteronism type 1 autosomal dominant and recessive inheritance and type 2; uridylic acid synthase deficiency; ectopic; microphone (Meckel) syndrome type 7; congenital leucocyte granule abnormality syndrome (ch\xc3\xa 9diak-Higashi syndrome), chediak-Higashi syndrome, adult type; severe combined immunodeficiency caused by ADA deficiency with small head malformation, retarded growth, sensitivity to ionizing radiation, atypical, autosomal recessive inheritance, T cell negative, B cell positive, NK cell negative or NK positive; insulin resistance; steroid 11-beta-monooxygenase deficiency; popliteal pterygium syndrome; pulmonary hypertension associated with hereditary hemorrhagic telangiectasia; deafness, autosomal recessive inheritance 1A, 2, 3, 6, 8, 9, 12, 15, 16, 18b, 22, 28, 31, 44, 49, 63, 77, 86, and 89; primary hyperoxaluria, type I, type III; fengyou Lunberg (von Eulenburg) congenital paramyotonia; desbuquois syndrome Kou Si; carnitine lipid acyltransferases I, II (tardive) and II (infant) deficiency; secondary hypothyroidism; mandibular facial bone hypoplasia, treacher kolin (Treacher Collins), autosomal recessive inheritance; cowden (Cowden) syndrome 1; li-flumini (Li-Fraomeni) syndrome 1; asparagine synthetase deficiency; malattialeventines; optic atrophy 9; infantile convulsion and paroxysmal chorea athetosis, familial; ataxia devoid of vitamin E; islet cell proliferation; sanhao (Miyoshi) muscular dystrophy 1; thrombophilia, hereditary, autosomal dominant and recessive due to protein C deficiency; fei Xite sodium (Fechtner) syndrome; preparation of a lysin deficiency, X linkage; mental retardation, notch movement, epilepsy and/or brain deformity; creatine deficiency, X linkage; hair matrix tumor; cyanosis, transient neonates, and atypical kidney disease; ataxia syndrome of adult onset motor eye; hemangiomas, capillary hemangiomas; PC-K6a; systemic dominant dystrophic epidermolysis bullosa; pecies disease (Pelizaeus-Merzbacher); myopathy, central nucleus, 1, congenital, excessive muscle spindle, distal, 1, lactic acidosis and iron granule young cell anemia 1, progressive mitochondrial congenital cataract, hearing loss and developmental retardation, tubular aggregation, 2; benign familial neonatal convulsions 1 and 2; primary pulmonary arterial hypertension; primary lymphedema is accompanied by myelodysplasia; congenital long QT syndrome; familial exudative vitreoretinopathy, X-linked; autosomal dominant hypohidrosis ectodermal dysplasia; original dwarfism; familial pulmonary capillary hemangiomatosis; carnitine acyl carnitine transaminase deficiency; visceral myopathy; familial mediterranean fever and familial mediterranean fever, autosomal dominant inheritance; combining the partial and complete 17-alpha-hydroxylase/17, 20-lyase; ear-palatophageal finger syndrome, type I; kidney stones/osteoporosis, hypophosphatemia, 2; familial type 1 and type 3 hyperlipoproteinemia; a phenotype; CHARGE joint deformity; fulmann syndrome; dilute syndrome-lymphedema-telangiectasia syndrome; blomstrand for cartilage dysplasia; acroerythraokaratosderma; nerve conduction velocity is slowed down, and autosomal dominant inheritance is achieved; hereditary cancer susceptibility syndrome; the skeletal development of the cranium is maldeveloped, and the autosomal dominant inheritance is carried out; spinocerebellar ataxia autosomal recessive inheritance 1 and 16; a proprotein convertase 1/3 deficiency; d-2-hydroxyglutarate 2; convulsion syndrome 2 and hereditary convulsion syndrome; central axonopathy; opitz G/BBB syndrome; cystic fibrosis; cellular corneal dystrophy; deficiency of phosphoglycerate mutase; mitochondrial short-chain enoyl-coa hydratase 1 deficiency; ectodermal dysplasia skin fragility syndrome; wolfram (Wolfram) like syndrome, autosomal dominant inheritance; anemia of small cell type; pyruvate carboxylase deficiency; leukocyte adhesion deficiency type I and type III; multiple endocrine adenomas, type 4; transient bullous skin relief in neonates; primrose syndrome; non-small cell lung cancer; congenital muscular dystrophy; mixed esterase deficiency; col-CARPENTER syndrome 2; the junction of the atrioventricular septal defect and the common atrioventricular junction; xanthine oxidase deficiency; waldenburg syndrome types 1, 4C and 2E (with involvement of the nervous system); still (stonkler) syndrome, type i (non-synopsis eye) and type 4; keratoconus, blue sclera, and overactive joints; a small spherical lens; chudley-McCullough syndrome; simple epidermolysis bullosa and limb-girdle muscular dystrophy, simple mottle pigmentation, simple pylorus locking, simple autosomal recessive inheritance, and pylorus locking; rett (Rett) disorder; abnormal neuron migration; growth hormone deficiency with abnormal pituitary gland; subacute necrotizing cerebrospinal disease; palmoplantar keratosis striatum 1; wei Senba Hertz-Sissenbacher-Zweymuller syndrome; medium chain acyl-coa dehydrogenase deficiency; UDP glucose-4-epimerase deficiency; susceptibility to autism, X linkage 3; the aperture-derived retinal detachment, autosomal dominant inheritance; familial febrile convulsion 8; ulna and fibula lack severe limb defects; left ventricular cardiac insufficiency 6; centromere instability and immunodeficiency of chromosomes 1,9 and 16; hereditary diffuse leukoencephalopathy with spheroids; cushing's syndrome; dopamine receptor d2, decreased brain density; syndrome C; kidney dysplasia, retinal pigment dystrophy, cerebellar ataxia and skeletal dysplasia; ovarian hypoplasia 1; pearson (Pierson) syndrome; multiple neuropathy, hearing loss, ataxia, retinitis pigmentosa, and cataracts; progressive intrahepatic cholestasis; autosomal dominant inheritance, autosomal recessive inheritance, and X-linked recessive Alport syndrome; angelman (Angelman) syndrome; a Mi Shen (Amish) infant epileptic syndrome; autoimmune lymphoproliferative syndrome, type la; hydrocephalus; equisima-like syndrome (Marfanoid habitus); bare (Bare) lymphocyte syndrome type 2, complementation group E; recessive dystrophic epidermolysis bullosa; factor H, VII, X, v and the combined deficiency of factors viii,2, xiii, subunit; lamellar powdery cataract 3; warts, hypogammaglobulinemia, infections, and myelodysplastic abnormalities; benign hereditary chorea; a hyaluronan-deficient enzyme; small-headed malformations, hiatal hernias and nephrotic syndromes; growth and development, mental retardation, mandibular facial dysplasia, small head deformity and cleft palate; lymphedema, hereditary, id; pubertal delay; characterization of mineralocorticoid hyperactivity; systemic arterial calcification 2 in infancy; methylmalonic urine disorder, mut (0) type; congenital heart disease, multiple types, 2; familial hypoplasia, glomerular cystic kidney disease; brain-eye-face-bone syndrome 2; stergatt (Stargardt) disease 1; mental retardation, autosomal recessive inheritance 15, 44, 46 and 5; prolyl peptidase deficiency; methylmalonic urine disease cblB type; small mouth disease; endocrine-brain bone dysplasia; brain deformity 1, 2 (X linkage), 3, 6 (microcephaly), X linkage; growth hormone cell adenoma; gamstor-wohlfar syndrome; lipid protein deposition; inclusion body myopathies 2 and 3; vestibular aqueduct enlargement syndrome; osteoporosis-pseudoglioma syndrome; acquired long QT syndrome; phenylketonuria; CHOPS syndrome; overall developmental retardation; crystalline-like retinal degeneration; noonan (Noonan) syndrome-like diseases with or without childhood myelomonocytic leukemia; congenital hematogenic porphyria; hereditary atrophy of the eyeball; paraganglioma 3; cleft lip syndrome; an aromatase deficiency; birk Barel dysnoesia syndrome; amyotrophic lateral sclerosis 5; methemoglobinemia type I1 and 2; congenital stationary night blindness, type 1A, type IB, type 1C, type IE, type IF and type 2A; seizures; thyroid cancer, follicular character; a fatal congenital contracture syndrome 6; distal hereditary motor neuron disease type 2B; sex cord-interstitial tumor; epileptic encephalopathy, childhood onset, early childhood, 1, 19, 23, 25, 30, and 32; myofibrillar myopathy 1 and ZASP related; the infant cerebellar ataxia is accompanied by progressive extraocular muscle paralysis; purine nucleoside phosphorylase deficiency; forebrain defects; age-dependent epileptic encephalopathy; obesity; 4. left ventricular densification insufficiency 10; verheij syndrome; mowat-Wilson syndrome; odontotrichomelic syndrome; macular dystrophy of retinal pigment epithelium; lig4 syndrome; balacart (Barakat) syndrome; IRAK4 deficiency; growth hormone cell adenoma; branched-chain ketoacid dehydrogenase kinase deficiency; cystiuria; familial earthworm dysplasia; succinyl-coa acetoacetate transferase deficiency; shoulder fibular spinal muscular atrophy; pigmentary retinal degeneration; glanzmann (Glanzmann) thrombocytopenia; teenager primary open angle glaucoma 1; aicardi Goutieres syndromes 1, 4 and 5; renal dysplasia; intrauterine hypoevolutism, metaphyseal dysplasia, congenital adrenal hypoplasia, and genital abnormality; beaded hair; short stature, toenail dysplasia, facial deformity and thin complications; metachromatic leukodystrophy; cholestanol storage disease; three M syndrome 2; leber (Leber) congenital amaurosis 11, 12, 13, 16, 4, 7, 9; mandibular dysplasia with type a or type B lipodystrophy, atypical; melgolin (Meier-Gorlin) syndrome 4; rare complications 8 and 12; short QT syndrome 3; ectodermal dysplasia iib; nail-free disease; pseudo-hypoparathyroidism of type 1A, pseudo-hypoparathyroidism; leber (Leber) optic atrophy; banbriqi-Luo Posi (Bainbridge-Ropers) syndrome; wever (Weaver) syndrome; short stature, closed auditory canal, poor mandibular development, and abnormal bones; lack of alpha-mannosidase; macular dystrophy, yolk-like, adult onset; glutaluria, type 1; gangliosidosis GM type 1 (heart affected) 3; mandibular dysplasia; type I hereditary lymphedema; atrial rest 2; geisha singing facial makeup syndrome; bethlem myopathy and Bethlem myopathy 2; myeloperoxidase deficiency; spot corneal dystrophy; hereditary enteropathy acrodermatitis; familial hypobetalipoproteinemia associated with apob 32; type a of cobaine (Cockayne) syndrome; hyperparathyroidism, neonatal severe; ataxia-telangiectasia-like disorder; peng Delai (pendered) syndrome; a blood group I system; familial benign pemphigus; visceral ectopic 5, autosomal; renal diabetes insipidus, X linkage; small nuclear myopathy accompanied by extraocular muscle paralysis; pecry (Perry) syndrome; hypohidrosis/hair/teeth, autosomal recessive inheritance; hereditary pancreatitis; mental retardation and small head deformity with brain bridge and cerebellar hypoplasia; glycogen storage disease 0 (muscle), II (adult), IXa2, IXc, type 1A; the skull hardens the striated bone disease; glutathione synthetase deficiency; brugada syndrome and Brugada syndrome 4; endometrial cancer; hypohidrosis ectodermal dysplasia with immunodeficiency; cholestasis, intrahepatic, pregnancy 3; primary-threo (Bemard-soulie) syndrome, types A1 and A2 (autosomal dominant inheritance); sialic acid storage disease; ornithine aminotransferase deficiency; PTEN hamartoma syndrome; distichiasis-lymphedema syndrome; corticosteroid-binding globulin deficiency; adult neuronal ceroid lipofuscinosis; dejerine-Sottas disease; congenital limb-cutting syndrome (tetramelia), autosomal recessive inheritance; cenio-Loken (senor-Loken) syndromes 4 and 5; glutarate IIA and IIB; aortic aneurysms, familial chest 4, 6 and 9; hyperphosphatemia with mental retardation syndrome 2, 3 and 4; congenital hyperkeratosis of the X linkage; joint contracture, renal insufficiency and cholestasis 2; bannayan-Lai Li Lu Waer carba (Bannayan-Riley-Ruvalcaba) syndrome; 3-methylglutarate urine disorder; isolated 17, 20-lysis deficiency; gollin (Gorlin) syndrome; hand-foot uterine syndrome; tay-Sachs disease, B1 variation, gm 2-ganglioside deposition (adult), gm 2-ganglioside deposition (adult onset); dalin-degos (Dowling-degos) disease 4; parkinson's disease 14, 15, 19 (juvenile onset), 2, 20 (premature), 6, (autosomal recessive premature), and 9; sensory ataxia, autosomal dominant inheritance; congenital microvilli atrophy; myoclonus-tension loss epilepsy; dangil disease (Tangier disease); 2-methyl-3-hydroxybutyric acid urine; familial renal hypouricemia; cerebral infarction deformity; mitochondrial DNA depletion syndrome 4b, mngie type; fengold (Feingold) syndrome 1; a deficiency in renal carnitine transport; familial hypercholesterolemia; townes-Brocks-brandroootoenal-like syndrome; griesceli (Griscelli) syndrome type 3; merkel-Gruber syndrome; bullous ichthyoid erythroderma; neutrophil immunodeficiency syndrome; myasthenia syndrome, congenital, 17, 2A (slow channel), 4B (fast channel), without tubular aggregates; microvascular complications of diabetes 7; mcKusick Kaufman syndrome; chronic granulomatosis, autosomal recessive inherited cytochrome b positive, type 1 and type 2; argininosuccinate lyase deficiency; mitochondrial phosphate and pyruvate carrier deficiency; lattice-like corneal dystrophies type III; ectodermal dysplasia-and indication syndrome 1; low myelogenous leukodystrophy 7; mental retardation, autosomal dominant inheritance 12, 13, 15, 24, 3, 30, 4, 5, 6 and 9; general epilepsy with febrile convulsions, type 1 and type 2; psoriasis susceptibility 2; frank-terhaar (Frank Ter Haar) syndrome; thoracic aortic aneurysms and aortic dissection; kluzon (Crouzon) syndrome; ovarian granulosa layer cell tumor; epidermolytic palmoplantar keratosis; leri-Weill (Leri Weill) cartilage dysplasia; 3 beta-hydroxysteroid dehydrogenase deficiency; familial restrictive cardiomyopathy 1; autosomal dominant progressive extraocular muscle paralysis with mitochondrial DNA deletions 1 and 3; a Bixler syndrome (Antley-Bixler) with genital abnormalities and steroid-generating disorders; hereditary bone dysplasia and acroosteolysis; pigmentary nodular adrenocortical disease, primary, 1; narcotic pain syndrome, familial, 3; dejerine-Sottas syndrome, autosomal dominant; FG syndrome and FG syndrome 4; dendritic cells, monocytes, B lymphocytes and natural killer lymphocyte deficiency; hypothyroidism, congenital, non-goiter, 1; miller (Miller) syndrome; linear body myopathies 3 and 9; oligodendrocyte-colorectal cancer syndrome; cold sweating syndrome 1; van Buchem disease type 2; glaucoma 3, primary congenital, d; citrullinemia type I and type II; nosaka myopathy; congenital muscular dystrophy due to partial LAMA2 deficiency; neuropathic gastrointestinal encephalopathy syndrome; subacute necrotizing cerebrospinal disorders due to mitochondrial complex I deficiency; medulloblastoma; pyruvate dehydrogenase El-alpha deficiency; colon cancer; south-holan (Nance-Horan) syndrome; sang Huofu (Sandhoff) diseases, adult and infant; joint contracture renal insufficiency cholestasis syndrome; autosomal recessive inheritance of low phosphorus bone disease; multiple-inch cellular retinal dystrophy; spinocerebellar ataxia 14, 21, 35, 40 and 6; dementia with lewy bodies; RRM 2B-related mitochondrial disease; bromodi (Brody) disease; megabrain-polycephalum syndrome 2; wu Xieer (Usher) syndrome, type 1, IB, ID, 1G, 2A, 2C, and 2D; hypocalcemia and under-ripening, IIA1 enamel hypoplasia; pituitary hormone deficiency, combination 1, 2, 3 and 4; cushing) symbiosis; tubular acidosis, distal, autosomal recessive inheritance, delayed sensorineural hearing loss, or hemolytic anemia; the kidney of the infant is tuberculosis; juvenile polyposis syndrome; sensory ataxia neuropathy, dysarthria, and paralysis of the ocular muscles; 3-hydroxyacyl-coa dehydrogenase deficiency; parathyroid cancer; x-linked agaropectinemia; megaloblastic anemia, thiamine responsiveness, diabetes and sensorineural hearing loss; multiple sulfatase deficiency; neurodegeneration with iron deposition 4 and 6 in brain; cholesterol monooxygenase (side chain cleavage) deficiency; hemolytic anemia caused by a deficiency of adenylyl succinic acid lyase; epileptic myoclonus with jagged red fibers; pitt-Hopkins syndrome; escobar type of multiple pterygium syndrome; homocystinuria-megaloblastic anemia due to deficiency in cobalamin metabolism, cblE complementation; cholecystitis; polycythemia globosa of type 4 and type 5; a variety of congenital anomalies; pigment xeroderma, complementation group b, group D, group E and group G; lener syndrome; groenuw corneal dystrophy type I; coenzyme Q10 deficiency, primary 1, 4 and 7; distal spinal muscular atrophy, congenital non-progressive; huabao (Warburg) microcomprises 2 and 4; bile acid synthesis deficiency, congenital, 3; acth independent adrenal macrotuberous hyperplasia 2; dysplasia of the top femur; familial Paget (page) bone disease; severe neonatal encephalopathy is accompanied by small head deformity; zimmermann-Laband syndrome and Zimmermann-Laband syndrome 2; a riflestein (reifinstein) syndrome; familial hypokalemia-hypomagnesemia; photosensitive hair sulfur malnutrition; epidermolysis bullosa in the adult joint; lung cancer; frieman-sieldon syndrome; hyperinsulinemia-hyperammonemia syndrome; posterior polarity cataract of 2 type; scleral cornea, autosomal recessive inheritance; juvenile GM >1< ganglioside deposition; coren (Cohen) syndrome; hereditary paraganglioma-pheochromocytoma syndrome; neonatal insulin dependent diabetes mellitus; dysplasia of cartilage; floating-Harbor syndrome; skin laxity with bone malnutrition and severe pulmonary, gastrointestinal and urinary system abnormalities; congenital contractures of the extremities and faces, hypotonia and hypoevolutism; congenital hyperkeratosis autosomal dominant inheritance and autosomal dominant inheritance, 3; tissue cell type marrow reticulate tissue hyperplasia; kesteriolo (Costello) elastin deficiency; immunodeficiency 15, 16, 19, 30, 31C, 38, 40, 8, high IgM of type 1 and type 2 due to cd3-zeta deficiency, and X-linked, magnesium deficiency, epstein-Barr (Epstein-Barr) virus infection and neoplasia; atrial septal defects 2, 4, and 7 (with or without atrioventricular conduction defects); GTP cyclohydrolase I deficiency; the horseshoe turns inwards; phosphoglycerate kinase 1 deficiency; tuberous sclerosis 1 and 2; autosomal recessive congenital ichthyosis 1, 2, 3, 4A and 4B; and familial hypertrophic cardiomyopathy 1, 2, 3, 4, 7, 10, 23, and 24.

Tissue indications

Other suitable diseases and disorders that may be treated by the systems and methods provided herein include, but are not limited to, central Nervous System (CNS) diseases (see exemplary diseases and affected genes in table 13), ocular diseases (see exemplary diseases and affected genes in table 14), cardiac diseases (see exemplary diseases and affected genes in table 15), hematopoietic stem cell diseases (HSCs) (see exemplary diseases and affected genes in table 16), renal diseases (see exemplary diseases and affected genes in table 17), liver diseases (see exemplary diseases and affected genes in table 18), pulmonary diseases (see exemplary diseases and affected genes in table 19), skeletal muscle diseases (see exemplary diseases and affected genes in table 20), and skin diseases (see exemplary diseases and affected genes in table 21). Table 22 provides exemplary protective mutations that reduce the risk of a given disease. In some embodiments, the Gene Writer system described herein is used to treat an indication of any of tables 13-21. In some embodiments, the geneWriter system modifies a target site in genomic DNA in a cell, wherein the target site is in a gene of any one of tables 13-21, e.g., in a subject having the corresponding indication listed in any one of tables 13-21. In some embodiments, geneWriter corrects mutations in the gene. In some embodiments, the GeneWriter inserts sequences that have been deleted from the gene (e.g., by mutations that cause disease). In some embodiments, the geneWriter lacks sequences that have been replicated in the gene (e.g., by mutations that cause disease). In some embodiments, the GeneWriter replaces the mutation (e.g., a mutation that causes a disease) with the corresponding wild-type sequence. In some embodiments, the mutation is a substitution, insertion, deletion, or inversion.

Table 13 affected CNS diseases and genes.

TABLE 14 affected ocular diseases and genes.

Disease of the human body	Affected genes
		Achromatopsia (achromatopsia)	CNGB3
Congenital amaurosis (LCA 1)	GUCY2D
		Congenital amaurosis (LCA 10)	CEP290
Congenital amaurosis (LCA 2)	RPE65
		Congenital amaurosis (LCA 8)	CRB1
No choroidemia	CHM
		Cone rod malnutrition (ABCA 4)	ABCA4
Cone rod malnutrition (CRX)	CRX
		Cone rod malnutrition (GUCY 2D)	GUCY2D
Non-nephrotic cystine eye disease	CTNS
		Lattice corneal dystrophy type I	TGFBI
Spot cornea Malnutrition (MCD)	CHST6
		Atrophy of optic nerve	OPA1
Retinitis pigmentosa (AR)	USH2A
		Retinitis pigmentosa (AD)	RHO
Stergard disease	ABCA4
		Yolk macular dystrophy	BEST1；PRPH2

TABLE 15 affected heart disease and genes.

/>

Table 16 affected HSC diseases and genes.

/>

Table 17 affected kidney disease and genes.

Disease of the human body	Affected genes
		Alport syndrome	COL4A5
Autosomal dominant polycystic kidney disease (PKD 1)	PKD1
		Autosomal dominant polycystic kidney disease (PKD 2)	PDK2
Autosomal dominant tubular interstitial nephropathy (MUC 1)	MUC1
		Autosomal dominant tubular interstitial nephropathy (UMOD)	UMOD
Autosomal recessive polycystic kidney disease	PKHD1
		Congenital nephrotic syndrome	NPHS2
Cystinosis of the human body	CTNS

TABLE 18 affected liver diseases and genes.

/>

TABLE 19 affected pulmonary diseases and genes.

TABLE 20 affected skeletal muscle diseases and genes.

/>

Table 21 affected skin diseases and genes.

Disease of the human body	Affected genes
		Dominant epidermolytic dystrophy of bullosa	COL7A1
Recessive dystrophic epidermolysis bullosa (Hallopeau-Siemens type)	COL7A1
		Junction epidermolysis bullosa	LAMB3
Epidermolysis bullosa simplex	KRT5；KRT14
		Epidermolytic ichthyosis	KRT1；KRT10
Hailey-Hailey disease	ATP2C1
		Lamellar ichthyosis/nonbullous congenital ichthyoid erythroderma (ARCI)	TGM1
Netherton syndrome	SPINK5

Table 22 exemplary protective mutations that reduce disease risk.

/>

Pathogenic mutation

In some embodiments, the systems or methods provided herein can be used to correct pathogenic mutations. Pathogenic mutations may be genetic mutations that increase an individual's susceptibility or susceptibility to a disease or disorder. In some embodiments, the pathogenic mutation is a disease-causing mutation in a gene associated with a disease or disorder. In some embodiments, the systems or methods provided herein can be used to restore pathogenic mutations to their wild-type counterparts. In some embodiments, the systems or methods provided herein can be used to alter pathogenic mutations to sequences that do not cause a disease or disorder.

Table 23 provides exemplary indications (column 1), potential genes (column 2) and pathogenic mutations (column 3) that can be corrected using the systems or methods described herein.

Table 23. Indications, genes and pathogenic mutations.

/>

^# : see J Tden Dunnen and S E Antonarakis, hum Mutat [ human mutation ]]2000;15 7-12, which is incorporated herein by reference in its entirety, to understand the details of the nomenclature of gene mutations. * Indicating a stop codon.

Compensation editing

In some embodiments, the systems or methods provided herein may be used to introduce compensatory editing. In some embodiments, the compensatory edits are located at a position of the gene associated with the disease or disorder that is different from the position of the mutation that caused the disease. In some embodiments, the compensatory mutation is not in a gene comprising a pathogenic mutation. In some embodiments, the compensatory editing may cancel or compensate for mutations that cause the disease. In some embodiments, compensatory editing can be introduced by the systems or methods provided herein to inhibit or reverse the mutational effects of the mutation that causes the disease.

Table 24 provides exemplary indications (column 1), genes (column 2) and compensatory edits (column 3) that may be introduced using the systems or methods described herein. In some embodiments, the compensatory edits provided in table 24 may be introduced to suppress or reverse the mutational effects of the mutation causing the disease.

Table 24. Indications, genes, compensatory edits and exemplary design features.

Disease of the human body	Gene	Nucleotide changes ^#
			Alpha-1 antitrypsin deficiency	SERPINAI	F51L
Alpha-1 anti-cancerTrypsin deficiency	SERPINAI	M374I
			Alpha-1 antitrypsin deficiency	SERPINAI	A348V/A347V
Alpha-1 antitrypsin deficiency	SERPINAI	K387R
			Alpha-1 antitrypsin deficiency	SERPINAI	T59A
Alpha-1 antitrypsin deficiency	SERPINAI	T68A
			ATTR amyloidosis	TTR	Al08V
ATTR amyloidosis	TTR	Rl04H
			ATTR amyloidosis	TTR	T119M
Cystic fibrosis	CFTR	R555K
			Cystic fibrosis	CFTR	F409L
Cystic fibrosis	CFTR	F433L
			Cystic fibrosis	CFTR	H667R
Cystic fibrosis	CFTR	Rl070W
			Cystic fibrosis	CFTR	R29K
Cystic fibrosis	CFTR	R553Q
			Cystic fibrosis	CFTR	1539T
Cystic fibrosis	CFTR	G550E
			Cystic fibrosis	CFTR	F429S
Cystic fibrosis	CFTR	Q637R
			Sickle cell disease	HBB	A70T
Sickle cell disease	HBB	A70V
			Sickle cell disease	HBB	L88P
Sickle cell disease	HBB	F85L and/or F85P
			Sickle cell disease	HBB	E22G
Sickle cell disease	HBB	G16D and/or G16N

^# : see J T den Dunnen and S E Antonarakis, hum Mutat [ human mutation ]]2000;15 7-12, which is incorporated herein by reference in its entirety, to understand the details of the nomenclature of gene mutations.

Regulatory editing

In some embodiments, the systems or methods provided herein may be used to introduce regulatory edits. In some embodiments, regulatory edits are introduced into the regulatory sequences of a gene, such as a gene promoter, a gene enhancer, a gene repressor, or a sequence that regulates splicing of a gene. In some embodiments, the regulatory edits increase or decrease the expression level of the target gene. In some embodiments, the target gene is the same gene that contains a mutation that causes the disease. In some embodiments, the target gene is different from the gene containing the mutation that causes the disease. For example, the systems or methods provided herein can be used to up-regulate fetal hemoglobin expression by introducing regulatory edits at the promoter of bcl11a, thereby treating sickle cell disease.

Table 25 provides exemplary indications (column 1), genes (column 2) and regulatory edits (column 3) that may be introduced using the systems or methods described herein.

Table 25 indication, gene and compensatory regulatory edits.

/>

Repeat amplified disease

In some embodiments, the systems or methods provided herein can be used for repeat-amplification diseases, such as the repeat-amplification diseases provided in table 26. Table 26 provides the indication (column 1), the gene (column 2), the minimal repeat of the repeat amplified under this condition (column 3), and the position of the repeat relative to the listed genes in each indication (column 4). In some embodiments, the systems or methods provided herein, e.g., those comprising Gene writers, can be used to treat a repeat-sequence-amplified disease by resetting the number of repeat sequences at a locus according to a customized RNA template (see, e.g., example 24).

Table 26. Exemplary repeat amplifications of disease, genes, causal repeat, and repeat positions.

Exemplary templates

In some embodiments, the systems or methods provided herein use the template sequences listed in table 27. Table 27 provides exemplary template RNA sequences (column 5) and optional second nicked gRNA sequences (column 6) designed to pair with Gene Writing polypeptides to correct for specified pathogenic mutations (column 4). All templates in table 27 are intended to illustrate the overall sequence of: (1) a gRNA for a first strand incision, (2) a polypeptide binding domain, (3) a heterologous subject sequence, and (4) a targeting homeodomain for building TPRT at the first strand incision.

Table 27 exemplary diseases, tissues, genes, pathogenic mutations, template RNA sequences and second nicked gRNA sequences.

/>

In some embodiments, the systems or methods provided herein use the template sequences listed in table 35. Table 35 provides exemplary template RNA sequences (column 5) and optional second nicked gRNA sequences (column 6) designed to pair with Gene Writing polypeptides to correct for specified pathogenic mutations (column 4). All templates in table 35 are intended to illustrate the overall sequence of: (1) a gRNA for a first strand incision, (2) a polypeptide binding domain, (3) a heterologous subject sequence, and (4) a targeting homeodomain for building TPRT at the first strand incision.

Table 35. Exemplary Gene Writing templates and second nicked gRNA sequences for correction of exemplary repeat amplified disease. Template regions spanning one or more repeated sequences are represented in lowercase letters.

/>

Exemplary heterologous object sequence

In some embodiments, the systems or methods provided herein comprise a heterologous subject sequence, wherein the heterologous subject sequence or reverse complement thereof encodes a protein (e.g., an antibody) or peptide. In some embodiments, the therapy is a therapy approved by a regulatory agency, such as the FDA.

In some embodiments, the protein or peptide is a protein or peptide from the THPdb database (Usmani et al PLoS One [ public science library comprehensive ]12 (7): e0181748 (2017), which is incorporated herein by reference in its entirety.

In some embodiments, the protein or peptide is an antibody disclosed in Table 1 of Lu et al J Biomed Sci [ journal of biomedical science ]27 (1): 1 (2020), which is incorporated herein by reference in its entirety. In some embodiments, the protein or peptide is an antibody disclosed in table 29. In some embodiments, the systems or methods disclosed herein, e.g., those comprising Gene writers, can be used to integrate the expression cassettes of the antibodies in table 29 into host cells to enable expression of the antibodies in the host. In some embodiments, the systems or methods described herein are used to express an agent that binds to a target of column 2 of table 29 (e.g., a monoclonal antibody of column 1 of table 29) in a subject having an indication of column 3 of table 29.

Table 28 exemplary protein and peptide therapeutics.

/>

Table 29 exemplary monoclonal antibody therapies.

/>

Plant modification method

The Gene Writer systems described herein can be used to modify plants or plant parts (e.g., leaves, roots, flowers, fruits, or seeds) for example to increase fitness of plants.

A. Delivery to plants

Provided herein are methods of delivering the Gene Writer systems described herein to plants. Including methods for delivering a Gene Writer system to a plant by contacting the plant or a portion thereof with the Gene Writer system. These methods can be used to modify plants, for example, to increase fitness of the plants.

More particularly, in some embodiments, a nucleic acid described herein (e.g., a nucleic acid encoding geneWriter) can be encoded in a vector, e.g., inserted adjacent to a plant promoter (e.g., maize ubiquitin promoter (ZmUBI) in a plant vector (e.g., pHUC 411)). In some embodiments, a nucleic acid described herein is introduced into a plant (e.g., japonica rice) or a portion of a plant (e.g., callus of a plant) via agrobacterium. In some embodiments, the systems and methods described herein can be used in plants by replacing a plant gene (e.g., hygromycin Phosphotransferase (HPT)) with a null allele (e.g., containing a base substitution at the start codon). Systems and methods for modifying plant genomes are described below: xu et al Development of plant prime-editing systems for precise genome editing [ develop plant lead editing system for precise genome editing ],2020,Plant Communications [ plant communication ].

In one aspect, provided herein is a method of increasing fitness of a plant, the method comprising delivering to the plant a Gene Writer system described herein (e.g., in an effective amount and duration) to increase fitness of the plant relative to an untreated plant (e.g., a plant not delivering the Gene Writer system).

The increase in plant fitness resulting from the delivery of the Gene Writer system can manifest itself in a variety of ways, for example, resulting in better production of plants, such as improved yield, improved plant vigor or quality of the product harvested from the plant, improvement in pre-or post-harvest traits (e.g., taste, appearance, shelf life) desired by the agricultural or horticultural industry, or improvement in traits that otherwise benefit humans (e.g., reduced allergen production). Improved plant yield relates to an increase in the yield of a product of a plant in measurable amounts relative to the yield of the same product of a plant produced under the same conditions but without the use of the composition of the invention or as compared to the use of conventional plant modifiers (e.g., as measured by plant biomass, grain, seed or fruit yield, protein content, carbohydrate or oil content or leaf area). For example, the yield may be increased by at least about 0.5%, about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 100%, or greater than 100%. In some cases, the method is effective to increase the yield by about 2 x-fold, 5 x-fold, 10 x-fold, 25 x-fold, 50 x-fold, 75 x-fold, 100 x-fold, or greater than 100 x-fold relative to untreated plants. Yield may be expressed in terms of weight or volume of a plant or plant product on a certain basis. The basis may be expressed in terms of time, growth area, weight of plants produced, or amount of raw materials used. For example, such methods may increase the yield of plant tissue, including but not limited to: seeds, fruits, kernels, pods, tubers, roots and leaves.

The increase in plant fitness as a result of delivering the Gene Writer system can also be measured by other means, such as vigor rating, plant density (stand) (number of plants per unit area), plant height, stalk circumference, stalk length, leaf number, leaf size, plant canopy, visual appearance (such as greener leaf color), root rating, emergence, protein content, increased tillering, larger leaves, more leaves, fewer dead basal leaves, stronger tillers, fewer desired fertilizers, fewer desired seeds, more productive tillers, earlier flowering, early grain or seed maturity, fewer plant segments (verse) (lodging), increased shoot growth, earlier germination, or any combination of these factors, the same factors being increased or improved by a measurable or perceptible amount, relative to plants produced under the same conditions but without the application of the compositions of the invention or the application of conventional plant modifiers (e.g., plant modifiers delivered without PMP).

Accordingly, provided herein is a method of modifying a plant, the method comprising delivering to the plant an effective amount of any of the Gene Writer systems provided herein, wherein the method modifies the plant and thereby introduces or increases a beneficial trait (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%,70%, 80%, 90%, 100%, or greater than 100%) in the plant relative to an untreated plant. In particular, the method can increase plant fitness (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%,70%, 80%, 90%, 100%, or greater than 100%) relative to an untreated plant.

In some cases, the increase in plant fitness is an increase (e.g., an increase of about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%): disease resistance, drought tolerance, heat resistance, cold tolerance, salt tolerance, metal tolerance, herbicide tolerance, chemical tolerance, water use efficiency, nitrogen use, resistance to nitrogen stress, nitrogen fixation, pest resistance, herbivore resistance, pathogen resistance, yield under water limiting conditions, vigor, growth, photosynthetic capacity, nutrition, protein content, carbohydrate content, oil content, biomass, bud length, root structure, seed weight, or amount of harvestable product.

In some cases, the increase in fitness is an increase in development, growth, yield, resistance to an abiotic stress source, or resistance to an biotic stress source (e.g., an increase of about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%). Abiotic stress refers to an environmental stress condition to which a plant or plant part is subjected, including, for example, drought stress, salt stress, heat stress, cold stress, and low nutrient stress. Biotic stress refers to an environmental stress condition to which a plant or plant part is subjected, including, for example, nematode stress, herbivorous stress, fungal pathogen stress, bacterial pathogen stress, or viral pathogen stress. Stress may be temporary, e.g. hours, days, months or permanent, e.g. for the lifetime of the plant.

In some cases, the increase in plant fitness is an increase in the mass of product harvested from the plant (e.g., an increase of about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% or greater than 100%). For example, the increase in plant fitness may be an improvement in a commercially advantageous characteristic (e.g., taste or appearance) of the product harvested from the plant. In other cases, the increase in plant fitness is an increase in shelf life of a product harvested from a plant (e.g., an increase of about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% or greater than 100%).

Alternatively, the increase in fitness may be a change in a trait beneficial to human or animal health, such as a decrease in allergen production. For example, the increase in fitness can be a decrease (e.g., about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%) in the production of an allergen (e.g., pollen) that stimulates an immune response in an animal (e.g., a human).

Modifications of the plant (e.g., increases in fitness) may result from modification of one or more plant parts. For example, a plant may be modified by contacting the leaf, seed, pollen, root, fruit, bud, flower, cell, protoplast, or tissue (e.g., meristem) of the plant. Thus, in another aspect, provided herein is a method of increasing fitness of a plant, the method comprising contacting pollen of the plant with an effective amount of any one of the plant-modifying compositions herein, wherein the method increases fitness of the plant relative to an untreated plant (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%).

In yet another aspect, provided herein is a method of increasing fitness of a plant, the method comprising contacting a seed of the plant with an effective amount of any of the Gene Writer systems disclosed herein, wherein the method increases fitness of the plant relative to an untreated plant (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%).

In another aspect, provided herein is a method comprising contacting a protoplast of a plant with an effective amount of any of the Gene Writer systems described herein, wherein the method increases fitness of the plant relative to an untreated plant (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%).

In a further aspect, provided herein is a method of increasing fitness of a plant, the method comprising contacting plant cells of the plant with an effective amount of any of the Gene Writer systems described herein, wherein the method increases fitness of the plant (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%) relative to an untreated plant.

In another aspect, provided herein is a method of increasing fitness of a plant, the method comprising contacting a meristem of the plant with an effective amount of any of the plant-modifying compositions herein, wherein the method increases fitness of the plant relative to an untreated plant (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%).

In another aspect, provided herein is a method of increasing fitness of a plant, the method comprising contacting an embryo of the plant with an effective amount of any of the plant-modifying compositions herein, wherein the method increases fitness of the plant relative to an untreated plant (e.g., by about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater than 100%).

B. Application method

The plants described herein may be exposed to any of the Gene Writer system compositions described herein in any suitable manner that allows for delivery or application of the composition to the plants. The Gene Writer system may be delivered alone or in combination with other active (e.g., fertiliser) or inactive substances and may be applied by, for example, spraying, injection (e.g., microinjection), by plant, pouring, dipping, in the form of concentrated liquids, gels, solutions, suspensions, sprays, powders, pills, blocks, bricks, etc. (formulated to deliver an effective concentration of the plant modification composition). The amount and location of application of the compositions described herein generally depends on the habit of the plant, the lifecycle stage at which the plant can be targeted by the plant modification composition, the location at which the plant modification composition will be applied, and the physical and functional characteristics of the plant modification composition.

In some cases, the composition is sprayed directly onto the plants (e.g., crops) by, for example, knapsack spray, aerial spray, crop spray/dust, etc. In the case of delivering the Gene Writer system to a plant, the plant receiving the Gene Writer system may be at any stage of plant growth. For example, the formulated plant modifying composition may be applied in the form of a seed coating or root treatment at an early stage of plant growth or as a total plant treatment at a later stage of the crop cycle. In some cases, the plant modifying composition may be applied to the plant as a topical agent.

In addition, the Gene Writer system (e.g., in the soil in which plants are grown, or in water used to water plants) can be applied as a systemic agent that is absorbed and distributed through the tissues of the plants. In some cases, a plant or food organism may be genetically transformed to express the Gene Writer system.

Delayed or sustained release may also be accomplished by: the Gene Writer system or a composition having one or more plant modifying compositions is coated with a dissolvable or bioerodible coating layer (such as gelatin) that dissolves or erodes in the environment of use, thereby then making the plant modifying composition Gene Writer system available in place, or by dispersing the agent in a dissolvable or erodable matrix. Such sustained release and/or dispensing means may advantageously be used to maintain an effective concentration of one or more plant modifying compositions described herein at all times.

In some cases, the Gene Writer system is delivered to a portion of a plant, such as a leaf, seed, pollen, root, fruit, bud, or flower, or tissue, cell, or protoplast thereof. In some cases, the Gene Writer system is delivered to cells of plants. In some cases, the Gene Writer system is delivered to protoplasts of plants. In some cases, the Gene Writer system is delivered to the tissue of a plant. For example, the composition can be delivered to a meristem of a plant (e.g., a apical meristem, a lateral meristem, or a metameristem). In some cases, the composition is delivered to permanent tissue (e.g., simple tissue (e.g., parenchyma, thick angle tissue, or thick wall tissue) or complex permanent tissue (e.g., xylem or phloem)) of the plant. In some cases, the Gene Writer system is delivered to plant embryos.

C. Plants and methods of making the same

A variety of plants can be delivered to or treated with the Gene Writer system described herein. Plants that can deliver the Gene Writer system (i.e., "treated") according to the methods of the invention include whole plants and parts thereof, including but not limited to bud vegetative organs/structures (e.g., leaves, stems, and tubers), roots, flowers, and floral organs/structures (e.g., bracts, sepals, petals, stamens, carpels, anthers, and ovules), seeds (including embryos, endosperm, cotyledons, and seed coats) and fruits (mature ovaries), plant tissue (e.g., vascular tissue, basal tissue, etc.), and cells (e.g., guard cells, egg cells, etc.), and progeny thereof. Plant parts may further refer to plant parts as follows: bud, root, stem, seed, leaf, petal, flower, ovule, bract, branch, petiole, internode, bark, short soft hair, tillering, rhizome, frond (front), leaf, pollen, stamen, etc.

Classes of plants that can be treated in the methods disclosed herein include higher and lower plant classes, including angiosperms (monocots and dicots), gymnosperms, ferns, scouring rush, gymnosperms, pinus, bryophytes, and algae (e.g., multicellular algae or unicellular algae). Plants that may be treated according to the methods of the invention further include any vascular plant, such as monocots or dicots or gymnosperms, including, but not limited to, alfalfa, apple, arabidopsis, banana, barley, canola, castor bean, chrysanthemum, clover, cocoa, coffee, cotton, cottonseed, corn, cranberry, cucumber, dendrobium, yam, eucalyptus, fescue, flax, gladiolus, liliaceae, flaxseed, millet, melon, mustard, oat, oil palm, rape, papaya, peanut, pineapple, ornamental, beans, potato, rapeseed, rice, rye, ryegrass, safflower, sesame, sorghum, soybean, beet, sugarcane, sunflower, strawberry, tobacco, tomato, turf grass, wheat, and vegetable crops (e.g., lettuce, celery, broccoli, cauliflower, cucurbits); fruit trees and nut trees such as apples, pears, peaches, oranges, grapefruits, lemons, lime, almonds, hickory nuts, walnuts, hazelnuts; vines, such as grapes (e.g., vineyards), kiwi fruits, hops (hops); fruit shrubs and raspberries, such as raspberries, blackberries, currants; woods such as fraxinus mandshurica, pine, fir, maple, oak, chestnut, poplar (populus); with alfalfa, canola, castor seed, corn, cotton, cranberry, flax, linseed, mustard, oil palm, canola, peanut, potato, rice, safflower, sesame, soybean, beet, sunflower, tobacco, tomato, and wheat. Plants that can be treated according to the methods of the present invention include any crop plant, for example, forage crops, oilseed crops, cereal crops, fruit crops, vegetable crops, fiber crops, spice crops, nut crops, turf crops, sugar crops, beverage crops, and forest crops. In some cases, the crop plant treated in the method is a soybean plant. In certain other cases, the crop plant is wheat. In some cases, the crop plant is corn. In some cases, the crop plant is cotton. In some cases, the crop plant is alfalfa. In some cases, the crop plant is sugar beet. In some cases, the crop plant is rice. In some cases, the crop plant is potato. In some cases, the crop plant is tomato.

In some cases, the plant is a crop. Examples of such crop plants include, but are not limited to, monocots and dicots, including, but not limited to, forage or forage legumes, ornamental plants, food crops, trees, or shrubs, selected from maple species (Acer spp.), allium species (Allium spp.), amaranthus species (Amaranthus spp.), pineapple (Ananas comosus), celery (Apium graveolens), arachis species (Arachis spp), asparagus (Asparagus officinalis), beet (Beta vulgaris), brassica species (Brassica spp.) (e.g., brassica napus (Brassica napus), turnip (Brassica rapa ssp.) (canola, rape, brassica napus (turnip rape)), wild tea tree (Camellia sinensis), canna (Canna indica), canna (Cannabis saliva), capsicum species (Capsicum spp.)), chestnut species (Castanea spp.), cultivated chicory (Cichorium endivia), watermelon (Citrullus lanatus), citrus species (Citrus spp.), coconut species (Cocos spp.), coffee species (cofea spp.), coriander (Coriandrum sativum), hazelnut species (Coryleus spp.)), crataegus species (Crataegspp.), cucurbita species (cuurba spp.)), carrot (Daucarota), green strawberry species (fig spp.), fig species (fig spp.), fig. Ficus (fig spp.), fig. the fruit spp.) Soyabean violations (Glycine spp.) (e.g., soyabean (Glycine max), soja hispida, or Soja max)), upland cotton (Gossypium hirsutum), sunflower species (Helianthus spp.) (e.g., sunflower), hibiscus species (Hibiscus spp.)), barley species (Hordeum spp.) (e.g., barley (Hordeum vulgare)), sweet potato (Ipomoea batatas), walnut species (Juglans spp.), lettuce (Lactuca sativa), flax (Linum usitatissimum), litchi (litchineinsis), lotus species (Lotus spp.), luffa acuta (Luffa acuta gu), lupin species (Lupinus spp.)), tomato species (Lycopersicon spp.) (e.g., luffa acuta spp.), tomato (Lycopersicon esculenturn)), cherry tomato (Lycopersicon lycopersicum), pear-shaped tomato (Lycopersicon pyriforme), malus species (Malus spp.), alfalfa (Medicago sativa), mint species (Mentha spp.), mango (Miscanthus sinensis), black mulberry (Morus nigra), musa species (Musa spp.), nicotiana species (nicothiana spp.), luteolin species (Olea spp.), oryza species (Oryza spp.) (e.g., rice (Oryza sativa)), wild rice (Oryza latifolia), millet (Panicum miliaceum), switchgrass (Panicum virgatum), passion flower (Passiflora edulis), parsley (petroselinum crenulatum), and the like, phaseolus species (Phaseolus spp.), pinus species (Pinus spp.), pistachia (pistachia) Pistacia, pisum species (Pisum spp.), poachus species (Poa spp.), populus species (Populus spp), prune species (Prunus spp.), pyris (Pyrus communis), quercus species (Quercus spp.), radishes (phasnux sativus), rheum officinale (Rheum rhabarbarum), ribes spp (Ribes spp), castor (riccinus muriana), rubus species (ruspp), saccharum species (Saccharum spp), salix species (Salix sp), sambucus species (Sambucus spp), secale (secalium spp), sesbanum species (Sesamum spp), sinapu species (sibus spp), white mustaro species (sibus spp), potato (Solanum tuberosum), red eggplant (Solanum integrifolium), or tomato (Solanum lycopersicum)), bicolor Sorghum (Sorghum bicolor), cogongrass (Sorghum halepense), spinach species (spinia spp.), robusta (Tamarindus indica), cocoa (Theobroma cacao), clover species (Trifolium spp.), triticale (Triticosecale rimpaui), triticale species (Triticum spp.) (e.g., common wheat (Triticum aestivum)), durum wheat (Triticum durum), cone wheat (Triticum turgidum), triticum hybernum, ma Kaxiao wheat (Triticum macha), and the like, triticum sativum or Triticum vulgare), vaccinium species (Vaccium spp.), vicia species (Vicia spp.), vicia species (Vigna spp.), viola odorta (Viola odorta), vitis species (Vitis spp.), and corn (Zea mays). In certain embodiments, the crop plant is rice, canola, soybean, corn (maize)), cotton, sugarcane, alfalfa, sorghum, or wheat.

Plants or plant parts useful in the present invention include plants of any stage of plant development. In some cases, delivery may be at the stage of germination, seedling growth, vegetative growth, and reproductive growth. In some cases, delivery to the plant occurs during vegetative and reproductive growth phases. In some cases, the composition is delivered to pollen of a plant. In some cases, the composition is delivered to seeds of a plant. In some cases, the composition is delivered to protoplasts of the plant. In some cases, the composition is delivered to tissue of a plant. For example, the composition can be delivered to a meristem of a plant (e.g., a apical meristem, a lateral meristem, or a metameristem). In some cases, the composition is delivered to permanent tissue (e.g., simple tissue (e.g., parenchyma, thick angle tissue, or thick wall tissue) or complex permanent tissue (e.g., xylem or phloem)) of the plant. In some cases, the composition is delivered to a plant embryo. In some cases, the composition is delivered to a plant cell. Vegetative and reproductive growth stages are also referred to herein as "adult" or "mature" plants.

In the case of delivering the Gene Writer system to a plant part, the plant part may be modified by a plant modifier. Alternatively, the Gene Writer system may be distributed to other parts of the plant (e.g., through the circulatory system of the plant), which are subsequently modified by the plant modifier.

Administration and delivery modes

The nucleic acid elements of the systems provided herein for use in the methods provided herein can be delivered in a variety of ways. In embodiments where the system comprises two separate nucleic acid molecules (e.g., the reverse transcriptase transposase and the template nucleic acid are separate molecules), the two molecules can be delivered in the same manner, while in other embodiments the two molecules are delivered in different manners. The compositions and systems described herein may be used in vitro or in vivo. In some embodiments, the system or components of the system are delivered to a cell (e.g., a mammalian cell, such as a human cell), e.g., in vitro, ex vivo, or in vivo. In some embodiments, the cell is a eukaryotic cell, such as a cell of a multicellular organism, such as an animal, such as a mammal (e.g., human, pig, cow), bird (e.g., poultry, such as chicken, turkey, or duck), or fish. In some embodiments, the cells are non-human animal cells (e.g., laboratory animals, livestock animals, or companion animals). In some embodiments, the cell is a stem cell (e.g., a hematopoietic stem cell), a fibroblast, or a T cell. In some embodiments, the cell is a non-dividing cell, such as a non-dividing fibroblast or a non-dividing T cell. Those of skill in the art will appreciate that components of the Gene Writer system can be delivered in the form of polypeptides, nucleic acids (e.g., DNA, RNA), and combinations thereof.

For example, delivery may use any combination of a reverse transcriptase (e.g., as DNA encoding a reverse transcriptase protein, as RNA encoding a reverse transcriptase protein or as the protein itself) and a template RNA (e.g., as DNA encoding RNA, or as RNA):

1. reverse transcription transposase DNA+template DNA

2. Reverse transcription transposase RNA+template DNA

3. Reverse transcriptase DNA+template RNA

4. Reverse transcriptase RNA+template RNA

5. Reverse transcription transposase protein+template DNA

6. Reverse transcription transposase protein+template RNA

7. Reverse transcription transposase virus+template virus

8. Reverse transcription transposase virus+template DNA

9. Reverse transcriptase virus+template RNA

10. Reverse transcription transposase DNA+template virus

11. Reverse transcriptase RNA+ template virus

12. Reverse transcription transposase protein + template virus

As described above, in some embodiments, DNA or RNA encoding a retrotransposase protein is delivered using a virus, and in some embodiments, template RNA (or DNA encoding template RNA) is delivered using a virus.

In one embodiment, the system and/or components of the system are delivered in the form of nucleic acids. For example, a Gene Writer polypeptide may be delivered in the form of DNA or RNA encoding the polypeptide, and a template RNA may be delivered in the form of RNA or its complementary DNA to be transcribed into RNA. In some embodiments, the system or components of the system are delivered on 1, 2, 3, 4, or more different nucleic acid molecules. In some embodiments, the system or components of the system are delivered as a combination of DNA and RNA. In some embodiments, the system or components of the system are delivered as a combination of DNA and protein. In some embodiments, the system or components of the system are delivered as a combination of RNA and protein. In some embodiments, the Gene Writer genome editor polypeptide is delivered as a protein.

In some embodiments, the system or components of the system are delivered to a cell, such as a mammalian cell or a human cell, using a carrier. The vector may be, for example, a plasmid or a virus. In some embodiments, the delivery is in vivo, in vitro, ex vivo, or in situ. In some embodiments, the virus is an adeno-associated virus (AAV), a lentivirus, an adenovirus. In some embodiments, the system or components of the system are delivered to the cell with a virus-like particle or virion. In some embodiments, the delivery uses more than one virus, virus-like particle, or virosome.

In one embodiment, the compositions and systems described herein may be formulated in liposomes or other similar vesicles. Liposomes are spherical vesicle structures composed of a lipid bilayer of one or more layers surrounding an inner aqueous compartment and a relatively impermeable outer lipophilic phospholipid bilayer. Liposomes can be anionic, neutral or cationic. Liposomes are biocompatible, non-toxic, can deliver both hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasmatic enzymes, and load transport across the biological membrane and the Blood Brain Barrier (BBB) (for reviews see, e.g., sphch and Navarro, journal of Drug Delivery [ journal of drug delivery ], volume 2011, article ID 469679, page 12, 2011.doi:10.1155/2011/469679).

Vesicles can be made from several different types of lipids; however, phospholipids are most commonly used to form liposomes as drug carriers. Methods for preparing multilamellar vesicle lipids are known in the art (see, e.g., U.S. patent No. 6,693,086, the teachings of which are incorporated herein by reference for multilamellar vesicle lipid preparation). Although vesicle formation is spontaneous when lipid membranes are mixed with aqueous solutions, vesicle formation can also be accelerated by applying force in the form of oscillation using a homogenizer, sonicator or squeeze device (for reviews see, e.g., sphch and Navarro, journal of Drug Delivery [ journal of drug delivery ], volume 2011, article ID 469679, page 12, 2011.doi:10.1155/2011/469679). The extruded lipids may be prepared by extrusion through a filter having a reduced size, as described in Templeton et al, nature Biotech [ Nature Biotech ],15:647-652,1997, the teachings of which are incorporated herein by reference for the preparation of extruded lipids.

Lipid nanoparticles are another example of a carrier that provides a biocompatible and biodegradable delivery system for the pharmaceutical compositions described herein. Nanostructured Lipid Carriers (NLCs) are modified Solid Lipid Nanoparticles (SLNs) that retain the characteristics of SLNs, improve drug stability and loading capacity, and prevent drug leakage. Polymeric Nanoparticles (PNPs) are an important component of drug delivery. These nanoparticles can effectively direct drug delivery to specific targets and improve drug stability and controlled drug release. Lipopolymer Nanoparticles (PLNs), a novel carrier that combines liposomes and polymers, can also be used. These nanoparticles have the complementary advantage of PNP and liposomes. PLN is composed of a core-shell structure; the polymer core provides a stable structure and the phospholipid shell provides good biocompatibility. Thus, the two components increase the drug encapsulation efficiency, promote surface modification, and prevent leakage of the water-soluble drug. For reviews, see, for example, li et al 2017, nanomaterials [ nanomaterials ]7,122; doi 10.3390/nano7060122.

Exosomes may also be used as drug delivery vehicles for the compositions and systems described herein. For review, see Ha et al, 2016, 7, acta Pharmaceutica Sinica B, proc. Pharmacology, B, volume 6, stage 4, pages 287-296; https:// doi.org/10.1016/j.apsb.2016.02.001.

The Gene Writer system can be incorporated into cells, tissues and multicellular organisms. In some embodiments, the system or components of the system are delivered to the cells via mechanical means or physical means.

Formulations of protein therapeutics are described in the following documents: meyer (editions), therapeutic Protein Drug Products: practical Approaches to formulation in the Laboratory, manufacturing, and the clinical [ therapeutic protein drug product: laboratory, manufacturing and clinical methods of formulation practice ], woodhead Publishing Series [ Wu Dehai De published series ] (2012).

All publications, patent applications, patents, and other publications and references cited herein (e.g., sequence database reference numbers) are incorporated herein by reference in their entirety. For example, all GenBank, unigene and Entrez sequences mentioned herein (e.g., in any of the tables herein) are incorporated by reference. Unless otherwise indicated, the sequence accession numbers specified herein (including in any of the tables herein) refer to current database entries up to 3 months 4 days in 2020. When a gene or protein references multiple sequence accession numbers, all sequence variants are encompassed.

Examples

The invention is further illustrated by the following examples. These examples are provided for illustrative purposes only and should not be construed in any way as limiting the scope or content of the present invention.

Example 1: internal Gene Writer deletions showing protein domain modularity

This example describes deletions in the Gene Writer polypeptide that retain function and further demonstrate modularity of the DNA binding domain.

In this example, a series of experiments were performed to test the activity of various mutant retrotransposases and to gain structural knowledge about these proteins. This experiment tested the removal of the c-myb motif polypeptide fragment in the DNA Binding Domain (DBD) and its replacement with a flexible linker (FIG. 1 a). The removed polypeptide fragment is referred to as a "natural linker" because it is the intermediate region between the DNA binding motif and the RNA binding domain. The removed polypeptide region spans the following: the N-terminal side at either position A (predicted random coil following the C-myb motif) or position B (end of the alpha helix predicted by the portion of the C-myb motif) and the removed region ends at either position v1 (the alpha helical region of R2Tg preceding the predicted-1 RNA binding motif) or position v2 (the C-terminal side of the alpha helical region of R2Tg preceding the predicted-1 RNA binding motif). Instead of the removed polypeptide fragments, the "native linker" is two linkers (linker A, XTEN: SGSETPGTSESATPES (SEQ ID NO: 1023), and linker B,3GS (GGGS) n (SEQ ID NO: 1024) for each of these mutant retrotransposases containing a different removal region (position A-v1, position A-v2, position B-v1 or position B-v 2), which are replaced with either linker A or linker B by PCR of the DNA plasmid expressing R2Tg, resulting in the sequence c-mybA-v1 replaced with 3GS linker, c-mybA-v2 replaced with 3GS linker, c-mybA-v1 replaced with XTEN linker, c-mybA-v2 replaced with 3GS linker, c-mybB-v1 replaced with XTEN linker, c-mybB-v2 replaced with 3GS linker, c-mybB-v2 replaced with XTEN linker, c-mybB-v1 replaced with XTEN 1 as shown below, and the following Table of DNA fragments were purified by DNA sequence testing.

Table e1. Amino acid sequences of r2tg mutants in which the linker replaces the "natural linker" region that intervenes in the DNA Binding Domain (DBD) and RNA binding domain.

The N-terminal DNA binding domain is shown in italics and the linker attached to the rest of the protein is shown in bold and underlined.

/>

HEK293T cells were plated in 96-well plates and grown overnight at 37 ℃ at 5% CO 2. HEK293T cells were transfected with plasmids expressing R2Tg (wild type), R2 endonuclease mutants and linker mutants. Transfection was performed using Fugene HD transfection reagent according to the manufacturer's recommendations, wherein each well received 80ng plasmid DNA and 0.5. Mu.L transfection reagent. All transfections were performed in duplicate and cells were incubated for 72 hours prior to genomic DNA extraction.

The activity of the mutants was measured by ddPCR assay which quantifies the copy number of R2Tg integration by measuring the number of 3' ligation amplicons (fig. 1 b).

The deletions starting after random crimping after the c-myb DNA binding motif (position A, c-mybA) had good tolerance to the integration activity close to the wild-type R2 Tg. The natural linker region deletion end is nearly identical for positions v1 (-N-terminus of alpha helix before 1RNA binding motif) or v2 (-C-terminus of alpha helix before 1RNA binding motif). For deletions starting at position A and ending at position v1 or v2, replacement of the polypeptide fragment with the XTEN linker (SEQ ID NO: 1023) appears to retain maximum activity, whereas the integration activity was reduced by about 50% when replaced with the 3GS linker (SEQ ID NO: 1024). For the deletion of the native linker starting at position B (c-mybB), these configurations showed a more significant decrease in integration activity compared to wild-type or position a (c-mybA). The difference in activity may be related to the protein structure based on the site of the deletion which creates a non-optimal three-dimensional structure of the retrotransposase by the position of the linker, the length of the linker or a combination of amino acids of the linker, which is not the optimal choice for linking position B and position v1 or v 2. Even though the N-terminal natural linker deletion starting position mybB is suboptimal, the deleted C-terminus is best tolerated by the 3GS or XTEN linker at v2 and appears to have a preferential position for the polypeptide before the RBD-1 region.

Example 2: determination of Gene Writer endonuclease Domain target specificity

This example describes the use of custom genomic landing pads in human cells to determine if sequence requirements are present for target cleavage and subsequent Gene Writer system integration.

In this example, the cell line was created with a "landing pad" or stable integration that mimics the rDNA region that contains the R2 position to which the R2 retrotransposase targets for retrotransposition (see fig. 2). The integrants or landing pads were designed to have wild-type region sequences in and around the R2 site found in rDNA, 12-bp sequence mutations at and around the R2 cleavage site, or 75-bp sequence mutations at and around the R2 cleavage site (Table E2). The DNA of these different landing pads was chemically synthesized and cloned into the pLenti-N-tGFP vector. Landing pads cloned into lentiviral expression vectors were confirmed and sequence verified by sanger sequencing of the landing pads. According to the manufacturer's instructions, lipofectamine2000 was used ^TM The sequence verified plasmid (9. Mu.g) and lentiviral packaging mixture (9. Mu.g, obtained from Biosettia) were transfected into packaging cell line LentiX-293T (Takara Bio, takara Shuzo). The transfected cells were incubated at 37℃with 10% CO ₂ Incubate for 48 hours (including changing the medium once at 24 hours) and then collect the culture medium containing the virus particles from the cell culture dish. The collected medium was filtered through a 0.2 μm filter to remove cell debris and prepare for transduction of U2OS cells. The virus-containing medium was diluted in DMEM and mixed with polybrene to prepare a dilution series for transduction of U2OS cells, wherein the final concentration of polybrene was 8 μg/ml. U2OS cells were grown in virus-containing medium for 48 hours and then split with fresh medium. Dividing cells grow to confluence by passage through a flow-throughGFP expression of genome-integrated lentiviruses containing GFP and different rDNA landing pads (WT, 12-bp mutation or 75-bp mutation) was examined by cytometry and ddPCR to measure transduction efficiency at different virus dilutions. Selection of dilutions from 1:10 virus medium>99% gfp+) GFP positive cell lines were used for subsequent experiments and cryopreserved.

To test whether mutations in and around the R2 cleavage site would affect Gene Writer system activity, the R2Tg Gene Writer driver was electroporated into a different landing pad cell line along with a plasmid expressing the Gene Writer transgene molecule. To test whether sequences within and around the cleavage site affect Gene Writer polypeptide sequence integration activity, the homology arm of the Gene Writer template molecule was designed to have 100bp with 100% homology to the left (Gene Writer molecular module a) and 100bp with 100% homology to the right (Gene Writer molecular module F) of the cleavage site for each landing pad. The change in homology arm to Gene Writer template molecule expression plasmid was introduced by PCR and confirmed by sanger sequencing. 73ng WT R2Tg Gene Writer driver or endonuclease domain mutant R2Tg Gene Writer driver expression plasmids were co-nuclear transfected with 177ng of plasmid expressing Gene Writer template molecules 100% homology to WT landing pads, 12-bp mutant landing pads or 75-bp mutant landing pads into each of the different U2OS landing pad cell lines (WT, 12-bp mutant or 75-bp mutant) using nuclear transfection procedure DN 100. After nuclear transfection, cells were incubated at 37℃with 10% CO ₂ Culturing was performed for 3 days, followed by cell lysis and genomic DNA extraction. Gene Writer template molecule integration of extracted gDNA at landing pad sites was measured by ddPCR. DNA nicking activity was measured by next generation sequence analysis of amplicons generated from landing pads found in gDNA, detecting insertions, deletions, and/or combinations of insertions and deletions at the landing pad.

When mutations were made in the cleavage regions without integration of the Gene Writer template molecule in the 12-bp or 75-bp landing pad cell lines, the integration activity of R2Tg Gene Writer was greatly reduced (FIG. 3 a). Furthermore, no integration was detected for the Gene Writer template molecules with homology arms corresponding to the 12-bp or 75-bp mutant landing pads. To exclude lost integrated activity was due to incompatible homology arms, DNA nick activity was measured by NGS analysis of landing pads. The nicking activity was independent of the Gene Writer template molecule, as the WT R2Tg Gene Writer driver had an indel at the WT landing pad comparable to the WT, 12-bp mutant or 75-bp mutant Gene Writer template molecule (FIG. 3 b). Neither the 12-bp nor 75-bp landing pads (whether Gene Writer template molecule co-nuclear transfected with WT R2Tg Gene Writer) showed any indels above background. The level of indels is similar to the Gene Writer template driver, which contains endonuclease mutations.

Table E2: landing pad information

In some embodiments, the Gene Writer is derived from a retrotransposase having a certain level of target sequence specificity in the endonuclease domain. Thus, it may be desirable to re-target the Gene Writer to a location in the genome that has homology to a native target sequence recognized by an endonuclease domain known as the Endonuclease Recognition Motif (ERM). In some embodiments, the sub-target sequence may be contained in a region surrounding the nick site. In a specific embodiment, a 13nt sequence (TAAGGTAGCCAAA (SEQ ID NO: 1661)) based on the nicking site (e.g., R2 Tg) of the R2 element is used to search for a suitable location in the human genome to re-target the Gene Writer, wherein the heterologous DNA binding domain is designed to localize the Gene Writer to the endogenous ERM, thereby directing endonuclease activity and subsequent reverse transcription transposition of the template RNA. In some embodiments, the human genomic locus has 100% identity to at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 nucleotides in the 13nt motif. In further embodiments, the human genomic locus comprising ERM is selected from table E3, and the fusion of the DNA binding domain (e.g., ZF, TAL or dCas 9) with the tailored gRNA is designed to localize the polypeptide to the locus (e.g., see example 3). In a preferred embodiment, the genomic loci have safe harbor scores of at least 5, 6, 7, 8, as defined in Pellenz et al Hum Gene Ther [ human Gene therapy ]30,814-282 (2019) and shown in Table E3. In some embodiments, the template RNA (or DNA encoding the template RNA) is designed such that the homology arms match the flanking genomic sequences around the intended nick site at the new target.

Table E3: the human genomic locus matches the 13nt segment around the nick site of the R2 element.

The human genome is searched for 100% identity ("match") to the complete 13nt match or 12 consecutive nucleotides. Chromosome position and start and end coordinates are provided for each match. The score ("score") is an indicator of the eight ideal safe harbor features that evaluate each site.

/>

Example 3: re-targeting Gene writers to genome safe harbor sites

This example describes the Gene Writer comprising a heterologous DNA binding domain that redirects Gene Writer activity to a genomic safe harbor site.

In this example, the Gene Writer polypeptide sequence is altered to have its native DNA binding domain replaced, mutated/inactivated, and/or linked to another polypeptide sequence that can redirect the Gene Writer system to another genomic location that is not its endogenous or native binding site. In some cases, polypeptide sequences that redirect the Gene Writer system to a non-native genomic location may also be ligated and/or inserted into any module of the Gene Writer polypeptide sequence.

In some embodiments, the polypeptide sequence used to redirect the Gene Writer system to a non-native genomic target encodes: zinc fingers, a series of adjacent, regularly or irregularly spaced zinc fingers, a transcription activator-like effector (TALE), a series of adjacent, regularly or irregularly spaced transcription activator-like effectors (TALE), cas9, mutations of Cas 9's catalytic residues inactivate double-stranded DNA endonuclease activity (known as catalytic dead Cas9 (dCas 9)), cas9 having one or more point mutations in a single catalytic domain such that Cas9 endonuclease cleaves only one double-stranded DNA (known as Cas9 nickase) (see fig. 5).

In some embodiments, the polypeptide sequences used to redirect the Gene Writer system target genomic safe harbor locations (e.g., AAVS1 site on human chromosome 19) (Pellenz, S., et al Human Gene Therapy [ human Gene therapy ],30 (7), 814-828, 2019), see FIGS. 4 and 6. In further embodiments, the polypeptide sequence for redirecting the Gene Writer polypeptide sequence is used with a nucleic acid that targets a safe harbor location of the genome (e.g., the polypeptide sequence of catalytically inactive Cas9 is used with a single guide RNA that targets the AAVS1 site on chromosome 19).

Table E4: retargeted Gene Writer construct. The example shown is the re-targeting of the R2Tg Gene Writer polypeptide sequence to the AAVS1 site using ZF or Cas9 domains.

/>

Example 4: inactivation of endogenous nucleolar localization signals in Gene Writer

This example describes the Gene Writer in which endogenous nucleolar localization signals have been inactivated to reduce intracellular targeting of proteins to nucleoli.

In this example, the nucleolar localization signal (NoLS) of retrotransposases was predicted computationally using a published algorithm trained with validated proteins that localize to nucleosomes (Scott, m.s., et al, nucleic Acids Research [ nucleic acids research ],38 (21), 7388-7399 (2010)). The predicted NoLS sequence is based on the amino acid sequence, the amino acid sequence background and the predicted secondary structure of the retrotransposase. The identified sequences are typically rich in basic amino acids (Scott, m.s., et al Nucleic Acids Research [ nucleic acids research ],38 (21), 7388-7399 (2010)), which when mutated to simple side chain non-basic amino acids or removed from the retrotransposase polypeptide chain, prevent localization to nucleoli (Yang, c.p., et al Journal of Biomedical Science [ journal of biomedical science ],22 (1), 1-15 (2015), martin, r.m., et al, nucleolus [ cell Nucleus ],6 (4), 314-325 (2015)). In some embodiments, the NoLS sequence is located in an amino acid region of the reverse transcription transposase between the reverse transcriptase polymerase motif and the restriction enzyme-like endonuclease motif. The predicted NoLS region contains lysine, arginine, histidine and/or glutamine amino acids, wherein nucleolar localisation is inactivated by mutation of one or more of these residues to alanine amino acid residues and/or removal of one or more of these amino acids from the polypeptide chain of the reverse transcriptase. In some embodiments, the amino acid sequence of the Gene Writer driver of R2Tg found upstream of RLE is mutated such that lysine (K) replaces alanine (A), e.g., predicted NoLS of R2Tg (amino acids 1,128-1,154 of the polypeptide sequence) (APTQKDKFPKPCNWRKNEFKKWTKLAS (SEQ ID NO: 1685)) is mutated at 1, 2, 3, 4, 5, 6, or 7 residues, resulting in an inactivated NoLS (APTQADAFPAPCNWRANEFAAWTALAS (SEQ ID NO: 1686)).

Example 5: application of second strand nicks in Gene Writer System

This example describes the Gene Writer system, in which retrotranspositions are paired with targeted second strand nick activity to increase the efficiency of the integration event. The second chain cut can be achieved by: (1) The Cas9 nickase was fused to the Gene Writer system, where the Gene Writer introduced one nick through its endonuclease domain (EN), while fused nickase Cas9 placed the other nick on either the top and bottom DNA strands (fig. 7A), or (2) the Gene Writer system, where the active EN domain introduced a nick, and Cas9 nickase introduced a second nick on either the top and bottom DNA strands upstream or downstream of the Gene Writer-induced nick (fig. 7B).

In the first part of this example, cas9 nickase was fused to the Gene Writer protein (fig. 7A). Cas9 targets DNA sequences through gRNA. The Gene Writer protein introduces a DNA nick through its EN domain and creates an additional nick through nickase Cas9 activity. This additional nick may target the top or bottom strand of DNA around the nick introduced by the Gene Writer (fig. 1A). Constructs designed and tested included (see schematic fig. 8A):

·Cas9-N863A-R2tg(RBD*、RT、EN)

·Cas9-H840A-R2tg(RBD*、RT、EN)

·Cas9-D10A-R2tg(RBD*、RT、EN)

·dCas9-R2tg(RBD*、RT、EN)

the DNA binding domain is a nickase Cas9, which directs the Gene Writer molecule to the DNA target through the gRNA. The RNA Binding Domains (RBDs) in this set of Gene Writer constructs were inactivated by point mutations (RBD). As an inserted donor, constructs in which the R2Tg RNA binding domain was inactivated used a gRNA extended at its 3' end to include a donor sequence for genomic modification (fig. 8B). These modifications include nucleotide substitutions, nucleotide deletions and nucleotide insertions. In this first set of experiments, fusions of the constructs-R2 Tg (RBD, RT, EN) and dCas9-R2Tg (RBD, RT, EN) with the 3' extended gRNA template targeting the AAVS1 locus were delivered to U2OS cells by nuclear transfection in SE buffer using program DN 100. The grnas used include for each construct grnas that target the bottom or top strand of DNA. Following nuclear transfection, cells were grown in complete medium for 3 days. gDNA was harvested on day 3 and amplicon sequenced and then analyzed computationally using CRISPResso (indel analysis tool). 3' extended gRNA mediated insertions, deletions or nucleotide substitutions were observed following delivery of dCas9-R2Tg (RBD, RT, EN), and increased frequency was observed upon delivery of the nickase-Cas 9-R2Tg (RBD, RT, EN) construct.

In the second part of this example, cas9 nickase was fused to the Gene Writer protein (fig. 7A). Cas9 targets DNA sequences through gRNA. The Gene Writer protein introduces a DNA nick through its EN domain and creates an additional nick through nickase Cas9 activity. This additional nick may target the top or bottom strand of DNA around the nick introduced by the Gene Writer (fig. 7A). Unlike the constructs listed above, the R2Tg RNA binding domain is active (fig. 9A), and the template for genomic modification is a transgene flanked by UTRs (fig. 9B). Constructs included (see schematic fig. 9A):

·Cas9-N863A-R2tg(RBD、RT、EN)

·Cas9-H840A-R2tg(RBD、RT、EN)

·Cas9-D10A-R2tg(RBD、RT、EN)

·dCas9-R2tg(RBD、RT、EN)

transgenes flanking the UTR require homology arms at the nick site. To determine the nick site for accurate design of the homology arm of the donor transgenic DNA, the above construct was nuclear transfected with pulse code DN100 into 200k u2os cells, where the gRNA targets the AAVS1 locus. Following nuclear transfection, cells were grown in complete medium for 3 days. gDNA was harvested on day 3 and amplicon sequenced and then analyzed computationally using CRISPResso as an indel analysis tool. The cleavage site of an EN domain is identified from the indels generated by the EN domain at the AAVS1 site. The 100bp homology arms flanking the EN cleavage site were designed and included in the transgene (see fig. 9 and 10 for the positions of the homology arms in the transgene). To achieve genomic modification, the Cas9-R2Tg fusion constructs listed above are nuclear transfected into U2OS cells together with a gRNA targeting the top or bottom strand of the AAVS1 locus and an appropriate transgene with homology to the previously determined nicking site. Following nuclear transfection, cells were grown in complete medium for 3 days. gDNA was harvested on day 3 and ddPCR was performed to detect transgene integration at AAVS1 site. Integration was observed after delivery of dCas9-R2Tg (RBD, RT, EN), and increased frequency was observed upon delivery of the nickase-Cas 9-R2Tg (RBD, RT, EN) construct.

In another example, the Gene Writer protein targets DNA through its DNA binding domain (fig. 7B). Gene Writer protein will introduce a DNA nick on the DNA strand. Furthermore, cas9 nickases are used to create a second nick on the top or bottom strand of DNA, upstream or downstream of the first nick. In this example, gene Writer plasmids targeting the AAVS1 site (fig. 10A) and having UTR flanking transgenes homologous to the AAVS1 site (fig. 10B) were nuclear transfected into 200k u2os cells using pulse code DN 100. The following Cas9 constructs were transfected with the Gene Writer plasmid (fig. 10C):

·Cas9-N863A

·Cas9-H840A

·Cas9-D10A

·dCas9

all Cas9 constructs were co-nuclear transfected with grnas targeting AAVS1 loci upstream or downstream of the top or bottom strand nicks introduced by the Gene Writer. Following nuclear transfection, cells were grown in complete medium for 3 days. gDNA was harvested on day 3 and ddPCR was performed to detect transgene integration at AAVS1 site. Integration was observed after dCas9 delivery, and increased frequency was observed when delivering the nickase-Cas 9 construct.

Example 6: improved expression of Gene Writer polypeptides by heterologous UTRs

This example describes the use of heterologous UTRs to enhance intracellular expression of Gene Writer polypeptides.

In this example, the Gene Writer polypeptide is expressed from mRNA (fig. 11). In plasmid templates for mRNA production, the original retrotransposon UTR is replaced with UTR optimized for protein expression (C3 5'UTR and ORM 3' UTR (from Asrani et al, RNA biology [ RNA biology ]15,756-762 (2018)) or 5 'and 3' UTR (from Richte et al, cell [ Cell ]168,1114-1125 (2017))). The plasmid included a T7 promoter followed by a 5'UTR, a retrotransposon coding sequence, a 3' UTR, a 3GS linker (SEQ ID NO: 1024), an SV40 Nuclear Localization Signal (NLS), an XTEN linker, a HiBit sequence and a 96-100 nucleotide long poly (A) tail (SEQ ID NO: 1687). The plasmid was linearized by enzymatic restriction to produce blunt ends or 5' overhangs downstream of the poly (a) tail and used for In Vitro Transcription (IVT) using T7 polymerase (NEB). Following IVT, RNA was treated with dnase I (NEB). After buffer exchange, enzymatic capping was performed using vaccinia capping enzyme (NEB) and 2' -O-methyltransferase (NEB) in the presence of GTP and SAM (NEB). The capped RNA was concentrated and buffer exchanged. 50,000 HEK293T cells were transfected with 0.5 μ g Gene Writer mRNA using a Neon transfection system (2 pulses in a 10 μL tip per pulse, 96 well format, 20msec per pulse) at a 1:1 molar ratio in the presence or absence of RNA template. RNA templates were transcribed in vitro from plasmids as described in example 8 (improved Gene Writer component for RNA-based delivery).

HEK293T cells were grown for 5 hours after transfection and then grown by using the standard protocol https:// www.promega.com/-/media/files/resources/protocols/technologyal-manual/500/nano-glo-inhibition-lytic-detection-system-technical-manual. Pdfla=en detects its HiBit tag expression to determine Gene Writer expression. It was found that the use of 5 'and 3' UTRs from the C3-ORM compared to the use of the native UTR from the R2Tg _exp Protein expression can be greatly improved (FIG. 11). Genomic integration was analyzed 3 days after transfection using 3' ddpcr (fig. 12).

Example 7: improved Gene Writer component for mixed RNA and DNA delivery

This example describes improvements to RNA molecules encoding Gene Writer polypeptides that, when used with Gene Writer templates encoded on plasmid DNA, enhance expression and increase retrotransposition efficiency.

In this example, gene Writer ^TM The polypeptide component of the system is expressed by the mRNA described in example 6 (improved expression of Gene Writer polypeptide by heterologous UTR). The synthesis of the plasmid template allows the reporter gene (eGFP) to be flanked by an R2Tg untranslated region (UTR) and 100bp homologous to its rDNA target. Expression of the element is driven by a mammalian CMV promoter. We use HD transfection reagents each plasmid was introduced into HEK393T cells. 24 hours prior to transfection, HEK293T cells were seeded at 10,000 cells/well in 96-well plates. On the day of transfection, 0.5. Mu.l of transfection reagent and 80ng of DNA were mixed in 10. Mu.l of Opti-MEM and incubated for 15 min at room temperature. The transfection mixture is then added to the cell-seeded medium. Cells were detached and a Neon transfection system (1150V per pulse, 20msec per pulse, 2 pulses in 10. Mu.L tip, 96 well format) was used for electroporation of 0.5. Mu.g mRNA per well.

HEK293T cells were transfected with the following test agents:

1. mRNA encoding the above polypeptide

2. Plasmid encoding the above template RNA

3.1 and 2 as described above, the plasmid was pre-lipofected 24 hours prior to mRNA transfection.

Following transfection, HEK293T cells were cultured for 1-3 days and then assayed for site-specific genome editing. Genomic DNA was isolated from each group of HEK293 cells.

ddPCR was performed to confirm integration and assess integration efficiency. Taqman probes and primers were designed as described in PCT/US 2019/048607 to amplify the desired product across the integrally ligated 5 'and 3' ends. The results of ddPCR copy number analysis (compared to reference gene RPP 30) are shown in fig. 13. Genomic integration in the presence of mRNA and template plasmid achieved an average copy number of 0.683 integrants per genome at targeting the 3 'ligation and 0.249 integrants per genome at targeting the 5' ligation. mRNA transfection alone resulted in an average copy number of 0.002 integrants per genome, as compared to plasmid transfection alone of 0.0004 integrants per genome.

Example 8: improved Gene Writer component for RNA-based delivery

This example describes improvements to RNA molecules encoding Gene Writer polypeptides that, when co-delivered with a Gene Writer RNA template, can enhance expression and increase retrotransposition efficiency.

In this example, gene Writer ^TM The polypeptide component of the system is expressed by the mRNA described in example 6 (improved expression of Gene Writer polypeptide by heterologous UTR). Plasmid templates for RNA template production include the T7 promoter followed by a reporter gene (eGFP) expressing IRES flanked by the R2Tg untranslated region (UTR) and 100bp homologous to its rDNA target. The plasmid templates were linearized by enzymatic restriction, resulting in blunt ends or 5' overhangs downstream of the RNA template, and used for In Vitro Transcription (IVT) using T7RNA polymerase (NEB). After the IVT step, the RNA is treated with dnase I (NEB) or enzymatically polyadenylation by poly (a) polymerase (NEB) or not. After buffer exchange, enzymatic capping was performed using vaccinia capping enzyme (NEB) and 2' -O-methyltransferase (NEB) in the presence of GTP and SAM (NEB). The capped RNA was concentrated and buffer exchanged. 50,000 HEK293T cells were co-transfected with 0.5 to 1. Mu.g of Gene Writer mRNA and RNA templates at a molar ratio of 1:4 to 1:12. The Neon transfection system was used for RNA transfection (1150V per pulse, 20msec per pulse, 2 pulses in a 10 μl tip, 96 well format).

Following transfection, HEK293T cells were cultured for at least 1 day and then assayed for site-specific genome editing. Genomic DNA was isolated from each group of HEK293 cells. ddPCR was performed to confirm integration and assess integration efficiency. Taqman probes and primers were designed as described in PCT/US 2019/048607 to amplify the desired product across the integrally ligated 5 'and 3' ends. An average copy number of 0.498 integrants per genome in the presence of 0.5 μg of mRNA and a 1:8 molar ratio of Gene Writer mRNA to RNA template can be achieved when the RNA template is enzymatically polyadenylation, in contrast to 0.031 integrants per genome when the RNA transgene is not polyadenylation.

Example 9: gene writers for delivery of intron-containing Gene cargo

This example describes the integration of Gene cargo comprising introns using the Gene Writer system to modulate expression of a Gene of interest from its newly introduced genomic locus by using RNA-based delivery.

In this example, the Gene writing technique uses an RNA template to encode a protein of interest, including its natural or unnatural introns. For example, intron 6 of the Triose Phosphate Isomerase (TPI) gene (not et al, 2003) will be used as one of the non-natural introns in these experiments.

The presence of introns in genomic copies of genes and their removal by splicing has been reported to affect almost every aspect of gene expression, including its transcription rate, mRNA processing, export, cell localization, translation and decay (reviewed in Shaul International Journal of Biochemistry and Cell Biology [ J.Biol.Biol.Chem. ]91B,145-155 (2017)). Introns may be inserted into different parts of the RNA template (fig. 15), and their roles in gene expression may differ depending on the location of the introns.

5’UTR _exp Introns near the transcription initiation site introduce an activating chromatin modification (Bieberstein et al, cell Reports [ Cell communication]2,62-68 (2012)), improves the accuracy of transcription initiation site recognition andand facilitates the recruitment of PolII (Laxa et al, plant Physiology]172,313-327 (2016)) increases the transcription initiation rate (Kwek et al, nature Structural Biology [ Nature structural biology ]]9,800-805 (2002)) and extension rates (Lin et al, nature Structural and Molecular Biology [ Nature Structure and molecular biology ]]15,819-826 (2008)) and improves sense-in-sense productive extension relative to antisense orientation (Almada et al, nature ]499,360-363(2013))。

3’UTR _exp The intron of (a) limits mRNA expression to one protein molecule/mRNA: the Exon Junction Complex (EJC) left by the spliceosome downstream of the stop codon is recognized by a nonsense-mediated decay (NMD) mechanism, so that the mRNA is marked for deletion at the end of the first round of translation (Zhang et al, RNA 4,801-815 (1998)).

However, the ability to use introns in therapeutic genes may be limited by splicing that occurs prior to template integration. For example, when an RNA template is encoded and delivered onto a DNA plasmid, the positively oriented introns will be spliced out, as transcription in the same direction will produce a template RNA that will be spliced out prior to integration, thus failing to incorporate the introns into the genome. Furthermore, lentiviral constructs designed for delivery of transgenes must encode sequences with inverted orientation of the introns, as the viral packaging process can result in intron splicing and intron deficiency in the packaged viral particles (Miller et al J.Virol. J. 62,4337-45 (1988)). However, reverse orientation is also believed to result in reduced viral titers and transduction (Uchida et al, nat com [ natural communication ]10,4479 (2019)). Notably, because the Gene Writer template can be generated by in vitro transcription and delivered directly as RNA, the pre-integration splicing problem of the desired intron can be avoided. In some embodiments, the Gene Writer template may thus comprise one or more introns oriented synonymously with transcripts produced by the IVT and delivered as RNA to the target cells.

Introns anywhere described in FIG. 15 will recruit U1 snRNP, which protects mRNA from premature cleavage and polyadenylation (Kaida et al Nature [ Nature ]468 664-681 (2010); berg et al Cell [ Cell ]150,53-64 (2012)). In addition, EJC interacts with components of the TREX (transcription-export) complex and increases the export rate of mRNA from the nucleus to the cytoplasm by 6-10 fold compared to constructs lacking introns (Valencia et al, PNAS [ Proc. Natl. Acad. Sci. USA ]105,3386-3391 (2008)). Binding of a polypyrimidine bundle binding protein, a splicing regulator, has also been demonstrated to mediate a significant increase in half-life of spliced transcripts (Lu & Cullen, RNA 9,618-630 (2003); millevoi et al, nucleic Acid Research [ nucleic acids Ind. 37,4672-4683 (2009)). The efficiency of mRNA translation was shown to be increased by the presence of SR proteins (serine-arginine rich proteins involved in RNA splicing) (Sanford et al, genes & Development [ Gene & Development ]18,755-768 (2004); sato et al, molecular Cell [ Molecular Cell ]29,255-262 (2008)) and EJC proteins and their peripheral factors (not et al, genes & Development [ Gene & Development ]18,210-222 (2004)).

In this example, a template RNA containing one or more introns and a Gene Writer polypeptide are delivered to cells as in vitro transcribed capped RNAs, as described in example 8 (improved Gene Writer component for RNA-based delivery). GOI expression and genomic integration were determined 1 to 3 days post transfection. In some embodiments, genomic integration and/or protein expression will be higher for an intron-containing RNA template.

Example 10: engineering of retrotransposon 5' UTRs to improve integration efficiency

This example describes deletion, substitution or mutation of the 5' UTR of retrotransposons to increase integration efficiency.

The 5' UTR region of the non-LTR retrotransposon has a variety of functions, including self-cleaving ribozyme activity, which has been shown in some elements and predicted in other retrotransposons (see modules B and C of FIGS. 26-27) (Ruminski et al J Biol Chem [ journal of biochemistry ]286,41286-41295 (2011)). Ribozyme activity is expected to cleave RNA within or upstream of the 5' UTR. Such active and structural components that increase or limit the 5' UTR may be beneficial for retrotransposition efficiency. FIG. 28 provides predictions of the ribozyme structure for R2 Tg.

To assess engineering of the 5' utrs, constructs were designed to enhance or reduce these activities (fig. 14). In case (a), the natural 5' utr of R2Tg is used for integration in trans, as in the previous experiments. Case (B) illustrates the deletion of the 5' UTR. (C) And (D) indicates the case where the 5'utr from the original species (in this case R2Tg from the thoracolumbar) has been replaced by the 5' utr from the retrotransposon from a different species. Case (C) provides an example in which the 5'utr from a.maritima R2 replaces the 5' utr of R2 Tg. (D) Represents the general case where UTRs ("Rx") from other species may be substituted, for example from bombyx mori, d.ananasse, centipede (f.auricaria), horseshoe crab (l.polyphemus), gelti Jin Xiaofeng (n.giraulti) or medaka (o.latipes), or retrotransposons from any one of tables 1-3 selected from the tables herein or PCT/US 2019/048607 (incorporated herein by reference in its entirety). Case (E) represents a substitution of a ribozyme, e.g., a hammerhead ribozyme, e.g., riboJ (Lou et al Nat Biotechnol [ Nature Biotechnology ]30,1137-1142 (2012)). Case (F) represents that the 5'utr of R2Tg is inactivated by point mutation, e.g. 75c > t in the 5' utr (fig. 14.B, position indicated by hatched box). The 5' UTR sequence is expected to be modular for any retrotransposon mediated insertion.

Each case was evaluated as in the previous example by: gene Writer polypeptide plasmids were transfected with template plasmids and the frequency of integration was assessed by ddPCR. In some embodiments, substitution or mutation of the 5' utr results in increased integration efficiency.

Example 11: modification of the 5 'and 3' ends of Gene Writer RNA components to improve RNA stability

This example describes the addition of non-coding sequences at the 5 'and 3' ends of RNA to improve stability in mammalian cells.

Decay of eukaryotic RNA in cells is mainly performed by riboexonucleases. In this example, the half-life of the RNA is extended by introducing protective sequences and/or modifications at its 5 'and 3' ends. The most common natural method of protecting the ends of RNA is to introduce a 5 'cap structure and a 3' poly (a) tail. In this example, gene Writer ^TM Polypeptide component of the System by examples6 (improved expression of Gene Writer polypeptide by heterologous UTR). Plasmid templates for RNA template production include the T7 promoter followed by a reporter gene (eGFP) expressing IRES flanked by the R2Tg untranslated region (UTR) and 100bp homologous to its rDNA target. The plasmid templates were linearized by enzymatic restriction, resulting in blunt ends or 5' overhangs downstream of the RNA template, and used for In Vitro Transcription (IVT) using T7 polymerase (NEB). After the IVT step, the RNA is treated with dnase I (NEB) or enzymatically polyadenylation by poly (a) polymerase (NEB) or not. After the buffer exchange step, the enzymatic capping reaction to generate cap 1 structure was performed as described in example 8 (improved Gene Writer component for RNA-based delivery) or not. The template RNA was concentrated and buffer exchanged. 50,000 HEK293T cells were co-transfected with 0.5. Mu. g Gene Writer mRNA and RNA template at a molar ratio of 1:1 to 1:8 using a Neon transfection system (2 pulses in a 10. Mu.L tip, 96 well format, 20msec per pulse).

Following transfection, HEK293T cells were cultured for 1-3 days and then assayed for site-specific genome editing. Genomic DNA was isolated from each group of HEK293 cells. ddPCR was performed to confirm integration and assess integration efficiency. Taqman probes and primers were designed as described in PCT/US 2019/048607 to amplify the desired product across the integrally ligated 3' end. Genomic integration was improved when using enzymatic capping and poly (a) tailing templates (fig. 16).

In comparison to 0.031 integrants/genome when the RNA transgene was not enzymatically polyadenylation, an average copy number of 0.498 integrants/genome was achieved in the presence of 0.5 μg mRNA and 1:8 molar ratio of mRNA to RNA template when the RNA template was enzymatically polyadenylation.

Modification of the 3' -end of RNA.

Interactions between poly (A) tails shorter than 15-20nt and poly (A) binding proteins (PABP) are reported to be unstable, resulting in rapid RNA degradation (Chang et al, molecular Cell [ Molecular cells ]53,1044-1052 (2014); subtelny et al, nature [ Nature ]508,66-71 (2014)). To determine the appropriate length of the poly (a) tail of the template RNA, we will test its length of 30, 40, 50, 60, 70, 80, 90 and 100 nucleotides. The IVT template will be generated by PCR using reverse primers encoding poly (a) tails of the length described above. IVT, dnase I treatment and capping of Gene writers and RNA templates will be performed as described in example 8 (improved Gene Writer component for RNA-based delivery). One to three days after transfection, genomic integration will be determined. In some embodiments, genomic integration of an RNA template with a poly (a) tail of suitable length will be higher.

In cells, RNA degradation is initiated by shortening its poly (a) tail by the enzyme adenylase. Since the enzyme des-adenosylase is a 3'-5' riboexonuclease that favors poly (A) extension, the terminal uridine, cytidine, and most commonly guanine that are detected in the natural poly (A) tail of many mRNAs are proposed to protect the poly (A) tail from shortening (Chang et al, molecular Cell [ Molecular Cell ]53,1044-1052 (2014)). We will determine Gene Writer and template RNA with encoded poly (A) tails (with terminal G or C, or intermittent G or C (similar to that used by Lim et al, science 361,701-704 (2018)) as described previously.

Some RNAs have been described as evolving alternative methods of protecting their 3' ends. It has been reported that a specific 16 nucleotide long stem-loop structure flanked by unpaired 5 nucleotides protects the 3' end of mRNA encoding H2a.X histone (Mannironi et al, nucleic Acid Research [ nucleic acids Res. 17,9113-9126 (1989)). Heterologous mRNAs ending in a histone stem-loop structure have been shown to be cell cycle regulated (Harris et al, molecular Cellular Biology [ molecular cell biology ]11,2416-2424 (1991); stauber et al, EMBO Journal [ EMBO J ]5,3297-3303 (1986)). The stem-loop structure is recognized and protected by stem-loop binding protein (SLBP). Proteins accumulate shortly before cells enter the S phase and degrade rapidly at the end of the S phase (Whifield et al, molecular Cellular Biology [ molecular cell biology ]20,4188-4198 (2000)). The stem-loop element will be inserted into the 3' end of the Gene Writer mRNA and RNA templates and tested as described above to induce cell cycle specific genomic integration events.

Some viruses and long non-coding RNAs have been advanced to protect their 3' ends with triple helix structures (Brown et al, PNAS [ national academy of sciences of the united states of america ]109,19202-19207 (2012)). Furthermore, the structural elements of tRNA, Y RNA and vat RNA (reviewed in Labno et al, biochemica et Biophysica Acta [ journal of biochemistry and biophysics ]1863,3125-3147 (2016)) have been reported to extend the half-life of these non-coding RNAs. We will insert structures to protect the 3' end of the RNA template and probe their efficiency in the Gene Writing system as described above.

Finally, we will incorporate dntps, 2' o-methylated NTPs or phosphorothioate-NTPs at the 3' end of the RNA transgene to extend the half-life of these molecules by protecting the 3' end of the RNA from the effects of riboexonucleases. We will incorporate a single modified nucleotide or fragment thereof by extending the 3' end of the RNA with a DNA polymerase (e.g., klenow fragment) that is capable of extending the RNA sequence by adding modified nucleotides (Shcerbakova & Brenow, nature Protocols [ Nature laboratory Manual ]3,288-302 (2008)).

Chemical modification of the mononucleotide at the 3 'end of the RNA can be accomplished by first oxidizing the 3' end of ribose with sodium periodate to form a reactive aldehyde, and then conjugating with the aldehyde-reactive modified nucleotide.

Alternatively, T4 DNA or T4 RNA ligase may be used to ligate the modified nucleotide fragment splint to the 3' end of the RNA (Moore & Query, methods in Enzymology [ methods of enzymology ]317,109-123 (2000)).

Chemical ligation of the two fragments is also possible. The phosphodiester linkage between two RNA substrates can be formed by activating a phosphomonoester group using a reactive imidazole or by using a condensing reagent such as cyanogen bromide. The disadvantage of chemical ligation is that it may also result in the creation of 2'-5' phosphodiester linkages and the desired 3'-5' phosphodiester linkages.

Modification of the 5' -end of RNA

In addition to the cap 1 structure described in the examples (improved Gene Writer component for RNA-based delivery), other 5' terminal protecting groups will be explored. In particular, we will use hypermethylated (Wurth et al Nucleic Acid Res [ Nucleic Acid research]42,8663-8677 (2014)), phosphorothioate (Kuhn et al, gene Therapy [ Gene Therapy)]17,961-971(2010))、NAD ⁺ Derived (Kiledjian, trends in Cell Biology [ trend in cell biology ]]28,454-464 (2018)) and modified (e.g., biotinylated: bednarek et al Phil Trans R Soc B [ london Royal university journal b edit-bioscience]373,20180167 (2018)) cap analogues are used for co-transcriptional capping.

We will also label the 5' of the RNA with 5' - [ γ -thio ] triphosphate to produce a reactive thio group and chemically modify the 5' end by protective modification using a haloacetamide derivative of the modified group.

The proposed modifications that protect the 3 'and 5' ends of the RNA will be introduced into the RNA template and/or Gene Writer mRNA (if compatible with translation). The genomic integration efficiency of RNA will be tested as described in example 8 (improved Gene Writer component for RNA-based delivery).

Example 12: use of modified RNA bases in Gene Writer System

This example describes the Gene Writer system comprising modified RNA bases to potentially improve the properties of the system, e.g., increase integration efficiency, decrease the response of cells to exogenous nucleic acids. For the Gene Writer polypeptide, the proposed modifications related to the coding region are compatible with translation. For RNA templates, the proposed modifications are compatible with reverse transcription.

In this example, mRNA encoding the Gene Writer polypeptide was transcribed in vitro, with the corresponding rtp being 100% replaced by one of the following modified rtps: pseudouridine (ψ), 1-N-methyl-pseudouridine (1-Me- ψ), 5-methoxy-uridine (5-MO-U) or 5-methylcytidine (5 mC). In other aspects, RNA preparation, purification, and cell transfection were performed as described in example 8 (improved Gene Writer component for RNA-based delivery). The gene integration capacity of the modified mRNA was compared with that of the unmodified mRNA (G0) using ddPCR, and all polypeptide mrnas were paired with the unmodified template RNA (fig. 17). Integration was detected when each modified rNTP was used to encode the polypeptide, with the highest signal from 5-MO-U and the lowest from 5mC. This suggests that the Gene Writer polypeptide component is functional when expressed from mRNA containing modified bases.

In addition, this example describes the modularization of the Gene Writer template molecule, where it consists of all or a subset of the exemplary modules listed in FIG. 19 and illustrated in FIG. 18. The individual modules may be produced as a continuous nucleic acid molecule by chemical or in vitro synthesis or as individual fragments which are subsequently combined together. The individual modules of the Gene Writer template molecule can be chemically modified nucleic acids, partially or fully composed of non-nucleic acids, rearranged in sequence, and/or omitted to form the Gene Writer template molecule.

In some embodiments, the Gene Writer template molecule (all modules, A-F) is synthesized by in vitro transcription, wherein 0-100% of the corresponding rNTPs (adenosine, cytidine, guanosine, and/or uridine) are replaced with one or more modified rNTPs (base or ribose modifications), for example 5' -hydroxy, 5' -phosphate, 2' -O-methyl, 2' -O-ethyl, 2' -fluoro, ribothymidine, C-5 propynyl-dC (pdC), C-5-propynyl-dU (pdU), C-5-propynyl-C (pC), C-5-propynyl-U (pU), 5-methyl C, 5-methyl U, 5-methyl dC, 5-methyl dU methoxy, (2, 6-diaminopurine), 5' -dimethoxytrityl-N4-ethyl-2 ' -deoxycytidine, C-5 propynyl-fC (pfC), C-5 propynyl-fU (pfU), 5-methyl fC, C-5 propynyl-mC (pmC), C-5 propynyl-fU (pmU), 5-methyl mC, 5-methyl mU, LNA (locked nucleic acid), MGB (minor groove binder) pseudouridine (Me), 1-N-methyl pseudouridine (Me-1-Psyllo-t) 5-methoxyuridine (5-MO-U). The modified nucleotides in this example rely on incorporation by a transcription reaction that utilizes the native or mutant polypeptide sequence of an RNA polymerase that readily incorporates the modified nucleotide into an RNA transcript prepared in vitro (Padilla, R., nucleic Acids Research [ nucleic acids research ],30 (24), 138e-138,2002; ibach, J., et al, journal of Biotechnology [ journal of Biotechnology ],167 (3), 287-295,2013; meyer, A.J., et al, nucleic Acids Research [ nucleic acids research ],43 (15), 7480-7488, 2015). The modified Gene Writer template molecule is generally fully or partially compatible with the reverse transcriptase activity of the Gene Writer polypeptide sequence; for the module or part of the module of the Gene Writer template molecule used as a reverse transcription template, modifications compatible with reverse transcription are preferred (Motorin et al, methods in Enzymology [ methods of enzymology ]425 21-53,2007; mauger et al, PNAS [ Proc. Natl. Acad. Sci. USA ]116,24075-24083,2019). The Gene Writer system with the template molecule comprising the modified rtp was tested as described above and in example 8 (improved Gene Writer component for RNA-based delivery).

In some embodiments, each module is chemically synthesized, containing modified nucleotides, such as 5' hydroxy, 5' phosphate, 2' -O-methyl, 2' -O-ethyl, 2' -fluoro, ribothymidine, C-5 propynyl-dC (pdC), C-5-propynyl-dU (pdU), C-5-propynyl-C (pC), C-5-propynyl-U (pU), 5-methyl C, 5-methyl U, 5-methyl dC, 5-methyl dU methoxy, (2, 6-diaminopurine), 5' -dimethoxytrityl-N4-ethyl-2 ' -deoxycytidine, C-5 propynyl-fC (pfC), C-5 propynyl-fU (pfU), 5-methyl fC, 5-methyl fU, C-5 propynyl-mC (pmU), C-5-methyl mC, LNA (LNA), pseudouridine (MGC), pseudouridine (Psyllo-1-U), pseudouridine (m-U), and combinations thereof, wherein each module is linked by enzymatic (e.g., splint ligation using T4 DNA ligase, moore, M.J., & Query, C.C.methods in Enzymology [ methods of Enzymology ],317,109-123,2000) or chemical process (e.g., fedorova, O.A., et al, nucleosides and Nucleotides [ nucleoside with nucleotide ],15 (6), 1137-1147, 1996) are ligated together to form the complete Gene Writer template molecule.

An example of a modified Gene Writer template molecule is a chemically synthesized RNA with 100nt each of modules A and F, with cytidine and uridine nucleotides containing 2' -O-methyl ribose modifications, and module A containing a (3) phosphorothioate linkage between the first 3 nucleotides at the 5' end and module F containing a (3) phosphorothioate linkage between the last 3 nucleotides at the 3' end of the module. Modules B-E are synthesized by in vitro transcription using RNA polymerase (RNAP) (e.g., T7 RNAP, T3 RNAP, or SP6 RNAP (NEB)) or derivatives thereof with enhanced properties (e.g., increased fidelity, increased processivity, or increased efficiency of incorporation of modified nucleotides). Module A was attached to the 5 'end of the in vitro transcribed module B-E molecule and module F was attached to the 3' end of the in vitro transcribed module B-E molecule by a splint ligation (described in Moore, M.J. & Query, C.C.methods in Enzymology [ methods of Enzymology ],317,109-123,2000). This fully assembled template RNA (all modules, a-F) was then used with the Gene Writer polypeptide (or nucleic acid encoding the polypeptide) in the target cell to assess genomic integration as in the previous examples. In some embodiments, RNA modification does not reduce integration efficiency by more than 50%, e.g., as measured by ddPCR. In some embodiments, RNA modification improves integration efficiency, e.g., as measured by ddPCR. In some embodiments, RNA modification improves reverse transcription reactions, e.g., improves sustained synthesis ability or fidelity as measured by sequencing of integration events.

Example 13: gene Writer template without UTR incorporated

This example describes the configuration of the Gene Writer template molecule that results in UTR exclusion, so these regions for retrotransposition do not integrate into the host cell.

In this example, we describe the localization, omission and/or substitution of the UTR module of the Gene Writer template molecule (fig. 18 and 19) to result in the Gene Writer driver not incorporating the UTR module into the genome as part of a retrotransposition. In some embodiments, the Gene Writer template molecule modules for the 5 'and 3' utrs (modules b+c and E of the Gene Writer template molecule) are moved to the ends of the molecule so that their function of interacting with the Gene Writer driver is not altered, but the homology arms are now located near the heterologous subject sequence (module D), where the complementarity of the homology arms acts as a primer for reverse transcription. In some cases, modules B and/or C are omitted from the Gene Writer template molecule, module E follows module F.

Other examples of not incorporating UTR into the genome are removal of modules B and C from the Gene Writer template molecule, repositioning of module F (3' homology arm) to module D (heterologous subject sequence), and substitution of module E with a binding ligand, e.g. biotin. The Gene Writer template molecule will now consist of module A (5 'homology arm) -module D (heterologous subject sequence) -module F (3' homology arm) -module E consisting of biotin. The Gene Writer driver polypeptide sequence will be modified to incorporate the amino acid sequence of monomeric streptavidin. This example illustrates how the utility of mediating the association of non-nucleic acid mediated Gene Writer template molecules with Gene Writer driver polypeptide sequences.

Example 14: the length of the homologous arm influences the reverse transcription transposition efficiency

This example describes the modulation of Homology Arms (HA) flanking a transgene to increase the frequency of related retrotransposition events.

Retrotransposition is believed to be mediated by priming events on the 3' of the integrated transgene. Initiation of transgenic RNA from nicked host genomic DNA requires homology between the 3 'end of the transgenic RNA and the genomic DNA 3' of the host nick. Although the method of 5' regression of retrotransposition is unknown, such regression may also benefit from homology through host-mediated repair pathways. In addition, processing of the 5 'end of the transgenic RNA may affect retrotransposition, i.e., cleavage by ribozymes upstream of the 5' UTR. Thus, flanking homology of the load transgene is regulated to optimize retrotransposition.

Plasmid transfection was performed to test the effect of transgene homology on trans-integration efficiency. Plasmids expressing R2Tg with endonuclease inactivating mutations or control R2Tg were co-delivered with transgenic plasmids containing target homologies of different lengths and optionally random DNA stuffer sequences (table E5 below, fig. 20). The stuffer sequence is used to control the effect of transcript length on retrotransposition. A total of 500ng of plasmid was nuclear transfected into 2e5 293T cells by a Lonza 4D shuttle (Lonza 4D shuttle) with program DS-150 with a 1:4 molar ratio of driver to transgene plasmid. After 3 days, genomic DNA of the cells was isolated. Drop digital PCR (ddPCR) was performed across each ligation using primers specific for the transgene and Taqman probes in combination with primers at the intended rDNA locus. The number of copies found represents the efficiency of integration of the 3 'or 5' end of the transgene.

The results are shown in FIG. 21. Constructs tested for the longest homology (100 bp on either side) showed the highest integration efficiency.

Table E5: template constructs were tested for homology at the 5 'and 3' ends, with different lengths, including stuffer sequences to keep the total flanking sequence at 100bp at each end.

5' homology arm Length (bp)	3' homology arm Length (bp)	Random filling sequences (to 100 bp) are present
			100	100	n/a
0	0	N
			0	0	Y
5	5	N
			5	5	Y
10	10	N
			10	10	Y
20	20	N
			20	20	Y
50	50	N
			50	50	Y
90	90	N
			90	90	Y

Example 15: homology arm forbesity and specificity affect reverse transcription transposition efficiency

This example describes the necessity of accurate alignment of homology arm designs with incision site during retrotransposition.

The importance of homology arms to efficient retrotransposition is highlighted by the data from example 14 above. In many cases, the position of the incision for a given retrotransposon may be difficult to characterize. In these cases, the homology arm or arms of the design may not be initiated near the nick site in the genomic DNA. To assess the dependence of homology arm localization on retrotransposition efficiency, we designed constructs with 3' homology arms shifted by multiple bases relative to the known cleavage site of R2Tg (table E6, below, fig. 22). In the case of 3 'homology arms moving 3' from the cleavage site (+) homology arms are effectively shortened. In the case of a 3' homology arm moving to the 5' of the cleavage site, a homology base is added from the 5' arm.

The driver and transgenic plasmid were co-transfected into 293T cells as described in the examples above. On day 3, genomic DNA was extracted. Integration frequency was measured by ddPCR as described above. A significant loss of integration frequency was noted when the 3' homology arm was moved relative to either position of the WT cleavage site (fig. 23).

Table E6: template constructs were tested with different homologous positions at the 3 'end, including relative shifts in the position of the 3' end sequence.

5' homology arm Length (bp)	3' homology arm Length (bp)	Right arm movement (relative to the intended incision site)
			100	100	0
100	99	+1
			100	97	+3
100	95	+5
			100	90	+10
100	101	-1
			100	103	-3
100	105	-5
			100	110	-10

Example 16: gene Writer can integrate Gene cargo independent of homology directed repair pathways

This example describes the use of the Gene Writer system in human cells in which the homologous recombination repair pathway is inhibited.

In this example, U2OS cells were treated with 30pmol (1.5. Mu.M) of non-targeted control siRNA (control) or siRNA directed against Rad51 (core component of the homologous recombination repair pathway). SiRNA was co-delivered in trans with R2Tg driver and transgene plasmid (driver and transgene configuration schematic is shown in FIG. 24). Specifically, a plasmid expressing R2Tg, a control R2Tg with a mutation in the RT domain, or a control R2Tg with an endonuclease inactivating mutation was used in combination with the transgene (fig. 25A, B). Using pulse codes DN100, a total of 250ng DNA plasmid to 1:4 molar ratio of driver ratio transgene and 30pmol siRNA were nuclear transfected into 200k U2OS cells resuspended in 20. Mu.L of nuclear transfection buffer SE. Protein lysates collected on day 3 showed no Rad51 present under the conditions of siRad51 treatment (fig. 24C). gDNA was extracted on day 3 and ddPCR analysis was performed to detect transgene integration at the rDNA locus. The results of ddPCR copy number analysis (compared to reference gene RPP 30) are shown in fig. 25. The absence of Rad51 resulted in a reduction of about 20% in R2Tg mediated transgene integration at the 3 'and 5' junctions at the rDNA locus (fig. 25), suggesting that R2Tg mediated transgene insertion was not entirely dependent on the presence of the homologous recombination pathway and could occur without the endogenous HR pathway. In some embodiments, HR independence enables Gene Writing to work in cells and tissues of endogenous low levels of HR, e.g., liver, brain, retina, muscle, bone, nerves, G ₀ Or G ₁ Phase cells, non-dividing cells, senescent cells, terminally differentiated cells. In some embodiments, HR independence enables Gene Writing to function in cells or in patients or tissues that have mutations in genes involved in HR pathways (e.g., BRCA1, BRCA2, P53, RAD 51).

Example 17: gene Writer can integrate Gene cargo independent of single stranded template repair pathways

This example describes the use of the Gene Writer system in human cells in which the Single Stranded Template Repair (SSTR) pathway is inhibited.

In this example siRNA against a core component of the SSTR pathway will be used to inhibit the pathway: FANCA, FANCD2, FANCE, and USP1. Control siRNA for non-target controls will also be included. 200k U2OS cells were nuclear transfected with 30pmol (1.5. Mu.M) siRNA and R2Tg driver and transgene plasmid (trans configuration). Specifically, 250ng of a plasmid expressing R2Tg, a control R2Tg with mutations in the RT domain, or a control R2Tg with endonuclease inactivating mutations were used in combination with the transgene at a molar ratio (driver to transgene) of 1:4. Transfection of U2OS cells was performed in SE buffer using program DN 100. Following nuclear transfection, cells were grown in complete medium for 3 days. gDNA was harvested on day 3 and ddPCR was performed to assess the integration of rDNA sites. Transgenic integration on rDNA was detected without core SSTR pathway components.

Example 18: gene Writer System with enhanced Activity on target cells compared to non-target cells

This example describes the incorporation of regulatory sequences into the Gene Writer system to reduce integrated activity in non-target cells.

In this example, genetic regulation is accomplished by: (i) Use of a tissue-specific promoter to up-regulate expression and integration of components in a target cell and (ii) use of a miRNA binding site to reduce integration in a non-target cell having an increased endogenous level of the corresponding miRNA. The target cells used are human hepatocytes and the non-target cells are Hematopoietic Stem Cells (HSCs). The driver incorporated herein is a plasmid encoding a Gene Writer polypeptide (e.g., R2Tg reverse transcriptase) driven by a different promoter and having an disordered or specific miRNA binding site following the coding sequence. The template for integration is encoded on plasmid DNA, so transcription produces a heterologous subject sequence that is homologous and flanked by UTRs. The heterologous subject sequence may comprise a reporter gene driven by a different promoter and having an disordered or specific miRNA binding site following the coding sequence. The control promoter used herein is CMV and the control for the miRNA binding site is a random, disordered version of the miR-142 binding site. The target tissue specific promoter used herein is apoe.hcr.haat, which is expressed in hepatocytes, and the off-target tissue specific miRNA binding site is complementary to miR-142 (uguaguguuuccuacuuuaugga (SEQ ID NO: 1688)) expressed in HSCs.

The target cells and non-target cells were nuclear transfected with a combination of Gene Writer polypeptide (1) and template (2) selected from the group consisting of:

(HEK 293T compared to HEK293T with miRNA

Gene Writer polypeptide construct (1):

a. nonspecific driver: CMV-R2Tg

b. Nonspecific inactivating driver: CMV-R2Tg (EN)

c. Tissue-specific driver: apoE.HCR.hAAT-R2Tg-miR142

d. Tissue-specific inactivating drivers: apoE.HCR.hAAT-R2Tg (EN x) -miR142

Gene Writer template construct (2):

a. nonspecific transgene: CMV-gfp

b. Tissue-specific transgenesis: apoE.HCR.hAAT-gfp-miR142

Cells were cultured for at least three days and then assessed for integration efficiency and reporter gene expression. For integration efficiency, ddPCR was performed to quantify the average number of integrations per genome per sample. In some embodiments, the ratio between integration efficiency in target cells and non-target cells is higher when using templates paired with tissue-specific driver (1 a) compared to non-specific driver (1 c). To assess the expression of the reporter gene, cells were analyzed by flow cytometry to detect GFP fluorescence and RT-qPCR to detect transcription. In some embodiments, the ratio of fluorescence in target cells and non-target cells is higher when using a driver paired with the tissue-specific transgene cassette (2 b) compared to the non-specific transgene cassette (2 a). In some embodiments, the ratio of transcript levels in target cells and non-target cells is higher when using a driver paired with the tissue-specific transgene cassette (2 b) as compared to the non-specific transgene cassette (2 a). In some embodiments, the combination of the tissue-specific driver (1 a) and the tissue-specific transgene cassette (2 b) results in the highest ratio of transcription or expression between the target cells and non-target cells.

Example 19: gene Writer application in human chimeric liver mouse model ^TM The system delivers therapeutic genes to the liver.

This example describes Gene Writer in vivo ^TM The genome editing system is delivered to hepatocytes for integration and stable expression of genetic payloads. Promoters and miRNA recognition sequences for expression control and therapeutic genes are intended to illustrate this approach and are selected from table 2 of WO 2020014209 (incorporated herein by reference), and tables 3 and 4 of WO 2020014209, respectively.

In this example, human hepatocytes from OTC deficient patients were transplanted into a mouse model (Ginn et al JHEP Reports [ jhepreport ]]2019 Using Gene Writer) ^TM The system provides OTC expression cassettes for integration into hepatocytes. Gene Writer ^TM The polypeptide component comprises an expression cassette for R2Tg reverse transcriptase (table 3), the template component comprises an expression cassette for the human OTC gene (table 5 of WO 2020014209) flanked by UTR sequences required for R2Tg binding and reverse transcription transposition (table 3) and further flanked by 100nt homology to the target site in the ribosomal DNA. In this example, both the transposase and the template expression cassette additionally comprise the hAAT promoter for hepatocyte-specific expression (Table 3 of WO 2020014209) and a miRNA recognition sequence complementary to the seed sequence of miR-142 (Table 4 of WO 2020014209) for down-regulating expression in hematopoietic cells.

1.Gene Writer ^TM Polypeptide components: rAAV2/NP59.hAAT.R2Tg

2. Gene Writer with endonuclease mutations ^TM Polypeptide: rAAV2/NP59.hAAT.R2TgEN

3.GeneWriter ^TM Template composition: rAAV2/NP59.hAAT.OTC

4. Reporter Gene Writer ^TM Template composition: rAAV2/NP59.hAAT.GFP

Human hepatocytes (isolated from pediatric donors or purchased from Lonza corporation (Basel, switzerland)) were implanted into female Fah 8 to 12 weeks old ^-/- Rag2 ^-/- Il2rg ^-/- (FRG) mice, as described previously (Azuma et al Nat Biotechnol [ Nature Biotechnology)]2007). Transplanted mice circulate 2- (2-nitro-4-trifluoro-methylbenzoyl) -1, 3-cyclohexanedione (NTBC) in drinking water to promote liver regeneration. Blood was collected every two weeks and at the end of the experiment, and human albumin levels in serum were measured by enzyme-linked immunosorbent assay (ELISA; besom laboratories (Bethyl Laboratories, inc.), montgomery, texas) and used as markers for estimating the transplantation level. 11 weeks after transplantation, gene writers packaged in NP59 (which is a highly human hepadnavicular AAV capsid) were used ^TM Mice were treated. The following vectors were administered by i.p. injection:

therapeutic Activity Gene Writing ^TM : (1) And (3)

Reporter Activity Gene Writing ^TM : (1) And (4)

Integration of inactive therapeutic controls: (2) And (3)

Integration inactivated reporter control: (2) And (4)

Following vector injection, mice were recirculated on NTBC for 5 weeks and then euthanized. DNA and RNA were then extracted from the liver lysates by standard methods. OTC expression was then determined by RT-qPCR of the isolated RNA samples using sequence specific primers. To confirm the integration of the construct and analyze the genomic position, genomic DNA samples were one-way sequenced by reading the surrounding genomic sequences outward on MiSeq using specific primers that annealed to the inserted gene.

Example 20: application Gene Writer ^TM System for delivering therapeutic genes to young mice or adult mice models of disease Liver of type.

This example describes Gene Writer in vivo ^TM The genome editing system is delivered to hepatocytes for integration and stable expression of genetic payloads. Promoters and miRNA recognition sequences for expression control and therapeutic genes are intended to illustrate this approach and are selected from tables 2, 3 and 4 of WO 2020014209, respectively.

In this example, gene Writer was evaluated using an OTC deficient mouse model ^TM A system that aims to deliver OTC expression cassettes for integration into hepatocytes. Gene Writer ^TM The polypeptide component comprises an expression cassette for R2Tg reverse transcriptase (table 3), the template component comprises an expression cassette for the human OTC gene (table 5 of WO 2020014209) flanked by UTR sequences required for R2Tg binding and reverse transcription transposition (table 3) and further flanked by 100nt homology to the target site in the ribosomal DNA. In this example, both the transposase and the template expression cassette additionally comprise the hAAT promoter for hepatocyte-specific expression (Table 3 of WO 2020014209) and a miRNA recognition sequence complementary to the seed sequence of miR-142 (Table 4 of WO 2020014209) for down-regulating expression in hematopoietic cells.

1.Gene Writer ^TM Polypeptide components: rAAV2/8.hAAT.R2Tg

2. Gene Writer with endonuclease mutations ^TM Polypeptide: rAAV2/8.hAAT.R2TgEN

3.GeneWriter ^TM Template composition: rAAV2/8.hAAT.OTC

4. Reporter Gene Writer ^TM Template composition: rAAV2/8.hAAT.GFP

Female Otc 1 to 2 days or 8 to 12 weeks old lacks Spf ^ash Mice (C57 BL/6/C3H-F1 background) were treated with Gene Writer packaged in AAV8 (which is a hepadnavicular AAV capsid) ^TM Treatment is performed. The following vectors were administered by i.p. injection:

therapeutic Activity Gene Writing ^TM : (1) And (3)

Reporter Activity Gene Writing ^TM : (1) And (4)

Integration of inactive therapeutic controls: (2) And (3)

Integration inactivated reporter control: (2) And (4)

After 5 weeks, DNA and RNA were then extracted from the liver lysate by standard methods. OTC expression was then determined by RT-qPCR of the isolated RNA samples using sequence specific primers. To confirm the integration of the construct and analyze the genomic position, genomic DNA samples were one-way sequenced by reading the surrounding genomic sequences outward on MiSeq using specific primers that annealed to the inserted gene.

Example 21: ribozyme and homology arm sequence compatibility at retargeted sites

This example describes a gene writer template molecule used in conjunction with a mutant gene writer polypeptide sequence that targets a genomic location outside of its native genomic target site. The endogenous sequence of retrotransposon RNA contains a ribozyme at its 5' end of the RNA, which ribozyme is active only when the RNA forms the correct secondary structure or when the RNA is folded (Eickbush, D.G., et al, molecular and Cellular Biology [ molecular and cell biology ],30 (13), 3142-3150,2010; eickbush, D.G., et al, PLoS ONE [ public science library. Complex ],8 (9), 1-16,2013; ruminski, D.J., et al, journal of Biological Chemistry [ journal of biochemistry ],286 (48), 41286-41295,2011). In order for the active form of the retrotransposon ribozyme to form the correct structure, some of the RNA of the retrotransposon must contain a portion of the 28S ribosomal RNA in order to form the appropriate secondary structure of the ribozyme P1 stem (Eickbush, D.G., et al, PLoS ONE [ public science library complex ],8 (9), 1-16,2013).

The portion of the endogenous retrotransposon RNA that is the 28S ribosomal RNA that interacts with the 5' utr of the retrotransposon RNA is similar to the gene writer template molecule (modularized in fig. 18), wherein a portion of the sequence in module a interacts with a portion of the sequence of the gene writer template molecule module B. In order for module B to be an active ribozyme, it needs to fold into the appropriate secondary structure, and the P1' portion of the ribozyme found in module B interacts with the P1 sequence of the ribozyme found in module a with some complementarity. In some embodiments, wherein the 5 'homology arm of the gene writer template molecule (module a) is necessary for integration activity and has an active ribozyme (module B), the sequence of the P1' sequence of the ribozyme found in module B is altered to have some complementarity to the sequence found in module a in which the P1 sequence resides. The length of the complementary nucleotides between the P1 sequence found in module A and the P1' sequence of module B may vary between 0 and 100 nucleotides. In some embodiments, if the re-targeted Gene Writer polypeptide sequence is based on an R2 element from a pectoral girdle, nucleotides 49-54 of module B of the Gene Writer template molecule interact by being fully or partially complementary to the last 28 nucleotides of module a, which is a homology arm with complementarity to a genomic location compatible with the mutant Gene Writer polypeptide sequence (which targets a genomic region with complementarity to module a) (the ribozyme predicted in fig. 28).

Example 22: gene Writing integrates specific template sequences.

This example describes analysis of insertions accomplished by the Gene Writing system to determine for unintended insertions of non-template RNAs (e.g., cellular endogenous RNAs) into target sites.

In this example, the Gene Writing system was used as described in the previous examples to integrate the template RNA into the target site in HEK293 cells. HEK293 was transfected with the following reagents:

1. mRNA encoding Gene Writer polypeptide

2. mRNA encoding an inactivated Gene Writer polypeptide (e.g., an R2Tg reverse transcriptase mutant)

3.Gene Writer RNA template (e.g., comprising 5'-3':5'HA-5' UTR-GFP cassette-3 'UTR-3' HA)

4. Gene Writer RNA templates without binding motif (e.g., comprising 5'-3':5'HA-GFP cassette-3' HA)

After 3 days of incubation, genomic DNA was extracted and the frequency of insertion was analyzed by ddPCR, as described elsewhere herein. In some embodiments, the combination of (1) and (2) will result in integration of the templates. In some embodiments, the combination of (1) and (4) will not result in detectable integration of the template because template (4) does not have a polypeptide binding motif, e.g., the 3' utr from the R2Tg retrotransposon. In some embodiments, (1) and (4) will result in an integration frequency that is less than the integration frequency of (1) and (2), e.g., less than 10%, 5%, 4%, 3%, 2%, 1%, 0.05%, or 0.01%.

To further analyze all incorporated sequences, the cells derived from transfection (1) and (2) were subjected to amplicon-seq by: PCR was used to amplify the ligation across the target site. Optionally, negative selection for unedited target sites can be performed by specific digestion at junctions of unedited target sites (the nicking sites of the Gene Writer) to improve the signal. Amplicons were processed on Illumina MiSeq for next generation sequencing as described previously. Unintentional incorporation is found by looking for reads that contain an insert (e.g., at least 100 nt) of new DNA that is not mapped to the template RNA. Optionally, the insertion is compared to a human transcriptome to determine the source of any transcripts that were inadvertently incorporated into the target site. In some embodiments, the Gene Writing system does not incorporate any templates that are not Gene Writing template RNAs into the target site. In some embodiments, the Gene Writing system does not incorporate any template that is not a Gene Writing template RNA into the target site at a level greater than 1% of the total insertion.

^TM Example 23: gene Writer enables nucleotide substitutions in genomic DNA to correct alpha-1 anti-pancreatic eggs in human cells Mutation of Albumin deficiency

This example describes the use of Gene Writer ^TM The gene editing system alters the genomic sequence at a single nucleotide.

In this example, gene Writer ^TM The polypeptide and the writing template were provided as DNA transfected into HEK293T cells with the PiZ genotype (E342K), a common allele associated with alpha-1 antitrypsin deficiency. Gene Writer ^TM The polypeptide uses Cas9 nickase to achieve DNA binding and endonuclease function. The writing template is designed to have homology to the target sequence while incorporating additional nucleotides at desired positions such that reverse transcription of the template RNA results in the production of a new DNA strand comprising the substitution.

In order to generate a transversion in the affected human SERPINA1 Gene (which restores the GAG triplets encoding glutamic acid in healthy patients), gene Writer was used ^TM The polypeptides are used with specific template nucleic acids encoding a gRNA scaffold for polypeptide binding, a spacer for polypeptide homing, a target homeodomain for building TPRT, and a template sequence for reverse transcription (which includes the required substitutions). Exemplary template RNA carrying sequence (1) TCCCCTCCAGGCCGTGCATA (2) GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC (3) TcGTCGATGGTCAGCACAGCCTTAT (4) GCACGGCCTGGA (SEQ ID NO: 1689), wherein numerals are used to delineate in order (5 ' -3 ') the module of the template (1) gRNA spacer, (2) gRNA scaffold, (3) heterologous subject sequence, (4) 3' homologous priming domain, lower case letter "c" indicates the position in the template carrying nucleotide substitutions to be written to the target site to correct for the E342K mutation. An exemplary gRNA for providing a second nick as described in embodiments of the present system comprises spacer sequence TTTGTTGAACTTGACCTCGG (SEQ ID NO: 1625) and directs Cas9 nickase to nick the second strand of the target site within the homologous region. In some embodiments, the second cut improves editing efficiency.

Following transfection, cells were incubated for three days to allow Gene Writing ^TM System and method for controlling a systemAnd extracting genomic DNA from the cells. The genomic DNA was then PCR-based amplified using site-specific primers and the amplicons were sequenced on Illumina MiSeq according to the manufacturer's protocol. Sequence analysis is then performed to determine the frequency of reads containing the desired edits.

Example 24: lipid nanoparticles comprising Gene Writer were used to correct alpha-1 antitrypsin deficiency.

This example describes the use of Gene Writer ^TM The gene editing system alters the genomic sequence at a single nucleotide in vivo. More specifically, gene Writer ^TM The polypeptides and writing templates were delivered to mouse hepatocytes via lipid nanoparticles to correct for SERPINA1 PiZ mutations that resulted in alpha-1 antitrypsin deficiency.

Finn et al Cell Reports [ Cell report ]22:2227-2235 (2018), the method of which is incorporated herein by reference, teaches the formulation of LNP (LNP-INT 01 system) carrying Cas9 and gRNA and the treatment of mouse models therewith.

Capped and polyadenylation Gene Writer polypeptide mRNA containing N1-methyl pseudoU was produced by in vitro transcription using a linear plasmid DNA template and T7 RNA polymerase. Polypeptide mRNA was purified from enzymes and nucleotides using MegaClear Transcription Clean-up kit according to the manufacturer's protocol (ThermoFisher). Transcript concentration was determined by measuring absorbance at 260nm (Nanodrop) and transcripts were analyzed by capillary electrophoresis of TapeStation (Agilent). Template RNAs containing mutation-correcting sequences were also prepared by in vitro transcription and translation using similar methods. In this example, the template RNA comprises the sequence illustrated in example 1.

LNP was formulated at an amine to RNA phosphate (N: P) ratio of 4.5. The lipid nanoparticle component was dissolved in 100% ethanol at the following molar ratio: 45mol% of LP01 lipid, 44mol% of cholesterol, 9mol% of DSPC and 2mol% of PEG2k-DMG. RNA cargo (1:40 molar ratio of polypeptide mRNA: template RNA) was dissolved in 50mM acetate buffer (pH 4.5), resulting in a concentration of RNA cargo of about 0.45mg/mL. LNP was formed by microfluidic mixing of lipid and RNA solutions using a precision nanosystem nano-assembly bench-top instrument (Precision Nanosystems NanoAssemblr Benchtop Instrument) according to the manufacturer's protocol. After mixing, LNP was collected and diluted in PBS (approximately 1:1), and then the remaining buffer was exchanged into PBS (100-fold excess over sample volume) with 10kDa Slide-a-Lyzer G2 dialysis cartridge (Semerfeier technologies Co. (ThermoFisher Scientific)) overnight at 4C with gentle agitation. The resulting mixture was then filtered using a 0.2-mm sterile filter. The filtrate was stored at 2C-8C. Multiple dose formulations can be formulated using 25mM citrate, 100mM NaCl cargo buffer (pH 5) and buffer exchange to triple salt sucrose buffer (TSS) by TFF (5% sucrose, 45mM NaCl and 50mM Tris). The average size of the formulated LNP was 105nm. Encapsulation efficiency was determined by the ribogreen assay (Leung et al 2012). Particle size and polydispersity were measured by Dynamic Light Scattering (DLS) using a Malvern Zetasizer DLS instrument.

NSG-PiZ mice harboring the human SERPINA1 PiZ allele (E342K) were purchased from Jackson laboratories (The Jackson Laboratory). To assess the ability of Gene Writing to edit mutant alleles in vivo, LNP was administered at 3mg/kg per animal in a volume of 0.2mL via the caudal vein. Vehicle treated animals served as negative controls for all studies. Animals were sacrificed at different time points by cardiac puncture exsanguination under isoflurane anesthesia. In some embodiments, animals are euthanized one week after treatment to analyze Gene Writing. Liver tissue was collected from the middle or left lobes of each animal for DNA extraction and analysis.

For NGS analysis of editing efficiency, PCR primers were designed around the target site and the region of interest was amplified from the extracted genomic DNA. Additional PCR was performed according to the manufacturer's protocol (Illumina) to add appropriate chemicals for sequencing, and then amplicons were sequenced on Illumina MiSeq. After elimination of those genomes with low quality scores, sequencing reads will be aligned with the mouse reference genome. The result file containing reads is mapped to a reference genome (BAM file) in which reads that overlap the target region of interest are selected and the number of wild-type reads compared to the number of reads containing the SERPINA1 reverse mutation encoded in the template RNA is calculated. The percentage of editing (e.g., "editing efficiency" or "editing percentage") is defined as the ratio of the total number of inverted sequence reads to the total number of sequence reads.

In some embodiments, this example is repeated with additional groups of mice, and the re-dosing regimen is used to analyze the dose-response characteristics of the system. In these experiments, mice were assigned to groups administered for up to 4 weeks per week, with euthanasia and tissue analysis as described herein performed weekly. In some embodiments, mice receiving more doses of LNP formulation show higher Gene Writing efficiency by sequencing, e.g., mice receiving 2 doses every other week are analyzed at week three, showing that the fraction of Gene correction reads of NGS through liver tissue samples is higher compared to mice receiving a single dose analyzed at week three. In use, administration in this manner may modulate therapeutic intervention after assessing the patient's response to one or more doses.

Example 25: gene Writing to address repeat amplification diseases

This example describes the use of Gene Writer ^TM The gene editing system treats repeat-amplified disease by rewriting a normal number of repeats into the locus. More specifically, gene Writer ^TM The polypeptide and writing template are delivered to the mouse CNS via AAV to reset CAG repeats in HTT according to custom template RNA, thereby curing huntington's disease. Healthy people tend to carry 10 to 35 CAG repeats within the huntington's gene (HTT), whereas people with huntington's disease may possess 36 to more than 120 repeats.

In this example, the template RNA is designed to correct the CAG repeat region of the HTT gene by encoding a sequence having 10 such repeats and sequences homologous to flanking target sequences written completely across the target locus. Multiple examples of such template RNAs can be designed, wherein an exemplary template RNA, as encoded in DNA, comprises the sequence (1) GGCGGCTGAGGAAGCTGAGG (2) GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC (3) AGTCCCTCAAGTCCTTCcagcagcagcagcagcagcagcagcagcagccgccaccgccgccgccgccgccgccgcctcct (4) CAGCTTCCTCAG (SEQ ID NO: 1690), wherein the modules of the template are depicted in sequence (5 '-3') using numbers: (1) a gRNA spacer, (2) a gRNA scaffold, (3) a heterologous subject sequence, (4) a 3' homologous priming domain, wherein in (3) a repeat correction is encoded. The CAG repeat region is followed by a short repeat region encoding 11 proline residues (8 residues are encoded by the CCG triplet). Without wishing to be bound by theory, this region is included in (3) to place (4) in a more unique region to prevent false triggers. An exemplary gRNA for providing a second nick as described in embodiments of the present system comprises spacer sequence CGCTGCACCGACCGTGAGTT (SEQ ID NO: 1647) and directs Cas9 nickase to nick the second strand of the target site within the homologous region. In some embodiments, the second cut improves editing efficiency.

To deliver the complete Gene Writing system to the CNS, in this example, the Gene Writer is split into two AAV genomes, the first encoding a nicking enzyme Cas9 domain fused to intein N of a split intein pair (DnaE intein-N: CLSYETEILTVEYGLLPIGKIVEKRI ECTVYSVDNNGNIYTQPVAQWHDRGEQEVFEYCLEDGSLIRATKDHKFM TVDGQMLPIDEIFERELDLMRVDNLPN (SEQ ID NO: 1613)) and the second encoding (DnaE intein-C, MIKIATRKYLGKQNVYDIGVERDHNFALKNGFIASN (SEQ ID NO: 1615)) and template RNA fused to the RT domain of intein-C of the split intein pair. Both polypeptide components are expressed from a polymerase II promoter (e.g., a neuronal cell specific promoter as described herein), and the template RNA and gRNA used to provide the second nick are expressed from a polymerase III promoter (e.g., a U6 promoter). When the cells are co-infected, the two polypeptide components recombine the complete Gene Writer polypeptide with an N-terminal Cas9 and a C-terminal RT, and the template RNA is expressed and reverse transcribed into the target locus. To achieve delivery to CNS cells (particularly the lobular nucleus and the putamen of the basal ganglia), the pseudosystem rAAV2/1 is used herein, wherein AAV2 ITRs are used to package the nucleic acid into particles with AAV1 capsids. AAV preparation and mouse injection and harvesting protocols used herein follow the teachings of Monteys et al Mol Ther [ molecular therapy ]25 (1): 12-23 (2017).

FVB-Tg (YAC 128) 53Hay/J mice were purchased from Jackson laboratories (The Jackson Laboratory). These transgenic mice express full-length human huntingtin with about 118 glutamine repeats (CAG trinucleotide repeats) and develop hyperkinesia three months old. At 8 weeks of age, mice were treated with a 1:1 combination of rAAV2/1-Cas9 virus and rAAV-MMLV_RT/hU6 template RNA virus. For rAAV injection, mice were anesthetized with isoflurane and 5. Mu.L of rAAV mixture was unilaterally injected into the right striatum at 0.2. Mu.L/min. Three weeks later, mice were sacrificed and brain tissue was removed for genomic DNA extraction and NGS analysis.

For NGS analysis of editing efficiency, PCR primers were designed flanking the target site and the region of interest was amplified from the extracted genomic DNA. Additional PCR was performed according to the manufacturer's protocol (Illumina) to add the required chemicals for sequencing, and then the amplicons were sequenced on Illumina MiSeq. After elimination of those genomes with low quality scores, sequencing reads will be aligned with the mouse reference genome. The result file containing reads is mapped to a reference genome (BAM file), where reads that overlap with the target region of interest are selected, and the number of pathogenic allele (> 35 CAG repeats) reads is calculated as compared to the number of repaired allele (10-35 CAG repeats) reads. The percentage of editing (e.g., "edit efficiency" or "edit percentage") is defined as the total number of reads repaired (as defined above) compared to the total number of reads of the sequence.

Example 26: gene Writing system by LNP and AAV vehicle delivery

This example describes the use of Gene Writer ^TM The gene editing system alters the genomic sequence at a single nucleotide in vivo. More specifically, gene Writer ^TM The polypeptide and the writing template were delivered to mouse liver cells by a combination of lipid nanoparticles (mRNA encoding the polypeptide) and AAV (DNA encoding the RNA template) to correct for SERPINA1 PiZ mutations that resulted in alpha-1 antitrypsin deficiency.

Capped and tailed mRNA encoding Gene Writer polypeptide was prepared by in vitro transcription and formulated into LNP-INT01 as described in example 23, but not co-formulated with template RNA.

In this example, the template RNA is encoded into DNA and delivered by AAV. The teachings of Cunningham et al Mol Ther [ molecular therapy ]16 (6): 1081-1088 (2008) describe the use of two copies of rAAV2/8 with the liver control region of the human alpha-1 antitrypsin (hAAT) promoter and the apolipoprotein E enhancer (ApoE) to efficiently transduce and drive expression of cargo in the liver of young mice. Thus, the rAAV2/8.Apoe-haat. PiZ (rAAV 2/8. PiZ) described herein comprises the AAV and promoter system described above that drives expression of the RNA template for correction of PiZ mutations, and a second nick-directed gRNA driven by the U6 promoter (RNA sequence previously described in example 1).

NGS-PiZ mice carrying the human SERPINA1 PiZ allele (E342K) were purchased from jackson laboratories (The Jackson Laboratory). To evaluate the activity of Gene Writing in editing mutant alleles in vivo, about 10 was intraperitoneally administered to 8-week-old mice ¹¹ vg of rAAV2/8.PiZ to express template RNA and 3mg/kg of formulated LNP was administered via the lateral tail vein in a volume of 0.2mL per animal to express the Gene Writer polypeptide. Animals were sacrificed at different time points by cardiac puncture exsanguination under isoflurane anesthesia. In some embodiments, animals are euthanized one week after treatment to analyze Gene Writing. Liver tissue was collected from the middle or left lobes of each animal for DNA extraction and analysis.

For NGS analysis of editing efficiency, PCR primers were designed around the target site and the region of interest was amplified from the extracted genomic DNA. Additional PCR was performed according to the manufacturer's protocol (Illumina) to add the required chemicals for sequencing, and then the amplicons were sequenced on Illumina MiSeq. After elimination of those genomes with low quality scores, sequencing reads will be aligned with the mouse reference genome. The result file containing reads is mapped to a reference genome (BAM file) in which reads that overlap the target region of interest are selected and the number of wild-type reads compared to the number of reads containing the SERPINA1 reverse mutation encoded in the template RNA is calculated. The percent edit is defined as the ratio of the total number of inverted sequence reads to the total number of sequence reads.

Example 27: gene Writer application in human chimeric liver mouse model ^TM Systemic delivery of therapeutic genes to the liver

This example describes Gene Writer in vivo ^TM The genome editing system is delivered to hepatocytes for integration and stable expression of genetic payloads. In particular, LNP is used to deliver a Gene Writing system that is capable of integrating a complete OTC expression cassette to treat a humanized mouse model of OTC deficiency.

In this example, the Gene Writing system is used to treat a humanized mouse model of OTC deficiency, in which human hepatocytes derived from an OTC deficiency patient are transplanted into the mouse model (Ginn et al JHEP Reports [ JHEP report ] 2019). Exemplary Gene Writing systems for load integration include Cas 9-directed reverse transcriptase systems that utilize high sustained synthesis capacity reverse transcriptase (MarathonRT). Exemplary template RNA components comprise (from 5' to 3 ') (1) a gRNA spacer homologous to the AAVS1 safety harbor site, (2) a gRNA scaffold, (3) a heterologous subject sequence, and (4) a 3' target homologous region (for annealing to genomic DNA immediately upstream of the first strand nick to prime TPRT of the heterologous subject sequence). (1) An exemplary sequence of (A) is GGGGCCACTAGGGACAGGAT (SEQ ID NO: 1691). Region (2) carries a gRNA scaffold as described herein, typically comprising sequence GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC (SEQ ID NO: 1603). In this example, (3) comprises a complete OTC expression cassette wherein the liver codon optimized sequence encoding human OTC (UniProt P00480) is operably associated with the apoe.haat promoter system described in example 25. (4) An exemplary sequence of (A) is CTGTCCCTAGTG (SEQ ID NO: 1692). An exemplary sequence for creating a second strand cut to increase integration efficiency of the additional gRNA spacer is AGAGAGATGGCTCCAGGAAA (SEQ ID NO: 1693).

Human hepatocytes (isolated from pediatric donors or purchased from Lonza corporation (Basel, switzerland)) were implanted into female Fah 8 to 12 weeks old ^-/- Rag2 ^-/- Il2rg ^-/- (FRG) mice, as described previously (Azuma et al Nat Biotechnol [ Nature Biotechnology)]2007). Transplanted mice circulate 2- (2-nitro-4-trifluoro-methylbenzoyl) -1, 3-cyclohexanedione (NTBC) in drinking water to promote liver regeneration. Blood was collected every two weeks and at the end of the experiment, and was assayed by enzyme-linked immunosorbent assay (ELISA; bass experiment)Room corporation (Bethyl Laboratories, inc.) measured human albumin levels in serum and used as a marker for estimating transplantation levels. Ten weeks after transplantation, the Gene writers as formulated in example 23 were used ^TM Mice were treated. For treatment, LNP was delivered through the lateral tail vein at a volume of 0.2mL per animal at 3 mg/kg.

Following vector injection, mice were recirculated on NTBC for 5 weeks and then euthanized. DNA and RNA were then extracted from the liver lysates by standard methods. OTC expression was then determined by RT-qPCR of the isolated RNA samples using sequence specific primers. Human OTC levels were also measured on sera at days-7, 0, 2, 4, 7, 14, 21, 28 and 35 post-injection using a human OTC ELISA kit (e.g., the oviductus biosystems (Aviva Systems Biology) OTC ELISA kit (human) (OKCD 07437)) throughout the course of the experiment, following the manufacturer's recommended protocol.

To analyze editing efficiency, ddPCR analysis was performed using a pair of primers annealed across the integrated 5 'or 3' ligation, one primer in each set annealed to the heterologous subject sequence and the other primer annealed to the appropriate region of the AAVS1 site on the genome. The assay is normalized to a reference gene to quantify the number of target site integrations/genome.

To analyze integration at the target site, long-read sequencing is performed across the integration site. PCR primers are designed flanking the target site and the region of interest is amplified from the extracted genomic DNA. Additional PCR was performed according to the manufacturer's protocol (PacBio) to add the required chemicals for sequencing, and then the amplicon was sequenced by PacBio. After elimination of those genomes with low quality scores, sequencing reads will be aligned with the mouse reference genome. The result file containing the reads is mapped to a reference genome (BAM file), where reads containing the insert sequences relative to the reference genome are selected for further analysis to determine the integrity of the integration, which in this example is defined as containing the complete promoter and OTC coding sequences.

Example 28: gene Writer for integrating CAR into T cells ex vivo

This example describes Gene Writer in vivo ^TM T cells delivered by a genome editing system for integration and stable expression of genetic payloads. In particular, LNP is used to deliver a Gene Writing system that is capable of integrating a Chimeric Antigen Receptor (CAR) expression cassette into the TRAC locus to generate CAR-T cells for the treatment of B cell lymphomas.

In this example, the Gene Writing system comprises a Gene Writing polypeptide, e.g., a nickase Cas9 and R2Tg reverse transcriptase domain as described herein, a gRNA for directing nickase activity to a target locus, and a template RNA comprising from 5 'to 3':

(1) 100nt homology to the target site 3' of the primary strand cut

(2) 5' UTR from R2Tg

(3) Heterologous object sequence

(4) 3' UTR from R2Tg

(5) 100nt homology to the target site 5' of the primary strand cut

Wherein (3) comprises the coding sequence of a CD19 specific Hu19-CD828Z (Genbank MN698642; brudno et al Nat Med [ Nat. Med. Natl. Med ]26:270-280 (2020)) CAR molecule. Gene writers in this example were directed to the 5' end of the first exon of TRAC by using targeting gRNA (e.g., TCAGGGTTCTGGATATCTGT (SEQ ID NO: 1694)) in order to place cargo under the control of endogenous expression from that locus while destroying the endogenous TCR, as taught by Eyquem et al Nature [ Nature ]543:113-117 (2017). All three components (polypeptide, gRNA and template) comprise RNA, which is synthesized by in vitro transcription (e.g., polypeptide mRNA, template RNA) or chemical synthesis (gRNA).

The LNP formulation used in this example has been screened and validated for ex vivo delivery to T cells as taught by Billingsley et al Nano Lett [ nanoReinforcement ]20 (3): 1578-1589 (2020), which is incorporated herein by reference in its entirety. Specifically, LNP formulation C14-4 (comprising cholesterol, phospholipids, lipid-anchored PEG, and ionizable lipid C14-4 (Billingsley et al Nano Lett [ Nano flash ]20 (3): 1578-1589 (2020), FIG. 2C)) was used to encapsulate all three RNA components at a molar ratio of mRNA: gRNA to template RNA of about 1:40:40.

Additional edits can be made to T cells to increase the activity of CAR-T cells against their cognate targets. In some embodiments, the second LNP formulation of C14-4 as described comprises Cas9/gRNA preformed RNP complex, wherein the gRNA targets Pdcd1 exon 1 for PD-1 inactivation, which may enhance the anti-tumor activity of CAR-T cells by disrupting this inhibition checkpoint (which would otherwise trigger cell inhibition) (see Rupp et al Sci Rep [ scientific report ]7:737 (2017)). Thus, the use of two nanoparticle formulations achieves lymphoma targeting by providing anti-CD 19 cargo while improving efficacy by knocking out PD-1 checkpoint inhibitors. In some embodiments, the cells may be treated with the nanoparticles simultaneously. In some embodiments, the cells may be treated with the nanoparticle in a separate step, e.g., first delivering RNP for generating a PD-1 knockout, and then treating the cells with the anti-CD 19 CAR bearing nanoparticle. In some embodiments, the second component of the system that increases T cell efficacy can result in the knockout of PD-1, TCR, CTLA-4, HLA-I, HLA-II, CS1, CD52, B2M, MHC-I, MHC-II, CD3, FAS, PDC1, CISH, TRAC, or a combination thereof. In some embodiments, knockdown of PD-1, TCR, CTLA-4, HLA-I, HLA-II, CS1, CD52, B2M, MHC-I, MHC-II, CD3, FAS, PDC1, CISH, or TRAC may be preferred, e.g., using siRNA targeting PD-1. In some embodiments, PD-1 targeting siRNA can be achieved by using self-delivering RNAi (as described in Ligtenberg et al Mol Ther [ molecular therapy ]26 (6): 1482-1493 (2018) and WO 2010033247), each of which is incorporated herein by reference in its entirety), wherein the siRNA is subjected to a wide range of chemical modifications that confer the resulting hydrophobically modified siRNA molecule the ability to penetrate all types of cells ex vivo and in vivo and achieve durable specific target gene knockdown without any additional delivery formulation or technique. In some embodiments, one or more components of the system may be delivered by other methods, such as electroporation. In some embodiments, additional modulators are knocked into cells for overexpression to control T-cell and NK-cell mediated immune responses and macrophage phagocytosis, e.g., PD-L1, HLA-G, CD47 (Han et al PNAS [ Proc. Natl. Acad. Sci. USA ]116 (21): 10441-10446 (2019)). The typing may be accomplished by applying the following: an additional Gene Writing system having a template carrying an expression cassette for one or more of such factors (3), wherein a safe harbor locus, e.g., AAVS1, is targeted, e.g., using gRNA GGGGCCACTAGGGACAGGAT to target a Gene Writing polypeptide to AAVS1.

LNP is used to process Dynabead at 1:1CD4 ⁺ :CD8 ⁺ Ratio primary T cells activated at a total mRNA concentration of 450 ng/. Mu.l. The resulting T cell populations were analyzed for integration, expression and effect. To assess integration, ddPCR was used with primers that produced amplicons that extended from within the integrated CAR to flanking genomic TRAC sequences. Comparing the signal to a reference gene (e.g., RPP 30) the average copy number/genome and integration efficiency can be quantified. To analyze expression, flow cytometry with an immune probe was used to evaluate the level and percentage of cells exhibiting surface CAR expression. To analyze the activity of CAR-T cells, the treated cells were evaluated by a co-plated cancer cell killing assay. By engineering Nalm6 ALL cells to express luciferase, cancer cell killing can be assessed by luminescence changes after co-culture with CAR-T cells as compared to signals from Nalm6 cells alone, billingsley et al Nano Lett]20 (3):1578-1589 (2020). Thus, the Gene Writing system can be used to generate CAR-T cells ex vivo with the desired cytotoxic activity.

Example 29: gene Writer for in vivo integration of CAR into T cells

This example describes Gene Writer ^TM A genome editing system that is delivered in vivo to T cells for integration and stable expression of gene loads. In particular, the targeted nanoparticles are used to deliver a Gene Writing system that is capable of integrating a Chimeric Antigen Receptor (CAR) expression cassette into the murine Rosa26 locus to generate CAR-T cells in a murine model.

(1) 100nt homology to the target site 3' of the primary strand cut

(2) 5' UTR from R2Tg

(3) Heterologous object sequence

(4) 3' UTR from R2Tg

(5) 100nt homology to the target site 5' of the primary strand cut

Wherein (3) comprises the coding sequence of a CD19 specific m194-1BBz CAR driven by the EF1a promoter (Smith et al Nat Nanotechnol [ Nature nanotechnology ]12 (8): 813-820 (2017)). Gene writers in this example were directed to the murine Rosa26 locus using gRNA (e.g., ACTCCAGTCTTTCTAGAAGA (SEQ ID NO: 1695)) and (Chu et al Nat Biotechnol [ Nature Biotechnology ]33 (5): 543-548 (2015)). The production of RNA molecules is in accordance with the examples provided herein, e.g., by in vitro transcription (e.g., gene Writer polypeptide mRNA, template RNA) and by chemical synthesis (e.g., gRNA). Modifications to the system RNA component are described elsewhere. For Gene Writer mRNA, the sequences additionally include the 5'UTR (e.g., GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGC CACC (SEQ ID NO: 1604)) and 3' UTR (e.g., UGAUAAUAGGCUGGAGCC UCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCU CCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA (SEQ ID NO: 1605)) sequences flanking the coding sequence. It has been demonstrated that a combination of 5'UTR and 3' UTR results in good expression of operatively linked ORFs (Richner et al Cell 168 (6): pages 1114-1125 (2017)).

To achieve specific delivery to T cells, a targeted LNP (tLNP) carrying a conjugated mAb to CD4 was generated. See, e.g., ramissetti et al ACS Nano [ American society of chemistry ]]9 (7):6706-6716 (2015). Alternatively, conjugation of mAb to CD3 may be used to target CD4 ⁺ And CD8 ⁺ T cells (Smith et al Nat Nanotechnol natural nanotechnology)]12 (8):813-820 (2017)). In other embodiments, the nanoparticle for in vivo delivery to T cells is a constrained nanoparticle lacking a targeting ligand, as taught below: lokugamage et al Adv Mater advanced Material]31(41):e1902251(2019)。

tLNP was prepared by: first a mix of nucleic acid (e.g., polypeptide mRNA: gRNA: template RNA molar ratio of 1:40:40) and lipid mixture (cholesterol, DSPC, PEG-DMG, dlin-MC3-DMA and DSPE-PEG-maleimide) is prepared, and then the desired DTT reduced mAb (e.g., anti-CD 4, e.g., clone YTS.177) is chemically conjugated to the maleimide functional group on LNP. See Ramissetti et al ACS Nano [ American society of chemistry ]9 (7): 6706-6716 (2015).

Formulated LNP was injected intravenously into 6 to 8 week old C57BL6/J mice at a dose of 1mg RNA/kg body weight. On the first and third days after administration, blood was collected in heparin-coated collection tubes and leukocytes were isolated by density centrifugation using Ficoll-Paque PLUS (GE Healthcare). Five days after administration, animals were euthanized and blood and organs (spleen, lymph nodes, bone marrow cells) were harvested for T cell analysis. anti-CD 19 CAR expression was detected by FACS using specific immunosorts. Cells positive for integration were confirmed by ddPCR on the sorted population using primers flanking the integration junction, e.g., one primer of a pair anneals to the integrated cargo and the other primer anneals to genomic DNA of the Rosa26 target site.

Example 30: mutations in the DNA binding motif of reverse transcriptase prevent integration at its natural site

In this example, the inherent DNA binding properties of the DNA binding domain of a Gene Writer polypeptide based on a retrotransposon are analyzed by mutagenesis or truncation of that region. The detection of the production and reduced activity of these mutants aids in understanding determinants of DNA targeting and in creating mutant derivatives with reduced or eliminated function, which can then be used as scaffolds for fusion via heterologous DNA binding domains. Here, it was shown that mutations in the C-terminal zinc finger domain and the C-myb domain have profound effects on the enzymatic activity of R2Tg retrotransposase at its natural recognition sequence.

To test whether the R2Tg DNA binding mutant resulted in the disappearance of DNA binding activity, exemplary mutants were constructed and evaluated for their integration activity at the native rDNA target site (as a downstream readout of the target DNA binding activity). In this example, the naming and design of the parent domain and the mutant (x) domain is as follows:

·ZF1：CPCCGTRVNSVLNLIEHLKVSH

·ZF1*：SPSSGTRVNSVLNLIEHLKVSH

·ZF2：CEVCNRDFTTKIGLGQHKRLAH

·ZF2*：SEVSNRDFTTKIGLGQHKRLAH

·c-myb：RCWTKEEEELLIRLEAQFEGNKNINKLIAEHITTKTAKQISD

·c-myb*：ACATKEEEELLIRLEAQFEGNKNINKLIAEHITTKTAKQISD

the general structure of these domains is shown in FIG. 30A. Here, the mutant Gene Writer polypeptide is encoded within the Gene Writer template, which means that the polypeptide coding sequence is further flanked by R2 Tg-derived 5'UTR and 3' UTR and 100nt homology to the native target site, as described herein.

To test whether mutations in the R2Tg DNA binding domain affect R2Tg Gene Writer function, 250ng of plasmid (containing the parent R2Tg polypeptide, endonuclease inactivating mutant (negative control of Gene Writer activity), each single DBD mutation, or all three DBD mutations) was nuclear transfected into HEK293T cells using the lonsha Amaxa Nucleofector well shuttle system with program DS150 according to the manufacturer's protocol. After nuclear transfection, cells were incubated at 37℃with 5% CO ₂ Culturing was performed for 3 days, followed by cell lysis and genomic DNA extraction. The extracted gDNA was measured by ddPCR for Gene Writer template integration at the native rDNA site. Here, mutation-inhibiting integration in the ZF1, ZF2 or c-myb domains is observed to the extent of ZF2>c-myb>ZF1 (fig. 30B).

Example 31: determination of the cleavage site of the endonuclease Domain of Gene Writer by indel characterization

This example describes a cell-based assay for determining the cleavage active site of a Gene Writer polypeptide comprising an Endonuclease (EN) domain on a genomic DNA target. In particular, it was shown that reverse transcriptase activity can produce low levels of target site modification, possibly due to host DNA repair of the target nick. This feature can be assessed to locate cleavage sites by using extremely sensitive and targeted amplicon sequencing assays. The ability to rapidly evaluate early steps in the relevant Gene Writing system is an enabling assay for understanding and engineering the DNA specificity of the enzyme (e.g., sequence specific DNA binding or sequence specific endonuclease activity) without relying on complete integration, which may be affected by other attributes of the system.

To generate a cleavage site profile, an assay was created to analyze genomic sequence modifications at the predicted cleavage site of the reverse transcriptase R2 Tg. A schematic of the target sequence for R2Tg is shown in fig. 31A, depicting predicted DNA binding regions and predicted endonuclease cleavage sites. To determine if R2Tg endonuclease activity could be detected around the predicted target, 73ng of R2Tg expression plasmid was nuclear transfected into 200,000U 2OS cells using buffer SE of procedures DN100 and Amaxa Lonza nucleofector according to the manufacturer's instructions. Cells were cultured in DMEM containing 10% FBS three days after nuclear transfection prior to extraction of genomic DNA. Amplicons were generated using primers flanking the rDNA target site (5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTagggg aatccgactgtttaatta-3' and 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGAT CTcacctctcatgtctcttcaccg-3'). Amplicon sequencing was performed using Illumina MiSeq and the results were analyzed using the CRISPResso2 tube (Clement et al, nat Biotechnol 37 (3): 224-226 (2019)) to determine the mutation profile at the target site. Insertions and deletions were found at and around the predicted GG cleavage site, with peaks of the insertion features occurring directly at the predicted R2Tg cleavage site (fig. 31B). These data demonstrate the use of indel features in the Gene Writing system for detecting and localizing endonuclease activity.

Example 32: improving sequence specificity of reverse transcription transposase endonuclease domains

This example describes an experiment to elucidate the sequence specificity of endonuclease activity associated with the Gene Writer system. In particular, DNA target sequence information important for endonuclease activity is elucidated by targeted mutagenesis of regions of various sizes at or around the cleavage site. Here, the sequence specificity of the native endonuclease domain of the R2Tg reverse transcriptase is analyzed by applying an indel characterization assay (see example 31) to cells comprising a genomic landing pad with native or altered R2Tg target sequence to determine the cleavage activity of the target library.

In this example, the resulting cell line has stable integration of the landing pad, and the Liu Dian sequence comprises an rDNA-derived sequence corresponding to the native target site (where the R2 class retrotransposon system promotes retrotransposition). These landing pads are designed to have (1) wild-type target sequences, including rDNA-derived sequences from the R2 region of rDNA; (2) Landing pad from (1) having a sequence mutation of 12bp at and around the R2 cleavage site; or (3) a series of mutations within the 12-bp sequence from (2) to further define the minimum sequence requirement within the 12bp range described in example 2. The complete list of landing pads can be found in the landing pad table and is shown in fig. 32B. To create these cell lines, DNA from different landing pads was synthesized and cloned downstream of GFP reporter in lentiviral gene expression vectors (fig. 32A). The landing pad lentiviral vector was verified by sanger sequencing of the landing pad. To generate lentiviral particles for transduction, 9 μg of the sequence verified plasmid and 9 μg of lentiviral packaging mixture (Biosettia corporation) were transfected into lentiviral packaging cell line LentiX-293T (Takara Bio Inc.). Transfected cells were incubated at 37℃for 48 hours at 5% CO2 (including 24 hours with primary medium change) and the medium containing the viral particles was collected. The collected medium was filtered through a 0.2 μm filter to remove cell debris and prepare for transduction of U2OS cells. The filtrate containing the virus was diluted in DMEM and mixed with polybrene to prepare a dilution series for cell transduction, wherein the final concentration of polybrene was 8 μg/mL. Recipient U2OS cells were grown in virus-containing medium for 48 hours, then split and grown to confluence with fresh medium. Transduction efficiencies of different dilutions of virus were measured by GFP expression via flow cytometry and ddPCR to determine the average copy number of the integrated lentiviral landing pad.

The ability of R2Tg retrotransposases to exhibit endonuclease activity at target sites with various mutations in and around the R2 cleavage site was determined using an insertion deletion assay (example 31). Specifically, 73ng of Gene Writer polypeptide expression vector (which contains R2Tg reverse transcriptase) was nuclear transfected into different U2OS landing pad cell lines using the Dragon sand company Amaxa Nucleofector well shuttle system with the nuclear transfection program DN100 according to the manufacturer's instructions. Following nuclear transfection, cells were cultured at 37℃for 3 days under 5% CO2, followed by cell lysis and genomic DNA extraction. Landing pad sequence specific primers (5'-ACACTCTTTCCCTACACGACGCTCTTCCG ATCTgctcacacaggaaacagctatg-3' and 5'-GTGACTGGAGTTCAGACGTGTGCT CTTCCGATCTggatgtgctgcaaggcgatt-3') were used to amplify this region and sequence the purified amplicon on Illumina MiSeq. The endonuclease activity profile at the target site was analyzed by detecting insertions and deletions at the landing pad using the CRISPResso2 pipeline (Clement et al Nat Biotechnol 37 (3): 224-226 (2019)). As shown in example 31, R2Tg endonuclease activity occurred at the GG cleavage site (fig. 31A and B). Using this series of landing pads, it was found that the minimal sequence important for the endonuclease activity of the tested polypeptide included the GG dinucleotide and the additional AA dinucleotide immediately upstream, which defined the 5'-AAGG-3' motif within the native target sequence important for endonuclease activity (FIG. 32B). In some embodiments, it may be desirable to find a native endonuclease-specific motif at a site in the human genome to re-target the reverse transcription-transposase based Gene Writer system. In some embodiments, naturally occurring AAGG sequences in the genome are used as seeds for retargeting the R2 reverse transcription transposase based Gene Writing system, wherein the DNA binding domain is mutated or replaced by a heterologous DNA binding domain, such that binding of the Gene Writing polypeptide to the new target site results in the correct localization of the endonuclease domain to the AAGG motif to achieve endonuclease activity.

Table 41: landing pads comprising wild-type target sequences or mutant target sequences are described in example 32. The sequences were integrated into the genome using a lentiviral vector system to achieve stable integration. The sequences in the table are provided in a 5 'to 3' orientation, where underlined text indicates the native rDNA sequence and bold text indicates mutations from the native rDNA sequence.

Table 41: landing pad comprising wild-type target sequence or mutant target sequence

/>

Example 33: determination of sequence specificity for retargeting Gene Writer polypeptides

This example describes the redirection of the polypeptide component of the Gene Writer system from its natural recognition sequence to a new location in the human genome. As described in the present disclosure, the reverse transcription transposase-based system can be directed to recognize new DNA sequences by adding a heterologous DNA binding domain alone or in tandem with mutagenesis of an endogenous DNA binding domain. Here, zinc fingers capable of targeting the native human AAVS1 site are fused to the Gene Writer polypeptide to recognize the AAVS1 sequence.

To direct the Gene Writer polypeptide comprising the R2Tg reverse transcriptase to a DNA sequence that is different from its native rDNA target sequence, a Zinc Finger (ZF) domain targeting the human AAVS1 site is fused to the N-terminus of the polypeptide. Using a method similar to example 32, the resulting cells included various genomic landing pad compositions, representing different combinations of ZF-recognized AAVS1 sequences and R2 reverse transcriptase-recognized native rDNA target sequences. In general, the library consisted of 460 different landing pads containing AAVS1 sequences and rDNA sequences of different lengths, further diversified by varying the distance and orientation between the two sequences (fig. 33). In this example, all landing pads were designed to include a human AAVS1 genomic sequence comprising an AAVS1 ZF binding site. Furthermore, an rDNA sequence comprising a minimal AAGG sequence important for R2Tg endonuclease activity (see example 32) was added to the AAVS1 sequence according to the following parameters: (1) The rDNA sequence includes 12, 22, 32, 42, 52, 62, or 72nt of the rDNA sequence immediately 3' to the AAGG tetranucleotide; or (2) the rDNA sequence comprises 12, 22, 32, or 42nt immediately 3 'and 5' to the AAGG cleavage site, resulting in a total rDNA length of 24, 44, 64, or 84nt. Different rDNA sequence compositions were further placed at different distances from the AAVS1 ZF binding site, including distances of 5, 10, 15, 20, 25, 30, 40, 45, 50, 55, 60, 65, 70, 75, 80, or 85 nt. Design considerations include keeping the overall length of the landing pad sites between the assay primers constant to prevent bias during PCR amplification, e.g., longer rDNA sequences cannot be placed at the same distance from AAVS1 sites as shorter rDNA sequences. As a final variation, the orientation between the two sites is altered, so that the rDNA sequence composition described above is placed upstream or downstream of the AAVS1 site in either a forward or reverse orientation. Using these parameters, libraries containing 454 AAVS1-rDNA hybrid landing pads were designed, where the controls included various combinations of full length rDNA sequences and AAVS1 sequences (positive) or no rDNA sequences (negative). An illustrative representation of a landing pad design strategy is shown in fig. 33, with a short list of exemplary landing pad sequences provided in table 42 to demonstrate a particular composition.

The library of lentiviral constructs described above was synthesized with a 3' barcode for sequence analysis and cloned into a lentiviral gene expression vector. The lentiviral system was then used to generate a U2OS cell line with integrated landing pad, as described in example 32. To verify successful library generation, a pool of U2OS landing pad cell lines was analyzed for landing pad representation. Primers specific for conserved landing pad sequences (see example 32) were used for amplification across target regions, including construct-specific barcodes. The bar codes in each landing pad were computationally demultiplexed and categorized, with approximately 94% of the landing pads represented by at least 10,000 reads (fig. 34).

After validation of the library, the U2OS landing pad cells were used to determine the minimal sequence determinants for re-targeting the R2Tg reverse transcriptase-based Gene Writer polypeptide. By fusing the coding sequence of ZF-AAVS1 with full length R2Tg (SEQ ID NO: 1672) (ZF-R2 Tg, FIG. 35A) or with the truncated R2Tg DNA binding domain SEQ ID NO: 1663) (ZF-R2 Tg (NO DBD), FIG. 36A) two re-targeting constructs were generated and the corresponding expression plasmids were electroporated into U2OS pooled landing pad cells. Specifically, 400ng of ZF-R2Tg or ZF-R2Tg (no DBD) were delivered to cells using the yak Amaxa Nucleofector well shuttle system with nuclear transfection procedure DN100 according to the manufacturer's instructions. After nuclear transfection, cells were incubated at 37℃with 5% CO ₂ Culturing was performed for 3 days, followed by cell lysis and genomic DNA extraction. As with library validation above, landing pad sequence specific primers were used to amplify the target region for amplicon sequencing using Illumina MiSeq. Sequencing reads of landing pad variants were first demultiplexed using the associated barcodes, then purified by using the CRISPResso2 pipeline (Clement et al Nat Biotechnol [ natural biotechnology ]]37 224-226 (2019)) to analyze the characteristics of endonuclease activity at the target site.

Given that the ZF-R2Tg construct comprises a full length R2Tg protein, it is expected that the polypeptide will still retain the ability to recognize its native target sequence. The frequency of insertion of each landing pad at the GG target location was calculated and plotted (fig. 35A). The positive control landing pad containing the 200nt rDNA sequence was found to contain an insertion feature at the GG cleavage site, as shown in example 31. No negative control landing pads without rDNA sequence were found to contain any insertions. Here, the insertion features resulting from ZF-R2Tg endonuclease activity at the target cleavage site were detected in landing pads containing 44, 64 and 84nt native rDNA target sequences, but not in landing pads containing only 24nt native rDNA target sequences (FIGS. 35A and B). The length of rDNA sequences tested positive was shown to be positive on landing pads at different distances from the AAVS1 sequence.

Next, ZF-R2Tg (DBD free) constructs (lacking their endogenous DNA binding domain and predicted to rely on ZF for target binding and activity) were similarly evaluated. The frequency of insertion of each landing pad at the GG target location was calculated and plotted (fig. 36A). No negative control landing pads without rDNA sequence were found to contain any insertions. Because the R2Tg protein in this construct contained a significant deletion, there was no positive control landing pad configuration for validating activity. In this experiment, two different landing pad configurations showed indel features at GG target locations. Both hits contained the same 44nt rDNA sequence, but were positioned differently relative to the AAVS1 site, with one rDNA target located 55nt upstream of the AAVS1 site and the other located 20nt downstream of the AAVS1 site and in reverse complement orientation (fig. 36A and B). Despite the different compositions, these two hits indicate that the activity of the R2 retrotransposase DNA binding domain mutant is restored by compensating for the deleted endogenous domain with a heterologous DNA binding domain fusion known to target the native locus in the human genome. In addition, this example establishes a method for further refining the requirement for re-targeting the Gene Writer polypeptide to alternative sequences in the human genome.

Table 42 provides a selection of exemplary landing pad target sequences, which are designed herein, for example in this example, to test Gene Writer polypeptides comprising AAVS1 zinc fingers fused to R2Tg retrotransposase. Included are 1) positive controls comprising AAVS1 zinc finger recognition sequences and a complete 200nt region centered at the R2 cleavage site from rDNA; 2) A negative control lacking rDNA sequence; 3) A 44nt rDNA sequence located upstream of the AAVS1 sequence; and 4) a 44nt rDNA sequence downstream of the AAVS1 sequence and in the opposite orientation. Two experimental landing pads included herein show that the AAGG core sequence can be cleaved using a zinc finger fused to a mutant R2Tg lacking its endogenous N-terminal DNA binding domain, see, for example, this example. The ribosomal DNA sequence is indicated in underlined text, the AAGG core is indicated in brackets, and the binding sequence of the AAVS1 zinc finger is indicated in lowercase text.

TABLE 42 selection of exemplary landing pad target sequences designed for testing Gene Writer polypeptides comprising AAVS1 zinc fingers fused to R2Tg reverse transcriptase transposase

Example 34: selection of lipid reagents with reduced aldehyde content

In this example, lipids are selected for downstream use of a lipid nanoparticle formulation containing one or more Gene Writing component nucleic acids, and the lipids are selected based at least in part on the absence or low level of contaminating aldehydes. Reactive aldehyde groups in the lipid reagent can cause chemical modification of one or more component nucleic acids (e.g., RNA, e.g., template RNA) during LNP formulation. Thus, in some embodiments, the aldehyde content of the lipid reagent is minimized.

Liquid Chromatography (LC) in combination with tandem mass spectrometry (MS/MS) can be used to separate, characterize and quantify The aldehyde content of a reagent, for example, as described in Zurek et al The analyser 124 (9): 1291-1295 (1999), which is incorporated herein by reference. Here, each lipid reagent was subjected to LC-MS/MS analysis. LC/MS-MS methods first separate lipids and one or more impurities using a C8 HPLC column, and then detect and structurally measure these molecules using a mass spectrometer. If an aldehyde is present in the lipid reagent, it is quantified using a Stable Isotope Labeling (SIL) standard that is structurally identical to an aldehyde but heavier due to C13 and N15 labeling. An appropriate amount of SIL standard was incorporated into the lipid reagent. The mixture was then subjected to LC-MS/MS analysis. The amount of contaminating aldehyde is determined by multiplying the amount of SIL standard by the peak ratio (unknown/SIL). Aldehyde identified as any one or more of the quantitative lipid reagents. In some embodiments, the lipid feedstock selected for the LNP formulation is found to not contain any contaminating aldehyde content above the selected level. In some embodiments, one or more, and optionally all, of the lipid agents used in the formulation comprise less than 3% total aldehyde content. In some embodiments, one or more, and optionally all, of the lipid reagents used in the formulation comprise less than 0.3% of any single aldehyde species. In some embodiments, one or more, and optionally all, of the lipid reagents used in the formulation comprise less than 0.3% of any single aldehyde species and less than 3% of total aldehyde content.

Example 35: quantification of RNA modification by aldehyde during formulation

In this example, the RNA molecule is analyzed after formulation to determine the extent of any modification that may occur during formulation, e.g., to detect chemical modification caused by aldehyde contamination of the lipid reagent (see, e.g., example 34).

RNA modification can be detected by analysis of ribonucleosides, for example, according to the method of Su et al Nature Protocols [ Nature laboratory Manual ]9:828-841 (2014), which is incorporated herein by reference in its entirety. In this method, RNA is digested into a riboside mixture, which is then subjected to LC-MS/MS analysis. Post-formulation RNA was contained in LNP and it had to be first isolated from lipids by co-precipitation with GlycoBlue in 80% isopropanol. After centrifugation, the pellet containing RNA was carefully transferred to a new Eppendorf tube, to which an enzyme mixture (omnipotentiase, phosphodiesterase type 1, phosphatase) was added to digest the RNA into nucleosides. Eppendorf tubes were placed on a thermo mixer preheated at 37℃for 1 hour. The resulting nucleoside mixture was directly analyzed by LC-MS/MS method, which first separated the nucleosides and modified nucleosides using a C18 column and then detected them using mass spectrometry.

If one or more aldehydes in the lipid reagent cause chemical modification, the data analysis will correlate the one or more modified nucleosides with the one or more aldehydes. Modified nucleosides can be quantified using the SIL standard, which is structurally identical to the natural nucleoside, except for the heavier due to the C13 and N15 labels. An appropriate amount of SIL standard was incorporated into the nucleoside digest, which was then subjected to LC-MS/MS analysis. The amount of modified nucleoside was obtained by multiplying the amount of SIL standard by the peak ratio (unknown/SIL). LC-MS/MS is capable of simultaneously quantifying all target molecules.

In some embodiments, the use of a lipid reagent with a higher impurity aldehyde content results in a higher level of RNA modification than the use of a higher purity lipid reagent as a material in the lipid nanoparticle formulation process. Thus, in preferred embodiments, higher purity lipid reagents are used, which result in RNA modification below acceptable levels.

Example 36: gene Writer ^TM Implementation of large insertions in genomic DNA

This example describes the use of Gene Writer ^TM The gene editing system alters the genomic sequence by inserting a large series of nucleotides.

In this example, gene Writer ^TM The polypeptide, gRNA and writing template are provided as DNA transfected into HEK293T cells. Gene Writer ^TM The polypeptide uses Cas9 nickase to achieve DNA binding and endonuclease function. Reverse transcriptase function is derived from the high processivity RT domain of R2 reverse transcriptase transposase. The writing template is designed to have homology to the target sequence while incorporating a genetic load at the desired location such that reverse transcription of the template RNA results in the production of a new DNA strand containing the desired insertion.

Gene Writer for generating large insertions in human HEK293T cell DNA ^TM The polypeptide is conjugated to a specific gRNA (which will contain the Gene Writer of Cas9 ^TM Targeting the target locus) and a template RNA for reverse transcription comprising an RT binding motif (3' utr from R2 element) for association with reverse transcriptase, a target site homology region for priming reverse transcription and a genetic load (GFP expression unit). The complex nicks the target site and then performs TPRT on the template, priming the reaction by using a priming region on the template that is complementary to the sequence immediately adjacent to the nick site, and replicating GFP payload into genomic DNA.

Following transfection, cells were incubated for three days to allow Gene Writing ^TM Expression of the system and conversion of genomic DNA targets. After the incubation period, genomic DNA is extracted from the cells. The genomic DNA was then PCR-based amplified using site-specific primers and the amplicons were sequenced on Illumina MiSeq according to the manufacturer's protocol. Sequence analysis is then performed to determine the frequency of reads containing the desired edits.

^TM Example 37: gene Writer enables large insertions in genomic DNA

Example 38: preparation of lipid nanoparticles encapsulating firefly luciferase mRNA

In this example, reporter mRNA encoding firefly luciferase is formulated as lipid nanoparticles comprising different ionizable lipids. The Lipid Nanoparticle (LNP) component (ionizable lipid, helper lipid, sterol, PEG) was dissolved in 100% ethanol along with the lipid component. These were then prepared using the ionizable lipids lipidtv 004 or lipidtv 005 (table A1), DSPC, cholesterol, and DMG-PEG 2000 in a molar ratio of 50:10:38.5:1.5, respectively. Firefly luciferase mRNA-LNP containing ionizable lipid LIPIDV003 (Table A1) was prepared using LIPIDV003, DSPC, cholesterol, and DMG-PEG 2000, respectively, at a molar ratio of 45:9:44:2. Firefly luciferase mRNA used in these formulations was produced by in vitro transcription and encoded firefly luciferase protein, further comprising 5' caps, 5' and 3' UTRs, and poly A tails. mRNA was synthesized under standard conditions for in vitro transcription by T7 RNA polymerase with co-transcriptional capping, but in the reaction, the nucleotide UTP was 100% replaced by N1-methyl-pseudouridine triphosphate. The purified mRNA was dissolved in 25mM sodium citrate, pH 4, at a concentration of 0.1mg/mL.

Firefly luciferase mRNA was formulated as LNP with a molar ratio of lipid amine to RNA phosphate (N: P) of 6. Use of precision nanosystems nanoAsssemblr ^TM The LNP was formed by microfluidic mixing of lipid and RNA solutions using a bench top instrument using the manufacturer recommended setup. During mixing with different flow rates, the ratio of aqueous solvent to organic solvent was maintained at 3:1. After mixing, LNP was collected and dialyzed overnight at 4℃in 15mM Tris, 5% sucrose buffer. Firefly luciferase mRNA-LNP formulations were concentrated by centrifugation using an Amicon 10kDa centrifuge filter (Millipore). The resulting mixture was then filtered using a 0.2 μm sterile filter. The final LNP was stored at-80℃until further use.

Table A1: the ionizable lipids used in example 37

The prepared LNP was analyzed for size, uniformity and% RNA encapsulation. Size and uniformity measurements were made by dynamic light scattering using a Malvern Zetasizer DLS instrument (malvern analysis company (Malvern Panalytical)). LNP was diluted in PBS to determine mean particle size (nanometers, nm) and polydispersity index (pdi) prior to measurement by DLS. The particle size of firefly luciferin mRNA-LNP is shown in Table A2.

Table A2: LNP particle size and uniformity

LNP ID	Ionizable lipids	Particle size (nm)	pdi
				LNPV019-002	LIPIDV005	77	0.04
LNPV006-006	LIPIDV004	71	0.08
				LNPV011-003	LIPIDV003	87	0.08

The percentage of encapsulation of luciferase mRNA was measured by fluorescence-based RNA quantification of Ribogreen (sammer femto technology company (ThermoFisher Scientific)). LNP samples were diluted in 1 XTE buffer and mixed with Ribogreen's reagent as recommended by the manufacturer and measured on an i3 SpectraMax spectrophotometer (molecular instruments Co. (Molecular Devices)) using 644nm excitation and 673nm emission wavelength. To determine the percent encapsulation, LNP was measured using a Ribogreen assay with intact LNP and disrupted LNP, where particles were incubated with 1×te buffer containing 0.2% (w/w) Triton-X100 to disrupt the particles to allow the encapsulated RNA to interact with the Ribogreen reagent. The samples were again measured on an i3 SpectraMax spectrophotometer to determine the total amount of RNA. When the LNP is intact, the total RNA is subtracted from the amount of RNA detected to determine the fraction of encapsulation. The value is multiplied by 100 to determine the percent package. The percent firefly luciferase mRNA-LNP and RNA encapsulation measured by Ribogreen are reported in table A3.

Table A3: RNA encapsulation after LNP formulation

LNP ID	Ionizable lipids	% mRNA encapsulation
			LNPV019-002	LIPIDV005	98
LNPV006-006	LIPIDV004	92
			LNPV011-003	LIPIDV003	97

Example 39: in vitro Activity test of mRNA-LNP in Primary hepatocytes

In this example, LNP comprising luciferase reporter mRNA is used to deliver RNA cargo into cultured cells. Primary mice or primary human hepatocytes were thawed and seeded into collagen-coated 96-well tissue culture plates at a density of 30,000 or 50,000 cells per well, respectively. Cells were plated in 1 XWilliam Medium E without phenol Red and incubated at 37℃and 5% CO ₂ And (5) incubating. After 4 hours, the medium was changed to maintenance medium (1 XWilliam Medium E, phenol free, hepatocyte maintenance supplements package (Siemens Feisher technology Co.), and the cells were incubated at 37℃and 5% CO ₂ Incubate overnight. Firefly luciferase mRNA-LNP was thawed and gently mixed at 4 ℃. LNP was diluted to the appropriate concentration in maintenance medium containing 7.5% fetal bovine serum. LNP was incubated at 37℃for 5 minClock, then added to the plated primary hepatocytes. To assess delivery of RNA cargo to cells, LNPs were incubated with primary hepatocytes for 24 hours, then cells were harvested and lysed for luciferase activity assays. Briefly, media was aspirated from each well and then washed with 1x PBS. PBS was aspirated from each well and 200 μl of Passive Lysis Buffer (PLB) (Promega) was added back to each well and then placed on a plate shaker for 10 minutes. Lysed cells in PLB were frozen and stored at-80 ℃ until luciferase activity assay was performed.

For luciferase activity assays, cell lysates in passive lysis buffer were thawed, transferred to round bottom 96 well microtiter plates, and centrifuged at 15,000g for 3 min at 4 ℃ to remove cell debris. According to the manufacturer's instructions, pierce is used ^TM The BCA protein assay kit (sameifer tech) measures the protein concentration of each sample. Protein concentration was used to normalize cell numbers and determine the appropriate dilution of lysates for luciferase assays. According to the manufacturer's instructions, luciferase activity assays were performed in white-wall 96-well microtiter plates using luciferase assay reagents (Promega), and luminescence was measured using an i3X SpectraMax reader (molecular instruments). The results of the dose response of firefly mRNA-LNP mediated firefly luciferase activity are shown in fig. 37 and demonstrate successful LNP mediated delivery of RNA into primary cells in culture. As shown in fig. 37A, LNP formulated according to example 38 was analyzed for delivery of cargo to primary human (a) and mouse (B) hepatocytes, as according to example 39. Luciferase assays showed dose-responsive luciferase activity in cell lysates, indicating successful delivery of RNA to cells and expression of firefly luciferase from mRNA cargo.

Example 40: LNP-mediated delivery of RNA to mouse liver.

To measure the effectiveness of LNP-mediated delivery of firefly luciferase-containing particles to the liver, LNPs were formulated and characterized as described in example 60 and tested in vitro prior to administration to mice (example 39). Approximately 8 week old C57BL/6 male mice (Charles river laboratory (Charles River Labs)) were given LNP at a dose of 1mg/kg by intravenous (i.v.) route. Vehicle control animals were given 300 μl of phosphate buffered saline intravenously. Mice were injected with 5mg/kg dexamethasone by the intraperitoneal route 30 minutes prior to LNP injection. Tissues were collected after LNP administration or at 6, 24, 48 hours necropsy, 5 mice per group at each time point. Liver and other tissue samples were collected, snap frozen in liquid nitrogen, and stored at-80 ℃ until analysis.

Frozen liver samples were crushed on dry ice and transferred to a homogenization tube containing lysis matrix D beads (MP Biomedical). Ice-cold 1x luciferase Cell Culture Lysis Reagent (CCLR) (plagmata) was added to each tube and samples were homogenized in Fast Prep-245G homogenizer (MP biomedical company) at 6m/s for 40 seconds. The samples were transferred to a clean microcentrifuge tube and clarified by centrifugation. Pierce was used according to the manufacturer's instructions prior to luciferase activity assay ^TM The BCA protein assay kit (sameidie science and technology) determines the protein concentration of each sample. Luciferase activity was measured with 200 μg (total protein) liver homogenate using an i3X spectromax reader (molecular instruments) using luciferase assay reagent (plagmager) according to the manufacturer's instructions. Liver samples showed successful delivery of mRNA for all lipid formulations with reporter activity according to LIPIDV005>LIPIDV004>Ordering of LIPIDV003 (fig. 38). LNP containing firefly luciferase mRNA was formulated and delivered to mice by iv, and liver samples were collected and assayed for luciferase activity 6, 24 and 48 hours after administration, as shown in fig. 38. The reporter activity of each formulation was LIPIDV005 in turn>LIPIDV004>LIPIDV003.RNA expression was transient and enzyme levels recovered near the mediator background after 48 hours. After application. The assay verifies the use of these ionizable lipids and their respective formulations for RNA systems for delivery to the liver.

Without wishing to be limited by the examples, the lipids and formulations described in the examples support the efficacy of in vivo delivery of RNA molecules other than reporter mRNA. The whole RNA Gene Writing system can be delivered by the formulations described herein. For example, a whole RNA system using the Gene Writer polypeptide mRNA, a template RNA, and optionally a second nicked gRNA is described for editing a genome in vitro by nuclear transfection, using modified nucleotides, by lipofection, and for editing primary T cells. As described herein, these whole RNA systems have many unique advantages in terms of cellular immunogenicity and toxicity, which is important when dealing with more sensitive primary cells, especially immune cells (e.g., T cells), as opposed to immortalized cell culture cell lines. Furthermore, it is contemplated that these total RNA systems can be targeted to alternative tissues and cell types using the novel lipid delivery systems as mentioned herein, e.g., for delivery to liver, lung, muscle, immune cells, etc., provided that the function of the Gene Writing system has been validated in a variety of cell types in vitro, and that the function of other RNA systems delivered with targeted LNP is known in the art. In vivo delivery of the Gene Writing system can have a tremendous impact in many therapeutic areas, such as correction of pathogenic mutations, instillation of protective variants, and enhancement of endogenous cells of the body, such as T cells. Given the appropriate formulation, whole RNA Gene Writing is believed to be capable of producing therapeutic agents in situ in cell-based patients.

Example 41: improving expression of Cas-RT fusion proteins by linker selection

This example demonstrates optimization of Cas-RT fusion to improve protein expression in mammalian cells. Construction of novel Cas-RT fusions by simple replacement of the novel functional domains may result in low or moderate expression of the Gene Writer polypeptide. Thus, it is contemplated herein that the modified configuration of the fusion may be advantageous in the context of different domains. Without wishing to be limited to this example, one approach for improving the expression and stability of the new fusion is through the use of linker libraries. Here, using a library of linker sequences, the peptide linker sequence between the Cas and RT domains of the Cas-RT fusion is variable. More specifically, the linkers in table 38B are used to generate new variants of Cas9-RT fusion constructs that previously displayed low protein expression and delivered to human cells to screen for improved Cas-RT protein expression.

A set of 22 peptide linkers (Table 38B) with varying degrees of length, flexibility, hydrophobicity, and secondary structure were first used to generate variants of Cas-RT fusion proteins by replacing the original linker (SQ ID NO: 480). HEK293T cells were transfected by electroporation of 250,000 cells/well using approximately 800ng of each Cas9-RT fusion plasmid and 200ng of single guide RNA plasmid. To assess the expression level of Cas9-RT fusion, cell lysates were collected on day 2 post-transfection and analyzed by western blot using a primary antibody against Cas 9. The linker 10 (SEQ ID NO: 468) listed in Table 38 significantly improved Cas-RT fusion expression (FIG. 39), indicating the potential profound effect of the peptide linker sequence on Cas-RT expression.

TABLE 38B peptide sequences used as a linker between Cas and RT domains in Gene Writer Polypeptides comprising Cas-RT fusions

It should be understood that for all numerical limits describing a certain parameter in this application, such as "about," "at least," "less than," and "greater than," the description also necessarily encompasses any range bounded by the recited values. Thus, for example, the expression "at least 1, 2, 3, 4 or 5" also describes, inter alia, the ranges 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5 and 4-5, etc.

For all patents, applications, or other references cited herein, such as non-patent documents and reference sequence information, it should be understood that they are incorporated by reference in their entirety for all purposes and for the claims to be stated. If any conflict exists between a document incorporated by reference and the present application, the present application will control. All information related to the reference gene sequences disclosed in the present application, such as GeneID or accession numbers (commonly referred to NCBI accession numbers), including, for example, genomic loci, genomic sequences, functional annotations, allelic variants and reference mRNA (including, for example, exon boundaries or response elements) and protein sequences (such as conserved domain structures), as well as chemical references (such as pubhem compounds, pubhem substances or pubhem bioassay entries, including annotations therein, such as structure and assays, etc.), are incorporated herein by reference in their entirety.

The headings used in this application are for convenience only and do not affect the interpretation of this application.

Claims

1. A system for modifying DNA, the system comprising:

2. A system for modifying DNA, the system comprising:

3. A system for modifying DNA, the system comprising:

4. A method of modifying a target DNA strand in a cell, tissue or subject, the method comprising administering to the cell a system, wherein the system comprises: