CN116685682A - Guided editing guide RNAs, compositions thereof, and methods of using the same - Google Patents

Guided editing guide RNAs, compositions thereof, and methods of using the same Download PDF

Info

Publication number
CN116685682A
CN116685682A CN202180078921.8A CN202180078921A CN116685682A CN 116685682 A CN116685682 A CN 116685682A CN 202180078921 A CN202180078921 A CN 202180078921A CN 116685682 A CN116685682 A CN 116685682A
Authority
CN
China
Prior art keywords
pegrna
nucleotides
napdnabp
nucleic acid
editing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180078921.8A
Other languages
Chinese (zh)
Inventor
D·R·刘
J·W·纳尔逊
P·B·伦道夫
A·V·安扎隆
S·沈
K·伊夫雷特
P·J·陈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harvard College
Broad Institute Inc
Original Assignee
Harvard College
Broad Institute Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harvard College, Broad Institute Inc filed Critical Harvard College
Priority claimed from PCT/US2021/052097 external-priority patent/WO2022067130A2/en
Publication of CN116685682A publication Critical patent/CN116685682A/en
Pending legal-status Critical Current

Links

Abstract

The present disclosure provides modified pegRNA comprising one or more additional nucleotide structural motifs that increase editing efficiency during guided editing, increase in vivo half-life, and increase cellsIs provided. Modifications include, but are not limited to, aptamers (e.g., prequeosines 1 -1 riboswitch aptamer or "evoparq 1 -1 ") or a variant thereof, a pseudoknot (MMLV viral genome pseudoknot or" Mpknot-1 ") or a variant thereof, a tRNA (e.g., a modified tRNA used by MMLV as a primer for reverse transcription) or a variant thereof, or a G-quadruplex or a variant thereof. The present disclosure further provides guided editor complexes comprising modified pegrnas and having improved properties and/or performance, including stability, improved cell life, and improved editing efficiency. The present disclosure also provides methods of editing a genome using the guided editor complex with modified pegRNA, as well as nucleotide sequences and expression vectors encoding the guided editor and modified pegRNA, as well as cells, kits, and pharmaceutical compositions comprising the improved guided editor complex.

Description

Guided editing guide RNAs, compositions thereof, and methods of using the same
RELATED APPLICATIONS
The application claims the benefits of U.S. S. N.63/231,231, U.S. S. N.63/182,633, U.S. S. N.63/083,067, U.S. S. N.63/083,067, U.S. S. N.63/231,231, U.S. S. N.63/182,633, U.S. S. N.63/083,067, U.S. S. N.63, and U.S. 24, respectively, of 2021, 8, 9, and 2021. In addition, the present application claims the benefit of U.S. S. N.63/091,272, U.S. provisional application filed on 10/13 of 2020, each of which is incorporated herein by reference.
Government support
The present application was carried out with government support under grant numbers AI142756, HG009490, EB022376 and GM118062 from the national institutes of health. The government has certain rights in this application.
Background
Guided editing (PE) is a nucleic acid editing platform that enables targeted and programmable installation of defined changes in nucleotide sequence at a desired locus. It involves targeting a guide editor to a target site in the genome, where the guide editor comprises a nucleic acid programmable DNA binding protein (napDNAbp) fused to a polymerase (e.g., reverse Transcriptase (RT)) associated with a guide editing RNA (pegRNA). The pegRNA comprises a scaffold (binding to napdNAbp), a spacer sequence (complementary to the genomic site) and an extension arm at the 3 'or 5' end of the pegRNA. The extension arm includes a DNA synthesis template that contains the sequence desired to be edited. During guided editing, once the guided editor complexed with pegRNA is located at the genomic site, a polymerase (e.g., reverse transcriptase) synthesizes a new DNA strand containing the desired editing using the DNA synthesis template. The new DNA strand then replaces the corresponding endogenous DNA strand at the genomic site, thereby installing the desired, edited nucleotide sequence into the genome at the editing site.
While guided editing has many advantages over other genome editing modes, such as ease of programming a DNA synthesis template to specify desired edits, it is still desirable to further enhance the characteristics and performance of guided editing, including, for example, installing desired edits and/or reducing the efficiency of indels (indels) formation.
Guided editing and/or modification of its components that results in increased editing efficiency and/or increased specificity would significantly advance the genome editing field.
Summary of The Invention
The present disclosure provides next generation peprnas with improved properties including, but not limited to, increased stability, increased in vivo half-life, and/or improved binding affinity to napDNAbp and/or target DNA sequences. These improved properties can be achieved in a variety of ways, including but not limited to attaching a three-dimensional RNA structure (such as a stem loop) to the pegRNA to increase its stability, or modifying to reduce the binding affinity of the Primer Binding Site (PBS) of the pegRNA extension arm to the spacer sequence of the pegRNA (e.g., by blocking the PBS with a pivot point (toehold) that dissociates upon napDNAbp binding, providing a 3' extension arm in trans, or introducing chemical and/or genetic modifications to the pegRNA, as described further herein). These modified pegrnas, when used in conjunction with a guided editor such as a fusion protein comprising a napDNAbp domain (e.g., cas9 domain) and a polymerase domain (e.g., reverse transcriptase domain), result in improved activity and/or efficiency of guided editing. In particular, the inventors have found that there may be various drawbacks to pegrnas, including reduced affinity for nucleic acid programmable DNA binding proteins (e.g., cas9 nickase), increased sensitivity to degradation (in particular, degradation of the extension arm) compared to typical single guide RNAs (sgrnas), and a tendency to inactivate due to unwanted duplex formation between the extension arm (in particular, the primer binding site of the extension arm) and the spacer sequence in the pegRNA, competing for binding of the pegRNA to the target DNA. Without wishing to be bound by any particular theory, these problems arise because of the presence of extension arms, which are an integral part of the pegRNA, that are not present in typical sgrnas. To overcome these drawbacks, the inventors have found that pegRNAs can be modified in one or more ways to improve their overall stability and/or performance in guided editing.
First, the inventors have found that the addition of one or more RNA structural motifs to the pegRNA can prevent degradation of the pegRNA. Such RNA structural motifs may include, but are not limited to, prequeosin 1-1 riboswitch aptamer (evopraQ 1) and variants thereof, frameshift pseudojunctions (hereinafter "mpknot") from Moloney Murine Leukemia Virus (MMLV) 22 and variants thereof, G-quadruplexes, hairpin structures (e.g., 15-bp hairpin) and P4-P6 domains of group I introns.
Second, the inventors have discovered various ways to reduce duplex formation between the Primer Binding Site (PBS) and the spacer sequence of the extension arm of the pegRNA (i.e., reduce PBS/spacer binding interactions). In one embodiment, PBS/spacer binding interactions are avoided by stabilizing the 3' extension arm, including, but not limited to, (i) blocking the PBS with a pivot point that dissociates upon napDNAbp (e.g., cas9 nickase) binding, (ii) trans-providing the 3' extension arm, i.e., moving the 3' extension arm or a portion thereof (e.g., PBS and/or PBS and DNA template portion) from the perna to another molecule, e.g., a nicking-generating gRNA, and (iii) introducing chemical and/or genetic modifications to the perna that favor RNA/DNA duplex formation but disfavor RNA/RNA duplex formation, thereby facilitating the desired interaction between the PBS of the perna and the target DNA.
In general, the modified pegrnas disclosed herein resulting from the implementation of these strategies are referred to herein as "engineered" pegrnas or "epegrnas.
In another aspect of the disclosure, the inventors have developed a novel computational algorithm, which may be embodied in software, for identifying one or more nucleotide linkers for coupling a guided editing guide RNA (pegRNA) to a nucleic acid portion, such as, but not limited to, an aptamer (e.g., prequeosone 1 -1 riboswitch aptamer or "evoparq 1 -1 ") or a variant thereof, a pseudoknot (MMLV viral genome pseudoknot or" Mpknot-1 ") or a variant thereof, a tRNA (e.g., a modified tRNA used by MMLV as a reverse transcription primer) or a variant thereof, or a G-quadruplex or a variant thereof, to form or produce an engineered pegRNA. This computational technique, which may be referred to herein as a pegRNA adaptor identification tool ("pegLIT"), involves efficient assessment of nucleic acid adaptor candidates to identify those with a lower propensity for base pairing with other regions of the pegRNA (e.g., regions comprising primer binding sites, spacers, DNA synthesis templates, and/or gRNA cores).
Furthermore, the present disclosure provides nucleic acid molecules encoding and/or expressing epegrnas, as well as expression vectors and constructs for expressing the epegrnas described herein, host cells comprising the nucleic acid molecules and expression vectors, and compositions for delivering and/or administering the epegrnas in conjunction with the guided editing systems described herein. Furthermore, the present disclosure provides isolated epegrnas, as well as compositions comprising the epegrnas as described herein. Still further, the present disclosure provides a guided editor system comprising (a) a guided editor (e.g., a complex or fusion protein comprising napDNAbp (e.g., cas9 nickase) and a reverse transcriptase or other RNA-dependent DNA polymerase) and (b) an epegRNA as disclosed herein. Still further, the present disclosure provides methods of making the epegrnas disclosed herein, and methods of using the epegrnas in methods of guided editing for introducing one or more changes into a target nucleic acid molecule (e.g., genome) with improved efficiency compared to the guided editor and the use of the pegrnas. The present specification also provides methods of efficiently editing a target nucleic acid molecule (e.g., a single nucleobase of a genome) using the guided editing system described herein (e.g., in the form of a guided editor or vector or construct encoding the same and an epegRNA described herein) or any of the previously described guided editing systems. Still further, the present description provides therapeutic methods for treating a genetic disease and/or for altering or changing a genetic trait or condition by contacting a target nucleic acid molecule (e.g., genome) with a guided editing system utilizing an epegRNA as described herein or previously described herein.
In particular embodiments, it has surprisingly been found that by attaching a nucleotide structural motif to the end of an extension arm of a pegRNA, including but not limited to an aptamer (e.g., prequeosone 1 -1 riboswitch aptamer or "evoparq 1 -1 ") or a variant thereof, a pseudoknot (MMLV viral genome pseudoknot or" Mpknot-1 ") or a variant thereof, a tRNA (e.g., a modified tRNA of MMLV used as a reverse transcription primer) or a variant thereof, or a G-quadruplex or a variant thereof, achieves a consistent increase in editing efficiency. Thus, the present disclosure provides modified pegrnas comprising one or more additional nucleotide structural motifs that increase the editing efficiency of a guided editor when complexed with the guided editor. Furthermore, the present disclosure provides a guided editing complex comprising a guided editor complexed with an engineered pegRNA disclosed herein, as well as nucleotide sequences and expression vectors encoding the modified pegRNA, and optionally, it may also encode a guided editor on the same or different carrier molecules. Still further, the present disclosureThere is provided a guided editing-based genome editing method that involves using a guided editor associated with a modified pegRNA disclosed herein to install desired nucleotide sequence changes at desired sites in a nucleic acid, characterized by higher editing efficiency than guided editing using pegrnas (i.e., those not modified in the manner described herein). The present disclosure also provides cells and kits comprising the disclosed modified pegrnas, or guided editing complexes comprising the modified pegrnas. The present disclosure also provides methods of making the disclosed modified pegRNAs comprising coupling one or more structural nucleotide motifs (e.g., aptamers, G-quadruplexes, tRNA's, or pseudojunctions) to the ends of the extension arms of the pegRNAs, optionally through nucleotide linkers. The present disclosure also provides methods of delivering modified pegRNA and optionally a guided editor to a target cell for genome editing at a desired target site, and methods of treating a genetic disorder using the modified pegRNA disclosed in conjunction with the guided editing.
The guided editing process may introduce into the nucleic acid (e.g., genome) at least one or more of the following genetic changes: transversions, transitions, deletions and insertions. Furthermore, guided editing may also be implemented for a particular application. For example, guided editing can be used to (a) install mutation correction changes to nucleotide sequences, (b) install protein and RNA tags, (c) install immune epitopes on a protein of interest, (d) install dimerization domains in a protein, (e) install or remove sequences that alter the activity of biomolecules, (f) install recombinase target sites to direct specific genetic changes, and (g) mutagenize target sequences by using error-prone RT, among other purposes. Also, using the modified pegRNA described herein, these guided editing applications can be performed with high efficiency and/or reduced indels.
In a first aspect, the present disclosure provides a pegRNA for use in guiding editing comprising a guide RNA and at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site, wherein the extension arm comprises a nucleic acid moiety attached thereto, the nucleic acid moiety selected from the group consisting of a toe loop, a hairpin, a stem loop, a pseudoknot, an aptamer, a G-quadruplex, a tRNA, a riboswitch, or a ribozyme. In certain embodiments, the nucleic acid moiety is attached to the 3' end of the pegRNA extension arm. In other embodiments, the nucleic acid moiety is attached to the 5' end of the pegRNA extension arm.
In various embodiments, the nucleic acid portion is an Mpknot1 portion having a nucleotide sequence selected from the group consisting of seq id no:195 (Mpknot 1), 196 (Mpknot 1 3'), SEQ ID NO:197 (Mpknot 1 with 5 'extra), SEQ ID NO:198 (Mpknot 1U 38A), SEQ ID NO:199 (Mpknot 1U38A A C), SEQ ID NO:200 (MMLC A29C), SEQ ID NO:201 (Mpknot 1 with 5' extra and U38A), SEQ ID NO:202 (Mpknot 1 with 5'extra and U38A A C), and SEQ ID NO:203 (Mpknot 1 with 5' extra and A29C), or nucleotide sequences having at least 80% sequence identity thereto.
In other embodiments, the nucleic acid portion is a G-quadruplex having a nucleotide sequence selected from the group consisting of: SEQ ID NO:204 (tnsl), SEQ ID NO:205 (stk 40), SEQ ID NO:206 (apc 2), SEQ ID NO:207 (ceacam 4), SEQ ID NO:208 (pitpnm 3), SEQ ID NO:209 (rlf), SEQ ID NO:210 (erc 1), SEQ ID NO:211 (ube 3 c), SEQ ID NO:212 (taf 15), SEQ ID NO:213 (stard 3) and SEQ ID NO:214 (g 2), or a nucleotide sequence having at least 80% sequence identity thereto.
In other embodiments, the nucleic acid portion of the modified pegRNA is an evoreq 1 aptamer having a nucleotide sequence selected from the group consisting of: SEQ ID NO. 215 (evapoteq 1), SEQ ID NO. 216 (evapoteq 1motif 1), SEQ ID NO. 217 (evapoteq 1motif 2), SEQ ID NO. 218 (evapoteq 1motif 3), SEQ ID NO. 219 (shorter preq 1-1), SEQ ID NO. 220 (preq 1-1G5C (mut 1)) and SEQ ID NO. 221 (preq 1-1G 15C (mut 2)), or a nucleotide sequence having at least 80% sequence identity thereto.
In other embodiments, the nucleic acid portion is a tRNA portion having the nucleotide sequence of SEQ ID NO. 222, or a nucleotide sequence that has at least 80% sequence identity thereto.
In other embodiments, the nucleic acid portion has the nucleotide sequence of SEQ ID NO. 223 (xrnl), or a nucleotide sequence having at least 80% sequence identity thereto.
In other embodiments, the nucleic acid portion has the nucleotide sequence of SEQ ID NO 224 (grp 1 intron P4P 6), or a nucleotide sequence having at least 80% sequence identity thereto.
Any of the nucleic acid portions described herein can be attached to the pegRNA by a linker (e.g., a nucleotide linker), e.g., to the 3' end of the pegRNA. The linker may have a nucleotide sequence selected from SEQ ID NOS 225-236. The linker may be any suitable sequence. Optionally, the linker sequence may be empirically determined for each pegRNA.
The connector may have any suitable length. In certain embodiments, the linker is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides in length.
In a preferred embodiment, the linker is at least 8 nucleotides in length.
In various embodiments, the extension arm of the pegRNA is placed at the 3 'or 5' end of the guide RNA, or at an intramolecular position of the guide RNA, and wherein the nucleic acid extension arm is DNA or RNA.
In various embodiments, the pegRNA is capable of binding to napdNAbp and directing napdNAbp to a target DNA sequence. The target DNA sequence may comprise a target strand and a complementary non-target strand. The guide RNA can hybridize to the target strand to form an RNA-DNA hybrid and an R-loop.
In various embodiments, the length of the extension arm may vary and depends on the length of the DNA synthesis template. In certain embodiments, the nucleic acid extension arm is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 42 nucleotides, at least 45 nucleotides, at least 46 nucleotides, at least 48 nucleotides, at least 46 nucleotides.
The DNA synthesis template may also vary depending on the desired editing and may be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length.
In various embodiments, the desired edit is a single nucleotide substitution, or a single nucleotide deletion, or an insertion. The desired edit may also be any length that can be installed by a guided edit, and may include a deletion, insertion, or inversion.
The length of the primer binding site may also vary and may be, for example, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length.
In another aspect, the present disclosure provides a pegRNA for guided editing comprising (i) a guide RNA comprising a spacer region and (ii) at least one nucleic acid extension arm comprising a DNA synthesis template, a primer binding site, a pivot motif, and an additional nucleic acid portion, wherein the pivot motif blocks the interaction of the primer binding site with the spacer region when the pegRNA is not bound by a guided editor, but does not block the interaction of the primer binding site with a pre-spacer sequence on a target DNA molecule when the pegRNA is bound by a guided editor. In some embodiments, the fulcrum motif and additional nucleic acid moiety are attached to the 3' end of the extension arm. In some embodiments, the fulcrum motif is attached to the 3 'end of the extension arm, and the additional nucleic acid portion is attached to the 3' end of the fulcrum motif. In some embodiments, the fulcrum motif is attached to the PEgRNA by a linker.
In another aspect, the present disclosure provides PEgRNA pairs for guided editing comprising (i) a first PEgRNA comprising a guide RNA, wherein the guide RNA comprises a spacer region; (ii) A second PEgRNA comprising a second nick-producing guide RNA, wherein the second nick-producing guide RNA comprises at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site. In some embodiments, the first PEgRNA and the second PEgRNA are each capable of binding to a nucleic acid programmable DNA binding protein (napDNAbp) of a guide editor and guiding the napDNAbp to a target DNA sequence.
In another aspect, the present disclosure provides PEgRNA comprising (i) a guide RNA comprising a spacer region and (ii) at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site, wherein the primer binding site comprises one or more modified nucleotides, wherein the one or more modified nucleotides result in a greater decrease in binding affinity of the primer binding site to the spacer region than the binding affinity of the primer binding site to a pre-spacer sequence on a target DNA molecule. In some embodiments, the one or more modified nucleotides comprise a genetic mutation. In some embodiments, the one or more modified nucleotides comprise chemically modified nucleotides.
In another aspect, the present disclosure provides a composition for guided editing comprising:
(a) A fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) and a domain comprising RNA-dependent DNA polymerase activity; and
(b) Any of the pegRNAs described above comprising a nucleic acid moiety attached to the end of an extension arm.
In some embodiments, the napDNAbp that directs editing of the complex comprises an endonuclease with nucleic acid programmable DNA binding capability. In some embodiments, the napDNAbp comprises an active endonuclease capable of cleaving both strands of double stranded target DNA. In some embodiments, the napDNAbp is a nuclease-active endonuclease, such as a nuclease-active Cas protein, that can cleave both strands of double-stranded target DNA by creating a nick on each strand. For example, a nuclease-active Cas protein can generate a cut (nick) on each strand of a double-stranded target DNA. In some embodiments, the two nicks on the two strands are staggered nicks, e.g., generated by napDNAbp comprising Cas12a or Cas12b 1. In some embodiments, the two nicks on both strands are located at the same genomic position, e.g., generated by napDNAbp comprising nuclease activity Cas 9. In some embodiments, the napDNAbp comprises an endonuclease as a nicking enzyme. For example, in some embodiments, the napDNAbp comprises an endonuclease that contains one or more mutations that reduce the nuclease activity of the endonuclease, making it a nicking enzyme. In some embodiments, the napDNAbp comprises an inactive endonuclease, e.g., in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that eliminate nuclease activity. In various embodiments, the napDNAbp is a Cas9 protein or variant thereof. The napDNAbp can also be nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9). In a preferred embodiment, napDNAbp is Cas9 nickase (nCas 9) nicking only a single strand. In other embodiments, the napDNAbp may be selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ) and Argonaute and optionally has nickase activity such that only one strand is nicked. In some embodiments, napDNAbp is selected from Cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity such that one DNA strand is nicked over the other DNA strand. In various embodiments, the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising any one of the amino acid sequences of SEQ ID NOs 32, 34, 36, 102-128 and 132.
In some embodiments, the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs 32, 34, 36, 102-128 and 132. In other embodiments, the domain comprising RNA-dependent DNA polymerase activity is a naturally occurring reverse transcriptase from a retrovirus or retrotransposon.
In another aspect, the present disclosure provides nucleic acid molecules encoding the modified pegrnas described above and provided in the present disclosure.
In yet another aspect, the present disclosure provides an expression vector comprising the above nucleic acid molecule. The nucleic acid molecule may be under the control of a promoter. The promoter may be a polIII promoter. The promoter may also be a U6, U6v4, U6v7 or U6v9 promoter or fragment thereof, including promoters having the nucleotide sequence of any one of SEQ ID NOs 3915-3918.
In another aspect, the disclosure provides a cell (e.g., a transformed cell line) comprising a modified pegRNA as described above. The cells may also comprise a guided editing complex as described above (e.g., wherein the cells comprise both a modified pegRNA and a guided editor). The cell may also comprise any of the nucleic acid molecules described above that express the modified pegRNA, and optionally a guide editor. Furthermore, the cell may comprise any of the expression vectors described above, which express the modified pegRNA, and optionally a guide editor.
In another aspect, the present disclosure provides a pharmaceutical composition comprising: (i) The modified pegRNA described above, or the primer editing complex described above, the nucleic acid molecule described above, or the expression vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
In another aspect, the present disclosure provides a kit comprising: (i) The modified pegRNA, or the guide editing complex, the nucleic acid molecule, or the expression vector, or any of the cells, and (ii) a set of instructions for performing guide editing.
In another aspect, the present disclosure provides a system comprising (i) any pegRNA or epegRNA disclosed herein, and (ii) at least one guide editor comprising napDNAbp and a DNA polymerase.
In another aspect, the present disclosure provides a method of guided editing comprising contacting a target DNA sequence with a modified pegRNA as described above and a guided editor comprising napDNAbp and a domain having RNA-dependent DNA polymerase activity, wherein editing efficiency is increased compared to the same method using a peprna that does not comprise the modification. In certain embodiments, the editing efficiency is increased by at least a factor of 1.5. In other embodiments, the editing efficiency is increased by at least a factor of 2.0. In other embodiments, the editing efficiency is increased by at least a factor of 3.0. In other embodiments, the editing efficiency is increased by at least a factor of 4, 5, 6, 7, 8, 9, or 10.
In another aspect, the present disclosure uses a guide editor (e.g., PE1, PE2, or PE 3) in conjunction with a guide RNA (pegRNA) to perform guide editing to directly install or correct mutations in the CDKL5 gene that lead to CDKL5 deficiency. In various embodiments, the disclosure provides complexes comprising a guide editor (e.g., PE1, PE2, or PE 3) and pegRNA that are capable of directly installing or correcting one or more mutations in a CDKL5 gene in multiple subjects.
In the guided editing methods disclosed herein, napDNAbp can have nickase activity. The napDNAbp can be a Cas9 protein or a variant thereof. The napDNAbp can also be nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9). napDNAbp can also be Cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally have nickase activity.
In the guided editing method, the RNA-dependent DNA polymerase activity may be a reverse transcriptase comprising any one of the amino acid sequences of SEQ ID NOs 32, 34, 36, 102-128 and 132. In embodiments, the RNA dependent DNA polymerase activity may be a reverse transcriptase comprising an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs 32, 34, 36, 102-128 and 132.
The present specification further refers to and incorporates by reference the following application related to lead editing, U.S. provisional application number 62/820,813 (attorney docket number B1195.70074US00) filed on 3 months 19, U.S. provisional application number 62/858,958 (attorney docket number B1195.70074US01) filed on 6 months 7, U.S. provisional application number 62/889,996 (attorney docket number B1195.70074US02) filed on 8 months 21, U.S. provisional application number 62/922,654 (attorney docket number B1195.70083US00) filed on 8 months 21, U.S. provisional application number 62/913,553 (attorney docket number B1195.70074US03) filed on 10 months 10, U.S. provisional application number 62/973,558 (attorney docket number B1195.70083US01) filed on 10 months 10, U.S. provisional application number 62/931,195 (attorney docket number B1195.70074US04) filed on 11 months 5, and 5, U.S. provisional application number 62/922,2020 (attorney docket number 62/2020) filed on 12 months 21, and (attorney docket number 35) filed on 5, and (attorney docket number 35,548) filed on 5, attorney docket number 35, attorney numbers filed on 10, and (attorney docket number 35, etc.). Further, the present U.S. provisional application is incorporated by reference for the following international PCT application nos.: PCT/US20/23721; PCT/US20/23730; PCT/US20/23713; PCT/US20/23712; PCT/US20/23727; PCT/US20/23724; PCT/US20/23725; PCT/US20/23728; PCT/US20/23732; PCT/US20/23723; PCT/US20/23553; and PCT/US20/23583, both filed on 19 days 3/2020.
It should be appreciated that the foregoing concepts and other concepts discussed below may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further advantages and novel features of the present disclosure will become apparent from the following detailed description of different non-limiting embodiments when considered in conjunction with the drawings.
Drawings
The following drawings form a part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which may be better understood by reference to one or more of these drawings in combination with the detailed description of the specific embodiments presented herein.
Fig. 1A provides a schematic diagram of an exemplary process for introducing nucleotide changes, insertions, and/or deletions into a DNA molecule (e.g., genome) using a fusion protein (i.e., a guide editor) that contains a reverse transcriptase fused to a Cas9 protein to complex pegRNA (i.e., a guide editor complex). In this embodiment, the guide RNA is extended at the 3' end to include a DNA synthesis template sequence. The schematic shows how a complex of a polymerase fused to Cas9 nickase (e.g., reverse Transcriptase (RT)) and pegRNA binds to a DNA target site and nicks PAM-containing DNA strands adjacent to the target nucleotide. RT uses nicked DNA as a primer for DNA synthesis from gRNA, which serves as a template for synthesis of new DNA strands encoding desired edits (e.g., mutations, insertions, and/or deletions). The editing process shown may be referred to as "guided editing".
FIG. 1B provides the same illustration as FIG. 1A, except that the guide editor complex is more generally represented as [ napdNAbp ] - [ P ]: pegRNA or [ P ] - [ napdNAbp ]: pegRNA, where "P" refers to any polymerase (e.g., reverse transcriptase), "napdNAbp" refers to a nucleic acid programmable DNA binding protein (e.g., spCas 9), and "pegRNA" refers to the guide editing RNA, "]" refers to an optional linker. As described elsewhere, e.g., as shown in fig. 3A-3G, the pegRNA comprises a 5' extension arm that contains a primer binding site and a DNA synthesis template. Although not shown, it is contemplated that the extension arm of the pegRNA (i.e., it comprises a primer binding site and a DNA synthesis template) may be DNA or RNA. The particular polymerase considered in this configuration depends on the nature of the DNA synthesis template. For example, if the DNA synthesis template is RNA, the polymerase may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). If the DNA synthesis template is DNA, the polymerase may be a DNA dependent DNA polymerase.
Fig. 1C provides a schematic diagram of an exemplary process for introducing single nucleotide changes, insertions, and/or deletions into a DNA molecule (e.g., genome) using a fusion protein complex pegRNA comprising a reverse transcriptase fused to a Cas9 protein. In this embodiment, the guide RNA is extended at the 5' end to comprise a reverse transcriptase template sequence. The schematic shows how a complex of Reverse Transcriptase (RT) fused to Cas9 nickase and pegRNA binds to a DNA target site and nicks PAM-containing DNA strands adjacent to the target nucleotide. RT uses nicked DNA as a primer for DNA synthesis from gRNA, which serves as a template for synthesis of new DNA strands encoding desired edits. The editing process shown may be referred to as "guided editing".
FIG. 1D provides the same illustration as FIG. 1C, except that the guide editor complex is more generally represented as [ napdNAbp ] - [ P ]: pegRNA or [ P ] - [ napdNAbp ]: pegRNA, where "P" refers to any polymerase (e.g., reverse transcriptase), "napdNAbp" refers to a nucleic acid programmable DNA binding protein (e.g., spCas 9), "pegRNA" refers to the guide editing guide RNA, "] - [" refers to an optional linker. As described elsewhere, e.g., as shown in fig. 3A-3G, the pegRNA comprises a 3' extension arm that contains a primer binding site and a DNA synthesis template. Although not shown, it is contemplated that the extension arm of the pegRNA (i.e., it comprises a primer binding site and a DNA synthesis template) may be DNA or RNA. The particular polymerase considered in this configuration depends on the nature of the DNA synthesis template. For example, if the DNA synthesis template is RNA, the polymerase may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). If the DNA synthesis template is DNA, the polymerase may be a DNA dependent DNA polymerase. In various embodiments, pegRNA can be engineered or synthesized to incorporate DNA-based DNA synthesis templates.
FIG. 1E is a schematic diagram depicting an exemplary process of how a single strand of synthetic DNA (which contains the desired nucleotide changes) is broken down to incorporate the desired nucleotide changes into the DNA. As shown, the editing strand (or "mutagenic strand") is then synthesized, balanced with the endogenous strand, and the cleavage and ligation of the endogenous strand's flaps results in the decomposition of mismatched DNA duplex by the action of endogenous DNA repair and/or replication processes, followed by incorporation into DNA editing.
FIG. 1F is a schematic diagram showing that "reverse strand nick generation" can be incorporated into the decomposition process of FIG. 1E to help drive the formation of the desired product relative to the recovery product (reversion product). In reverse strand nick generation, a second Cas9/gRNA complex is used to introduce a second nick on the strand opposite the original nick strand. This induces endogenous cellular DNA repair and/or replication processes to preferentially displace non-editing strands (i.e., strands containing a second nick site).
FIG. 1G provides another schematic illustration of an exemplary process for introducing single nucleotide changes and/or insertions and/or deletions into a DNA molecule (e.g., genome) of a target locus using a nucleic acid programmable DNA binding protein (napdNAbp) complex pegRNA. This process may be considered as an implementation of guided editing. The pegRNA comprises an extension at the 3 'or 5' end of the guide RNA or in an intramolecular position of the guide RNA. In step (a), the napDNAbp/gRNA complex is contacted with a DNA molecule and the gRNA directs the binding of napDNAbp to the target locus. In step (b), a nick (e.g., by a nuclease or chemical agent) is introduced into one of the DNA strands at the target locus (R loop strand, or PAM-containing strand, or non-target DNA strand, or pre-spacer strand) to create a useable 3' end in one strand at the target locus. In certain embodiments, a nick is created in the DNA strand corresponding to the R-loop strand, i.e., a strand that does not hybridize to the guide RNA sequence. In step (c), the 3' -terminal DNA strand interacts with the extension of the guide RNA to initiate reverse transcription. In certain embodiments, the 3' terminal DNA strand hybridizes to a specific RT initiation sequence on the extension of the guide RNA. In step (d), a reverse transcriptase is introduced which synthesizes single stranded DNA from the 3 'end of the priming site to the 3' end of the guide RNA. This forms a single stranded DNA flap comprising the desired nucleotide change (e.g., single base change, insertion, or deletion, or a combination thereof). In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) involve the breakdown of single stranded DNA flaps such that the desired nucleotide change is incorporated into the target locus. This process may be driven towards the formation of the desired product by removal of the corresponding 5' endogenous DNA flap, which will form the 5' endogenous DNA flap once the 3' single stranded DNA flap invades and hybridizes to the complementary sequence of the other strand. This process can also be driven toward the formation of a product using a second strand incision, as shown in FIG. 1F. The process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions and insertions.
FIG. 1H is a schematic diagram depicting the types of gene changes possible using the guided editing process described herein. Types of nucleotide changes that can be achieved by guided editing include deletions (including short and long deletions), single nucleotide changes (including transitions and transversions), inversions, and insertions (including short and long insertions).
Fig. 1I is a schematic diagram depicting the timing (temporal) second strand-cut generation exemplified by PE3b (pe3b=pe2 guide editor+pegrna+second strand-cut generation guide RNA). Timing second strand nick generation is a variant of second strand nick generation to facilitate formation of a desired editing product. The term "timing" refers to the fact that the second chain cut of the non-edit chain only occurs after the desired edit is installed into the edit chain. This avoids double-stranded DNA breaks caused by concurrent nicking of both strands.
Fig. 1J depicts an editing-directed variant contemplated herein that replaces napDNAbp (e.g., spCas9 nickase) with any programmable nuclease domain, such as a Zinc Finger Nuclease (ZFN) or a transcription activator-like effector nuclease (TALEN). Thus, it is contemplated that suitable nucleases do not necessarily need to be "programmed" by a nucleic acid targeting molecule (e.g., guide RNA), but can be programmed by defining the specificity of a DNA binding domain, such as a nuclease in particular. Just as with the guided editing of the napDNAbp moiety, such alternative programmable nucleases are preferably modified to cleave only one target DNA strand. In other words, the programmable nuclease should preferably function as a nicking enzyme. Once a programmable nuclease (e.g., ZFN or TALEN) is selected, additional functionality can be engineered into the system to act as a guided editing-like mechanism. For example, a programmable nuclease can be modified by coupling (e.g., via a chemical linker) thereto an RNA or DNA extension arm, wherein the extension arm comprises a Primer Binding Site (PBS) and a DNA synthesis template. The programmable nuclease can also be coupled (e.g., via a chemical or amino acid linker) to a polymerase whose nature depends on whether the extension arm is DNA or RNA. In the case of an RNA extension arm, the polymerase may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of a DNA extension arm, the polymerase can be a DNA-dependent DNA polymerase (e.g., a prokaryotic polymerase, including PolI, polII, or PolIII, or a eukaryotic polymerase, including Pola, polb, polg, pold, pole or Polz). The system may also include other functions added as a fusion to the programmable nuclease or added in trans to promote an overall reaction (e.g., (a) helicase to unwind DNA at the cleavage site to form a cleavage strand with a useful 3' end for use as a primer, (b) a flap endonuclease (e.g., FEN 1) to help remove endogenous strands on the cleavage strand to drive the reaction toward replacement of endogenous strands with synthetic strands, or (c) the npas 9: gRNA complex forms a second site nick on the opposite strand, which may help drive integration of synthetic repair by favorable cellular repair of non-editing strands). In a manner similar to guided editing with napDNAbp, this complex with otherwise programmable nucleases can be used to synthesize and then permanently install a newly synthesized DNA substitution strand carrying the editing of interest into the target site of the DNA.
FIG. 1K depicts anatomical features of target DNA that are editable by guided editing in one embodiment. Target DNA comprises "non-target strands" and "target strands". The target strand is a strand that anneals to the spacer of the pegRNA of the guide-editor complex that recognizes the PAM site (in this case NGG, which is recognized by the classical guide-editor based on SpCas 9). The target strand may also be referred to as a "non-PAM strand" or a "non-editing strand". In contrast, non-target strands (i.e., strands containing the pre-spacer and PAM sequences of NGG) may be referred to as "PAM strands" or "edit strands. In various embodiments, the cleavage site of the PE complex is located in the pre-spacer of the PAM strand (e.g., using SpCas 9-based PE). The positioning of the notch will be a feature of the specific Cas9 that forms the PE. For example, with SpCas 9-based PEs, the cleavage site in the phosphodiester bond between base three (position "-3" relative to position 1 of the PAM sequence) and base four (position "-4" relative to position 1 of the PAM sequence). The cleavage site in the pre-spacer forms a free 3' hydroxyl group, which complexes with the primer binding site of the pegRNA extension arm and provides a substrate to initiate polymerization of the single stranded DNA code by the DNA synthesis template of the pegRNA extension arm, as shown in the following figure. The polymerization reaction is catalyzed in the 5 'to 3' direction by a polymerase (e.g., reverse transcriptase) that directs the editor. Polymerization is terminated prior to reaching the gRNA core (e.g., by inclusion of a polymerization termination signal or secondary structure that serves to terminate the polymerization activity of the PE), resulting in a single stranded DNA flap extending from the original 3' hydroxyl group of the nicked PAM strand. The DNA synthesis template encodes single-stranded DNA homologous to an endogenous 5' end DNA single strand immediately adjacent to the nicking site of the PAM strand and incorporating the desired nucleotide changes (e.g., single base substitutions, insertions, deletions, inversions). The desired editing position can be anywhere after the nick site on the PAM strand, positions +1, +2, +3, +4 (the start of the PAM site), +5 (the position 2 of the PAM site), +6 (the position 3 of the PAM site), +7, +8, +9, +10, +11, +12, +13, +14, +15, +16, +17, +18, +19, +20, +21, +22, +23, +24, +25, +26, +27, +28, +29, +30, +31, and +1 can be included. +32, +33, +34, +35, +36, +37, +38, +39, +40, +41, +42, +43, +44, +45, +46, +47, +48, +49, +50, +51, +52, +53, +54, +55, +56, +57, +58, +59, +60, +61, +62, +63, +64, +65, +66, +67, +68, +69, +70, +71, +72, and a method of manufacturing the same. +73, +74, +75, +76, +77, +78, +79, +80, +81, +82, +83, +84, +85, +86, +87, +88, +89, +90, +91, +92, +93, +94, +95, +96, +97, +98, +99, +100, +101, +102, +103, +104, +105, +106, +107, +108, +109, +110, +111, +112, +113, +114, +115, +116, +117, +118, +119, +120, +121, +122, +123, +124, +125, +126, +127, +128, +129, +130, +131, +132, +133, +134, +135, +136, +137, +138, +139, +140, +141, +144, +145, +146, +147, +148, +147, +146, +148, +18, +35, +18 +149, or +150 or more (downstream position relative to the incision site). Once the 3 'terminal single-stranded DNA (containing the edits of interest) replaces the endogenous 5' terminal single-stranded DNA, the DNA repair and replication process will result in the edits on the permanently installed PAM strand, then correct the mismatches present on the non-PAM strand of the target site. Thus, two DNA strands that will extend to the target DNA site are edited. It should be understood that references to "edit strand" and "non-edit" strands are intended only to depict DNA strands involved in the PE mechanism. An "editing strand" is a strand that is first edited by replacing the 5 '-terminal single-stranded DNA immediately downstream of the nicking site with a synthetic 3' -terminal single-stranded DNA containing the desired editing. A "non-editing" strand is a strand that pairs with an editing strand, but which itself also edits by repair and/or replication to complement the editing strand, particularly the edit of interest.
FIG. 1L depicts a guided editing mechanism showing the anatomical features of the target DNA, the guided editor complex, and the interaction between pegRNA and target DNA. First, a guide editor comprising SpCas9 with a polymerase (e.g., reverse transcriptase) and napDNAbp (e.g., spCas9 nickase, e.g., with a inactivating mutation (e.g., H840A) in the HNH nuclease domain or with a inactivating mutation (D10A) in the RuvC nuclease domain) is complexed with perna and DNA with target DNA to be edited. The pegRNA comprises a spacer region, a gRNA core (also known as a gRNA scaffold or gRNA backbone) that binds to napdNAbp, and an extension arm. The extension arm may be located at the 3 'end, the 5' end of the pegRNA molecule or somewhere within the pegRNA molecule. As shown, the extension arm is located at the 3' end of PEgRNA. The extension arm comprises a primer binding site and a DNA synthesis template in the 3' to 5' direction (comprising the editing of interest and a homology region (i.e., homology arm) to the 5' terminal single-stranded DNA immediately adjacent to the nicking site of the PAM strand). As shown, once the nick is introduced, thereby creating a free 3' hydroxyl group immediately upstream of the nick site, the region immediately upstream of the nick site of the PAM strand anneals to a complementary sequence at the 3' end of the extension arm, referred to as the "primer binding site", creating a short double-stranded region with an available 3' hydroxyl end, which forms a substrate for the polymerase that directs the editor complex. The polymerase (e.g., reverse transcriptase) then polymerizes the DNA strand from the 3' hydroxyl end to the end of the extension arm. The sequence of single stranded DNA is encoded by a DNA synthesis template, which is the portion of the extension arm that is "read" by the polymerase to synthesize new DNA (i.e., excluding the primer binding site). This polymerization effectively extends the sequence of the original 3' hydroxyl terminus of the original cleavage site. The DNA synthesis template encodes single-stranded DNA that contains not only the desired editing, but also regions of homology to the endogenous single-stranded DNA immediately downstream of the nicking site of the PAM strand. Next, the encoded 3 'terminal single-stranded DNA (i.e., the 3' single-stranded DNA flap) replaces the corresponding homologous endogenous 5 'terminal single-stranded DNA immediately downstream of the nicking site of the PAM strand, forming a DNA intermediate with a 5' terminal single-stranded DNA flap that is removed by the cell (e.g., by a flap endonuclease). The 3' terminal single-stranded DNA flap (which anneals to the complement of the endogenous 5' terminal single-stranded DNA flap) is attached to the endogenous strand after removal of the 5' DNA flap. The desired editing in the 3' terminal single stranded DNA flap, now annealed and ligated, forms a mismatch with the complementary strand, which undergoes DNA repair and/or replication rounds, permanently installing the desired editing on both strands.
Fig. 2 shows three Cas complexes (SpCas 9, saCas9, and LbCas12 a) and PAM, gRNA, and DNA cleavage features thereof that can be used in the guide editors described herein. The figure shows the design of a complex involving SpCas9, saCas9 and LbCas12 a.
Figures 3A to 3F show designs of engineered 5 'guided editor gRNA (figure 3A), 3' guided editor gRNA (figure 3B) and intramolecular extension (figure 3C). pegRNA is also referred to herein as pegRNA or "guided editing guide RNA". Fig. 3D and 3E provide other embodiments of 3 'and 5' guided editor gRNA (pegRNA), respectively. FIG. 3F shows the interaction between the 3' end-guided editor guide RNA and the target DNA sequence. The embodiments of fig. 3A-3C depict exemplary arrangements of reverse transcription template sequences (i.e., or more broadly, DNA synthesis templates, as shown, because RT is only one type of polymerase that can be used in the context of a guided editor), primer binding sites, and optional linker sequences in 3', 5' and intramolecular forms of extensions, as well as general arrangements of spacer and core regions. The disclosed guided editing process is not limited to these configurations of pegRNA. The embodiment of fig. 3D provides the structure of the exemplary PEgRNA contemplated herein. PEgRNA comprises three major constituent elements ordered in the 5 'to 3' direction, namely: a spacer, a gRNA core, and an extension arm at the 3' end. The extension arms can be further divided in the 5 'to 3' direction into the following structural elements, namely: optionally homology arms, DNA synthesis templates, primer Binding Sites (PBS). In addition, the pegRNA may comprise an optional 3 'terminal modification region (e 1) and an optional 5' terminal modification region (e 2). Still further, the pegRNA may comprise a transcription termination signal (not depicted) at the 3' end of the pegRNA. These structural elements are further defined herein. The description of the pegRNA structure is not meant to be limiting, but rather encompasses variations in the arrangement of the elements. For example, the optional sequence modification regions (modifiers) (e 1) and (e 2) may be located within or between any of the other regions shown, and are not limited to being located at the 3 'and 5' ends. In certain embodiments, the pegRNA may comprise secondary RNA structures such as, but not limited to, hairpin, stem/loop, toe loop (toe loop), RNA binding protein recruitment domain (e.g., MS2 aptamer recruiting and binding to MS2cp protein). For example, such secondary structures may be located within the spacer, gRNA core or extension arm, particularly within the e1 and/or e2 modification regions. In addition to the secondary RNA structure, the pegRNA may also comprise (e.g., within the e1 and/or e2 modification region) a chemical linker or poly (N) linker or tail, where "N" may be any nucleobase. In some embodiments (e.g., as shown in fig. 72 (c)), the chemical linker can function to prevent reverse transcription of the sgRNA scaffold or core. Furthermore, in certain embodiments (see, e.g., fig. 72 (c)), the extension arm (3) can be composed of RNA or DNA, and/or can include one or more nucleobase analogs (e.g., which can add functionality, such as temperature elasticity). Still further, the orientation of the extension arm (3) may be a natural 5 'to 3' direction, or synthesized in the opposite orientation in the 3 'to 5' direction (relative to the orientation of the entire pegRNA molecule). It should also be noted that one of ordinary skill in the art will be able to select an appropriate DNA polymerase depending on the nature of the nucleic acid material (i.e., DNA or RNA) of the extension arm for use in guided editing, which can be implemented as a fusion with napDNAbp or as a separate part provided in trans, to synthesize a 3' single stranded DNA flap containing the desired template encoding of the desired editing. For example, if the extension arm is RNA, the DNA polymerase may be a reverse transcriptase or any other suitable RNA-dependent DNA polymerase. However, if the extension arm is DNA, the DNA polymerase may be a DNA-dependent DNA polymerase. In various embodiments, the DNA polymerase may be provided in trans, e.g., by using an RNA-protein recruitment domain (e.g., an MS2 hairpin mounted on a peRNA (e.g., in the e1 or e2 region or elsewhere, and an MS2cp protein fused to the DNA polymerase, thereby co-localizing the DNA polymerase to the peRNA). It should also be noted that the primer binding site typically does not form part of a DNA polymerase (e.g., a reverse transcriptase) that encodes the resulting template comprising the desired edited 3' single-stranded DNA flap, thus, the name "DNA synthesis template" refers to a region or portion of the extension arm (3) containing the edited desired 3' single-stranded DNA flap and a region homologous to the 5' single-stranded DNA flap, wherein the 5' single-stranded DNA flap is replaced by a 3' single-stranded DNA strand product that directs the editing of DNA synthesis. The function of the DNA polymerase may be terminated. It is possible that part or even all of the e2 region is encoded into DNA. How much e2 is actually used as a template will depend on its composition and whether this composition interrupts the DNA polymerase function.
The embodiment of fig. 3E provides another pegRNA structure contemplated herein. pegRNA contains three major constituent elements ordered in the 5 'to 3' direction, namely: a spacer, a gRNA core, and an extension arm at the 3' end. The extension arms can be further divided in the 5 'to 3' direction into the following structural elements, namely: optionally homology arms, DNA synthesis templates, primer Binding Sites (PBS). In addition, the pegRNA may comprise an optional 3 'terminal modification region (e 1) and an optional 5' terminal modification region (e 2). Still further, the pegRNA may comprise a transcription termination signal (not depicted) at the 3' end of the pegRNA. These structural elements are further defined herein. The description of the pegRNA structure is not meant to be limiting, but rather encompasses variations in the arrangement of the elements. For example, the optional sequence modification regions (e 1) and (e 2) may be located within or between any of the other regions shown, and are not limited to being located at the 3 'and 5' ends. In certain embodiments, the pegRNA may comprise a secondary RNA structure such as, but not limited to, hairpin, stem/loop, toe loop, RNA binding protein recruitment domain (e.g., MS2 aptamer recruiting and binding to MS2cp protein). These secondary structures may be located anywhere in the pegRNA molecule. For example, such secondary structures may be located within the spacer region, the gRNA core, or the extension arm, particularly within the e1 and/or e2 modification region. In addition to the secondary RNA structure, the pegRNA may also comprise (e.g., within the e1 and/or e2 modification region) a chemical linker or poly (N) linker or tail, where "N" may be any nucleobase. In some embodiments (e.g., as shown in fig. 72 (c)), the chemical linker can function to prevent reverse transcription of the sgRNA scaffold or core. Furthermore, in certain embodiments (see, e.g., fig. 72 (c)), the extension arm (3) can be composed of RNA or DNA, and/or can include one or more nucleobase analogs (e.g., which can add functionality, such as temperature elasticity). Still further, the orientation of the extension arm (3) may be a natural 5 'to 3' direction, or synthesized in the opposite orientation in the 3 'to 5' direction (relative to the orientation of the entire pegRNA molecule). It should also be noted that one of ordinary skill in the art will be able to select an appropriate DNA polymerase for guided editing, either as a fusion with napDNAbp or as a separate part, to be provided in trans for implementation to synthesize a 3' single stranded DNA flap containing the desired template encoding of the desired editing, depending on the nature of the nucleic acid material (i.e., DNA or RNA) of the extension arm. For example, if the extension arm is RNA, the DNA polymerase may be a reverse transcriptase or any other suitable RNA-dependent DNA polymerase. However, if the extension arm is DNA, the DNA polymerase may be a DNA-dependent DNA polymerase. In various embodiments, the DNA polymerase may be provided in trans, e.g., by using an RNA-protein recruitment domain (e.g., an MS2 hairpin mounted on a peRNA (e.g., in the e1 or e2 region or elsewhere, and in an MS2cp protein fused to the DNA polymerase, thereby co-locating the DNA polymerase to the peRNA). It should also be noted that the primer binding site typically does not form part of a DNA polymerase (e.g., a reverse transcriptase) that encodes a resulting template comprising the desired edited 3' single-stranded DNA flap, thus, the name "DNA synthesis template" refers to a region or portion of the extension arm (3) comprising the edited desired 3' single-stranded DNA flap and a region homologous to the 5' single-stranded DNA flap, wherein the 5' single-stranded DNA flap is replaced by a 3' single-stranded DNA strand product that directs the synthesis of the edited DNA. The function of the DNA polymerase may be terminated. It is possible that part or even all of the e2 region is encoded into DNA. How much e2 is actually used as a template will depend on its composition and whether this composition interrupts the DNA polymerase function.
FIG. 3F is a schematic drawing depicting the interaction of a typical pegRNA with a target site of double stranded DNA and the concomitant generation of a 3' single stranded DNA flap containing a change in the gene of interest. Double-stranded DNA is shown as the upper (top) strand in the 3 'to 5' direction (i.e., the target strand) and the lower (lower) strand in the 5 'to 3' direction (i.e., the PAM strand or the non-target strand). The upper strand contains the complement of the "pre-spacer" and the complement of the PAM sequence, and is referred to as the "target strand" because it is the strand of the spacer of the pegRNA that is targeted and annealed to. The complementary lower strand is referred to as the "non-target strand" or "PAM strand" or "pre-spacer strand" because it contains a PAM sequence (e.g., NGG) and a pre-spacer. Although not shown, the delineated pegRNA will complex with Cas9 or equivalent domains that guide the editor. As shown in the schematic (fig. 3F), the spacer sequence of the pegRNA anneals to the complementary region of the pre-spacer sequence of the target strand. This interaction forms a DNA/RNA hybrid between the complementary sequences of the spacer RNA and the pre-spacer DNA and induces the formation of an R loop in the pre-spacer. As taught elsewhere herein, cas9 protein (not shown) then induces a notch in the non-target strand, as shown. This then leads to the formation of a 3' ssdna flap region immediately upstream of the nicking site, which 3' ssdna flap region interacts with the 3' end of the perna of the primer binding site according to x z. The 3' end of the ssDNA flap (i.e., reverse transcriptase primer sequence) anneals to the primer binding site (a) on the pegRNA, thereby priming the reverse transcriptase. Next, the reverse transcriptase (e.g., provided in trans or as a fusion protein cis, attached to Cas9 construct) polymerizes single stranded DNA encoded by the DNA synthesis template, including editing template (B) and homology arm (C). Polymerization continues toward the 5' end of the extension arm. The polymeric strands of ssDNA form the ssDNA 3 'end flap, as described elsewhere (e.g., as shown in fig. 1G), invade the endogenous DNA, displace the corresponding endogenous strand (which is removed as the 5' end DNA flap of the endogenous DNA), and install the desired nucleotide edits (single nucleotide base pair changes, deletions, insertions (including the entire gene) through the DNA repair/replication cycle.
Fig. 3G depicts yet another embodiment of guided editing contemplated herein. In particular, the top schematic drawing depicts one embodiment of a guided editor (PE) comprising a fusion protein of napDNAbp (e.g., spCas 9) and a polymerase (e.g., reverse transcriptase) linked by a linker. PE forms a complex with the pegRNA by binding to the gRNA core of the pegRNA. In the embodiment shown, the pegRNA is equipped with a 3 'extension arm, which, starting from the 3' end, comprises a Primer Binding Site (PBS), followed by a DNA synthesis template. The bottom schematic depicts a variation of the bootstrap editor, referred to as the "trans-bootstrap editor (tPE)". In this embodiment, the DNA synthesis template and PBS are separated from the pegRNA and presented on separate molecules, referred to as trans-guide editor RNA template ("tPERT"), which comprises an RNA-protein recruitment domain (e.g., MS2 hairpin). PE itself is further modified to comprise a fusion with a rPERT recruitment protein ("RP"), a protein that specifically recognizes and binds to the RNA protein recruitment domain. In examples where the RNA protein recruitment domain is an MS2 hairpin, the corresponding rprt recruitment protein may be MS2cp of the MS2 tagging system. The MS2 tagging system is based on the natural interaction of the MS2 phage coat protein ("MCP" or "MS2 cp") with the stem loop or hairpin structure present in the phage genome, i.e. "MS2 hairpin" or "MS2 aptamer". In the case of trans-guide editing, the RP-PE: gRNA complex "recruits" tPERT with the appropriate RNA-protein recruitment domain to co-localize with the PE: gRNA complex, thereby providing in trans the PBS and DNA synthesis templates used in guide editing, as illustrated in FIG. 3H for example.
Fig. 3H depicts the process of trans-lead editing. In this embodiment, the trans-guide editor comprises a "PE2" guide editor (i.e., a fusion of Cas9 (H840A) and a variant MMLV RT) fused to the MS2cp protein (i.e., a class of recruitment proteins that recognize and bind to the MS2 aptamer) and complexed to the sgRNA (i.e., a standard guide RNA as opposed to the pegRNA). The trans-guide editor binds to the target DNA and nicks the non-target strand. MS2cp proteins trans-recruit tPERT by specific interactions with the RNA protein recruitment domain of tPERT molecules. the tprt is co-located with a trans-guide editor to provide PBS and DNA synthesis templates in trans for use by reverse transcriptase polymerase to synthesize single stranded DNA flaps that have 3' ends and contain the desired genetic information encoded by the DNA synthesis templates.
Figures 4A to 4E show in vitro guided editing assays. FIG. 4A is a schematic representation of the templated extension of the fluorescent-labeled DNA substrate gRNA by RT enzyme, PAGE. FIG. 4B shows guided editing with pre-nicked substrate, dCAS9 and 5' -extended pegRNA of different synthetic template lengths. Fig. 4C shows RT reaction with pre-nicked DNA substrate in the absence of Cas 9. Fig. 4D shows guided editing with pegRNA with Cas9 (H840A) and 5' -extension on a complete dsDNA substrate. FIG. 4E shows a 3' extended pegRNA template with pre-nicked and intact dsDNA substrates. All reactions utilized M-MLV RT.
Figure 5 shows in vitro verification using 5' extended pegRNA with different length synthetic templates. Fluorescence labelled (Cy 5) DNA targets were used as substrates and nicks were pre-generated in this set of experiments. Cas9 used in these experiments was catalytic death Cas9 (dCas 9), and RT used was commercial RT derived from moloney-murine leukemia virus (M-MLV), superscript III. dCAS 9. GRNA complexes are formed from purified modules. The fluorescently labeled DNA substrate is then added along with dntps and RT enzyme. After incubation at 37℃for 1 hour, the reaction products were analyzed by denaturing urea-polyacrylamide gel electrophoresis (PAGE). The gel image shows that the original DNA strand extends to a length consistent with the length of the reverse transcription template.
Figure 6 shows in vitro verification using 5' -pegRNA with different length synthetic templates, which are closely parallel to those shown in figure 5. However, in this set of experiments, the DNA substrate was not previously nicked. The Cas9 used in these experiments was Cas9 nickase (SpyCas 9H 840A mutant) and the RT used was commercial RT derived from moloney-murine leukemia virus (M-MLV), superscript III. The reaction products were analyzed by denaturing urea-polyacrylamide gel electrophoresis (PAGE). As shown in the gel, the nicking enzyme can cleave the DNA strand with high efficiency when using standard gRNA (gRNA-0, lane 3).
Figure 7 shows that 3' extension supports DNA synthesis and does not significantly affect Cas9 nickase activity. When dCas9 or Cas9 nicking enzymes are used, the pre-nicked substrate (black arrow) is almost quantitatively converted to RT product (lanes 4 and 5). More than 50% conversion of the RT product was observed with the complete substrate (red arrow) (lane 3). Cas9 nickase (SpyCas 9H 840A mutant), catalytic death Cas9 (dCas 9), and commercial RT, superscriptIII derived from moloney murine leukemia virus (M-MLV) were used.
FIG. 8 shows a two-color experiment to determine if the RT reaction preferentially occurs in cis with gRNA (bound in the same complex). Two independent experiments were performed on 5 '-extended and 3' -extended pegRNA. The products were analyzed by PAGE. The product ratio was calculated as (Cy 3cis/Cy3 trans)/(Cy 5trans/Cy5 cis).
Fig. 9A to 9D show valve model substrates. Figure 9A shows a dual FP reporter for flap-directed mutagenesis. Fig. 9B shows stop codon repair in HEK cells. FIG. 9C shows sequenced yeast colonies after flap repair. Fig. 9D shows the testing of different valve characteristics in human cells.
FIG. 10 shows guided editing of plasmid substrates. The bifluorescence reporter plasmid was constructed for yeast (s.cerevisiae) expression. Expression of this construct in yeast only produced GFP. The in vitro directed editing reaction introduces point mutations and either the parental plasmid or the in vitro Cas9 (H840A) nicked plasmid is transformed into yeast. Colonies were visualized by fluorescent imaging. Yeast double FP plasmid transformants are shown. Transformation of the parental plasmid or in vitro Cas9 (H840A) nicking plasmid only produced green GFP expression colonies. The guided editing reactions using 5 '-extended or 3' -extended pegRNA produced a mixture of green and yellow colonies. The latter expresses both GFP and mCherry simultaneously. More yellow colonies were observed with 3' extended pegRNA. Positive controls without stop codon are also shown.
FIG. 11 shows guided editing of plasmid substrates similar to the experiment in FIG. 10, but not mounting point mutations in the stop codon, guided editing of mounting repair frameshift mutations and allowing single nucleotide insertions (left) or deletions (right) of downstream mCherry synthesis. Both experiments used 3' extended pegRNA.
Fig. 12 shows the editing product of guided editing of a plasmid substrate characterized by Sanger sequencing. Individual colonies from TRT transformation were selected and analyzed by Sanger sequencing. Precise editing was observed by sequencing selected colonies. The green colonies contained plasmids with the original DNA sequence, while the yellow colonies contained the exact mutations designed by the guide editing gRNA. No other point mutations or insertions/deletions were observed.
FIG. 13 shows the potential scope of the novel guided editing technique and is compared to the deaminase mediated base editor technique.
Fig. 14 shows a schematic diagram of editing in a human cell.
FIG. 15 shows the extension of primer binding sites in gRNA.
Figure 16 shows truncated gRNA for proximity targeting.
Figures 17A to 17C are graphs showing% T-to-a conversion at target nucleotides following transfection of the modules in Human Embryonic Kidney (HEK) cells. Fig. 17A shows data presenting results using an N-terminal fusion (32 amino acid linker) of wild-type MLV reverse transcriptase with Cas9 (H840A) nickase. FIG. 17B is similar to FIG. 17A except for the C-terminal fusion of the RT enzyme. Fig. 17C is similar to fig. 17A, but the linker between MLVRT and Cas9 is 60 amino acids in length instead of 32 amino acids.
Figure 18 shows high purity T-to-a editing at HEK3 sites by high throughput amplicon sequencing. The output of the sequencing analysis showed the most abundant genotypes of the edited cells.
FIG. 19 shows editing efficiency at target nucleotide (blue bar) and indel rate (orange bar). WT refers to wild-type MLV RT enzyme. The mutant enzymes (M1 to M4) contained the mutations listed on the right. The rate of editing was quantified by high throughput sequencing of genomic DNA amplicons.
FIG. 20 shows the editing efficiency of a target nucleotide when a single-stranded nick is introduced adjacent to the target nucleotide in a complementary DNA strand. Nick generation (triangles) at different distances from the target nucleotide was tested. The editing efficiency of the target base pairs (blue bars) is shown along with the indel formation rate (orange bars). The "none" example does not contain a complementary strand-incision producing guide RNA. The rate of editing was quantified by high throughput sequencing of genomic DNA amplicons.
Figure 21 shows the processed high throughput sequencing data showing the general absence of desired T-to-a transversion mutations and other major genome editing byproducts.
FIG. 22 provides a schematic of an exemplary process for targeted mutagenesis (i.e., guided editing with error-prone RT) with error-prone reverse transcriptase at a target locus using a nucleic acid programmable DNA binding protein (napdNAbp) complexed with pegRNA. This process may be referred to as an embodiment of guided editing for targeted mutagenesis. The pegRNA comprises an extension at the 3 'or 5' end of the guide RNA or at a position within the molecule of the guide RNA. In step (a), the napDNAbp/gRNA complex is contacted with a DNA molecule and the gRNA directs the binding of napDNAbp to the target locus to be mutagenized. In step (b), a nick is introduced (e.g., by a nuclease or chemical agent) into one of the DNA strands at the target locus, thereby producing a useful 3' end in one of the strands at the target locus. In certain embodiments, the nicks are created in the DNA strand corresponding to the R-loop strand, i.e., the strand that does not hybridize to the guide RNA sequence. In step (c), the 3' -terminal DNA strand interacts with the extension of the guide RNA to initiate reverse transcription. In certain embodiments, the 3' terminal DNA strand hybridizes to a specific RT initiation sequence on the extension of the guide RNA. In step (d), an error-prone reverse transcriptase is introduced which synthesizes mutagenized single stranded DNA from the 3 'end of the priming site to the 3' end of the guide RNA. Exemplary mutations are indicated by asterisks. This forms a single stranded DNA flap containing the desired mutagenesis region. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) involve the breakdown of single stranded DNA flaps (comprising the mutagenic region) such that the desired mutagenic region is integrated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5 'endogenous DNA flap, which is formed once the 3' single stranded DNA flap invades the complementary sequence on the other strand and hybridizes thereto. The second strand break generation can also be used to drive the process toward product formation, as illustrated in FIG. 1F. After endogenous DNA repair and/or replication processes, the mutagenic region is incorporated into both DNA strands of the DNA locus.
FIG. 23 is a schematic of a gRNA design for trinucleotide repeat sequence reduction and trinucleotide repeat sequence reduction using guided editing. Trinucleotide repeat amplification is associated with a number of human diseases including huntington's disease, fragile X syndrome and friedreich's ataxia. The most common trinucleotide repeats contain the CAG triplets, but GAA triplets (friedreich's ataxia) and CGG triplets (fragile X syndrome) are also present. Inheriting amplified predisposition or obtaining amplified parent alleles increases the likelihood of disease. Presumably, guided editing can be used to correct for the pathogenic amplification of trinucleotide repeats. The region upstream of the repeat region may be nicked by RNA-guided nucleases and then used to initiate synthesis of new DNA strands containing a healthy number of repeats (depending on the particular gene and disease). After the repeated sequence, a short homology segment is added that matches the identity of the sequence (red strand) adjacent to the other end of the repeated sequence. Invasion of the newly synthesized strand, and subsequent replacement of endogenous DNA with the newly synthesized flap, results in a reduced repeat allele.
FIG. 24 is a schematic diagram showing the deletion of exactly 10 nucleotides using guided editing. Guide RNAs targeting the HEK3 locus were designed with reverse transcription templates encoding a 10 nucleotide deletion after the nicking site. Amplicon sequencing was used to assess the editing efficiency of transfected HEK cells.
FIG. 25 is a schematic diagram showing the gRNA design of peptide tagged genes at endogenous genomic loci and peptide tagging with guided editing. The FlAsH and ReAsH labeling systems include two parts: (1) Fluorophore-biarsenic probe, and (2) a gene encoding a peptide comprising a tetracysteine motif, e.g., the sequence FLNCCPGCCMEP (SEQ ID NO: 1). Proteins containing the tetracysteine motif can be fluorescently labeled with a fluorophore-arsenic probe when expressed in cells (see reference: J.Am. Chem. Soc.,2002,124 (21), pp6063-6076.DOI:10.1021/ja017687 n). The "sort labelling" system employs bacterial sortase which covalently conjugates a labeled peptide probe with a protein containing a suitable peptide substrate (see reference: nat. Chem. Biol.2007Nov;3 (11): 707-8.DOI: 10.1038/nchambio.2007.31). FLAG tag (DYKDDDDK (SEQ ID NO: 2)), V5 tag (GKPIPNPLLGLDST (SEQ ID NO: 3)), GCN4 tag (EELLSKNYHLENEVARLKK (SEQ ID NO: 4)), HA tag (YPYDVPDYA (SEQ ID NO: 5)) and Myc-tag (EQKLISEEDL (SEQ ID NO: 6)) are commonly used as epitope tags for immunoassays. The peptide sequence encoded by pi clamp (p-clamp) (FCPF (SEQ ID NO: 7)) can be labeled with a pentafluoroaromatic substrate (see: nat. Chem.2016Feb;8 (2): 120-8.Doi:10.1038/nchem. 2413).
FIG. 26A shows His 6 The tag and FLAG-tag are precisely mounted into genomic DNA. The guide RNA targeting the HEK3 locus was designed with a reverse transcription template encoding either an 18-nt His tag insertion or a 24-nt FLAG tag insertion. Amplicon sequencing was used to assess editing efficiency in transfected HEK cells. Note that the complete 24-nt sequence of the FLAG tag is outside the observation box (viewing frame) (sequencing confirms complete and precise insertion). FIG. 26B shows a schematic diagram outlining various applications involving protein/peptide tagging, including (a) making a protein soluble or insoluble, (B) altering or tracking cellular localization of a protein, (c) extending protein half-life, (d) facilitating protein purification, and (e) facilitating protein detection.
Fig. 27 shows an overview of guided editing by installing protective mutations in PRNPs that prevent or block prion disease progression. The pegRNA sequence corresponds to residues 1-20 of SEQ ID NO. 810 on the left (i.e., 5 'of the sgRNA scaffold) and residues 21-43 of SEQ ID NO. 810 on the right (i.e., 3' of the sgRNA scaffold)
FIG. 28A is a schematic of PE-based insertion of sequences encoding RNA motifs. Fig. 28B is a list (non-exhaustive) of some example motifs and their functions that may be inserted.
Fig. 29A is a depiction of a boot editor. FIG. 29B shows possible modifications to PE-directed genomic, plasmid or viral DNA. Fig. 29C shows an example scheme of inserting a peptide loop library into a defined protein (GFP in this case) via a pegRNA library. FIG. 29D shows examples of possible programmable deletions of codons or N-or C-terminal truncations of proteins using different pegRNAs. Deletions are expected to occur with minimal generation of frameshift mutations.
FIG. 30 shows a possible scheme for repeated insertion of codons in a continuous evolution system such as PACE.
FIG. 31 is a diagram showing an engineered gRNA core, an approximately 20nt spacer matched to a targeted gene sequence, a reverse transcription template with an immunogenic epitope nucleotide sequence, and a primer binding site matched to the targeted gene sequence.
FIG. 32 is a schematic diagram showing the use of guided editing as a means of inserting known immunogenic epitopes into endogenous or exogenous genomic DNA, resulting in modification of the corresponding protein.
FIG. 33 is a schematic representation of the use of guided editing to determine off-target editing for primer binding sequence insertion and primer binding insert genomic DNA pegRNA design. In this embodiment, guided editing is performed within living cells, tissues, or animal models. In the first step, the appropriate pegRNA is designed. The upper schematic shows exemplary pegRNAs that can be used in this regard. The spacer in the pegRNA (labeled "pre-spacer") is complementary to one of the strands of the genomic target. PE-RNA complexes (i.e., PE complexes) have a single-stranded 3' -end flap attached at the cleavage site, which contains the encoded primer binding sequence and a homologous region (encoded by the homology arm of the pegRNA) that is complementary to the region immediately downstream of the cleavage site (red). The synthesized strand is incorporated into the DNA by flap invasion and DNA repair/replication processes, thereby installing a primer binding site. This process may occur at the desired genomic target, as well as at other genomic sites that may interact with the pegRNA in an off-target manner (i.e., the pegRNA directs the PE complex to other off-target sites due to complementarity of the spacer region to other genomic sites of the non-desired genomic sites). Thus, the primer binding sequences may be installed not only at the desired genomic target, but also at off-target genomic sites elsewhere in the genome. To detect the insertion of these primer binding sites at the intended genomic target site and off-target genomic site, genomic DNA (post PE) can be isolated, fragmented and ligated to adapter nucleotides (shown in red). Next, PCR can be performed using PCR oligonucleotides annealed to the adaptor and inserted primer binding sequences to amplify the mid-target and off-target genomic DNA regions inserted into the primer binding sites by PE. High throughput sequencing and sequence alignment can then be performed to identify the insertion point of the PE inserted primer binding sequence at the mid-target site or off-target site.
FIG. 34 is a schematic diagram showing precise insertion of a gene using PE.
Fig. 35A is a schematic diagram showing the natural insulin signaling pathway. Fig. 35B is a schematic diagram showing FKBP12 tagged insulin receptor activation controlled by FK 1012.
Fig. 36 shows a small molecule monomer. Reference is made to: buffered FK506 mimetic (2) 107
Fig. 37 shows small molecule dimers. Reference is made to: FK1012 4 95,96 ;FK1012 5 108 ;FK1012 6 107 ;AP1903 7 107 The method comprises the steps of carrying out a first treatment on the surface of the Cyclosporine A dimer 8 98 The method comprises the steps of carrying out a first treatment on the surface of the FK 506-cyclosporin A dimer (FkCsA) 9 100
Figures 38A to 38F provide an overview of pilot editing and feasibility studies in vitro and in yeast cells. Figure 38A shows 75,122 known human pathogenic genetic variants in ClinVar, classified by type (7 month visit 2019). Fig. 38B shows a guide-editing complex consisting of a guide-editor (PE) protein containing an RNA-guided DNA nicking generating domain, such as Cas9 nickase, fused to an engineered reverse transcriptase domain and complexed with a guide-editing guide RNA (pegRNA). PE-RNA complexes bind to target DNA sites and allow for a wide variety of precise DNA edits at various DNA sites either before or after the pre-spacer adjacent motif (PAM) of the target site. FIG. 38C shows that the PE-pegRNA complex nicks the DNA strand containing PAM upon DNA target binding. The resulting free 3' end hybridizes to the primer binding site of the pegRNA. Reverse transcriptase domain catalyzes primer extension using RT template of pegRNA, yielding a newly synthesized DNA strand (3' flap) containing the desired editing. The balance between the edited 3' flap and the non-edited 5' flap containing the original DNA followed by 5' flap cleavage and ligation of the cell, and DNA repair or replication to break down heteroduplex DNA, results in stable edited DNA. FIG. 38D shows an in vitro 5 '-extended pegRNA primer extension assay using a pre-nicked dsDNA substrate containing a 5' -Cy 5-labeled PAM strand, dCAS9 and a commercial M-MLV RT variant (RT, superscriptIII). dCas9 was complexed with pegrnas containing RT templates of different lengths and then added to DNA substrates along with the indicated modules. The reaction was incubated for 1 hour at 37 ℃ and then analyzed by urea denaturing PAGE and visualized for Cy5 fluorescence. FIG. 38E shows primer extension assays using pre-complexed 3 '-extended pegRNA with dCAS9 or Cas 9H 840A nicking enzyme and pre-nicked or un-nicked 5' -Cy 5-labeled dsDNA substrates as in FIG. 38D. FIG. 38F shows transformation of yeast colonies with GFP-mCherry fusion reporter plasmid edited in vitro with pegRNA, cas9 nickase and RT. Plasmids containing nonsense or frameshift mutations between GFP and mCherry were edited using 5 '-extended or 3' -extended pegRNA that restored mCherry translation via transversion mutations, 1-bp insertions or 1-bp deletions. GFP and mCherry biscationic cells (yellow) reflect successful editing.
Fig. 39A to 39D show guided editing of genomic DNA by PE1 and PE2 in human cells. FIG. 39A shows that the pegRNA contains a spacer sequence, a sgRNA scaffold and 3' extension containing a primer binding site (green) and a Reverse Transcription (RT) template (purple), which contains edited bases (red). The primer binding site hybridizes to the PAM-containing DNA strand immediately upstream of the nicking site. Apart from coding editing, the RT template is homologous to the DNA sequence downstream of the nick. FIG. 39B shows the HEK3 site mount T.A to A.T transversion edits in HEK293T cells using Cas 9H 840A nickase fused to wild type M-MLV reverse transcriptase (PE 1) and pegRNA of different primer binding site lengths. FIG. 39C shows that the use of engineered five mutant M-MLV reverse transcriptase (D200N, L603W, T306K, W313F, T330P) in PE2 significantly improves guided editing transversion efficiency at 5 genomic loci in HEK293T cells, as well as small insertion and small deletion editing at HEK 3. FIG. 39D is a comparison of PE2 editing efficiency with different RT template lengths at five genomic loci in HEK293T cells. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Fig. 40A to 40C illustrate that the PE3 and PE3b systems make cuts to the non-edit chain to increase the guided editing efficiency. Fig. 40A is an overview of bootstrap editing by PE 3. After initial synthesis of the editing strand, DNA repair will remove the new synthetic strand containing the editing (3 'flap excision) or the original genomic DNA strand (5' flap excision). The 5' flap cut leaves a DNA heteroduplex containing one editing strand and one non-editing strand. Mismatch repair mechanisms or DNA replication can break down heteroduplexes to provide edited or non-edited products. Nicking the non-editing strand facilitates repair of the strand, preferentially generating stable duplex DNA containing the desired editing. FIG. 40B shows the effect of complementary strand nick generation on PE3 mediated leader editing efficiency and indel formation. "none" refers to a PE2 control that does not nick the complementary strand. FIG. 40C is a comparison of editing efficiency using PE2 (no complementary strand nick), PE3 (general complementary strand nick) and PE3b (editing specific complementary strand nick). All edit yields reflect the percentage of total sequencing reads that contained the expected edits but no indels in all treated cells without sorting. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Figures 41A to 41K show targeted insertions, deletions and all 12 types of point mutations with PE3 at seven endogenous human genomic loci in HEK293T cells. FIG. 41A is a graphical representation of all 12 types of single nucleotide transitions and transversion edits from position +1 to position +8 of the HEK3 site (calculation of pegRNA induced nicks as positions between position +1 and position-1) using a 10-nt RT template. FIG. 41B is a diagram illustrating remote PE3 transversion editing at HEK3 site using a 34-nt RT template. The diagrams of fig. 41C to 41H show all 12 types of transitions and transversion edits at different positions in the guide edit window of (fig. 41C) RNF2, (fig. 41D) FANCF, (fig. 41E) EMX1, (fig. 41F) RUNX1, (fig. 41G) VEGFA and (fig. 41H) DNMT 1. FIG. 41I is a graph showing targeted 1 and 3bp insertions and 1 and 3bp deletions at seven endogenous genomic loci using PE 3. FIG. 41J is a diagram showing targeted exact deletions of 5 to 80bp at HEK3 target site. FIG. 41K is a diagram showing the combined editing of insertions and deletions, insertions and point mutations, deletions and point mutations, and double point mutations at three endogenous genomic loci. All editing yields reflect the percentage of total sequencing reads in treated cells that contained the expected edits but no indels without sorting. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Fig. 42A to 42H show a comparison of guide editing and base editing by Cas9 and PE3 at known Cas9 off-target sites. FIG. 42A shows total C.G to T.A editing efficiency at the same target nucleotides of PE2, PE3, BE2max and BE4max at endogenous HEK3, FANCF and EMX1 sites of HEK293T cells. Fig. 42B shows the indel frequency from the process in fig. 42. Fig. 42C shows the edit efficiency (no bystander edits or indels) of the exact c.g to t.a edits of PE2, PE3, BE2max, and BE4max at HEK3, FANCF, and EMX 1. For EMX1, the exact PE combinatorial editing of all possible combinations of c.g to t.a transitions at three targeting nucleotides is also shown. Fig. 42D shows the total a.t to g.c editing efficiency of PE2, PE3, ABEdmax, and ABEmax at HEK3 and FANCF. Fig. 42E shows the exact a.t to g.c editing efficiency without bystander editing or insertion deletion at HEK3 and FANCF. Fig. 42F shows the indel frequency from the process in fig. 42D. Figure 42G shows the average editing efficiency of Cas9 nuclease in triplicate at four mid-target sites and 16 known off-target sites in HEK293T cells (sequencing with indelsPercent reads). The 16 off-target sites detected were the first four previously reported off-target sites for each of the four mid-target sites 118,159 . For each mid-target site, cas9 pairs with either the sgrnas or each of the 4 pegrnas that recognize the same pre-spacer. Fig. 42H shows the mid-target and off-target editing efficiency and indel efficiency (in brackets below) averaged in triplicate in HEK293T cells for PE2 or PE3 paired with each pegRNA in (fig. 42G). The mid-target editing yield reflects the percentage of total sequencing reads that contained the intended edits and did not contain indels in all treated cells without sorting. Off-target editing yield reflects off-target locus modifications consistent with guided editing. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Fig. 43A to 43I show pilot editing, installation and correction of pathogenic transversions, insertion or deletion mutations, and comparison of pilot editing and HDR in different human cell lines and primary mouse cortical neurons. FIG. 43A is a graph showing the installation (via T.A to A.T transversion) and correction (via A.T to T.A transversion) of pathogenic E6V mutations in HBB of HEK293T cells. Correction of wild-type HBB or HBB containing silent mutations disrupting peprna PAM is shown. FIG. 43B is a diagram showing the installation (via 4-bp insertion) and correction (via 4-bp deletion) of the pathogenic HEXA 1278+TATC allele of HEK293T cells. Correction of wild-type HEXA or HEXA containing silent mutations disrupting pegRNA PAM is shown. FIG. 43C is a diagram showing the installation of a protective G127V variant in a PRNP of HEK293T cells via a G.C to T.A transversion. Fig. 43D is a diagram showing pilot editing in other human cell lines including K562 (leukemia bone marrow cells), U2OS (osteosarcoma cells), and HeLa (cervical cancer cells). FIG. 43E is a graph showing the installation of G.C to T.A transversion mutations in DNMT1 of mouse primary cortical neurons using a double split intein (PE 3 lentiviral system, where the N-terminal half is fused to the N-intein and to Cas9 (1-573) of GFP-KASH by the P2A self-cleaving peptide, and the C-terminal half is the C-intein fused to the remainder of PE 2. The PE2 half is expressed by a human synapsin promoter highly specific for mature neurons. The sorted values reflect editing or indels from GFP-positive nuclei, while the unsorted values are from all nuclei. Fig. 43F is a comparison of efficiency of PE3 and Cas9 mediated HDR editing at an endogenous genomic locus in HEK293T cells. Fig. 43G is a comparison of the efficiency of PE3 and Cas9 mediated HDR editing at endogenous genomic loci of K562, U2OS and HeLa cells. Fig. 43H is a comparison of PE3 and Cas9 mediated HDR indel byproduct production in HEK293T, K562, U2OS and HeLa cells. FIG. 43I shows targeted insertion of His6 tag (18 bp), FLAG epitope tag (24 bp) or extended LoxP site (44 bp) by PE3 in HEK293T cells. All edit yields reflect the percentage of total sequencing reads in all treated cells that contained the intended edits but no indels. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Fig. 44A to 44G show in vitro guided editing validation studies using fluorescently labeled DNA substrates. FIG. 44A shows electrophoretic mobility measurements using dCAS9, 5 '-extended pegRNA and 5' -Cy 5-labeled DNA substrates. pegrnas 1 to 5 contain a 15-nt linker sequence between the spacer and PBS (linker a for pegRNA 1 and linker B for pegrnas 2 to 5), a 5-nt PBS sequence and RT templates of 7nt (pegrnas 1 and 2), 8nt (pegRNA 3), 15nt (pegRNA 4) and 22nt (pegRNA 5). pegRNA are those used in fig. 44E and 44F; the complete sequences are listed in tables 2A to 2C. Figure 44B shows an in vitro incision generation assay for Cas9H840A using 5 '-extended and 3' -extended pegrnas. Fig. 44C shows Cas 9-mediated indel formation at HEK3 in HEK293T cells using 5 '-extended and 3' -extended pegrnas. Fig. 44D shows a guided editing overview of in vitro biochemical assays. The 5' -Cy5 labeled pre-nicked and non-nicked dsDNA substrates were tested. sgrnas, 5 '-extended pegrnas or 3' -extended pegrnas were pre-complexed with dCas9 or Cas9H840A nickase and then combined with dsDNA substrates, M-MLV RT and dntps. The reaction was allowed to proceed for 1 hour at 37 ℃ and then separated by urea denaturing PAGE and visualized by Cy5 fluorescence. Figure 44E shows that primer extension reactions using 5' -extended pegRNA, pre-nicked DNA substrate and dCas9 resulted in significant conversion to RT products. FIG. 44F shows a primer extension reaction using 5' -extended pegRNA as in FIG. 44B with a non-nicked DNA substrate and Cas9H840A nicking enzyme. The product yield is greatly reduced compared to the previously nicked substrate. FIG. 44G shows the generation of a single apparent product by urea denaturing PAGE using an in vitro primer extension reaction of 3' -pegRNA. The RT product bands were excised, eluted from the gel, and then homomeric tailing with terminal transferase (TdT) was performed using dGTP or dATP. The tailing product is extended by poly-T or poly-C primers and the resulting DNA sequenced. Sanger trace indicated that three nucleotides derived from the gRNA scaffold were reverse transcribed (added as the final 3' nucleotide to the DNA product). Note that in mammalian cell-guided editing experiments, pegRNA scaffold insertion was much less frequent than in vitro (fig. 56A-56D), probably due to the inability of tethered reverse transcriptase to access Cas 9-bound guide RNA scaffolds and/or cytotomy of mismatched 3 '-ends of the 3' -flap containing the pegRNA scaffold sequence.
FIGS. 45A to 45G show cell repair from a 3' DNA flap in yeast from an in vitro directed editing reaction. FIG. 45A shows that the dual fluorescent protein reporter plasmid contains GFP and mCherry open reading frames separated by a target site encoding an in-frame stop codon, +1 frameshift or-1 frameshift. The editing reactions were conducted in vitro using Cas 9H 840A nickase, PEgRNA, dntps, and M-MLV reverse transcriptase, and then transformed into yeast. Colonies containing non-editing plasmids produced GFP but not mCherry. Yeast colonies containing the edited plasmid produced GFP and mCherry as fusion proteins. FIG. 45B shows the superposition of GFP and mCherry fluorescence of yeast colonies transformed with a reporter plasmid containing a stop codon between GFP and mCherry (non-edited negative control, upper panel), or no stop codon or frame shift between GFP and mCherry (pre-edited positive control, lower panel). FIGS. 45C to 45F show the visualization of mCherry and GFP fluorescence from yeast colonies transformed with in vitro directed editing reaction products. FIG. 45C shows the termination codon correction via T.A to A.T transversion using 3 '-extended pegRNA or using 5' -extended pegRNA, as shown in FIG. 45D. FIG. 45E shows +1 frameshift correction via 1-bp deletion using 3' -extended pegRNA. FIG. 45F shows-1 frameshift correction via 1-bp insertion using 3' -extended pegRNA. FIG. 45G shows Sanger DNA sequencing traces from plasmids isolated from GFP only colonies in FIG. 45B and GFP and mCherry double positive colonies in FIG. 45C.
Fig. 46A to 46F show correct editing with PE1 versus indel generation. FIG. 46A shows T.A to A.T transversion editing efficiency and indel generation by PE1 at position +1 of HEK3 using pegRNA containing 10-nt RT template and PBS sequence ranging from 8-17 nt. FIG. 46B shows G.C to T.A transversion editing efficiency and indels generation by PE1 at position +5 of EMX1 using pegRNA containing 13-nt RT template and PBS sequence ranging from 9-17 nt. FIG. 46C shows G.C to T.A transversion editing efficiency and indels generation by PE1 at position +5 of FANCF using pegRNA containing 17-nt RT template and PBS sequence ranging from 8-17 nt. FIG. 46D shows the efficiency of C.G to A.T transversion editing and indel generation by PE1 at position +1 of RNF2 using pegRNA containing 11-ntRT template and PBS sequence ranging from 9-17 nt. FIG. 46E shows G.C to T.A transversion editing efficiency and indels generation by PE1 at position +2 of HEK4 using pegRNA containing 13-nt RT template and PBS sequence ranging from 7-15 nt. FIG. 46F shows PE1 mediated +1T deletions, +1A insertions and +1CTT insertions at HEK3 sites using 13-nt PBS and 10-nt RT templates. The sequences of pegRNA are those used in FIG. 39C (see tables 3A-3R). Values and error bars reflect the mean and standard deviation of three independent biological replicates.
FIGS. 47A through 47S show the evaluation of M-MLV RT variants for guided editing. Fig. 47A shows an abbreviation for the guided editor variant used in this figure. Fig. 47B shows targeted insertion and deletion editing with PE1 at HEK3 locus. FIGS. 47C through 47H illustrate the ability to compare 18 guided editor constructs containing M-MLV RT variants to install +2G.C through C.G transversion edits at HEK3, as illustrated in FIG. 47C, 24-bp FLAG insert at HEK3, as illustrated in FIG. 47D, +1C.G through A.T transversion edits at RNF2, as illustrated in FIG. 47E, +1G.C through C.G transversion edits at EMX1, as illustrated in FIG. 47F, +2T.A through A.T transversion edits at HBB, as illustrated in FIG. 47G, and +1G.C through C.G transversion edits at FANCF, as illustrated in FIG. 47H. FIGS. 47I through 47N illustrate the ability to compare four guided editor constructs containing M-MLV variants to install the installation edits illustrated in FIGS. 47C through 47H in a second round of independent experiments. Fig. 47O to 47S show the efficiency of PE2 editing at five genomic loci with different PBS lengths. Fig. 47O shows +1t·a to a·t changes at hek3. Fig. 47P shows +5g·c to t·a changes at emx1. Fig. 47Q shows +5g·c to t·a changes at FANCF. FIG. 47R shows +1C.G to A.T changes at RNF2. Fig. 47S shows +2g·c to t·a changes at hek4. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
FIGS. 48A to 48C show design features of pegRNA PBS and RT template sequences. FIG. 48A shows PE2 mediated +5G.C to T.A transversion editing efficiency (blue line) as a function of RT template length at VEGFA in HEK293T cells. Indels (grey lines) were drawn for comparison. The sequence below the figure shows the last template-providing nucleotide synthesized from pegRNA. G nucleotide (C as template in pegRNA) is highlighted; RT templates ending with C should be avoided during the pegRNA design process to maximize guided editing efficiency. FIG. 48B shows +5G.C to T.A transversion editing and indels of DNMT as in FIG. 48A. Fig. 48C shows +5g·c to t·a transversion editing and indels of RUNX1, as in fig. 48A. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Fig. 49A to 49B show the effect of PE2, PE 2R 110S K103L, cas H840A nickase and dCas9 on cell viability. HEK293T cells were transfected with plasmids encoding PE2, PE 2R 110S K103L, cas H840A nickase or dCas9, pegRNA plasmid targeting HEK 3. Cell viability was measured every 24 hours after transfection using CellTiter-glo2.0 assay (Promega) for 3 days. Figure 49A shows viability measured by luminescence 1, 2 or 3 days post transfection. Values and error bars reflect the average of three independent biological replicates and s.e.m, each replicate was performed in triplicate technically. FIG. 49B shows percent editing and indels of PE2, PE 2R 110S K103L, cas H840A nickase or dCAS9 along with HEK3 targeting pegRNA plasmid encoding +5G to A editing. The edit efficiency of the treated cells along with those used to determine viability in fig. 49A was measured on day 3 post-transfection. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Figures 50A to 50B show PE3 mediated HBB E6V correction and hex a 1278+tatc correction by different pegrnas. FIG. 50A shows the correction of HBB E6V alleles with PE3 in HEK293T cells by screening 14 pegRNAs. All pegRNA evaluated converted the HBBE6V allele back to wild-type HBB without introducing any silent PAM mutations. FIG. 50B shows the screening of 41 pegRNAs for correction of HEXA 1278+TATC alleles using PE3 or PE3B in HEK293T cells. Those pegRNAs labeled HEXA correct for pathogenic alleles by shifting the 4bp deletion, which destroys PAM and leaves silent mutations. Those pegRNAs labeled HEXA correct the pathogenic allele back to wild type. Entries ending with "b" edit-specific nick-producing sgRNA was used in combination with pegRNA (PE 3b system). Values and error bars reflect the mean and standard deviation of three independent biological replicates.
Fig. 51A to 51F show PE3 activity in human cell lines and comparing PE3 and Cas9 initiated HDR. In HEK293T cells as shown in fig. 51A, in K562 cells as shown in fig. 51B, in U2OS cells as shown in fig. 51C, and in HeLa cells as shown in fig. 51D, the efficiency of PE3 and Cas9 initiated HDR generation with correct editing (no indels) and indels frequency. Each edit comparison in brackets installs the same edits using PE3 and Cas9 initiated HDR. The non-targeted controls were PE3 and PEgRNA targeting the non-target locus. Fig. 51E shows control experiments with non-targeted pegrna+pe3 and with dcas9+sgrna, confirming that ssDNA donor HDR templates (common contaminants that artificially improve apparent HDR efficiency) do not contribute to the HDR measurements in fig. 51A-51D, compared to the wild-type Cas9 HDR experiment. Figure 51F shows an example HEK3 locus allele table for genomic DNA samples isolated from K562 cells after editing with PE3 or with Cas 9-initiated HDR. Alleles were sequenced using Illumina MiSeq and analyzed using CRISPResso2 178 . The reference HEK3 sequence from this region is at the top. Allele tables for non-targeted pegRNA negative control, +1CTT insertion at HEK3 using PE3, and +1CTT insertion at HEK3 using Cas 9-initiated HDR are shown. Allele frequencies for each allele and corresponding Illumina sequencing read counts are shown. All alleles observed at a frequency of > 0.20% are shown. Value and error bar reflectionMean and standard deviation of three independent biological replicates.
Fig. 52A to 52D show the distribution of lengths according to pathogenic insertions, duplications, deletions and indels in the ClinVar database. The ClinVar variant digest was downloaded from NCBI on day 7, 15 of 2019. The length of the reported insertions, deletions and duplications is calculated using the appropriate identifying information in the reference and substitution alleles, variant start and end positions or variant names. Analysis precludes variants that do not report any of the above information. The length of reported indels (including insertions and single variants of the deletions relative to the reference genome) is calculated by determining the number of mismatches or gaps in the best alignment between the reference and alternative alleles.
Fig. 53A to 53B show FACS gating examples for GFP positive cell sorting. The following is an example of the original bulk analysis file, summarizing the sorting strategy used to generate the HEXA 1278+tatc and HBB E6VHEK293T cell lines. Image data were generated on a SonyLE-MA900 cytometer using Cell Sorter software v.3.0.5. FIG. 1 shows a gating pattern of cells that do not express GFP. FIG. 2 shows an example of P2A-GFP expressing cells used to isolate the HBB E6V HEK293T cell line. HEK293T cells were initially gated on the population using FSC-se:Sub>A/BSC-se:Sub>A (gating se:Sub>A) and then sorted for single peaks (single) using FSC-se:Sub>A/FSC-H (gating B). Viable cells were sorted by gating DAPI negative cells (gate C). Cells with GFP fluorescence levels higher than negative control cells were sorted using EGFP as fluorescent dye (gate D). Fig. 53A shows HEK293T cells (GFP negative). FIG. 53B shows a representative graph of FACS gating of cells expressing PE 2-P2A-GFP. FIG. 53C shows the genotype of HEK293T cell HEXA 1278+TATC homozygote. FIG. 53D shows the allele table of the HEK293T cell line HBB E6V homozygote.
FIG. 54 is a schematic diagram summarizing the pegRNA cloning procedure.
FIGS. 55A to 55G are schematic illustrations of pegRNA designs. Fig. 55A shows a simplified diagram of pegRNA, where the domain is labeled (left) and binds to nCas9 at the genomic site (right). FIG. 55B shows various types of modifications to pegRNA that can increase activity. FIG. 55C shows modification of pegRNA to increase transcription of longer RNA via promoter selection and 5', 3' processing and termination. Fig. 55D shows an extension of the P1 system, which is an example of scaffold modification. Fig. 55E shows that incorporation of synthetic modifications within the template region or elsewhere within the pegRNA can increase activity. FIG. 55F shows that the incorporation of a minimal secondary structure within the design template can prevent the formation of longer, more inhibitory secondary structures. FIG. 55G shows a fragmented pegRNA with a second template sequence anchored by the 3' terminal (left) RNA element of the pegRNA. Incorporation of elements at the 5 'or 3' end of the pegRNA can enhance RT binding.
Fig. 56A to 56D show the incorporation of the pegRNA scaffold sequence into the target locus. HTS data with pegRNA scaffold sequence insertion were analyzed as described in fig. 60A-60B. Fig. 56A shows analysis of EMX1 loci. The total sequencing read percentage containing one or more pegRNA scaffold sequence nucleotides in the insertion adjacent to the RT template is shown (left); percent total sequencing reads (middle) containing specified length pegRNA scaffold sequence insertions; and the cumulative total percent of pegRNA insertions up to and including the specified length on the X axis. Fig. 56B shows the same as fig. 56A except for FANCF. Fig. 56C shows the same as fig. 56A except HEK 3. FIG. 56D shows the same as FIG. 56A except for RNF 2. Values and error bars reflect the mean and standard deviation of three independent biological replicates.
FIGS. 57A through 57I show the effect of PE2, PE2-dRT and Cas9H840A nickases on full transcriptome RNA abundance. Cellular RNAs isolated from HEK293T cells expressing PE2, PE2-dRT or Cas9H840A nickase and PRNP-targeted or HEXA-targeted pegRNA were analyzed, removing ribosomal RNAs. RNA corresponding to 14,410 genes and 14,368 genes were detected in PRNP and HEXA samples, respectively. Figures 57A to 57F show volcanic plots showing log2 fold change in p-value versus log2 fold change for each RNA transcript abundance, comparing (figure 57A) PE2 versus PE2-dRT with PRNP-targeted perna, (figure 57B) PE2 versus Cas9H840A with PRNP-targeted perna, (figure 57C) PE2-dRT versus Cas9H840A with PRNP-targeted perna, (figure 57D) PE2 versus PE2-dRT with pexa-targeted pexa, (figure 57E) PE2 versus Cas9H840A with pexa-targeted perna, (figure 57F) PE2 versus Cas9H840A with pexa-targeted pexa. Red dots represent genes with a 2-fold change in relative abundance (FDR adjusted p < 0.05) and statistically significant. FIGS. 57G through 57I are Wien (Venn) plots of up and down regulated transcripts (. Gtoreq.2 fold change), comparing PRNP and HEXA samples, (FIG. 57G) PE2 versus PE2-dRT, (FIG. 57H) PE2 versus Cas9H840A and (FIG. 57I) PE2-dRT versus Cas9H 840A.
Fig. 58 shows representative FACS gating for neuronal core sorting. Nuclei are sequentially gated based on the DyeCycle Ruby signal, FSC/SSC ratio, SSC width/SSC height ratio, and GFP/DyeCycle ratio.
FIGS. 59A to 59F show a scheme for cloning 3' -extended pegRNA into a mammalian U6 expression vector by Golden Gate assembly. FIG. 59A shows a clone overview. Fig. 59B shows "step 1: pU 6-pegRNA-GG-vector plasmid (module 1) ". Fig. 59C shows "steps 2 and 3: the oligonucleotide moieties (modules 2, 3 and 4) were sequenced and annealed. Fig. 59D shows "step 2.B.ii.: sgRNA scaffold phosphorylation (not required if phosphorylated oligonucleotides were purchased) ". Fig. 59E shows "step 4: pegRNA assembly). Fig. 59F shows "steps 5 and 6: transformation of the assembled plasmid. FIG. 59G shows a diagram summarizing pegRNA cloning schemes.
FIGS. 60A to 60B show Python scripts for quantifying pegRNA scaffold integration. Custom python scripts were generated to characterize and quantify pegRNA insertion at the target genomic locus. The script iteratively matches the length-increased text strings collected from the reference sequence (guide RNA scaffold sequence) to the sequencing reads in the fastq file and calculates the number of sequencing reads that match the search query. Each successive text string corresponds to an additional nucleotide of the guide RNA scaffold sequence. The exact length integral and the accumulated integral are calculated in this way to reach the specified length. At the beginning of the reference sequence, 5 to 6 bases of the 3' -end of the new DNA strand synthesized by reverse transcriptase are included to ensure alignment and accurate counting of the short fragments of sgRNA.
Fig. 61 is a graph showing the percentage of total sequencing reads with specified edits for SaCas9 (N580A) -MMLV RT HEK3+6c > a. Showing the values of the correct edits and indels.
Fig. 62A to 62B show the importance of the front spacer for effectively installing a desired edit at a precise position with a guided edit. FIG. 62A is a graph showing the percentage of total sequencing reads with target T.A base pairs converted to A.T for each HEK3 locus. Fig. 62B is a diagram showing the same sequence analysis.
Fig. 63 is a diagram showing SpCas9 PAM variants (n=3) in PAM editing. Shows the total sequencing read percentage with targeted PAM editing for SpCas9 (H840A) -VRQR-MMLV RT, where NGA > NTA, and for SpCas9 (H840A) -VRER-MMLV RT, where NGCG > NTCG. The pegRNA Primer Binding Site (PBS) length, RT Template (RT) length and PE system used are listed.
FIG. 64 is a schematic diagram showing the introduction of individual site-specific recombinase (SSR) targets into the genome using PE. (a) A general schematic of insertion of a recombinase target sequence through a guided editor is provided. (b) Showing how a single SSR target of PE insertion can be used as a site for genomic integration of a DNA donor template. (c) It is shown how tandem insertion of SSR target sites can be used to delete portions of the genome. (d) It is shown how tandem insertion of SSR target sites can be used to invert portions of the genome. (e) It is shown how insertion of two SSR target sites in two distal chromosomal regions can lead to chromosomal translocation. (f) It is shown how the insertion of two different SSR target sites in the genome can be used to exchange cassettes from DNA donor templates.
Figure 65 shows 1) PE-mediated synthesis of SSR target sites in the human cell genome and 2) integration of DNA donor templates comprising GFP expression markers using the SSR target sites. Once successfully integrated, GFP causes cells to fluoresce.
FIG. 66 depicts one embodiment of a leader editor provided as two PE half proteins that regenerate into a complete leader editor by self-splicing of the broken intein half at the end or start of each leader editor half protein.
FIG. 67 depicts the mechanism of removal of inteins from polypeptide sequences and reformation of peptide bonds between N-terminal and C-terminal extein peptide sequences. (a) The general mechanism of two half proteins is described, each containing half of the intein sequence, which when contacted in a cell produces a fully functional intein which is then self-spliced and excised. The excision process results in the formation of a peptide bond between the N-terminal protein half (or "N-extein") and the C-terminal protein half (or "C-extein") to form the complete single polypeptide comprising both the N-extein and the C-extein portions. In various embodiments, the N-exons may correspond to the N-terminal half of the fragmentation-directed editor, and the C-exons may correspond to the C-terminal half of the fragmentation-directed editor. (b) The chemical mechanism of intein cleavage and reformation of peptide bonds connecting the N-exopeptide half (red half) and the C-exopeptide half (blue half) is shown. Cleavage of the split inteins (i.e., the N-intein and the C-intein in the split intein construct) may also be referred to as "trans-splicing" because it involves splicing of two separate components provided in trans.
FIG. 68A shows that delivery of two split intein halves of the linker SpPE (SEQ ID NO: 383) remained active at three test loci when co-transfected into EK293T cells.
FIG. 68B shows that delivery of two split inteins of SaPE2 (e.g., SEQ ID NO:394 and SEQ ID NO: 395) when co-transfected into HEK293T cells reproduces the activity of full-length SaPE2 (SEQ ID NO: 33). The residues indicated in the quotas are the sequences of amino acids 741-743 in SaCas9 (the first residue of the C-terminal exopeptide), which are important for the intein trans-splicing reaction. "SMP" is a natural residue which we have also mutated to a "CFN" consensus splice sequence. The consensus sequence was shown to produce the highest reconstruction as measured by the percent pilot editing.
Fig. 68C provides data showing that various disclosed PE ribonucleoprotein complexes (high concentration PE2, high concentration PE3, and low concentration PE 3) can be delivered in this manner.
FIG. 69 shows phage plaque assay to determine PE availability in PANCE. Plaques (black circles) indicate that phage can successfully infect E.coli. An increase in L-rhamnose concentration results in an increase in PE expression and an increase in plaque formation. Sequencing of plaques revealed the presence of PE-installed genome editing.
FIGS. 70A to 70I provide examples of editing target sequences as illustrations of step-by-step instructions for designing pegRNA and nick-producing sgRNA for guided editing. Fig. 70A: step 1, defining a target sequence and editing. Search wrap It is desirable to edit (point mutation, insertion, deletion or a combination thereof) the sequence of the target DNA region (about 200 bp) centered at the position. Fig. 70B: step 2, positioning target PAM. PAM was identified near the edit position. PAM was found on both strands. Although PAM close to the edit position is preferred, PAM installation edits with pre-spacers and placing cuts at a distance of ≡30nt from the edit position may be used. Fig. 70C: step 3, positioning a notch site. For each PAM under consideration, the corresponding incision site was identified. For Sp Cas 9H 840A nickase, cleavage occurs between the 3 rd and 4 th bases of NGG PAM 5' in the PAM-containing strand. All editing nucleotides must be present 3 'of the nick site, so the appropriate PAM must place the nick 5' of the target editing on the PAM-containing strand. In the examples shown below, there are two possible PAMs. For simplicity, the remaining steps will show a pegRNA design using PAM1 only. Fig. 70D: and 4, designing a spacer sequence. The pre-spacer of Sp Cas9 corresponds to 20 nucleotides of 5' of NGG PAM on the PAM-containing strand. Efficient PolIII transcription initiation requires G as the first transcribed nucleotide. If the first nucleotide of the pre-spacer is G, the spacer sequence of the pegRNA is the pre-spacer. If the first nucleotide of the pre-spacer is not G, the spacer sequence of the pegRNA is G followed by the pre-spacer. Fig. 70E: step 5, designing a Primer Binding Site (PBS). Using the initial allele sequence, DNA primers on the PAM-containing strand were identified. The 3 'end of the DNA primer is the nucleotide just upstream of the nicking site (i.e., the 4 th base of 5' of NGG PAM of Sp Cas 9). As a general design principle for use with PE2 and PE3, a pegRNA Primer Binding Site (PBS) containing 12 to 13 nucleotides of complementarity to the DNA primer can be used for sequences containing about 40-60% GC content. Longer (14 to 15 nt) PBS should be tested for sequences with lower GC content. For sequences with higher GC content, shorter (8 to 11 nt) PBS should be tested. The optimal PBS sequence should be determined empirically, regardless of GC content. To design a length p PBS sequence, use was made of InitiationAllele sequence, reverse complement of the first p nucleotides 5' of the nick site in PAM-containing strand. Fig. 70F: step 6, designing an RT template. Editing and adjacent editing of RT template coding designHomology of the edited sequence. The optimal RT template length varies depending on the target site. For short-range editing (positions +1 to +6), it is recommended to test short (9 to 12 nt), medium (13 to 16 nt) and long (17 to 20 nt) RT templates. For remote editing (positions +7 and above), it is suggested to use an RT template that extends at least 5nt (e.g., 10nt or more) after editing the position to allow for sufficient 3' dna flap homology. For remote editing, several RT templates should be screened to identify functional designs. For larger insertions and deletions (. Gtoreq.5 nt), it is recommended to incorporate greater 3' homology (about 20nt or more) to the RT template. Editing efficiency is often compromised when the RT template encodes the synthesis of G as the last nucleotide in the reverse transcribed DNA product (corresponding to C in the RT template of the pegRNA). Since many RT templates support efficient guided editing, it is suggested to avoid G as the final synthesized nucleotide when designing RT templates. To design an RT template sequence of length r, use is made ofThe desired Allele sequence, and take the reverse complement of the first r nucleotides 3' of the nick site in the original PAM-containing strand. Note that insertion or deletion edits obtained using RT templates of the same length do not contain the same homology as SNP edits. Fig. 70G: step 7, assembling the complete pegRNA sequence. The pegRNA modules were concatemerized (concatate) in the following order (5 'to 3'): spacer, scaffold, RT template and PBS. Fig. 70H: step 8, designing incision-producing sgrnas for PE 3. PAM on the non-editing strand upstream and downstream of editing was identified. The optimal incision-generating location is highly dependent on the locus and should be determined empirically. Generally, a 40 to 90 nucleotide cut placed 5' to the position opposite the pegRNA induced cut results in higher editing yield and fewer indels. Nicking-producing sgrnas haveInitiationA20-nt pre-spacer matched spacer sequence in the allele, if the pre-spacer does not start with G, then 5' -G is added. Fig. 70I: step 9, designing PE3b incision-producing sgRNA. If PAM is present in the complementary strand and its corresponding pre-spacer overlaps with the sequence targeted for editing, this editing may be a candidate for the PE3b system. In the PE3b system, the spacer sequence of the nicking-producing sgRNA matches the sequence of the desired edited allele, but does not match the starting allele Sequence of the gene. The PE3b system operates efficiently when the edited nucleotides fall within the seed region of the nicking sgRNA pre-spacer (approximately 10nt adjacent to PAM). This prevents nicking of the complementary strand prior to installation of the editing strand, thereby preventing competition between the pegRNA and the sgRNA for binding to the target DNA. PE3b also avoids the formation of simultaneous cuts on both strands, thereby significantly reducing indel formation while maintaining high editing efficiency. PE3b sgRNA should have a high affinity forIt is desirable toA20-nt pre-spacer matched spacer sequence in the allele, with 5' G added if desired.
Fig. 71A shows the nucleotide sequence (upper) of the SpCas9 pegRNA molecule, which terminates at the 3' end in a "UUU" and does not contain a toe ring element. The lower part of the figure depicts the same SpCas9 pegRNA molecule, but further modified to include a toe ring element having the sequence 5' - "GAAANNNNN" -3' inserted just prior to the 3' end of the "UUU". "N" can be any nucleobase.
FIG. 71B shows the results of example 3, which demonstrates that using pegRNA containing toe ring elements increases the efficiency of guided editing in HEK cells or EMX cells, while the percent indels formed are substantially unchanged.
FIG. 72 depicts an alternative pegRNA configuration that can be used in guided editing. (a) depicts a PE2:pegRNA embodiment that directs editing. This embodiment relates to PE2 (fusion protein comprising Cas9 and reverse transcriptase) complexed with pegRNA (also as described in FIGS. 1A-1I and/or FIGS. 3A-3E). In this embodiment, the template for reverse transcription is incorporated into the 3' extension arm on the sgRNA to make the pegRNA, and the DNA polymerase is a Reverse Transcriptase (RT) fused directly to Cas 9. (b) depicts an MS2cp-PE2: sgRNA+tPERT embodiment. This embodiment includes a PE2 fusion (Cas 9+ reverse transcriptase) that further fuses with MS2 phage coat protein (MS 2 cp) to form a MS2cp-PE2 fusion protein. To achieve guided editing, the MS2cp-PE2 fusion protein is complexed with an sgRNA that targets the complex to a specific target site in DNA. This embodiment then involves introducing a trans-guide editing RNA template ("tPERT") that replaces pegRNA operation by providing a Primer Binding Site (PBS) and a DNA synthesis template on separate molecules (i.e., tPERT) that are also equipped with an MS2 aptamer (stem loop). The MS2cp protein recruits tPERT by binding to the molecular MS2 aptamer. (c) Alternative designs of pegRNA are depicted which can be achieved by known methods of nucleic acid molecule chemical synthesis. For example, chemical synthesis may be used to synthesize hybrid RNA/DNA pegRNA molecules for use in guided editing, wherein the extension arm of the hybrid pegRNA is DNA rather than RNA. In such embodiments, a DNA-dependent DNA polymerase can be used in place of reverse transcriptase to synthesize a 3' DNA flap comprising the desired genetic change formed by guided editing. In another embodiment, an extension arm can be synthesized to include a chemical linker that prevents DNA polymerase (e.g., reverse transcriptase) from using the sgRNA scaffold or backbone as a template. In yet another embodiment, the extension arm may comprise a DNA synthesis template having an opposite orientation relative to the overall orientation of the pegRNA molecule. For example, as shown for pegRNA in the 5' to 3' orientation and with extension attached to the 3' end of the sgRNA scaffold, the DNA synthesis template is in the opposite reverse orientation, i.e. 3' to 5' orientation. This embodiment may be advantageous for pegRNA embodiments having an extension arm located at the 3' end of the gRNA. By reversing the orientation of the extension arm, DNA synthesis by the polymerase (e.g., reverse transcriptase) will terminate upon reaching the newly oriented 5' end of the extension arm, thus there is no risk of using the gRNA core as a template.
Fig. 73 shows guided editing with tPERT and MS2 recruitment system (another name MS2 tagging technique). The sgrnas that direct the targeting of the editor protein (PE 2) to the target locus were expressed in combination with tPERT containing a primer binding site (13-nt or 17-ntPBS), an RT template encoding His6 tag insertion and homology arm, and an MS2 aptamer (located at the 5 'or 3' end of the tPERT molecule). Fusion of the leader editor protein (PE 2) or MS2cp with the N-terminus of PE2 was used. Editing was performed with or without complementary strand nick-producing sgrnas, as in the previously developed PE3 systems (designated on the x-axis as labels "PE2+ nick" or "PE2", respectively). This is also referred to and defined herein as "second strand incision generation".
FIG. 74 shows that the MS2 aptamer of reverse transcriptase is expressed in trans and recruited by the MS2 aptamer system. The pegRNA contains an MS2 RNA aptamer inserted into one of the two sgRNA scaffold hairpins. Wild-type M-MLV reverse transcriptase is expressed as an N-or C-terminal fusion with MS2 coat protein (MCP). The HEK3 site located in HEK293T cells was edited.
FIG. 75 provides a bar graph comparing the efficiency of PE2, PE 2-truncations, PE3, and PE 3-truncations at different target sites of different cell lines (i.e., "total sequencing reads with specified edits or indels"). The data indicate that the leader editor comprising truncated RT variants is similar in efficiency to the leader editor comprising non-truncated RT proteins.
FIG. 76 shows the editing efficiency of the intein-fragmentation-directed editor. HEK239T cells were transfected with plasmids encoding full length PE2 or intein-disrupted PE2, pegRNA and nicking-producing guide RNA. The consensus sequence (most of the amino terminal residues of the C-terminal exopeptide) is shown. The percentage of editing at two sites is shown: hek3+1CTT insert and prnp+6g to T. Repeat n=3 independent transfections.
FIG. 77 shows the editing efficiency of an intein cleavage guide editor. After delivery of 5E10vg/SpPE3 half and a small amount of 1E10 nuclear localized GFP: KASH to P0 mice by ICV injection, edits were assessed by targeted depth sequencing in the massive cortex and GFP+ subpopulations. The editor and GFP were packaged in AAV9 with the EFS promoter. Mice were harvested three weeks after injection and gfp+ nuclei were isolated by flow cytometry. A single data point is displayed and 1-2 mice are analyzed for each condition.
FIG. 78 shows the editing efficiency of the intein-fragmentation-directed editor. Specifically, the figure depicts AAV cleavage-SpPE 3 constructs. PE3 activity was recapitulated by co-transduction of AAV particles expressing SpPE3-N and SpPE3-C, respectively. Note that the N-terminal genome contains a U6-sgRNA cassette that expresses nicking-generating sgRNA, and the C-terminal genome contains a U6-pegRNA cassette that expresses pegRNA.
FIG. 79 illustrates the editing efficiency of certain optimized joints. In particular, the data show editing efficiency (labeled PE 2-white) of PE2 constructs with current linkers compared to various forms with linkers replaced with sequences as shown at HEK3, EMX1, FANCF, RNF2 loci for representative pegrnas for transitions, transversions, insertions, and deletions. The substitution linkers are referred to as "1xSGGS (SEQ ID NO: 8)", "2xSGGS (SEQ ID NO: 9)", "3xSGGS (SEQ ID NO: 10)", "1xXTEN" (SEQ ID NO: 11) "," NO linker "," 1xGly "," 1xPro "," 1xEAAAK (SEQ ID NO: 12) "," 2xEAAAK (SEQ ID NO: 13) ", and" 3xEAAAK (SEQ ID NO: 14) ". Editing efficiency was measured in bar graph form relative to the "control" editing efficiency of PE 2. The linker for PE2 is SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 11). All edits were done in the context of the PE3 system, i.e. refer to the PE2 editing construct plus the addition of the optimal secondary sgRNA nick generating guide.
Figure 80 the average fold efficacy relative to PE2 was plotted, indicating an average 1.14 fold increase in editing efficiency using the 1xXTEN linker sequence (n=15).
FIG. 81 depicts transcript levels of pegRNA from different promoters as described in example 2.
FIG. 82 shows the effect of different types of modifications to the pegRNA structure on editing efficiency relative to unmodified pegRNA, as shown in example 2.
Fig. 83 depicts a PE experiment targeting editing of HEK3 gene, in particular targeting insertion of 10nt insertion at position +1 relative to the incision site and using PE3. See example 2.
FIG. 84A depicts the structure of tRNA's that can be used to modify the pegRNA structure. See example 2. The length of P1 may be variable. P1 can be extended to help prevent RNAseP processing of the pegRNA-tRNA fusion.
FIG. 84B depicts an exemplary pegRNA having a spacer, a gRNA core, and an extension arm (RT template+primer binding site) modified with a tRNA molecule at the 3' end of the pegRNA, coupled via a UCU linker. tRNA's include various post-transcriptional modifications. However, the modification is not necessary. See example 2.
Figure 85 depicts a PE experiment targeting editing of the FANCF gene, in particular targeting G-to-T conversion at position +5 relative to the nick site and using a PE3 construct. See example 2.
FIG. 86 depicts PE experiments targeting editing of HEK3 gene, particularly targeting insertion of 71nt FLAG tag insertion at position +1 relative to the incision site and using PE3 constructs. See example 2.
FIG. 87 is a result of a screen from N2A cells in which the pegRNA was installed 1412Adel with detailed information (shown with and without indels) on Primer Binding Site (PBS) length and Reverse Transcriptase (RT) template length.
FIG. 88 is a result of a screen from N2A cells in which the pegRNA was installed 1412Adel with detailed information (shown with and without indels) on Primer Binding Site (PBS) length and Reverse Transcriptase (RT) template length.
FIG. 89 depicts the results of editing at the proxy (proxy) locus in the β -globin gene and HEK3 in healthy HSC, changing the concentration of the editor for pegRNA and nicking-producing gRNA.
FIG. 90A shows RT-qPCR data indicating that PCR amplicons 3 and 6 amplify with the same efficiency as amplicons consisting of spacer and scaffold regions of pegRNA using in vitro transcribed undegraded and full length pegRNA. Amplicon 3 contained the template region of the pegRNA, while amplicon 6 contained PBS of the pegRNA. Bars are averages of 3 technical replicates.
FIG. 90B shows RT-qPCR data indicating a decrease in abundance of pegRNA template and PBS, particularly PBS, after extraction from cells compared to in vitro transcribed pegRNA by the same extraction process.
FIG. 90C provides template amplicon and PBS amplicon sequences corresponding to amplicon 3 and 6, respectively, of FIG. 90B.
FIGS. 91A-91D provide results of scaffold modification on pegRNA activity for editing +1FLAG of HEK3 (FIG. 91A), +5G-T of RNF2 (FIG. 91B), +5G-T of DNMT1 (FIG. 91C) and +5G-T on EMX1 (FIG. 91D). Modifications to P1, P2 and P3 of the scaffold extensively kill activity. Modification of direct repeat (direct repeat) may increase activity.
FIGS. 92A-92C provide +1FLAG insert editing at HEK3 (FIG. 91A), RNF2 (FIG. 91B) and RUNX1 (FIG. 91C) loci. As noted, pegRNA includes structural motifs and linkers. If no splice length is given, the length is 8.
FIGS. 93A-93H show the sequence of the two nucleotides at various template lengths (15 nucleotides ("template 15"), 25 nucleotides ("template)25 ") and 35 nucleotides (" template 35 ")) at a sequence having a 3' structural motif (linker=8; motif = linker + evoparq 1 -1 or linker+mmlv-pknot), linker alone (linker=8; motif= "-") without 3' addition (linker=0; motif= "-") PE at the specified site of the pegRNA as specified edit (as deletion "del" or point mutation in the title). Blue bars represent the percentage of total sequencing reads with correct editing. The grey bars track the occurrence of indels as a percentage of total sequencing reads.
FIG. 94 shows a separate linker, linker+evoparQ 1 Summary of the effect of either-1 or linker+Mpknot-1 on pilot editing activity, the data in FIGS. 93A-H are summarized. The line represents the median fold increase.
FIGS. 95A-95B show the efficiency of editing in Hela, U2OS and K562 cell lines for inserting a nucleotide sequence corresponding to the FLAG tag at the HEK3 locus after plasmid nuclear transfection. The results are the average of three biological replicates. The results indicate that the increase in efficiency of the 3' stable modification may be higher in other cell types where the editing agent delivery efficiency is lower.
FIG. 96 shows the effect of mutations mut1 and mut2 on pilot editing activity. Mutations are expected to disrupt evoreQ 1 -1.
FIG. 97 shows RT-qPCR data indicating that the 3 'structural motif retains the 3' end of the pegRNA compared to the unmodified species, especially PBS critical for guided editing. Bars are averages of three biological replicates, each of which is an average of three technical replicates. The template amplicon and the PBS amplicon correspond to amplicons 3 and 6, respectively.
FIG. 98 provides a schematic representation of a pegRNA with a nucleic acid moiety attached at the 3' end, which may include, but is not limited to, a duplex portion, a toe loop portion, a hairpin portion, a stem loop portion, a pseudoknot portion, an aptamer portion, a G-quadruplex portion, or a tRNA portion. The nucleic acid portion may be attached to the 3' end of the pegRNA by an optional nucleotide linker (e.g., 3-18 nucleotides).
Fig. 99 is a schematic diagram of an expression vector comprising a U6 promoter, which surprisingly has been found to result in improved editing efficiency.
FIGS. 100A-100E show that the use of the U6 promoter (including U6 wild-type, U6 v4, U6 v7, and U6 v 9) to express pegRNA results in improved editing.
FIG. 101 shows the folding of an evoreq 1 nucleic acid portion that can be used to modify pegRNA.
FIG. 102 shows the folding of the Mpknot1 nucleic acid portion that can be used to modify pegRNA.
FIG. 103 shows the folding of tRNA nucleic acid parts that can be used to modify pegRNA.
FIGS. 104A-104C show that truncated pegRNA limits guided editing efficiency. Fig. 104A (left) provides a schematic of a guide editing complex consisting of a guide editor (PE) protein consisting of Cas9 nickase (nCas 9) fused to a modified reverse transcriptase via a flexible linker and a guide editing guide RNA (pegRNA). Fig. 104A (right) shows that degradation of the 3' extension of perna by exonuclease may hamper editing efficiency by the loss of PBS. Fig. 104B shows PE3 mediated editing efficiency with the addition of plasmids expressing sgrnas, truncated pegrnas targeting the same genomic locus (HEK 3), non-targeted pegrnas, or SaCas9 pegrnas. All pegRNAs were expressed from the U6 promoter. Data and error bars reflect the mean and standard deviation of three independent biological replicates. Fig. 104C shows the design of engineered pegRNA (epegRNA) containing structured RNA pseudojunctions that protect 3' extension from degradation by exonucleases.
FIGS. 105A-105D show that PE editing efficiency is improved by adding a structured RNA motif to the 3' end of the pegRNA. FIG. 105A shows the efficiency of PE3 mediated insertion of FLAG epitope tags at the +1 editing position (directly into the pepRNA-induced nick site) between multiple genomic loci of HEK293T cells using typical pepRNAs ("unmodified"), pepRNAs with the evatreQ 1 or mpknot motif attached to the 3 'end of PBS via an 8-nt linker sequence, pepRNAs with only the 8-nt linker sequence attached to the 3' end. FIG. 105B provides a summary of PE editing efficiency versus fold change in typical perRNAs indicating editing at different genomic loci after addition of indicated 3' motifs via 8-nt linker or linker alone. "transversion" means that +5G.C of RUNX1, EMX1, VEGFA and DNMT1 are mutated to T.A, +1C.G of RNF2 is mutated to T.A, and +1T.A of HEK3 is mutated to A.T, wherein positive integers indicate distance from the Cas9 nick site. "deletion" means a 15-bp deletion at the Cas9 cleavage site. The data summarized here are presented in fig. 105C and fig. 109A-109K. The horizontal bars show the median value. Fig. 105C shows a representative improvement in PE editing efficiency, as evopmeq 1 (p) or mpknot (m) is attached via an 8-nt linker to pegrnas with different template lengths (indicated in nucleotides). Fig. 105D shows the editing activity of typical and modified pegrnas for three genomic loci in HeLa cells, U2OS cells and K562 cells. Data and error bars represent mean and standard deviation of three independent biological replicates (figures 105A, 105C and 105D).
FIGS. 106A-106D show that structural motifs increase RNA stability and efficiency of reverse transcription. FIG. 106A shows unmodified pegRNA or containing evoreQ 1 Or epegRNA of mpknot, resistance to degradation upon exposure to HEK293T nuclear lysate. The agarose gel shown represents three experiments. Untreated in vitro transcribed pegRNA or epegRNA was used as standard. The percentage of remaining RNA was calculated using densitometry. Significance was analyzed using a two-tailed unpaired student t-test (p=0.0028 for mpknot, 0.0022 for evoreq 1). Figure 106B shows fold change in abundance of pegRNA scaffold relative to unmodified pegRNA following exposure to HEK293T nuclear lysate in the absence and presence of nCas9, as determined by RT-qPCR of the sgRNA scaffold. FIG. 106C shows a comparison of guide editing intermediates generated by PE2 and pegRNA or epegRNA in RNF 2. The dashed line represents the full length reverse transcriptase product templated by pegRNA or epegRNA tested at the indicated locus. The X-axis is the position of the induced notch relative to PE2, where the first base 3' downstream is indicated as position +1. Histograms and pie charts were generated from the average of three independent biological replicates. FIG. 106D shows the use of unmodified pegRNA, pegRNA containing an evap rQ 1 motif or evap rQ containing a disrupted pseudoknot motif structure 1 PE3 editing efficiency of pegRNA of (M1) G15C Point mutant in HEK293T cells. FIG. 106E shows a Cas9 RNP portion consisting of dCAS9 and unmodified pegRNA or epegRNA containing evatreQ 1 or mpknotThe +1FLAG tag insertion was templated at HEK3, which binds to dsDNA, as determined by MST. Figure 106F shows CRISPRa transcriptional activation by pegRNA, epegRNA and sgrnas. The reported GFP fluorescence was normalized to the iRFP fluorescence expressed by the co-transfected plasmid. AU, arbitrary unit. FIG. 106G shows a portion of unmodified pegRNA or epegRNA (templated +1FLAG tag insertion at HEK 3) containing evapmeQ 1 or mpknot bound to H840A nCas9 as determined by micro-scale thermophoresis (MST). Data and error bars reflect the mean and standard deviation of three independent biological replicates. FIG. 106H shows the abundance of epegr RNA and typical pegRNA used in FIG. 106A in HEK293T cells by RT-qPCR amplification and quantification of the sgRNA scaffold. Primers can be found in table E5.
FIGS. 107A-107E illustrate that guided editing-mediated editing efficiency of treatment-related genome editing is improved by using epegRNA. FIG. 107A shows that PE3 mediated installation of a G127V mutation in a PRNP prevents prion disease in humans. FIGS. 107B-107C show correction of pathogenic C1278TATC insertion in HEXA leading to Tay Sachs disease in HEK293T cells (FIG. 107B) and primary patient derived fibroblasts (FIG. 107C). FIG. 107D shows a comparison of PE2 mediated pathogenicity and protective allele installation using non-optimized epegRNA or non-optimized pegRNA at nine genomic sites. The reference SNP (rs) names for all mutations can be found in table E6. FIG. 107E shows PE2 mediated editing efficiency of FLAG epitope tag insertion at 15 genomic loci in HEK293T cells using non-optimized epegRNA as compared to non-optimized typical pegRNA. Data and error bars represent mean and standard deviation of three independent biological replicates.
FIG. 108 shows the sequence and secondary structure of RNA structural motifs detected in this study. The structure is based on predictions of previously published structural or bioinformatic analyses. For simplicity, only two G-quadruplexes of the 11 tested are shown. The sequences of all motifs are provided in table E2.
FIGS. 109A-109C illustrate PE3 mediated editing of the pegRNA and epegRNA shown in FIGS. 105A-105D: indel ratio. Editing of pilot editing observed for FLAG epitope tag installation in HEK293T cells (fig. 109A) or indication transversion or deletion (fig. 109B), or epegRNA with evoreq 1 (p) or mpknot (m) in HeLa, U2OS or K562 cells, compared to unmodified pegRNA (dashed line): fold change in indel ratio. The values are calculated from the data presented in fig. 105A, 105C, and 105D, respectively. Data and error bars reflect the mean and standard deviation of three independent biological replicates.
FIG. 110 shows linker length dependence of epegRNA activity. The effect of the 8-nt linker used in FIGS. 105A-105D and 111A-111K on PE3 editing efficiency was removed. The evopmeQ 1 (p) or mpknot (m) was attached to PBS either without a linker or via an 8-nt linker. The distance from Cas9 cleavage site to the installed mutation in the nucleotide is shown in the legend. Dots represent the average of three biological replicates. The columns represent the total median. Significance was calculated via a two-tailed paired student t-test (p=0.022).
Figures 111A-111K show the improvement in efficiency of PE 3-mediated editing from different genomic loci adding 3' rna structural motifs to pegrnas. FIGS. 111A-111K show PE3 mediated installation of indicated edits at DNMT1 (FIGS. 111A-111B), RUNX1 (FIG. 111C), RNF2 (FIGS. 111D-109E), FANCF (FIGS. 111F-111G), EMX1 (FIGS. 111H-111I), VEGFA (FIG. 111J) and HEK3 (FIG. 111K). An 8-nt linker alone or a linker that binds to evatreQ 1 (p) or mpknot (m) was added to the pepRNA of increasing template length and compared to the typical pepRNA. The distance from the Cas9 cleavage site to the installed mutation is indicated. Error bars represent standard deviations of three replicates.
FIGS. 112A-112C show PE3 mediated editing for pegRNA and epegRNA shown in FIG. 110: indel ratio. Compared to unmodified pegRNA (dashed line), HEK3, RUNX1 or DNMT1 (fig. 112A), RNF2 or FANCF (fig. 112B) or observed edits at EMX1 or VEGFA (fig. 112C) of epegrnas with evoreq 1 (p) or mpknot (m) indicating transversions or deletions: fold change in indel ratio. The values are calculated from the data presented in fig. 109A-109C. Data and error bars reflect the mean and standard deviation of three independent biological replicates.
FIG. 113 shows that engineered pegRNA did not increase in detected off-target activity compared to typical pegRNA. Targeting and off-targeting PE3 edits targeting HEK3, EMX1 or FANCF for pegRNA and epegRNA, and templates nucleotide transversions (T.A to A.T at HEK3 or G.C to T.A; pt mtn) or 15-nt deletions (del) at EMX1 and FANCF; -typical pegRNA; m, epegRNA containing mpknot; p, epegRNA containing evatreQ 1. Indel frequencies are shown in brackets. For EMX1 off-target 1, indels were obtained by subtracting the percentage of sequencing reads containing indels in cells transfected with non-targeted pegRNA. Off-target loci are listed in table E4. The data are the average of three biological replicates.
FIGS. 114A-114C show the site-dependent expression differences of pegRNA and epegRNA. Northern blots containing HEK293T lysates of perna or epegr RNA targeted to (fig. 114A) HEK3 or (fig. 114B) EMX1 after hybridization with DIG-labeled RNA probes complementary to the sgRNA scaffold. The PAGE gels shown represent multiple independent biological replicates. Normalized fold changes in abundance relative to unmodified pegRNA as determined by densitometry are shown (right). For samples in which full-length pegRNA was present, abundance was calculated by including full-length pegRNA and epegRNA. Banding identity was confirmed using untreated in vitro transcribed pegRNA and epegRNA as standards, DIG-labeled ssRNA ladder, and purified RNA from HEK293T cells transfected with sgRNA as markers. FIG. 114C shows the abundance of HEK3, DNMT1, RNF2 or EMX 1-targeted epegRNA and typical pegRNA in HEK293T cells by RT-qPCR amplification and quantification of sgRNA scaffolds. Primers for qPCR amplification can be found in table E5. Data and error bars reflect the mean and standard deviation of three independent biological replicates.
FIGS. 115A-115C show high throughput sequencing analysis of PE2 mediated genomic reverse transcriptase products. As shown, comparison of guided editing intermediates generated by PE2 with pegRNA or epegr at EMX1 (fig. 115A) HEK3, (fig. 115B) DNMT1 or (fig. 115C). The dashed line represents the full length reverse transcriptase product templated by pegRNA or epegRNA tested at the indicated locus. The first base 3' downstream relative to the position of the PE2 induced notch is indicated as position +1. Histograms and pie charts were generated from the average of three independent biological replicates.
FIGS. 116A-116D show PE3 mediated editing efficiency of pegRNA containing other RNA structural motifs. FIGS. 116A-116B show a comparison of PE3 mediated editing efficiency for installation of a FLAG epitope tag, 15-nt deletion or point mutation at HEK3 (FIG. 116A) and RNF2 (FIG. 116B) with epegRNAs to which different G-quadruplexes have been attached via 8-nt linkers. G-quadruplexes range from 60℃to >90℃according to melting temperature ordering, as previously determined. FIG. 116C shows PE3 mediated efficiency of mutation at a designated genomic locus mounting point using pegRNA containing an evoreQ 1 motif or a 15-bp (34-nt) hairpin. FIG. 116D shows that adding a pseudojunction or large tertiary RNA structure (P4-P6 domain from group I introns of Thermotetrahymena) known to inhibit 5 'exonuclease XrnI (xrNI) to the 3' end of pegRNA via an 8-nt linker does not result in more efficient editing than by the same linker addition of evoreQ 1 or mpnkot. The distance from the Cas9 cleavage site to the installed mutation is indicated. Data and error bars represent standard deviations of three independent biological replicates.
FIGS. 117A-117C illustrate the inclusion of evoreQ 1 Or PE 3-mediated editing efficiency of epegrnas of mpknot variants. For installation, 15-nt deletion or point mutation using FLAG epitope tags at HEK3 and RNF2 with epegrnas containing different RNA motifs, a comparison of PE3 mediated editing efficiency is shown, where the distance between Cas9 nick and edit is represented by +1. FIGS. 117A-117B illustrate and evatreQ 1 Or mpknot compared to additionally evolved prequeosine 1 PE3 editing efficiency of either the riboswitch aptamer variant-1 (FIG. 117A) or mpknot modification (FIG. 117B). FIG. 117C shows PE3 editing efficiency of epegRNA with trimming to remove 5 'and 3' nucleotides of evoreQ 1 (tevopreQ 1) and mpknot (tmpknot) compared to the parental epegRNA. Data and error bars represent mean and standard deviation of three independent biological replicates.
FIG. 118 shows the effect of (F+E) scaffolds on PE2 editing efficiency of lentivirally transduced epeg RNA. Lentiviral transduction guide editor and PE2 editing efficiency containing tevopreQ1 and a typical or (F+E) sgRNA scaffold and templating the editing-indicating pegRNA or epegRNA at HEK3 or DNMT1 in HEK293T cells. Data and error bars reflect the mean and standard deviation of three independent biological replicates.
FIG. 119 shows the effect of (F+E) scaffold modification on the guided editing efficiency of epegRNA. Comparison of PE3 mediated editing efficiency of epegrnas with indicator scaffold versus epegrnas with standard SpCas9 sgRNA scaffold. In these experiments, one tenth of the normal amount of PE2 encoding plasmid and pegRNA or epegRNA were transfected into HEK293T cells. Templated edits are transversions at PRNP, RUNX1 or EMX1 or 15-nt deletions at HEK 3. The modified scaffold sequences all contain "flip and extend" (f+e) modifications. The scaffold designated cr also contains mutations to the (f+e) scaffold that were previously identified as possibly enhancing Cas9 nuclease activity at some sites 6 . The sequences of all scaffolds can be seen in table E1. The lines represent the total median.
FIGS. 120A-120F show calculated predictions of effective linker sequences between PBS and epegRNA structural motifs. FIG. 120A provides a schematic diagram illustrating the pegLIT workflow, which is a computational script for selecting the appropriate linker sequence for the epegRNA. Potential linker sequences are filtered by sequence identity and base pairing propensity with other regions of the epegRNA. The sequences that pass the filter are then optionally clustered according to identity, and individual sequences are selected from the different clusters to promote diversity of the final output. FIGS. 120B-120C show evaopreQ linked to a conjugate containing a predicted interaction with PBS via an artificially designed conjugate or by pegLIT 1 Containing evatreq linked via the pegLIT recommended linker sequence compared to the epegRNA of (a) 1 The epegrnas of (a) lead to moderately improved PE editing efficiency. Figure 120D shows the salvaged activity at those sites where the epegrnas did not initially produce improvement (figures 111A-111K). FIG. 120E shows a memory having an evoreQ 1 Comparison with the PE3 mediated editing efficiency of epegRNAs with 8 or 18nt long linkers showed that no significant improvement was achieved by increasing the linker length. FIG. 120F shows a comparison of PE3 mediated editing efficiency for an epegRNA with evatreQ 1 (p) or mpknot (m) and with 8-nt pegLIT linker (8) or no linker (0). Significance was calculated using student t-test (p=0.0061). FIG. 120G shows the fold increase in PE3 mediated editing efficiency compared to no linker using the epegRNA of tevopreQ1 containing an 8-nt pegLIT linker. The data is displayed as flat The mean, error bars represent (fig. 120B) the standard deviation of the mean of five pegLIT designed adaptors (triplicate for each adaptor) or the standard deviation of three replicates of an artificially designed adaptor sequence, (fig. 120C, 120D and 120G) or (fig. 120E-120F) or the total mean of the mean multiple changes of the editing efficiency for each indicated locus and edit.
FIGS. 121A-121B show the improvement in editing efficiency after nuclear transfection of chemically synthesized epegRNA. FIG. 121A shows a designated editing of PE2 encoding, chemically synthesized nick-producing sgRNA and chemically synthesized pegRNA or containing evoreQ via 8-nt linker 1 Indicates the efficiency of the edited PE3 mediated installation after nuclear transfection of mRNA for epegRNA. FIG. 121B shows that based on the data in FIG. 121A, for indicated sites and edits, epegRNA edits compared to pegRNA: observed fold change in indel ratio. Data and error bars represent standard deviations of two or more independent biological replicates.
FIGS. 122A-122B show PE2 mediated efficiency of FLAG tag installation at indicated genomic sites. FIG. 122A shows PE2 mediated editing efficiency of inserting FLAG epitope tags at 15 genomic loci in HEK293T cells using non-optimized epegRNA as compared to non-optimized typical pegRNA. Fig. 122B shows the data from fig. 122A displayed in bar graph form. Sites with less than 1% editing efficiency for pegRNA and epegRNA are not shown, but are listed in Table E1. Data and error bars reflect the mean and standard deviation of three independent biological replicates.
FIG. 123 provides an image of the uncut agarose gel from FIG. 106A. The selected areas of the uncut image for the agarose gel of FIG. 106A are outlined in black. Untreated in vitro transcribed pegRNA or epegRNA was used as molecular weight standard.
FIGS. 124A-124C illustrate the uncut northern blot of FIGS. 114A-114C. Fig. 124A shows an unclamped image for the northern blot of fig. 114A, where the selected areas are outlined in black. Species lengths were confirmed on separate blots with molecular weight gradients using untreated in vitro transcribed pegRNA and epegRNA as molecular weight standards (as shown in figure 124B). FIG. 124B shows an uncleaved image of a northern blot used to confirm the identity and molecular weight of the bands of the standard in FIG. 124A. Fig. 124C shows an unclamped image for the northern blot of fig. 124C, wherein the selected areas are outlined in black.
Figures 125A-125E show the effect of different sgRNA scaffolds on editing efficiency in HEK293T cells.
FIGS. 126A-126B illustrate that flipping and extending the modifier may improve the guided editing efficiency in some cases.
FIGS. 127A-127B illustrate that different sgRNA scaffolds can improve guided editing efficiency in some cases.
FIG. 128 is a flow chart of an illustrative process 11800 for identifying one or more nucleic acid linkers for coupling a guided editing guide RNA to a nucleic acid portion, in accordance with some embodiments of the technology described herein. Process 11800 may be implemented using any suitable computing device, as aspects of the techniques described herein are not limited in this respect.
FIG. 129 is a flowchart of an illustrative process 11900 for iteratively identifying one or more nucleic acid linkers for coupling a guided editing guide RNA to a nucleic acid portion, in accordance with some embodiments of the techniques described herein. Process 11900 may be implemented using any suitable computing device, as aspects of the techniques described herein are not limited in this respect.
Fig. 130 shows an illustrative implementation of a computer system 12000 in which embodiments of the techniques described herein may be implemented. For example, any of the computing devices described herein may be implemented as computing system 12000. The computing system 12000 can include one or more computer hardware processors 12002 and one or more articles of manufacture comprising non-transitory computer-readable storage media (e.g., memory 12004 and one or more non-volatile storage devices 12006). The processor 12002 may control writing data to and reading data from the memory 12004 and the nonvolatile storage device 12006 in any suitable manner. To perform any of the functions described herein, the processor 12002 may execute one or more processor-executable instructions stored in one or more non-transitory computer-readable storage media (e.g., memory 12004), which may act as a non-transitory computer-readable storage medium storing processor-executable instructions for execution by the processor 12002.
Fig. 131 illustrates three broad areas in which guided editing may be improved. These include recognition of target nucleic acids, installation of edits, and resolution of edited DNA heteroduplexes.
Figure 132 shows that PBS: spacer interactions limit PE efficiency by decreasing Cas9 affinity, but are necessary for PBS: pre-spacer binding to occur. Shorter PBS has been shown to result in increased binding affinity to Cas 9.
Figure 133 shows that the pivot point can inhibit PBS: spacer and PBS: pre-spacer interactions if independent of Cas9 binding.
Figure 134 shows that the pivot was contended by PE2 binding due to competing RNA-protein interactions. Design considerations include 1) interdependence of the lengths of both Cas9-RT and RT-MS2 linkers, pegRNA extension and PBS, the pivot point, and the linker between MS2 aptamer and the pivot point; 2) Dependence of fulcrum length on PBS melt temperature and site accessibility; 3) Optimization for each site; and 4) tolerance to non-interactive 17 nucleotide PBS.
Fig. 135 shows that the C-terminal MS2 fusion shows better editing efficiency than the HEK 3N-terminal fusion.
Fig. 136 shows that MS2 labeling of PE2 is advantageous over untagged PE 2. PE2-MS2 fusions comprising either an xten-16aa linker or an xten-33aa linker are shown.
FIG. 137 shows that MS2 and toe ring tagging rescue long primer binding sites. PE2-MS2 fusions comprising either an xten-16aa linker or an xten-33aa linker are shown.
Figure 138 shows that the pegRNA extension was moved onto the incision generating guide to completely avoid PBS-spacer interactions. Design considerations include: 1) The extension template is used as a joint to affect the decomposition of the valve; 2) Optimizing incision-producing spacers; and 3) the necessity of two PE complexes being present on the genome at the same time.
FIG. 139 shows that the strategy shown in FIG. 138 (extending pegRNA onto the incision generating guide) enables guided editing.
Fig. 140 shows a model based on the identity and location of mismatches within the PBS relative to the cut.
Figure 141 shows that mutations in PBS are tolerable, or in some cases enhance PE activity, and fit into the initial model, with mutation location and identity determining PE efficiency.
Fig. 142 shows that longer PBS (RNF 2, 15 nt) cannot tolerate mutations, probably because they excessively inhibit PBS: the pre-spacers interact.
Fig. 143 shows that PBS mutations can increase PE efficiency of pegRNA with shorter optimal PBS. mutPBS for mutPBS epegRNA was 17, with 4 consecutive mutations (HEK 3, DNMT1, PRNP) or mutPBS was fifteen, with four consecutive mutations (RNF 2), followed by 8nt linker and tevopreQ 1
FIG. 144 shows that the improvement in mutPBS can provide additional enhancement in editing efficiency when used in combination with epegRNA.
FIG. 145 shows that guided editing (e.g., using PE 3) can be used to install or correct pathogenic alleles and sequence tags.
Fig. 146 shows an embodiment of a guided editing strategy to install and correct CDKL5 c.1412dela mutations.
Fig. 147 shows that guided editing using the pegRNA of fig. 146 can be used to edit CDKL5 c.1412dela mutations in human cells.
FIG. 148 shows that a single guide editor (e.g., PE 2) complexed with a single pegRNA is capable of correcting multiple pathogenic variations of the CDKL5 gene in exon 8, including correcting V172I, A173D, R175S, W176G, W176R, Y177C, R178P, P180L, E A and L182P mutations.
FIG. 149 shows that a single guide editor (e.g., PE 2) complexed with a single pegRNA is able to correct a large number of pathogenic mutations at positions +4, +8, +12, +17, +21, and +25 relative to position 1 (i.e., the nucleotide at the 5' -most position) of the PAM sequence.
Figure 150 shows CDKL5 c1412delA directed editing transfection in N2A cells.
Figure 151 shows the editing efficiency of 1412delA inserts in N2A cells using epegRNA 072 without seed editing.
Graph 152 shows editing efficiency of 1412delA inserts in N2A cells using PE5 and different pegRNAs edited with seed addition.
Fig. 153 shows the editing efficiency of installing multiple pathogenic CDKL5 alleles in HEK293T cells via plasmid transfection.
FIG. 154 shows a schematic diagram of PE2 and PEmax editor architecture. bpNLSSV40, two-component SV40 NLS nuclear localization signal. MMLV RT, moloney mouse leukemia virus reverse transcriptase pentamutant (pentamutant); codon optimization, human codon optimization.
Fig. 155 compares the structures of PE2, PE3, PE4, and PE 5. In particular, the PE4 editing system consists of a guide editor enzyme (nickase Cas9-RT fusion), MLH1dn and pegRNA. The PE5 editing system consists of a guide editor enzyme, MLH1dn, pegRNA and a second strand nick producing sgRNA.
Fig. 156 shows the guided editing of CDKL5 in wild-type HeLa and HEK293T cells. CDKL5 editing was located at the site of the c.1412dela mutation leading to CDKL5 deficiency. epegRNA was used to edit the CDKL5 locus. Bars represent the average of n=3 independent biological replicates.
FIG. 157 shows correction of CDKL5 c.1412delA via A.T insertion and silencing G.C-to-A.T edits in iPSCs derived from allele heterozygous patients. Editing efficiency represents the percentage of sequencing reads with c.1412dela correction in editable alleles with mutations. The indel frequency reflects all sequencing reads that contain any indels. Bars represent the average of n=3 independent biological replicates.
FIG. 158 shows correction of CDKL5 c.1412delA via A.T insertions and G.C-to-A.T edits in iPSCs derived from patients heterozygous for the disease allele. Editing efficiency represents the percentage of sequencing reads with c.1412dela correction in editable alleles with mutations. Indels frequency reflects all sequencing reads containing any indels that are not located to the c.1412dela allele or wild type sequence. Under all conditions shown, 1. Mu.g PE2 mRNA was used. Bars represent the average of n=3 independent biological replicates.
FIG. 159 shows a combination of MLH1dn and epegRNA for CDKL5 editing. The editing efficiency of CDKL5 c.1412a to G mutations in HEK293T cells is shown.
Fig. 160 shows the optimization of kerf-generating sgrnas for pilot editing at CDKL 5. The editing efficiency of installing CDKL5 silencing +1c to T mutation (c.1412dela site) in HEK293T cells is shown.
Fig. 161 shows that SpCas9-PE may generate indel byproducts when editing wild-type CDKL 5.
Fig. 162 shows that the NRCH SpCas9 variant guide editor does not generate indel byproducts when editing wild-type CDKL 5.
Fig. 163 shows that the NRTH SpCas9 variant guide editor does not generate indel byproducts when editing wild-type CDKL 5.
FIG. 164 shows an optimization for installing nucleotide converted pegRNA at c.1412 in the CDKL5 gene of HEK293T cells using PE 2.
Figure 165 shows the screening of the notch-producing guide for PE3 mediated editing use at c.1412. Ext> allext> guidesext> containedext> theext> optimalext> PBSext> andext> templateext> lengthext> identifiedext> inext> figureext> 164ext> andext> encodedext> +1gext> -ext> aext> conversionext>.ext> CDKL5h37 is pegRNA, the remaining guide is epegRNA containing a different RNA structural motif at the 3' of PBS via an 8 nucleotide linker. CDKL5h37 and JNpeg953 showed the highest editing efficiency.
Definition of the definition
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The following references provide the skilled artisan with a general definition of many of the terms used in the present invention: singleton et al Dictionary of Microbiology and Molecular Biology (2 nd ed 1994); the Cambridge Dictionary of Science and Technology (Walker ed., 1988); the Glossary of Genetics,5th Ed., r.rieger et al (Ed.), springer Verlag (1991); and Hale & Marham, the Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings given to them unless otherwise indicated.
Antisense strand
In genetics, the "antisense" strand of a double-stranded DNA internal segment is the template strand and is considered to extend in the 3 'to 5' direction. In contrast, the "sense" strand is a segment of double-stranded DNA extending from 5 'to 3', which is complementary to the antisense or template strand (extending from 3 'to 5') of DNA. In the case of a DNA segment encoding a protein, the sense strand is a DNA strand having the same sequence as the mRNA, which takes the antisense strand as its template during transcription and eventually (usually, not always) undergoes translation into a protein. Thus, the antisense strand is responsible for subsequent translation into protein RNA, while the sense strand has nearly the same composition as mRNA. Note that for each segment of dsDNA, there may be two sets of sense and antisense, depending on the direction of reading (since sense and antisense are relative to the view angle). Finally, it is specified which strand of the dsDNA segment is referred to as sense or antisense to be a gene product or mRNA.
Aptamer
An "aptamer" refers to an oligonucleotide or peptide molecule that binds to a particular target molecule. Aptamers include DNA or RNA aptamers, which are short oligonucleotides based on single stranded DNA or RNA that, when folded into their unique three-dimensional structure, can selectively bind small molecule ligands or protein targets with high affinity and specificity. At the molecular level, the aptamer binds to its cognate target through various non-covalent interactions, electrostatic interactions, hydrophobic interactions, and induction fitting. Further reference is made to Ku et al, "Nucleic Acid Aptamers: an Emerging Tool for Biotechnology and Biomedical Sensing," Sensors,2015,15 (7): 16281-16313. The present disclosure contemplates the use of any aptamer, including aptamers obtained from commercial sources. For example, many aptamers are available from APTAGEN (www.aptagen.com), including but not limited to thrombin (15 mer), HIV-1TAR RNA hairpin loop (B22-19), human immunoglobulin G (IgG) (Apt 8), active green 19 (GR-30), abrin (TA 6), malachite green (MG-4), PSMA aptamer (A10-3), tenascin-C (GBI-10), and methylenedianiline (M1). Another example is prequeosine 1 1 riboswitch aptamer-one of the smallest natural tertiary RNA structures (also known as evoreQ) 1 -1)。
Cas9
The term "Cas9" or "Cas9 nuclease" refers to a protein comprising a Cas9 domain or fragment thereof (e.g., comprising an active or inactive DNA cleavage domain of Cas9, and/or a gRNA binding domain of Cas 9). As used herein, a "Cas9 domain" is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or a gRNA binding domain of Cas 9. The "Cas9 protein" is a full-length Cas9 protein. Cas9 nucleases are sometimes also referred to as Cas 1 nucleases or CRISPR (clustered regularly interspaced short palindromic repeats) related nucleases. CRISPR is an adaptive immune system that can provide protection against mobile genetic elements (viruses, transposable elements, and conjugated plasmids). CRISPR clusters contain a spacer, a sequence complementary to a preceding mobile element and a target invader nucleic acid. The CRISPR cluster was transcribed and processed into CRISPR RNA (crRNA). In a type II CRISPR system, correct processing of pre-crrnas requires trans-encoded small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc) and Cas9 domains. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular dsDNA target complementary to the spacer in an endonucleolytic manner. Target strands that are not complementary to crrnas are first cleaved in an endonucleolytic manner and then trimmed in a 3'-5' exonucleolytic manner. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gRNA") may be engineered to incorporate aspects of both crRNA and tracrRNA into a single RNA species. See, e.g., jink m., chlinski k, fonfara i, hauer m, doudna j.a., charplenier e.science 337:816-821 (2012), the entire contents of which are incorporated herein by reference. Cas9 recognizes short motifs in CRISPR repeats (PAM or pre-spacer adjacent motifs) to help distinguish self from non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., "Complete genome sequence of an M1 strain of Streptococcus pyogenes", "ferroet al, j.j., mcshin w.m., ajdic d.j., savic g., lyon k., primeaux c, sezate s, suvorov a.n., kenton s, lai h.s., lin s.p., qian y, jia h.g., najar f.z., ren q., zhu h., song l., white j., yuan x, clifton s.w., roe B.A., mcLaughlin R.E., proc.Natl.Acad.Sci.U.S.A.98:4658-4663 (2001);" CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. "Deltcheva E.," Chulinski K., "Sharma C.M.," Gonzales K., "Chao Y.," Pirzada Z.A., "Eckert M.R.," Vogel J., "Charpentier E.," Nature 471:602-607 (2011), "Aprogrammable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity", "Jinek M.," Chundiski K., "Fonfara I.," Hauer M., "Doudna J.A.," Charpentier E.Science 337:816-821 (2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species including, but not limited to, streptococcus pyogenes(s) and streptococcus thermophilus (s.thermophilus). Other suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on the present disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from organisms and loci disclosed in cheilinski, rhen, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference). In some embodiments, the Cas9 nuclease comprises a partial damage or inactivation of one or more mutations of the DNA cleavage domain.
Nuclease-inactivated Cas9 domains are interchangeably referred to as "dCas9" proteins (representing nuclease- "dead" Cas 9). Methods for generating Cas9 domains (or fragments thereof) with inactive DNA cleavage domains are known (see, e.g., jink et al, science.337:816-821 (2012); qi et al, "Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression" (2013) cell.28;152 (5): 1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, an HNH nuclease subdomain and a RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, while the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas 9. For example, mutations D10A and H840A completely inactivate nuclease activity of Streptococcus pyogenes Cas9 (Jinek et al, science.337:816-821 (2012); qi et al, cell.28;152 (5): 1173-83 (2013)). In some embodiments, proteins comprising Cas9 fragments are provided. For example, in some embodiments, the protein comprises one of two Cas9 domains: (1) a gRNA binding domain of Cas 9; or (2) a DNA cleavage domain of Cas 9. In some embodiments, a protein comprising Cas9 or a fragment thereof is referred to as a "Cas9 variant. Cas9 variants have homology to Cas9 or fragments thereof. For example, the Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 37). In some embodiments, cas9 variants can have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 37). In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO:37Cas9 (e.g., a gRNA binding domain or a DNA cleavage domain) such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 37). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of the corresponding wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 37).
cDNA
The term "cDNA" refers to a strand of DNA that is replicated from an RNA template. The cDNA is complementary to the RNA template.
Annular arrangement body (permament)
As used herein, the term "circular arrangement" refers to a protein or polypeptide (e.g., cas 9) comprising a circular arrangement (permutation) that is a change in the structural structure of a protein, involving amino acid sequence changes that occur in the amino acid sequence of the protein. In other words, the circular arrangement is a protein having altered N-and C-termini compared to the wild-type counterpart, e.g. the wild-type C-terminal half of the protein becomes the new N-terminal half. The circular arrangement (or CP) is essentially a topological rearrangement of the primary sequence of a protein, typically using a peptide linker to join its N and C ends, while splitting its sequence at different positions to form new adjacent N-and C-ends. The result is a protein structure with different connectivity, but may generally have the same overall similar three-dimensional (3D) shape, and may include improved or altered characteristics, including reduced proteolytic susceptibility, increased catalytic activity, altered substrate or ligand binding, and/or increased thermostability. The circularly permuted proteins may be present in nature (e.g., concanavalin a and lectin). Furthermore, the circular arrangement may occur as a result of post-translational modification, or may be engineered using recombinant techniques.
Annularly arranged Cas9
The term "circularly permuted Cas9" refers to any Cas9 protein or variant thereof that exists as a circular permutation whereby its N-and C-termini have been partially rearranged. Such a circular arrangement of Cas9 proteins ("CP-Cas 9") or variants thereof retains the ability to bind DNA when complexed with guide RNAs (grnas). See Oakes et al, "Protein Engineering of Cas9 for enhanced function," Methods Enzymol,2014,546:491-511 and Oakes et al, "CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification," Cell, january 10,2019,176:254-267, each of which is incorporated herein by reference. The present disclosure contemplates any previously known CP-Cas9 or use of a new CP-Cas9, so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with guide RNAs (grnas). An exemplary CP-Cas9 protein is SEQ ID NO 88-97.
CRISPR
CRISPR is a family of DNA sequences in bacteria and archaebacteria (i.e., CRISPR clusters) that represent fragments of a prior infection by viruses that have invaded prokaryotes. Prokaryotic cells use DNA fragments to detect and destroy DNA from subsequent attack by similar viruses and effectively constitute a prokaryotic immune defense system along with a series of CRISPR-associated proteins (including Cas9 and its homologs) and CRISPR-associated RNAs. In fact, the CRISPR cluster is transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crrnas requires trans-encoded small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc), and Cas9 proteins. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular dsDNA target complementary to the RNA in an endonucleolytic manner. Specifically, the target strand that is not complementary to crRNA is first cleaved in an endonucleolytic manner, and then trimmed in a 3'-5' exonucleolytic manner. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gRNA") may be engineered to incorporate aspects of both crRNA and tracrRNA into a single RNA species, the guide RNA. See, e.g., jink m., chlinski k, fonfara i, hauer m, doudna j.a., charplenier e.science 337:816-821 (2012), the entire contents of which are incorporated herein by reference. Cas9 recognizes short motifs in CRISPR repeats (PAM or pre-spacer adjacent motifs) to help distinguish self from non-self. CRISPR biology and Cas9 nuclease sequences and structures are well known to those skilled in the art (see, e.g., "Complete genome sequence of an M1 strain of Streptococcus pyogenes", "ferrotti et al, j.j., mcshin w.m., ajdic d.j., savic g., lyon k., primeaux c, sezate s, suvorov a.n., kenton s, lai h.s, lin s.p., qian y, jia h.g., najar f.z., ren q., zhu h., song l., white j., yuan x, clifton s.w., roe B.A., mcLaughlin R.E., proc.Natl.Acad.Sci.U.S.A.98:4658-4663 (2001);" CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. "Deltcheva E.," Chulinski K., "Shalma C.M.," Gonzales K., "Chao Y.," Pirzada Z.A., "Eckert M.R.," Vogel J., "Charpentier E.," Nature 471:602-607 (2011), "Aprogrammable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity", "Jinek M.," Chundiski K., "Fonfara I.," Hauer M., "Doudna J.A.," Charpentier E.Science 337:816-821 (2012), the entire contents of which are incorporated herein by reference). Cas9 orthologs have been described in different species, including but not limited to streptococcus pyogenes and streptococcus thermophilus. Other suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on the present disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from organisms and loci disclosed in cheilinski, rhun, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference).
In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crrnas requires trans-encoded small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc), and Cas9 proteins. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular nucleic acid target complementary to the RNA in an endonucleolytic manner. Specifically, the target strand that is not complementary to crRNA is first cleaved in an endonucleolytic manner, and then trimmed in a 3'-5' exonucleolytic manner. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gRNA") may be engineered to incorporate embodiments of both crRNA and tracrRNA into a single RNA species, the guide RNA.
In general, a "CRISPR system" is collectively referred to as transcripts and other elements involved in the expression of or directing the activity of a CRISPR-associated ("Cas") gene, including sequences encoding Cas genes, tracr (trans-activated CRISPR) sequences (e.g., tracrRNA or active moiety tracrRNA), tracr mate sequences (covering "orthographic repeats" in the context of endogenous CRISPR systems and partial orthographic repeats of tracrRNA processing), guide sequences (also referred to as "spacers" in the context of endogenous CRISPR systems), or other sequences and transcripts from CRISPR loci. The tracrRNA of this system is complementary (either fully or partially) to the tracrRNA pairing sequence present on the guide RNA.
DNA synthesis template
As used herein, the term "DNA synthesis template" refers to a region or portion of the extension arm of a pegRNA that is used as a template strand by the polymerase of the guided editor to encode a 3' single stranded DNA flap containing the desired editing, and then replaces the corresponding endogenous DNA strand at the target site by the guided editing mechanism. In various embodiments, the DNA synthesis templates are shown in fig. 3A (in the context of pegRNA comprising 5 'extension arms), fig. 3B (in the context of pegRNA comprising 3' extension arms), fig. 3C (in the context of internal extension arms), fig. 3D (in the context of 3 'extension arms), and fig. 3E (in the context of 5' extension arms). The extension arm (including the DNA synthesis template) may be composed of DNA or RNA. In the case of RNA, the polymerase that directs the editor may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of DNA, the polymerase that directs the editor may be a DNA-dependent DNA polymerase. In various embodiments (e.g., as shown in fig. 3D-3E), the DNA synthesis template comprises an "editing template" and a "homology arm". In various embodiments (e.g., as shown in fig. 3D-3E), the DNA synthesis template (4) may comprise all or part of the "editing template" and "homology arm", and optionally the 5' end modification region E2. That is, depending on the nature of the e2 region (e.g., whether it includes hairpin, toe loop, or stem/loop secondary structures), the polymerase may also not encode, or both a certain and the entire e2 region. In other words, in the case of a 3' extension arm, the DNA synthesis template (3) may comprise the portion of the extension arm (3) spanning the 5' end of the Primer Binding Site (PBS) to the 3' end of the gRNA core that may function as a template for the synthesis of a single strand of DNA by a polymerase (e.g., reverse transcriptase). In the case of a 5' extension arm, the DNA synthesis template (3) may comprise a portion of the extension arm (3) spanning the 5' end of the pegRNA molecule to the 3' end of the editing template. In some embodiments, the DNA synthesis template does not include a Primer Binding Site (PBS) for pegRNA with a 3 'extension arm or a 5' extension arm. Certain embodiments described herein (e.g., FIG. 71A) refer to "RT templates" that include editing templates and homology arms, i.e., sequences of pegRNA extension arms that actually function as templates during DNA synthesis. The term "RT template" is equivalent to the term "DNA synthesis template". In certain embodiments, the RT templates may be used to refer to template polynucleotides for reverse transcription, e.g., in a guided editing system, complex, or method using a guided editor with a polymerase as reverse transcriptase. In some embodiments, a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, such as RNA-dependent DNA polymerization or DNA-dependent polymerization, for example, in a guided editing system, complex, or method of using a guided editor with a DNA polymerase as RNA-dependent or DNA-dependent.
In the case of trans-directed editing (e.g., fig. 3G and 3H), the Primer Binding Site (PBS) and DNA synthesis template may be engineered into separate molecules, referred to as trans-directed editor RNA template (tPERT).
In some embodiments, the DNA synthesis template is a single-stranded portion of PEgRNA that is 5' to PBS and comprises a region complementary to a PAM strand (i.e., a non-target strand or an editing strand) and comprises one or more nucleotide edits compared to the endogenous sequence of the double-stranded target DNA. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on a non-target strand located downstream of the nicking site, except for one or more non-complementary nucleotides at the desired nucleotide editing position. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand immediately downstream (i.e., immediately downstream) of the nicking site, except for one or more non-complementary nucleotides at the desired nucleotide editing position. In some embodiments, one or more non-complementary nucleotides at the desired nucleotide editing position are immediately downstream of the nick site. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to a double stranded target DNA sequence. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to a non-target strand of a double-stranded target DNA sequence. For each PEgRNA described herein, the nick site is a characteristic of a particular napDNAbp with which the gRNA core of the PEgRNA is associated and is a characteristic of a particular PAM required to recognize and function of the napDNAbp. For example, for PEgRNA comprising a gRNA core associated with SpCas9, the nick site in the phosphodiester linkage is located between base three (position 1 "-3" relative to PAM sequence) and base four (position "-4" relative to PAM sequence position 1). In some embodiments, the DNA synthesis template and the primer binding site are immediately adjacent to each other. The terms "nucleotide edit", "nucleotide change", "desired nucleotide change" and "desired nucleotide edit" are used interchangeably to refer to a particular nucleotide edit at a particular position in a DNA synthesis template of PEgRNA to be incorporated into a target DNA sequence, e.g., a particular deletion of one or more nucleotides, a particular insertion of one or more nucleotides, a particular substitution of one or more nucleotides, or a combination thereof. In some embodiments, the DNA synthesis template comprises more than one nucleotide edit relative to the double stranded target DNA sequence. In such embodiments, each nucleotide edit is a specific nucleotide edit at a specific location in the DNA synthesis template, each nucleotide edit is at a different specific location relative to any other nucleotide edits in the DNA synthesis template, and each nucleotide edit is independently selected from a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution of one or more nucleotides, or a combination thereof. Nucleotide edits may refer to edits on the DNA synthesis template as compared to the sequence on the target strand of the target gene, or may refer to edits on the newly synthesized single stranded DNA encoded by the DNA synthesis template that replace endogenous target DNA sequences on non-target strands, in either case the nucleotide edits may be referred to as nucleotide edits compared to the target DNA sequences.
Downstream of
As used herein, the terms "upstream" and "downstream" are relative terms that define the linear position of at least two elements located in a nucleic acid molecule (whether single-stranded or double-stranded) oriented in the 5 'to 3' direction. In particular, the first element is upstream of the second element in the nucleic acid molecule, wherein the first element is located somewhere 5' of the second element. For example, if the SNP is located 5' to the nick site, the SNP is located upstream of the Cas 9-induced nick site. Conversely, the first element is located downstream of the second element in the nucleic acid molecule, wherein the first element is located somewhere 3' of the second element. For example, if the SNP is located 3' to the nick site, the SNP is located downstream of the Cas 9-induced nick site. The nucleic acid molecule may be DNA (double-stranded or single-stranded). RNA (double-stranded or single-stranded), or hybrids of DNA and RNA. Single-stranded nucleic acid molecules are identical to double-stranded molecules in analysis, as the terms upstream and downstream refer only to the single strand of the nucleic acid molecule, except for the consideration of which strand of the double-stranded molecule needs to be selected. Generally, the strand of double-stranded DNA that can be used to determine the positional relatedness of at least two elements is the "sense" or "coding" strand. In genetics, the "sense" strand is a segment of double-stranded DNA that extends from 5 'to 3' and is complementary to the antisense or template strand of DNA that extends from 3 'to 5'. Thus, for example, if a SNP nucleobase is 3' to the sense strand or the promoter of the coding strand, the SNP nucleobase is "downstream" of the promoter sequence in genomic DNA (which is double-stranded).
Editing template
The term "editing template" refers to a portion of an extension arm that encodes the desired edit in a single-stranded 3' DNA flap synthesized by a polymerase, e.g., a DNA-dependent DNA polymerase, an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). Certain embodiments described herein (e.g., fig. 71A) refer to "RT templates," which refer to both editing templates and homology arms, i.e., sequences of pegRNA extension arms that actually serve as templates during DNA synthesis. The term "RT editing template" is also equivalent to the term "DNA synthesis template", but wherein the RT editing template reflects the use of a guide editor with a polymerase as reverse transcriptase, wherein the DNA synthesis template more broadly reflects the use of a guide editor with any polymerase.
Effective amount of
As used herein, the term "effective amount" refers to an amount of a bioactive agent sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a guided editor (PE) may refer to an amount of editor sufficient to edit a target site nucleotide sequence (e.g., genome). In some embodiments, an effective amount of a guide editor (PE) provided herein, e.g., an effective amount of a fusion protein comprising a nickase Cas9 domain and a reverse transcriptase, can refer to an amount of the edited fusion protein sufficient to cause the fusion protein to specifically bind and edit a target site. Those skilled in the art will appreciate that an effective amount of an agent, such as a fusion protein, nuclease, hybrid protein, protein dimer, complex of protein (or protein dimer) and polynucleotide, or polynucleotide, may vary depending on various factors, such as the desired biological response, e.g., the particular allele, genome, or target site to be edited, the cell or tissue to be targeted, and the agent used.
Error-prone reverse transcriptase
As used herein, the term "error-prone" reverse transcriptase (or more broadly, any polymerase) refers to a reverse transcriptase (or more broadly, any polymerase) that is naturally occurring or derived from another reverse transcriptase (e.g., wild type M-MLV reverse transcriptase) that has a lower error rate than the wild type M-MLV reverse transcriptase. The error rate of wild-type M-MLV reverse transcriptase is reported to be in the range of one error (higher) in 15,000 to one error (lower) in 27,000. An error rate of 15,000 times corresponds to 6.7x10 -5 Is a function of the error rate (error rate). An error rate of 27,000 times corresponds to 3.7x10 -5 Is a function of the error rate (error rate). See Boutaalout et al (2001) "DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Ty," Nucleic Acids Res (11): 2217-2222, incorporated herein by reference. Thus, for the purposes of the present application, the term "error-prone" refers to a single error in the incorporation of more than 15,000 nucleobases with an error rate (6.7x10 -5 Or higher), for example, 1 error in 14,000 nucleobases (7.14x10) -5 Or higher), 1 error in 13,000 or less nucleobases (7.7x10) -5 Or higher), 1 error in 12,000 or less nucleobases (7.7x10) -5 Or higher), 1 error in 11,000 or less nucleobases (9.1x10) -5 Or higher), 1 error in 10,000 or less nucleobases (1 x 10) -4 Or 0.0001 or higher), 9,000 or less nucleobases1 error (0.00011 or higher), 1 error (0.00013 or higher) in 8,000 or less nucleobases, 1 error (0.00014 or higher) in 7,000 or less nucleobases, 1 error (0.00016 or higher) in 6,000 or less nucleobases, 1 error (0.0002 or higher) in 5,000 or less nucleobases, 1 error (0.00025 or higher) in 4,000 or less nucleobases, 1 error (0.00033 or higher) in 3,000 or less nucleobases, 1 error (0.00050 or higher) in 2,000 or less nucleobases, or 1 error (0.002 or higher) in 1,000 or less nucleobases, or 1 error (0.004 or higher) in 500 or less nucleobases.
Exo-peptide
As used herein, the term "extein" refers to a polypeptide sequence that is flanked by inteins and linked to another extein during splicing of the protein to form a mature spliced protein. Typically, the intein is flanked by two sequences of exopeptides that are joined together when the intein catalyzes its own excision. Thus, an exopeptide is a protein analog of an exon present in an mRNA. For example, the polypeptide comprising an intein may have the structure of an extein (N) -intein-extein (C). After cleavage of the intein and splicing of the two exons, the resulting structure is the extein (N) -extein (C) and the free intein. In different configurations, the exons can be separate proteins (e.g., cas9 or half of a guide editor), each fused to a split intein, where cleavage of the split intein results in splicing together of the exonic peptide sequences.
Extension arm
The term "extension arm" refers to a nucleotide sequence component of a pegRNA that comprises a primer binding site and a DNA synthesis template (e.g., editing template and homology arm) for a polymerase (e.g., reverse transcriptase). In some embodiments, such as fig. 3D, the extension arm is located at the 3' end of the guide RNA. In other embodiments, such as FIG. 3E, the extension arm is located at the 5' end of the guide RNA. In some embodiments, the extension arm comprises a DNA synthesis template and a primer binding site. In some embodiments, the extension arm comprises the following components in the 5 'to 3' direction: a DNA synthesis template and a primer binding site. In some embodiments, the extension arm further comprises a homology arm. In various embodiments, the extension arm comprises the following components in the 5 'to 3' direction: homology arms, editing templates, and primer binding sites. Since the polymerization activity of reverse transcriptase is in the 5 'to 3' direction, the preferred arrangement of homology arms, editing templates and primer binding sites is in the 5 'to 3' direction, so that once reverse transcriptase is directed by annealed primer sequences, DNA single strands are polymerized using editing templates as complementary template strands.
Further details, such as the length of the extension arm, are described elsewhere herein.
The extension arm can also be described as generally comprising two regions: primer Binding Sites (PBS) and DNA synthesis templates, for example, as shown in fig. 3G (top panel). When the endogenous DNA strand is directed to the editor complex to nick, thereby exposing the 3' end on the endogenous nicked strand, the primer binding site binds to the primer sequence formed from the endogenous DNA strand at the target site. As explained herein, binding of the primer sequence to the primer binding site on the pegRNA extension arm creates a duplex region with an exposed 3' end (i.e., 3' of the primer sequence), which then provides a substrate for the polymerase to polymerize the DNA single strand starting from the exposed 3' end along the length of the DNA synthesis template. The sequence of the single stranded DNA product is the complement of the DNA synthesis template. Polymerization is continued toward the 5' side of the DNA synthesis template (or extension arm) until polymerization is terminated. Thus, the DNA synthesis template represents a portion of the extension arm that is encoded by the polymerase of the guided editor complex into a single-stranded DNA product (i.e., a 3' single-stranded DNA flap containing the desired genetic editing information) and ultimately displaces the corresponding endogenous DNA strand at the target site immediately downstream of the PE-induced nick site. Without being bound by theory, the DNA synthesis template continues to polymerize toward the 5' end of the extension arm until a termination event. The polymerization may terminate in a variety of ways including, but not limited to (a) reaching the 5 'end of the pegRNA (e.g., in the case of a 5' extension arm, where the DNA polymerase simply depletes the template), (b) reaching an insurmountable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., blocking or inhibiting a specific nucleotide sequence of the polymerase, or a nucleic acid topology signal, such as supercoiled DNA or RNA.
Flap endonucleases (e.g., FEN 1)
As used herein, the term "flap endonuclease" refers to an enzyme that catalyzes the removal of a 5' single stranded DNA flap. These are enzymes that handle the removal of the 5' flap formed during DNA replication. The guided editing methods described herein can utilize endogenously provided flap endonucleases or those provided in trans to remove the 5' flap of endogenous DNA formed at the target site during guided editing. Flap endonucleases are known in the art and are described in Patel et al, "Flap endonucleases pass 5'-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5' -ends," Nucleic Acids Research,2012,40 (10): 4507-4519,Tsutakawa et al., "Human flap endonuclease structures, DNA double-base flip, and a unified understanding of the FEN1 superfamily," Cell,2011,145 (2): 198-211, and Balakrishnan et al, "Flap Endonuclease 1," Annu Rev Biochem,2013, vol 82:119-138 (each incorporated herein by reference). An exemplary flap endonuclease is FEN1, which can be represented by the following amino acid sequence:
functional equivalents
The term "functional equivalent" refers to a second biomolecule that is functionally equivalent to, but not necessarily structurally equivalent to, the first biomolecule. For example, a "Cas9 equivalent" refers to a protein that has the same or substantially the same function as Cas9, but not necessarily the same amino acid sequence. In the context of the present disclosure, the present specification refers throughout to "protein X or functional equivalent thereof". In this case, a "functional equivalent" of protein X includes any homolog, paralog, fragment, naturally occurring, engineered, mutated or synthetic form of protein X having an equivalent function.
Fusion proteins
As used herein, the term "fusion protein" refers to a hybrid polypeptide comprising protein domains from at least two different proteins. A protein may be located in the amino-terminal (N-terminal) portion of the fusion protein or in the carboxy-terminal (C-terminal) protein, thereby forming an "amino-terminal fusion protein" or a "carboxy-terminal fusion protein", respectively. The proteins can comprise different domains, for example, a nucleic acid binding domain (e.g., a gRNA binding domain of Cas9 that directs binding of the protein to a target site) and a nucleic acid cleavage domain or catalytic domain of a nucleic acid editing protein. Another example includes Cas9 or its equivalent to a reverse transcriptase. Any of the proteins provided herein can be produced by any method known in the art. For example, the proteins provided herein can be produced via recombinant protein expression and purification, which is particularly useful for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known and include Green and Sambrook, molecular Cloning: A Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)) which are incorporated herein by reference in their entirety.
Gene of interest (GOI)
The term "gene of interest" or "GOI" refers to a gene encoding a biomolecule of interest (e.g., a protein or RNA molecule). The protein of interest may include any intracellular, membrane or extracellular protein, such as nuclear proteins, transcription factors, nuclear membrane transport proteins, intracellular organelle related proteins, membrane receptors, catalytic proteins, and enzymes, therapeutic proteins, membrane transport proteins, signal transduction or immune proteins (e.g., igG or other antibody proteins), and the like. Genes of interest may also encode RNA molecules including, but not limited to, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), antisense RNA, guide RNA, microrna (miRNA), small interfering RNA (siRNA), and cell-free RNA (cfRNA).
Guide RNA ("gRNA")
As used herein, the term "guide RNA" is a specific type of guide nucleic acid that is typically associated with a Cas protein of CRISPR-Cas9 and associates with Cas9, directing the Cas9 protein into a DNA molecule that includes a specific sequence of complementarity to the pre-spacer of the guide RNA. However, the term also includes equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and otherwise program Cas9 equivalents to localize to a particular target nucleotide sequence. Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (type V CRISPR-Cas system), C2 (type VI CRISPR-Cas system), and C2C3 (type V CRISPR-Cas system). Other Cas equivalents are described in Makarova et al, "C2C2 is a single-component programmable RNA-guide RNA-targeting CRISPR effector," Science 2016;353 (6299), the contents of which are incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein. In addition, provided herein are methods for designing suitable guide RNA sequences. As used herein, a "guide RNA" is also referred to as a "traditional guide RNA" to be compared to a modified form of guide RNA, referred to as a "guide editing guide RNA" (or "pegRNA"), which has been invented for use in the guide editing methods and compositions disclosed herein.
The guide RNA or pegRNA may comprise various structural elements including, but not limited to:
spacer sequence-sequence in guide RNA or pegRNA (with a length of about 20 nt) that binds to a pre-spacer in target DNA.
gRNA core (or gRNA scaffold or backbone sequence) -refers to the sequence within the gRNA responsible for Cas9 binding, which does not include a 20bp spacer/targeting sequence for guiding Cas9 to target DNA.
Extension arm-single stranded extension at the 3 'end or 5' end of the pegRNA, comprising a primer binding site and a DNA synthesis template sequence encoding a single stranded DNA flap containing the gene change of interest via a polymerase (e.g., reverse transcriptase), then integrated into the endogenous DNA by substitution of the corresponding endogenous strand, thereby installing the desired gene change.
The transcription terminator-guide RNA or pegRNA may comprise a transcription termination sequence at the 3' end of the molecule.
G-quadruplex
The term "G-quadruplex" refers to its ordinary and customary meaning. G-quadruplexes are complex three-dimensional nucleic acid moieties formed in guanine (G) -rich nucleic acid sequences. They are helical, formed by a stack of interconnected guanine quadrants (or "G-quadrants"), each of which is a flat ring structure formed by four guanines, and can be stabilized by the presence of cations (e.g., potassium) located in the central channel between the G-quadrants. G-quadruplexes are collections of multiple structures, rather than single structures. Further references to G-Quadruplexes can be found in (1) Kwok et al, "G-quadrupexes: prediction, characial, and Biological Application," Trends inBiotechnology,2017, vol.35 (10; pp.997-1013; (2) Hansel-Hertsch R.et al., "DNA G-Quadruplexes in the human genome: detection, functions and therapeutic potential," Nat.Rev.mol.cell biol.,2017;18:279-284; and (3) Millevoi S.et al., "G-Quadruplexes in RNA biology," Wiley Interdinip.Rev.RNA.; 2012;3:495-507, each of which is incorporated herein by reference.
Homology arm
The term "homology arm" refers to a portion of an extension arm that encodes the portion of the resulting reverse transcriptase encoded single stranded DNA flap that will integrate into the target DNA location by displacing the endogenous strand. The portion of the single-stranded DNA flap encoded by the homology arm is complementary to the non-editing strand of the target DNA sequence, which facilitates replacement of the endogenous strand and in situ annealing of the single-stranded DNA flap, thereby installing editing. The component is further defined elsewhere. The homology arm is part of the DNA synthesis template, as it is by definition encoded by the polymerase of the guided editor described herein.
Host cells
As used herein, the term "host cell" refers to a cell that can contain, replicate and express a vector described herein, e.g., a vector comprising a nucleic acid molecule encoding a fusion protein comprising Cas9 or a Cas9 equivalent and a reverse transcriptase.
Intein peptides
As used herein, the term "intein" refers to an automatically processed polypeptide domain found in organisms from all areas of life. Inteins (inteins) perform a unique automated processing event called protein splicing, in which they cleave themselves from larger precursor polypeptides by cleavage of two peptide bonds, and in the process attach flanking extein (external protein) sequences by forming new peptide bonds. This rearrangement occurs post-translationally (or possibly co-translationally) as the intein gene is found embedded in-frame with other protein-encoding genes. Furthermore, intein-mediated protein splicing is spontaneous; it requires no external factors or energy sources, only folding the intein domain. This process is also known as cis-protein splicing, as opposed to the natural process of trans-protein splicing with "split inteins". Inteins are protein equivalents of self-splicing RNA introns (see Perler et al, nucleic Acids Res.22:1125-1127 (1994)) which catalyze their own cleavage from precursor proteins, accompanied by fusion of flanking protein sequences known as exopeptides (see Perler et al, curr.Opin. Chem.biol.1:292-299 (1997), perler, F.B.cell 92 (1): 1-4 (1998), xu et al, EMBO J.15 (19): 5146-5153 (1996)).
As used herein, the term "protein splicing" refers to the process in which the internal region of a precursor protein (intein) is excised and the flanking regions of the protein (extein) are joined to form the mature protein. This natural process has been observed in many proteins from prokaryotes and eukaryotes (Perler, f.b., xu, m.q., paul, H.current Opinion in Chemical Biology 1997,1,292-299;Perler,F.B.Nucleic Acids Research 1999,27,346-347). Intein units contain the necessary components to catalyze protein splicing and typically contain an endonuclease domain involved in intein migration (Perler, f.b., davis, e.o., dean, g.e., gilble, f.s., jack, w.e., neff, n., noren, c.j., thomer, j., belfort, m.nucleic Acids Research 1994,22,1127-1127). However, the proteins produced are linked and not expressed as separate proteins. Protein splicing can also be performed in trans, with the cleaved inteins expressed on separate polypeptides spontaneously combining to form a single intein, which then undergoes a protein splicing process to join to separate proteins.
Elucidation of the mechanism of Protein splicing has led to a number of intein-based applications ((Comb, et al, U.S. patent No. 5,496,714;Comb,et al, U.S. patent No. 5,834,247;Camarero and Muir,J.Amer.Chem.Soc, 121:5597-5598 (1999); chong, et al, gene,192:271-281 (1997), chong, et al, nucleic Acids Res, 26:5109-5115 (1998); chong, et al, J.biol.chem.,273:10567-10577 (1998); cotton, et al J.am.chem.121:1100-1101 (1999), evans, et al, J.biol.chem.274:18359-18363 (1999), evans, et al, J.biol.chem.274:3923-3926 (1999), evans, et al, protein Sci.7:2256-2264 (1998), evans, et al, J.biol.chem.275:9091-9094 (2000), iwai and Pluckthun, FEBS Lett.459:166-172 (1999), mathys, et al, gene,231:1-13 (1999), proc.Nature.Sci.USA.3543-3548 (1998), muir, et al, nature Sci.Sci.6705. A95:6705) O95 (1998), evans, et al, J.biol.chem.chem.7-1998), J.chem.37-37, J.1997, J.Biol.chem.35.37-95, J.1997, J.chem.35, J.35, J.35:35, J.m.35, J.35, J.m. 35, J.35, J.m.35, J.1, J.35, scE.35, E.35, 35, scE.35, USE.35, 35, scE.35, 35, USE.35, scE.35, 35, scE.35, scE, scE.35, scE, sc35, scSc35, sc35, scScScSc35, viScScScScScScScScScScScScScScScScScScScScScScScScScScScE, 1, 1-1-1, 1-1-35, biochim Biophys Acta 1387:422-432 (1998 b); xu, et al, proc.Natl. Acad.Sci.USA 96:388-393 (1999); yamazaki, et al, J.am.chem.Soc.,120:5591-5592 (1998)). Each of which is incorporated herein by reference.
Ligand-dependent inteins
As used herein, the term "ligand-dependent intein" refers to an intein comprising a ligand binding domain. Typically, the ligand binding domain is inserted into the amino acid sequence of the intein to form the structure: intein (N) -ligand binding domain-intein (C). In general, ligand-dependent inteins exhibit no or only minimal protein splicing activity in the absence of a suitable ligand, whereas protein splicing activity is significantly increased in the presence of a ligand. In some embodiments, the ligand-dependent inteins do not exhibit observable splicing activity in the absence of ligand, but do exhibit splicing activity in the presence of ligand. In some embodiments, the ligand-dependent inteins exhibit observable protein splicing activity in the absence of the ligand, and the protein splicing activity in the presence of the appropriate ligand is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 150-fold, at least 200-fold, at least 250-fold, at least 500-fold, at least 1000-fold, at least 1500-fold, at least 2000-fold, at least 2500-fold, at least 5000-fold, at least 10000-fold, at least 20000-fold, at least 25000-fold, at least 50000-fold, at least 100000-fold, at least 500000-fold, or at least 1000000-fold greater than the activity observed in the absence of the ligand. In some embodiments, the increase in activity is dose dependent over at least 1 order of magnitude, at least 2 orders of magnitude, at least 3 orders of magnitude, at least 4 orders of magnitude, or at least 5 orders of magnitude, allowing for fine tuning of intein activity by adjusting ligand concentration. Suitable ligand-dependent inteins are known in the art and include those provided below and those described in published U.S. patent application No. 2014/0065711 A1; mootz et al, "Protein splicing triggered by a small molecular," j.am.chem.soc.2002;124,9044-9045; mootz et al, "Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo," J.am.chem.Soc.2003;125,10561-10569; buskirk et al, proc.Natl. Acad. Sci. USA.2004;101,10505-10510); skretas & Wood, "Regulation of Protein activity with small-molecular-controlled inteins," Protein Sci.2005;14,523-532; schwartz, et al, "Post-translational enzyme activation in an animal via optimized conditional protein splicing," nature chem. Biol.2007;3,50-54; peck et al chem.biol.2011;18 (5), 619-630; the entire contents of each are hereby incorporated by reference. An exemplary sequence is as follows:
/>
Joint
As used herein, the term "linker" refers to a molecule that connects two other molecules or moieties. In the case of a linker linking two fusion proteins, the linker may be an amino acid sequence. For example, cas9 can be fused to a polymerase (e.g., reverse transcriptase) via an amino acid linker sequence. In the case of joining two nucleotide sequences together, the linker may also be a nucleotide sequence. For example, in the present case, a traditional guide RNA is linked via a spacer or linker nucleotide sequence to an RNA extension that directs editing of the guide RNA, which may comprise an RT template sequence and an RT primer binding site. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5 to 100 amino acids in length, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
Separated from each other
"isolated" means altered or removed from a natural state. For example, a nucleic acid or peptide naturally occurring in a living animal is not "isolated," but the same nucleic acid or peptide, partially or completely isolated from coexisting materials in its natural state, is "isolated. The isolated nucleic acid or protein may be present in a substantially purified form, or may be present in a non-natural environment such as, for example, a host cell.
In some embodiments, the gene of interest is encoded by an isolated nucleic acid. As used herein, the term "isolated" refers to the property of a material as provided herein to be removed from its original or natural environment (e.g., the natural environment if it is naturally occurring). Thus, a naturally occurring polynucleotide or protein or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide separated from some or all of the coexisting materials in the natural system by human intervention is isolated. Thus, artificial or engineered materials, such as non-naturally occurring nucleic acid constructs, expression constructs and vectors as described herein, are also referred to as isolated. The material does not have to be purified for isolation. Thus, the material may be part of the carrier and/or part of the composition and still be isolated, as such carrier or composition is not part of the environment in which the material is found in nature.
MS2 tagging technology
In various embodiments (e.g., as depicted in the embodiments of fig. 72-73 and example 19), the term "MS2 tagging technique" refers to a combination of an "RNA-protein interaction domain" (also known as an "RNA-protein recruitment domain or protein") with an RNA-binding protein pairing that specifically recognizes and binds to the RNA-protein interaction domain (e.g., a specific hairpin structure). These types of systems can be utilized to recruit various functionalities to a guided editor complex that binds to a target site. The MS2 tagging technique is based on the natural interaction of the MS2 phage coat protein ("MCP" or "MS2 cp") with stem-loop or hairpin structures (i.e., "MS2 hairpin") present in the phage genome. In the case of guided editing, MS2 tagging techniques involve introducing an MS2 hairpin into a desired RNA molecule (e.g., pegRNA or tPERT) involved in guided editing, and then constructing a specific interactable binding target for the RNA binding protein that recognizes and binds to the structure. In the case of MS2 hairpins, they are recognized and bound by MS2 phage coat protein (MCP). Also, if the MCP is fused to another protein (e.g., a reverse transcriptase or other DNA polymerase), the MS2 hairpin may be used to "recruit" the other protein in trans to the target site occupied by the guide editing complex.
The guided editors described herein, as an aspect, may incorporate any known RNA-protein interaction domain to recruit or "co-localize" particular functionalities of interest to the guided editor complex. Other reviews of modular RNA-protein interaction domains have been described in the art, for example, johansson et al, "RNA recognition by the MS phage coat protein," Sem Virol, 1997, vol.8 (3): 176-185; delebicque et al, "Organization of intracellular reactions with rationally designed RNA assemblies," Science,2011, vol.333:470-474; mali et al, "Cas9transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering," Nat. Biotechnol.,2013, vol.31:833-838; and Zaletan et al, "Engineering complex synthetic transcriptional programs with CRISPR RNA scaffoldes," Cell,2015, vol.160:339-350, each of which is incorporated herein by reference in its entirety. Other systems include PP7 hairpins (which specifically recruit PCP proteins) and "Com" hairpins (which specifically recruit Com proteins). See Zalatan et al.
The nucleotide sequence of the MS2 hairpin (or equivalently "MS2 aptamer") is:
GCCAACATGAGGATCACCCATGTCTGCAGGGCC(SEQ ID NO:24)。
The amino acid sequence of MCP or MS2cp is:
GSASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKVATQTVGGEELPVAGWRSYLNMELTIPFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY(SEQ ID NO:25)。
MS2 hairpins (or "MS2 aptamers") may also be referred to as a class of "RNA effector recruitment domains" (or equivalently "RNA binding protein recruitment domains" or simply "recruitment domains") because they are physical structures (e.g., hairpins) that fit into pegRNA or tPERT that effectively recruit other effector functions (e.g., RNA binding proteins with different functions, such as DNA polymerase or other DNA modifying enzymes) to the so modified pegRNA or rprt, thereby co-localizing the effector functions in trans to the guide editing mechanism. The application is not intended to be limited in any way to any particular RNA effector recruitment domain, and may include any useful such domain, including MS2 hairpins. Example 19 and fig. 72 (b) depict the use of an MS2 aptamer linked to a DNA synthesis domain (i.e., a tPERT molecule) and a guide editor comprising an MS2cp protein fused to PE2 to cause co-localization of the DNA synthesis domain of the tPERT molecule and a guide editor complex (MS 2cp-PE2: sgRNA complex) that binds to a target DNA location.
napDNAbp
As used herein, the term "nucleic acid programmable DNA binding protein" or "napDNAbp" (where Cas9 is an example) refers to a protein that uses RNA: DNA hybridization to target and bind to a specific sequence in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence comprising a DNA strand (i.e., target strand) complementary to the guide nucleic acid or portion thereof (e.g., the pre-spacer of the guide RNA). In other words, the guide nucleic acid "programs" the napDNAbp (e.g., cas9 or equivalent) to locate and bind the complementary sequence.
Without being bound by theory, the binding mechanism of the napDNAbp-guide RNA complex generally includes a step of forming an R loop, whereby napDNAbp induces unwinding of the double stranded DNA target, thereby separating the strands in the region bound by napDNAbp. The guide RNA pre-spacer is then hybridized to the "target strand". This displaces the "non-target strand" that is complementary to the target strand, forming the single-stranded region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cleave DNA, leaving behind various types of lesions. For example, napDNAbp can comprise nuclease activity that cleaves a non-target strand at a first location and/or cleaves a target strand at a second location. Depending on the nuclease activity, the target DNA may be cleaved to form a "double strand break," thereby cleaving both strands. In other embodiments, the target DNA may be cleaved at only a single site, i.e., the DNA has a "nick" on one strand. Exemplary napDNAbp with different nuclease activities include "Cas9 nickase" ("nCas 9") and deactivated Cas9 without nuclease activity ("dead Cas9" or "dCas 9"). Exemplary sequences of these and other napDNAbp are provided herein.
Nicking enzyme
As used herein, "nickase" refers to a napDNAbp (e.g., cas protein) capable of cleaving only one of two complementary strands of a double-stranded target DNA sequence, thereby generating a nick in that strand. In some embodiments, the nicking enzyme cleaves a non-target strand of a double-stranded target DNA sequence. In some embodiments, the nickase comprises an amino acid sequence having one or more mutations in the catalytic domain of a typical napDNAbp (e.g., cas protein), wherein the one or more mutations reduce or eliminate nuclease activity of the catalytic domain. In some embodiments, the nickase is Cas9 comprising one or more mutations in the RuvC-like domain relative to the wild-type Cas9 sequence or relative to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is Cas9 comprising one or more mutations in the HNH-like domain relative to the wild-type Cas9 sequence or relative to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is Cas9 comprising an aspartic acid-to-alanine substitution (D10A) in the RuvC I catalytic domain relative to a typical Cas9 sequence or relative to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is Cas9 comprising a H840A, N854A and/or N863A mutation relative to a typical Cas9 sequence or relative to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the term "Cas9 nickase" refers to Cas9 with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of the target DNA. In some embodiments, the nickase is a Cas protein that is not a Cas9 nickase.
Nuclear Localization Sequence (NLS)
The term "nuclear localization sequence" or "NLS" refers to an amino acid sequence that facilitates the import of a protein into the nucleus, such as by nuclear transport. Nuclear localization sequences are known in the art and will be apparent to the skilled artisan. For example, international PCT application No. PCT/EP 2000/0110290, filed 11/23/2000, and published in WO/2001/038547 at 31/2001, the disclosures of which are incorporated herein by reference. In some embodiments, the NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 26) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 27).
Nucleic acid molecules
As used herein, the term "nucleic acid" refers to a polymer of nucleotides. The polymer can include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolopyrimidine, 3-methyladenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyluridine, C5 propynylcytidine, C5 methylcytidine, 7 deadenosine, 7 deazaguanosine, 8 oxo-adenosine, 8 oxo-guanosine, O (6) methylguanosine, 4-acetylcytidine, 5- (carboxyhydroxymethyl) uridine, dihydrouridine, methylpseuduridines, 1-methylguanosine, N6-methyladenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), inserted bases, modified sugars (e.g., 2 '-fluorodeoxyuridine, 2' -methylriboside, and N '-methylriboside, and phospho-and phospho-arabino (N' phospho) and phospho-linked).
Nucleotide structural motifs (or nucleic acid portions)
As used herein, the term "nucleotide structural motif" or equivalently "nucleic acid portion" refers to a nucleic acid molecule or portion thereof that forms a secondary or tertiary structure as a result of base pairing interactions within a single nucleic acid polymer or between two or more nucleic acid polymers. Such nucleotide structural motifs may be formed from DNA, RNA or a mixture of DNA and RNA. The term does not refer to a standard DNA duplex. Examples of nucleic acid moieties include, but are not limited to, toe loops, hairpins, stem loops, pseudoknots, aptamers, G quadruplexes, tRNA, ribozymes, riboswitches, type A DNA, type B DNA, or type Z DNA.
pegRNA
As used herein, the term "guide editing guide RNA" or "pegRNA" or "extended guide RNA" refers to a particular form of guide RNA that has been modified to include one or more additional sequences for use in practicing the guide editing methods and compositions described herein. As described herein, the guided editing guide RNA comprises one or more "extension regions" of a nucleic acid sequence. The extension region may include, but is not limited to, single stranded RNA or DNA. Furthermore, an extension region may be present at the 3' end of a conventional guide RNA. In other configurations, the extension region may be present at the 5' end of a traditional guide RNA. In other configurations, the extension region may be present in an intramolecular region of a traditional guide RNA, e.g., in a gRNA core region associated with and/or bound to napDNAbp. The extension region comprises a "DNA synthesis template" that encodes (by a polymerase that directs the editor) single-stranded DNA that is in turn designed to (a) be homologous to the endogenous target DNA to be edited, and (b) that comprises at least one desired nucleotide change (e.g., a transition, a transversion, a deletion, or an insertion) to be introduced or integrated into the endogenous target DNA. The extension region may also comprise other functional sequence elements such as, but not limited to, "primer binding sites" and "spacer or linker" sequences, or additional structural elements such as, but not limited to, an aptamer, a stem loop, a hairpin, a toe loop (e.g., a 3' toe loop), or an RNA-protein recruitment domain (e.g., an MS2 hairpin). As used herein, a "primer binding site" comprises a sequence that hybridizes to a single-stranded DNA sequence having a 3' end generated from nicked DNA of an R-loop.
In certain embodiments, the pegRNA is shown by FIG. 3A, which shows a pegRNA with a 5' extension arm, a spacer region, and a gRNA core. "'
The '5' extension also comprises a reverse transcriptase template, a primer binding site and a linker in the 5 'to 3' direction. As shown, the reverse transcriptase template may also be more broadly referred to as a "DNA synthesis template," wherein the editor-directed polymerase described herein is not RT, but another type of polymerase.
In certain other embodiments, the pegRNA is shown by FIG. 3B, which shows a pegRNA with a 5' extension arm, a spacer region, and a gRNA core. The 5' extension also includes a reverse transcriptase template, a primer binding site, and a linker in the 5' to 3' direction. As shown, the reverse transcriptase template may also be more broadly referred to as a "DNA synthesis template," wherein the editor-directed polymerase described herein is not RT, but another type of polymerase.
In other embodiments, the pegRNA is shown in FIG. 3D, showing a pegRNA with spacer (1), gRNA core (2) and extension arm (3) in the 5 'to 3' direction. Extension arm (3) is located at the 3' end of pegRNA. The extension arm (3) also includes a "primer binding site" (A), an "editing template" (B) and a "homology arm" (C) in the 5 'to 3' direction. Extension arm (3) may also comprise optional modification regions at the 3 'and 5' ends, which may be the same sequence or different sequences. In addition, the 3' end of the pegRNA may comprise a transcription terminator sequence. These sequence elements of pegRNA are further described and defined herein.
In other embodiments, the pegRNA is shown by FIG. 3E, which shows a pegRNA with extension arm (3), spacer (1) and gRNA core (2) in the 5 'to 3' direction. Extension arm (3) is located at the 5' end of pegRNA. The extension arm (3) further comprises a "primer binding site" (A), "editing template" (B) and "homology arm" (C) in the 5 'to 3' direction. Extension arm (3) may also comprise optional modification regions at the 3 'and 5' ends, which may be the same sequence or different sequences. The pegRNA may further comprise a transcription terminator sequence at the 3' end. These sequence elements of pegRNA are further described and defined herein.
PE1
As used herein, "PE1" refers to a PE complex comprising a fusion protein comprising Cas9 (H840A) and wild-type MMLV RT having the following structure: [ NLS ] - [ Cas9 (H840A) ] - [ linker ] - [ MMLV_RT (wt) ]+the desired pegRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO: 28.
PE2
As used herein, "PE2" refers to a PE complex comprising a fusion protein comprising Cas9 (H840A) and a variant MMLV RT having the structure: (NLS) - [ (Cas 9 (H840A) ] - [ linker ] - [ mmlv_rt (D200N) (T330P) (L603W) (T306K) (W313F) ]+the desired pegRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO: 33.
PE3
As used herein, "PE3" refers to PE2 plus a second strand nick-generating guide RNA that complexes with PE2 and introduces a nick in the non-edited DNA strand to cause preferential displacement of the edited strand.
PE3b
As used herein, "PE3b" refers to PE3 but wherein the second-strand-incision-producing guide RNA is designed for timing control such that the second strand incision is not introduced until after installation of the desired edit. This is achieved by designing the gRNA with a spacer sequence that matches only the editing strand, not the original allele. Using this strategy (hereinafter referred to as PE3 b), the mismatch between the pre-spacer and the non-editing allele should be detrimental to nicking by the sgRNA until after the editing event on the PAM strand occurs.
PE4
As used herein, "PE4" refers to a system comprising a trans-expressed PE2 plus MLH1 dominant negative protein (i.e., wild-type MLH1 truncated at amino acids 754-756, which may be referred to herein as "mlh1Δ754-756" or "MLH1 dn"). In some embodiments, PE4 refers to a fusion protein comprising PE2 and an MLH1 dominant negative protein linked via an optional linker.
PE5
As used herein, "PE5" refers to a system comprising a trans-expressed PE3 plus MLH1 dominant negative protein (i.e., wild-type MLH1 truncated at amino acids 754-756 as further described herein, which may be referred to as "mlh1Δ754-756" or "MLH1 dn"). In some embodiments, PE5 refers to a fusion protein comprising PE3 and an MLH1 dominant negative protein linked via an optional linker.
PE-short
As used herein, "PE-short" refers to a PE construct fused to a C-terminally truncated reverse transcriptase and having the following amino acid sequence:
MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDNSRLINSGGSKRTADGSEFEPKKKRKV(SEQ ID NO:35)
explanation:
nuclear Localization Sequence (NLS)Open end: (SEQ ID NO: 29), terminal: (SEQ ID NO: 30)
CAS9(H840A)(SEQ ID NO:31)
33-amino acid linker 1(SEQ ID NO:11)
M-MLV truncated reverse transcriptase
(SEQ ID NO:36)
Peptide tag
The term "peptide tag" refers to a peptide amino acid sequence that is genetically fused to a protein sequence to impart one or more functions to the protein that facilitate manipulation of the protein for different purposes, such as visualization, purification, solubilization, and isolation, among others. Peptide tags may include different types of tags classified by purpose or function, which may include "affinity tags" (facilitating protein purification), "solubilization tags" (assisting in protein proper folding), "chromatographic tags" (altering the chromatographic properties of the protein), "epitope tags" (binding to high affinity antibodies), "fluorescent tags" (facilitating visualization of the protein in cells or in vitro).
Polymerase enzyme
As used herein, the term "polymerase" refers to an enzyme that synthesizes a nucleotide chain and can be used in conjunction with the guided editor system described herein. The polymerase may be a "template dependent" polymerase (i.e., a polymerase that synthesizes a nucleotide chain based on the nucleotide base sequence of the template chain). The polymerase may also be a "template independent" polymerase (i.e., a polymerase that synthesizes a nucleotide chain without the need for a template chain). The polymerase may be further classified as "DNA polymerase" or "RNA polymerase". In various embodiments, the guided editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase may be a "DNA-dependent DNA polymerase" (i.e., whereby the template molecule is a DNA strand). In this case, the DNA template molecule may be pegRNA, wherein the extension arm comprises a DNA strand. In this case, the pegRNA may be referred to as a chimeric or hybrid pegRNA, which comprises an RNA portion (i.e., a guide RNA component, including a spacer region and a gRNA core) and a DNA portion (i.e., an extension arm). In various other embodiments, the DNA polymerase may be an "RNA-dependent DNA polymerase" (i.e., whereby the template molecule is an RNA strand). In this case, the pegRNA is RNA, i.e. includes RNA extension. The term "polymerase" may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., polymerase activity). Typically, the enzyme will begin synthesis at the 3 '-end of the primer that anneals to the polynucleotide template sequence (e.g., the primer sequence that anneals to the primer binding site of the pepRNA) and will proceed toward the 5' -end of the template strand. "DNA polymerase" catalyzes the polymerization of deoxynucleotides. As used herein with respect to DNA polymerase, the term DNA polymerase includes "functional fragments thereof. "functional fragment thereof" refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the complete amino acid sequence of the polymerase and retains the ability to catalyze the polymerization of a polynucleotide under at least one set of conditions. Such a functional fragment may exist as a separate entity or it may be a component of a larger polypeptide (e.g., a fusion protein).
Guided editing
As used herein, the term "guided editing" refers to a novel method of gene editing using napDNAbps, a polymerase (e.g., reverse transcriptase), and a specialized guide RNA that includes a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence. Certain embodiments of guided editing are specifically described in the embodiments of fig. 1A-1H and 72 (a) to 72 (c).
Guide editing represents a completely new genome editing platform for a versatile and accurate genome editing method that uses a nucleic acid programmable DNA binding protein ("napDNAbp") that runs with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with napDNAbp) to write new genetic information to specified DNA sites, wherein the guide editing system programs with a guide editing (PE) guide RNA ("perna") that both specifies a target site by an extension (DNA or RNA) engineered into the guide RNA (e.g., at the 5 'or 3' end or internal portion of the guide RNA) and provides a template for synthesizing the desired editing in the form of a displaced DNA strand. The substitution strand containing the desired editing (e.g., single nucleobase substitution) shares the same (or homologous) sequence (only) It includes the desired edit). The endogenous strand downstream of the nick site is replaced by a newly synthesized replacement strand containing the desired editing by DNA repair and/or replication mechanisms. In some cases, guided editing may be considered a "search-and-replace" genome editing technique, in that the guided editor described herein not only searches for and locates the desired target site to be edited, but also simultaneously encodes a replacement strand containing the desired editing that is installed to replace the endogenous DNA strand of the corresponding target site. The boot editor of the present disclosure relates in part to the following findings: precise genome editing based on CRISPR/Cas can be performed with high efficiency and genetic plasticity (e.g., as depicted in the different embodiments of fig. 1A-1F) using or modulating a target-initiated reverse transcription (TPRT) or "guided editing" mechanism. TPRT is naturally used by mobile DNA elements, such as mammalian non-LTR retrotransposons and bacterial group II introns 28,29 . The inventors herein use Cas protein-reverse transcriptase fusions or related systems to target specific DNA sequences with guide RNAs, generate single-stranded nicks at target sites, and use nicked DNA as primers to perform reverse transcription of an engineered reverse transcriptase template integrated with the guide RNAs. However, while the concept begins with the use of reverse transcriptase as the guide editor for the DNA polymerase component, the guide editors described herein are not limited to reverse transcriptase, but can include the use of nearly any DNA polymerase. Indeed, although the present application may refer throughout to a guided editor having a "reverse transcriptase", it is proposed herein that reverse transcriptase be only one type of DNA polymerase that can function with guided editing. Thus, wherever the specification refers to "reverse transcriptase," one of ordinary skill in the art will appreciate that any suitable DNA polymerase may be used in place of reverse transcriptase. Thus, in one aspect, the guide editor can comprise Cas9 (or equivalent napDNAbp) programmed to target a DNA sequence by associating it with a specialized guide RNA (i.e., pegRNA) that contains a spacer sequence that anneals to a complementary pre-spacer in the target DNA. The specialized guide RNA also contains novel genetic information in the form of an extension that encodes a DNA substitution strand containing the desired gene change for substitution A corresponding endogenous DNA strand at the target site. To transfer information from the pegRNA to the target DNA, the guided editing mechanism involves nicking the target site on one strand of the DNA to expose the 3' -hydroxyl. The exposed 3' -hydroxyl pair can then be used to direct DNA polymerization encoding the edited extension on the pegRNA directly into the target site. In various embodiments, the extension (which provides a template for polymerization containing the edited substitution strand) may be formed from RNA or DNA. In the case of RNA extension, the polymerase that directs the editor may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of DNA extension, the polymerase that directs the editor may be a DNA-dependent DNA polymerase. The new synthetic strand formed by the guided editor disclosed herein (i.e., the strand containing the substitution DNA desired to be edited) will be homologous (i.e., have the same sequence) to the genomic target sequence except for containing the desired nucleotide change (e.g., a single nucleotide change, a deletion or an insertion, or a combination thereof). The newly synthesized (or displaced) DNA strand, also referred to as a single-stranded DNA flap, will compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase (e.g., provided as a fusion protein with a Cas9 domain, or provided in trans with a Cas9 domain). Error-prone reverse transcriptase can introduce changes during the synthesis of single stranded DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase may be utilized to introduce nucleotide changes into target DNA. The variation may be random or non-random depending on the error-prone reverse transcriptase used with the system. The decomposition of hybridization intermediates (including single-stranded DNA flaps synthesized by reverse transcriptase hybridized to endogenous DNA strands) may include removal of the resulting replacement flaps of endogenous DNA (e.g., using 5' end DNA flap endonucleases, FENl), ligation of the synthesized single-stranded DNA flaps to target DNA, and assimilation of desired nucleotide changes due to cellular DNA repair and/or replication processes. Since DNA synthesis, which provides templates, provides single nucleotide precision for any nucleotide modification (including insertions and deletions), this approach is very broad in scope and can predictably be used for numerous applications in basic science and therapeutics.
In various embodiments, the guided editing is performed by contacting the target DNA molecule (to which a change in nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) that is complexed with the guided editing guide RNA (pegRNA). Referring to FIG. 1G, a guide editing guide RNA (pegRNA) comprises an extension at the 3 'or 5' end of the guide RNA or at an intramolecular position of the guide RNA and encodes a desired nucleotide change (e.g., a single nucleotide change, an insertion, or a deletion). In step (a), the napDNAbp/pegRNA complex is contacted with a DNA molecule, and the extended pegRNA directs the napDNAbp to bind to the target locus. In step (b), a nick is introduced (e.g., by a nuclease or chemical agent) in one of the DNA strands of the target locus, thereby producing a useful 3' end in one of the strands of the target locus. In certain embodiments, a nick is created in the DNA strand corresponding to the R-loop strand (i.e., the strand that does not hybridize to the guide RNA sequence, i.e., the "non-target strand"). However, an incision may be introduced in either strand. In other words, a nick may be introduced into the R loop "target strand" (i.e., the strand hybridized to the CpegRNA) or the "non-target strand" (i.e., the strand forming the single-stranded portion of the R loop that is complementary to the target strand). In step (c), the 3' -end of the DNA strand (formed by the nick) interacts with the extension of the guide RNA to initiate reverse transcription (i.e. "target-initiated RT"). In certain embodiments, the 3' terminal DNA strand hybridizes to a specific RT priming sequence on the extension of the guide RNA, i.e., a "reverse transcriptase priming sequence" or a "primer binding site" on the pegRNA. In step (d), a reverse transcriptase (or other suitable DNA polymerase) is introduced which synthesizes single stranded DNA from the 3 'end of the priming site towards the 5' end of the guide editing guide RNA. The DNA polymerase (e.g., reverse transcriptase) may be fused to the napDNAbp, or alternatively may be provided in trans with the napDNAbp. This forms a single-stranded DNA flap that contains the desired nucleotide changes (e.g., single base changes, insertions or deletions, or a combination thereof) and is otherwise homologous to the endogenous DNA at or near the nicking site. In step (e), napDNAbp and guide RNA are released. Steps (f) and (g) involve the breakdown of single stranded DNA flaps in order to incorporate the desired nucleotide changes into the target locus. This can drive the process by removing the corresponding 5 'endogenous DNA flap, which is formed after the 3' single stranded DNA flap invades and hybridizes to the endogenous DNA sequence, to the desired product formation. Without being bound by theory, the cellular endogenous DNA repair and replication process breaks down mismatched DNA to incorporate nucleotide changes to form the desired altered product. This process can also be driven towards product formation by "second strand incision creation", as shown in fig. 1F. The process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions and insertions.
The term "guided editor (PE) system" or "guided editor (PE)" or "PE system" or "PE editing system" refers to compositions involved in genomic editing methods using guided editing as described herein, including, but not limited to, napDNAbp, reverse transcriptase, fusion proteins (e.g., comprising napDNAbp and reverse transcriptase), guided editing guide RNAs, and complexes comprising fusion proteins and guided editing guide RNAs, as well as auxiliary elements, such as a second strand nick generating component (e.g., second strand sgRNA) and 5' endogenous DNA flap removal endonuclease (e.g., FEN 1), to aid in driving the guided editing process towards the formation of an editing product.
Although in the embodiments described so far, the pegRNA constitutes a single molecule comprising a guide RNA (which itself comprises a spacer sequence and a gRNA core or scaffold) and a 5 'or 3' extension arm that contains a primer binding site and a DNA synthesis template (see, e.g., fig. 3d, the pegRNA may also take the form of two separate molecules consisting of a guide RNA and a trans-guide editor RNA template (tPERT) that substantially accommodates extension arms (including, in particular, primer binding sites and DNA synthesis domains) and RNA-protein recruitment domains (e.g., MS2 aptamers or hairpins) in the same molecule that co-localize or recruit to a modified guide editor complex comprising tPERT recruitment proteins (e.g., MS2cp proteins that bind to MS2 aptamers) as examples of tPERT that may be used with guide editing.
Guide editor
The term "guide editor" refers to a fusion construct described herein that comprises napDNAbp (e.g., cas9 nickase) and a reverse transcriptase, and is capable of guided editing of a target nucleotide sequence in the presence of a pegRNA (or "extended guide RNA"). The term "guide editor" may refer to a fusion protein or a fusion protein complexed with a pegRNA and/or further complexed with a second strand-incision generating sgRNA. In some embodiments, a guide editor may also refer to a complex comprising a fusion protein (reverse transcriptase fused to napDNAbp), a pegRNA, and a conventional guide RNA capable of guiding a second site nick generation step of a non-editing strand, as described herein. In other embodiments, the reverse transcriptase component of the "guide editor" may be provided in trans.
Primer binding sites
The term "primer binding site" or "PBS" refers to a nucleotide sequence located on a pegRNA as an extension arm assembly (typically, e.g., at the 3' end of an extension arm). The term "primer binding site" refers to a single stranded portion of PEgRNA that is a component of an extension arm that comprises a region complementary to a sequence on a non-target strand. In some embodiments, the primer binding site is complementary to a region in the non-target strand upstream of the nick site. In some embodiments, the primer binding site is complementary to a region immediately upstream of the nick site in the non-target strand. In some embodiments, the primer binding site is capable of binding to a primer sequence formed after nicking of the target sequence by the guidance editor. When the guide editor nicks one strand of the target DNA sequence (e.g., by a Cas nickase component of the guide editor), a 3' terminal ssDNA flap is formed that acts as a primer that anneals to a primer binding site on the pegRNA to guide reverse transcription. FIGS. 27 and 28 show embodiments of primer binding sites on 3 'and 5' extension arms, respectively. In some embodiments, the PBS is complementary or substantially complementary to the free 3' end on the non-target strand of the double-stranded target DNA at the nicking site and can anneal thereto. In some embodiments, PBS annealed to the free 3' end on the non-target strand may initiate target-directed DNA synthesis.
Promoters
The term "promoter" is art-recognized and refers to a nucleic acid molecule having a sequence recognized by cellular transcription machinery and capable of initiating transcription of a downstream gene. A promoter may be constitutively active, meaning that the promoter is always active in a given cellular environment, or conditionally active, meaning that the promoter is active only in the presence of certain conditions. For example, a conditional promoter may be active only in the presence of a particular protein linking a protein associated with a regulatory element in the promoter to the underlying transcription machinery, or only in the absence of an inhibitory molecule. One subset of conditionally active promoters is inducible promoters, which require the presence of small molecule "inducers" to be active. Examples of inducible promoters include, but are not limited to, arabinose inducible promoters, tet-on promoters, and tamoxifen inducible promoters. Various constitutive, conditional and inducible promoters are well known to those skilled in the art, and those skilled in the art will be able to determine a variety of such promoters for use in the practice of the present invention, as are not limited in this regard.
Front spacer
As used herein, the term "pre spacer" refers to a sequence (about 20 bp) in DNA that is adjacent to a PAM (pre spacer adjacent motif) sequence. The pre-spacer shares the same sequence as the spacer sequence of the guide RNA. The guide RNA anneals to the complement of the pre-spacer on the target DNA (specifically, one strand thereof, i.e., the "target strand" of the target DNA sequence, is relative to the "non-target strand"). In some embodiments, in order for the Cas nickase module of the guide editor to function, a specific pre-spacer proximity motif (PAM) is also required that varies with the Cas9 protein module itself (e.g., cas protein type and bacterial species). For example, the most commonly used Cas9 nucleases derived from streptococcus pyogenes recognize PAM sequences of NGG on non-target strands, which are located directly downstream of the genomic DNA target sequence. The skilled artisan will appreciate that the literature in the art sometimes refers to "pre-spacers" as about 20-nt target-specific guide sequences on the guide RNA itself, rather than to "spacers". Thus, in some cases, the term "pre-spacer" as used herein may be used interchangeably with the term "spacer". The context of the specification surrounding a "pre-spacer" or "spacer" will help inform the reader whether the term refers to a gRNA or a DNA target.
Prespacer Adjacent Motifs (PAM)
As used herein, the term "pre-spacer adjacent motif" or "PAM" refers to a DNA sequence of about 2-6 base pairs as an important targeting component for Cas9 nucleases. Typically, the PAM sequence is located on either strand and downstream of the Cas9 cleavage site in the 5 'to 3' direction. The classical PAM sequence (i.e., the PAM sequence associated with Cas9 nuclease or SpCas9 of streptococcus pyogenes) is 5'-NGG-3', where "N" is any nucleobase followed by two guanine ("G") nucleobases. Different PAM sequences may be associated with different Cas9 nucleases or equivalent proteins from different organisms. Furthermore, any given Cas9 nuclease, such as SpCas9, can be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes the alternative PAM sequence.
For example, referring to classical SpCas9 amino acid sequence SEQ ID NO:37, the PAM sequence may be modified by introducing one or more mutations, including (a) D1135V, R1335Q and T1337R "VQR variants" that alter the PAM specificity for NGAN or NGNG, (b) D1135E, R1335Q and T1337R "EQR variants" that alter the PAM specificity for NGAG, and (c) D1135V, G1218R, R E and T1337R "VRR variants" that alter the PAM specificity for NGCG. Furthermore, the D1135E variant of classical SpCas9 can still recognize NGG, but it is more selective than the wild-type SpCas9 protein.
It is also understood that Cas9 enzymes (i.e., cas9 orthologs) from different bacterial species may have different PAM specificities. For example, cas9 (SaCas 9) from staphylococcus aureus (Staphylococcus aureus) recognizes NGRRT or NGRRN. Furthermore, cas9 (NmCas) from neisseria meningitidis (Neisseria meningitis) recognizes NNNNGATT. In another example, cas9 (StCas 9) from streptococcus thermophilus (Streptococcus thermophilis) recognizes NNAGAAW. In another example, cas9 (TdCas) from treponema dentatum (Treponema denticola) recognizes NAAAAC. These are examples and are not meant to be limiting. It is further understood that non-SpCas 9 binds to a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cleavage site. Furthermore, non-SpCas 9 may have other features that may make them more useful than SpCas 9. For example, cas9 (SaCas 9) from staphylococcus aureus is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV). Reference is further made to Shah et al, "Protospacer recognition motifs: mixed identities and functional diversity," RNA Biology,10 (5): 891-899 (incorporated herein by reference).
Reverse transcriptase
The term "reverse transcriptase" describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require primers to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA, which can then be cloned into a vector for further manipulation. Avian Myoblast (AMV) virus (AMV) reverse transcriptase is the first widely used RNA-dependent DNA polymerase (Verma, biochim. Biophys. Acta473:1 (1977)). The enzyme has 5 'to 3' RNA-directed DNA polymerase activity, 5 'to 3' DNA-directed DNA polymerase activity and RNase H activity. RNase H is a persistent 5 'and 3' ribonuclease specific for the RNA strand used in RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, new York: wiley & Sons (1984)). Reverse transcriptase cannot correct errors in transcription because known viral reverse transcriptases lack the 3 'to 5' exonuclease activity necessary for proofreading (Saunders and Saunders, microbial Genetics Applied to Biotechnology, london: croom Helm (1987)). A detailed study of AMV reverse transcriptase activity and its associated RNase H activity is provided by Berger et al, biochemistry 22:2365-2372 (1983). Another reverse transcriptase widely used in molecular biology is one derived from Moloney (Moloney) murine leukemia Virus (M-MLV). See, e.g., gerard, G.R., DNA 5:271-279 (1986) and Kotewicz, M.L., et al, gene 35:249-258 (1985). M-MLV reverse transcriptase substantially lacking RNase H activity is also described. See, for example, U.S. patent No. 5,244,797. The present invention contemplates the use of any such reverse transcriptase, or variants or mutants thereof.
Furthermore, the present invention contemplates the use of error-prone reverse transcriptases, i.e., which may be referred to as error-prone reverse transcriptases or reverse transcriptases that do not support high fidelity nucleotide incorporation during polymerization. During the synthesis of an RT template-based single-stranded DNA flap integrated with the guide RNA, an error-prone reverse transcriptase may introduce one or more nucleotides mismatched to the RT template sequence, thereby introducing changes to the nucleotide sequence through the error-polymerization of the single-stranded DNA flap. These errors introduced during single-stranded DNA flap synthesis are then integrated into the double-stranded molecule as follows: hybridization with the corresponding endogenous target strand, removal of endogenous displaced strand, ligation, and then a round of endogenous DNA repair and/or sequencing.
Reverse transcription
As used herein, the term "reverse transcription" refers to the ability of an enzyme to synthesize a DNA strand (i.e., complementary DNA or cDNA) using RNA as a template. In some embodiments, reverse transcription may be "error-prone reverse transcription," which refers to the property of certain reverse transcriptases to be error-prone in their DNA polymerization activity.
Proteins, peptides and polypeptides
The terms "protein," "peptide," and "polypeptide" are used interchangeably herein to refer to a polymer of amino acid residues joined together by peptide (amide) bonds. These terms refer to proteins, peptides or polypeptides of any size, structure or function. Typically, a protein, peptide or polypeptide is at least three amino acids in length. A protein, peptide or polypeptide may refer to a single protein or collection of proteins. One or more amino acids in a protein, peptide or polypeptide may be modified, for example, by the addition of chemical entities such as carbohydrate groups, hydroxyl groups, phosphate groups, farnesyl groups, isofarnesyl groups, fatty acid groups, linkers for conjugation, functionalization or other modification, and the like. The protein, peptide or polypeptide may also be a single molecule or may be a multi-molecular complex. The protein, peptide or polypeptide may be simply a fragment of a naturally occurring protein or peptide. The protein, peptide or polypeptide may be naturally occurring, recombinant or synthetic, or any combination thereof. Any of the proteins provided herein can be produced by any method known in the art. For example, the proteins provided herein can be produced via recombinant protein expression and purification, which is particularly useful for fusion proteins comprising a peptide linker. Methods of recombinant protein expression and purification are well known, including those described in Green and Sambrook, molecular Cloning: A Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)), the entire contents of which are incorporated herein by reference.
Protein splicing
As used herein, the term "protein splicing" refers to the process of cleaving a sequence, an intein (or cleaving an intein, as the case may be), from an amino acid sequence, and the remainder of the fragment of the amino acid sequence, the extein, is joined via an amide bond to form a contiguous amino acid sequence. The term "trans" protein splicing refers to the specific case where inteins are split inteins and they are located on different proteins.
Second strand incision generation
The break down of heteroduplex DNA (i.e., containing one edit strand and one non-edit strand) formed as a result of guided editing determines long-term editing results. In other words, the goal of guided editing is to break down heteroduplex DNA formed as PE intermediates by permanently integrating the editing strand into the complementary endogenous strand (the editing strand paired with the endogenous non-editing strand). The method of "second strand break up" may be used herein to help drive the break up of heteroduplex DNA to facilitate permanent integration of the editing strand into the DNA molecule. As used herein, the concept of "second strand nick creation" refers to the introduction of a second nick (i.e., providing an initial nick site at the free 3' end for priming reverse transcriptase on the extension of the guide RNA) at a location downstream of the first nick, preferably on the unedited strand. In certain embodiments, the first and second cuts are located on opposite strands. In other embodiments, the first and second cuts are located on opposite strands. In yet another embodiment, the first nick is located on a non-target strand (i.e., a strand that forms the single-stranded portion of the R-loop), and the second nick is located on a target strand. In other embodiments, the first cut is located on an edit chain and the second cut is located on a non-edit chain. The second cut can be located at least 5 nucleotides downstream of the first cut, or at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or 150 or more nucleotides downstream of the first cut. In certain embodiments, the second nick may be introduced between about 5 and 150 nucleotides, or between about 5 and 140, or between about 5 and 130, or between about 5 and 120, or between about 5 and 110, or between about 5 and 100, or between about 5 and 90, or between about 5 and 80, or between about 5 and 70, or between about 5 and 60, or between about 5 and 50, or between about 5 and 40, or between about 5 and 30, or between about 5 and 20, or between about 5 and 10, distal to the peprna-induced nick site on the non-editing strand. In one embodiment, the second nick is introduced between 14 to 116 nucleotides away from the pegRNA-induced nick. Without being bound by theory, the second nick induces endogenous DNA repair and replication processes of the cell toward unedited strand displacement or editing, thereby permanently installing editing sequences on both strands and breaking down heteroduplex formation due to PE. In some embodiments, the edited strand is a non-target strand, and the unedited strand is a target strand. In other embodiments, the edited strand is a target strand and the unedited strand is a non-target strand.
Sense strand
In genetics, the "sense" strand is a fragment of double-stranded DNA that extends from 5 'to 3' and is complementary to the antisense or template strand of DNA that extends from 3 'to 5'. In the case of DNA fragments encoding proteins, the sense strand is a DNA strand having the same sequence as the mRNA, which templates the antisense strand during transcription and eventually undergoes (typically, not always) translation into a protein. Thus, the antisense strand is responsible for the production of RNA that is later translated into protein, while the sense strand has nearly the same composition as mRNA. Note that for each segment of dsDNA, there may be two sets of sense and antisense, depending on the direction of reading (since sense and antisense are relative to the view angle). Ultimately specifying which strand of a dsDNA fragment is referred to as sense or antisense is the gene product or mRNA.
In the case of pegRNA, the first step is the synthesis of single stranded complementary DNA oriented in the 5' to 3' direction (i.e., the incorporated 3' ssDNA flap) that leaves the pegRNA extension arm as a template. Whether a 3' ssdna flap should be considered as the sense or antisense strand depends on the direction of transcription, as it is accepted that both DNA strands can serve as templates for transcription (but not simultaneously). Thus, in some embodiments, the 3' ssdna flap (which extends generally in the 5' to 3' direction) will act as the sense strand, as it is the coding strand. In other embodiments, the 3' ssdna flap (which extends generally in the 5' to 3' direction) will act as an antisense strand and thus as a transcription template.
Spacer sequences
As used herein, the term "spacer sequence" in reference to a guide RNA or pegRNA refers to a portion of about 20 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, or 24 nucleotides) of the guide RNA or pegRNA that contains a nucleotide sequence that is complementary to a target strand. In some embodiments, the spacer sequence hybridizes to a region on the target strand that is complementary to a pre-spacer on the non-target strand to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R-loop ssDNA structure of a complementary endogenous DNA strand on the non-target strand.
A subject
As used herein, the term "subject" refers to an individual organism, such as an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, goat, cow, cat or dog. In some embodiments, the subject is a vertebrate, amphibian, reptile, fish, insect, fly, or nematode. In some embodiments, the subject is a study animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of any sex, or may be at any stage of development.
Fragmentation inteins
Although inteins are most commonly found as continuous domains, some exist in naturally broken forms. In this case, the two fragments are expressed as separate polypeptides and must associate before splicing occurs, so-called protein trans-splicing.
An exemplary break intein is an Ssp DnaE intein, which comprises two subunits, namely DnaE-N and DnaE-C. The two different subunits are encoded by different genes, namely dnaE-N and dnaE-C, encoding dnaE-N and dnaE-C subunits, respectively. DnaE is a broken intein naturally occurring in Synechocytis sp PCC6803 and is capable of directing trans-splicing of two different proteins, each comprising a fusion with DnaE-N or DnaE-C.
Other naturally occurring or engineered cleaved intein sequences are known in the art or can be prepared from the complete intein sequences described herein or those available in the art. Examples of disrupted intein sequences can be found in Stevens et al, "A promiscuous split intein with expanded protein engineering applications," PNAS,2017, vol.114:8538-8543; iwai et al, "Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme, FEBS Lett,580:1853-1858, each of which is incorporated herein by reference. Other disrupted intein sequences can be found, for example, in WO2013/045632, WO2014/055782, WO2016/069774 and EP2877490, the respective contents of which are incorporated herein by reference.
In addition, trans-protein splicing has been described in vivo and in vitro (Shingledecker, et al, gene 207:187 (1998), southworth, et al, EMBO J.17:918 (1998), mills, et al, proc.Natl. Acad. Sci. USA,95:3543-3548 (1998), lew, et al, J.biol. Chem.,273:15887-15890 (1998), wu, et al, biochim. Biophys. Acta 35732:1 (1998 b), yamazaki, et al, J.am. Chem. 120:5591 (1998), evans, et al, J.biol. Chem. 9091 (2000), oto, et al, biochemi. 38:16040-16044 (1999), otoo et al, J.Biol. 14:105-105, sciro.J.J.Biol. 136.1999), and the expression of the expression products is carried out separately from, for example, see FIGS. 96.67, and the expression of the two expressed fragments, see FIGS. Sciro, sc.1998, sc.J.am.chem.120, sc.55, J.55.J.chem.55, evans, et al.
Target site
The term "target site" refers to a sequence within a nucleic acid molecule edited by a guided editor (PE) as disclosed herein. Target site also refers to the sequence within the nucleic acid molecule that binds to a complex of the guide editor (PE) and the gRNA.
tPERT
See definition of "trans-guide editor RNA template (tPERT)".
Time-series second chain cut generation
As used herein, the term "time-series second chain cut generation" refers to a variant of second chain cut generation whereby installing a second cut in an unedited chain occurs only after installing the desired edit in the edit chain. This avoids the simultaneous occurrence of nicks on both strands, which can lead to double-stranded DNA breaks. The second-strand-incision-producing guide RNAs are designed for timing control so that the second strand incision is introduced after installation of the desired edit. This is achieved by designing the gRNA with a spacer sequence that matches only the editing strand, not the original allele. Using this strategy, mismatches between the pre-spacer and unedited alleles should be detrimental to nicking by the sgrnas until after an editing event on the PAM strand occurs.
Trans-lead editing
As used herein, the term "trans-guide editing" refers to a modified form of guide editing with a split-type pegRNA, i.e., wherein the pegRNA is split into two separate molecules: sgRNA and trans-guide editing RNA template (tPERT). sgrnas are used to target the guide editor (or more generally, the napDNAbp component of the guide editor) to the desired genomic target site, whereas tprt is used by a polymerase (e.g., reverse transcriptase) to write a new DNA sequence to the target locus once it is recruited in trans to the guide editor by interaction of binding domains located on the guide editor and the tprt. In one embodiment, the binding domain may include an RNA-protein recruitment portion, such as an MS2 aptamer located on tPERT and an MS2cp protein fused to a guide editor. The advantage of trans-directed editing is that by separating the DNA synthesis template from the guide RNA, a longer length template can potentially be used.
Embodiments of trans-lead editing are shown in fig. 3G and 3H. The left side of FIG. 3G shows the composition of a trans-guide editor complex ("RP-PE: gRNA complex") comprising napDNAbp fused to each of a polymerase (e.g., reverse transcriptase) and a rPERT recruitment protein (e.g., MS2 sc) and complexed with a guide RNA. FIG. 3G also shows a separate tPERT molecule comprising the extension arm features of the pegRNA, including the DNA synthesis template and primer binding sequences. the tPERT molecule also includes an RNA-protein recruitment domain (in this case, it is a stem-loop structure, which may be, for example, an MS2 aptamer). As shown in the process depicted in FIG. 3H, the RP-PE gRNA complex binds to and nicks the target DNA sequence. Then, a Recruitment Protein (RP) recruits tPERT to co-localize to the guide editing complex that binds to the DNA target site, thereby binding the primer binding site to the primer sequence on the nick strand, and subsequently, allowing a polymerase (e.g., RT) to synthesize DNA single strands against the DNA synthesis template up to 5' of tPERT.
While tPERT is shown in fig. 3G and 3H to contain PBS and DNA synthesis templates at the 5 'end of the RNA-protein recruitment domain, tPERT in other configurations can be designed with PBS and DNA synthesis templates located at the 3' end of the RNA-protein recruitment domain. However, an advantage of having a 5 'extended tPERT is that the synthesis of the DNA single strand will terminate naturally at the 5' end of tPERT, thus there is no risk of using any part of the RNA-protein recruitment domain as a template during the DNA synthesis phase of the guided editing.
Conversion of
As used herein, "switch" refers to the exchange of purine nucleobasesOr pyrimidine nucleobasesSuch exchanges involve nucleobases of similar shape. The compositions and methods disclosed herein are capable of inducing one or more switches in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing transitions and transversions in the same target DNA molecule. These changes involve +.>Or->. In the case of double stranded DNA with Watson-Crick paired nucleobases, the transversion refers to the following base pair exchange: />Or (b)The compositions and methods disclosed herein are capable of inducing one or more switches in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing transitions and transversions, as well as other nucleotide changes, including deletions and insertions, in the same target DNA molecule.
Transversion
As used herein, "transversion" refers to the exchange of purine nucleobases with pyrimidine nucleobases, or vice versa, thus involving the exchange of nucleobases having dissimilar shapes. These changes involve And->In the case of double stranded DNA with Watson-Crick paired nucleobases, the transversionThe substitutions refer to the following base pair exchanges: /> Andthe compositions and methods disclosed herein are capable of inducing one or more transversions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversions, as well as other nucleotide changes, including deletions and insertions, in the same target DNA molecule.
Treatment of
The term "treatment" refers to a clinical intervention intended to reverse, alleviate, delay onset, or inhibit progression of a disease or disorder or one or more symptoms thereof as described herein. As used herein, the term "treatment" refers to a clinical intervention intended to reverse, alleviate, delay onset, or inhibit progression of a disease or disorder or one or more symptoms thereof as described herein. In some embodiments, treatment may be performed after one or more symptoms have occurred and/or after the disease has been diagnosed. In other embodiments, treatment may be performed without symptoms, for example, to prevent or delay the onset of symptoms or to inhibit the onset or progression of disease. For example, a susceptible individual may be treated prior to the appearance of symptoms (e.g., based on a history of symptoms and/or based on genetic or other susceptibility factors). Treatment may also continue after the symptoms subside, e.g., to prevent or delay recurrence thereof.
Upstream of
As used herein, the terms "upstream" and "downstream" are relative terms that define the linear position of at least two elements located in a nucleic acid molecule (whether single-stranded or double-stranded) oriented in the 5 'to 3' direction. In particular, in a nucleic acid molecule in which the first element is located somewhere 5' of the second element, the first element is upstream of the second element. For example, if the SNP is located 5' to the nick site, the SNP is located upstream of the Cas 9-induced nick site. In contrast, in a nucleic acid molecule in which a first element is located somewhere 3' of a second element, the first element is located downstream of the second element. For example, if the SNP is located 3' to the nick site, the SNP is located downstream of the Cas 9-induced nick site. The nucleic acid molecule may be DNA (double-stranded or single-stranded), RNA (double-stranded or single-stranded), or a hybrid of DNA and RNA. Single-stranded nucleic acid molecules are identical to double-stranded molecules in analysis, as the terms upstream and downstream refer only to the single strand of the nucleic acid molecule, only to select which strand of the double-stranded molecule to consider. Generally, the strand of double-stranded DNA that can be used to determine the positional relatedness of at least two elements is the "sense" or "coding" strand. In genetics, the "sense" strand is a fragment of double-stranded DNA that extends from 5 'to 3', which is complementary to the antisense strand or template strand of DNA (which extends from 3 'to 5'). Thus, for example, if a SNP nucleobase is 3' to a promoter on the sense or coding strand, the SNP nucleobase is "downstream" of the promoter sequence in genomic DNA (which is double-stranded).
Variants
As used herein, the term "variant" shall be understood to refer to Cas9 that exhibits characteristics that have a pattern that deviates from the pattern that occurs in nature, e.g., variant Cas9 is Cas9 that comprises one or more amino acid residue changes as compared to the wild-type Cas9 amino acid sequence. The term "variant" encompasses homologous proteins having at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99% percent identity to a reference sequence and having the same or substantially the same functional activity as the reference sequence. The term also encompasses mutants, truncations, or domains of the reference sequence and exhibit one or more functional activities that are identical or substantially identical to the reference sequence.
Carrier body
As used herein, the term "vector" refers to a nucleic acid that can be modified to encode a gene of interest and which is capable of entering a host cell, mutating and replicating within the host cell, and then transferring the replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or phages and filamentous phages, as well as conjugation plasmids. Other suitable carriers will be apparent to those skilled in the art based on this disclosure.
Wild type
As used herein, the term "wild-type" is a term of art understood by the skilled artisan and means an organism, strain, gene or characteristic of a typical form as it exists in nature, as opposed to a mutant or variant form.
5' endogenous DNA flap
As used herein, the term "5' endogenous DNA flap" refers to a DNA strand located in the target DNA immediately downstream of the PE-induced nick site. Nicking the target DNA strand by PE exposes a 3 'hydroxyl group on the upstream side of the nicking site and a 5' hydroxyl group on the downstream side of the nicking site. The endogenous strand ending with the 3' hydroxyl group is used to prime the DNA polymerase that directs the editor (e.g., where the DNA polymerase is a reverse transcriptase). The endogenous strand downstream of the nicking site and starting with the exposed 5' hydroxyl group is called the "5' endogenous DNA flap", which is eventually removed and replaced by the newly synthesized replacement strand encoded by the extension of the pegRNA (i.e. "3' replacement DNA flap").
5' endogenous DNA flap removal
As used herein, the term "5' endogenous DNA flap removal" or "5' flap removal" refers to the removal of a 5' endogenous DNA flap formed when a single-stranded DNA flap synthesized at RT competitively invades and hybridizes to endogenous DNA, displacing the endogenous strand in the process. Removal of such endogenous displaced strands may drive the reaction toward the formation of a desired product comprising the desired nucleotide changes. The cell's own DNA repair enzyme may catalyze the removal or excision of the 5' endogenous flap (e.g., a flap endonuclease such as EXO1 or FEN 1). The host cell may also be transformed to express one or more enzymes (e.g., flap endonucleases) that catalyze the removal of the 5' endogenous flap, thereby driving the process toward product formation. Flap endonucleases are known in the art, and can be found and described in Patel et al, "Flap endonucleases pass 5'-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free' -ends," Nucleic Acids Research,2012,40 (10): 4507-4519, and Tsutakawa et al, "Human flap endonuclease structures, DNA double-base flip, and a unified understanding of the FEN1 superfamity," Cell,2011,145 (2): 198-211, each incorporated herein by reference.
3' substitution DNA flap
As used herein, the term "3' replacement DNA flap" or simply "replacement DNA flap" refers to a DNA strand synthesized by the guide editor and encoded by the extension arm of the guide editor pegRNA. More specifically, the 3' replacement DNA flap is encoded by the polymerase template of the pegRNA. The 3 'replacement DNA flap comprises the same sequence as the 5' endogenous DNA flap except that it also contains an editing sequence (e.g., single nucleotide changes). The 3 'replacement DNA flap anneals to the target DNA, replaces or replaces the 5' endogenous DNA flap (e.g., can be excised by a 5 'flap endonuclease such as FEN1 or EXO 1) and then ligates to join the 3' end of the 3 'replacement DNA flap to the exposed 5' hydroxyl end of the endogenous DNA (exposed after excision of the 5 'endogenous DNA flap, thereby reforming a phosphodiester linkage and installing the 3' replacement DNA flap to form heteroduplex DNA containing an edited strand and an unedited strand.
The terms "cleavage site", "cleavage site" and "cleavage site" are used interchangeably in the context of guided editing herein to refer to a specific position between two nucleotides or two base pairs in a double-stranded target DNA sequence. In some embodiments, the position of the cleavage site is determined relative to the position of a particular PAM sequence. In some embodiments, the nicking site is a specific location at which a nick will occur when the double stranded target DNA is contacted with a napDNAbp (e.g., a nicking enzyme such as Cas nicking enzyme) that recognizes a specific PAM sequence. For each PEgRNA described herein, the nick site is a characteristic of a particular napDNAbp with which the gRNA core of the PEgRNA is associated and is a characteristic of a particular PAM required to recognize and function of the napDNAbp. For example, for PEgRNA comprising a gRNA core associated with SpCas9, the cleavage site in the phosphodiester linkage is located between base three (position "-3" relative to position 1 of the PAM sequence) and base four (position "-4" relative to position 1 of the PAM sequence).
In some embodiments, the nicking site is in the target strand of the double-stranded target DNA sequence. In some embodiments, the nicking site is in a non-target strand of the double-stranded target DNA sequence. In some embodiments, the cleavage site is located in the pre-spacer sequence. In some embodiments, the cleavage site is adjacent to the pre-spacer sequence. In some embodiments, the nicking site is downstream of a region (e.g., on a non-target strand) that is complementary to a primer binding site of the PEgRNA. In some embodiments, the nicking site is downstream of the region that binds to the primer binding site of PEgRNA (e.g., on a non-target strand). In some embodiments, the nicking site is immediately downstream of a region (e.g., on a non-target strand) that is complementary to the primer binding site of the PEgRNA. In some embodiments, the nicking site is upstream of a specific PAM sequence on a non-target strand of the double-stranded target DNA, wherein the PAM sequence is specific for the recognition of napDNAbp associated with the gRNA core of PEgRNA. In some embodiments, the nicking site is located downstream of a specific PAM sequence on a non-target strand of the double-stranded target DNA, wherein the PAM sequence is specific for the recognition of napDNAbp associated with the gRNA core of PEgRNA. In some embodiments, the cleavage site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by streptococcus pyogenes Cas9 cleavage enzyme, corynebacterium parvulus (p.labmentivorans) Cas9 cleavage enzyme, diphtheria bacillus Cas9 cleavage enzyme, neisseria gray Cas9, staphylococcus aureus Cas9, or n.lari Cas9 cleavage enzyme. In some embodiments, the nicking site is 3 nucleotides upstream of the PAM sequence and the PAM sequence is recognized by a Cas9 nickase, wherein the Cas9 nickase comprises a nuclease active HNH domain and a nuclease inactive RuvC domain. In some embodiments, the nicking site is 2 base pairs upstream of the PAM sequence and the PAM sequence is recognized by a streptococcus thermophilus Cas9 nickase.
Detailed description of certain embodiments
The present disclosure provides next generation modified pegrnas with improved properties, including but not limited to increased stability, increased in vivo lifetime, and/or improved binding affinity to napDNAbp. These modified pegRNAs, when used in combination with a guided editor (e.g., a fusion protein comprising a Cas9 nickase domain and a reverse transcriptase domain), increase the activity and/or efficiency of guided editing. In particular, the inventors have found that there may be various drawbacks to the pegRNA, including reduced affinity for a nucleic acid programmable DNA binding protein (e.g., cas9 nickase), increased sensitivity to degradation (in particular, degradation of the extension arm) relative to typical single guide RNAs (sgrnas), and a tendency to inactivate due to unwanted duplex formation between the extension arm (in particular, the primer binding site of the extension arm) and the spacer sequence of the pegRNA, thereby competing for binding with the pegRNA spacer and the primer binding site of the target DNA strand. Without being bound by theory, these problems arise because of the presence of the extension arm characteristic as part of the pegRNA that is not present in typical sgrnas. To overcome these drawbacks, the inventors have found that pegRNAs can be modified in one or more ways to increase their overall stability and/or performance in initiating editing. First, the inventors found that adding one or more RNA structural motifs to the pegRNA can prevent degradation of the pegRNA. Such RNA structural motifs may include, but are not limited to, (I) prequeosin 1-1 riboswitch aptamer (evopraQ 1) and variants thereof, (ii) frameshift pseudojunctions from Moloney Murine Leukemia Virus (MMLV) 22, hereinafter referred to as "mpknot" and variants thereof, (iii) G-quadruplexes, (iv) hairpin structures (e.g., 15-bp hairpin), (v) xrRNA, and (vi) P4-P6 domains of group I introns. Second, the inventors have discovered various ways to reduce duplex formation between the Primer Binding Site (PBS) of the extension arm and the spacer sequence of the pegRNA (i.e., reduce PBS/spacer binding interactions). In one embodiment, PBS/spacer binding agent interactions are avoided by stabilizing the 3' extension arm, including, but not limited to, (i) blocking the PBS with a pivot point that dissociates upon napDNAbp (e.g., cas9 nickase) binding, (ii) providing a 3' trans extension arm, i.e., moving the 3' extension arm or portion thereof (e.g., PBS and/or PBS and DNA template portion) from the perna to another molecule, e.g., a nicking-generating gRNA, and (iii) introducing a chemical modification of the perna that favors RNA/DNA duplex formation but favors RNA/RNA duplex formation, thereby promoting the desired interaction between the PBS of the perna and the target DNA. In general, the modified pegrnas disclosed herein as a result of implementing these strategies are referred to herein as "engineered" pegrnas or "epegrnas" or equivalently "modified pegrnas".
Furthermore, the present disclosure provides a guided editing complex comprising a guided editor complexed with an engineered pegRNA disclosed herein, as well as nucleotide sequences and expression vectors encoding the engineered pegRNA and a guided editing complex comprising an engineered pegRNA. Still further, the present disclosure provides guided editing-based genome editing methods that involve using the guided editing fusion proteins disclosed herein in complex with engineered pegrnas to install desired nucleotide sequence changes at desired sites in the genome, characterized by higher editing efficiency compared to guided editing using typical pegrnas (i.e., those not modified in the manner described herein). The present disclosure also provides cells and kits comprising the modified pegrnas disclosed herein, or guided editing complexes comprising the modified pegrnas. The present disclosure also provides methods of making the disclosed modified pegRNAs comprising combining one or more structural nucleotide motifs (e.g., evatreQ 1 -1、evopreQ 1 -1 or modified MMLV tRNA) is coupled to the end of the extension arm of the pegRNA, optionally through a nucleotide linker. The present disclosure also provides methods of delivering modified pegRNA and guided editor components to target cells for genome editing at a desired editing site, and methods of treating genetic disorders using guided editing in combination with the modified pegRNA disclosed herein.
[1]Guided editing
The present invention relates to improved forms of "guided editing" that utilize modified or equivalent engineered pegrnas engineered to contain one or more structural modifications that improve one or more characteristics, including their stability, cell life, affinity for Cas9 (or more broadly, to napDNAbp), or interactions with target DNA (e.g., improving interactions between primer binding sites and target DNA) to increase editing efficiency of guided editing. The inventors developed guided editing as a "Search and replace" genome editing tool, further described in Anzalone et al, "Search-and-replace genome editing without double-strand breaks or donor DNA," Nature, october 21,2019,576, pp.149-157, the contents of which are incorporated herein by reference in their entirety.
Guided editing is a multifunctional and precise genomic editing method that uses a nucleic acid programmable DNA binding protein ("napDNAbp") working in concert with a polymerase (i.e., provided in fusion protein form or otherwise in trans with napDNAbp) to write new genetic information directly to a designated DNA site, where the guided editing system programs (or as in the present disclosure, programs) with a guided editing (PE) guide RNA ("perna") that both specify a target site and templates the synthesis of the desired editing by engineering an extension (DNA or RNA) onto the guide RNA (e.g., at the 5 'or 3' end, or at an internal portion of the guide RNA) to replace the form of the DNA strand. A replacement strand containing the desired edit (e.g., a single nucleobase substitution, deletion, or insertion) shares the same sequence with the endogenous strand of the target site to be edited (except that the replacement strand contains the desired edit). The endogenous strand of the target site is replaced by a newly synthesized replacement strand containing the desired editing, through DNA repair and/or replication mechanisms. In some cases, guided editing may be considered a "search and replace" genome editing technique because, as described herein, the guided editor not only searches for and locates the desired target site to be edited, but also encodes a replacement strand containing the desired edit that is installed at the location of the endogenous DNA strand of the corresponding target site.
In various embodiments, the guided editing is performed by contacting the target DNA molecule (desirably introducing a change in nucleotide sequence) with a nucleic acid programmable DNA binding protein (napDNAbp) of the composite perna (or epegRNA engineered as in the present disclosure). Referring to fig. 1g, a pegrna (or epegRNA) comprises an extension at the 3 'or 5' end of a guide RNA, or at an intramolecular position of a guide RNA, and encodes a desired nucleotide change (e.g., a single nucleotide change, insertion, or deletion). In step (a), the napDNAbp/pegRNA complex (or napDNAbp/epegr rna complex in the present disclosure) contacts a DNA molecule, and e/pegRNA directs napDNAbp binding to a target locus. In step (b), a nick is introduced (e.g., by a nuclease or chemical agent) in one strand of the target locus, thereby creating a useful 3' end in one strand of the target locus. In certain embodiments, the nicks are generated in the DNA strand corresponding to the R-loop strand, i.e., the strand that does not hybridize to the guide RNA sequence, i.e., the "non-target strand". However, an incision may be introduced in either strand. That is, a nick may be introduced into the R loop "target strand" (i.e., the strand that hybridizes to the spacer sequence of the pegRNA) or "non-target strand" (i.e., the strand that forms the single-stranded portion of the R loop and is complementary to the target strand). In step (c), the 3' end of the DNA strand (formed by the nick) interacts with the extension of the guide RNA to direct reverse transcription (i.e. "target-directed RT"). In certain embodiments, the 3' terminal DNA strand hybridizes to a specific RT leader sequence, i.e., a "reverse transcriptase leader sequence," on the guide RNA extension. In step (d), a reverse transcriptase (as a fusion protein with napDNAbp or trans) is introduced, which synthesizes single stranded DNA from the 3 'end of the priming site to the 5' end of the e/pegRNA. This forms a single-stranded DNA flap containing the desired nucleotide changes (e.g., single base changes, insertions or deletions, or a combination thereof) and is otherwise homologous to endogenous DNA at or near the nick site. In step (e), the napDNAbp and e/pegRNA are released. Steps (f) and (g) involve the breakdown of single stranded DNA flaps so that the desired nucleotide changes are incorporated into the target locus. This process can be driven to the desired product formation by removal of the corresponding 5 'endogenous DNA flap (once the 3' single stranded DNA flap invades and hybridizes to the endogenous DNA sequence to form) (e.g., by trans-supplied FEN1 or similar enzyme, as a fusion with a guide editor, or endogenously supplied). Without being bound by theory, the endogenous DNA repair and replication process of the cell addresses mismatched DNA to incorporate nucleotide changes to form the desired altered product. This process may also be driven to product formation by "second strand-break generation", as illustrated in fig. 1G, or by "time-series second strand-break generation", as illustrated in fig. 1I and discussed herein.
In another embodiment of guided editing, fig. 3F depicts the interaction of a typical pegRNA (which may be substituted by an epegRNA disclosed herein) with a target site of double stranded DNA and the concomitant generation of a 3' single stranded DNA flap containing a genetic change of interest. Double stranded DNA is shown with the upper strand in the 3 'to 5' direction and the lower strand in the 5 'to 3' direction. The upper strand comprises a "pre-spacer" and PAM sequence, referred to as a "target strand". The complementary lower strand is referred to as the "non-target strand". Although not shown, the pegRNA described will complex with Cas9 or equivalent. As shown schematically, the spacer sequence of the pegRNA anneals to a complementary region on the target strand, called the pre-spacer, downstream of the PAM sequence, of about 20 nucleotides in length. This interaction forms a DNA/RNA hybrid between the spacer RNA and the pre-spacer DNA and induces the formation of an R loop in the region opposite the pre-spacer. As taught elsewhere herein, cas9 protein (not shown) then induces a nick in the non-target strand, as shown. This then leads to the formation of a 3'ssdna flap region, which interacts with the 3' end of the pegRNA at the primer binding site, according to x z. The 3' end of the ssDNA flap (i.e., the reverse transcriptase primer sequence) anneals to the primer binding site (a) on the pegRNA, thereby directing reverse transcriptase. Next, the reverse transcriptase (e.g., provided in trans or as a fusion protein cis, attached to the Cas9 construct) then polymerizes single stranded DNA encoded by the editing template (B) and homology arm (C) (together constituting a DNA synthesis template). The polymerization continues to extend toward the 5' end of the extension arm. The polymeric strands of ssDNA form the ssDNA 3 'end flap, as described elsewhere (e.g., as shown in fig. 1G), invade the endogenous DNA, displace the corresponding endogenous strand (the 5' DNA flap that is the endogenous DNA is removed), and install the desired nucleotide edits (single nucleotide base pair alterations, deletions, insertions (including the entire gene)) through the DNA repair/replication cycle.
In various embodiments, the boot editor relies on a boot editing mechanism (e.g., as shown in the various embodiments of fig. 1A-1F). In various embodiments, the guide editor comprises a Cas protein-reverse transcriptase fusion or related system to target a specific DNA sequence with a guide RNA, generate a single-stranded nick at the target site, and use the nicked DNA as a primer for reverse transcription and an engineered reverse transcriptase template for guide RNA integration. The guidance editors described herein are not limited to reverse transcriptase, but may include the use of nearly all DNA polymerases. Indeed, while the entire application may refer to a guided editor having a "reverse transcriptase," the discussion herein of reverse transcriptase is but one type of DNA polymerase that may serve as a guide editor. Thus, where the specification refers to "reverse transcriptase," one of ordinary skill in the art will appreciate that any suitable DNA polymerase may be used in place of reverse transcriptase. Thus, in one aspect, the guide editor may comprise Cas9 (or equivalently napDNAbp) programmed to target DNA sequences by associating it with a specific guide RNA (i.e., pegRNA) that contains a spacer sequence that anneals to the complementary pre-spacer of the target DNA. The specific guide RNAs also contain new genetic information in the form of an extension encoding a DNA substitution strand containing the desired genetic alteration for substitution of the corresponding endogenous DNA strand at the target site. To transfer information from the pegRNA to the target DNA, the guided editing mechanism involves nicking the target site in one DNA strand to expose the 3' -hydroxyl. The exposed 3' -hydroxyl can then be used to guide the polymerization of the DNA encoding the extension on the pegRNA directly into the target site. In various embodiments, the extension (which provides a template for polymerization containing the edited substitution strand) may be formed from RNA or DNA. In the case of RNA extension, the polymerase that directs the editor may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of DNA extension, the polymerase that directs the editor may be a DNA-dependent DNA polymerase.
The new synthetic strand formed by the guided editor disclosed herein (i.e., containing the substitution DNA strand desired to be edited) will be homologous (i.e., have the same sequence) to the genomic target sequence except for the desired nucleotide change (e.g., a single nucleotide change, a deletion or an insertion, or a combination thereof). The newly synthesized synthetic (or replacement) DNA strand, which may also be referred to as a single-stranded DNA flap, will compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase (e.g., provided as a fusion protein with a Cas9 domain, or provided in trans to a Cas9 domain). Error-prone reverse transcriptase can introduce changes during the synthesis of single stranded DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase may be utilized to introduce nucleotide changes into target DNA. The variation may be random or non-random depending on the error-prone reverse transcriptase used by the system.
The decomposition of the hybrid intermediate (comprising single-stranded DNA flaps synthesized by reverse transcriptase hybridized to endogenous DNA strands) may include removal of the resulting displaced flaps of endogenous DNA (e.g., with 5' end DNA flap endonuclease, FEN 1), ligation of the synthesized single-stranded DNA flaps to the target DNA, and assimilation of the desired nucleotide changes as a result of cellular DNA repair and/or replication processes. Since templated DNA synthesis provides single nucleotide precision for any nucleotide modification (including insertions and deletions), this approach is very broad in scope and can be predictably used for numerous applications in basic science and therapeutics.
In each of these embodiments of guided editing, the modified or engineered pegrnas described herein can be used in place of typical pegrnas to increase the editing efficiency of guided editing. Without being bound by theory, the increased editing efficiency is believed to result from improved pegRNA stability, improved pegRNA cell life, increased binding affinity of Cas9 to pegRNA or reduced binding interactions between the primer binding site and the spacer of the epegr na (and thus better interactions between the primer binding site and the target DNA).
Detailed description of the inventionvarious components of the guided editor contemplated herein are now described that can be used with the modified or engineered pegrnas described herein to increase the editing efficiency of guided editing.
[2]napDNAbp
The guidance editors described herein may include a nucleic acid programmable DNA binding protein (napDNAbp).
In one aspect, the napDNAbp can be bound to or complexed with at least one guide nucleic acid (e.g., guide RNA or pegRNA) that localizes the napDNAbp to a DNA sequence comprising a DNA strand (i.e., target strand) complementary to the guide nucleic acid or a portion thereof (e.g., the spacer of the guide RNA that anneals to the pre-spacer of the DNA target). In other words, the guide nucleic acid "programs" the napDNAbp (e.g., cas9 or equivalent) to locate and bind to the complement of the pre-spacer in the DNA.
Any suitable napDNAbp may be used in the bootstrapping editor described herein. In various embodiments, the napDNAbp can be any class 2 CRISPR-Cas system, including any type II, type V, or type VI CRISPR-Cas enzyme. In view of the rapid development of CRISPR-Cas as a genome editing tool, nomenclature for describing and/or identifying CRISPR-Cas enzymes has been continually developed, such as Cas9 and Cas9 orthologs. The present application refers to CRISPR-Cas enzyme nomenclature that may be old and/or new. Those skilled in the art will be able to determine the particular CRISPR-Cas enzyme referred to in the present application based on the nomenclature used, whether it is old (i.e. "legacy") or new. CRISPR-Cas nomenclature is widely discussed in Makarova et al, "Classification and Nomenclature of CRISPR-Cas Systems: white from heat? "The CRISPR Journal, vol.1.No.5,2018, the entire contents of which are incorporated herein by reference. The particular CRISPR-Cas nomenclature used in any given example of the application is not limited in any way, and one of skill in the art will be able to determine what CRISPR-Cas enzyme is referenced.
For example, the following type II, type V, and type VI type 2 CRISPR-Cas enzymes have the following old (i.e., legacy) and new names recognized in the art. Each of these enzymes and/or variants thereof may be used with the guidance editor described herein:
* See Makarova et al The CRISPR Journal, vol.1, no.5, 2018
Without being bound by theory, the mechanism of action of certain napDNAbp contemplated herein includes a step of forming an R-loop, whereby napDNAbp induces unwinding of the double stranded DNA target, thereby separating the strands in the region bound by napDNAbp. The guide RNA spacer then hybridizes to the "target strand" in a region complementary to the pre-spacer sequence. This displaces the "non-target strand" that is complementary to the target strand, which forms the single-stranded region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities that cleave DNA, leaving behind different types of lesions. For example, napDNAbp can comprise nuclease activity that cleaves a non-target strand at a first location and/or cleaves a target strand at a second location. Depending on the nuclease activity, the target DNA may be cleaved to form a "double strand break," thereby cleaving both strands. In other embodiments, the target DNA may be cleaved at only a single site, i.e., the DNA is "nicked" on one strand. Exemplary napDNAbp with different nuclease activities include "Cas9 nickase" ("nCas 9") and deactivated Cas9 without nuclease activity ("dead Cas9" or "dCas 9").
The following description of various napDNAbp that may be used in conjunction with the presently disclosed bootstrapping editor is not meant to be limiting in any way. The guide editor may comprise a classical SpCas9, or any orthologous Cas9 protein, or any variant Cas9 protein, including any naturally occurring Cas9 variant, mutant, or other engineered form, which is known or may be prepared or evolved by directed evolution or other mutagenesis processes. In various embodiments, cas9 or Cas9 variants have nickase activity, i.e., cleave only one strand of the target DNA sequence. In other embodiments, cas9 or Cas9 variants have an inactive nuclease, i.e., a "dead" Cas9 protein. Other variant Cas9 proteins that may be used are those having a smaller molecular weight (e.g., for easier delivery) than typical SpCas9 or having a modified or rearranged primary amino acid structure (e.g., in circular arrangement).
The guide editors described herein may also include Cas9 equivalents, including Cas12a (Cpfl) and Cas12b1 proteins, which are the result of convergent evolution. napDNAbp (e.g., spCas9, cas9 variants, or Cas9 equivalents) as used herein may also contain various modifications that alter/enhance its PAM specificity. Finally, the application contemplates any Cas9, cas9 variant, or Cas9 equivalent that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence (e.g., a reference SpCas9 canonical sequence or a reference Cas9 equivalent (e.g., cas12a (Cpf 1)).
napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeats) related nuclease. As described above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain a spacer, a sequence complementary to the preceding mobile element and a target invader nucleic acid. The CRISPR cluster was transcribed and processed into CRISPR RNA (crRNA). In a type II CRISPR system, correct processing of pre-crrnas requires trans-encoding small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc) and Cas9 proteins. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular dsDNA target complementary to the spacer in an endonucleolytic manner. Target strands that are not complementary to crrnas are first cut endonucleometrically and then 3'-5' trimmed exogenously. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, single stranded guide RNAs ("sgrnas", or simply "grnas") may be engineered to incorporate aspects of both crrnas and tracrrnas into a single RNA species. See, e.g., jink m.et al, science 337:816-821 (2012), the entire contents of which are incorporated herein by reference.
In some embodiments, napDNAbp directs cleavage of one or both strands at a target sequence position (e.g., within the target sequence and/or within the complement of the target sequence). In some embodiments, napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500 or more base pairs from the first or last nucleotide of the target sequence. In some embodiments, the vector encodes a napDNAbp that is mutated with respect to the corresponding wild-type enzyme such that the mutated napDNAbp lacks the ability to cleave one or both strands of a target polynucleotide comprising a target sequence. For example, aspartic acid to alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from streptococcus pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves single strand). Other examples of mutations that make Cas9 a nickase include, but are not limited to, H840A, N854A and N863A that reference an equivalent amino acid site in a typical SpCas9 sequence or other Cas9 variants or Cas9 equivalents.
As used herein, the term "Cas protein" refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a different sequence than a naturally occurring Cas protein, or any fragment of a Cas protein, but which retains all or a substantial amount of the essential functions required for the disclosed methods, i.e., (i) has the ability of the Cas protein to programmably bind to a nucleic acid of a target DNA, and (ii) to nick a target DNA sequence on one strand. Cas proteins contemplated herein include CRISPR Cas9 proteins, as well as Cas9 equivalents, variants (e.g., cas9 nickase (nCas 9) or nuclease-inactive Cas9 (dCas 9)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and can include Cas9 equivalents from any class 2 CRISPR system (e.g., type II, V, VI), including Cas12a (Cpf 1), cas12e (CasX), cas12b1 (C2C 1), cas12b2, cas12C (C2C 3), C2C4, C2C8, C2C5, C2C10, C2C9Cas13a (C2), cas13d, cas13C (C7), cas13b (C2C 6), and Cas13b. Other Cas equivalents are described in Makarova et al, "C2C2 is a single-component programmable RNA-guide RNA-targeting CRISPR effector," Science 2016;353 (6299) Makarova et al, "Classification and Nomenclature of CRISPR-Cas Systems: white from heat? "The CRISPR Journal, vol.1.No.5,2018, the contents of which are incorporated herein by reference.
The term "Cas9" or "Cas9 nuclease" or "Cas9 portion" or "Cas9 domain" includes any naturally occurring Cas9 from any organism, any naturally occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of Cas9 that is naturally occurring or engineered. The term Cas9 is not meant to be particularly limiting and may be referred to as "Cas9 or equivalent. Exemplary Cas9 proteins are further described herein and/or described in the art and are incorporated herein by reference. The present disclosure is not limited to the particular Cas9 used in the guide editor (PE) of the present invention.
Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., "Complete genome sequence of an M1 strain of Streptococcus pyogenes," ferrett et al, j.j., mcshin w.m., ajdic d.j., savic g., lyon k., primeaux c, sezate s, suvorov a.n., kenton s, lai h.s, lin s.p., qian y, jia h.g., najar f.z., ren q., zhu h., song l., white j., yuan x, clifton s.w., roe B.A., mcLaughlin R.E., proc.Natl.Acad.Sci.U.S.A.98:4663 (2001); "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III." Deltcheva E., "Chukinski K.," Sharma C.M., "Gonzales K.," Chao Y., "Pirzada Z.A.," Eckert M.R., "Vogel J.," Charpentier E., "Nature 471:602-607 (2011); and" Aprogrammable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity., "Jinek M.," Chundiski K., "Fonfara I.," Hauer M., "Doudna J.A.," Charpentier E.Science 337:816-821 (2012), the entire contents of each of which are incorporated herein by reference).
Examples of Cas9 and Cas9 equivalents are provided below; however, these specific examples are not meant to be limiting. The guide editor of the present disclosure may use any suitable napDNAbp, including any suitable Cas9 or Cas9 equivalent.
A.Wild-type classical SpCas9
In one embodiment, the guide editor constructs described herein may comprise a "classical SpCas9" nuclease from streptococcus pyogenes, which has been widely used as a tool for genome engineering and categorized as a type II subgroup enzyme of the class 2 CRISPR-Cas system. This Cas9 protein is a large multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to eliminate one or both nuclease activities, resulting in nickase Cas9 (nCas 9) or dead Cas9 (dCas 9), respectively, that still retain the ability to bind DNA in an sgRNA programming manner. In principle, cas9 or a variant thereof (e.g., nCas 9) can target a protein to almost any DNA sequence by co-expression with an appropriate sgRNA when fused to another protein or domain. As used herein, a typical SpCas9 protein refers to a wild-type protein from streptococcus pyogenes having the amino acid sequence:
/>
/>
/>
the guide editors described herein can include typical SpCas9, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type Cas9 sequence provided above. These variants may include SpCas9 variants that contain one or more mutations, including any known mutations reported by SwissProt accession No. Q99ZW2 (SEQ ID NO: 37), including:
Other wild-type SpCas9 sequences useful in the present disclosure include:
/>
/>
/>
/>
/>
/>
/>
/>
/>
the guide editor described herein can include any of the above SpCas9 sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
B.Wild type Cas9 orthologs
In other embodiments, the Cas9 protein may be a wild-type Cas9 ortholog from another bacterial species that is different from a typical Cas9 from streptococcus pyogenes. For example, the following Cas9 orthologs may be used in conjunction with the guided editor constructs described herein. Furthermore, any variant Cas9 ortholog having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the following orthologs may also be used with the present guide editor.
/>
/>
/>
/>
/>
/>
/>
/>
The guide editors described herein can include any of the Cas9 orthologous sequences described above, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
napDNAbp can include any suitable homolog and/or ortholog or enzyme, such as Cas9, cas9 homolog and/or ortholog, have been described in different species including, but not limited to, streptococcus pyogenes and streptococcus thermophilus. Preferably, the Cas moiety is configured (e.g., mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target double-stranded DNA. Other suitable Cas nucleases and sequences will be apparent to the skilled artisan based on the present disclosure, such Cas9 nucleases and sequences include Cas9 sequences from organisms and loci disclosed in cheilinski, rhun, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference). In some embodiments, the Cas9 nuclease has an inactive (e.g., inactive) DNA cleavage domain, i.e., cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 3. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one of the Cas9 orthologs in the table above.
C.Dead napDNAbp variants
In some embodiments, the disclosed guidance editors may comprise catalytically inactive or "dead" napDNAbp domains. In certain embodiments, the guide editors described herein can include death Cas9, e.g., death SpCas9, that has no nuclease activity due to one or more mutations that inactivate both nuclease domains of Cas9, i.e., ruvC domains (which cleave non-pre-spacer DNA strands) and HNH domains (which cleave pre-spacer DNA strands). Nuclease inactivation may be due to one or more mutations resulting in one or more substitutions and/or deletions in the amino acid sequence of the encoded protein or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95% or at least 99% sequence identity thereto.
As used herein, the term "dCas9" refers to nuclease-inactive Cas9 or nuclease-dead Cas9, or a functional fragment thereof, and includes any naturally occurring dCas9 from any organism, any naturally occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog or paralog from any organism, and any mutant or variant of naturally occurring or engineered dCas 9. The term dCas9 is not meant to be limiting in particular and may be referred to as "dCas9 or equivalent. Exemplary dCas9 proteins and methods for preparing dCas9 proteins are further described herein and/or described in the art and incorporated herein by reference.
In other embodiments, dCas9 corresponds to or comprises part or all of the Cas9 amino acid sequence with one or more mutations that inactivate Cas9 nuclease activity. In other embodiments, cas9 variants are provided having mutations other than D10A and H840A that result in complete or partial inactivation of endogenous Cas9 nuclease (e.g., nCas9 or dCas9, respectively) activity. For example, referring to wild-type sequences such as Cas9 from streptococcus pyogenes (NCBI reference sequence: NC 017053.1), such mutations include other amino acid substitutions at D10 and H820 of Cas9, or other substitutions within the nuclease domain (e.g., substitutions in the HNH nuclease subdomain and/or RuvC1 subdomain). In some embodiments, variants or homologs of Cas9 are provided (e.g., variants of Cas9 from streptococcus pyogenes (NCBI reference sequence: nc_017053.1 (SEQ ID NO: 39)) that are identical to the NCBI reference sequence: NC 017053.1 is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical. In some embodiments, variants of Cas9 (e.g., variants of NCBI reference sequence: nc_017053.1 (SEQ ID NO: 39)) are provided having an amino acid sequence of about 5 amino acids, about 10 amino acids, about 15 amino acids, about 20 amino acids, about 25 amino acids, about 30 amino acids, about 40 amino acids, about 50 amino acids, about 75 amino acids, about 100 amino acids, or more, shorter or longer than nc_017053.1 (SEQ ID NO: 39).
In one embodiment, death Cas9 may be based on a typical SpCas9 sequence of Q99ZW2, and may have a sequence comprising D10X and H810X, where X may be any amino acid, substitution (underlined and bold), or variant that is a variant having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 57.
In one embodiment, death Cas9 may be based on a typical SpCas9 sequence of Q99ZW2 and may have sequences comprising D10A and H810A substitutions (underlined and bold) or variants having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID No. 58.
/>
D.napDNAbp nickase variants
In some embodiments, the disclosed base editor can comprise a napDNAbp domain comprising a nicking enzyme. In one embodiment, the guide editor described herein comprises a Cas9 nickase. The term "Cas9 nickase" of "nCas9" refers to a Cas9 variant capable of introducing a single-strand break in a double-stranded DNA molecule target. In some embodiments, the Cas9 nickase comprises only a single functional nuclease domain. Wild-type Cas9 (e.g., typical SpCas 9) comprises two independent nuclease domains, namely, a RuvC domain (cleaving a non-pre-spacer DNA strand) and a HNH domain (cleaving a pre-spacer DNA strand). In one embodiment, the Cas9 nickase comprises a mutation in the RuvC domain that inactivates RuvC nuclease activity. For example, mutations in aspartic acid (D) 10, histidine (H) 983, aspartic acid (D) 986, or glutamic acid (E) 762 have been reported as loss-of-function mutations in RuvC nuclease domains and creation of functional Cas9 nickases (e.g., nishimasu et al, "Crystal structure of Cas9 in complex with guide RNA and target DNA," Cell 156 (5), 935-949, incorporated herein by reference). Thus, the nickase mutation in RuvC domain may include D10X, H983X, D986X, or E762X, where X is any amino acid other than a wild type amino acid. In certain embodiments, the nicking enzyme may be D10A, H983A, or D986A, or E762A, or a combination thereof.
In various embodiments, the Cas9 nickase may have a mutation in the RuvC nuclease domain and have one of the following amino acid sequences or a variant of an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
/>
/>
/>
/>
/>
In another embodiment, the Cas9 nickase comprises a mutation in the HNH domain that inactivates HNH nuclease activity. For example, mutations in histidine (H) 840 or asparagine (R) 863 have been reported as loss-of-function mutations of the HNH nuclease domain and creation of functional Cas9 nickases (e.g., nisimasu et al, "Crystal structure of Cas in complex with guide RNA and target DNA," Cell 156 (5), 935-949, which is incorporated herein by reference). Thus, the nickase mutations in the HNH domain may include H840X and R863X, where X is any amino acid other than the wild type amino acid. In certain embodiments, the nicking enzyme may be H840A or R863A or a combination thereof.
In various embodiments, the Cas9 nickase may have a mutation in the HNH nuclease domain and have one of the following amino acid sequences or a variant of an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
/>
/>
In some embodiments, the N-terminal methionine is removed from the Cas9 nickase or from any Cas9 variant, ortholog or equivalent disclosed or contemplated herein. For example, a methionine reduced Cas9 nickase includes a sequence or variant of an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
/>
/>
E.Other Cas9 variants
In addition to death Cas9 and Cas9 nickase variants, cas9 proteins used herein may also include other "Cas9 variants having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference Cas9 protein (including any wild-type Cas9 or mutant Cas9 (e.g., inactive Cas9 or Cas9 nickase), or Cas9 fragments, or circularly arranged Cas9, or other variants of Cas9 disclosed herein or known in the art). In some embodiments, cas9 variants may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes as compared to reference Cas 9. In some embodiments, the Cas9 variant comprises a fragment (e.g., a gRNA binding domain or a DNA cleavage domain) of reference Cas9 such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild-type Cas 9. In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of the corresponding wild-type Cas9 (e.g., SEQ ID NO: 37).
In some embodiments, the present disclosure can also utilize Cas9 fragments that retain functionality and are fragments of any Cas9 protein disclosed herein. In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length.
In various embodiments, the guide editors disclosed herein can comprise one of the Cas9 variants described below or a Cas9 variant that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the reference Cas9 variants.
F.Small Cas9 variants
In some embodiments, the guide editors contemplated herein may include Cas9 proteins having a molecular weight less than a typical SpCas9 sequence. In some embodiments, the small Cas9 variants may facilitate delivery to cells, e.g., by expression vectors, nanoparticles, or other means of delivery. In certain embodiments, the small Cas9 variants may include enzymes classified as type II enzymes of a class 2 CRISPR-Cas system. In some embodiments, the small Cas9 variants may include enzymes classified as type V enzymes of a class 2 CRISPR-Cas system. In other embodiments, the small Cas9 variants may include enzymes classified as class VI enzymes of a class 2 CRISPR-Cas system.
Typical SpCas9 proteins are 1368 amino acids in length and have a predicted molecular weight of 158 kilodaltons. As used herein, the term "small Cas9 variant" refers to any Cas9 variant-naturally occurring, engineered, or otherwise-that is less than 1300 amino acids, or at least 1290 amino acids, or less than 1280 amino acids, or less than 1270 amino acids, or less than 1260 amino acids, or less than 1250 amino acids, or less than 1240 amino acids, or less than 1230 amino acids, or less than 1220 amino acids, or less than 1210 amino acids, or less than 1200 amino acids, or less than 1190 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 1150 amino acids, or less than 1140 amino acids, or less than 1130 amino acids, or less than 1120 amino acids, or less than 1110 amino acids, or less than 1100 amino acids, or less than 1050 amino acids, or less than 1000 amino acids, or less than 900 amino acids, or less than 850 amino acids, or less than 800 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than about 500 amino acids, and retains the function of a Cas of at least 500 amino acids. Cas9 variants may include those classified as type II, type V or type VI enzymes of a class 2 CRISPR-Cas system.
In various embodiments, the guide editors disclosed herein can comprise small Cas9 variants as described below or Cas9 variants that are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference small Cas9 protein.
/>
/>
/>
/>
G.Cas9 equivalents
In some embodiments, the guide editor described herein can include any Cas9 equivalent. As used herein, the term "Cas9 equivalent" is a broad term encompassing any napDNAbp protein that performs the same function as Cas9 in the present guide editor, although its amino acid primary sequence and/or its three-dimensional structure may be different and/or unrelated from an evolutionary point of view. Thus, while Cas9 equivalents include any Cas9 ortholog, homolog, mutant or variant described or encompassed herein that is evolutionarily related, cas9 equivalents also include proteins that may have been evolved to have the same or similar function as Cas9 by a convergent evolution process, they do not necessarily have any similarity in amino acid sequence and/or three-dimensional structure. The guide editors described herein include any Cas9 equivalent that will provide the same or similar function as Cas9, although Cas9 equivalents may be based on proteins produced by convergent evolution. For example, if Cas9 refers to a type II enzyme of a CRISPR-Cas system, then Cas9 equivalent may refer to a type V or VI enzyme of the CRISPR-Cas system.
For example, cas12e (CasX) is a Cas9 equivalent that is reported to have the same function as Cas9 but evolved by convergent evolution. Thus, the Cas12e (CasX) protein described in Liu et al, "CasX enzymes comprises a distinct family of RNA-guided genome editors," Nature,2019, vol.566:218-223 is contemplated for use with the guide editor described herein. Furthermore, any variant or modification of Cas12e (CasX) is contemplated and within the scope of the present disclosure.
Cas9 is a bacterial enzyme that evolves in a wide variety of species. However, cas9 equivalents contemplated herein may also be obtained from archaebacteria that constitute domains and kingdoms of single-cell prokaryotic microorganisms other than bacteria.
In some embodiments, cas9 equivalents may refer to Cas12e (CasX) or Cas12d (CasY), which have been described, for example, in Burstein et al, "New CRISPR-Cas systems from uncultivated microns," Cell res.2017 Feb 21.doi:10.1038/cr.2017.21, the entire contents of which are incorporated herein by reference. Using genome-resolved metagenomics, many CRISPR-Cas systems were identified, including Cas9, which was first reported in the archaebacteria life domain. This different Cas9 protein is found in very few nanoarchaea (nanoarchaea) as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems, CRISPR-Cas12e and CRISPR-Cas12d, were found, which are one of the most compact systems found so far. In some embodiments, cas9 refers to Cas12e, or a variant of Cas12 e. In some embodiments, cas9 refers to Cas12d or a variant of Cas12 d. It is understood that other RNA-guided DNA binding proteins may be used as the nucleic acid programmable DNA binding protein (napDNAbp) and are within the scope of the present disclosure. See also Liu et al, "CasX enzymes comprises a distinct family of RNA-guided genome editors," Nature,2019, vol.566:218-223. Any such Cas9 equivalents are contemplated.
In some embodiments, the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp is a naturally occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas portion or any Cas portion provided herein.
In various embodiments, the nucleic acid-programmable DNA-binding proteins include, but are not limited to, cas9 (e.g., dCas9 and nCas 9), cas12e (CasX), cas12d (CasY), cas12a (Cpfl), cas12b1 (C2C 1), cas13a (C2), cas12C (C2C 3), argonaute proteins, and Cas12b1. One example of a nucleic acid programmable DNA binding protein with PAM specificity different from Cas9 is a clustered regularly interspaced short palindromic repeat from Prevolvula (Prevolella) and Francisella (Francisella) 1 (i.e., cas12a (Cpf 1)). Like Cas9, cas12a (Cpf 1) is also a class 2 CRISPR effector, but it is an enzyme of the V-subgroup rather than the II-subgroup. Cas12a (Cpf 1) has been shown to mediate strong DNA interference, a characteristic different from Cas 9. Cas12a (Cpf 1) is a single stranded RNA-guided endonuclease, lacking tracrRNA, that utilizes a T-rich pre-spacer proximity motif (TTN, TTTN, or YTN). Furthermore, cpf1 cleaves DNA via a staggered DNA double strand break. Of the 16 Cpf1 family proteins, 2 enzymes from the genus amino acid coccus (Acidococcus) and the family Trichosporoceae (Lachnospiraceae) demonstrated potent genome editing activity in human cells. Cpf1 proteins are known in the art and have been previously described, for example, in Yamano et al, "Crystal structure of Cpf1 in complex with guide RNA and target DNA." Cell (165) 2016, p.949-962; the entire contents of which are incorporated herein by reference.
In other embodiments, cas proteins may include any CRISPR-associated protein, including, but not limited to, cas12a, cas12b1, cas1B, cas2, cas3, cas4, cas5, cas6, cas7, cas8, cas9 (also referred to as Csn1 and Csx 12), cas10, csy1, csy2, csy3, cse1, cse2, csc1, csc2, csa5, csn2, csm3, csm4, csm5, csm6, cmr1, cmr3, cmr4, cmr5, cmr6, csb1, csb2, csb3, csx17, csx14, csx10, csx16, csaX, csx3, csx1, csx15, csf1, csf2, csf3, csf4, homologs thereof, or versions thereof, and preferably comprise a mutation of the wild-type Cas corresponding to the mutant of the polypeptide of SEQ ID NO 37.
In various other embodiments, the napDNAbp may be any one of the following proteins: cas9, cas12a (Cpfl), cas12e (CasX), cas12d (CasY), cas12b1 (C2C 1), cas13a (C2), cas12C (C2C 3), geoCas9, cjCas9, cas12g, cas12h, cas12i, cas13b, cas13C, cas13d, cas14, csn2, xCas9, spCas9-NG, circularly arranged Cas9, or Argonaute protein (Ago) domains, or variants thereof.
Exemplary Cas9 equivalent protein sequences may include the following:
/>
/>
/>
/>
/>
/>
the guide editors described herein may also include Cas12a (Cpfl) (dCpfl) variants, which may be used as guide nucleotide sequence-programmable DNA binding protein domains. The Cas12a (Cpf 1) protein has a RuvC-like endonuclease domain similar to that of Cas9, but no HNH endonuclease domain, and the N-terminus of Cas12a (Cpf 1) has no alpha-helical recognition cleft (lobe) of Cas 9. Zetsche et al, cell,163,759-771,2015 (incorporated herein by reference) shows that the RuvC-like domain of Cas12a (Cpf 1) is responsible for cleaving two DNA strands and that inactivation of the RuvC-like domain inactivates Cas12a (Cpf 1) nuclease activity.
In some embodiments, the napDNAbp is a single effector of a microbial CRISPR-Cas system. Single effectors of microbial CRISPR-Cas systems include, but are not limited to, cas9, cas12a (Cpf 1), cas12b1 (C2C 1), cas13a (C2), and Cas12C (C2C 3). Generally, microbial CRISPR-Cas systems are classified into class 1 and class 2 systems. Class 1 systems have a multi-subunit effector complex, while class 2 systems have a single protein effector. For example, cas9 and Cas12a (Cpf 1) are class 2 effectors. In addition to Cas9 and Cas12a (Cpf 1), three different class 2 CRISPR-Cas systems (Cas 12b1, cas13a, and Cas12 c) are described in Shmakov et al, "Discovery and Functional Characterization of Diverse Class 2 CRISPR Cas Systems", mol. Cell,2015Nov5;60 385-397, the entire contents of which are incorporated herein by reference.
Effectors of both systems (Cas 12b1 and Cas12 c) contain RuvC-like endonuclease domains associated with Cas12 a. The third system, cas13a, contains effectors with two predicted HEPN RNase domains. Mature CRISPR RNA is produced independently of tracrRNA, unlike CRISPR RNA produced by Cas12b 1. Cas12b1 relies on CRISPR RNA and tracrRNA for DNA cleavage. Bacterial Cas13a has been shown to have unique RNase activity for CRISPR RNA maturation, as opposed to its RNA-activated single-stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA processing behavior of Cas12 a. See, e.g., east-Seletsky, et al, "Two distinct RNase activities of CRISPR-Cas13a enable guide-RNA processing and RNA detection", nature,2016 Oct 13;538 270-273, the entire contents of which are incorporated herein by reference. In vitro biochemical analysis of Cas13a in Leptotrichia shahii indicated that Cas13a was guided by single strand CRISPR RNA and was programmable to cleave ssRNA targets with complementary pre-spacer regions. Catalytic residues in two conserved HEPN domains mediate cleavage. Mutations in the catalytic residues produce catalytically inactive RNA-binding proteins. See, e.g., abudayyeh et al, "C2C2 is a single-component programmable RNA-guide RNA-targeting CRISPR effector", science,2016 Aug 5;353 (6299), the entire contents of which are incorporated herein by reference.
The crystal structure of Alicyclobaccillus acidoterrastris Cas b1 (AacC 2c 1) has been reported to complex with chimeric single molecule guide RNAs (sgrnas). See, e.g., liu et al, "C2C1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism", mol. Cell,2017 Jan 19;65 310-322, the entire contents of which are incorporated by reference herein. The crystal structure in thermophilic acidophiles (Alicyclobacillus acidoterrestris) C2C1 that bind to target DNA as a ternary complex is also reported. See, e.g., yang et al, "PAM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonucleolytic", cell,2016 Dec 15;167 (7) 1814-1828, which is incorporated herein by reference in its entirety. The conformations of the catalytic ability of AacC2C1 (in both cases of target and non-target DNA strands) have been captured independently, within a single RuvC catalytic pocket, with C2C1 mediated cleavage resulting in staggered seven nucleotide breaks of the target DNA. Structural comparison between the C2C1 ternary complex and the previously determined Cas9 and Cpf1 counterparts suggests a diversity of mechanisms used by the CRISPR-Cas9 system.
In some embodiments, the napDNAbp can be a C2C1, C2, or C2C3 protein. In some embodiments, napDNAbp is a C2C1 protein. In some embodiments, the napDNAbp is a Cas13a protein. In some embodiments, the napDNAbp is a Cas12c protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring Cas12b1 (C2C 1), cas13a (C2), or Cas12C (C2C 3) protein. In some embodiments, the napDNAbp is a naturally occurring Cas12b1 (C2C 1), cas13a (C2), or Cas12C (C2C 3) protein.
H.Cas9 annular arrangement body
In various embodiments, the guide editor disclosed herein can comprise a circular arrangement of Cas 9.
The term "circularly permuted Cas9" or "circular permutation of Cas9" or "CP-Cas9" refers to any Cas9 protein or variant thereof that appears or has been modified to be engineered as a circular permutation variant, meaning that the N-and C-termini of the Cas9 protein (e.g., wild-type Cas9 protein) have been locally rearranged. This circular arrangement of Cas9 proteins or variants thereof retains the ability to bind DNA when complexed with guide RNAs (grnas). See Oakes et al, "Protein Engineering of Cas9 for enhanced function," Methods Enzymol,2014,546:491-511, and Oakes et al, "CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification," Cell, january 10,2019,176:254-267, each of which is incorporated herein by reference. The present disclosure contemplates any previously known CP-Cas9 or use of a new CP-Cas9, so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with guide RNAs (grnas).
Any Cas9 protein described herein, including any variant, ortholog, or naturally occurring Cas9 or equivalent thereof, may be reconfigured as a circular array variant.
In various embodiments, the circular arrangement of Cas9 can have the following structure:
n-terminal- [ original C-terminal ] - [ optional linker ] - [ original N-terminal ] -C-terminal.
By way of example, the present disclosure contemplates the following circular arrangement of 1368 amino acids (numbering based on amino acid positions in SEQ ID NO: 37) of a typical Streptococcus pyogenes Cas9 (UniProtKB-Q99 ZW2 (CAS 9. RTM. STRP 1)).
N-terminal- [1268-1368] - [ optional linker ] - [1-1267] -C-terminal;
n-terminal- [1168-1368] - [ optional linker ] - [1-1167] -C-terminal;
n-terminal- [1068-1368] - [ optional linker ] - [1-1067] -C-terminal;
n-terminal- [968-1368] - [ optional linker ] - [1-967] -C-terminal;
n-terminal- [868-1368] - [ optional linker ] - [1-867] -C-terminal;
n-terminal- [768-1368] - [ optional linker ] - [1-767] -C-terminal;
n-terminal- [668-1368] - [ optional linker ] - [1-667] -C-terminal;
n-terminal- [568-1368] - [ optional linker ] - [1-567] -C-terminal;
n-terminal- [468-1368] - [ optional linker ] - [1-467] -C-terminal;
n-terminal- [368-1368] - [ optional linker ] - [1-367] -C-terminal;
n-terminal- [268-1368] - [ optional linker ] - [1-267] -C-terminal;
n-terminal- [168-1368] - [ optional linker ] - [1-167] -C-terminal;
n-terminal- [68-1368] - [ optional linker ] - [1-67] -C-terminal; or (b)
Corresponding circular arrays of N-terminal- [10-1368] - [ optional linker ] - [1-9] -C-terminal, or other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
In a particular embodiment, the circularly arranged Cas9 has the following structure (1368 amino acids based on streptococcus pyogenes Cas9 (UniProtKB-Q99 ZW2 (Cas 9_strp1) (numbering based on amino acid position in SEQ ID NO: 37)):
n-terminal- [102-1368] - [ optional linker ] - [1-101] -C-terminal;
n-terminal- [1028-1368] - [ optional linker ] - [1-1027] -C-terminal;
n-terminal- [1041-1368] - [ optional linker ] - [1-1043] -C-terminal;
n-terminal- [1249-1368] - [ optional linker ] - [1-1248] -C-terminal; or (b)
Corresponding circular arrangements of the N-terminus- [1300-1368] - [ optional linker ] - [1-1299] -C-terminus, or other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
In other embodiments, the circularly arranged Cas9 has the following structure (1368 amino acids based on streptococcus pyogenes Cas9 (UniProtKB-Q99 ZW2 (Cas 9_strp1) (numbering based on amino acid positions in SEQ ID NO: 37):
n-terminal- [103-1368] - [ optional linker ] - [1-102] -C-terminal;
n-terminal- [1029-1368] - [ optional linker ] - [1-1028] -C-terminal;
n-terminal- [1042-1368] - [ optional linker ] - [1-1041] -C-terminal;
n-terminal- [1250-1368] - [ optional linker ] - [1-1249] -C-terminal; or (b)
Corresponding circular arrangements of the N-terminus- [1301-1368] - [ optional linker ] - [1-1300] -C-terminus, or other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
In some embodiments, the circular arrangement can be formed by ligating the C-terminal fragment of Cas9 to the N-terminal fragment of Cas9 directly or by using a linker (e.g., an amino acid linker). In some embodiments, the C-terminal fragment can correspond to 95% or more of the amino acids at the C-terminus of Cas9 (e.g., about 1300-1368 amino acids), or 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10% or 5% or more of the amino acids at the C-terminus of Cas9 (e.g., any of SEQ ID NOs: 88-97). The N-terminal portion can correspond to 95% or more of the amino acids (e.g., about 1-1300 amino acids) of the N-terminal of Cas9, or 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of the amino acids of the N-terminal of Cas9 (e.g., SEQ id no: 37).
In some embodiments, the circular arrangement can be formed by ligating the C-terminal fragment of Cas9 to the N-terminal fragment of Cas9 directly or by using a linker (e.g., an amino acid linker). In some embodiments, the C-terminal fragment rearranged to the N-terminus comprises or corresponds to 30% or less of the amino acids at the C-terminus of Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 37). In some embodiments, the C-terminal fragment rearranged to the N-terminus comprises or corresponds to 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% of the amino acids of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 37). In some embodiments, the C-terminal fragment rearranged to the N-terminus comprises or corresponds to 410 residues or less of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 37). In some embodiments, the C-terminal portion rearranged to the N-terminus comprises or corresponds to 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 residues of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 37). In some embodiments, the C-terminal portion rearranged to the N-terminus comprises or corresponds to 357, 341, 328, 120, or 69 residues of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 37).
In other embodiments, the circular arrangement Cas9 variant may defineIs a topological rearrangement of Cas9 primary structure based on the following method, which is based on streptococcus pyogenes Cas9 of SEQ ID NO: 37: (a) A circular arrangement (CP) site corresponding to the internal amino acid residue of Cas9 primary structure was selected, which split the original protein into two halves: an N-terminal region and a C-terminal region; (b) The Cas9 protein sequence is modified (e.g., by genetic engineering techniques) by moving the original C-terminal region (containing the CP site amino acid) to before the original N-terminal region, thereby forming a new N-terminal of the Cas9 protein now beginning with the CP site amino acid residue. The CP site may be located in any domain of the Cas9 protein, including, for example, the helix-II domain, ruvCIII domain, or CTD domain. For example, the CP site can be located at (relative to streptococcus pyogenes Cas9 of SEQ ID NO: 37) original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1071, 1247, 1249, or 1282. Thus, once relocated to the N-terminus, the original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249 or 1282 will become the new N-terminal amino acid. The naming of these CP-Cas9 proteins can be considered as Cas9-CP, respectively 181 、Cas9-CP 199 、Cas9-CP 230 、Cas9-CP 270 、Cas9-CP 310 、Cas9-CP 1010 、Cas9-CP 1016 、Cas9-CP 1023 、Cas9-CP 1029 、Cas9-CP 1041 、Cas9-CP 1247 、Cas9-CP 1249 And Cas9-CP 1282 . The description is not meant to be limited to preparing CP variants from SEQ ID No. 37, but rather can be implemented to prepare CP variants of any Cas9 sequence, whether at CP sites corresponding to these positions, or all at other CP sites. This description is not intended to limit the specific CP sites in any way. Almost any CP site can be used to form the CP-Cas9 variant.
Exemplary CP-Cas9 amino acid sequences for Cas9 based on SEQ ID No. 37 are provided below, with the linker sequence underlined and the optional methionine (M) residues shown in bold. It is to be understood that the present disclosure provides CP-Cas9 sequences that do not include a linker sequence or include a different linker sequence. It should be understood that the CP-Cas9 sequence may be based on Cas9 sequences other than SEQ ID NO:37, and any examples provided herein are not meant to be limiting. Exemplary CP-Cas9 sequences are as follows:
/>
/>
/>
described herein are Cas9 circular arrays useful in guiding editing constructs. Exemplary C-terminal fragments of Cas9 (Cas 9 based on SEQ ID NO: 37) are provided below, which can rearrange to the N-terminus of Cas 9. It should be understood that such C-terminal fragments of Cas9 are exemplary and not meant to be limiting. These exemplary CP-Cas9 fragments have the following sequences:
/>
I.Cas9 variants with modified PAM specificity
The guide editor of the present disclosure may further comprise Cas9 variants with modified PAM specificity. Some aspects of the disclosure provide Cas9 proteins that exhibit activity against a target sequence that does not comprise typical PAM (5 ' -NGG-3', where N is A, C, G or T) at its 3' -end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5' -NGG-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNG-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNA-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNC-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNT-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NGT-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5' -NGA-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NGC-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAA-3' pam sequence at its 3' -terminus. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAC-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAT-3' pam sequence at its 3' -end. In other embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAG-3' pam sequence at its 3' -end.
It is to be understood that any amino acid mutation described herein (e.g., a 262T) from a first amino acid residue (e.g., a) to a second amino acid residue (e.g., T) can also include a mutation from the first amino acid residue to an amino acid residue that is similar (e.g., conserved) to the second amino acid residue. For example, a mutation of an amino acid having a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) can be a mutation of a second amino acid having a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, an alanine-to-threonine mutation (e.g., an a262T mutation) can also be an alanine-to-threonine amino acid (e.g., serine) that is similar in size and chemical nature to threonine. As another example, a mutation of an amino acid having a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation of a second amino acid having a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, a mutation of an amino acid having a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation of a second amino acid having a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Other similar pairs of amino acids include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan will recognize that such conservative amino acid substitutions may have little effect on the protein structure and may be well tolerated without compromising function. In some embodiments, any amino group provided herein that is mutated from an amino acid to threonine can be an amino acid mutation to serine. In some embodiments, any amino group in an amino acid mutation provided herein from one amino acid to arginine may be an amino acid mutation to lysine. In some embodiments, any amino group provided herein that is mutated from an amino acid to isoleucine can be mutated from an amino acid to alanine, valine, methionine, or leucine. In some embodiments, any amino group in an amino acid mutation provided herein from one amino acid to lysine can be mutated from an amino acid to arginine. In some embodiments, any amino group provided herein that is mutated from an amino acid to an amino acid aspartic acid can be an amino acid mutation to glutamic acid or asparagine. In some embodiments, any amino group in an amino acid mutation provided herein from one amino acid to valine can be an amino acid mutation to alanine, isoleucine, methionine, or leucine. In some embodiments, any amino group provided herein that is mutated from an amino acid to glycine can be an amino acid mutation to alanine. However, it is understood that other conserved amino acid residues will be recognized by those skilled in the art and that any amino acid mutation of other conserved amino acid residues is also within the scope of the present disclosure.
In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5' -NAA-3' pam sequence at its 3' -end. In some embodiments, the combination of mutations is present in any one of the clones listed in table 1. In some embodiments, the combination of mutations is a conservative mutation of the clones listed in table 1. In some embodiments, the Cas9 protein comprises a combination of mutations for any one of the Cas9 clones listed in table 1.
Table 1: NAA PAM clone
/>
In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 1.
In some embodiments, the Cas9 protein exhibits increased activity against a target sequence that does not comprise classical PAM (5 ' -NGG-3 ') at its 3' end as compared to streptococcus pyogenes Cas9 provided by SEQ ID No. 37. In some embodiments, the Cas9 protein exhibits at least a 5-fold increase in activity against a target sequence having a 3' terminus that is not immediately adjacent to a typical PAM sequence (5 ' -NGG-3 ') as compared to the activity of streptococcus pyogenes Cas9 provided by SEQ ID No. 37 against the same target sequence. In some embodiments, the Cas9 protein exhibits at least a 10-fold, at least a 50-fold, at least a 100-fold, at least a 500-fold, at least a 1,000-fold, at least a 5,000-fold, at least a 10,000-fold, at least a 50,000-fold, at least a 100,000-fold, at least a 500,000-fold, or at least a 1,000,000-fold increase in activity to a target sequence that is not immediately adjacent to a typical PAM sequence (5 '-NGG-3') as compared to the activity of the streptococcus pyogenes provided by SEQ ID No. 37 to the same target sequence. In some embodiments, the 3' end of the target sequence is directly adjacent to the AAA, GAA, CAA or TAA sequence. In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity against a target sequence comprising a 5' -NAC-3' pam sequence at its 3' end. In some embodiments, the combination of mutations is present in any one of the clones listed in table 2. In some embodiments, the combination of mutations is a conservative mutation of the clones listed in table 2. In some embodiments, the Cas9 protein comprises a combination of mutations for any one of the Cas9 clones listed in table 2.
Table 2: NAC PAM cloning
/>
In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 2.
In some embodiments, the Cas9 protein exhibits increased activity against a target sequence that does not comprise classical PAM (5 ' -NGG-3 ') at its 3' end as compared to streptococcus pyogenes Cas9 provided by SEQ ID No. 37. In some embodiments, cas9 protein has at least a 5-fold increase in activity to a target sequence having a 3' terminus that is not immediately adjacent to a typical PAM sequence (5 ' -NGG-3 ') as compared to the activity of streptococcus pyogenes Cas9 provided by SEQ ID No. 37 to the same target sequence. In some embodiments, the Cas9 protein exhibits at least a 10-fold, at least a 50-fold, at least a 100-fold, at least a 500-fold, at least a 1,000-fold, at least a 5,000-fold, at least a 10,000-fold, at least a 50,000-fold, at least a 100,000-fold, at least a 500,000-fold, or at least a 1,000,000-fold increase in activity against a target sequence that is not immediately adjacent to a typical PAM sequence (5 '-NGG-3') as compared to the activity of the streptococcus pyogenes provided in SEQ ID No. 37 against the same target sequence. In some embodiments, the 3' end of the target sequence is directly adjacent to AAC, GAC, CAC or TAC sequences.
In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity against a target sequence comprising a 5' -NAT-3' pam sequence at its 3' end. In some embodiments, the combination of mutations is present in any one of the clones listed in table 3. In some embodiments, the combination of mutations is a conservative mutation of the clones listed in table 3. In some embodiments, the Cas9 protein comprises a combination of mutations for any one of the Cas9 clones listed in table 3.
TABLE 3 NAT PAM cloning
The above description of various napDNAbp that may be used in connection with the presently disclosed bootstrapping editor is not meant to be limiting in any way. The guide editor may comprise a classical SpCas9, or any orthologous Cas9 protein, or any variant Cas9 protein, including any naturally occurring Cas9 variant, mutant, or other engineered form, which is known or may be prepared or evolved by directed evolution or other mutagenesis processes. In various embodiments, cas9 or Cas9 variants have nickase activity, i.e., cleave only one strand of the target DNA sequence. In other embodiments, cas9 or Cas9 variants have an inactive nuclease, i.e., a "dead" Cas9 protein. Other variant Cas9 proteins that may be used are those having a smaller molecular weight (e.g., for easier delivery) than typical SpCas9 or having a modified or rearranged primary amino acid structure (e.g., in the form of a circular arrangement). The guide editors described herein may also contain Cas9 equivalents, including Cas12a/Cpf1 and Cas12b proteins, which are the result of convergent evolution. napDNAbp (e.g., spCas9, cas9 variants, or Cas9 equivalents) as used herein may also contain various modifications that alter/enhance its PAM specificity. Finally, the application contemplates any Cas9, cas9 variant or Cas9 equivalent that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence (e.g., a reference SpCas9 canonical sequence or reference Cas9 equivalent (e.g., cas12a/Cpf 1)).
In a particular embodiment, the Cas9 variant with extended PAM capability is SpCas9 (H840A) VRQR (SEQ ID NO: 87) with the following amino acid sequence (wherein V, R, Q, R substitutions SpCas9 (H840A) relative to SEQ ID NO:68 are shown in bold underlines:
in another particular embodiment, the Cas9 variant with extended PAM capability is SpCas9 (H840A) VRER with the following amino acid sequence (wherein V, R, E, R substitutions SpCas9 (H840A) relative to SEQ ID NO:51 are shown in bold underlines:
in some embodiments, the napDNAbp that functions in the case of atypical PAM sequences is an Argonaute protein. An example of such a nucleic acid programmable DNA binding protein is the Argonaute protein (NgAgo) from saline-alkali bacillus griseus (Natronobacterium gregoryi). NgAgo is a ssDNA-guided endonuclease. NgAgo binds to 5' phosphorylated ssDNA (gDNA) of about 24 nucleotides to direct it to a target site and create a DNA double strand break at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a pre-spacer adjacent motif (PAM). The use of nuclease-inactive NgAgo (dNgAgo) can greatly expand the bases that are likely to be targeted. Features and applications of NgAgo are described in Gao et al, nat biotechnol, 2016Jul;34 (7) 768-73.PubMed PMID:27136078; swars et al, nature.507 (7491) (2014): 258-61; and Swars et al, nucleic Acids Res.43 (10) (2015): 5120-9, each of which is incorporated herein by reference.
In some embodiments, the napDNAbp is a prokaryotic homolog of the Argonaute protein. Prokaryotic homologs of the Argonaute protein are known and have been described, for example, in Makarova k., et al, "Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements", biol direct.2009aug 25; doi:10.1186/1745-6150-4-29, the entire contents of which are incorporated herein by reference. In some embodiments, the napDNAbp is a Marinitoga piezophile Argonaute (MpAgo) protein. CRISPR-associated Marinitoga piezophile Argonaute (MpAgo) proteins use 5' -phosphorylation guides to cleave single stranded target sequences. All known Argonaute proteins use 5' guides. The crystal structure of the MpAgo-RNA complex shows a guide-chain binding site comprising residues that block 5' phosphate interactions. This data demonstrates the evolution of the Argonaute protein subclass with atypical specificity for 5' -hydroxylation guides. See, for example, kaya et al, "Abacterial Argonaute with noncanonical guide RNA specificity", proc Natl Acad Sci U S a.2016apr12; 113 4057-62, the entire contents of which are incorporated herein by reference. It is understood that other Argonaute proteins may be used and are within the scope of the present disclosure.
Some aspects of the disclosure provide Cas9 domains with different PAM specificities. Typically, cas9 proteins, such as Cas9 from streptococcus pyogenes (spCas 9), require typical NGG PAM sequences to bind to specific nucleic acid regions. This may limit the ability to edit the desired bases within the genome. In some embodiments, the base editing fusion proteins provided herein may need to be placed at a precise location, such as placing the target base within a 4-base region (e.g., an "editing window") that is about 15 bases upstream of PAM. See Komor, a.c., et al, "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage" Nature 533,420-424 (2016), the entire contents of which are incorporated herein by reference. Thus, in some embodiments, any fusion protein provided herein may contain a Cas9 domain capable of binding to a nucleotide sequence that does not contain a typical (e.g., NGG) PAM sequence. Cas9 domains that bind atypical PAM sequences have been described in the art and are apparent to the skilled artisan. For example, cas9 domains that bind atypical PAM sequences have been described in kleinsriver, b.p., et al, "Engineered CRISPR-Cas9 nucleases with altered PAM specificities" Nature 523,481-485 (2015); and kleinsriver, b.p., et al, "Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition" Nature Biotechnology, 1293-1298 (2015); the entire contents of each are hereby incorporated by reference.
For example, a napDNAbp domain with altered PAM specificity, such as a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to wild-type francissamum novyi (Francisella novicida) Cpfl (D917, E1006, and D1255) (SEQ ID NO: 100) having the amino acid sequence:
other napDNAbp domains with altered PAM specificity, such as domains with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to bacillus stearothermophilus (Geobacillus thermodenitrificans) Cas9 (SEQ ID NO: 55) with the following amino acid sequences. In some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) is a nucleic acid programmable DNA binding protein that does not require a typical (NGG) PAM sequence. In some embodiments, the napDNAbp is an Argonaute protein. An example of such a nucleic acid programmable DNA binding protein is the Argonaute protein (NgAgo) from saline-alkali bacillus griseus (Natronobacterium gregoryi). NgAgo is a ssDNA-guided endonuclease. NgAgo binds to 5' phosphorylated ssDNA (gDNA) of about 24 nucleotides, directs it to a target site, and double-strand breaks DNA at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a pre-spacer adjacent motif (PAM). The use of nuclease-inactive NgAgo (dNgAgo) can greatly expand the bases that are likely to be targeted. Characterization and application of NgAgo is described in Gao et al, nat Biotechnol, 34 (7): 768-73 (2016), pubMed PMID:27136078; swars et al, nature,507 (7491): 258-61 (2014); and Swarts et al, nucleic Acids Res.43 (10) (2015): 5120-9, each of which is incorporated herein by reference. The sequence of the Argonaute protein of halophiles griseus is provided in SEQ ID NO. 101.
The disclosed fusion proteins may comprise a napDNAbp domain with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to a wild-type bacillus griseus Argonaute protein (SEQ ID NO: 101) having the amino acid sequence:
furthermore, the variant or mutant Cas9 protein may be obtained or constructed using any available method. As used herein, the term "mutation" refers to the substitution of a residue within a sequence (e.g., a nucleic acid or amino acid sequence) with another residue, or the deletion or insertion of one or more residues within the sequence. Mutations are typically described by determining the position of an original residue followed by that residue in the sequence, and by the identity of the newly substituted residue. Various methods for making amino acid substitutions (mutations) provided herein are well known in the art and are provided, for example, in Green and Sambrook, molecular Cloning: A Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)). Mutations may include a variety of classes such as single base polymorphisms, microrepeat regions, insertion deletions and inversions, and are not meant to be limiting in any way. Mutations may include "loss of function" mutations, which are the normal result of mutations that reduce or eliminate protein activity. Most loss-of-function mutations are recessive in that in the heterozygote, the second chromosomal copy carries an unmutated version of the gene encoding the fully functional protein, whose presence compensates for the effects of the mutation. Mutations also include "function-gain" mutations, which are mutations that confer abnormal activity on proteins or cells that are not normally present. Many function-acquiring mutations are located in the regulatory sequences, rather than in the coding region, and therefore have many consequences. For example, mutations may result in expression of one or more genes in the wrong tissue, which has achieved functions that they normally do not. Due to its nature, the function-gain mutation is usually dominant.
Mutations can be introduced into the reference Cas9 protein using site-directed mutagenesis. Older site-directed mutagenesis methods known in the art rely on subcloning the sequence to be mutated into a vector, such as an M13 phage vector, which allows isolation of single-stranded DNA templates. In these methods, a mutagenic primer (i.e., a primer that is capable of annealing to the site to be mutated but has one or more mismatched nucleotides at the site to be mutated) is annealed to a single-stranded template, and then the complementary sequence of the template is polymerized starting from the 3' end of the mutagenic primer. The resulting duplex is then transformed into host bacteria and plaques are screened for the desired mutation. Recently, the PCR method has been adopted for site-directed mutagenesis, which has the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require subcloning. Several problems must be considered in performing PCR-based site-directed mutagenesis. First, in these methods, it is desirable to reduce the number of PCR cycles to prevent unwanted mutant amplification by the polymerase. Second, selection must be made to reduce the number of non-mutant parent molecules that persist in the reaction. Third, the extended length PCR method is preferred to allow the use of a single PCR primer set. Fourth, because of the template-independent end-extension activity of some thermostable polymerases, it is often necessary to incorporate an end-fill step in the procedure prior to blunt-end ligation of PCR-generated mutant products.
Mutations can also be introduced by directed evolution processes, such as phage-assisted continuous evolution (PACE) or phage-assisted discontinuous evolution (PANCE). As used herein, the term "phage-assisted continuous evolution (PACE)" refers to continuous evolution employing phage as a viral vector. The general concept of PACE technology has been described in, for example, international PCT application PCT/US2009/056194 filed on 8/9/2009, published as WO2010/028347 on 11/3/2010; PCT international application PCT/US2011/066747 filed on 12/22 2011, 6/28 2012 published as WO 2012/088381; U.S. patent No. 9,023,594 issued 5/2015; international PCT application PCT/US 2015/01022 filed on 1 month 20 of 2015, published as WO2015/134121 on 11 month 9 of 2015; international PCT application PCT/US2016/027795, filed 4/15/2016, 10/20/2016, published as WO 2016/168831, each of which is incorporated herein by reference in its entirety. Variant Cas9 may also be obtained by phage-assisted discontinuous evolution (PANCE), as used herein refers to discontinuous evolution using phage as viral vector. PANCE is a simplified technique for rapid in vivo directed evolution, using sequential bottle transfer (serial flask transfers) of evolved "select phage" (SP) containing the gene of interest to be evolved in fresh e.coli host cells, allowing the genes in the host e.coli to remain constant while the genes contained in SP are evolving. Continuous flask transfer has been a widely used method of microbiological laboratory evolution, and similar methods have recently been developed for phage evolution. The PANCE system is characterized by a lower stringency than the PACE system.
Any reference mentioned above to Cas9 or Cas9 equivalents is incorporated herein by reference in its entirety, if not already stated.
J.Separate napDNAbp domains for split PE delivery
In various embodiments, the guidance editors described herein may be delivered to a cell as two or more fragments that are assembled (either passively, or actively, such as using a split intein sequence) within the cell as a reconstituted guidance editor. In some cases, self-assembly may be passive, with two or more leader editor fragments being bound intra-cellularly, either covalently or non-covalently, to reconstitute the leader editor. In other cases, self-assembly may be catalyzed by dimerization domains mounted on each fragment. Examples of dimerization domains are described herein. In other cases, self-assembly may be catalyzed by a split intein sequence mounted on each leader editor fragment.
Broken PE delivery may be advantageous in addressing various size limitations of different delivery methods. For example, the delivery method may include a viral-based delivery method, a messenger RNA-based delivery method, or RNP-based delivery (ribonucleoprotein-based delivery). Also, by dividing the boot editor into smaller portions, each of these delivery methods may be more efficient and/or effective. Once inside the cell, the smaller parts can be assembled into a functional guided editor. Depending on the manner of fragmentation, the separated guided editor fragments may be reassembled non-covalently or covalently to reassemble the guided editor. In one embodiment, the guidance editor may be broken into two or more fragments at one or more break sites. Fragments may be unmodified (except for being cleaved). Once the fragments are delivered to the cells (e.g., by direct delivery of ribonucleoprotein complexes or by nucleic acid delivery-e.g., mRNA delivery or viral vector-based delivery), the fragments can be re-associated, either covalently or non-covalently, to re-compose the guidance editor. In another embodiment, the guidance editor may be broken into two or more fragments at one or more breaking points. Each fragment may be modified to comprise a dimerization domain, whereby each fragment formed is coupled to a dimerization domain. Once delivered or expressed within the cell, the dimerization domains of the different fragments correlate and bind together, grouping together the different leader editor fragments to reform a functional leader editor. In yet another embodiment, the guided editor fragment may be modified to include a break-away intein. Once delivered or expressed within the cell, the cleaved intein domains of the different fragments associate and bind to each other and then undergo trans-splicing, which results in cleavage of the cleaved intein domains from each fragment and concomitant formation of peptide bonds between the fragments, thereby restoring the guidance editor.
In one embodiment, the guidance editor is delivered using a split intein approach.
The position of the cleavage site can be located between any one or more pairs of residues in the guide editor and any domain therein, including within the napDNAbp domain, within the polymerase domain (e.g., RT domain), within the linker domain connecting the napDNAbp domain and the polymerase domain.
In one embodiment, as shown in fig. 66, the cleavage sites of the guide editor (PE) within the napDNAbp are separated.
In certain embodiments, napDNAbp is a typical SpCas9 polypeptide of SEQ ID No. 37. In certain embodiments, spCas9 breaks into two fragments at a cleavage site between any two pairs of residues located anywhere between residues 1 and 2, or 2 and 3, or 3 and 4, or 4 and 5, or 5 and 6, or 6 and 7, or 7 and 8, or 8 and 9, or 9 and 10 of typical SpCas9 of SEQ ID NO:37, or between any two pairs of residues located between residues 1 to 10, 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 1000 to 1100, 1100 to 1200, 1200 to 1300, or 1300 to 1368.
In certain embodiments, the napDNAbp breaks into two fragments at a cleavage site at a pair of residues corresponding to any two pairs of residues at any position between positions 1 to 10, 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 1000 to 1100, 1100 to 1200, 1200 to 1300, or 1300 to 1368 of typical SpCas9 located at SEQ ID No. 37.
In certain embodiments, spCas9 breaks into two fragments at a cleavage site between any two pairs of residues located between any two pairs of residues between residues 1 and 2, or 2 and 3, or 3 and 4, or 4 and 5, or 5 and 6, or 6 and 7 of typical SpCas9 of SEQ ID NO:37, or between residues 1 to 10, 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 1000 to 1100, 1100 to 1200, 1200 to 1300, or 1300 to 1368. In certain embodiments, the cleavage site is located at one or more polypeptide linkage sites (i.e., a "cleavage site" or a "cleavage intein cleavage site") that is fused to the cleavage intein and then delivered to the cell as a separately encoded fusion protein. Once the split intein fusion protein (i.e., protein half) is expressed in the cell, the protein is trans-spliced to form the complete or full PE, with the attached split intein sequence removed.
For example, as shown in fig. 66, an N-terminal exopeptide can be fused to a first split intein (e.g., an N-intein) and a C-terminal exopeptide can be fused to a second split intein (e.g., a C-intein). The N-terminal and C-terminal exons are fused to reform the complete leader editor comprising the napDNAbp domain and the polymerase domain (e.g., RT domain) upon self-association of the N-and C-inteins within the cell, and then undergo self-excision with simultaneous formation of peptide bonds between the N-and C-terminal exons portions of the complete leader editor (PE).
To utilize a split PE delivery strategy using split inteins, the guide editor needs to be split at one or more split sites to create at least two separate halves of the guide editor, each of which is re-ligatable within the cell if fused to the split intein sequence.
In certain embodiments, the guidance editor breaks at a single breaking site. In certain other embodiments, the guidance editor breaks at two break sites, or three break sites, or four break sites or more.
In a preferred embodiment, the leader editor is cleaved at a single cleavage site to produce two separate halves of the leader editor, each of which can be fused to a cleaved intein sequence.
An exemplary break intein is an Ssp DnaE intein, which comprises two subunits, namely DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-N and dnaE-C, which encode dnaE-N and dnaE-C subunits, respectively. DnaE is a broken intein naturally occurring in Synechocystis sp. PCC6803, capable of directing trans-splicing of two different proteins, each comprising a fusion with DnaE-N or DnaE-C.
Other naturally occurring or engineered cleaved intein sequences are known or can be prepared from the complete intein sequences described herein or those available in the art. Examples of disrupted intein sequences can be found in Stevens et al, "A promiscuous split intein with expanded protein engineering applications," PNAS,2017, vol.114:8538-8543; iwai et al, "Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme, FEBS Lett,580:1853-1858, each of which is incorporated herein by reference. Other disrupted intein sequences can be found, for example, in WO2013/045632, WO2014/055782, WO2016/069774 and EP2877490, the respective contents of which are incorporated herein by reference.
In addition, in vivo and in vitro protein trans-splicing has been described in (Shingledecker, et al, gene 207:187 (1998), southworth, et al, EMBO J.17:918 (1998), mills, et al, proc.Natl. Acad. Sci.USA,95:3543-3548 (1998), lew, et al, J.biol.chem.,273:15887-15890 (1998), wu, et al, biochim.Biophys.acta 35732:1 (1998 b), yamazaki, et al, J.am.chem.120:5591 (1998), evans, et al, J.biol.chem.275:9091 (2000), oto, et al, biochemiy 38:16040-3544 (1998), otomo, et al, J.biol.14:105, sciro.1999), and Scolmol.136.66.67, respectively, and expression of the two expressed fragments were carried out as indicated separately in the figures, and without the expression of the two expressed fragments were carried out in association of the two fragments, such as those expressed in FIGS.
In various embodiments described herein, a continuous evolution method (e.g., PACE) can be used to evolve the first portion of the base editor. The first portion may comprise a single component or domain, such as a Cas9 domain, a deaminase domain, or a UGI domain. The separately evolved modules or domains can then be fused to the remainder of the intracellular base editor by expressing the evolved portion and the remainder of the non-evolved portion, respectively, using the cleaved intein polypeptide domain. The first portion can more broadly include any first amino acid portion of a base editor that is desired to evolve using the continuous evolution method described herein. In this embodiment, the second portion refers to the remaining amino acid portion of the base editor that has not been evolved using the methods herein. Both the evolved first and second portions of the base editor can each be expressed in a cell using a split intein polypeptide domain. The native protein splicing machinery of the cell will reassemble the evolved first portion and the non-evolved second portion to form a single fusion protein evolved base editor. The evolved first portion may comprise the N-or C-terminal portion of a single fusion protein. In a similar manner, the first portion that may allow evolution using the second orthogonal trans-splicing intein pair comprises the inner portion of a single fusion protein.
Thus, any of the evolving and non-evolving components of the base editor described herein can be expressed with a broken intein tag to facilitate the formation of a complete base editor comprising both the evolving and non-evolving components within the cell.
The mechanism of protein splicing processing has been studied in great detail (Chong, et al, J.biol. Chem.1996,271,22159-22168; xu, M-Q & Perler, F.B.EMBO Journal,1996,15,5146-5153), and conserved amino acids at intein and exopeptide splice sites (Xu, et al, EMBO Journal,1994,13 5517-522). The constructs described herein comprise an intein sequence (e.g., an evolved portion of a base editor) fused to the 5' -end of a first gene. Suitable intein sequences may be selected from any protein known to contain protein splice elements. Databases containing all known inteins can be found on the world wide web (Perler, F.B.nucleic Acids Research,1999,27,346-347). The intein sequence is fused at the 3 'end to the 5' end of the second gene. To target the gene to a certain organelle, a peptide signal may be fused to the coding sequence of the gene. After the second gene, the intein gene sequence may be repeated as many times as desired to express multiple proteins in the same cell. For constructs containing multiple inteins, it may be useful to use intein elements from different sources. After the last gene sequence to be expressed, a transcription termination sequence must be inserted. In one embodiment, the intein splice unit is designed so that it can both catalyze cleavage of an extein from an intein and prevent ligation of an extein. Mutagenesis of the C-terminal exopeptide linkage in the Pyrococcus species GB-DDNA polymerase was found to produce altered splice elements that induce cleavage of the exopeptide and intein but prevent subsequent exopeptide linkage (Xu, M-Q & Perler, F.B.EMBO Journal,1996,15,5146-5153). Mutation of serine 538 to alanine or glycine induces cleavage but prevents ligation. Because of the conserved amino acids where the C-terminal extein is attached to the intein, mutations in equivalent residues in other intein splice units should also prevent extein attachment. A preferred intein without an endonuclease domain is the Mycobacterium bufonis (Mycobacterium xenopi) gyrA protein (Telenti, et al J. Bacteriol.1997,179, 6378-6382). Other have been found in nature or created artificially by removing the endonuclease domain from endonucleases containing inteins (Chong, et al J.biol. Chem.1997,272, 15587-15590). In a preferred embodiment, the intein is selected such that it constitutes the minimum number of amino acids required to perform a splicing function, e.g., the intein from the mycobacteria venenum GyrA protein (Telenti, a., et al, j. Bacteriol.1997,179, 6378-6382). In alternative embodiments, an intein is selected that has no endonuclease activity, such as an intein from the Mycobacterium bufonis gyrA protein or a Saccharomyces cerevisiae (Saccharaomyces cerevisiae) VMA intein modified to remove the endonuclease domain (Chong, 1997). Further modification of the intein splice unit may allow for varying the reaction rate of the cleavage reaction, allowing for control of protein dosage by simple modification of the splice unit gene sequence.
Inteins may also exist as two fragments encoded by two separately transcribed and translated genes. These so-called break inteins self-associate and catalyze trans-protein splicing activities. The cleavage inteins have been identified in different cyanobacteria and archaebacteria (Caspi et al, mol Microbiol.50:1569-1577 (2003), choi J. Et al, J Mol Biol.556:1093-1106 (2006.); dassa B. Et al, biochemistry.46:322-330 (2007.); liu X. And Yang J.; J Biol chem.275:26315-26318 (2003); wu H. Et al Proc Natl Acad Sci USA.; E.5:9226-9231 (1998.); and Zettler J. Et al, FEBS letters.553:909-914 (2009)), but have not been found in eukaryotes to date. More recently, bioinformatic analysis of environmental metagenomic data revealed 26 different loci with new genomic arrangements. At each locus, the conserved enzyme coding region is interrupted by a disrupted intein, wherein a separate endonuclease gene is inserted between the portions encoding the intein subdomains. Wherein, five loci are fully assembled: DNA helicases (gp 41-l, gp 41-8); inosine-5' -monophosphate dehydrogenase (IMPDH-1); and ribonucleotide reductase catalytic subunits (NrdA-2 and NrdJ-1). Such disrupted genetic tissue appears to be present mainly in phages (Dassa et al, nucleic Acids research.57:2560-2573 (2009)).
The split intein Npu DnaE is characterized by the highest reporting rate of protein trans-splicing reactions. Furthermore, temperatures of 6℃to 37℃and the presence of up to 6M urea are considered robust and high yielding for the Npu DnaE protein splicing reactions of different exon sequences (Zettler J. Et al, FEBS Letters 553:909-914 (2009); iwai I. Et al, FEBS Letters 550:1853-1858 (2006)). As expected, upon introduction of Cysl Ala mutations in the N domain of these inteins, the initial N-to S-acyl transfer and thus protein splicing is blocked. Unfortunately, the C-terminal cleavage reaction is also almost completely inhibited. The dependence of asparagine cyclization at the C-terminal splice junction on the acyl transfer of the N-terminal scissile peptide bond appears to be a unique property shared by naturally cleaving DnaE intein alleles (Zettler J. Et al FEBS letters 555:909-914 (2009)).
The mechanism of protein splicing generally has four steps [29-30]: 1) N-S or N-O acyl transfer of the N-terminus of the intein, which breaks the upstream peptide bond and forms an ester linkage between the N-exopeptide and the first amino acid (Cys or Ser) side chain of the intein; 2) Transesterification relocates the N-exopeptide to the C-terminus of the intein, forming a new ester linkage linking the N-exopeptide to the first amino acid (Cys, ser or Thr) side chain of the C-exopeptide; 3) Asn cyclization breaks the peptide bond between the intein and the C-extein; and 4) S-N or O-N acyl transfer using a peptide bond between the N-and C-exopeptides instead of an ester bond.
Trans-splicing of proteins catalyzed by split inteins provides a completely enzymatic method for protein ligation [31]. The cleaved intein is essentially a continuous intein (e.g., a mini-intein) cleaved into two parts (designated N-intein and C-intein, respectively). The N-intein and C-intein of the cleaved intein may associate non-covalently to form an active intein and catalyze splicing reactions in substantially the same manner as the continuous intein. The split inteins have been found in nature and have also been engineered in the laboratory [31-35]. As used herein, the term "split intein" refers to any intein in which there is one or more peptide bond breaks between the N-terminal and C-terminal amino acid sequences, such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently reassociate or recombine into an intein that reacts with trans-splicing reactions. Any catalytically active intein or fragment thereof may be used to derive the cleaved intein for use in the methods of the invention. For example, in one aspect, the disrupted intein may be derived from a eukaryotic intein. Alternatively, the disrupted intein may be derived from a bacterial intein. Alternatively, the disrupted intein may be derived from an archaebacteria intein. Preferably, the cleaved inteins so derivatized will have only the amino acid sequence necessary to catalyze the trans-splicing reaction.
As used herein, "N-terminal cleavage intein (In)" refers to any intein sequence comprising an N-terminal amino acid sequence that is responsible for trans-splicing reactions. Thus, in also contains sequences that are cut out when trans-splicing occurs. In may comprise a modified sequence of the N-terminal portion of a naturally occurring intein sequence. For example, in may comprise additional amino acid residues and/or mutated residues, provided that the inclusion of such additional and/or mutated residues does not render In nonfunctional In trans-splicing. Preferably, the inclusion of additional and/or mutated residues increases or enhances the trans-splicing activity of In.
As used herein, "C-terminal split intein (Ic)" refers to any intein sequence comprising a C-terminal amino acid sequence that is responsible for the trans-splicing reaction. In one aspect, ic comprises 4 to 7 consecutive amino acid residues, at least 4 of which are from the last β -strand of the intein from which they were derived. Thus Ic also comprises the sequence that is cut out when trans-splicing occurs. Ic may comprise a modified sequence of the C-terminal portion of the naturally occurring intein sequence. For example, ic may comprise additional amino acid residues and/or mutated residues, provided that the inclusion of such additional and/or mutated residues does not render In nonfunctional In trans-splicing. Preferably, residues comprising additional and/or mutations increase or enhance the trans-splicing activity of Ic.
In some embodiments of the invention, the peptide linked to Ic or In may comprise additional chemical moieties including fluorophores, biotin, polyethylene glycol (PEG), amino acid analogs, unnatural amino acids, phosphate groups, glycosyls, radioisotope labels, and drug molecules. In other embodiments, the peptide linked to Ic may comprise one or more chemically reactive groups, including ketone, aldehyde, cys residues, and Lys residues. In the presence of an "Intein Splicing Polypeptide (ISP)", the N-intein and the C-intein of a cleaved intein can be non-covalently bound to form an active intein and catalyze a splicing reaction. As used herein, "Intein Splice Polypeptide (ISP)" refers to the portion of the amino acid sequence of a cleaved intein that remains when Ic, in, or both are removed from the cleaved intein. In certain embodiments, in comprises an ISP. In another embodiment, ic comprises an ISP. In yet another embodiment, the ISP is a separate peptide that is covalently linked to neither In nor Ic.
The split inteins can be generated from consecutive inteins by engineering one or more split sites in the unstructured loop or intervening amino acid sequences between-12 conserved β -strands present in the mini-intein structure [25-28]. There may be some flexibility in the location of the cleavage site within the region between the β -strands, provided that cleavage occurs so as not to disrupt the structure of the intein, particularly the structured β -strand, to an extent sufficient to result in loss of splicing activity of the protein.
In protein trans-splicing, one precursor protein consists of an N-intein moiety and a subsequent N-intein moiety, the other precursor protein consists of a C-intein and a subsequent C-intein moiety, and the trans-splicing reaction (co-catalyzed by the N-and C-inteins) cleaves the two intein sequences and connects the two intein sequences with a peptide bond. Protein trans-splicing is an enzymatic reaction that can be performed at very low (e.g., micromolar) concentrations of protein and can be performed under physiological conditions.
K. Other programmable nucleases
In various embodiments described herein, the guide editor comprises a napDNAbp, such as a Cas9 protein. These proteins are "programmable" by complexing with a guide RNA (or pegRNA, as the case may be), which directs the Cas9 protein to the target site of DNA, which has a sequence that is partially complementary to the spacer of the gRNA (or pegRNA), and also has the desired PAM sequence. However, in certain embodiments contemplated herein, napDNAbp may be substituted with a different type of programmable protein, such as a zinc finger nuclease or a transcription activator-like effector nuclease (TALEN).
Fig. 1J depicts this variation of guided editing contemplated herein, namely the substitution of napDNAbp (e.g., spCas9 nickase) with any programmable nuclease domain, such as Zinc Finger Nuclease (ZFN) or transcription activator-like effector nuclease (TALEN). Thus, it is contemplated that suitable nucleases do not necessarily need to be "programmed" by a nucleic acid targeting molecule (e.g., guide RNA), but can be programmed by defining the specificity of a DNA binding domain, such as a nuclease in particular. As with the napdNAbp portion for guided editing, such alternative programmable nucleases are preferably modified to cleave only one strand of the target DNA. In other words, the programmable nuclease should preferably function as a nicking enzyme. Once a programmable nuclease (e.g., ZFN or TALEN) is selected, additional functionality can be engineered into the system to enable it to operate in a similar guided editing mechanism. For example, a programmable nuclease can be modified by coupling (e.g., via a chemical linker) an RNA or DNA extension arm, wherein the extension arm comprises a Primer Binding Site (PBS) and a DNA synthesis template. The programmable nuclease may also be coupled (e.g., via a chemical or amino acid linker) to a polymerase, the nature of which depends on whether the extension arm is DNA or RNA. In the case of an RNA extension arm, the polymerase may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of a DNA extension arm, the polymerase can be a DNA-dependent DNA polymerase (e.g., a prokaryotic polymerase, including PolI, polII, or PolIII, or a eukaryotic polymerase, including Pola, polb, polg, pold, pole or Polz). The system may also include other functions added as a fusion to the programmable nuclease or in trans to promote the entire reaction (e.g., (a) helicase untwists DNA at the cleavage site to obtain a cleavage strand with a useful 3' end as a primer, (b) FEN1 to help remove the endogenous strand on the cleavage strand to drive the reaction towards replacement of the endogenous strand with the synthetic strand, or (c) npas 9: gRNA complex forms a second site nick on the opposite strand, which may help drive integration of the synthetic repair by favorable cellular repair of the non-editing strand). In a manner similar to the guided editing using napDNAbp, this complex with other programmable nucleases can be used for synthesis, and then the newly synthesized DNA replacement strand carrying the editing of interest is permanently installed into the target site of the DNA.
Suitable alternative programmable nucleases are well known in the art and can be used in place of the napDNAbp: gRNA complex to construct alternative guided editor systems that can be programmed to selectively bind to target sites of DNA and which can be further modified in the manner described above to co-localize the polymerase and RNA or DNA extension arms comprising primer binding sites and DNA synthesis templates to specific nicking sites. For example, as shown in fig. 1J, a transcription activator-like effector nuclease (TALEN) can be used as a programmable nuclease in the guided editing methods and compositions described herein. TALENs are artificial restriction enzymes produced by fusing TAL effector DNA binding domains to DNA cleavage domains. These reagents can achieve efficient, programmable and specific DNA cleavage, and are powerful tools for in situ genome editing. Transcription activator-like effectors (TALEs) can be engineered rapidly to bind almost any DNA sequence. As used herein, the term TALEN is broad and includes monomeric TALENs that can cleave double-stranded DNA without the aid of another TALEN. The term TALEN is also used to refer to one or both members of a pair of TALENs that are engineered to cooperatively cleave DNA at the same site. The cooperative TALENs may be referred to as left and right TALENs, which relate to the rotational orientation of DNA. See U.S. Ser. No. 12/965,590; U.S. serial No. 13/426,991 (U.S. patent No. 8,450,471); U.S. serial No. 13/427,040 (U.S. patent No. 8,440,431); U.S. serial No. 13/427,137 (U.S. patent No. 8,440,432); and U.S. Ser. No. 13/738,381, which is incorporated by reference in its entirety. Furthermore, TALENs are described in WO 2015/027134,US 9,181,535,Boch et al, "Breaking the Code of DNA Binding Specificity of TAL-Type III Effectors," Science, vol.326, pp.1509-1512 (2009), bogdarove et al, TAL effects Customizable Proteins for DNA Targeting, science, vol.333, pp.1843-1846 (2011), cam et al, "Highly efficient generation of heritable zebrafish gene mutations using homo-and heterodimeric TALENs," Nucleic Acids Research, vol.40, pp.8001-8010 (2012), and Cerak et al, "Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting," Nucleic Acids Research, vol.39, no.17, e82 (2011), each of which is incorporated herein by reference.
As shown in fig. 1J, zinc finger nucleases can also be used as alternative programmable nucleases for guided editing instead of napDNAbp (e.g., cas9 nickase). As with TALENs, the ZFN protein may be modified to function as a nicking enzyme, i.e., the ZFN is engineered to cleave only one strand of target DNA in a manner similar to that of the napDNAbp used with the guidance editor described herein. ZFN proteins have been widely described in the art, for example, carroll et al, "Genome Engineering with Zinc-Finger nucleic acids," Genetics, aug 2011, vol.188:773-782; durai et al, "Zinc finger nucleases: custom-designed molecular scissors for genome engineering of plant and mammalian cells," Nucleic Acids Res,2005, vol.33:5978-90; and Gaj et al, "ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering," Trends biotechnol.2013, vol.31:397-405, each of which is incorporated herein by reference in its entirety.
[3]Polymerase (e.g., reverse transcriptase)
In various embodiments, the guided editor (PE) systems disclosed herein include a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase), or a variant thereof, which can be provided as a fusion protein with a napDNAbp or other programmable nuclease, or in trans.
Any polymerase can be used in the guidance editor disclosed herein. The polymerase may be a wild-type polymerase, a functional fragment, a mutant, a variant, a truncated variant, or the like. The polymerase may include a wild-type polymerase from a eukaryotic, prokaryotic, archaebacterial, or viral organism, and/or the polymerase may be modified by genetic engineering, mutagenesis, directed evolution-based processes. The polymerase may include T7 DNA polymerase, T5 DNA polymerase, T4 DNA polymerase, klenow fragment DNA polymerase, DNA polymerase III, and the like. The polymerase may also be thermostable and may include a Taq, tne, tma, pfu, tfl, tth, stoffel fragment,And->DNA polymerase, KOD, tgo, JDF3 and mutants, variants and derivatives thereof (see U.S. Pat. No. 5,436,149; U.S. Pat. No. 4,889,818; U.S. Pat. No. 4,965,185; U.S. Pat. No. 5,079,352; U.S. Pat. No. 5,614,365; U.S. Pat. No. 5,374,553; U.S. Pat. No. 5,270,179; U.S. Pat. No. 5,047,342; U.S. Pat. No. 5,512,462; WO 92/06188; WO 92/06200; WO 96/10640; barnes, W.M., gene 112:29-35 (1992); lawyer, F.C., et al, PCR Meth.appl.2:275-287 (1993); flaman, J. -M, et al Nuc.acids Res.22 (15): 3259-3260 (1994), each of which is incorporated by reference). To synthesize longer nucleic acid molecules (e.g., nucleic acid molecules that are more than about 3-5Kb in length), at least two DNA polymerases can be used. In certain embodiments, one polymerase may substantially lack 3 'exonuclease activity, while another polymerase may have 3' exonuclease activity. Such pairing may include the same or different polymerases. Examples of DNA polymerases that substantially lack 3' exonuclease activity include, but are not limited to, taq, tne (exo-), tma (exo-), pfu (exo-), pwo (exo-), exo-KOD and TthDNA polymerases and mutants, variants and derivatives thereof.
Preferably, the polymerase that can be used in the guided editor disclosed herein is a "template dependent" polymerase (since the polymerase is intended to rely on a DNA synthesis template to specify the sequence of a DNA strand synthesized during guided editing as used herein, the term "template DNA molecule" refers to a nucleic acid strand that synthesizes a complementary nucleic acid strand by the DNA polymerase, e.g., in a primer extension reaction of a DNA synthesis template of a perna.
As used herein, the term "template-dependent manner" is intended to refer to a process of template-dependent extension of a primer molecule (e.g., synthesis of DNA by DNA polymerase). The term "template-dependent manner" refers to polynucleotide synthesis of RNA or DNA, wherein the sequence of the newly synthesized polynucleotide strand is determined by well-known complementary base pairing rules (see, e.g., watson, j.d. et al, in: molecular Biology of the Gene,4th Ed., w.a. benjamin, inc., menlo Park, calif (1987)). The term "complementary" refers to the broad concept of sequence complementarity between regions of two polynucleotide strands or between two nucleotides through base pairing. Adenine nucleotides are well known to be capable of forming specific hydrogen bonds ("base pairing") with thymine or uracil nucleotides. Similarly, cytosine nucleotides are known to be capable of base pairing with guanine nucleotides. Thus, in the case of guided editing, it can be said that the polymerase of the guided editor "complements" the sequence of the DNA synthesis template with respect to the DNA single strand synthesized by the DNA synthesis template.
A.Exemplary polymerases
In various embodiments, the guidance editors described herein comprise a polymerase. The present disclosure encompasses any wild-type polymerase obtained from any naturally occurring organism or virus or obtained from commercial or non-commercial sources. In addition, polymerases useful in the guided editor of the present disclosure can include any naturally occurring mutant polymerase, engineered mutant polymerase, or other variant polymerase, including truncated variants that retain functionality. The polymerases useful herein can also be engineered to contain specific amino acid substitutions, such as those specifically disclosed herein. In certain preferred embodiments, polymerases useful in the guided editor of the present disclosure are template-based polymerases, i.e., they synthesize nucleotide sequences in a template-dependent manner.
A polymerase is an enzyme that synthesizes a nucleotide chain and can be used in conjunction with the guided editor system described herein. The polymerase is preferably a "template dependent" polymerase (i.e., a polymerase that synthesizes a nucleotide chain based on the sequence of nucleotide bases of the template chain). In some configurations, the polymerase may also be "template independent" (i.e., a polymerase that synthesizes a nucleotide chain without the need for a template chain). The polymerase may be further classified as "DNA polymerase" or "RNA polymerase". In various embodiments, the guided editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase may be a "DNA-dependent DNA polymerase" (i.e., the template molecule is thus a DNA strand). In this case, the DNA template molecule may be pegRNA, wherein the extension arm comprises a DNA strand. In this case, the pegRNA may be referred to as a chimeric or hybrid pegRNA, which comprises an RNA portion (i.e., a guide RNA component, including a spacer region and a gRNA core) and a DNA portion (i.e., an extension arm). In various other embodiments, the DNA polymerase may be an "RNA-dependent DNA polymerase" (i.e., the template molecule is thus an RNA strand). In this case, the pegRNA is RNA, i.e. includes RNA extension. The term "polymerase" may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., polymerase activity). Typically, the enzyme will begin synthesis at the 3 '-end of the primer that anneals to the polynucleotide template sequence (e.g., the primer sequence that anneals to the primer binding site of the pepRNA) and will proceed toward the 5' -end of the template strand. "DNA polymerase" catalyzes the polymerization of deoxynucleotides. As used herein, reference to a DNA polymerase, the term DNA polymerase includes "functional fragments thereof. "functional fragment thereof" refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the complete amino acid sequence of the polymerase and retains the ability to catalyze nucleotide polymerization under at least one set of conditions. Such a functional fragment may exist as a separate entity or it may be a component of a larger polypeptide, such as a fusion protein.
In some embodiments, the polymerase may be from a phage. Phage DNA polymerases typically lack 5 'to 3' exonuclease activity because the activity is encoded by a separate polypeptide. Examples of suitable DNA polymerases are T4, T7 and phi29DNA polymerases. Commercially available enzymes are: t4 (available from a number of sources, such as Epicentre) and T7 (available from a number of sources, such as unmodified from Epicentre, and DNA polymerase for 3 'to 5' exoT 7 "sequenase" from USB).
In other embodiments, the polymerase is an archaebacteria polymerase. 2 different types of DNA polymerase have been identified in archaebacteria: type I B/pol (homolog of Pfu from Pyrococcus furiosus (Pyrococcus furiosus)) and type II 2.pol (homolog of Pyrococcus furiosus (P.furiosus) DP1/DP 22-subunit polymerase). DNA polymerases from both classes have been shown to naturally lack related 5 'to 3' exonuclease activity and have 3 'to 5' exonuclease (proofreading) activity. Suitable DNA polymerases (pol I or pol II) can be derived from archaebacteria having an optimal growth temperature similar to the desired assay temperature.
Thermostable archaebacteria DNA polymerases were isolated from Pyrococcus (Pyrococcus) species (Pyrococcus), GB-D species, woesii, abysii, horikoshii), thermococcus (Thermococcus) species (kodakaraensis KOD1, litora, species 9degrees North-7, species JDF-3, gorgon arius), thermomyces crypticus (Pyrodictium occultum) and Archaeoglobus fulgidus (Archaeoglobus fulgidus).
The polymerase may also be from a eubacterial species. There are 3 classes of eubacterial DNA polymerases pol I, II and III. Enzymes in the Pol I DNA polymerase family have 5 'to 3' exonuclease activity, and certain members also exhibit 3 'to 5' exonuclease activity. Pol II DNA polymerase naturally lacks 5 'to 3' exonuclease activity, but does exhibit 3 'to 5' exonuclease activity. Pol III DNA polymerase represents the major replicative DNA polymerase of a cell, consisting of multiple subunits. Pol III catalytic subunits lack 5 'to 3' exonuclease activity, but in some cases 3 'to 5' exonuclease activity is located in the same polypeptide.
There are various Pol I DNA polymerases commercially available, some of which are modified to reduce or eliminate 5 'to 3' exonuclease activity.
Suitable thermostable pol I DNA polymerases can be isolated from a variety of thermophilic eubacteria, including Thermus (Thermus) species and Thermotoga maritima (Thermotoga maritima), such as Thermus aquaticus (Thermus aquaticus, taq), thermus thermophilus (Thermus thermophilus, tth) and Thermotoga maritima (Thermotoga maritima, tmaUlTma).
Other eubacteria relevant to those listed above are described in Thermophilic Bacteria (Kristjansson, j.k., ed.) CRC Press, inc., boca Raton, fla.,1992.
The present invention further provides chimeric or non-chimeric DNA polymerases, chemically modified according to the methods disclosed in U.S. Pat. nos. 5,677,152, 6,479,264 and 6,183,998, the contents of which are incorporated herein by reference in their entirety.
Other archaeal DNA polymerases relevant to those listed above are described in the following references: archaea: A Laboratory Manual (Robb, F.T. and Place, A.R., eds.), cold Spring Harbor Laboratory Press, cold Spring Harbor, N.Y.,1995 and Thermophilic Bacteria (Kristjansson, J.K., ed.) CRC Press, inc., boca Raton, fla.,1992.
B. Exemplary reverse transcriptase
In various embodiments, the guided editors described herein comprise reverse transcriptase as a polymerase. The present disclosure encompasses any wild-type reverse transcriptase obtained from any naturally occurring organism or virus or obtained from commercial or non-commercial sources. In addition, reverse transcriptase useful in the guided editor of the present disclosure may include any naturally occurring mutant RT, engineered mutant RT, or other variant RT, including truncated variants that retain function. RT may also be designed to contain specific amino acid substitutions, such as those specifically disclosed herein.
Reverse transcriptase is a multifunctional enzyme, typically having three enzymatic activities, including RNA-dependent and DNA-dependent DNA polymerization activities, and RNaseH activity that catalyzes the cleavage of RNA in RNA-DNA hybrids. Some reverse transcriptase mutants have disabled the RNaseH moiety to prevent accidental damage to mRNA. These enzymes that synthesize complementary DNA (cDNA) using mRNA as a template were first identified in RNA viruses. Reverse transcriptase is then isolated and purified directly from virus particles, cells or tissues (see, e.g., kacian et al, 1971,Biochim.Biophys.Acta 46:365-83; yang et al, 1972, biochem. Biophys. Res. Comm.47:505-11; gerard et al, 1975, J. Virol.15:785-97; liu et al, 1977,Arch.Virol.55 187-200; kato et al, 1984,J.Virol.Methods 9:325-39; luke et al, 1990, biochem.29:1764-69, and Le Grice et al, 1991, J. Virol.65:7004-07, each of which is incorporated by reference). Recently, mutants and fusion proteins have been created in order to seek improved properties such as thermostability, fidelity and activity. Any wild-type, variant and/or mutant form of reverse transcriptase known in the art or that can be prepared using methods known in the art is contemplated herein.
The Reverse Transcriptase (RT) gene (or genetic information contained therein) may be obtained from a number of different sources. For example, the gene may be obtained from a retrovirus-infected eukaryotic cell, or from a number of plasmids containing a portion of the retroviral genome or the entire genome. In addition, messenger RNA-like RNA containing RT gene can be obtained from retroviruses. Examples of sources of RT include, but are not limited to, moloney murine leukemia Virus (M-MLV or MLVRT); human T cell leukemia virus type 1 (HTLV-1); bovine Leukemia Virus (BLV); rous Sarcoma Virus (RSV); human Immunodeficiency Virus (HIV); yeasts, including Saccharomyces, neurospora, drosophila; a primate; and rodents. See, for example, weiss, et al, U.S. patent No. 4,663,290 (1987); gerard, G.R., DNA:271-79 (1986); kotewicz, M.L., et al, gene 35:249-58 (1985); tanese, N., et al, proc. Natl. Acad. Sci. (USA): 4944-48 (1985); roth, M.J., at al., J.biol.chem.260:9326-35 (1985); michel, F., et al, nature 316:641-43 (1985); akins, R.A., et al, cell 47:505-16 (1986), EMBO J.4:1267-75 (1985); and Fawcett, d.f., cell 47:1007-15 (1986) (each of which is incorporated herein by reference in its entirety).
Wild type RT
Exemplary enzymes for use with the guidance editors disclosed herein can include, but are not limited to, M-MLV reverse transcriptase and RSV reverse transcriptase. Enzymes having reverse transcriptase activity are commercially available. In certain embodiments, the reverse transcriptase is provided in trans to other components of the guidance editor (PE) system. That is, the reverse transcriptase is expressed as a separate component or otherwise provided, i.e., not as a fusion protein with napDNAbp.
One of ordinary skill in the art will recognize wild-type reverse transcriptases including, but not limited to, moloney murine leukemia Virus (M-MLV); human Immunodeficiency Virus (HIV) reverse transcriptase and avian sarcomSup>A-leukemiSup>A virus (ASLV) reverse transcriptase, including but not limited to Rous SarcomSup>A Virus (RSV) reverse transcriptase, avian Myeloblastosis Virus (AMV) reverse transcriptase, avian Erythroblastosis Virus (AEV) helper virus MCAV reverse transcriptase, avian myelomSup>A virus MC29 helper virus MCAV reverse transcriptase, avian reticuloendotheliosis virus (REV-T) helper virus REV-Sup>A reverse transcriptase, avian sarcomSup>A virus UR2 helper virus UR2AV reverse transcriptase, avian sarcomSup>A virus Y73AV helper virus reverse transcriptase, rous-associated virus (RAV) reverse transcriptase, and myeloblastosis-associated virus (MAV) reverse transcriptase, may be suitably used in the subject methods and compositions described herein.
Exemplary wild-type RT enzymes are as follows:
/>
/>
/>
/>
variant and error prone RT
Reverse transcriptase is essential for the synthesis of complementary DNA (cDNA) strands from RNA templates. Reverse transcriptase is an enzyme consisting of different domains exhibiting different biochemical activities. These enzymes catalyze the synthesis of DNA from RNA templates as follows: in the presence of the annealing primer, reverse transcriptase binds to the RNA template and initiates polymerization. RNA-dependent DNA polymerase activity synthesizes complementary DNA (cDNA) strands and incorporates dntps. RNaseH activity degrades the RNA template of the DNA-RNA complex. Thus, reverse transcriptase comprises (a) binding activity that recognizes and binds to RNA/DNA hybrids, (b) RNA-dependent DNA polymerase activity, and (c) RNaseH activity. In addition, reverse transcriptases are generally considered to have different properties, including their thermostability, sustained synthesis capability (dNTP incorporation rate) and fidelity (or error rate). Reverse transcriptase variants contemplated herein can include any mutation of the reverse transcriptase enzyme that affects or alters any one or more of these enzyme activities (e.g., RNA-dependent DNA polymerase activity, RNaseH activity, or DNA/RNA hybrid binding activity) or enzyme properties (e.g., thermostability, sustained synthesis capacity, or fidelity). Such variants are available in the art in public areas, commercially available, or can be prepared using known mutagenesis methods, including directed evolution processes (e.g., PACE or PANCE).
In various embodiments, the reverse transcriptase may be a variant reverse transcriptase. As used herein, "variant reverse transcriptase" includes any naturally occurring or genetically engineered variant that comprises one or more mutations (including single mutations, inversions, deletions, insertions, and rearrangements) relative to a reference sequence (e.g., reference wild-type sequence). RT naturally has several activities, including RNA-dependent DNA polymerase activity, ribonuclease H activity and DNA-dependent DNA polymerase activity. In general, these activities enable enzymes to convert single-stranded RNA into double-stranded cDNA. In retroviruses and retrotransposons, such cDNA can integrate into the host genome from which new RNA copies can be produced via host cell transcription. Variant RT may comprise mutations that affect one or more of these activities (either decreasing or increasing these activities, or eliminating these activities altogether). Furthermore, variant RT may comprise one or more mutations that make RT more or less stable, less prone to aggregation, and facilitate purification and/or detection, and/or modification of other properties or characteristics.
One of ordinary skill in the art will recognize variant reverse transcriptases derived from other reverse transcriptases including, but not limited to, moloney murine leukemia Virus (M-MLV); human Immunodeficiency Virus (HIV) reverse transcriptase and avian sarcomSup>A-leukemiSup>A virus (ASLV) reverse transcriptase, including but not limited to Rous SarcomSup>A Virus (RSV) reverse transcriptase, avian Myeloblastosis Virus (AMV) reverse transcriptase, avian Erythroblastosis Virus (AEV)) helper virus MCAV reverse transcriptase, avian myelomSup>A virus MC29 helper virus MCAV reverse transcriptase, avian reticuloendotheliosis virus (REV-T) helper virus REV-Sup>A reverse transcriptase, avian sarcomSup>A virus UR2 helper virus UR2AV reverse transcriptase, avian sarcomSup>A virus Y73AV helper virus reverse transcriptase, rous-associated virus (RAV) reverse transcriptase, and myeloblastosis-associated virus (MAV) reverse transcriptase, may be suitably used in the subject methods and compositions described herein.
One way to prepare variant RT is by genetic modification (e.g., by modifying the DNA sequence of a wild-type reverse transcriptase). Numerous methods are known in the art that allow random and targeted mutation of DNA sequences (see, e.g., ausubel et al short Protocols in Molecular Biology (1995) 3.sup.rd Ed.John Wiley)&Sons, inc.). In addition, there are a number of commercially available kits for site-directed mutagenesis, including conventional methods and PCR-based methods. Examples include QuikChange site-directed mutagenesis kit)、/>Site-directed mutagenesis kit (NEWENGLAND)) And GeneArt TM Site-directed mutagenesis System (THERMOSHOER->)。
In addition, mutant reverse transcriptases may be generated by insertion mutation or truncation (N-terminal, internal or C-terminal insertion or truncation) according to methods known to those skilled in the art. As used herein, the term "mutation" refers to the substitution of a residue within a sequence (e.g., a nucleic acid or amino acid sequence) with another residue, or the deletion or insertion of one or more residues in the sequence. Mutations are generally described herein by determining the position of an original residue followed by that residue in the sequence and the identity of the newly substituted residue. Various methods for making amino acid substitutions (mutations) provided herein are well known in the art and are provided by, for example, green and Sambrook, molecular Cloning: A Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)). Mutations may include a variety of classes such as single base polymorphisms, microrepeat regions, insertion deletions and inversions, and are not meant to be limiting in any way. Mutations may include "loss of function" mutations, which are the normal result of mutations that reduce or eliminate protein activity. Most loss-of-function mutations are recessive in that in the heterozygote, the second chromosomal copy carries the unmutated version of the gene encoding the full-function protein, whose presence compensates for the effects of the mutation. Mutations also include "gain of function" mutations, a mutation that confers abnormal activity on a protein or cell that is not normally present. Many function-acquiring mutations are located in the regulatory sequences, rather than in the coding region, and therefore have many consequences. For example, mutations may result in expression of one or more genes in the wrong tissue, which has obtained their commonly lacking function. Due to its nature, the function-gain mutation is usually dominant.
Earlier site-directed mutagenesis methods known in the art have relied on subcloning the sequence to be mutated into a vector, such as an M13 phage vector, which allows isolation of single-stranded DNA templates. In these methods, a mutagenic primer (i.e., a primer that is capable of annealing to the site to be mutated but carries one or more mismatched nucleotides at the site to be mutated) is annealed to a single-stranded template, and then the complementary sequence of the template is polymerized starting from the 3' end of the mutagenic primer. The resulting duplex is then transformed into host bacteria and plaques are screened for the desired mutation.
Recently, the PCR method has been adopted for site-directed mutagenesis, which has the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require subcloning. Several problems must be considered in performing PCR-based site-directed mutagenesis. First, in these methods, it is desirable to reduce the number of PCR cycles to prevent polymerase from introducing unwanted mutant amplification. Second, selection must be made to reduce the number of non-mutant parent molecules that persist in the reaction. Third, in order to allow the use of a single PCR primer set, an extended length PCR method is preferred. Fourth, because of the template-independent end-extension activity of some thermostable polymerases, it is often necessary to add an end-fill step to the program prior to blunt-end ligation of PCR-generated mutant products.
Random mutagenesis methods exist in the art that will result in a set of mutants with one or more randomly located mutations. Such a set of mutants may then be screened for a property that exhibits an increased stability relative to the wild-type reverse transcriptase, for example.
An example of a random mutagenesis method is the so-called "error-prone PCR method". As the name suggests, this method amplifies a given sequence under conditions where the DNA polymerase does not support high fidelity incorporation. Although the conditions that promote misincorporation of different DNA polymerases vary, one skilled in the art can determine such conditions for a given enzyme. A key variable in amplification fidelity for many DNA polymerases is, for example, the type and concentration of divalent metal ions in the buffer. The use of manganese ions and/or changes in magnesium or manganese ion concentration can therefore be applied to affect the error rate of the polymerase.
In various aspects, the RT of the guided editor may be an "error-prone" reverse transcriptase variant. Error-prone reverse transcriptase known and/or available in the art may be used. It will be appreciated that reverse transcriptase does not naturally have any proofreading function; reverse transcriptase therefore generally has a higher error rate than DNA polymerase comprising proofreading activity. The error rate of any particular reverse transcriptase is a property of the enzyme "fidelity" that represents the accuracy of template directed polymerization of DNA against its RNA template. RT with high fidelity has a low error rate. In contrast, RT with low fidelity has a high error rate. The fidelity of M-MLV-based reverse transcriptase is reported to have an error rate in the range of one error in the synthesized 15,000 to 27,000 nucleotides. See Boutaalout et al, "DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Ty," Nucleic Acids Res,2001,29:2217-2222, which is incorporated by reference. Thus, for the purposes of the present application, those reverse transcriptases that are considered "error-prone" or that are considered to have "error-prone fidelity" are those reverse transcriptases that have an error rate of less than one error in the 15,000 nucleotides synthesized.
Error-prone reverse transcriptase can also be generated by mutagenesis of the starting RT enzyme (e.g., wild-type M-MLVRT). Mutagenesis methods are not limited and may include directed evolution processes such as phage-assisted continuous evolution (PACE) or phage-assisted discontinuous evolution (PANCE). As used herein, the term "phage-assisted continuous evolution (PACE)" refers to continuous evolution employing phage as a viral vector. The general concept of PACE technology has been described in, for example, international PCT application PCT/US2009/056194 filed on 8 th 9 th 2009, published as WO2010/028347 on 11 th 3 th 2010; PCT international application PCT/US2011/066747 filed 12/22 2011, 6/28 of 2012 published as WO 2012/088381; U.S. patent No. 9,023,594 issued 5/2015; international PCT application No. PCT/US 2015/01022, filed on 1 month 20 2015, published as WO2015/134121 on 9 month 11 2015, and International PCT application No. PCT/US2016/027795, filed on 4 month 15 2016, published as WO 2016/168831 on 10 month 20, each of which is incorporated herein by reference in its entirety.
Error-prone reverse transcriptase can also be obtained by phage-assisted discontinuous evolution (PANCE), which as used herein refers to discontinuous evolution using phage as a viral vector. PANCE is a simplified technique for rapid in vivo directed evolution, using successive flask transfer of the evolving "select phage" (SP) containing the gene of interest to be evolved in fresh e.coli host cells, allowing the genes in the host e.coli to remain constant while the genes contained in SP evolve. Continuous flask transfer has been a widely used method of microbiological laboratory evolution, and similar methods have recently been developed for phage evolution. The PANCE system is characterized by a lower stringency than the PACE system.
Other error-prone reverse transcriptases have been described in the literature, each of which is contemplated for use in the methods and compositions herein. For example, error-prone reverse transcriptases have been described in Bebenek et al, "Error-prone Polymerization by HIV-1Reverse Transcriptase," J Biol Chem,1993, vol.268:10324-10334, and Sebastin-Martin et al, "Transcriptional inaccuracy threshold attenuates differences in RNA-dependent DNA synthesis fidelity between retroviral reverse transcriptases," Scientific Reports,2018, vol.8:627, each of which is incorporated by reference. Further, reverse transcriptases, including error-prone reverse transcriptases, are available from commercial suppliers, including(II) reverse transcriptase, AMV reverse transcriptase,>reverse transcriptase and M-MuLV reverse transcriptase, both from NEW ENGLANDOr AMV reverse transcriptase XL, SMART script reverse transcriptase, GPR ultrapure MMLV reverse transcriptase, all from TAKARA BIO USA, INC. (previous CLONTECH).
The present disclosure also contemplates reverse transcriptase having mutations in the RNaseH domain. As described above, one of the intrinsic properties of reverse transcriptase is RNaseH activity, which cleaves RNA templates of RNA: cDNA hybrids while polymerizing. RNaseH activity may not be desirable for synthesis of long cDNA because RNA templates may degrade before full length reverse transcription is complete. RNaseH activity may also decrease reverse transcription efficiency, possibly due to its competition with the polymerase activity of the enzyme. Thus, the present disclosure contemplates any reverse transcriptase variant comprising modified RNaseH activity.
The present disclosure also contemplates reverse transcriptases having mutations in the RNA-dependent DNA polymerase domain. As described above, one of the inherent properties of reverse transcriptase is RNA dependent DNA polymerase activity, which incorporates nucleobases into the nascent cDNA strand encoded by the template RNA strand of the RNA: cDNA hybrid. RNA-dependent DNA polymerase activity (i.e., in terms of its incorporation rate) can be increased or decreased to increase or decrease the sustained synthesis capacity of the enzyme. Thus, the present disclosure contemplates any reverse transcriptase variant comprising modified RNA-dependent DNA polymerase activity such that the sustained synthesis capacity of the enzyme is increased or decreased relative to the unmodified form.
Reverse transcriptase variants having altered thermostability characteristics are also contemplated herein. The ability of reverse transcriptase to withstand high temperatures is an important aspect of cDNA synthesis. The elevated reaction temperature helps denature RNAs with strong secondary structures and/or high GC content, allowing reverse transcriptase to read sequences. Thus, reverse transcription at higher temperatures can achieve full-length cDNA synthesis and higher yields, which may lead to improved generation of 3' flap ssDNA as a result of the pilot editing process. The optimal temperature range for wild-type M-MLV reverse transcriptase is typically 37℃to 48 ℃; however, it is possible to introduce mutations that allow reverse transcription activity to be performed at higher temperatures than 48℃including 49℃50℃51℃52℃53℃54℃55℃56℃57℃58℃59℃60℃61℃62℃63℃64℃65℃66℃and higher.
Variant reverse transcriptases contemplated herein, including error-prone RT, thermostable RT, RT that increases the ability to continue synthesis, may be engineered by different conventional strategies including mutagenesis or evolutionary processes. In some cases, variants may be produced by introducing a single mutation. In other cases, a variant may require more than one mutation. For those mutants that contain more than one mutation, the effect of a given mutation can be assessed by introducing the identified mutation into the wild-type gene by site-directed mutagenesis and isolating it from other mutations carried by the particular mutant. Screening assays for the single mutants so generated will allow the effect of the mutation to be determined separately.
Variant RT enzymes as used herein may also include other "RT variants" that are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference RT protein (including any wild-type RT), or mutant RT, or fragment RT, or other variants of RT, disclosed or contemplated herein or known in the art.
In some embodiments, an RT variant may have an amino acid change of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, or up to 100, or up to 200, or up to 300, or up to 400, or up to 500, or more, compared to a reference RT. In some embodiments, the RT variants comprise fragments of the reference RT such that the fragments are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragments of the reference RT. In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of the corresponding wild-type RT (M-MLV reverse transcriptase) (e.g., SEQ ID NO: 32) or any of the reverse transcriptases of SEQ ID NO: 102-112.
In some embodiments, the present disclosure may also utilize RT fragments that retain their functionality and are fragments of any of the RT proteins disclosed herein. In some embodiments, the RT fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, or up to 600 or more amino acids in length.
In other embodiments, the present disclosure may also utilize RT variants truncated by a certain number of amino acids at the N-terminus or C-terminus, or both, resulting in sufficient polymerase function remaining in the truncated variants. In some embodiments, the RT truncated variant has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 amino acids at the N-terminal end of the protein. In other embodiments, the RT truncated variant has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 amino acids at the C-terminal end of the protein. In other embodiments, the RT truncation variants have truncations of the same or different lengths at the N-terminal and C-terminal ends.
For example, the guidance editors disclosed herein may include truncated forms of M-MLV reverse transcriptase. In this embodiment, the reverse transcriptase contains 4 mutations (D200N, T306K, W313F, T330P; note that the L603W mutation present in PE2 is no longer present due to truncation). The DNA sequence encoding such a truncated editor is 522bp smaller than PE2, so it may be suitable for applications where delivery of the DNA sequence is challenging due to its size (i.e., adeno-associated virus and lentiviral delivery). This embodiment is called MMLV-RT (truncated) and has the following amino acid sequence:
in various embodiments, the guide editors disclosed herein can comprise one of the RT variants described herein, or an RT variant that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the reference Cas9 variants.
In other embodiments, the present methods and compositions may utilize DNA polymerases that have evolved to reverse transcriptases, as described in Effefson et al, "Synthetic evolutionary origin of a proofreading reverse transcriptase," Science, june 24,2016, vol.352:1590-1593, the contents of which are incorporated herein by reference.
In certain other embodiments, the reverse transcriptase is provided as a module that also comprises a napDNAbp fusion protein. In other words, in some embodiments, the reverse transcriptase is fused to napDNAbp as a fusion protein.
In various embodiments, the variant reverse transcriptase may be engineered by a wild type M-MLV reverse transcriptase as set forth in SEQ ID NO. 32.
In various embodiments, the guidance editors described herein (where RT is provided as a fusion partner or in trans) may include variant RT comprising one or more of the following mutations: P51L, S67K, E69K, L139P, T197A, D200N, H204R, F K, E302R, T306K, F N, W313F, T330P, L345G, L435G, N454K, D524G, E562Q, D583N, H594Q, L603W, E K, or D653N at the corresponding amino acid position in the wild-type M-MLV RT or other wild-type RT polypeptide sequence of SEQ ID NO 32.
Some exemplary reverse transcriptases are provided below, which may be fused to a napDNAbp protein or provided as separate proteins according to various embodiments of the present disclosure. Exemplary reverse transcriptases include variants having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to a wild-type enzyme or a portion of the following:
/>
/>
/>
/>
/>
/>
/>
In various other embodiments, the guided editor described herein (wherein RT is provided as a fusion partner or in trans) may comprise a variant RT comprising one or more of the following mutations: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, N X, D524X, E562X, D583X, H594X, L603X, E X, or D653X, at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or the additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a P51X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is L.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising the S67X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an E69X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an L139X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is P.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a T197X mutation at the corresponding amino acid position in the wild-type M-MLV RT of SEQ ID No. 32 or in an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is a.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a D200X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an H204X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is R.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an F209X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an E302X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an E302X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is R.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a T306X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an F309X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a W313X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is F.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a T330X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is P.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an L345X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is G.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising the L435X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is G.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an N454X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a D524X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is G.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an E562X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is Q.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising the D583X mutation at the corresponding amino acid position of the wild-type M-MLVRT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an H594X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is Q.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an L603X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is W.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising an E607X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors described herein (wherein RT is provided as a fusion partner or in trans) may include a variant RT comprising a D653X mutation at the corresponding amino acid position of the wild-type M-MLV RT of SEQ ID No. 32 or of an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
Some exemplary reverse transcriptases are provided below, which may be fused to a napDNAbp protein or provided as separate proteins according to various embodiments of the present disclosure. Exemplary reverse transcriptases include those having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type enzyme or partial enzyme represented by SEQ ID NOs 32, 34, 113-128. The guidance editor (PE) described herein contemplates any publicly available reverse transcriptase described or disclosed in the following U.S. patents, each of which is incorporated by reference in its entirety: U.S. patent No.: 10,202,658;10,189,831;10,150,955;9,932,567;9,783,791;9,580,698;9,534,201; and 9,458,484, and any variants thereof using known methods of installing mutations or known methods of evolving proteins. The following references describe reverse transcriptases in the art. Each of which is incorporated herein by reference in its entirety.
Herzig,E.,Voronin,N.,Kucherenko,N.&Hizi,A.A Novel Leu92 Mutant of HIV-1 Reverse Transcriptase with a Selective Deficiency in Strand Transfer Causes a Loss of Viral Replication.J.Virol.89,8119–8129(2015).
Mohr,G.et al.A Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition.Mol.Cell 72,700-714.e8(2018).
Zhao,C.,Liu,F.&Pyle,A.M.An ultraprocessive,accurate reverse transcriptase encoded by a metazoan group II intron.RNA 24,183–195(2018).
Zimmerly,S.&Wu,L.An Unexplored Diversity of Reverse Transcriptases in Bacteria.Microbiol Spectr3,MDNA3-0058–2014(2015).
Ostertag,E.M.&Kazazian Jr,H.H.Biology of Mammalian L1 Retrotransposons.Annual Review of Genetics35,501–538(2001).
Perach,M.&Hizi,A.Catalytic Features of the Recombinant Reverse Transcriptase of Bovine Leukemia Virus Expressed in Bacteria.Virology 259,176–189(1999).
Lim,D.et al.Crystal structure of the moloney murine leukemia virus RNase H domain.J.Virol.80,8379–8389(2006).
Zhao,C.&Pyle,A.M.Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution.Nature Structural&Molecular Biology23,558–565(2016).
Griffiths,D.J.Endogenous retroviruses in the human genome sequence.Genome Biol.2,REVIEWS1017(2001).
Baranauskas,A.et al.Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants.Protein Eng Des Sel25,657–668(2012).
Zimmerly,S.,Guo,H.,Perlman,P.S.&Lambowltz,A.M.Group II intron mobility occurs by target DNA-primed reverse transcription.Cell82,545–554(1995).
Feng,Q.,Moran,J.V.,Kazazian,H.H.&Boeke,J.D.Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition.Cell87,905–916(1996).
Berkhout,B.,Jebbink,M.&Zsíros,J.Identification of an Active Reverse Transcriptase Enzyme Encoded by a Human Endogenous HERV-K Retrovirus.Journal of Virology73,2365–2375(1999).
Kotewicz,M.L.,Sampson,C.M.,D’Alessio,J.M.&Gerard,G.F.Isolation of cloned Moloney murine leukemia virus reverse transcriptase lacking ribonuclease H activity.Nucleic Acids Res 16,265–277(1988).
Arezi,B.&Hogrefe,H.Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer.Nucleic Acids Res 37,473–481(2009).
Blain,S.W.&Goff,S.P.Nuclease activities of Moloney murine leukemia virus reverse transcriptase.Mutants with altered substrate specificities.J.Biol.Chem.268,23585–23592(1993).
Xiong,Y.&Eickbush,T.H.Origin and evolution of retroelements based upon their reverse transcriptase sequences.EMBO J9,3353–3362(1990).
Herschhorn,A.&Hizi,A.Retroviral reverse transcriptases.Cell.Mol.Life Sci.67,2717–2747(2010).
Taube,R.,Loya,S.,Avidan,O.,Perach,M.&Hizi,A.Reverse transcriptase of mouse mammary tumour virus:expression in bacteria,purification and biochemical characterization.Biochem.J.329(Pt 3),579–587(1998).
Liu,M.et al.Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage.Science295,2091–2094(2002).
Luan,D.D.,Korman,M.H.,Jakubczak,J.L.&Eickbush,T.H.Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site:a mechanism for non-LTR retrotransposition.Cell72,595–605(1993).
Nottingham,R.M.et al.RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase.RNA 22,597–613(2016).
Telesnitsky,A.&Goff,S.P.RNase H domain mutations affect the interaction between Moloney murine leukemia virus reverse transcriptase and its primer-template.Proc.Natl.Acad.Sci.U.S.A.90,1276–1280(1993).
Halvas,E.K.,Svarovskaia,E.S.&Pathak,V.K.Role of Murine Leukemia Virus Reverse Transcriptase Deoxyribonucleoside Triphosphate-Binding Site in Retroviral Replication and In Vivo Fidelity.Journal of Virology74,10349–10358(2000).
Nowak,E.et al.Structural analysis of monomeric retroviral reverse transcriptase in complex with an RNA/DNA hybrid.Nucleic Acids Res41,3874–3887(2013).
Stamos,J.L.,Lentzsch,A.M.&Lambowitz,A.M.Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications.Molecular Cell68,926-939.e4(2017).
Das,D.&Georgiadis,M.M.The Crystal Structure of the Monomeric Reverse Transcriptase from Moloney Murine Leukemia Virus.Structure 12,819–829(2004).
Avidan,O.,Meer,M.E.,Oz,I.&Hizi,A.The processivity and fidelity of DNA synthesis exhibited by the reverse transcriptase of bovine leukemia virus.European Journal of Biochemistry269,859–867(2002).
Gerard,G.F.et al.The role of template-primer in protection of reverse transcriptase from thermal inactivation.Nucleic Acids Res30,3118–3129(2002).
Monot,C.et al.The Specificity and Flexibility of L1 Reverse Transcription Priming at Imperfect T-Tracts.PLOS Genetics9,e1003499(2013).
Mohr,S.et al.Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing.RNA 19,958–970(2013).
Any of the references mentioned above relating to reverse transcriptase are incorporated herein by reference in their entirety if not already stated.
[4]Guide editor
The boot editor (PE) system described herein refers to a system comprising (a) at least two proteins: (1) napDNAbp (e.g., cas9 nickase) and (2) a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as reverse transcriptase), and (B) an engineered pegRNA comprising at least one performance enhancing modification relative to a typical pegRNA. The napDNAbp and polymerase modules may be provided separately, i.e. in trans from each other, or may be provided as fusion proteins, whereby the napDNAbp and polymerase modules are coupled, e.g. via a polypeptide linker.
The present application contemplates combining any suitable napDNAbp and polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase) in a single fusion protein for use with the engineered pegrnas disclosed herein. Examples of napDNAbp and polymerases (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as reverse transcriptase), respectively, are defined herein. Since polymerases are well known in the art and amino acid sequences are readily available, the disclosure is not meant to be limited in any way to those specific polymerases identified herein. In various embodiments, the fusion protein may comprise any suitable structural configuration. For example, the fusion protein may comprise napDNAbp fused to a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase) from the N-terminal to the C-terminal direction. In other embodiments, the fusion protein may comprise a polymerase (e.g., reverse transcriptase) fused to napDNAbp from N-terminus to C-terminus. The fusion domains may optionally be linked by a linker, such as an amino acid sequence. In other embodiments, the fusion protein may comprise the structure NH 2 -[napDNAbp]- [ polymerase ]]-COOH; or NH 2 - [ polymerase ]]--[napDNAbp]-COOH, where']Each occurrence of "- [" indicates the presence of an optional linker sequence. In embodiments in which the polymerase is a reverse transcriptase, the fusion protein may comprise the structure NH 2 -[napDNAbp]-[RT]-COOH; or NH 2 -[RT]-[napDNAbp]-COOH, where']Each occurrence of "- [" indicates the presence of an optional linker sequence.
An exemplary fusion protein is depicted in fig. 14, which shows a fusion protein comprising an MLV reverse transcriptase ("MLV-RT") fused via a linker sequence to a nicking enzyme Cas9 ("Cas 9 (H840A)"). This example is not intended to limit the scope of fusion proteins that may be used in the guided editor (PE) system described herein.
In various embodiments, the guide editor can have an amino acid sequence (referred to herein as "PE 1") that includes a Cas9 variant (i.e., cas9 nickase) and M-MLV RT wild type that contains an H840A mutation, as well as an N-terminal NLS sequence (19 amino acids) and an amino acid linker (32 amino acids) that connects the C-terminus of the Cas9 nickase domain to the N-terminus of the RT domain. The PE1 fusion protein has the following structure: [ NLS ] - [ Cas9 (H840A) ] - [ linker ] - [ MMLV-RT (wt) ]. The amino acid sequence of PE1 and its individual components is as follows:
/>
/>
in another embodiment, the guide editor can have an amino acid sequence (referred to herein as "PE 2") that includes a Cas9 variant (i.e., cas9 nickase) that includes the H840A mutation and an M-MLV RT that includes mutations D200N, T330P, L W, T306K and W313F, as well as an N-terminal NLS sequence (19 amino acids) and an amino acid linker (33 amino acids) that connects the C-terminus of the Cas9 nickase domain to the N-terminus of the RT domain. The PE2 fusion protein has the following structure: [ NLS ] - [ Cas9 (H840A) ] - [ linker ] - [ MMLV-RT (D200N) (T330P) (L603W) (T306K) (W313F) ]. The amino acid sequence of PE2 is as follows:
/>
/>
In still other embodiments, the guidance editor may have the following amino acid sequence:
/>
/>
/>
in other embodiments, the guide editor may be based on SaCas9 or SpCas9 nickases with altered PAM specificity, such as the sequences exemplified below:
/>
/>
/>
in other embodiments, the guide editor contemplated herein may include a Cas9 nickase (e.g., cas9 (H840A)) fused to a truncated form of M-MLV reverse transcriptase. In this embodiment, the reverse transcriptase also contains 4 mutations (D200N, T306K, W313F, T330P; note that the L603W mutation present in PE2 is no longer present due to truncation). The DNA sequence encoding such a truncated editor is 522bp smaller than PE2, so it may be suitable for applications where delivery of the DNA sequence is challenging due to its size (i.e., adeno-associated virus and lentiviral delivery). This embodiment is referred to as Cas9 (H840A) -MMLV-RT (truncated) or "PE 2-short" or "PE 2-truncated" and has the following amino acid sequence:
/>
FIG. 75 provides a bar graph comparing the efficiency (i.e. "percent of total sequencing reads with specified edits or indels") of PE2, PE 2-truncations, PE3, and PE 3-truncations at different target sites in different cell lines. The data shows that the pilot editor comprising truncated RT variants is almost as efficient as the pilot editor comprising non-truncated RT proteins.
In various embodiments, the guidance editors contemplated herein may also include any variant of the sequences disclosed above having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to PE1, PE2, or any of the guidance editor fusion sequences shown above.
In certain embodiments, linkers can be used to ligate any peptide or peptide domain or moiety of the invention (e.g., napDNAbp linked or fused to a polymerase such as reverse transcriptase).
[5]Linkers and other domains
In addition to the napDNAbp (e.g., cas9 domain) and polymerase domain (e.g., RT domain), the guide editor may contain various other domains. For example, where napDNAbp is Cas9 and the polymerase is RT, the guide editor may comprise one or more linkers connecting the Cas9 domain and the RT domain. The linker may also connect other functional domains, such as Nuclear Localization Sequences (NLS) or FEN1 (or other flap endonucleases) to the guide editor or its domain.
Furthermore, in embodiments involving trans-directed editing, a linker may be used to link the tprt recruitment protein to the directed editor, such as between the tprt recruitment protein and napDNAbp. See, e.g., fig. 3G, an exemplary schematic of a trans-lead editor (tPE) that includes a linker fusing the polymerase domain and the recruitment protein domain, respectively, to napDNAbp.
A.Joint
As defined above, the term "linker" as used herein refers to a chemical group or molecule that connects two molecules or moieties (e.g., a binding domain and a cleavage domain of a nuclease). In some embodiments, the linker connects the gRNA binding domain of the RNA-programmable nuclease and the catalytic domain of a polymerase (e.g., reverse transcriptase). In some embodiments, the linker connects dCas9 and reverse transcriptase. Typically, a linker is located between or on both sides of two groups, molecules or other moieties and connects them to each other via a covalent bond, thereby linking the two. In some embodiments, the linker is an amino acid or multiple amino acids (e.g., peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
In certain embodiments, the linkers are nucleotide linkers and may refer to those linkers that link the pegRNA to additional nucleotide moieties, as described herein, such as, but not limited to, aptamers (e.g., prequeosines 1 -1 riboswitch aptamer or "evoparq 1 -1 ") or a variant thereof, a pseudoknot (MMLV viral genome pseudoknot or" Mpknot-1 ") or a variant thereof, a tRNA (e.g., a modified tRNA of MMLV used as a reverse transcription primer) or a variant thereof, or a G-quadruplex or a variant thereof. Exemplary nucleotide sequences for such linkers are provided herein throughout and include, but are not limited to, SEQ ID NOS 225-236.
The length of the linker may be as simple as the covalent bond or may be a polymeric linker of many atoms. In certain embodiments, the linker is a polypeptide or amino acid based. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide bond. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of an aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, acetic acid, alanine, β -alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminocaproic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a benzene ring. The linker may include a functionalized moiety to facilitate attachment of nucleophiles (e.g., thiols, amino groups) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, michael (Michael) acceptors, haloalkanes, aryl halides, acyl halides, and isothiocyanates.
In some other embodiments, the linker comprises an amino acid sequence (GGGGS) n (SEQ ID NO:138)、(G) n (SEQ ID NO:139)、(EAAAK) n (SEQ ID NO:12)、(GGS) n (SEQ ID NO:140)、(SGGS) n (SEQ ID NO:8)、(XP) n (SEQ ID NO: 141) or any combination thereof, wherein n is independently an integer from 1 to 30, and X is any amino acid. In some embodiments, the linker comprises an amino acid sequence(GGS) N (SEQ ID NO: 140), wherein N is 1, 3 or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 142). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 143). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 144). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 8). In other embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGGS (SEQ ID NO: 131).
In certain embodiments, linkers can be used to attach any peptide or peptide domain or moiety of the invention (e.g., napDNAbp linked or fused to a polymerase such as reverse transcriptase).
As defined above, the term "linker" as used herein refers to a chemical group or molecule that connects two molecules or moieties (e.g., a binding domain and a cleavage domain of a nuclease). In some embodiments, the linker connects the gRNA binding domain of the RNA-programmable nuclease and the catalytic domain of the recombinase. In some embodiments, the linker connects dCas9 and reverse transcriptase. Typically, a linker is located between or on both sides of two groups, molecules or other moieties and is attached to each other via a covalent bond, thereby linking the two. In some embodiments, the linker is an amino acid or multiple amino acids (e.g., peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
The length of the linker may be as simple as the covalent bond or may be a polymeric linker of many atoms. In certain embodiments, the linker is a polypeptide or amino acid based. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide bond. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of an aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, acetic acid, alanine, β -alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminocaproic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a benzene ring. The linker may include a functionalized moiety to facilitate attachment of nucleophiles (e.g., thiols, amino groups) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, michael acceptors, haloalkanes, aryl halides, acyl halides, and isothiocyanates.
In some other embodiments, the linker comprises the amino acid sequence (GGGGS) n (SEQ ID NO: 138), (G) n (SEQ ID NO: 139), (EAAAK) n (SEQ ID NO:12)、(GGS) n (SEQ ID NO: 140), (SGGS) n (SEQ ID NO: 8), (XP) n (SEQ ID NO: 141) or any combination thereof, wherein n is independently an integer from 1 to 30 and X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS) N (SEQ ID NO: 140), wherein N is 1, 3 or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 142). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 143). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 144). In one placeIn some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 8).
In particular, the following connectors may be used in different embodiments to connect the boot editor domains to each other:
GGS(SEQ ID NO:140);
GGSGGS(SEQ ID NO:145);
GGSGGSGGS(SEQ ID NO:146);
SGGSSGGSSGSETPGTSESATPESSGGSSGGSS(SEQ ID NO:11)
SGSETPGTSESATPES(SEQ ID NO:142);
SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGGS(SEQ ID NO:131).。
B.nuclear Localization Sequences (NLS))
In various embodiments, the guidance editor may include one or more Nuclear Localization Sequences (NLS) that help facilitate translocation of the protein into the nucleus. Such sequences are well known in the art and may include the following examples:
The above NLS example is non-limiting. The boot editor may comprise any known NLS sequence, including any of those described in Cokol et al, "Finding nuclear localization signals," EMBO Rep., "2000,1 (5): 411-415and Freitas et al," Mechanisms and Signals for the Nuclear Import of Proteins, "Current Genomics,2009,10 (8): 550-7, each of which is incorporated herein by reference.
In various embodiments, the guidance editors and constructs encoding the guidance editors disclosed herein further comprise one or more, preferably at least two, nuclear localization signals. In some embodiments, the boot editor comprises at least two NLSs. In embodiments with at least two NLSs, the NLSs may be the same NLS or may be different NLSs. In addition, NLS can be expressed as part of a fusion protein with the rest of the guidance editor. In some embodiments, one or more NLSs are two-component NLSs ("bpnlss"). In certain embodiments, the disclosed fusion proteins comprise two-component NLS. In some embodiments, the disclosed fusion proteins comprise two or more two-component NLS.
The location of the NLS fusion can be at the N-terminus, the C-terminus, or within the sequence of the guide editor (e.g., inserted between the encoded napDNAbp module (e.g., cas 9) and the polymerase domain (e.g., reverse transcriptase domain).
The NLS may be any NLS sequence known in the art. The NLS may also be any NLS discovered in the future for nuclear localization. The NLS may also be any naturally occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).
The term "nuclear localization sequence" or "NLS" refers to an amino acid sequence that facilitates the import of a protein into the nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and will be apparent to the skilled artisan. For example, NLS sequences are described in International PCT application PCT/EP 2000/0110290, filed 11/23 in 2000, 31 in 2001, published as WO/2001/038547, the contents of which are incorporated herein by reference. In some embodiments, the NLS comprises the amino acid sequences PKKKKRKV (SEQ ID NO: 26), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 27), KRTADGSEFESPKKKRKV (SEQ ID NO: 154) or KRTADGSEFEPKKKRKV (SEQ ID NO: 155). In other embodiments, the NLS comprises amino acid sequences NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 156), PAAKRVKLD (SEQ ID NO: 149), RQRRNELKRSF (SEQ ID NO: 157), NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 158).
In one aspect of the disclosure, the guided editor is modified with one or more Nuclear Localization Signals (NLS), preferably at least two NLS. In certain embodiments, the guided editor is modified with two or more NLS. The present disclosure contemplates the use of any nuclear localization signal known in the art at the time of disclosure, or any nuclear localization signal that is confirmed or otherwise available in the prior art after the time of filing of the present application. Representative nuclear localization signals are peptide sequences that direct a protein to the nucleus expressing the sequence. The nuclear localization signal is predominantly basic, can be located almost anywhere in the amino acid sequence of a protein, and generally comprises a short sequence of four amino acids (Aueri & Agrawal, (1998) J.biol. Chem.273:14731-37, incorporated herein by reference) to eight amino acids, and is generally rich in lysine and arginine residues (Magin et al, (2000) Virology 274:11-16, incorporated herein by reference). The nuclear localization signal typically comprises a proline residue. A variety of nuclear localization signals have been identified and have been used to affect the transport of biomolecules from the cytoplasm to the nucleus. See, e.g., tinland et al, (1992) proc.Natl. Acad.Sci.U.S. A.89:7442-46; moede et al, (1999) FEBS Lett.461:229-34, which is incorporated by reference. Translocation is currently thought to involve nucleoporins.
Most NLS can be divided into three categories: (i) Single component NLS, such as the SV40 larger T antigen NLS (PKKKRKV (SEQ ID NO: 26)); (ii) Two-component motifs consisting of two basic domains separated by a different number of spacer amino acids, exemplified by Xenopus nucleoplasmic protein NLS (KRRXXXXXXXXXKKL (SEQ ID NO: 159)); (iii) Atypical sequences such as M9 of hnRNP A1 protein, influenza virus nucleoprotein NLS and yeast Gal4 protein NLS (Dingwall and Laskey 1991).
Nuclear localization signals occur at different points in the amino acid sequence of proteins. NLS has been identified at the N-terminus, C-terminus and central region of proteins. Thus, the present disclosure provides a boot editor that can be decorated with one or more NLS's at the C-terminus, N-terminus, and interior regions of the boot editor. Residues of longer sequences that do not function as component NLS residues should be selected so as not to interfere with the nuclear localization signal itself, e.g., in tension or space. Thus, although there is no strict limitation on the composition of the sequence comprising the NLS, in practice such sequences are functionally limited in length and composition.
The present disclosure contemplates any suitable means by which a guided editor is modified to include one or more NLSs. In one aspect, the guided editor may be designed to express a guided editor protein that translationally fuses one or more NLS at its N-terminus or C-terminus (or both), i.e., to form a guided editor-NLS fusion construct. In other embodiments, the nucleotide sequence encoding the guided editor may be genetically modified to incorporate a reading frame encoding one or more NLS in the interior region of the encoded guided editor. In addition, the NLS may include various amino acid linkers or spacers encoded between the guide editor and the N-terminal, C-terminal, or internally attached NLS amino acid sequence (e.g., in the central region of the protein). Thus, the present disclosure also provides nucleotide constructs, vectors, and host cells for expressing a fusion protein comprising a guide editor and one or more NLS.
The guidance editors described herein may also include a nuclear localization signal that is linked to the guidance editors by one or more linkers, such as a polymer, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplation of the present disclosure are not intended to be limiting in any way, may be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain), and are attached to the guided editor by any suitable strategy that enables the formation of bonds (e.g., covalent bonds, hydrogen bonds) between the guided editor and one or more NLSs.
C.Flap endonucleases (e.g., FEN1)
In various embodiments, the guide editor may comprise one or more flap endonucleases (e.g., FENl), which refers to enzymes that catalyze the removal of 5' single stranded DNA flaps. These are enzymes used to remove 5' flaps formed in cellular processes, including DNA replication. The guided editing methods described herein can utilize endogenously provided flap endonucleases or those provided in trans to remove the 5' flap of endogenous DNA formed at the target site during guided editing. Flap endonucleases are known in the art and can be found in Patel et al, "Flap endonucleases pass 5'-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5' -ends," Nucleic Acids Research,2012,40 (10): 4507-4519 and Tsutakawa et al, "Human flap endonuclease structures, DNA double-base flip, and a unified understanding of the FEN1 superfamily," Cell,2011,145 (2): 198-211, each of which is incorporated herein by reference). An exemplary flap endonuclease is FEN1, which may be represented by the amino acid sequence of SEQ ID NO. 15.
The flap endonuclease may also include any FEN1 variant, mutant or other flap endonuclease ortholog, homolog or variant. Non-limiting FEN1 variants are exemplified as follows:
/>
/>
in various embodiments, the guidance editors contemplated herein may include any of the endovalve enzyme variants of the disclosed sequences described above having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the sequences described above.
Other endonucleases that can be utilized by the present methods to facilitate removal of a 5' end single-stranded DNA flap include, but are not limited to, (1) trex2, (2) exo1 endonucleases (e.g., keijzers et al, biosci rep.2015,35 (3): e 00206).
Trex 2
3' triple prime repair exonuclease 2 (TREX 2) -human
Accession number NM_080701
MSEAPRAETFVFLDLEATGLPSVEPEIAELSLFAVHRSSLENPEHDESGALVLPRVLDKLTLCMCPERPFTAKASEITGLSSEGLARCRKAGFDGAVVRTLQAFLSRQAGPICLVAHNGFDYDFPLLCAELRRLGARLPRDTVCLDTLPALRGLDRAHSHGTRARGRQGYSLGSLFHRYFRAEPSAAHSAEGDVHTLLLIFLHRAAELLAWADEQARGWAHIEPMYLPPDDPSLEA(SEQ ID NO:165)
3' triple primary repair exonuclease 2 (TREX 2) -mice
Accession number NM-011907
MSEPPRAETFVFLDLEATGLPNMDPEIAEISLFAVHRSSLENPERDDSGSLVLPRVLDKLTLCMCPERPFTAKASEITGLSSESLMHCGKAGFNGAVVRTLQGFLSRQEGPICLVAHNGFDYDFPLLCTELQRLGAHLPQDTVCLDTLPALRGLDRAHSHGTRAQGRKSYSLASLFHRYFQAEPSAAHSAEGDVHTLLLIFLHRAPELLAWADEQARSWAHIEPMYVPPDGPSLEA(SEQ ID NO:166)
3' triple primary repair exonuclease 2 (TREX 2) -rat
Accession number NM-001107580
MSEPLRAETFVFLDLEATGLPNMDPEIAEISLFAVHRSSLENPERDDSGSLVLPRVLDKLTLCMCPERPFTAKASEITGLSSEGLMNCRKAAFNDAVVRTLQGFLSRQEGPICLVAHNGFDYDFPLLCTELQRLGAHLPRDTVCLDTLPALRGLDRVHSHGTRAQGRKSYSLASLFHRYFQAEPSAAHSAEGDVNTLLLIFLHRAPELLAWADEQARSWAHIEPMYVPPDGPSLEA(SEQ ID NO:167)
ExoI
Human exonuclease 1 (EXO 1) is involved in many different DNA metabolic processes including DNA mismatch repair (MMR), micro-mediated end ligation, homologous Recombination (HR) and replication. Human EXO1 belongs to the family of eukaryotic nucleases Rad2/XPG, which also include FEN1 and GEN1. The Rad2/XPG family is conserved in the nuclease domain of species ranging from phage to human. The EXO1 gene product exhibits 5 'exonuclease and 5' flap activity. In addition, EXO1 contains an inherent 5' RNase H activity. Human EXO1 has high affinity for processing double-stranded DNA (dsDNA), nicks, gaps, pseudo Y structures, and can use its inherited flap activity to break down Holliday (Holliday) linkers. Human EXO1 is associated with MMR and contains a conserved binding domain that directly interacts with MLH1 and MSH 2. PCNA, mutSα (MSH 2/MSH6 complex), 14-3-3, MRN and 9-1-1 complexes can positively stimulate EXO1 nucleolytic activity.
Exonuclease 1 (EXO 1) accession No. NM-003686 (Chile exonuclease 1 (EXO 1), transcript variant 3) -isoform A
MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYVGFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVSEARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITEDSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDYLSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLYQLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAHSRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPRKSSIVKRPRSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSVSTPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIPDKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFRRKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDSDSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKPASIQKRKHHNAENKPGLQIKLNELWKNFGFKKF(SEQ ID NO:168)
Exonuclease 1 (EXO 1) accession number NM-006027 (homo sapiens exonuclease 1 (EXO 1), transcript variant 3) -isoform B
MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYVGFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVSEARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITEDSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDYLSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLYQLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAHSRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPRKSSIVKRPRSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSVSTPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIPDKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFRRKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDSDSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKPASIQKRKHHNAENKPGLQIKLNELWKNFGFKKDSEKLPPCKKPLSPVRDNIQLTPEAEEDIFNKPECGRVQRAIFQ(SEQ ID NO:169)
Exonuclease 1 (EXO 1) accession number NM-001319224 (homo sapiens exonuclease 1 (EXO 1), transcript variant 4) -isoform C
MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYVGFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVSEARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITEDSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDYLSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLYQLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAHSRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPRKSSIVKRPRSELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSVSTPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIPDKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFRRKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDSDSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKPASIQKRKHHNAENKPGLQIKLNELWKNFGFKKDSEKLPPCKKPLSPVRDNIQLTPEAEEDIFNKPECGRVQRAIFQ(SEQ ID NO:170)
D.Inteins and cleaved inteins
It will be appreciated that in some embodiments (e.g., using AAV particles to deliver a guidance editor in vivo), it may be advantageous to break a polypeptide (e.g., deaminase or napDNAbp) or fusion protein (e.g., guidance editor) into an N-terminal half and a C-terminal half, deliver them separately, and then co-localize them to reform the intact protein (or fusion protein, as the case may be) within the cell. The separate halves of the protein or fusion protein may each comprise a split intein tag to facilitate the reformation of the intact protein or fusion protein by the protein trans-splicing mechanism.
Trans-splicing of proteins catalyzed by split inteins provides a completely enzymatic method for protein ligation. The cleaved intein is essentially a continuous intein (e.g., a mini-intein) cleaved into two parts (designated N-intein and C-intein, respectively). The N-intein and C-intein of the cleaved intein may associate non-covalently to form an active intein and catalyze splicing reactions in substantially the same manner as the continuous intein. The split inteins have been found in nature and have also been engineered in the laboratory. As used herein, the term "split intein" refers to any intein in which there is one or more peptide bond breaks between the N-terminal and C-terminal amino acid sequences, such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently reassociate or recombine into an intein that reacts with trans-splicing reactions. Any catalytically active intein or fragment thereof may be used to derive the cleaved intein for use in the methods of the invention. For example, in one aspect, the disrupted intein may be derived from a eukaryotic intein. Alternatively, the disrupted intein may be derived from a bacterial intein. Alternatively, the disrupted intein may be derived from an archaebacteria intein. Preferably, the cleaved inteins so derivatized will have only the amino acid sequence necessary to catalyze the trans-splicing reaction.
As used herein, "N-terminal cleavage intein (In)" refers to any intein sequence comprising an N-terminal amino acid sequence that is responsible for trans-splicing reactions. Thus, in also contains sequences that are cut out when trans-splicing occurs. In may comprise a modified sequence that is the N-terminal portion of a naturally occurring intein sequence. For example, in may comprise additional amino acid residues and/or mutated residues, provided that the inclusion of such additional and/or mutated residues does not render In nonfunctional In trans-splicing. Preferably, residues comprising additional and/or mutations increase or enhance the trans-splicing activity of In.
As used herein, "C-terminal split intein (Ic)" refers to any intein sequence comprising a C-terminal amino acid sequence that is responsible for the trans-splicing reaction. In one aspect, ic comprises 4 to 7 consecutive amino acid residues, at least 4 of which are from the last β -strand of the intein from which they were derived. Thus Ic also comprises the sequence that is cut out when trans-splicing occurs. Ic may comprise a modified sequence that is the C-terminal portion of a naturally occurring intein sequence. For example, ic may comprise additional amino acid residues and/or mutated residues, provided that the inclusion of such additional and/or mutated residues does not render In nonfunctional In trans-splicing. Preferably, inclusion of additional and/or mutated residues increases or enhances the trans-splicing activity of Ic.
In some embodiments of the invention, the peptide linked to Ic or In may comprise additional chemical moieties, including, inter alia, fluorophores, biotin, polyethylene glycol (PEG), amino acid analogs, unnatural amino acids, phosphate groups, glycosyls, radioisotope labels, and drug molecules. In other embodiments, the peptide linked to Ic may comprise one or more chemically reactive groups, including ketone, aldehyde, cys residues, and Lys residues. In the presence of an "Intein Splicing Polypeptide (ISP)", the N-intein and the C-intein of a cleaved intein can associate non-covalently to form an active intein and catalyze a splicing reaction. As used herein, "Intein Splice Polypeptide (ISP)" refers to the portion of the amino acid sequence of a cleaved intein that remains when Ic, in, or both are removed from the cleaved intein. In certain embodiments, in comprises an ISP. In another embodiment, ic comprises an ISP. In yet another embodiment, the ISP is a separate peptide that is covalently linked to neither In nor Ic.
The cleaved inteins may be generated from the contiguous inteins by engineering one or more cleavage sites in the unstructured loop or intervening amino acid sequences between-12 conserved β -strands present in the mini-intein structure. There may be some flexibility in the location of the cleavage site within the region between the β -strands, provided that cleavage occurs so as not to disrupt the structure of the intein, particularly the structured β -strand, to an extent sufficient to result in loss of splicing activity of the protein.
In protein trans-splicing, one precursor protein consists of an N-intein moiety and a subsequent N-intein moiety, the other precursor protein consists of a C-intein and a subsequent C-intein moiety, and the trans-splicing reaction (co-catalyzed by the N-and C-inteins) cleaves the two intein sequences and connects the two intein sequences with a peptide bond. Protein trans-splicing is an enzymatic reaction that can be performed at very low (e.g., micromolar) concentrations of protein and can be performed under physiological conditions.
Exemplary sequences are represented by SEQ ID NOS.16-23.
Although inteins are most often found as continuous domains, some exist in naturally broken forms. In this case, the two fragments are expressed as separate polypeptides and must associate before splicing occurs, so-called protein trans-splicing.
An exemplary break intein is an Ssp DnaE intein, which comprises two subunits, namely DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-N and dnaE-C, encoding dnaE-N and dnaE-C subunits, respectively. DnaE is a broken intein naturally occurring in synechocyanopsis sp. PCC6803 is capable of directing trans-splicing of two different proteins, each comprising a fusion with DnaE-N or DnaE-C.
Other naturally occurring or engineered cleaved intein sequences are known or can be prepared from the complete intein sequences described herein or those available in the art. Examples of disrupted intein sequences can be found in Stevens et al, "A promiscuous split intein with expanded protein engineering applications," PNAS,2017, vol.114:8538-8543; iwai et al, "Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostc punctiforme, FEBS Lett,580:1853-1858, each of which is incorporated herein by reference. Additional disrupted intein sequences can be found, for example, in WO2013/045632, WO2014/055782, WO2016/069774 and EP2877490, the respective contents of which are incorporated herein by reference.
Furthermore, protein trans-splicing has been described in vivo and in vitro (Shingledecker, et al, gene 207:187 (1998), southworth, et al, EMBO J.17:918 (1998), mills, et al, proc.Natl. Acad. Sci. USA,95:3543-3548 (1998), lew, et al, J.biol. Chem.,273:15887-15890 (1998), wu, et al, biochim. Biophys. Acta 35732:1 (1998 b), yamazaki, et al, J.am. Chem. 120:5591 (1998), evans, et al, J.biol. Chem.9091 (2000), oto, et al, biochemi. 38:16040-16044 (1999), otoo, et al, J.Biol. 14:105-105 (1998)), and the expression of the protein half-expressed by the two separate expression cassette (1998), such as those shown in FIGS. 96, for example, can be carried out in the figures, and the expression of the two expressed fragments can be performed by the expression cassette, such as those expressed in vivo and in vitro.
E.RNA-protein interaction domain
In various embodiments, two separate protein domains (e.g., cas9 domain and polymerase domain) can be co-located with each other by using an "RNA-protein recruitment system" (e.g., "MS2 tagging technology") to form a functional complex (similar to the function of a fusion protein comprising two separate protein domains). Such systems typically tag one protein domain with an "RNA-protein interaction domain" (also known as an "RNA-protein recruitment domain"), and tag another protein domain with an "RNA-binding protein" (e.g., a specific hairpin structure) that specifically recognizes and binds to the RNA-protein interaction domain. These types of systems can be utilized to co-locate the domains of the boot editor and to recruit additional functionality, such as UGI domains, for the boot editor. In one example, MS2 tagging techniques are based on the natural interaction of the MS2 phage coat protein ("MCP" or "MS2 cp") with stem-loop or hairpin structures present in the phage genome, i.e., "MS2 hairpin". In the case of MS2 hairpins, they are recognized and bound by MS2 phage coat protein (MCP). Thus, in one exemplary scenario, the deaminase-MS 2 fusion may recruit Cas9-MCP fusion.
Reviews of other modular RNA-protein interaction domains are described in, for example, johansson et al, "RNA recognition by the MS2 phage coat protein," Sem Virol, 1997, vol.8 (3): 176-185; delebicque et al, "Organization of intracellular reactions with rationally designed RNA assemblies," Science,2011, vol.333:470-474; mali et al, "Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering," Nat. Biotechnol.,2013, vol.31:833-838; and Zaletan et al, "Engineering complex synthetic transcriptional programs with CRISPR RNA scaffoldes," Cell,2015, vol.160:339-350, each of which is incorporated herein by reference in its entirety. Other systems include PP7 hairpins that specifically recruit PCP proteins and "Com" hairpins that specifically recruit Com proteins. See Zalatan et al.
The nucleotide sequence of the MS2 hairpin is represented by SEQ ID NO:24
The amino acid sequence of MCP or MS2cp is represented by SEQ ID NO. 25.
F.UGI domain
In other embodiments, the guidance editors described herein may comprise one or more uracil glycosidase inhibitor domains. As used herein, the term "Uracil Glycosidase Inhibitor (UGI)" or "UGI domain" refers to a protein capable of inhibiting uracil-DNA glycosidase base excision repair enzymes. In some embodiments, the UGI domain comprises a wild-type UGI or a UGI as shown in SEQ ID NO: 171. In some embodiments, the UGI proteins provided herein comprise a fragment of UGI and a protein homologous to UGI or fragment of UGI. For example, in some embodiments, the UGI domain comprises a fragment of the amino acid sequence shown as SEQ ID NO. 171. In some embodiments, the UGI fragment comprises an amino acid sequence comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence shown as SEQ ID NO. 171. In some embodiments, UGI comprises an amino acid sequence that is homologous to the amino acid sequence shown in SEQ ID NO. 171, or an amino acid sequence that is homologous to a fragment of the amino acid sequence shown in SEQ ID NO. 171. In some embodiments, a protein comprising UGI or a fragment of UGI or a homolog of UGI or fragment of UGI is referred to as a "UGI variant. The UGI variant has homology to UGI or a fragment thereof. For example, the UGI variant is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to the UGI shown in wild-type UGI or SEQ ID NO. 171. In some embodiments, the UGI variant comprises a fragment of UGI such that the fragment is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to the corresponding fragment of UGI as shown in wild-type UGI or SEQ ID NO. 171. In some embodiments, the UGI comprises the following amino acid sequences:
uracil-DNA glycosidase inhibitors:
>sp|P14739|UNGI_BPPB2
MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML(SEQ ID NO:171)。
the boot editor described herein may include more than one UGI field that may be separated by one or more linkers as described herein.
G.Other PE elements
In certain embodiments, the guidance editors described herein may comprise a base repair inhibitor. The term "base repair inhibitor" or "IBR" refers to a protein capable of inhibiting the activity of a nucleic acid repair enzyme (e.g., base excision repair enzyme). In some embodiments, the IBR is an OGG base excision repair inhibitor. In some embodiments, the IBR is a base excision repair inhibitor ("iBER"). Exemplary inhibitors of base excision repair include inhibitors of APE1, endo III, endo IV, endo V, endo VIII, fpg, alogg 1, hNEIL1, T7EndoI, T4PDG, UDG, hSMUG1, and hAAG. In some embodiments, the IBR is an inhibitor of Endo V or hAAG. In some embodiments, the IBR is an iBER that may be a small molecule or peptide inhibitor of a catalytically inactive glycosidase or a catalytically inactive dioxygenase or oxidase, or a variant thereof. In some embodiments, the IBR is iBER, which may be a TDG inhibitor, MBD4 inhibitor, or an alkbhase inhibitor. In some embodiments, the IBR is an iBER comprising a catalytically inactive TDG or a catalytically inactive MBD 4. An exemplary catalytically inactive TDG is the N140A mutant of SEQ ID NO:175 (human TDG).
Some exemplary glycosidases are provided below. Any of these catalytically inactive variants in the glycosidase domain is napDNAbp or iBER of the polymerase domain that can be fused to the guide editor provided by the present disclosure.
OGG (human)
MPARALLPRRMGHRTLASTPALWASIPCPRSELRLDLVLPSGQSFRWREQSPAHWSGVLADQVWTLTQTEEQLHCTVYRGDKSQASRPTPDELEAVRKYFQLDVTLAQLYHHWGSVDSHFQEVAQKFQGVRLLRQDPIECLFSFICSSNNNIARITGMVERLCQAFGPRLIQLDDVTYHGFPSLQALAGPEVEAHLRKLGLGYRARYVSASARAILEEQGGLAWLQQLRESSYEEAHKALCILPGVGTKVADCICLMALDKPQAVPVDVHMWHIAQRDYSWHPTTSQAKGPSPQTNKELGNFFRSLWGPYAGWAQAVLFSADLRQSRHAQEPPAKRRKGSKGPEG(SEQ ID NO:172)
MPG (human)
MVTPALQMKKPKQFCRRMGQKKQRPARAGQPHSSSDAAQAPAEQPHSSSDAAQAPCPRERCLGPPTTPGPYRSIYFSSPKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRIVETEAYLGPEDEAAHSRGGRQTPRNRGMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLETMRQLRSTLRKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVGVGHAGEWARKPLRFYVRGSPWVSVVDRVAEQDTQA(SEQ ID NO:173)
MBD4 (human)
MGTTGLESLSLGDRGAAPTVTSSERLVPDPPNDLRKEDVAMELERVGEDEEQMMIKRSSECNPLLQEPIASAQFGATAGTECRKSVPCGWERVVKQRLFGKTAGRFDVYFISPQGLKFRSKSSLANYLHKNGETSLKPEDFDFTVLSKRGIKSRYKDCSMAALTSHLQNQSNNSNWNLRTRSKCKKDVFMPPSSSSELQESRGLSNFTSTHLLLKEDEGVDDVNFRKVRKPKGKVTILKGIPIKKTKKGCRKSCSGFVQSDSKRESVCNKADAESEPVAQKSQLDRTVCISDAGACGETLSVTSEENSLVKKKERSLSSGSNFCSEQKTSGIINKFCSAKDSEHNEKYEDTFLESEEIGTKVEVVERKEHLHTDILKRGSEMDNNCSPTRKDFTGEKIFQEDTIPRTQIERRKTSLYFSSKYNKEALSPPRRKAFKKWTPPRSPFNLVQETLFHDPWKLLIATIFLNRTSGKMAIPVLWKFLEKYPSAEVARTADWRDVSELLKPLGLYDLRAKTIVKFSDEYLTKQWKYPIELHGIGKYGNDSYRIFCVNEWKQVHPEDHKLNKYHDWLWENHEKLSLS(SEQ ID NO:174)
TDG (human)
MEAENAGSYSLQQAQAFYTFPFQQLMAEAPNMAVVNEQQMPEEVPAPAPAQEPVQEAPKGRKRKPRTTEPKQPVEPKKPVESKKSGKSAKSKEKQEKITDTFKVKRKVDRFNGVSEAELLTKTLPDILTFNLDIVIIGINPGLMAAYKGHHYPGPGNHFWKCLFMSGLSEVQLNHMDDHTLPGKYGIGFTNMVERTTPGSKDLSSKEFREGGRILVQKLQKYQPRIAVFNGKCIYEIFSKEVFGVKVKNLEFGLQPHKIPDTETLCYVMPSSSARCAQFPRAQDKVHYYIKLKDLRDQLKGIERNMDVQEVQYTFDLQLAQEDAKKMAVKEEKYDPGYEAAYGGAYGENPCSSEPCGFSSNGLIESVELRGESAFSGIPNGQWMTQSFTDQIPSFSNHCGTQEQEEESHA(SEQ ID NO:175)
In some embodiments, the fusion proteins described herein can comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more domains in addition to the leader editor component). The fusion protein may comprise any additional protein sequence, and optionally comprises a linker sequence between any two domains. Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences (e.g., nuclear export sequences or other localization sequences), and sequence tags that may be used for the lysis, purification or detection of fusion proteins.
Examples of protein domains that can be fused to a guide editor or component thereof (e.g., napDNAbp domain, polymerase domain, or NLS domain) include, but are not limited to, epitope tags and reporter sequences. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza Hemagglutinin (HA) tags, myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol Acetyl Transferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green Fluorescent Protein (GFP), hcRed, dsRed, cyan Fluorescent Protein (CFP), yellow Fluorescent Protein (YFP), and autofluorescent proteins, including Blue Fluorescent Protein (BFP). The guide editor may be fused to gene sequences encoding proteins or protein fragments that bind to DNA molecules or bind to other cellular molecules, including but not limited to Maltose Binding Protein (MBP), S-tag, lex a DNA Binding Domain (DBD) fusion, GAL4DNA binding domain fusion, and Herpes Simplex Virus (HSV) BP16 protein fusion. Other domains that may form part of the guidance editor are described in U.S. patent publication No. 2011/0059502 published at 3/10 in 2011, and incorporated herein by reference in its entirety.
In one aspect of the disclosure, reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol Acetyl Transferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green Fluorescent Protein (GFP), hcRed, dsRed, cyan Fluorescent Protein (CFP), yellow Fluorescent Protein (YFP), and autofluorescent proteins, including Blue Fluorescent Protein (BFP), which can be introduced into cells to encode gene products for use as markers for measuring changes or modifications in gene product expression. In certain embodiments of the present disclosure, the gene product is a luciferase. In another embodiment of the present disclosure, expression of the gene product is reduced.
Suitable protein tags provided herein include, but are not limited to, a Biotin Carboxylase Carrier Protein (BCCP) tag, myc-tag, calmodulin-tag, FLAG-tag, hemagglutinin (HA) -tag, polyhistidine tag (also known as histidine tag or His tag), maltose Binding Protein (MBP) tag, nus tag, glutathione-S-transferase (GST) tag, green Fluorescent Protein (GFP) tag, thioredoxin tag, S tag, softag (e.g., softag1, softag 3), chain tag, biotin ligase tag, flash tag, V5 tag, and SBP tag. Other suitable sequences will be apparent to those skilled in the art. In some embodiments, the fusion protein comprises one or more His-tags.
In some embodiments of the present disclosure, the activity of the guided editing system may be time-sequentially adjusted by adjusting the residence time, amount, and/or activity of the expression component of the PE system. For example, as described herein, a PE may be fused to a protein domain capable of altering the intracellular half-life of the PE. In certain embodiments involving two or more vectors (e.g., vector systems in which the components described herein are encoded on two or more separate vectors), the activity of the PE system can be time-sequentially modulated by controlling the time of delivery of the vector. For example, in some embodiments, the vector encoding the nuclease system may deliver the PE prior to the vector encoding the template. In other embodiments, the vector encoding the pegRNA may deliver the guide before the vector encoding the PE system. In some embodiments, the vector encoding the PE system and the pegRNA are delivered simultaneously. In certain embodiments, the concurrently delivered vector delivers in time sequence, e.g., PE, pegRNA, and/or second strand guide RNA components. In further embodiments, the RNA transcribed from the coding sequence on the vector (e.g., a nuclease transcript, for example) may further comprise at least one element capable of altering the intracellular half-life of the RNA and/or modulating translational control. In some embodiments, the half-life of the RNA can be increased. In some embodiments, the half-life of the RNA can be reduced. In some embodiments, the element may be capable of increasing the stability of the RNA. In some embodiments, the element may be capable of reducing the stability of RNA. In some embodiments, the element may be within the 3' utr of the RNA. In some embodiments, the element may comprise a polyadenylation signal (PA). In some embodiments, the element may comprise a cap, such as an upstream mRNA or pegRNA end. In some embodiments, the RNA may not comprise PA, such that it degrades more rapidly in the cell after transcription. In some embodiments, the elements may comprise at least one AU-rich element (ARE). ARE can be bound by an ARE binding protein (ARE-BP) in a manner that depends on tissue type, cell type, time, cell location and environment. In some embodiments, the destabilizing element may promote RNA decay, affect RNA stability, or activate translation. In some embodiments, an ARE may comprise 50 to 150 nucleotides in length. In some embodiments, an ARE may comprise at least one copy of the sequence AUUUA. In some embodiments, at least one ARE may be added to the 3' utr of RNA. In some embodiments, the element may be woodchuck hepatitis virus (WHP).
A post-transcriptional regulatory element (WPRE) that generates tertiary structure to enhance expression of transcripts. In further embodiments, the elements are modified and/or truncated WPRE sequences capable of enhancing expression of transcripts, described, for example, in Zufferey et al, J Virol,73 (4): 2886-92 (1999) and Flajol et al, J Virol,72 (7): 6175-80 (1998). In some embodiments, WPRE or equivalent may be added to the 3' utr of RNA. In some embodiments, the element may be selected from other RNA sequence motifs that are enriched in transcripts that decay rapidly or slowly.
In some embodiments, the vector encoding the PE or pegRNA may self-destruct by cleavage of the target sequence present on the vector via the PE system. Cleavage may prevent PE or pegRNA from continuing transcription from the vector. While transcription may occur on the linearized vector for a period of time, the expressed transcripts or proteins that undergo intracellular degradation will have less time to produce off-target effects without the need for continued supply from the expression of the encoding vector.
[6]Modified pegRNA
The guided editing systems described herein contemplate the use of any suitable pegRNA, particularly one modified to include one or more structural motifs disclosed herein that confer improved properties, such as increased stability and/or increased affinity for Cas 9. The inventors have surprisingly found that by attaching certain nucleotide structural motifs to the pegRNA, for example to the extension arm end of the pegRNA, including but not limited to prequeosin 1 -1 riboswitch aptamer (' evoreQ) 1 -1 "), pseudoknot from MMLV viral genome (" evoreQ 1 -1 ") and MMLV RT were used as modified trnas for reverse transcription primers, achieving a consistent increase in editing activity.
Typical pegRNA Structure
FIG. 3A shows one embodiment of a typical pegRNA that can be modified and then used in the guided editing system disclosed herein. Typical pegrnas (i.e., excluding any modified pegrnas described herein) comprise a traditional guide RNA (green part) that includes a spacer sequence of about 20nt and a gRNA core region and binds to napDNAbp. Typical pegRNAs also include an extended RNA segment at the 5 'end, i.e., 5' extension, or an extended RNA segment at the 3 'end, i.e., 3' extension. The 5' extension includes a reverse transcription template sequence, a reverse transcription primer binding site, and optionally a 5-20 nucleotide linker sequence. As shown in FIGS. 1A-1B, the RT primer binding site hybridizes to the free 3' end formed after a nick is formed in the non-target strand of the R loop, thereby directing the reverse transcriptase to proceed with DNA polymerization in the 5' to 3' direction.
FIG. 3B shows another embodiment of a pegRNA useful in the guided editing system disclosed herein, wherein a traditional guide RNA (green part) comprises about 20nt protospacer and a gRNA core that binds to napdNAbp. In this embodiment, the guide RNA comprises an extended RNA fragment at the 3 'end, i.e., 3' extension. In this embodiment, the 3' extension includes a reverse transcription template sequence and a reverse transcription primer binding site. As shown in FIGS. 1C-1D, the RT primer binding site hybridizes to the free 3' end formed after a nick is formed in the non-target strand of the R loop, thereby priming the reverse transcriptase for DNA polymerization in the 5' to 3' direction.
FIG. 3C shows another embodiment of a pegRNA useful in the guided editing system disclosed herein, wherein a traditional guide RNA (green part) comprises about 20nt protospacer and a gRNA core that binds to napdNAbp. In this embodiment, the guide RNA comprises an extended RNA segment at a position between molecules in the gRNA core, i.e., intramolecular extension. In this embodiment, the intramolecular extension includes a reverse transcription template sequence and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3' end formed after a nick is formed in the non-target strand of the R loop, thereby priming the reverse transcriptase for DNA polymerization in the 5' to 3' direction.
Any of these exemplary pegRNAs may be further modified to include one or more modifications described herein to increase the efficiency of guided editing.
In one embodiment, the location of intermolecular RNA extension is not in the protospacer of the guide RNA. In another embodiment, the location of intermolecular RNA extension in the gRNA core. In yet another embodiment, the location of intermolecular RNA extension is any location of the guide RNA molecule other than within the protospacer, or at a location that disrupts the protospacer.
In one embodiment, the intermolecular RNA extension is inserted downstream of the 3' end of the protospacer. In another embodiment, the intermolecular RNA extension is inserted at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides downstream of the 3' end of the pre-spacer sequence.
In other embodiments, the intermolecular RNA extension inserts into the gRNA, which refers to the portion of the guide RNA that corresponds to or comprises the tracrRNA that binds to and/or interacts with the Cas9 protein or an equivalent thereof (i.e., a different napDNAbp). Preferably, insertion of the intermolecular RNA extension does not disrupt or minimally disrupts the interaction between the tracrRNA moiety and the napDNAbp.
The length of the RNA extension (which includes at least the RT template and the primer binding site) may be any useful length. In various embodiments, the RNA extension is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
The RT template sequence may also be of any suitable length. For example, the RT template sequence may be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
In other embodiments, wherein the reverse transcription primer binding site sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
In other embodiments, the optional linker or spacer sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
In certain embodiments, the RT template sequence encodes a single stranded DNA molecule that is homologous to the non-target strand (and thus complementary to the corresponding site of the target strand) but includes one or more nucleotide changes. The at least one nucleotide change may include one or more single base nucleotide changes, one or more deletions, and one or more insertions.
As shown in FIG. 1G, the single stranded DNA product synthesized by the RT template sequence is homologous to the non-target strand and contains one or more nucleotide changes. The single stranded DNA product of the RT template sequence hybridizes in equilibrium with the complementary target strand sequence, displacing the homologous endogenous target strand sequence. In some embodiments, the displaced endogenous strand may be referred to as a 5' endogenous DNA flap species (e.g., see fig. 1E). This 5 'endogenous DNA flap species can be removed by a 5' flap endonuclease (e.g., FEN 1) that can ligate the single stranded DNA product now hybridized to the endogenous target strand, thereby forming a mismatch between the endogenous sequence and the newly synthesized strand. Mismatches can be resolved by the process of cell's innate DNA repair and/or replication.
In various embodiments, the nucleotide sequence of the RT template sequence corresponds to the nucleotide sequence of a non-target strand that is displaced as a 5' flap species and overlaps with the site to be edited.
In various embodiments of the pegRNA, the reverse transcription template sequence can encode a single-stranded DNA flap complementary to the endogenous DNA sequence adjacent to the nicking site, wherein the single-stranded DNA flap comprises the desired nucleotide change. The single-stranded DNA flap can displace endogenous single-stranded DNA at the nicking site. The endogenous single stranded DNA displaced at the nicking site may have a 5' end and form an endogenous flap, which may be excised by the cell. In various embodiments, excision of the 5 'endogenous flap can aid in driving product formation, as removal of the 5' endogenous flap facilitates hybridization of the single-stranded 3'DNA flap to the corresponding complementary DNA strand, as well as incorporation or assimilation of the desired nucleotide change carried by the single-stranded 3' DNA flap into the target DNA.
In various embodiments of the pegRNA, cellular repair of the single stranded DNA flap results in the installation of the desired nucleotide change, thereby forming the desired product.
In other embodiments, the desired nucleotide changes are installed in the following edit window: between about-5 and +5 of the incision site, or between about-10 and +10 of the incision site, or between about-20 and +20 of the incision site, or between about-30 and +30 of the incision site, or between about-40 and +40 of the incision site, or between about-50 and +50 of the incision site, or between about-60 and +60 of the incision site, or between about-70 and +70 of the incision site, or between about-80 and +80 of the incision site, or between about-90 and +90 of the incision site, or between about-100 and +100 of the incision site, or between about-200 and +200 of the incision site.
In other embodiments, the desired nucleotide changes are installed in the following edit window: between about +1 and +2 from the incision site, or about +1 to +3, +1 to +4, +1 to +5, +1 to +6, +1 to +7, +1 to +8, +1 to +9, +1 to +10, +1 to +11, +1 to +12, +1 to +13, +1 to +14, +1 to +15, +1 to +16, +1 to +17, +1 to +18, +1 to +19, +1 to +20, +1 to +21, +1 to +22, +1 to +23, +1 to +24, +1 to +25, +1 to +26, +1 to +27, +1 to +28, +1 to +29, +1 to +30, +1 to +31, +1 to +32, +1 to +33, +1 to +34, +1 to +35, +1 to +36, +1 to +37, +1 to +38, +1 to +39, +1 to +40, +1 to +41. +1 to +44, +1 to +45, +1 to +46, +1 to +47, +1 to +48, +1 to +49, +1 to +50, +1 to +51, +1 to +52, +1 to +53, +1 to +54, +1 to +55, +1 to +56, +1 to +57, +1 to +58, +1 to +59, +1 to +60, +1 to +61, +1 to +62, +1 to +63, +1 to +64, +1 to +65, +1 to +66, +1 to +67, +1 to +68, +1 to +69, +1 to +70, +1 to +71, +1 to +72, +1 to +73, +1 to +74, +1 to +75, +1 to +76, +1 to +77, +1 to +78, +1 to +79, +1 to +80, +1 to +81, +1 to +82, +1 to +83, +1 to +1 +1 to +85, +1 to +86, +1 to +87, +1 to +88, +1 to +89, +1 to +90, +1 to +91, +1 to +92, +1 to +93, +1 to +94, +1 to +95, +1 to +96, +1 to +97, +1 to +98, +1 to +99, +1 to +100, +1 to +101, +1 to +102, +1 to +103, +1 to +104, +1 to +105, +1 to +106, +1 to +107, +1 to +108, +1 to +109, +1 to +110, +1 to +111, +1 to +112, +1 to +113, +1 to +114, +1 to +115, +1 to +116, +1 to +117, +1 to +118, +1 to +119, +1 to +120, +1 to +121, +1 to +123, +1 to +125, or +124.
In other embodiments, the desired nucleotide changes are installed in the following edit window: about +1 to +2 from the incision site, or about +1 to +5, +1 to +10, +1 to +15, +1 to +20, +1 to +25, +1 to +30, +1 to +35, +1 to +40, +1 to +45, +1 to +50, +1 to +55, +1 to +100, +1 to +105, +1 to +110, +1 to +115, +1 to +120, +1 to +125, +1 to +130, +1 to +135, +1 to +140, +1 to +145, +1 to +150, +1 to +155, +1 to +160, +1 to +165, +1 to +170, +1 to +175, +1 to +180, +1 to +185, +1 to +190, +1 to +195, or +1 to +200 from the incision site.
In a different aspect, the pegRNA is a modified form of the guide RNA. The guide RNA may be expressed from a coding nucleic acid or chemically synthesized. Methods for obtaining or otherwise synthesizing a guide RNA and determining the appropriate sequence for the guide RNA are well known in the art and include protospacers that interact and hybridize with the target strand of the genomic target site of interest.
In various embodiments, the particular design aspects of the guide RNA sequence depend on, among other factors, the nucleotide sequence of the genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., cas9 protein) present in the guided editing system described herein, e.g., PAM sequence position, percentage of G/C content in the target sequence, degree of micro-homologous regions, secondary structure, etc.
In general, a guide sequence is any polynucleotide sequence that has sufficient complementarity to a target polynucleotide sequence to hybridize to the target sequence and direct the specific binding of napDNAbp (e.g., cas9 homolog, or Cas9 variant) to the sequence of the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using a suitable alignment algorithm. The optimal alignment may be determined using any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, the Burrow-Wheeler transform-based algorithm (e.g., burrows Wheeler Aligner), clustalW, clustalX, BLAT, novoalign (Novocraft Technologies, ELAND (Illumina, san Diego, calif.), SOAP (available from SOAP. Genemics. Org. Cn), and Maq (available from map. Sourcefore. Net.) in some embodiments, the guide sequences are about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2728, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
In some embodiments, the guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12 or fewer nucleotides in length. The ability of the guide sequence to direct sequence-specific binding of the guide editor (PE) to the target sequence may be assessed by any suitable assay. For example, a host cell having a corresponding target sequence may be provided with a guide editor (PE) component, including the guide sequence to be tested, e.g., by transfection with a vector encoding a guide editor (PE) component disclosed herein, and then preferential cleavage within the target sequence is assessed, e.g., by a Surveyor assay described herein. Similarly, cleavage of a target polynucleotide sequence can be assessed in vitro by: a component of a target sequence, a guide editor (PE), is provided that includes a guide sequence to be tested and a control guide sequence that is different from the test guide sequence, and compares the binding or cleavage rate of the target sequence between the test and control guide sequence reactions. Other assays are possible and will occur to those of skill in the art.
The guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within the genome of the cell. Exemplary target sequences include those that are unique in the target genome. For example, for Streptococcus pyogenes Cas9, the unique target sequence in the genome can include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 176), where NNNNNNNNNNNNXGG (SEQ ID NO: 177) (N is A, G, T, or C; X can be any base). Unique target sequences in the genome may include the streptococcus pyogenes Cas9 target site in the form mmmmmmmmmmmmnnnnnnnnnnnnxgg (SEQ ID NO: 178), where nnnnnnnnnnxgg (SEQ ID NO: 179) (N is A, G, T or C; X may be any base). For streptococcus thermophilus CRISPR1Cas9, the unique target sequence in the genome may comprise the Cas9 target site of the form mmmmmmmmnnnnnnnnnnxxagaaw (SEQ ID NO: 180), where nnnnnnnnnnxxagaaw (SEQ ID NO: 181) (N is A, G, T or C; X may be any base; W is a or T). Unique target sequences in the genome may include the streptococcus thermophilus CRISPR1Cas9 target site in the form mmmmmmmmmmnnnnnnnnnnnnxxagaaaw (SEQ ID NO: 182), where nnnnnnnnnnxxagaaaw (SEQ ID NO: 183) (N is A, G, T or C; X may be any base; W is a or T). For Streptococcus pyogenes Cas9, the unique target sequence in the genome may comprise a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG (SEQ ID NO: 184), wherein NNNNNNNNNNNNXGGXG (SEQ ID NO: 185) (N is A, G, T or C; X may be any base). Unique target sequences in the genome may include the streptococcus pyogenes Cas9 target site in the form mmmmmmmmmmmmnnnnnnnnnnnnxggxg (SEQ ID NO: 186), wherein nnnnnnnnnnxggxg (SEQ ID NO: 187) (N is A, G, T or C; and X may be any base). In each of these sequences, "M" may be A, G, T or C and need not be considered when determining that the sequence is unique.
In some embodiments, the guide sequence is selected to reduce the extent of secondary structure within the guide sequence. The secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimum Gibbs free energy. An example of such an algorithm is mFold, such as Zuker and Stiegler (Nucleic Acids Res.9 (1981), 133-148). Another example folding algorithm is RNAfold, an online web server developed by the university of vienna (the University of Vienna) theoretical chemistry research, using centroid structure prediction algorithms (see, e.g., a.r. gruber et al, 2008, cell106 (1): 23-24; and PA Carr and GM Church,2009,Nature Biotechnology 27 (12): 1151-62). Further algorithms can be found in U.S. application Ser. No. 61/836,080 (Broad Reference BI-2013/004A); incorporated herein by reference.
Generally, a tracr mate sequence includes any sequence that has sufficient complementarity to a tracr sequence to facilitate one or more of the following: (1) Excision of the guide sequences flanking the tracr mate sequence in cells containing the corresponding tracr sequence; and (2) forming a complex at the target sequence, wherein the complex comprises a tracr mate sequence hybridized to a tracr sequence. In general, the degree of complementarity refers to the optimal alignment of the tracr mate sequence and tracr sequence along the length of the shorter of the two sequences. The optimal alignment may be determined by any suitable alignment algorithm and may further result in self-complementarity within a secondary structure, such as a tracr sequence or tracr mate sequence. In some embodiments, the optimal alignment is about or greater than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99% or more, the degree of complementarity between the tracr sequence and the tracr mate sequence is along the length of the shorter of the two. In some embodiments, the tracr sequence is about or greater than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50 or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two results in a transcript having a secondary structure (e.g., hairpin structure). The preferred loop forming sequence for the hairpin structure is four nucleotides in length, most preferably having the sequence GAAA. However, longer or shorter loop sequences may be used, and alternative sequences may be used. The sequence preferably includes a nucleotide triplet (e.g., AAA) and additional nucleotides (e.g., C or G). Examples of loop forming sequences include CAAA and AAAG. In embodiments of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has 2, 3, 4 or 5 hairpins. In another embodiment of the invention, the transcript has up to 5 hairpins. In some embodiments, the single transcript further comprises a transcription termination sequence; preferably, this is a polyT sequence, such as 6T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence and a tracr sequence are as follows (listed from 5 'to 3'), wherein "N" represents a base of the guide sequence, the first lowercase letter represents the tracr mate sequence, the second lowercase letter represents the tracr sequence, and the last poly-T sequence represents the transcription terminator:
(1)NNNNNNNNgtttttgtactctcaagatttaGAAAtaaatcttgcagaagctacaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT(SEQ ID NO:188);
(2)NNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT(SEQ ID NO:189);
(3)NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgtTTTTT(SEQ ID NO:190);
(4)NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgcTTTTTT(SEQ ID NO:191);
(5) TagetTagetTagetTagetTagetTTT (SEQ ID NO: 192). And
(6)NNNNNNNNNNNNNNNNNNNNgttttagagctagAAATAGcaagttaaaataaggctagtccgttatcaTTTTTTTT(SEQ ID NO:193)。
in some embodiments, sequences (1) to (3) are used in combination with Cas9 from streptococcus thermophilus CRISPR 1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from streptococcus pyogenes. In some embodiments, the tracr sequence is a transcript separate from the transcript comprising the tracr mate sequence.
It will be apparent to those of skill in the art that in order to target any fusion protein comprising a Cas9 domain and a single-stranded DNA binding protein to a target site, such as a site comprising a point mutation to be edited, it is often desirable to co-express the fusion protein with a guide RNA (e.g., sgRNA), as disclosed herein. As explained in more detail elsewhere herein, the guide RNA typically comprises a tracrRNA framework that allows Cas9 binding and a guide sequence that confers Cas9: nucleic acid editing enzyme/domain fusion protein sequence specificity.
In some embodiments, the guide RNA comprises the structure 5'- [ guide sequence ] -GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU-3' (SEQ ID NO: 194), wherein the guide sequence comprises a target sequence that is complementary to the target sequence. The guide sequence is typically 20 nucleotides long. The sequence of a suitable guide RNA for targeting the Cas9: nucleic acid editing enzyme/domain fusion protein to a specific genomic target site will be apparent to those skilled in the art based on the present disclosure. Such suitable guide RNA sequences typically comprise a guide sequence complementary to a nucleic acid sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Provided herein are some exemplary guide RNA sequences suitable for targeting any provided fusion protein to a specific target sequence. Other guide sequences are known in the art and may be used with the guide editor (PE) described herein.
In other embodiments, pegRNA includes those depicted in fig. 3D.
In other embodiments, the pegRNA may include those depicted in fig. 3E.
Fig. 3D provides a structure of an embodiment of the pegRNA contemplated herein, which can be designed according to the method defined in example 2. The pegRNA contains 3 major component elements arranged in the 5 'to 3' direction, namely: a spacer, a gRNA core, and an extension arm at the 3' end. The extension arms can be further divided in the 5 'to 3' direction into the following structural elements, namely: optional homology arms, DNA synthesis templates, and Primer Binding Sites (PBS). In addition, the pegRNA may comprise an optional 3 'terminal modification region (e 1) and an optional 5' terminal modification region (e 2). Still further, the pegRNA may comprise a transcription termination signal (not depicted) at the 3' end of the pegRNA. These structural elements are further defined herein. The description of the pegRNA structure is not meant to be limiting, but rather encompasses variations in the arrangement of the elements. For example, the optional sequence modification regions (e 1) and (e 2) may be located within or between any of the other regions shown, and are not limited to being located at the 3 'and 5' ends.
Fig. 3E provides a structure of another embodiment of the pegRNA contemplated herein, which can be designed according to the method defined in example 2. The pegRNA contains 3 major constituent elements arranged in the 5 'to 3' direction, namely: a spacer, a gRNA core, and an extension arm at the 3' end. The extension arms can be further divided in the 5 'to 3' direction into the following structural elements, namely: optional homology arms, DNA synthesis templates, and Primer Binding Sites (PBS). In addition, the pegRNA may comprise an optional 3 'terminal modification region (e 1) and an optional 5' terminal modification region (e 2). Still further, the pegRNA may comprise a transcription termination signal (not depicted) at the 3' end of the pegRNA. These structural elements are further defined herein. The description of the pegRNA structure is not meant to be limiting, but rather encompasses variations in the arrangement of the elements. For example, the optional sequence modification regions (e 1) and (e 2) may be located within or between any of the other regions shown, and are not limited to being located at the 3 'and 5' ends.
In some embodiments, the PEgRNA or nick-producing guide RNAs described herein comprise chemically modified nucleobases or nucleobase analogs. In some embodiments, the PEgRNA or nick-producing guide RNA comprises modified bases (e.g., methylated bases), inserted bases, modified sugars (e.g., 2 '-fluoro ribose, 2' -deoxyribose, 2 '-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioate and 5' n phosphoramidite linkages). In some embodiments, the PEgRNA comprises a 2' -O-methyl modification. In some embodiments, the PEgRNA comprises phosphorothioate linkages between the first and last three nucleotides of the RNA.
In some embodiments, the PEgRNA or nicking-producing guide RNAs described herein comprise a chemical modification comprising a hydrohaline (nebularine) or a deoxyhydrohaline (deoxynebularine). In some embodiments, the PEgRNA or nick-generating guide RNA comprises a chemical modification comprising a phosphorothioate linkage. In some embodiments, the PEgRNA or nick-generating guide RNA comprises a phosphorothioate linkage at the 5 'end or the 3' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises two and no more than two consecutive phosphorothioate linkages at the 5 'end or the 3' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises three consecutive phosphorothioate linkages at the 5 'end or the 3' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises the sequence 5'-UsUsU-3' at the 3 'end or the 5' end, wherein U represents uridine and wherein s represents a phosphorothioate linkage. In some embodiments, nucleobases can be chemically modified. Examples of nucleobase chemical modifications include, but are not limited to, 2-thiouridine, 4-thiouridine, N6-methyladenosine, pseudouridine, 2, 6-diaminopurine, inosine, thymidine, 5-methylcytosine, 5-substituted pyrimidine, isoguanine, isocytosine, or halogenated aromatic groups.
Non-limiting examples of modifications can include 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-MOE), 2 '-fluoro (2' -F), phosphorothioate (PS) linkages between nucleotides, G-C substitutions, and reverse abasic linkages between nucleotides and equivalents thereof. In some embodiments, the PEgRNA comprises a chemical modification selected from the group consisting of: dihydrouridine, inosine, 7-methylguanosine, 5-methylcytidine (5 mC), 5' -phosphonucleoside, 2' -O-methylnucleoside, 2' -O-ethylnucleoside, 2' -fluororiboside, C-5-propynyldeoxycytidine (pdC), C-5-propynyldeoxyuridine (pdU), C-5-propynyluridine (pU), 5-methylcytidine, 5-methyluridine, 5-methyldeoxycytidine, 5-methyldeoxyuridine methoxy, 2, 6-diaminopurine, 5' -dimethoxytrityl-N4-ethyl-2 ' -deoxycytidine, C-5-propynyl-f-cytidine (pfC), C-5-propynyl-f-uridine (pfU), 5-methylf-cytidine, 5-methylf-uridine, C-5-propynyl-meta-cytidine (pmC), 5-methylmeta-uridine, B (minor groove binder) pseudouridine (1-methyl-N4-ethyl-2 ' -deoxycytidine, C-5-propynyl-f-cytidine (pfC), 5-methylcytidine (Me) and 2' -fluororiboside modifications (Me), 2' -fluororiboside modifications (e.g., 2' -fluororiboside modifications (e.g., 2' -O-riboside modifications), locked Nucleic Acid (LNA), C-ethylene bridged nucleic acid (ENA), bridged Nucleic Acid (BNA), unlocked Nucleic Acid (UNA)), base or nucleobase modification, internucleoside linkage modification, ribonucleic acid, 2 '-O-methyl-aequorin or 2' -deoxyaequorin. Other examples of modifications include, but are not limited to, 5 'adenylate, 5' guanosine-triphosphate cap, 5 'N7-methylguanosine-triphosphate cap, 5' triphosphate cap, 3 'phosphate, 3' phosphorothioate, 5 'phosphate, 5' phosphorothioate, cis-Syn thymidine dimer, trimer, abasic, acridine, azobenzene, biotin BB, biotin TEG, cholesterol TEG, desulphate, TEG, DNP-X, DOTA, dT-biotin, bisbiotin, PC biotin, psoralen C2, psoralen C6, TINA, 3'DABCYL, black hole quencher 1, black hole quencher 2, DABCYL SE, dT-DABCYL, IRDye QC-1, QSY-21, QSY-35, QSY-7, QSY-9, carboxy-linker, thiol linker, 2' deoxyribonucleoside analog purine, 2 'deoxyribonucleoside analog, pyrimidine, nucleoside analog, 2' -methyl ribonucleoside, 2 '-phosphorodithionate, 3' -phospho-methyl nucleoside, 3 '-2, 3' -phosphorodithionate, 3 '-phospho-methyl-2, 3' -phosphorodithioate, fluorescent 2 '-methyl-ribonucleoside, 3' -phospho-2, fluorescent phosphate, 3 '-phospho-2' -methyl-phospho-2, or the like.
In some embodiments, PEgRNA and/or nicking-generating guide RNAs provided in the present disclosure can undergo a modification, such as a chemical modification or a biological modification. Modifications may be made at any position within the PEgRNA or nick-producing guide RNA, and may include one or more modifications to the nucleobase, ribose component, phosphate backbone, or any combination thereof. In some embodiments, the modification may be a structure-directed modification. In some embodiments, the modification is located at the 5 'end and/or the 3' end of the PEgRNA. In some embodiments, the chemical modification is located at the 5 'end and/or the 3' end of the nicking guide RNA. In some embodiments, the modification may be within a spacer region, an extension arm, a DNA synthesis template, and/or a primer binding site of PEgRNA. In some embodiments, the modification may be within the spacer sequence or the gRNA core of the PEgRNA or nick-generating guide RNA. In some embodiments, the modification may be within the 3' -most end of the PEgRNA or nick-producing guide RNA. In some embodiments, the modification may be within the 5' -most end of the PEgRNA or nick-producing guide RNA. In some embodiments, the PEgRNA or nick-generating guide RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises 1, 2, 3, 4, or 5 or more modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, 3, 4, or more than 5 modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, or 3 or more modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, or 3 more modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more consecutive modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more consecutive modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, 3, 4, or 5 consecutive modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, 3, 4, or 5 consecutive modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, or 3 consecutive modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, or 3 consecutive modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 3 consecutive modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nick-generating guide RNA comprises 1, 2, 3, 4, 5 or more modified nucleotides near the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 3 consecutive modified nucleotides at the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 3 consecutive modified nucleotides at the 5' end. In some embodiments, the PEgRNA or nick-producing guide RNA comprises 1, 2, 3, 4, 5 or more modified nucleotides near the 3' end. In some embodiments, the PEgRNA or nicking guide RNA comprises 1, 2, 3, 4, 5, or more consecutive modified nucleotides near the 3' end.
PegRNA design method
The disclosure also relates to methods for designing pegrnas.
In one aspect of the design, the design method may consider a particular application of guided editing to be used. For example, as illustrated and discussed herein, guided editing may use, but is not limited to: (a) installing correct mutation changes to the nucleotide sequence, (b) installing protein and RNA tags, (c) installing immune epitopes on the protein of interest, (d) installing inducible dimerization domains in the protein, (e) installing or removing sequences to alter the activity of the biomolecule, (f) installing recombinase target sites to direct specific genetic changes, and (g) mutagenesis using error-prone RT target sequences. In addition to these methods of inserting, altering or deleting nucleotide sequences, typically at the target site of interest, the guided editor can also be used to construct highly programmable libraries, as well as to conduct cell data recording and lineage tracing studies. In these various applications, there may be specific design aspects associated with preparing pegrnas that are particularly useful in any given of these applications, as described herein.
Many considerations are considered in designing pegRNA for any particular application or use of guided editing, including but not limited to:
(a) A target sequence, i.e., a nucleotide sequence in which it is desired to direct the editor to install one or more nucleobase modifications;
(b) The position of the cleavage site within the target sequence, i.e., the guide editor will induce a single-stranded nick at a specific nucleobase site to create a 3' RT primer sequence on one side of the nick and a 5' endogenous flap on the other side of the nick (eventually removed by FEN1 or its equivalent and replaced by a 3' ssdna flap). The cleavage site is similar to the "editing position" in that this results in a 3' end RT primer sequence that is RT extended during RNA-dependent DNA polymerization to produce a 3' ssdna flap that contains the desired editing, which then replaces the 5' endogenous DNA flap in the target sequence.
(c) Useful PAM sequences (including classical SpCas9 PAM sites, as well as non-classical PAM sites recognized by Cas9 variants and equivalents with extended or different PAM specificities);
(d) The spacing between the PAM sequences available and the position of the cleavage site of the target sequence;
(e) Specific Cas9, cas9 variants or Cas9 equivalents of the guide editor used;
(f) The sequence and length of the primer binding site;
(g) Editing the sequence and length of the template;
(h) Sequence and length of homology arms;
(i) Spacer sequence and length; and
(j) Core sequence.
The present disclosure discusses these aspects above.
In one embodiment, provided herein are methods of designing suitable pegrnas and optionally nick-generating sgrnas for use in second site nick generation. This embodiment provides a set of step descriptions of design pegRNA and nicking sgRNA for guided editing, which take into account one or more of the above considerations. These steps refer to examples shown in fig. 70A to 70I.
1. Defining target sequences and edits. The sequence (about 200 bp) of the target DNA region centered at the desired editing (point mutation, insertion, deletion or combination thereof) position is retrieved. See fig. 70A.
2. Positioning target PAM. PAM is determined to be close to the desired edit position. PAM can be determined on any DNA strand adjacent to the desired editing position. While PAM is preferred near the edit site (i.e., where the cut site is less than 30nt from the edit site, or less than 29nt, 28nt, 27nt, 26nt, 25nt, 24nt, 23nt, 22nt, 21nt, 20nt, 19nt, 18nt, 17nt, 16nt, 15nt, 14nt, 13nt, 12nt, 11nt, 10nt, 9nt, 8nt, 7nt, 6nt, 5nt, 4nt, 3nt, or 2nt from the edit site) the cut may be placed at ≡30nt from the edit site using the pre-spacer and PAM installation edits. See fig. 70B.
3. The incision site is located. For each PAM under consideration, the corresponding incision site and on which strand is determined. For Sp Cas 9H 840A nickase, cleavage occurs between the 3 rd and 4 th bases of NGG PAM 5' in the PAM-containing strand. All editing nucleotides must be present 3 'of the nick site, so the appropriate PAM must place the nick 5' of the target editing of the PAM-containing strand. In the examples shown below, there are two possible PAMs. For simplicity, the remaining steps will show a pegRNA design using PAM1 only. See fig. 70C.
4. Spacer sequences were designed. The pre-spacer of SpCas9 corresponds to 20 nucleotides 5' of NGG PAM in the PAM-containing strand. Efficient Pol III transcription initiation requires G as the first transcribed nucleotide. If the first nucleotide of the pre-spacer is G, the spacer sequence of the pegRNA is the pre-spacer. If the first nucleotide of the pre-spacer is not G, the spacer sequence of the pegRNA is G followed by the pro-spacer. See fig. 70D.
5. Primer Binding Sites (PBS) were designed. The DNA primer for the PAM-containing strand was determined using the starting allele sequence. The 3 'end of the DNA primer is the nucleotide just upstream of the nicking site (i.e., the 4 th base of 5' of NGG PAM of Sp Cas 9). As a general design principle using PE2 and PE3, a pegRNA Primer Binding Site (PBS) comprising 12 to 13 nucleotides complementary to a DNA primer can be used for sequences comprising a GC content of about 40-60%. Longer (14 to 15 nt) PBS should be tested for lower GC content sequences. For higher GC content sequences, shorter (8 to 11 nt) PBS should be tested. The optimal PBS sequence should be determined empirically, regardless of GC content How this is. To design a length p PBS sequence, use was made ofInitiationAllele sequence, reverse complement of the first p nucleotides 5' of the nick site in PAM-containing strand. See fig. 70E.
6. The RT template (or DNA synthesis template) is designed. The RT template (or DNA synthesis template, where the polymerase is not reverse transcriptase) encodes the designed edits and is homologous to sequences adjacent to the edits. In one embodiment, these regions correspond to the DNA synthesis templates of fig. 3D and 3E, wherein the DNA synthesis templates comprise an "editing template" and a "homology arm". The optimal RT template length varies depending on the target site. For short-range editing (positions +1 to +6), it is recommended to test short (9 to 12 nt), medium (13 to 16 nt) and long (17 to 20 nt) RT templates. For remote editing (+7 and above), it is recommended to use an RT template that extends at least 5nt (preferably 10nt or more) after the editing site to allow for sufficient 3' DNA flap homology. For remote editing, multiple RT templates should be screened to identify functional designs. For larger insertions and deletions (. Gtoreq.5 nt), it is recommended to incorporate greater 3' homology (about 20nt or more) to the RT template. Editing efficiency is often compromised when the RT template uses synthesis encoding G as the last nucleotide in the reverse transcribed DNA product (corresponding to C in the RT template for pegRNA). Since many RT templates support efficient guided editing, it is suggested to avoid G as the final synthesized nucleotide when designing RT templates. To design a RT template sequence of length r, use is made of Phase of time Wash the looking atAllele sequence, and take the reverse complement of the first r nucleotides 3' of the nick site in the original PAM-containing strand. Note that insertion or deletion editing using RT templates of the same length does not contain the same homology as SNP editing. See fig. 70F.
7. The complete pegRNA sequence was assembled. The pegRNA modules were ligated in the following order (5 'to 3'): spacer, scaffold, RT template and PBS. See fig. 70G.
8. The nicking sgrnas of PE3 were designed. PAM on the non-edit chain upstream and downstream of the edit is determined. The optimal incision generation location is highly locus dependent and should be determined empirically. Typically, a 40 to 90 nucleotide cut is placed 5' to the site opposite the pegRNA induced cutThe mouth results in higher editing yield and fewer indels. Nicking-producing sgrnas haveInitiationA20-nt protospacer in an allele matches a spacer sequence, if the protospacer does not start with G, 5' -G is added. See fig. 70H.
9. PE3b nicking generating sgRNA was designed. If PAM is present in the complementary strand and its corresponding protospacer overlaps with the sequence targeted for editing, this editing may be a candidate for the PE3b system. In the PE3b system, the spacer sequence of the nicking-generating sgRNA matches the sequence of the desired editing allele, but does not match the sequence of the starting allele. The PE3b system operates efficiently when the editing nucleotide falls within the seed region of the nicking, generating sgRNA pro-spacer (about 10nt adjacent to the PAM). This prevents nicking of the complementary strand prior to installing the editing strand, thereby preventing competition between the pegRNA and the sgRNA for binding to the target DNA. PE3b also avoids nicking both strands simultaneously, thereby significantly reducing indel formation while maintaining high editing efficiency. PE3b sgRNA should have a high affinity for It is desirable toA 20nt protospacer matched spacer sequence in the allele and 5' g added as needed. See fig. 70I.
The above described stepwise method for designing suitable pegRNA and second site nicking generating sgRNA is not meant to be limiting in any way. The present disclosure contemplates variations of the above-described step-wise methods as may be derived therefrom by one of ordinary skill in the art.
The present disclosure provides next generation modified pegrnas with improved properties including, but not limited to, increased stability and cell longevity, and improved binding affinity to napDNAbp. These modified pegRNAs can improve genome editing, as demonstrated by increasing the editing efficiency of various genomic sites. The inventors have surprisingly found that by attaching certain nucleic acid structural motifs to the extension arm end of pegRNA, including but not limited to prequeosine 1 -1 riboswitch aptamer (' evoreQ) 1 -1 ") or a variant thereof, a pseudoknot MMLV viral genome from pegRNA (" evoreQ 1-1 ") or a variant thereof, a modified tRNA or a variant thereof, MMLV RT as a reverse transcription primer, and a G quadruplex or a variant thereof, in factA consistent increase in editing activity is now seen.
In one embodiment, the modified pegRNA comprises a nucleic acid moiety at the 3' end of the pegRNA, as shown in FIG. 98. Optionally, the 3' end of the pegRNA is fused to the nucleic acid moiety by a nucleotide linker. In various embodiments, it will be appreciated that a variety of nucleotide sequences will function reasonably well for each genomic target site. The joint length may also be variable. In some cases, a linker ranging from 3 to 18 nucleotides in length will work. In other cases, the linker can be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides.
In general, the nucleic acid portion that can be used to modify the pegRNA, for example, by attaching it to the 3' end of the pegRNA, can include any nucleic acid portion, including, for example, a nucleic acid molecule that comprises or forms a duplex portion, a toe loop portion, a hairpin portion, a stem loop portion, a pseudoknot portion, an aptamer portion, a G quadruplex portion, a tRNA portion, or a ribozyme portion. The nucleic acid portion may be characterized as forming a secondary nucleic acid structure, a tertiary nucleic acid structure, or a quaternary nucleic acid structure. In other words, the nucleic acid portion may form any two-dimensional or three-dimensional structure known to be formed from such structures. The nucleic acid portion may be DNA or RNA.
Without limitation, the following are specific examples of nucleotide motifs that can be appended to the end of the extension arm of a pegRNA. Thus, in the case of a 3 'extension arm, the nucleotide motif will optionally be coupled, attached or otherwise linked to 3' of the pegRNA via a linker. In the case of a 5 'extension arm, the nucleotide motif will optionally be coupled, attached or otherwise linked to the 5' end of the pegRNA via a linker.
/>
As described above, these motifs may be coupled, attached or otherwise linked to typical pegrnas via a linker. Exemplary linkers include, but are not limited to:
In some embodiments, the linker will be designed and/or selected based on the genomic site targeted by the guided editing and the modified pegRNA.
In various embodiments, it will be appreciated that a wide variety of nucleotide sequences will function reasonably well for each genomic target site. The joint length may also be variable. In some cases, a linker ranging from 3 to 18 nucleotides in length will work. In other cases, the linker can be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides.
In one embodiment, the linker is 8 nucleotides in length.
The present disclosure also contemplates variants of the above nucleotide motifs and linkers that have at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to any of the above motifs and linker sequences.
The pegRNA may also include additional design improvements that may modify the characteristics and/or features of the pegRNA, thereby increasing the efficacy of guided editing. In various embodiments, these improvements may fall into one or more of many different categories, including, but not limited to: (1) Design to be able to efficiently express functional pegRNA from non-polymerase III (pol III) promoters, this would enable longer pegRNA to be expressed without cumbersome sequence requirements; (2) The improvement of the core and Cas9 combined pegRNA bracket can improve the curative effect; (3) Modifying the pegRNA to increase the ability of RT to continue synthesis, thereby enabling insertion of longer sequences at the target genomic locus; and (4) adding an RNA motif at the 5 'or 3' end of the pegRNA to increase stability of the pegRNA, enhance RT processing ability, prevent misfolding of the pegRNA, or recruit other factors important for genome editing.
In one embodiment, the pegRNA may be designed with a polIII promoter to increase expression of a longer length of pegRNA with a larger extension arm. sgrnas are typically expressed from the U6 snRNA promoter. This promoter recruits pol III to express the relevant RNA and can be used to express short RNAs that remain in the nucleus. However, pol III is not sufficiently processible to express RNA exceeding a few hundred nucleotides in length at the level required for efficient genome editing. In addition, pol III may stop or terminate at the extension of U, potentially limiting the sequence diversity that can be inserted using pegRNA. Other promoters that recruit polymerase II (e.g., pCMV) or polymerase I (e.g., U1 snRNA promoter) have been tested for their ability to express longer sgRNAs. However, these promoters are typically partially transcribed, which results in an additional sequence 5' of the spacer in the expressed pegRNA, which has been shown to significantly reduce Cas9: sgRNA activity in a site-dependent manner. Furthermore, while pol III transcribed pegRNA can simply terminate in a stretch of 6-7U, a different termination signal is required for a pegRNA transcribed from pol II or pol I. Typically such signals will also result in polyadenylation, which will result in the intentional transport of pegRNA from the nucleus. Similarly, RNAs expressed from pol II promoters (e.g., pCMV) are typically 5' -capped, also resulting in their nuclear export.
Exemplary U6 promoters include, but are not limited to:
u6 promoter:
GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCG(SEQ ID NO:237)
u6v9 promoter:
GCCTGAGGCGTGGGGCCGCCTCCCAAAGACTTCTGGGAGGGCGGTGCGGCTCAGGCTCTGCCCCGCCTCCGGGGCTATTTGCATACGACCATTTCCAGTAATTCCCAGCAGCCACCGTAGCTATATTTGGTAGAACAACGAGCACTTTCTCAACTCCAGTCAATAACTACGTTAGTTGCATTACACATTGGGCTAATATAAATAGAGGTTAAATCTCTAGGTCATTTAAGAGAAGTCGGCCTATGTGTACAGACATTTGTTCCAGGGGCTTTAAATAGCTGGTGGTGGAACTCAATATTCG(SEQ ID NO:238)
u6v7 promoter:
AAGTCCGCGGCACGAGAAATCAAAGCCCCGGGGCCTGGGTCCCACGCGGGGTCCCTTACCCAGGGTGCCCCGGGCGCTCATTTGCATGTCCCACCCAACAGGTAAACCTGACAGATCGGTCGCGGCCAGGTACGGCCTGGCGGTCAGAGCACCAAACTTACGAGCCTTGTGATGAGTTCCGTTACATGAAATTCTCCTAAAGGCTCCAAGATGGACAGGAAAGCGCTCGATTAGGTTACCGTAAGGAAAACAAATGAGAAACTCCCGTGCCTTATAAGACCTGGGGACGGACTTATTTGCG(SEQ ID NO:239)
u6v4 promoter:
AAATTGAGTCATCTGACAGAAATTATCTTTGGCAAGGTTTTAGTCCTAGGGTTACCAGATGGAATACAGGACATCCATTTAAATTTGAATTTCAGATAAACAGTTAACACTTCTCAAGGATAAATATGCCTCAAATATTGCACGGGACATATTTATACTAAAAAAAAAGTGTTTTTTTTTTTCCTGCGATTCAAACTTAACTGGTGTCCTGCATTTGTATTTGTTAAATCTGTCAATCCTATCTCAGTTTCCTTTGATGGAATGTACCTCTGTGCTAATATTTAAAAATAGGTTACATTTG(SEQ ID NO:240)
those of ordinary skill in the art will appreciate that these promoter sequences may be trimmed at 5' and still function at the same or nearly the same level. For example, any U6 promoter may be trimmed at the 5 'end by removing up to 1, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 nucleotides from the 5' end, i.e., about 30% of the promoter length. In other embodiments, up to 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29% or up to 30% of the promoter length from the 5' end.
One of ordinary skill in the art will also appreciate that other promoters may be used to improve expression of a longer length pegRNA with a larger extension arm. For example, in different cell types, other promoters may be preferred and result in higher expression of the longer length of pegRNA.
Previously, rinn and colleagues screened a variety of expression platforms for the production of long non-coding RNAs (lncRNA) tagged sgRNA 183 . These platforms include ENE elements expressed from pCMV and terminating in MALAT1 ncRNA from human 184 PAN ENE element of KSHV 185 Or U1 snRNA 3' cassette 186 Is a RNA of (C). Notably, MALAT1 ncRNA and PANEs form a triple helix protecting the polyA tail 184,187 . These constructs may also enhance RNA stability. It is expected that these expression systems will also be able to express longer pegRNA.
In addition, a series of methods have been devised for cleaving the part of the pol II promoter that will be transcribed as part of the pegRNA, adding a self-cleaving ribozyme such as hammerhead 188 Pistol-shaped 189 Axe shape 189 Hair clip 190 、VS 191 Twist shape 192 Or sister twisted (twister) 192 Ribozymes, or other self-cleaving elements to process guidance of transcription, or hairpins recognized by Csy4 and resulting in guidance processing 193 . Furthermore, the incorporation of multiple ENE motifs can increase the expression and stability of pegRNA. Circularization, as previously shown for KSHV PAN RNA and elements 185 . Circularization of pegRNA in the form of circular intron RNA (cisRNA) is also expected to lead to enhanced RNA expression and stability, as well as nuclear localization.
In various embodiments, the pegRNA may include various of the above elements, as exemplified by the following sequences. Non-limiting example 1-pegRNA expression platform consisting of pCMV, csy4 hairpin, pegRNA and MALAT1 ENE
TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTAGGGTCATGAAGGTTTTTCTTTTCCTGAGAAAACAACACGTATTGTTTTCTCAGGTTTTGCTTTTTGGCCTTTTTCTAGCTTAAAAAAAAAAAAAGCAAAAGATGCTGGTGGTTGGCACTCCTGGTTTCCAGGACGGGGTTCAAATCCCTGCGGCGTCTTTGCTTTGACT(SEQ ID NO:241)
Non-limiting example 2-pegRNA expression platform consisting of pCMV, csy4 hairpin, pegRNA and PAN ENE
TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAA(SEQ ID NO:242)
Non-limiting example 3-pegRNA expression platform consisting of pCMV, csy4 hairpin, pegRNA and 3xPAN ENE
TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAAACACACTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAATCTCTCTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAA(SEQ ID NO:243)
Non-limiting example 4-pegRNA expression platform consisting of pCMV, csy4 hairpin, pegRNA and 3' frame
TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTGTTTCAAAAGTAGACTGTACGCTAAGGGTCATATCTTTTTTTGTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA(SEQ ID NO:244)
Non-limiting example 5-pegRNA expression platform CTAAGGACCAGCTTCTTTGGGAGAGAACAGACGCAGGGGCGGGAGGGAAAAAGGGAGAGGCAGACGTCACTTCCCCTTGGCGGCTCTGGCAGCAGATTGGTCGGTTGAGTGGCAGAAAGGCAGACGGGGACTGGGCAAGGCACTGTCGGTGACATCACGGACAGGGCGACTTCTATGTAGATGAGGCAGCGCAGAGGCTGCTGCTTCGCCACTTGCTGCTTCACCACGAAGGAGTTCCCGTGCCCTGGGAGCGGGTTCAGGACCGCTGATCGGAAGTGAGAATCCCAGCTGTGTGTCAGGGCTGGAAAGGGCTCGGGAGTGCGCGGGGCAAGTGACCGTGTGTGTAAAGAGTGAGGCGTATGAGGCTGTGTCGGGGCAGAGGCCCAAGATCTCAGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTCAGCAAGTTCAGAGAAATCTGAACTTGCTGGATTTTTGGAGCAGGGAGATGGAATAGGAGCTTGCTCCGTCCACTCCACGCATCGACCTGGTATTGCAGTACCTCCAGGAACGGTGCACCCACTTTCTGGAGTTTCAAAAGTAGACTGTACGCTAAGGGTCATATCTTTTTTTGTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA (SEQ ID NO: 245) consisting of pU1, csy4 hairpin, pegRNA and 3' frame.
In various other embodiments, the pegRNA may be modified by introducing modifications to the scaffold or core sequence.
The core, cas 9-binding pegRNA scaffold can be modified to enhance PE activity. In an exemplary method, the first mating element (P1) of the scaffold contains a GTTTT-AAAAAAC (SEQ ID NO: 246) mating element. Such segments of Ts can lead to pol III pauses and premature termination of RNA transcripts. A T-A pair in this part of P1 can be mutated to G-C pair as a result of the rational mutation to enhance sgRNA activity. This method can be used to improve pegRNA. Furthermore, increasing the length of P1 may enhance sgRNA folding and result in improved pegRNA activity. Example refinements of the core include:
p1-containing 6nt extended pegRNA
GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGCTCATGAAAATGAGCTAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT(SEQ ID NO:247)
PegRNA containing T-A to G-C mutation in P1
GGCCCAGACTGAGCACGTGAGTTTGAGAGCTAGAAATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT(SEQ ID NO:248)
pegRNA split into CRISPR-and tracrRNA components:
GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGA(SEQ ID NO:249)
AATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTG(SEQ ID NO:250)
in various other embodiments, the pegRNA can be improved by introducing modifications to the editing template region. As the size of the peprna-templated insert increases, it is more likely to be degraded by endonucleases, spontaneously hydrolyze or fold into a secondary structure that cannot be reverse transcribed by RT, or disrupt the folding of the peprna scaffold and subsequent Cas9-RT binding. Thus, modification of the pegRNA template may be required to affect large insertions, such as insertion of the entire gene. Some strategies to do this include incorporating modified nucleotides into the synthetic or semi-synthetic pegRNA to make the RNA more resistant to degradation or hydrolysis, or less likely to adopt inhibitory secondary structures 196 . Such modifications may include 8-nitrogen-7-deazaguanosine, which will reduce the RNA secondary structure in G-rich sequences; locked Nucleic Acid (LNA) can reduce degradation and enhance certain kinds of RNA secondary structures; 2' -O-methyl, 2' -fluoro or 2' -O-methoxyethoxy modification, enhancing RNA stability. Such modifications may also be included elsewhere in the pegRNA to enhance stability and activity. Alternatively or additionally, templates for pegRNA may be designed to encode both the desired protein product and more likely to adopt a simple secondary structure that can be expanded by RT. This simple structure will act as a thermodynamic sink, so that more complex structures are less likely to occur to prevent reverse transcription. Finally, the template can also be split into two separate pegRNAs. In such a design, PE will be used to initiate transcription and recruit individual template RNAs to the target site via an RNA recognition element (e.g., MS2 aptamer) on the RNA binding protein fused to Cas9 or on the pegRNA itself. RT may bind directly to this separate template RNA or reverse transcription may be initiated on the original pegRNA before switching to the second template. This approach can achieve long insertion by preventing misfolding of the pegRNA upon addition of the long template and without dissociation of Cas9 from the genome, which may inhibit PE-based long insertion.
In other embodiments, the peptide may be provided by the 5 'and 3' ends of the pegRNA, or even the positions in betweenAdditional RNA motifs are introduced (e.g., in the core or spacer region of the gRNA) to improve the pegRNA. Several such motifs, e.g., PAN ENE from KSHV and ENE from MALAT1, are discussed above as possible means of terminating expression of longer pegRNA from a non-pol III promoter. These elements form RNA triplexes that engulf the polyA tail, resulting in their retention in the nucleus 184,187 . However, by forming complex structures at the 3' end of the pegRNA to block the terminal nucleotides, these structures may also help prevent exonuclease-mediated degradation of the pegRNA.
Although unable to terminate from a non-pol III promoter, other structural elements inserted at the 3' end may also enhance RNA stability. Such motifs may include hairpin or RNA quadruplexes, which block the 3' end 197 Or a self-cleaving ribozyme (e.g., HDV), which results in the formation of a 2' -3' -cyclic phosphate at the 3' end and may reduce the likelihood of degradation of the pegRNA by exonucleases 198 . Inducing cyclization of pegRNA via incomplete splicing-forming cRNA-may also increase pegRNA stability and result in retention of pegRNA in the nucleus 194
Additional RNA motifs can also improve RT processivity or enhance pegRNA activity by enhancing the binding of RT to DNA-RNA duplex. The addition of native sequences bound by RT to its cognate retroviral genome can enhance RT activity 199 . This may include the natural Primer Binding Site (PBS), the polypurine tract (PPT) or the kissing ring involved in retroviral genome dimerization and transcription initiation 199
Addition of dimerization motifs to 5 'and 3' ends of pegRNA-such as kissing loops or GNRA four-ring/four-ring receptor pairs 200 Can also lead to efficient cyclization of the pegRNA, improving stability. Furthermore, it is expected that the addition of these motifs may allow for physical separation of the pepRNA spacer and primer, preventing blocking of the spacer that may hinder PE activity. Short 5 'or 3' extensions of the pegRNA form small fulcrum hairpins at the spacer or along the primer binding site, which may also advantageously counter annealing of internal complementary regions along the length of the pegRNA, e.g. interactions between spacers and possibly primer binding sites. Finally, the kiss ringCan be used to recruit other template RNAs to the genomic site and effect exchange of RT activity from one RNA to another. As an exemplary embodiment of the various secondary structures, the pegrnas depicted in fig. 3D and 3E list some secondary RNA structures that can be engineered into any region of the pegRNA, including the end portions of the extension arms (i.e., E1 and E2), as shown.
Example improvements include, but are not limited to:
pegRNA-HDV fusion
GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATGGCGAATGGGACTTTTTTT(SEQ ID NO:251)
pegRNA-MMLV kissing ring
GGTGGGAGACGTCCCACCGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTGGTGGGAGACGTCCCACCTTTTTTT(SEQ ID NO:252)
pegRNA-VS ribozyme kissing ring
GAGCAGCATGGCGTCGCTGCTCACGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTCCATCAGTTGACACCCTGAGGTTTTTTT(SEQ ID NO:253)
pegRNA-GNRA tetracyclic/tetracyclic receptors
GCAGACCTAAGTGGUGACATATGGTCTGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAUACGTAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTUACGAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTGCATGCGATTAGAAATAATCGCATGTTTTTTT(SEQ ID NO:254)
pegRNA template switch secondary RNA-HDV fusion TCTGCCATCAAAGCTGCGACCGTGCTCAGTCTGGTGGGAGACGTCCCACCGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATGGCGAATGGGACTTTTTTT (SEQ ID NO: 255)
The pegRNA scaffold can be further improved via directed evolution in a manner similar to that of improving SpCas9 and guide editor (PE). Directed evolution can enhance recognition of pegRNA by Cas9 or evolved Cas9 variants. Furthermore, differentPossibly optimized at different genomic loci, either enhancing PE activity at the relevant site, or reducing off-target activity, or both. Finally, evolution of the pegRNA scaffold with the addition of other RNA motifs almost certainly increases the activity of the fused pegRNA relative to the non-evolved fusion RNA. For example, the evolution of an allosteric ribozyme consisting of a c-di-GMP-I aptamer and a hammerhead ribozyme results in a significant increase in activity 202 This suggests that evolution would also increase the activity of the hammerhead-pegRNA fusion. Furthermore, while Cas9 is currently generally unable to tolerate 5' extension of sgrnas, directed evolution may generate enabling mutations that mitigate this intolerance, allowing the use of additional RNA motifs.
In various embodiments, other scaffolds that have been shown to increase activity relative to typical sgRNA scaffolds may be used for the pegrnas and epegr rnas as described herein. Such improvements may include, for example, those disclosed in Chen, b.et al, dynamic Imaging of Genomic Loci in Living Human Cells by an Optimized CRISPR/Cas system, cell, 2013,155 (7), 1479-1471, and Jost, m.et al, titrating expression using libraries of systematically attenuated CRISPR guide rnas, nat, biotechnol, 2020,38,355-364, the entire contents of which are incorporated herein by reference. These improvements may enhance epegRNA activity by improving binding to a guided editor and/or improved expression. stabilization of the sgRNA scaffold can also reduce PBS/spacer interactions that inhibit perna and epegr activity.
Exemplary epegrnas incorporating improved sgRNA scaffolds include, but are not limited to: HEK3 1-15del standard stent evoparQ 1
GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:256)
HEK3 1-15del cr748 evopreQ1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:257)
HEK3 1-15del cr289 evopreQ1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:258)
HEK3 1-15del cr622 evopreQ1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:259)
HEK3 1-15del cr772 evopreQ1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:260)
HEK3 1-15del cr532 evopreQ1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:261)
HEK3 1-15del cr961 evopreQ1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:262)
HEK3 1-15del eversion and extension scaffold evatreQ 1
GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGCCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:263)
RNF2 1-15del cr748 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:264)
RNF2 1-15del cr289 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:265)
RNF2 1-15del cr622 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:266)
RNF2 1-15del cr772 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:267)
RNF2 1-15del cr532 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:268)
RNF2 1-15del cr961 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:269)
RNF2 1-15del eversion and extension scaffold evapmeq 1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTATGGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:270)
RUNX1 1-15del Standard stent evaopreQ 1
GCATTTTCAGGAGGAAGCGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:271)
RUNX1 1-15del cr748 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:272)
RUNX1 1-15del cr289 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:273)
RUNX1 1-15del cr622 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:274)
RUNX1 1-15del cr772 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:275)
RUNX1 1-15del cr532 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:276)
RUNX1 1-15del cr961 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:277)
RUNX1 1-15del eversion and extension scaffold evatreQ 1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:278)
RUNX1+5G-T standard stent evoreQ 1
GCATTTTCAGGAGGAAGCGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:279)
RUNX1+5G-T cr748 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:280)
RUNX1+5G-T cr289 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:281)
RUNX1+5G-T cr622 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:282)
RUNX1+5G-T cr772 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:283)
RUNX1+5G-T cr532 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:284)
RUNX1+5G-T cr961 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:285)
RUNX1+5G-T turnover and extension scaffold evopmeQ 1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTGTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:286)
DNMT1 1-15del standard support evoparQ 1
GATTCCTGGTGCCAGAAACAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:287)
DNMT1 1-15del cr748 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:288)
DNMT1 1-15del cr289 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:289)
DNMT1 1-15del cr622 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:290)
DNMT1 1-15del cr772 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:291)
DNMT1 1-15del cr532 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:292)
DNMT1 1-15del cr961 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:293)
DNMT1 1-15del eversion and extension scaffold evatreQ 1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTGCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:294)
DNMT 1+5G-T standard support evopmeQ 1
GATTCCTGGTGCCAGAAACAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:295)
DNMT1+5G--T cr748 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:296)
DNMT1+5G--T cr289 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:297)
DNMT1+5G--T cr622 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:298)
DNMT1+5G--T cr772 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:299)
DNMT1+5G--T cr532 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:300)
DNMT1+5G--T cr961 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:301)
DNMT 1+5G-T turnover and extension stent evoreQ 1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:302)
FANCF 1-15del Standard scaffold evoparQ 1
GGAATCCCTTCTGCAGCACCGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:303)
FANCF 1-15del cr748 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:304)
FANCF 1-15del cr289 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:305)
FANCF 1-15del cr622 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:306)
FANCF 1-15del cr772 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:307)
FANCF 1-15del cr532 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:308)
FANCF 1-15del cr961 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:309)
FANCF 1-15del eversion and extension scaffold evatreQ 1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTAGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:310)
FANCF+5G--T cr748 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:311)
FANCF+5G--T cr289 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:312)
FANCF+5G--T cr622 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:313)
FANCF+5G--T cr772 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:314)
FANCF+5G--T cr532 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:315)
FANCF+5G--T cr961 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:316)
FANCF+5G-T turnover and extension scaffold evapmeQ 1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGGAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:317)
EMX1 1-15del Standard scaffold evoparQ 1
GAGTCCGAGCAGAAGAAGAAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:318)
EMX1 1-15del cr748 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:319)
EMX1 1-15del cr289 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:320)
EMX1 1-15del cr622 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:321)
EMX1 1-15del cr772 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:322)
EMX1 1-15del cr532 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:323)
EMX1 1-15del cr961 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:324)
EMX1 1-15del eversion and extension scaffold evapmeQ 1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTCGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:325)
EMX1+5G—T Standard stent evaopreQ 1
GAGTCCGAGCAGAAGAAGAAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:326)
EMX1+5G--T cr748 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:327)
EMX1+5G--T cr289 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:328)
EMX1+5G--T cr622 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:329)
EMX1+5G--T cr772 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:330)
EMX1+5G--T cr532 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:331)
EMX1+5G--T cr961 evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:332)
EMX1+5G-T turnover and extension stent evoreQ 1
GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:333)
RNF2+1FLAG standard stent evopmeQ 1
GTCATCTTAGTCATTACCTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:334)RNF2+1FLAG cr748 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:335)
RNF2+1FLAG cr289 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:336)
RNF2+1FLAG cr622 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:337)
RNF2+1FLAG cr772 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:338)
RNF2+1FLAG cr532 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:339)
RNF2+1FLAG cr961 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:340)
RNF2+1FLAG turnover and extension scaffold evopmeQ 1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTGAGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:341)
VEGFA+5 G--T cr748 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:342)
VEGFA+5 G--T cr289 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:343)
VEGFA+5 G--T cr622 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:344)
VEGFA+5 G--T cr772 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:345)
VEGFA+5 G--T cr532 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:346)
VEGFA+5 G--T cr961 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:347)
VEGFA+5G-T turnover and extension stent evoreQ 1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:348)
VEGFA+1FLAG standard stent evopmeQ 1
GATGTCTGCAGGCCAGATGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:349)VEGFA+1FLAG cr748 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:350)
VEGFA+1FLAG cr289 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:351)
VEGFA+1FLAG cr622 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:352)
VEGFA+1FLAG cr772 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:353)
VEGFA+1FLAG cr532 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:354)
VEGFA+1FLAG cr961 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:355)
VEGFA+1FLAG turnover and extension scaffold evopmeQ 1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTAATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:356)
VEGFA 1-15del Standard scaffold evoparQ 1
GATGTCTGCAGGCCAGATGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:357)
VEGFA 1-15 del cr748 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:358)
VEGFA 1-15 del cr289 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:359)
VEGFA 1-15 del cr622 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:360)
VEGFA 1-15 del cr772 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:361)
VEGFA 1-15 del cr532 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:362)
VEGFA 1-15 del cr961 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:363)
VEGFA 1-15del eversion and extension scaffold evoreQ 1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:364)
Runx1+1FLAG standard stent evoparq 1
GCATTTTCAGGAGGAAGCGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:365)
RUNX1+1FLAG cr748 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:366)
RUNX1+1FLAG cr289 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:367)
RUNX1+1FLAG cr622 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:368)
RUNX1+1FLAG cr772 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:369)
RUNX1+1FLAG cr532 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:370)
RUNX1+1FLAG cr961 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:371)
RUNX1+1FLAG turnover and extension scaffold evopmeQ 1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTGTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:372)
DNMT1+1FLAG standard stent evopmeQ 1
GATTCCTGGTGCCAGAAACAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:373)DNMT1+1FLAG cr748 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:374)
DNMT1+1FLAG cr289 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:375)
DNMT1+1FLAG cr622 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:376)
DNMT1+1FLAG cr772 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:377)
DNMT1+1FLAG cr532 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:378)
DNMT1+1FLAG cr961 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:379)
DNMT1+1FLAG overturning and extending bracket evoreQ 1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTCTGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT(SEQ ID NO:380)
The present disclosure contemplates any such ways to further improve the efficiency of the guidance editing system disclosed herein.
In various embodiments, it may be advantageous to limit the occurrence of consecutive T sequences from the extension arm, as consecutive T sequences may limit the ability of the pegRNA to be transcribed. For example, at least three consecutive T's, at least four consecutive T's, at least five consecutive T's, at least six consecutive T's, at least seven consecutive T's, at least eight consecutive T's, at least nine consecutive T's, at least ten consecutive T's, at least eleven consecutive T's, at least twelve consecutive T's, at least thirteen consecutive T's, at least fourteen consecutive T's, or at least fifteen consecutive T's should be avoided or at least should be removed from the final design sequence when designing the pegRNA. In one embodiment, the inclusion of unwanted consecutive T strings in the pegRNA extension arm can be avoided, but a target site enriched in consecutive A: T nucleobase pairs is avoided.
In other embodiments, the guided editing system may include the use of pegRNA designs and strategies that may improve guided editing efficiency. These strategies aim to overcome some of the problems that exist with the multi-step process required to guide editing. For example, adverse RNA structures that may form within the pegRNA may result in inhibition of DNA editing from the pegRNA to the genomic locus. These limitations can be overcome by redesigning and engineering the pegRNA component. These redesigns may increase the efficiency of the guided editor and allow for longer insertion sequences to be installed into the genome.
Thus, in various embodiments, the pegRNA design can produce longer pegRNA by enabling efficient expression of functional pegRNA from non-polymerase III (pol III) promoters, which would avoid the need for cumbersome sequence requirements. In other embodiments, the core, cas 9-binding pegRNA scaffold can be modified to increase the efficiency of the system. In other embodiments, the pegRNA may be modified to increase the sustained synthesis capacity of Reverse Transcriptase (RT), which will enable insertion of longer sequences at the target genomic locus. In other embodiments, RNA motifs may be added to the 5 'and/or 3' ends of the pegRNA to improve stability, enhance RT processing ability, prevent misfolding of the pegRNA, and/or recruit other factors important for genome editing. In yet another embodiment, a platform is provided for pegRNA evolution for a given sequence target that can improve pegRNA scaffolds and improve guided editor efficiency. These designs can be used to improve any pegRNA recognized by any Cas9 or evolutionary variant thereof.
Such an application of guided editing can be further described in embodiment 2.
The pegRNA may include additional design improvements that may modify the properties and/or characteristics of the pegRNA, thereby increasing the efficiency of guided editing. In various embodiments, these improvements may fall into one or more of many different categories, including, but not limited to: (1) Design to be able to efficiently express functional pegRNA from non-polymerase III (pol III) promoters, this would enable longer pegRNA to be expressed without cumbersome sequence requirements; (2) The improvement of the core and Cas9 combined pegRNA bracket can improve the efficiency; (3) Modifying the pegRNA to increase the ability of RT to continue synthesis, thereby enabling insertion of longer sequences at the target genomic locus; (4) RNA motifs are added to the 5 'or 3' end of the pegRNA to increase stability of the pegRNA, enhance RT processing ability, prevent misfolding of the pegRNA, or recruit other factors important for genome editing.
In one embodiment, the pegRNA may be designed with a polIII promoter to increase expression of a longer length of pegRNA with a larger extension arm. sgrnas are typically expressed from the U6 snRNA promoter. This promoter recruits pol III to express the relevant RNA and can be used to express short RNAs that remain in the nucleus. However, pol III is not sufficiently processible to express RNA exceeding a few hundred nucleotides in length at the level required for efficient genome editing. In addition, pol III may stop or terminate at the extension of U, potentially limiting the sequence diversity that can be inserted using pegRNA. Other promoters that recruit polymerase II (e.g., pCMV) or polymerase I (e.g., U1 snRNA promoter) have been tested for their ability to express longer sgRNAs. However, these promoters are typically partially transcribed, which will result in an additional sequence 5' to the spacer in the expressed pegRNA, which has been shown to significantly reduce Cas9: sgRNA activity in a site-dependent manner. Furthermore, while pol III transcribed pegRNA can simply terminate in a stretch of 6-7U, a different termination signal is required for a pegRNA transcribed from pol II or pol I. Typically such signals also result in polyadenylation, which can lead to undesired transport of the pegRNA from the nucleus. Similarly, RNAs expressed from pol II promoters (e.g., pCMV) are typically 5' capped, also resulting in their nuclear export.
Heretofore, rinn and colleagues screened a variety of expression platforms for the production of long non-coding RNA (lncRNA) tagged sgRNAs 183 . These platforms include ENE elements expressed from pCMV and terminating in MALAT1 ncRNA from human 184 PAN ENE element of KSHV 185 Or U1snRNA 3' cassette 186 Is a RNA of (C). Notably, MALAT1 ncRNA and PANEs form a triple helix protecting the polyA tail 184,187 . These constructs may also enhance RNA stability. It is expected that these expression systems will also be able to express longer pegRNA.
In addition, a series of methods have been devised for cleaving the part of the pol II promoter that will be transcribed as part of the pegRNA, adding a self-cleaving ribozyme such as hammerhead 188 Pistol-shaped 189 Axe shape 189 Hair clip 190 、VS 191 Twist shape 192 Or sister twisted (twister) 192 Ribozymes, or other self-cleaving elements to process guidance of transcription, or hairpins recognized by Csy4 and resulting in guidance processing 193 . Furthermore, the incorporation of multiple ENE motifs can increase the expression and stability of pegRNA. Circularization of pegRNA in the form of circular intron RNA (cisRNA) is expected to result in enhanced RNA expression and stability, as well as nuclear localization.
In various embodiments, the pegRNA may include various of the above elements, as exemplified by SEQ ID NOS.241-245.
In various other embodiments, the pegRNA may be modified by introducing modifications to the scaffold or core sequence. This can be accomplished by introducing known means
The core, cas 9-binding pegRNA scaffold can be modified to enhance PE activity. In an exemplary method, the first mating element (P1) of the scaffold contains a GTTTT-AAAAAAC (SEQ ID NO: 246) mating element. Such T-segments can lead to pol III pauses and premature termination of RNA transcripts. A T-A pair in this part of P1 can be mutated to G-C pair as a result of the rational mutation to enhance sgRNA activity. This method can be used to improve pegRNA. Furthermore, increasing the length of P1 may enhance sgRNA folding and result in improved pegRNA activity. Example refinements of the core include:
p1-containing 6nt extended pegRNA
GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGCTCATGAAAATGAGCTAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT(SEQ ID NO:228)
PegRNA containing T-A to G-C mutation in P1
GGCCCAGACTGAGCACGTGAGTTTGAGAGCTAGAAATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT (SEQ ID NOS: 247 and 248).
In various other embodiments, the pegRNA can be improved by introducing modifications to the editing template region. As the size of the peprna-templated insert increases, it is more likely to be degraded by endonucleases, spontaneously hydrolyze or fold into a secondary structure that cannot be reverse transcribed by RT, or disrupt the folding of the peprna scaffold and subsequent Cas9-RT binding. Thus, modification of the pegRNA template may be required to affect large insertions, such as insertion of the entire gene. Some strategies to do this include incorporating modified nucleotides into the synthetic or semi-synthetic pegRNA to make the RNA more resistant to degradation or hydrolysis, or less likely to adopt inhibitory secondary structures 196 . Such modifications may include 8-nitrogen-7-deazaguanosine, which will reduce the RNA secondary structure in G-rich sequences; locked Nucleic Acid (LNA) can reduce degradation and enhance certain kinds of RNA secondary structures; 2' -O-methyl, 2' -fluoro or 2' -O-methoxyethoxy modification, enhancing RNA stability. Such modifications may also be contained in other pegRNAWhere to enhance stability and activity. Alternatively or additionally, templates for pegRNA may be designed to encode both the desired protein product and more likely to adopt a simple secondary structure that can be expanded by RT. This simple structure will act as a thermodynamic sink, so that more complex structures are less likely to occur to prevent reverse transcription. Finally, the template can also be split into two separate pegRNAs. In such a design, PE will be used to initiate transcription and recruit individual template RNAs to the target site via an RNA recognition element (e.g., MS2 aptamer) on the RNA binding protein fused to Cas9 or on the pegRNA itself. RT may bind directly to this separate template RNA or reverse transcription may be initiated on the original pegRNA before switching to the second template. This approach can achieve long insertion by preventing misfolding of the pegRNA upon addition of the long template and without dissociation of Cas9 from the genome, which may inhibit PE-based long insertion.
In other embodiments, the pegRNA can be modified by introducing additional RNA motifs at the 5 'and 3' ends of the pegRNA. Several such motifs, e.g., PAN ENE from KSHV and ENE from MALAT1, are discussed above as possible means of terminating expression of longer pegRNA from a non-pol III promoter. These elements form RNA triplexes that engulf the polyA tail, resulting in their retention in the nucleus 184,187 . However, by forming complex structures at the 3' end of the pegRNA to block the terminal nucleotides, these structures may also help prevent exonuclease-mediated degradation of the pegRNA.
Although unable to terminate from a non-pol III promoter, other structural elements inserted at the 3' end may also enhance RNA stability. Such motifs may include hairpin or RNA quadruplexes, which block the 3' -end 197 Or a self-cleaving ribozyme (e.g., HDV) which results in the formation of a 2' -3' -cyclic phosphate at the 3' end and which may reduce the likelihood of degradation of the pegRNA by exonucleases 198 . Inducing cyclization of pegRNA via incomplete splicing-forming cRNA-may also increase pegRNA stability and result in retention of pegRNA in the nucleus 194
Additional RNA motifs can also be created by enhancing RT and DNA-RNA duplexTo improve RT processivity or to enhance pegRNA activity. The addition of native sequences bound by RT to its cognate retroviral genome can enhance RT activity 199 . This may include the natural Primer Binding Site (PBS), the polypurine tract (PPT) or the kissing ring involved in retroviral genome dimerization and transcription initiation 199
Addition of dimerization motifs to 5 'and 3' ends of pegRNA-such as kissing loops or GNRA four-ring/four-ring receptor pairs 200 Can also lead to efficient cyclization of the pegRNA, improving stability. Furthermore, it is expected that the addition of these motifs may allow for physical separation of the pepRNA spacer and primer, preventing blocking of the spacer that may hinder PE activity. Short 5 'or 3' extensions of the pegRNA form small fulcrum hairpins at the spacer or along the primer binding site, which can also compete with the pegRNA annealing region that binds to the spacer. Finally, kissing loops can also be used to recruit other template RNAs to the genomic site and effect exchange of RT activity from one RNA to another. Exemplary modifications include, but are not limited to SEQ ID NOS 251-255.
The pegRNA scaffold can be further improved via directed evolution in a manner similar to that of improving SpCas9 and guide editor (PE). Directed evolution can enhance recognition of pegRNA by Cas9 or evolved Cas9 variants. Furthermore, different pegRNA scaffold sequences may be optimal at different genomic loci, either to enhance PE activity at the relevant site, or to reduce off-target activity, or both. Finally, evolution of the pegRNA scaffold with the addition of other RNA motifs almost certainly increases the activity of the fused pegRNA relative to the non-evolved fusion RNA. For example, the evolution of an allosteric ribozyme consisting of a c-di-GMP-I aptamer and a hammerhead ribozyme results in a significant increase in activity 202 This suggests that evolution would also increase the activity of the hammerhead-pegRNA fusion. Furthermore, while Cas9 is currently generally unable to tolerate 5' extension of sgrnas, directed evolution may generate enabling mutations that mitigate this intolerance, allowing the use of additional RNA motifs.
The present disclosure contemplates any such manner to further increase the efficiency of the guidance editing system disclosed herein.
[7] Calculation method for nucleotide joint design
In one aspect of the disclosure, the inventors have developed new computational techniques, which may be embodied in software, for identifying one or more nucleotide linkers for coupling a guided editing guide RNA to a nucleic acid moiety, such as, but not limited to, an aptamer (e.g., prequeosin 1 -1 riboswitch aptamer or "evoparq 1 -1 ") or a variant thereof, a pseudoknot (MMLV viral genome pseudoknot or" Mpknot-1 ") or a variant thereof, a tRNA (e.g., a modified tRNA of MMLV used as a reverse transcription primer) or a variant thereof, or a G-quadruplex or a variant thereof. Exemplary nucleotide sequences for such linkers are provided herein throughout and include, but are not limited to, SEQ ID NOS 225-236.
The computational technique, which may be referred to herein as a pegRNA adaptor identification tool ("pegLIT"), involves effectively evaluating nucleic acid adaptor candidates to identify those that have a lower propensity for base pairing with other regions of the pegRNA (e.g., regions comprising primer binding sites, spacers, DNA synthesis templates, and/or gRNA cores). In some embodiments, the propensity of a particular linker candidate to base pair with one or more regions of the pegRNA can be determined using a computational tool that is used to model RNA-RNA interactions while taking into account the secondary structure of the RNA. An illustrative example of such a computing tool is ViennaRNA, aspects of which are described in Lorenz, r.et al ViennaRNA package 2.0.Algorithms Mol Biol 6,26 (2011), the entire contents of which are incorporated herein by reference.
The inventors have recognized that evaluating the fitness of each possible nucleic acid adaptor candidate is computationally impractical because: (1) Assessing fitness of individual linker candidates is computationally expensive (e.g., because it involves physics-based RNA secondary structure modeling); and (2) the number of linker candidates to be considered increases exponentially with their length. Furthermore, in the context of screening, all linker candidates have to be re-assessed for any changes in pegRNA (e.g., in PBS, spacer, template and/or core region).
Accordingly, the inventors developed optimization techniques to efficiently explore the space of nucleic acid linker candidates to identify linkers suitable for coupling pegRNA to nucleic acid moieties. In some embodiments, the optimization technique involves identifying a plurality of linker candidates using an iterative optimization method (e.g., simulated annealing). In some embodiments, the linker candidates identified using the optimization technique may be clustered to obtain linker clusters and one or more representative linkers in each cluster may be returned, which may help to promote diversity among the identified linkers.
In some embodiments, the optimization technique involves calculating a plurality of scores for each joint candidate under consideration. Each of the plurality of scores may indicate a degree to which the linker candidate may interact with a region of the pegRNA. Thus, multiple scores for a single linker candidate represent the extent to which the linker candidate interacts with multiple regions of the pegRNA. Considering the interactions of the linker with the pegRNA on a region-by-region basis helps to determine the fitness of each linker candidate more accurately than other methods.
In fact, the computational techniques developed by the inventors not only resulted in computational improvements over brute force search methods (e.g., reduced utilization of processor and memory computer resources), but also identified linkers that improved overall PE editing efficiency compared to those predicted by either artificially designed linkers or computational tools interacting with primer binding sites. The improvement in editing efficiency is shown in fig. 113A-113E and described with reference to fig. 113A-113E.
Accordingly, some embodiments provide a method for identifying at least one nucleic acid adaptor for coupling a guided editing guide RNA (pegRNA) to a nucleic acid portion, the method comprising: (1) Generating a plurality of nucleic acid adaptor candidates including a first nucleic acid adaptor candidate; (2) Identifying at least one nucleic acid adaptor from a plurality of nucleic acid adaptor candidates at least in part by: (a) Calculating a plurality of scores for at least some of the plurality of nucleic acid adaptor candidates, the calculating comprising calculating a first set of scores for a first nucleic acid adaptor candidate, the first set of scores comprising: a first score indicative of the extent of interaction between the first nucleic acid linker candidate and the first region of the pegRNA; a second score indicative of a degree of interaction between the first nucleic acid linker candidate and a second region of the pegRNA (e.g., wherein the first and second regions are different regions); (b) Identifying at least one nucleic acid adaptor from at least some of the plurality of nucleic acid adaptor candidates using the calculated multiplex score; and (3) outputting information indicative of the at least one nucleic acid adaptor.
In some embodiments, the first score indicates the extent to which the first nucleic acid linker candidate is predicted to avoid interacting with the first region of the pegRNA, and the second score indicates the extent to which the first nucleic acid linker candidate is predicted to avoid interacting with the second region of the pegRNA. For example, if the first score has a value of 0.8, this may indicate that the predicted probability of the pegRNA folding state lacking base pairing between any nucleotide of the linker candidate and the first region is 80% on average. As another example, if the value of the second score is 0.9, this may indicate that the predicted probability of the pegRNA folding status lacking base pairing between any nucleotide of the linker candidate and the second region is on average 90%.
In some embodiments, the first region may comprise a primer binding site of the pegRNA, a spacer of the pegRNA, a DNA synthesis template of the pegRNA, or a gRNA core of the pegRNA. The second region may further comprise a primer binding site for the pegRNA, a spacer for the pegRNA, a DNA synthesis template for the pegRNA, or a gRNA core for the pegRNA.
In some embodiments, the first set of scores further comprises a third score indicating the extent to which the first nucleic acid linker candidate is predicted to avoid interacting with the third region of the pegRNA and a fourth score indicating the extent to which the first nucleic acid linker candidate is predicted to avoid interacting with the fourth region of the pegRNA. Wherein the first nucleic acid linker candidate is predicted to avoid interaction with the fourth region of the pegRNA. In some embodiments, the first, second, third, and fourth regions may comprise PBS of the pegRNA, a spacer of the pegRNA, a DNA synthesis template of the pegRNA, and a gRNA core of the pegRNA, respectively.
In some embodiments, pegRNA is used to install nucleotide edits in a double stranded target DNA sequence. The pegRNA may comprise: a spacer region that hybridizes to a first strand of a double-stranded target DNA sequence, an extension arm that hybridizes to a second strand of the double-stranded target DNA sequence, an extension arm comprising a Primer Binding Site (PBS) and a DNA synthesis template comprising nucleotide editing, and a gRNA core that interacts with a nucleic acid programmable DNA binding protein napDNAbp. In some such embodiments, the first region comprises PBS, the second region comprises a spacer, the third region comprises a DNA synthesis template, and the fourth region comprises a gRNA core.
In some embodiments, the fitness of various linker candidates may be assessed relative to each other based on their scores. In some embodiments, the plurality of nucleic acid adaptor candidates comprises a second nucleic acid adaptor candidate, and identifying at least one nucleic acid adaptor from at least some of the plurality of nucleic acid adaptor candidates using the calculated plurality of scores comprises comparing the first set of scores of the first nucleic acid adaptor candidate with the second set of scores of the second nucleic acid adaptor candidate.
There are a number of ways in which the score sets of two different linker candidates can be compared. For example, in some embodiments, each score set may have a composition score for certain regions (e.g., regions containing PBS, regions containing spacers), and candidates may first be compared based on their respective scores for particular regions (e.g., regions containing PBS). If the scores of particular regions are equal or within a threshold (e.g., such that they may be considered to be close to each other), the scores for another region (e.g., a region containing a spacer) may be compared. If the scores of the other regions are equal or within a threshold, the scores for the third region (e.g., the region containing the DNA synthesis template) may be compared. If the scores of the third regions are equal or within a threshold, then the scores for the fourth regions (e.g., regions containing the gRNA core) may be compared. Etc.
Thus, in some embodiments, the first region comprises a Primer Binding Site (PBS), a first score in the first set of scores is indicative of a degree to which the first nucleic acid adaptor candidate is predicted to avoid interacting with the first region of the perna, a third score in the second set of scores is indicative of a degree to which the second nucleic acid adaptor candidate is predicted to avoid interacting with the first region of the perna, and comparing the first set of scores to the second set of scores comprises comparing the first score to the third score. In some embodiments, when the first score is equal to or within a threshold distance of the third score, comparing the first set of scores to the second set of scores further comprises comparing the score of the first set other than the first score to another score of the second set of scores other than the third score.
In some embodiments, techniques for identifying candidate linkers may be performed iteratively. In some embodiments, generating a plurality of nucleic acid adaptor candidates and determining at least one nucleic acid adaptor from the plurality of nucleic acid adaptor candidates may be performed according to an iterative optimization algorithm (e.g., an algorithm involving simulated annealing).
In some embodiments, the plurality of nucleic acid adaptor candidates comprises a second nucleic acid adaptor candidate, and generating and determining according to an iterative optimization algorithm comprises: generating a first nucleic acid adaptor candidate; determining a first score for the first nucleic acid adaptor candidate, wherein identifying comprises determining whether the first nucleic acid adaptor candidate is included in the at least one nucleic acid adaptor based on the first score; generating a second nucleic acid adaptor candidate after determining the first score of the first nucleic acid adaptor candidate; determining a second score for the second nucleic acid adaptor candidate, wherein identifying comprises determining whether the second nucleic acid adaptor candidate is included in the at least one nucleic acid adaptor based on the second score. Aspects of such iterative embodiments are described herein, including with reference to fig. 119.
In some embodiments, calculating the score in the first set of scores is performed using software for modeling RNA-to-RNA interactions (e.g., viennaRNA).
In some embodiments, the technique further comprises filtering the plurality of nucleic acid adaptor candidates using one or more filtering rules. For example, linker candidates having at least a threshold number of consecutive occurrences of the same nucleotide (e.g., four uridine in a row) may be removed from further consideration according to the filtering rules. As another example, according to the filtering rules, splice candidates having AC content below a threshold percentage (e.g., below 50%) may be removed from further consideration. One or more other filtering rules may be used in addition to or in lieu of the above two example filtering rules, as aspects of the techniques described herein are not limited in this respect.
In some embodiments, the technique further comprises clustering the identified linker candidates and determining the linkers representing the different clusters to obtain different populations of linker candidates. Thus, in some embodiments, identifying at least one nucleic acid linker comprises: identifying a subset of the plurality of nucleic acid adaptor candidates based on their respective scores; clustering (e.g., using hierarchical agglomerative clustering or any other suitable clustering technique) the subset of nucleic acid adaptor candidates to obtain a plurality of clusters; and including at least one representative member of each of the plurality of clusters in the at least one nucleic acid adaptor.
In some embodiments, one or more of the identified at least one nucleic acid adaptor can be prepared and used in various applications described herein.
FIG. 118 is a flowchart of an illustrative process 11800 for identifying one or more nucleic acid linkers for coupling a guided editing guide RNA to a nucleic acid portion, in accordance with some embodiments of the technology described herein. Process 11800 may be implemented using any suitable computing device, as aspects of the techniques described herein are not limited in this respect.
Process 11800 begins with operation 11802, wherein one or more nucleic acid linker candidates may be generated. Any suitable number of linker candidates may be generated in any suitable manner in operation 11802. In some embodiments, a plurality of linker candidates may be generated at operation 11802, and some or all of these candidates may be further evaluated at operation 11810 and sub-operations thereof. In other embodiments, one or a small number of splice candidates may be generated at operation 11802, and when it is determined that additional splice candidates are needed (e.g., at operation 11818), then process 11800 may return to operation 11802 to generate additional splice candidates.
Each of the resulting joints may have any suitable length. For example, the linker candidate may consist of 4 nucleotides, 8 nucleotides, 15 nucleotides or 4-32 nucleotides, 8-16 nucleotides or any suitable number of nucleotides within any other suitable range within these ranges.
In some embodiments, the linker candidates may be generated by randomly selecting each of one or more (or all) nucleotides. Each nucleotide may be selected uniformly at random or according to a specified distribution (e.g., uniform or any other discrete distribution). In some embodiments, each nucleotide may be selected independently of other linker nucleotides. In some embodiments, two or more nucleotides may be selected in a related manner, for example, by sampling from a joint distribution defined over a sequence of two or more nucleotides (whether contiguous or not).
Next, process 11800 proceeds to operation 11810, wherein at least one nucleic acid adaptor is identified from the nucleic acid adaptor candidates generated during operation 11802. Identification involves: (1) At operation 11812, calculating a plurality of scores for each of at least some of the joint candidates generated during operation 11802; and (2) at operation 11818, identifying at least one nucleic acid adaptor candidate using the plurality of scores calculated at operation 11812. Each of these operations is described in turn. In some embodiments, one or more filtering rules (examples of which are described herein) may be used to filter the joint candidates such that computing resources do not need to be expended to calculate the score of an otherwise unsuitable joint candidate.
As described herein, calculating the plurality of scores for a particular linker candidate involves determining a plurality of scores for a corresponding plurality of regions of the pegRNA. In some embodiments, each of the plurality of scores may indicate a degree of interaction between the linker candidate and a respective region of the plurality of regions. For example, a particular score may indicate the extent to which the linker is predicted to interact with or avoid interacting with a particular region. In an illustrative example, operation 11812 involves calculating at least two scores for each of at least some joint candidates. Specifically, at operation 11814, a first score indicative of the extent of interaction between the first linker candidate and a first region of the pegRNA (e.g., a region comprising PBS or any other suitable region example provided herein) may be calculated, and at operation 11816, a second score indicative of the extent of interaction between the first linker candidate and a second region of the pegRNA (e.g., a region comprising a spacer or any other suitable region example provided herein) may be calculated. Each score may be calculated using RNA-to-RNA interaction modeling software (e.g., viennaRNA) or any other suitable software, as aspects of the techniques described herein are not limited in this respect.
Although in the illustrative example operation 11812 involves calculating two scores, in some embodiments, 3, 4, 5, 6, or any other suitable number of scores may be calculated for each linker candidate to obtain a measure of the degree of interaction with 3, 4, 5, 6, or any other suitable number of pegRNA regions, as aspects of the techniques described herein are not limited in this respect. For example, in one illustrative embodiment, four scores for the linker candidates may be calculated and the extent of interaction between the linker candidates and PBS, spacer, DNA synthesis template, and gRNA core region of the pegRNA may be indicated.
At operation 11818, the calculated score may be used to identify the "best" nucleic acid adaptor candidate. For example, the score may be used to identify a subset of linker candidates that are predicted to have minimal interactions with one or more regions of the pegRNA. Interactions with certain regions of the pegRNA may be considered worse than interactions with other regions of the pegRNA. Thus, in some embodiments, the score may be examined on a per region basis to identify linker candidates for subsequent use. For example, in some embodiments, the linker candidates may be compared based on their PBS scores—the score represents the extent to which the linker is predicted to avoid interaction with the pegRNA region comprising PBS. A threshold number of candidate linkers that have minimal interaction with such regions may be retained (e.g., 100 linker candidates that are predicted to have minimal interaction with PBS may be retained). If multiple joint candidates have the same score for the same region, then these candidates may be compared/ranked using their scores for other regions, as described herein.
In some embodiments, operations 11812 and 11818 may be performed according to an iterative optimization algorithm (as indicated by the arrows from 11818 to 11812 in fig. 118). Operations 11812 and 11818 may be performed in accordance with, for example, a simulated annealing technique. This aspect is described herein, including with reference to fig. 119.
After subsets of nucleic acid candidates are identified based on their respective scores (and there may be any suitable number of such candidates identified; for example, this may be controlled by a parameter setting indicating the number of candidates desired), in some embodiments, the identified nucleic acid adaptor candidates may be further screened to identify sequences that constitute different subsets of adaptor candidates. To this end, in some embodiments, the identified nucleic acid adaptor candidates may be clustered to obtain a plurality of clusters, and one or more representative adaptors in each cluster may be output at operation 11818, which facilitates sequence diversity among the output adaptor candidates. Any suitable clustering technique (e.g., coacervation hierarchical clustering) may be used for this purpose, as aspects of the techniques described herein are not limited in this respect.
Information about the linker candidates identified during operation 11810 may be output at operation 11820. The information may include the sequence of the linker candidates, their scores, and/or any other relevant information. The information may be transmitted over a communications network to one or more other computing devices, stored in at least one non-transitory computer-readable storage medium (e.g., in memory, on a hard disk drive, in a file, etc.) for subsequent access, presented in a graphical user interface, and/or output in any other suitable manner.
As described above, aspects of process 18000 may be performed iteratively. An illustrative example is shown in FIG. 119. FIG. 119 is a flowchart of an illustrative process 11900 for iteratively identifying one or more nucleic acid linkers for coupling a guided editing guide RNA to a nucleic acid portion, in accordance with some embodiments of the technology described herein. Process 11900 may be implemented using any suitable computing device, as aspects of the techniques described herein are not limited in this respect.
The process 11900 begins with operation 11902 where a nucleic acid linker candidate is generated. The linker candidates may have any suitable length, and examples of lengths are provided herein. The splice candidates may be generated in any suitable manner, including any of the manners described with reference to operation 11802.
Next, process 11900 proceeds to decision block 11904, where a determination is made as to whether the joint candidates generated at operation 11902 pass one or more filtering rules. Any suitable filtering rules may be used to eliminate unwanted linker candidates. For example, candidate linkers having at least a threshold number of consecutive occurrences of the same nucleotide (e.g., four consecutive uridine) may be removed from further consideration (by not passing the filtering rule). As another example, candidate joints having an AC content below a threshold percentage (e.g., below 50%) may be removed from further consideration (by not passing the filtering rules).
When it is determined at decision block 11904 that the joint candidate does not pass the one or more filtering rules, process 11900 returns to operation 11902 where another joint candidate is generated. On the other hand, when it is determined at decision block 11904 that the joint candidate passes the one or more filtering rules, process 11900 proceeds to operation 11906 where a plurality of scores for the joint candidate are calculated. As described herein, for a corresponding plurality of pegRNA regions, a plurality of scores indicates the extent of interaction between the candidate linker and the region. Aspects of how to calculate multiple scores for a joint candidate are described herein.
Next, process 11900 proceeds to operation 11908, where the score of the joint candidate determined at operation 11906 is compared to the score of the joint candidate previously retained. For example, a set of splice candidates (e.g., 100 candidates) may have been identified from among the candidate splices detected so far, and the score of the new splice candidate (the candidate generated in operation 11902) may be compared to the previously determined score for the retained candidate. Aspects of how the scores of the joint candidates are compared to the respective scores of the other joint candidates are described herein.
Based on the comparison, at decision block 11910, a determination is made as to whether a new splice candidate is retained. When the comparison of operation 11908 indicates that the new joint candidate is better than at least one of the retained candidates, then the new joint candidate may be retained in operation 11912. Optionally, one of the previously retained splice candidates may be removed from the list (e.g., if there are a fixed number of splice candidates that may be retained, and adding a new splice to the list may result in the total number of retained splice candidates exceeding the fixed number).
On the other hand, when the comparison of operation 11908 shows that the new joint candidate is not better than any of the remaining candidates, then the new joint candidate is retained with only a certain probability in operation 11912; otherwise it is removed and process 11900 returns to operation 11902 where new joint candidates may be generated. In some embodiments, the probability may be selected according to a simulated annealing procedure. In this sense, the iterative optimization scheme of process 11900 may be considered to involve simulated annealing.
After operation 11912 retains the splice candidates, the process 11900 proceeds to decision block 11914 where it is determined whether additional splice candidates should be generated. Such a determination may be made in any suitable manner, for example, based on the number of iterations/time spent, how many candidate joints remain, an estimate of quality and/or diversity between the remaining candidates, and/or any suitable indicator. When it is determined that additional joint candidates are to be generated, process 11900 returns to operation 11902 where new joint candidates may be generated. Otherwise, the process 11900 proceeds to operation 11916, where information is output indicating at least some of the retained joint candidates (e.g., at least one representative member of all, all clusters). Examples of outputting information about retained candidate linkers are described herein.
FIG. 120 shows an illustrative implementation of a computer system 12000 in which embodiments of the techniques described herein may be implemented. For example, any of the computing devices described herein may be implemented as computing system 12000. The computing system 12000 can include one or more computer hardware processors 12002 and one or more articles of manufacture comprising non-transitory computer-readable storage media (e.g., memory 12004 and one or more non-volatile storage devices 12006). The one or more processors 12002 may control the writing of data to and the reading of data from the memory 12004 and the one or more non-volatile storage devices 12006 in any suitable manner. To perform any of the functions described herein, the processor 12002 may execute one or more processor-executable instructions stored in one or more non-transitory computer-readable storage media (e.g., memory 12004), which may serve as a transitory computer-readable storage medium storing processor-executable instructions for execution by the one or more processors 12002.
The terms "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of processor-executable instructions that can be used to program a computer or other processor to implement the various aspects of the embodiments described above. Furthermore, according to one aspect, one or more computer programs, when executed, perform the methods of the disclosure provided herein, need not reside on a single computer or processor, but may be distributed in a modular fashion amongst different computers or processors to implement various aspects of the disclosure provided herein.
The processor-executable instructions may be in a variety of forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed.
Furthermore, the data structures may be stored in any suitable form in one or more non-transitory computer-readable storage media. For simplicity of illustration, the data structure may be shown with fields related by location in the data structure. Such relationships may also be implemented by allocating storage for fields having locations in a non-transitory computer-readable medium that conveys relationships between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish a relationship between data elements.
As used herein in the specification and claims, the phrase "at least one" referring to a list of one or more elements is understood to mean at least one element selected from any one or more elements in the list of elements, but does not necessarily include at least one of each element specifically listed in the list of elements, and does not exclude any combination of elements in the list of elements. The definition also allows that elements other than the specifically identified elements in the list of elements to which the phrase "at least one" refers may optionally be present, whether or not associated with those elements specifically identified. Thus, for example, in one embodiment, "at least one of a and B" (or equivalently "at least one of a or B", or equivalently "at least one of a and/or B") may refer to at least one, optionally including more than one, a, with no B present (and optionally including elements other than B); in another embodiment, at least one, optionally including more than one, B, is absent a (and optionally includes elements other than a); in yet another embodiment, at least one, optionally including more than one, a, and at least one, optionally including more than one, B (and optionally including other elements); etc.
The phrase "and/or" as used herein in the specification and claims should be understood to mean "one or both of the elements so combined, i.e., elements that in some cases exist coherently and in other cases exist separately. The various elements listed as "and/or" should be interpreted in the same manner, i.e., "one or more" of the elements so combined. In addition to the elements specifically identified by the "and/or" clause, other elements may optionally be present, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, reference to "a and/or B" when used in conjunction with an open language such as "comprising" may refer in one embodiment to a only (optionally including elements other than B); in another embodiment, refer to B only (optionally including elements other than a); in yet another embodiment, both a and B are referred to (optionally including other elements); etc.
Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed. These terms are only used as labels to distinguish one claim element having a particular name from another element having the same name (but using ordinal terms). The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," "having," "containing," "involving," and variations thereof herein, is meant to encompass the items listed thereafter and additional items.
Having described in detail several embodiments of the technology described herein, various modifications and improvements will readily occur to those skilled in the art. Such modifications and improvements are intended to be within the spirit and scope of this disclosure. Accordingly, the foregoing description is by way of example only and is not intended as limiting. These techniques are defined only by the following claims and their equivalents. [9] Resolution of pegRNA design for trans-guided editing
The present disclosure also contemplates trans-directed editing, which refers to a modified form of directed editing that operates by separating pegRNA into two distinct molecules: guide RNA and tPERT molecules. the tpet molecule is programmed to co-localize with the guide editor complex at the target DNA site, bringing the primer binding site and DNA synthesis template back to the guide editor. For example, see FIG. 3G for an embodiment of a trans-primer editor (tPE) showing a two-component system comprising (1) a Recruiting Protein (RP) -PE: gRNA complex and (2) a tPERT template comprising a primer binding site and DNA synthesis linked to an RNA-protein recruitment domain (e.g., a stem loop or hairpin), wherein the recruiting protein component of the RP-PE: gRNA complex recruits tPERT to a target site to be edited, thereby trans-associating PBS and DNA synthesis templates with the guide editor. In other words, tPERT is designed as an extension arm containing (all or part of) pegRNA, including primer binding sites and DNA synthesis templates. One advantage of this approach is to separate the extension arm of the pegRNA from the guide RNA, thereby minimizing the annealing interactions that often occur between the PBS of the extension arm and the spacer sequence of the guide RNA.
Trans primer editing can be performed with any of the pegrnas described herein, including modified pegrnas described herein, which results in increased efficiency of PE editing.
A key feature of trans-guide editing is that the trans-guide editor is able to recruit tPERT to the DNA editing site, effectively co-locating all functions of the pegRNA at the guide editing site. Recruitment may be accomplished by installing an RNA protein recruitment domain (e.g., an MS2 aptamer) into the tPERT and fusing the corresponding recruitment protein to the guide editor (e.g., via a linker to napDNAbp or via a linker to a polymerase) to enable specific binding to the RNA protein recruitment domain, thereby recruiting tPERT molecules to the guide editor complex. As shown in the process depicted in FIG. 3H, the RP-PE: gRNA complex binds and nicks the target DNA sequence. The Recruitment Protein (RP) then recruits the tPERT to co-localize with the guide editor complex that binds to the DNA target site, allowing the primer binding site located on tPERT to bind to the primer sequence on the nick strand, and subsequently allowing the polymerase (e.g., RT) to synthesize single stranded DNA against the DNA synthesis template located on tPERT until the 5' end of tPERT.
While tpet is shown in fig. 3G and 3H as comprising PBS and DNA synthesis templates located at the 5 'end of the RNA-protein recruitment domain, other configurations of tpets can be designed with PBS and DNA synthesis templates located at the 3' end of the RNA-protein recruitment domain. However, the advantage of having a 5 'extended tPERT is that the synthesis of single stranded DNA will naturally terminate at the 5' end of tPERT, thus there is no risk of using any part of the RNA-protein recruitment domain as a template in the guided edited DNA synthesis stage.
[8]Guiding delivery of editors
In another aspect, the present disclosure provides methods of using different strategies to deliver guided editors in vitro and in vivo, including strategies to use cleaved inteins on separate vectors, and to directly deliver ribonucleoprotein complexes (i.e., guided editors that complex pegRNA and/or second site gRNA) using techniques such as electroporation, with cationic lipid-mediated formulations, and to induce endocytosis using receptor ligands fused to ribonucleoprotein complexes. Any such method is contemplated herein.
Delivery selection overview
In some aspects, the invention provides methods comprising delivering to a host cell one or more polynucleotides encoding a guided editor, e.g., one or more vectors described herein that encode one or more components of the guided editor system described herein, one or more transcripts thereof, and/or one or more proteins transcribed therefrom. In some aspects, the invention further provides cells produced by such methods, as well as organisms (e.g., animals, plants, or fungi) comprising or produced by such cells. In some embodiments, the guidance editors described herein are delivered to the cell in combination with (and optionally in complex with) the guide sequence. Conventional viral and nonviral-based gene transfer methods can be used to introduce nucleic acids into mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding guide editor components into a cultured cell or host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g., transcripts of the vectors described herein), naked nucleic acids, and nucleic acids complexed with delivery vectors (e.g., liposomes). Viral vector delivery systems include DNA and RNA viruses that have an episome or an integrated genome after delivery to a cell. For a review of gene therapy procedures, see Anderson, science 256:808-813 (1992); nabel & Felgner, TIBTECH 11:211-217 (1993); mitani & Caskey, TIBTECH 11:162-166 (1993); dillon, TIBTECH 11:167-175 (1993); miller, nature 357:455-460 (1992); van Brunt, biotechnology 6 (10): 1149-1154 (1988); vigne, restorative Neurology and Neuroscience 8:35-36 (1995); kremer & Perricaudet, british Medical Bulletin (1): 31-44 (1995); haddada et al in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds.) (1995); and Yu et al, gene Therapy 1:13-26 (1994).
Non-viral delivery methods of nucleic acids include lipofection, nuclear transfection, microinjection, gene gun, virions, liposomes, immunoliposomes, polycations or lipids: nucleic acid conjugates, naked DNA, artificial virions and agent enhanced DNA uptake. Lipofection is described, for example, in U.S. patent No. 5,049,386,4,946,787; and 4,897,355, lipid transfection reagents are commercially available (e.g., transfectam TM And Lipofectin TM ). Cationic and neutral lipids suitable for efficient receptor recognition lipid transfection of polynucleotides include Feigner, WO91/17424; those of WO 91/16024. Delivery may be cellular (e.g., in vitro or ex vivo administration) or target tissue (e.g., in vivo administration).
Preparation of lipid-nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to those skilled in the art (see, e.g., crystal, science 270:404-410 (1995); blaese et al, cancer Gene Ther.2:291-297 (1995); behr et al, bioconjugate chem.5:382-389 (1994); remy et al, bioconjugate chem.5:647-654 (1994); gao et al, gene Therapy 2:710-722 (1995); ahmad et al, cancer Res.52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183,4,217,344,4,235,871,4,261,975,4,485,054,4,501,728,4,774,085,4,837,028 and 4,946,787).
The use of RNA or DNA virus-based systems to deliver nucleic acids has utilized a highly evolutionary process for targeting viruses to specific cells in the body and transporting viral payloads to the nucleus. Viral vectors may be administered directly to a patient (in vivo) or they may be used to treat cells in vitro, and optionally modified cells are administered to a patient (ex vivo). Conventional virus-based systems may include retroviral, lentiviral, adenoviral, adeno-associated viral and herpes simplex viral vectors for gene transfer. The use of retroviral, lentiviral and adeno-associated viral gene transfer methods may integrate in the host genome, often resulting in long-term expression of the inserted transgene. Furthermore, high transduction efficiencies are observed in many different cell types and target tissues.
The tropism of the virus can be altered by incorporating exogenous envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors capable of transducing or infecting non-dividing cells and generally producing high viral titers. Thus, the choice of retroviral gene transfer system depends on the target tissue. Retroviral vectors consist of cis-acting long terminal repeats, which encapsulate exogenous sequences up to 6-10kb in capacity. The minimal cis-acting LTR is sufficient to replicate and package the vector for integration of the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based on murine leukemia virus (MuLV), gibbon leukemia virus (GaLV), simian Immunodeficiency Virus (SIV), human Immunodeficiency Virus (HIV) and combinations thereof (see, e.g., buchscher et al, J.Virol.66:2731-2739 (1992), johann et al, J.Virol.66:1635-1640 (1992), sommnerface et al, virol.176:58-59 (1990), wilson et al, J.Virol.63:2374-2378 (1989), miller et al, J.Virol.65:2220-2224 (1991), PCT/US 94/05700). In applications where transient expression is preferred, adenovirus-based systems may be used. Adenovirus-based vectors can have very high transduction efficiency in many cell types and do not require cell division. Using such vectors, high titers and expression levels have been achieved. The carrier can be mass-produced in a relatively simple system. Adeno-associated virus ("AAV") vectors can also be used to transduce cells with target nucleic acids, for example, in the in vitro production of nucleic acids and peptides, as well as for in vivo and ex vivo gene therapy procedures (see, e.g., west et al, virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641;Kotin,Human Gene Therapy 5:793-801 (1994); muzyczka, J.Clin. Invest.94:1351 (1994); construction of recombinant AAV vectors is described in a number of publications, including U.S. Pat. No. 5,173,414;Tratschin et al, mol. Cell. Biol.5:3251-3260 (1985); trachin, et al, mol. Cell. Biol.4:2072-2081 (1984); hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samuls et al, J.Virol.63:03822-3828 (1989).
Packaging cells are commonly used to form viral particles capable of infecting host cells. These cells include 293 cells packaging adenovirus and ψ2 cells or PA317 cells packaging retrovirus. Viral vectors for gene therapy are typically produced by generating cell lines that package nucleic acid vectors into viral particles. Vectors typically contain minimal viral sequences required for packaging and subsequent integration into a host, with other viral sequences being used to replace the expression cassette for the polynucleotide to be expressed. The deleted viral functions are normally provided in trans by the packaging cell line. For example, AAV vectors for gene therapy typically have only ITR sequences from the AAV genome, which are necessary for packaging and integration into the host genome. The viral DNA is packaged in a cell line that contains helper plasmids encoding other AAV genes, rep and cap, but lacks ITR sequences. Cell lines can also be infected with adenovirus as a helper. Helper viruses facilitate replication of AAV vectors from helper plasmids and expression of AAV genes. Helper plasmids are not packaged in large quantities due to the lack of ITR sequences. Contamination of adenovirus can be reduced by, for example, heat treatment, adenovirus being more susceptible to heat treatment than AAV. Other methods of delivering nucleic acids to cells are known to those of skill in the art. See, e.g., US20030087817, incorporated herein by reference.
In various embodiments, PE constructs (including break constructs) may be engineered for delivery in one or more rAAV vectors. rAAV related to any of the methods and compositions provided herein can be of any serotype, including any derivative or pseudotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 2/1, 2/5, 2/8, 2/9, 3/1, 3/5, 3/8, or 3/9). The rAAV may comprise a genetic load to be delivered to the cell (i.e., a recombinant nucleic acid vector expressing a gene of interest, such as a complete or split guide editor carried by the rAAV to the cell). The rAAV may be chimeric.
As used herein, a serotype of rAAV refers to a serotype of recombinant viral capsid proteins. Non-limiting examples of derivatives and pseudotyped include rAAV2/1, rAAV2/5, rAAV2/8, rAAV2/9, AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6 (Y445F/Y731F), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShH10, AAV2 (Y- > F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAV 3.45. Non-limiting examples of derivatives and pseudotyped with chimeric VP1 proteins are rAAV2/5-1VP1u, which has the genome of AAV2, the capsid backbone of AAV5, and VP1u of AAV 1. Other non-limiting examples of derivatives and pseudotyped with chimeric VP1 proteins are rAAV2/5-8VP1u, rAAV2/9-1VP1u, and rAAV2/9-8VP1u.
AAV derivatives/pseudotypes, and methods of producing such derivatives/pseudotypes, are known in the art (see, e.g., mol ter. 2012apr;20 (4): 699-708.Doi:10.1038/mt.2011.287.Epub 2012Jan 24.The AAV vector toolkit:poised at the clinical crossroads.Asokan A1,Schaffer DV,Samulski RJ.). Methods of generating and using pseudotyped rAAV vectors are known in the art (see, e.g., duan et al, j. Virol.,75:7662-7671,2001;Halbert et al, j. Virol.,74:1524-1532,2000;Zolotukhin et al, methods,28:158-167,2002; and Auricchio et al, hum. Molecular. Genet.,10:3075-3081,2001).
Methods of preparing or packaging rAAV particles are known in the art, and reagents are commercially available (see, e.g., zolotukhin et al product and purification of serotype 1,2,and 5 recombinant adeno-associated viral vectors methods 28 (2002) 158-167, and U.S. patent publication nos. US20070015238 and US20120322861, which are incorporated herein by reference, as well as plasmids and kits available from ATCC and CellBiolabs, inc.). For example, a plasmid comprising a gene of interest can be combined with one or more helper plasmids, e.g., comprising Rep genes (e.g., encoding Rep78, rep68, rep52, and Rep 40) and cap genes (encoding VP1, VP2, and VP3, including the modified VP2 region as described herein), and transfected into a recombinant cell, such that the rAAV particle can be packaged and subsequently purified.
Recombinant AAV may comprise a nucleic acid vector, which may minimally comprise: (a) One or more heterologous nucleic acid regions comprising a sequence encoding a protein or polypeptide of interest or an RNA of interest (e.g., siRNA or microRNA), and (b) one or more regions comprising an Inverted Terminal Repeat (ITR) sequence (e.g., a wild-type ITR sequence or an engineered ITR sequence) flanking the one or more nucleic acid regions (e.g., the heterologous nucleic acid region). The heterologous nucleic acid region comprising a sequence encoding a protein or RNA of interest is referred to herein as a gene of interest.
Any of the rAAV particles provided herein can have a capsid protein having amino acids of different serotypes outside the VPlu region. In some embodiments, the serotype of the VP1 protein backbone is different from the serotype of the ITR and/or Rep genes. In some embodiments, the serotype of the VP1 capsid protein backbone of the particle is the same as the serotype of the ITR. In some embodiments, the serotype of the VP1 capsid protein backbone of the particle is the same as the serotype of the Rep gene. In some embodiments, the capsid protein of the rAAV particle comprises an amino acid mutation that results in increased transduction efficiency.
In some embodiments, a nucleic acid vector comprises one or more regions comprising sequences that promote expression of a nucleic acid (e.g., a heterologous nucleic acid) (e.g., an expression control sequence operably linked to the nucleic acid). Many such sequences are known in the art. Non-limiting examples of expression control sequences include promoters, isolators, silencers, response elements, introns, enhancers, start sites, termination signals and poly (A) tails. Any combination of such control sequences (e.g., promoters and enhancers) is contemplated herein.
The final AAV construct may incorporate sequences encoding pegRNA. In other embodiments, the AAV construct may incorporate a sequence encoding a second site-nick generating guide RNA. In other embodiments, the AAV construct may incorporate a sequence encoding a second site nick generating guide RNA and a sequence encoding a pegRNA.
In various embodiments, the pegRNA and the second site nick generating guide RNA may be expressed from a suitable promoter, such as a human U6 (hU 6) promoter, a mouse U6 (mU 6) promoter, or other suitable promoter. The pegRNA and the second site nick generating guide RNA may be driven by the same promoter or different promoters.
In some embodiments, the rAAV construct or composition herein is administered enterally to the subject. In some embodiments, the rAAV construct or composition herein is administered parenterally to the subject. In some embodiments, the rAAV particles or compositions herein are administered subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intraventricular, intramuscularly, intrathecally (IT), intracisternally, intraperitoneally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs. In some embodiments, the rAAV particles or compositions herein are administered to the subject by injection into a hepatic artery or portal vein.
Strategies based on broken PE vectors
In this regard, the guidance editor may be split at the fracture site and provided as two halves of an entire/complete guidance editor. The two halves may be delivered to the cell (e.g., as expressed proteins or on separate expression vectors) and upon intracellular contact, the two halves form a complete leader editor by self-splicing of the intein on each leader editor half. The split intein sequences can be engineered into each half of the encoded leader editor to facilitate their intracellular alternative splicing and concomitant recovery of the intact functional PE.
These methods based on the cleavage of inteins overcome several obstacles to in vivo delivery. For example, DNA encoding the bootstrap editor is larger than the rAAV packaging limit, thus requiring special solutions. One such solution is to formulate an editor fused to a split intein pair, package these intein pairs into two separate rAAV particles, and recombine the functional editor proteins when they are co-delivered to the cells. Some other special considerations are described to explain the unique functions of guided editing, including optimizing the second site-notch generating targets and properly packaging the guided editor into viral vectors, including lentiviruses and rAAV.
In this regard, the guidance editor may be split at the fracture site and provided as two halves of a complete/full guidance editor. The two halves may be delivered to the cell (e.g., as expressed proteins or on separate expression vectors) and upon intracellular contact, the two halves form a complete leader editor by self-splicing of the intein of each leader editor half. The split intein sequences can be engineered into each half of the encoded leader editor to facilitate their intracellular alternative splicing and concomitant recovery of the intact functional PE.
FIG. 66 depicts one embodiment of a leader editor provided as two PE half proteins that regenerate into a complete leader editor by self-splicing of the cleaved intein at the end or start of each leader editor half protein. As used herein, the term "PE N-terminal half" refers to the N-terminal half of the complete leader editor and includes an "N-intein" at the C-terminus of the PE N-terminal half of the complete leader editor (i.e., the N-terminal extein). "N-intein" refers to the N-terminal half of a completely formed cleaved intein moiety. As used herein, the term "PE C-terminal half" refers to the C-terminal half of the complete guided editor and includes a "C-intein" at the N-terminus of the C-terminal half of the complete guided editor (i.e., the C-terminal extein). When the two half proteins, i.e., the PE N-terminal half and the PE C-terminal half, are in contact with each other, e.g., in a cell, the N-intein and the C-intein undergo a self-excision process simultaneously and form a peptide bond between the C-terminus of the PE N-terminal half and the N-terminus of the PE C-terminal half to reform a complete guided editor protein comprising a complete napDNAbp domain (e.g., cas9 nickase) and an RT domain. Although not shown, the guide editor may also contain additional sequences, including NLS at the N-and/or C-terminus, and amino acid linker sequences connecting the domains.
In various embodiments, the guide editor may be engineered into two half proteins (i.e., a PE N-terminal half and a PE C-terminal half) by "breaking" the entire guide editor at a "break site". "cleavage site" refers to a location in the guide editor where a cleavage intein sequence (i.e., an N-intein and a C-intein) is inserted between two adjacent amino acid residues. More specifically, "cleavage site" refers to a location where the entire guide editor is split into two separate halves, where each half is fused to an N-intein or a C-intein motif at the cleavage site. The cleavage site may be located at any suitable position in the guidance editor, but preferably the cleavage site is located at a position that allows for the formation of two half-proteins of suitable size (e.g., by an expression vector) for delivery, and the intein fused to each half-protein at the end of the cleavage site is available to substantially interact with each other when one half-protein is contacted with the other half-protein within the cell.
In some embodiments, the cleavage site is located in the napDNAbp domain. In other embodiments, the cleavage site is located in the RT domain. In other embodiments, the cleavage site is located in a linker that connects the napDNAbp domain and the RT domain.
In various embodiments, the cleavage site design requires finding the location of cleavage and insertion of the N-terminal and C-terminal inteins, both of which are structurally permissive for packaging of the two half-guided editor domains into two different AAV genomes. Furthermore, the intein residues required for trans-splicing may be incorporated by mutating the residues at the N-terminus of the C-terminal extein or inserting residues that leave an intein "trace".
Exemplary cleavage configurations of the cleavage guide editor comprising SpCas9 nickase or SaCas9 nickase are as follows.
/>
/>
/>
In various embodiments, using SpCas9 nickase (SEQ ID NO:37, 1368 amino acids) as an example, the break can be between any two amino acids between 1 and 1368. However, a preferred cleavage will be in the central region of the protein, for example from amino acid 50 to 1250, or from 100 to 1200, or from 150 to 1150, or from 200 to 1100, or from 250 to 1050, or from 300 to 1000, or from 350 to 950, or from 400 to 900, or from 450 to 850, or from 500 to 800, or from 550 to 750, or from 600 to 700 of SEQ ID NO 37. In particular exemplary embodiments, the cleavage site may be located at 740/741, or 801/802, or 1010/1011, or 1041/1042. In other embodiments, the cleavage site may be relative to SEQ ID NO: spCas9 at 37 is located at 1/2, 2/3, 3/4, 4/5, 5/6, 6/7, 7/8, 8/9, 9/10, 10/11, 12/13, 14/15, 15/16, 17/18, 19/20, 20/21, 21/22, 22/23, 23/24, 24/25, 25/26, 26/27, 27/28, 28/29, 29/30, 30/31, 31/32, 32/33, 33/34, 34/35, 35/36, 36/37, 38/39, 39/40, 41/42, 42/43, 43/44, 44/45, 45/46, 46/47, 47/48, 48/49 49/50, 51/52, 52/53, 53/54, 54/55, 55/56, 56/57, 57/58, 58/59, 59/60, 61/62, 62/63, 63/64, 64/65, 65/66, 66/67, 67/68, 68/69, 69/70, 71/72, 72/73, 73/74, 74/75, 75/76, 76/77, 77/78, 78/79, 79/80, 81/82, 82/83, 83/84, 84/85, 85/86, 86/87, 87/88, 88/89, 89/90, or 90 to 100, 100 to 150, 150 to 200, 200 to 250, 250 to 300, 300 to 350, 350 to 400, 450 to 500, 500 to 550, 550 to 600, 600 to 650, 650 to 700, 700 to 750, 750 to 800, 800 to 850, 850 to 900, 900 to 950, 950 to 1000, 1000 to 1050, 1050 to 1100, 1100 to 1150, 1150 to 1200, 1200 to 1250, 1250 to 1300, 1300 to 1350, 1350 to 1368, or between any two corresponding residues of an amino acid sequence having at least 80%, 85%, 90%, 95%, 98%, 99% or 99.9% sequence identity to SEQ ID NO:37, or between any two corresponding residues of any amino acid sequence of SpCas9 of SEQ ID NO:31, 37-38, 40, 42, 44-99, or between any two corresponding residues of an amino acid sequence having at least 80%, 85%, 90%, 95%, 98%, 99%, or 99.9% sequence identity to any sequence of SEQ ID NO:31, 37-38, 40, 42, 44-99.
In various embodiments, the disrupted intein sequence may be engineered by the intein sequences represented by SEQ ID NOS: 16-23.
In various embodiments, the split intein sequence can be used as follows:
in various embodiments, the split inteins can be used to deliver separate portions of the complete guided editor to cells separately, which, after expression in the cells, reconstitute the complete guided editor by trans-splicing.
In some embodiments, the present disclosure provides a method of delivering a guidance editor to a cell comprising:
(a) Constructing a first expression vector encoding an N-terminal fragment of a leader editor fused to a first split intein sequence;
(b) Constructing a second expression vector encoding a C-terminal fragment of the leader editor fused to a second split intein sequence;
(c) Delivering the first and second expression vectors to a cell,
wherein the N-terminal and C-terminal fragments recombine in the cell to direct the editor due to trans-splicing activity resulting in self-excision of the first and second cleaved intein sequences.
In some embodiments, the cleavage site may be anywhere in the guide editor fusion, including the napDNAbp domain, linker, or reverse transcriptase domain.
In other embodiments, the cleavage site is in the napDNAbp domain.
In other embodiments, the cleavage site is in the reverse transcriptase or polymerase domain.
In other embodiments, the cleavage site is in a linker.
In various embodiments, the disclosure provides a guide editor comprising a napDNAbp (e.g., cas9 domain) and a reverse transcriptase, wherein one or both of the napDNAbp and/or the reverse transcriptase comprises an intein, such as a ligand-dependent intein. In general, inteins are ligand-dependent inteins that exhibit no or minimal protein splicing activity in the absence of a ligand (e.g., a small molecule such as 4-hydroxy tamoxifen, a peptide, a protein, a polynucleotide, an amino acid, and a nucleotide). Ligand-dependent inteins are known, including those described in U.S. patent application U.S. N.14/004,280, published as U.S. 2014/0065711A1, the entire contents of which are incorporated herein by reference. Furthermore, a broken Cas9 structure is utilized. In some embodiments, the intein comprises an amino acid sequence selected from the group consisting of SEQ ID NOS: 16-23, 382, 385, 388.
In various embodiments, the napDNAbp domain is a smaller size napDNAbp domain compared to the classical SpCas9 domain of SEQ ID No. 37.
Classical SpCas9 protein is 1368 amino acids in length and has a predicted molecular weight of 158 kilodaltons. As used herein, the term "small-sized Cas9 variant" refers to any Cas9 variant-naturally occurring, engineered, or otherwise-that retains less than 1300 amino acids, or at least 1290 amino acids, or less than 1280 amino acids, or less than 1270 amino acids, or less than 1260 amino acids, or less than 1250 amino acids, or less than 1240 amino acids, or less than 1230 amino acids, or less than 1220 amino acids, or less than 1210 amino acids, or less than 1200 amino acids, or less than 1190 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 1150 amino acids, or less than 1140 amino acids, or less than 1130 amino acids, or less than 1120 amino acids, or less than 1110 amino acids, or less than 1100 amino acids, or less than 1050 amino acids, or less than 1250 amino acids, or less than 1240 amino acids, or less than 850 amino acids, or less than 800 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 500 amino acids, or less than about 500 amino acids, or more than 500 amino acids.
In one embodiment, the specification includes a split intein PE construct that splits between residues 1024 and 1025 of classical SpCas9 (SEQ ID NO: 37) (or minus SEQ ID NO:37 relative to Met, which may be referred to as residues 1023 and 1024, respectively) as described in example 20.
First, the amino acid sequence of SEQ ID NO 37 is shown below, indicating the position of the cleavage site between residues 1024 ("K") and 1025 ("S"):
in this configuration, the amino acid sequence of the N-terminal half (amino acids 1-1024) is as follows:
in this configuration, the amino acid sequence (amino acids 1-1023) of the N-terminal half (where the protein subtracts Met at position 1) is as follows:
in this configuration, the amino acid sequence of the C-terminal half (amino acids 1024-1368 (or counted as amino acids 1023-1367 in reduced Met Cas 9) is as follows:
as shown in example 20, the PE2 (SpCas 9 based on SEQ ID NO: 37) construct breaks into two separate constructs at positions 1023/1024 (relative to the reduced Met SEQ ID NO: 37), as follows:
SpPE2 breaks at the 1023/1024N end half
Explanation:NLS,NpuC intein,>
SpPE2 cleavage at the 1023/1024C-terminal portion
Explanation:NLS,NpuC intein,>
the present disclosure also contemplates methods of delivering and/or treating cells with the split intein-directed editor.
In some embodiments, the present disclosure provides a method of delivering a guidance editor to a cell comprising:
(a) Constructing a first expression vector encoding an N-terminal fragment of a leader editor fused to a first split intein sequence;
(b) Constructing a second expression vector encoding a C-terminal fragment of the leader editor fused to a second split intein sequence;
(c) Delivering the first and second expression vectors to a cell,
wherein the N-terminal and C-terminal fragments recombine in the cell to guide the editor due to self-excision of the first and second cleaved intein sequences by trans-splicing activity.
In certain embodiments, the N-terminal fragment of the guide editor fused to the first split intein sequence is SEQ ID NO 394 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99.9% sequence identity to SEQ ID NO 394.
In other embodiments, the C-terminal fragment of the leader editor fused to the first split intein sequence is SEQ ID NO. 395 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99.9% sequence identity to SEQ ID NO. 395.
In other embodiments, the present disclosure provides methods of editing a target DNA sequence in a cell comprising:
(a) Constructing a first expression vector encoding an N-terminal fragment of a leader editor fused to a first split intein sequence;
(b) Constructing a second expression vector encoding a C-terminal fragment of the leader editor fused to a second split intein sequence;
(c) Delivering the first and second expression vectors to a cell,
wherein the N-terminal and C-terminal fragments recombine in the cell to guide the editor due to self-excision of the first and second cleaved intein sequences by trans-splicing activity.
In certain embodiments, the N-terminal fragment of the guide editor fused to the first split intein sequence is SEQ ID NO 394 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99.9% sequence identity to SEQ ID NO 394.
In other embodiments, the C-terminal fragment of the leader editor fused to the first split intein sequence is SEQ ID NO. 395 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99.9% sequence identity to SEQ ID NO. 395.
Delivery of PE ribonucleoprotein complexes
In this regard, the guidance editor may be delivered by non-viral delivery strategies, including delivery of guidance editors complexed with pegRNA (i.e., PE ribonucleoprotein complexes) by various methods, including electroporation and lipid nanoparticles. Non-viral delivery methods of nucleic acids include lipofection, nuclear transfection, microinjection, gene gun, virions, liposomes, immunoliposomes, polycations or lipids: nucleic acid conjugates, naked DNA, artificial virions and agent enhanced DNA uptake. Lipofection is described, for example, in U.S. Pat. Nos. 5,049,386, 4,946,787 and 4,897,355), and lipofection reagents are commercially available (e.g., transfectam TM And Lipofectin TM ). Cationic and neutral lipids suitable for efficient receptor recognition lipid transfection of polynucleotides include Feigner, WO91/17424; those in WO 91/16024. May be delivered to cells (e.g., in vitro or ex vivo) or target tissue (e.g., in vivo).
Preparation of lipid-nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to those skilled in the art (see, e.g., crystal, science 270:404-410 (1995); blaese et al, cancer Gene Ther.2:291-297 (1995); behr et al, bioconjugate chem.5:382-389 (1994); remy et al, bioconjugate chem.5:647-654 (1994); gao et al, gene Therapy 2:710-722 (1995); ahmad et al, cancer Res.52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183,4,217,344,4,235,871,4,261,975,4,485,054,4,501,728,4,774,085,4,837,028 and 4,946,787).
Reference may be made to the following references which discuss methods of non-viral delivery of ribonucleoprotein complexes, each of which is incorporated herein by reference.
Chen,Sean,et al."Highly efficient mouse genome editing by CRISPR ribonucleoprotein electroporation of zygotes."Journal of Biological Chemistry(2016):jbc-M116.PubMed
Zuris,John A.,et al."Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo."Nature biotechnology 33.1(2015):73.PubMed
Rouet, domain, et al, "receptor-Mediated Delivery of CRISPR-Cas9 Endonuclease for Cell-Type-Specific Gene edition," Journal of the American Chemical Society 140.21 (2018): 6596-6603.Pubmed.
The data provided in fig. 68C shows that various disclosed PE ribonucleoprotein complexes can be delivered in this manner (PE 2 at high concentration, PE3 at high concentration, and PE3 at low concentration).
Delivery of PE through mRNA
Another method that may be used to deliver the guide editor and/or pegRNA to cells that require genome editing based on guide editing is through the use of messenger RNA (mRNA) delivery methods and techniques. Examples of mRNA delivery methods and compositions that may be utilized in the present disclosure include, for example, PCT/US2014/028330, US8822663B2, NZ700688A, ES2740248T3, EP2755693A4, EP2755986A4, WO2014152940A1, EP3450553B1, BR112016030852A2, and EP3362461A1, each of which is incorporated herein by reference in its entirety. Other disclosures incorporated herein by reference are found in Kowalski et al, "Delivering the Messenger: advances in Technologies for Therapeutic mRNA Delivery," Mol therapeutic, "2019; 27 (4):710-728.
In contrast to DNA vectors encoding a guided editor, the use of RNA as a delivery agent for the guided editor has the advantage that genetic material does not have to enter the nucleus to perform its function. The delivered mRNA can be directly translated in the cytoplasm into the desired protein (e.g., guide editor) and nucleic acid product (e.g., pegRNA). However, to be more stable (e.g., against RNA degrading enzymes in the cytoplasm), it is necessary in some embodiments to stabilize the mRNA to increase delivery efficiency. Certain delivery vehicles (e.g., cationic lipid or polymeric delivery vehicles) may also help protect transfected mRNA from endogenous RNase enzymes that might otherwise degrade therapeutic mRNA encoding the desired guide editor. Furthermore, despite the increased stability of modified mRNA, it remains a challenge to deliver mRNA (particularly mRNA encoding a full-length protein) to cells in vivo in a manner that allows therapeutic levels of protein production.
With some exceptions, intracellular delivery of mRNA is generally more challenging than intracellular delivery of small oligonucleotides, and it requires encapsulation into delivery nanoparticles, in part because mRNA molecules are significantly larger in size (300-5,000 kDa, about 1-15 kb) compared to other types of RNAs (small interfering RNAs [ siRNA ], about 14kDa; antisense oligonucleotides [ ASO ],4-10 kDa).
mRNA must cross the cell membrane to reach the cytoplasm. Cell membranes are a dynamic and powerful barrier to intracellular delivery. It consists essentially of a lipid bilayer of a zwitterionic and a negatively charged phospholipid, wherein the polar head of the phospholipid points to the aqueous environment and the hydrophobic tail forms a hydrophobic core.
In some embodiments, the mRNA compositions of the present disclosure comprise mRNA (encoding a guide editor and/or pegRNA), a transport vector, and optionally an agent that facilitates contact with a target cell and subsequent transfection.
In some embodiments, the mRNA may include one or more modifications that confer stability to the mRNA (e.g., as compared to a wild-type or native version of the mRNA), and involve aberrant expression of the associated protein. One or more modifications to correct for defects in the wild type may also be included. For example, a nucleic acid of the invention may include modifications to one or both of the 5 'untranslated region or the 3' untranslated region. Such modifications may include sequences comprising partial sequences encoding the Cytomegalovirus (CMV) immediate early 1 (IE 1) gene, poly a tail, cap1 structure, or human growth hormone (hGH). In some embodiments, the mRNA is modified to reduce mRNA immunogenicity.
In one embodiment, the "guide editor" mRNA in the compositions of the invention may be formulated in a liposome transfer carrier to facilitate delivery to target cells. Contemplated transfer carriers may include one or more cationic lipids, non-cationic lipids, and/or PEG-modified lipids. For example, the transfer carrier may comprise at least one of the following cationic lipids: c12-200, DLin-KC2-DMA, DODAP, HGT4003, ICE, HGT5000 or HGT5001. In embodiments, the transfer carrier comprises cholesterol (chol) and/or PEG-modified lipids. In some embodiments, the transfer vector comprises DMG-PEG2K. In certain embodiments, the transfer vector has the following lipid formulation: c12-200, DOPE, chol, DMG-PEG2K; DODAP, DOPE, cholesterol, DMG-PEG2K; HGT5000, DOPE, chol, DMG-PEG2K, HGT, 5001, DOPE, chol, DMG-PEG2K.
The present disclosure also provides compositions and methods for facilitating transfection of target cells with one or more mRNA molecules encoding PE. For example, the compositions and methods of the invention contemplate the use of targeting ligands that increase the affinity of the composition for one or more target cells. In one embodiment, the targeting ligand is apolipoprotein B or apolipoprotein E and the corresponding target cell expresses a low density lipoprotein receptor, thereby facilitating recognition of the targeting ligand. A large number of target cells can be preferentially targeted using the methods and compositions of the present disclosure. For example, contemplated target cells include hepatocytes, epithelial cells, hematopoietic cells, epithelial cells, endothelial cells, lung cells, bone cells, stem cells, mesenchymal cells, neural cells, cardiac cells, adipocytes, vascular smooth muscle including cells, cardiac muscle cells, skeletal muscle cells, beta cells, pituitary cells, synovial lining cells, ovarian cells, testicular cells, fibroblasts, B cells, T cells, reticulocytes, leukocytes, granulocytes, and tumor cells. However, it is not limited to these.
In some embodiments, the mRNA encoding PE may optionally have chemical or biological modifications, e.g., to increase the stability and/or half-life of such mRNA, or to increase or otherwise facilitate protein production. Following transfection, the native mRNA in the compositions of the invention may decay with a half-life of 30 minutes to days. The mRNA in the compositions of the present disclosure may retain at least some of the ability to be translated, thereby producing a functional protein or enzyme. Accordingly, the present invention provides compositions comprising stabilized mRNA and methods of administration thereof. In some embodiments, the activity of the mRNA is extended over an extended period of time. For example, the activity of mRNA can be prolonged such that the compositions of the present disclosure are administered to a subject every half-week or every two weeks, or more preferably monthly, bi-monthly, quarterly, or on a annual basis. The activity of the mRNA expansion or elongation of the present invention is directly related to the amount of protein or enzyme produced by such mRNA. Similarly, the activity of the compositions of the present disclosure may be further extended or prolonged by modification to improve or enhance mRNA translation. Furthermore, the amount of functional protein or enzyme produced by a target cell is a function of the amount of mRNA delivered to the target cell and the stability of such mRNA. The stability of the mRNA of the present invention may be improved or enhanced to some extent, further extending the half-life, the activity of the protein or enzyme produced, and the frequency of administration of the composition.
Thus, in some embodiments, the mRNA in the compositions of the present disclosure comprises at least one modification that imparts increased or enhanced stability to the nucleic acid, including, for example, improving resistance to nuclease digestion in vivo. As used herein, the terms "modified" and "modified" as such terms relate to nucleic acids provided herein, include at least one alteration (e.g., resistance to nuclease digestion) that preferably enhances stability and renders mRNA more stable than wild-type or naturally occurring versions of mRNA. As used herein, the terms "stable" and "stability" as such terms relate to nucleic acids of the invention, in particular mRNA, refer to increasing or enhancing resistance to degradation by, for example, nucleases (i.e., endonucleases or exonucleases) that are typically capable of degrading such mRNA. Increased stability may include, for example, reduced sensitivity to hydrolysis or other disruption of endogenous enzymes (e.g., endonucleases or exonucleases) or conditions within the target cell or tissue, thereby increasing or enhancing the residence of such mRNA in the target cell, tissue, subject, and/or cytoplasm. The stabilized mRNA molecules provided herein exhibit a longer half-life relative to their naturally occurring unmodified counterparts (e.g., wild-type versions of mRNA). The terms "modified" and "modified" as such terms relate to mRNA of the invention, also encompass alterations that improve or enhance translation of mRNA nucleic acids, including, for example, sequences that are involved in protein translation initiation (e.g., kozak consensus sequences) (Kozak, m., nucleic Acids Res (20): 8125-48 (1987)).
In some embodiments, the mRNA used in the compositions of the present disclosure are chemically or biologically modified to make them more stable. Exemplary modifications to the mRNA include depletion of bases (e.g., by deletion or by substitution of one nucleotide for another) or base modification, such as chemical modification of bases. The phrase "chemical modification" as used herein includes modifications that introduce chemical properties different from those seen in naturally occurring mRNA, e.g., covalent modifications, such as the introduction of modified nucleotides (e.g., nucleotide analogs, or side groups that are not naturally occurring included in such mRNA molecules).
Other suitable polynucleotide modifications that may be incorporated into the mRNA encoding PE used in the compositions of the present disclosure include, but are not limited to, 4' -thio modified bases: 4' -thio-adenosine, 4' -thio-guanosine, 4' -thio-cytidine, 4' -thio-uridine, 4' -thio-5-methyl-cytidine, 4' -thio-pseudouridine and 4' -thio-2-thiouridine, pyridine-4-ribonucleoside, 5-aza-uridine, 2-thio-5-aza-uridine, 2-thiouridine, 4-thio-pseudouridine, 2-thio-pseudouridine, 5-hydroxyuridine, 3-methyluridine, 5-carboxymethyl-uridine, 1-carboxymethyl-pseudouridine, 5-propynyl-uridine, 1-propynyl-pseudouridine 5-taurine methyl uridine, 1-taurine methyl-pseudo uridine, 5-taurine methyl-2-thiouridine, 1-taurine methyl-4-thiouridine, 5-methyl-uridine, 1-methyl-pseudo uridine, 4-thio-1-methyl-pseudo uridine, 2-thio-1-methyl-pseudo uridine, 1-methyl-1-deaza-pseudo uridine, 2-thio-1-methyl-1-deaza-pseudo uridine, dihydro pseudo uridine, 2-thiodihydro pseudo uridine, 2-methoxy-4-thiouridine, 4-methoxy-pseudouridine, 4-methoxy-2-thio-pseudouridine, 5-aza-cytidine, pseudoisocytidine, 3-methyl-cytidine, N4-acetylcytidine, 5-formylcytidine, N4-methylcytidine, 5-hydroxymethylcytidine, 1-methyl-pseudoisocytidine, pyrrolo-cytidine, pyrrole-pseudoisocytidine, 2-thio-cytidine, 2-thio-5-methyl-cytidine, 4-thio-pseudoisocytidine, 4-thio-1-methyl-1-de aza-pseudoisocytidine, 1-methyl-1-deaza-pseudoisocytidine zebularine, 5-aza-zebularine, 5-methyl-zebularine, 5-aza-2-thio-zebularine, 2-methoxy-cytidine, 2-methoxy-5-methyl-cytidine, 4-methoxy-pseudoisocytidine, 4-methoxy-1-methyl-pseudoisocytidine, 2-aminopurine, 2, 6-diaminopurine, 7-deaza-adenine, 7-deaza-8-aza-adenine, 7-deaza-2-aminopurine, 7-deaza-8-aza-2-aminopurine, 7-deaza-2, 6-diaminopurine, 7-deaza-8-aza-2, 6-diaminopurine, 1-methyladenosine, N6-isopentenyl adenosine, N6- (cis-hydroxyisopentenyl) adenosine, 2-methylthio-N6- (cis-hydroxyisopentenyl) adenosine, N6-glycylcarbamoyladenosine, N6-threonyl-carbamoyl adenosine, 2-methylthio-N6-threonyl-carbamoyl adenosine, N6-dimethyl adenosine, 7-methyladenosine, 2-methylthioadenine, inosine, 1-methylainosine huacoside (wyosine), huai Dinggan (wybutosine), 7-deaza-guanosine, 7-deaza-8-aza-guanosine, 6-thio-7-deaza-8-aza-guanosine, 7-methyl-guanosine, 6-thio-7-methyl-guanosine, 7-methyl-inosine, 6-methoxy-guanosine, 1-methyl guanosine, N2-methyl guanosine, 8-oxo-guanosine, 7-methyl-8-oxo-guanosine, 1-methyl-6-thio-guanosine, N2-methyl-6-thio-guanosine and N2, N2-dimethyl-6-thioguanosine, and combinations thereof. The term modification also includes, for example, incorporation of non-nucleotide linkages or modified nucleotides into the mRNA sequences of the invention (e.g., modification of one or both of the 3 'and 5' ends of an mRNA molecule encoding a functional protein or enzyme). Such modifications include addition of bases to the mRNA sequence (e.g., comprising a poly a tail or longer), alteration of the 3'utr or 5' utr, complexing the mRNA with an agent (e.g., a protein or complementary nucleic acid molecule), and inclusion of elements that alter the molecular structure of the mRNA (e.g., form a secondary structure).
In some embodiments, the mRNA encoding PE includes a 5' cap structure. The 5' cap is typically added as follows: first, the RNA terminal phosphatase removes one terminal phosphate group from the 5' nucleotide, leaving two terminal phosphate groups; guanosine Triphosphate (GTP) is then added to the terminal phosphate by guanylate transferase, yielding a 5'5 triphosphate linkage; the 7-nitrogen of guanine is then methylated by methyltransferase. Examples of cap structures include, but are not limited to, m7G (5 ') ppp (5 ' (a, G (5 ') ppp (5 ') a) and G (5 ') ppp (5 ') G.) naturally occurring cap structures include 7-methylguanosine, which bridges to the 5' -end of the first transcribed nucleotide by triphosphate, resulting in a dinucleotide cap of m7G (5 ') ppp (5 ') N, where N is any nucleoside in vivo, enzymatically capped in the nucleus, catalyzed by an enzyme (guanylate transferase).
Other cap analogs include, but are not limited to, chemical structures selected from the group consisting of: m7GpppG, m7GpppA, m7GpppC; unmethylated cap analogs (e.g., gpppG); a dimethyl cap analogue (e.g., m2,7 GpppG), a trimethyl cap analogue (e.g., m2,7 GpppG), a dimethyl symmetrical cap analogue (e.g., m7Gpppm 7G), or an anti-reverse cap analogue (e.g., ARCA; m7,2'OmeGpppG, m72' dGpppG, m7,3'OmeGpppG, m7,3' dGpppG, and tetraphosphoric acid derivatives thereof) (see, e.g., jeniey, J.et. Al., "Novel 'anti-reverse' cap analogs with superior translational properties", RNA,9:1108-1122 (2003)).
Typically, the presence of a "tail" is used to protect the mRNA from exonuclease degradation. PolyA or polyU tails are thought to stabilize natural messengers and synthetic sense RNA. Thus, in certain embodiments, a long poly a or poly U tail may be added to the mRNA molecule, thereby making the RNA more stable. Poly A or Poly U tails may be added using art-recognized techniques. For example, long Poly A tails can be added to synthetic or in vitro transcribed RNA using Poly A polymerase (Yokoe, et al Nature Biotechnology 1996; 14:1252-1256). The transcription vector may also encode a long poly A tail. Furthermore, poly A tails can be added by transcription directly from the PCR product. Poly A can also be ligated to the 3' end of sense RNA using RNA ligase (see, e.g., molecular Cloning A Laboratory Manual,2nd Ed., ed. By Sambrook, fritsch and Maniatis (Cold Spring Harbor Laboratory Press: 1991)).
Typically, the poly a or poly U tail can be at least about 10, 50, 100, 200, 300, 400, at least 500 nucleotides in length. In some embodiments, the poly a tail at the 3' end of the mRNA generally comprises about 10 to 300 adenosine nucleotides (e.g., about 10 to 200 adenosine nucleotides, about 10 to 150 adenosine nucleotides, about 10 to 100 adenosine nucleotides, about 20 to 70 adenosine nucleotides, or about 20 to 60 adenosine nucleotides). In some embodiments, the mRNA includes a3' poly (C) tail structure. Suitable poly-C tails on the 3' end of an mRNA typically include about 10 to 200 cytosine nucleotides (e.g., about 10 to 150 cytosine nucleotides, about 10 to 100 cytosine nucleotides, about 20 to 70 cytosine nucleotides, about 20 to 60 cytosine nucleotides, or about 10 to 40 cytosine nucleotides). The poly-C tail may be added to the poly-A or poly-U tail or may be substituted for the poly-A or poly-U tail.
mRNA encoding PE according to the present disclosure can be synthesized according to any of a variety of known methods. For example, the mRNA of the present invention may be synthesized by In Vitro Transcription (IVT). Briefly, IVT is typically performed using a linear or circular DNA template comprising a promoter, a pool of ribo-triphosphates, a buffer system that can comprise DTT and magnesium ions, and a suitable RNA polymerase (e.g., T3, T7, or SP6RNA polymerase), DNAse I, pyrophosphatase, and/or RNAse inhibitor. The exact conditions will vary depending on the particular application.
In embodiments involving mRNA delivery, the ratio of mRNA to pegRNA encoding the guide editor may be important for efficient editing. In certain embodiments, the weight ratio of mRNA (encoding guide editor) to pegRNA is 1:1. In certain other embodiments, the weight ratio of mRNA (encoding guide editor) to pegRNA is 2:1. In other embodiments, the weight ratio of mRNA (encoding guide editor) to pegRNA is 1:2. In a further embodiment, the weight ratio of mRNA (encoding guide editor) to pegRNA is selected from: about 1:1000, 1:900, 1:800, 1:700, 1:600, 1:500, 1:400, 1:300, 1:200, 1:100, 1:90, 1:80, 1:70, 1:60, 1:50, 1:40, 1:30, 1:20, 1:10, and 1:1. In other embodiments, the weight ratio of mRNA (encoding guide editor) to pegRNA is selected from: about 1:1000, 1:900, 800:1, 700:1, 600:1, 500:1, 400:1, 300:1, 200:1, 100:1, 90:1, 80:1, 70:1, 60:1, 50:1, 40:1, 30:1, 20:1, 10:1, and 1:1.
[9]Therapeutic method
The present disclosure provides methods for treating subjects diagnosed with or caused by point mutations or other mutations (e.g., deletions, insertions, inversions, duplications, etc.), which may be corrected by the guidance editing systems provided herein, such as, but not limited to, prion diseases, trinucleotide repeat expansion disorders, or CDKL5 deficiency disorders (CDD) (e.g., example 6 herein).
Virtually any genetic defect causing a disease can be repaired by using guide editing, including selecting an appropriate guide editor (including napDNAbp and a polymerase (e.g., reverse transcriptase), and designing an appropriate pegRNA to (a) target an appropriate target DNA comprising an editing site, and (b) providing a template for synthesis of single stranded DNA from the 3' end of the nicking site, including substitution and substitution of the desired edits of the endogenous strand immediately downstream of the nicking site.
Methods of treating a disorder may include designing a suitable pegRNA and guiding an editor as a pre-step according to the methods described herein, which includes many considerations that may be considered, such as:
(a) A target sequence, i.e., a nucleotide sequence in which it is desired to direct the editor to install one or more nucleobase modifications;
(b) The position of the cleavage site in the target sequence, i.e., the guide editor will induce a single-stranded nick to form a 3' end RT primer sequence on one side of the nick and a 5' end endogenous flap on the other side of the nick (which is eventually removed by FEN1 or its equivalent and replaced by a 3' ssdna flap). The cleavage site forms a 3' end primer sequence that is extended by a polymerase (e.g., RT enzyme) of the guided editor during RNA-dependent DNA polymerization to form a 3' ssdna flap comprising the desired edit, which then replaces the 5' endogenous DNA flap in the target sequence.
(c) Useful PAM sequences (including classical SpCas9 PAM sites, as well as non-classical PAM sites recognized by Cas9 variants and equivalents with expanded or different PAM specificities);
(d) The spacing between the PAM sequences available and the position of the cleavage site in the PAM strand;
(e) The specific Cas9, cas9 variant or Cas9 equivalent of the available guide editor to be used (partly governed by the available PAM);
(f) The sequence and length of the primer binding site;
(g) Editing the sequence and length of the template;
(h) Sequence and length of homology arms;
(i) Spacer sequence and length; and
(j) gRNA core sequence.
Suitable pegrnas and optionally incision-generating sgrnas for second site incision generation are designed by way of the following exemplary step instruction set, which takes into account one or more of the above considerations. These steps refer to examples shown in fig. 70A to 70I.
1. Defining target sequences and edits. The sequence of the target DNA region (about 200 bp) centered around the desired editing position (point mutation, insertion, deletion or a combination thereof) is retrieved. See fig. 70A.
2. Positioning target PAM. PAM is determined adjacent to the desired edit position. PAM can be determined on any DNA strand adjacent to the desired editing position. Although PAM near the edit position is preferred (i.e., where the kerf site is less than 30nt from the edit position or from the edit position to the kerf site is less than 29nt, 28nt, 27nt, 26nt, 25nt, 24nt, 23nt, 22nt, 21nt, 20nt, 19nt, 18nt, 17nt, 16nt, 15nt, 14nt, 13nt, 12nt, 11nt, 10nt, 9nt, 8nt, 7nt, 6nt, 5nt, 4nt, 3nt, or 2 nt), the kerf can be placed at a distance of ≡30nt from the edit position using the original spacer and PAM installation edits. See fig. 70B.
3. The incision site is located. For each PAM under consideration, the corresponding incision site and on which strand is determined. For Sp Cas 9H 840A nickase, cleavage occurs between the 3 rd and 4 th bases of 5' of NGG PAM in PAM-containing strands. All editing nucleotides must be present at 3 'of the nick site, so the appropriate PAM must place the nick 5' of target editing of PAM-containing strands. In the examples shown below, there are two possible PAMs. For simplicity, the remaining steps will show a pegRNA design using PAM1 only. See fig. 70C.
4. Spacer sequences were designed. The protospacer of SpCas9 corresponds to 20 nucleotides 5' of NGG PAM in the PAM-containing strand. Efficient Pol III transcription initiation requires G as the first transcribed nucleotide. If the first nucleotide of the protospacer is G, then the spacer sequence of the pegRNA is the protospacer. If the first nucleotide of the protospacer is not G, then the spacer sequence of the pegRNA is G followed by the protospacer. See fig. 70D.
5. Primer Binding Sites (PBS) were designed. The DNA primer for the PAM-containing strand was determined using the starting allele sequence. The 3 'end of the DNA primer is the nucleotide just upstream of the nicking site (i.e., the 4 th base of 5' of NGG PAM of SpCas 9). As a general design principle for using PE2 and PE3, a pegRNA Primer Binding Site (PBS) comprising 12 to 13 nucleotides complementary to a DNA primer can be used as a sequence comprising a GC content of about 40-60%. Longer (14 to 15 nt) PBS should be tested for low GC content sequences. For higher GC content sequences, shorter (8 to 11 nt) PBS should be tested. The optimal PBS sequence should be determined empirically, regardless of GC content. To design a length p PBS sequence, use was made of InitiationAllele sequence, reverse complement of 5' first p nucleotides of the nick site in PAM-containing strand. See fig. 70E.
6. The RT template (or DNA synthesis template) is designed. RT templates (or DNA synthesis templates, where the polymerase is not reverse transcriptase) encode the homology of the designed edit and adjacent edit sequences. In one embodiment, these regions correspond to the DNA synthesis templates of fig. 3D and 3E, wherein the DNA synthesis templates comprise an "editing template" and a "homology arm". The optimal RT template length varies depending on the target site. For short-range editing (positions +1 to +6), it is recommended to test short (9 to 12 nt), medium (13 to 16 nt) and long (17 to 20 nt) RT templates. For remote editing (+7 bits and above), it is recommended to use an RT template that extends at least 5nt (preferably 10nt or more) after the editing position to allow for sufficient 3' DNA flap homology. For remote editing, multiple should be filteredThe RT templates are used to identify functional designs. For larger insertions and deletions (. Gtoreq.5 nt), it is recommended to incorporate greater 3' homology (about 20nt or more) to the RT template. Editing efficiency is typically reduced when the RT template encodes the synthesis of G as the last nucleotide in the reverse transcribed DNA product (corresponding to C in the RT template of PEgRNA). Since many RT templates support efficient guided editing, it is suggested to avoid G as the final synthesized nucleotide when designing RT templates. To design a RT template sequence of length r, use is made of Phase of time Wash the looking atThe allele sequence was taken as the reverse complement of the first r nucleotides 3' of the nick site in the original PAM-containing strand. Note that insertion or deletion editing using RT templates of the same length does not contain the same homology as SNP editing. See fig. 70F.
7. The complete pegRNA sequence was assembled. The pegRNA modules were ligated in the following order (5 'to 3'): spacer, scaffold, RT template and PBS. See fig. 70G.
8. Nicking sgrnas were designed for PE 3. PAM editing the upstream and downstream non-editing chains was determined. The optimal incision generation location is highly locus dependent and should be determined empirically. Generally, placing a 40 to 90 nucleotide nick 5' to the site opposite the pegRNA-induced nick results in higher editing yield and fewer indels. Nicking-producing sgrnas haveInitiationA20-nt protospacer in an allele matches a spacer sequence, if the protospacer does not start with G, 5' -G is added. See fig. 70H.
9. PE3b nicking generating sgRNA was designed. If PAM is present in the complementary strand and its corresponding protospacer overlaps with the sequence targeted for editing, this editing may be a candidate for the PE3b system. In the PE3b system, the spacer sequence of the nicking-generating sgRNA matches the sequence of the desired editing allele, but does not match the sequence of the starting allele. The PE3b system operates efficiently when the editing nucleotide falls within the seed region of the nicking, generating sgRNA pro-spacer (about 10nt adjacent to the PAM). This prevents nicking of the complementary strand prior to installing the editing strand, thereby preventing competition between the pegRNA and the sgRNA for binding to the target DNA. PE3b also avoids nicking both strands simultaneously, thereby maintaining High editing efficiency while significantly reducing indel formation. PE3b sgRNA should have a high affinity forIt is desirable toThe 20-nt pro-spacer in the allele matches the spacer sequence and 5' G is added if necessary. See fig. 70I.
The above described stepwise method for designing suitable pegRNA and second site nicking generating sgRNA is not meant to be limiting in any way. The present disclosure contemplates variations of the above-described step-wise methods as may be derived therefrom by one of ordinary skill in the art.
Once the appropriate pegrnas and guide editors are selected/designed, they can be administered by suitable methods, such as by vector-based transfection (where one or more vectors contain DNA encoding the pegRNA and PE fusion proteins and are expressed intracellularly after transfection with the vector), direct delivery of the guide editors complexed with the pegRNA in delivery form (e.g., lipid particles, nanoparticles) (e.g., RNP delivery), or by mRNA-based delivery systems. The present disclosure describes such methods herein and any known method may be utilized.
The pegRNA and guide editor (or together PE complex) can be delivered to the cells in therapeutically effective amounts such that upon contact with the target DNA of interest, the desired edit is installed therein.
It is envisioned that any disease can be treated by such methods, provided delivery to the appropriate cells is feasible. One of ordinary skill in the art will be able to select and/or choose a method of PE delivery that is suitable for the intended purpose and intended target cell.
For example, in some embodiments, methods are provided that include administering to a subject having such a disease (e.g., a cancer associated with a point mutation as described above) an effective amount of a guided editing system described herein that corrects the point mutation in the disease-associated gene or introduces a inactivating mutation thereto in the presence of a donor DNA molecule comprising the desired gene change, mediated by homologous directed repair. In some embodiments, provided methods comprise administering to a subject having such a disease (e.g., a cancer associated with the point mutation described above) an effective amount of a guided editing system described herein that corrects or introduces a deactivating mutation in a disease-associated gene. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that may be treated by correcting point mutations in disease-related genes or introducing deactivating mutations therein are known to those of skill in the art, and the disclosure is not limited in this respect.
In another aspect, a method is provided that uses a guide editor (e.g., PE1, PE2, or PE 3) in combination with a guide RNA (pegRNA) to perform guide editing to directly install or correct mutations in the CDKL5 gene leading to CDKL5 deficiency. In various embodiments, the disclosure provides a complex comprising a guide editor (e.g., PE1, PE2, or PE 3) and a pegRNA capable of directly installing or correcting one or more mutations in a CDKL5 gene in a plurality of subjects.
The present disclosure provides methods for treating other diseases or conditions, for example, diseases or conditions associated with or caused by point mutations that can be corrected by directed editing. Some such diseases are described herein, and other suitable diseases that can be treated with the strategies and fusion proteins provided herein based on the present disclosure will be apparent to those of skill in the art. Exemplary suitable diseases and conditions are listed below. It will be appreciated that the numbering of specific positions or residues in each sequence will depend on the particular protein and numbering scheme used. For example, the numbering may be different in the precursor of the mature protein and the mature protein itself, and sequence differences between species may affect numbering. Those skilled in the art will be able to identify the corresponding residues in any homologous protein and corresponding encoding nucleic acid by methods well known in the art, for example by sequence alignment and determination of homologous residues. Exemplary suitable diseases and conditions include, but are not limited to: 2-methyl-3-hydroxybutyric acid urea (2-methyl-3-hydroxybutyric aciduria); 3beta-hydroxysteroid dehydrogenase deficiency (3 beta-Hydroxysteroid dehydrogenase deficiency); 3-methylglutarate urine disorder (3-Methylglutaconic aciduria); 3-Oxo-5α -steroid delta 4-dehydrogenase deficiency (3-Oxo-5 alpha-steroid delta 4-dehydrogenase deficiency); 46, xy inversion type1, 3,5 (46,XY sex reversal,type 1,3,and 5); 5-hydroxyproline enzyme deficiency (5-Oxoprolinase deficiency); 6-pyruvoyl-tetrahydropterin synthase deficiency (6-pyruvoyl-tetrahydropterin synthase deficiency); aarskog syndrome (Aarskog syndrome); aase syndrome (Aase syndrome); type 2 of achondroplasia (Achondrogenesis type 2); achromatopsia 2and 7 (achrombotopsia 2and 7); acquired long QT syndrome (Acquired long QT syndrome); acrocalllosal syndrome Schinzel type (Acrocallosal syndrome, schinzel type); acroapitofemorol dysplasia (Acrocapitofemoral dysplasia); acroosseous hypoplasia 2, with or without hormonal resistance (acrodysostomis 2,with or without hormone resistance); acroerythema keratosis (acroerythema); dysplasia of the shoulder (Acromicric dysplasia); acth independent adrenal macrotuberous hyperplasia 2 (Acth-independent macronodular adrenal hyperplasia 2); activating PI3K-delta syndrome (Activated PI3K-delta syndrome); acute intermittent porphyria (Acute intermittent porphyria); acyl-coa dehydrogenase family member 9 deficiency (deficiency of Acyl-CoA dehydrogenase family, membrane 9); adams-Oliver syndromes 5and 6; deficiency of adenine phosphoribosyl transferase (Adenine phosphoribosyltransferase deficiency); adenylate kinase deficiency (Adenylate kinase deficiency); hemolytic anemia, caused by a deficiency of adenylyl succinate lyase (hemolytic anemia due to Adenylosuccinate lyase deficiency); adolescent kidney wasting disease (Adolescent nephronophthisis); kidney-liver-pancreatic dysplasia (Renal-hepatic-pancreatic dysplasia); meckel (Meckel) syndrome type 7; adrenoleukodystrachy (adrenohaleukodystrachy); epidermolysis bullosa, an adult interface (Adult junctional epidermolysis bullosa); junctional epidermolysis bullosa, localized variation (Epidermolysis bullosa, junction, localisata variant); adult neuronal ceroid lipofuscinosis (Adult neuronal ceroid lipofuscinosis); adult neuronal ceroid lipofuscinosis (Adult neuronal ceroid lipofuscinosis); adult onset ataxia with eye movement loss (Adult onset ataxia with oculomotor apraxia); ADULT syndrome; fibrinogen-free and congenital fibrinogen-free (Afibrinogenemia and congenital Afibrinogenemia); autosomal recessive agaropectinemia 2 (autosomal recessive Agammaglobulinemia 2); age-related macular degeneration 3,6,11 and 12 (Age-related macular degeneration 3,6,11 and 12); aicardi Goutieres syndromes 1, 4and 5; chilbain lupus 1; alagille syndromes 1and 2; alexander (Alexander) disorder; uricemia (Alkaptonia); allan-Herndon-Dudley syndrome; alopecia congenital (Alopecia universalis congenital); alper's encephalopathy (Alpers encephalopathy); alpha-1-antitrypsin deficiency (Alpha-1-antitrypsin deficiency); autosomal dominant, autosomal recessive, and X-linked recessive Alport syndrome (autosomal dominant, autosomal recessive, and X-linked recessive Alport syndromes); familial alzheimer's disease 3with spastic lower limb paresis and disuse (Alzheimer disease, family, 3,with spastic paraparesis and apraxia); alzheimer's disease types 1, 3and 4 (Alzheimer disease, types,1,3, and 4); hypocalcific and hypomature forms, IIA1 enamel hypoplasia (hypocalcification type and hypomaturation type, IIA1 Amelogenesis imperfecta); aminoacylase 1deficiency (Aminoacylase 1 deficiency); a Mi Shen infant epileptic syndrome (Amish infantile epilepsy syndrome); amyloid transthyretin amyloidosis (Amyloidogenic transthyretin amyloidosis); amyloid cardiomyopathy, transthyretin-related (Amyloid Cardiomyopathy); cardiomyopathy (cardiomycoplathy); amyotrophic lateral sclerosis type1, type 6, type 15 (with or without frontotemporal dementia), type 22 (with or without frontotemporal dementia), and type 10 (Amyotrophic lateral sclerosis types 1,6,15 (with or without frontotemporal dementia), 22 (with or without frontotemporal dementia), and 10); frontotemporal dementia with TDP43 inclusions, TARDBP-related (Frontotemporal dementia with TDP43 incorporating, TARDBP-related); andermann syndrome; andersenTawil syndrome; congenital long QT syndrome (Congenital long QT syndrome); non-hemocyte type hemolytic Anemia, G6PD deficiency (Anemia, nonspherocytic hemolytic, due to G6PD deference); angelman syndrome; severe neonatal encephalopathy is accompanied by a small head deformity (Severe neonatal-onset encephalopathy with microcephaly); susceptible autism, X-linked 3 (susceptibility to Autism, X-linked 3); hereditary vascular disease is accompanied by renal disease, aneurysms, and muscle spasms (Angioplathy, heredity, with nephotath, aneuroses, and muscle cramps); angiotensin i-converting enzyme, benign serum elevation (angiotenin i-converting enzyme, benign serum increase); aneroid, cerebellar ataxia and mental retardation (Aniridia, cerebellar ataxia, and mental retardation); nail-free (Anonychia); antithrombin III deficiency (Antithrombin III deficiency); antley-Bixler syndrome (Antley-Bixler syndrome with genital anomalies and disordered steroidogenesis) with genital abnormalities and steroid-generating disorders; familial thoracic Aortic aneurysms 4, 6and 9 (Aortic aneurysms, familial thoracic, 4,6, and 9); thoracic aortic aneurysm and aortic dissection (Thoracic aortic aneurysms and aortic dissections); multisystem smooth muscle dysfunction syndrome (Multisystemic smooth muscle dysfunction syndrome); smog disease 5 (Moyamoya disease 5); aplastic anemia (Aplastic anemia); apparent mineralocorticoid excess (Apparent mineralocorticoid excess); arginase deficiency (Arginase deficiency); argininosuccinate lyase deficiency (Argininosuccinate lyase deficiency); aromatase deficiency (Aromatase deficiency); arrhythmogenic right ventricular cardiomyopathy type 5, type 8and type 10 (Arrhythmogenic right ventricular cardiomyopathy types, 5, 8and 10); primary familial hypertrophic cardiomyopathy (Primary familial hypertrophic cardiomyopathy); congenital distal multiple joint bending disorder, X-linked (Arthrogryposis multiplex congenita, distal, X-linked); arthrodesis renal insufficiency cholestasis syndrome (Arthrogryposis renal dysfunction cholestasis syndrome); joint bending, renal dysfunction and cholestasis 2 (Arthrogryposis, renal dysfunction, and cholestasis 2); asparagine synthetase deficiency (Asparagine synthetase deficiency); abnormal neuronal migration (Abnormality of neuronal migration); ataxia is accompanied by vitamin E deficiency (Ataxia with vitamin E deficiency); autosomal dominant sensory Ataxia (Ataxia, sensor, autosomal dominant); ataxia-telangiectasia syndrome (Ataxia-telangiectasia syndrome); hereditary cancer susceptibility syndrome (Hereditary cancer-predisposing syndrome); transferrin deficiency (ataransferrineia); familial atrial fibrillation 11,12,13, and 16 (Atrial fibrillation, family, 11,12,13, and 16); atrial septal defects 2,4, and7 (with or without atrioventricular conduction defects) (Atrial septal defects, 2,4, and7 (with or without atrioventricular conduction defects)); atrial block 2 (Atrial standstill 2); atrial septal defect 4 (Atrioventricular septal defect 4); hereditary atrophy of the eyeball (Atrophia bulborum hereditaria); ATR-X syndrome (ATR-X syndrome); ear condylar syndrome2 (Auriculocondylar syndrome 2); multisystem autoimmune diseases, infancy onset (Autoimmune disease, multisystem, infantile-onset); autoimmune lymphoproliferative syndrome type 1a (Autoimmune lymphoproliferative syndrome, type1 a); autosomal dominant inherited hypohidrosis ectodermal dysplasia (Autosomal dominant hypohidrotic ectodermal dysplasia); autosomal dominant progressive exooculopathy is accompanied by mitochondrial DNA deletions 1and 3 (Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions 1and 3); autosomal dominant torsokinesis 4 (Autosomal dominant torsion dystonia 4); autosomal recessive central nuclear myopathy (Autosomal recessive centronuclear myopathy); autosomal recessive congenital ichthyosis 1,2,3,4A and 4B (Autosomal recessive congenital ichthyosis 1,2,3,4A, and 4B); autosomal recessive skin relaxant type IA and type 1B (Autosomal recessive cutis laxa type IA and 1B); autosomal recessive hypohidrosis ectodermal dysplasia syndrome (Autosomal recessive hypohidrotic ectodermal dysplasia syndrome); ectodermal dysplasia11b (Ectodermal dysplasia11 b); hypohidrosis/hair/tooth type, autosomal recessive) autosomal recessive; autosomal recessive hypophosphatemia bone disease (Autosomal recessive hypophosphatemic bone disease); axenfeld-Rieger syndrome type 3; bainbridge-Ropers syndrome; bannayan-Riley-Ruvalcaba syndrome; PTEN hamartoma tumor syndrome (PTEN hamartoma tumor syndrome); baraitser-Winter syndromes 1and 2; barakat syndrome; barset-Biedl syndromes 1, 11, 16and 19; naked lymphocyte syndrome type 2, complementation group E (Bare lymphocyte syndrome type 2,complementation group E); prenatal Bartter syndrome type 2 (Bartter syndrome antenatal type); bartter syndrome type 3, type 3with low calcium urine and type 4 (Bartter syndrome types 3,3with hypocalciuria,and 4); idiopathic basal nuclear calcification 4 (Basal ganglia calcification, idiopathic, 4); beaded hair (Beaded hair); benign familial hematuria (Benign familial hematuria); benign familial neonatal epilepsy 1and2 (Benign familial neonatal seizures 1and 2); benign familial neonatal epilepsy 1 and/or myofiber tics (Seizures, benign familial neonatal,1, and/or myokymia); epileptic, early infant epileptic encephalopathy 7 (Seizures, early infantile epileptic encephalopathy 7); benign familial neonatal-infant epilepsy (Benign familial neonatal-infantile seizures); benign hereditary chorea (Benign hereditary chorea); benign scapular fibular muscular dystrophy with cardiomyopathy (Benign scapuloperoneal muscular dystrophy with cardiomyopathy); bernard-Soulier syndrome types A1 and A2 (autosomal dominant); bestrophinopath, autosomal recessive; beta Thalassemia (beta thalasssemi); bethlem myopathy and Bethlem myopathy 2 (Bethlem myopathy and Bethlem myopathy 2); bietti crystalline-like corneal retinal dystrophy (Bietti crystalline corneoretinal dystrophy); congenital bile acid synthesis disorder 2 (Bile acid synthesis defect, connetial, 2); biotin enzyme deficiency (Biotinidase deficiency); birk Barel mental retardation deformity syndrome (Birk Barel mental retardation dysmorphism syndrome); narrow, drooping and inverted inner canthus skin neoplasms (blepharospheimia, ptosis, and epicanthus inversus); bloom syndrome; borjeson-forsman-Lehmann syndrome; boucher Neuhauser syndrome; short fingers type A1 and A2 (Brachydactyly types A and A2); short-finger with hypertension (Brachydactyly with hypertension); cerebral small vessel disease is accompanied by bleeding (Brain small vessel disease with hemorrhage); branched-chain ketoacid dehydrogenase kinase deficiency (Branched-chain-chain ketoacid dehydrogenase kinase deficiency); branchiostic syndromes 2and 3; early-onset Breast cancer (early-set); familial Breast cancer-ovarian cancers 1,2, and 4 (Breast-ovarian cancer, family 1,2, and 4); keratolytic syndrome2 (Brittle cornea syndrome 2); brown myopathy; bronchodilation with or without elevated sweat chloride 3 (Bronchiectasis with or without elevated sweat chloride 3); brown-Vialetto-Van Laere syndrome and Brown-Vialetto-Van Laere syndrome 2; brugada syndrome; brugada syndrome 1; ventricular fibrillation (Ventricular fibrillation); paroxysmal familial ventricular fibrillation (Paroxysmal familial ventricular fibrillation); brugada syndrome and Brugada syndrome 4; long QT syndrome; sudden cardiac death (Sudden cardiac death); bovine eye-like macular dystrophy (Bull eye macular dystrophy); stargardt disease 4; cone rod dystrophy 12 (Cone-rod dynasty 12); bullous ichthyosis-like erythroderma (Bullous ichthyosiform erythroderma); burn-Mckeown syndrome; familial Candidiasis 2,5,6, and 8 (candidasis, family, 2,5,6, and 8); carbohydrate deficiency glycoprotein syndrome type I and II (Carbohydrate-deficient glycoprotein syndrome type I and II); carbonic anhydrase VA deficiency, hyperamidemia (Carbonic anhydrase VA deficiency, hyperammonemia due to); colon cancer (Carcinoma of colon); arrhythmia (Cardiac arrhythmia); long QT syndrome LQT1 subtype (Long QT syndrome, LQT1 subtype); infant fatal cardiomyopathy, cytochrome c oxidase deficiency (cardioencephalomyopathic, fatalinfartile, due to cytochrome c oxidase deficiency); cardiofectiocutaneous syndrome; cardiomyopathy (cardiomycoplathy); danon disease; hypertrophic cardiomyopathy (Hypertrophic cardiomyopathy); left ventricular densification incomplete cardiomyopathy (Left ventricular noncompaction cardiomyopathy); carnevale syndrome; carney syndrome type 1; carnitine acyl carnitine translocase deficiency (Carnitine acylcarnitine translocase deficiency); carnitine palmitoyl transferase I, II (tardive) and II (infant) deficiency (Carnitine palmitoyltransferase I, II (late set), and II (infantile) deficiency); cataracts 1,4, autosomal dominant, multiple types, with microkeratoses, copdock-like, juvenile, with microkeratoses and diabetes, and diffuse non-progressive nuclei (Cataract 1,4,autosomal dominant,autosomal dominant,multiple types,with microcornea,coppock-like, juvenile, with microcornea and glucosuria, and nuclear diffuse nonprogressive); catecholamine-sensitive polymorphic ventricular tachycardia (Catecholaminergic polymorphic ventricular tachycardia); tail degeneration syndrome (Caudal regression syndrome); familial Cd8 deficiency (family); central axons (Central core disease); 1. chromosome 9and 16 centromeres are unstable and immunodeficiency (Centromeric instability of chromosomes 1,9and 16and immunodeficiency); pediatric cerebellar ataxia with progressive extraocular paralysis and cerebellar ataxia, mental retardation and balance syndrome2 (Cerebellar ataxia infantile with progressive external ophthalmoplegi and Cerebellar ataxia, mental retardation, and dysequilibrium syndrome 2); APP-associated cerebral amyloid angiopathy (Cerebral amyloid angiopathy, APP-related); autosomal dominant and recessive cerebral arterial lesions were associated with subcortical infarction and leukoencephalopathy (Cerebral autosomal dominant and recessive arteriopathy with subcortical infarcts and leukoencephalopathy); cerebral cavernous hemangioma 2 (Cerebral cavernous malformations 2); brain-eye-surface skeletal syndrome2 (Cerebrooculofacioskeletal syndrome 2); brain-eye-face-bone syndrome (Cerebro-oculo-facio-skeletal syndrome); brain retinal microvascular disease is accompanied by calcification and cysts (Cerebroretinal microangiopathy with calcifications and cysts); neuronal ceroid lipofuscinosis 2,6,7, and 10 (); ch\xc3\xa9diak-Higashi syndrome, adult-sized Chediak-Higashi syndrome (Ceroid lipofuscinosis neuronal 2,6,7,and 10;Ch\xc3\xa9diak-Higashi syndrome, chediak-Higashi syndrome, adult type); charcot-mare-Tooth disease forms 1B, 2B2, 2C, 2F, 2I, 2U (axon), 1C (demyelination), dominant intermediate C, recessive intermediate a, 2A2, 4C, 4D, 4H, IF, IVF and X; shoulder fibular spinal muscular atrophy (Scapuloperoneal spinal muscular atrophy); congenital non-progressive distal spinal muscular atrophy (Distal spinal muscular atrophy, congenital nonprogressive); autosomal recessive distal spinal muscular atrophy 5 (Spinal muscular atrophy, distal, autosomal recessive, 5); CHARGE syndrome; childhood phosphatase hypo (Childhood hypophosphatasia); adult phosphatase hypo (Adult hypophosphatasia); cholecystitis (cholecytitis); progressive familial intrahepatic cholestasis 3 (Progressive familial intrahepatic cholestasis 3); intrahepatic Cholestasis 3 (Cholestasis, intrahepatics, of pregnancy 3); a cholesterol storage disease (Cholestanol storage disease); cholesterol monooxygenase (side-chain cutting) deficiency (Cholesterol monooxygenase (side-chain cutting) deficiency); chondrodysplasia blomsstrand type (Chondrodysplasia Blomstrand type); punctate dysplasia1, X-linked recessive and2X-linked dominant (Chondrodysplasia punctata 1, X-linked recessive and2X-linked dominint); CHOPS syndrome; chronic granulomatosis, autosomal recessive cytochrome b positive, type 1and type 2 (Chronic granulomatous disease, autosomal recessive cytochrome b-positive, types 1and 2); chudley-McCullough syndrome; primary ciliated dyskinesias 7,11,15,20and22 (Ciliary dyskinesia, primary,7,11,15,20and 22); citrullinemia type I (Citrullinemia type I); citrullinemia type I and II (Citrullinemia type I and II); craniocerebral hypoplasia (Cleidocranial dysostosis); a C-like syndrome (C-like syndrome); cockayne syndrome type a; primary Coenzyme Q10 deficiency 1, 4and 7 (Coenzyme Q10 deficiency, primary 1,4, and 7); coffinSiris/mental retardation (Coffin Siris/Intellectual Disability); coffin-Lowry syndrome; cohen syndrome; cold-induced sweating syndrome 1 (Cold-induced sweating syndrome 1); COLE-CARPENTER syndrome 2; combined cellular and humoral immunity deficiency with granuloma (Combined cellular and humoral immune defects with granulomas); combining d-2-and l-2-hydroxyglutarate (); combined malonate and methylmalonic acid urea (Combined-2-and l-2-hydroxyglutaric aciduria; combined malonic and methylmalonic aciduria); combined oxidative phosphorylation defects 1,3,4,12,15, and 25 (Combined oxidative phosphorylation deficiencies 1,3,4,12,15,and 25); combination of partial and complete 17-alpha-hydroxylase/17,20-lyase deficiency (Combined partial and complete 17-alpha-hydroxylase/17,20-lyase deficiency); common variant immunodeficiency 9 (Common variable immunodeficiency 9); c1 inhibitor complement component 4 partial deficiency, dysfunction (Complement component 4,partial deficiency of,due to dysfunctional c1 inhibitor); complement factor B deficiency (Complement factor B deficiency); cone cell monochromatic color vision (Cone monochromatism); cone dystrophies 2and 6 (Cone-rod dynasty 2and 6); cone rod dystrophic enamel hypoplasia (Cone-rod dystrophy amelogenesis imperfecta); x-linked congenital adrenal hyperplasia and congenital adrenal hypoplasia (Congenital adrenal hyperplasia and Congenital adrenal hypoplasia, X-linked); congenital megakaryocytopenia-free (Congenital amegakaryocytic thrombocytopenia); congenital aneroid (Congenital aniridia); congenital central hypoventilation (Congenital central hypoventilation); congenital megacolon disease 3 (Hirschsprung disease 3); congenital contracture spider-like syndrome (Congenital contractural arachnodactyly); congenital limb and facial contractures, hypotonia, and developmental retardation (Congenital contractures of the limbs and face, hypotonia, and developmental delay); congenital glycosylation disorders 1B,1D,1G,1H,1J,1K,1N,1P,2C,2J, type 2K, IIm (Congenital disorder of glycosylation types B,1D,1G,1H,1J,1K,1N,1P,2C,2J,2K, iim); congenital erythropoiesis anoxia type I and II (Congenital dyserythropoietic anemia, type I and II); congenital facial ectodermal dysplasia (Congenital ectodermal dysplasia of face); congenital erythropoietic porphyria (Congenital erythropoietic porphyria); congenital generalized lipodystrophy type 2 (Congenital generalized lipodystrophy type 2); congenital heart disease type 2 (Congenital heart disease, multiple types, 2); congenital heart disease (Congenital heart disease); aortic arch dissection (Interrupted aortic arch); congenital lipoma overgrowth, vascular malformations, and epidermal nevi (Congenital lipomatous overgrowth, vascular malformations, and epidermal nevi); non-small cell lung cancer (Non-small cell lung cancer); ovarian tumor (Neoplasm of ovary); nonspecific cardiac conduction defects (Cardiac conduction defect, non-specific); congenital microvilli atrophy (Congenital microvillous atrophy); congenital muscular dystrophy (Congenital muscular dystrophy); congenital muscular dystrophy, caused in part by LAMA2deficiency (Congenital muscular dystrophy due to partial LAMA2 deficiency); congenital muscular dystrophy-associated glycoprotein diseases with brain and eye abnormalities A2, A7, A8, a11 and a14 (Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, types A2, A7, A8, a11, and a 14); congenital muscular dystrophy-amyotrophic-associated glycoprotein diseases with mental retardation type B2, B3, B5, and B15 (Congenital muscular dystrophy-dystroglycanopathy with mental retardation, types B2, B3, B5, and B15); congenital muscular dystrophy-amyotrophic lateral sclerosis-associated glycoprotein disease, mental retardation free type B5 (Congenital muscular dystrophy-dystroglycanopathy without mental retardation, type B5); congenital myomegaly-brain syndrome (Congenital muscular hypertrophy-cerebral syndrome); congenital muscle weakness syndrome, acetazolamide reactivity (Congenital myasthenic syndrome); congenital myopathy with fiber imbalance (Congenital myopathy with fiber type disproportion); congenital eye defects (Congenital ocular coloboma); congenital stationary night blindness type 1A,1B,1C,1E,1F, type 2A (Congenital stationary night blindness, type 1A,1B,1C,1E,1F,and 2A); coproporphyria (Coproporphyria); applanation Cornea 2 (Cornea plana 2); fuchs corneal endothelial dystrophy 4 (Corneal dystrophy, fuchs endothelial, 4); corneal endothelial dystrophy type 2 (Corneal endothelial dystrophy type 2); corneal friable spherical cornea, blue sclera, and hyperkinesia (Corneal fragility keratoglobus, blue sclerae and joint hypermobility); cornelia de Lange syndromes 1and 5; autosomal dominant coronary lesions 2 (Coronary artery disease, autosomal dominant 2); coronary heart disease (Coronary heart disease); hyperalpha lipoproteinemia 2 (hyperalpha lipoproteoemia 2); complex cortical dysplasia is accompanied by other brain deformities 5and 6 (Cortical dysplasia, complex, with other brain malformations and 6); occipital cortical deformity (Cortical malformations, occipital); cortisol binding globulin deficiency (coricosteroid-binding globulin deficiency); corticosterone methyl oxidase type 2deficiency (Corticosterone methyloxidase type deficiency); costello syndrome; cowden syndrome 1; flat hip (Coxa plana); autosomal dominant diaphyseal dysplasia (Craniodiaphyseal dysplasia, autosomal dominant); craniosynostosis 1and 4 (crariosyrosis 1and 4); craniocerebral suture premature closure and dental abnormality (Craniosynostosis and dental anomalies); creatine deficiency, X-linked (Creatine deficiency, X-linked); crouzon syndrome; cryptoeye syndrome (Cryptophthalmos syndrome); unilateral or bilateral Cryptorchidism (unilateral or bilateral); cushing refers to joint adhesions (Cushing symphalangism); cutaneous malignant melanoma 1 (Cutaneous malignant melanoma 1); cutaneous laxity is accompanied by osteodystrophy and severe pulmonary, gastrointestinal and urinary system abnormalities (Cutis laxa with osteodystrophy and with severe pulmonary, gastrointestinal, and urinary abnormalities); transient Cyanosis and atypical kidney disease in neonates (cyansis, transient neonatal and atypical nephropathic); cystic fibrosis (Cystic fibrosis); cystiuria (Cystinuria); cytochrome c oxidase i deficiency (Cytochrome c oxidase i deficiency;); cytochrome-c oxidase deficiency (Cytochrome-c oxidase deficiency); d-2-hydroxyglutarate 2 (D-2-hydroxyglutaric aciduria 2); segmental Darier disease (segment); deafness is accompanied by complex hypoplastic small ear disease (LAMM) (Deafness with Labyrinthine Aplasia Microtia and Microdontia (LAMM)); deafness, autosomal dominant 3a,4,12,13,15, autosomal dominant non-syndromic sensory nerves 17,20 and 65 (Deafness, autosomal dominant 3a,4,12,13,15,autosomal dominant nonsyndromic sensorineural 17,20,and 65); deafness, autosomal recessive 1A,2,3,6,8,9,12,15,16,18b,22,28,31,44,49,63,77,86 and 89 (deaess, autosomal recessive 1A,2,3,6,8,9,12,15,16,18b,22,28,31,44,49,63,77,86,and 89); cochlear Deafness with myopia and mental retardation, no vestibular involvement, autosomal dominant, X-linkage 2 (Deafness, cochlear, with myopia and intellectual impairment, without vestibular involvement, autosomal dominant, X-linked 2); 2-methylbutyryl-CoA dehydrogenase Deficiency (Deficelectronic of 2-methyllbutyryl-CoA dehydrogenase); 3-hydroxyacyl-coa dehydrogenase Deficiency (Defiiciency of 3-hydroxy ycyl-CoA dehydrogenase); alpha-mannosidase deficiency (Deficiency of alpha-mannosidase); aromatic-L-amino acid decarboxylase deficiency (Deficiency of aromatic-L-amino-acid decarboxylase); phosphoglyceromutase deficiency (Deficiency of bisphosphoglycerate mutase); butyryl-CoA dehydrogenase deficiency (Deficiency of butyryl-CoA dehydrogenase); iron oxidase deficiency (Deficiency of ferroxidase); galactokinase deficiency (Deficiency of ferroxidase; deficiency of galactokinase); guanidinoacetic acid methyltransferase deficiency (Deficiency of guanidinoacetate methyltransferase); hyaluronidase deficiency (Deficiency of hyaluronoglucosaminidase); deficiency of 5-phosphoribosyl isomerase (Deficiency of ribose-5-phosphate isomerase); steroid 11-beta-monooxygenase deficiency (Deficiency of steroid 11-beta-monooxygenase); UDP glucose-hexose-1-phosphate uridyltransferase deficiency (Deficiency of UDPglucose-hexose-1-phosphate uridylyltransferase); xanthine oxidase deficiency (Deficiency of xanthine oxidase); dejerine-Sottas disease; summer-horse-figure three disease type ID and IVF; autosomal dominant Dejerine-sotmas syndrome; dendritic cells, monocytes, B lymphocytes and natural killer lymphocyte deficiency (Dendritic cell, monocyte, B lymphocyte, and natural killer lymphocyte deficiency); desbuquois dysplasia 2; desbuquois syndrome; DFNA2 non-syndromic hearing loss (DFNA 2Nonsyndromic Hearing Loss); diabetes and diabetes insipidus with optic atrophy and deafness (Diabetes mellitus and insipidus with optic atrophy and deafness); diabetes type 2and insulin dependent 20 (Diabetes mellitus, type 2, and insulin-dependent, 20); diamond-Blackfan anemia 1, 5, 8and 10; diarrhea 3 (congenital sodium secretion syndrome) and 5 (accompanied by congenital tufted bowel disease) (diaorhea 3 (secret media, conngital) and 5 (with tufting enteropathy, conngital)); dicarboxy amino acid urea (Dicarboxylic aminoaciduria); diffuse palmoplantar keratosis Bothnian (Diffuse palmoplantar keratoderma, bothnian type); digitorenocerebral syndrome; dihydropterin reductase deficiency (Dihydropteridine reductase deficiency); dilated cardiomyopathy 1A,1AA,1C,1G,1BB,1DD,1FF,1HH,1I,1KK,1N,1S,1Y, and 3B (Dilated cardiomyopathy 1A,1AA,1C,1G,1BB,1DD,1FF,1HH,1I,1KK,1N,1S,1Y,and 3B); left ventricular densification imperfection 3 (Left ventricular noncompaction 3); dyssteroidogenesis, a deficiency of cytochrome p450 oxidoreductase (Disordered steroidogenesis due to cytochrome p450 oxidoreductase deficiency); distal joint flexion type 2B (Distal arthrogryposis type 2B); distal hereditary motor neuron disease type 2B (Distal hereditary motor neuronopathy type 2B); remote myopathy markesberg-Griggs type (Distal myopathy Markesbery-Griggs type); distal spinal muscular atrophy, X-linked 3 (Distal spinal muscular atrophy, X-linked 3); birow ciliary-lymphedema syndrome (Distichiasis-lymphedema syndrome); dominant dystrophic epidermolysis bullosa is associated with skin defects (Dominant dystrophic epidermolysis bullosa with absence of skin); dominant hereditary optic atrophy (Dominant hereditary optic atrophy); donnai Barrow syndrome; deficiency of dopamine beta hydroxylase (Dopamine beta hydroxylase deficiency); dopamine receptor d2, decreased brain density (Dopamine receptor d, reduced brain density of); dowling-degos disease 4; doyne cellular retinal dystrophy (Doyne honeycomb retinal dystrophy); malattia leventinese; duane syndrome type 2; dubin-Johnson syndrome; duchenne muscular dystrophy (Duchenne muscular dystrophy); becker muscular dystrophy (Becker muscular dystrophy); fibrinogen abnormality (Dysfibrinogenemia); autosomal dominant and autosomal dominant congenital hyperkeratosis 3 (Dyskeratosis congenita autosomal dominant and autosomal dominant, 3); congenital dysplastic, autosomal recessive 1,3,4, and 5 (Dyskeratosis congenita, autosomal recessive,1,3,4, and 5); congenital hyperkeratosis, X-linked (Dyskeratosis congenita X-linked); familial Dyskinesia is accompanied by facial muscle twitching (dyskinsia, family, with facial myokymia); abnormal plasminogen blood (dysplasminogenmia); dystonia 2 (torsion, autosomal recessive), 3 (torsion, X linkage), 5 (Dopa-responsive), 10,12,16,25,26 (myoclonus) (Dystonia 2 (tension, autosomal recessive), 3 (tension, X-linked), 5 (Dopa-responsive type), 10,12,16,25,26 (myolonic)); benign familial infant epilepsy 2 (Seizures, benign familial infantile, 2); early infant epileptic encephalopathy 2,4,7,9,10,11,13, and 14 (Early infantile epileptic encephalopathy 2,4,7,9,10,11,13,and 14); atypical Rett syndrome; early T cell progenitor acute lymphoblastic leukemia (Early T cell progenitor acute lymphoblastic leukemia); ectodermal dysplastic fragile-skin syndrome (Ectodermal dysplasia skin fragility syndrome); ectodermal dysplasia-parallel finger (toe) syndrome 1 (Ectodermal dysplasia-syndactyly syndrome 1); crystalline ectopic, isolated autosomal recessive and dominant (Ectopia lens, isolated autosomal recessive and dominant); finger (toe) deformity, ectodermal dysplasia and cleft lip/palate syndrome 3 (ectrodactyl, ectodermal dysplasia, and cleft lip/plate syndrome 3); ehlers-Danlos syndrome type 7 (autosomal recession), classical, type 2 (premature senility), hydroxylysine deficiency, type 4 variation, and tenascin-X deficiency (Ehlers-Danlos syndrome type (autosomal recessive), classification type, type 2 (progenoid), hydroxyysine-identification, type 4,type 4variant,and due to tenascin-X identification); eichsfeld type congenital muscular dystrophy (Eichsfeld type congenital muscular dystrophy); endocrine-brain bone dysplasia (Endocrine-cerebrosteododysplasia); enhanced s-cone syndrome (Enhanced s-cone syndrome); large vestibular aqueduct syndrome (Enlarged vestibular aqueduct syndrome); enterokinase deficiency (Enterokinase deficiency); warty epidermodysplasia (Epidermodysplasia verruciformis); simple epidermolysis bullosa and limb-girdle muscular dystrophy, simple with macular pigmentation, simple with pyloric closure, simple, autosomal recession, and with pyloric closure (Epidermolysa bullosa simplex and limb girdle muscular dystrophy, simplex with mottled pigmentation, simplex with pyloric atresia, simplex, autosomal recessive, and with pyloric atresia); epidermolytic palmoplantar keratosis (Epidermolytic palmoplantar keratoderma); familial febrile convulsion 8 (Familial febrile seizures 8); children's absence Epilepsy 2,12 (idiopathic generalization, susceptibility) 5 (night frontal lobe), night frontal lobe type1, partial, with variable lesions, progressive myoclonus 3and X linked, with variable learning and behavioural disorders (epiepsy, childhood absence, 12 (idiopathic generalized, susceptibility to) 5 (nocturnal frontal lobe), nocturnal frontal lobe type1,partial,with variable foci,progressive myoclonic 3,and X-linked, with variable learning disabilities and behavior disorders); epileptic encephalopathy, childhood onset, early infancy, 1,19,23,25,30, and 32 (Epileptic encephalopathy, child-set, early infartile, 1,19,23,25,30,and 32); multiple epiphyseal dysplasia with myopia and presbycusis (Epiphyseal dysplasia, multiple, with myopia and conductive deafness); episodic ataxia type 2 (Episodic ataxia type 2); familial paroxysmal pain syndrome 3 (Episodic pain syndrome, family, 3); epstein syndrome; fechtner syndrome; erythropoiesis protoporphyria (Erythropoietic protoporphyria); estrogen resistance (Estrogen resistance); exudative vitreoretinopathy 6 (Exudative vitreoretinopathy 6); fabry disease and fabry disease heart variation (Fabry disease and Fabry disease, cardioac variant); factor H, VII, X, v and Factor viii,2 combined deficiency, xiii, subunit, deficiency (Factor H, VII, X, v and Factor viii, combined deficiency of 2,xiii,a subunit,deficiency); familial adenomatous polyposis 1and 3 (Familial adenomatous polyposis 1and 3); familial amyloid nephropathy with urticaria and deafness (Familial amyloid nephropathy with urticaria and deafness); familial cold urticaria (Familial cold urticarial); cerebellar indolent familial hypoplasia (Familial aplasia of the vermis); familial benign pemphigus (Familial benign pemphigus); familial breast cancer (Familial cancer of breast); susceptible Breast cancer (Breast cancer, susceptibility to); osteosarcoma (Osteosarcoma); pancreatic cancer 3 (Pancreatic cancer 3); familial cardiomyopathy (Familial cardiomyopathy); familial cold autoinflammatory syndrome2 (Familial cold autoinflammatory syndrome 2); familial colorectal cancer (Familial colorectal cancer); familial exudative vitreoretinopathy, X-linked (Familial exudative vitreoretinopathy, X-linked); familial hemiplegia migraine type 1and type 2 (Familial hemiplegic migraine types and 2); familial hypercholesterolemia (Familial hypercholesterolemia); familial hypertrophic cardiomyopathy 1,2,3,4,7,10,23, and 24 (Familial hypertrophic cardiomyopathy 1,2,3,4,7,10,23and 24); familial hypokalemia-hypomagnesemia (Familial hypokalemia-hypomagnesemia); familial glomerular cystic kidney disease (Familial hypoplastic, glomerulocystic kidney); familial infantile muscle weakness (Familial infantile myasthenia); familial juvenile gout (Familial juvenile gout); familial mediterranean fever and familial mediterranean fever, autosomal dominant (Familial Mediterranean fever and Familial mediterranean fever, autosomal dominant); familial foramen cerebral deformity (Familial porencephaly); familial delayed skin porphyria (Familial porphyria cutanea tarda); familial pulmonary capillary hemangiomatosis (Familial pulmonary capillary hemangiomatosis); familial renal diabetes (Familial renal glucosuria); familial renal hypouricemia (Familial renal hypouricemia); familial restrictive cardiomyopathy 1 (Familial restrictive cardiomyopathy 1); familial hyperlipoproteinemia type 1and type 3 (family type 1and 3hyperlipoproteinemia); fanconi anemia, complementation group E, I, N and O (Fanconi anemia, complementation group E, I, N, and O); fancon-Bickel syndrome; susceptible Favism (susceptibility to); familial Febrile convulsion 11 (11); feingold syndrome 1; fetal hemoglobin quantitative trait locus 1 (Fetal hemoglobin quantitative trait locus 1); FG syndrome and FG syndrome 4; congenital extraocular muscle fibrosis 1,2,3a (with or without extraocular muscle involvement), 3b (Fibrosis of extraocular muscles, connetial, 1,2,3a (with or without extraocular involvement), 3 b); fisheye disease (Fish-eye disease); spot corneal dystrophy (Fleck corneal dystrophy); flowing-Harbor syndrome; focal epilepsy with or without mental retardation due to language disorder (Focal epilepsy with speech disorder with or without mental retardation); focal segmental glomerulosclerosis 5 (Focal segmental glomerulosclerosis 5); forebrain defect (Forebrain defects); frank Ter Haar syndrome; borrone Di Rocco Crovato syndrome; frasier syndrome; wilms tumor1 (Wilms tumor 1); freeman-Sheldon syndrome; frontal metaphyseal dysplasia 1and 3 (Frontometaphyseal dysplasia 1and 3); frontotemporal dementia (Frontotemporal dementia); frontotemporal dementia and/or amyotrophic lateral sclerosis 3and 4 (Frontotemporal dementia and/or amyotrophic lateral sclerosis 3and 4); frontotemporal dementia chromosome 3 linkage and frontotemporal dementia ubiquitin positive (Frontotemporal Dementia Chromosome 3-Linked and Frontotemporal dementia ubiquitin-positive); fructose-bisphosphatase deficiency (fructiose-biphosphatase deficiency); fulman syndrome (Fuhrmann syndrome); gamma-aminobutyric acid transaminase deficiency (Gamma-aminobutyric acid transaminase deficiency); gamstor-wohlfar syndrome; gaucher disease type 1and subacute neuropathy (Gaucher disease type 1and Subacute neuronopathic); familial horizontal Gaze paralysis is accompanied by progressive scoliosis (size palsy, familial horizontal, with progressive scoliosis); pan-onset dominant dystrophy epidermolysis bullosa (Generalized dominant dystrophic epidermolysis bullosa); generalized seizures with febrile convulsions add types 3,1, 2 (Generalized epilepsy with febrile seizures plus, type1, type 2); epileptic encephalopathy Lennox-Gastaut type (Epileptic encephalopathy Lennox-Gastaut type); megaaxial neuropathy (Giant axonal neuropathy); glanzmann's thrombocytopenia (Glanzmann thrombasthenia); open angle Glaucoma 1, e, F and G (Glaucoma 1,open angle,e,F,and G); primary congenital Glaucoma 3, d (Glaucoma 3,primary congenital,d); congenital Glaucoma and congenital Glaucoma defects (Glaucoma, congenital and Glaucoma, connetial, coloboma); juvenile primary open angle Glaucoma (Glaucoma, primary open angle, juvenole-set); glioma susceptibility 1 (; glioma susceptibility 1); glucose transporter type 1deficiency syndrome (Glucose transporter type 1deficiency syndrome); glucose-6-phosphate transport defect (Glucose-6-phosphate transport defect); GLUT1 lacks syndrome2 (GLUT 1deficiency syndrome 2); idiopathic generalized susceptible Epilepsy 12 (epiepsy, idiopathic generalized, susceptibility to, 12); glutamate iminomethyltransferase deficiency (Glutamate formiminotransferase deficiency); glutarate IIA and IIB (Glutaric acidemia IIA and IIB); glutarate urine type1 (Glutaric aciduria, type 1); glutathione synthetase deficiency (Gluthathione synthetase deficiency); glycogen storage disease type 0 (muscle), type II (adult), IXa2, IXc, type 1A; type II, type IV (liver disease and myopathy combined), type V and type VI (Glycogen storage disease 0 (muscle), II (adult form), IXa2, IXc, type 1A;type II,type IV,IV (combined hepatic and myopathic), type V, and type VI); goldmann-Favre syndrome; gordon syndrome (Gordon syndrome); gorlin syndrome; forebrain crazy sequence sign (Holoprosencephaly sequence); forebrain crack-free deformity 7 (Holoprosencephaly 7); chronic granulomatosis, X-linked, variant (Granulomatous disease, chiral, X-linked, variant); ovarian granuloma (Granulosa cell tumor of the ovary); gray platelet syndrome (Gray platelet syndrome); griscelli syndrome type 3; groenuw corneal dystrophy type I (Groenouw corneal dystrophy type I); growth and mental retardation, mandibular facial bone hypoplasia, small head deformity, and cleft palate (Growth and mental retardation, mandibulofacial dysostosis, microcephaly, and cleft plate); growth hormone deficiency is accompanied by pituitary abnormalities (Growth hormone deficiency with pituitary anomalies); growth hormone insensitivity with immunodeficiency (Growth hormone insensitivity with immunodeficiency); GTP cyclohydrolase I deficiency (GTP cyclohydrolase I deficiency); hajdu-Cheney syndrome; uterine hand-foot syndrome (Hand foot uterus syndrome); hearing impairment (Hearing impairment); infantile capillary hemangiomas (Hemangioma, capillary infantile); hematological neoplasms (Hematologic neoplasm); hemochromatosis types 1,2B and 3 (Hemochromatosis type, 2B, and 3); diabetic microvascular complications 7 (Microvascular complications of diabetes 7); transferrin serum level quantitative trait locus 2 (Transferrin serum level quantitative trait locus 2); non-deleted haemoglobinopathy (Hemoglobin H disease, nodeleperiod); non-hemocyte type Hemolytic anemia, glucose phosphate isomerase deficiency (Hemolytic anemia, nondipherogenic, due to glucose phosphate isomerase deficiency); familial hemophagocytic lymphoproliferative disorder 2 (Hemophagocytic lymphohistiocytosis, family, 2); familial hemophagocytic lymphoproliferative disorder 3 (Hemophagocytic lymphohistiocytosis, family, 3); heparin cofactor II deficiency (Heparin cofactor II deficiency); hereditary enteropathy acrodermatitis (Hereditary acrodermatitis enteropathica); hereditary breast cancer and ovarian cancer syndrome (Hereditary breast and ovarian cancer syndrome); ataxia-telangiectasia-like disorder (Ataxia-telangiectasia-like disorder); hereditary diffuse gastric cancer (Hereditary diffuse gastric cancer); hereditary diffuse spheroid leukoencephalopathy () Hereditary diffuse leukoencephalopathy with spheroids; deficiency of genetic factor II, IX, VIII (Hereditary factors II, IX, VIII deficiency disease); hereditary hemorrhagic telangiectasia type 2 (Hereditary hemorrhagic telangiectasia type 2); hereditary anhidrosis pain insensitivity (Hereditary insensitivity to pain with anhidrosis); hereditary lymphedema type I (Hereditary lymphedema type I); hereditary motor and sensory neuropathy with optic atrophy (Hereditary motor and sensory neuropathy with optic atrophy); hereditary myopathy is accompanied by early respiratory failure (Hereditary myopathy with early respiratory failure); hereditary neuralgia muscular dystrophy (Hereditary neuralgic amyotrophy); hereditary non-polyposis colorectal tumours (Hereditary Nonpolyposis Colorectal Neoplasms); lynch syndromes I and II (); hereditary pancreatitis (Hereditary pancreatitis); chronic susceptibility Pancreatitis (chronic, susceptibility to); type IIB and type IIA hereditary sensory and autonomic neuropathy (Hereditary sensory and autonomic neuropathy type IIB amd IIA); hereditary iron granule young cell anemia (Hereditary sideroblastic anemia); hermansky-Pudlak syndrome 1,3, 4and 6; visceral ectopic 2, 4and 6, autosomal (hetrotaxy, viscosal, 2,4,and 6,autosomal); visceral ectopic, X-linked (X-linked); ectopic (heteotopia); tissue-cell myeloproliferative reticulocyte hyperplasia (Histiocytic medullary reticulosis); histiocytohyperplasia-lymphadenopathy plus syndrome (histiosporis-lymphadenopathy plus syndrome); holocarboxylase synthase deficiency (Holocarboxylase synthetase deficiency); forebrain crack-free deformities 2,3,7, and9 (Holoprosencephaly 2,3,7, and 9); holt-Oram syndrome; homocysteinemia, pyridoxine reactivity, MTHFR deficiency, CBS deficiency and homocystinuria (Homocysteinemia due to MTHFR deficiency, CBS deficiency, and Homocystinuria, pyridoxine-responsive); homocystinuria-megaloblastic anemia, a deficiency in cobalamin metabolism, cblE complementation (Homocystinuria-Megaloblastic anemia due to defect in cobalamin metabolism, cblE complementation type); howell-Evans syndrome; hurler syndrome; hutchinson-Gilford syndrome; hydrocephalus (Hydrocephalus); hyperammonemia type III (type III); hypercholesterolemia and hypercholesterolemia, autosomal recessive (Hypercholesterolaemia and Hypercholesterolemia, autosomal recessive); excessive fright 2and hereditary excessive fright (Hyperekplexia 2and Hyperekplexia hereditary); hyperferripinemia cataract syndrome (Hyperferritinemia cataract syndrome); hyperglyciuria (hyperglycuria); hyperimmune protein D is associated with periodic fever (Hyperimmunoglobulin D with periodic fever); mevalonic aciduria (Mevalonic aciduria); high immunoglobulin E syndrome (Hyperimmunoglobulin E syndrome); familial hyperinsulinemic hypoglycemia 3,4, and 5 (Hyperinsulinemic hypoglycemia familial 3,4, and 5); hyperinsulinemia-hyperammonemia syndrome (hyperinsulinenism-hyperammonemia syndrome); hyperlysinemia (Hyperlysinemia); hypermanganemia with dystonia, polycythemia and cirrhosis (Hypermanganesemia with dystonia, polycythemia and cirrhosis); ornithine-hyperammonemia-homocystinuria syndrome (hyperorthininemia-hyperammonemia-homocitrullinuria syndrome); hyperparathyroidism 1and2 (Hyperparathyroidism 1and 2); neonatal severity parathyroid hyperfunction (hyperparametric); hyperphenylalaninemia, BH4 deficiency, a, partial pts deficiency, BH4 deficiency, D and non-pku (Hyperphenylalaninemia, BH 4-deficits, a, due to partial pts deficiency, BH 4-deficits, D, and non-pku); hyperphosphatemia is associated with mental retardation syndrome2, 3and 4 (Hyperphosphatasia with mental retardation syndrome2, 3and 4); multiple osteomalacia (Hypertrichotic osteochondrodysplasia); familial Hypobetalipoproteinemia associated with apob32 (hypobetaisoprotemia, family, associated with apob 32); hypocalcemia, autosomal dominant 1 (Hypocalcemia, autosomal dominant 1); familial hypocalcuria hypercalcemia type 1and type 3 (Hypocalciuric hypercalcemia, family, types 1and 3); cartilage disease (Hypochondrogenesis); iron overload hypopigmented microcytic anemia (Hypochromic microcytic anemia with iron overload); hypoglycemia, due to liver glycogen synthase deficiency (Hypoglycemia with deficiency of glycogen synthetase in the liver); hypogonadotropic hypogonadism 11with or without dysolfaction (Hypogonadotropic hypogonadism 11with or without anosmia); hypohidrosis ectodermal dysplasia with immunodeficiency (Hypohidrotic ectodermal dysplasia with immune deficiency); hypohidrosis X-linked ectodermal dysplasia (Hypohidrotic X-linked ectodermal dysplasia); hypokalemia iodic paralysis 1and2 (Hypokalemic periodic paralysis 1and 2); intestinal Hypomagnesemia 1 (Hypomagnesemia 1, intestinal); hypomagnesemia, seizures, and mental retardation (Hypomagnesemia, seizures, and mental retardation); low myelinated leukodystrophy 7 (Hypomyelinating leukodystrophy); left heart dysplasia syndrome (Hypoplastic left heart syndrome); atrioventricular septal defects and common atrioventricular junctions (Atrioventricular septal defect and common atrioventricular junction); hypourethral cleavages 1and2, x-linked (hypspasadias 1and2, x-linked); congenital Hypothyroidism has no goiter 1 (hypotyroidosm, confect 1); less hair disorders 8and 12 (hypotrichia 8and 12); less hair disorder-lymphedema-telangiectasia syndrome (hypotrichia-lymphedema-telangiectasia syndrome); a blood group I system (I blood group system); siemens bullous ichthyosis (Ichthyosis bullosa of Siemens); ichthyosis exfoliative (Ichthyosis exfoliativa); ichthyosis syndrome of premature infants (Ichthyosis prematurity syndrome); idiopathic basal ganglia calcification 5 (Idiopathic basal ganglia calcification 5); idiopathic fibroalveolar inflammation, chronic form (Idiopathic fibrosing alveolitis, chronic form); congenital dysplastic keratosis, autosomal dominant,2and 5 (Dyskeratosis congenita, autosomal dominant,2and 5); infant idiopathic hypercalcemia (Idiopathic hypercalcemia of infancy); immune dysfunction is accompanied by T cell inactivation, and calcium entry deficiency, 2 (Immune dysfunction with T-cell inactivation due to calcium entry defect 2); immunodeficiency 15,16,19,30,31C,38,40,8, cd3- ζ defects, with high IgM types 1and2, and X-Linked, with magnesium defects, epstein-Barr virus infection and neoplasia (Immunodeficiency 15,16,19,30,31C,38,40,8,due to defect in cd3-zeta, with hyper IgM type and2, and X-Linked, with magnesium defect, epstein-Barr virus infection, and neoplasia); immunodeficiency-centromere instability-facial abnormality syndrome2 (Immunodeficiency-centromeric instability-facial anomalies syndrome 2); inclusion body myopathies 2and 3 (Inclusion body myopathy 2and 3); nosaka myopathy; familial infantile convulsions and paroxysmal chorea athetosis (Infantile convulsions and paroxysmal choreoathetosis, family); infant cortical hyperostosis (Infantile cortical hyperostosis); infant GM1 gangliosidosis (Infantile GM1 gangliosis); infant hypophosphatasia (Infantile hypophosphatasia); nephritis in infants (Infantile nephronophthisis); infant nystagmus, X-linked (Infantile nystagmus); infant parkinson's disease-dystonia (Infantile Parkinsonism-dystonia); infertility associated with multi-tailed sperm and DNA excess (Infertility associated with multi-tailed spermatozoa and excessive DNA); insulin resistance (Insulin resistance); insulin resistance diabetes mellitus and acanthosis nigricans (instrin-resistant diabetes mellitus and acanthosis nigricans); insulin dependent diabetes secretory diarrhea syndrome (ins-dependent diabetes mellitus secretory diarrhea syndrome); megakaryoplasmic nephritis (Interstitial nephritis, karyomegalice); intrauterine dysplasia, metaphyseal dysplasia, congenital adrenal dysplasia, genital abnormalities (Intrauterine growth retardation, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomalies); iodotyrosyl coupling defect (Iodotyrosyl coupling defect); IRAK4 deficiency; iris dysplasia type dominant and type1 (Iridogoniodysgenesis dominant type and type 1); brain tissue iron deposition (Iron accumulation in brain); ischial patellar dysplasia (Ischiopatellar dysplasia); islet cell hyperplasia (Islet cell hyperplasia); isolated17,20-lyase deficiency (Isolated 17,20-lyase deficiency); isolated luteinizing hormone deficiency (Isolated lutropin deficiency); isovaleryl-coa dehydrogenase deficiency (Isovaleryl-CoA dehydrogenase deficiency); jankovic river syndrome; jervell-Lange Nielsen syndrome 2; joubert syndrome 1,6,7,9/15 (double gene), 14,16 and 17, orofaciiodigital syndrome xiv (Jankovic Rivera syndrome; jervell and Lange-Nielsen syndrome 2;Joubert syndrome 1,6,7,9/15 (digenic), 14,16,and 17,and Orofaciodigital syndrome xiv); herlitz interface epidermolysis bullosa (Junctional epidermolysis bullosa gravis of Herlitz); juvenile GM >1< ganglioside deposition (juvenole GM >1< gangliosidosis); juvenile polyposis syndrome (Juvenile polyposis syndrome); juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome (Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome); juvenile retinal cleavage (Juvenile retinoschisis); kabuki make-up syndrome; kalman syndromes 1, 2and 6 (Kallmann syndrome 1, 2and 6); pubertal delay (Delayed puberty); kanzaki disease; karak syndrome (Karak syndrome); kartagner syndrome; kenny-cafrey syndrome type 2; keppen-Lubinsky syndrome; keratoconus 1 (keratectopus 1); follicular keratosis (Keratosis follicularis); palmoplantar keratosis 1 (Keratosis palmoplantaris striata 1); kindler syndrome; l-2-hydroxyglutarate (L-2-hydroxyglutaric aciduria); larsen syndrome, dominant form; type III lattice corneal dystrophies (Lattice corneal dystrophy Type III); leber's black mask (Leber amaurosis); zellweger syndrome; peroxisome biogenesis disorders (Peroxisome biogenesis disorders); zellweger syndrome spectrum (Zellweger syndrome spectrum); leber congenital amaurosis 11,12,13,16,4,7 and9 (Leber congenital amaurosis 11,12,13,16,4,7,and 9); leber's optic atrophy (Leber optic atrophy); aminoglycoside-induced deafness (Aminoglycoside); non-syndromic sensorineural hearing loss, mitochondria (Deafness, nonsyndromic sensorineural, mitochondral); left ventricular densification insufficiency 5 (Left ventricular noncompaction 5); left-right axis deformity (Left-right axis malformations); leigh disease; mitochondrial short alkenoyl-CoA Hydratase 1deficiency (Mitochondrial short-chain end-CoA hydroatase 1 prescribing); leigh syndrome, mitochondrial complex I deficiency; leiner disease; leri Weill cartilage dysfunction (Leri Weill dyschondrosteosis); lethal congenital contracture syndrome 6 (Lethal congenital contracture syndrome 6); leukocyte adhesion deficiency type I and type III (Leukocyte adhesion deficiency type I and III); leukodystrophy, myelination deficiencies 11and 6 (leukodystrachy, hypmyelining, 11and 6); leukoencephalopathy with ataxia, brain stem and spinal cord involvement and elevated lactic acid with white matter disappearance and progression with ovarian failure (Leukoencephalopathy with ataxia, with Brainstem and Spinal Cord Involvement and Lactate Elevation, with vanishing white matter, and progressive, with ovarian failure); panonychomycosis (Leukonychia totalis); dementia with lewy bodies (Lewy body dementia); lichtenstein-Knorr syndrome; li-Fraomeni syndrome 1; lig4syndrome; limb banding muscular dystrophies 1B,2A,2B,2D, C, C5, C9, C14 (Lig 4syndrome; limb-girdle muscular dystrophy, type 1B,2A,2B,2d, C1, C5, C9, C14); congenital muscular dystrophy-muscular dystrophy with brain and eye abnormalities a14 and B14 (Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type a14 and B14); combined lipase deficiency (Lipase deficiency combined); lipid protein deposition (Lipid proteinosis); familial partial Lipodystrophy type 2and type 3 (Lipodystrophy, family partial, type 2and 3); no brain return deformity 1,2 (X linkage), 3,6 (microcephaly), X linkage (Lissencephaly 1,2 (X-linked), 3,6 (with microcephaly), X-linked); hypocortical lamellar ectopic, X-linked (Subcortical laminar heterotopia, X-linked); acute infant liver failure (Liver failure acute infantile); loys-Dietz syndrome 1,2, 3; long QT syndrome 1, 2/9, 2/5, (di-genotype), 3, 5and 5, availability, susceptibility; lung cancer (Lung cancer); hereditary Lymphedema, id (Lymphedema, hereadenylary, id); primary Lymphedema is accompanied by myelodysplasia (Lymphedema, primary, with myelodysplasia); lymphoproliferative syndromes 1,1 (X linked) and2 (Lymphoproliferative syndrome 1,1 (X-linked), and 2); lysosomal acid lipase deficiency (Lysosomal acid lipase deficiency); giant head deformity, giant child, facial deformity syndrome (Macrocephaly, macromonomia, facial dysmorphism syndrome); adult vitelliform macular dystrophy (Macular dystrophy, vitelliform, add-on;); malignant Gao Reyi susceptibility type1 (Malignant hyperthermia susceptibility type 1); non-Hodgkin's malignant lymphoma (Malignant lymphoma, non-Hodgkin); malignant melanoma (Malignant melanoma); prostate malignancy (Malignant tumor of prostate); mandibular bone dysplasia (Mandibuloacral dysostosis); mandibular dysplasia with type a or type B lipodystrophy, atypical (Mandibuloacral dysplasia with type A or B lipodystrophy); mandibular dysplasia, treacher Collins type, autosomal recession (Mandibulofacial dysostosis, treacher Collins type, autosomal recessive); mannose-binding protein deficiency (Mannose-binding protein deficiency); maple diabetes type 1A and 3 (Maple syrup urine disease type a and type 3); marden Walker-like syndrome; ma Fanzeng syndrome (Marfan syndrome); marinesco-Sj\xc3\xb6gren syndrome; martsolf syndrome; juvenile onset adult-onset diabetes type1, type 2, type 11, type 3, and type 9 (quality-onset diabetes of the young, type1,type 2,type 11,type 3,and type 9); may-Hegglin abnormalities; MYH 9-related diseases (MYH 9 related disorders); sebastin syndrome; mcCune-alignment syndrome; growth hormone cell adenoma (Somatotroph adenoma); sex cord interstitial tumor (Sex cord-structural tumor); cushing syndrome; mcKusick Kaufman syndrome; mcLeod neurosacanthocytosis syndrome (McLeod neuroacanthocytosis syndrome); meckel-Gruber syndrome; medium-chain acyl-coa dehydrogenase deficiency (Medium-chain acyl-coenzyme A dehydrogenase deficiency); medulloblastoma (Medulloblastoma); megabrain leukoencephalopathy is accompanied by subcortical cysts 1and 2a (Megalencephalic leukoencephalopathy with subcortical cysts, 1and2 a); congenital giant brain telangiectasia marble-like skin (Megalencephaly cutis marmorata telangiectatica congenital); PIK3 CA-associated overgrowth spectrum (PIK 3CA Related Overgrowth Spectrum); megabrain-polycephalum-multi-finger (toe) deformity-hydrocephalus syndrome2 (megalenceply-polymicrogyria-polydactyl-hydrocephalus syndrome); thiamine-responsive megaloblastic anemia with diabetes mellitus and sensorineural hearing loss (Megaloblastic anemia, with diabetes mellitus and sensorineural deafness); meier-Gorlin syndromes 1and 4; melnick-Needles syndrome; meningioma (menigioma); mental retardation, X-linked,3,21,30 and 72 (Mental retardation, X-linked,3,21,30 and 72); mental retardation and small head deformity with brain bridge and cerebellar hypoplasia (Mental retardation and microcephaly with pontine and cerebellar hypoplasia); mental retardation X-linked syndrome 5 (Mental retardation X-linked syndrome 5); mental retardation, anterior maxillary processes and strabismus (Mental retardation, anterior maxillary protrusion, and strabidemus); mental retardation, autosomal dominant 12,13,15,24,3,30,4,5, 6and 9 (Mental retardation, autosomal dominant 12,13,15,24,3,30,4,5,6,and 9); mental retardation, autosomal recessions 15,44,46 and 5 (Mental retardation, autosomal recessive, 15,44,46, and 5); mental retardation, notch movements, epilepsy and/or brain malformations (Mental retardation, stereotypic movements, epiepsy, and/or cerebral malformations); mental retardation, syndrome, claes-Jensen type, X linkage (Mental retardation, synromic, claes-Jensen type, X-linked); x-linked nonspecific mental retardation, syndrome Hedera type and syndrome wu type (Mental retardation, X-linked, non-nspecific, syndromic, hedera type, and syndromic, wu type); merosin lacks congenital muscular dystrophy (Merosin deficient congenital muscular dystrophy); metachromatic leukodystrophy, juvenile, infant advanced and adult (Metachromatic leukodystrophy juvenile, late infartile, and adult types); metachromatic leukodystrophy (Metachromatic leukodystrophy); anagen vegetative dysplasia (Metatrophic dysplasia); methemoglobinemia type I and type 2 (Methemoglobinemia types I and 2); methionine adenosine transferase deficiency, autosomal dominant (Methionine adenosyltransferase deficiency, autosomal dominant); methylmalonic acid blood with homocystinuria (Methylmalonic acidemia with homocystinuria); methylmalonic urine cblB (Methylmalonic aciduria cblB type); methylmalonic acid urea, caused by deficiency of methylmalonyl-CoA mutase (Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency); methylmalonic acid urine disorder, mut (0) TYPE (METHYLMALONIC ACIDURIA, mut (0) TYPE); original dwarfism type 2 of microcephaly bone dysplasia (Microcephalic osteodysplastic primordial dwarfism type 2); small head deformity with or without chorioretinopathy, lymphedema or mental retardation (Microcephaly with or without chorioretinopathy, lymphedema, or mental retardation); microcephaly, hiatal hernia, and nephrotic syndrome (Microcephaly, hiatal hernia and nephrotic syndrome); microcephaly (Microcephaly); callus hypoplasia (Hypoplasia of the corpus callosum); spastic paraplegia 50, autosomal recession (Spastic paraplegia 50,autosomal recessive); overall developmental retardation (Global developmental delay); insufficient CNS myelination (CNS hypomyelination); brain atrophy (Brain atrophy); microcephaly, normal mental and immunodeficiency (Microcephaly, normal intelligence and immunodeficiency); microcephaly-capillary malformation syndrome (Microcephaly-capillary malformation syndrome); small cell anemia (Microcytic anemia); small eye syndrome 5,7 and9 (Microphthalmia syndromic 5,7 and 9); small eyeballs, isolated 3,5,6, 8and companion eye defect 6 (microphtalmia, isolated 3,5,6,8,and with coloboma 6); spherical lens (microspheria;); familial basal Migraine (migrain); miller syndrome; micronuclear myopathy is accompanied by extraocular muscle paralysis (Minicore myopathy with external ophthalmoplegia); congenital central Myopathy (Myopathy, congenital with cores); mitchell-Riley syndrome; mitochondrial 3-hydroxy-3-methylglutaryl-coa synthase deficiency (mitochondral 3-hydroxy-3-methylglutaryl-CoA synthase deficiency); mitochondrial complex I, II, III, III (karyotype 2,4 or 8) deficiency (Mitochondrial complex I, II, III (nuclear type 2,4, or 8) deficiency); mitochondrial DNA depletion syndromes 11,12 (cardiomyopathy type), 2,4B (MNGIE type), 8B (MNGIE type) (Mitochondrial DNA depletion syndrome, 12 (cardiomyopathic type), 2,4B (MNGIE type), 8B (MNGIE type)); mitochondrial DNA depletion syndromes 3and 7, hepatic encephalopathy and 13 (encephalomyopathy) (Mitochondrial DNA-depletion syndrome 3and 7,hepatocerebral types,and 13 (encephalomyopathic type)); mitochondrial phosphate carrier and pyruvate carrier deficiency (Mitochondrial phosphate carrier and pyruvate carrier deficiency); mitochondrial trifunctional protein deficiency (Mitochondrial trifunctional protein deficiency); long-chain 3-hydroxyalkyl coenzyme a dehydrogenase deficiency (Long-chain 3-hydroxyycyl-CoA dehydrogenase deficiency); miyoshi muscular dystrophy 1 (Miyoshi muscular dystrophy 1); distal Myopathy, accompanied by Myopathy (Myopathy, distal, with anterior tibial onset); mohr-Tranebjaerg syndrome; molybdenum cofactor deficiency, complement group a (Molybdenum cofactor deficiency, complementation group A); mowat-Wilson syndrome; mucopolysaccharidosis III gamma (Mucolipidosis III Gamma); mucopolysaccharidoses type VI, type VI (severe) and type VII (Mucopolysaccharidosis type VI, type VI (reverse), and type VII); mucopolysaccharidoses, MPS-I-H/S, MPS-II, MPS-III-A, MPS-III-B, MPS-III-C, MPS-IV-A, MPS-IV-B (mucosaccharidosis, MPS-I-H/S, MPS-II, MPS-III-Sup>A, MPS-III-B, MPS-III-C, MPS-IV-Sup>A, MPS-IV-B); retinitis pigmentosa73 (Retinitis Pigmentosa 73); ganglioside deposition GM1 type1 (heart concomitant) 3 (Gangliosidosis GM type1 (with cardiac involvement) 3); multi-centered osteolytic kidney disease (Multicentric osteolysis nephropathy); multi-central osteolysis, sarcoidosis and arthropathy (Multicentric osteolysis, nodulosis and arthropathy); multiple congenital anomalies (Multiple congenital anomalies); atrial septal defect 2 (Atrial septal defect 2); multiple congenital malformation-hypotonia-epileptic syndrome 3 (Multiple congenital anomalies-hypotonia-seizures syndrome 3); multiple skin and mucosal venous malformations (Multiple Cutaneous and Mucosal Venous Malformations); multiple endocrine tumor types 1and 4 (Multiple endocrine neoplasia, types 1and 4); multiple epiphyseal dysplasia 5or Dominant (Multiple epiphyseal dysplasia or dominint); multiple gastrointestinal block (Multiple gastrointestinal atresias); escobar type of multiple pterygium syndrome (Multiple pterygium syndrome Escobar type); multiple sulfatase deficiency (Multiple sulfatase deficiency); multiple osteosynthesis syndrome 3 (Multiple synostoses syndrome); muscle AMP guanine oxidase deficiency (Muscle AMP guanine oxidase deficiency); myoocular encephalopathy (Muscle eye brain disease); congenital muscular dystrophy, large cone granule (Muscular dystrophy, congent, megaConial type); familial infantile muscle weakness 1 (myasshenia, familial infantile, 1); congenital myasthenia syndrome 11, associated with acetylcholine receptor deficiency (Myasthenic Syndrome, congenital,11,associated with acetylcholine receptor deficiency); congenital myasthenia syndrome 17,2A (slow channel), 4B (fast channel), and no tubular aggregates (Myasthenic Syndrome, congenital,17,2A (slow-channel), 4B (fast-channel), and without tubular aggregates); myeloperoxidase deficiency (Myeloperoxidase deficiency); MYH-related polyposis (MYH-associated polyposis); endometrial cancer (Endometrial carcinoma); myocardial infarction 1 (Myocardial infarction 1); myoclonus dystonia (Myoclonic dystonia); myoclonus-tension loss Epilepsy (myolonic-Atonic epiepsy); myoclonus epilepsy with broken red fibers (Myoclonus with epilepsy with ragged red fibers); myofibrillar myopathy 1and ZASP-related (Myofibrillar myopathy and ZASP-related); acute recurrent myoglobin urine, autosomal recessive (myoglobinaria, acute recurrent, autosomal recessive); myoneuropathic gastrointestinal encephalopathy syndrome (Myoneural gastrointestinal encephalopathy syndrome); infant cerebellar ataxia is accompanied by progressive external oculopathy (Cerebellar ataxia infantile with progressive external ophthalmoplegia); mitochondrial DNA depletion syndrome 4b, mngie type (Mitochondrial DNA depletion syndrome b, mngie type); congenital central Myopathy 1, with distal polymyositis 1, lactic acidosis and iron-granule young erythrocyte anemia 1, mitochondrial progressive congenital cataracts, hearing loss and developmental retardation, and tubular aggregates 2 (1,congenital,with excess of muscle spindles,distal,1,lactic acidosis,and sideroblastic anemia 1,mitochondrial progressive with congenital cataract,hearing loss,and developmental delay,and tubular aggregate,2); myopia 6 (myopic 6); myosclerosis, autosomal recession (myoclosis, autosomal recessive); congenital myotonia (Myotonia congenital); congenital myotonia, autosomal dominant and recessive forms (Congenital myotonia, autosomal dominant and recessive forms); nail-patella syndrome (tail-patella syndrome); nance-Horan syndrome; true small eyeballs 2 (nanophtalmos 2); navajo nerve liver disease (Navajo neurohepatopathy); linear body myopathies 3and9 (Nemaline myopathy 3and 9); neonatal hypotonia (Neonatal hypotonia); a smart barrier (Intellectual disability); epilepsy (Seizures); speech and language developmental delay (Delayed speech and language development); mental retardation, autosomal dominant 31 (Mental retardation, autosomal dominant); neonatal intrahepatic cholestasis, caused by deficiency of Hiterlin (Neonatal intrahepatic cholestasis caused by citrin deficiency); nephrogenic diabetes insipidus, X-linked nephrogenic diabetes insipidus (Nephrogenic diabetes insipidus, nephrogenic diabetes insipidus, X-linked); kidney stones/osteoporosis, hypophosphatemia 2 (nephroithiasis/osteoporosis, 2); kidney wasting diseases 13,15and 4 (nephrophophisis 13,15and 4); infertility (Infertility); cerebellum-eye-kidney syndrome (kidney wasting disease, motor nerve loss and cerebellum abnormality) (Cerebello-oculo-renal syndrome, oculomotor apraxia and cerebellar abnormalities); nephrotic syndrome type 3, type 5, with or without ocular abnormalities, type 7 and type 9 (Nephrotic syndrome, type 3,type 5,with or without ocular abnormalities,type 7,and type 9); nestor-Guillermo premature senility syndrome (Nestor-Guillermo progeria syndrome); neu-Laxova syndrome 1; neurodegenerative brain-associated iron deposits 4and 6 (Neurodegeneration with brain iron accumulation and 6); a neuroferritin pathology (neurofritinopahy); neurofibromatosis type 1and type 2 (Neurofibromatosis, type 1and type 2); neurofibrosarcoma (neuroofibrosacea); diabetes insipidus of the pituitary (Neurohypophyseal diabetes insipidus); hereditary sensory Neuropathy IC (neuronathy, hereditary Sensory, type IC); neutral 1amino acid transport deficiency (Neutral 1amino acid transport defect); neutral lipid storage disease is accompanied by myopathy (Neutral lipid storage disease with myopathy); neutrophil immunodeficiency syndrome (Neutrophil immunodeficiency syndrome); nicolaides-Baraitser syndrome; cheng Renxing Niemann disease type C1, C2, A and C1 (Niemann-Pick disease type C, C2, type A, and type C1, add form); non-ketotic hyperglycinemia (Non-ketotic hyperglycinemia); noonan syndrome 1and 4, leopard syndrome 1 (Noonan syndrome 1and 4,LEOPARD syndrome 1); noonan syndrome-like disease with or without juvenile myelomonocytic leukemia (Noonan syndrome-like disorder with or without juvenile myelomonocytic leukemia); normal blood potassium-type periodic paralysis, potassium-sensitive (Normokalemic periodic paralysis); norum disease (Norum disease); epilepsy, hearing Loss and mental retardation syndrome (epiepsy, heart Loss, and Mental Retardation Syndrome); mental retardation, X linkage 102and syndrome 13 (Mental Retardation, X-Linked 102and syndromic 13); obesity (Obesity); albino eye type I (type I); eyelid albinism type 1B, type 3and type 4 (Oculocutaneous albinism type 1B,type 3,and type 4); eye and tooth finger dysplasia (Oculodentodigital dysplasia); dentition hypoalkaline phosphatase (odohypophosphatasia); odontotrichomelic syndrome; kohlrabi (Oguchi disease); oligodendrocyte-colorectal cancer syndrome (Oligodontia-colorectal cancer syndrome); opitzG/BBB syndrome; optic atrophy 9 (Optic atrophy 9); oral-facial-digital syndrome (Oral-facial-digital syndrome); ornithine aminotransferase deficiency (Ornithine aminotransferase deficiency); orofacial cleft deformities 11and 7, cleft labialis/cleft palate-ectodermal dysplasia syndrome (Orofacial cleft 11and 7,Cleft lip/plate-ectodermal dysplasia syndrome); orstavik Lindemann Solberg syndrome; osteoarthritis is accompanied by mild cartilage dysplasia (Osteoarthritis with mild chondrodysplasia); osteochondritis dissecans (Osteochondritis dissecans); osteogenesis imperfecta type 12, 5,7, 8, I, III, with normal sclera, overt, recessive perinatal mortality (Osteogenesis imperfecta type 12,type 5,type 7,type 8,type I,type III,with normal sclerae,dominant form,recessive perinatal lethal); the striated bone disease is accompanied by skull sclerosis (Osteopathia striata with cranial sclerosis); osteosclerosis, autosomal dominant type 1and type 2, recessive 4, recessive 1, recessive 6 (Osteopetrosis autosomal dominant type 1and2,recessive 4,recessive 1,recessive 6); osteoporosis with pseudoglioma (Osteoporosis with pseudoglioma); ear-palate-finger syndrome, types I and II (Oto-palto-digital syndrome, types I and II); ovarian hypoplasia 1 (Ovarian dysgenesis 1); ovarian leukodystrophy (Ovarioleukodystrophy); congenital armor disease 4and type 2 (Pachyonychia congenita and type 2); familial paget's disease of bone (Paget disease of bone, family); pallister-Hall syndrome; palmoplantar keratosis, non-epidermolysis, focal or diffuse (Palmoplantar keratoderma); pancreatic hypoplasia and congenital heart disease (Pancreatic agenesis and congenital heart disease); papillon-Lef\xc3\xa8vre syndrome; paraganglioma 3 (Paragangliomas 3); von Eulenburg congenital paramyotonia (Paramyotonia congenita of von Eulenburg); parathyroid cancer (Parathyroid carcinoma); parkinson's disease 14,15,19 (juvenile onset), 2,20 (premature), 6, autosomal recessive premature and9 (Parkinson disease, 15,19 (juvenole-onset), 2,20 (early-onset), 6, (autosomal recessive early-onset, and 9), partial albinism (Partial albinism), partial hypoxanthine-guanine phosphoribosyl transferase deficiency (Partial hypoxanthine-guanine phosphoribosyltransferase deficiency), retinal pigment epithelial pattern dystrophy (Patterned dystrophy of retinal pigment epithelium), PC-K6A, pelizaeus-Merzbacher disease, pendred syndrome, peripheral demyelinating neuropathy, central myelination disorder (Peripheral demyelinating neuropathy, central dysmyelination), hirschspring-disease, permanent Neonatal diabetes (Permanent Neonatal diabetes mellitus), permanent Neonatal diabetes with neurological characteristics (Diabetes mellitus, permanent Neonatal, with neurologic features), neonatal insulin dependent diabetes (neondins-dependent diabetes mellitus), adult diabetes type 2 of juvenile onset (Maturesis-onset diabetes of the young), type 2A, 5, type 14A, 5B, 5A, 5B, 5, biological disorders, B, 5, B, adult onset type 2, adult-phosphoribosy, 7A and7B (Peroxisome biogenesis disorder 14B,2A,4A,5B,6A,7A,and7B;Perrault syndrome 4); perrault syndrome 4; perry syndrome; infants continue hyperinsulinemic hypoglycemia (Persistent hyperinsulinemic hypoglycemia of infancy); familial hyperinsulinemia (familial hyperinsulinism); phenotype (Phenotypes); phenylketonuria (phenyllketonia); pheochromocytoma (Pheochromocytoma); hereditary paraganglioma-pheochromocytoma syndrome (Hereditary Paraganglioma-Pheochromocytoma Syndromes); paraganglioma 1 (Paragangliomas 1); intestinal carcinoid tumor (Carcinoid tumor of intestine); cowden syndrome 3; phosphoglycerate dehydrogenase deficiency (Phosphoglycerate dehydrogenase deficiency); phosphoglycerate kinase 1deficiency (Phosphoglycerate kinase 1 deficiency); photosensitive hair sulfur malnutrition (Photosensitive trichothiodystrophy); phytanic acid storage disease (Phytanic acid storage disease); pick disease (Pick disease); pearson syndrome (Pierson syndrome); pigment retinal dystrophy (Pigmentary retinal dystrophy); primary pigmentary nodular adrenocortical disease 1 (Pigmented nodular adrenocortical disease, primary, 1); hair matrix tumor (pilotarixoma); pitt-Hopkins syndrome; pituitary-dependent hypercortisolism (Pituitary dependent hypercortisolism); combined pituitary hormone deficiency symptoms 1,2, 3and 4 (Pituitary hormone deficiency, combined 1,2, 3and 4); plasminogen activator inhibitor type 1deficiency (Plasminogen activator inhibitor type1 deficiency); plasminogen-deficiency type I (Plasminogen deficiency, type I); platelet bleeding disorders 15and 8 (Platelet-type bleeding disorder 15and 8); hereditary fibrotic skin heterochromosis is accompanied by tendon contracture, myopathy and pulmonary fibrosis (poikilloderm, hereditary fibrosing, with tendon contractures, myopathy, and pulmonary fibrosis); polycystic kidney disease 2, adult, infant (Polycystic kidney disease 2,adult type,and infantile type); polycystic lipid membranous bone dysplasia is accompanied by sclerotic leukoencephalopathy (Polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy); polyglucosan myopathy 1with or without immunodeficiency (Polyglucosan body myopathy 1with or without immunodeficiency); asymmetric bilateral frontal multiple cerebellar gyria (Polymicrogyria, asymmetric, bilateral frontoparietal); polyneuropathy, hearing loss, ataxia, retinitis pigmentosa, and cataracts; brain bridge cerebellar hypoplasia type 4 (Pontocerebellar hypoplasia type 4); pterygium of the popliteal fossa (Popliteal pterygium syndrome); porencephora 2 (Porencephaly 2); diffuse superficial photosensitive Porokeratosis 8 (poroseasis 8,disseminated superficial actinic type); porphobilinogen synthase deficiency (Porphobilinogen synthase deficiency); delayed porphyria cutanea (Porphyria cutanea tarda); posterior column ataxia is accompanied by retinitis pigmentosa (Posterior column ataxia with retinitis pigmentosa); posterior pole cataract type 2 (Posterior polar cataract type 2); prader-Willi like syndrome; premature ovarian failure 4,5,7 and9 (Premature ovarian failure 4,5,7 and 9); primary autosomal recessive microcephaly 10,2,3, and 5 (Primary autosomal recessive microcephaly 10,2,3, and 5); primary ciliated dyskinesia 24 (Primary ciliary dyskinesia 24); primary dilated cardiomyopathy (Primary dilated cardiomyopathy); left ventricular densification insufficiency 6 (Left ventricular noncompaction 6); 4, left ventricular densification insufficiency 10 (Left ventricular noncompaction); paroxysmal atrial fibrillation (Paroxysmal atrial fibrillation); primary hyperoxaluria type I, III, and III (Primary hyperoxaluria, type I, type, and type III); primary hypertrophic osteoarthropathy, autosomal recessive 2 (Primary hypertrophic osteoarthropathy, autosomal recessive 2); primary hypomagnesemia (Primary hypomagnesemia); primary open-angle juvenile glaucoma 1 (Primary open angle glaucoma juvenile onset 1); primary pulmonary hypertension (Primary pulmonary hypertension); primrose syndrome; progressive familial heart block type 1B (Progressive familial heart block type 1B); progressive familial intrahepatic cholestasis 2and 3 (Progressive familial intrahepatic cholestasis 2and 3); progressive intrahepatic cholestasis (Progressive intrahepatic cholestasis); progressive myoclonus epilepsy with ataxia (Progressive myoclonus epilepsy with ataxia); progressive pseudo-rheumatoid dysplasia (Progressive pseudorheumatoid dysplasia); progressive sclerosant malnutrition (Progressive sclerosing poliodystrophy); prolyl peptidase deficiency (Prolidase deficiency); proline dehydrogenase deficiency (Proline dehydrogenase deficiency); schizophrenia 4 (Schizophrenia 4); properdin deficiency, X-linked (Properdin deficiency, X-linked); acrylic acidemia (Propionic academia); proprotein convertase 1/3deficiency (Proprotein convertase 1/3 deficiency); hereditary Prostate cancer 2 (2); red pigment defect (Protan defect); proteinuria (proteouria); finnish congenital nephrotic syndrome (Finnish congenital nephrotic syndrome); protein syndrome; breast cancer (Breast adenocarcinoma); pseudo-achondroplasia spondyloeepiphyseal dysplasia syndrome (Pseudoachondroplastic spondyloepiphyseal dysplasia syndrome); pseudoaldosteronism type 1autosomal dominant and recessive and type 2 (Pseudohypoaldosteronism type 1autosomal dominant and recessive and type 2); pseudohypoparathyroidism type 1A, pseudohypoparathyroidism (Pseudohypoparathyroidism type a, pseudopseudohypothyroidism); pseudo-neonatal adrenoleukodystrophy (Pseudoneonatal adrenoleukodystrophy); pseudo-primary aldosteronism (Pseudoprimary hyperaldosteronism); -elastic pseudoxanthoma (Pseudoxanthoma elasticum); infant-onset arterial calcification 2 (Generalized arterial calcification of infancy 2); pseudoxanthomatoid disorders with multiple clotting factor deficiency (Pseudoxanthoma elasticum-like disorder with multiple coagulation factor deficiency); psoriasis susceptibility 2 (Psoriasis susceptibility 2); PTEN hamartoma tumor syndrome (PTEN hamartoma tumor syndrome); pulmonary hypertension, associated with hereditary hemorrhagic telangiectasia (Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia); telomere-Related pulmonary fibrosis and/or bone marrow failure 1and 3 (Pulmonary Fibrosis And/Or Bone Marrow Failure, telomere-Related,1and 3); primary pulmonary hypertension 1 is accompanied by hereditary hemorrhagic telangiectasia (Pulmonary hypertension, primary,1,with hereditary hemorrhagic telangiectasia); purine nucleoside phosphorylase deficiency (Purine-nucleoside phosphorylase deficiency); pyruvate carboxylase deficiency (Pyruvate carboxylase deficiency); pyruvate dehydrogenase E1-alpha deficiency (Pyruvate dehydrogenase E1-alpha deficiency); erythrocyte pyruvate kinase deficiency (Pyruvate kinase deficiency of red cells); ryan syndrome (rain syndrome); rasopathy; recessive dystrophic epidermolysis bullosa (Recessive dystrophic epidermolysis bullosa); non-syndromic congenital Nail disease 8 (soil dispenser, nonsyndromic congenital, 8); lei Fansi tam syndrome (Reifenstein syndrome); renal dysplasia (ready dysplasia); deficiency in renal carnitine transport (Renal carnitine transport defect); kidney defect syndrome (Renal coloboma syndrome); renal dysplasia (Renal dysplasia); kidney dysplasia, retinal pigment dystrophy, cerebellar ataxia and skeletal dysplasia (Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia); distal tubular acidosis, autosomal recession, with delayed sensorineural hearing loss or with hemolytic anemia (Renal tubular acidosis, distal, autosomal recessive, with late-onset sensorineural hearing loss, or with hemolytic anemia); proximal tubular acidosis is accompanied by ocular abnormalities and mental retardation (Renal tubular acidosis, proximal, with ocular abnormalities and mental retardation); cone dystrophy 3B (Retinal cone dystrophy 3B); retinal pigment degeneration (Retinitis pigmentosa); retinitis pigmentosa 10,11,12,14,15,17, and 19 (Retinitis pigmentosa 10,11,12,14,15,17,and 19); retinitis pigmentosa 2,20,25,35,36,38,39,4,40,43,45,48,66,7,70,72 (Retinitis pigmentosa 2,20,25,35,36,38,39,4,40,43,45,48,66,7,70,72); retinoblastoma (retinobastoma); rett disorder (Rett disorder); rhabdoid tumor susceptibility syndrome2 (Rhabdoid tumor predisposition syndrome 2); hole-derived retinal detachment, autosomal dominant (Rhegmatogenous retinal detachment, autosomal dominant); limb root punctate dysplasia type 2and type 3 (Rhizomelic chondrodysplasia punctata type and type 3); roberts-SC short limb deformity syndrome (Roberts-SC phocomelia syndrome); robinow Sorauf syndrome; robinow syndrome, autosomal recessive, short with simultaneous and multi-fingered (toe) with brachyth syndrome, autosomal recessive, autosomal recessive; rothmund-Thomson syndrome; radiline syndrome; RRM2B-related mitochondrial disease (RRM 2B-related mitochondrial disease); rubinstein-Taybi syndrome; salla disease; sandhoff's disease, adult and infant; early Sarcoidosis (early-set); blau syndrome; sindler (Schindler) disease type1 (Schindler disease, type 1); cerebral laceration (schizoncephaly); schizophrenia 15 (Schizophrenia 15); schneckenbecken dysplasia; schwannomosis 2; schwartz Jampel syndrome type 1; sclerotic cornea, autosomal recessive (sclecrocornea, autosomal recessive); sclerosteosis (Sclerosteosis); secondary hypothyroidism (Secondary hypothyroidism); segawa syndrome, autosomal recessive; senier-Loken syndrome 4and 5; sensory ataxia neuropathy, dysarthria, and oculoparalysis (Sensory ataxic neuropathy, dysarthria, and ophthalmoparesis); sepiapterin reductase deficiency; seSAME syndrome; severe combined immunodeficiency, ADA deficiency, with small head malformation, retarded growth, sensitivity to ionizing radiation, atypical, autosomal recessive inheritance, T cell negative, B cell positive, NK cell negative or NK positive (Severe combined immunodeficiency due to ADA deficiency, with microcephaly, growth retardation, and sensitivity to ionizing radiation, atypical, autosomal recessive, T cell-negative, B cell-positive, NK cell-negative of NK-positive); severe congenital neutropenia (Severe congenital neutropenia); severe congenital neutropenia 3, autosomal recessive or dominant (Severe congenital neutropenia 3,autosomal recessive or dominant); severe congenital neutropenia 6, autosomal recession (Severe congenital neutropenia and 6,autosomal recessive); severe myoclonus epilepsy in infants (Severe myoclonic epilepsy in infancy); generalized epilepsy with febrile convulsions plus types 1and2 (Generalized epilepsy with febrile seizures plus, types 1and 2); severe X-linked myopathy (Severe X-linked myotubular myopathy); short QT syndrome 3 (Short QT syndrome 3); short stature accompanied by nonspecific skeletal abnormalities (Short stature with nonspecific skeletal abnormalities); short stature, closed ear canal, underjaw hypoplasia, skeletal abnormalities (Short status, auditory canal atresia, mandibular hypoplasia, skeletal abnormalities); short stature, nail dysplasia, facial deformity, and oligospermia (facial dysmorphism, and hypotrichosis); original dwarfism (Primordial dwarfism); chest Short rib dysplasia11 or 3with or without multi-fingered (toe) deformity (Short-rib thoracic dysplasia 11or 3with or without polydactyly); sialidosis type I and type II (Sialidosis type I and II); silvery spastic paraplegia syndrome (Silver spastic paraplegia syndrome); nerve conduction velocity is slowed, autosomal dominant (Slowed nerve conduction velocity, autosomal dominant); smith-Lemli-Opitz syndrome; snyder Robinson syndrome; growth hormone cell adenoma (Somatotroph adenoma); prolactinoma (Prolactinoma); familial pituitary adenoma susceptibility (family, pituitary adenoma predisposition); sotos syndrome 1or 2 (Sotos syndrome 1or 2); spasticity ataxia 5, autosomal recession, charlevoix-Saguenay type,1,10, or 11, autosomal recession (sports ataxia 5,autosomal recessive,Charlevoix-Saguenay type,1,10,or 11,autosomal recessive); amyotrophic lateral sclerosis type 5 (Amyotrophic lateral sclerosis type 5); spastic paraplegia 15,2,3,35,39,4, autosomal dominant,55, autosomal recessive, and 5A (Spastic paraplegia 15,2,3,35,39,4,autosomal dominant,55,autosomal recessive,and 5A); congenital bile acid synthesis deficiency 3 (Bile acid synthesis defect, connetial, 3); spermatogenic disorders 11, 3and 8 (Spermatogenic failure 11,3, and 8); globoid erythrosis types 4and 5 (Spherocytosis types and 5); globoid myopathy (Spheroid body myopathy); spinal muscular atrophy, lower limb dominance 2, autosomal dominance (Spinal muscular atrophy, lower extremity predominant 2,autosomal dominant); spinal muscular atrophy type II (Spinal muscular atrophy, type II); spinocerebellar ataxia 14,21,35,40 and 6 (Spinocerebellar ataxia 14,21,35,40,and 6); spinocerebellar ataxia autosomal recessive 1and 16 (Spinocerebellar ataxia autosomal recessive, 1and 16); splenic hypoplasia (Splenic hypoplasia); a spinal carpal tarsal fusion syndrome (Spondylocarpotarsal synostosis syndrome); spinal hand dysplasia, ehlers-Danlos syndrome, immune disorder, aggrecan, congenital joint dislocation, short limb hand, sedaghatian, cone rod dystrophy, kozlowski (Spondylocheirotyplassia, ehlers-Danlos syndrome-like, with immune dysregulation, aggrecan type, with congenital joint dislocations, short limb-hand type, sedaghatian type, with con-rod dynstroph, and Kozlowski type); jurassic dwarfism (Parastremmatic dwarfism); stargardt disease 1; cone bar dystrophy 3 (Cone-rod dynasty 3); stickler syndrome type 1; kniest dysplasia (Kniest dysplasia); stickler syndrome types 1 (non-syndromic eye disease) and 4 (Stickler syndrome, types 1 (nonsyndromic ocular) and 4); sting-related vascular lesions, infancy onset (Sting-associated vasculopathy, infantile-onset); stormerken syndrome; sturge-Weber syndrome, congenital capillary deformity 1 (Sturge-Weber syndrome, capillary malformations, connetial, 1); succinyl-coa acetoacetate transferase deficiency (Succinyl-CoA acetoacetate transferase deficiency); sucrase-isomaltase deficiency (Sucrase-isomaltase deficiency); sudden infant death syndrome (Sudden infant death syndrome); isolated sulfite oxidase deficiency (Sulfite oxidase deficiency, isolated); aortic stenosis on the valve (Supravalvar aortic stenosis); pulmonary surfactant metabolic dysfunctions 2and 3 (Surfactant metabolism dysfunction, pulmoniy, 2and 3); proximal phalangeal adhesion 1b (symphalalangism, proximal,1 b); and finger (toe) Cenani Lenz type (Syndactyly Cenani Lenz type); and finger (toe) type 3 (syncyl type 3); syndrome X-linked mental retardation 16 (Syndromic X-linked mental retardation 16); a valgus varus deformity (Talipes equinovarus); tangier disease; TARP syndrome; tay-Sachs disease, B1 variation, gm 2-ganglioside deposition (adult), gm 2-ganglioside deposition (adult onset) (Tay-Sachs disease, B1 variant, gm2-gangliosidosis (adult), gm 2-ganliaosidis (adult-onset)); tertamy syndrome; tenorio syndrome; end bone dysplasia (Terminal osseous dysplasia); testosterone 17-beta-dehydrogenase deficiency (Testosterone 17-beta-dehydrogenase deficiency); congenital limb amputation, autosomal recession (tetramelia, autosomal recessive); fallotetraemia (Tetralogy of Fallot); left heart dysplasia syndrome2 (Hypoplastic left heart syndrome 2); arterial trunk (Truncus arteriosus); cardiac and macrovascular malformations (Malformation of the heart and great vessels); ventricular septal defect 1 (Ventricular septal defect 1); thiel-Behnke corneal dystrophy (Thiel-Behnke corneal dystrophy); thoracic aortic aneurysm and aortic dissection (Thoracic aortic aneurysms and aortic dissections); marfan-like morphology (Marfanoid habitus); three M syndrome2 (Three M syndrome 2); thrombocytopenia, platelet dysfunction, hemolysis and imbalances in globulin synthesis (Thrombocytopenia, platelet dysfunction, hemolysis, and imbalanced globin synthesis); thrombocytopenia, X-linked (X-linked); hereditary Thrombophilia, protein C deficiency, autosomal dominant and recessive (thrombia, herediness, due to protein Cdeficiency, autosomal dominant and recessive); hypoplasia of the Thyroid gland (Thyroid agensis); follicular Thyroid cancer (follicular); abnormal thyroid hormone metabolism (Thyroid hormone metabolism, abnormal); pan-thyroid hormone resistance, autosomal dominant (Thyroid hormone resistance, genetically, autosomal dominant); hyperthyroidism periodic paralysis and hyperthyroidism periodic paralysis 2 (Thyrotoxic periodic paralysis and Thyrotoxic periodic paralysis 2); pan-Thyrotropin releasing hormone resistance (Thyrotropin-releasing hormone resistance, genetically modified); timothy syndrome; TNF receptor-related periodic fever syndrome (TNF receptor-associated periodic fever syndrome (trap)); selective dental dysplasia 3and 4 (dental agent, selective,3and 4); torsion ventricular tachycardia (Torsades de pointes); townes-Brocks-brandroootoenal-like syndrome; neonatal temporary bullous skin dissolution (Transient bullous dermolysis of the newborn); treacher collins syndrome 1; hair enlargement is accompanied by mental retardation, dwarfism and retinal pigment degeneration (Trichomegaly with mental retardation, dwarfism and pigmentary degeneration of retina); hair nasopharyngeal dysplasia type I (Trichorhinophalangeal dysplasia type I); hair turbinate syndrome type 3 (Trichorhinophalangeal syndrome type 3); trimethylaminuria (trimethyllaminria); tuberous sclerosis syndrome (Tuberous sclerosis syndrome); lymphangiomyomatosis (lymphangiomyomosis); tuberous sclerosis 1and2 (Tuberous sclerosis 1and 2); tyrosinase negative ocular skin albinism (Tyrosinase-negative oculocutaneous albinism); tyrosinase-positive ocular skin albinism (Tyrosinase-positive oculocutaneous albinism); tyrosinemia type I (Tyrosinemia type I); UDP glucose-4-epimerase deficiency (UDPglucose-4-epimerase deficiency); ullrich congenital muscular dystrophy (Ullrich congenital muscular dystrophy); ulna and fibular defects such as those associated with critical limb defects (Ulna and fibula absence of with severe limb deficiency); upshaw-Schulman syndrome; uridylic acid synthase deficiency (Urocanate hydratase deficiency); usher syndrome types 1, 1B,1D,1G, 2A, 2C, and 2D; retinal pigment degeneration 39 (Retinitis pigmentosa 39); UV-sensitive syndrome (UV-sensitive syndrome); van der Woude syndrome; van Maldergem syndrome 2; hennekam lymphangiogenesis-lymphedema syndrome2 (Hennekam lymphangiectasia-lymphedema syndrome 2); variegated porphyria (Variegate porphyria); ventricular enlargement with cystic kidney disease (Ventriculomegaly with cystic kidney disease); verheij syndrome; very long chain acyl-coa dehydrogenase deficiency (Very long chain acyl-CoA dehydrogenase deficiency); vesicoureteral reflux 8 (Vesicoureteral reflux 8); visceral ectopic 5, autosomal (Visceral heterotaxy, autosomal); visceral myopathy (Visceral myopathy); vitamin D dependent rickets type 1and type 2 (Vitamin D-dependent rickets, types 1and 2); vitelline-like malnutrition (Vitelliform dystrophy); von Willebrand disease types 2M and 3; waarenburg syndrome types 1, 4C and 2E (with involvement of the nervous system); klein-Waarbenberg syndrome; walker-Warburg congenital muscular dystrophy (Walker-Warburg congenital muscular dystrophy); warburg micro syndrome 2and 4; warts, hypogammaglobulinemia, infections, and myelocytopenia (Warts, hypogammaglobulinemia, inffectons, and myelokathexis); weaver syndrome; weill-Marchesani syndrome 1and 3; weill-Marchesani-like syndrome; weissenbacher-Zweymuller syndrome; werdnig-Hoffmann disease; summer-horse-figure three disease; werner syndrome; WFS 1-related diseases; wiedemann-Steiner syndrome; wilson disease (Wilson disease); wolfram-like syndrome, autosomal dominant; worth disease; van Buchem disease type 2; xeroderma pigmentosum, complementation group b, group D, group E and group G; x-linked agaropectinemia (Xeroderma pigmentosum, complementation group b, group D, group E, and group G); x-linked hereditary motor and sensory neuropathy (X-linked agammaglobulinemia; X-linked hereditary motor and sensory neuropathy); ichthyosis X-linked with sterol sulfatase deficiency (X-linked ichthyosis with steryl-sulfatase deficiency); x-linked periventricular ectopic (X-linked periventricular heterotopia); ear-palate-index syndrome type I (Oto-palto-digital syndrome, type I); severe combined immunodeficiency of the X linkage (X-linked severe combined immunodeficiency); zimmermann-Laband syndrome 2; lamellar dust cataract 3 (Zonular pulverulent cataract 3).
The target nucleotide sequence may comprise a target sequence associated with a disease, disorder or condition (e.g., a point mutation). The target sequence may comprise a T-to-C (or a-to-G) point mutation associated with a disease, disorder or condition, and wherein deamination of the mutated C base results in a mismatch repair-mediated correction of the sequence not associated with the disease, disorder or condition. The target sequence may comprise a G to a (or C to T) point mutation associated with a disease, disorder or condition, and wherein deamination of the mutant a base results in a mismatch repair mediated correction of the sequence not associated with the disease, disorder or condition. The target sequence may encode a protein, and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutated codon compared to the wild-type codon. The target sequence may also be located at a splice site, and the point mutation results in a change in splicing of the mRNA transcript compared to the wild-type transcript. Furthermore, the target may be located in a non-coding sequence of the gene, such as a promoter, and the point mutation results in an increase or decrease in gene expression.
Thus, in some aspects, deamination of mutation C results in a change in the amino acid encoded by the mutant codon, which in some cases can result in expression of the wild-type amino acid. In other aspects, deamination of mutation a results in a change in the amino acid encoded by the mutant codon, which in some cases can result in expression of the wild-type amino acid.
The methods described herein involving contacting a cell with a composition or rAAV particle can occur in vitro, ex vivo, or in vivo. In certain embodiments, the contacting step occurs in the subject. In certain embodiments, the subject has been diagnosed with a disease, disorder, or condition.
In some embodiments, the methods disclosed herein comprise contacting a mammalian cell with a composition or rAAV particle. In particular embodiments, the method involves contacting retinal cells, cortical cells, or cerebellar cells.
The split Cas9 protein or split guide editor delivered using the methods described herein preferably has activity comparable to the original Cas9 protein or guide editor (i.e., the unbroken protein delivered to or expressed as a whole in a cell). For example, the split Cas9 protein or split guide editor retains at least 50% (e.g., at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%) of the original Cas9 protein or guide editor activity. In some embodiments, the split Cas9 protein or split guide editor is more active (e.g., 2-fold, 5-fold, 10-fold, 100-fold, 1000-fold, or more) than the original Cas9 protein or guide editor.
The compositions described herein may be administered to a subject in need thereof in a therapeutically effective amount to treat and/or prevent a disease or disorder in which the subject is suffering. Any disease or disorder that can be treated and/or prevented using CRISPR/Cas 9-based genome editing techniques can be treated by the split Cas9 protein or split guide editor described herein. It is understood that if the nucleotide sequence encoding the split Cas9 protein or the guide editor does not further encode a gRNA, a separate nucleic acid vector encoding the gRNA may be administered with the compositions described herein.
Exemplary suitable diseases, disorders or conditions include, but are not limited to, diseases or disorders selected from the group consisting of: cystic fibrosis, phenylketonuria, epidermolytic Hyperkeratosis (EHK), chronic Obstructive Pulmonary Disease (COPD), charcot-Marie-toolt disease type 4J, neuroblastoma (NB), von Willebrand's disease (vWD), congenital myotonic, hereditary renal amyloidosis, dilated cardiomyopathy, hereditary lymphedema, familial alzheimer's disease, prion disease, chronic infant neuropathic skin joint syndrome (CINCA), congenital deafness, niemann-Pick's disease type C (NPC) disease, and desmin-related myopathy (DRM). In particular embodiments, the disease or disorder is niemann-pick disease type C (NPC) disease.
In some embodiments, the disease, disorder, or condition is associated with a point mutation in an NPC gene, a DNMT1 gene, a PCSK9 gene, or a TMC1 gene. In certain embodiments, the point mutation is a T3182C mutation in NPC that results in an I1061T amino acid substitution.
In certain embodiments, the point mutation is an a545G mutation in TMC1 that results in a Y182C amino acid substitution. TMC1 encodes a protein that forms mechanically sensitive ion channels in sensory hair cells of the inner ear, which is necessary for normal auditory function. Y182C amino acid substitutions are associated with congenital deafness.
In some embodiments, the disease, disorder, or condition is associated with a point mutation that produces a stop codon, e.g., a premature stop codon within the coding region of the gene.
Other exemplary diseases, disorders, or conditions include cystic fibrosis (see, e.g., schwank et al Functional repair of CFTR by CRISPR/Cas9 in intestinal stem cell organoids of cystic fibrosis components.cell step cell.2013;13:653-658; and Wu et al Correction of a genetic disease in mouse via use of CRISPR-Cas9.cell step cell.2013;13:659-662, neither of which use deaminase fusion proteins to correct genetic defects); phenylketonuria, e.g., phenylalanine to serine mutation at position 835 (mouse) or position 240 (human) or homologous residue in the phenylalanine hydroxylase gene (T > C mutation) -see, e.g., mcDonald et al, genomics.1997;39:402-405; bernard-Soulier syndrome (BSS) -e.g., a phenylalanine to serine mutation at position 55 or homologous residue in platelet membrane glycoprotein IX, or a cysteine to arginine mutation (T > C mutation) at residue 24 or homologous residue-see, e.g., noris et al, british Journal of Haemallogy.1997; 97:312-320,and Ali et al, hemalo.2014; 93:381-384; epidermolytic Hyperkeratosis (EHK) -e.g., leucine to proline mutation (T > C mutation) at position 160 or 161 (if the starting methionine is calculated) or homologous residues in keratin 1-see, e.g., chipev et al, cell.1992;70:821-828, see also UNIPAT database (www.uniprot.org) accession number P04264; chronic Obstructive Pulmonary Disease (COPD) -e.g., leucine to proline mutations (T > C mutations) at either the processing form site 54 or 55 or the homologous residue of alpha 1-antitrypsin (if the starting methionine is calculated) or the unprocessed form residue 78 or the homologous residue-see, e.g., beller et al, genomics.1993;17:740-743, see also UNIPAT database accession number P01011; charcot-Marie-Toot disease type 4J-e.g., isoleucine to threonine mutation at position 41 or homologous residue in FIG. 4 (T > C mutation) -see, e.g., lenk et al, PLoS genetics.2011;7:e1002104; neuroblastoma (NB) -e.g., a leucine to proline mutation at position 197 or a homologous residue in Caspase-9 (T > C mutation) -see, e.g., kundu et al, 3Biotech.2013,3:225-234; von willebrand disease (vWD) -for example, von willebrand factor processed form site 509 or at homologous residues or von willebrand factor unprocessed form site 1272 or at homologous residues, cysteine to arginine mutations (T > C mutations) -see, for example, lavegne et al, br.j. Haemaol.1992, 82:66-72; see also UNIPROT database accession number P04275; congenital myotonia-e.g., cysteine-to-arginine mutation at position 277 or homologous residue in muscle chloride channel gene CLCN1 (T > C mutation) -see, e.g., weinberger et al, the j.of physiology.2012,590:3449-3464; hereditary renal amyloidosis-e.g., stop codon to arginine mutation at either processing form site 78 or homologous residue or unprocessed form site 101 or homologous residue of apolipoprotein AII (T > C mutation) -see, e.g., yazaki et al, kidney int.2003;64:11-16; dilated Cardiomyopathy (DCM) -e.g. a tryptophan to arginine mutation at position 148 or a homologous residue in the FOXD4 gene (T > C mutation), see e.g. Minoretti et al, int.j.of mol.med.2007,19:369-372; hereditary lymphedema-e.g., a histidine to arginine mutation at position 1035 or a homologous residue in the VEGFR3 tyrosine kinase (A > G mutation), see, e.g., irrthum et al, am.J.hum.Genet.2000,67:295-301; familial alzheimer's disease-e.g., isoleucine to valine mutation at position 143 or a homologous residue in presenilin 1 (a > G mutation), see, e.g., gallo et al, j. Alzheimer's disease.2011,25:425-431; prion diseases, e.g., a methionine to valine mutation at position 129 homologous residue in a prion protein (a > G mutation), see, e.g., lewis et al, j.of General virology 2006,87:2443-2449; chronic infant neuropathic cutaneous joint syndrome (CINCA) -e.g., tyrosine to cysteine mutation at position 570 or homologous residues in cryopyrin protein (a > G mutation) -see, e.g., fujisawa et al blood 2007,109:2903-2911; and desmin-related myopathy (DRM) -e.g., an arginine to glycine mutation (a > G mutation) at position 120 or a homologous residue in an αβ lens protein-see, e.g., kumar et al, j.biol.chem.1999;274:24137-24141. The entire contents of all references and database entries are incorporated herein by reference.
Trinucleotide repeat amplification disease
Trinucleotide repeat amplification is associated with a number of human diseases including huntington's disease, fragile X syndrome and friedreich's ataxia. The most common trinucleotide repeats contain the CAG triplets, but GAA triplets (friedreich's ataxia) and CGG triplets (fragile X syndrome) are also present. Inheriting the propensity for amplification or obtaining a parent allele that has been amplified increases the likelihood of disease. The pathogenic amplification of trinucleotide repeats can be assumed to be corrected using guided editing.
The region upstream of the repeat region may be nicked by RNA-guided nucleases and then used to initiate synthesis of new DNA strands containing healthy numbers of repeats (depending on the particular gene and disease), according to the general mechanisms outlined in fig. 1G or fig. 22. After the repeated sequence, a homologous short sequence matching the identity of the sequence (red strand) adjacent to the other end of the repeated sequence is added. The guide editor invades the newly synthesized strand and then replaces endogenous DNA with the newly synthesized flap, resulting in reduced repetitive alleles. The term "reduction" refers to a reduction in the length of the nucleotide repeat region, resulting in repair of the trinucleotide repeat region.
The guided editing systems or guided editing (PE) systems described herein can be used to reduce trinucleotide repeat mutations (or "triplet amplification diseases") to treat conditions such as huntington's disease and other trinucleotide repeat conditions. Trinucleotide repeat expansion disorders are complex progressive disorders involving developmental neurobiology, often affecting cognitive and sensorimotor functions. These disorders show genetic early onset (i.e., increased severity for each generation). DNA amplification or reduction typically occurs in meiotic fashion (i.e., during gametogenesis or early embryonic development) and is typically gender biased, meaning that some genes are only amplified when inherited by females, while others are inherited only by males. In humans, trinucleotide repeat amplification causes gene silencing at the transcriptional or translational level, which essentially disrupts gene function. Alternatively, trinucleotide repeat amplification disorders result in altered proteins that produce large repeated amino acid sequences, often eliminating or altering protein function in a dominant negative manner (e.g., polyglutamine disease).
Without wishing to be bound by theory, triplet amplification is caused by slippage during DNA replication or during DNA repair synthesis. Since tandem repeat sequences have identical sequences to each other, base pairing between two DNA strands can occur at multiple points in the sequence. This may lead to the formation of "loop-out" structures during DNA replication or DNA repair synthesis. This may result in repeated copies of the repeated sequence, expanding the number of repeats. Other mechanisms involving hybridizing RNA to DNA intermediates have been proposed. Guided editing can be used to reduce or eliminate one or more or problematic repeat codon triplets by deleting those triplets. In an embodiment of this application, FIG. 23 provides a schematic representation of a pegRNA design with reduced or reduced trinucleotide repeats with guided editing.
Guide editing can be performed to reduce the triplet amplified region by nicking the region upstream of the triplet repeat region with a guide editor containing a pepRNA that targets the cleavage site appropriately. The guide editor then synthesizes a new DNA strand (ssDNA flap) based on the pegRNA as a template (i.e., its editing template) that encodes a healthy number of triplet repeats (depending on the specific gene and disease). A newly synthesized ssDNA strand comprising a healthy triplet repeat sequence is also synthesized to include homologous short sequences (i.e., homology arms) that match the sequence adjacent to the other end of the repeat sequence (red strand). Invasion of the newly synthesized strand, and subsequent replacement of endogenous DNA with newly synthesized ssDNA flaps, results in reduced repeat alleles.
Depending on the particular trinucleotide amplification disease, defect-induced triplet amplification can occur in "trinucleotide repeat amplified proteins". Trinucleotide repeat expansion proteins are a diverse group of proteins associated with the susceptibility to develop trinucleotide repeat expansion disease, the presence of trinucleotide repeat expansion disease, the severity of trinucleotide repeat expansion disease, or any combination thereof. Trinucleotide repeat amplification is classified into two categories according to the type of the repeat sequence. The most common repeat is the triplet CAG, which when present in the coding region of the gene encodes the amino acid glutamine (Q). Thus, these diseases are known as polyglutamine (poly Q) diseases and include the following: huntington's Disease (HD); spinal and Bulbar Muscular Atrophy (SBMA); spinocerebellar ataxia (SCA types 1, 2, 3, 6, 7 and 17); and dentate nucleus pallidus globus palustris atrophy (DRPLA). The remaining trinucleotide repeat amplification disease does not involve a CAG triplet or a CAG triplet is not in the coding region of the gene and is therefore referred to as a non-polyglutamine disease. Non-polyglutamine diseases include fragile X syndrome (FRAXA); friable XE mental retardation (FRAXE); friedreich's ataxia (FRDA); myotonic Dystrophy (DM); and spinocerebellar ataxia (SCA type 8 and type 12).
Proteins associated with trinucleotide repeat expansion disorders can be selected based on their experimental association with trinucleotide repeat expansion disorders. For example, the productivity or circulating concentration of a protein associated with a trinucleotide repeat expansion disease may be increased or decreased in a population with a trinucleotide repeat expansion disease relative to a population without a trinucleotide repeat expansion disease. Protein level differences can be assessed using proteomic techniques including, but not limited to, western blotting, immunohistochemical staining, enzyme-linked immunosorbent assay (ELISA), and mass spectrometry. Alternatively, proteins associated with trinucleotide repeat amplification disorders can be identified by obtaining gene expression profiles of the genes encoding the proteins using genomic techniques, including but not limited to DNA microarray analysis, gene expression Series Analysis (SAGE), and real-time quantitative polymerase chain reaction (Q-PCR).
Non-limiting examples of proteins associated with trinucleotide repeat expansion disorders can be corrected by guided editing, including AR (androgen receptor), FMR1 (fragile X mental retardation 1), HTT (huntingtin protein), DMPK (myotonic dystrophy-protein kinase), FXN (frataxin), ATXN2 (ataxin 2), ATN1 (atrophin 1), FEN1 (valve structure specific endonuclease 1), TNRC6A (trinucleotide repeat sequence containing 6A), PABPN1 (poly (A) binding protein, core 1), JPH3 (avidin 3), MED15 (intermediate complex subunit 15), ATXN1 (ataxin 1), ATXN3 (ataxin 3), TBP (TATA box binding protein), CACNA1A (calcium channel, voltage dependent P/Q type, α1a subunit), ATXN80S (ATXN 8 reverse strand (non-protein coding)), PPP2R2B (protein phosphatase 2, regulatory subunit B, β), ATXN7 (ataxin 7), TNRC6B (6B-containing trinucleotide repeat), TNRC6C (6C-containing trinucleotide repeat), CELF3 (CUGBP, elav-like family member 3), MAB21L1 (MAB-21-like 1 (caenorhabditis elegans)), MSH2 (mutS homolog 2, colon cancer, non-polyposis type 1 (escherichia coli)), TMEM185A (transmembrane protein 185A), SIX5 (SIX homolog 5), CNPY3 (canopy 3 homolog (zebra fish)), FRAXE (rare sites, she Suanxing, rare, fra (X) (q 28) E), GNB2 (guanosine-binding protein (G protein), beta polypeptide 2), RPL14 (ribosomal protein L14), ATXN8 (ataxin 8), INSR (insulin receptor), TTR (transthyretin), EP400 (E1A-binding protein p 400), GIGYF2 (GRB 10-interacting GYF protein 2), OGG1 (8-oxoguanine DNA glycosidase), STC1 (steganin 1), CNDP1 (carnosine dipeptidase 1 (metallopeptidase M20 family)), C10orf2 (chromosome 10 open reading frame 2), MAML3 (mastering-like 3 (drosophila)), DKC1 (congenital hyperkeratosis 1, keratins), PAXIP1 (PAX interacting with the transcriptional activation domain protein 1), CASK (calmodulin-dependent serine protein kinase (MAGUK family)), MAPT (microtubule-associated protein tau), SP1 (Sp 1 transcription factor), POLG (polymerase (DNA targeting), gamma), AFF2 (AF 4/FMR2 family, member 2), THBS1 (thrombospondin 1), TP53 (tumor protein p 53), ESR1 (estrogen receptor 1), CGGBP1 (CGG triplet repeat binding protein 1), ABT1 (basal transcriptional activation factor 1), KLK3 (kallikrein-associated peptidase 3), PRNP (prion protein), JUN (JUN oncogene)), KCNN3 (potassium medium/small conductance calcium activation pathway, subfamily N, member 3), BAX (BCL 2-related X protein), FRAXA (fragile site, she Suanxing, rare, fra (X) (q 27.3) A (giant orchid disease, mental retardation), KBBD 10 (10 in the kelch repeat and BTB (POZ) domains), MBNL1 (blind myoid (Drosophila)), RAD51 (RAD 51 homolog (RecA homolog, E.coli) (Saccharomyces cerevisiae)), NCOA3 (nuclear receptor coactivator 3), ERDA1 (amplified repeat domain, CAG/CTG 1), TSC1 (tuberous sclerosis 1), COMP (cartilage oligomeric matrix protein), GCLC (glutamate-cysteine ligase, catalytic subunit), RRAD (diabetes related Ras), MSH3 (mutS homolog 3 (e.coli)), DRD2 (dopamine receptor D2), CD44 (CD 44 molecule (indian blood group)), CTCF (CCCTC binding factor (zinc finger protein))), CCND1 (cyclin D1), CLSPN (claspin homolog (xenopus laevis), MEF2A (myo-enhanced factor 2A), ptpur (protein tyrosine phosphatase, receptor type, U), GAPDH (3-phosphoglycerol aldehyde dehydrogenase), TRIM22 (triple motif containing 22), WT1 (Wilms) tumor 1), AHR (arene receptor), GPX1 (glutathione peroxidase 1), TPMT (thiopurine S-methyltransferase) NDP (nori disease (pseudoglioma)), ARX (awn-free associated homeobox), MUS81 (MUS 81 endonuclease homolog (saccharomyces cerevisiae)), TYR (tyrosinase (eyelid albinism IA)), EGR1 (early growth reaction protein 1), UNG (uracil-DNA glycosylase), NUMBL (number homolog (drosophila) -like), FABP2 (fatty acid binding protein 2, intestinal tract), EN2 (engrailed homeobox 2), cregc (lens protein, yc), SRP14 (signal recognition particle 14kDa (homologous Alu RNA binding protein), cregb (lens protein, yc), PDCD1 (programmed cell death 1), HOXA1 (homeobox A1), ATXN2L (ataxin 2-like), PMS2 (post-meiosis separation increase 2 (saccharomyces cerevisiae)), GLA (galactosidase), alpha), CBL (Cas-Br-M (murine) aviphilic retroviral transformation sequence), FTH1 (ferritin, heavy polypeptide 1), IL12RB2 (interleukin 12 receptor,. Beta.2), OTX2 (ortholog homeobox 2), HOXA5 (homeobox A5), POLG2 (polymerase (DNA targeting), gamma 2, helper subunit), DLX2 (distantly homeobox 2), SIRPA (signal regulator protein) alpha), OTX1 (ortholog homeobox 1), AHRR (aromatic hydrocarbon receptor repressor), MANF (midbrain astrocyte-derived neurotrophic factor), human astrocyte-like human being, TMEM158 (transmembrane protein 158 (gene/pseudogene)) and ENSG00000078687.
In a particular aspect, the present disclosure provides a guidance editor for treating a subject diagnosed with an amplification repeat disease (also referred to as a repeat amplification disorder or trinucleotide repeat disease). Amplification repeat disease occurs when microsatellite repeat amplification exceeds a threshold length. At present, at least 30 genetic diseases are considered to be caused by repeated amplifications. Early 90 s of the 20 th century, with the discovery that trinucleotide repeats are the basis of several major hereditary diseases, including fragile X, spinal and bulbar muscular atrophy, myotonic dystrophy and huntington's disease (Nelson et al, "The unstable repeats-three evolving faces of neurological disease," Neuron, march 6,2013, vol.77;825-843, incorporated herein by reference), and HawRiver syndrome, jacobsen syndrome, dentate nuclear pallidox atrophy (DRPLA), machado-Joseph disease, and multi-finger (toe) deformity (SPDII), hand and Foot Genital Syndrome (HFGS), collarbone dysgenesis (CCD), forebrain non-split deformity (HPE), congenital Central Hypoventilation Syndrome (CCHS), ARX non-syndromic X mental retardation (XLMR) and Ocular Pharyngeal Muscular Dystrophy (OPMD) (see, repeated amplification of each generation in succession is thought to cause disease by several different mechanisms.
In one embodiment, the method of treating trinucleotide repeat disorders is depicted in FIG. 23. Generally, the method involves the use of guided editing in combination with a gRNA that contains a region encoding a desired and healthy replacement trinucleotide repeat sequence that is intended to replace the endogenous diseased trinucleotide repeat sequence by a guided editing processing mechanism. A schematic diagram of an exemplary gRNA design for reducing trinucleotide repeats and trinucleotide repeat reduction with guided editing is shown in fig. 23.
Prion diseases
Guided editing may also prevent or arrest the progression of prion disease by installing one or more protective mutations into misfolded prion protein (PRNP) during disease. Prion diseases or Transmissible Spongiform Encephalopathies (TSEs) are a family of rare progressive neurodegenerative diseases affecting humans and animals. They are characterized by long latency, characteristic spongiform changes associated with neuronal loss, and inability to induce inflammatory responses.
In humans, prion diseases include Creutzfeldt-Jakob disease (CJD), variant Creutzfeldt-Jakob disease (vCJD), jacob disease-Style-Sjogren syndrome, fatal familial insomnia, and kuru. In animals, prion diseases include bovine spongiform encephalopathy (BSE or "mad cow disease"), chronic Wasting Disease (CWD), sheep pruritus, infectious mink encephalopathy, feline spongiform encephalopathy, and ungulate spongiform encephalopathy. Guided editing can be used to install protective point mutations into prion proteins to prevent or arrest the progression of any of these prion diseases.
Classical CJD is a human prion disease. It is a neurodegenerative disease with typical clinical and diagnostic characteristics. This disease progresses rapidly and is always fatal. Infection with this disease usually results in death within 1 year after onset. CJD is a rapidly progressing and invariably fatal neurodegenerative disease, believed to be caused by an abnormal isoform of cellular glycoproteins known as prion protein. CJD occurs worldwide and estimated annual incidence in many countries, including the united states, is reported to be about one million population. Most CJD patients die within 1 year of onset. CJD is classified as Transmissible Spongiform Encephalopathy (TSE) along with other prion diseases occurring in humans and animals. In about 85% of patients, CJD occurs as sporadic disease with no identifiable transmission pattern. Due to the genetic mutation of the prion protein gene, a small percentage of patients (5% to 15%) develop CJD. These genetic forms include jetty-stoneley-schlerk syndrome and fatal familial insomnia. There is currently no known treatment for CJD.
The variant creutzfeldt-jakob disease (vCJD) is a prion disease first described in the united kingdom in 1996. There is now strong scientific evidence that the pathogens responsible for the outbreak of prion disease, bovine spongiform encephalopathy (BSE or "mad cow disease") in cows are identical to the pathogens responsible for the outbreak of vCJD in humans. Variant CJD (vCJD) differs from classical CJD (often abbreviated as CJD). It has clinical and pathological characteristics different from classical CJD. Each disease also has specific genetic characteristics of prion protein genes. Both diseases are fatal brain diseases, with very long latency, in years, and caused by an unusual infectious agent called a prion. There is currently no known treatment for vCJD.
BSE (bovine spongiform encephalopathy or "mad cow disease") is a progressive neurological disorder of cattle caused by a rare infectious agent infection called prion. The nature of the infectious agent is not yet clear. Currently, the most accepted theory is that this factor is a modified form of a normal protein called prion protein. For reasons that are not yet clear, normal prion proteins can be converted to pathogenic (deleterious) forms and then damage the central nervous system of the cow. There is increasing evidence that different BSE strains exist: typical or classical BSE strains and two atypical strains (H and L strains) which lead to british outbreaks. There is currently no known treatment for BSE.
Chronic Wasting Disease (CWD) is a prion disease affecting deer, elk, reindeer, sika deer and moose. Have been found in some areas of north america including canada and the united states, norway and korea. Infected animals may take more than a year to develop symptoms, including a sudden decrease in body weight (emaciation), fall and crashes, insemination, and other neurological symptoms. CWD can affect animals of all ages and some infected animals may die without illness. CWD is fatal to animals and has no therapeutic or vaccine.
The causative agent of TSE is thought to be prion. The term "prion" refers to abnormal pathogens that are transmissible and capable of inducing abnormal folding of a specific normal cellular protein called prion protein, which is the highest in the brain. The function of these normal prion proteins is still not fully understood. Abnormal folding of prion protein results in the characteristic signs and symptoms of brain injury and disease. Prion diseases generally progress rapidly and are invariably fatal.
As used herein,the term "prion" refers to infectious particles known to cause diseases in humans and animals (spongiform encephalopathies). The term "prion" is an abbreviation for both the terms "protein" and "infection" and the particles are largely, if not entirely, composed of expressed PRNP C PRNP encoded by PRNP gene of (a) Sc Molecular composition, PRNP C Conformational change to PRNP Sc . Prions are different from bacteria, viruses and viroids. Prions are known to include those that infect animals causing sheep pruritus (transmissible degenerative disease of the sheep and goat nervous system), bovine Spongiform Encephalopathy (BSE) or mad cow and cat spongiform encephalopathy. As described above, four prion diseases known to affect humans are (1) kuru, (2) creutzfeldt-jakob disease (CJD), (3) jetty-straussler disease (GSS), and (4) Fatal Familial Insomnia (FFI). As used herein, prions include all forms of prions that cause all or any of these diseases or other diseases in any animal used, particularly in humans and domestic farm animals.
Generally, and without wishing to be bound by theory, previous diseases are caused by misfolding of prion proteins. Misfolding of prion proteins, a disease of this type, commonly referred to as deposition disease, can be explained as follows. If a is a normally synthesized gene product that exerts its intended physiological effect in monomeric or oligomeric state, a is a conformational-activated form of a that is capable of undergoing a significant conformational change, B is a conformationally altered state that favors multimeric assembly (i.e., misfolded forms of deposit formation), and Bn is a multimeric material that is pathogenic and relatively difficult to recover. For prion diseases, PRNP C And PRNP Sc Corresponds to states a and Bn, where a is predominantly a helix and monomer and Bn is a beta-rich multimer.
It is known that certain mutations in prion proteins may be associated with increased risk of previous diseases. Conversely, certain mutations in prion proteins are protective in nature. See, bagynszky et al, "Characterization of mutations in PRNP (prion) gene and their possible roles in neurodegenerative diseases," Neuropsychiatr Dis treat, 2018;14:2067-2085, the contents of which are incorporated herein by reference.
PRNP (NCBI reference sequence No. NP-000302.1 (SEQ ID NO: 396)) -human prion protein-is encoded by a gene 16kb long on chromosome 20 (4686151-4701588). It contains two exons, exon 2, carrying an open reading frame encoding a 253 Amino Acid (AA) long PrP protein. Exon 1 is a non-coding exon and can be used as a transcription initiation site. Post-translational modification resulted in the removal of the first 22 AA N-terminal fragments (NTFs) and the last 23 AA C-terminal fragments (CTFs). NTF is cleaved after PrP is transported to the Endoplasmic Reticulum (ER), whereas CTF (glycosyl phosphatidylinositol [ GPI-SP ]) is cleaved by GPI anchors. GPI anchors may be involved in PrP protein transport. It may also function to attach prion protein to the outer surface of the cell membrane. Normal PrP consists of a long N-terminal loop (containing an octapeptide repeat region), two short β -sheets, three alpha helices and a C-terminal region (which contains a GPI anchor). Cleavage of PrP produces 208 AA-long glycoproteins anchored to the cell membrane.
The 253 amino acid sequence of PRNP (np_ 000302.1) is as follows:
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV(SEQ ID NO:396)。
the 253 amino acid sequence of PRNP (np_ 000302.1) is encoded by the following nucleotide sequence (NCBI reference sequence No. nm_000311.5, "homo sapiens prion protein (PRNP), transcript variant 1, mrna), as follows:
GCGAACCTTGGCTGCTGGATGCTGGTTCTCTTTGTGGCCACATGGAGTGACCTGGGCCTCTGCAAGAAGCGCCCGAAGCCTGGAGGATGGAACACTGGGGGCAGCCGATACCCGGGGCAGGGCAGCCCTGGAGGCAACCGCTACCCACCTCAGGGCGGTGGTGGCTGGGGGCAGCCTCATGGTGGTGGCTGGGGGCAGCCTCATGGTGGTGGCTGGGGGCAGCCCCATGGTGGTGGCTGGGGACAGCCTCATGGTGGTGGCTGGGGTCAAGGAGGTGGCACCCACAGTCAGTGGAACAAGCCGAGTAAGCCAAAAACCAACATGAAGCACATGGCTGGTGCTGCAGCAGCTGGGGCAGTGGTGGGGGGCCTTGGCGGCTACATGCTGGGAAGTGCCATGAGCAGGCCCATCATACATTTCGGCAGTGACTATGAGGACCGTTACTATCGTGAAAACATGCACCGTTACCCCAACCAAGTGTACTACAGGCCCATGGATGAGTACAGCAACCAGAACAACTTTGTGCACGACTGCGTCAATATCACAATCAAGCAGCACACGGTCACCACAACCACCAAGGGGGAGAACTTCACCGAGACCGACGTTAAGATGATGGAGCGCGTGGTTGAGCAGATGTGTATCACCCAGTACGAGAGGGAATCTCAGGCCTATTACCAGAGAGGATCGAGCATGGTCCTCTTCTCCTCTCCACCTGTGATCCTCCTGATCTCTTTCCTCATCTTCCTGATAGTGGGATGAGGAAGGTCTTCCTGTTTTCACCATCTTTCTAATCTTTTTCCAGCTTGAGGGAGGCGGTATCCACCTGCAGCCCTTTTAGTGGTGGTGTCTCACTCTTTCTTCTCTCTTTGTCCCGGATAGGCTAATCAATACCCTTGGCACTGATGGGCACTGGAAAACATAGAGTAGACCTGAGATGCTGGTCAAGCCCCCTTTGATTGAGTTCATCATGAGCCGTTGCTAATGCCAGGCCAGTAAAAGTATAACAGCAAATAACCATTGGTTAATCTGGACTTATTTTTGGACTTAGTGCAACAGGTTGAGGCTAAAACAAATCTCAGAACAGTCTGAAATACCTTTGCCTGGATACCTCTGGCTCCTTCAGCAGCTAGAGCTCAGTATACTAATGCCCTATCTTAGTAGAGATTTCATAGCTATTTAGAGATATTTTCCATTTTAAGAAAACCCGACAACATTTCTGCCAGGTTTGTTAGGAGGCCACATGATACTTATTCAAAAAAATCCTAGAGATTCTTAGCTCTTGGGATGCAGGCTCAGCCCGCTGGAGCATGAGCTCTGTGTGTACCGAGAACTGGGGTGATGTTTTACTTTTCACAGTATGGGCTACACAGCAGCTGTTCAACAAGAGTAAATATTGTCACAACACTGAACCTCTGGCTAGAGGACATATTCACAGTGAACATAACTGTAACATATATGAAAGGCTTCTGGGACTTGAAATCAAATGTTTGGGAATGGTGCCCTTGGAGGCAACCTCCCATTTTAGATGTTTAAAGGACCCTATATGTGGCATTCCTTTCTTTAAACTATAGGTAATTAAGGCAGCTGAAAAGTAAATTGCCTTCTAGACACTGAAGGCAAATCTCCTTTGTCCATTTACCTGGAAACCAGAATGATTTTGACATACAGGAGAGCTGCAGTTGTGAAAGCACCATCATCATAGAGGATGATGTAATTAAAAAATGGTCAGTGTGCAAAGAAAAGAACTGCTTGCATTTCTTTATTTCTGTCTCATAATTGTCAAAAACCAGAATTAGGTCAAGTTCATAGTTTCTGTAATTGGCTTTTGAATCAAAGAATAGGGAGACAATCTAAAAAATATCTTAGGTTGGAGATGACAGAAATATGATTGATTTGAAGTGGAAAAAGAAATTCTGTTAATGTTAATTAAAGTAAAATTATTCCCTGAATTGTTTGATATTGTCACCTAGCAGATATGTATTACTTTTCTGCAATGTTATTATTGGCTTGCACTTTGTGAGTATTCTATGTAAAAATATATATGTATATAAAATATATATTGCATAGGACAGACTTAGGAGTTTTGTTTAGAGCAGTTAACATCTGAAGTGTCTAATGCATTAACTTTTGTAAGGTACTGAATACTTAATATGTGGGAAACCCTTTTGCGTGGTCCTTAGGCTTACAATGTGCACTGAATCGTTTCATGTAAGAATCCAAAGTGGACACCATTAACAGGTCTTTGAAATATGCATGTACTTTATATTTTCTATATTTGTAACTTTGCATGTTCTTGTTTTGTTATATAAAAAAATTGTAAATGTTTAATATCTGACTGAAATTAAACGAGCGAAGATGAGCACCA(SEQ ID NO:397)
The mutation sites reported to date in relation to CJD and FFI with respect to PRNP (np_ 000302.1) are as follows. These mutations can be removed or installed using the guidance editors disclosed herein.
/>
The mutation site relative to PRNP (NP-000302.1) (SEQ ID NO: 396) associated with GSS was reported as follows:
/>
/>
mutation sites relative to PRNP (NP-000302.1) (SEQ ID NO: 396) associated with possible protective properties against prion diseases are as follows:
/>
thus, in various embodiments, pilot editing may be used to remove mutations in prns associated with prion diseases or to install mutations in prns that are believed to have a protective effect on prion diseases. For example, guided editing may be used to remove or restore D178N, V180I, T188K, E196K, E196A, E200K, E200G, V203I, R208H, V210I, E Q, I V or M232R mutations (relative to the PRNP of NP-000302.1) (SEQ ID NO: 396). In other embodiments, guided editing may be used to remove or restore P102L, P105L, A117V, G131V, V176G, H187R, F198S, D202N, Q212P, Q217R or M232T mutations (relative to the PRNP of NP-000302.1) (SEQ ID NO: 396) in the PRNP protein. By using guided editing to remove or correct the presence of such mutations in prns, the risk of prion diseases can be reduced or eliminated.
In other embodiments, guided editing may be used to install protective mutations in prns that are associated with protection against one or more prion diseases. For example, a boot editor may be used to install G127S, G127V, M129V, D167G, D167N, N171S, E219K or a P238S protective mutation (relative to the PRNP of NP-000302.1) (SEQ ID NO: 396) in the PRNP. In other embodiments, the protective mutation may be any of the alternative amino acids (PRNP relative to np_ 000302.1) installed at G127, M129, D167, N171, E219, or P238 in the PRNP (SEQ ID NO: 396).
In a particular embodiment, a boot edit may be used to install a G127V protective mutation in a PRNP, as shown in FIG. 27 and discussed in example 5.
In another embodiment, boot editing may be used to install the E219K protective mutation in the PRNP.
PRNP proteins and protective mutation sites are conserved in mammals and thus, in addition to treating human diseases, can be used to generate cattle and sheep immunized against prion diseases, even helping to cure wild animal populations with prion diseases. Pilot editing can be used to achieve about 25% of the installation of the natural protective allele in human cells, which the mouse experiments indicate is sufficient to elicit immunity to prion diseases. This approach is the first and perhaps the only way to install such genes in most cell types with such high efficiency. Another possible therapeutic strategy is to reduce or eliminate expression of PRNP by installing an early stop codon in the gene using guided editing.
Using the pegRNA design principles described herein, suitable pegRNAs can be designed to install the desired protective mutations, or to remove prion disease related mutations from the PRNP. For example, the following list of pegRNAs can be used to install the G127V protective allele and the E219K protective allele in human PRNPs, as well as the G127V protective alleles in various animal PRNPs.
[10]Pharmaceutical composition
Other aspects of the disclosure relate to pharmaceutical compositions comprising any of the various components of the guided editing systems described herein (e.g., including, but not limited to, napDNAbp, reverse transcriptase, fusion proteins (e.g., comprising napDNAbp and reverse transcriptase), pegRNA, and complexes comprising fusion proteins and pegRNA, as well as ancillary elements (e.g., a second nick generating component and a 5' endogenous DNA flap removal endonuclease to help drive the guided editing process toward editing product formation).
As used herein, the term "pharmaceutical composition" refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises other agents (e.g., for specific delivery, to increase half-life, or other therapeutic compounds).
As used herein, the term "pharmaceutically acceptable carrier" refers to a pharmaceutically acceptable material, composition or excipient, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or stearic acid), or solvent encapsulating material that participates in carrying or transporting a compound from one site of the body (e.g., a delivery site) to another site (e.g., an organ, tissue or part of the body). Pharmaceutically acceptable carriers are "acceptable" in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiological pH, etc.). Some examples of materials that may be used as pharmaceutically acceptable carriers include: (1) saccharides such as lactose, glucose and sucrose; (2) starches, such as corn starch, potato starch; (3) Cellulose and its derivatives such as sodium carboxymethyl cellulose, methyl cellulose, ethyl cellulose, microcrystalline cellulose, cellulose acetate, etc.; (4) powdery tragacanth; (5) malt; (6) gelatin; (7) Lubricants such as magnesium stearate, sodium lauryl sulfate, talc, and the like; (8) excipients such as cocoa butter, suppository waxes, etc.; (9) Oils such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, soybean oil, and the like; (10) glycols, such as propylene glycol; (11) Polyols such as glycerol, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents such as magnesium hydroxide, aluminum hydroxide, etc.; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethanol; (20) a pH buffer solution; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) Serum components such as serum albumin, high density lipoprotein and low density lipoprotein; (22) C2-C12 alcohols, such as ethanol; (23) other non-toxic compatible substances for pharmaceutical formulations. Wetting agents, colorants, mold release agents, coating agents, sweeteners, flavoring agents, perfuming agents, preservatives and antioxidants can also be present in the formulation. Terms such as "excipient," "carrier," "pharmaceutically acceptable carrier," and the like are used interchangeably herein.
In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administration of the pharmaceutical compositions described herein include, but are not limited to: topical, subcutaneous, transdermal, intradermal, intralesional, intra-articular, intraperitoneal, intravesical, transmucosal, gingival, intra-dental, intra-cochlear, tympanic membrane, intra-organ, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseous, periocular, intratumoral, intracerebral, and intraventricular administration.
In some embodiments, the pharmaceutical compositions described herein are topically applied to a diseased site (e.g., a tumor site). In some embodiments, the pharmaceutical compositions described herein are administered to a subject by injection, by catheter, by suppository, or by implant, the implant being a porous, non-porous, or gel-like material, including membranes such as sialic acid membranes, or fibers.
In other embodiments, the pharmaceutical compositions described herein are delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., langer,1990,Science 249:1527-1533;Sefton,1989,CRC Crit.Ref.Biomed.Eng.14:201;Buchwald et al, 1980,Surgery 88:507;Saudek et al, 1989, N.Engl. J. Med. 321:574). In another embodiment, polymeric materials may be used (see, e.g., medical Applications of Controlled Release (Langer and Wise eds., CRC Press, boca Raton, fla., 1974), controlled Drug Bioavailability, drug Product Design and Performance (Smolen and Ball eds., wiley, new York, 1984), ranger and Peppas,1983, macromol. Sci. Rev. Macromol. Chem.23:61. See also Levy et al.,1985,Science 228:190;During et al, 1989,Ann.Neurol.25:351;Howard et al, 1989, J. Neurosurg. 71:105). Other controlled release systems are discussed, for example, in Langer (supra).
In some embodiments, the pharmaceutical composition is formulated according to conventional procedures into a composition suitable for intravenous or subcutaneous administration to a subject, such as a human. In some embodiments, the pharmaceutical composition for administration by injection is a solution in a sterile isotonic aqueous buffer. If desired, the drug may also include a solubilizing agent and a local anesthetic (e.g., lidocaine) to relieve pain at the injection site. Typically, the ingredients are provided separately or mixed together in unit dosage form, e.g., as a dry lyophilized powder or anhydrous concentrate in a sealed container such as an ampoule or pouch that displays the active dose. When the drug is administered by infusion, the drug may be dispensed from an infusion bottle containing sterile pharmaceutical grade water or saline. When the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline may be provided so that the ingredients may be mixed prior to administration.
The pharmaceutical composition for systemic administration may be a liquid, such as sterile saline, ringer's lactate solution or Hank's solution. Furthermore, the pharmaceutical composition may be in solid form and redissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
The pharmaceutical composition may be contained in a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles may have any suitable structure, such as a monolayer or multilayer, so long as the composition is contained therein. The compounds may be encapsulated in "stable plasmid-lipid particles" (SPLPs) containing a fusogenic lipid dioleoyl phosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid core and stabilized by a polyethylene glycol (PEG) coating (Zhang Y.P.et al., gene Ther.1999, 6:1438-47). Positively charged lipids such as N- [1- (2, 3-dioleoyloxy) propyl ] -N, N, N-trimethyl-ammonium methyl sulfate or "DOTAP" are particularly preferred for use in such particles and vesicles. The preparation of these lipid particles is well known. See, for example, U.S. Pat. nos. 4,880,635;4,906,477;4,911,928;4,917,951;4,920,016; and 4,921,757; each of which is incorporated herein by reference.
For example, the pharmaceutical compositions described herein may be administered in unit doses or packaged. The term "unit dose" when used in the pharmaceutical compositions of the present disclosure refers to physically discrete units of unit dose suitable for use in a subject, each unit containing a predetermined amount of the active agent calculated to produce the desired therapeutic effect in combination with the required diluent; i.e., a carrier or vehicle.
Furthermore, the pharmaceutical compositions may be provided as a pharmaceutical kit comprising (a) a container containing the compound of the invention in lyophilized form, and (b) a second container containing a pharmaceutically acceptable diluent for injection (e.g., sterile water). Pharmaceutically acceptable diluents may be used to reconstitute or dilute the lyophilized compounds of the present invention. Optionally associated with such containers may be a notification in the form prescribed by a government agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notification reflects approval by the manufacturing, use or sale agency for human administration.
In another aspect, articles of manufacture comprising materials useful in the treatment of the above-described disorders are included. In some embodiments, the article comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The container may be formed from a variety of materials, such as glass or plastic. In some embodiments, the container contains a composition effective to treat the diseases described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle. The active agent in the composition is a compound of the present invention. In some embodiments, a label on or associated with the container indicates the choice of composition for treating the disease. The article of manufacture may further comprise a second container comprising a pharmaceutically acceptable buffer, such as phosphate buffered saline, ringer's solution, or dextrose solution. From a commercial and user perspective, it may also include other desirable materials including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
[11]Kit, cell, vector and delivery
Kit for detecting a substance in a sample
The compositions of the present disclosure may be assembled into a kit. In some embodiments, the kit comprises a nucleic acid vector for expressing a guidance editor as described herein. In other embodiments, the kit further comprises suitable guide nucleotide sequences (e.g., pegRNA and second locus gRNA) or nucleic acid vectors for expressing such guide nucleotide sequences to target the Cas9 protein or guide editor to a desired target sequence.
The kits described herein can include one or more containers containing components for performing the methods described herein and optionally instructions for use. Any of the kits described herein may also include the components necessary to perform the assay methods. Where applicable, the components of the kit may be provided in liquid form (e.g., solution) or in solid form (e.g., dry powder). In some cases, some components may be re-soluble or otherwise processable (e.g., into an active form), for example, by the addition of a suitable solvent or other substance (e.g., water), which may or may not be provided with the kit.
In some embodiments, the kit may optionally include instructions and/or promotions for using the provided components. As used herein, "description" may designate an element of description and/or promotion, and generally refers to written description relating to or associated with the packages of the present disclosure. The instructions may also include any verbal or electronic instructions provided in any manner such that the user will clearly recognize that the instructions will be associated with the kit, e.g., audiovisual (e.g., video tape, DVD, etc.), internet and/or web-based communications, etc. The written instructions may take the form prescribed by a government agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which may also reflect approval of the manufacture, use or sale agency for administration to animals. As used herein, "promotional" includes all methods of conducting business, including educational methods, hospital and other clinical guidelines, scientific research, drug discovery or development, academic research, pharmaceutical industry activities (including pharmaceutical sales), and any advertising or other promotional activities, including any form of written, verbal, and electronic communications relevant to the present disclosure. Furthermore, as described herein, the kit may include other components depending on the particular application.
The kit may comprise any one or more of the components described herein in one or more containers. The assembly may be prepared aseptically, packaged in syringes and shipped refrigerated. Alternatively, it may be stored in a vial or other container. The second container may have other components that are prepared aseptically. Alternatively, the kit may include the active agent pre-mixed and transported in a vial, tube or other container.
The kit may take a variety of forms, such as a blister pack, shrink-wrap bag, vacuum-sealed bag, sealed thermoformed tray, or similar pouch or tray form, wherein the fitment is loosely packaged within a pouch, one or more tubes, containers, boxes, or bags. The kit may be sterilized after the addition of the accessories, allowing the individual accessories in the container to be opened in other ways. The kit may be sterilized using any suitable sterilization technique, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. Kits may also include other components, such as containers, cell culture media, salts, buffers, reagents, syringes, needles, fabrics (e.g., gauze) for applying or removing disinfectant, disposable gloves, supports for reagents prior to application, and the like, depending on the particular application. Some aspects of the disclosure provide kits comprising nucleic acid constructs comprising nucleotide sequences encoding various components of the guided editing systems described herein (e.g., including, but not limited to, napDNAbp, reverse transcriptase, polymerase, fusion proteins (e.g., comprising napDNAbp and reverse transcriptase (or more broadly, polymerase), perna, and complexes comprising fusion proteins and perna), and auxiliary elements, such as a second nick-producing component (e.g., a second nick-producing gRNA) and a 5' endogenous DNA flap removal endonuclease to help drive the guided editing process toward editing product formation).
Other aspects of the disclosure provide kits comprising one or more nucleic acid constructs encoding various components of the guided editing systems described herein, e.g., comprising nucleotide sequences encoding components of the guided editing systems capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of a component of the guidance system.
Some aspects of the disclosure provide kits comprising a nucleic acid construct comprising (a) a nucleotide sequence encoding napDNAbp (e.g., cas9 domain) fused to a polymerase, such as a reverse transcriptase, and (b) a heterologous promoter that drives the expression of the sequence of (a).
Cells
Cells that may comprise any of the compositions described herein include prokaryotic cells and eukaryotic cells. The methods described herein are useful for delivering Cas9 proteins or guide editors to eukaryotic cells (e.g., mammalian cells such as human cells). In some embodiments, the cells are in vitro (e.g., cultured cells). In some embodiments, the cell is in vivo (e.g., in a subject such as a human subject). In some embodiments, the cells are ex vivo (e.g., isolated from a subject and can be administered back to the same or a different subject).
Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells), or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines including, but not limited to, human Embryonic Kidney (HEK) cells, heLa cells, cancer cells from the U.S. national cancer institute (National Cancer Institute) 60 cancer cell line (NCI 60), DU145 (prostate cancer) cells, lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from myeloma) and Saos-2 (bone cancer) cells. In some embodiments, the rAAV vector is delivered into Human Embryonic Kidney (HEK) cells (e.g., HEK 293 or HEK 293T cells). In some embodiments, the rAAV vector is delivered into a stem cell (e.g., a human stem cell), such as a pluripotent stem cell (e.g., a human pluripotent stem cell, including a human induced pluripotent stem cell (hiPSC)). Stem cells refer to cells that are capable of dividing indefinitely in culture and producing specialized cells. Pluripotent stem cells refer to a class of stem cells that are capable of differentiating into all tissues of an organism, but are incapable of independently maintaining the development of the intact organism. Human induced pluripotent stem cells refer to somatic (e.g., mature or adult) cells that are reprogrammed to an embryonic stem-like state by forced expression of genes and factors important to maintain defined characteristics of the embryonic stem cells (see, e.g., takahashi and Yamanaka, cell 126 (4): 663-76,2006, incorporated herein by reference). Human induced pluripotent stem cells express stem cell markers and are capable of producing cells with all three germ layers (ectodermal, endodermal, mesodermal) characteristics.
Other non-limiting examples of cell lines that may be used in accordance with the present disclosure include 293-T, 3T3, 4T1, 721, 9L, A-549, A172, A20, A253, A2780ADR, A2780cis, A431, ALC, B16, B35, BCP-1, BEAS-2B, bEnd.3, BHK-21, BR 293, bxPC3, C2C12, C3H-10T1/2, C6/36, cal-27, CGR8, CHO, CML T1, CMT, COR-L23/5010, COR-L23/CPR, COR-L23, COS-7, COV-434, CT26, D17, DH82, DU145, duCaP, E14Tg2a, EL4, EM2, EM3, EMT6/AR1, T6/AR10, HB 0.9, HB 3H 9, HCL 1, CML 2, HCP 7, HCP 54, HCP 1, HCP 2; high Five cells, HL-60, HMEC, HT-29, HUVEC, JUVEC, jurkat, JY cells, K562 cells, KCL22, KG1, ku812, KYO1, LNCap, ma-Mel 1, 2, 3..48, MC-38, MCF-10A, MCF-7, MDA-MB-231, MDA-MB-435, MDA-MB-468, MDCK II, MG63, MONO-MAC 6, MOR/0.2R, MRC, MTD-1A, myEnd, NALM-1, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NW-145, OPCN/OPCT Peer, PNT-1A/PNT 2, PTK2, raji, RBL cells, renCa, RIN-5F, RMA/RMAS, S2, saos-2, sf21, sf9, SU 7, SKU 3, SKU 7, SKU 3, 35, SWU 3, SWU 7, and so forth X63, YAC-1 and YAR cells.
Some aspects of the disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, the host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, the cell is transfected naturally in the subject. In some embodiments, the transfected cells are taken from a subject. In some embodiments, the cells are derived from cells, such as cell lines, taken from the subject. A variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, heLa-S3, huh1, huh4, huh7, HUVEC, HASMC, HEKn, HEKa, miaPaCell, panc 1.1, PC-3, TF1, CTLL-2, C1R, rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, calu1, SW480, SW620, SKOV3, SK-UT, caCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, jurkat, J45.01, LRMB, bcl-1, BC-3, IC21, DLD2, raw264.7, NRK-52E, MRC, MEF, hepG 2, heLa B, heLa T4, COS-1, COS-6, COS-M6-A, BS-C1 monkey kidney epithelium, BAHI-231, HB/3T 3 mouse fibroblast, T3, swiss 3, sword 3, 3-3, and human embryo cells; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A172, A20, A253, A431, A-549, ALC, B16, B35, BCP-12B, BCP-12B, 3, BHK-21, BR 293.BxPC 3. C3H-10T1/2, C6/36, cal-27, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr-/-, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, duCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, heLa, hepa1C1C7, HL-60, HMEC, HT-29, jurkat, JY cells, K562 cells, ku812, KCL22, KG1, KYO1, ma-Mel 1-48 MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, myEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, peer, PNT-1A/PNT 2, renCa, RIN-5F, RMA/AS, saos-2 cells, sf-9, SKBr3, T2, T-47D, T, THP1 cell lines, U373, U87, U937, VCaP, vero cells, WM39, WT-49, X63, YAC-1, YAR and transgenic variants thereof.
Cell lines can be obtained from a variety of sources known to those skilled in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, va)). In some embodiments, cells transfected with one or more vectors described herein are used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, cells transiently transfected (e.g., transiently transfected with one or more vectors, or transfected with RNA) with a CRISPR system component as described herein and modified by the activity of a CRISPR complex are used to establish a new cell line, wherein the new cell line comprises cells comprising the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more of the vectors described herein, or cell lines derived from such cells, are used to evaluate one or more test compounds.
Carrier body
Some aspects of the disclosure relate to delivering a guide editor described herein or a component thereof (e.g., a split Cas9 protein or a split nucleobase guide editor) to a cell using a recombinant viral vector (e.g., an adeno-associated viral vector, an adenoviral vector, or a herpes simplex viral vector). In the case of the split PE approach, the N-terminal portion of the guide editor and the C-terminal portion of the PE fusion protein are delivered to the same cell by separate recombinant viral vectors (e.g., adeno-associated viral vectors, adenovirus vectors, or herpes simplex viral vectors) because the full-length Cas9 protein or guide editor exceeds the packaging limitations of the various viral vectors, such as rAAV (about 4.9 kb).
Thus, in one embodiment, the present disclosure contemplates a carrier capable of delivering a fracture guide editor or a fracture component thereof. In some embodiments, compositions for delivering a split Cas9 protein or a split guide editor to a cell (e.g., a mammalian cell, a human cell) are provided. In some embodiments, the compositions of the present disclosure comprise: (i) A first recombinant adeno-associated virus (rAAV) particle comprising a first nucleotide sequence encoding the N-terminal portion of a Cas9 protein or a guide editor fused to intein N at its C-terminus; and (ii) a second recombinant adeno-associated virus (rAAV) particle comprising a second nucleotide sequence encoding an N-terminus of intein-C fused to a Cas9 protein or a C-terminal portion of a guide editor. The rAAV particles of the present disclosure comprise a rAAV vector (i.e., a recombinant genome of a rAAV) encapsulated in a viral capsid protein.
In some embodiments, the rAAV vector comprises: (1) A heterologous nucleic acid region comprising a first or second nucleotide sequence encoding an N-terminal portion or a C-terminal portion of any of the forms of a split Cas9 protein or a split guide editor described herein; (2) One or more nucleotide sequences comprising a sequence that facilitates expression of a heterologous nucleic acid region (e.g., a promoter), and (3) one or more nucleic acid regions comprising a sequence that facilitates integration of the heterologous nucleic acid region (optionally with one or more nucleic acid regions comprising a sequence that facilitates expression) into the genome of the cell. In some embodiments, the viral sequences that promote integration include Inverted Terminal Repeat (ITR) sequences. In some embodiments, the first or second nucleotide sequence encoding the N-terminal portion or the C-terminal portion of the split Cas9 protein or the split guide editor is flanked on each side by ITR sequences. In some embodiments, the nucleic acid vector further comprises a region encoding an AAV Rep protein as described herein, which is contained within or outside the region flanked by ITRs. The ITR sequences can be derived from any AAV serotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) or can be derived from more than one serotype. In some embodiments, the ITR sequence is derived from AAV2 or AAV6.
Thus, in some embodiments, a rAAV particle disclosed herein includes at least one rAAV2 particle, rAAV6 particle, rAAV8 particle, rphp.b particle, rphp.eb particle, or rAAV9 particle, or variant thereof. In particular embodiments, the disclosed rAAV particles are rphp.b particles, rphp.eb particles, rAAV9 particles.
ITR sequences and ITR-containing sequencesPlasmids of the columns are known in the art and are commercially available (see, for example, products or services obtainable from Vector Biolabs, philadelphia, pa., cellbiolabs, san Diego, calif., agilent Technologies, santa Clara, ca, and Addgene, cambridge, mass., and Gene delivery to skeletal muscle results in sustained expression and systemic delivery of a therapeutic protein. Kessler PD, podsakoff GM, chen X, mcQuiston SA, colosi PC, matelis LA, kurtzman GJ, byrne BJ.Proc Natl Acad Sci USA 1996Nov 26, 93 (24): 14082-7; and Curtis A.Machida. Methods in Molecular Medicine) TM .Viral Vectors for Gene Therapy Methods and Protocols.10.1385/1-59259-304-6:201Humana Press Inc.2003.Chapter 10.Targeted Integration by Adeno-Associated viruses. Matthew D.Weitzman, samule M.Young Jr., toni Cathomen and Richard Jude Samulski; U.S. Pat. nos. 5,139,941 and 5,962,313, all of which are incorporated herein by reference).
In some embodiments, a rAAV vector of the disclosure comprises one or more regulatory elements to control expression of a heterologous nucleic acid region (e.g., a promoter, transcription terminator, and/or other regulatory elements). In some embodiments, the first and/or second nucleotide sequences are operably linked to one or more (e.g., 1, 2, 3, 4, 5, or more) transcription terminators. Non-limiting examples of transcription terminators that may be used in accordance with the present disclosure include the bovine growth hormone gene (bGH), the human growth hormone gene (hGH), SV40, CW3,Or a combination thereof. Several transcription terminators have been tested for efficiency to determine their respective effects on the expression level of the split Cas9 protein or the split guide editor. In some embodiments, the transcription terminator used in the present disclosure is a bGH transcription terminator. In some embodiments, the rAAV vector further comprises woodchuck hepatitis virus post-transcriptional regulatory elements (WPREs). In certain embodiments, the WPRE is a truncated WPRE sequence,such as "W3". In some embodiments, the WPRE is inserted 5' of the transcription terminator. These sequences, when transcribed, produce tertiary structures that enhance expression, particularly from viral vectors. / >
In some embodiments, the vector used herein may encode a guide editor or any component thereof (e.g., napDNAbp, linker, or polymerase). In addition, the vectors used herein may encode a pegRNA and/or a helper gRNA for second strand nick generation. The vector may be capable of driving expression of one or more coding sequences in a cell. In some embodiments, the cell may be a prokaryotic cell, such as a bacterial cell. In some embodiments, the cell may be a eukaryotic cell, such as a yeast, plant, insect, or mammalian cell. In some embodiments, the eukaryotic cell may be a mammalian cell. In some embodiments, the eukaryotic cell may be a rodent cell. In some embodiments, the eukaryotic cell may be a human cell. Suitable promoters for driving expression in different types of cells are known in the art. In some embodiments, the promoter may be wild-type. In other embodiments, the promoter may be modified for more efficient expression. In other embodiments, the promoter may be truncated but retain its function. For example, the promoter may have a normal size or reduced size suitable for proper packaging of the vector into a virus.
In some embodiments, promoters useful for directing the editor vector may be constitutive, inducible, or tissue specific. In some embodiments, the promoter may be a constitutive promoter. Non-limiting exemplary constitutive promoters include the cytomegalovirus immediate early promoter (CMV), the simian virus (SV 40) promoter, the adenovirus Major Late (MLP) promoter, the Rous Sarcoma Virus (RSV) promoter, the Mouse Mammary Tumor Virus (MMTV) promoter, the phosphoglycerate kinase (PGK) promoter, the elongation factor- α (EFla) promoter, the ubiquitin promoter, the actin promoter, the tubulin promoter, the immunoglobulin promoter, functional fragments thereof, or a combination of any of the foregoing. In some embodiments, the promoter may be a CMV promoter. In some embodimentsThe promoter may be a truncated CMV promoter. In other embodiments, the promoter may be an EFla promoter. In some embodiments, the promoter may be an inducible promoter. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be a promoter with low basal (non-inducible) expression levels, such as Tet- Promoter (Clontech). In some embodiments, the promoter may be a tissue-specific promoter. In some embodiments, the tissue-specific promoter is expressed exclusively or predominantly in liver tissue. Non-limiting exemplary tissue-specific promoters include the B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-beta promoter, mb promoter, nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter and WASP promoter.
In some embodiments, the guide editor vector (e.g., including any vector encoding a guide editor and/or a pegRNA and/or an ancillary second strand-incision-producing gRNA) may comprise an inducible promoter to begin expression only after it is delivered to the target cell. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be a promoter with low basal (non-inducible) expression levels, such as Tet- Promoter (Clontech).
In other embodiments, the guide editor vector (e.g., including any vector encoding the guide editor and/or the pegRNA and/or the auxiliary second strand-incision generating gRNA) may comprise a tissue-specific promoter to begin expression only after it is delivered to a particular tissue. Non-limiting exemplary tissue-specific promoters include the B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, flt-1 promoter, GFAPpro promoter, GPIIb promoter, ICAM-2 promoter, INF-beta promoter, mb promoter, nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter and WASP promoter.
In some embodiments, the nucleotide sequence encoding the pegRNA (or any guide RNA used in connection with guided editing) may be operably linked to at least one transcriptional or translational control sequence. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to at least one promoter. In some embodiments, the promoter is recognized by RNA polymerase III (Pol III). Non-limiting examples of Pol III promoters include U6, HI, and tRNA promoters. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human U6 promoter. In other embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human HI promoter. In some embodiments, the nucleotide sequence encoding the guide RNA can be operably linked to a mouse or human tRNA promoter. In embodiments with more than one guide RNA, the promoters used to drive expression may be the same or different. In some embodiments, the nucleotides encoding the crRNA of the guide RNA and the nucleotides encoding the tracrRNA of the guide RNA may be provided on the same vector. In some embodiments, the nucleotide encoding the crRNA and the nucleotide encoding the tracrRNA may be driven by the same promoter. In some embodiments, crRNA and tracrRNA can be transcribed as a single transcript. For example, crrnas and tracrrnas can be processed from a single transcript to form a bi-molecular guide RNA. Alternatively, crrnas and tracrRNA may be transcribed into single molecule guide RNAs.
In some embodiments, the nucleotide sequence encoding the guide RNA may be located on the same vector that includes the nucleotide sequence encoding the guide editor. In some embodiments, the expression of the guide RNA and the guide editor may be driven by their respective promoters. In some embodiments, the expression of the guide RNA may be driven by the same promoter that drives the expression of the guide editor. In some embodiments, the guide RNA and guide editor transcripts may be contained within a single transcript. For example, the guide RNA can be within the untranslated region (UTR) of the Cas9 protein transcript. In some embodiments, the guide RNA may be within the 5' utr of the guide editor transcript. In other embodiments, the guide RNA may be within the 3' utr of the PE fusion protein transcript. In some embodiments, the intracellular half-life of the guide editor transcript may be reduced by including a guide RNA within its 3'utr and thereby shortening the length of its 3' utr. In other embodiments, the guide RNA may be within an intron of the PE fusion protein transcript. In some embodiments, a suitable splice site may be added at the intron where the guide RNA is located, such that the guide RNA is correctly spliced out of the transcript. In some embodiments, expression of Cas9 protein and guide RNA in close proximity on the same vector may promote more efficient formation of CRISPR complexes.
The guidance editor carrier system may include one carrier, or two carriers, or three carriers, or four carriers, or five carriers, or more. In some embodiments, the vector system may comprise one single vector encoding the guide editor and the pegRNA. In other embodiments, the vector system may comprise two vectors, one of which encodes the guide editor and the other encodes the pegRNA. In other embodiments, the vector system may comprise three vectors, wherein the third vector encodes the second strand-incision generating gRNA used in the methods herein.
In some embodiments, the composition comprising rAAV particles (in any form contemplated herein) further comprises a pharmaceutically acceptable carrier. In some embodiments, the compositions are formulated in a suitable pharmaceutical carrier for administration to a human or animal subject.
Some examples of materials that may be used as pharmaceutically acceptable carriers include: (1) saccharides such as lactose, glucose and sucrose; (2) starches, such as corn starch, potato starch; (3) Cellulose and its derivatives such as sodium carboxymethyl cellulose, methyl cellulose, ethyl cellulose, microcrystalline cellulose, cellulose acetate, etc.; (4) powdery tragacanth; (5) malt; (6) gelatin; (7) Lubricants such as magnesium stearate, sodium lauryl sulfate, talc, and the like; (8) excipients such as cocoa butter, suppository waxes, etc.; (9) Oils such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, soybean oil, and the like; (10) glycols, such as propylene glycol; (11) Polyols such as glycerol, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents such as magnesium hydroxide, aluminum hydroxide, etc.; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethanol; (20) a pH buffer solution; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) Serum components such as serum albumin, high density lipoprotein and low density lipoprotein; (22) C2-C12 alcohols, such as ethanol; (23) other non-toxic compatible substances for pharmaceutical formulations. Wetting agents, colorants, mold release agents, coating agents, sweeteners, flavoring agents, perfuming agents, preservatives and antioxidants can also be present in the formulation. Terms such as "excipient," "carrier," "pharmaceutically acceptable carrier," and the like are used interchangeably herein.
Delivery method
In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as one or more vectors described herein, one or more transcripts thereof, and/or one or more proteins transcribed therefrom, to a host cell. In some aspects, the invention also provides cells produced by such methods, as well as organisms (e.g., animals, plants, or fungi) comprising or produced by such cells. In some embodiments, the base editor is delivered to the cell in combination with (and optionally in complex with) a guide sequence as described herein.
Exemplary delivery strategies are described elsewhere herein, including vector-based strategies, PE ribonucleoprotein complex delivery, and delivery of PE by mRNA methods.
In some embodiments, delivery methods are provided that include nuclear transfection, microinjection, gene gun, virion, liposome, immunoliposome, polycation or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent enhanced DNA uptake.
Exemplary nucleic acid delivery methods include lipofection, nuclear transfection, electroporation, stable genome integration (e.g., piggybac), microinjection, gene gun, virion, liposome, immunoliposome, polycation, or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent enhanced DNA uptake. Lipofection is described, for example, in U.S. patent No. 5,049,386,4,946,787; and 4,897,355, lipid transfection reagents are commercially available (e.g., transfectam TM 、Lipofectin TM And SFCellLine 4D-Nucleofector X Kit TM (Lonza)). Cationic and neutral lipids suitable for efficient receptor recognition lipid transfection of polynucleotides include Feigner in WO91/17424; those in WO 91/16024. Delivery may be cellular (e.g., in vitro or ex vivo administration) or target tissue (e.g., in vivo administration). Delivery may be achieved through the use of RNP complexes.
Lipid preparation of nucleic acid complexes, including targeting liposomes, such as immunolipid complexes, is well known to those skilled in the art (see, e.g., crystal, science 270:404-410 (1995); blaese et al, cancer Gene Ther.2:291-297 (1995); behr et al, bioconjug chem.5:382-389 (1994); rem et al, bioconjug chem.5:647-654 (1994); gao et al, gene Therapy 2:710-722 (1995); ahmad et al, cancer Res.52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054,4,501,728, 4,774,085, 4,837,028, and 4,946,787).
In other embodiments, the delivery methods and vectors provided herein are RNP complexes. RNP delivery of fusion proteins significantly enhances DNA specificity for base editing. RNP delivery of fusion proteins results in separation of mid-target and off-target DNA editing. RNP delivery eliminates off-target editing of non-repeat sites while maintaining mid-target editing comparable to plasmid delivery and greatly reduces off-target DNA editing even at highly repeated VEGFA site 2. See Rees, h.a. et al Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery, nat.commun.8,15790 (2017), U.S. patent No. 9,526,784 to date 2016, 12, 27 and U.S. patent No. 9,737,604 to date 2017, 8, 22, each of which is incorporated herein by reference.
Other methods of delivering nucleic acids to cells are known to those of skill in the art. See, for example, US2003/0087817, incorporated herein by reference.
Other aspects of the disclosure provide methods of delivering a guided editor construct to a cell to form a complete functional guided editor within the cell. For example, in some embodiments, the cells are contacted with a composition described herein (e.g., a composition comprising a nucleotide sequence encoding a split Cas9 or a split guide editor or an AAV particle comprising a nucleic acid vector comprising such a nucleotide sequence). In some embodiments, the contacting results in the delivery of such nucleotide sequences into the cell, wherein the N-terminal portion of the Cas9 protein or guide editor and the C-terminal portion of the Cas9 protein or guide editor are expressed and linked within the cell to form the complete Cas9 protein or complete guide editor.
It is to be understood that any of the rAAV particles, nucleic acid molecules, or compositions provided herein can be stably or transiently introduced into a cell in any suitable manner. In some embodiments, the disclosed proteins can be transfected into cells. In some embodiments, the cell may be transduced or transfected with a nucleic acid molecule. For example, a nucleic acid molecule encoding a split protein or rAAV particle containing a viral genome encoding one or more nucleic acid molecules can be transduced (e.g., with a virus encoding a split protein) or transfected (e.g., with a plasmid encoding a split protein). Such transduction may be stable or transient. In some embodiments, the cell expressing the split protein or containing the split protein can be transduced or transfected with one or more guide RNA sequences, for example, when delivering the split Cas9 (e.g., nCas 9) protein. In some embodiments, plasmids expressing the split proteins can be introduced into the cells by electroporation, transient (e.g., lipofection), and stable genomic integration (e.g., piggybac), and viral transduction, or other methods known to those of skill in the art.
In certain embodiments, the compositions provided herein comprise lipids and/or polymers. In certain embodiments, the lipid and/or polymer is cationic. The preparation of such lipid particles is well known. See, for example, U.S. Pat. nos. 4,880,635;4,906,477;4,911,928;4,917,951;4,920,016;4,921,757; and 9,737,604, each of which is incorporated herein by reference.
The guide RNA sequence can be 15-100 nucleotides in length and comprise a sequence of at least 10, at least 15, or at least 20 consecutive nucleotides that is complementary to the target nucleotide sequence. The guide RNA can comprise 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 consecutive nucleotides complementary to the target nucleotide sequence. The guide RNA may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length.
In some embodiments, the target nucleotide sequence is a DNA sequence in a genome, such as a eukaryotic genome. In certain embodiments, the target nucleotide sequence is in a mammalian (e.g., human) genome.
For example, the compositions of the present disclosure may be administered in unit doses or packaged. The term "unit dose" as used in the pharmaceutical compositions of the present disclosure refers to physically discrete units suitable as unitary dosages for subjects, each unit containing a predetermined quantity of active material in association with a desired diluent (i.e., carrier or vehicle), calculated to produce the desired therapeutic effect.
Treatment of a disease or disorder includes delaying the progression or progression of the disease, or reducing the severity of the disease. The treatment of the disease does not necessarily require a cure.
As used herein, "delay" of progression of a disease refers to delaying, impeding, slowing, arresting, stabilizing, and/or delaying the progression of the disease. This delay may be of different lengths of time, depending on the history of the disease and/or the individual being treated. A method of "delaying" or alleviating the progression of a disease or delaying the onset of a disease is a method of reducing the likelihood of one or more symptoms of a disease occurring within a given time frame and/or alleviating the extent of symptoms within a given time frame, as compared to the absence of the method. Such comparisons are typically based on clinical studies using a number of subjects sufficient to yield statistically significant results.
"progression" or "progression" of a disease refers to the initial manifestation and/or subsequent progression of the disease. The progression of the disease can be detected and assessed using standard clinical techniques well known in the art. However, development also refers to progress that may not be detectable. For the purposes of this disclosure, development or progression refers to the biological process of symptoms. "progression" includes occurrence, recurrence and onset.
As used herein, a "episode" or "occurrence" of a disease includes an initial episode and/or recurrence. Depending on the type of disease or the site of the disease to be treated, the isolated polypeptide or pharmaceutical composition may be administered to the subject using conventional methods known to one of ordinary skill in the medical arts.
Without further elaboration, it is believed that one skilled in the art can, based on the preceding description, utilize the present disclosure to its fullest extent. Accordingly, the following detailed description is to be taken only in any way and not to limit the remainder of the disclosure. All publications cited herein are incorporated by reference for the purpose or subject matter of the disclosure.
Description of the embodiments
The present disclosure also relates to the following non-limiting numbered paragraphs.
1. A guided editing guide RNA (PEgRNA), comprising:
a spacer sequence comprising a region complementary to a target strand of a double-stranded target DNA sequence;
A nucleic acid extension arm comprising a DNA synthesis template core associated with a nucleic acid programmable DNA binding protein (napdNAbp),
wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence;
wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence and comprises one or more nucleotide edits compared to the double-stranded target DNA sequence;
and wherein the extension arm further comprises a nucleic acid portion selected from the group consisting of: toe ring, hairpin, stem loop, pseudoknot, aptamer, G-quadruplex, tRNA, riboswitch or ribozyme.
2. The PEgRNA of paragraph 1, wherein the nucleic acid portion is at the 3' end of the extension arm.
3. The PEgRNA of paragraph 1, wherein the nucleic acid portion is at the 3' end of the extension arm.
4. The PEgRNA of paragraph 1, wherein the nucleic acid portion comprises a frameshift pseudoknot (Mpknot) from the moloney murine leukemia virus (M-MLV) genome, optionally wherein the Mpknot is an Mpknot1 portion having a nucleotide sequence selected from the group consisting of: SEQ ID NO 3930 (Mpknot 1), SEQ ID NO 3931 (Mpknot 1 3' trimmed), SEQ ID NO 3932 (Mpknot 1 with 5' extra), SEQ ID NO 3933 (Mpknot 1U 38A), SEQ ID NO 3934 (Mpknot 1U 38A A C), SEQ ID NO 3935 (MMLC A29C), SEQ ID NO 3936 (Mpknot 1 with 5' extra and U38A), SEQ ID NO 3937 (Mpknot 1 with 5' extra and U38A A C), and SEQ ID NO 3938 (Mpknot 1 with 5' extra and A29C), or a nucleotide sequence having at least 80% sequence identity thereto.
5. The PEgRNA of paragraph 1, wherein the nucleic acid portion comprises a G-quadruplex, optionally wherein the G-quadruplex has a nucleotide sequence selected from the group consisting of: SEQ ID NO. 3939 (tns 1), SEQ ID NO. 3940 (stk 40), SEQ ID NO. 3941 (apc 2), SEQ ID NO. 3942 (ceacam 4), SEQ ID NO. 3943 (pitpnm 3), SEQ ID NO. 3944 (rlf), SEQ ID NO. 3945 (erc 1), SEQ ID NO. 3946 (ube c), SEQ ID NO. 3947 (taf 15), SEQ ID NO. 3948 (stard 3) and SEQ ID NO. 3949 (g 2), or a nucleotide sequence having at least 80% sequence identity thereto.
6. The PEgRNA of paragraph 1, wherein the nucleic acid portion comprises a prequeosine1 riboswitch aptamer.
7. The PEgRNA of paragraph 6, wherein the nucleic acid portion comprises an evolved prequeosin 1-1 riboswitch aptamer (evoreq 1), optionally wherein the evoreq 1 has a nucleotide sequence selected from the group consisting of: SEQ ID NO:3950 (evapoteq 1), SEQ ID NO:3951 (evapotq 1 motif 1), SEQ ID NO:3952 (evapotq 1 motif 2), SEQ ID NO:3953 (evapotq 1 motif 3), SEQ ID NO:3954 (shorter preq 1-1), SEQ ID NO:3955 (preq 1-1G 5C (mut 1)) and SEQ ID NO:3956 (preq 1-1G 15C (mut 2)), or a nucleotide sequence having at least 80% sequence identity thereto.
8. The PEgRNA of paragraph 1, wherein the nucleic acid portion comprises a tRNA portion having the nucleotide sequence of SEQ ID NO:3957 or a nucleotide sequence having at least 80% sequence identity thereto.
9. The PEgRNA of paragraph 1, wherein the nucleic acid portion has the nucleotide sequence of SEQ ID No. 3958 (xrn 1), or a nucleotide sequence having at least 80% sequence identity thereto.
10. The PEgRNA of paragraph 1, wherein the nucleic acid portion comprises the P4-P6 domain of the group I intron, optionally wherein the P4-P6 domain has the nucleotide sequence of SEQ ID NO:3959, or a nucleotide sequence having at least 80% sequence identity thereto.
11. The PEgRNA of any one of paragraphs 1-10, wherein the PEgRNA further comprises a linker.
12. The PEgRNA of paragraph 11, wherein the linker is between the nucleic acid portion and another component of the PEgRNA.
13. The PEgRNA of paragraph 11, wherein the linker is between the nucleic acid portion and the primer binding site or between the gRNA core and the nucleic acid portion.
14. The PEgRNA of paragraph 11, wherein the linker comprises a nucleotide sequence selected from the group consisting of: 3960, 3961, 3962, 3963, 3964, 3965, 3966, 3967, 3968, 3969, 3970 and 3971.
15. The PEgRNA of paragraph 11, wherein the linker is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides in length, wherein the linker is no longer than 50 nucleotides.
16. The PEgRNA of paragraph 11, wherein the linker is 1 to 5 nucleotides, 5 to 10 nucleotides, 10 to 20 nucleotides, 15 to 25 nucleotides, 20 to 30 nucleotides, 25 to 35 nucleotides, 30 to 40 nucleotides, 35 to 45 nucleotides, or 40 to 50 nucleotides in length; or wherein the linker is 1 to 50, 3 to 50, 5 to 50, or 8 to 50 nucleotides in length.
17. The PEgRNA of paragraph 11, wherein the linker is 8 nucleotides in length.
18. The PEgRNA of any one of paragraphs 4-17, wherein the extension arm is located at the 3 'or 5' end of a guide RNA, and wherein the nucleic acid extension arm comprises DNA or RNA.
19. The PEgRNA of paragraph 18, wherein the primer binding site comprises a region complementary to a region upstream of a nick site in the non-target strand of the target DNA sequence, wherein the nick site is characteristic of napDNAbp.
20. The PEgRNA of paragraph 19, wherein the DNA synthesis template comprises a region complementary to a region downstream of the nick site in the non-target strand of the target DNA sequence.
21. The PEgRNA of paragraph 18, wherein the primer binding site comprises a region complementary to a region immediately upstream of a nick site in a non-target strand of the target DNA sequence.
22. The PEgRNA of paragraph 18, wherein the nucleic acid extension arm is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 42 nucleotides, at least 46 nucleotides, at least 48 nucleotides, at least 46 nucleotides; or wherein the nucleic acid extension arm is 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 100 to 110, 110 to 120, 20 to 120, 40 to 120, 60 to 120, 80 to 120, 100 to 120, 40 to 100, 60 to 100, 80 to 100, or 60 to 80 nucleotides in length; or wherein the nucleic acid extension arm is 15 to 300, 20 to 250, 20 to 200, 20 to 150, 25 to 150, 15 to 100, 20 to 100, or 25 to 100 nucleotides in length; or wherein the nucleic acid extension arm has a length of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、
47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、
63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、
79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、
95. 96, 97, 98, 99 or 100 nucleotides.
23. The PEgRNA of paragraph 18, wherein the DNA synthesis template is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length; or wherein the DNA synthesis template is 1 to 10, 5 to 15, 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 10 to 30, 10 to 40, 10 to 50, 10 to 60, 20 to 40, 20 to 60, 30 to 100, 40 to 100, 50 to 100, 60 to 100, or 70 to 100 nucleotides in length; wherein the DNA synthesis template is 5 to 300, 5 to 250, 15 to 200, 15 to 150, 5 to 100, 10 to 100, or 15 to 100 nucleotides in length; or wherein the length of the DNA synthesis template is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、
39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、
55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、
71. 72, 73, 74, 75, 76, 77, 78, 79 or 80 nucleotides.
24. The PEgRNA of paragraph 23, wherein the DNA synthesis template is 15 to 35 nucleotides in length.
25. The PEgRNA of paragraph 18, wherein the primer binding site is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length, or wherein the primer binding site is 1 to 10 nucleotides, 5 to 10 nucleotides, 10 to 15 nucleotides, 10 to 20 nucleotides, 8 to 20 nucleotides, 15 to 25 nucleotides, 20 to 30 nucleotides, or 25 to 30 nucleotides in length; wherein the primer binding site is 3 to 60, 5 to 60, 8 to 50, or 12 to 50 nucleotides in length, or wherein the primer binding site is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides in length.
26. The PEgRNA of any one of paragraphs 1-25, wherein the gRNA core comprises a co-repeat, wherein the co-repeat does not contain four or more consecutive a-U base pairs.
27. The PEgRNA of paragraph 26, wherein the direct repeat comprises the nucleotide sequence UUUA.
28. The PEgRNA of any one of paragraphs 1-27, wherein the PEgRNA comprises chemically or biologically modified nucleotides or nucleotide analogs.
29. The PEgRNA of paragraph 28, wherein the three consecutive nucleotides at the 5 'end of the PEgRNA comprise one or more chemically modified nucleotides, and/or wherein the three consecutive nucleotides at the 3' end of the PEgRNA comprise one or more chemically modified nucleotides.
30. A boot editor system comprising:
(a) Nucleic acid programmable DNA binding protein (napdNAbp)
(b) A domain comprising DNA polymerase activity; and
(c) The PEgRNA of any one of paragraphs 1-29.
31. The guided editor system of paragraph 30, wherein the PEgRNA and the napDNAbp and/or the domain comprising DNA polymerase activity form a complex.
32. The guided editor system of paragraphs 30 or 31, wherein the domain having DNA polymerase activity and the napDNAbp are fused to form a fusion protein.
33. The guided editor system of any of paragraphs 30-32, wherein the napDNAbp has nickase activity.
34. The guided editor system of any of paragraphs 30-32, wherein the napDNAbp is a Cas9 protein or a variant thereof.
35. The guide editor system of paragraph 34, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
36. The guided editor system of paragraph 35, wherein the napDNAbp is Cas9 nickase (nCas 9).
37. The guided editor system of any of paragraphs 30-32, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, and Argonaute, and optionally has nickase activity.
38. The guided editor system of any of paragraphs 30-37, wherein the domain comprising RNA-dependent DNA polymerase activity is a reverse transcriptase comprising an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to an amino acid sequence of any of SEQ ID NOs 89-100, 105-122, 128-129, 132, 139, 143, 149, 154, 159, 235, 454, 471, 516, 662, 700, 701-716, 739-741, and 766.
39. The guided editor system of any of paragraphs 30-37, wherein the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising any one of the amino acid sequences of SEQ ID NOs 89-100, 105-122, 128-129, 132, 139, 143, 149, 154, 159, 235, 454, 471, 516, 662, 700, 701-716, 739-741, and 766.
40. The guided editor system of paragraph 38, wherein the reverse transcriptase is Moloney-murine leukemia virus reverse transcriptase (M-MLVRT).
41. The guidance editor system of paragraph 40 wherein the RNA-dependent DNA polymerase domain comprises a variant moloney-murine leukemia virus reverse transcriptase (M-MLV RT) domain, wherein the variant M-MLV RT domain comprises one or more of the following mutations relative to the amino acid sequence of SEQ ID NO: 89: P51X, S67X, E69X, L139X, T197X, D200X, H X, F209X, E302X, T209X, F X, W313X, T X, L345X, L435X, N454X, D524X, E562X, D583X, H594X, L603X, E X or D653X and wherein X is any amino acid.
42. The guided editor system of paragraph 41, wherein the variant M-MLV RT domain comprises one or more of the following mutations relative to the amino acid sequence of SEQ ID NO: 89: P51L, S67K, E K, L139P, T197A, D200N, H R, F209N, E K, E4815 306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E35562Q, D583N, H594Q, L603W, E K or D653N.
43. The guidance editor system of paragraph 41 wherein the variant M-MLV RT domain comprises the amino acid substitutions D200N, T330P and L603W relative to the amino acid sequence of SEQ ID NO:89, optionally wherein the M-MLV RT domain comprises the amino acid substitutions D200N, T306K, W313F, T330P and L603W relative to the amino acid sequence of SEQ ID NO: 89.
44. The guided editor system of any of paragraphs 30-37, wherein the domain comprising RNA-dependent DNA polymerase activity is a naturally occurring reverse transcriptase from a retrovirus or retrotransposon.
45. A nucleic acid molecule encoding the PEgRNA of any one of paragraphs 1-29.
46. A nucleic acid molecule encoding a napDNAbp and/or a domain with DNA polymerase activity of any one of paragraphs 30-45.
47. An expression vector comprising the nucleic acid molecule of paragraph 45 and/or the nucleic acid molecule of paragraph 46, optionally wherein the nucleic acid molecule is under the control of a promoter.
48. The expression vector of paragraph 47, wherein said promoter is a polIII promoter.
49. The expression vector of paragraph 47, wherein the promoter is a U6 promoter.
50. The expression vector of paragraph 47, wherein the promoter is a U6, U6v4, U6v7 or U6v9 promoter or fragment thereof.
51. A cell comprising the PEgRNA of any one of paragraphs 1-29.
52. A cell comprising the guided editor system of any of paragraphs 30-44, the nucleic acid molecule of paragraph 45 or 46, or the expression vector of any of paragraphs 47-50.
53. A Lipid Nanoparticle (LNP) comprising the PEgRNA of any one of paragraphs 1-29, the guided editor system of any one of paragraphs 30-44, or the nucleic acid molecule of paragraphs 45 or 46.
54. A ribonucleoprotein complex (RNP) comprising the PEgRNA of any one of paragraphs 1-29, the guidance editor system of any one of paragraphs 30-44, or the nucleic acid molecule of paragraphs 45 or 46.
55. A pharmaceutical composition comprising: (i) The PEgRNA of any one of paragraphs 1-29, the guidance editor system of any one of paragraphs 30-44, or the nucleic acid molecule PEgRNA of paragraph 45 or 46, the expression vector of any one of paragraphs 47-50, the cell of paragraph 51 or 52, the LNP of paragraph 53 or the RNP of paragraph 54, and (ii) a pharmaceutically acceptable excipient.
56. A kit composition comprising: (i) The PEgRNA of any one of paragraphs 1-29, the guidance editor system of any one of paragraphs 30-44, or the nucleic acid molecule of paragraph 45 or 46, the expression vector of any one of paragraphs 47-50, the cell of paragraph 51 or 52, the LNP of paragraph 53, or the RNP PEgRNA (ii) of paragraph 54, a set of instructions for performing guidance editing.
57. A method of guided editing comprising contacting a target DNA sequence with the PEgRNA of any one of paragraphs 1-29 and a guided editor comprising napDNAbp and a domain having DNA polymerase activity, wherein the contacting installs one or more nucleotide edits in the target DNA sequence.
58. The method of paragraph 57, wherein the target DNA has an increased editing efficiency as compared to the editing efficiency when the guide editor and a control PEgRNA that does not contain the nucleic acid portion PEgRNA are contacted.
59. The method of paragraph 58, wherein the editing efficiency is increased by at least a factor of 1.5.
60. The method of paragraph 58, wherein the editing efficiency is increased by at least a factor of 2.
61. The method of paragraph 58, wherein the editing efficiency is increased by at least 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, or 10-fold [1255], [1270].
62. The method of any one of paragraphs 57-61, wherein the napDNAbp has nickase activity.
63. The method of any of paragraphs 57-62, wherein the napDNAbp is a Cas9 protein or variant thereof.
64. The method of paragraph 63, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
65. The method of paragraph 64, wherein the napDNAbp is Cas9 nickase (nCas 9).
66. The method of any one of paragraphs 57-62, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, and Argonaute, and optionally has nickase activity.
67. The method of any one of paragraphs 57-66, wherein the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising any one of the nucleotide sequences of SEQ ID NOs 89-100, 105-122, 128-129, 132, 139, 143, 149, 154, 159, 235, 454, 471, 516, 662, 700, 701-716, 739-741 and 766.
68. The method of any one of paragraphs 57-66, wherein the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising an amino acid sequence having at least 80%, 85%, 90%, 95%, 98%, 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs 89-100, 105-122, 128-129, 132, 139, 143, 149, 154, 159, 235, 454, 471, 516, 662, 700, 701-716, 739-741 and 766.
69. The method of paragraph 68, wherein the reverse transcriptase is Moloney-murine leukemia virus reverse transcriptase (M-MLVRT).
70. The method of paragraph 69, wherein the RNA dependent DNA polymerase domain comprises a variant Moloney-murine leukemia virus reverse transcriptase (M-MLV RT) domain, wherein the variant M-MLV RT domain comprises one or more of the following mutations relative to the amino acid sequence of SEQ ID NO: 89: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D653X, H594X, L603X, E X or D653X, wherein X is any amino acid.
71. The method of paragraph 70, wherein the variant M-MLV RT domain comprises one or more of the following mutations with respect to the amino acid sequence of SEQ ID NO: 89: P51L, S67K, E K, L139P, T197A, D200N, H R, F209N, E K, E4815 306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E35562Q, D583N, H594Q, L603W, E K or D653N.
72. The method of paragraph 70, wherein the variant M-MLV RT domain comprises the amino acid substitutions D200N, T P and L603W relative to the amino acid sequence of SEQ ID NO:89, optionally wherein the M-MLV RT domain comprises the amino acid substitutions D200N, T306K, W313F, T P and L603W relative to the amino acid sequence of SEQ ID NO: 89.
73. The method of any of paragraphs 57-66, wherein the domain comprising RNA-dependent DNA polymerase activity is a naturally occurring reverse transcriptase from a retrovirus or retrotransposon.
74. A method of installing nucleotide edits in a double stranded target DNA sequence, the method comprising: contacting the double stranded target DNA sequence with a guide editor comprising a nucleic acid programmable DNA binding protein (napDNAbp), a DNA polymerase, and a guide editing guide RNA (PEgRNA), wherein the PEgRNA comprises:
(a) A spacer sequence comprising a region of complementarity that hybridizes to a target strand of a double-stranded target DNA sequence;
(b) A nucleic acid extension arm comprising a DNA synthesis template and a primer binding site,
(c) A gRNA core associated with a nucleic acid programmable DNA binding protein (napDNABP),
(d) A nucleic acid moiety selected from the group consisting of: toe ring, hairpin, stem loop, pseudoknot, aptamer, G-quadruplex, tRNA, riboswitch or ribozyme; and
(e) A linker connecting the nucleic acid portion to another component of the PEgRNA,
wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence;
wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence and comprises one or more nucleotide edits compared to the double-stranded target DNA sequence, and wherein the linker is designed by a computational model. PEgRNA
75. The PEgRNA of paragraph 75, wherein the linker comprises a nucleotide sequence selected from the group consisting of: 3960, 3961, 3962, 3963, 3964, 3965, 3966, 3967, 3968, 3969, 3970 and 3971.
76. A method of identifying at least one nucleic acid adaptor for ligating a component of a guide editing guide RNA (PEgRNA) to a nucleic acid portion, the method comprising:
using at least one computer hardware processor to perform:
generating a plurality of nucleic acid adaptor candidates including a first nucleic acid adaptor candidate;
identifying the at least one nucleic acid adaptor from the plurality of nucleic acid adaptor candidates at least in part by:
Calculating a plurality of scores for each of at least some of the plurality of nucleic acid adaptor candidates, the calculating comprising calculating a first set of scores for the first nucleic acid adaptor candidate, the first set of scores comprising:
a first score indicative of a degree of interaction between the first nucleic acid linker candidate and a first region of the PEgRNA;
a second score indicative of a degree of interaction between the first nucleic acid linker candidate and a second region of the PEgRNA; and
identifying the at least one nucleic acid adaptor from among at least some of the plurality of nucleic acid adaptor candidates using the calculated plurality of scores; and outputting information indicative of the at least one nucleic acid adaptor.
77. The method of paragraph 77, wherein the first score indicates a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with a first region of the PEgRNA, and wherein the second score indicates a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with a second region of the PEgRNA.
78. The method of paragraph 78, wherein the first region comprises a Primer Binding Site (PBS) of the PEgRNA.
79. The method of paragraph 79, wherein the second region comprises a spacer of the PEgRNA.
80. The method of paragraph 78, wherein the first component further comprises a third score indicating the extent to which the first nucleic acid linker candidate is predicted to avoid interacting with a third region of the PEgRNA and an extent to which the first nucleic acid linker candidate is predicted to avoid interacting with a fourth region of the PEgRNA.
81. The method of paragraph 81, wherein the third region comprises a DNA synthesis template.
82. The method of paragraph 82, wherein the fourth region comprises a gRNA core that interacts with a nucleic acid programmable DNA binding protein (napDNAbp).
83. The method of paragraph 81,
wherein the PEgRNA is used to install nucleotide edits in a double stranded target DNA sequence,
wherein the PEgRNA comprises:
a spacer sequence comprising a region of complementarity to a target strand of a double-stranded target DNA sequence, a nucleic acid extension arm comprising a DNA synthesis template and a primer binding site, and
a gRNA core that interacts with a nucleic acid programmable DNA binding protein napdNAbp,
wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence;
wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double stranded target DNA sequence and comprises one or more nucleotide edits compared to the double stranded target DNA sequence, and wherein the first region comprises PBS, the second region comprises a spacer, the third region comprises a DNA synthesis template, and the fourth region comprises a gRNA core.
84. The method of paragraph 77, wherein the plurality of nucleic acid adaptor candidates comprises a second nucleic acid adaptor candidate, and wherein identifying the at least one nucleic acid adaptor from among at least some of the plurality of nucleic acid adaptor candidates using the calculated plurality of scores comprises:
comparing the first set of scores of the first nucleic acid adaptor candidate with the second set of scores of the second nucleic acid adaptor candidate.
85. The method of paragraph 85, wherein:
the first region comprises a Primer Binding Site (PBS), the first score in the first set of scores is indicative of a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with the first region of the PEgRNA, the third score in the second set of scores is indicative of a degree to which the second nucleic acid linker candidate is predicted to avoid interacting with the first region of the PEgRNA, and comparing the first set of scores to the second set of scores comprises: comparing the first score to the third score.
86. The method of paragraph 86, wherein when the first score is equal to or within a threshold distance of the third score, comparing the first set of scores to the second set of scores further comprises:
Comparing a score of the first set of scores other than the first score with another score of the second set of scores other than the third score.
The present disclosure also provides the following numbered embodiments.
1. A method of editing two or more copies of a disease-related gene, wherein each copy of the disease-related gene comprises a double-stranded target DNA sequence, the method comprising contacting each of the two or more copies of the disease-related gene with a guided editor system comprising:
(a) A nucleic acid programmable DNA binding protein (napDNAbp) domain or a polynucleotide encoding the napDNAbp domain;
(b) A polymerase domain or a polynucleotide encoding the polymerase domain; and
(c) A guided editing guide RNA (PEgRNA), wherein the PEgRNA comprises:
a spacer region comprising a region complementary to a target strand of the double-stranded DNA sequence;
a gRNA core associated with the napDNAbp domain; and
a nucleic acid extension arm comprising a primer binding site and a DNA synthesis template, wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence, and wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence and comprises one or more nucleotide edits as compared to the double-stranded target DNA, wherein the non-target strand is complementary to the target strand;
Wherein each copy of the disease-related gene comprises a pathogenic variant and two or more copies of the disease-related gene comprise two or more different pathogenic variants, wherein the contacting installs one or more nucleotide edits in each of the two or more copies of the disease-related gene, wherein the installing corrects pathogenic variations in each disease-related gene, thereby editing each of the two or more copies of the disease-related gene.
2. The method of embodiment 1, wherein two or more copies of the disease-associated gene are in one subject.
3. The method of embodiment 1, wherein the two or more copies of the disease-associated gene are present in two or more different subjects.
4. A method of treating a disease in two or more subjects, each of the subjects comprising a disease-related gene, wherein the disease-related genes of the two or more subjects comprise a double-stranded target DNA sequence, the method comprising administering to the two or more subjects a guidance editor system comprising:
(a) A nucleic acid programmable DNA binding protein (napDNAbp) domain or a polynucleotide encoding the napDNAbp domain;
(b) A polymerase domain or a polynucleotide encoding a polymerase domain; and
(c) A guided editing guide RNA (PEgRNA), wherein the PEgRNA comprises:
(i) A spacer region comprising a region complementary to a target strand of the double-stranded DNA sequence;
(ii) A gRNA core associated with the napDNAbp domain; and
(iii) A nucleic acid extension arm comprising a primer binding site and a DNA synthesis template, wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence, and wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence and comprises one or more nucleotide edits as compared to the double-stranded target DNA, wherein the non-target strand is complementary to the target strand;
wherein the two or more subjects comprise two or more different pathogenic variations in disease-related genes, wherein the administering installs the one or more nucleotide edits in the disease-related genes of each of the two or more subjects, wherein the installing corrects pathogenic variations in the disease-related genes of each of the two or more subjects, thereby treating the disease of the two or more subjects.
5. The method of any of embodiments 1-4, wherein the polynucleotide encoding the napDNAbp domain and/or the polynucleotide encoding the polymerase domain comprises RNA, optionally wherein the polynucleotide encoding the napDNAbp domain and/or the polynucleotide encoding the polymerase domain is mRNA.
6. The method of any one of embodiments 1-5, wherein the polymerase domain is an RNA-dependent DNA polymerase domain.
7. The method of embodiment 6, wherein the polymerase domain is a reverse transcriptase, optionally wherein the reverse transcriptase is a reverse transcriptase from a retrovirus or retrotransposon.
8. The method of embodiment 6, wherein the reverse transcriptase is reacted with 89-100, 105-122, 128-129, 132, 139, 143, 149, 154, 159, 235, 454, 471, 516, 662, 700, 701-
716. The amino acid sequence of any one of 739-741 and 766 has at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity.
9. The method of embodiment 6, wherein the reverse transcriptase is Moloney murine leukemia virus reverse transcriptase (M-MLVRT).
10. The method of embodiment 9, wherein the RNA-dependent DNA polymerase domain comprises a variant moloney murine leukemia virus reverse transcriptase (M-MLV RT) domain, wherein the variant M-MLV RT domain comprises one or more of the following mutations relative to the amino acid sequence of SEQ ID NO: 89: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D653X, H594X, L603X, E X or D653X, wherein X is any amino acid.
11. The method of embodiment 9, wherein the variant M-MLV RT domain comprises one or more of the following mutations relative to the amino acid sequence of SEQ ID NO: 89: P51L, S67K, E K, L139P, T197A, D200N, H R, F209N, E K, E4815 306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E35562Q, D583N, H594Q, L603W, E K or D653N.
12. The method of embodiment 11, wherein said variant M-MLV RT domain comprises the amino acid substitutions D200N, T330P and L603W relative to the amino acid sequence of SEQ ID NO: 89.
13. The method of embodiment 11, wherein said M-MLV RT domain comprises the amino acid substitutions D200N, T306K, W313F, T P and L603W relative to the amino acid sequence of SEQ ID NO: 89.
14. The method of embodiment 11, wherein said variant M-MLV RT domain comprises any of the amino acid sequences of SEQ ID NOS: 106-122, 143, 701-716 or 740-741.
15. The method of embodiment 11, wherein said M-MLV RT domain has the sequence of SEQ ID NO: 741.
16. The method of embodiment 9, wherein the variant M-MLV RT domain is a truncated variant of M-MLV RT comprising the D200N, T306K, W313F and T330P mutations.
17. The method of embodiment 16, wherein said variant M-MLV RT domain has the sequence of SEQ ID NO. 766.
18. The method of any one of embodiments 1-4, wherein the napDNAbp domain has nickase activity.
19. The method of any one of embodiments 1-4, wherein the napDNAbp domain is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, and Argonaute, and optionally has nickase activity.
20. The method of any of embodiments 1-4, wherein the napDNAbp domain is a Cas9 protein or variant thereof.
21. The method of embodiment 20, wherein the napDNAbp domain is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
22. The method of embodiment 21, wherein the napDNAbp domain is Cas9 nickase (nCas 9).
23. The leader editor of any of embodiments 1-4, wherein the napDNAbp domain comprises an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to the amino acid sequence of any of SEQ ID NOs 18, 19, 21, 25, 26, 126, 137, 141, 147, 153, 157, 445, 460, 467 and 482-487.
24. The method of embodiment 23, wherein the napDNAbp domain comprises an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to the amino acid sequence of SEQ ID No. 18.
25. The method of any one of embodiments 1-24, wherein the napDNAbp domain and the RNA-dependent DNA polymerase domain are linked to form a fusion protein.
26. The method of embodiment 25, wherein said napDNAbp domain and said RNA-dependent DNA polymerase domain are linked via a peptide linker to form a fusion protein.
27. The method of embodiment 25 or 26, wherein the fusion protein comprises the structure NH2- [ napDNAbp domain ] - [ RNA-dependent DNA polymerase domain ] -COOH, or NH2- [ RNA-dependent DNA polymerase domain ] - [ napDNAbp domain ] -COOH, wherein each instance of "] - [" indicates the presence of an optional linker sequence.
28. The method of embodiment 26 or 27, wherein the peptide linker comprises an amino acid sequence selected from SGGS, (2 xSGGS), (3 x SGGS), XTEN, EAAAK, (2 x EAAAK), and (3 x EAAAK).
29. The method of embodiment 28, wherein the peptide linker consists of the amino acid sequence of 1x XTEN.
30. The method of embodiment 25, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO. 134, or an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO. 134.
31. The method of any of embodiments 1-30, wherein the nicking site is within a pre-spacer on a non-target strand of the double-stranded target DNA, wherein the pre-spacer is immediately adjacent to a pre-spacer adjacent motif (PAM).
32. The method of any one of embodiments 1-31, wherein the spacer, the nucleic acid extension arm, and the gRNA core are in a single molecule.
33. The method of embodiment 32, wherein the nucleic acid extension arm is located at the 3 'or 5' end of the gRNA core, or at an intramolecular position of the gRNA core, and optionally wherein the nucleic acid extension arm comprises DNA or RNA.
34. The method of embodiment 32, wherein the nucleic acid extension arm is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 42 nucleotides, at least 43 nucleotides, at least 46 nucleotides, at least 48 nucleotides, at least 46 nucleotides, optionally wherein the nucleic acid extension arm is 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 100 to 110, 110 to 120, 20 to 120, 40 to 120, 60 to 120, 80 to 120, 100 to 120, 40 to 100, 60 to 100, 80 to 100, or 60 to 80 nucleotides in length.
35. The method of embodiment 32, wherein the primer binding site is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length, optionally wherein the primer binding site is 1 to 10 nucleotides, 5 to 10 nucleotides, 10 to 15 nucleotides, 10 to 20 nucleotides, 8 to 20 nucleotides, 15 to 25 nucleotides, 20 to 30 nucleotides, or 25 to 30 nucleotides in length.
36. The method of embodiment 35, wherein the primer binding site is 8 nucleotides to 15 nucleotides in length.
37. The method of embodiment 32, wherein the primer binding site is (a) 8 nucleotides to 11 nucleotides in length and contains greater than about 60% GC content, (b) 12 nucleotides to 13 nucleotides in length and comprises about 40% -60% GC content, or (c) 14 nucleotides to 15 nucleotides in length and contains less than about 40% GC content.
38. The method of embodiment 32, wherein the DNA synthesis template is a reverse transcription template sequence.
39. The method of any one of embodiments 1-38, wherein the DNA synthesis template has a wild-type sequence of a disease-associated gene.
40. The method of any of embodiments 1-39, wherein the DNA synthesis template is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length.
41. The method of any one of embodiments 1-39, wherein the DNA synthesis template is 5 to 10, 5 to 15, 10 to 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90, 90 to 100, 10 to 30, 10 to 40, 10 to 50, 10 to 60, 20 to 40, 20 to 60, 30 to 100, 40 to 100, 50 to 100, 60 to 100, or 70 to 100 nucleotides in length, optionally wherein the DNA synthesis template is 10 to 35 nucleotides in length.
42. The method of any one of embodiments 1-39, wherein the DNA synthesis template is at least 3 to 58 nucleotides in length.
43. The method of any one of embodiments 1-39, wherein the DNA synthesis template is 8 nucleotides to 31 nucleotides in length.
44. The method of any one of embodiments 1-39, wherein the DNA synthesis template is (a) 10 nucleotides to 16 nucleotides or (b) 12 nucleotides to 17 nucleotides in length.
45. The method of any of embodiments 1-39, wherein the DNA synthesis template comprises a nucleotide sequence 80%, or 85%, or 90%, or 95%, or 99% identical to the double stranded target DNA sequence.
46. The method of any one of embodiments 1-45, wherein the PEgRNA further comprises at least one nucleic acid moiety selected from the group consisting of a toe loop, a hairpin, a stem loop, a pseudoknot, an aptamer, a G-quadruplex, a tRNA, a riboswitch, or a ribozyme.
47. The method of embodiment 46, wherein said nucleic acid moiety is at the 3 'or 5' end of said PEgRNA.
48. The method of embodiment 46, wherein the extension arm comprises the nucleic acid portion.
49. The method of embodiment 48, wherein the nucleic acid moiety is located at the 3 'or 5' end of the extension arm.
50. The method of embodiment 46, wherein the nucleic acid portion comprises a frameshift pseudoknot (Mpknot) from the moloney murine leukemia virus (M-MLV) genome, optionally wherein the Mpknot is an Mpknot1 portion having a nucleotide sequence selected from the group consisting of: SEQ ID NO 3930 (Mpknot 1), SEQ ID NO 3931 (Mpknot 1 3' trimmed), SEQ ID NO 3932 (Mpknot 1 with 5' extra), SEQ ID NO 3933 (Mpknot 1U 38A), SEQ ID NO 3934 (Mpknot 1U 38A A C), SEQ ID NO 3935 (MMLC A29C), SEQ ID NO 3936 (Mpknot 1 with 5' extra and U38A), SEQ ID NO 3937 (Mpknot 1 with 5' extra and U38A A C), and SEQ ID NO 3938 (Mpknot 1 with 5' extra and A29C), or a nucleotide sequence having at least 80% sequence identity thereto.
51. The method of embodiment 46, wherein the nucleic acid portion comprises a G-quadruplex, optionally wherein the G-quadruplex has a nucleotide sequence selected from the group consisting of seq id nos: SEQ ID NO. 3939 (tns 1), SEQ ID NO. 3940 (stk 40), SEQ ID NO. 3941 (apc 2), SEQ ID NO. 3942 (ceacam 4), SEQ ID NO. 3943 (pitpnm 3), SEQ ID NO. 3944 (rlf), SEQ ID NO. 3945 (erc 1), SEQ ID NO. 3946 (ube c), SEQ ID NO. 3947 (taf 15), SEQ ID NO. 3948 (stard 3) and SEQ ID NO. 3949 (g 2), or a nucleotide sequence having at least 80% sequence identity thereto.
52. The method of embodiment 46, wherein the nucleic acid portion comprises a prequeosyl riboswitch aptamer, optionally wherein the nucleic acid portion comprises an evolved prequeosyl 1-1 riboswitch aptamer (evoreq 1), the evoreq 1 having a nucleotide sequence selected from the group consisting of: SEQ ID NO:3950 (evapoteq 1), SEQ ID NO:3951 (evapotq 1 motif 1), SEQ ID NO:3952 (evapotq 1 motif 2), SEQ ID NO:3953 (evapotq 1 motif 3), SEQ ID NO:3954 (shorter preq 1-1), SEQ ID NO:3955 (preq 1-1G 5C (mut 1)) and SEQ ID NO:3956 (preq 1-1G 15C (mut 2)), or a nucleotide sequence having at least 80% sequence identity thereto.
53. The method of embodiment 46, wherein the nucleic acid portion comprises a tRNA portion that has the nucleotide sequence of SEQ ID NO:3957 or a nucleotide sequence that has at least 80% sequence identity thereto.
54. The method of embodiment 46, wherein the nucleic acid portion has the nucleotide sequence of SEQ ID No. 3958 (xrn 1), or a nucleotide sequence having at least 80% sequence identity thereto.
55. The method of embodiment 46, wherein said nucleic acid portion comprises the P4-P6 domain of an intron of group I, optionally wherein said P4-P6 domain has the nucleotide sequence of SEQ ID NO:3959, or a nucleotide sequence having at least 80% sequence identity thereto.
56. The method of any one of embodiments 46-55, wherein the PEgRNA further comprises a linker.
57. The method of embodiment 56, wherein said linker is located between said nucleic acid portion and another component of said PEgRNA.
58. The method of embodiment 57, wherein the linker is between the nucleic acid moiety and the primer binding site or between the gRNA core and the nucleic acid moiety.
59. The method of embodiment 58, wherein the linker comprises a nucleotide sequence selected from the group consisting of: 3960, 3961, 3962, 3963, 3964, 3965, 3966, 3967, 3968, 3969, 3970 and 3971.
60. The method of any one of embodiments 1-59, wherein the one or more nucleotide edits comprise an insertion of one or more nucleotides as compared to the double stranded DNA sequence.
61. The method of any one of embodiments 1-59, wherein the one or more nucleotide edits comprise a deletion of one or more nucleotides as compared to the double stranded DNA sequence.
62. The method of any one of embodiments 1-59, wherein the one or more nucleotide edits comprise nucleotide substitutions as compared to the double stranded DNA sequence.
63. The method of any one of embodiments 1-59, wherein the one or more nucleotide edits comprise one or more insertions of one or more nucleotides, nucleotide substitutions, deletions of one or more nucleotides, or a combination of any such nucleotide edits, as compared to the double-stranded target DNA sequence.
64. The method of any one of embodiments 62-63, wherein the one or more nucleotide substitutions is a single base nucleotide substitution.
65. The method of any one of embodiments 2-65, wherein the administering corrects 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 pathogenic variations of the disease-associated gene in the two or more subjects; or wherein said administering corrects 2 to 5, 2 to 7, 3 to 10, 3 to 12, 4 to 15, or 4 to 20 pathogenic variations of said disease-associated genes in said two or more subjects.
66. The method of any of embodiments 1-63, wherein the PEgRNA comprises a modified nucleobase, a modified sugar, a modified phosphate group, or a nucleoside analog.
67. The method of embodiments 4-64, comprising administering to the two or more subjects a pharmaceutical composition comprising the method and a pharmaceutically acceptable excipient.
68. The method of any one of embodiments 1-65, wherein the disease-associated gene is CDKL5.
69. The method of embodiment 66, wherein each of the different pathogenic variations encodes a mutation selected from the group consisting of V172I, A173D, R175S, W176G, W176R, Y177C, R178P, P180L, E181A and L182P as compared to the wild-type CDKL5 protein.
70. The method of embodiment 66, wherein the PEgRNA comprises the sequence of the PEgRNA sequence in fig. 2.
71. The method of embodiment 66, wherein the PEgRNA comprises the sequence of the PEgRNA sequence in fig. 4.
Examples
Example 1: guiding editing: highly versatile and accurate search and replacement genome editing in human cells Without double-stranded DNA breaks
Background
Current genome editing methods can use programmable nucleases to disrupt, delete or insert target genes, by-products that accompany double-stranded DNA breaks, and base editors to install four switch point mutations at the target locus. However, small insertions, small deletions and eight transversion point mutations together represent most pathogenic genetic variants, but are not effective in most cell types and corrected without excessive byproducts. Described herein are guided editing, a highly versatile and accurate genome editing method that uses catalytically impaired Cas9 fused to an engineered reverse transcriptase (programmed with engineered guided editing guide RNAs (pegrnas) that both specify target sites and encode desired edits) to write new genetic information directly to the specified DNA sites. More than 175 different edits were made in human cells to confirm that guided edits can be effective (typically 20-60%, up to 77% in unsorted cells) and targeted insertions, deletions, all 12 possible types of point mutations and combinations thereof with low byproducts (typically 1-10%), without double strand breaks or donor DNA templates. Leading edits are applied in human cells to correct the major genetic causes of sickle cell disease (requiring a.t to t.a transversions in HBB) and tay-sajohne disease (requiring a 4-base deletion in hex a), in both cases effectively reverting the pathogenic genome allele to wild type with minimal by-products. Guidance editing can also be used to create human cell lines with these pathogenic HBB and hex a insertion mutations, install in PRNP a G127V mutation that confers resistance to prion diseases (g.c to t.a transversions are required), and effectively insert His6 tag, FLAG epitope tag and extended LoxP site into the target locus of human cells. Guided editing provides advantages over HDR in terms of efficiency and product purity, and complementary advantages and disadvantages compared to base editing. Consistent with its search and replacement mechanism (which requires three different base pairing events), guide editing is less likely to occur off-target DNA modifications at known Cas9 off-target sites than Cas 9. Guided editing greatly expands the scope and capacity of genome editing, in principle correcting about 89% of known pathogenic human genetic variants.
The ability to make virtually any targeted change in the genome of any living cell or organism is a long-sought after life sciences. Despite the rapid progress of genome editing technology, it is related to diseases>75,000 known human genetic variants 111 Most of these could not be corrected or installed in most treatment-related cells (fig. 38A). Programmable nucleases such as CRISPR-Cas9 produce double-stranded DNA breaks (DSBs) that can destroy genes by inducing a mix of insertions and deletions (indels) at the target site 112-114 . Nucleases can also be used to delete target genes by homology-independent processing 115,116 Or insertion of foreign genes 117-119 . However, double-stranded DNA breaks are also associated with undesired results, including complex product mixtures, translocations 120 And p53 activation 121,122 . Furthermore, most pathogenic alleles differ from their non-pathogenic counterparts in small insertions, deletions or base substitutions that require much more accurate editing techniques to correct (fig. 38A). Nuclease-induced DSB stimulated Homology Directed Repair (HDR) 123 Has been widely used for installing eachPrecise DNA changes. However, HDR relies on exogenous donor DNA repair templates, often producing excess indels by-products from end-junction repair of DSBs, and is inefficient in most treatment-related cell types (T cells and some stem cells are important exceptions 124,125 . While improving the efficiency and accuracy of DSB-mediated genome editing remains a major concern in promising efforts 126-130 These challenges, however, require the exploration of alternative precise genome editing strategies.
Base editing can effectively install or correct four types of switching mutations (C to T, G to A, A to G and T to C) in a wide variety of cell types and organisms (including mammals) without the use of DSBs 128–131 However, at present, any of the eight transversion mutations (C to A, C to G, G to C, G to T, A to C, A to T, T to a and T to G) cannot be achieved, for example, the most common cause of direct correction of sickle cell disease (HBB E6V) requires t.a to a.t mutations 132 . Furthermore, no DSB-free methods have been reported for target deletion, e.g., removal of 4-base repeats leading to Tay-Sachs disease (HEXA 1278+TATC) 133 Or targeted insertion, e.g., direct correction of cystic fibrosis most common cause (CFTR Δf508) requires precise 3-base insertion 134 . Thus, targeted transversion point mutations, insertions and deletions are difficult to install or correct efficiently and without excessive byproducts in most cell types, even though they together constitute most known pathogenic alleles (fig. 38A).
Described herein is the development of guided editing, a new "search and replace" genome editing technique that mediates targeted insertions, deletions, and all 12 possible base-to-base conversions at a targeted locus in human cells without double-stranded DNA breaks or donor DNA templates. The guide editor, initially exemplified by PE1, copies the genetic information directly from the extension of the pegRNA to the target genomic locus using reverse transcriptase fused to programmable nicking enzyme and guide edit pegRNA (pegRNA). While the second generation guided editor (PE 2) uses the work Cheng Huani transcriptase to significantly increase editing efficiency with minimal insertion loss (typically < 2%), the third generation PE3 system adds a second guide RNA to nick the non-editing strand, facilitating substitution of the non-editing strand and further increasing editing efficiency with insertion loss of about 1-10%, typically up to about 20-50% in human cells. PE3 provides much fewer byproducts and higher or similar efficiency compared to optimized Cas9 nuclease-initiated HDR, and provides complementary advantages and disadvantages compared to contemporary base editors.
PE3 is applied to the genomic locus of human HEK293T cells to effect efficient conversion of HBB E6V to wild type HBB, deletion of inserted TATC to restore HEXA 1278+TATC to wild type HEXA, installation in PRNP conferring resistance to prion diseases 135 G127V mutation (requiring G.C to T.A transversion), and targeted insertion of His 6 The tag (18 bp), FLAG epitope tag (24 bp) and extended LoxP site were subjected to Cre-mediated recombination (44 bp). Pilot editing was also successful in three other human cell lines and post-mitotic primary mouse cortical neurons with varying efficiencies. Due to the high flexibility of the distance between the initial cut and the editing position, the guided editing is essentially not limited by the PAM requirement of Cas9, in principle most genomic loci can be targeted. Off-target guided editing at the known Cas9 off-target locus is much less than off-target Cas9 editing, possibly due to the need for three different DNA base pairing events to occur for effective guided editing. Guided editing has the potential to advance the study and correction of many gene variants by achieving precise targeted insertions, deletions, and all 12 possible point mutations at a wide variety of genomic loci without the need for DSBs or donor DNA templates.
Results
Strategies for transferring information from pegRNA to target DNA sites
Cas9 targets DNA using guide RNAs that comprise spacer sequences that hybridize to target DNA sites 112-114,136,137 . The goal is to engineer guide RNAs that can target DNA as in the native CRISPR system 138,139 New genetic information can also be included to replace the corresponding DNA nucleotide at the target locus. Transfer of genetic information from pegRNA directly to designated DNA sites followed by substitutionOriginal non-edited DNA can in principle provide a general method of installing targeted DNA sequence changes in living cells independent of DSBs or donor DNA templates. To achieve this direct information transfer, the goal is to use genomic DNA that makes a nick at the target site to expose the 3' -hydroxyl to trigger direct reverse transcription of genetic information from an extension on an engineered guide RNA (hereinafter referred to as guide editing guide RNA or pegRNA) to the target site (fig. 38A).
These initial steps of nicking and reverse transcription (which are analogous to the mechanisms used by certain naturally-movable genetic elements 140 ) Resulting in a branched intermediate with two redundant single-stranded DNA lobes on one strand: a 5 'flap comprising unedited DNA sequence and a 3' flap comprising an edited sequence copied from the pegRNA (fig. 38B). To achieve successful editing, the branch intermediate must be broken down so that the edited 3 'flap replaces the unedited 5' flap. Although hybridization of the 5' flap to the unedited strand may be thermodynamically favored because the edited 3' flap may form fewer base pairs with the unedited strand, the 5' flap is a preferred substrate for a structure-specific endonuclease (e.g., FEN 1) 141 Which excises the 5' flap created during the synthesis of the lag-chain DNA and excision of the long patch base. It was inferred that preferential 5 'flap excision and 3' flap ligation could drive the incorporation of edited DNA strands to form heteroduplex DNA containing one edited strand and one unedited strand (fig. 38B).
The permanent installation of edits may result from subsequent DNA repair that breaks down the mismatch between the two DNA strands in a manner that copies the information in the edited strand to the complementary DNA strand (fig. 38C). Based on developing similar strategies for maximizing DNA base editing efficiency 131-133 It is expected that nicking a non-editing DNA strand far enough from the initial nick site to minimize double strand break formation may bias DNA repair toward preferential replacement of the non-editing strand.
Verification of the pilot editing step in vitro and in Yeast cells
After cleavage of the PAM-containing DNA strand by the RuvC nuclease domain of Cas9, the PAM distal fragment of the strand can be separated from the otherwise stable Cas9: sgRNA: DNA complex 143 . 3 of this free chainThe' ends may be close enough to initiate DNA polymerization. Efforts to engineer guide RNAs 144-146 And crystal structure of Cas9 sgRNA-DNA complex 147-149 It was shown that the 5 'and 3' ends of the sgrnas can be extended without eliminating Cas9:sgrna activity. The pegRNA was designed by extending the sgRNA to include two key components: a Primer Binding Site (PBS) allowing hybridization of the 3 'end of the nicked DNA strand to the pegRNA and a Reverse Transcriptase (RT) template containing the desired edits that will be directly copied to the genomic DNA site as the 3' end of the nicked DNA strand is extended along the RNA template by a polymerase (fig. 38C).
These were hypothesized to be tested in vitro using purified streptococcus pyogenes Cas9 protein. A series of pegRNA candidates were constructed by extending the sgrnas at either end using PBS sequences (5 to 6 nucleotides, nt) and RT templates (7 to 22 nt). It was demonstrated that 5' -extended pegRNA directs Cas9 binding to target DNA, and that both 5' -extended pegRNA and 3' -extended pegRNA support Cas 9-mediated target nick generation and DNA cleavage activity in mammalian cells in vitro (fig. 44A-44C). These candidate pegRNA designs were tested using pre-nicked 5' -Cy 5-labeled dsDNA substrates, catalytic death Cas9 (dCas 9), and commercial variants of moloney murine leukemia virus (M-MLV) reverse transcriptase (fig. 44D). When all components were present, efficient conversion of the fluorescent-labeled DNA strand into longer DNA products was observed, with gel migration consistent with reverse transcription along the RT template (fig. 38D, fig. 44D-44E). Products of desired length were formed using 5 '-extended or 3' -extended pegRNA (FIGS. 38D to 38E). Omission of dCas9 resulted in nick translation products on the DNA template derived from reverse transcriptase-mediated DNA polymerization without pegRNA information transfer (fig. 38D). No DNA polymerization products were observed when the pegRNA was replaced with conventional sgrnas, confirming the necessity of PBS and RT template components for the pegRNA (fig. 38D). These results indicate that Cas 9-mediated DNA melting exposes a single-stranded R loop that can initiate reverse transcription from 5 '-extended or 3' -extended pegRNA if nicked.
Next, cas9 nickase (H840A mutant) was used that nicked specifically the PAM-containing strand 112 Non-nicking dsDNA substrates were tested. In these reactions, 5' -extended pegRNA is inefficientReverse transcription products were generated, possibly due to impaired Cas9 nickase activity (fig. 44F). However, 3' -extended pegRNA allowed robust Cas9 nick generation and efficient reverse transcription (fig. 38E). Although reverse transcription may in principle terminate anywhere within the remainder of the pegRNA, the use of 3' extended pegRNA only results in a single apparent product. DNA sequencing of the product reacted with Cas9 nickase, RT and 3' -extended pegRNA showed that the complete RT template sequence was reverse transcribed into the DNA substrate (fig. 44G). These experiments demonstrate that 3' -extended pegRNA can provide a template for reverse transcription of new DNA strands while retaining the ability to guide Cas9 nickase activity.
To evaluate eukaryotic cellular DNA repair results of the 3' flap resulting from the peprna programming reverse transcription in vitro, the reporter plasmid substrate was subjected to DNA nicking production and reverse transcription in vitro using peprna, cas9 nickase and RT, and the reaction product was transformed into yeast (saccharomyces cerevisiae) cells (fig. 45A). Encouraging, 37% of yeast transformants expressed GFP and mCherry proteins when the plasmid was edited in vitro with 3' -extending pegRNA encoding T.A to A.T transversions correcting premature stop codons (FIG. 38F, FIG. 45C). Consistent with the results in fig. 38E and 44F, editing reactions with 5 '-extended pegRNA in vitro produced fewer GFP and mCherry double positive clones (9%) than those with 3' -extended pegRNA (fig. 38F and 45D). Efficient editing was also observed with 3' -extended pegRNA corrected for frameshift mutations using either inserted mononucleotides (15% double positive transformants) or absent mononucleotides (29% double positive transformants) (FIG. 38F and FIGS. 45E to 45F). DNA sequencing of the edited plasmid recovered from the biscationic yeast colonies confirmed that encoded transversion editing occurred at the desired sequence site (fig. 45G). These results indicate that DNA repair in eukaryotic cells can break down the 3' DNA flap caused by guided editing to incorporate precise DNA editing, including transversions, insertions, and deletions.
Design of boot editor 1 (PE 1)
The encouraging of results in vitro and in yeast, the development of guided editing systems with a minimum number of components that are capable of editing genomic DNA in mammalian cells was sought. The 3' -extended pegRNA (hereinafter abbreviated as pegRNA, fig. 39A) and direct fusion of Cas 9H 840A with reverse transcriptase can constitute a functional two-component guided editing system via a flexible linker. HEK293T (immortalized human embryonic kidney) cells were transfected with a plasmid encoding a fusion of the wild-type M-MLV reverse transcriptase with either end of Cas 9H 840A nickase and a second plasmid encoding pegRNA. Preliminary attempts resulted in no detectable t·a to a·t transition at the HEK3 target locus.
However, extension of PBS to 8-15 bases in pegRNA (FIG. 39A) resulted in the detection of T.A-A.T edits at HEK3 target sites (FIG. 39B), the guide editor construct fusing RT to the C-terminus of Cas9 nickase (3.7% maximum T.A-A.T transitions, where PBS length ranges from 8-15 nt) had higher efficiency than the N-terminal RT-Cas9 nickase fusion (1.3% maximum T.A-A.T transitions) (FIG. 39B; unless otherwise indicated, all mammalian cell data reported herein were values for the entire treated cell population, unselected or sorted). These results indicate that the wild-type M-MLV RT fused to Cas9 requires longer PBS sequences for genome editing of human cells than is required in vitro using the M-MLV RT commercial variants provided in trans. This first generation wild-type M-MLV reverse transcriptase fused to the C-terminus of Cas 9H 840A nickase was designated PE1.
PE1 was tested for its ability to precisely introduce transversion point mutations at four additional genomic target sites specified by pegRNA (FIG. 39C). Similar to editing at HEK3 loci, the efficiency at these genomic loci depends on PBS length, with a maximum editing efficiency ranging from 0.7-5.5% (fig. 39C). Indels from PE1 were low, with an average of 0.2±0.1% for five sites under conditions that maximize editing efficiency per site (fig. 46A). PE1 can also install targeted insertions and deletions such as single nucleotide deletions at the HEK3 locus (4.0% efficiency), single nucleotide insertions (9.7%) and trinucleotide insertions (17%) (fig. 39C). These results demonstrate that PE1 is capable of direct installation of targeting transversions, insertions and deletions without double-stranded DNA breaks or DNA templates.
Design of boot editor 2 (PE 2)
While PE1 can install multiple edits at multiple loci of HEK293T cells, the editing efficiency is typically low (typically. Ltoreq.5%) (FIG. 39C). Engineering PE1 can increase the efficiency of DNA synthesis within the unique conformational constraints of the guided editing complex, resulting in higher genome editing yields. The M-MLV RT mutation has been previously reported to increase the thermostability of the enzyme 150,151 Continuous synthesis capability 150 Affinity for DNA, RNA heteroduplex substrate 152 And inactivate RNaseH activity 153 . 19 PE1 variants containing various reverse transcriptase mutations were constructed to evaluate their guided editing efficiency in human cells.
First, a series of M-MLVRT variants were studied, which were previously evolved from the laboratory due to their ability to support reverse transcription at elevated temperatures 150 . Three of these amino acid substitutions (D200N, L603W and T330P) were introduced consecutively into M-MLV RT (hereinafter referred to as M3), resulting in an average 6.8-fold improvement in the transversion and insertion editing efficiency of 5 genomic loci in HEK293T cells compared to PE1 (fig. 47A to 47S).
Next, other reverse transcriptase mutations were tested that previously showed enhanced template PBS complex binding, enzyme processivity and thermostability in combination with M3 152 . Among the 14 other mutants analyzed, the variants with T306K and W313F substitutions in addition to the M3 mutation gave a further 1.3-to 3.0-fold improvement in editing efficiency of 6 transversions or insert editing at 5 genomic loci of human cells compared to M3 (fig. 47A to 47S). This five mutant of M-MLV reverse transcriptase incorporates the PE1 structure (Cas 9H 840A-M-MLV RT (D200N L603W T330P T306K W313F)) and is hereinafter referred to as PE2.
PE2 was installed with single nucleotide transversions, insertions and deletion mutations at substantially higher efficiency than PE1 (FIG. 39C), and was compatible with shorter PBS pegRNA sequences (FIG. 39C), consistent with the increased ability to efficiently bind transient genomic DNA to PBS complex. On average, PE2 resulted in a 1.6 to 5.1 fold increase in pilot editing point mutation efficiency over PE1 (fig. 39C), and in some cases significantly increased editing yield up to 46 fold (fig. 47F and 47I). PE2 was also more efficient at achieving targeted insertions and deletions than PE1, achieving targeted insertion of the 24-bp FLAG epitope tag at the HEK3 locus with 4.5% efficiency, 15-fold more efficient than installing the insertion with PE1 (FIG. 47D), mediating 1-bp deletions with 8.6% efficiency in HEK3, 2.1-fold more efficient than PE1 (FIG. 39C). These results confirm that PE2 is a more efficient boot editor than PE 1.
Optimization of pegRNA characteristics
The relationship between pegRNA structure and guided editing efficiency was systematically explored at 5 genomic sites in HEK293T cells using PE2 (FIG. 39C). In general, the lower GC content priming sites required longer PBS sequences (EMX 1 and RNF2, 40% and 30% GC content in the first 10nt upstream of the nick, respectively), while those sites with higher GC content supported shorter PBS sequences (HEK 4 and FANCF, 80% and 60% GC content in the first 10nt upstream of the nick, respectively) (FIG. 39C) to prime editing, consistent with the energy requirements for hybridization of the nicked DNA strand to the pegRNA PBS. The PBS length or GC content level cannot be strictly predicted to guide editing efficiency, and other factors such as DNA primer or secondary structure of pegRNA extension may also affect editing activity. For a typical target sequence, it is recommended to start with a PBS length of about 13nt, and if the sequence deviates from about 40-60% GC content, a different PBS length is explored. If necessary, the optimal PBS sequence should be determined empirically.
Next, the performance determinants of the RT template portion of pegRNA were studied. RT templates with lengths ranging from 10-20nt were systematically evaluated at five genomic target sites using PE2 (FIG. 39D), and pegRNA with longer RT templates up to 31nt were evaluated at three genomic sites (FIGS. 48A-48C). As with PBS length, RT template length may also be varied to maximize boot editing efficiency, although many RT template lengths, typically longer than or equal to 10nt, support more efficient boot editing (FIG. 39D). Since some target sites prefer longer RT templates (> 15 nt) to achieve higher editing efficiency (FANCF, EMX 1), while other loci prefer short RT templates (HEK 3, HEK 4) (fig. 39D), it is recommended to use both short and long RT templates for testing when optimizing pegRNA, starting from about 10-16 nt.
Importantly, placing C as an RT template for nucleotides adjacent to the terminal hairpin of the sgRNA scaffold generally resulted in lower editing efficiency compared to other pegrnas with similar length RT templates (fig. 48A-48C). Structure based on sgrnas that bind Cas9 148,149 Examination paperThe presence of C as the first nucleotide of the 3' extension of classical sgrnas can disrupt the sgRNA scaffold folding by pairing with G81, a nucleotide that naturally forms pi stacks with Tyr1356 in Cas9 and that is non-classical base pairing with sgRNA a 68. Since many RT template lengths support guided editing, it is recommended to select pegRNA with the first base of the 3' extension (the last reverse transcribed base of the RT template) being other than C.
Design of the boot editor 3 System (PE 3 and PE3 b)
While PE2 can transfer genetic information from pegRNA to the target locus more efficiently than PE1, the way cells break down heteroduplex DNA formed by one editing strand and one non-editing strand determines whether editing is permanent. Previous base editing developments have faced similar challenges because the initial product of cytosine or adenine deamination is heteroduplex DNA comprising one editing strand and one non-editing strand. To increase the efficiency of base editing, using an editing strand as a template, introducing nicks into a non-editing strand using Cas 9D 10A nickase and directing DNA repair to the strand 129,130,142 . To exploit this principle to increase guided editing efficiency, a similar strategy was tested to induce cells to preferentially displace non-editing strands using Cas 9H 840A nickase already present in PE2 and simple sgrnas to nick the non-editing strand (fig. 40A). Since the edited DNA strand also had nicks to initiate guided editing, various sgRNA programmed nick locations were tested on the non-edited strand to minimize the creation of double-stranded DNA breaks that resulted in indels.
This PE3 strategy was first tested at the 5 genomic sites of HEK293T cells by screening for sgRNA that induced a nick located 14 to 116 bases at the pepRNA-induced nick site (5 'or 3') of PAM. Among 4 of the 5 sites tested, nicking the non-editing strand increased the number of indels of guided editing product by 1.5 to 4.2 fold, up to 55% compared to the PE2 system (fig. 40B). Although the optimal nick generation location varies from genomic site to genomic site, a nick located about 40 to 90bp from the perna-induced nick of 3' (positive distance in fig. 40B) of PAM generally resulted in an advantageous increase in guided editing efficiency (41% on average) without excessive indels formation (6.8% for the sgRNA average indels, resulting in the highest editing efficiency for each of the 5 sites tested) (fig. 40B). As expected, at some sites, placing the non-editing strand nicks within 40bp of the pegRNA-induced nick resulted in a significant increase in indel formation by up to 22% (fig. 40B), possibly due to the double strand break created by nicking the two tight strands. However, at other sites, nicking up to 14bp from the pegRNA-induced nick produced only 5% indels (FIG. 40B), indicating that the locus dependent factors control the conversion of the proximal double nick to a double stranded DNA break. At one test site (HEK 4), the complementary strand nicks did not provide any benefit, or resulted in indel levels exceeding editing efficiency (up to 26%), even placed >70bp from the pegRNA-induced nick, consistent with the unusual propensity of the editing strand to be nicked or inefficiently ligated by cells at that site. If the indel frequency exceeds an acceptable level, it is recommended to start with generating a nick from the non-editing strand of about 50bp of the pegRNA mediated nick and test for substitution nick location.
This model (FIG. 40A) predicts how to increase guided editing efficiency with respect to complementary strand nick generation, where nicking of the non-editing strand only after editing of the strand breaks minimizes the presence of concurrent nicks, thereby reducing the frequency of double strand breaks that continue to form indels. To achieve time-sequential control of non-editing strand nick generation, sgrnas were designed with spacer sequences that match the editing strand but do not match the original allele. Using this strategy (hereinafter referred to as PE3 b), the mismatch between the spacer and the non-editing allele should be detrimental to the nicking by the sgRNA until after the PAM strand has undergone an editing event. This PE3b method was tested with 5 different edits at three genomic sites of HEK293T cells and the results were compared with those achieved with the PE2 and PE3 systems. In all cases, PE3b was associated with significantly lower levels of indels (3.5 to 30-fold lower indels, 12-fold lower averages, or 0.85%) than PE3, without any significant decrease in overall editing efficiency compared to PE3 (fig. 40C). Thus, when editing is located within the second protospacer, the PE3b system can reduce indels, still improve editing efficiency compared to PE2, typically to a level similar to PE3 (FIG. 40C).
Taken together, these findings demonstrate that the PE3 system (Cas 9 nickase optimized reverse transcriptase+pegrna+sgrna) improves editing efficiency by about 3-fold compared to PE2 (fig. 40B to 40C). As expected, PE3 was accompanied by a wider range of indels than PE2, taking into account the additional nicking activity of PE3. When the guidance editing efficiency is prioritized, it is recommended to use PE3. PE2 provides about a 10-fold lower frequency of indels when minimizing indels is critical. While the sgrnas that identify installed edits can be used to nick non-edited chains, the PE3b system can achieve a PE 3-like editing level with greatly reduced indels formation.
To demonstrate the targeting range and versatility of guided editing with PE3, using PE3 and pegRNA with a 10 nucleotide RT template, all possible single nucleotide substitutions were explored between +1 to +8 sites of HEK3 target site (counting the first base of the pegRNA-induced nick 3' as site +1) (fig. 41A). Overall, these 24 different edits covered all 4 transition mutations and all 8 transversion mutations, with an average editing efficiency of 33±7.9% (range 14% to 48%) without indels, with an average of 7.5±1.8% indels.
Importantly, with PE3, long distance RT templates can also lead to efficient guided editing. For example, using PE3 with 34-nt RT template, point mutations were installed at positions +12, +14, +17, +20, +23, +24, +26, +30, and +33 (pegRNA induced cuts 12 to 33 bases) in the HEK3 locus with an average efficiency of 36.+ -. 8.7% and an indels of 8.6.+ -. 2.0% (FIG. 41B). Although editing beyond +10 sites at other loci was not attempted, other RT templates at three substitution sites > 30nt also support efficient editing (FIGS. 48A-C). The activity of long RT templates enables efficient guided editing of tens of nucleotides from the initial nicking site. Since NGG PAM on either DNA strand appears once every about 8bp on average, far less than the maximum distance between PAM and edits that support efficient guided editing, in contrast to other precise genome editing methods 125,142,154 Guided editing is essentially unrestricted by availability of nearby PAM sequences. Considering RNA secondary structure and guided editing efficiencyThe hypothetical relationship between, when designing pegRNA for remote editing, is prudent to test RT templates of different lengths and to test sequence composition (e.g., synonymous codons) as necessary to optimize editing efficiency.
To further test the scope and limitation of the PE3 system for introducing transition and transversion point mutations, 72 other edits were tested covering all 12 possible types of point mutations spanning 6 other genomic target sites (fig. 41C to 41H). In summary, the efficiency of editing without indels averages 25±14%, whereas indels form an average of 8.3±7.5%. Since the pegRNART template contains a PAM sequence, guided editing can induce changes in the PAM sequence. In these cases, higher editing efficiency (average 39±9.7%) and lower indel formation (average 25.0±2.9%) were observed (fig. 41A to 41K, point mutations at positions +5 or +6). This increase in PAM editing efficiency and reduced indel formation may be due to the inability of Cas9 nickase to re-bind and nick the editing strand prior to complementary strand repair. Since guided editing supports combined editing without significant loss of editing efficiency, it is suggested to edit PAM where possible, among other desired changes.
Next, 14 targeted small insertions and 14 targeted small deletions were made at 7 genomic sites using PE3 (fig. 41I). The average efficiency of targeting 1-bp insertions was 32.+ -. 9.8%, whereas the average efficiency of 3-bp insertions was 39.+ -. 16%. Targeted 1-bp and 3-bp deletions were also effective, with average yields of 29.+ -. 14% and 32.+ -. 11%, respectively. Indels form (insertions or deletions beyond the target) on average 6.8±5.4%. Since the insertions and deletions introduced between positions +1 and +6 alter the PAM site or structure, insertion and deletion editing in this range is considered generally more efficient because Cas9 nickase cannot re-bind and nick the edited DNA strand prior to repair of the complementary strand, similar to point mutations that edit PAM.
PE3 was also tested for its ability to mediate larger exact deletions of 5bp to 80bp at the HEK3 site (FIG. 41J). Very high editing efficiencies (52% to 78%) were observed for the 5-, 10-and 15-bp deletions using 13-nt PBS and RT templates containing 29, 24 or 19bp homology to the target locus, respectively. A26-nt RT template is used for supporting 25bp larger deletion, the efficiency is 72+/-4.2%, and a 20-nt RT template realizes 80-bp deletion, and the efficiency is 52+/-3.8%. These targeted deletions were accompanied by an average of 11±4.8% indel frequency (fig. 41J).
Finally, PE3 was tested for its ability to mediate 12 multiple editing combinations consisting of insertions and deletions, insertions and point mutations, deletions and point mutations, or two point mutations across three genomic sites at the same target locus. These combinatorial edits are very efficient, averaging 55% of target edits with 6.4% indels (fig. 41K), and demonstrate that guided edits are capable of combining precise insertions, deletions, and point mutations with high efficiency and low indels frequency at a single target site.
In summary, the examples of figures 41A through 41K represent 156 different transitions, transversions, insertions, deletions and combinatorial edits spanning 7 human genomic loci. These findings confirm the versatility, accuracy and targeting flexibility of guided editing.
Comparison of guided editing with base editing
The Cytidine Base Editor (CBE) and Adenine Base Editor (ABE) of the current generation can install C.G-to-T.A conversion mutation and A.T-to-G.C conversion mutation with high efficiency and low indels 129,130,142 . The use of base editing can be limited to the presence of multiple cytidine or adenine bases (typically about 5bp wide) within the base editing activity window, which results in unwanted bystander editing 129,130,142,155 Or by the lack of PAM located about 15±2nt from the target nucleotide 142,156 . Guided editing is particularly useful for installing transition mutations precisely without bystander editing, or for advantageously locating target nucleotides within a PAM-blocked CBE or ABE activity window lacking proper localization.
Optimized CBE by using nicking enzyme free Activity (BE 2 max) or having nicking enzyme Activity (BE 4 max) 157 Or using PE2 and PE 3-like guided editing systems, edit 3 genes containing multiple target cytidine in the classical base editing window (protospacer sites 4-8, PAM counted as sites 21-23)The genomic loci compare the guided editing with cytosine base editing. Of the 9 total target cytosines within the base editing window of 3 sites, the average total C.G to T.A conversion produced by BE4max was 2.2 times higher than PE3 for the base in the center of the base editing window (original spacer sites 5-7, FIG. 42A). Also, at these well-located bases, non-nicking producing BE2max was on average 1.4 times better than PE2 (fig. 42A). However, for cytosines beyond the center of the base editing window, PE3 performed 2.7 times better than BE4max and PE2 performed 2.0 times better than BE2max (average editing of PE3 40.+ -. 17% vs. BE4max 15.+ -. 18% and PE2 22.+ -. 11% vs. BE2max 11.+ -. 13%). In summary, the indel frequency of PE2 is very low (0.86.+ -. 0.47% on average), while the indel frequency of PE3 is similar to or slightly higher than BE4max (BE 4max range: 2.5% to 14%; PE3 range: 2.5% to 21%) (FIG. 42B).
Comparing the efficiency of base editing with that of pilot editing to install precise C.G to T.A edits (without any bystander editing), the efficiency of pilot editing at the above sites greatly exceeded that of base editing, similar to most genomic DNA sites, which contained multiple cytosines within an approximately 5-bp base editing window (FIG. 42C). At these sites (e.g., EMX1, which contains cytosines at the original spacer sites C5, C6, and C7), BE4max produced little product, which contained only a single target base pair conversion, without bystander editing. Conversely, guided editing at this site may be used to selectively install C.G to T.A edits at any site or combination of sites (C5, C6, C7, C5+C6, C6+C7, C5+C7, or C5+C6+C7) (FIG. 42C). All exact single-base or double-base edits (i.e., edits that do not modify any other nearby bases) are much more effective with PE3 or PE2 than with BE4max or BE2, respectively, while the three base c.g to t.a edits are more effective with BE4max (fig. 42C), reflecting the propensity of the base editor to edit all target bases within the activity window. Taken together, these results indicate that a cytosine base editor can make higher level edits than PE2 or PE3 at optimally located target bases, but that guided editing can be better than base editing at non-optimally located target bases, and can edit with much higher accuracy using multiple editable bases.
ABEmax by optimized non-nicking generating ABE (ABEmax with dCS 9 instead of Cas9 nicking enzyme 152 Hereinafter referred to as ABEdmax) compared to PE2 and the adenine base editor ABEmax generated by optimized nicks compared to PE3, comparing the a.t to g.c edits at the two genomic loci. At the position containing two target adenine's in the base edit window (HEK 3), ABE is more efficient than PE2 or PE3 for A5 transition, but PE3 is more efficient for A8 transition at the edge of the ABEmax edit window (fig. 42D). When comparing the efficiency of the exact editing in which only a single adenine was converted, PE3 was better than ABEmax at both A5 and A8 (fig. 42E). In summary, ABE produced significantly fewer indels at HEK3 than the leader editor (ABEdmax 0.19±0.02% versus 1.5±0.46% for PE2, ABEmax 0.53±0.16% versus 11±2.3% for PE3, fig. 42F). Where only a single a's FANCF is present within the base edit window, ABE2 and ABEmax are 1.8 to 2.9 times better than their guided editing counterparts in terms of total target base pair conversion, almost all editing products come from both base edits, with the guided edits containing only precise edits (fig. 42D to 42E). As with HEK3 site, ABE produced much fewer indels at the FANCF site (FIG. 42F).
Taken together, these results demonstrate that base editing and guided editing provide complementary advantages and disadvantages for performing targeted transition mutations. For the case where there is a single target nucleotide in the base editing window, or when bystander editing is acceptable, current base editors are generally more efficient and produce fewer indels than lead editors. Guidance editors offer great advantages when multiple cytosines or adenine are present and bystander editing is not desired, or when the target base positioning for base editing relative to available PAM is not good.
Off-target guided editing
To result in efficient editing, guided editing requires complementary to Cas9 domain for binding to target locus, pegRNA spacer, target locus that initiates pegRNA-initiated reverse transcription, pegRNA PBS complementarity, and target locus for flap cleavage, reverse transcriptase product complementarity. With other genomesThese three different DNA hybridization requirements can minimize off-target guided editing compared to editing methods. For validation, HEK293T cells were treated with Cas9 and 4 corresponding sgrnas targeting the same protospacer, or with Cas9 and the same 16 pegrnas, with PE3 or PE2 and a total of 16 pegrnas designed to target four mid-target genomic loci. These 4 target loci were selected because each had at least 4 well-characterized off-target sites for which Cas9 and corresponding mid-target sgrnas in HEK293T cells are known to result in a large number of off-target DNA modifications 118,159 . After treatment, the 4 mid-target loci and the first 4 known Cas9 off-target sites of each mid-target spacer were sequenced for a total of 16 off-target sites (table 1).
And previous study 118 Concordance, cas9 and 4 target sgrnas modified all 16 previously reported off-target loci (fig. 42G). Cas9 off-target modification efficiency among the 4 off-target sites of HEK3 target locus averages 16%. Cas9 and HEK 4-targeted sgrnas resulted in an average of 60% modification of the known off-target sites of 4 assays. Similarly, the off-target sites of EMX1 and FANCF were modified by Cas9:sgRNA with average frequencies of 48% and 4.3%, respectively (FIG. 42G). Notably, on average, the pegRNA and Cas9 nuclease modified the mid-target site with similar efficiency (1 to 1.5 fold lower) compared to the sgRNA, while the average efficiency of the pegRNA and Cas9 nuclease modified the off-target site was about 4 fold lower than the sgRNA.
Remarkably, PE3 or PE2 with the same 16 test pegrnas containing these four target spacers resulted in much lower off-target editing (fig. 42H). Of the 16 sites known to be off-target edited by Cas9+ sgRNA, pe3+ pegRNA, or pe2+ pegRNA, only 3 out of 16 off-target sites had detectable off-target guided edits occurred, and only 1 out of 16 showed an off-target editing efficiency of ≡1% (fig. 42H). At these 16 known Cas9 off-target sites, the average off-target guided editing of pernas targeting HEK3, HEK4, EMX1 and FANCF was <0.1%, <2.2±5.2%, <0.1% and <0.13±0.11%, respectively (fig. 42H). Notably, at HEK4 off-target 3 site, where Cas9+pegrna1 edits with 97% efficiency, although sharing the same spacer sequence, pe2+pegrna1 resulted in only 0.7% off-target editing, indicating how the two additional DNA hybridization events necessary to guide editing compared to Cas9 editing can greatly reduce off-target editing. Taken together, these results indicate that PE3 and pegRNA induce significantly lower off-target DNA editing in human cells than Cas9 and sgrnas targeting the same protospacer.
In principle, reverse transcription of 3' -extended pegRNA can proceed to the guide RNA scaffold. If the resulting 3 'flap (although its 3' end lacks complementarity to the unedited DNA strand) is incorporated into the target locus, the result is an insertion of the pegRNA scaffold nucleotide that contributes to the frequency of the indel. We analyzed sequencing data from 66 PE 3-mediated editing experiments at 4 loci in HEK293T cells, observed low frequency pegRNA scaffold insertions, averaged 1.7±1.5% total insertions of any number of pegRNA scaffold nucleotides (fig. 56A-56D). Due to Cas9 domain binding, and cytokinesis during the 3' flap cleavage of the mismatched 3' end of the 3' flap caused by reverse transcription of the pegRNA scaffold, the guide RNA scaffold is not accessible to reverse transcriptase, the product of incorporating the pegRNA scaffold nucleotides can be minimized. While such events are rare, future engineering of pegrnas or guiding editor proteins that aims to minimize pegRNA scaffold incorporation may further reduce the frequency of indels.
Deaminase in some base editors can act in Cas 9-independent fashion, yielding low-level but extensive off-target DNA editing in the first generation CBE (but not ABE) 160-162 And off-target RNA editing in first generation CBE and ABE 163-165 Newer CBE and ABE variants with engineered deaminase greatly reduce Cas9 independent off-target DNA and RNA editing 163-165 . The guide editor lacks a base modifying enzyme such as deaminase and therefore does not have the inherent ability to modify DNA or RNA bases in a Cas9 independent manner.
Although in principle the reverse transcriptase domain in the guide editor can process correctly primed RNA or DNA templates in cells, it is noted that retrotransposons like the LINE-1 family 166 Endogenous retrovirus 167、168 And human telomerase, provide active endogenous human reverse transcriptase. Their natural presence in human cells suggests that reverse transcriptase activity itself is essentially non-toxic. In fact, and expression ofdCAS9, control for Cas 9H 840A nickase or PE2 with R110S+K103L (PE 2-dRT) mutation to inactivate reverse transcriptase 169 In contrast, no PE 3-dependent differences were observed in HEK293T cell viability (fig. 49A-49B).
Despite the above data and analysis, there remains a need for additional studies to evaluate off-target guided editing in an unbiased whole genome fashion, and to characterize the extent to which reverse transcriptase variants in the guided editing or guided editing intermediates may affect cells.
Guiding editing of pathogenic transversions, insertions and deletion mutations in human cells
PE3 was tested in human cells for its ability to directly mount or correct transversion, small insertions and small deletion mutations leading to genetic disease. Sickle cell disease is most often caused by a.t to t.a transversion mutation in HBB leading to a glu6→val mutation in β -globin. HDR using Cas9 nuclease and donor DNA templates, followed by enrichment of edited cells, transplantation and implantation to treat hematopoietic stem cells ex vivo is a promising potential strategy for treating sickle cell disease 170 . However, except for the correctly edited HBB allele 170-171 In addition, this approach still produces many by-products containing indels. While base editors typically produce far fewer indels, they are currently unable to make the T.A to A.T transversion mutations required to directly restore the normal sequence of HBB.
PE3 was used to install the HBB E6V mutation in HEK293T cells with 44% efficiency and 4.8% insertion deletion (FIG. 43A). From the PE3 treated cell mixture we isolated 6 HEK293T cell lines homozygous (triploid) for the HBB E6V allele (fig. 53A to 53D), demonstrating the ability to guide the editing to generate human cell lines with pathogenic mutations. To correct the HBB E6V allele to wild-type HBB, we treated homozygous HBB E6V HEK293T cells with PE3 and pegRNA programmed to directly revert the HBB E6V mutation to wild-type HBB. A total of 14 pegRNA designs were tested. After 3 days, DNA sequencing showed that all 14 pegrnas were able to efficiently correct HBB E6V to wild-type HBB (26% wild-type HBB, no indels) when bound to PE3, and the indels level averaged 2.8±0.70% (fig. 50A). Optimal pegRNA with 2.4% indels resulted in correction of 52% HBB E6V to wild type (FIG. 43A). The introduction of silent mutations that modified PAM recognized by pegRNA moderately improved editing efficiency and product purity to 58% correction with 1.4% indels (fig. 43A). These results demonstrate that guided editing can install and correct pathogenic point mutations in human cell lines with high efficiency and minimal byproducts.
Tay-Satwo disease is most often caused by the 4-bp insertion of the HEXA gene (HEXA 1278+TATC) 136 . PE3 was used to install the 4-bp insert into HEK293T cells with 31% efficiency and 0.8% insert deletion (FIG. 43B) and 2 HEXA1278+TATC alleles homozygous HEK293T cell lines were isolated (FIGS. 53A-53D). Correction of pathogenic insertions in the HEXA was tested using these cells for 43 pegrnas and 3 nick-producing sgrnas with the PE3 or PE3B systems (fig. 50B), either by reverting to the wild-type allele completely or by disrupting PAM and installing a transferred 4-bp deletion of silent mutations. 19 of the 43 pegRNAs tested resulted in > 20% editing. Complete correction to wild-type HEXA using PE3 or PE3B and optimal pegRNA was performed with similar average efficiency (30% PE3 versus 33% PE 3B), but the PE3B system was accompanied by 5.3-fold fewer indels (1.7% PE3 versus 0.32% PE 3) (FIGS. 43B and 50B). These findings indicate that pilot editing is capable of performing precise small insertions and deletions of pathogenic alleles in mammalian cells that are efficient and installed or corrected with minimal byproducts.
Finally, the installation of protective SNPs into the gene PRNP encoding human prion protein (PrP) was tested. PrP misfolding results in progressive and fatal neurodegenerative prion diseases that can occur spontaneously by a dominant mutation in the PRNP gene or by exposure to misfolded PrP 172 . PRNP 127V mutant allele-conferring humans 138 And mice (mice) 173 Resistance to prion diseases. PE3 was used to install G127V into the human PRNP allele of HEK293T cells, which required G.C to T.A transversions. The PE3 system was used to evaluate 4 pegRNAs and 3 nicking-producing sgRNAs. DNA sequencing after 3 days of exposure to the most potent PE3 and pegRNA showed an efficiency of 53.+ -. 11% for the installation of G127V mutationsThe level of indels was 1.7±0.7% (fig. 43C). Taken together, these results demonstrate the ability to direct editing to install or correct transversion, insertion or deletion mutations that result in or confer resistance to disease with high efficiency and minimal byproducts in human cells.
Guided editing in different human cell lines and primary mouse neurons
Next, the ability of guided editing to edit endogenous sites in 3 other human cell lines was tested. In K562 (leukemia bone marrow) cells, PE3 was used to perform the transversion edits at HEK3, EMX1 and FANCF sites, and the 18-bp 6XHis tag was inserted in HEK 3. For each of these four PE3 mediated edits, an average editing efficiency of 15-30% was observed, with an insertion deletion average of 0.85-2.2% (FIG. 43A). In U2OS (osteosarcoma) cells, transversion mutations in HEK3 and FANCF were installed, and the 3-bp insertion and 6XHis tag insertion in HEK3 had editing efficiency of 7.9-22% over indels forming 10 to 76-fold (FIG. 43A). Finally, in HeLa (cervical cancer) cells, 3-bp insertion HEK3 was performed with an average efficiency of 12% and an indel of 1.3% (fig. 43A). Taken together, these data indicate that a variety of cell lines other than HEK293T cells support guided editing, but the editing efficiency varies from cell type to cell type and is generally less efficient than HEK293T cells. The indel ratio remained high in all human cell lines tested.
To determine if pilot editing is possible after mitosis, terminally differentiated primary cells harvested from E18.5 mice, primary cortical neurons, were transduced with a double-split PE3 lentiviral delivery system, in which split intein splices 203 The PE2 proteins from the N-terminal half and the C-terminal half are reconstituted, each delivered by a separate virus. To limit editing of postmitotic neurons, human synapsin promoters highly specific for mature neurons are used to drive expression of both PE2 protein components. GFP by self-cleavage of the P2A peptide 205 Fused to the N-terminal half of PE 2. Nuclei were isolated from neurons two weeks after double viral transduction and sequenced directly, or GFP expression was sorted prior to sequencing. It was observed that on average 0.58.+ -. 0.14% of the indels were present on DNMT1 base in the sorted nucleiThe installation transversions were edited due to an average guidance of 7.1±1.2% at the seat (fig. 43D). Cas9 nucleases in the same split intein dual lentiviral system resulted in 31±5.5% indels in the sorted cortical neuronal nuclei (fig. 43D). These data indicate that post-mitotic terminally differentiated primary cells can support guided editing, thus confirming that guided editing does not require cell replication.
Comparing guided editing to Cas9 initiated HDR
In support of HDR 128 Comparing the performance of PE3 in mitotic cell lines and optimizing Cas 9-initiated HDR 128,125 Is a performance of the (c). HEK293T, heLa, K562 and U2OS cells were treated with Cas9 nuclease, sgRNA and ssDNA donor oligonucleotide templates designed for installation of various transversions and insert editing (fig. 43E to 43G and 51A to 51F). In all cases, cas 9-initiated HDR successfully installed the desired edits, but the byproduct levels were much higher (mainly indels), as expected from the treatment that resulted in the double strand break. Using PE3, HBB E6V installation and correction in HEK293T cells was performed with an average editing efficiency of 42% and 58%, with average indels of 2.6% and 1.4%, respectively (fig. 43E and 43G). In contrast, the same edits with Cas9 nuclease and HDR template resulted in an average edit efficiency of 5.2% and 6.7%, with an average indel frequency of 79% and 51% (fig. 43E and 43G). Similarly, PE3 installed PRNP 127V with 53% efficiency and 1.7% indels, while Cas 9-initiated HDR installed the mutation with 6.9% efficiency and 53% indels (fig. 43E and 43G). Thus, editing of HBB E6V installation, HBB E6V correction, and PRNP G127V installation, the indel ratio is 270 times higher for PE5 than for Cas9 initiated HDR on average.
In human cell lines other than HEK293T, comparisons between PE3 and HDR showed similar results, although PE3 editing was less efficient. For example, in K562 cells, PE3 mediated 3-bp insertion of HEK3 was performed with 25% efficiency and 2.8% indels, facilitating 40-fold editing of PE3 compared to 17% editing and 72% indels of Cas 9-initiated HDR (FIGS. 43F-43G). In U2OS cells, PE3 performs this 3-bp insertion with 22% efficiency and 2.2% insertion deletion, while Cas 9-initiated HDR resulted in 15% editing versus 74% insertion deletion, 49-fold lower editing: insertion deletion ratio (fig. 43F to 3G). In HeLa cells, PE3 was inserted with 12% efficiency and 1.3% indels, 210-fold editing: indel ratio differences compared to 3.0% editing and 69% indels of Cas 9-initiated HDR (fig. 43F-43G). Taken together, these data indicate that HDR generally resulted in similar or lower editing efficiency than PE3 and much higher indels than PE3 in the four cell lines tested (fig. 51A-51F).
Discussion and future directions
The ability to insert DNA sequences with single nucleotide accuracy is a particularly viable pilot editing capability. For example, PE3 is used to treat His 6 The tag (18 bp,65% average efficiency), FLAG epitope tag (24 bp,18% average efficiency) and extended LoxP site (which is the natural substrate for Cre recombinase, 44bp,23% average efficiency) were inserted exactly into HEK3 locus of HEK293T cells. The average indels range for these examples was 3.0% to 5.9% (fig. 43H). Many biotechnology, synthetic biology and therapeutic applications predict the ability to efficiently and accurately introduce new DNA sequences into target sites of interest in living cells.
In summary, the pilot editing experiments described herein install 18 up to 44bp insertions, 22 up to 80bp deletions, 113 point mutations (including 77 transversions) and 18 combinatorial edits across 12 endogenous loci at positions ranging from 3bp upstream to 29bp downstream of PAM initiation in the human and mouse genomes without causing significant double stranded DNA breaks. These results confirm that guided editing is a very versatile method of genome editing. Since the vast majority (85-99%) of insertions, deletions, indels and duplications in ClinVar are ∈30bp (fig. 52A to 52D), guided editing can in principle correct up to about 89% (transitions, transversions, insertions, deletions, indels and duplications in fig. 38A) of the 75,122 pathogenic human genetic variants currently known in ClinVar, with the additional potential to ameliorate diseases caused by copy number increase or decrease.
Importantly, for any desired editing, the flexibility of guided editing provides a pegRNA induced nick site, sgRNA inducedThe second incision site, PBS length, RT template length, and many possible choices of the first edited strand, as broadly demonstrated herein. This flexibility and general applicability to other precise genome editing methods 125,142,154 In contrast to the more limited selection of (c) allowing optimization of editing efficiency, product purity, DNA specificity or other parameters to suit the needs of a given application, as shown in figures 50A to 50B, where 14 and 43 pegrnas are tested, covering a series of guided editing strategies that optimize pathogenic HBB and hex a allele correction, respectively.
However, guided editing provides new "search and substitution" capabilities by achieving high precision targeted transitions, transversions, small insertions and small deletions in mammalian cell genomes without double strand breaks or HDR, greatly expanding the scope of genome editing.
Example 2: pegRNA modification
Described herein are a series of pegRNA designs and strategies that can improve the efficiency of guided editing (PE).
Guided editing (PE) is a genomic editing technique that can replace, insert, or remove specific DNA sequences within a targeted genetic locus using information encoded in a guided RNA (pegRNA) of guided editing. The guide editor (PE) consists of a sequence programmable DNA binding protein with nuclease activity (Cas 9) fused to a polymerase such as Reverse Transcriptase (RT). The PE forms a complex with a pegRNA that contains information targeting a specific DNA locus within the spacer sequence, as well as information specifying the desired editing in engineering extensions constructed in standard sgRNA scaffolds. PE-pegRNA complexes bind and nick the programmed target DNA locus, allowing the nicked DNA strand to hybridize to the engineered Primer Binding Sequence (PBS) of the pegRNA. Then, using the nicked genomic DNA as a primer for DNA polymerization, the reverse transcriptase domain copies the editing encoded information within the RT template portion of the pegRNA. Subsequent DNA repair processes incorporate the newly synthesized edited DNA strand into the genomic locus. Improving the design of these pegRNAs can increase PE efficiency and enable the installation of longer insert sequences into the genome.
Described herein are a series of pegRNA designs that are expected to increase PE efficacy. These designs utilize many previously published methods to improve the efficacy and/or stability of sgrnas, and utilize many new strategies. These improvements may fall into one or more of many different categories:
(1) Longer pegRNA. This class relates to improved designs that enable efficient expression of functional pegrnas from non-polymerase III (pol III) promoters that enable longer pegrnas to be expressed without cumbersome sequence requirements;
(2) Core improvement. This class relates to improvements to core, cas 9-binding pegRNA scaffolds, which can improve efficacy;
(3) RT continues synthesis capacity. This class involves modifications to the pegRNA that increase the ability of RT to continue synthesis, enabling insertion of longer sequences at the targeted genomic locus; and
(4) Terminal motifs. This class involves the addition of RNA motifs at the 5 'and/or 3' end of the pegRNA that increase pegRNA stability, enhance RT processivity, prevent misfolding of the pegRNA, or recruit other factors important for genome editing.
Many potential such pegRNA designs in each category are described herein. Some of these designs for increasing sgRNA activity using Cas9 have been previously described and are shown as such. Also described herein is a platform for PEgRNA evolution of a given sequence target that enables improvements in PEgRNA scaffolds and enhancement of PE activity (5). Notably, these designs are also readily applicable to improving the pegRNA recognized by any Cas9 or evolutionary variants thereof.
(1) Longer pegRNA
sgrnas are typically expressed from the U6 snRNA promoter. The promoter recruits pol III to express the relevant RNA and can be used to express short RNAs that remain in the nucleus. However, pol III is not highly persistent and cannot express RNAs of over a few hundred nucleotides in length at the level required for efficient genome editing 183 . In addition, pol III may stop or terminate at the extension of U, which may limit sequence diversity using pegRNA insertion. Other recruiting polymerase II (e.g., pCMV) or have been detectedAbility of promoters of polymerase I (e.g., U1 snRNA promoter) to express longer sgRNAs 183 . However, these promoters are typically partially transcribed, which results in additional sequences 5' to the spacer in the expressed pegRNA, which has been shown to result in a significant reduction in Cas9: sgRNA activity in a site-dependent manner. Furthermore, although pol III transcribed pegRNA can simply terminate in a stretch of 6-7U, a different termination signal is required for a pegRNA transcribed from pol II or pol I. Typically, such signals also result in polyadenylation, and thus undesired transport of pegRNA from the nucleus. Similarly, RNAs expressed by pol II promoters such as pCMV are typically 5' -capped, also resulting in their nuclear export.
Heretofore, rinn and colleagues screened a variety of expression platforms for the production of long non-coding RNA (lncRNA) tagged sgRNAs 183 . These platforms include ENE elements expressed from pCMV and terminating in MALAT1 ncRNA from human 184 PAN ENE element of KSHV 185 Or U1snRNA 3' cassette 186 Is a RNA of (C). Notably, MALAT1 ncRNA and PANEs form a triple helix protecting the polyA tail 184,187 . In addition to being able to express RNA, these constructs may also enhance the stability of RNA (see section iv). The use of promoters from U1snRNA was also explored to enable expression of these longer sgRNAs 183 . These expression systems will also be able to express longer pegRNA. In addition, a series of methods have been devised for cleaving the part of the pol II promoter that will be transcribed as part of the pegRNA, adding a self-cleaving ribozyme such as hammerhead 188 Pistol-shaped 189 Axe shape 189 Hair clip 190 、VS 191 Twist shape 192 Or sister twisted (twister) 192 Ribozymes, or other self-cleaving elements to process guidance of transcription, or hairpins recognized by Csy4 and resulting in guidance processing 193 . Furthermore, the incorporation of multiple ENE motifs can increase the expression and stability of pegRNA. Circularization of pegRNA in the form of circular intron RNA (cRNA) may lead to enhanced RNA expression and stability, as well as nuclear localization 194 . An exemplary pegRNA expression platform is represented by SEQ ID NOS.241-245.
(2) Core/stent improvements
The core, cas 9-binding pegRNA scaffold can be modified to enhance PE activity. In an exemplary method, the first mating element (P1) of the scaffold contains a GTTTT-AAAAAAC (SEQ ID NO: 246) mating element. Such T-segments can lead to pol III pauses and premature termination of RNA transcripts. A T-A pair in this part of P1 can be mutated to G-C pair as a result of the rational mutation to enhance sgRNA activity. This method can be used to improve pegRNA. Furthermore, increasing the length of P1 can enhance sgRNA folding and increase activity. Finally, perfecting the pegRNA scaffold by directed evolution of the pegRNA on a given DNA target may also increase activity. This is described in section (v). Exemplary modified pegRNAs are represented by SEQ ID NOs 247 and 248.
Many structural modifications to the gRNA scaffold were also tested, none of which showed a significant increase in editing activity (see fig. 82, 3.30.13 to 3.30.19 on the X-axis, compared to 3.30). However, there are two notable places for this data. First, the present guide works well, and it is better if the less efficient guide is used for testing. Second, in HEK cells, transfection was very efficient and it was noted that the amount of transfected guide RNA was much more than required (about 4-8 fold reduction in amount had no effect on editing). These improvements may only occur in other cell types where transfection is less efficient or where a less efficient vector is used. Many of these changes precede to increase sgRNA activity in other cell lines.
The sequence of the construct of fig. 82 is as follows:
/>
/>
/>
note that if there are no terminal motifs or terminal motifs that do not end in the U-segment, the transcripts are terminated using the following HDV ribozymes:
GGCCGGCAUGGUCCCAGCCUCCUCGCUGGCGCCGGCUGGGCAACAUGCUUCGGCAUGGCGAAUGGGAC(SEQ ID NO:457)
(3)enhancing RT processivity via modification of template regions of pegRNA
As the size of the insert templated by the pegRNA increases, it is more likely to be degraded by endonucleases, spontaneously hydrolyze, or fold into a secondary structure that cannot be reverse transcribed by RT or disrupt folding of the pegRNA scaffold and subsequent Cas9-RT binding. Thus, modification of the pegRNA template may be required to affect large insertions, such as insertion of the entire gene. Some strategies to do this include incorporating modified nucleotides into the synthetic or semi-synthetic pegRNA that render the RNA more resistant to degradation or hydrolysis, or less likely to adopt inhibitory secondary structures 196 . Such modifications may include 8-nitrogen-7-deazaguanosine, which will reduce the RNA secondary structure in G-rich sequences; locked Nucleic Acid (LNA) can reduce degradation and enhance certain kinds of RNA secondary structures; 2' -O-methyl, 2' -fluoro or 2' -O-methoxyethoxy modifications that enhance RNA stability. Such modifications may also be included elsewhere in the pegRNA to enhance stability and activity. Alternatively or additionally, templates for pegRNA may be designed to encode both the desired protein product and more likely to adopt a simple secondary structure that can be expanded by RT. Such simple structures will act as thermodynamic sinks (sink) and thus more complex structures are less likely to occur to prevent reverse transcription. Finally, the template can also be split into two separate pegRNAs. In such designs, PE will be used to initiate transcription and recruit a separate template RNA to the targeting site via an RNA recognition element (e.g., MS2 aptamer) on the RNA binding protein fused to Cas9 or on the pegRNA itself. RT may be directly bound to this separate template RNA, or exchanged to a second template Reverse transcription was initiated on the original pegRNA before. Such methods can achieve long insertions by preventing misfolding of the pegRNA upon addition of the long template and also by eliminating the need to dissociate Cas9 from the genome for long insertions to occur (which may inhibit PE-based long insertions).
(4) Mounting additional RNA motifs at the 5 'or 3' end
pegRNA can also be improved via the installation of additional motifs at either end of the RNA end. Previously in section (i) 184,185 Several such motifs are discussed-e.g. PAN ENE from KSHV and ENE from MALAT1 as possible means of terminating expression of longer pegRNA from non-pol III promoters. These elements form RNA triplexes that engulf the polyA tail, resulting in their retention in the nucleus 184,187 . However, by forming complex structures at the 3' end of the pegRNA to block the terminal nucleotides, these structures may also help prevent exonuclease-mediated degradation of the pegRNA. Although termination from a non-pol III promoter cannot be achieved, other structural elements inserted at the 3' end may also enhance RNA stability. Such motifs may include hairpin or RNA quadruplexes that block the 3' end 197 Or a self-cleaving ribozyme (e.g., HDV) that results in the formation of a 2' -3' -cyclophosphate at the 3' end and also potentially makes the pegRNA less likely to be degraded by exonucleases 198 . Inducing cyclization of the pegRNA via incomplete splicing-to form a cRNA-can also increase pegRNA stability and result in retention of the pegRNA in the nucleus 194
Additional RNA motifs can also improve RT processivity or enhance pegRNA activity by enhancing the binding of RT to DNA-RNA duplex. The addition of native sequences bound by RT to its cognate retroviral genome can enhance RT activity 199 . This may include the natural Primer Binding Site (PBS), the polypurine tract (PPT) or the kissing ring involved in retroviral genome dimerization and transcription initiation 199 . Addition of dimerization motifs to 5 'and 3' ends of pegRNA-such as kissing loops or GNRA four-ring/four-ring receptor pairs 200 Can also lead to efficient cyclization of the pegRNA, improving stability. Furthermore, it is expected that the addition of these motifs enables the realization of a pegRNA spacer and primerThe separation prevents the blocking of the spacer regions which might hinder PE activity. Short 5 'or 3' extensions of the pegRNA form small fulcrum hairpins at the spacer, which can also advantageously compete with the annealing region of the pegRNA that binds to the spacer. Finally, kissing loops can also be used to recruit other template RNAs to the genomic site and effect exchange of RT activity from one RNA to another (section iii). An exemplary pegRNA construct is represented by SEQ ID NOS.251-255.
(5) Evolution of pegRNA
The pegRNA scaffold can be further improved via directed evolution in a manner similar to how SpCas9 and base editor have been improved 201 . Directed evolution can enhance recognition of pegRNA by Cas9 or evolved Cas9 variants. Furthermore, different pegRNA scaffold sequences may be optimal at different genomic loci, either to enhance PE activity at the relevant site, or to reduce off-target activity, or both. Finally, the evolution of the pegRNA scaffold with the addition of other RNA motifs almost certainly increases the activity of the fused pegRNA relative to the non-evolved fusion RNA. For example, the evolution of an allosteric ribozyme consisting of a c-di-GMP-I aptamer and a hammerhead ribozyme results in a significant increase in activity 202 This suggests that evolution would also increase the activity of the hammerhead-pegRNA fusion. Furthermore, while Cas9 is currently generally unable to tolerate 5' extension of sgrnas, directed evolution may generate mutations that mitigate this intolerance, allowing the use of additional RNA motifs.
As described herein, many such methods have been described for use with Cas9: sgRNA complexes, but no design has been reported for improving the pegRNA activity. Other strategies for installing programmable mutations into the genome include base editing, homology Directed Recombination (HDR), exact microhomology-mediated end ligation (MMEJ), or transposase-mediated editing. However, all of these methods have significant disadvantages compared to PE. Current base editors, while more efficient than existing PEs, can only install certain classes of genomic mutations and may result in additional, undesirable nucleotide transformations at sites of interest. HDR is only viable in a very small number of cell types and results in relatively high random insertion and deletion mutation (indel) ratios. Precise MMEJ can lead to predictable repair of double strand breaks, but is primarily limited to installation of deletions, is very site-dependent, and may also have a relatively high rate of undesired indels. To date, transposase mediated editing has only played a role in bacteria. Thus, such improvements in PE may represent the best way to therapeutically correct for a broad range of genomic mutations.
(5) PBS toe ring
To further increase PE activity, the inventors considered adding a toe loop sequence at the 3 'end of the pegRNA with 3' extension arm. Fig. 71A provides an example (top molecule) of a universal SpCas9 pegRNA with a 3' extension arm. In turn, the 3 'extension arm contains the RT template (which includes the desired edits) and the Primer Binding Site (PBS) at the 3' end of the molecule. The molecule terminates with a poly (U) sequence comprising three U nucleobases (i.e., 5 '-UU-3').
In contrast, the bottom portion of FIG. 71A shows the same pegRNA molecule as the top portion of FIG. 71A, but wherein the 9 nucleobase sequences of 5'-GAAANNNNN-3' have been inserted between the 3 'end of the primer binding site and the 5' end of the terminal poly (U) sequence. The structure folded back on itself 180 ° to form a "toe loop" RNA structure, wherein the 9 nucleobase inserted 5'-NNNNN-3' sequence anneals to a complementary sequence in the primer binding site, and wherein the 5'-GAAA-3' portion forms a 180 ° turn. The features of the toe ring sequence depicted in fig. 71A are not intended to limit or narrow the range of possible toe rings that may be substituted. Furthermore, the sequence of the toe loop will depend on the complementary sequence of the primer binding site. In essence, however, in various embodiments, the toe loop sequence may have a first sequence portion that forms 180 ° and a second sequence portion that has a sequence that is complementary to a portion of the primer binding site.
Without being bound by theory, it is believed that the toe ring sequence enables the use of pegrnas with longer and longer primer binding sites than would otherwise be possible. In turn, longer PBS sequences are thought to increase PE activity. More specifically, a possible function of the toe ring is to prevent or at least minimize the interaction of PBS with the spacer. Stable hairpin formation between PBS and spacer can result in inactive pegRNA. Without the toe ring, this interaction may be required to limit the length of the PBS. The use of a 3' terminal toe ring to block or minimize interactions between the spacer and PBS may result in improved PE activity.
(6) Expression of pegRNA from non-pol III promoters
Using the 102 nucleotide sequence insertion from FKBP as a read, various pegRNA expression systems were tested for their ability to produce pegRNA.
Transcription of pegRNA can be directed by a typical constitutive promoter, such as the U6 promoter. While the U6 promoter is effective in directing transcription of pegRNA in most cases, the U6 promoter is not very effective in directing transcription of longer pegRNA or U-rich RNA. The U-rich RNA fragments can lead to premature termination of transcription. This example compares the editing results of guides of expression from the CMV promoter or the U1 promoter with the U6 promoter. These promoters require different terminator sequences, such as MASC ENE or PAN ENE, as provided below. Increased editing was observed using the pCMV/MASC-ENE system, but these guides resulted in incomplete sequence insertion, while for the U6 promoter, full insertion was observed at lower levels of editing. See fig. 81. The data indicate that alternative expression systems may be useful for long insertions.
The nucleotide sequence of the pCMV/MASC-ENE expression system is as follows (5 '-to-3' -direction) (where the motif names are shown in bold before the regions to which they refer):
/>
/>
explanation:
[ pCMV promoter ] -binding pol II RNA polymerase
[ Csy4 loop ] -binds to the Csy4 protein resulting in 3' cleavage of the loop. It is necessary because a portion of the [ CMV promoter ] is transcribed and if this sequence is attached to 5' of the gRNA it will reduce/eliminate activity (previously known).
PegRNA [ spacer sequence ]
[ pegRNA scaffold ]
[ template for DNA Synthesis ]
[ insert edit (108 nt from FKBP) ]
[ primer binding site ]
[ linker ] (highly variable) -ligation of PBS and terminator element
[ MASC ENE transcription terminator ] -transcription of the element results in transcription termination; the polyA tail is encoded and then isolated by the ENE element
[ unimportant sequence ]
[ Ubc promoter ] -Csy4 protein expression is necessary
The [ Csy4 protein and NLS ] -processing guide 5' end is required. Other strategies that do not require expression of larger proteins (e.g., ribozyme-mediated cleavage of the spacer) may also be used, but these require more individual adjustments to the different spacer sequences, so we use this.
[ SV40 terminator ] -was used to terminate the Csy4 protein.
(7) Additional RNA motifs
For detailed information on certain motifs that can be introduced into pegRNA to enhance its performance, see FIG. 82, e.g., HDV ribozyme 3 'of pegRNA, or G-quadruplex insertion, P1 extension, template hairpin, and tetracyclic circle'd.
Specifically, this example tested the effect of installing tRNA motifs 3' to the primer binding site. This element is chosen because of a variety of potential functions:
(1) tRNA motifs are very stable RNA motifs, thus potentially reducing degradation of pegRNA;
(2) MMLV RT uses prolyl-tRNA as a primer in converting viral genome into DNA during transcription, so it is speculated that the same cap can be bound by RT, improving PE binding to pegRNA, RNA stability, and bringing PBS closer to genomic locus, possibly also improving activity.
In these constructs, P1 of the tRNA (see FIG. 84) was extended. P1 refers to the first stem/base pairing element of the tRNA (see FIG. 84). This is believed to be necessary to prevent RNAseP-mediated 5' cleavage of the P1 tRNA, which would result in its removal from the pegRNA.
In this design, a prolyl-tRNA (codon CGG) is used, which has an extended P1 and a short 3nt linker between the tRNA and PBS. Compared to pegRNA without tRNA cap, various tRNA designs were tested and editing efficiency was tested—see comparison data in fig. 83 (depicting PE experiments targeting editing HEK3 gene, specifically insertion targeting 10nt insertion at +1 position relative to the incision site and using PE 3), fig. 85 (depicting PE experiments targeting editing FANCF gene, specifically G-to-T conversion at +5 position relative to the incision site and using PE3 construct), and fig. 86 (depicting PE experiments targeting editing HEK3 gene, specifically insertion targeting 71nt FLAG tag insertion at +1 position relative to the incision site and using PE3 construct). The tRNA modified pegRNA was tested against the unmodified pegRNA control.
UGG/CGG refers to the codon used, the number refers to the length of the added P1 extension, the length indicates an 8nt linker, and no 3nt linker is specified.
The data indicate that the installation of tRNA can use shorter PBS, which may lead to additional activity improvements. In the case of RNF2, the linker used may/may lead to improved binding of PBS to the spacer and to reduced activity.
Some of the sequences used:
HEK3+1 FLAG-tag insertion, proly-tRNA { UGG } P1 ext 5nt, linker 3nt
GGCCCAGACUGAGCACGUGAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUGGAGGAAGCAGGGCUUCCUUUCCUCUGCCAUCACUUAUCGUCGUCAUCCUUGUAAUCCGUGCUCAGUCUGUCUGGCGGGGCUCGUUGGUCUAGGGGUAUGAUUCUCGCUUCGGGUGCGAGAGGUCCCGGGUUCAAAUCCCGGACGAGCCCCGCCUUUU(SEQ ID NO:459)
FANCF+5G to T proly-tRNA { CGG } P1 ext 5nt, linker 3nt
GGAAUCCCUUCUGCAGCACCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAAAGCGAUCAAGGUGCUGCAGAAGGGAUCUGGCGGGGCUCGUUGGUCUAGGGGUAUGAUUCUCGCUUCGGGUGCGAGAGGUCCCGGGUUCAAAUCCCGGACGAGCCCCGCCUUUU(SEQ ID NO:460)
HEK3++1 10nt insert, proly-tRNA { UGG } P1 ext 5nt, linker 3nt
GGCCCAGACUGAGCACGUGAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGGACCGAGUCGGUCCUCUGCCAUCAAAGCUUCGACCGUGCUCAGUCUUCUGCUCGAGGCGGGGCUCGUUGGUCUAGGGGUAUGAUUCUCGCUUCGGGUGCGAGAGGUCCCGGGUUCAAAUCCCGGACGAGCCCCGCCUCGAGCUUUU(SEQ ID NO:461)
The sequences reported in the data of fig. 85 and 86 are as follows:
/>
/>
/>
example 3: next generation pegRNA modification for improved guided editing efficiency
Background
The boot editor complex is made up of two components. First, in one embodiment, the guide editor (PE) itself is a programmable nuclease, such as streptococcus pyogenes Cas9 (SpCas 9) fused to a polymerase, such as reverse transcriptase, and carries mutations that inactivate the HNH nuclease domain. The second component is pegRNA, which both targets the editor to the programmed genomic site and contains the template for reverse transcriptase to install the programming edits. While PEs have the ability to create almost any programmable edits, PEs typically have lower activity than Base Editors (BE) for comparable edits. It is believed that rational engineering of pegRNA can also improve editing results and achieve more extensive guided editing in the cell genome, for example, in non-HEK 293T cell lines.
It is believed that the Cas9 affinity and stability of the pegRNA are both reduced relative to a typical single guide RNA (sgRNA). This decrease in Cas9 affinity may be due not only to 3' extension, but also to RNA duplex formation between the spacer and the Primer Binding Site (PBS) inhibiting Cas9 binding. In fact, the longer PBS length completely abrogates PE activity at all test sites, presumably by this mechanism. Furthermore, transfection of the pegRNA with SpCas9 nuclease resulted in fewer indels than the sgrnas targeting the same site, which further suggests that 3' extension reduces Cas9 binding and possibly catalytic activity. The 3' extension also appears to be potentially unprotected by Cas9, may be degraded by exonucleases or bound by other cytokines that may compete with Cas9 or RT binding, or otherwise inhibit guided editing. Indeed, after examination of the cellular lifetime of pegRNA via RT-qPCR, a significant decrease in the stability of the 3' extension relative to the scaffold region was observed (FIGS. 90A-C).
Existing sgRNA improvement strategies
Many sgRNA modifications have been reported to improve human cell editing. The most common such modifications are "flip" and "extension" mutations of the sgRNA scaffold. It has been previously noted that the tetrauridine (U) nucleotide extension in the co-repeat (DR) of the scaffold may be a polymerase III (pol III) termination sequence, and that the inversion of terminal uridine-adenosine (a) base pairs for a-U base pairs results in increased expression and activity of sgrnas. Likewise, extension of DR has also been shown to increase the activity of sgrnas, possibly via Cas9 binding capacity structures that stabilize the sgrnas. Such modifications have been found to increase the pegRNA activity just as they have on the sgRNA activity (FIGS. 91A-D). This is probably due to the efficient transfection of pegRNA encoding plasmids in HEK293T cells, and these modifications are expected to increase activity in other cell types extensively. Another modification sought is to reduce the interaction between the spacer and PBS via incorporation of the fulcrum stem.
Next generation sgRNA improvement strategy
In view of the above findings, the decision was focused on strategies to improve 3' extension stability. Degradation of PBS is particularly detrimental to pegRNA, as any degradation results in pegRNA not being able to be bound by RT but still being able to be bound by Cas 9. Thus, degraded pegRNA can compete for Cas9 and binding to the targeting site and still be able to make a nick at that site, potentially reducing editing and increasing indel formation.
Thus, incorporation of the structural motif 3' of PBS was found to improve stability, as reported for the G-quadruplexes of RNA added to sgRNA. However, it was also decided to screen for additional structural motifs, as purine rich sequences may lead to misfolding of the pegRNA. Thus, several other structural motifs attached to PBS through short unstructured nucleotide linkers were screened.
First, prequeosine 1 The-1 riboswitch aptamer (one of the smallest natural tertiary RNA structures) has evolved to be more stable, hereinafter referred to as evoreQ 1 -1. Second, two structural motifs were selected that potentially interacted with MMLV RT to improve stability and affinity for PE, namely a pseudoknot from the MMLV viral genome (referred to herein as Mpknot-1) and a modified tRNA of MMLV RT that served as a primer for reverse transcription.
The test assay involved screening a pegRNA guide configured to encode FLAG tag insertion sequences (challenging edits) to be installed at various genomic loci (fig. 92A-C). Interestingly, short (G2) quadruplexes and evoreq were observed at all sites tested in HEK293T cells 1 Both-1 and Mpknot-1 had significantly increased editing activity, suggesting that these motifs may improve the activity of various genomic loci.
It is considered that the junctionThe addition of a conformational motif to the 3' -end of the pegRNA may only increase the activity of the pegRNA with longer extension. To determine if this is the case, a small library of pegRNAs encoding point mutations or deletions and containing templates of increased length at 6 additional genomic sites was screened. Extensive improved editing was observed for almost all guides tested (fig. 93A-H), with the range of improvement ranging from 1.5-6 times, regardless of site, edit type or template length, indicating their versatility. Interestingly, while incorporation of structural motifs resulted in improved editing compared to the addition of linkers alone, the addition of linkers generally resulted in improved editing activity relative to the parent pegRNA (fig. 94). To determine whether these structurally tagged pegrnas resulted in improved editing in other cell lines, the ability of the modified pegrnas to mount a FLAG tag at the HEK3 locus of K562, U2OS and HeLa cells was tested. evaopreQ when PBS is added 1 A dramatic improvement in editing efficacy was observed in these cells when either-1 or Mpknot-1 pseudojunctions 3' (FIGS. 95A-B).
In order to improve the original design, attempts have been made to understand how these motifs improve activity. Although they appear to function via an extended cell life, it was observed that the addition of short unstructured linkers was sometimes sufficient to increase the activity of the pegRNA relative to the parent pegRNA (fig. 94). Meanwhile, mutations that would disrupt the motif structure were expected to result in reduced evoPreQ1 (mut 1) and evoPreQ1 (mut 2) edits (fig. 96), indicating that the motif structure is important for activity. This in turn suggests that there may be multiple mechanisms by which these motifs increase PE efficiency.
As a first step, attempts were made to confirm that the modified pegRNA improved the cellular life of the pegRNA. To date, the relative amounts of pegRNA scaffold and template were measured using RT-qPCR, and it was found that attaching a structural motif to the 3' tail of the pegRNA resulted in a significant increase in template amount (FIG. 97). It is thought that the PBS length of these pegRNAs may be increased, further enhancing editing activity.
It is believed that the design of these next generation pegRNAs can be further improved. For this purpose, some additional 3' motifs will be screened. These include additional evolved preq 1-aptamers, modifications to Mpknot-1, additional natural G-quadruplexes with improved stability, P4-P6 domains of group I introns, and self-cleaving HDV ribozymes. This ribozyme results in RNA processing immediately 5 'to itself, leaving a 2' -3 '-cyclic phosphonate at the 3' end of the RNA that is resistant to exonucleases. In addition, mutations of typical sgRNA scaffolds will be tested, which have been reported to increase the editing efficiency of Cas9 nuclease cleavage to see if they would increase activity in HEK cells and other cell types.
These studies involve linker lengths of 8 nucleotides (nt), however, other linker lengths are possible, including, for example, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 or more nucleotides may be used. In some cases, the linker length and sequence for each site must be determined empirically. In other cases, a single linker unrelated to the pegRNA sequence may be used. To aid this process, a computational script can be used to design the linker sequence that does not interfere with the pegRNA structure.
Additional design
As a final step, additional design of pegRNA was sought. Several aspects of the pegRNA structure are contemplated, including: pol III promoter for expression of pegRNA, pegRNA scaffold, and nick-generating guide for use in PE3 to enhance editing efficiency by nicking opposite strands. A variety of pol III promoters have been used to express small RNAs in human cells. Historically, two (u 6 and h 1) have been used for expression of pre-micrornas. Among these two, u6 was found to be superior in sgRNA expression. However, other promoters may increase the expression of pegRNA. To determine if this is the case, a number of pol III promoters were screened, including other homologs of h1 and u6, to exert editing efficacy in HEK293T cells. Several promoters, including one u6 homolog known as u6-9, were found to significantly increase editing efficiency (see FIGS. 100A-100E). These promoters were identified as:
Non-limiting examples of U6 promoters include the promoters represented by SEQ ID NOS 237-240.
Conclusion(s)
In addition to generally increasing PE activity, the modified pegRNA may also simplify the process of designing the pegRNA. Currently, the design of optimal pegRNA generally requires screening for 10s-100s of pegRNA constructs. Such tests are particularly time consuming, expensive and not feasible when constructing a pegRNA library. One potential application of such libraries is the systematic labelling of all proteins in a given collection. A significant benefit of the modified pegrnas described herein is that they simplify the pegRNA design by limiting the negative impact of poor template selection, as seen in HEK3 editing (fig. 93B; fig. 93E). Furthermore, if the 3' motif is able to extend the PBS to its maximum possible length (17), this will greatly simplify the pegRNA design.
In summary, the design of modified pegRNA with improved editing activity has been validated. These pegrnas contain structured RNA 3' of PBS, their improved activity is derived from improved cell life and Cas9 binding activity. These modifications broadly increase PE activity at various genomic loci, coding edits, and cell types.
Example 4: engineered pegRNA to improve guided editing efficiency
The ability to make targeted changes to the genome of the life system continues to drive the development of life sciences and medicine. Double Strand Break (DSB) -mediated DNA editing strategies using programmable nucleases (such as ZFNs, TALENs, or CRISPR-Cas nucleases) can efficiently destroy genes by inducing insertions or deletions (indels) at target sites, but DSBs can also lead to generally undesirable results, including uncontrolled editing result mixes 1,2 Larger DNA rearrangements 3-5 Activation of p53 6-8 And chromosome disruption 9,10 . Although targeting DSBs can stimulate accurate gene correction by homology directed repair, this process is inefficient in most therapeutically relevant cell types 11 . In contrast, base editor 12,13 Booting an editor 14 Accurate changes can be effectively installed in treatment-related cells without the need for DSBs. Cytosine and adenosine base editors are capable of converting C.G to T.A and A.T to G.C, respectively,while the guide editor is capable of installing almost any local mutation, including substitutions, insertions and/or deletions of up to tens of base pairs at the target DNA site.
A boot editing (PE) system is composed of at least two components: proteins containing programmable DNA nicking enzymes fused to engineered Reverse Transcriptase (RT) and directed editing guide RNA or pegRNA (FIG. 104A) 14 . The pegRNA contains a spacer region specifying the target site, a sgRNA scaffold, and a 3' extension encoding the desired editing. The extension contains a Primer Binding Site (PBS) that is partially complementary to the DNA pre-spacer, and an RT template that encodes the desired editing and downstream genomic sequence. After PE Ribonucleoprotein (RNP) binds to the target site and nicks the DNA strand containing PAM, the resulting nicked DNA strand base pairs with PBS in the pegRNA, directing reverse transcription of the RT template directly to the target DNA site 14 . The 3' flap of newly synthesized edited DNA is then broken down by cellular DNA repair pathways, resulting in the installation of the desired edits at the target site.
The versatility of directing editing stems from the ability of the 3' extension of pegRNA to encode a variety of editing sequences. Despite its versatility, the efficiency of current guidance editors varies greatly between target sites and cell types 14 . In this example, it is described that putative degradation of the 3' extension of pegRNA would impair the efficiency of guided editing. Although the resulting truncated pegRNAs compete for target site participation, they cannot be guide edited. To address this vulnerability, RNA motifs that protect pegRNA integrity and widely enhance guided editing efficiency were identified at multiple target sites of multiple cell lines via multiple delivery modes. The resulting engineering pegRNA (epegRNA) greatly improves the effectiveness and scope of the guided editing.
Results
RNA stability limits the efficacy of pegRNA
Unprotected nuclear RNAs are susceptible to degradation from 5 'and 3' ends by exonucleases 15 . Compared to the whole guide RNA protected by the associated Cas9 protein 16 The 3' extension of pegRNA may be exposed in the cell and thus more susceptible to exonucleolytic degradationInfluence of solution. While partially degraded pegrnas may retain their ability to bind Cas9 and bind target DNA sites, loss or truncation of PBS may prevent their ability to install the desired editing, occupying PE proteins and target sites, whereas guide RNAs cannot mediate guided editing.
To demonstrate this possibility, HEK293T cells were transfected with a mixture of two plasmids in different proportions, which produced full length pegRNA containing RT templates encoding T-a-to-a-T transversions, or truncated pegRNA containing RT templates encoding T-a-to-G-C transversions but lacking PBS at the 3' end. The two pegRNAs target the same or different genomic loci in human cells. The effect of adding a plasmid that generates a non-interacting SaCas9 perna that should compete for transcription with the SpCas9 perna encoding plasmid, but not interact with the guide editing protein, was also tested. When full-length and truncated pegRNA target the same site, increasing the yield of truncated pegRNA inhibited PE activity (FIG. 104B). In contrast, neither truncated perna targeting a different genomic site nor non-targeted SpCas9 sgrnas were more able to block PE activity than SaCas9 pernas (fig. 104B). These data indicate that degraded pegRNA with truncated 3' extension inhibits PE activity by making editing disabled to direct editor Ribonucleoprotein (RNP) to compete for the target genomic locus.
Design of engineering pegRNA (epegRNA) to improve guided editing efficiency
Truncated pegrnas have been identified as potent inhibitors of guided editing, and subsequent attempts were made to minimize pegRNA degradation. It is contemplated that the 3' -terminal structured RNA motif of pegRNA may increase the stability of pegRNA, consistent with the ability of the 5' or 3' -terminal RNA structure to enhance mRNA stability in human cells and yeast 17,18 . For example, long-chain non-coding RNA MALAT1 is stabilized by isolating the triple helix of its poly (A) tail, thereby limiting degradation and nuclear export 19
The use of one of two stable pseudojunctions at the 3' -end of pegRNA tested whether it was possible to improve the efficiency of guided editing by incorporating additional RNA structures: modified prequeosine 1 -1 riboswitch aptamer 20,21, (evaparq 1), or from moloney murine leukemia diseaseThe frameshift dummy junction 22 of the toxin (MMLV), hereinafter "mpknot" (fig. 108). EvopreQ1 was chosen because it is one of the smallest naturally-derived RNA structural motifs, with a defined tertiary structure (42 nucleotides in length, nt) 20,21 . It is speculated that smaller motifs minimize the formation of secondary structures that may interfere with the function of the pegRNA. Furthermore, shorter pegRNAs can be more easily produced by chemical synthesis. Mpknot is chosen because of its tertiary structure and because it is an endogenous template from which to engineer MMLV RT that typically directs RT in editors, which increases the likelihood that Mpknot may contribute to recruitment of RT.
It was tested whether these epegrnas could insert a FLAG epitope tag sequence at five genomic loci of HEK293T cells using PE3 (fig. 105A). To reduce the likelihood of motifs interfering with pegRNA function during guided editing, an 8-nt linker was included to direct evatreQ 1 Or mpknot is ligated to the 3' end of epegRNA PBS. The linker sequence was designed using Vienna RNA 23 To avoid potential base pairing interactions between the linker and PBS or between the linker and the pegRNA spacer 14 . An average 2.1-fold improvement in FLAG tag insertion efficiency was observed with epegr as compared to typical pegRNA at all five genomic sites tested, while editing: the indel ratio did not change significantly (FIGS. 109A-109C), indicating that the 3' terminal pseudoknot motif could increase PE efficacy.
The role of the linker sequence in editing efficiency was characterized by comparing the ability of epegRNA with or without 8-nt linker to mediate transversion or FLAG tag insertion. After removal of the adaptor containing the epegRNA of mpknot, a decrease in PE3 editing efficiency was observed (p=0.022), but for the case containing evatreQ 1 No significant difference was observed for epegrnas (fig. 110), probably because evoreq 1 was smaller than mpknot and less susceptible to steric clash with RT. Although having evoparq 1 The overall average editing efficiency of the epegrnas was similar (with or without linker), but it was noted that the performance of the epegrnas without linker was occasionally reduced (fig. 110). Thus, unless otherwise noted for all subsequent epegRNA designs, 8-nt linkers were used.
To ensure that this improvement in PE efficacy was not limited to epegrnas with longer extensions, 148 additional epegrnas were tested that encoded various point mutations or deletions with different RT template lengths at seven different genomic sites of HEK293T cells using PE 3. The use of either motif results in an average 1.5-fold improvement in guided editing efficiency relative to typical pegrnas across all test sites and pegrnas in HEK293T cells, while editing: the indel ratio did not change significantly (FIGS. 105B-105C, FIGS. 111A-111K, and FIGS. 112A-112C). Taken together, these results indicate that epegrnas widely increase PE efficacy in HEK293T cells.
Engineering pegRNA to improve guided editing in a variety of mammalian cell lines
Previously, significant differences in PE efficiency between mammalian cell types were observed 14 It is highlighted that there is a need to test improved PE systems in various cells. Comprising 3' evoreQ 1 Or the epegRNA of the mpknot motif can insert a 24-bp FLAG epitope tag via PE at HEK3 of the K562, U2OS and HeLa cells tested, delete 15bp at DNMT1, or install a C.G-to-A.T transversion at RNF 2. In all these cell lines, the efficiency of epegRNA editing was greatly improved compared to pegRNA, with the K562 cells editing 2.4-fold higher on average, heLa cells 3.1-fold higher, and U2OS cells 5.6-fold higher in all tested edits (FIG. 105D), while editing: the indel ratio was not reduced (FIGS. 109A-109C). These results indicate that epegrnas can be used to enhance guided editing in a variety of mammalian cell lines. Furthermore, the greater degree of editing efficiency of epegrnas in non-HEK 293T cells than in HEK293T cells (fig. 105A and fig. 111A-111K compared to fig. 105D) suggests that epegrnas are particularly beneficial in cell lines where the original PE system was transfected or editing efficiency was lower.
Influence of engineered pegRNA on off-target guided editing
It has been previously demonstrated that compared to other CRISPR gene editing strategies 14,24-27 Off-target editing resulting from guided editing is much less. To determine evoparq 1 Or whether the addition of mpknot changes the extent of off-target editing, using a template transversion with PE3 (T.A-to-A.T at HEK3 or at EMX1 and FANCFHEK293T cells were treated with pegRNA or epegRNA targeting HEK3, EMX1 or FANCF for G.C-to-T.A) or 15-bp deletions. For each targeted locus, the extent of indel generation was measured, along with any nucleotide changes that might reasonably occur in the guided editing of the off-target sites confirmed in the first four experiments 28 And comparing the degree of off-target editing between the epegr rna and the unmodified pegRNA after treatment with PE 3. In all cases, epegRNA and pegRNA exhibited 0.1% or less of off-target guided editing and/or indels at the site examined (FIG. 113), indicating that epegRNA and pegRNA exhibited similar levels of off-target editing.
Enhancement of the basis of guided editing using engineered pegRNA
Epegrnas can enhance the guided editing results by a variety of mechanisms, including resistance to degradation, higher expression levels, more efficient Cas9 binding, and/or target DNA conjugation upon complexing with Cas 9. Each of these possibilities is discussed.
To determine evoparq 1 Or mpknot, whether it prevents degradation of the 3' extension of the pegRNA, the stability of epegRNA and pegRNA were compared after in vitro incubation with HEK293T nuclear lysate containing endogenous exonuclease. It was found that pegRNA was degraded to a greater extent in this treatment than epegRNA (compared to evapmeQ 1 1.9 times compared to mpknot, 1.8 times compared to mpknot, p<0.005, fig. 106A). In contrast, addition of Cas9, which binds to the guide RNA scaffold and possibly protects the core sgrnas from degradation, rescues the perna abundance compared to either epegrnas, as determined by RT-qPCR quantification of the guide RNA scaffold (fig. 106B).
The ability of the 3 'structural motif to increase the abundance of the upstream scaffold region (fig. 106B) suggests that perna degradation in the nucleus is dominated by 3' directed degradation. The model is consistent with the characteristic behavior of the nuclear exosomes, which are the main sources of RNA turnover in the nucleus 29 . However, partially degraded pegRNA would generate edit-disabled RNPs, which were previously shown to inhibit guided editing (fig. 104C). To detect partially degraded RNA in cells, pegRNA or epe inserted with a +1FLAG tag at PE2 and templated HEK3 or nucleotide inverted at EMX1 was analyzed via northern blotting Plasmid transfected HEK293T cell lysate of gRNA. RNA species containing the sgRNA scaffold and comparable in size to the sgrnas were observed, consistent with previous findings (fig. 106B), i.e., cas9 binding protected the scaffold from 3' -directed degradation (fig. 114A-114C). However, lysates with different total levels of pegRNA or epegr RNA had similar levels of sgRNA-like truncated species, which represent only a small fraction of the guide RNA content of the lysate (fig. 114A-114C). Since strong degradation of pegRNA exposed to nuclear lysates was observed in vitro (FIGS. 106A-106B) and the level of pegRNA in HEK293T cells was higher than PE2 (FIG. 104B), the partially degraded pegRNA species may not accumulate to a level suitable for northern blot detection.
Next, genome-directed editing intermediates were examined to better understand how epegrnas mediate increased editing efficiency. In the current model, the 3 'flap intermediate generated by RT extension of the nicked targeting site is converted to a 5' flap intermediate, replacing the original genomic sequence with the newly synthesized genomic sequence 14 . This 5' flap is then removed by a 5' -3' exonuclease, and the resulting genomic nicks are ligated to install guided editing 14 . While full length pegRNA is expected to template the RT extension of the nicked genomic strand efficiently, truncated pegRNA without PBS should not do so, but rather result in nicking of the target strand followed by chew-back or extension of the strand by DNA repair enzymes (in either case, without templated editing). If more RT extension-directed editing intermediates are observed in the epegRNA than in the pegRNA, this suggests that the addition of the 3' RNA motif may improve the integrity of PBS.
To capture these intermediates, the PE2 encoding plasmid and the subverted unmodified pegRNA or the evatreQ containing template HEK3, DNMT1, EMX1 or RNF2 were used 1 Or mpknot's epegRNA transfected HEK293T cells. Next, terminal transferase was used to label oligo-dG at the 3' end of genomic DNA, which should include guided editing intermediates that have not yet been ligated. In each case, the epegRNA reduced the extent of editing the disabled intermediate at the targeted site by an average of 2.2-fold in the four sites (FIGS. 106C and 115A-115C). The main reverse transcription product consists ofThe 3' extended templated complete sequence and the two nucleotides templated by the last two nucleotides of the pegRNA scaffold are consistent with the in vitro characterization of the previous PE intermediate. The scaffold templated nucleotides may be removed during DNA repair at the target locus to produce a cleanly edited allele representing the main product of PE. These data are consistent with a model in which epegRNA improves reverse transcription of pegRNA extension to target sites by reducing the frequency of non-productive target site nicks generated by a guide editor that binds to truncated pegRNA.
Because the single-stranded 3 'end is a common feature of 3' exonuclease substrates 30 It was next tested whether the degradation resistance conferred by these motifs can be explained by the higher mechanical stability tertiary structure of the pseudo-junctions. Notably, the addition of a 15-bp (34-nt) hairpin to the 3' end resulted in inconsistent improvement in PE efficiency compared to the addition of a pseudo-knot (FIGS. 116A-116D), indicating that tertiary structure is indeed an important feature of epegRNA.
To test whether a tertiary pseudo-junction structure is required for epegRNA mediated PE efficiency enhancement, evaparQ was tested 1 Editing efficiency of epegRNA containing G15C Point mutation, a mutation known to disrupt pseudoknot formation (M1 in FIG. 108) 23 . The epegRNA was used to install 24-bp FLAG epitope tag insertions, 15-bp deletions or transversions at HEK3 or RNF2 of HEK293T cells using PE 3. In fact, the G15C mutation was incorporated into evoreQ 1 The increase in editing efficiency is eliminated (fig. 106D). These results indicate that the secondary or tertiary structure of the motif is critical for epegRNA-mediated PE improvement, possibly by stabilizing 3' extension.
Next, the structured 3' motifs in the epegrnas were tested to determine if they were likely to increase their expression levels compared to pegrnas. RT-qPCR quantification of the pegRNA scaffolds revealed a target-dependent difference in the level of epegRNA expression relative to unmodified pegRNA (FIGS. 114A-114C). For pegRNA with +1FLAG tag insertion at templated HEK3, addition of evoreQ 1 or mpknot was observed to reduce pegRNA expression by 9.2 to 9.6-fold, although FLAG tag epitope insertion efficiency at HEK3 produced a 1.9-fold increase (FIG. 105A). Similarly, the transversed epegrnas at templated DNMT1 also showed reduced expression (1.6 to 2.1 fold). However, the expression level of the subverted epegRNA at either the templated RNF2 or EMX1 was higher than that of the unmodified pegRNA (2.2 to 2.4 fold and 1.4 to 3.7 fold, respectively, FIGS. 114A-114C). These data indicate that the 3' motif affects pegRNA expression non-consistently, consistent with early findings (FIG. 104B), i.e., PE efficiency under these transfection conditions is not limited by pegRNA expression in HEK293T cells. However, increasing epegr expression may further increase editing efficiency when epegr expression is more limited.
Next, the ability of the dCAS9-VP64-p65-Rta (dCAS 9-VPR) fusion to support transcriptional activation was demonstrated by comparing epegRNA with pegRNA 32,33 It was tested whether the addition of the 3' rna structural motif reduced the involvement of the target DNA site. HEK293T cells were transfected with plasmids encoding dCas9-VPR, HEK3, DNMT1, RNF2 or GFP downstream of the EMX1 target pre-spacer, pegRNA, epegRNA or sgRNA targeting the corresponding sites. Transcriptional activation was measured three days later via cellular GFP fluorescence. In contrast to its ability to enhance PE activity (fig. 105A), epegrnas exhibit Cas 9-dependent transcriptional activation similar to pegrnas in HEK293T cells (fig. 106F). Both epegrnas and typical pegrnas resulted in lower transcriptional activation compared to sgrnas targeting the same site (3.0-fold for pegrnas, 2.3-fold for evoreq 1 epegrnas, and 1.9-fold for mpknot epegrnas among the four sites), indicating that 3' extension moderately impedes target site engagement in both pegrnas and epegrnas.
To understand the potential changes in convolved target site engagement and differences in pegRNA and epegr rna expression, microphoresis (microscale thermophoresis, MST) was performed to measure the affinity of pre-incubated RNP complexes of catalytically inactive Cas9 (dCas 9) and pegRNA or epegr rna to dsDNA substrates. It was found that the addition of mpknot or evoreq 1 resulted in a comparable or moderate decrease in binding affinity to dsDNA compared to unmodified pegRNA, respectively (kd=10 nM for evoreq 1 epegRNA and 21nM for mpknot pegRNA versus 8.1nM for unmodified pegRNA, fig. 106E). Either motif also moderately reduced the affinity of the pegRNA for Cas 9H 840A nickase (kd=18 nM for evatreq 1 epegRNA, 11nM for mpknot pegRNA, 5nM for unmodified pegRNA; fig. 106G). These findings indicate that the enhanced PE efficiency of the epegrnas does not result from improved binding of the pegrnas to Cas9 or PE RNP complexes to the target site.
Taken together, these results indicate that epegr is more resistant to cellular degradation than pegRNA, thus generating fewer truncated pegRNA species that impair the efficiency of guided editing. Other mechanisms behind the epegRNA improvement cannot be excluded.
Optimization of engineered pegRNA 3' motifs
It has been determined that epegRNA increases editing efficiency by resisting exonucleolytic degradation, presumably more stable RNA motifs would further increase PE activity. Twenty five additional structured RNA motifs were screened for the ability to increase the efficiency of epegRNA editing between epegrnas encoding the installation of a 24-bp FLAG epitope tag insertion, a 15-bp deletion, or a transversion at HEK3 or RNF2 (fig. 116A-116D, fig. 117A-117C). These motifs include additionally evolved prequeosines 1 -1 riboswitch aptamer 21 Mpknot variants with improved pseudo-junction stability 22 G-quadruplex with increased stability 34 15-bp hairpin, xrRNA 35 And group I intron P4-P6 domain 36 . While 123 of the 137 epegrnas tested exhibited improved overall guided editing compared to the corresponding pegrnas, none of the edits tested exhibited a comparable evoreq in most of the tested edits 1 Or mpknot consistent improvement (fig. 116A-116D, fig. 117A-117C).
Next, from the added evoreQ 1 And trimming unnecessary sequences in the mpknot motif can further improve epegRNA design, as removal of extraneous sequences in the structured RNA can reduce the propensity for misfolding 37 . It was found that relative to full-length epegRNA, from evoreQ 1 Or mpknot pruned for 5nt of excess sequence resulting in marginal gains in average PE3 editing efficiency (fig. 117A-117C). Since trimming these RNA motifs does not adversely affect editing efficiency and shorter epegRNAs are more readily prepared by chemical synthesis, the trimmed evoreQ 1 (tevopreQ 1) is used in the epegRNAs when the epegRNAs are applied to install treatment-related mutations (see below).
"turnover and extension" (F+E) sgRNA scaffolds were also tested 38 Whether the epegRNA editing efficiency will be further improved. The guide RNA scaffold mutates the directly repeated fourth base pair from U.A to A.U to remove the potential pol III terminator and directly repeats the five base pairs to improve Cas9 binding 38 . HEK293T cells were transduced with lentiviruses encoding unmodified (f+e) pegRNA, tevopreQ 1-containing (f+e) epegRNA or tevopreQ1 epegRNA with standard scaffolds that templated the transversion at HEK3 or DNMT1 or 3-nt insertion at HEK 3. The use of tevopreq1 significantly improved the editing efficiency (3.8-fold for nucleotide transversion, 2.6-fold for 3-nt insertion at HEK3, 6.8-fold at DNMT 1) (fig. 118). The use of the (F+E) scaffold in tevopreQ1 epegRNA further improved the editing efficiency (1.1-fold for nucleotide transversion, 1.5-fold for 3-nt insertion at HEK3, and 2.5-fold at DNMT 1). sgRNA scaffold variants previously shown to increase Cas 9-nuclease activity under transfection conditions with reduced plasmid amounts 39 Is also characterized and similar overall benefits are observed, albeit with greater variability (fig. 119). These findings further indicate that epegRNA mediates a greater increase in PE efficiency when expression is limited. Furthermore, these data highlight the potential of modified scaffolds to bind epegrnas to increase PE efficiency.
Calculation tool for designing epegRNA joint
RNA linkers are more likely to be sequence dependent than protein linkers, so that the same linker may work on one epegRNA but hinder the other. To minimize the likelihood of interference from the epegRNA linker, pegLIT (pegRNA linker identification tool) was developed (FIGS. 120A-120F), which is a computational tool to identify the linker sequence that is expected to least base pair with the rest of the epegRNA. For initial validation, two sets of 15 evoreQ 1 epegRNAs templated either the C.G-to-A.T transversion at RNF2 or the 15-bp deletion at DNMT1 were tested using different linkers. In each group, pegLIT recommends five joints; five were predicted to base pair with the spacer and five were predicted to base pair with PBS. The use of the pegLIT designed linker resulted in a modest increase in PE3 editing efficiency (1.2-fold higher for RNF2 and 1.1-fold higher for DNMT 1) over the use of the manual designed linker (figures 120A-120F). Although spacer interactions did not significantly affect editing efficiency, linker-PBS interactions were associated with a decrease in PE3 editing efficiency, resulting in a 1.3-fold and 1.1-fold decrease in editing efficiency, respectively, compared to the pegLIT linkers of RNF2 and DNMT 1. The two worst performing linkers resulted in 1.9-fold and 3.4-fold lower editing efficiency at PE3 of RNF2 relative to the optimal linker sequence, which was correctly identified by pegLIT as poor in PBS interaction score (figures 120A-120F). The closer the linker is to the PBS than the spacer, the more likely the linker is: PBS interactions with linker: spacer pairing has an entropy advantage over the pairing.
The PegLIT designed linker sequences were studied to determine if they could increase the efficacy of two epegRNAs (templated G.C-to-T.A transversions at EMX1 and 15-bp deletions at VEGFA) that initially failed to exhibit improved editing (FIGS. 111A-111K). In fact, the use of the pegLIT designed linker increases the PE3 editing efficiency by 1.3 and 1.4 fold, respectively, over the two edited pegrnas (fig. 120A-120G). Overall, these findings indicate that pegLIT facilitates the use of epegrnas to continue to improve guided editing results.
The PegLIT designed linkers were also studied to determine if they increased the activity of the epegRNA compared to the epegRNA without the linker. The addition of the pegLIT designed linker resulted in a slight improvement in editing efficiency compared to the mpknot epegRNA without linker compared to when using the manually designed linker (fig. 110 and 120A-120F). In contrast, the use of pegLIT linkers did not significantly increase editing with either evoreq 1 or tevopreQ1 epegrnas relative to the epegrnas without linkers (figures 120A-120G).
Chemically modified epegRNA to improve editing efficiency
Chemically synthesized gRNA is typically used when transfecting cells with mRNA or RNP 40 . Although synthetic gRNA may incorporate chemical modifications to promote resistance to degradation by exonucleic acids (exonucleolytics) 16,40 It is believed that the specified structural motif may still mediate additional improvements in connection with such modifications.
To demonstrate this possibility, tevopreQ was synthesized 1 epegRNA and synthetic pegRNA at five genomic sites (HEK 3, a. Sup. 3. Sup. Th.) in HEK293T cells,The efficiency of pilot editing of RNF2, DNMT1, RUNX1 and EMX 1) mounting point mutations or 15-bp deletions was compared. Both epegRNA and pegRNA contain 2' -O-methyl modifications and phosphorothioate linkages between the first and last three nucleotides of RNA. For six of the seven pegrnas tested, the corresponding epegr rnas exhibited 1.1 to 3.1 fold higher edits, with edits: the indel ratio was unchanged (FIGS. 121A-121B). These data indicate that epegRNA can also enhance PE results compared to pegRNA in applications using chemically synthesized and modified pegRNA.
Guided editing of engineered pegRNA improved treatment-related mutations
After validating the use of epegRNA as a strategy for broadly enhancing PE activity, we next contained tevopreQ 1 The activity of the epegRNA is compared with the activity of the pegRNA to install various protective or therapeutic genetic mutations. epegRNA has been successfully used to install the PRNP G127V allele in HEK293T cells to protect against human prion diseases 41,42 Wherein the efficiency was 1.4-fold higher than typical pegRNA (FIG. 107A). Furthermore, epegRNA was used to correct the most common cause of Tay-Sachs disease (HEXA) 1278+TATC ) Whether in the prior construction of HEXA via plasmid lipofection 1278+TATC In the HEK293T cell line, also in primary patient-derived fibroblasts transfected via nuclei of in vitro transcribed mRNA and synthetic pegRNA (fig. 107B-107C). In both cases, higher editing efficiency of tevopreQ1epegr rna containing the 8-nt linker of pegLIT design was observed than for typical pegRNA (2.8-fold higher in HEK293T cells and 2.3-fold higher in patient-derived fibroblasts).
Treatment-related editing using non-optimized epegRNA installation
The design and selection of many pegRNAs with different PBS and RT templates is an important first step in successful use of guided editing 14 . Although the general rules of wizard PBS and RT template length and composition have been described 14,43 Identification of the best pegRNA generally requires extensive screening of the pegRNA constructs. It is believed that epegRNA can support more efficient installation of treatment-related guided editing even without extensive pegRNA optimization. The ability of non-optimized pegRNA and epegRNA to template the installation of nine protective or pathogenic point mutations using PE 2. In all cases, the pegRNA and epegRNA used in this experiment contained 13-nt PBS and RT template containing 10nt homology to the target site after the last edited nucleotide, unless the 3' extension started with cytosine 14 In which case it extends to the nearest non-C nucleotide.
Detection of mounting and Alzheimer's disease 44 Coronary heart disease 45,46 Type 2 diabetes mellitus 47 Innate immunity 48 Deficiency of CDKL5 49 Lamin a defect 50 And Rett syndrome 51,52 Associated treatment-related mutated pegRNA. These nine mutations include protective alleles in APP, PCSK9, SLC30A8, CD209 and CETP, and pathogenic mutations in CDKL5, LMNA and MECP2 54 . Results of pilot editing using pegRNA and corresponding tevopreQ1 epegRNA with 8-nt pegLIT linker in HEK293T cells were compared (FIG. 111D). Only one pegRNA or epegRNA design was tested per target. In each case, the efficiency of editing of epegRNA was superior to that of pegRNA. For five of the nine treatment-related edits tested, the efficiency of the epegRNA editing was ≡20%, which is generally sufficient to generate model cell lines. In contrast, only three of the nine pegRNAs achieved this level of editing efficiency. The higher editing efficiency mediated by epegRNA (2.8 times higher on average than pegRNA) should simplify the generation of homozygous cell lines, an important consideration for mimicking recessive mutations. Likewise, 5 of the non-optimized epegRNAs mediate 24-bp FLAG tag insertion at > 10% efficiency in 15 test sites; the corresponding pegRNA did not reach an efficiency of > 10% at any of the test sites (FIGS. 122A-122B). Taken together, these findings indicate that epegRNA simplifies the generation of model cell lines using PE.
Discussion of the invention
The design, characterization and validation of engineered pegRNAs is presented herein to address key bottlenecks in guided editing. These epegRNAs contain the structured RNA motif 3' of PBS, preventing degradation of pegRNA extension and subsequent formation of uneditable PE complexes that compete for entry into the targeted genomic site. As a result, it was found that epegRNA increased the efficiency of pilot editing extensively in all five cell lines and primary cell types tested, with greater improvement observed in more difficult transfected cell lines. Furthermore, it was observed that the use of epegRNA can improve guided editing performance when using chemically modified pegRNA, when installing treatment-related edits in human cells, and when using non-optimized pegRNA designs. Finally, a computational program is described that accelerates the epegRNA design by identifying linkers that minimize the risk of adverse secondary structures. Taken together, these findings indicate that epegrnas widely improve guided editing results for various genomic loci, editing types (substitutions, insertions and deletions) and cell types.
The improvement of guided editing initiated by epegrnas may depend on the delivery strategy. When the pegRNA concentration is limited, the lower expression delivery profile (e.g. some viral vectors) may benefit more strongly from the use of epegrnas (fig. 119). Likewise, further improvements in chemically modified RNA synthesis may reduce the benefits of epegRNA by mitigating pegRNA 3' degradation. Furthermore, given the current challenges of chemically synthesizing longer RNAs, longer epegRNA lengths (using tevopreQ when using synthetic epegrnas 1 An increase of 37 nt) is an important consideration.
It is recommended to use epegRNA for guided editing experiments, which can support moderately longer pegRNA. When it is not a priority to maximize editing efficiency, extensive screening may not be required. In these cases, PBS length 13 contains trimmed evoreQ 1 The motif and the adaptor of the 8-nt pegLIT design of the epegRNA, and templates comprising targeting 10nt homology after editing for small insertions, deletions and point mutations-or 25nt homology for larger insertions or deletions-provide a promising starting point for the epegRNA design. If insufficient editing efficiency was observed, PBS, RT template length and nicking-generating sgRNA could be optimized.
PegLIT strategy for identifying optimal linker sequences
pegLIT uses analog fallbackFire efficient sampling of analyzed joint space 1 . pegLIT is preferably an adenosine or cytosine rich linker, as these nucleotides are reported to perform better as flexible RNA linkers 2 . Furthermore, pegLIT filters out linkers containing four or more uridine segments, as such sequences may lead to premature termination of transcription 3
Then, pegLIT tool uses ViennaRNA 4 The linkers passing these requirements are analyzed to predict potential interactions between the linker sequence and the pegRNA spacer, PBS, template or scaffold. These base pair probabilities of predicted interactions are used to generate a sub-score for each region of the pegRNA, each sub-score representing the extent to which the predicted linker avoids interaction with the relevant region. For example, a sub-fraction of 0.95 for PBS basically indicates that on average, the predicted probability of a pegRNA folding state lacking base pairing between any of the linker nucleotides and PBS is 95%.
The use of pegLIT has been validated against the linker design and it has been checked which interactions identified by pegLIT are least detrimental to editing efficiency. 30 linker sequences (10 recommended by pegLIT, 10 interactions with spacer, 10 interactions with PBS) were generated to use the c.g-to-a.t transversion at templated RNF2 or the 15-bp deletion at DNMT1 evaparq 1 The epegRNA was tested. The average spacer and PBS subfractions for the optimal sequences were 0.94 and 0.97, the spacer sequences were 0.66 and 0.95, and the PBS sequences were 0.86 and 0.21. The use of PBS interactive linkers correlated with 1.3-fold and 1.1-fold reduction in editing efficiency at RNF2 and DNMT1, respectively, relative to the recommended design (figures 120A-120G), while spacer interactive linkers had negligible effect on editing efficiency. This difference may be due to the closer distance of the linker from the PBS than the spacer, which may allow for the linker: PBS interactions with linker: spacer pairing has an entropy advantage over the pairing.
Delivery of epegRNA via plasmid transfection with optimized guide RNA scaffolds in HEK293T cells
To simulate lower expression conditions, 20ng of the PE2 plasmid and 4ng were used in assessing the suitability of the "flip and extend" (F+E) sgRNA scaffold variants for PEHEK293T cells were transfected with pegRNA or epegRNA plasmids. Comparison of the typical sgRNA scaffolds, (f+e) scaffolds containing targeting PRNP, HEK3, RUNX1 and EMX1 5 Or one of the six (f+e) scaffolds with mutations previously shown to increase Cas9 nuclease activity 6 Editing efficiency of epegrnas. It was found that these alternative stents generally maintain or increase PE efficiency relative to standard stents, with cr772 exhibiting the best improvement (fig. 119). Although the efficiency improvement was less consistent under these conditions compared to lentiviral transduction (fig. 118), this may result from the differences in expression. EpegRNA expression after plasmid transfection may be several times higher than expression after single copy lentivirus transduction, which may partially confound the benefits of more efficient transcription and Cas9 binding affinity. It is suggested to test cr772 or the original (f+e) scaffold to further increase PE efficiency using epegrnas, especially for applications where expression is lower than plasmid transfection.
Mounting FLAG tag using uneptimized epegRNA
More challenging edits were made to install, such as insertion of a 24-bp FLAG epitope tag to compare epegr and pegRNA (fig. 105A). The ability of non-optimized pegRNA and tevopreQ1 epegRNA templated containing one of the two locus specific pegLIT designed 8-nt linkers to mount FLAG epitope tags at 15 loci in HEK293T cells using PE2 was evaluated (FIGS. 122A-122B). The non-optimized epegRNA and pegRNA were designed with 13nt PBS and RT templates containing 25nt homology downstream of the inserted FLAG epitope tag unless 3' extension starts with cytosine 7 In this case the template extends to the nearest non-C nucleotide. The use of epegr RNA enabled FLAG tags to be installed with PE2 at > 10% efficiency, with 5 out of 15 sites without PBS and RT template optimization, whereas > 10% efficiency was not observed with any pegRNA (FIGS. 122A-122B). These observations further indicate that epegrnas can enhance the guided editing performance of various edits to many different endogenous human genomic loci.
Method
General procedure. Plasmids expressing pegRNA and epegRNA were assembled by Gibson and Golden Gate Assembly using the custom made previously describedAcceptor plasmid 14 Or a newly designed evoreQ with pruning 1 Or mpknot's custom receptor plasmid (the use of which is described below) or they are synthesized and cloned from Twist Biosciences. Plasmids expressing sgrnas were cloned via Gibson or USER assembly. DNA amplification was accomplished by PCR using Phusion U or High Fidelity Phusion Green Hot Start II (New England Biolabs). Plasmids expressing pegRNA were purified using PureYIeld plasmid miniprep kit (Promega) when transfecting HEK293T cells or Plasmid Plus Midiprep kit (Qiagen) when transfecting other cell types, whereas plasmids expressing the guide editor were purified using only Plasmid Plus Midiprep kit. Plasmids ordered from Twist Biosciences were resuspended in nuclease-free water and used directly. Primers and dsDNA fragments were purchased from Integrated DNA Technologies (IDT).
Guidelines for epegRNA cloning via Golden Gate DNA assembly 61 . When the Golden Gate method is used to clone epegRNA, the same protocol as previously described is followed 14 Is appropriate, an important precaution is the use of tevopreQ 1 And the ligation sequence between the 3' extension oligonucleotide of the trimmed mpknot (tmpknot) epegRNA and the plasmid backbone was different, as shown below. For more details on pegRNA design and cloning, please access liukroup. Plasmid backbone for Golden Gate clones has been deposited in Addgene. SEQ ID NO 486-489 (top-bottom): forward oligonucleotide 5'-GTGCNNNNNNNNNNNNNNNNNNNNNNNN-3' for 3 'extension of pegRNA and epegRNA reverse oligonucleotide 3' -NNNNNNNNNNNNNNNNNNNNNNNNAAAA-5 'for 3' extension of pegRNA was used with tevopreq 1 3' -NNNNNNNNNNNNNNNNNNNNNNNNGCGC-5' of the 3' -extended reverse oligonucleotide of the epegRNA of-1 for 3' -NNNNNNNNNNNNNNNNNNNNNNNNGGGAGTC-5' of the 3' -extended reverse oligonucleotide of the epegRNA with tmpknot '
Synthetic pegRNA and in vitro transcribed mRNA are produced. The synthesized pegRNA was ordered from IDT, contained 2' -O-methyl modifications at the first and last three nucleotides, phosphorothioate linkages between the first and last nucleotides, and could be used directly. Synthetic nicking-producing sgrnas were ordered from synthesis, containing 2' -O-methyl modifications at the first and last three nucleotides, and phosphorothioate linkages between the first three and last two nucleotides. PE-encoded mRNA was transcribed in vitro using the protocol of Gaudelli et al (2020). Briefly, the PE2 cassette (consisting of 5'UTR, kozak sequence, PE2 ORF and 3' UTR) was cloned into a plasmid containing the inactive T7 (dT 7) promoter. The mRNA transcription template was generated via PCR using primers to install the correct T7 promoter sequence and reverse primers to install the poly-A tail. mRNA was generated using the HiScribe T7 high-yield RNA kit (New England Biolabs) according to the manufacturer's instructions except that N1-methyl pseudouridine triphosphate (Trilink) was used instead of uridine triphosphate and CleanCapAG (Trilink) was added to cap co-transcription. The resulting mRNA was purified via lithium chloride precipitation and reconstituted in TE buffer (10mM Tris,1mM EDTA,pH 8.0, 25 ℃). The sequences of pegRNA and sgRNA used in this example can be found in Table E1. The structured RNA base sequence listing detected in this example can be found in Table E2.
General mammalian cell culture conditions. HEK293T (ATCC CRL-3216), U2OS (ATCC HTB-96), K562 (CCL-243) and HeLa (CCL-2) cells were purchased from ATCC and cultured and passaged in Dulbecco's Modified Eagle's Medium (DMEM) supplemented with GlutaMax (Thermo Fisher Scientific), mcCoy's 5A medium (Gibco), glutaMAX (Gibco) -supplemented RPMI medium 1640 or Eagle minimal essential medium (EMEM, ATCC), respectively, supplemented with 10% (v/v) fetal bovine serum (Gibco, acceptable), respectively. Primary tesadi patient fibroblasts were obtained from Coriell Institute (cat.id GM 00221) and grown in low glucose DMEM (Sigma Aldrich) and 10% (v/v) FBS, supplemented with additional 2mM L-glutamine (Thermo Fisher Scientific). All cell types were at 37℃with 5% CO 2 Incubate, maintain and culture under conditions. Each cell line was validated by its respective supplier and was negative for mycoplasma testing.
Tissue culture transfection and nuclear transfection protocols and genomic DNA preparation. For transfection, 10,000 HEK293T cells were seeded in each well of a 96-well plate (Corning). 16-24 hours after inoculation, cells were transfected with 0.5. Mu.L Lipofectamine 2000 (Thermo Fisher Scientific) and 200ng PE plasmid, 40ng pegRNA plasmid and 13ng sgRNA plasmid (for PE 3) at approximately 60% confluence according to the manufacturer's protocol.
For nuclear transfection, HEK293T cells were electroporated with in vitro transcribed mRNA and synthetic pegRNA using Lonza4D Nucleofector with SF cell line kit (Lonza). 200,000 cells were centrifuged at 120x g for 8 min per electroporation and then washed in 1mL PBS (Thermo Fisher Scientific). After the second centrifugation, the cells were resuspended in 5 μl of recombinant SF buffer per sample and added to the microcontainer.
For each cuvette, 17 μl of cargo mixture (1 μg PE2 mRNA in 0.5 μl, 60pmol of nick-generating sgRNA in 90pmol pegRNA,0.6 μl in 0.9 μl, and 15 μl reconstituted SF buffer) was added and pipetted up and down three times for mixing. Cells were electroporated using program CM-130, then 80 μl of warmed medium was added and the cells were incubated at room temperature for 10 minutes. The mixture was then pipetted and 25 μl was added to the wells of the 48-well plate, with a final culture volume of 250 μl per well. For experiments with HeLa, U2OS and K562 cells, 800ng of the PE2 expression plasmid, 200ng of the pegRNA expression plasmid and 83ng of the nick-producing sgRNA expression plasmid were nuclear transfected in a 16-well nucleovette band (Lonza) in a final volume of 20 μl. HeLa cells were nuclear transfected using SE Cell Line 4D-Nucleofector X Kit (Lonza) according to the manufacturer's protocol, 2X 10 per sample 5 Individual cells (procedure CN-114). U2OS cells were nuclear transfected using SE Cell Line 4D-Nucleofector X Kit (Lonza) according to manufacturer's protocol, 2X10 per sample 5 Individual cells (program DN-100). Nuclear transfection of K562 cells with SE Cell Line 4D-Nuclear electrode X kit (Lonza) according to the manufacturer's protocol, 2X10 per sample 5 Individual cells (program FF-120).
Patient-derived fibroblasts were electroporated using PE2 encoding mRNA and synthetic pegRNA and nick-generating sgrnas as described above for HEK293T cells, using SE cell line kit and 100,000 cells, and 100,000 cells were centrifuged at 100x g for 10 min. In addition, 40. Mu.L of recovered cells, instead of 25. Mu.L, was added to a 48-well plate. In all cases, cells were post-transfectionAfter 3 days of incubation, the medium was removed, the cells were washed with PBS, followed by addition of 50. Mu.L of freshly prepared lysis buffer (10 mM Tris-HCl, pH 8, 25 ℃;0.05% SDS; 25. Mu.g mL) for 96-well plates or 150. Mu.L for 48-well plates -1 Proteinase K (Qiagen)) and incubated at 37℃for 1 hour or more, and then proteinase K was inactivated at 80℃for 30 minutes or more. The gDNA produced was stored at-20℃until use.
High throughput DNA sequencing of genomic DNA samples. Genomic loci of interest are amplified from genomic DNA samples and sequenced on Illumina MiSeq as described previously 14 . Cas9 off-target sites for HEK3, EMX1 and FANCF have been previously determined by Guide-Seq 29 . Primers used for amplification of mammalian cell genomic DNA are listed in Table E3 and amplicons are listed in Table E4. Sequencing reads were demultiplexed using a MiSeq report (Illumina). Using CRISPResso2 59 The amplicon sequence is aligned with a reference sequence. For all guided editing yield quantification, editing efficiency is calculated as the percentage of reads with desired editing and no indels to the total number of reads with an average phred score of at least thirty. For quantification of point mutation editing, CRISPResso2 runs in standard mode and turns on "discard_indel_reads". Edit yield is calculated as the percentage of non-discarded reads containing edits divided by the total reads. For insertion or deletion editing, CRISPResso2 operates in HDR mode, using the desired allele as the intended allele, and turning on the "discard_indel_reads". Edit yield is calculated as the percentage of HDR comparison reads divided by the total reads. For all experiments, indel frequency was calculated as the discarded reads divided by the total number of reads. For experiments involving PE2, indels within 10 nucleotides (inclusive) upstream and downstream of the pepRNA cleavage site were analyzed. For experiments involving PE3, indels between 10 nucleotides (inclusive) upstream of the sgRNA nick site and downstream of the sgRNA nick site were analyzed. As previously described 14 Off-target edits were quantified.
In vitro exonuclease susceptibility assay. Containing mpknot or evopareQ 1 Is used with HiScribe T7 rapid high-yield RNA synthesis kit (New England B)iolabs) were prepared from PCR amplified templates containing the T7 promoter sequence according to the manufacturer's protocol. Nuclear extracts were prepared from 300 ten thousand HEK293T cells grown to 70% -80% confluency using the epiqueik nuclear extraction kit (EpiGentek) according to the manufacturer's protocol. In a solution containing 20mM Tris-HCl (pH 7.5) and 5mM MgCl 2 The assay was performed in a 10. Mu.L reaction system of 50mM NaCl, 2mM DTT, 1mM NTP and 0.8U/. Mu.L RNaseOUT recombinant ribonuclease inhibitor (40U/. Mu.L; thermoFisher Scientific) that inhibited endonuclease activity. Each reaction degraded 0.5. Mu.g of RNA substrate using 3. Mu.L of fresh nuclear lysate. The reaction mixture was then incubated at 37℃for 20 minutes, and the degradation products were separated on a 2.0% agarose gel stained with SYBR Gold. Degradation was determined using ImageJ software (NIH).
RTqPCR of total RNA. 10,000 HEK293T cells per well were seeded in 96-well plates. 16-24 hours after inoculation, cells were transfected with 0.5. Mu.L Lipofectamine 2000 and 200ng PE2 plasmid and 40ng pegRNA or epegRNA plasmid to approximately 60% confluency, according to the manufacturer's protocol. Three days later, total RNA was extracted using Power SYBR Green Cells-to-CT kit (Thermo Fisher Scientific), total cDNA was reverse transcribed with random hexamers, and qPCR was performed with forward and reverse primers amplifying the sgRNA scaffold, according to the manufacturer's protocol. Primer sequences are provided in table E5.
Based on transcriptional activation of Cas 9. 10,000 HEK293T cells per well were seeded in 96-well black wall plates (Corning). 16-24 hours after inoculation, cells were transfected with 0.5. Mu.LLipofectamine 2000 and 100ng dXas 9-VPR plasmid, 30ng GFP reporter plasmid, 15ng iRFP plasmid and 20ng sgRNA, pegRNA or epegRNA plasmid to a confluency of approximately 60% according to the manufacturer's protocol. After three days, GFP and iRFP fluorescence of the cells was measured using an Infinite M1000 Pro microplate reader (Tecan). After subtraction of the background fluorescence signal from untreated cells, GFP fluorescence was normalized to iRFP fluorescence.
Joint design was performed via pegLIT. To design the epegRNA linker sequence, custom algorithm pegRNA linker identification tools or pegLIT are written for searching for linker sequences of specified length, thereby minimizing bases to the rest of the pegRNAPairing. This process uses simulated annealing to maximize the sub-scores, each corresponding to a subsequence of the pegRNA: spacer, PBS, template, or scaffold. In the optimization process, the higher scoring splice in any pair is determined by comparing their discrete sub-scores in the following subsequence order of priority: spacer, PBS, template, then scaffold. Using Vienna RNA 2.0 25 In standard parameters (37 ℃,1M NaCl,0.05M MgCl) 2 ) The base pair probabilities calculated below calculate each sub-score as a complement of the average probability that a nucleotide in the linker forms a base pair with any nucleotide in the pegRNA subsequence under consideration, where the average is taken from all bases in the linker. AC content<50% of the linker sequences and those that would result in a pegRNA containing four identical nucleotides in succession are excluded from consideration 39,40 . Optionally, the algorithm performs hierarchical agglomerative clustering on the 100 highest scoring linkers and outputs one linker for each cluster to promote sequence diversity in the final output. The code for pegLITis is as follows:
/>
/>
/>
/>
/>
/>
/>
the sequence shown is: seq_spacer (SEQ ID NO: 490); seq_scaffold (top) (SEQ ID NO: 491); seq_scaffold (SEQ ID NO: 492); seq_template (top) (SEQ ID NO: 493); seq_template (bottom) (SEQ ID NO: 494); seq_pbs (SEQ ID NO: 495); seq_motif (SEQ ID NO: 219).
Example 5: other strategies for improving guided editing
Other strategies for improving guided editing have also been developed. These include three broad areas in which guided editing can be improved, as shown in fig. 131: 1) Identification of a target nucleic acid; 2) Installing the editing; and 3) decomposition of the edited DNA heteroduplex. The following examples focus on increasing editing efficiency by increasing recognition of target nucleic acids, in particular by reducing interactions between spacer sequences on PBS and pegRNA. The pegRNA and epegrnas may sometimes reduce the involvement of the target site and reduce binding to Cas9 relative to the sgrnas. PBS: spacer interactions can limit the efficiency of guide editing by decreasing Cas9 affinity (figure 132). However, this interaction is also necessary for PBS: pre-spacer binding to occur. As shown in fig. 132, shorter PBS can increase binding affinity to Cas 9. Strategies to reduce PBS: spacer interactions were thus explored, including 1) blocking PBS with a pivot point that dissociates upon Cas9 binding; 2) Trans-delivering the pegRNA template via nicking to produce a nascent sgRNA; and 3) introducing chemical and/or genetic modifications that have different effects on PBS: spacer and PBS: pre-spacer interactions.
First, strategies were explored to block PBS with a pivot point that dissociates upon Cas9 binding. It was observed that if independent of Cas9 binding, the pivot point could inhibit PBS: spacer and PBS: pre-spacer interactions (fig. 133). The MS2 hairpin was fused to the 3' end of the pegRNA, while the MS2 phage coat protein was fused to the reverse transcriptase of the guide editor. As shown in fig. 134, the pivot point can be contended by PE2 binding due to competing RNA-protein interactions. Several design considerations should be taken into account when using this strategy, including 1) the interdependence of both Cas9-RT and RT-MS2 linkers, perna extension and PBS linkers, fulcrum linkers, and linker lengths between MS2 aptamer and fulcrum; 2) Dependence of fulcrum length on PBS melt temperature and site accessibility; 3) Optimization for each site; and 4) tolerance to non-interacting 17 nucleotide PBS. N-and C-terminal fusions of MS2 with PE2 were tested. The use of the C-terminal MS2 fusion has been found to result in editing efficiency superior to the N-terminal fusion at HEK3 (fig. 135). MS2 labeling of PE2 was observed to provide advantages in editing efficiency compared to unlabeled PE2 using various pegrnas (fig. 136). PE2-MS2 fusions containing either an xten-16aa linker or an xten-33aa linker were tested for PE2-xten without MS2 fusion. MS2 and toe ring tagging were also observed to rescue the long primer binding site (fig. 137). In summary, the strategy of blocking PBS with a pivot point that dissociates upon Cas9 binding shows some benefits in terms of editing efficiency of different genomic sites, especially those sites that typically edit at lower efficiency due to low PBS: pre-spacer stability. It has also been shown that the epegRNA motif may be bifunctional, thereby increasing the stability of the pegRNA.
Next, strategies for trans-delivery of the pegRNA template via nicking-generated sgrnas were explored. It was found that pegRNA extension could be moved onto the incision generating guide to avoid PBS-spacer interactions completely (FIG. 138). Several design considerations should also be considered when using this strategy, including: 1) The extension template is used as a joint to affect the decomposition of the valve; 2) Optimizing the incision-generated spacer; and 3) the necessity of two PE complexes being present on the genome at the same time. It was observed that this strategy enabled pilot editing at DMNT1, HEK3, PRNP, RUNX1 and VEGFA (fig. 139).
Fig. 140 shows a model based on mismatch properties and position relative to the notch in PBS.
Figure 141 shows that mutations in PBS are tolerable or in some cases enhance PE activity and fit into the initial model where mutation location and identity determine PE efficiency.
Fig. 142 shows that longer PBS (RNF 2, 15 nt) cannot tolerate mutations, probably because they excessively inhibit PBS: the pre-spacers interact.
Fig. 143 shows that PBS mutations can increase PE efficiency of pegRNA with shorter optimal PBS. mutPBS for mutPBS epegRNA was 17, with 4 consecutive mutations (HEK 3, DNMT1, PRNP) or mutPBS was fifteen, with four consecutive mutations (RNF 2), followed by 8nt linker and tevopreQ 1
FIG. 144 shows that the improvement in mutPBS can further increase editing efficiency when used in combination with epegRNA.
FIG. 145 provides a schematic diagram of dual boot editing. Double guided editing is particularly useful for large editing because the petals are exogenous and can only base pair with each other.
Fig. 146 shows that the middle area is notched in the double guide editing to reduce competing homology to improve editing efficiency. The additional incision (or incisions) will degrade the genomic region between the two petals, thereby reducing the complexity of the intermediate and increasing yield.
Fig. 147 shows MECP2 double guide editing and auxiliary incisions.
Example 6: treatment of CDKL5 deficiency by guided editing
CDKL5 deficiency is a hereditary disease characterized by seizures initiated early after birth and subsequently by developmental delays in many respects. Seizures associated with CDKL5 deficiency typically change with age. The most common seizure types in affected individuals are known as generalized tonic-clonic seizures (also known as epileptic large seizures), which include loss of consciousness, muscle stiffness, and body twitches. Tonic seizures represent another major type of seizure associated with CDKL5 deficiency, which may be characterized by abnormal muscle contraction. Another common seizure type is epileptic cramps, which involve transient involuntary muscle twitches. Most CDKL5 deficient patients develop seizures daily, but may also experience a seizure free period. Epileptic seizures with CDKL5 deficiency are generally resistant to treatment.
CDKL5 deficiency is also associated with impaired child development. These children have severe mental retardation and significantly limited speech ability. In addition, the development of certain individuals' gross motor skills (e.g., walking, sitting, and standing) is delayed or lost entirely. In fact, only about one third of the affected individuals are able to walk unassisted. Fine motor skills are also compromised, and only about half of affected individuals can meaningfully use their hands. Many individuals affected by CDKL5 deficiency also suffer from impaired vision.
CDKL5 deficiency is caused by a CDKL5 gene mutation. The gene provides instructions for the production of proteins essential for normal brain development and function. In particular, a CDKL5 gene mutation may reduce the number of functional CDKL5 proteins or alter their activity in neurons. CDKL5 deficiency (deficiency) or impaired function thereof may disrupt brain development, but it is not clear how these changes lead to specific features of CDKL5 deficiency.
Current treatment of CDKL5 mutations/defects is primarily focused on controlling symptoms. However, no treatment methods currently exist that can improve neurological outcome in subjects with CDKL5 mutations or defects, or can correct CDKL5 gene mutations that lead to disease. Thus, there is a need for gene therapy methods for treating CDKL5 deficiency.
In the present disclosure, a guide editor (e.g., PE 2) is used in combination with pegRNA, as shown in FIG. 148, to correct multiple pathogenic mutations in the CDKL5 gene simultaneously (including correcting V172I, A173D, R175S, W176G, W176R, Y177C, R178P, P180L, E181A and L182P mutations). A single guide editor (e.g., PE 2) complexed with a single pegRNA was also shown to be able to correct a large number of pathogenic mutations at positions +4, +8, +12, +17, +21 and +25 relative to position 1 (i.e., the most 5' nucleotide; FIG. 149) of the PAM sequence.
Example 7: method of correcting multiple mutations in mouse CDKL5 using a single guide RNA
The following examples describe the optimization of the installation of the pathogenic 1412delA mutation in mouse cells. N2A cells are used for this work because these cells are derived from neuroblastoma, whereas CDKL5 deficiency (CDD) is primarily a neurological disease. Much work was done to optimize the pegRNA and nick-generating guide and install this mutation. One such example of such optimization using DNA plasmid transfection is provided in figure 150. PE system, incision generating guide and pegRNA parameters are detailed on the X-axis. "13_20" pegRNA was used for subsequent synthesis of pegRNA for electroporation.
N2A cells were then electroporated with in vitro transcribed PE mRNA, synthetic epegRNA and synthetic guide RNA (PE 3), or with the above substrate for mMLH1neg mRNA (PE 5) (FIG. 151). Incision generating guide positions (NG 1 and NG 3) are also different. It was concluded that the PE5 system with NG1 provided the highest percentage of installations, while the PE5 system with NG3 provided the most desirable edit indel rate.
Similar experiments were then performed, with seed editing encoded by epegrnas added, along with the desired 1412delA mutation (figure 152). Seed editing is silent and therefore is not expected to be pathogenic because the amino acid sequence is not altered. In addition to the notch-producing guides testing the PE5b strategy, two standard notch-producing guides (NG 1, NG 3) were used for PE5. The reverse transcriptase templates of pegRNA 081 and 082 differ in length. The conclusion was that pegRNA 081 was most efficient, with about 70% efficiency in installing the desired mutation.
Because CDKL5 is caused by a dominant mutation on the X chromosome, female patients typically have a healthy allele. The effect of indels on this healthy allele is unknown and the gene can be targeted with typical PE (SpCas 9 PE). A new pegRNA was designed (table E7) that required SpCas9-NRCH PE and 1) would not target healthy alleles and 2) would not be a good substrate for subsequent editing once the first editing event occurred, as PAM would be destroyed.
Finally, one pegRNA was used to correct multiple pathogenic alleles. Since mutations occur de novo, it is extremely rare that the same mutation occurs in any two patients. However, loci in the CDKL5 gene are more likely to carry these pathogenic mutations than other loci. One such locus is exon 8. Multiple pathogenic CDKL5 alleles were installed in HEK293T cells via plasmid transfection (fig. 153). Two pegRNA parameters are depicted on the X-axis.
Example 8: directed editing of CDKL5 loci with PE4 and PE5
PE4max and PE5max (figures 154 and 155) were used to introduce silent c.g-to-t.a mutations at the CDKL5 site, which is known to contain causative mutations of CDKL5 deficiency, a severe neurodevelopmental disorder (Olson et al, 2019). It was observed that PE4max increased the average guided editing efficiency 29-fold in HeLa cells and 2.1-fold in HEK293T cells compared to PE2 (fig. 156). Notably, the efficiency of PE4max editing (8.6% editing with 0.19% indels in HeLa cells and 20% editing with 0.26% indels in HEK293T cells) was similar to or greater than PE3 (4.5% editing with 1.5% indels in HeLa cells and 24% editing with 5.4% indels in HEK293T cells), but much less indels. Furthermore, PE5max increased disease-associated allele conversion by an average of 6.1-fold in HeLa cells, 1.5-fold in HEK293T cells, compared to PE3, and would edit: indel purity was increased 6.4-fold in HeLa cells and 3.5-fold in HEK293T cells (fig. 156).
Next, the PE4 and PE5 editing systems were evaluated in cell models of genetic diseases and primary human cells. Pathogenic CDKL5 c.1412dela mutations in human induced pluripotent stem cells (ipscs) derived from these heterozygous patients have been corrected ((Chen et al, 2021) electroporation of these ipscs with PE3 modules (in vitro transcribed PE2 mRNA and synthetic pegRNA and nick-producing sgrnas) produced 17% editable pathogenic allele correction and 20% total indel products (figures 157 and 158) co-electroporation of these modules with MLH1dn mRNA for PE5 editing increased correction efficiency to 34% and reduced the frequency of indels to 6.1% in order to further reduce indels, in the absence of complementary strand nick generation, MLH1dn was observed to increase allele correction from 4.0% (PE 2) to 10% (PE 4) with little indels (< 0.34%) (FIGS. 157 and 158). Likewise, PE3b resulted in 13% editing and 4.8% indels of the mutant allele, while PE5b increased editing to 27% with 3.8% indels.
MLH1dn and epegRNA were also combined for CDKL5 editing (FIG. 159). By using MLH1dn (PE 4 and PE 5) and epegRNA, the editing efficiency of CDKL5 c.1412A to G mutations in HEK293T cells can be improved. Finally, nick-producing sgrnas were also optimized for guided editing of CDKL5 (figure 160). By this optimization, the editing efficiency of installing the CDKL5 silencing +1c to T mutation (c.1412dela site) in HEK293T cells was improved. The sequences of the guide RNAs used in this example are provided in table E8.
Example 9: PAM variant guided editing for editing CDKL5 loci
When PE4 and PE5 were used for correction of CDKL5 c.1412dela mutations in heterozygous human patient-derived induced pluripotent stem cells, high levels of insertion and deletion byproducts (indels) were observed in addition to the expected guidance edits. Many of these indels were presumed to be caused by the attempted pilot editing of the wild-type allele without the c.1412dela mutation. Specifically, since the c.1412dela mutation was far from the targeted pre-spacer of SpCas9-PE, this pre-spacer was nicked even for the wild-type allele, generating an indel by-product (figure 161).
To mitigate indels, guided editors were developed that target and nick DNA only in the presence of the c.1412dela mutation. Thus, a guided editor using NRCH and NRTH SpCas9 variants (as described in international patent application publication WO 2020/04751) was generated. NRCH SpCas9-PE and NRTH SpCas9-PE can specifically target the c.1412dela mutation, so they cannot bind and nick the wild-type CDKL5 allele (fig. 162 and 163).
Thus, NRCH SpCas9-PE and NRTH SpCas9-PE can only correct CDKL5 c.1412dela mutations if present, which should minimize indel by-products. The pegRNA and nicking-generating sgRNA sequences used in this strategy are provided in table E9.
Example 10: boot editor for CDKL5 mutation installation
Guide editing guide RNAs (pegrnas) with different Primer Binding Sites (PBS) and template lengths were screened to identify those that were able to use the PE2 guide editor for the most efficient installation of the transition point mutation at c.1412 in the CDKL5 gene of HEK293T cells (fig. 164). Next, the selection of the notch-producing guide used in the PE3 guided editor system was optimized, further improving the efficiency of editing at c.1412 (fig. 165). Coding silencing in the seed region of the pre-spacer targeted by pegRNA was also incorporated to further increase editing efficiency (figure 165). The guide RNA sequences used in this strategy are provided in table E10. Overall, pegRNA CDKL5h37 and epegRNA JNpeg0953 showed the highest editing efficiency.
Sequence(s)
The following sequences in tables E1-E6 are mentioned throughout example 4 and the associated figures.
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
Table E2.
RNA structural motif sequences detected in this study. This table contains a separate list of RNA structural motifs attached to the epegrnas. The epegRNA containing tevopreQ1 is shown in italics.
/>
Table E3.
Primer sequences for genomic DNA amplification. This table lists all primers used for genomic DNA amplification prior to high throughput sequencing. For most forward primers, offset forward primers with 4 or 5N are used, as indicated. Primers were used to amplify genomic loci and qPCR. NNNN represents an equal mix of n=4 and n=5 oligonucleotides.
/>
/>
/>
/>
Table E4.
The sequence of the amplicon was analyzed using high throughput sequencing. This table lists all genomic regions analyzed by high throughput sequencing, including HEK3, EMX1 and known Cas9 off-target sites for FANCF.
/>
/>
/>
/>
/>
/>
Table E5.
Primer sequences used in RTqPCR experiments. This table lists all primers used for RTqPCR analysis of perna expression levels.
Stent-fp CCAGACTGAGCACGTGAGTTT(SEQ ID NO:1376)
Stent-rp CGACTCGGTGCCACTTTTTC(SEQ ID NO:1377)
Table E6.
Reference Single Nucleotide Polymorphism (SNP) number of pathogenic mutations installed with pegRNA or epegRNA.
The table lists the NCBI reference SNP names (rs) for the mutations installed in fig. 107D.
/>
Table E7.
The sequence of the guide RNA used in example 7
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
Table E8.
The sequence of the guide RNA used in example 8.
/>
Table E9.
The sequence of the guide RNA used in example 9.
/>
/>
/>
/>
/>
/>
/>
Table E10.
The sequence of the guide RNA used in example 10.
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
Reference of example 4
1.Komor,A.C.,Badran,A.H.&Liu,D.R.CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes.Cell 168,20-36(2017).
2.Anzalone,A.V.,Koblan,L.W.&Liu,D.R.Genome editing with CRISPR-Cas nucleases,base editors,transposases and prime editors.Nat Biotechnol38,824-844(2020).
3.Cullot,G.et al.CRISPR-Cas9 genome editing induces megabase-scale chromosomal truncations.Nat Commun 10,1136(2019).
4.Kosicki,M.,Tomberg,K.&Bradley,A.Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements.Nat Biotechnol36,765-771(2018).
5.Boroviak,K.,Fu,B.,Yang,F.,Doe,B.&Bradley,A.Revealing hidden complexities of genomic rearrangements generated with Cas9.Sci Rep 7,12867(2017).
6.Enache,O.M.et al.Cas9 activates the p53 pathway and selects for p53-inactivating mutations.Nat Genet 52,662-668(2020).
7.Haapaniemi,E.,Botla,S.,Persson,J.,Schmierer,B.&Taipale,J.CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response.Nat Med 24,927-930(2018).
8.Ihry,R.J.et al.p53 inhibits CRISPR-Cas9 engineering in human pluripotent stem cells.Nat Med 24,939-946(2018).
9.Leibowitz,M.L.et al.Chromothripsis as an on-target consequence of CRISPR-Cas9 genome editing.Preprint at https://www.biorxiv.org/content/10.1101/2020.07.13.200998v1(2020).
10.Burgio,G.&Teboul,L.Anticipating and Identifying Collateral Damage in Genome Editing.Trends Genet 36,905-914(2020).
11.Cox,D.B.,Platt,R.J.&Zhang,F.Therapeutic genome editing:prospects and challenges.Nat Med 21,121-131(2015).
12.Komor,A.C.,Kim,Y.B.,Packer,M.S.,Zuris,J.A.&Liu,D.R.Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature 533,420-424(2016).
13.Gaudelli,N.M.et al.Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage.Nature 551,464-471(2017).
14.Anzalone,A.V.et al.Search-and-replace genome editing without double-strand breaks or donor DNA.Nature 576,149-157(2019).
15.Houseley,J.&Tollervey,D.The many pathways of RNA degradation.Cell 136,763-
776(2009).
16.Hendel,A.et al.Chemically modified guide RNAs enhance CRISPR-Cas genome editing in human primary cells.Nat Biotechnol 33,985-989(2015).
17.Geisberg,J.V.,Moqtaderi,Z.,Fan,X.,Ozsolak,F.&Struhl,K.Global analysis of mRNA isoform half-lives reveals stabilizing and destabilizing elements in yeast.Cell 156,812-824(2014).
18.Wu,X.&Bartel,D.P.Widespread Influence of 3′-End Structures on Mammalian mRNA Processing and Stability.Cell 169,905-917 e911(2017).
19.Brown,J.A.et al.Structural insights into the stabilization of MALAT1 noncoding RNA by a bipartite triple helix.Nat Struct Mol Biol 21,633-640(2014).
20.MacFadden,A.et al.Mechanism and structural diversity of exoribonuclease-resistant RNA structures in flaviviral RNAs.Nat Commun 9,119(2018).
21.Pijlman,G.P.et al.A highly structured,nuclease-resistant,noncoding RNA produced by flaviviruses is required for pathogenicity.Cell Host Microbe 4,579-591(2008).
22.Roth,A.et al.A riboswitch selective for the queuosine precursor preQ1 contains an unusually small aptamer domain.Nat Struct Mol Biol 14,308-317(2007).
23.Anzalone,A.V.,Lin,A.J.,Zairis,S.,Rabadan,R.&Cornish,V.W.Reprogramming eukaryotic translation with ligand-responsive synthetic RNA switches.Nat Methods 13,453-458(2016).
24.Houck-Loomis,B.et al.An equilibrium-dependent retroviral mRNA switch regulates translational recoding.Nature 480,561-564(2011).
25.Lorenz,R.et al.ViennaRNA package 2.0.Algorithms Mol Biol 6,26(2011).
26.Schene,I.F.et al.Prime editing for functional repair in patient-derived disease models.Nat Commun 11,5352(2020).
27.Kim,D.Y.,Moon,S.B.,Ko,J.H.,Kim,Y.S.&Kim,D.Unbiased investigation of specificities of prime editing systems in human cells.Nucleic Acids Res 48,10576-10589(2020).
28.Gao,P.et al.Prime editing in mice reveals the essentiality of a single base in driving tissue specific gene expression.Preprint at www.biorxiv.org/content/10.1101/2020.11.07.372748v3.full.pdf(2020).
29.Tsai,S.Q.et al.GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.Nat Biotechnol 33,187-197(2015).
30.Ibrahim,H.,Wilusz,J.&Wilusz,C.J.RNA recognition by 3′-to-5′exonucleases:the substrate perspective.Biochim biophys acta 1779,256-265(2008).
31.Green,L.,Kim,C.H.,Bustamante,C.&Tinoco,I.,Jr.Characterization of the mechanical unfolding of RNA pseudoknots.J Mol Biol 375,511-528(2008).
32.Chavez,A.et al.Highly efficient Cas9-mediated transcriptional programming.Nat Methods 12,326-328(2015).
33.Hu,J.H.et al.Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.Nature 556,57-63(2018).
34.Nahar,S.et al.A G-quadruplex motif at the 3′end of sgRNAs improves CRISPR-Cas9 based genome editing efficiency.Chem Commun 54,2377-2380(2018).
35.Steckelberg,A.L.et al.A folded viral noncoding RNA blocks host cell exoribonucleases through a conformationally dynamic RNA structure.Proc Natl Acad Sci USA 115,6404-6409(2018).
36.Cate,J.H.et al.Crystal structure of a group I ribozyme domain:principles of RNA packing.Science 273,1678-1685(1996).
37.Fedor,M.J.&Westhof,E.Ribozymes:the first 20 years.Mol Cell 10,703-704(2002).
38.Chen,X.,Zaro,J.L.&Shen,W.C.Fusion protein linkers:property,design and functionality.Adv Drug Deliv Rev 65,1357-1369(2013).
39.Win,M.N.&Smolke,C.D.A modular and extensible RNA-based gene-regulatory platform for engineering cellular function.Proc Natl Acad Sci USA 104,14283-14288(2007).
40.Nielsen,S.,Yuzenkova,Y.&Zenkin,N.Mechanism of eukaryotic RNA polymerase III transcription termination.Science 340,1577-1580(2013).
41.Huang,T.P.,Newby,G.A.&Liu,D.R.Precision genome editing using cytosine and adenine base editors in mammalian cells.Nat Protoc(2021).
42.Basila,M.,Kelley,M.L.&Smith,A.V.B.Minimal 2′-O-methyl phosphorothioate linkage modification pattern of synthetic guide RNAs for increased stability and efficient CRISPR-Cas9 gene editing avoiding cellular toxicity.PLoS One 12,e0188593(2017).
43.Mead,S.et al.A novel protective prion protein variant that colocalizes with Kuru exposure.N Engl J Med361,2056-2065(2009).
44.Asante,E.A.et al.A naturally occurring variant of the human prion protein completely prevents prion disease.Nature 522,478-481(2015).
45.Kim,H.K.et al.Predicting the efficiency of prime editing guide RNAs in human cells.Nat Biotechnol(2020).
46.Jonsson,T.et al.A mutation in APP protects against Alzheimer's disease and age-related cognitive decline.Nature 488,96-99(2012).
47.Abifadel,M.et al.Mutations in PCSK9 cause autosomal dominant hypercholesterolemia.Nat Genet 34,154-156(2003).
48.Bustami,J.et al.Cholesteryl ester transfer protein(CETP)I405V polymorphism and cardiovascular disease in eastern European Caucasians-a cross-sectional study.BMC Geriatr 16,144(2016).
49.Flannick,J.et al.Loss-of-function mutations in SLC30A8 protect against type 2 diabetes.Nat Genet 46,357-363(2014).
50.Sakuntabhai,A.et al.A variant in the CD209 promoter is associated with severity of dengue disease.Nat Genet 37,507-513(2005).
51.Olson,H.E.et al.Cyclin-Dependent Kinase-Like 5 Deficiency Disorder:Clinical Review.Pediatr Neurol 97,18-25(2019).
52.Al-Saaidi,R.et al.The LMNA mutation p.Arg321Ter associated with dilated cardiomyopathy leads to reduced expression and a skewed ratio of lamin A and lamin C proteins.Exp Cell Res 319,3010-3019(2013).
53.Ip,J.P.K.,Mellios,N.&Sur,M.Rett syndrome:insights into genetic,molecular and circuit mechanisms.Nat Rev Neurosci 19,368-382(2018).
54.Christodoulou,J.,Grimm,A.,Maher,T.&Bennetts,B.RettBASE:The IRSA MECP2 variation database-a new mutation database in evolution.Hum Mutat 21,466-472(2003).
55.Dwivedi,O.P.et al.Loss of ZnT8 function protects against diabetes by enhanced insulin secretion.Nat Genet 51,1596-1606(2019).
56.Thyme,S.B.,Akhmetova,L.,Montague,T.G.,Valen,E.&Schier,A.F.Internal guide RNA interactions interfere with Cas9-mediated cleavage.Nat Commun 7,11750(2016).
57.Boyle,E.A.et al.Quantification of Cas9 binding and cleavage across diverse guide sequences maps landscapes of target engagement.Science Advances,in press(2021).
58.Gaudelli,N.M.et al.Directed evolution of adenine base editors with increased activity and therapeutic application.Nat Biotechnol 38,892-900(2020).
59.Clement,K.et al.CRISPResso2 provides accurate and rapid genome editing sequence analysis.Nat Biotechnol 37,224-226(2019).
60.Pandey,S.,Agarwala,P.&Maiti,S.Effect of loops and G-quartets on the stability of RNA G-quadruplexes.J Phys Chem B 117,6896-6905(2013).
61.Engler,C.,Gruetzner,R.,Kandzia,R.&Marillonnet,S.Golden gate shuffling:a one-pot DNA shuffling method based on type IIs restriction enzymes.PLoS One 4,e5553(2009).
Reference to the literature
All of the following references are each incorporated by reference herein in their entirety.
1.Jinek,M.et al.A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity.Science337,816–821(2012).
2.Cong,L.et al.Multiplex Genome Engineering Using CRISPR/Cas Systems.Science 339,819–823(2013).
3.Komor,A.C.,Badran,A.H.&Liu,D.R.CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes.Cell 168,20–36(2017).
4.Komor,A.C.,Kim,Y.B.,Packer,M.S.,Zuris,J.A.&Liu,D.R.Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature 533,420–424(2016).
5.Nishida,K.et al.Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.Science353,aaf8729(2016).
6.Gaudelli,N.M.et al.Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.Nature 551,464–471(2017).
7.ClinVar,July 2019.
8.Dunbar,C.E.et al.Gene therapy comes of age.Science 359,eaan4672(2018).
9.Cox,D.B.T.,Platt,R.J.&Zhang,F.Therapeutic genome editing:prospects and challenges.Nat.Med.21,121–131(2015).
10.Adli,M.The CRISPR tool kit for genome editing and beyond.Nat.Commun.9,1911(2018).
11.Kleinstiver,B.P.et al.Engineered CRISPR-Cas9 nucleases with altered PAM specificities.Nature 523,481–485(2015).
12.Kleinstiver,B.P.et al.High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects.Nature 529,490–495(2016).
13.Hu,J.H.et al.Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.Nature 556,57–63(2018).
14.Nishimasu,H.et al.Engineered CRISPR-Cas9 nuclease with expanded targeting space.Science 361,1259–1262(2018).
15.Jasin,M.&Rothstein,R.Repair of strand breaks by homologous recombination.Cold Spring Harb.Perspect.Biol.5,a012740(2013).
16.Paquet,D.et al.Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9.Nature 533,125–129(2016).
17.Kosicki,M.,Tomberg,K.&Bradley,A.Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements.Nat.Biotechnol.36,765–771(2018).
18.Haapaniemi,E.,Botla,S.,Persson,J.,Schmierer,B.&Taipale,J.CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response.Nat.Med.24,927–930(2018).
19.Ihry,R.J.et al.p53 inhibits CRISPR–Cas9 engineering in human pluripotent stem cells.Nat.Med.24,939–946(2018).
20.Richardson,C.D.,Ray,G.J.,DeWitt,M.A.,Curie,G.L.&Corn,J.E.Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA.Nat.Biotechnol.34,339–344(2016).
21.Srivastava,M.et al.An Inhibitor of Nonhomologous End-Joining Abrogates Double-Strand Break Repair and Impedes Cancer Progression.Cell 151,1474–1487(2012).
22.Chu,V.T.et al.Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells.Nat.Biotechnol.33,543–548(2015).
23.Maruyama,T.et al.Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining.Nat.Biotechnol.33,538–542(2015).
24.Kim,Y.B.et al.Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions.Nat.Biotechnol.35,371–376(2017).
25.Li,X.et al.Base editing with a Cpf1–cytidine deaminase fusion.Nat.Biotechnol.36,324–327(2018).
26.Gehrke,J.M.et al.An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities.Nat.Biotechnol.(2018).doi:10.1038/nbt.4199
27.Rees,H.A.&Liu,D.R.Base editing:precision chemistry on the genome and transcriptome of living cells.Nat.Rev.Genet.1(2018).doi:10.1038/s41576-018-0059-1.
28.Ostertag,E.M.&Kazazian Jr,H.H.Biology of Mammalian L1 Retrotransposons.Annu.Rev.Genet.35,501–538(2001).
29.Zimmerly,S.,Guo,H.,Perlman,P.S.&Lambowltz,A.M.Group II intron mobility occurs by target DNA-primed reverse transcription.Cell 82,545–554(1995).
30.Luan,D.D.,Korman,M.H.,Jakubczak,J.L.&Eickbush,T.H.Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site:a mechanism for non-LTR retrotransposition.Cell 72,595–605(1993).
31.Feng,Q.,Moran,J.V.,Kazazian,H.H.&Boeke,J.D.Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition.Cell 87,905–916(1996).
32.Jinek,M.et al.Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation.Science 343,1247997(2014).
33.Jiang,F.et al.Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage.Science aad8282(2016).doi:10.1126/science.aad8282
34.Qi,L.S.et al.Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression.Cell 152,1173–1183(2013).
35.Tang,W.,Hu,J.H.&Liu,D.R.Aptazyme-embedded guide RNAs enable ligand-responsive genome editing and transcriptional activation.Nat.Commun.8,15939(2017).
36.Shechner,D.M.,Hacisuleyman,E.,Younger,S.T.&Rinn,J.L.Multiplexable,locus-specific targeting of long RNAs with CRISPR-Display.Nat.Methods 12,664–670(2015).
37.Anders,C.&Jinek,M.Chapter One-In vitro Enzymology of Cas9.in Methods in Enzymology(eds.Doudna,J.A.&Sontheimer,E.J.)546,1–20(Academic Press,2014).
38.Briner,A.E.et al.Guide RNA Functional Modules Direct Cas9 Activity and Orthogonality.Mol.Cell 56,333–339(2014).
39.Nowak,C.M.,Lawson,S.,Zerez,M.&Bleris,L.Guide RNA engineering for versatile Cas9 functionality.Nucleic Acids Res.44,9555–9564(2016).
40.Sternberg,S.H.,Redding,S.,Jinek,M.,Greene,E.C.&Doudna,J.A.DNA interrogation by the CRISPR RNA-guided endonuclease Cas9.Nature 507,62–67(2014).
41.Mohr,S.et al.Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing.RNA 19,958–970(2013).
42.Stamos,J.L.,Lentzsch,A.M.&Lambowitz,A.M.Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications.Mol.Cell 68,926-939.e4(2017).
43.Zhao,C.&Pyle,A.M.Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution.Nat.Struct.Mol.Biol.23,558–565(2016).
44.Zhao,C.,Liu,F.&Pyle,A.M.An ultraprocessive,accurate reverse transcriptase encoded by a metazoan group II intron.RNA 24,183–195(2018).
45.Ran,F.A.et al.Genome engineering using the CRISPR-Cas9 system.Nat.Protoc.8,2281–2308(2013).
46.Liu,Y.,Kao,H.-I.&Bambara,R.A.Flap endonuclease 1:a central component of DNA metabolism.Annu.Rev.Biochem.73,589–615(2004).
47.Krokan,H.E.&,M.Base Excision Repair.Cold Spring Harb.Perspect.Biol.5,(2013).
48.Kelman,Z.PCNA:structure,functions and interactions.Oncogene 14,629–640(1997).
49.Choe,K.N.&Moldovan,G.-L.Forging Ahead through Darkness:PCNA,Still the Principal Conductor at the Replication Fork.Mol.Cell 65,380–392(2017).
50.Li,X.,Li,J.,Harrington,J.,Lieber,M.R.&Burgers,P.M.Lagging strand DNA synthesis at the eukaryotic replication fork involves binding and stimulation of FEN-1 by proliferating cell nuclear antigen.J.Biol.Chem.270,22109–22112(1995).
51.Tom,S.,Henricksen,L.A.&Bambara,R.A.Mechanism whereby proliferating cell nuclear antigen stimulates flap endonuclease 1.J.Biol.Chem.275,10498–10505(2000).
52.Tanenbaum,M.E.,Gilbert,L.A.,Qi,L.S.,Weissman,J.S.&Vale,R.D.A protein-tagging system for signal amplification in gene expression and fluorescence imaging.Cell 159,635–646(2014).
53.Bertrand,E.et al.Localization of ASH1 mRNA particles in living yeast.Mol.Cell 2,437–445(1998).
54.Dahlman,J.E.et al.Orthogonal gene knockout and activation with a catalytically active Cas9 nuclease.Nat.Biotechnol.33,1159–1161(2015).
55.Tsai,S.Q.et al.GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.Nat.Biotechnol.33,187–197(2015).
56.Tsai,S.Q.et al.CIRCLE-seq:a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets.Nat.Methods 14,607–614(2017).
57.Schek N,Cooke C,Alwine JC.Molecular and Cellular Biology.(1992).
58.Gil A,Proudfoot NJ.Cell.(1987).
59.Zhao,B.S.,Roundtree,I.A.,He,C.Nat Rev Mol Cell Biol.(2017).
60.Rubio,M.A.T.,Hopper,A.K.Wiley Interdiscip Rev RNA(2011).
61.Shechner,D.M.,Hacisuleyman E.,Younger,S.T.,Rinn,J.L.Nat Methods.(2015).62.Paige,J.S.,Wu,K.Y.,Jaffrey,S.R.Science(2011).
63.Ray D.,…Hughes TR.Nature(2013).
64.Chadalavada,D.M.,Cerrone-Szakal,A.L.,Bevilacqua,P.C.RNA(2007).
65.Forster AC,Symons RH.Cell.(1987).
66.Weinberg Z,Kim PB,Chen TH,Li S,Harris KA,Lünse CE,Breaker RR.Nat.Chem.Biol.(2015).
67.Feldstein PA,Buzayan JM,Bruening G.Gene(1989).
68.Saville BJ,Collins RA.Cell.(1990).
69.Winkler WC,Nahvi A,Roth A,Collins JA,Breaker RR.Nature(2004).
70.Roth A,Weinberg Z,Chen AG,Kim PG,Ames TD,Breaker RR.Nat Chem Biol.(2013).
71.Choudhury R,Tsai YS,Dominguez D,Wang Y,Wang Z.Nat Commun.(2012).
72.MacRae IJ,Doudna JA.Curr Opin Struct Biol.(2007).
73.Bernstein E,Caudy AA,Hammond SM,Hannon GJ Nature(2001).
74.Filippov V,Solovyev V,Filippova M,Gill SS.Gene(2000).
75.Cadwell RC and Joyce GF.PCR Methods Appl.(1992).
76.McInerney P,Adams P,and Hadi MZ.Mol Biol Int.(2014).
77.Esvelt KM,Carlson JC,and Liu DR.Nature.(2011).
78.Naorem SS,Hin J,Wang S,Lee WR,Heng X,Miller JF,Guo H.Proc Natl Acad Sci USA(2017).
79.Martinez MA,Vartanian JP,Wain-Hobson S.Proc Natl Acad Sci USA(1994).
80.Meyer AJ,Ellefson JW,Ellington AD.Curr Protoc Mol Biol.(2014).
81.Wang HH,Isaacs FJ,Carr PA,Sun ZZ,Xu G,Forest CR,Church GM.Nature.(2009).
82.Nyergeset al.Proc Natl Acad Sci USA.(2016).
83.Mascola JR,Haynes BF.Immunol Rev.(2013).
84.X.Wen,K.Wen,D.Cao,G.Li,R.W.Jones,J.Li,S.Szu,Y.Hoshino,L.Yuan,Inclusion of a universal tetanus toxoid CD4(+)T cell epitope P2 significantly enhanced the immunogenicity of recombinant rotavirusΔVP8*subunit parenteral vaccines.Vaccine 32,4420-4427(2014).
85.G.Ada,D.Isaacs,Carbohydrate-protein conjugate vaccines.Clin Microbiol Infect 9,79-85(2003).
86.E.Malito,B.Bursulaya,C.Chen,P.L.Surdo,M.Picchianti,E.Balducci,M.Biancucci,A.Brock,F.Berti,M.J.Bottomley,M.Nissum,P.Costantino,R.Rappuoli,G.Spraggon,Structural basis for lack of toxicity of the diphtheria toxin mutant CRM197.Proceedings of the National Academy of Sciences 109,5229(2012).
87.J.de Wit,M.E.Emmelot,M.C.M.Poelen,J.Lanfermeijer,W.G.H.Han,C.van Els,P.Kaaijk,The Human CD4(+)T Cell Response against Mumps Virus Targets a Broadly Recognized Nucleoprotein Epitope.J Virol 93,(2019).
88.M.May,C.A.Rieder,R.J.Rowe,Emergent lineages of mumps virus suggest the need for a polyvalent vaccine.Int J Infect Dis 66,1-4(2018).
89.M.Ramamurthy,P.Rajendiran,N.Saravanan,S.Sankar,S.Gopalan,B.Nandagopal,Identification of immunogenic B-cell epitope peptides of rubella virus E1 glycoprotein towards development of highly specific immunoassays and/or vaccine.Conference Abstract,(2019).
90.U.S.F.Tambunan,F.R.P.Sipahutar,A.A.Parikesit,D.Kerami,Vaccine Design for H5N1 Based on B-and T-cell Epitope Predictions.Bioinform Biol Insights 10,27-35(2016).
91.Asante,EA.et.al.″A naturally occurring variant of the human prion protein completely prevents prion disease″.Nature.(2015).
92.Crabtree,G.R.&Schreiber,S.L.Three-part inventions:intracellular signaling and induced proximity.Trends Biochem.Sci.21,418–22(1996).
93.Liu,J.et al.Calcineurin Is a Common Target of A and FKBP-FK506 Complexes.Cell 66,807–815(1991).
94.Keith,C.T.et al.A mammalian protein targeted by G1-arresting rapamycin–receptor complex.Nature 369,756–758(2003).
95.Spencer,D.M.,Wandless,T.J.,Schreiber,S.L.S.&Crabtree,G.R.Controlling signal transduction with synthetic ligands.Science 262,1019–24(1993).
96.Pruschy,M.N.et al.Mechanistic studies of a signaling pathway activated by the organic dimerizer FK1012.Chem.Biol.1,163–172(1994).
97.Spencer,D.M.et al.Functional analysis of Fas signaling in vivo using synthetic inducers of dimerization.Curr.Biol.6,839–847(1996).
98.Belshaw,P.J.,Spencer,D.M.,Crabtree,G.R.&Schreiber,S.L.Controlling programmed cell death with a cyclophilin-cyclosporin-based chemical inducer of dimerization.Chem.Biol.3,731–738(1996).
99.Yang,J.X.,Symes,K.,Mercola,M.&Schreiber,S.L.Small-molecule control of insulin and PDGF receptor signaling and the role of membrane attachment.Curr.Biol.8,11–18(1998).
100.Belshaw,P.J.,Ho,S.N.,Crabtree,G.R.&Schreiber,S.L.Controlling protein association and subcellular localization with a synthetic ligand that induces heterodimerization of proteins.Proc.Natl.Acad.Sci.93,4604–4607(2002).
101.Stockwell,B.R.&Schreiber,S.L.Probing the role of homomeric and heteromeric receptor interactions in TGF-βsignaling using small molecule dimerizers.Curr.Biol.8,761–773(2004).
102.Spencer,D.M.,Graef,I.,Austin,D.J.,Schreiber,S.L.&Crabtree,G.R.A general strategy for producing conditional alleles of Src-like tyrosine kinases.Proc.Natl.Acad.Sci.92,9805–9809(2006).
103.Holsinger,L.J.,Spencer,D.M.,Austin,D.J.,Schreiber,S.L.&Crabtree,G.R.Signal transduction in T lymphocytes using a conditional allele of Sos.Proc.Natl.Acad.Sci.92,9810–9814(2006).
104.Myers,M.G.Insulin Signal Transduction and the IRS Proteins.Annu.Rev.Pharmacol.Toxicol.36,615–658(1996).
105.Watowich,S.S.The erythropoietin receptor:Molecular structure and hematopoietic signaling pathways.J.Investig.Med.59,1067–1072(2011).
106.Blau,C.A.,Peterson,K.R.,Drachman,J.G.&Spencer,D.M.A proliferation switch for genetically modified cells.Proc.Natl.Acad.Sci.94,3076–3081(2002).
107.Clackson,T.et al.Redesigning an FKBP-ligand interface to generate chemical dimerizers with novel specificity.Proc.Natl.Acad.Sci.95,10437–10442(1998).
108.Diver,S.T.&Schreiber,S.L.Single-step synthesis of cell-permeable protein dimerizers that activate signal transduction and gene expression.J.Am.Chem.Soc.119,5106–5109(1997).
109.Guo,Z.F.,Zhang,R.&Liang,F.Sen.Facile functionalization of FK506 for biological studies by the thiol-ene‘click’reaction.RSC Adv.4,11400–11403(2014).
110.Robinson,D.R.,Wu,Y.-M.&Lin,S.-F.The protein tyrosine kinase family of the human genome.Oncogene 19,5548–5557(2000).
111.Landrum,M.J.et al.ClinVar:public archive of interpretations of clinically relevant variants.Nucleic Acids Res.44,D862–D868(2016).
112.Jinek,M.et al.A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity.Science 337,816–821(2012).
113.Cong,L.et al.Multiplex Genome Engineering Using CRISPR/Cas Systems.Science 339,819–823(2013).
114.Mali,P.et al.RNA-Guided Human Genome Engineering via Cas9.Science 339,823–826(2013).
115.Yang,H.et al.One-Step Generation of Mice Carrying Reporter and Conditional Alleles by CRISPR/Cas-Mediated Genome Engineering.Cell 154,1370–1379(2013).
116.Kim,S.,Kim,D.,Cho,S.W.,Kim,J.&Kim,J.-S.Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins.Genome Res.24,1012–1019(2014).
117.Orlando,S.J.et al.Zinc-finger nuclease-driven targeted integration into mammalian genomes using donors with limited chromosomal homology.Nucleic Acids Res.38,e152–e152(2010).
118.Tsai,S.Q.et al.GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.Nat.Biotechnol.33,187–197(2015).
119.Suzuki,K.et al.In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration.Nature 540,144–149(2016).
120.Kosicki,M.,Tomberg,K.&Bradley,A.Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements.Nat.Biotechnol.36,765–771(2018).
121.Haapaniemi,E.,Botla,S.,Persson,J.,Schmierer,B.&Taipale,J.CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response.Nat.Med.24,927–930(2018).
122.Ihry,R.J.et al.p53 inhibits CRISPR–Cas9 engineering in human pluripotent stem cells.
Nat.Med.24,939–946(2018).
123.Chapman,J.R.,Taylor,M.R.G.&Boulton,S.J.Playing the end game:DNA double-strand break repair pathway choice.Mol.Cell 47,497–510(2012).
124.Cox,D.B.T.,Platt,R.J.&Zhang,F.Therapeutic genome editing:prospects and challenges.Nat.Med.21,121–131(2015).
125.Paquet,D.et al.Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9.Nature 533,125–129(2016).
126.Chu,V.T.et al.Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells.Nat.Biotechnol.33,543–548(2015).
127.Maruyama,T.et al.Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining.Nat.Biotechnol.33,538–542(2015).
128.Rees,H.A.,Yeh,W.-H.&Liu,D.R.Development of hRad51–Cas9 nickase fusions that mediate HDR without double-stranded breaks.Nat.Commun.10,1–12(2019).
129.Komor,A.C.,Kim,Y.B.,Packer,M.S.,Zuris,J.A.&Liu,D.R.Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature 533,420–424(2016).
130.Gaudelli,N.M.et al.Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.Nature 551,464–471(2017).
131.Gao,X.et al.Treatment of autosomal dominant hearing loss by in vivo delivery of genome editing agents.Nature 553,217–221(2018).
132.Ingram,V.M.A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin.Nature 178,792–794(1956).
133.Myerowitz,R.&Costigan,F.C.The major defect in Ashkenazi Jews with Tay-Sachs disease is an insertion in the gene for the alpha-chain of beta-hexosaminidase.J.Biol.Chem.263,18587–18589(1988).
134.Zielenski,J.Genotype and Phenotype in Cystic Fibrosis.Respiration 67,117–133(2000).
135.Mead,S.et al.A Novel Protective Prion Protein Variant that Colocalizes with Kuru Exposure.N.Engl.J.Med.361,2056–2065(2009).
136.Marraffini,L.A.&Sontheimer,E.J.CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA.Science 322,1843–1845(2008).
137.Barrangou,R.et al.CRISPR provides acquired resistance against viruses in prokaryotes.
Science 315,1709–1712(2007).
138.Jiang,F.&Doudna,J.A.CRISPR–Cas9 Structures and Mechanisms.Annu.Rev.Biophys.46,505–529(2017).
139.Hille,F.et al.The Biology of CRISPR-Cas:Backward and Forward.Cell 172,1239–1259(2018).
140.Luan,D.D.,Korman,M.H.,Jakubczak,J.L.&Eickbush,T.H.Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site:a mechanism for non-LTR retrotransposition.Cell 72,595–605(1993).
141.Liu,Y.,Kao,H.-I.&Bambara,R.A.Flap endonuclease 1:a central component of DNA metabolism.Annu.Rev.Biochem.73,589–615(2004).
142.Rees,H.A.&Liu,D.R.Base editing:precision chemistry on the genome and transcriptome of living cells.Nat.Rev.Genet.19,770(2018).
143.Richardson,C.D.,Ray,G.J.,DeWitt,M.A.,Curie,G.L.&Corn,J.E.Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA.Nat.Biotechnol.34,339–344(2016).
144.Qi,L.S.et al.Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression.Cell 152,1173–1183(2013).
145.Shechner,D.M.,Hacisuleyman,E.,Younger,S.T.&Rinn,J.L.Multiplexable,locus-specific targeting of long RNAs with CRISPR-Display.Nat.Methods 12,664–670(2015).
146.Tang,W.,Hu,J.H.&Liu,D.R.Aptazyme-embedded guide RNAs enable ligand-responsive genome editing and transcriptional activation.Nat.Commun.8,15939(2017).
147.Jinek,M.et al.Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation.Science 343,1247997(2014).
148.Nishimasu,H.et al.Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA.Cell 156,935–949(2014).
149.Jiang,F.,Zhou,K.,Ma,L.,Gressel,S.&Doudna,J.A.A Cas9–guide RNA complex preorganized for target DNA recognition.Science 348,1477–1481(2015).
150.Baranauskas,A.et al.Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants.Protein Eng.Des.Sel.25,657–668(2012).
151.Gerard,G.F.et al.The role of template-primer in protection of reverse transcriptase from thermal inactivation.Nucleic Acids Res.30,3118–3129(2002).
152.Arezi,B.&Hogrefe,H.Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer.Nucleic Acids Res.37,473–481(2009).
153.Kotewicz,M.L.,Sampson,C.M.,D’Alessio,J.M.&Gerard,G.F.Isolation of cloned Moloney murine leukemia virus reverse transcriptase lacking ribonuclease H activity.Nucleic Acids Res.16,265–277(1988).
154.Shen,M.W.et al.Predictable and precise template-free CRISPR editing of pathogenic variants.Nature 563,646–651(2018).
155.Thuronyi,B.W.et al.Continuous evolution of base editors with expanded target compatibility and improved activity.Nat.Biotechnol.(2019).doi:10.1038/s41587-019-0193-0
156.Kim,Y.B.et al.Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions.Nat.Biotechnol.35,371–376(2017).
157.Koblan,L.W.et al.Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction.Nat.Biotechnol.(2018).doi:10.1038/nbt.4172
158.Komor,A.C.et al.Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity.Sci.Adv.3,eaao4774(2017).
159.Kleinstiver,B.P.et al.High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects.Nature 529,490–495(2016).
160.Zuo,E.et al.Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos.Science 364,289–292(2019).
161.Jin,S.et al.Cytosine,but not adenine,base editors induce genome-wide off-target mutations in rice.Science 364,292–295(2019).
162.Kim,D.,Kim,D.,Lee,G.,Cho,S.-I.&Kim,J.-S.Genome-wide target specificity of CRISPR RNA-guided adenine base editors.Nat.Biotechnol.37,430–435(2019).
163.Grünewald,J.et al.Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors.Nature 569,433–437(2019).
164.Zhou,C.et al.Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis.Nature 571,275–278(2019).
165.Rees,H.A.,Wilson,C.,Doman,J.L.&Liu,D.R.Analysis and minimization of cellular RNA editing by DNA adenine base editors.Sci.Adv.5,eaax5717(2019).
166.Ostertag,E.M.&Kazazian Jr,H.H.Biology of Mammalian L1 Retrotransposons.Annu.Rev.Genet.35,501–538(2001).
167.Griffiths,D.J.Endogenous retroviruses in the human genome sequence.Genome Biol.
2,REVIEWS1017(2001).
168.Berkhout,B.,Jebbink,M.&Zsíros,J.Identification of an Active Reverse Transcriptase Enzyme Encoded by a Human Endogenous HERV-K Retrovirus.J.Virol.73,2365–2375(1999).
169.Halvas,E.K.,Svarovskaia,E.S.&Pathak,V.K.Role of Murine Leukemia Virus Reverse Transcriptase Deoxyribonucleoside Triphosphate-Binding Site in Retroviral Replication and In Vivo Fidelity.J.Virol.74,10349–10358(2000).
170.Dever,D.P.et al.CRISPR/Cas9 Beta-globin Gene Targeting in Human Hematopoietic Stem Cells.Nature 539,384–389(2016).
171.Park,S.H.et al.Highly efficient editing of theβ-globin gene in patient-derived hematopoietic stem and progenitor cells to treat sickle cell disease.Nucleic Acids Res.doi:10.1093/nar/gkz475
172.Collinge,J.Prion diseases of humans and animals:their causes and molecular basis.Annu.Rev.Neurosci.24,519–550(2001).
173.Asante,E.A.et al.A naturally occurring variant of the human prion protein completely prevents prion disease.Nature 522,478–481(2015).
174.Anzalone,A.V.,Lin,A.J.,Zairis,S.,Rabadan,R.&Cornish,V.W.Reprogramming eukaryotic translation with ligand-responsive synthetic RNA switches.Nat.Methods 13,453–458(2016).
175.Badran,A.H.et al.Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance.Nature 533,58–63(2016).
176.Anders,C.&Jinek,M.Chapter One-In Vitro Enzymology of Cas9.in Methods in Enzymology(eds.Doudna,J.A.&Sontheimer,E.J.)546,1–20(Academic Press,2014).
177.Pirakitikulr,N.,Ostrov,N.,Peralta-Yahya,P.&Cornish,V.W.PCRless library mutagenesis via oligonucleotide recombination in yeast.Protein Sci.Publ.Protein Soc.19,2336–2346(2010).
178.Clement,K.et al.CRISPResso2 provides accurate and rapid genome editing sequence analysis.Nat.Biotechnol.37,224–226(2019).
179.Tsai,S.Q.et al.GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.Nat.Biotechnol.33,187–197(2015).
180.Kleinstiver,B.P.et al.High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects.Nature 529,490–495(2016).
181.Koblan,L.W.et al.Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction.Nat.Biotechnol.(2018).
doi:10.1038/nbt.4172
182.Baranauskas,A.et al.Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants.Protein Eng.Des.Sel.25,657–668(2012).
183.Schechner,DM,Hacisuleyman E.,Younger ST,Rinn JL.Nat Methods 664-70(2015).
184.Brown JA,et al.Nat Struct Mol Biol 633-40(2014).
185.Conrad NA and Steitz JA.EMBO J 1831-41(2005).
186.Bartlett JS,et al.Proc Natl Acad Sci USA 8852-7(1996).
187.Mitton-Fry RM,DeGregorio SJ,Wang J,Steitz TA,Steitz JA.Science 1244-7(2010).
188.Forster AC,Symons RH.Cell.1987.
189.Weinberg Z,Kim PB,Chen TH,Li S,Harris KA,Lünse CE,Breaker RR.Nat.Chem.Biol.2015.
190.Feldstein PA,Buzayan JM,Bruening G.Gene 1989.
191.Saville BJ,Collins RA.Cell.1990.
192.Roth A,Weinberg Z,Chen AG,Kim PG,Ames TD,Breaker RR.Nat Chem Biol.2013.
193.Borchardt EK,et al.RNA 1921-30(2015).
194.Zhang Y,et al.Mol Cell 792-806(2013).
195.Dang Y,et al.Genome Biol 280(2015).
196.Schaefer M,Kapoor U,and Jantsch MF.Open Biol 170077(2017).
197.Nahar S,et al.Chem Comm 2377-80(2018).
198.Gao Y and Zhao Y.J Integr Plant Biol 343-9(2014).
199.Dubois N,Marquet R,Paillart J,Bernacchi S.Front Microbiol 527(2018).
200.Costa M and Michel F.EMBO J 1276-85(1995).
201.Hu JH,et al.Nature 57-63(2018).
202.Furukawa K,Gu H,Breaker RR.Methods Mol Biol 209-20(2014).
203.Zettler,J.,Schütz,V.&Mootz,H.D.The naturally split Npu DnaE intein exhibits an extraordinarily high rate in the protein trans-splicing reaction.FEBS Lett.583,909–914(2009).
204.Kügler,S.,Kilic,E.&,M.Human synapsin 1gene promoter confers highly neuron-specific long-term transgene expression from an adenoviral vector in the adult rat brain depending on the transduced area.Gene Ther.10,337–347(2003).
205.de Felipe,P.,Hughes,L.E.,Ryan,M.D.&Brown,J.D.Co-translational,intraribosomal cleavage of polypeptides by the foot-and-mouth disease virus 2A peptide.J.Biol.Chem.278,11441–11448(2003).
206.Levy,J.M.&Nicoll,R.A.Membrane-associated guanylate kinase dynamics reveal regional and developmental specificity of synapse stability.J.Physiol.595,1699–1709(2017).
207.Li,B.&Dewey,C.N.RSEM:accurate transcript quantification from RNA-Seq data with or without a reference genome.BMC Bioinformatics 12,323(2011).
208.Ritchie,M.E.et al.limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Res.43,e47–e47(2015).
Equivalents and scope
Articles such as "a," "an," and "the" may mean one or more than one, unless the context clearly indicates otherwise. Unless the context indicates otherwise or is otherwise evident, embodiments or descriptions that include "or" between one or more members of a group are deemed to be satisfied if one, more than one, or all of the group members are present, used, or otherwise relevant to a given product or process. The present invention includes embodiments wherein one member of the group is exactly present, used, or otherwise associated with a given product or process. The present invention includes embodiments wherein more than one or all of the group members are present, utilized, or otherwise associated with a given product or process.
Furthermore, this disclosure encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims are introduced into another claim. For example, any claim that depends from another claim may be modified to include one or more limitations found in any other claim that depends from the same basic claim. Where elements are presented in a list, for example, in Markush groups, each subgroup of elements is also disclosed, and any elements may be removed from the group. It should be understood that, in general, where the invention or aspects of the invention are referred to as comprising particular elements and/or features, certain embodiments of the disclosure or aspects of the disclosure consist of or consist essentially of such elements and/or features. For simplicity, these embodiments are not specifically set forth herein. It should also be noted that the terms "comprising" and "including" are intended to be open-ended and allow for the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values expressed as ranges can infer any particular value or subrange within the ranges described in the various embodiments of the invention to one tenth of the unit of the lower limit of the range unless the context clearly dictates otherwise.
The present application is directed to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If a conflict exists between any of the incorporated references and this specification, the present specification will control. Furthermore, any particular embodiment of the application that belongs to the prior art may be explicitly excluded from any one or more embodiments. Because such embodiments are believed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the application may be excluded from any embodiment for any reason, whether or not related to the existence of prior art.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the foregoing description, but rather is set forth in the appended embodiments. Those of ordinary skill in the art will appreciate that various changes and modifications may be made to the present description without departing from the spirit or scope of the application, as defined in the following embodiments.

Claims (145)

1. A pegRNA for guiding editing comprising a guide RNA and at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site, wherein the extension arm comprises a nucleic acid portion attached thereto, the nucleic acid portion selected from the group consisting of: toe ring, hairpin, stem loop, pseudoknot, aptamer, G-quadruplex, tRNA, riboswitch or ribozyme.
2. The pegRNA of claim 1, wherein the nucleic acid portion is attached to the 3' end of the extension arm.
3. The pegRNA of claim 1, wherein the nucleic acid portion is attached to the 5' end of the extension arm.
4. The pegRNA of claim 1, wherein the pseudoknot is an Mpknot1 portion having a nucleotide sequence selected from the group consisting of: SEQ ID NO:195 (Mpknot 1), SEQ ID NO:196 (Mpknot 13 ' trimmed), SEQ ID NO:197 (Mpknot 1 with 5' extra), SEQ ID NO:198 (Mpknot 1U 38A), SEQ ID NO:199 (Mpknot 1U 38A A C), SEQ ID NO:200 (MMLC A29C), SEQ ID NO:201 (Mpknot 1 with 5' extra and U38A), SEQ ID NO:202 (Mpknot 1 with 5' extra and U38A A C) and SEQ ID NO:203 (Mpknot 1 with 5' extra and A29C), or a nucleotide sequence having at least 80% sequence identity thereto.
5. The pegRNA of claim 1, wherein the G-quadruplex has a nucleotide sequence selected from the group consisting of: SEQ ID NO:204 (tns 1), SEQ ID NO:205 (stk 40), SEQ ID NO:206 (apc 2), SEQ ID NO:207 (ceacam 4), SEQ ID NO:208 (pitpnm 3), SEQ ID NO:209 (rlf), SEQ ID NO:210 (erc 1), SEQ ID NO:211 (ube 3 c), SEQ ID NO:212 (taf 15), SEQ ID NO:213 (stard 3) and SEQ ID NO:214 (g 2), or a nucleotide sequence having at least 80% sequence identity thereto.
6. The pegRNA of claim 1, wherein the evoparq 1 has a nucleotide sequence selected from the group consisting of: SEQ ID NO. 215 (evapoteq 1), SEQ ID NO. 216 (evapoteq 1 motif 1), SEQ ID NO. 217 (evapoteq 1 motif 2), SEQ ID NO. 218 (evapoteq 1 motif 3), SEQ ID NO. 219 (shorter preq 1-1), SEQ ID NO. 220 (preq 1-1G 5C (mut 1)) and SEQ ID NO. 221 (preq 1-1G 15C (mut 2)), or a nucleotide sequence having at least 80% sequence identity thereto.
7. The pegRNA of claim 1, wherein said tRNA part has the nucleotide sequence of SEQ ID NO. 222, or a nucleotide sequence having at least 80% sequence identity thereto.
8. The pegRNA of claim 1, wherein the nucleic acid portion has the nucleotide sequence of SEQ ID No. 223 (xrn 1), or a nucleotide sequence having at least 80% sequence identity thereto.
9. The pegRNA of claim 1, wherein the nucleic acid portion has the nucleotide sequence of SEQ ID NO 224 (grp 1 intron P4P 6), or a nucleotide sequence having at least 80% sequence identity thereto.
10. The pegRNA of any one of claims 1-9, wherein the nucleic acid portion is attached to the pegRNA by a linker.
11. The pegRNA of claim 10, wherein the linker has a nucleotide sequence selected from the group consisting of SEQ ID NOs 225-236.
12. The pegRNA of claim 10, wherein the linker is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides in length, wherein the linker is no longer than 50 nucleotides.
13. The pegRNA of claim 10, wherein the linker is 8 nucleotides in length.
14. The pegRNA of claim 1, wherein the extension arm is located at the 3 'or 5' end of the guide RNA, and wherein the nucleic acid extension arm is DNA or RNA.
15. The peprna of claim 1, wherein the peprna is capable of binding to napDNAbp and directing the napDNAbp to a target DNA sequence.
16. The pegRNA of claim 15, wherein the target DNA sequence comprises a target strand and a complementary non-target strand.
17. The pegRNA of claim 16, wherein the guide RNA hybridizes to the target strand to form an RNA-DNA hybrid and an R loop.
18. The pegRNA of claim 1, wherein the nucleic acid extension arm is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 43 nucleotides, at least 46 nucleotides, at least 48 nucleotides, at least 46 nucleotides, at least 48 nucleotides.
19. The pegRNA of claim 1, wherein the DNA synthesis template is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length.
20. The pegRNA of claim 1, wherein the DNA synthesis template encodes a desired edit.
21. The pegRNA of claim 1, wherein the primer binding site is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides in length.
22. A compound for guiding editing, comprising:
(a) A fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) and a domain comprising RNA-dependent DNA polymerase activity; and
(b) The pegRNA of any one of claims 1 to 21.
23. The complex of claim 22, wherein the napDNAbp has nickase activity.
24. The complex of claim 22, wherein the napDNAbp is a Cas9 protein or variant thereof.
25. The complex of claim 22, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
26. The complex of claim 22, wherein the napDNAbp is Cas9 nickase (nCas 9).
27. The complex of claim 22, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
28. The complex of claim 22, wherein the domain comprising RNA-dependent DNA polymerase activity is a reverse transcriptase comprising any one of the amino acid sequences of SEQ ID NOs 32, 34, 36, 102-128 and 132.
29. The complex of claim 22, wherein the domain comprising RNA-dependent DNA polymerase activity is a reverse transcriptase comprising an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs 32, 34, 36, 102-128 and 132.
30. The complex of claim 22, wherein the domain comprising RNA-dependent DNA polymerase activity is a naturally occurring reverse transcriptase from a retrovirus or retrotransposon.
31. A nucleic acid molecule encoding the pegRNA of any one of claims 1 to 19.
32. An expression vector comprising the nucleic acid molecule of claim 31, wherein the nucleic acid molecule is under the control of a promoter.
33. The expression vector of claim 32, wherein the promoter is a polIII promoter.
34. The expression vector of claim 32, wherein the promoter is a U6 promoter.
35. The expression vector of claim 32, wherein the promoter is a U6, U6v4, U6v7, or U6v9 promoter or fragment thereof.
36. A cell comprising the pegRNA of any one of claims 1-21.
37. A cell comprising the complex of any one of claims 22-30.
38. A cell comprising the nucleic acid molecule of claim 31.
39. A cell comprising the expression vector of any one of claims 32-35.
40. A pharmaceutical composition comprising: (i) The pegRNA of any one of claims 1-21, the complex of any one of claims 22-30, the nucleic acid molecule of claim 31, the expression vector of any one of claims 32-35, or the cell of any one of claims 36-39, and (ii) a pharmaceutically acceptable excipient.
41. A kit composition comprising: (i) The pegRNA of any one of claims 1-21, the complex of any one of claims 22-30, the nucleic acid molecule of claim 31, the expression vector of any one of claims 32-35, or the cell of any one of claims 36-39, and (ii) a set of instructions for performing guided editing.
42. A method of guided editing comprising contacting a target DNA sequence with the pegRNA of any one of claims 1-21 and a guided editor comprising napDNAbp and a domain having RNA-dependent DNA polymerase activity, wherein editing efficiency is increased compared to the same method using a peprna that does not comprise a modification.
43. The method of claim 42, wherein the editing efficiency is increased by at least a factor of 1.5.
44. The method of claim 42, wherein the editing efficiency is increased by at least a factor of 2.
45. The method of claim 42, wherein the editing efficiency is increased by at least a factor of 3.
46. The method of claim 42, wherein the napDNAbp has nickase activity.
47. The method of claim 42, wherein the napDNAbp is a Cas9 protein or variant thereof.
48. The method of claim 47, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
49. The method of claim 48, wherein the napDNAbp is Cas9 nickase (nCas 9).
50. The method of claim 42, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
51. The method of claim 42, wherein the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising any one of the amino acid sequences of SEQ ID NOs 32, 34, 36, 102-128 and 132.
52. The method of claim 42, wherein the domain comprising RNA dependent DNA polymerase activity is a reverse transcriptase comprising an amino acid sequence having at least 80%, 85%, 90%, 95%, 98% or 99% sequence identity to the amino acid sequence of any one of SEQ ID NOs 32, 34, 36, 102-128 and 132.
53. The method of claim 42, wherein the domain comprising RNA dependent DNA polymerase activity is a naturally occurring reverse transcriptase from a retrovirus or retrotransposon.
54. The pegRNA of claim 10, wherein the linker is designed by the computational method of claim 56.
55. A method for precisely installing nucleotide edits in a double-stranded target DNA sequence, the method comprising: contacting the double stranded target DNA sequence with a guide editor comprising a nucleic acid programmable DNA binding protein (napDNAbp), a DNA polymerase, and a guide editing RNA (pegRNA), wherein the pegRNA comprises:
(a) A spacer region that hybridizes to a first strand of the double-stranded target DNA sequence;
(b) An extension arm that hybridizes to a second strand of the double-stranded target DNA sequence;
(c) A DNA synthesis template comprising the nucleotide edits;
(d) A gRNA core that interacts with the napDNAbp;
(e) A nucleic acid moiety attached to the pegRNA, the nucleic acid moiety selected from the group consisting of: toe ring, hairpin, stem loop, pseudoknot, aptamer, G-quadruplex, tRNA, riboswitch or ribozyme; and
(f) A linker coupling the nucleic acid moiety to the pegRNA,
wherein the joint is designed from a computational model; and is also provided with
Wherein the PEgRNA directs the guide editor to install the nucleotide edits in the double stranded target DNA sequence.
56. A method of identifying at least one nucleic acid adaptor for coupling a guide editing guide RNA (pegRNA) to a nucleic acid portion, the method comprising:
Using at least one computer hardware processor to perform:
generating a plurality of nucleic acid adaptor candidates including a first nucleic acid adaptor candidate;
identifying the at least one nucleic acid adaptor from the plurality of nucleic acid adaptor candidates at least in part by:
calculating a plurality of scores for each of at least some of the plurality of nucleic acid adaptor candidates, the calculating comprising calculating a first set of scores for the first nucleic acid adaptor candidate, the first set of scores comprising:
a first score indicative of a degree of interaction between the first nucleic acid linker candidate and the first region of the pegRNA;
a second score indicative of a degree of interaction between the first nucleic acid linker candidate and a second region of the pegRNA; and
identifying the at least one nucleic acid adaptor from at least some of the plurality of nucleic acid adaptor candidates using the calculated plurality of scores; and is also provided with
Outputting information indicative of the at least one nucleic acid adaptor.
57. The method of claim 56, wherein the first score indicates a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with the first region of the pegRNA, and wherein the second score indicates a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with the second region of the pegRNA.
58. The method of claim 57, wherein said first region comprises a Primer Binding Site (PBS) of said pegRNA.
59. The method of claim 58, wherein said second region comprises a spacer of said pegRNA.
60. The method of claim 57, wherein the first component further comprises a third score indicative of a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with a third region of the pegRNA and a degree to which the first nucleic acid linker candidate is predicted to avoid interacting with a fourth region of the pegRNA.
61. The method of claim 60, wherein the third region comprises a DNA synthesis template.
62. The method of claim 61, wherein the fourth region comprises a gRNA core that interacts with a nucleic acid programmable DNA binding protein (napdNAbp).
63. The method of claim 60, wherein the step of,
wherein the pegRNA is used to install nucleotide edits in a double stranded target DNA sequence,
wherein the pegRNA comprises:
a spacer region which hybridizes to a first strand of the double stranded target DNA sequence,
an extension arm that hybridizes to a second strand of the double-stranded target DNA sequence, the extension arm comprising a Primer Binding Site (PBS) and a DNA synthesis template comprising the nucleotide editing, and
A gRNA core that interacts with the nucleic acid programmable DNA binding protein napdNAbp, and
wherein the first region comprises the PBS and the second region comprises the spacer,
the third region comprises the DNA synthesis template and the fourth region comprises the gRNA core.
64. The method of claim 56, wherein the plurality of nucleic acid adaptor candidates comprises a second nucleic acid adaptor candidate, and wherein identifying the at least one nucleic acid adaptor from at least some of the plurality of nucleic acid adaptor candidates using the calculated plurality of scores comprises:
comparing the first set of scores of the first nucleic acid adaptor candidate with a second set of scores of the second nucleic acid adaptor candidate.
65. The method of claim 64, wherein:
the first region comprises a Primer Binding Site (PBS),
the first score in the first set of scores is indicative of a degree to which the first nucleic acid adaptor candidate is predicted to avoid interaction with a first region of the pegRNA,
the third score in the second set of scores is indicative of a degree to which the second nucleic acid adaptor candidate is predicted to avoid interaction with the first region of the pegRNA, and
Comparing the first set of scores to the second set of scores includes:
comparing the first score with the third score.
66. The method of claim 65, wherein when the first score is equal to or within a threshold distance of the third score, comparing the first set of scores to the second set of scores further comprises:
comparing a score of the first set of scores other than the first score with another score of the second set of scores other than the third score.
67. PEgRNA for use in guiding editing comprising (i) a guide RNA comprising a spacer region and (ii) at least one nucleic acid extension arm comprising a DNA synthesis template, a primer binding site, a fulcrum motif (toehold motif) and an additional nucleic acid portion.
68. The PEgRNA of claim 67, wherein the pivot motif and the additional nucleic acid portion are attached to the 3' end of the extension arm.
69. The PEgRNA of claim 67 or 68, wherein the pivot motif is attached to the 3 'end of the extension arm and the additional nucleic acid portion is attached to the 3' end of the pivot motif.
70. The PEgRNA of any one of claims 67-69, wherein the fulcrum motif is attached to the PEgRNA by a linker.
71. The PEgRNA of claim 70, wherein the linker is at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 nucleotides in length.
72. The PEgRNA of any one of claims 67-71, wherein the PEgRNA is capable of binding to a nucleic acid programmable DNA binding protein (napDNAbp) of a guide editor and guiding the napDNAbp to a target DNA sequence.
73. A guided editing system for site-specific genomic modification comprising (a) the PEgRNA of any one of claims 67-72, and (b) a guided editor comprising (i) napDNAbp, (ii) a DNA polymerase, and (iii) a moiety that binds to a fulcrum motif of the PEgRNA.
74. The system of claim 73, wherein the portion of the guide editor that binds to the fulcrum motif of the PEgRNA is fused to the N-terminal end of the guide editor.
75. The system of claim 73, wherein the portion of the guide editor that binds to the fulcrum motif of the PEgRNA is fused to the C-terminal end of the guide editor.
76. The system of any one of claims 73-75, wherein the portion of the guide editor that binds to the fulcrum motif of the PEgRNA is fused to the guide editor by a linker.
77. The system of claim 76, wherein the linker is at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, or more than 30 amino acids in length.
78. The system of claim 76 or 77, wherein the linker comprises an xten linker.
79. The system of any one of claims 73-78, wherein the portion of the guide editor that binds to the fulcrum motif of the PEgRNA comprises an MS2 phage coat protein.
80. The system of any one of claims 73-79, wherein the napDNAbp has nickase activity.
81. The system of any one of claims 73-80, wherein the napDNAbp is a Cas9 protein or variant thereof.
82. The system of any one of claims 73-79, wherein the napDNAbp is nuclease-active Cas9, nuclease-inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
83. The system of any one of claims 73-79, wherein the napDNAbp is Cas9 nickase (nCas 9).
84. The system of any one of claims 73-79, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
85. A polynucleotide comprising the PEgRNA of any one of claims 67-72.
86. A vector comprising the polynucleotide of claim 85.
87. A cell comprising the PEgRNA of any one of claims 67-72, the system of any one of claims 73-84, the polynucleotide of claim 85, or the vector of claim 86.
88. A pharmaceutical composition comprising (i) PEgRNA of any one of claims 67-72, the system of any one of claims 73-84, the polynucleotide of claim 85, or the vector of claim 86, and (ii) a pharmaceutically acceptable excipient.
89. A kit comprising the PEgRNA of any one of claims 67-72, the system of any one of claims 73-84, the polynucleotide of claim 85, the vector of claim 86, or the cell of claim 87.
90. A method of guided editing comprising providing a target DNA sequence to the system of any one of claims 73-84, wherein the target DNA sequence is contacted with the PEgRNA and the guided editor of the system.
91. PEgRNA pair for guided editing comprising
(i) A first PEgRNA comprising a guide RNA and at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site, wherein the extension arm comprises a nucleic acid portion attached thereto, the nucleic acid portion selected from the group consisting of: toe ring, hairpin, stem loop, pseudoknot, aptamer, G-quadruplex, tRNA, riboswitch or ribozyme; and
(ii) A second PEgRNA comprising a second strand-gap-producing guide RNA, wherein the second strand-gap-producing guide RNA comprises at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site.
92. The PEgRNA pair of claim 91, wherein the first PEgRNA and the second PEgRNA are each capable of binding to a nucleic acid programmable DNA binding protein (napDNAbp) of a guide editor and guide the napDNAbp to a target DNA sequence.
93. A guided editing system for site-specific genomic modification comprising (a) the PEgRNA pair of claim 91 or 92, and (b) at least one guided editor comprising napDNAbp and a DNA polymerase.
94. The system of claim 93, wherein the system comprises a first guidance editor and a second guidance editor, each comprising napDNAbp and a DNA polymerase.
95. The system of claim 94, wherein the napDNAbp of the first guide editor binds to a first PEgRNA of the PEgRNA pair, and wherein the napDNAbp of the second guide editor binds to a second PEgRNA of the PEgRNA pair.
96. The system of any one of claims 93-95, wherein the napDNAbp has nickase activity.
97. The system of any one of claims 93-95, wherein the napDNAbp is a Cas9 protein or variant thereof.
98. The system of any one of claims 93-95, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
99. The system of any one of claims 93-95, wherein the napDNAbp is Cas9 nickase (nCas 9).
100. The system of any one of claims 93-95, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
101. A polynucleotide comprising the PEgRNA of claim 91 or 92.
102. A vector comprising the polynucleotide of claim 101.
103. A cell comprising the PEgRNA of claim 91 or 92, the system of any one of claims 93-100, the polynucleotide of claim 101, or the vector of claim 102.
104. A pharmaceutical composition comprising (i) the PEgRNA of claim 91 or 92, the system of any one of claims 93-100, the polynucleotide of claim 101, or the vector of claim 102, and (ii) a pharmaceutically acceptable excipient.
105. A kit comprising PEgRNA of claim 91 or 92, the system of any one of claims 93-100, the polynucleotide of claim 101, the vector of claim 102, or the cell of claim 103.
106. A method of guided editing comprising providing a target DNA sequence to the system of any one of claims 93-100, wherein the target DNA sequence is contacted with the PEgRNA pair and the one or more guided editors of the system.
Pegrna comprising (i) a guide RNA comprising a spacer region and (ii) at least one nucleic acid extension arm comprising a DNA synthesis template and a primer binding site, wherein the primer binding site comprises one or more modified nucleotides that result in a greater reduction in binding affinity of the primer binding site to a pre-spacer sequence on a target DNA molecule than to the spacer region.
108. The PEgRNA of claim 107, wherein the one or more modified nucleotides comprise a genetic mutation.
109. The PEgRNA of claim 107, wherein the one or more modified nucleotides comprise a chemically modified nucleotide.
110. A guided editing system for site-specific genome modification comprising (a) the PEgRNA pair of any one of claims 107-109, and (b) at least one guided editor comprising napDNAbp and a DNA polymerase.
111. The system of claim 110, wherein the system comprises a first guidance editor and a second guidance editor, each comprising napDNAbp and a DNA polymerase.
112. The system of claim 111, wherein the napDNAbp of the first guide editor binds to a first PEgRNA of the PEgRNA pair, and wherein the napDNAbp of the second guide editor binds to a second PEgRNA of the PEgRNA pair.
113. The system of any one of claims 110-112, wherein the napDNAbp has nickase activity.
114. The system of any one of claims 110-112, wherein the napDNAbp is a Cas9 protein or variant thereof.
115. The system of any one of claims 110-112, wherein the napDNAbp is nuclease-active Cas9, nuclease-inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
116. The system of any one of claims 110-112, wherein the napDNAbp is Cas9 nickase (nCas 9).
117. The system of any one of claims 110-112, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
118. A polynucleotide comprising the PEgRNA of any one of claims 107-109.
119. A vector comprising the polynucleotide of claim 118.
120. A cell comprising the PEgRNA of any one of claims 107-109, the system of any one of claims 110-117, the polynucleotide of claim 118, or the vector of claim 119.
121. A pharmaceutical composition comprising (i) the PEgRNA of any one of claims 107-109, the system of any one of claims 110-117, the polynucleotide of claim 118, or the vector of claim 119, and (ii) a pharmaceutically acceptable excipient.
122. A kit comprising the PEgRNA of any one of claims 107-109, the system of any one of claims 110-117, the polynucleotide of claim 118, the vector of claim 119, or the cell of claim 120.
123. A method of guided editing comprising providing a target DNA sequence to the system of any one of claims 110-117, wherein the target DNA sequence is contacted with the PEgRNA pair and the one or more guided editors of the system.
124. A method of correcting one or more mutations in a CDKL5 gene by guide editing using a single pegRNA, the method comprising contacting a target DNA sequence with a guide editor comprising (i) napDNAbp and (ii) a domain having RNA-dependent DNA polymerase activity, and a pegRNA, wherein the pegRNA targets the guide editor to the CDKL5 gene comprising the one or more mutations.
125. The method of claim 124, wherein the pegRNA is provided in figure 146.
126. The method of claim 124, wherein the pegRNA is provided in figure 148.
127. The method of claim 124, wherein the mutation in the CDKL5 gene comprises a 1412delA mutation.
128. The method of claim 124, wherein the one or more mutations encodes a V172I, A173D, R175S, W176G, W176R, Y177C, R178P, P180L, E a or L182P substitution.
129. The method of claim 124, wherein the napDNAbp has nickase activity.
130. The method of claim 124, wherein the napDNAbp is a Cas9 protein or variant thereof.
131. The method of claim 124, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
132. The method of claim 124, wherein the napDNAbp is Cas9 nickase (nCas 9).
133. The method of claim 124, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
134. The method of claim 124, wherein the domain comprising RNA-dependent DNA polymerase activity is a reverse transcriptase.
135. A method of treating a plurality of subjects suffering from a CDKL5 deficiency caused by different mutations in a CDKL5 gene, the method comprising contacting a target DNA sequence with a guide editor comprising (i) napDNAbp and (ii) a domain having RNA-dependent DNA polymerase activity, and a single pegRNA, wherein the single pegRNA is capable of targeting the guide editor to the CDKL5 gene in any of the plurality of subjects, thereby producing a repaired CDKL5 gene in a mutation-agnostic manner.
136. The method of claim 135, wherein the pegRNA is provided in figure 148.
137. The method of claim 135, wherein the pegRNA is provided in graph 150.
138. The method of claim 135, wherein the mutation in the CDKL5 gene comprises a 1412delA mutation.
139. The method of claim 135, wherein the one or more mutations encodes a V172I, A173D, R175S, W176G, W176R, Y177C, R178P, P180L, E a or L182P substitution.
140. The method of claim 135, wherein the napDNAbp has nickase activity.
141. The method of claim 135, wherein the napDNAbp is a Cas9 protein or variant thereof.
142. The method of claim 135, wherein the napDNAbp is nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9).
143. The method of claim 135, wherein the napDNAbp is Cas9 nickase (nCas 9).
144. The method of claim 135, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
145. The method of claim 135, wherein the domain comprising RNA-dependent DNA polymerase activity is a reverse transcriptase.
CN202180078921.8A 2020-09-24 2021-09-24 Guided editing guide RNAs, compositions thereof, and methods of using the same Pending CN116685682A (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US63/083,067 2020-09-24
US63/091,272 2020-10-13
US63/182,633 2021-04-30
US202163231231P 2021-08-09 2021-08-09
US63/231,231 2021-08-09
PCT/US2021/052097 WO2022067130A2 (en) 2020-09-24 2021-09-24 Prime editing guide rnas, compositions thereof, and methods of using the same

Publications (1)

Publication Number Publication Date
CN116685682A true CN116685682A (en) 2023-09-01

Family

ID=87784135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180078921.8A Pending CN116685682A (en) 2020-09-24 2021-09-24 Guided editing guide RNAs, compositions thereof, and methods of using the same

Country Status (1)

Country Link
CN (1) CN116685682A (en)

Similar Documents

Publication Publication Date Title
US20230357766A1 (en) Prime editing guide rnas, compositions thereof, and methods of using the same
US11912985B2 (en) Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US11447770B1 (en) Methods and compositions for prime editing nucleotide sequences
JPWO2020191233A5 (en)
JPWO2020191234A5 (en)
JPWO2020191243A5 (en)
CN116685682A (en) Guided editing guide RNAs, compositions thereof, and methods of using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination