CN117321201A - Boot editor variants, constructs and methods for enhancing boot editing efficiency and accuracy - Google Patents

Boot editor variants, constructs and methods for enhancing boot editing efficiency and accuracy Download PDF

Info

Publication number
CN117321201A
CN117321201A CN202280020781.3A CN202280020781A CN117321201A CN 117321201 A CN117321201 A CN 117321201A CN 202280020781 A CN202280020781 A CN 202280020781A CN 117321201 A CN117321201 A CN 117321201A
Authority
CN
China
Prior art keywords
fold
seq
dna
sequence
amino acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280020781.3A
Other languages
Chinese (zh)
Inventor
D·R·刘
P·J·陈
B·亚当森
J·哈斯曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harvard College
University of California
Princeton University
Broad Institute Inc
Original Assignee
Harvard College
University of California
Princeton University
Broad Institute Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harvard College, University of California, Princeton University, Broad Institute Inc filed Critical Harvard College
Priority claimed from PCT/US2022/012054 external-priority patent/WO2022150790A2/en
Publication of CN117321201A publication Critical patent/CN117321201A/en
Pending legal-status Critical Current

Links

Landscapes

  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

The present disclosure provides compositions and methods for guided editing with increased editing efficiency and/or reduced indel formation by inhibiting DNA mismatch repair pathways while targeted site guided editing. Thus, the present disclosure provides methods of editing a nucleic acid molecule by guided editing, which methods involve contacting the nucleic acid molecule with inhibitors of guided editors, pegRNA and DNA mismatch repair pathways, thereby installing one or more modifications to the nucleic acid molecule at a target site with increased editing efficiency and/or reduced indels formation. The present disclosure further provides polynucleotides for editing a DNA target site by guided editing comprising a nucleic acid sequence encoding napDNAbp, a polymerase and an inhibitor of the DNA mismatch repair pathway, wherein napDNAbp and polymerase are capable of installing one or more modifications at the DNA target site with increased editing efficiency and/or reduced indel formation in the presence of peprna. The present disclosure further provides vectors, cells, and kits comprising the compositions and polynucleotides of the present disclosure. The present disclosure also provides compositions and methods for guided editing with modified guided editor fusion proteins with improved editing efficiency and/or reduced indel formation. The present disclosure further provides vectors, cells, and kits comprising the compositions and polynucleotides of the present disclosure.

Description

Boot editor variants, constructs and methods for enhancing boot editing efficiency and accuracy
Government support
The invention is completed with the following government support: the grant numbers AI142756, AI 150951, HG009490, EB022376, EB031172, GM118062, CA072720, GM138167, U01AI142756, RM1HG009490, R01EB022376 and R35GM118062 issued by the national institutes of health, and grant number 18062HR0011-17-2-0049 issued by the national defence department. The government has certain rights in this invention.
RELATED APPLICATIONS
The present application claims the following priority according to 35 u.s.c. ≡119 (e): U.S. provisional application U.S. S. 5/194,913 filed on 10 month 14 of 2021, U.S. provisional application U.S. 63/255,897 filed on 8 month 9 of 2021, U.S. provisional application U.S. 63/231,230 filed on 28 of 2021, U.S. provisional application U.S. 63/194,865 filed on 28 of 2021, U.S. S. 5/63/176,202 filed on 16 of 2021, U.S. provisional application U.S. 63/176,20263/176,180 filed on 16 of 2021, and U.S. S. 63/136,194 filed on 11 of 2021 are each incorporated herein by reference.
Incorporated by reference
Further, the present application cites and incorporates by reference the entire contents of each of the following patent applications for guided editing previously filed by one or more of the present application inventors: U.S. provisional application U.S. N.62/820,813, filed on 19 days 3 and 3 of 2019; U.S. provisional application U.S. N.62/858,958, filed on 7/6/2019; U.S. provisional application U.S. N.62/889,996, filed on 21/8/2019; U.S. provisional application U.S. N.62/922,654, filed on 21/8/2019; U.S. provisional application U.S. N.62/913,553, filed on 10 months 10 of 2019; U.S. provisional application U.S. N.62/973,558, filed on 10 months 10 of 2019; U.S. provisional application U.S. N.62/931,195, filed on 5/11/2019; U.S. provisional application U.S. N.62/944,231, filed on 5/12.2019; U.S. provisional application U.S. N.62/974,537, filed on 5/12.2019; U.S. provisional application U.S. N.62/991,069, filed on 17 days 3 months 2020; U.S. provisional application U.S. N.63/100,548, filed on 17 days 3 and 17 months 2020; international PCT application No. PCT/US2020/023721, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US2020/023553, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US2020/023583, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US2020/023730, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US2020/023713, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US 2020/023972, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US2020/023727, filed on 19 days 3 and 3 in 2020; international PCT application No. PCT/US2020/023724, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US2020/023725, filed on 19 days 3/3 in 2020; international PCT application No. PCT/US 2020/023228, filed on 19/3/2020; international PCT application No. PCT/US2020/023732, filed on 19 days 3/3 in 2020; and International PCT application No. PCT/US2020/023723, 19 days 3/3 in 2020.
Background
Recent advances in pilot editing have enabled insertion, deletion and/or substitution of genomic DNA sequences without requiring error-prone double-stranded DNA breaks. See Anzalone et al, "Search-and-replace genome editing without double-strand breaks or donor DNA," Nature,2019, vol.576, pp.149-157, the contents of which are incorporated herein by reference. The guide editing uses an engineered Cas9 nickase-reverse transcriptase fusion protein (e.g., PE1 or PE 2) paired with an engineered guide editing guide RNA (pegRNA) that not only guides Cas9 to the target genomic site, but also encodes information for installing the desired editing. Without wishing to be bound by any particular theory, guided editing proceeds through a hypothetical multi-step editing process: 1) The Cas9 domain binds and nicks the target genomic DNA site, which is specified by the spacer sequence of the pegRNA; 2) Reverse transcriptase domain uses nicked genomic DNA as primer, using engineered extension on pegRNA as reverse transcription template to initiate synthesis of edited DNA strand-this will generate single stranded 3' flap containing edited DNA sequence; 3) Cellular DNA repair resolves 3' flap intermediates by substitution of 5' flap species, which occurs by invasion through the edited 3' flap, excision of the 5' flap containing the original DNA sequence, and ligation of the new 3' flap to incorporate the edited DNA strand, forming a heteroduplex of one edited strand and one unedited strand; and 4) cellular DNA repair uses the edited strand as a repair template to replace the unedited strand in the heteroduplex, completing the editing process.
Since 2019, guided editing has been applied to introduce genetic changes in a variety of cells and/or organisms. In view of its rapid adoption, guided editing represents a powerful tool for genome editing. Despite its versatility and widespread use, guided editing efficiencies can vary greatly between different editing categories, target loci, and cell types (Anzalone et al, 2019). Thus, modifications to the guided editing system to increase the specificity and/or efficiency of the guided editing process would significantly help advance the art. In particular, modifications are desired that facilitate more efficient incorporation of edited DNA strands that direct editor synthesis into target genomic sites. It is also desirable to reduce the frequency of indel byproducts formed by pilot editing. Such further modifications to the guided editing will drive the development of this technology.
Summary of The Invention
In one aspect, the disclosure relates to the observation that the efficiency and/or specificity of guided editing is affected by the DNA mismatch repair (MMR) DNA repair pathway of the cell itself. MMR is a multifactorial pathway involved in correcting base pair mismatches and insertion/deletion mismatches generated during DNA replication and recombination. As described herein, the inventors developed a novel genetic screening method (referred to in one embodiment as a "mixed CRISPRi screen for guided editing results") that resulted in the identification of various genetic determinants affecting guided editing efficiency and/or specificity, including MMR. Accordingly, in one aspect, the present disclosure provides a novel guided editing system that includes means for suppressing and/or circumventing MMR effects, thereby improving the efficiency and/or specificity of guided editing. In one embodiment, the present disclosure provides a guided editing system comprising a dominant negative variant of an MMR inhibitor protein, such as, but not limited to, a dominant negative MLH1 protein (i.e., "MLH1 dn"). In another embodiment, directing the editing system includes installing one or more silent mutations near the intended edit, thereby allowing the intended edit to escape MMR recognition even in the absence of MMR inhibitor proteins such as MLH1 dn. In another aspect, the present disclosure provides novel genetic screens for identifying genetic determinants, such as MMR, that affect the efficiency and/or specificity of guided editing. In yet a further aspect, the present disclosure provides a nucleic acid construct encoding the improved guided editing system described herein. The present disclosure also provides, in other aspects, vectors (e.g., AAV or lentiviral vectors) comprising nucleic acids encoding the improved guided editing systems described herein. In other aspects, the present disclosure provides cells comprising the improved guided editing systems described herein. In other aspects, the disclosure also provides components of genetic screens, including nucleic acid and/or vector constructs, guide RNAs, pegrnas, cells (e.g., CRISPRi cells), and other reagents and/or materials for performing the genetic screens disclosed herein. In other aspects, the present disclosure provides compositions and kits, e.g., pharmaceutical compositions, that comprise the improved guided editing systems described herein and can be administered to a cell, tissue, or organism by any suitable means, such as by gene therapy, mRNA delivery, virus-like particle delivery, or Ribonucleoprotein (RNP) delivery. In yet another aspect, the present disclosure provides methods of installing one or more edits in a target nucleic acid molecule, e.g., a genomic locus, using an improved guided editing system. In another aspect, the present disclosure provides methods of treating a disease or disorder using an improved guided editing system to correct or otherwise repair one or more genetic alterations (e.g., single nucleotide polymorphisms) in a target nucleic acid molecule (e.g., genomic loci comprising one or more pathogenic mutations).
Thus, in various aspects, the present disclosure describes improved and modified guided editing methods that include inhibiting DNA mismatch repair (MMR) systems during guided editing. The inventors have surprisingly found that when one or more functions of a DNA mismatch repair (MMR) system are inhibited, blocked or otherwise inactivated during pilot editing (e.g., using an MLH1dn inhibitor of MMR), the editing efficiency of pilot editing can be significantly increased (e.g., at least 2-fold increase, at least 3-fold increase, at least 4-fold increase, at least 5-fold increase, at least 6-fold increase, at least 7-fold increase, at least 8-fold increase, at least 9-fold increase, at least 10-fold increase, or more). Furthermore, the inventors have surprisingly found that when one or more functions of a DNA mismatch repair (MMR) system are inhibited, blocked, or otherwise inactivated during guided editing, the frequency of indels formed by guided editing can be significantly reduced (e.g., about 2-fold reduction, about 3-fold reduction, about 4-fold reduction, about 5-fold reduction, about 6-fold reduction, about 7-fold reduction, about 8-fold reduction, about 9-fold reduction, or about 10-fold reduction or less).
The present disclosure also describes, in other embodiments, improved and modified guided editing methods that include escaping DNA mismatch repair (MMR) systems during guided editing. The inventors have surprisingly found that the editing efficiency of guided editing can be significantly increased when one or more silent mutations are installed near a desired site for installing a genetic change by guided editing in the presence or absence of an MMR inhibitor (e.g., at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least at least 30 times, at least 31 times, at least 32 times, at least 33 times, at least 34 times, at least 35 times, at least 36 times, at least 37 times, at least 38 times, at least 39 times, at least 40 times, at least 41 times, at least 42 times, at least 43 times, at least 44 times, at least 45 times, at least 46 times, at least 47 times, at least 48 times, at least 49 times, at least 50 times, at least 51 times, at least 52 times, at least 53 times, at least 54 times, at least 55 times, at least 56 times, at least 57 times, at least 58 times, at least 59 times, at least 60 times, at least 61 times, at least 62 times, at least 63 times, at least 64 times, at least 65 times, at least 66 times, at least 67 times, at least 68 times, at least 69 times, at least 70 times, at least 71 times, at least 72 times, at least 73 times, at least 74-fold or at least 75-fold). Furthermore, the inventors have surprisingly found that when one or more silent mutations are installed near a desired site for installation of a genetic alteration by guided editing, in the presence or absence of an MMR inhibitor, the frequency of indels resulting from guided editing can be significantly reduced (e.g., at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74-fold or at least 75-fold).
In some embodiments, the present disclosure describes an improved guided editing system, referred to herein as "PE4", that includes PE2 plus an MLH1 dominant negative protein (e.g., wild-type MLH1 with amino acids 754-756 truncated, as further described herein). In certain embodiments, the MLH1dn is expressed in trans in cells comprising PE2 fusion proteins. MLH1dn and PE2 can be provided together or separately, for example, by delivery on separate plasmids, separate vectors (e.g., AAV or lentiviral vectors), separate vector-like particles, separate ribonucleoprotein complexes (RNPs), or by delivery on the same plasmids, the same vectors (e.g., AAV or lentiviral vectors), the same vector-like particles, the same ribonucleoprotein complexes (RNPs). In other embodiments, the MLH1dn may be fused to PE2 or otherwise associated, coupled or joined with PE2 such that they are co-delivered.
In other embodiments, the present disclosure describes an improved guided editing system, referred to as "PE5", that includes PE3 (which is PE2 plus a second strand nick-producing guide RNA) plus an MLH1 dominant-negative protein (e.g., wild-type MLH1 with amino acids 754-756 truncated, as further described herein). In certain embodiments, the MLH1dn is expressed in trans in cells comprising a PE3 directed editor. MLH1dn and PE3 can be provided together or separately, for example, by delivery on separate plasmids, separate vectors (e.g., AAV or lentiviral vectors), separate vector-like particles, separate ribonucleoprotein complexes (RNPs), or by delivery on the same plasmids, the same vectors (e.g., AAV or lentiviral vectors), the same vector-like particles, the same ribonucleoprotein complexes (RNPs). In other embodiments, the MLH1dn may be fused to PE3 or otherwise associated, coupled or joined with PE3 such that they are co-delivered.
In other aspects, the present disclosure describes an optimized PE2 boot editor architecture referred to herein as "PEmax". PEmax is a modified form of PE2 that comprises modified reverse transcriptase codon usage, spCas9 mutation, NLS sequence, and is depicted in fig. 54B. Specifically, PEmax refers to a PE complex comprising a fusion protein comprising Cas9 (R221K N394K H840A) and a variant MMLV RT five mutant (D200N T306K W313F T330P L603W), having the following structure: [ bipartite NLS ] - [ Cas9 (R221K) (N394K) (H840A) ] ] - [ linker ] - [ MMLV_RT (D200N) (T330P) (L603W) ] ] - [ bipartite NLS ] - [ NLS ] + desired PEgRNA, wherein PE fusion has the sequence of SEQ ID NO:99, as follows:
explanation:
binary SV40 Nuclear Localization Sequence (NLS)Open end: (SEQ ID NO: 101), terminal: (SEQ ID NO: 140)
CAS9(R221K N39K H840A)(SEQ ID NO:104)
SGGSX 2-binary SV40NLS-SGGSX2 joint(SEQ ID NO:105)
M-MLV reverse transcriptase (D200N T306K W313F T330P L603W) (SEQ ID NO: 98)
Other linker sequences (SEQ ID NO: 122)
Other linker sequences (SEQ ID NO: 106)
c-Myc NLS PAAKR VKLD(SEQ ID NO:135)
In some embodiments, PE4 may be modified to replace the PE2 fusion protein with PEmax. In this case, the modified guidance editing system may be referred to as "PE4max".
In some embodiments, PE5 may be modified to replace the PE3 boot editor with PEmax. In this case, the modified primer editing system may be referred to as "PE5max" and includes the second nick-producing guide RNA.
The inventors developed guided editing that enabled insertion, deletion and/or substitution of genomic DNA sequences without requiring error-prone double-stranded DNA breaks. The present disclosure now provides improved guided editing methods that involve blocking, inhibiting, escaping or inactivating the MMR pathway (e.g., by inhibiting, blocking or inactivating MMR pathway proteins, including MLH 1) during guided editing, thereby surprisingly resulting in increased editing efficiency and reduced indel formation. As used herein, a guided editing "during" may encompass any suitable sequence of events such that the guided editing step may be applied before, simultaneously with, or after the step of blocking, inhibiting, evading, or inactivating the MMR pathway (e.g., by targeted inhibition of MLH 1).
In various aspects, and without wishing to be bound by any particular theory, the guided editing pairs with an engineered guided editing guide RNA (pegRNA) using an engineered Cas9 nickase-reverse transcriptase fusion protein (e.g., PE1 or PE 2), both of which guide Cas9 to the target genomic site and encode information for installing the desired editing. The pilot editing is performed by a multi-step editing process: 1) The Cas9 domain binds and nicks the target genomic DNA site, which is specified by the spacer sequence of the pegRNA; 2) Reverse transcriptase domain uses nicked genomic DNA as primer, using engineered extension on pegRNA as reverse transcription template to initiate synthesis of edited DNA strand-this will generate single stranded 3' flap containing edited DNA sequence; 3) Cellular DNA repair resolves 3' flap intermediates by substitution of 5' flap species, which occurs by invasion through the edited 3' flap, excision of the 5' flap containing the original DNA sequence, and ligation of the new 3' flap to incorporate the edited DNA strand, forming a heteroduplex of one edited strand and one unedited strand; and 4) cellular DNA repair uses the edited strand as a repair template to replace the unedited strand in the heteroduplex, completing the editing process.
Efficient incorporation of editing is expected to require that the newly synthesized 3' flap contain sequence portions homologous to genomic DNA sites. This homology enables the edited 3 'flap to compete with the endogenous DNA strand (the corresponding 5' flap) for incorporation into the DNA duplex. Since the edited 3' flap contains less sequence homology than the endogenous 5' flap, competition is expected to favor the 5' flap chain. Thus, a potential limiting factor in guiding editing efficiency may be that the 3 'flap containing the edit fails to invade and replace the 5' flap chain effectively. Furthermore, successful 3 'flap invasion and removal of the 5' flap incorporate editing on only one strand of the double stranded DNA genome. Permanent installation editing requires cellular DNA repair to replace the non-edited complementary DNA strand with the edited strand as a template. Although cells can be favored to replace the unedited strand instead of the edited strand by introducing a nick in the unedited strand adjacent to editing using a secondary sgRNA (i.e., the PE3 system) (step 4 above), this process still relies on the second stage of DNA repair.
The present disclosure describes a modified guided editing method that further includes inhibiting, blocking, or otherwise inactivating a DNA mismatch repair (MMR) system. In certain embodiments, the DNA mismatch repair (MMR) system may be inhibited, blocked, or otherwise inactivate one or more proteins of the MMR system, including but not limited to MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA. The present disclosure contemplates any suitable method of inhibiting, blocking, or otherwise inactivating a DNA mismatch repair (MMR) system, including, but not limited to, one or more key proteins that inactivate the MMR system at the gene level, such as by introducing one or more mutations in the gene encoding the MMR system protein, e.g., MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA.
Thus, in one aspect, the present disclosure provides methods for editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating a DNA mismatch repair (MMR) system.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, evading, or otherwise inactivating proteins of an MMR system, such as MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating MLH1 or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating PMS2 (or mutlα) or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating PMS1 (or mutlβ) or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating MLH3 (or mutlγ) or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating mutsα (MSH 2-MSH 6) or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating MSH2 or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating MSH6 or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating PCNA or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating RFC or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating EXO1 or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, evading, or otherwise inactivating POL delta or variants thereof.
Thus, in one aspect, the present disclosure provides methods for editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, escaping, or otherwise inactivating a DNA mismatch repair (MMR) system.
In another aspect, the present disclosure provides methods of evading MMR by installing one or more silent mutations near an intended edit, resulting in MMR evasion and thereby improving the edit efficiency of guided editing. In various embodiments, the number of installed silent mutations can be one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or eleven, or twelve, or thirteen, or fourteen, or fifteen, or sixteen, or seventeen, or eighteen, or nineteen, or twenty or more. Yet another silent mutation may be located upstream or downstream of the intended editing site (or in combination if multiple silent mutations are involved), on the same or opposite DNA strand as the intended editing site (or in combination if multiple silent mutations are involved). The silent mutation may be located upstream or downstream of the desired edit from the desired edit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 or more nucleotide positions. In various embodiments, when one or more silent mutations are installed near a desired site for installation of a genetic change by a guided editing in the presence or absence of an MMR inhibitor, the method of evading by installing the silent mutation results in a significant increase in editing efficiency of the guided editing (e.g., at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least 33 fold, at least 34 fold, at least at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold) in various embodiments, in the presence or absence of an MMR inhibitor, when one or more silent mutations are installed near the desired site for installation of a genetic alteration by directed editing, the approach of evading MMR by silent mutation installation results in a significant reduction in the frequency of indel formation leading to editing (e.g., at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least 33 fold, at least at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with an inhibitor of an MMR system, e.g., an inhibitor of one or more of MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, or PCNA, that directs the editor and MMR system. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an antibody, e.g., a neutralizing antibody. In other embodiments, the inhibitor may be a variant of the MMR protein (e.g., a variant encoded by a dominant negative mutant of a gene encoding the MMR protein that adversely affects the function or expression of a normal wild-type MMR protein, also referred to herein as a "dominant negative mutant," "dominant negative variant," or "dominant negative protein," e.g., a "dominant negative MMR protein"). In some embodiments, the inhibitor is a dominant negative variant of MMR protein that inhibits the activity of wild-type MMR protein. For example, the inhibitor may be a MLH1 protein variant (e.g., a dominant negative mutant) of one or more of MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, or PCNA, e.g., a dominant negative mutant of MLH 1. In other embodiments, the inhibitor may be targeted at the transcriptional level, e.g., a siRNA or other nucleic acid agent that knocks down the level of a transcript encoding MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, or PCNA. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; and (iii) a DNA vector (e.g., an AAV or lentiviral vector, a plasmid, or other nucleic acid delivery vector) encoding the guide editor fusion protein and/or a suitable pegRNA on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MLH1 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MLH1 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MLH 1 antibody, e.g., a neutralizing antibody that inactivates MLH 1. In other embodiments, the inhibitor may be a dominant negative mutant of MLH 1. In other embodiments, the inhibitor may be targeted at the transcript level of MLH1, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MLH 1. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PMS2 (or mutlα) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and a PMS2 (or mutlα) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-PMS 2 (or mutlα) antibody, for example a neutralizing antibody that inactivates PMS2 (or mutlα). In other embodiments, the inhibitor may be a dominant negative mutant of PMS2 (or mutlα). In other embodiments, the inhibitor may be targeted at the transcript level of PMS2 (or mutlα), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding PMS2 (or mutlα). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PMS1 (or mutlβ) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and a PMS1 (or mutlβ) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-PMS 1 (or mutlβ) antibody, e.g., a neutralizing antibody that inactivates PMS1 (or mutlβ). In other embodiments, the inhibitor may be a dominant negative mutant of PMS1 (or mutlβ). In other embodiments, the inhibitor may be targeted at the transcript level of PMS1 (or mutlβ), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding PMS1 (or mutlβ). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MLH3 (or mutlγ) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MLH3 (or mutlγ) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MLH 3 (or MutLγ) antibody, e.g., a neutralizing antibody that inactivates MLH3 (or MutLγ). In other embodiments, the inhibitor may be a dominant negative mutant of MLH3 (or MutLγ). In other embodiments, the inhibitor may be targeted at the transcript level of MLH3 (or MutLγ), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MLH3 (or MutLγ). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating mutsα (MSH 2-MSH 6) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a lead editor and a mutsα (MSH 2-MSH 6) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-mutsα (MSH 2-MSH 6) antibody, for example a neutralizing antibody that inactivates mutsα (MSH 2-MSH 6). In other embodiments, the inhibitor may be a dominant negative mutant of MutSα (MSH 2-MSH 6). In other embodiments, the inhibitor may be targeted at the transcript level of mutsα (MSH 2-MSH 6), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding mutsα (MSH 2-MSH 6). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MSH2 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MSH2 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MSH 2 antibody, e.g., a neutralizing antibody that inactivates MSH 2. In other embodiments, the inhibitor may be a dominant negative mutant of MSH 2. In other embodiments, the inhibitor may be targeted at the transcript level of MSH2, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MSH 2. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MSH6 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MSH6 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MSH 6 antibody, e.g., a neutralizing antibody that inactivates MSH 6. In other embodiments, the inhibitor may be a dominant negative mutant of MSH 6. In other embodiments, the inhibitor may be targeted at the transcript level of MSH6, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MSH 6. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PCNA or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and a PCNA inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-PCNA antibody, e.g., a neutralizing antibody that inactivates PCNA. In other embodiments, the inhibitor may be a dominant negative mutant of PCNA. In other embodiments, the inhibitor may be targeted at the transcription level of PCNA, e.g., knockdown of siRNA or other nucleic acid agents encoding the transcript level of PCNA. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating RFC or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guidance editor and a RFC inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-RFC antibody, such as a neutralizing antibody that inactivates RFC. In other embodiments, the inhibitor may be a dominant negative mutant of RFC. In other embodiments, the inhibitor may be targeted at the level of transcription of RFC, e.g., knockdown of siRNA or other nucleic acid agents encoding the level of transcripts of RFC. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating EXO1 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an EXO1 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor can be an anti-EXO 1 antibody, e.g., a neutralizing antibody that inactivates EXO 1. In other embodiments, the inhibitor may be a dominant negative mutant of EXO 1. In other embodiments, the inhibitor can be targeted at the transcript level of EXO1, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding EXO 1. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating POL delta or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a lead editor and a POL delta inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-POL delta antibody, e.g., a neutralizing antibody that inactivates POL delta. In other embodiments, the inhibitor may be a dominant negative mutant of POL delta. In other embodiments, the inhibitor may be targeted at the transcript level of POL delta, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding POL delta. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In one aspect, the present disclosure provides a method of editing a nucleic acid molecule by guided editing. In some embodiments, the method comprises contacting the nucleic acid molecule with an inhibitor that directs the editor, pegRNA, and DNA mismatch repair pathways, thereby installing one or more modifications to the nucleic acid molecule at the target site.
The method may increase the efficiency of guided editing and/or decrease the frequency of indel formation. In some embodiments, the guided editing efficiency is increased by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold in the presence of an inhibitor of the DNA mismatch repair pathway. In some embodiments, the frequency of indel formation is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold in the presence of an inhibitor of the DNA mismatch repair pathway.
In some embodiments, the inhibitor of the DNA mismatch repair pathway inhibits one or more proteins of the DNA mismatch repair pathway. In some embodiments, the one or more proteins are selected from the group consisting of: MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ and PCNA. In certain embodiments, the one or more proteins is MLH1. In some embodiments, MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including 100% sequence identity.
The inhibitor used in the method may be an antibody, a small molecule, a small interfering RNA (siRNA), a small non-coding microrna, or a dominant negative variant of MMR protein that inhibits the activity of wild-type MMR protein (e.g., a dominant negative variant of MLH 1). In certain embodiments, the inhibitor is an antibody that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a small molecule that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In certain embodiments, the inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a dominant negative variant of MLH1 that inhibits MLH1.
In some embodiments, dominant negative variants are (a) MLH1E34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH1E34A delta 754-756 (SEQ ID NO: 210), (E) MLH11-335 (SEQ ID NO: 211), (f) MLH11-335E34A (SEQ ID NO: 212), (g) MLH11-335NLS SV40 (SEQ ID NO: 213), (h) MLH1501-756 (SEQ ID NO: 215), (i) MLH1501-753 (SEQ ID NO: 216), (j) MLH1461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including an amino acid sequence of 100% sequence identity.
The boot editor used in the methods of the present disclosure may comprise a plurality of components. In some embodiments, the guidance editor comprises napDNAbp and a polymerase. In some embodiments, the napDNAbp is a nuclease active Cas9 domain, a nuclease inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof. In certain embodiments, the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity. In certain embodiments, the napDNAbp comprises SEQ ID NO: 2. 4-67 or 99 (PEmax) or an amino acid sequence identical to any one of SEQ ID NOs: 2. 4-67 or 99 (PEmax) has an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity. In certain embodiments, the napDNAbp comprises SEQ ID NO:2 or SEQ ID NO:37 (e.g., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase. In some embodiments, the polymerase is a reverse transcriptase. In certain embodiments, the reverse transcriptase comprises SEQ ID NO:69-98 or an amino acid sequence identical to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
The napDNAbp and the editor-directing polymerase can be linked together to form a fusion protein. In some embodiments, the napDNAbp and the editor-directing polymerase are linked by a linker to form a fusion protein. In certain embodiments, the linker comprises SEQ ID NO:102 or 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102 or 118-131, or an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity. In some embodiments, the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
The components used in the method (e.g., inhibitors that direct the editor, pegRNA, and/or DNA mismatch repair pathways) may be encoded on a DNA vector. In some embodiments, inhibitors that direct the editor, pegRNA, and DNA mismatch repair pathways are encoded on one or more DNA vectors. In certain embodiments, the one or more DNA vectors comprise an AAV or lentiviral DNA vector. In some embodiments, the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
The boot editor used in the presently disclosed method may be further coupled to additional components. In some embodiments, the guide editor as a fusion protein is further linked to an inhibitor of the DNA mismatch repair pathway through a second linker. In certain embodiments, the second linker is a self-hydrolyzing linker. In certain embodiments, the second linker comprises SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
In some embodiments, the one or more modifications to the nucleic acid molecule installed at the target site include one or more transitions, one or more transversions, one or more insertions, one or more deletions, or one or more inversions. In certain embodiments, the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A. In certain embodiments, the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T. In certain embodiments, the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs. In some embodiments, the one or more modifications comprise insertions or deletions of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
The methods of the present disclosure may be used to correct one or more disease-related genes. In some embodiments, the one or more modifications comprise modifications to a disease-associated gene. In certain embodiments, the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity. In certain embodiments, the disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; duchenne muscular dystrophy; galactosylation; hemochromatosis; huntington's disease; maple syrup urine disease; marfan syndrome; type 1 neurofibromatosis; congenital thick nail disease; phenylketonuria; severe combined immunodeficiency; sickle cell anemia; smith-lyme-oepitz syndrome; trinucleotide repeat disorders; prion diseases; and tay's disease.
In another aspect, the present disclosure provides a composition for editing a nucleic acid molecule by guided editing. In some embodiments, the composition comprises an inhibitor that directs the editor, pegRNA, and DNA mismatch repair pathways, wherein the composition is capable of installing one or more modifications to a nucleic acid molecule at a target site.
The composition may increase the efficiency of guided editing and/or decrease the frequency of indels formation. In some embodiments, the guided editing efficiency is increased by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold in the presence of an inhibitor of the DNA mismatch repair pathway. In some embodiments, the frequency of indel formation is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold in the presence of an inhibitor of the DNA mismatch repair pathway.
In some embodiments, the inhibitor of the DNA mismatch repair pathway inhibits one or more proteins of the DNA mismatch repair pathway. In some embodiments, the one or more proteins are selected from the group consisting of: MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ and PCNA. In certain embodiments, the one or more proteins is MLH1. In some embodiments, MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including 100% sequence identity.
The inhibitor used in the composition may be an antibody, a small molecule, a small interfering RNA (siRNA), a small non-coding microrna, or a dominant negative variant of MMR protein that inhibits the activity of wild-type MMR protein (e.g., a dominant negative variant of MLH 1). In certain embodiments, the inhibitor is an antibody that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a small molecule that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In certain embodiments, the inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
In some embodiments, dominant negative variants are (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including an amino acid sequence of 100% sequence identity.
The guidance editor used in the compositions of the present disclosure may comprise a plurality of components. In some embodiments, the guidance editor comprises napDNAbp and a polymerase. In some embodiments, the napDNAbp is a nuclease active Cas9 domain, a nuclease inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof. In certain embodiments, the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity. In certain embodiments, the napDNAbp comprises SEQ ID NO: 2. 4-67 or 99 (PEmax) or an amino acid sequence identical to any one of SEQ ID NOs: 2. 4-67 or 99 (PEmax) has an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity. In certain embodiments, the napDNAbp comprises SEQ ID NO:2 or SEQ ID NO:37 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2 or SEQ ID NO:37 has an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase. In some embodiments, the polymerase is a reverse transcriptase. In certain embodiments, the reverse transcriptase comprises SEQ ID NO:69-98 or an amino acid sequence identical to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
The napDNAbp and the editor-directing polymerase can be linked together to form a fusion protein. In some embodiments, the napDNAbp and the editor-directing polymerase are linked by a linker to form a fusion protein. In certain embodiments, the linker comprises SEQ ID NO: 102. 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102. 118-131, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
The components used in the compositions disclosed herein (e.g., inhibitors that direct the editor, pegRNA, and/or DNA mismatch repair pathway) may be encoded on a DNA vector. In some embodiments, inhibitors that direct the editor, pegRNA, and DNA mismatch repair pathways are encoded on one or more DNA vectors. In certain embodiments, the one or more DNA vectors comprise an AAV or lentiviral DNA vector. In some embodiments, the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
The guidance editor used in the presently disclosed compositions may be further connected to additional components. In some embodiments, the guide editor as a fusion protein is further linked to an inhibitor of the DNA mismatch repair pathway through a second linker. In certain embodiments, the second linker is a self-hydrolyzing linker. In certain embodiments, the second linker comprises SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
In some embodiments, the one or more modifications to the nucleic acid molecule installed at the target site include one or more transitions, one or more transversions, one or more insertions, one or more deletions, or one or more inversions. In certain embodiments, the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A. In certain embodiments, the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T. In certain embodiments, the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs. In some embodiments, the one or more modifications comprise insertions or deletions of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
The compositions of the present disclosure may be used to correct one or more disease-related genes. In some embodiments, the one or more modifications comprise modifications to a disease-associated gene. In certain embodiments, the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity. In certain embodiments, the disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; duchenne muscular dystrophy; galactosylation; hemochromatosis; huntington's disease; maple syrup urine disease; marfan syndrome; type 1 neurofibromatosis; congenital thick nail disease; phenylketonuria; severe combined immunodeficiency; sickle cell anemia; smith-lyme-oepitz syndrome; trinucleotide repeat disorders; prion diseases; and tay's disease.
In another aspect, the present disclosure provides polynucleotides for editing a DNA target site by guided editing. In some embodiments, the polynucleotide comprises a nucleic acid sequence encoding a napDNAbp, a polymerase, and an inhibitor of the DNA mismatch repair pathway, wherein the napDNAbp and the polymerase are capable of installing one or more modifications in a DNA target site in the presence of the pegRNA.
Polynucleotides may increase the efficiency of guided editing and/or decrease the frequency of indels formation. In some embodiments, the guided editing efficiency is increased by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold in the presence of an inhibitor of the DNA mismatch repair pathway. In some embodiments, the frequency of indel formation is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold in the presence of an inhibitor of the DNA mismatch repair pathway.
In some embodiments, the inhibitor of the DNA mismatch repair pathway inhibits one or more proteins of the DNA mismatch repair pathway. In some embodiments, the one or more proteins are selected from the group consisting of: MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ and PCNA. In certain embodiments, the one or more proteins is MLH1. In some embodiments, MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including 100% sequence identity.
Inhibitors used in the polynucleotide may be antibodies, small molecules, small interfering RNAs (sirnas), small non-coding micrornas, or dominant negative variants of MMR proteins that inhibit wild-type MMR protein activity (e.g., dominant negative variants of MLH 1). In certain embodiments, the inhibitor is an antibody that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a small molecule that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In certain embodiments, the inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
In some embodiments, dominant negative variants are (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity.
The guide editor used in the polynucleotides of the present disclosure may comprise a plurality of components. In some embodiments, the napDNAbp is a nuclease active Cas9 domain, a nuclease inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof. In certain embodiments, napDNAbp is selected from the following: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity. In certain embodiments, the napDNAbp comprises SEQ ID NO: 2. 4-67 or 99 (PEmax) or an amino acid sequence identical to any one of SEQ ID NOs: 2. 4-67 or 99 (PEmax) has an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity. In certain embodiments, the napDNAbp comprises SEQ ID NO:2 or SEQ ID NO:37 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2 or SEQ ID NO:37 has an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase. In some embodiments, the polymerase is a reverse transcriptase. In certain embodiments, the reverse transcriptase comprises SEQ ID NO:69-98 or an amino acid sequence identical to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
The napDNAbp and the editor-directing polymerase can be linked together to form a fusion protein. In some embodiments, the napDNAbp and the editor-directing polymerase are linked by a linker to form a fusion protein. In certain embodiments, the linker comprises SEQ ID NO: 102. 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102. 118-131, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
The polynucleotides disclosed herein may comprise a vector. In some embodiments, the polynucleotide is a DNA vector. In certain embodiments, the DNA vector is an AAV or lentiviral DNA vector. In some embodiments, the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
The guide editor used in the presently disclosed polynucleotides may be further linked to additional components. In some embodiments, the guide editor as a fusion protein is further linked to an inhibitor of the DNA mismatch repair pathway through a second linker. In certain embodiments, the second linker is a self-hydrolyzing linker. In certain embodiments, the second linker comprises SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity. In some embodiments, the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
In some embodiments, the one or more modifications to the nucleic acid molecule installed at the target site include one or more transitions, one or more transversions, one or more insertions, one or more deletions, or one or more inversions. In certain embodiments, the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A. In certain embodiments, the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T. In certain embodiments, the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs. In some embodiments, the one or more modifications comprise insertions or deletions of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
The polynucleotides of the present disclosure may be used to correct one or more disease-related genes. In some embodiments, the one or more modifications comprise modifications to a disease-associated gene. In certain embodiments, the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity. In certain embodiments, the disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; duchenne muscular dystrophy; galactosylation; hemochromatosis; huntington's disease; maple syrup urine disease; marfan syndrome; type 1 neurofibromatosis; congenital thick nail disease; phenylketonuria; severe combined immunodeficiency; sickle cell anemia; smith-lyme-oepitz syndrome; trinucleotide repeat disorders; prion diseases; and tay's disease.
In another aspect, the present disclosure provides a cell. In some embodiments, the cell comprises any of the polynucleotides described herein.
In another aspect, the present disclosure provides a pharmaceutical composition. In some embodiments, the pharmaceutical composition comprises any of the compositions disclosed herein. In some embodiments, the pharmaceutical composition comprises any of the compositions disclosed herein and a pharmaceutically acceptable excipient. In some embodiments, the pharmaceutical composition comprises any of the polynucleotides disclosed herein. In some embodiments, the pharmaceutical composition comprises any of the polynucleotides disclosed herein and a pharmaceutically acceptable excipient.
In another aspect, the present disclosure provides a kit. In some embodiments, the kit comprises any of the compositions disclosed herein, a pharmaceutical excipient, and instructions for editing the DNA target site by directed editing. In some embodiments, the kit comprises any of the polynucleotides disclosed herein, pharmaceutical excipients, and instructions for editing the DNA target site by directed editing.
The present disclosure also provides methods and pegrnas for guided editing, thereby avoiding correction of changes introduced into a target nucleic acid molecule by the MMR pathway, without the need to provide inhibitors of the MMR pathway. Surprisingly, a pegRNA designed with consecutive nucleotide mismatches, e.g. having three or more consecutive mismatched nucleotides, compared to the endogenous sequence of the target site on the target nucleic acid can evade correction of the MMR pathway, resulting in an increase in the efficiency of guided editing and a decrease in the frequency of indel formation compared to the introduction of single nucleotide mismatches using guided editing. Furthermore, insertion or deletion of consecutive nucleotides, for example, insertion or deletion longer than 10 nucleotides in length, of a target nucleic acid target site introduced by guided editing will also evade correction of the MMR pathway, resulting in an increase in guided editing efficiency and a decrease in the frequency of indel formation as compared to insertion or deletion shorter than 10 nucleotides in length introduced by guided editing.
Thus, in another aspect, the present disclosure provides a method of editing a nucleic acid molecule by priming editing comprising contacting the nucleic acid molecule with a guide editor (e.g., any of PE2, PE3, or other guide editors described herein) and a pegRNA having a DNA synthesis template on an extension arm thereof, the template comprising three or more consecutive nucleotide mismatches relative to an endogenous sequence of a target site on the nucleic acid molecule. In some embodiments, at least one consecutive nucleotide mismatch results in a change in the amino acid sequence of a protein expressed from the nucleic acid molecule, while at least one remaining nucleotide mismatch is a silent mutation. The silent mutation may be in a coding region of the target nucleic acid molecule (i.e., in a portion of a gene encoding a protein), or the silent mutation may be in a non-coding region of the target nucleic acid molecule. In some embodiments, when the silent mutation is located in the coding region, the silent mutation introduces one or more alternative codons that encode the same amino acid as the unedited nucleic acid molecule into the nucleic acid molecule. In some embodiments, when the silent mutation is located in a non-coding region, the silent mutation is present in a region of the nucleic acid molecule that does not affect splicing, gene regulation, RNA life, or other biological properties of a target site on the nucleic acid molecule.
Any number of consecutive nucleotide mismatches can be designed in the DNA synthesis template of the pegRNA compared to the target site sequence to achieve the benefit of evading MMR pathway correction, thereby increasing guided editing efficiency and/or reducing indel formation. In some embodiments, the DNA synthesis template comprises at least three consecutive nucleotide mismatches compared to the sequence of the target site. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotide mismatches relative to the endogenous sequence of the target site in the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule. In certain embodiments, the use of three or more consecutive nucleotide mismatches results in at least a 1.5-fold, at least a 2.0-fold, at least a 2.5-fold, at least a 3.0-fold, at least a 3.5-fold, at least a 4.0-fold, at least a 4.5-fold, at least a 5.0-fold, at least a 5.5-fold, at least a 6.0-fold, at least a 6.5-fold, at least a 7.0-fold, at least a 7.5-fold, at least a 8.0-fold, at least a 8.5-fold, at least a 9.0-fold, at least a 9.5-fold, or at least a 10.0-fold increase in guided editing efficiency relative to methods that use a template for DNA synthesis that comprises only one consecutive nucleotide mismatch relative to an endogenous sequence of a target site on a nucleic acid molecule. In certain embodiments, the use of three or more consecutive nucleotide mismatches results in a reduction in the indel formation frequency by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a method of using a perna that comprises a DNA synthesis template comprising only one consecutive nucleotide mismatch relative to an endogenous sequence of a target site on a nucleic acid molecule.
In another aspect, the present disclosure provides a method of editing a nucleic acid molecule by guided editing comprising contacting the nucleic acid molecule with a guided editor (e.g., PE2, PE3, or any of the other guided editors described herein) and a pegRNA having on its extension arm a DNA synthesis template comprising an insertion or deletion of 10 or more consecutive nucleotides relative to an endogenous sequence of a target site on the nucleic acid molecule. In some embodiments, the DNA synthesis template of pegRNA may be designed to introduce insertions or deletions of greater than 3 nucleotides to avoid or reduce the effects of mismatch correction of the cellular MMR pathway, thereby increasing guided editing efficiency. In some embodiments, the DNA synthesis template of the pegRNA is designed to introduce one or more insertions and/or deletions of 3, 4, 5, 6, 7, 8, 9, 10 or more consecutive nucleotides to avoid or reduce the effects of mismatch correction of the cellular MMR pathway, thereby improving guided editing efficiency. In some embodiments, any length of insertion or deletion of greater than 10 consecutive nucleotides may be used to achieve the benefit of escaping correction of the MMR pathway. In some embodiments, the DNA synthesis template comprises an insertion of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule edited by directed editing. In some embodiments, the DNA synthesis template comprises a deletion of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule edited by directed editing. In some embodiments, the DNA synthesis template comprises an insertion of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 consecutive nucleotides relative to an endogenous sequence of a target site on a nucleic acid molecule edited by directed editing. In some embodiments, the DNA synthesis template comprises a deletion of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 consecutive nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template comprises insertions and deletions of 11 or more consecutive nucleotides, 12 or more consecutive nucleotides, 13 or more consecutive nucleotides, 14 or more consecutive nucleotides, 15 or more consecutive nucleotides, 16 or more consecutive nucleotides, 17 or more consecutive nucleotides, 18 or more consecutive nucleotides, 19 or more consecutive nucleotides, 20 or more consecutive nucleotides, 21 or more consecutive nucleotides, 22 or more consecutive nucleotides, 23 or more consecutive nucleotides, 24 or more consecutive nucleotides, or 25 or more consecutive nucleotides relative to the target site on the nucleic acid molecule. In certain embodiments, the DNA synthesis template comprises insertions or deletions of 15 or more consecutive nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule.
In some embodiments, guided editing using a pegRNA designed to introduce insertions and/or deletions of multiple consecutive nucleotides (e.g., three or more consecutive nucleotides) relative to an endogenous sequence of a target site results in a guided editing efficiency that is increased by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a guided editing using a corresponding control pegRNA (e.g., a control pegRNA that does not introduce insertions or deletions of three or more consecutive nucleotides). In some embodiments, guided editing using a pegRNA designed to introduce insertions and/or deletions of 3, 4, 5, 6, 7, 8, 9, 10, or more consecutive nucleotides relative to an endogenous sequence of a target site results in a guided editing efficiency that is increased by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a corresponding control pegRNA (e.g., a control pegRNA that does not introduce insertions or deletions of three or more consecutive nucleotides). In some embodiments, performing an insertion or deletion of 10 or more consecutive nucleotides results in at least a 1.5-fold, at least a 2.0-fold, at least a 2.5-fold, at least a 3.0-fold, at least a 3.5-fold, at least a 4.0-fold, at least a 4.5-fold, at least a 5.0-fold, at least a 5.5-fold, at least a 6.0-fold, at least a 6.5-fold, at least a 7.0-fold, at least a 7.5-fold, at least a 8.0-fold, at least a 8.5-fold, at least a 9.0-fold, at least a 9.5-fold, or at least a 10.0-fold increase in guided editing efficiency relative to a method using a perna that comprises a DNA synthesis template comprising an insertion or deletion of less than 10 nucleotides relative to an endogenous sequence of a target site on a nucleic acid molecule. In some embodiments, performing an insertion or deletion of 10 or more nucleotides results in a reduction in the frequency of indel formation by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a method using a DNA synthesis template comprising an insertion or deletion of less than 10 nucleotides relative to an endogenous sequence at a target site on a nucleic acid molecule.
In another aspect, the present disclosure also provides pegRNA useful for editing nucleic acid molecules by guided editing while avoiding correction of changes introduced into the nucleic acid molecule by the MMR pathway, thereby increasing guided editing efficiency and/or reducing indel formation. In some embodiments, the extension arm of a pegRNA provided by the present disclosure comprises three or more consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule. In some embodiments, at least one of the three consecutive nucleotide mismatches relative to the endogenous sequence of the target site is a silent mutation. In some embodiments, at least one consecutive nucleotide mismatch results in a change in the amino acid sequence of a protein expressed from the target nucleic acid molecule, while at least one remaining nucleotide mismatch is a silent mutation. The silent mutation may be in a coding region of the target nucleic acid molecule (i.e., in a portion of a gene encoding a protein), or the silent mutation may be in a non-coding region of the target nucleic acid molecule. In some embodiments, when the silent mutation is located in the coding region, the silent mutation introduces one or more alternative codons that encode the same amino acid as the unedited nucleic acid molecule into the nucleic acid molecule. In some embodiments, when the silent mutation is located in a non-coding region, the silent mutation is present in a region of the nucleic acid molecule that does not affect splicing, gene regulation, RNA lifetime, or other biological property of a target site on the nucleic acid molecule.
Any number of three or more consecutive nucleotide mismatches may be incorporated into the extension arm of the pegRNA described herein to achieve the benefit of escaping correction of the MMR pathway. In some embodiments, the DNA synthesis template of the extension arm of the pegRNA comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm of the pegRNA comprises at least three consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm of the pegRNA comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm of the pegRNA comprises 3, 4, 5, 6, 7, 8, 9, or 10 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In certain embodiments, the presence of three or more consecutive nucleotide mismatches on the extension arm of the pegRNA results in at least a 1.5-fold, at least a 2.0-fold, at least a 2.5-fold, at least a 3.0-fold, at least a 3.5-fold, at least a 4.0-fold, at least a 4.5-fold, at least a 5.0-fold, at least a 5.5-fold, at least a 6.0-fold, at least a 6.5-fold, at least a 7.0-fold, at least a 7.5-fold, at least a 8.0-fold, at least a 8.5-fold, at least a 9.0-fold, at least a 9.5-fold, or at least a 10.0-fold increase in guided editing efficiency relative to the pegRNA comprising a DNA synthesis template comprising only one consecutive nucleotide mismatch relative to an endogenous sequence of a target site on a nucleic acid molecule. In certain embodiments, the use of three or more consecutive nucleotide mismatches results in a reduction in the indel formation frequency by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a perna comprising a DNA synthesis template comprising only one consecutive nucleotide mismatch relative to an endogenous sequence of a target site on a nucleic acid molecule.
In another aspect, the present disclosure provides a guided editor system for site-specific genomic modification comprising (a) a guided editor comprising (i) a nucleic acid programmable DNA binding protein (napDNAbp) and (ii) a DNA polymerase, and (b) an inhibitor of a DNA mismatch repair pathway. In some embodiments, an inhibitor of the DNA mismatch repair pathway inhibits one or more proteins of the DNA mismatch repair pathway (e.g., MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and/or PCNA). In some embodiments, the one or more proteins is MLH1. In certain embodiments, MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including 100% sequence identity.
Any inhibitor of the DNA mismatch repair pathway may be used in the systems described herein. In some embodiments, the inhibitor is an antibody that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a small molecule that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits the activity of one or more proteins of the DNA mismatch repair pathway. In some embodiments, the inhibitor is a dominant negative variant of MMR protein that inhibits activity of wild-type MMR protein (e.g., a dominant negative variant of MLH1 that inhibits MLH 1).
In certain embodiments, dominant negative variants used in the systems of the present disclosure are (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1_461-753 (SEQ ID NO: 218) or (k) NLS SV40 MLH1_501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including an amino acid sequence of 100% sequence identity. The present disclosure also contemplates methods for guided editing of nucleic acid molecules in cells in which MMR activity is completely knocked out (e.g., by knocking down one or more genes in the genome of the cell that are involved in the MMR pathway). Such methods provide the benefits of inhibiting MMR (e.g., increased editing efficiency and reduced indel formation) without the need to provide MMR inhibitors. Thus, in another aspect, the present disclosure provides a method of editing a nucleic acid molecule by guided editing, comprising: contacting the nucleic acid molecule with a guide editor and a pegRNA to thereby install one or more modifications to the nucleic acid molecule at the target site, wherein the nucleic acid molecule Is located in a cell comprising a knockout of one or more genes involved in the DNA mismatch repair (MMR) pathway. In some embodiments, the method further comprises contacting the nucleic acid molecule with a second nick-producing gRNA. In certain embodiments, the efficiency of the guided editing is increased by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold relative to a method performed in a cell that does not comprise a knockout of one or more genes involved in MMR. In certain embodiments, the frequency of indel formation is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold relative to a method performed in a cell that does not comprise a knockout of one or more genes involved in MMR. In some embodiments, the one or more genes involved in MMR are selected from genes encoding: MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ and PCNA. In certain embodiments, the one or more genes are genes encoding MLH1 (e.g., comprising the amino acid sequence of SEQ ID NO:204, or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity to SEQ ID NO: 204).
In another aspect, the present disclosure provides a method of editing a nucleic acid molecule by guided editing, comprising: contacting the nucleic acid molecule with a guide editor, pegRNA, and a p53 inhibitor, thereby installing one or more modifications to the nucleic acid molecule at the target site. In some embodiments, the method further comprises contacting the nucleic acid molecule with a second nick-producing gRNA.
In some embodiments, in the presence of a p53 inhibitor, at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, at least 10.0-fold, at least 11-fold, at least 12-fold, at least 13-fold, at least 14-fold, at least 15-fold, at least 16-fold, at least 17-fold, at least 18-fold, at least 19-fold, at least 20-fold, at least 21-fold, at least 22-fold, at least 23-fold, at least 24-fold, at least 25-fold, at least 26-fold, at least 27-fold, at least 28-fold, at least 29-fold, at least 30-fold, at least 31-fold, at least 32-fold, at least 33-fold, at least at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold. In some embodiments, in the presence of a p53 inhibitor, the frequency of indel formation is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
In some embodiments, the p53 inhibitor is a protein. In certain embodiments, the inhibitor of p53 is protein i53. In some embodiments, the p53 inhibitor is an antibody that inhibits p53 activity. In some embodiments, the p53 inhibitor is a small molecule that inhibits p53 activity. In some embodiments, the p53 inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits p53 activity.
In another aspect, the present disclosure describes an improved guided editor fusion protein comprising SEQ ID NO: PEmax of 99. The present disclosure also contemplates a sequence that hybridizes to SEQ ID NO:99 has at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or at least up to 100% sequence identity.
The inventors have surprisingly found that when one or more components of a typical guided editor fusion protein (i.e., PE 2) are modified, the editing efficiency of the guided editing can be significantly increased (e.g., 2-fold increase, 3-fold increase, 4-fold increase, 5-fold increase, 6-fold increase, 7-fold increase, 8-fold increase, 9-fold increase, or 10-fold increase or more). Modifications may include modified amino acid sequences of one or more components (e.g., cas9 component, reverse transcriptase component, or linker).
In other aspects, the disclosure also provides compositions and pharmaceutical compositions comprising PEmax, guided editing methods using PEmax, polynucleotides and vectors encoding PEmax, and kits and cells comprising PEmax.
It should be appreciated that the foregoing concepts and additional concepts discussed below may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further advantages and novel features of the present disclosure will become apparent from the following detailed description of various non-limiting embodiments when considered in conjunction with the drawings.
Drawings
The following drawings form a part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which may be better understood by reference to one or more of these drawings in combination with the detailed description of the specific embodiments presented herein.
FIG. 1 provides a schematic diagram showing that guided editing is capable of directing RNA templated genome manipulation. DNA guided editing intermediates capable of cytokine repair are shown in boxes.
Fig. 2 provides a schematic diagram of a DNA repair CRISPRi screen for guiding editing results.
Figures 3A-3C show the optimization of guided editing efficiency at the target site. Fig. 3A provides a schematic illustration of the optimization process. FIG. 3B shows the percentage of reads with the indicated modifications at the target sites in HeLa cells. Figure 3C shows the percentage of reads with the indicated modifications at the blasticidin selected target sites in HeLa cells.
FIGS. 4A-4B show guided editing CRISPRi screens with DNA repair libraries. Fig. 4A provides a schematic illustration of the screening process. Fig. 4B shows the percentage of reads with the indicated modifications in bulk editing of HeLa cells after screening.
FIGS. 5A-5B show that CRISPRi screening reveals DNA mismatch repair restriction-guided editing efficiency. Knocking down mismatch repair proteins (MSH 2, MSH6, PMS2 and MLH 1) can increase the efficiency of PE2 by 3 times and PE3 by 2 times.
Figures 6A-6C show that siRNA knockdown of MMR improves guided editing in HEK293T cells. Editing results for multiple endogenous loci validated the results of the CRISPRi screen.
Figures 7A-7B show that full MMR knockout significantly enhances guided editing. Without MMR, the PE2 editing efficiency is shown to match the PE3 editing efficiency.
FIG. 8A provides a schematic representation of the mismatch repair (MMR) mechanism. First, MSH2: MSH6 (mutsα) binds to mismatch and recruits MLH1: PMS2 (mutlα). The DNA nick signals to the MMR which strand to repair. In the second step, mutLα cleaves indiscriminately mismatched nicked strands 5 'and 3'. In the third step, EXO1 excises the mismatch from the notch created by mutlα. Fourth, POL delta resynthesizes the excised strand and LIG1 ligation is performed.
FIG. 8B provides yet another schematic of the mismatch repair MMR mechanism in eukaryotic cells. The left side of the schematic depicts the 5' mmr. (A) mutS homologous proteins (MSH, purple) mutSα (MSH 2-MSH 6) or mutSβ (MSH 2-MSH 3) recognize and bind mismatches. Binding of RPA to single stranded DNA prevents EXO1 from accessing and degrading DNA. (B) In the slide clamp model, mismatched mutsα/β binds ATP and undergoes nucleotide switch activation as a slide clamp along the DNA diffusion. Multiple MSH clamps are loaded at a single mismatch. Interaction of EXO1 with the MSH slide clamp overcomes the RPA barrier and activates EXO1 to make a 5' to 3' excision from the 5' incision. MutL homologous proteins (MLHs) (MutLα is ScMlh1-Pms1 or HsMLH1-PMS 2) bind ATP and possibly interact with MSH slide clamp, but in vitro 5' MMR does not absolutely require MLH. In other models, MSH maintains mismatches to allow excision, or multiple MLH clips can be loaded onto DNA near the mismatch (not shown). (C) In the slide clamp model, the EXO1/MSH complex dissociates after excision of hundreds of nucleotides. The iterative rounds of MSH-EXO1 excision resulted in an RPA coated excision lane extending from the 5' incision beyond the mismatch. MLH may limit excision by modulating the number of MSH clips on the DNA. (D) RFC (not shown) loads PCNA clips in a specific direction at the 3' end of the strand breaks or gaps, and PCNA promotes high-fidelity DNA synthesis by Pol delta or epsilon. (E) DNA ligase I seals the nick. The right side of the schematic depicts the 3' mmr. (A) MSH recognizes mismatches. (B) In the slide clamp model, ATP-dependent binding and nucleotide conversion produce an MSH slide clamp that diffuses due to mismatches. The interaction of ATP-binding MLH heterodimers with the MSH slide clamp and PCNA oriented relative to the 3' terminus activates MLH strand-specific nick generation. Alternatively, ATP-activated MSH may maintain mismatches to load MLH and activate nick generation (not shown). (C) Excision is EXO1 dependent or independent, resulting in RPA coated excision lanes. The Pol delta strand displacement pathway independent of EXO1 is not shown. (D) Poldelta or epsilon gap filling is done by means of PCNA. (E) DNA ligase I seals the nick.
FIGS. 9A-9C provide schematic diagrams of mismatch repair of PE2 intermediates. MMR inhibition provides additional time for flap ligation, eliminating the chain discrimination signal of heteroduplex repair.
FIG. 10 shows that expression of dominant negative MLH1 mutants increases PE2 efficiency. The dominant negative MLH1 mutant increases PE2 efficiency by 2 to 4 times. Rnf2+3g to C did not respond to MMR inhibition.
FIGS. 11A-11B show the effect of MLH1 mutants on PE 3. The MLH1 mutant reduced PE3 indels by half.
FIGS. 12A-12B show improved transformation of MLH1 mutants to other sites. Fig. 12A shows that PE2 editing efficiency increases with the MLH1 mutant, and only rnf2+3g to C are resistant to MMR inhibition. FIG. 12B shows that MLH1 mutants reduce the incidence of indels by half.
FIG. 13 provides a schematic showing mismatch repair of PE3 intermediates.
FIG. 14 provides a schematic showing differential resolution of mismatch repair PE3 intermediates. Mismatch repair is required for an intermediate that facilitates editing.
FIGS. 15A-15H show screening of MLH1 mutants for smaller size and improved activity. FIG. 15A shows that MLH1Δ754-756 most strongly promotes PE2 editing (hereinafter MLH1 dn). The MLH 1N-terminal domain approximates the effectiveness of MLH1dn (hereinafter referred to as MLH1dn NTD ). The dominant negative MLH1 mutant may act through saturation binding of MutS. FIG. 15B shows the MLH 1N-terminal domain+NLS activity near MLH1 neg. FIG. 15C shows that MLH1dn can improve guided editing efficiency by fusion of a self-cleavable P2A linker (PE-2A-MLH 1 dn) to PE. FIGS. 15D-15F show that the MMR KD phenotype replicates MLH1neg expression. Figures 15G-15H show that the efficiency of PE2 and PE3 is equal in the absence of MMR, indicating that complementary cuts are only used to bias MMR.
FIG. 16 shows that MLH1dn reduces indels of PE 3. Silencing pegRNA is one that does not encode edits or produces mismatches. If a mismatch is generated, MLH1dn only reduces PE3 indels.
FIG. 17 shows that mismatch repair of PE heteroduplex results in diffuse indel pattern. For these edits, the indel distribution of PE3 is broad, but inhibiting MMR with MLH1dn reduces the distribution. This suggests that MMR creates a notch after mismatch recognition, helping PE3 to create indels.
FIG. 18 shows mismatch repair of PE3 intermediates.
FIGS. 19A-19B show that MMR excision of the target locus generates an indel in PE 3.
FIGS. 20A-20B show that MMR knockdown or knockout has no effect on RNF2+3G through C. This suggests that RNF2 sites are not repaired by MMR or the resulting C: the C mismatch is not repaired by MMR.
FIGS. 21A-21C show that other substitution edits at RNF2 can be modified with MLH1 dn.
FIGS. 22A-22B show that MLH1dn improves substitution editing at other sites, including HEK 3. MLH1dn strongly enhanced PE2 editing and reduced PE3 indels.
FIGS. 23A-23D show that MLH1dn improves substitution editing at other sites, including FANCF. MLH1dn strongly enhanced PE2 editing and reduced PE3 indels.
FIGS. 24A-24B show that the improvement of MHL1dn to PE is mismatch dependent. MLH1dn increased PE2 editing 2-fold on average in HEK293T cells. FIG. 24A shows that G to C editing (C: C mismatch) in HEK293T cells is not affected by MMR. This suggests that G to C editing has higher baseline efficiency than other substitutions. FIG. 24B shows edits from MLH1dn used with PE 3: the rate of indel purity increases significantly, which is also mismatch dependent.
FIGS. 25A-25D show that MLH1dn also improves the efficiency of small insertion and deletion editing. MMR is known to repair insertions and deletions of < 15 nucleotides in length.
FIGS. 26A-26B show MLH1dn reduced pegRNA scaffold integration. The scaffold integration events at these sites occur through Double Strand Break (DSB) intermediates.
FIG. 27 shows that MLH1dn does not promote substantial PE off-target editing. A small increase in off-target (OT) editing was observed at HEK4 off-target site 3.
FIGS. 28A-28B show that MLH1dn does not induce detectable microsatellite instability at biomarker loci. MMR inhibition is known to result in shortened homopolymer microsatellite regions.
FIG. 29 shows that MLH1dn provides a method to increase guided editing efficiency at sites where there is no good ngRNA (e.g., HEK 4).
FIG. 30 shows that MLH1dn ameliorates PE at disease sites.
FIG. 31 shows the installation of MLH1dn enhanced protective APOE Christchurch alleles in mouse astrocytes. Showing a 50% improvement in editing efficiency and a substantial reduction in indels.
Fig. 32 shows HEK293T cells are MMR-compromised. The MLH1 promoter in HEF293T is highly methylated, resulting in reduced MLH1 expression.
FIGS. 33A-33B show guided editing in MLH1dn enhanced HeLa cells. Fig. 33A shows the boot editing with PE 2. Fig. 33B shows the boot editing with PE 3.
FIGS. 34A-34B show guided editing in MLH1dn enhanced HeLa cells. Fig. 34A shows editing of prnp+6g to T. FIG. 34B shows editing of APOE+6G to T and +10C to A.
FIGS. 35A-35B show that MLH1dn has a greater effect in MMR competent cell lines such as HeLa.
FIGS. 36A-36D show MLH1dn improvement in synergy with stabilized pegRNA.
FIGS. 37A-37B show that successive substitutions can be used as another strategy to evade MMR.
FIG. 38 shows that MMR is not able to efficiently repair 3 or more consecutive substitutions. Thus, continuous substitution provides a way to circumvent MMR and improve PE efficiency.
FIGS. 39A-39C show that MLH1neg improves PE in HeLa cells.
FIGS. 40A-40G show that pooled Repair-seq CRISPRi screening reveals substituted genetic determinants that guide editing results. Fig. 40A shows that guided editing with the PE2 system is mediated by the PE2 enzyme (streptococcus pyogenes Cas9 (SpCas 9) H840A nickase fused with reverse transcriptase) and the guided editing guide RNA (pegRNA). The PE3 system uses additional single guide RNA (sgRNA) to nick the non-edited strand and results in higher editing efficiency. PBS, primer binding sites. RT template, reverse transcription template. FIG. 40B provides an overview of the guided editing Repair-seq CRISPRi screening. The library of CRISPRi sgrnas and pre-validated guide editing sites were transduced into a CRISPRi cell line and transfected with a guide editor targeting the editing sites. The CRISPRi sgRNA identity and the guide editing site are amplified together from genomic DNA and paired-end sequencing is performed, correlating each genetic perturbation to the editing result. SaCas9, staphylococcus aureus Cas9. FIG. 40C shows the effect of each CRISPRi sgRNA on the percentage of sequencing reads reporting the expected G.C-to-C.G. guided editing at the targeted editing sites in the pooled CRISPRi screens. Each value describes all sequencing reads carrying the same CRISPRi sgrnas. Figure 40D shows the effect of CRISPRi sgrnas on editing efficiency under all screening conditions. The black dots represent individual non-targeted sgrnas, the black lines show the average of all non-targeted sgrnas, and the grey shading represents the nuclear density estimate of all sgrnas distribution. FIGS. 40E-40G show a comparison of the gene level effects of CRISPRi targeting on predicted G.C-to-C.G-directed editing under different screening conditions. (FIG. 40E) K562 PE2 and HeLa PE2. (FIG. 40F) K562 PE3+50 and HeLa PE3+50. (FIG. 40G) K562 PE2 and K562 PE3+50. The effect of each gene was calculated as the average log2 fold change in frequency of the two most extreme sgrnas targeting that gene versus the non-targeted sgrnas. The number plotted is the average of n=2 independent biological replicates per cell type, and the bar graph shows the range of values spanned by the replicates. Black dots represent 20 random groups of three non-targeted sgrnas.
FIGS. 41A-41J show unexpected genetic modulators that guide editing results. 41A-41D show representative examples of four types of unexpected guided editing results observed in CRISPRi screening. In each figure, black bars represent the sequence of the editing result, blue bars represent the genomic sequence around the targeted editing site, and orange bars represent the pegRNA sequence. The blue and orange lines between the edited result and the genomic or pegRNA depict the local alignment between the result sequence and the relevant reference sequence. Mismatches in alignment are marked with an X and insertions are marked with a downward wavy line. The location of the programming edits are marked with grey boxes. Red and cyan rectangular markers SaCas9 pre-spacer and PAM on the genome, black vertical lines mark the position of the SaCas9 nick site. Orange, beige, grey and red rectangles on pegRNA mark the Primer Binding Site (PBS), reverse Transcription Template (RTT), scaffold and spacer, respectively. FIGS. 41E-41F provide a summary of the editing result categories observed in PE2 screening (FIG. 41E) and PE3+50 screening (FIG. 41F) in K562 cells. The number plotted is the average +sd of all sgrnas for each indicated gene (60 non-targeted sgrnas, three sgrnas for each targeted gene), averaged over n=2 independent biological replicates. FIGS. 41G-41H show a comparison of the effect of knockdown of all genes targeted in CRISPRi screening on the frequency of ligation of reverse transcribed sequences at unintended positions (FIG. 41G) or deletion frequencies from PE3+50 (FIG. 41H). The effect of each gene was calculated as the average log2 fold change in frequency of the two most extreme sgrnas targeting that gene versus the non-targeted sgrnas. The number plotted is the average of n=2 independent biological replicates per cell type, and the bar graph shows the range of values spanned by the replicates. Black dots represent 20 random groups of three non-targeted sgrnas. FIG. 41I shows the deletion frequency as a function of genomic position relative to the programmed PE3+50 cut (vertical dashed line) in K562 screen repeat 1 among all reads of a designated set of CRISPRi sgRNAs (black line: 60 non-targeted sgRNAs; orange and green lines: three sgRNAs targeting MSH2, MSH6, MLH1 and PMS2, respectively (upper). As compared to non-targeted sgRNAs, the fold change of Log2 as a function of genomic position from MSH2, MSH6, MLH1 and PMS2 sgRNAs (lower)). FIG. 41J shows the effect of gene knockdown on the proportion of all observed deletions that removed sequences at least 25-nt outside the programmed PE3+50 cut in the K562 screen. Each dot represents all reads of all sgrnas targeting each gene. Black dots represent 20 groups of three random non-targeted sgrnas.
42A-42D show models for directing mismatch repair of editing intermediates. FIG. 42A shows a DNA mismatch repair (MMR) model of PE2 intermediates. MMR cleaves and replaces nicked strands during repair of heteroduplex substrates generated by the guided editor. Infrequent ligation cuts prior to MMR recognition deprive the MMR of strand discrimination signals, resulting in unbiased resolution of heteroduplex. FIG. 42B shows the MMR model of PE3 intermediate. PE3 installs additional cuts on the non-edit chain that can guide MMR to replace the non-edit chain. The ligation of the edited strand breaks leaves only the complementary strand breaks to signal repair via MMR, thereby producing the desired guided editing result. FIG. 42C shows the efficiency of PE2 and PE3 guided editing of the editor at the endogenous sites (HEK 3, EMX1 and RUNX 1) in HEK293T cells pre-treated with knockdown siRNA against MSH2, MSH6, MLH1 or PMS2 transcripts. Cells were pre-transfected with siRNA 3 days prior to transfection with the pilot editor component and siRNA. Genomic DNA was harvested 3 days after transfection with the pilot editor and additional siRNA and then sequenced. Bars represent the average of n=3 independent biological replicates. Fig. 42D shows the efficiency of pilot editing (n=average of 3 independent biological replicates) in hap1Δmsh2 and hap1Δmlh1 cells. Delta, gene knockout.
FIGS. 43A-43F show the enhanced guided editing of engineered dominant negative MMR proteins (dominant negative variants of MSH2, MSH6, PMS2 and MLH 1). Figure 43A shows the editing improvements at HEK2, EMX1 and RUNX1 sites by co-expression of PE2 with human MMR protein or dominant negative variants in HEK293T cells in trans. MMR proteins include MSH2, MSH6, PMS2 and MLH1. Dominant negative variants were designated MSH 2K 675R, MSH K1140R, PMS E41A, PMS E705K, MLH E34A and MLH1. Delta.756. All values of n=3 independent biological replicates are shown. FIG. 43B shows functional annotations of 756 amino acid human MLH1 proteins, including ATPase domains, MSH2 interaction domains, NLS domains, PMS2 dimerization domains and endonuclease domains. FIG. 43C shows the editing enhancement of MLH1 variants co-expressed with PE2 at HEK3, EMX1 and RUNX1 sites in HEK293T cells. The red box indicates a mutation that inactivates the function of the MLH1 ATPase or endonuclease. MLH1dn, MLH1Δ754-756.MLH1NTD-NLS, codon optimized MLH1 1-335-NLSSV40. All values of n=3 independent biological replicates are shown. FIG. 43D shows the additional guide editingComparison of the first three dominant negative MLH1 variants. All values of n=3 independent biological replicates are shown. FIG. 43E shows trans PE2 and MLH1dn, trans PE2 and MLH1 in HEK293T cells NTD Guided editing of NLS and PE2-P2A-MLH1dn (human codon optimization). Bars represent the average of n=3 independent biological replicates. Fig. 43F compares the structures of PE2, PE3, PE4, and PE 5. In particular, the PE4 editing system consists of a guide editing enzyme (nickase Cas9-RT fusion), MLH1dn and pegRNA. The PE5 editing system consists of a guide editing enzyme, MLH1dn, pegRNA and a second strand nick producing sgRNA. FIG. 43G shows the editing efficiency of the PE2, PE3, PE4 and PE5 systems in HEK293T cells. Bars represent the average of n=3 independent biological replicates.
FIGS. 44A-44G show characterization of PE4 and PE5 in different guided editing categories and cell types. FIG. 44A provides a summary of the pilot editing enhancement of 84 single base substitution edits (seven for each substitution type) between seven endogenous sites in HEK293T cells by PE4 and PE5 as compared to PE2 and PE 3. The total mean ± SD of all individual values for n=3 independent biological replicates is shown. FIG. 44B shows the installation of single base mutations at the FANCF locus with PE2, PE3, PE4 and PE5 in HEK293T cells. Bars represent the average of n=3 independent biological replicates. FIG. 44C shows that PE4 improved 1-and 3-bp insertion and deletion guide editing in HEK293T cells as compared to PE 2. Bars represent the average of n=3 independent biological replicates. FIG. 44D shows enhancement of PE4 edits relative to PE2 in 33 different insertion and deletion guide edits. Bars represent the average of all individual values of n=3 independent biological replicates. FIGS. 44E-44F provide a summary of the editing efficiency of 35 different substitutions of PE2 and PE4 of 1 to 5 consecutive bases at five endogenous sites in HEK293T cells. 7 pegRNAs were tested for each altered number of consecutive bases. Mean ± SD of all individual values of n=3 independent biological replicates are shown. FIGS. 44G-44H show that installing additional silent or benign mutations near the intended edit can increase edit efficiency by generating heteroduplex substrates that escape MMR. PAM sequences (NGG) for each target are underlined. The amino acid sequence of the targeted gene is centered above each DNA codon. Values represent the mean ± SD of n=3 independent biological replicates. FIG. 44I shows a comparison of guided editing enhancements in different cell types. PE4 and PE5 systems enhance guided editing to a greater extent in MMR deficient cells (MMR-) than in MMR sufficient cells (MMR+). The same set of 30 coding single base substitution edited pegRNAs was tested in HEK293T and HeLa cells. K562 and U2OS cells were edited with 10 pegrnas, which are a direct subset of 30 pegrnas tested in HEK293T and HeLa cells. Mean ± SD of all individual values of n=3 independent biological replicates are shown. The P value was calculated using the Mann-Whitney U test. FIG. 44J shows pilot editing with PE2, PE3, PE4 and PE5 in HeLa, K562 and U2OS cells. Bars represent the average of n=3 independent biological replicates.
FIGS. 45A-45H show the effect of dominant negative MLH1 on guided editing product purity and off-target. FIG. 45A shows editing of encoded pegRNA programming base changes within a nascent 3' DNA flap and heteroduplex formation after flap interconversion. The non-editing pegRNA will have a 3' DNA flap templated perfectly complementary to the genomic target site. FIG. 45B shows the frequency of indels from PE3 or PE5 with four editing encoding pegRNA or four non-editing pegRNA with programmed single base mutations. The short horizontal bars represent the average of all individual values of n=3 independent biological replicates. FIG. 45C shows the ratio of the frequency of indels of PE5 relative to PE3 for either an edited encoded pegRNA with 4 programmed single base mutations or four edited encoded pegRNA with four non-edited pegRNA. The short horizontal bars represent the average of all individual values of n=3 independent biological replicates. FIG. 45D shows the deletion profile at genomic target DNA formed from PE3 and PE5 using 12 substitutions encoding pegRNA at the endogenous DNMT1 and RNF2 loci in HEK293T cells. The dashed lines indicate the positions of the pegRNA and sgRNA-guided incisions. Data represent mean ± SD of n=3 independent biological replicates. FIG. 45E shows the PE5/PE3 ratio of deletion frequencies that remove sequences greater than 25-nt outside the peRNA and sgRNA targeting nicks in HEK293T cells. Each dot represents one of 84 total pegrnas programmed to replace edits (n=average of 3 independent biological replicates) at seven loci in combination. FIG. 45F shows the PE5/PE3 ratio of the frequency of unexpected pegRNA scaffold sequence incorporation or unexpected editing of flap reconnections in HEK293T cells. Each dot represents one of 84 total pegrnas programmed to replace edits (n=average of 3 independent biological replicates) at seven loci in combination. Figure 45G shows off-target guided editing of PE2 and PE4 in HEK293T cells. Bars represent the average of n=3 independent biological replicates. FIG. 45H shows high throughput sequencing analysis of 17 sensitive microsatellite repeat loci for clinical diagnosis of MMR deficiency. HAP1 and HeLa cells have MMR capacity, while HCT116 cells have impaired MMR. After MSH2 knockout, hap1Δmsh2 cells underwent 60 cell divisions. HeLa cells were transiently transfected with PE2 or PE4 fractions and incubated for 3 days prior to sequencing. wt, wild type. All values of n=2 independent biological replicates are shown.
46A-46F show that the PEmax architecture with PE4 and PE5 editing systems enhances editing of disease-associated gene targets and cell types. FIG. 46A shows a schematic diagram of PE2 and PEmax editor architecture. bpNLSSV40, binary SV40NLS nuclear localization signal. MMLV RT, five mutants of Moloney murine leukemia virus reverse transcriptase; codon opt, human codon optimization. Figure 46B shows that engineered pegRNA (epegRNA) contains 3' rna structural motifs that improve guided editing properties. Fig. 46C shows the guided editing efficiency of PE4 and PE5 combined with PEmax architecture and epegRNA. Seven single base substitution edits targeting different loci were tested in HeLa and HEK293T cells. Fold change represents the average of fold increases per test edit. Mean ± SD of all individual values of n=3 independent biological replicates are shown. Figure 46D shows guided editing at treatment-related sites in wild-type HeLa and HEK293T cells. The HBB locus was edited at the commonly mutated E6 codon in sickle cell disease (E6V) patients. CDKL5 editing was located at the site where the c.1412dela mutation resulted in CDKL5 deficiency. epegRNA was used to edit the HBB, PRNP, and CDKL5 loci. Bars represent the average of n=3 independent biological replicates. FIG. 46E shows correction of CDKL5c.1412delA by A.T insertion and silencing G.C-to-A.T edits in iPSCs derived from allele heterozygous patients. Editing efficiency represents the percentage of c.1412dela corrected sequencing reads in the editable allele carrying the mutation. Indels frequency reflects all sequencing reads containing any indels. Bars represent the average of n=3 independent biological replicates. Fig. 46F shows guided editing in primary human T cells. Bars represent the average of n=3 independent biological replicates from different healthy T cell donors.
FIGS. 47A-47J show the design and results of Repair-seq screening for substitution-guided editing results. FIG. 47A shows the optimization of Staphylococcus aureus (Sa) -pegRNA for G.C-to-C.G editing installation within lentiviral integrated HBB sequences using SaPE2 in HEK293T cells. PBS, primer binding sites. Data represent the average of n=3 independent biological replicates. FIG. 47B shows the design of a directed editing Repair-seq lentiviral vector (pPC 1000). In the Repair-seq screen, 453bp region containing the CRISPRi sgRNA sequence and guiding the editing result was amplified from genomic DNA for double-ended Illumina sequencing. CRISPRi sgRNA was sequenced using a 44-nt Illumina forward read (R1) and a 263-nt Illumina reverse read (R2) was used to sequence the guided editing sites, including the +50 and-50 nick sites. The black triangles indicate the position of SaPE2 induced notch programmed by Sa-pegRNA and Sa-sgRNA. The sizes of all carrier components are drawn to scale. FIG. 47C shows a schematic of PE2, PE3+50 and PE3-50 guided editing constructs using the SaPE2 protein (SaCas 9N 580A fused to an engineered MMLV RT). FIG. 47D shows verification of the expected G.C-to-C.G editing at the Repair-seq editing site of lentiviral integration in HeLa cells expressing dCAS9-BFP-KRAB cells. Bars represent the average of n=2 independent biological replicates. FIG. 47E shows guided editing at the Repair-seq editing site with and without blasticidin selection in HeLa cells expressing dCAS 9-BFP-KRAB. The SaPE2-P2A-BlastR boot editor is applicable to all conditions. Bars represent the average of n=2. FIG. 47F shows the functional annotation class of pooled CriSPRi sgRNA library-targeted genes used in the Repair-seq screen. FIGS. 47G-47J show that knockdown of MSH2, MSH6, MLH1 and PMS2 increases the frequency of +6G.C-to-C.G-directed editing expected in all Repair-seq screens. The dots represent reads from a single CRISPRi sgRNA.
FIGS. 48A-48I show unexpected genetic modulators that guide the editing results. FIG. 48A shows a summary of the results of PE3-50 in a HeLa CRISPRI screen. TP53BP1 knockdown can significantly reduce the formation of all unexpected editing results. Fig. 48B shows additional details of the PE2 results in K562 CRISPRi screening, supplementing fig. 41H. Fig. 48C shows additional details of the result of pe3+50 in the K562 CRISPRi screen, supplementing the information in fig. 41G. FIGS. 48D-48I show a comparison of the effect of gene knockdown on the frequency of assigned outcome categories under assigned screening conditions. The number plotted is the average of log2 fold changes in non-targeted sgrnas for the two most extreme sgrnas per gene, average of 2 independent biological replicates per condition n=2. Error line marks repeatedly span a range of values. Black dots represent 20 random groups of three non-targeted sgrnas. Fig. 48D shows that MSH2, MLH1, and PMS2 knockdown produced a larger fold change in the installation of the additional edits than the expected edits in the K562 PE2 screen. FIG. 48E shows that unintended ligation of reverse transcribed sequences in PE2 screening in K562 and HeLa cells was most increased by knockdown of the Vanconi anemia gene (red) and a set of RAD51 homologs and other genes involved in homologous recombination (blue). FIG. 48F shows that the loss in PE2 screening in K562 and HeLa cells was most increased by a set of RAD51 homologs and other genes (blue) involved in homologous recombination. Fig. 48G shows that HLTF knockdown produces a larger fold change at the installation of additional edits than expected edits in the k562 pe3+50 screen, in addition to MSH2, MLH1, and PMS 2. Figure 48H shows that tandem repeats in HeLa and k562 pe3+50 screens were most reduced by knock-down of POLD and RFC subunits. FIG. 48I shows that deletions in HeLa PE3+50 and PE3-50 screens have significantly different genetic mediators, highlighting differences in the treatment of different protrusion configurations.
Fig. 49A-49F show verification of pilot editing of the Repair-seq screening results. FIGS. 49A-49B show the alignment of Sa-pegRNA, its templated 3' DNA flap after SaPE2 reverse transcription, and genomic target sequence (top). Compared to the Sa-pegRNA used in the Repair-seq screen (FIG. 49A), the Sa-pegRNA with recoded scaffold sequence (FIG. 49B) template extended 3' DNA flap with reduced homology to genomic target sequence. The recoded Sa-pegRNA contains 2 base pair changes, which preserve the base pairing interactions within the scaffold. Reverse transcription of the Sa-pegRNA scaffold can generate a misextended 3' flap that is incorporated into the genome. Vertical lines depict base pairing. X describes the mismatch between the erroneously extended reverse transcribed 3' flap and the genomic sequence. Fig. 49A-49B also show the frequency of editing result categories observed at screening editing sites from the aligned PE2 and PE3+50 experiments in HeLa CRISPRi cells (bottom). Guided editing using either the Sa-pegRNA used in the siteRepair-seq screen (FIG. 49A) or the recoded Sa-pegRNA (FIG. 49B) resulted in different frequencies of unintended editing installations from nearly matched scaffolds. For each cell line containing MSH2 or non-targeted CRISPRi sgrnas, the number plotted is the average of n=4 independent biological replicates +sd. FIG. 49C shows the mechanism of human DNA mismatch repair. Figure 49D shows that directing mismatch repair editing heteroduplex intermediates can install additional non-programmed nicks from mutlα endonuclease activity. The removal from these non-programmed cuts and subsequent repair of the resulting intermediates may result in larger and more frequent indel byproducts observed from MMR activity. Figure 49E shows knockdown efficiency of siRNA treatment in HEK293T cells relative to non-targeted siRNA controls. Cells were transfected with siRNA, incubated for 3 days, then again with PE2, pegRNA and the same siRNA, then incubated for 3 more days, and then relative RNA abundance was determined by RT-qPCR. NT, non-targeted. Data represent the average of n=3 independent biological replicates. Each point represents the average of n=3 technical replicates. Figure 49F shows edits in HEK293T cells co-transfected with the guide editor component and siRNA. Cells were not pre-treated with siRNA prior to transfection with the pilot editor. Bars represent the average of n=3 independent biological replicates.
FIGS. 50A-50H show the development and characterization of dominant negative MMR proteins that enhance guided editing. Figure 50A shows the guided editing efficiency of MMR proteins or dominant negative variants expressed trans-or directly fused to PE2 in HEK293T cells. 32aa linker, (SGGS). Times.2-XTEN- (SGGS). Times.2 (SEQ ID NO: 125) (SGGSSGGSSGSETPGTSESATPES SGGSSGGS (SEQ ID NO: 125)) or structurally, [ SGGS ] - [ SGGS ] - [ SGSETPGTSESATPES ] - [ SGGS ] - [ SGGS ] (SEQ ID NO: 125)). Codon optimization, human codon optimization. The data in the same graph are from experiments performed simultaneously. Data represent mean ± SD of n=3 independent biological replicates. FIG. 50B shows titration of transfection doses of MLH1dn plasmid and PE2 plasmid in HEK293T cells. The maximum plasmid amounts tested were 200ng PE2 and 100ng MLH1dn. Data represent mean ± SD of n=3 independent biological replicates. FIG. 50C shows pilot editing with MLH1dn coexpression in MMR-deficient HCT116 cells, which HCT116 cells contain a biallelic deletion in MLH 1. Bars represent the average of 3 replicates. FIG. 50D shows a comparison of guided editing using human MLH1dn (human codon optimization) or mouse MLH1dn (mouse codon optimization) in human HEK293T cells. Bars represent the average of n=3 independent biological replicates. FIG. 50E shows a comparison of pilot editing using human MLH1dn (human codon optimization) or mouse MLH1dn (mouse codon optimization) in mouse N2A cells. Bars represent the average of n=3 replicates. FIG. 50F shows that MLH1 knockdown in cloned HeLa cell lines enhances pilot editing efficiency to a greater extent than MLH1 co-expression in cloned wild-type HeLa cells. Delta, knockdown. Bars represent the average of n=3 or 4 independent biological replicates. FIG. 50G shows editing at the FANCF locus using PE3b and PE5b (complementary strand nicks specific for the edited sequences) in HEK293T cells. PE5b, PE3b editing systems are co-expressed with MLH1dn. Bars represent the average of n=3 independent biological replicates. Fig. 50H shows editing at HEK2 locus with complementary strand nicks in HEK293T cells. "none" indicates a lack of a cut, which represents a PE2 or PE4 editing strategy. Bars represent the average of n=3 independent biological replicates.
FIGS. 51A-51J show characterization of PE4 and PE5 in different guided editing categories and cell types. FIG. 51A shows a comparison of PE2, PE3, PE4 and PE5 for 84 single base substitution directed editing at seven endogenous sites in HEK293T cells. Bars represent the average of n=3 independent biological replicates). FIG. 51B provides a summary of the enhancement in editing efficiency of PE4 relative to PE2 for 84 single base substitution edits to seven endogenous sites in HEK293T cells. The fold improvement of PE4/PE2 over PAM editing may be low due to the higher basal editing efficiency of PAM editing or the high representativeness of g·c-to-c·g editing (five out of 15 in this category). Data represent mean ± SD of n=3 independent biological replicates. FIG. 51C shows the efficiency of single base substitution guided editing of PAM (+5G and +6G bases) that alters the targeted pre-spacer guided editing in HEK293T cells. Four G.C-to-A.T, five G.C-to-C.G and six G.C-to-T.A PAM edits are shown between the seven endogenous sites combined. The average of all individual values for n=3 independent biological replicates is shown. FIG. 51D shows the effect of siRNA knockdown of MMR gene on G.C-to-C.G editing at the RNF2 locus in HEK293T cells. Bars represent the average of n=3 independent biological replicates. FIG. 51E shows the effect of MMR gene knockout on G.C-to-C.G editing at the RNF2 locus in HAP1 cells. Delta, gene knockout. Bars represent the average of n=3 independent biological replicates. Fig. 51F shows guided editing in HeLa CRISPRi cells with CRISPRi knockdown at an integrated screening editing site. PE2 represents editing with the SaPE2 protein and Sa-pegRNA. PE3+50 represents the editing of SaPE2 protein, sa-pegRNA and Sa-sgRNA programming +50 complementary strand nicks. Bars represent the average of n=5 independent biological replicates. FIG. 51G provides a summary of the enhancement in editing efficiency of PE5 relative to PE3 when 84 single base substitution edits were made in HEK293T cells. The total mean ± SD of all individual values for n=3 independent biological replicates is shown. FIG. 51H shows enhancement of editing efficiency of PE4 over PE2 over a range of insertion and deletion guide editing lengths in HEK293T cells. A total of 33 different guided edits at the combined three endogenous loci are shown. The average of all individual values for n=3 independent biological replicates is shown. FIG. 51I shows that PE5 increases editing efficiency and reduces indel by-products in small indel and indel guided editing in HEK293T cells as compared to PE 3. FIG. 51J shows PE2 and PE4 editing efficiency of 33 different insertion and deletion guided editing across three endogenous loci combined. Bars represent the average of all individual values of n=3 independent biological replicates.
FIGS. 52A-52C show characterization of PE4 and PE5 systems and improved guided editing efficiency using additional silent mutations. FIG. 52A shows consecutive base substitutions using PE2 and PE4 in HEK293T cells. The top sequence represents the original, unedited genomic sequence. The numbers indicate the positions of the edited nucleotides relative to the PE2 nick site. Nucleotides within the SpCas9 PAM sequence (NGG) are underlined. The sequence of the expected editing product is shown below, with the edited nucleotide marked red. Bars represent the average of n=3 independent biological replicates. Figure 52B shows that installing additional silent mutations can increase guided editing efficiency by evading MMR. The fold change in PE4/PE2 edit frequency reflects the extent of guided editing indicated by the MMR activity inhibition. Editing nucleotides producing the designated coding mutation are marked red and editing nucleotides producing the silent mutation are marked green. Data represent mean ± SD of n=3 independent biological replicates. FIG. 52C shows the installation of 22 single base substitutions between seven endogenous sites in HeLa cells using PE2, PE3, PE4 and PE5 to guide editing. Bars represent the average of n=3 independent biological replicates.
FIGS. 53A-53G show the effect of dominant negative MLH1 on directed edit product purity and off-target. FIG. 53A shows indel frequencies in HEK293T cells treated with pegRNA, nicking-producing sgRNA and PE2 enzyme, RT-damaged PE2 (PE 2-dRT) or nicking enzyme Cas9 (SpCas 9H 840A) with or without MLH1 dn. The non-editing pegRNA encodes a 3' DNA flap with perfect homology to the genomic target. Bars represent the average of n=3 independent biological replicates. FIG. 53B shows the distribution of deletion results of PE3 and PE5 at endogenous loci in HEK293T cells. 12 different programmed single base substituted pegRNAs were tested at each indicated locus. The dashed lines indicate the positions of the pegRNA and sgRNA-guided incisions. Data represent mean ± SD of n=3 independent biological replicates. FIG. 53C shows the distribution of deletion results for PE3 and PE5 using editing encoded and non-editing pegRNA in HEK293T cells. The non-editing pegRNA will have a 3' DNA flap templated perfectly complementary to the genomic target sequence. Data represent mean ± SD of n=3 independent biological replicates. Figure 53D shows the frequency of all guide editing results for unexpected pegRNA scaffold sequence incorporation or unexpected flap reconnection in HEK293T cells. The 12 pegRNAs each programmed with a different single base substitution were tested at each of the seven designated loci. Each dot represents a single pegRNA at a given locus (n=average of 3 independent biological replicates). Figure 53E shows off-target guided editing of PE2 and PE4 in HEK293T cells (n=average of 3 independent biological replicates). Figure 53F shows the distribution and cumulative distribution of microsatellite repeat lengths in a given cell type and treatment. HAP1 and HeLa cells have MMR capacity, while HCT116 cells have impaired MMR. After MSH2 knockout, hap1Δmsh2 cells underwent 60 cell divisions. HeLa cells were transiently transfected with PE2 or PE4 fractions and grown for 3 days prior to sequencing. wt, wild type. All values of n=2 independent biological replicates are shown. FIG. 53G shows guided editing at a target locus in HeLa cells transfected with PE2 or PE4 components. Bars represent the average of n=2 independent biological replicates. Microsatellite length measurements were performed on genomic DNA from these PE2 and PE4 treated HeLa cells.
54A-54F show that the use of the Pemax architecture with PE4 and PE5 editing systems enhances editing of disease-associated gene targets and cell types. FIG. 54A shows a schematic diagram of the PE2 and PEmax editor architecture. bpNLS (binary non-linear system) SV40 Binary SV40 NLS. MMLV RT, moloney murine leukemia virus reverse transcriptase five mutant. GS codon, genscript human codon optimization. Figure 54B shows engineered pegRNA (epegRNA) containing a 3' rna structural motif that improves guided editing performance. Fig. 54C shows the guided editing efficiency of PE4 and PE5 combined with the Pemax architecture and epegRNA. Seven single base substitution edits targeting different loci were tested in HeLa and HEK293T cells. Fold change represents the average of fold increases per test edit. Mean ± SD of all individual values of n=3 independent biological replicates are shown. Figure 54D shows guided editing at treatment-related sites in wild-type HeLa and HEK293T cells. The HBB locus was edited at the commonly mutated E6 codon in sickle cell disease (E6V) patients. CDKL5 editingAt the site where the c.1412delA mutation leads to CDKL5 deficiency. epegRNA was used to edit the HBB, PRNP, and CDKL5 loci. Bars represent the average of n=3 independent biological replicates. FIG. 54E shows correction of CDKL5c.1412delA edited via A.T insertion and silencing G.C-to-A.T in iPSCs derived from allele heterozygous patients. Editing efficiency represents the percentage of sequencing reads with c.1412dela correction in the editable allele carrying the mutation. Indels frequency reflects all sequencing reads containing any indels. Bars represent the average of n=3 independent biological replicates. Fig. 54F shows guided editing in primary T cells. Bars represent the average of n=3 independent biological replicates from different healthy T cell donors.
FIGS. 55A-55B show the development of PEmax and the use of PE4 and PE5 for primary cell types. FIGS. 55A-55B show the screening of guided editor variants to maximize editing efficiency in HeLa cells. All guide editor constructs carry Cas9H840A mutations to prevent nicking of the complementary DNA strand on the pre-target spacer. * NLSSV40 contains a 1-aa deletion outside the PKKKRKV (SEQ ID NO: 132) NLSSV40 consensus sequence. All individual values of n=3 independent biological replicates are shown.
FIGS. 56A-56G show the development of PEmax and the use of PE4 and PE5 for primary cell types. FIG. 56A shows the selection of guided editor variants for improving the editing efficiency of the PE3 system in HeLa cells. All guide editor constructs carried the SpCas 9H840A mutation to prevent nicking of the complementary DNA strand on the pre-target spacer. NLSSV40 denotes a binary SV40 NLS. * NLSSV40 contains a 1-aa deletion outside the PKKKRKV (SEQ ID NO: 132) NLSSV40 consensus sequence. All individual values of n=3 independent biological replicates are shown. Fig. 56B shows the architecture of the original PE2 editor (Anzalone et al, 2019), PE2 (Liu et al, 2021), CMP-PE-V1 (Park et al, 2021), and boot editor variants (PEmax, CMP-PEmax) developed by the present work. HN1, HMGN1; H1G, histone H1 central globular domain; codon optimization, human codon optimization. Fig. 56C shows that PEmax outperforms other bootstrap editor architectures using the PE3 system in HeLa cells. Bars represent the average of n=3 independent biological replicates. Figure 56D shows fold change in editing efficiency of the bootstrap editor architecture compared to the PE2 and PE3 systems in HeLa cells. Mean ± SD of all individual values of n=3 independent biological replicates are shown. FIG. 56E shows the expected edit and indel frequencies of PE4, PE4max (PE 4 edit system with Pemax architecture), PE5, and PE5max (PE 5 edit system with Pemax architecture) in HeLa and HEK293T cells. Seven substitution-directed edits targeting different endogenous loci were tested for each condition. The mean + SD of all individual values of n=3 independent biological replicates is shown. FIG. 56F shows correction of CDKL5c.1412delA via A.T insertions and G.C-to-A.T edits in iPSCs derived from patients heterozygous for the disease allele. Editing efficiency represents the percentage of sequencing reads with c.1412dela correction in the editable allele carrying the mutation. Indels frequency reflects all sequencing reads containing any indels that do not map to the c.1412dela allele or wild type sequence. 1 μg PE2 mRNA was used under all conditions indicated. Bars represent the average of n=3 independent biological replicates. Fig. 56G shows guided editing in primary T cells. Bars represent the average of n=3 independent replicates from different T cell donors.
FIGS. 57A-57B show the unexpected result that recoding the pegRNA scaffold reduces incorporation of the scaffold sequences. FIG. 57A shows an alignment of the guide editing of the Repair-seq target site and the 3' DNA flap generated by SaPE2, templated by either the Sa-pegRNA used in the (upper) Repair-seq screen or the Sa-pegRNA with recoded scaffold sequences (lower). The 3' flap sequence was aligned with the templated region of the Sa-pegRNA shown above (RT template or scaffold). Red indicates the expected +6g·c to c·g edited positions programmed by the two Sa-pegrnas. Blue indicates where the genomic target sequence is not aligned with the 3' flap sequence templated by the Sa-pegRNA scaffold. Incorporation of a 3' flap containing reverse transcribed Sa-pegRNA scaffold sequences may result in unintended editing at these blue indicated nucleotides. Fig. 57B shows a summary of the editing result categories observed in the PE2 and pe3+50 experiments in HeLa CRISPRi cells. Screening pegRNA means Sa-pegRNA for guiding editing of Repair-seq screening. The Sa-pegRNA with recoded scaffolds (sequences shown in FIG. 54A) avoided sequence homology to the Repair-seq editing site. The number plotted is the mean ± SD of one CRISPRi sgRNA per specified target (MSH 2 and non-target), averaged over n=4 independent biological replicates.
Fig. 58 shows a comparison of PE3max (PE 3 editing system with PEmax protein) and PE3 (PE 3 editing system with PE2 protein) in HeLa cells (average of n=3 independent biological replicates).
Fig. 59 shows that the PE improvement of MLH1dn depends on the pilot editing size. MMR most efficiently repairs substitution, insertion and deletion errors less than or equal to about 13bp in length.
FIG. 60 shows that PE4 and epegRNA can be guide edited using a single pegRNA integrant.
FIG. 61 shows PE5 improves the installation of protective Christchurch alleles in the APOE4 mouse astrocyte model.
FIGS. 62A-62C show the efficiency and accuracy of inhibiting p53 from enhancing PE3 guided editing. This is especially true when the nicking-producing sgrnas form a nick upstream (-on the side) of the pegRNA-directed nick. Each dot on the graph represents a single CRISPRi gene knock-down in the Repair-seq screen. The axis depicts log2 fold change compared to the control. Knocking down TP53BP1 (p 53 gene) increased the expected edits (x-axis) and reduced three types of unexpected edits (y-axis), including ligating reverse transcribed sequences at unexpected positions (fig. 62A), unexpected deletions (fig. 62B), and unexpected tandem repeats (fig. 62C).
Figure 63 shows that p53 inhibitors (i 53) can enhance the efficiency and accuracy of PE3 guided editing. This is especially true when the nicking-producing sgrnas form a nick upstream (-on the side) of the pegRNA-directed nick. Only the EMX1 site was nicked on the "-" side. FIG. 64 represents various aspects of the present disclosure, including the demonstration that using CRISPri screening to reveal cellular genes (including mismatch repair genes) that have an impact on pilot editing results, using engineered MLH1 of the mismatch repair (MMR) pathway to improve the efficiency and accuracy of pilot editing, and that the improved pilot editing systems described herein (e.g., PE4 and PE5 systems and PEmax editors) exhibit the same beneficial effects in many cell types.
FIG. 64 shows that CRISPRi screening reveals cell determinants leading genome editing, engineered MLH1 proteins enhance leading editing efficiency and accuracy, and that improved leading editing systems are characterized between editing and cell types.
FIG. 65 provides a schematic showing the optimization of PE2 proteins.
FIG. 66 shows fold-changes in the expected edit frequency using PE2 and various other PE constructs on a range of gene targets (HEK 3, EMX1, RNF2, FANCF, FUNX1, DNMT1, VEGFA, HEK4, PRNP, APOE, CXCR, HEK 3) in HEK293T cells (low plasmid dose).
FIG. 67 shows fold-changes in expected editing frequency using PE3 and various guide editor constructs on a range of gene targets (HEK 3, FANCF, RUNX1, VEGFA) in HeLa cells.
Fig. 68 shows a comparison of pilot editing in HEK293T with HeLa editing using various PE constructs.
Fig. 69 shows NLS architecture optimization of PE3 in HeLa cells.
FIG. 70 provides a sequence showing a sequence corresponding to SEQ ID NO: schematic of the final PEmax construct of 99.
Fig. 71 shows that PEmax adds indels in addition to the intended edits.
Definition of the definition
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The following references provide the skilled artisan with a general definition of many of the terms used in the present invention: singleton et al Dictionary of Microbiology and Molecular Biology (2 nd ed 1994); the Cambridge Dictionary of Science and Technology (Walker ed., 1988); the Glossary ofGenetics,5th Ed., r.rieger et al (Ed.), springer Verlag (1991); and Hale & Marham, the Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings given to them unless otherwise indicated.
Cas9
The term "Cas9" or "Cas9 nuclease" refers to a protein comprising a Cas9 domain or fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or a gRNA binding domain of Cas 9). As used herein, a "Cas9 domain" is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or a gRNA binding domain of Cas 9. The "Cas9 protein" is a full-length Cas9 protein. Cas9 nucleases are sometimes also referred to as Cas 1 nucleases or CRISPR (clustered regularly interspaced short palindromic repeats) related nucleases. CRISPR is an adaptive immune system that can provide protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters comprise a spacer, a sequence complementary to a preceding mobile element and a target invasion nucleic acid. The CRISPR cluster was transcribed and processed into CRISPR RNA (crRNA). In a type II CRISPR system, correct processing of pre-crrnas requires trans-encoded small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc) and Cas9 domains. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular dsDNA target complementary to the spacer in an endonucleolytic manner. Target strands that are not complementary to crrnas are first cleaved in an endonucleolytic manner and then trimmed in a 3'-5' exonucleolytic manner. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gNRA") may be engineered to incorporate aspects of both crRNA and tracrRNA into a single RNA species. See, e.g., jink m., chlinski k, fonfara i, hauer m, doudna j.a., charplenier e.science 337:816-821 (2012), the entire contents of which are incorporated herein by reference. Cas9 recognizes short motifs in CRISPR repeats (PAM or pre-spacer adjacent motifs) to help distinguish self from non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., "Complete genome sequence of an M1 strain of Streptococcus pyogenes," ferrotti et al, j.j., mcShan w.m., ajdic d.j., savic g., lyon k., primeaux c, sezate s, suvorov a.n., kenton s, lai h.s., lin s.p., qian y, jia h.g., najar f.z., ren q., zhu h., song l., white j., yuan x, clifton s.w., roe B.A., mcLaughlin R.E., proc.Natl.Acad.Sci.U.S.A.98:4658-4663 (2001); "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III." Deltcheva E., "Chulinski K.," Sharma C.M., "Gonzales K.," Chao y., pirzada Z.A., eckert M.R., vogel J., "Charpentier E.," Nature 471:602-607 (2011), "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial Immunity", "Jinek M.," Chundiski K., "Fonfara I.," Hauer M., "Doudna J.A., charpentier E.Science 337:816-821 (2012), each of which is incorporated herein by reference in its entirety). Cas9 orthologs have been described in various species including, but not limited to, streptococcus pyogenes(s) and streptococcus thermophilus (s.thermophilus). Other suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on the present disclosure, and such Cas9 nucleases and sequences include those from chlkinski, rhun, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference) are disclosed in the organisms and loci. In some embodiments, the Cas9 nuclease comprises one or more mutations that partially damage or inactivate the DNA cleavage domain.
The nuclease-inactivated Cas9 domain is interchangeably referred to as the "dCas9" protein (representing the nuclease- "dead" Cas 9). Methods for producing Cas9 domains (or fragments thereof) with inactive DNA cleavage domains are known (see, e.g., jink et al, science.337:816-821 (2012); qi et al, "Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression" (2013) cell.28;152 (5): 1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, an HNH nuclease subdomain and a RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, while the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas 9. For example, mutations D10A and H840A completely inactivate nuclease activity of Streptococcus pyogenes Cas9 (Jinek et al, science.337:816-821 (2012); qi et al, cell.28;152 (5): 1173-83 (2013)). In some embodiments, proteins comprising Cas9 fragments are provided. For example, in some embodiments, the protein comprises one of two Cas9 domains: (1) a gRNA binding domain of Cas 9; or (2) a DNA cleavage domain of Cas 9. In some embodiments, a protein comprising Cas9 or a fragment thereof is referred to as a "Cas9 variant. Cas9 variants have homology to Cas9 or fragments thereof. For example, the Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 2). In some embodiments, cas9 variants can have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 2, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant comprises SEQ ID NO: a fragment of 2 Cas9 (e.g., a gRNA binding domain or a DNA cleavage domain) such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild-type Cas9 (e.g., spCas9 of SEQ ID No. 2). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of the corresponding wild-type Cas9 (e.g., spCas9 of SEQ ID NO: 2).
Annular arrangement body (permament)
As used herein, the term "circular arrangement" refers to a protein or polypeptide (e.g., cas 9) comprising a circular arrangement (permutation) that is a change in the structural structure of a protein, involving a change in the amino acid sequence that occurs in the amino acid sequence of the protein. In other words, the circular arrangement is a protein having altered N-and C-termini as compared to the wild-type counterpart, e.g., the wild-type C-terminal half of the protein becomes the new N-terminal half. The circular arrangement (or CP) is essentially a topological rearrangement of the primary sequence of a protein, typically using a peptide linker to join its N and C termini, while splitting its sequence at different positions to form new adjacent N and C termini. The result is a protein structure with different connectivity, but may generally have the same overall similar three-dimensional (3D) shape, and may include improved or altered characteristics, including reduced proteolytic susceptibility, increased catalytic activity, altered substrate or ligand binding, and/or increased thermostability. The circularly permuted proteins may be present in nature (e.g., concanavalin a and lectin). Furthermore, the circular arrangement may occur as a result of post-translational modification, or may be engineered using recombinant techniques.
Annularly arranged Cas9
The term "circularly permuted Cas9" refers to any Cas9 protein or variant thereof that exists as a circular permutation whereby its N-and C-termini have been partially rearranged. Such a circular arrangement of Cas9 proteins ("CP-Cas 9") or variants thereof retains the ability to bind DNA when complexed with guide RNAs (grnas). See Oakes et al, "Protein Engineering of Cas9 for enhanced function," Methods Enzymol,2014, 546:491-511 and Oakes et al, "CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification," Cell, january 10, 2019, 176:254-267, each of which is incorporated herein by reference. The present disclosure contemplates any previously known CP-Cas9 or use of a new CP-Cas9, so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with guide RNAs (grnas). An exemplary CP-Cas9 protein is SEQ ID NO:54-63.
CRISPR
CRISPR is a family of DNA sequences in bacteria and archaebacteria (i.e., CRISPR clusters) that represent fragments of a prior infection by viruses that have invaded prokaryotes. Prokaryotic cells use DNA fragments to detect and destroy DNA from subsequent attack by similar viruses and effectively constitute a prokaryotic immune defense system along with a series of CRISPR-associated proteins (including Cas9 and its homologs) and CRISPR-associated RNAs. In fact, the CRISPR cluster is transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crrnas requires trans-encoded small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc), and Cas9 proteins. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular dsDNA target complementary to the RNA in an endonucleolytic manner. Specifically, the target strand that is not complementary to crRNA is first cleaved in an endonucleolytic manner, and then trimmed in a 3'-5' exonucleolytic manner. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gNRA") can be engineered to incorporate aspects of both crRNA and tracrRNA into a single RNA species, the guide RNA. See, e.g., jink m., chlinski k, fonfara i, hauer m, doudna j.a., charplenier e.science 337:816-821 (2012), the entire contents of which are incorporated herein by reference. Cas9 recognizes short motifs in CRISPR repeats (PAM or pre-spacer adjacent motifs) to help distinguish self from non-self. CRISPR biology and Cas9 nuclease sequences and structures are well known to those skilled in the art (see, e.g., "Complete genome sequence of an M1 strain of Streptococcus pyogenes", "ferrotti et al, j.j., mcshin w.m., ajdic d.j., savic g., lyon k., primeaux c, sezate s, suvorov a.n., kenton s, lai h.s, lin s.p., qian y, jia h.g., najar f.z., ren q., zhu h., song l., white j., yuan x, clifton s.w., roe B.A., mcLaughlin R.E., proc.Natl.Acad.Sci.U.S.A.98:4658-4663 (2001);" CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. "Deltcheva E.," Chulinski K., "Shalma C.M.," Gonzales K., "Chao Y.," Pirzada Z.A., "Eckert M.R.," Vogel J., "Charpentier E.," Nature 471:602-607 (2011), "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity", "Jinek M.," Chundiski K., "Fonfara I.," Hauer M., "Doudna J.A.," Charpentier E.Science 337:816-821 (2012), the entire contents of which are incorporated herein by reference). Cas9 orthologs have been described in different species, including but not limited to streptococcus pyogenes and streptococcus thermophilus. Other suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on the present disclosure, and such Cas9 nucleases and sequences include those from chlinski, rhun, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference).
In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crrnas requires trans-encoded small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc), and Cas9 proteins. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular nucleic acid target complementary to the RNA in an endonucleolytic manner. Specifically, the target strand that is not complementary to crRNA is first cleaved with endonucleolytic mode, and then trimmed with 3'-5' exonucleolytic mode. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gRNA") may be engineered to incorporate embodiments of both crRNA and tracrRNA into a single RNA species, the guide RNA.
In general, the "CRISPR system" is collectively referred to as transcripts and other elements involved in the expression of or directing the activity of a CRISPR-associated ("Cas") gene, including sequences encoding Cas genes, tracr (trans-activated CRISPR) sequences (e.g., tracrRNA or active moiety tracrRNA), tracr mate sequences (including "direct repeat" in the context of an endogenous CRISPR system and direct repeat of portions of tracrRNA processing), guide sequences (also referred to as "spacers" in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. The tracrRNA of this system is complementary (in whole or in part) to the tracrRNA pairing sequence present on the guide RNA.
DNA synthesis template
As used herein, the term "DNA synthesis template" refers to a region or portion of an extension arm of PEgRNA that is used as a template strand by the polymerase of the guided editor to encode a 3' single stranded DNA flap that contains the desired editing, and then replaces the corresponding endogenous DNA strand at the target site by the guided editing mechanism. The "(extension arm (including DNA synthesis template) may consist of DNA or RNA. In the case of RNA, the polymerase that directs the editor may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of DNA, the polymerase that directs the editor may be a DNA-dependent DNA polymerase. In various embodiments, the DNA synthesis template may comprise an "editing template" and a "homology arm", and optionally all or part of the 5' end modification region e 2. That is, depending on the nature of the e2 region (e.g., whether it includes hairpin, toe loop, or stem/loop secondary structures), the polymerase may also encode none, some, or the entire e2 region. In other words, in the case of a 3' extension arm, the DNA synthesis template may comprise the portion of the extension arm spanning the 5' end of the Primer Binding Site (PBS) to the 3' end of the gRNA core that can function as a template for DNA single strand synthesis by a polymerase (e.g., reverse transcriptase). In the case of a 5' extension arm, the DNA synthesis template may comprise a portion of the extension arm spanning the 5' end of the PEgRNA molecule to the 3' end of the editing template. In some embodiments, the DNA synthesis template does not include a Primer Binding Site (PBS) for PEgRNA with a 3 'extension arm or a 5' extension arm. Certain embodiments described herein refer to "RT templates" that include editing templates and homology arms, i.e., sequences of PEgRNA extension arms that actually serve as templates during DNA synthesis. The term "RT template" is equivalent to the term "DNA synthesis template". In certain embodiments, an RT template may be used to refer to a template polynucleotide for reverse transcription, for example in a guided editing system, a complex, or a method using a guided editor with a polymerase as reverse transcriptase. In some embodiments, a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization (e.g., RNA-dependent DNA polymerization or DNA-dependent DNA polymerization), for example in a guided editing system, complex, or method using a guided editor that is a polymerase of RNA-dependent DNA polymerase or DNA-dependent DNA polymerase.
In some embodiments, the DNA synthesis template is a single-stranded portion of PEgRNA that is 5' to PBS and comprises a region complementary to a PAM strand (i.e., a non-target strand or an editing strand) and comprises one or more nucleotide edits compared to the endogenous sequence of the double-stranded target DNA. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand downstream of the nick site, except for one or more non-complementary nucleotides at the desired nucleotide editing position. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand immediately downstream (i.e., immediately downstream) of the nicking site, except for one or more non-complementary nucleotides at the desired nucleotide editing position. In some embodiments, one or more non-complementary nucleotides at the desired nucleotide editing position are immediately downstream of the nicking site. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to a double stranded target DNA sequence. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to a non-target strand of a double-stranded target DNA sequence. For each PEgRNA described herein, the nick site is characteristic of the specific napDNAbp with which the gRNA core of the PEgRNA is associated, and is characteristic of the specific PAM required for the recognition and function of napDNAbp. For example, for PEgRNA comprising a gRNA core associated with SpCas9, the cleavage site is located at the phosphodiester bond between base three (position "-3" relative to PAM sequence position 1) and base four (position "-4" relative to PAM sequence position 1). In some embodiments, the DNA synthesis template and the primer binding site are directly adjacent to each other. The terms "nucleotide edit," "nucleotide change," "desired nucleotide change," and "desired nucleotide edit" are used interchangeably to refer to a particular nucleotide edit, e.g., a particular deletion of one or more nucleotides, a particular insertion of one or more nucleotides, a particular substitution(s) of one or more nucleotides, or a combination thereof, at a particular position in a DNA synthesis template of PEgRNA to be incorporated into a target DNA sequence. In some embodiments, the DNA synthesis template comprises more than one nucleotide edit relative to the double stranded target DNA sequence. In such embodiments, each nucleotide edit is a specific nucleotide edit at a specific location in the DNA synthesis template, each nucleotide edit is at a different specific location relative to any other nucleotide edits in the DNA synthesis template, and each nucleotide edit is independently selected from a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution (or substitutions) of one or more nucleotides, or a combination thereof. Nucleotide editing may refer to editing of a DNA synthesis template compared to a sequence on a target strand of a target gene, or nucleotide editing may refer to editing encoded by a DNA synthesis template on newly synthesized single-stranded DNA that replaces an endogenous target DNA sequence on a non-target strand.
Dominant negative variants
The terms "dominant negative variant" and "dominant negative mutant" refer to a gene or gene product (e.g., a protein) that comprises a mutation that results in antagonism (i.e., inhibition of the activity) of the gene product with the wild-type gene product. Dominant negative mutations typically result in altered molecular function (usually negative). For example, the present disclosure provides dominant negative variants of MMR proteins that inhibit the activity of wild-type MMR proteins (e.g., dominant negative MLH1 proteins described herein).
Editing template
The term "editing template" refers to a portion of an extension arm that encodes the desired edit in a single-stranded 3' DNA flap synthesized by a polymerase, e.g., a DNA-dependent DNA polymerase, an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). Certain embodiments described herein refer to "RT templates," which refer to both editing templates and homology arms, i.e., sequences of PEgRNA extension arms that actually serve as templates during DNA synthesis. The term "RT editing template" is also equivalent to the term "DNA synthesis template", but wherein the RT editing template reflects the use of a guide editor with a polymerase as reverse transcriptase, wherein the DNA synthesis template more broadly reflects the use of a guide editor with any polymerase.
Extension arm
The term "extension arm" refers to a nucleotide sequence component of PEgRNA that comprises a primer binding site and a DNA synthesis template (e.g., editing template and homology arm) of a polymerase (e.g., reverse transcriptase). In some embodiments, the extension arm is located at the 3' end of the guide RNA. In other embodiments, the extension arm is located at the 5' end of the guide RNA. In some embodiments, the extension arm comprises a DNA synthesis template and a primer binding site. In some embodiments, the extension arm comprises the following components in the 5 'to 3' direction: a DNA synthesis template and a primer binding site. In some embodiments, the extension arm further comprises a homology arm. In various embodiments, the extension arm comprises the following components in the 5 'to 3' direction: homology arms, editing templates, and primer binding sites. Since the polymerization activity of reverse transcriptase is in the 5 'to 3' direction, the preferred arrangement of homology arms, editing templates and primer binding sites is in the 5 'to 3' direction, so that once reverse transcriptase is primed by annealed primer sequences, DNA single strands are polymerized using editing templates as complementary template strands.
The extension arm can also be described as generally comprising two regions: for example, primer Binding Sites (PBS) and DNA synthesis templates. The primer binding site binds to a primer sequence that is formed from the endogenous DNA strand of the target site when it is directed to the editor complex to create a nick, thereby exposing the 3' end on the endogenous nick strand. As explained herein, binding of the primer sequence to the primer binding site on the PEgRNA extension arm creates a duplex region with an exposed 3' end (i.e., 3' of the primer sequence), which then provides a substrate for the polymerase to polymerize the DNA single strand starting from the exposed 3' end along the length of the DNA synthesis template. The sequence of the single stranded DNA product is the complement of the DNA synthesis template. Polymerization is continued toward the 5' side of the DNA synthesis template (or extension arm) until polymerization is terminated. Thus, the DNA synthesis template represents a portion of the extension arm that is encoded by the polymerase of the guided editor complex into a single-stranded DNA product (i.e., a 3' single-stranded DNA flap containing the desired genetic editing information) and ultimately displaces the corresponding endogenous DNA strand at the target site immediately downstream of the PE-induced nick site. Without being bound by theory, the DNA synthesis template continues to polymerize toward the 5' end of the extension arm until a termination event. The polymerization may terminate in a variety of ways including, but not limited to (a) reaching the 5 'end of the PEgRNA (e.g., in the case of a 5' extension arm, where the DNA polymerase simply depletes the template), (b) reaching an insurmountable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., blocking or inhibiting a specific nucleotide sequence of the polymerase, or a nucleic acid topology signal, such as supercoiled DNA or RNA.
Fusion proteins
As used herein, the term "fusion protein" refers to a hybrid polypeptide comprising protein domains from at least two different proteins. A protein may be located in the amino-terminal (N-terminal) portion of the fusion protein or in the carboxy-terminal (C-terminal) protein, thereby forming an "amino-terminal fusion protein" or a "carboxy-terminal fusion protein", respectively. The proteins can comprise different domains, for example, a nucleic acid binding domain (e.g., a gRNA binding domain of Cas9 that directs binding of the protein to a target site) and a nucleic acid cleavage domain or catalytic domain of a nucleic acid editing protein. Another example includes Cas9 or its equivalent to a reverse transcriptase. Any of the proteins provided herein can be produced by any method known in the art. For example, the proteins provided herein can be produced by recombinant protein expression and purification, which is particularly useful for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, including Green and Sambrook, molecular Cloning: a Laboratory Manual (4) th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)) which are incorporated herein by reference in their entirety.
Guide RNA ("gRNA")
As used herein, the term "guide RNA" is a specific type of guide nucleic acid that is typically associated with a Cas protein of CRISPR-Cas9 and associates with Cas9, directing the Cas9 protein into a DNA molecule that includes a specific sequence of complementarity to the pre-spacer of the guide RNA. However, the term also includes equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and otherwise program Cas9 equivalents to localize to a particular target nucleotide sequence. Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (type V CRISPR-Cas system), C2 (type VI CRISPR-Cas system), and C2C3 (type V CRISPR-Cas system). Other Cas equivalents are described in Makarova et al, "C2C2 is a single-component programmable RNA-guide RNA-targeting CRISPR effector," Science 2016;353 (6299), the contents of which are incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein. In addition, provided herein are methods for designing suitable guide RNA sequences. As used herein, a "guide RNA" is also referred to as a "traditional guide RNA" to be compared to a modified form of guide RNA, referred to as a "guide editing guide RNA" (or "PEgRNA").
The guide RNA or PEgRNA may comprise various structural elements including, but not limited to:
spacer sequence-sequence in guide RNA or PEgRNA (with a length of about 20 nt) that binds to a pre-spacer in target DNA.
gRNA core (or gRNA scaffold or backbone sequence) -refers to the sequence within the gRNA responsible for Cas9 binding, which does not include a 20bp spacer/targeting sequence for guiding Cas9 to target DNA. In some embodiments, the gRNA core or scaffold comprises a sequence comprising one or more nucleotide changes as compared to a naturally occurring CRISPR-Cas guide RNA scaffold, such as a Cas9 guide RNA scaffold. In some embodiments, the sequence of the gRNA core is designed to contain minimal or no sequence homology to the endogenous sequence of the target nucleic acid at the target site, thereby reducing unintended editing. For example, in some embodiments, one or more base pairs in the second stem loop of the Cas9 gRNA core can be "flipped" (e.g., G-U base pairs and U-a base pairs as illustrated in fig. 49A) to reduce unintended editing. In some embodiments, the gRNA core comprises no more than 1%, 5%, 10%, 15%, 20%, 25% or 30% sequence homology to a double stranded target DNA that is 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides flanking the one or more nucleotide editing positions.
Extension arm-single stranded extension at the 3 'or 5' end of PEgRNA comprising a primer binding site and a DNA synthesis template sequence encoding a single stranded DNA flap containing a genetic change of interest by a polymerase (e.g., reverse transcriptase) and then integrating into the endogenous DNA by displacing the corresponding endogenous strand, thereby installing the desired genetic change.
The transcription terminator-guide RNA or PEgRNA may comprise a transcription termination sequence at the 3' end of the molecule.
In some embodiments, the PEgRNA comprises a transcription termination sequence between the DNA synthesis template and the gRNA core.
Homology to
As used herein, the terms "homology", "homology" or "percent homology" refer to the degree of sequence identity between an amino acid or polynucleotide sequence and a corresponding reference sequence. "homology" may refer to polymeric sequences, such as similar polypeptide or DNA sequences. Homology may refer to, for example, a nucleic acid sequence having at least about: 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% identity. In other embodiments, a "homologous sequence" of a nucleic acid sequence may exhibit 93%, 95%, or 98% sequence identity to a reference nucleic acid sequence. For example, a "region homologous to a genomic region" may be a region of DNA having a similar sequence to a given genomic region in the genome. The homology region may be of any length sufficient to facilitate binding of the spacer or pre-spacer sequence to the genomic region. For example, the homology region may comprise at least 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, or more base lengths, so that the homologous regions have sufficient homology to bind to the corresponding genomic regions.
When specifying a percentage of sequence homology or identity, in the case of two nucleic acid sequences or two polypeptide sequences, the percentage of homology or identity generally refers to an alignment of two or more sequences over a portion of their length when compared and aligned for maximum correspondence. When a position in the comparison sequence can be occupied by the same base or amino acid, then the molecules can be homologous at that position. Unless otherwise indicated, sequence homology or identity is assessed over a specified length of a nucleic acid, polypeptide, or portion thereof. In some embodiments, homology or identity is assessed over a functional portion or a designated portion of length.
Sequence alignment for assessing sequence homology can be performed by algorithms known in the art, such as the Basic Local Alignment Search Tool (BLAST) algorithm, described in Altschul et al, j.mol. Biol.215:403-410, 1990. The public internet interface for performing BLAST analysis is accessible through the national center for biotechnology information. Other known algorithms include those published in the following: smith & Waterman, "Comparison of Biosequences", adv.appl. Math.2:482 1981; needleman & Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins" j.mol.biol.48:443 1970; pearson & Lipman "Improved tools for biological sequence comparison", proc. Natl. Acad. Sci. USA 85:2444 1988; or by automation of these or similar algorithms. Global alignment programs can also be used to align similar sequences of approximately equal size. Examples of global alignment programs include NEEDLE (available at www.ebi.ac.uk/Tools/psa/emboss_needle) which is part of an EMBOSS package (Rice P et al, trends Genet.,2000; 16:276-277), and GGSEARCH program FASTA. Both of these procedures are based on the Needleman-Wunsch algorithm, which is used to find the optimal alignment (including gaps) of the two sequences along their entire length. For a detailed discussion of sequence analysis see also Ausubel et al, unit 19.3 ("Current Protocols in Molecular Biology" John Wiley & Sons Inc,1994-1998, chapter 15, 1998).
The skilled artisan will appreciate that amino acid (or nucleotide) positions in homologous sequences can be determined based on alignment. For example, when aligned with the reference Cas9 sequence, "H840" in the reference Cas9 sequence may correspond to another corresponding position in the H839 or Cas9 homolog.
Host cells
As used herein, the term "host cell" refers to a cell that can contain, replicate and express a vector described herein, e.g., a vector comprising a nucleic acid molecule encoding an MLH1 variant and a fusion protein comprising Cas9 or a Cas9 equivalent and a reverse transcriptase.
Inhibition of
As used herein, the terms "inhibit", "inhibit" or "inhibition" in the context of proteins and enzymes, for example in the context of enzymes involved in the DNA mismatch repair pathway, refer to a decrease in protein or enzyme activity. In some embodiments, the term refers to a decrease in the level of an enzyme's activity (e.g., the activity of one or more enzymes in a DNA mismatch repair pathway) to a level that is statistically significantly below an initial level, which may be, for example, a baseline level of enzyme activity. In some embodiments, the term refers to a decrease in the level of activity of an enzyme (e.g., activity of one or more enzymes in a DNA mismatch repair pathway) to less than 75%, less than 50%, less than 40%, less than 30%, less than 25%, less than 20%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, less than 1%, less than 0.5%, less than 0.1%, less than 0.01%, less than 0.001%, or less than 0.0001% of an initial level, which may be, for example, a baseline level of enzyme activity.
Joint
As used herein, the term "linker" refers to a molecule that connects two other molecules or moieties. In the case of a linker linking two fusion proteins, the linker may be an amino acid sequence. For example, cas9 may be fused to a reverse transcriptase by an amino acid linker sequence. In the case of joining two nucleotide sequences together, the linker may also be a nucleotide sequence. For example, in the present case, a traditional guide RNA is linked by a spacer or linker nucleotide sequence to an RNA extension that directs editing of the guide RNA, which may comprise an RT template sequence and an RT primer binding site. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5 to 100 amino acids in length, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. In certain embodiments, the linker is a self-hydrolyzing linker (e.g., a 2A self-cleaving peptide as further described herein). Self-hydrolytic linkers (e.g., 2A self-cleaving peptides) are capable of inducing ribosome jump during protein translation, resulting in the inability of the ribosome to form peptide bonds between two genes or gene fragments.
MLH1
The term "MLH1" refers to a gene encoding the DNA mismatch repair enzyme MLH1 (or MutL homolog 1). The protein encoded by this gene can heterodimerize with the mismatch repair endonuclease PMS2 to form mutlα (mutlα), which is part of the DNA mismatch repair system. MLH1 mediates protein-protein interactions during mismatch recognition, strand discrimination and strand removal. In mismatch repair, heterodimer MSH2: MSH6 (MutSα) forms and binds mismatches. Then, MLH1 forms a heterodimer with PMS2 (MutLα) and binds MSH2: MSH6 heterodimers. The mutlα heterodimer then cleaves the mismatched nick strands 5 'and 3', and then cleaves the mismatch from the mutlα generated nick by EXO 1. Finally, POL delta resynthesizes the excised strand, and LIG1 ligation is performed.
An exemplary amino acid sequence for MLH1 is human isoform 1, P40692-1: the sp|p40692|mlh1_human DNA mismatch repair protein Mlh1 os=homo sapiens ox=9606 gn=mlh1pe=1 SV =1:
or with SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
Another exemplary amino acid sequence for MLH1 is human isoform 2, P40692-2 (where amino acids 1-241 of isoform 1 are deleted): 14 sp|p40692-2|mlh1_human DNA mismatch repair protein Mlh1 isoform 2 OS =homo sapiens ox=9606 GN =mlh1:
or with SEQ ID NO:205 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
Another exemplary amino acid sequence of MLH1 is human isoform 3, P40692-3 (where amino acids 1-101 (MSFVAGVIRR … ASISTYGFRG (SEQ ID NO: 206) are replaced by MAF): sp|P40692-2|MLH1_HUMAN DNA mismatch repair protein Mlh1 isoform 2 OS = homo sapiens OX = 9606 GN = MLH1:
or with SEQ ID NO:207 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
The present disclosure contemplates targeting MLH1 and/or MMR pathway components that interact with MLH1, including any wild-type or naturally occurring variant of MLH1, including those that interact with SEQ ID NO: 204. 208-213, 215, 216, 218, 222 or 223, or an amino acid sequence having at least 70%, or 75%, or 80%, or 90%, or 95%, or 99% or more sequence identity, or a nucleic acid molecule encoding any MLH1 or variant of MLH1 (e.g., a dominant negative mutant of MLH1 as described herein) for inhibiting, blocking or otherwise inactivating wild-type MLH1 function in an MMR pathway, and thereby inhibiting, blocking or otherwise inactivating an MMR pathway, e.g., during genome editing using a guided editor.
In some embodiments, inactivation of the MMR pathway involves an inhibitor that disrupts, blocks, interferes with, or otherwise inactivates the wild-type function of the MLH1 protein. In some embodiments, inactivation of the MMR pathway involves mutants of the MLH1 protein, e.g., contacting a target cell with the MLH1 mutant protein or expressing an MLH1 mutant nucleic acid encoding the MLH1 mutant protein in the target cell. In some embodiments, the MLH1 mutein interferes with and thereby inactivates the function of the wild-type MLH1 protein in the MMR pathway. In some embodiments, the MLH1 mutant is a dominant negative mutant. In some embodiments, the MLH mutein is capable of binding an MLH1 interacting protein, e.g., mutS.
Without being limited by theory, the dominant negative mutant of MLH1 acts by saturating binding of MutS, thereby blocking MutS-wild-type MLH1 binding and interfering with the function of the wild-type MLH1 protein in the MMR pathway.
In various embodiments, dominant negative MLH1 may include, for example, MLH 1E 34A, which is based on the amino acid sequence of SEQ ID NO:222 and has the following amino acid sequence (E34A mutation underlined and bold):
or with SEQ ID NO:222 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In various other embodiments, dominant negative MLH1 may include, for example, MLH1Δ756, which is based on the amino acid sequence of SEQ ID NO:208 and has the following amino acid sequence (underlined and bold to show the Δ756 mutation at the C-terminus of the sequence):
or with SEQ ID NO:208 has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including 100% sequence identity (wherein [ - ] represents an amino acid residue deleted relative to the parent or wild-type sequence).
In other embodiments, dominant negative MLH1 may include, for example, MLH1Δ754- Δ756, which is based on SEQ ID NO:209 and has the following amino acid sequence (the mutations Δ754- Δ756 at the C-terminus of the sequence are underlined and bolded):
or with SEQ ID NO:209 has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or an amino acid sequence up to and including 100% sequence identity (where [ - - ] means an amino acid residue deleted relative to the parent or wild-type sequence).
In other embodiments, dominant negative MLH1 may include, for example, MLH 1E 34A Δ754- Δ756, which is based on SEQ ID NO:210 and has the following amino acid sequence (E34A and Δ754- Δ756 mutations are underlined and bold):
/>
Or with SEQ ID NO:210 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In certain embodiments, dominant negative MLH1 may comprise, for example, MLH1 1-335, which is based on the amino acid sequence of SEQ ID NO:211 and has the following amino acid sequence (amino acids 1 to 335 containing SEQ ID NO: 204):
or with SEQ ID NO:211 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In other embodiments, dominant negative MLH1 may comprise, for example, MLH1 1-335E 34A, which is based on the amino acid sequence of SEQ ID NO:212 and has the following amino acid sequence (containing amino acids 1 to 335 of SEQ ID NO:204 and the E34A mutation relative to SEQ ID NO: 204):
or with SEQ ID NO:212 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In other embodiments, dominant negative MLH1 may include, for example, MLH1 1-335NLS SV40 (or MLH1 dn) NTD Based on SEQ ID NO:204 and has the following amino acid sequence (NLS sequence containing amino acids SEQ NO:204, 1-335 and SV 40):
wherein underlined and bold refers to the NLS of SV 40), or to SEQ ID NO:213 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to and including 100% sequence identity.
In other embodiments, dominant negative MLH1 may include, for example, MLH1 1-335NLS Substitution of (which is based on SEQ ID NO:204 and has the following amino acid sequences (containing amino acids 1-335 of SEQ ID NO:204 and the substitution NLS sequence)):
or with SEQ ID NO:214 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity. The replacement NLS sequence may be any suitable NLS sequence including, but not limited to:
in some embodiments, the NLS sequence is appended to the N-terminus of the protein and begins with methionine ("M"). In other embodiments, the NLS sequence may be appended to the C-terminus of the protein, or between multiple domains of the fusion protein, and does not start with methionine (e.g., when NLS is appended to the C-terminus or between two domains of the fusion protein, M in SEQ ID NOs: 101, 1 and 134 is not included in NLS).
In other embodiments, dominant negative MLH1 may comprise, for example, MLH1 501-756, which corresponds to SEQ ID NO:204, which fragment corresponds to SEQ ID NO:204 amino acids 501-756:
or with SEQ ID NO:215 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In other embodiments, dominant negative MLH1 may comprise, for example, MLH1 501-753, which corresponds to SEQ ID NO:204, which fragment corresponds to SEQ ID NO:204 amino acids 501-753:
or with SEQ ID NO:216 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In other embodiments, dominant negative MLH1 may include, for example, MLH1 461-756, which is a sequence corresponding to SEQ ID NO:204, amino acids 461-756, SEQ ID NO: 204C-terminal fragment:
or with SEQ ID NO:217 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In various embodiments, dominant negative MLH1 may include, for example, MLH1 461-753, which is a polypeptide corresponding to SEQ ID NO:204 from amino acids 461 to 753, SEQ ID NO: 204C-terminal fragment:
or with SEQ ID NO:218 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
In various other embodiments, dominant negative MLH1 may include, for example, MLH1 461-753, which is a polypeptide corresponding to SEQ ID NO:204 from amino acids 461 to 753, SEQ ID NO:204, and further comprises an N-terminal NLS, e.g. NLS SV40
Or with SEQ ID NO:218 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity. The NLS sequence may be any suitable NLS sequence, including but not limited to SEQ ID NO: 1. 101, 103, 133-139.
napDNAbp
As used herein, the term "nucleic acid-programmable DNA-binding protein" or "napDNAbp" (where Cas9 is an example) refers to the use of RNA: DNA hybridization targets and binds to proteins of a specific sequence in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence comprising a DNA strand (i.e., target strand) complementary to the guide nucleic acid or portion thereof (e.g., the pre-spacer of the guide RNA). In other words, the guide nucleic acid "programs" the napDNAbp (e.g., cas9 or equivalent) to locate and bind the complementary sequence.
Without being bound by theory, the binding mechanism of the napDNAbp-guide RNA complex generally includes a step of forming an R loop, whereby napDNAbp induces unwinding of the double stranded DNA target, thereby separating the strands in the region bound by napDNAbp. The guide RNA pre-spacer sequence is then hybridized to the "target strand". This displaces the "non-target strand" that is complementary to the target strand, forming the single-stranded region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cleave DNA, leaving behind various types of lesions. For example, napDNAbp can comprise nuclease activity that cleaves a non-target strand at a first location and/or cleaves a target strand at a second location. Depending on the nuclease activity, the target DNA may be cleaved to form a "double strand break," thereby cleaving both strands. In other embodiments, the target DNA may be cleaved at only a single location, i.e., the DNA is "nicked" on one strand. Exemplary napDNAbp with different nuclease activities include "Cas9 nickase" ("nCas 9") and inactive Cas9 without nuclease activity ("dead Cas9" or "dCas 9"). Exemplary sequences of these and other napDNAbp are provided herein.
Nicking enzyme
As used herein, the term "nickase" may refer to Cas9 with one of the two nuclease domains inactivated. This enzyme cleaves only one strand of the target DNA. As used herein, "nickase" can refer to napDNAbp (e.g., cas protein) that is capable of cleaving only one of the two complementary strands of a double-stranded target DNA sequence, thereby creating a nick in that strand. In some embodiments, the nicking enzyme cleaves a non-target strand of a double-stranded target DNA sequence. In some embodiments, the nickase comprises an amino acid sequence having one or more mutations in the catalytic domain of a typical napDNAbp (e.g., cas protein), wherein the one or more mutations reduce or eliminate nuclease activity of the catalytic domain. In certain embodiments, the napDNAbp is a Cas9 nickase, a Cas12a nickase, or a Cas12b1 nickase. In some embodiments, the nickase is Cas9 comprising one or more mutations in the RuvC-like domain relative to the wild-type Cas9 sequence or relative to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is Cas9 comprising one or more mutations in the HNH-like domain relative to the wild-type Cas9 sequence or relative to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is Cas9 comprising an aspartic acid-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 relative to a typical Cas9 sequence or relative to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is Cas9 comprising a H840A, N854A and/or N863A mutation relative to a typical Cas9 sequence or relative to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the term "Cas9 nickase" refers to Cas9 with one of the two nuclease domains inactivated. This enzyme cleaves only one strand of the target DNA. In some embodiments, the nickase is a Cas protein, rather than a Cas9 nickase.
In some embodiments, the napDNAbp that directs editing of the complex comprises an endonuclease with nucleic acid programmable DNA binding capability. In some embodiments, the napDNAbp comprises an active endonuclease capable of cleaving both strands of double stranded target DNA. In some embodiments, the napDNAbp is a nuclease-active endonuclease, such as a nuclease-active Cas protein, that can cleave both strands of double-stranded target DNA by creating a nick on each strand. For example, a nuclease-active Cas protein can generate a cut (nick) on each strand of a double-stranded target DNA. In some embodiments, the two nicks on the two strands are staggered nicks, e.g., generated by napDNAbp comprising Cas12a or Cas12b 1. In some embodiments, the two nicks on both strands are located at the same genomic position, e.g., generated by napDNAbp comprising nuclease activity Cas 9. In some embodiments, the napDNAbp comprises an endonuclease as a nicking enzyme. For example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that reduce the nuclease activity of the endonuclease, thereby making it a nicking enzyme. In some embodiments, the napDNAbp comprises an inactive endonuclease. For example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that eliminate nuclease activity. In various embodiments, the napDNAbp is a Cas9 protein or variant thereof. The napDNAbp can also be nuclease active Cas9, nuclease inactive Cas9 (dCas 9), or Cas9 nickase (nCas 9). In a preferred embodiment, napDNAbp is Cas9 nickase (nCas 9) nicking only single strands. In certain embodiments, the napDNAbp is a Cas9 nickase, a Cas12a nickase, or a Cas12b1 nickase. In some embodiments, napDNAbp may be selected from the following: cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity such that only one strand is cleaved. In some embodiments, napDNAbp is selected from Cas9, cas12e, cas12d, cas12a, cas12b1, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity such that one DNA strand is cleaved in preference to the other DNA strand.
Incision site
The terms "cleavage site", "nicking site" and "cleavage site" are used interchangeably herein in the context of guided editing to refer to a specific position between two nucleotides or two base pairs in a double-stranded target DNA sequence. In some embodiments, the position of the cleavage site is determined relative to the position of a particular PAM sequence. In some embodiments, the nicking site is a specific location at which a nick will occur when the double stranded target DNA is contacted with a napDNAbp (e.g., a nicking enzyme such as Cas nicking enzyme) that recognizes a specific PAM sequence. For each PEgRNA described herein, the nick site is characteristic of a particular napDNAbp associated with the gRNA core of the PEgRNA, and is characteristic of a particular PAM required for the recognition and function of napDNAbp. For example, for PEgRNA comprising a gRNA core associated with SpCas9, the cleavage site is located at the phosphodiester bond between base three (position "-3" relative to position 1 of the PAM sequence) and base four (position "-4" relative to position 1 of the PAM sequence).
In some embodiments, the nicking site is located in the target strand of the double-stranded target DNA sequence. In some embodiments, the nicking site is located in a non-target strand of the double-stranded target DNA sequence. In some embodiments, the cleavage site is located in the pre-spacer sequence. In some embodiments, the cleavage site is adjacent to the pre-spacer sequence. In some embodiments, the nicking site is located downstream of a region complementary to the primer binding site of PEgRNA, e.g., on a non-target strand. In some embodiments, the nicking site is located downstream of the region that binds to the primer binding site of PEgRNA, e.g., on a non-target strand. In some embodiments, the nicking site is immediately downstream of the region complementary to the primer binding site of PEgRNA, e.g., on a non-target strand. In some embodiments, the nicking site is located upstream of a specific PAM sequence on a non-target strand of the double-stranded target DNA, wherein the PAM sequence is specific for the recognition of napDNAbp associated with the gRNA core of PEgRNA. In some embodiments, the nicking site is located downstream of a specific PAM sequence on a non-target strand of the double-stranded target DNA. Wherein the PAM sequence is specific for the recognition of napDNAbp associated with the gRNA core of PEgRNA. In some embodiments, the cleavage site is located 3 nucleotides upstream of the PAM sequence, and the PAM sequence is cleaved by streptococcus pyogenes Cas9 cleavage enzyme, pseudomonas ravaliensis Cas9 cleavage enzyme, corynebacterium diphtheriae Cas9 cleavage enzyme, botrytis cinerea Cas9, staphylococcus aureus Cas9, or n.lari Cas9 cleavage enzyme. In some embodiments, the nicking site is located 3 nucleotides upstream of the PAM sequence and the PAM sequence is recognized by a Cas9 nickase, wherein the Cas9 nickase comprises a nuclease active HNH domain and a nuclease inactivating RuvC domain. In some embodiments, the nicking site is located 2 base pairs upstream of the PAM sequence and the PAM sequence is recognized by the streptococcus thermophilus Cas9 nickase.
Nucleic acid molecules
As used herein, the term "nucleic acid" refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolopyrimidine, 3-methyladenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyluridine, C5 propynylcytidine, C5 methylcytidine, 7 deadenosine, 7 deazaguanosine, 8 oxo-adenosine, 8 oxo-guanosine, O (6) methylguanosine, 4-acetylcytidine, 5- (carboxyhydroxymethyl) uridine, dihydrouridine, methylpseuduridines, 1-methylguanosine, N6-methyladenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), inserted bases, modified sugars (e.g., 2' -fluorodeoxyuridine, 2' -methylriboside, and N ' -methylriboside, and phospho-and phospho-arabino (e) and phospho-riboside).
P53
As used herein, "p53" refers to tumor protein 53. Among other functions, p53 plays a role in DNA damage and repair, particularly in regulating cell cycle, apoptosis, and genomic stability. When DNA is damaged, P53 can activate DNA repair proteins. P53 can also prevent cell growth by maintaining the cell cycle at the G1/S regulatory point recognized by DNA damage, thereby providing more time for DNA repair proteins to repair DNA damage and then allowing the cell to continue the cell cycle. Thus, in some embodiments of the methods described herein, p53 is inhibited (e.g., by the p53 inhibitor protein "i53" or another p53 inhibitor) to increase the efficiency of guided editing.
PEgRNA
As used herein, the term "guide editing guide RNA" or "PEgRNA" or "extended guide RNA" refers to a particular form of guide RNA that has been modified to include one or more additional sequences for use in practicing the guide editing methods and compositions described herein. As described herein, the guided editing guide RNA comprises one or more "extension regions" of a nucleic acid sequence. The extension region may include, but is not limited to, single stranded RNA or DNA. In addition, an extension region may be present at the 3' end of a conventional guide RNA. In other configurations, the extension region may be present at the 5' end of a traditional guide RNA. In other configurations, the extension region may be present in an intramolecular region of a traditional guide RNA, e.g., in a gRNA core region associated with and/or bound to napDNAbp. The extension region comprises a "DNA synthesis template" that encodes (by a polymerase that directs the editor) single-stranded DNA that is in turn designed to (a) be homologous to the endogenous target DNA to be edited, and (b) that comprises at least one desired nucleotide change (e.g., a transition, a transversion, a deletion, or an insertion) to be introduced or integrated into the endogenous target DNA. The extension region may also comprise other functional sequence elements such as, but not limited to, "primer binding sites" and "spacer or linker" sequences, or additional structural elements such as, but not limited to, an aptamer, a stem loop, a hairpin, a toe loop (e.g., a 3' toe loop), or an RNA-protein recruitment domain (e.g., an MS2 hairpin). As used herein, a "primer binding site" comprises a sequence that hybridizes to a single-stranded DNA sequence having a 3' end generated from nicked DNA of an R-loop.
In certain embodiments, the PEgRNA has a 5' extension arm, a spacer region, and a gRNA core. The 5' extension also includes a reverse transcriptase template, a primer binding site, and a linker in the 5' to 3' direction. Reverse transcriptase templates may also be more broadly referred to as "DNA synthesis templates," in which the editor-directed polymerases described herein are not RT, but another type of polymerase.
In certain other embodiments, the PEgRNA has a 5' extension arm, a spacer region, and a gRNA core. The 5' extension also includes a reverse transcriptase template, a primer binding site, and a linker in the 5' to 3' direction. Reverse transcriptase templates may also be more broadly referred to as "DNA synthesis templates," in which the editor-directed polymerases described herein are not RT, but another type of polymerase.
In other embodiments, the PEgRNA has a spacer (1), a gRNA core (2), and an extension arm (3) PEgRNA in the 5 'to 3' direction. The extension arm (3) is located at the 3' end of PEgRNA. Extension arm (3) also includes "homology arm", "editing template" and "primer binding site" in the 5 'to 3' direction. In certain embodiments, PEgRNA comprises a spacer region from 5 'to 3', a DNA synthesis template, and a primer binding site. The extension arm (3) may also comprise optional modification regions at the 3 'and 5' ends, which may be the same sequence or different sequences. In addition, the 3' end of PEgRNA may comprise a transcription terminator sequence. These sequence elements of PEgRNA are further described and defined herein.
In other embodiments, the PEgRNA has an extension arm (3), a spacer (1) and a PEgRNA core (2) in the 5 'to 3' direction. The extension arm (3) is located at the 5' end of the PEgRNA. Extension arm (3) also includes "homology arm", "editing template" and "primer binding site" in the 5 'to 3' direction. The extension arm (3) may also comprise an optional modification region at the 3 'and 5' ends, which may be the same sequence or different sequences. PEgRNA may also comprise a transcription terminator sequence at the 3' end. These sequence elements of PEgRNA are further described and defined herein.
PE1
As used herein, "PE1" refers to a guided editor system comprising a fusion protein comprising Cas9 (H840A) and wild-type MMLVRT having the structure: [ NLS ] - [ Cas9 (H840A) ] - [ linker ] - [ MMLV_RT (wt) ]+the desired PEgRNA, wherein the PE fusion has the sequence of SEQ ID NO:100, as shown below;
explanation:
nuclear Localization Sequence (NLS)Open end: (SEQ ID NO: 101), terminal: (SEQ ID NO: 103)
CAS9 (H840A) (SEQ ID NO:37 is identical to SEQ ID NO:2 except for the substitution of H840A)
33-amino acid linker(SEQ ID NO:102)
M-MLV reverse transcriptase (SEQ ID NO: 81).
Alternatively, PE1 may also refer to SEQ ID NO:100, i.e. there is no pegRNA complexed therewith. PE1 can be complexed with pegRNA during manipulation and/or for guided editing.
PE2
As used herein, "PE2" refers to a guided editing system comprising a fusion protein comprising Cas9 (H840A) and a variant MMLV RT having the structure: (NLS) - [ (Cas 9 (H840A) ] - [ linker ] - [ mmlv_rt (D200N) (T330P) (L603W) (T306K) (W313F) ]+the desired PEgRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO:107, as shown below;
explanation:
nuclear Localization Sequence (NLS)Open end: (SEQ ID NO: 101), terminal: (SEQ ID NO: 103)
CAS9(H840A)(SEQ ID NO:37)
33-amino acid linker(SEQ ID NO:102)
M-MLV reverse transcriptase (SEQ ID NO: 98).
Alternatively, PE2 may also refer to SEQ ID NO:107, i.e. there is no pegRNA complexed therewith. PE2 can be complexed with pegRNA during manipulation and/or for guided editing.
PE3
As used herein, "PE3" refers to PE2 plus a second strand nick-generating guide RNA that complexes with PE2 and introduces a nick in the non-edited DNA strand to cause preferential displacement of the edited strand.
PE3b
As used herein, "PE3b" refers to PE3 but wherein the second-strand-incision-producing guide RNA is designed for timing control such that the second strand incision is not introduced until after installation of the desired edit. This is achieved by designing the gRNA with a spacer sequence that matches only the editing strand, not the original allele. Using this strategy (hereinafter referred to as PE3 b), the mismatch between the pre-spacer and the non-editing allele should be detrimental to the generation by nicking of the sgRNA until after the editing event on the PAM strand has occurred.
PE4
As used herein, "PE4" refers to a system comprising PE2 plus a trans-expressed MLH1 dominant-negative protein (i.e., wild-type MLH1 with amino acids 754-756 truncated, as further described herein). In some embodiments, PE4 refers to a fusion protein comprising PE2 and MLH1 dominant negative proteins linked via an optional linker.
PE5
As used herein, "PE5" refers to a wild-type MLH1 comprising PE3 plus a trans-expressed MLH1 dominant-negative protein (i.e., deletion of amino acids 754-756, which may be referred to as "mlh1Δ754-756" or "MLH1 dn"), as further described herein. In some embodiments, PE5 refers to a fusion protein comprising PE3 and an MLH1 dominant negative protein linked via an optional linker.
PEmax
As used herein, "PEmax" (see fig. 54B) refers to a PE complex comprising a fusion protein comprising Cas9 (R221K N394K H840A) and a variant MMLV RT five mutant (D200N T306K W313F T330P L603W) having the following structure: [ bipartite NLS ] - [ Cas9 (R221K) (N394K) (H840A) ] ] - [ linker ] - [ MMLV_RT (D200N) (T330P) (L603W) ] ] - [ bipartite NLS ] - [ NLS ] + desired PEgRNA, wherein PE fusion has the sequence of SEQ ID NO: 99.
PE4max
As used herein, "PE4max" refers to PE4, but wherein the PE2 component is replaced by PEmax.
PE-short
As used herein, "PE-short" refers to a PE construct fused to a C-terminally truncated reverse transcriptase and having the following amino acid sequence:
explanation:
nuclear Localization Sequence (NLS)Open end: (SEQ ID NO: 101), terminal: (SEQ ID NO: 103)
CAS9(H840A)(SEQ ID NO:37)
33-amino acid linker 1(SEQ ID NO:102)
M-MLV truncated reverse transcriptase
(SEQ ID NO:80)
Polymerase enzyme
As used herein, the term "polymerase" refers to an enzyme that synthesizes a nucleotide chain and can be used in conjunction with the guided editor system described herein. The polymerase may be a "template dependent" polymerase (i.e., a polymerase that synthesizes a nucleotide chain based on the nucleotide base sequence of the template chain). The polymerase may also be a "template independent" polymerase (i.e., a polymerase that synthesizes a nucleotide chain without the need for a template chain). The polymerase may be further classified as "DNA polymerase" or "RNA polymerase". In various embodiments, the guided editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase may be a "DNA-dependent DNA polymerase" (i.e., whereby the template molecule is a DNA strand). In this case, the DNA template molecule may be PEgRNA, wherein the extension arm comprises a DNA strand. In this case, the PEgRNA may be referred to as a chimeric or hybrid PEgRNA, which comprises an RNA portion (i.e., a guide RNA component, including a spacer region and a gRNA core) and a DNA portion (i.e., an extension arm). In various other embodiments, the DNA polymerase may be an "RNA-dependent DNA polymerase" (i.e., whereby the template molecule is an RNA strand). In this case, PEgRNA is RNA, i.e. comprises RNA extension. The term "polymerase" may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., polymerase activity). Typically, the enzyme will begin synthesis at the 3 '-end of the primer that anneals to the polynucleotide template sequence (e.g., the primer sequence that anneals to the primer binding site of the PEgRNA) and will proceed toward the 5' -end of the template strand. "DNA polymerase" catalyzes the polymerization of deoxynucleotides. As used herein with respect to DNA polymerase, the term DNA polymerase includes "functional fragments thereof. "functional fragment thereof" refers to any portion of a wild-type or mutant DNA polymerase that comprises less than the complete amino acid sequence of the polymerase and retains the ability to catalyze the polymerization of a polynucleotide under at least one set of conditions. Such a functional fragment may exist as a separate entity or it may be a component of a larger polypeptide (e.g., a fusion protein).
Guided editing
As used herein, the term "guided editing" refers to a method of gene editing using napDNAbps, a polymerase (e.g., reverse transcriptase), and a specialized guide RNA that includes a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence. Some embodiments of guided editing are described in the embodiment of fig. 1. Classical guided editing is described in the inventor's publication Anzalone, A.V. et al search-and-replace genome editing without double-strand breaks or donor DNA Nature 576, 149-157 (2019), which is incorporated herein by reference in its entirety.
Guided editing represents a generic and accurate genome editing platform that uses a nucleic acid programmable DNA binding protein ("napDNAbp") that runs with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with napDNAbp) to write new genetic information to specified DNA sites, wherein the guided editing system programs with a guided editing (PE) guide RNA ("PEgRNA") that both specifies a target site by an extension (DNA or RNA) engineered into the guide RNA (e.g., at the 5 'or 3' end or internal portion of the guide RNA) and provides a template for synthesizing the desired editing in the form of a displaced DNA strand. The displaced strand containing the desired edit (e.g., a single nucleobase substitution) shares the same (or homologous) sequence with the endogenous strand of the target site to be edited (immediately downstream of the nick site) (except that it includes the desired edit). The endogenous strand downstream of the nick site is replaced by a newly synthesized replacement strand containing the desired editing by DNA repair and/or replication mechanisms. In some cases, guided editing may be considered a "search-and-replace" genome editing technique, in that the guided editor described herein not only searches for and locates the desired target site to be edited, but also simultaneously encodes a replacement strand containing the desired editing that is installed to replace the endogenous DNA strand of the corresponding target site. The guided editors of the present disclosure are directed, in part, to target-initiated reverse transcription (TPRT) mechanisms that can be engineered for CRISPR/Cas-based precise genome editing, with high efficiency and genetic plasticity. TPRT is used by mobile DNA elements such as mammalian non-LTR retrotransposons and bacterial group II introns. The inventors herein use Cas protein-reverse transcriptase fusions or related systems to target specific DNA sequences with guide RNAs, create single-stranded nicks at mid-target sites, and use nicked DNA as primers to perform reverse transcription of an engineered reverse transcriptase template integrated with the guide RNAs. However, while the concept begins with the use of reverse transcriptase as the guide editor for the DNA polymerase component, the guide editors described herein are not limited to reverse transcriptase, but can include the use of nearly any DNA polymerase. Indeed, although the present application may refer throughout to a guided editor having a "reverse transcriptase," it is proposed herein that reverse transcriptase be but one type of DNA polymerase that can function with guided editing. Thus, wherever the specification refers to "reverse transcriptase," one of ordinary skill in the art will appreciate that any suitable DNA polymerase may be used in place of reverse transcriptase. Thus, in one aspect, the guide editor can comprise Cas9 (or equivalent napDNAbp) programmed to target a DNA sequence by associating it with a specialized guide RNA (i.e., PEgRNA) comprising a spacer sequence that anneals to a complementary pre-spacer in the target DNA. The specialized guide RNAs also contain new genetic information in extended form that encodes DNA replacement strands containing the desired genetic changes for replacement of the corresponding endogenous DNA strand at the target site. To transfer information from PEgRNA to target DNA, the guided editing mechanism involves nicking the target site on one strand of DNA to expose the 3' -hydroxyl. The exposed 3' -hydroxyl can then be used to directly initiate polymerization of DNA encoding the edited extension on PEgRNA in the centered target site. In various embodiments, the extension (which provides a template for polymerization containing the edited substitution strand) may be formed from RNA or DNA. In the case of RNA extension, the polymerase that directs the editor may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of DNA extension, the polymerase that directs the editor may be a DNA-dependent DNA polymerase. The new synthetic strand formed by the guided editor disclosed herein (i.e., the strand containing the substitution DNA desired to be edited) will be homologous (i.e., have the same sequence) to the genomic target sequence except for containing the desired nucleotide change (e.g., a single nucleotide change, a deletion or an insertion, or a combination thereof). The newly synthesized (or displaced) DNA strand, also referred to as a single-stranded DNA flap, will compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase (e.g., provided as a fusion protein with a Cas9 domain, or provided in trans with a Cas9 domain). Error-prone reverse transcriptase can introduce changes during the synthesis of single stranded DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase may be utilized to introduce nucleotide changes into target DNA. The variation may be random or non-random depending on the error-prone reverse transcriptase used with the system. The decomposition of hybridization intermediates (including single-stranded DNA flaps synthesized by reverse transcriptase hybridized to endogenous DNA strands) may include removal of the resulting replacement flaps of endogenous DNA (e.g., using 5' end DNA flap endonuclease, FEN 1), ligation of the synthesized single-stranded DNA flaps to the target DNA, and assimilation of desired nucleotide changes due to cellular DNA repair and/or replication processes. Since DNA synthesis, which provides templates, provides single nucleotide precision for any nucleotide modification (including insertions and deletions), this approach is very broad in scope and can predictably be used for numerous applications in basic science and therapeutics.
In various embodiments, the guided editing is performed by contacting the target DNA molecule (to which a change in nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) that is complexed with the guided editing guide RNA (PEgRNA). In various embodiments, the guide editing guide RNA (PEgRNA) comprises an extension at the 3 'or 5' end of the guide RNA or at an intramolecular position of the guide RNA and encodes a desired nucleotide change (e.g., a single nucleotide change, an insertion, or a deletion). In step (a), the napDNAbp/extended gRNA complex is contacted with a DNA molecule, and the extended gRNA directs the napDNAbp to bind to the target locus. In step (b), a nick is introduced (e.g., by a nuclease or chemical agent) into one of the DNA strands of the target locus, thereby producing a useful 3' end in one of the strands of the target locus. In certain embodiments, a nick is created in the DNA strand corresponding to the R-loop strand (i.e., the strand that does not hybridize to the guide RNA sequence, i.e., the "non-target strand"). However, an incision may be introduced in either strand. In other words, a nick may be introduced into the R loop "target strand" (i.e., the strand that hybridizes to the pre-spacer of the extended gRNA) or the "non-target strand" (i.e., the strand that forms the single-stranded portion of the R loop, which is complementary to the target strand). In step (c), the 3' end of the DNA strand (formed by the nick) interacts with the extension of the guide RNA to initiate reverse transcription (i.e. "target-initiated RT"). In certain embodiments, the 3' DNA strand hybridizes to a specific RT priming sequence on the extension of the guide RNA, i.e., a "reverse transcriptase priming sequence" or a "primer binding site" on PEgRNA. In step (d), a reverse transcriptase (or other suitable DNA polymerase) is introduced which synthesizes single stranded DNA from the 3 'end of the priming site towards the 5' end of the guide-editing guide RNA. The DNA polymerase (e.g., reverse transcriptase) can be fused to the napDNAbp, or alternatively can be provided in trans with the napDNAbp. This forms a single-stranded DNA flap that contains the desired nucleotide changes (e.g., single base changes, insertions or deletions, or a combination thereof) and is otherwise homologous to the endogenous DNA at or near the nicking site. In step (e), napDNAbp and guide RNA are released. Steps (f) and (g) involve the breakdown of single stranded DNA flaps in order to incorporate the desired nucleotide changes into the target locus. This can drive the process by removing the corresponding 5 'endogenous DNA flap, which is formed after the 3' single stranded DNA flap invades and hybridizes to the endogenous DNA sequence, to the desired product formation. Without being bound by theory, the cellular endogenous DNA repair and replication process breaks down mismatched DNA to incorporate nucleotide changes to form the desired altered product. This process can also be driven towards product formation by "second strand nick creation". The process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions and insertions.
The term "guided editor (PE) system" or "guided editor (PE)" or "PE system" or "PE editing system" refers to compositions involved in genomic editing methods using target-initiated reverse transcription (TPRT) as described herein, including, but not limited to, napDNAbp, a reverse transcriptase (or another DNA polymerase), fusion proteins (e.g., comprising napDNAbp and reverse transcriptase or comprising napDNAbp and DNA polymerase), guided editing guide RNAs, and complexes comprising fusion proteins and guided editing guide RNAs, as well as auxiliary elements, such as a second nick-producing component (e.g., second strand sgRNA) and a 5' endogenous DNA flap-removing endonuclease (e.g., FEN 1), for helping to drive the guided editing process toward the formation of an editing product.
Although in the embodiments described so far PEgRNA constitutes a single molecule comprising a guide RNA (which itself comprises a spacer sequence and a gRNA core or scaffold) and a 5 'or 3' extension arm comprising a primer binding site and a DNA synthesis template (PEgRNA may also take the form of two separate molecules consisting of a guide RNA and a trans-guide editor RNA template (tPERT) that substantially accommodates the extension arm (including in particular the primer binding site and the DNA synthesis domain) and an RNA-protein recruitment domain (e.g., an MS2 aptamer or hairpin) in the same molecule, which co-localizes or recruits to a modified guide editor complex comprising a tPERT recruitment protein (e.g., an MS2cp protein that binds to an MS2 aptamer).
Guide editor
The term "guide editor" refers to a fusion construct that comprises a napDNAbp (e.g., cas9 nickase) and a reverse transcriptase, and is capable of guided editing of a target nucleotide sequence in the presence of PEgRNA (or "extended guide RNA"). The term "guide editor" may refer to a fusion protein or a fusion protein complexed with PEgRNA and/or further complexed with a second strand-incision generating sgRNA. In some embodiments, a guide editor may also refer to a complex comprising a fusion protein (reverse transcriptase fused to napDNAbp), PEgRNA, and a conventional guide RNA capable of guiding a second site nick generation step of a non-editing strand, as described herein. In certain embodiments, a guide editor (e.g., PE1, PE2, or PE 3) can be provided as a system with an inhibitor of the DNA mismatch repair pathway (e.g., dominant negative MLH1 protein). In various embodiments, inhibitors of the DNA mismatch repair pathway, such as dominant negative MLH1 proteins, may be provided in trans to the guide editor. In other embodiments, inhibitors of the DNA mismatch repair pathway, such as dominant negative MLH1 proteins, may be complexed with the leader editor, e.g., coupled to the leader editor fusion protein via a linker.
Primer binding sites
The term "primer binding site" or "PBS" refers to the portion of PEgRNA that is a component of an extension arm (e.g., at the 3' end of the extension arm). The term "primer binding site" refers to a single stranded portion of PEgRNA that is a component of an extension arm that comprises a region complementary to a sequence on a non-target strand. In some embodiments, the primer binding site is complementary to a region in the non-target strand upstream of the nick site. In some embodiments, the primer binding site is complementary to a region of the non-target strand immediately upstream of the nicking site. In some embodiments, the primer binding site is capable of binding to a primer sequence formed after nicking by a target sequence of the guide editor (e.g., by a nicking enzyme component of the guide editor, such as Cas9 nickase). When the guide editor nicks one strand of the target DNA sequence (e.g., by the Cas nickase component of the guide editor), a 3' terminal ssDNA flap is formed that acts as a primer sequence that anneals to a primer binding site on the PEgRNA to initiate reverse transcription. In some embodiments, the PBS is complementary or substantially complementary to the free 3 'end on the non-target strand of the double-stranded target DNA at the nicking site, and can anneal to the free 3' end. In some embodiments, PBS annealed to the free 3' end on the non-target strand may initiate target-initiated DNA synthesis.
Front spacer
As used herein, the term "pre spacer" refers to a sequence (about 20 bp) in DNA that is adjacent to a PAM (pre spacer adjacent motif) sequence. The pre-spacer shares the same sequence as the spacer sequence of the guide RNA. The guide RNA anneals to the complement of the pre-spacer on the target DNA (specifically, one strand thereof, i.e., the "target strand" of the target DNA sequence, is relative to the "non-target strand"). In some embodiments, in order for the Cas nickase component of the guide editor to function, a specific pre-spacer proximity motif (PAM) is also required that varies with the Cas protein component itself (e.g., the type of Cas protein). For example, the most commonly used Cas9 nucleases derived from streptococcus pyogenes recognize PAM sequences of NGG on non-target strands, which are located directly downstream of the genomic DNA target sequence. The skilled artisan will appreciate that the literature in the art sometimes refers to "pre-spacers" as about 20-nt target-specific guide sequences on the guide RNA itself, rather than to "spacers". Thus, in some cases, the term "pre-spacer" as used herein may be used interchangeably with the term "spacer". The context of the specification surrounding a "pre-spacer" or "spacer" will help inform the reader whether the term refers to a gRNA or a DNA target.
Prespacer Adjacent Motifs (PAM)
As used herein, the term "pre-spacer adjacent motif" or "PAM" refers to a DNA sequence of about 2-6 base pairs as an important targeting component for Cas9 nucleases. Typically, the PAM sequence is located on either strand and downstream of the Cas9 cleavage site in the 5 'to 3' direction. The classical PAM sequence (i.e., the PAM sequence associated with Cas9 nuclease or SpCas9 of streptococcus pyogenes) is 5'-NGG-3', where "N" is any nucleobase followed by two guanine ("G") nucleobases. Different PAM sequences may be associated with different Cas9 nucleases or equivalent proteins from different organisms. Furthermore, any given Cas9 nuclease, such as SpCas9, can be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes the alternative PAM sequence.
For example, reference classical SpCas9 amino acid sequence is SEQ ID NO: the PAM sequence may be modified by introducing one or more mutations, including (a) D1135V, R1335Q and T1337R "VQR variants" that alter PAM specificity for NGAN or NGNG, (b) D1135E, R1335Q and T1337R "EQR variants" that alter PAM specificity for NGAG, and (c) D1135V, G1218R, R1335E and T1337R "VRR variants" that alter PAM specificity for NGCG. Furthermore, the D1135E variant of classical SpCas9 can still recognize NGG, but it is more selective than the wild-type SpCas9 protein.
It is also understood that Cas9 enzymes (i.e., cas9 orthologs) from different bacterial species may have different PAM specificities. For example, cas9 (SaCas 9) from staphylococcus aureus (Staphylococcus aureus) recognizes NGRRT or NGRRN. Furthermore, cas9 (NmCas) from neisseria meningitidis (Neisseria meningitis) recognizes NNNNGATT. In another example, cas9 (StCas 9) from streptococcus thermophilus (Streptococcus thermophilis) recognizes NNAGAAW. In another example, cas9 (TdCas) from treponema dentatum (Treponema denticola) recognizes NAAAAC. These are examples and are not meant to be limiting. It is further understood that non-SpCas 9 binds to a variety of PAM sequences, which makes them useful when no suitable SpCas9PAM sequence is present at the desired target cleavage site. Furthermore, non-SpCas 9 may have other features that may make them more useful than SpCas 9. For example, cas9 (SaCas 9) from staphylococcus aureus is about 1kb smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV). Reference may be further made to Shah et al, "Protospacer recognition motifs: mixed identities and functional diversity, "RNA Biology,10 (5): 891-899 (incorporated herein by reference).
Reverse transcriptase
The term "reverse transcriptase" describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require primers to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA, which can then be cloned into vectors for further manipulation. Avian Myoblast (AMV) virus (AMV) reverse transcriptase is the first widely used RNA-dependent DNA polymerase (Verma, biochim. Biophys. Acta473:1 (1977)). The enzyme has 5 'to 3' RNA-directed DNA polymerase activity, 5 'to 3' DNA-directed DNA polymerase activity and RNase H activity. RNase H is a persistent 5 'and 3' ribonuclease specific for the RNA strand used in RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, new York: wiley & Sons (1984)). Reverse transcriptase cannot correct errors in transcription because known viral reverse transcriptases lack the 3 'to 5' exonuclease activity necessary for proofreading (Saunders and Saunders, microbial Genetics Applied to Biotechnology, london: croom Helm (1987)). Detailed studies of AMV reverse transcriptase activity and its associated RNase H activity were performed by Berger et al, biochemistry 22:2365-2372 (1983). Another reverse transcriptase widely used in molecular biology is one derived from Moloney (Moloney) murine leukemia Virus (M-MLV). See, e.g., gerard, g.r., DNA 5:271-279 (1986) and Kotewicz, m.l., et al, gene 35:249-258 (1985). M-MLV reverse transcriptase substantially lacking RNase H activity is also described. See, for example, U.S. patent No. 5,244,797. The present invention contemplates the use of any such reverse transcriptase, or variants or mutants thereof.
Furthermore, the present invention contemplates the use of error-prone reverse transcriptases, i.e., which may be referred to as error-prone reverse transcriptases or reverse transcriptases that do not support high fidelity nucleotide incorporation during polymerization. During the synthesis of an RT template-based single-stranded DNA flap integrated with the guide RNA, an error-prone reverse transcriptase may introduce one or more nucleotides mismatched to the RT template sequence, thereby introducing changes to the nucleotide sequence through the error-polymerization of the single-stranded DNA flap. These errors introduced during single-stranded DNA flap synthesis are then integrated into the double-stranded molecule as follows: hybridization with the corresponding endogenous target strand, removal of endogenous displaced strand, ligation, and then a round of endogenous DNA repair and/or sequencing.
Reverse transcription
As used herein, the term "reverse transcription" refers to the ability of an enzyme to synthesize a DNA strand (i.e., complementary DNA or eDNA) using RNA as a template. In some embodiments, reverse transcription may be "error-prone reverse transcription," which refers to the property of certain reverse transcriptases to be error-prone in their DNA polymerization activity.
Proteins, peptides and polypeptides
The terms "protein," "peptide" and "polypeptide" are used interchangeably herein and refer to a polymer of amino acid residues joined together by peptide (amide) bonds. These terms refer to proteins, peptides or polypeptides of any size, structure or function. Typically, a protein, peptide or polypeptide is at least 3 amino acids in length. A protein, peptide or polypeptide may refer to a single protein or collection of proteins. One or more amino acids in a protein, peptide or polypeptide may be modified, for example, by the addition of chemical entities such as carbohydrate groups, hydroxyl groups, phosphate groups, farnesyl groups, isofarnesyl groups, fatty acid groups, linkers for conjugation, functionalization or other modification, and the like. The protein, peptide or polypeptide may also be a single molecule or may be a multi-molecular complex. The protein, peptide or polypeptide may be simply a fragment of a naturally occurring protein or peptide. The protein, peptide or polypeptide may be naturally occurring, recombinant or synthetic, or any combination thereof. Any of the proteins provided herein can be produced by any method known in the art. For example, the proteins provided herein can be produced by recombinant protein expression and purification, which is particularly useful for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known and include Green and Sambrook, molecular Cloning: a Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)) which is incorporated herein by reference in its entirety.
Silent mutations
As used herein, the term "silent mutation" refers to a mutation in a nucleic acid molecule that has no effect on the phenotype of the nucleic acid molecule or the protein it produces (if it encodes a protein). The silent mutations may be present in the coding region of the nucleic acid (i.e., the gene fragment encoding the protein), or they may be present in non-coding regions of the nucleic acid. Silent mutations in a nucleic acid sequence, such as in a target DNA sequence or in a DNA synthesis template sequence to be installed into a target sequence, may be nucleotide changes that do not result in the expression or function of the amino acid sequence encoded by the nucleic acid sequence or other functional characteristics of the target nucleic acid sequence. When silent mutations are present in the coding regions, they may be synonymous mutations. Synonymous mutation refers to the substitution of one base in a gene for another base such that the corresponding amino acid residue of the protein produced by the gene is not modified. This is because the redundancy of the genetic code allows multiple different codons to encode the same amino acid in a particular organism. When a silent mutation is located in a non-coding region or at a junction of a coding region with a non-coding region (e.g., an intron/exon junction), it may be located in a region that does not affect any biological properties of the nucleic acid molecule (e.g., splicing, gene regulation, RNA lifetime, etc.). Silent mutations can be used, for example, to increase the length of edits made to a target nucleotide sequence using guided edits to evade correction of edits by the MMR pathway as described herein. In certain embodiments, the number of installed silent mutations can be one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or more. In certain other embodiments involving at least two silent mutations, the silent mutations can be installed within one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides from the intended editing site.
Spacer sequences
As used herein, the term "spacer sequence" in relation to a guide RNA or PEgRNA refers to a portion of about 20 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, or 24 nucleotides) of the guide RNA or PEgRNA that comprises a nucleotide sequence that shares the same sequence as a pre-spacer in a target DNA sequence. The spacer sequence anneals to the complement of the pre-spacer sequence to form a ssRNA/ssDNA hybrid structure and a corresponding R-loop ssDNA structure of the endogenous DNA strand at the target site.
Target site
The term "target site" refers to a sequence within a nucleic acid molecule that is to be edited by a guide editor (PE) as disclosed herein. The target site may refer to an endogenous sequence within the nucleic acid molecule to be edited, such as an endogenous genomic sequence of the target genome, which is identical to the sequence of the DNA synthesis template except for one or more nucleotide edits present on the DNA synthesis template to be installed (and except that the DNA synthesis template contains uracil instead of thymine), or may refer to a corresponding endogenous sequence on a non-target strand complementary to the DNA synthesis template except for one or more mismatches present on the DNA synthesis template at one or more nucleotide edits to be installed. Target site may also refer to a sequence within a nucleic acid molecule that binds to a complex of a guide editor (PE) and a gRNA.
Variants
As used herein, the term "variant" shall be understood to refer to Cas9 that exhibits characteristics that have a pattern that deviates from the pattern that occurs in nature, e.g., variant Cas9 is Cas9 that comprises one or more amino acid residue changes as compared to the wild-type Cas9 amino acid sequence. The term "variant" includes homologous proteins having at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99% percent identity to a reference sequence and having the same or substantially the same functional activity as the reference sequence. The term also includes mutants, truncations, or domains of the reference sequence and exhibits one or more functional activities that are identical or substantially identical to the reference sequence.
Carrier body
As used herein, the term "vector" refers to a nucleic acid that can be modified to encode a gene of interest and which is capable of entering a host cell, mutating and replicating within the host cell, and then transferring the replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or phages and filamentous phages, and conjugative plasmids. Other suitable carriers will be apparent to those skilled in the art based on this disclosure.
Detailed description of specific embodiments
The present disclosure provides compositions and methods for guided editing with increased editing efficiency and/or reduced indel formation by inhibiting DNA mismatch repair pathways while targeted site guided editing. The inventors have surprisingly found that when one or more functions of a DNA mismatch repair (MMR) system are inhibited, blocked or otherwise inactivated during guided editing, the editing efficiency of guided editing can be significantly increased (e.g., 2-fold increase, 3-fold increase, 4-fold increase, 5-fold increase, 6-fold increase, 7-fold increase, 8-fold increase, 9-fold increase, 10-fold increase, 11-fold increase, 12-fold increase, 13-fold increase, 14-fold increase, 15-fold increase, 16-fold increase, 17-fold increase, 18-fold increase, 19-fold increase, 20-fold increase, 21-fold increase, 22-fold increase, 23-fold increase, 24-fold increase, 26-fold increase, 27-fold increase, 28-fold increase, 29-fold increase, 30-fold increase, 31-fold increase, 32-fold increase, 33-fold increase, 34-fold increase, 35-fold increase, 36-fold increase, 37-fold increase, 38-fold increase, 39-fold increase, 40-fold increase, 41-fold increase, 42-fold increase, 43-fold increase, 44-fold increase, 45-fold increase, 46-fold increase 47-fold increase, 48-fold increase, 49-fold increase, 50-fold increase, 51-fold increase, 52-fold increase, 53-fold increase, 54-fold increase, 55-fold increase, 56-fold increase, 57-fold increase, 58-fold increase, 59-fold increase, 60-fold increase, 61-fold increase, 62-fold increase, 63-fold increase, 64-fold increase, 65-fold increase, 66-fold increase, 67-fold increase, 68-fold increase, 69-fold increase, 70-fold increase, 71-fold increase, 72-fold increase, 73-fold increase, 74-fold increase, 75-fold increase, 76-fold increase, 77-fold increase, 78-fold increase, 79-fold increase, 80-fold increase, 81-fold increase, 82-fold increase, 83-fold increase, 84-fold increase, 85-fold increase, 86-fold increase, 87-fold increase, 88-fold increase, 89-fold increase, 90-fold increase, 91-fold increase, 92-fold increase, 93-fold increase, 94-fold increase, 95-fold increase, 96-fold increase, 97-fold increase, 98-fold increase, 99-fold increase, 100-fold increase, or more). Furthermore, the inventors have surprisingly found that when one or more functions of a DNA mismatch repair (MMR) system are inhibited, blocked, or otherwise inactivated during guided editing, the frequency of insertion/deletion formation resulting from guided editing can be significantly reduced (e.g., 2-fold reduction, 3-fold reduction, 4-fold reduction, 5-fold reduction, 6-fold reduction, 7-fold reduction, 8-fold reduction, 9-fold reduction, or 10-fold reduction or less).
The present disclosure relates to the surprising discovery that the efficiency and/or specificity of directed editing is affected by the DNA mismatch repair (MMR) DNA repair pathway of the cell itself. As described herein (e.g., in example 1), the inventors developed a novel genetic screening method (referred to in one embodiment as "pooled CRISPRi screening for guided editing results"), which resulted in the identification of various genetic determinants affecting the efficiency and/or specificity of guided editing, including MMR. Accordingly, in one aspect, the present disclosure provides a novel guided editing system that includes means for suppressing and/or circumventing the MMR effects, thereby increasing the efficiency and/or specificity of guided editing. In one embodiment, the present disclosure provides a guided editing system comprising an MMR-inhibitory protein, such as, but not limited to, a dominant negative MMR protein, such as a dominant negative MLH1 protein (i.e., "MLH1 dh"). In another embodiment, directing the editing system includes installing one or more silent mutations near the intended edit, thereby allowing the intended edit to escape MMR recognition even in the absence of MMR inhibitor proteins such as MLH1 dn. In another aspect, the present disclosure provides novel genetic screens for identifying genetic determinants, such as MMR, that affect the efficiency and/or specificity of guided editing. In yet a further aspect, the present disclosure provides a nucleic acid construct encoding the improved guided editing system described herein. The present disclosure also provides, in other aspects, vectors (e.g., AAV or lentiviral vectors) comprising nucleic acids encoding the improved primer editing systems described herein. In other aspects, the present disclosure provides cells comprising the improved guided editing systems described herein. In other aspects, the disclosure also provides components of genetic screens, including nucleic acid and/or vector constructs, guide RNAs, pegrnas, cells (e.g., CRISPRi cells), and other reagents and/or materials for performing the genetic screens disclosed herein. In other aspects, the present disclosure provides compositions and kits, e.g., pharmaceutical compositions, that comprise the improved guided editing systems described herein and can be administered to a cell, tissue, or organism by any suitable means, e.g., by gene therapy, mRNA delivery, virus-like particle delivery, or Ribonucleoprotein (RNP) delivery. In yet another aspect, the present disclosure provides methods of installing one or more edits in a target nucleic acid molecule, e.g., a genomic locus, using an improved guided editing system. In yet another aspect, the present disclosure provides methods of treating a disease or disorder using an improved guided editing system to correct or otherwise repair one or more genetic alterations (e.g., a single polymorphism) in a target nucleic acid molecule (e.g., a genomic locus comprising one or more pathogenic mutations).
In one embodiment, the MLH1 protein is inhibited, blocked or otherwise inactivated. In other embodiments, other proteins of the MMR system are inhibited, blocked, or otherwise inactivated, including but not limited to PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO, POL δ, and PCNA. Inhibition may involve inhibition of the protein with an inhibitor (e.g., an antibody or small molecule inhibitor or a dominant negative variant of the protein that disrupts, blocks or otherwise inactivates the function of the protein, e.g., a dominant negative form of MLH 1). Inhibition may also involve any other suitable means, such as inactivation or reduction of expression of the MLH1 gene by protein degradation (e.g., PROTAC-based MLH1 degradation), transcript level inhibition (e.g., siRNA transcript degradation/gene silencing or microRNA-based MLH1 transcript translational inhibition), or installation of mutations at the gene level (i.e., in the MLH1 gene (or regulatory region), or installation of mutations that inactivate, block, or minimize the activity of the encoded MLH1 product). In addition, the present disclosure contemplates that the guide editor (e.g., as a fusion protein comprising napDNAbp and a polymerase, such as Cas9 nickase delivery fused to a reverse transcriptase) can be administered with any inhibitor of the DNA mismatch repair pathway.
Thus, the present disclosure provides methods of editing a nucleic acid molecule by guided editing, which methods involve contacting the nucleic acid molecule with inhibitors of guided editors, pegRNA and DNA mismatch repair pathways, thereby installing one or more modifications to the nucleic acid molecule at a target site with increased editing efficiency and/or reduced indels formation. The present disclosure also provides polynucleotides for editing a DNA target site by guided editing comprising a nucleic acid sequence encoding napDNAbp, a polymerase, and an inhibitor of the DNA mismatch repair pathway, wherein napDNAbp and polymerase are capable of installing one or more modifications in the DNA target site with increased editing efficiency and/or reduced indels formation in the presence of peprna. The present disclosure also provides vectors, cells and kits comprising the compositions and polynucleotides of the present disclosure, as well as methods of making such vectors, cells and kits, as well as methods for delivering such compositions, polynucleotides, vectors, cells and kits to cells in vitro, ex vivo (e.g., during cell-based therapies that modify cells in vitro), and in vivo.
MMR pathway
As described above, the present disclosure relates to observations that the efficiency and/or specificity of guided editing is affected by the DNA mismatch repair (MMR) DNA repair pathway of the cell itself. DNA mismatch repair (MMR) is a highly conserved biological pathway that plays a key role in maintaining genomic stability (see, e.g., fig. 8A and 8B). Coli MutS and MutL and their eukaryotic homologs mutsα and mutlα, respectively, are key participants in MMR-related genome maintenance. In various aspects, the present disclosure contemplates any suitable method of inhibiting, blocking, or otherwise inactivating a DNA mismatch repair (MMR) system, including but not limited to inactivating one or more key proteins of the MMR system at the gene level, e.g., by introducing one or more mutations in the gene encoding the MMR system protein. Such proteins include, but are not limited to, MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ, and PCNA. The nucleotide and amino acid sequences of such naturally occurring proteins and variants thereof are known in the art. Exemplary sequences are provided herein. The present disclosure includes the use of any inhibitor, blocker, knockdown strategy, or other method to inactivate any known protein involved in MMR ("MMR protein"), including any wild-type or naturally-occurring variant of such MMR protein, as well as any engineered variant (including single or multiple amino acid substitutions, deletions, insertions, rearrangements, or fusions) of such MMR protein, so long as inhibiting, blocking, or otherwise inactivating one or more of the MMR proteins, or variants thereof, results in the inhibition, blocking, or inactivation of the MMR pathway. Inhibition, blocking, or inactivation of any one or more MMR proteins or variants can be achieved by applying any suitable method at the genetic level (e.g., in a gene encoding one or more MMR proteins, such as by introducing a mutation that inactivates the MMR protein or variant thereof), the transcriptional level (e.g., by transcript knockdown), the translational level (e.g., by blocking translation of one or more MMR proteins from its cognate transcript), or at the protein level (e.g., by administration of an inhibitor (e.g., a small molecule, an antibody, a dominant negative protein variant)), or by targeting protein degradation (e.g., a PROTAC-based degradation).
In one aspect, the present disclosure provides improved guided editing methods, including additionally inhibiting the DNA mismatch repair (MMR) system during guided editing by inhibiting, blocking, or otherwise inactivating MLH1 or variants thereof.
Without being limited by theory, MLH1 is a critical MMR protein that heterodimerizes with PMS2 to form MutLα (a component of the post-replication DNA mismatch repair system (MMR)). DNA repair is initiated by binding of mutsα (MSH 2-MSH 6) or mutsβ (MSH 2-MSH 3) to dsDNA mismatches, followed by recruitment of mutlα to the heteroduplex. In the presence of RFC and PCNA, the assembly of the MutL-MutS-heteroduplex ternary complex is sufficient to activate the endonuclease activity of PMS 2. It introduces a single strand break near the mismatch, creating a new entry point for the exonuclease EXO1 to degrade the mismatched strand. DNA methylation prevents cleavage, thereby ensuring that only newly mutated DNA strands are corrected. MutLα (MLH 1-PMS 2) physically interacts with the clip loading subunit of DNA polymerase III, suggesting that it may play a role in recruiting DNA polymerase III to the MMR site. Also involved is DNA damage signaling, a process that induces cell cycle arrest and leads to apoptosis when DNA is severely damaged. MLH1 also heterodimerizes with MLH3 to form MutLγ, which plays a role in meiosis.
The "classical" human MLH1 amino acid sequence consists of the amino acid sequence of SEQ ID NO: 204.
MLH1 may also include other isoforms, including P40692-2 (SEQ ID NO: 205), which differ from the canonical sequence by a deletion of residues 1-241 of the canonical sequence.
MLH1 may also include a third known isoform called P40692-3 (SEQ ID NO: 207), which differs from the typical sequence in that residues 1-101 (MSFVAGVIRR … AISTYGFRG (SEQ ID NO: 206)) are replaced with MAF.
MMR inhibitors and methods of MMR inhibition
The present disclosure provides methods of editing a nucleic acid molecule by guided editing, the methods comprising contacting the nucleic acid molecule with an inhibitor of guided editor, pegRNA, and DNA mismatch repair pathways, thereby installing one or more modifications to the nucleic acid molecule at a target site with increased editing efficiency and/or reduced indels formation. Accordingly, the present disclosure contemplates any suitable method of inhibiting MMR. In one embodiment, the disclosure includes administering an effective amount of an MMP pathway inhibitor. In various embodiments, MMR pathways can be inhibited, blocked, or inactivated by inhibiting, blocking, or inactivating any one or more MMR proteins or variants at the genetic level (e.g., in a gene encoding one or more MMR proteins, such as by introducing a mutation that inactivates the MMR protein or variant thereof), the transcriptional level (e.g., by transcript knockdown), the translational level (e.g., by blocking translation of one or more MMR proteins from its cognate transcript), or at the protein level (e.g., by administration of an inhibitor (e.g., a small molecule, an antibody, dominant negative protein partner)) or by targeted protein degradation (e.g., a PROTAC-based degradation). The present disclosure also contemplates guided editing methods designed to install modifications to nucleic acid molecules that escape correction of the MMR pathway without the need to provide MMR inhibitors.
The inventors developed guided editing that enables insertion, deletion or substitution of genomic DNA sequences without requiring error-prone double-stranded DNA breaks. The present disclosure now provides improved guided editing methods that involve blocking, inhibiting, or inactivating the MMR pathway (e.g., by inhibiting, blocking, or inactivating MMR pathway proteins, including MLH 1) during guided editing, thereby surprisingly resulting in increased editing efficiency and reduced indel formation. As used herein, a guided editing "during" may encompass any suitable sequence of events such that the guided editing step may be applied before, simultaneously with, or after the step of blocking, inhibiting, evading, or inactivating the MMR pathway (e.g., by targeted inhibition of MLH 1).
The guide editing pairs with an engineered guide editing guide RNA (pegRNA) using an engineered Cas9 nickase-reverse transcriptase fusion protein (e.g., PE1 or PE 2), both of which guide Cas9 to the target genomic site and encode information for installing the desired editing. The pilot editing is performed by a multi-step editing process: 1) The Cas9 domain binds and nicks the target genomic DNA site, which is specified by the spacer sequence of the pegRNA; 2) Reverse transcriptase domain uses nicked genomic DNA as primer, using engineered extension on pegRNA as reverse transcription template to initiate synthesis of edited DNA strand-this will generate single stranded 3' flap containing edited DNA sequence; 3) Cellular DNA repair resolves 3' flap intermediates by substitution of 5' flap species, which occurs by invasion through the edited 3' flap, excision of the 5' flap containing the original DNA sequence, and ligation of the new 3' flap to incorporate the edited DNA strand, forming a heteroduplex of one edited strand and one unedited strand; and 4) cellular DNA repair uses the edited strand as a repair template to replace the unedited strand in the heteroduplex, completing the editing process.
Efficient incorporation of editing is expected to require that the newly synthesized 3' flap contain sequence portions homologous to genomic DNA sites. This homology enables the edited 3 'flap to compete with the endogenous DNA strand (the corresponding 5' flap) for incorporation into the DNA duplex. Since the edited 3' flap contains less sequence homology than the endogenous 5' flap, competition is expected to favor the 5' flap chain. Thus, a potential limiting factor in guiding editing efficiency may be that the 3 'flap containing the edit fails to invade and replace the 5' flap chain effectively. Furthermore, successful 3 'flap invasion and removal of the 5' flap incorporate editing on only one strand of the double stranded DNA genome. Permanent installation editing requires cellular DNA repair to replace the unedited complementary DNA strand using the edited strand as a template. Although cells can be favored to replace the unedited strand instead of the edited strand by introducing a nick in the unedited strand adjacent to editing using a secondary sgRNA (PE 3 system) (step 4 above), this process still relies on the second stage of DNA repair.
The present disclosure describes a modified guided editing method that further includes inhibiting, blocking, or otherwise inactivating a DNA mismatch repair (MMR) system. In some embodiments, the MMR inhibitor is provided to the target nucleic acid along with other components of the guided editing system, e.g., an exogenous MMR inhibitor such as siRNA may be provided to a cell comprising the target nucleic acid. In some embodiments, the guided editing system component, e.g., pegRNA, is designed to install modifications in the target nucleic acid that escape the MMR system without providing an inhibitor. In certain embodiments, the DNA mismatch repair (MMR) system may be inhibited, blocked, or otherwise inactivate one or more proteins of the MMR system, including but not limited to MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA. The present disclosure contemplates any suitable method of inhibiting, blocking, or otherwise inactivating a DNA mismatch repair (MMR) system, including but not limited to inactivating at the gene level one or more key proteins of the MMR system, such as by introducing one or more mutations in the gene encoding the MMR system protein, e.g., MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA.
Thus, in one aspect, the present disclosure provides methods for editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating a DNA mismatch repair (MMR) system.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating proteins of the MMR system (e.g., MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA).
In one aspect, the present disclosure provides improved guided editing methods, including additionally inhibiting the DNA mismatch repair (MMR) system during guided editing by inhibiting, blocking, or otherwise inactivating MLH1 or variants thereof. Without being limited by theory, MLH1 is a critical MMR protein that heterodimerizes with PMS2 to form MutLα (a component of the post-replication DNA mismatch repair system (MMR)). DNA repair is initiated by binding of mutsα (MSH 2-MSH 6) or mutsβ (MSH 2-MSH 3) to dsDNA mismatches, followed by recruitment of mutlα to the heteroduplex. In the presence of RFC and PCNA, the assembly of the MutL-MutS-heteroduplex ternary complex is sufficient to activate the endonuclease activity of PMS 2. It introduces a single strand break near the mismatch, creating a new entry point for the exonuclease EXO1 to degrade the mismatched strand. DNA methylation prevents cleavage, thereby ensuring that only newly mutated DNA strands are corrected. MutLα (MLH 1-PMS 2) physically interacts with the clip loading subunit of DNA polymerase III, suggesting that it may play a role in recruiting DNA polymerase III to the MMR site. Also involved is DNA damage signaling, a process that induces cell cycle arrest and leads to apoptosis when DNA is severely damaged. MLH1 also heterodimerizes with MLH3 to form MutLγ, which plays a role in meiosis. The "classical" human MLH1 amino acid sequence consists of the amino acid sequence of SEQ ID NO: 204.
MLH1 may also include other isoforms, including P40692-2 (SEQ ID NO: 205), which differ from the canonical sequence by a deletion of residues 1-241 of the canonical sequence.
MLH1 may also include a third known isoform called P40692-3 (SEQ ID NO: 207), which differs from the typical sequence in that residues 1-101 (MSFVAGVIRR … AISTYGFRG (SEQ ID NO: 206)) are replaced with MAF.
The present disclosure contemplates that any of the following MLH1 proteins may be inhibited by an inhibitor, or otherwise blocked or inactivated, to inhibit the MMR pathway during guided editing. In addition, such exemplary proteins may also be used to engineer or otherwise prepare dominant negative variants that are useful as inhibitor types when administered in an amount effective to block, inactivate, or inhibit MMR. Without being limited by theory, it is believed that the dominant negative mutant of MLH1 may saturate the binding of MutS. Exemplary MLH1 proteins include the following amino acid sequences, or amino acid sequences having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or up to 100% sequence identity to any of the following:
/>
The methods and compositions described herein use MLH1 mutants or truncated variants. In some embodiments, mutants and truncated variants of human MLH1 wild-type proteins are used.
In one aspect, the present disclosure provides truncated variants of human MLH 1. In some embodiments, amino acids 754-756 of wild-type human MLH1 protein are truncated (Δ754-756, hereinafter MLH1 dn). In some embodiments, truncated variants of human MLH1 (hereinafter MLH1 dnNTD) are provided that comprise only the N-terminal domain (amino acids 1-335). In various embodiments, the following MLH1 variants are provided in the present disclosure:
/>
/>
in another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PMS2 (or mutlα) or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PMS1 (or mutlβ) or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MLH3 (or mutlγ) or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating mutsα (MSH 2-MSH 6) or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MSH2 or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MSH6 or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PCNA or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating RFC or variants thereof.
In another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating EXO1 or variants thereof.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating POL delta or variants thereof.
Exemplary amino acid sequences of these MMR proteins (PMS 2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA) are as follows:
/>
/>
thus, in one aspect, the present disclosure provides methods for editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating a DNA mismatch repair (MMR) system.
In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with an inhibitor of an MMR system, e.g., an inhibitor of one or more of MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, or PCNA, that directs the editor and MMR system. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an antibody, e.g., a neutralizing antibody. In other embodiments, the inhibitor is a dominant negative mutant of one or more of MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, or PCNA, e.g., a dominant negative mutant of MLH 1. In other embodiments, the inhibitor may target transcript levels, such as siRNA or other nucleic acid agents that knock down the level of a transcript encoding MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, or PCNA. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; and (iii) a DNA vector (e.g., an AAV or lentiviral vector, a plasmid, or other nucleic acid delivery vector) encoding the guide editor fusion protein and/or a suitable pegRNA on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MLH1 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MLH1 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MLH 1 antibody, e.g., a neutralizing antibody that inactivates MLH 1. In other embodiments, the inhibitor may be a dominant negative mutant of MLH 1. In other embodiments, the inhibitor may target the transcript level of MLH1, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MLH 1. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PMS2 (or mutlα) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and a PMS2 (or mutlα) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-PMS 2 (or mutlα) antibody, for example a neutralizing antibody that inactivates PMS2 (or mutlα). In other embodiments, the inhibitor may be a dominant negative mutant of PMS2 (or mutlα). In other embodiments, the inhibitor may target the transcript level of PMS2 (or mutlα), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding PMS2 (or mutlα). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PMS1 (or mutlβ) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and a PMS1 (or mutlβ) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-PMS 1 (or mutlβ) antibody, for example a neutralizing antibody that inactivates PMS1 (or mutlβ). In other embodiments, the inhibitor may be a dominant negative mutant of PMS1 (or mutlβ). In other embodiments, the inhibitor may target the transcript level of PMS1 (or mutlβ), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding PMS1 (or mutlβ). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MLH3 (or mutlγ) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MLH3 (or mutlγ) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MLH 3 (or MutLγ) antibody, e.g., a neutralizing antibody that inactivates MLH3 (or MutLγ). In other embodiments, the inhibitor may be a dominant negative mutant of MLH3 (or MutLγ). In other embodiments, the inhibitor may target the transcript level of MLH3 (or MutLγ), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MLH3 (or MutLγ). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating mutsα (MSH 2-MSH 6) or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a lead editor and a mutsα (MSH 2-MSH 6) inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-mutsα (MSH 2-MSH 6) antibody, for example a neutralizing antibody that inactivates mutsα (MSH 2-MSH 6). In other embodiments, the inhibitor may be a dominant negative mutant of MutSα (MSH 2-MSH 6). In other embodiments, the inhibitor may target the transcript level of mutsα (MSH 2-MSH 6), e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding mutsα (MSH 2-MSH 6). In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MSH2 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MSH2 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MSH 2 antibody, e.g., a neutralizing antibody that inactivates MSH 2. In other embodiments, the inhibitor may be a dominant negative mutant of MSH 2. In other embodiments, the inhibitor may target the transcript level of MSH2, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding MSH 2. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating MSH6 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an MSH6 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-MSH 6 antibody, e.g., a neutralizing antibody that inactivates MSH 6. In other embodiments, the inhibitor may be a dominant negative mutant of MSH 6. In other embodiments, the inhibitor may target the transcript level of MSH6, e.g., knock down siRNA or other nucleic acid agent encoding the transcript level of MSH 6. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating PCNA or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and a PCNA inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-PCNA antibody, e.g., a neutralizing antibody that inactivates PCNA. In other embodiments, the inhibitor may be a dominant negative mutant of PCNA. In other embodiments, the inhibitor may target the transcript level of PCNA, e.g., knock down siRNA or other nucleic acid agent encoding the transcript level of PCNA. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating RFC or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guidance editor and a RFC inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-RFC antibody, such as a neutralizing antibody that inactivates RFC. In other embodiments, the inhibitor may be a dominant negative mutant of RFC. In other embodiments, the inhibitor may target the level of transcription of RFC, e.g., a siRNA or other nucleic acid agent that knocks down the level of transcripts encoding RFC. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating EXO1 or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a guide editor and an EXO1 inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor can be an anti-EXO 1 antibody, e.g., a neutralizing antibody that inactivates EXO 1. In other embodiments, the inhibitor may be a dominant negative mutant of EXO 1. In other embodiments, the inhibitor can target the transcript level of EXO1, e.g., knock down siRNA or other nucleic acid agent encoding the transcript level of EXO 1. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In yet another aspect, the present disclosure provides methods of editing a nucleotide molecule (e.g., genome) using guided editing while blocking, inhibiting, or otherwise inactivating POL delta or variants thereof. In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) comprising contacting a target nucleotide molecule with a lead editor and a POL delta inhibitor. In various embodiments, the inhibitor may be a small molecule inhibitor. In other embodiments, the inhibitor may be an anti-POL delta antibody, e.g., a neutralizing antibody that inactivates POL delta. In other embodiments, the inhibitor may be a dominant negative mutant of POL delta. In other embodiments, the inhibitor may target the transcript level of POL delta, e.g., a siRNA or other nucleic acid agent that knocks down the transcript level encoding POL delta. In other embodiments, the step of "contacting the target nucleotide molecule with a guidance editor" can comprise (i) delivering an effective amount of a guidance editor fusion protein (e.g., PE1 or PE 2) complexed with a lipid delivery system directly to the cell; (ii) Delivering mRNA or a delivery complex comprising mRNA encoding a guide editor fusion protein and/or a suitable pegRNA to a cell; (iii) DNA vectors (e.g., AAV or lentiviral vectors, plasmids, or other nucleic acid delivery vectors) encoding the guide editor fusion proteins and/or suitable pegrnas on one or more DNA vectors.
In other aspects, the disclosure provides methods of guided editing, thereby evading correction of changes in the MMR pathway to be introduced into a target nucleic acid molecule, without providing inhibitors of the MMR pathway. Surprisingly, a pegRNA designed with consecutive nucleotide mismatches relative to a target site on a target nucleic acid, e.g. a pegRNA with three or more consecutive mismatched nucleotides, can evade correction of the MMR pathway compared to introducing single nucleotide mismatches using guided editing, resulting in an increase in guided editing efficiency and/or a decrease in the frequency of indel formation. Furthermore, correction of the MMR pathway can also be avoided by introducing multiple consecutive nucleotides, such as insertions or deletions of three or more consecutive nucleotides, or of 10 or more consecutive nucleotides, by guide editing, resulting in an increase in guide editing efficiency and/or a decrease in the frequency of indel formation as compared to guide editing with a corresponding control pegRNA (e.g., a control pegRNA that does not introduce insertions or deletions of three or more consecutive nucleotides). In some embodiments, a guided edit that introduces insertions or deletions of 10 or more consecutive nucleotides results in an increase in guided edit efficiency and/or a decrease in the frequency of insertions or deletions compared to introducing insertions or deletions of less than 10 nucleotides in length using a guided edit.
Thus, in one aspect, the present disclosure provides a method of editing a nucleic acid molecule by guided editing comprising contacting the nucleic acid molecule with a guided editor and a pegRNA comprising on its extension arm a DNA synthesis template comprising three or more consecutive nucleotide mismatches relative to a target site on the nucleic acid molecule. In some embodiments, the pegRNA comprises a DNA synthesis template comprising one or more nucleotide edits relative to an endogenous sequence of a nucleic acid molecule (e.g., double stranded target DNA) to be edited, wherein the one or more nucleotide edits comprise (i) an insertion, deletion, or substitution of x consecutive nucleotides that corrects a mutation (e.g., a disease-related mutation) in the nucleic acid molecule, and (ii) an insertion, deletion, or substitution of y consecutive nucleotides immediately adjacent to the x nucleotides, wherein (x+y) is an integer no less than 3. In some embodiments, the insertion, deletion, or substitution of y consecutive nucleotides is a silent mutation. In some embodiments, the insertion, deletion, or substitution of y consecutive nucleotides is a benign mutation. The silent mutation may be present in the coding region of the target nucleic acid molecule or in a non-coding region of the target nucleic acid molecule. When silent mutations are present in the coding region, they introduce one or more substitution codons that encode the same amino acid as the unedited nucleic acid molecule into the nucleic acid molecule. Alternatively, when the silent mutation is located in a non-coding region or at a junction of a coding region and a non-coding region (e.g., an intron/exon junction), the silent mutation may be present in a region of the nucleic acid molecule that does not affect splicing, gene regulation, RNA longevity, or other biological properties of a target site on the nucleic acid molecule. Benign mutations can refer to nucleotide or amino acid changes that alter the amino acid sequence of a protein or polypeptide encoded by a target nucleic acid sequence, but do not impair or substantially impair the expression and/or function of the protein or polypeptide. In some embodiments, x is an integer between 1 and 50. In some embodiments, y is an integer between 1 and 50. In some embodiments, y is an integer not less than 1. In some embodiments, the inclusion of one or more silent mutations increases efficiency, reduces the frequency of unintended indels, and/or increases the purity of the editing result by guided editing. As used herein, the term "guided editing result purity" may refer to the ratio of the expected editing to the unexpected indels produced by the guided editing. In some embodiments, compared to guided editing using a control pegRNA that does not include one or more silent mutations (e.g., a control pegRNA that includes only x consecutive nucleotide insertions, deletions, or substitutions and does not include y consecutive nucleotide insertions, deletions, or substitutions), the inclusion of one or more silent mutations increases efficiency, reduces the frequency of unintended indels by directing editing and/or increases the purity of the editing result by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least at least 26 times, at least 27 times, at least 28 times, at least 29 times, at least 30 times, at least 31 times, at least 32 times, at least 33 times, at least 34 times, at least 35 times, at least 36 times, at least 37 times, at least 38 times, at least 39 times, at least 40 times, at least 41 times, at least 42 times, at least 43 times, at least 44 times, at least 45 times, at least 46 times, at least 47 times, at least 48 times, at least 49 times, at least 50 times, at least 51 times, at least 52 times, at least 53 times, at least 54 times, at least 55 times, at least 56 times, at least 57 times, at least 58 times, at least 59 times, at least 60 times, at least 61 times, at least 62 times, at least 63 times, at least 64 times, at least 65 times, at least 66 times, at least 67 times, at least 68 times, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
In some embodiments, at least one of the three or more consecutive nucleotide mismatches results in a change in the amino acid sequence of a protein expressed from the nucleic acid molecule. In some embodiments, more than one of the consecutive nucleotide mismatches results in a change in the amino acid sequence of the protein expressed from the nucleic acid molecule. In some embodiments, at least one nucleotide mismatch is a silent mutation that does not result in a change in the amino acid sequence of a protein expressed by the nucleic acid molecule. The silent mutation may be present in the coding region of the target nucleic acid molecule or in a non-coding region of the target nucleic acid molecule. When silent mutations are present in the coding region, they introduce one or more substitution codons that encode the same amino acid as the unedited nucleic acid molecule into the nucleic acid molecule. Alternatively, when the silent mutation is located in a non-coding region, the silent mutation may be present in a region of the nucleic acid molecule that does not affect splicing, gene regulation, RNA lifetime, or other biological properties of a target site on the nucleic acid molecule.
Any number of three or more consecutive nucleotide mismatches may be used to achieve the benefit of escaping correction of the MMR pathway. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotide mismatches relative to the endogenous sequence of the target site in the nucleic acid molecule edited by directed editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 3, 4, or 5 consecutive nucleotide mismatches relative to the endogenous sequence of the target site in the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive nucleotide mismatches relative to the endogenous sequence of the target site in the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to the target site on the nucleic acid molecule.
In another aspect, the present disclosure provides a method of editing a nucleic acid molecule by guided editing comprising contacting the nucleic acid molecule with a guided editor and a pegRNA comprising on an extension arm thereof a DNA synthesis template comprising an insertion or deletion of 10 or more nucleotides relative to a target site on the nucleic acid molecule. Insertions and deletions of 10 or more nucleotides in length can evade correction of the MMR pathway when introduced by guided editing and thus can benefit from inhibition of the MMR pathway without the need to provide MMR inhibitors. Any length of insertions and deletions greater than 10 nucleotides may be used to achieve the benefit of evading correction of the MMR pathway. In some embodiments, the DNA synthesis template comprises an insertion or deletion of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides relative to the endogenous sequence of the target site of the nucleic acid molecule edited by directed editing. In some embodiments, the DNA synthesis template comprises an insertion or deletion of 11 or more nucleotides, 12 or more nucleotides, 13 or more nucleotides, 14 or more nucleotides, 15 or more nucleotides, 16 or more nucleotides, 17 or more nucleotides, 18 or more nucleotides, 19 or more nucleotides, 20 or more nucleotides, 21 or more nucleotides, 22 or more nucleotides, 23 or more nucleotides, 24 or more nucleotides, or 25 or more nucleotides relative to a target site on the nucleic acid molecule. In certain embodiments, the DNA synthesis template comprises insertions or deletions of 15 or more nucleotides relative to a target site on the nucleic acid molecule.
The present disclosure provides compositions and methods for guided editing with increased editing efficiency and/or reduced indel formation by inhibiting DNA mismatch repair pathways while targeted site guided editing. Thus, the present disclosure provides methods of editing a nucleic acid molecule by guided editing, which methods involve contacting the nucleic acid molecule with inhibitors of guided editors, pegRNA and DNA mismatch repair pathways, thereby increasing editing efficiency and/or reducing indels forming one or more modifications to the nucleic acid molecule installed at a target site. The present disclosure also provides polynucleotides for editing a DNA target site by directed editing comprising a nucleic acid sequence encoding napDNAbp, a polymerase, and an inhibitor of the DNA mismatch repair pathway, wherein napDNAbp and polymerase are capable of installing one or more modifications in the DNA target site with increased editing efficiency and/or reduced insertion/deletion formation in the presence of peprna. Thus, the methods and compositions described herein utilize a guided editor that may comprise a nucleic acid programmable DNA binding protein (napDNAbp).
A guide editor: napDNAbp domain
In one aspect, a napDNAbp of a guide editor described herein can bind to or complex with at least one guide nucleic acid (e.g., guide RNA or PEgRNA) that localizes the napDNAbp to a DNA sequence comprising a DNA strand (i.e., target strand) complementary to the guide nucleic acid or a portion thereof (e.g., the spacer of the guide RNA that anneals to the pre-spacer of the DNA target). In other words, the guide nucleic acid "programs" the napDNAbp (e.g., cas9 or equivalent) to locate and bind to the complement of the pre-spacer in the DNA.
Any suitable napDNAbp can be used in the guidance editors used in the methods and compositions described herein. In various embodiments, the napDNAbp can be any class 2 CRISPR-Cas system, including any type II, type V, or type VI CRISPR-Cas enzyme. In view of the rapid development of CRISPR-Cas as a genome editing tool, nomenclature for describing and/or identifying CRISPR-Cas enzymes has been continually developed, such as Cas9 and Cas9 orthologs. The present application refers to CRISPR-Cas enzyme nomenclature that may be old and/or new. Those skilled in the art will be able to determine the particular CRISPR-Cas enzyme referred to in this application based on the nomenclature used (whether it is an old (i.e. "legacy") nomenclature or a new nomenclature). CRISPR-Cas nomenclature is widely discussed in Makarova et al, "Classification and Nomenclature of CRISPR-Cas Systems: is white from heat? "The CRISPR Journal, vol.1.No.5, 2018, the entire contents of which are incorporated herein by reference. The particular CRISPR-Cas nomenclature used in any given instance of the present application is not limited in any way, and one of skill in the art will be able to determine what CRISPR-Cas enzyme is referenced.
For example, the following type II, type V, and type VI type 2 CRISPR-Cas enzymes have the following old (i.e., legacy) and new names recognized in the art. Each of these enzymes and/or variants thereof may be used with the guidance editors used in the methods and compositions described herein:
* See Makarova et al The CRISPR Journal, vol.1, no.5, 2018
Without being bound by theory, the mechanism of action of certain napDNAbp contemplated herein includes a step of forming an R-loop, whereby napDNAbp induces unwinding of the double stranded DNA target, thereby separating the strands in the region bound by napDNAbp. The guide RNA spacer then hybridizes to the "target strand" at a region complementary to the pre-spacer sequence. This displaces the "non-target strand" that is complementary to the target strand, which forms the single-stranded region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities that cleave DNA, leaving behind different types of lesions. For example, napDNAbp can comprise nuclease activity that cleaves a non-target strand at a first location and/or cleaves a target strand at a second location. Depending on the nuclease activity, the target DNA may be cleaved to form a "double strand break," thereby cleaving both strands. In other embodiments, the target DNA may be cleaved at only a single site, i.e., the DNA is "nicked" on one strand. Exemplary napDNAbp with different nuclease activities include "Cas9 nickase" ("nCas 9") and inactive Cas9 without nuclease activity ("dead Cas9" or "dCas 9").
The following description of various napDNAbp that may be used in connection with the guidance editors used in the presently disclosed methods and compositions is not meant to be limiting in any way. The guide editor may include classical SpCas9, or any orthologous Cas9 protein, or any variant Cas9 protein, including any naturally occurring Cas9 variant, mutant, or other engineered version, which is known or may be prepared or evolved by directed evolution or other mutagenesis processes. In various embodiments, cas9 or Cas9 variants have nickase activity, i.e., cleave only one strand of the target DNA sequence. In other embodiments, cas9 or Cas9 variants have an inactive nuclease, i.e., a "dead" Cas9 protein. Other variant Cas9 proteins that may be used are those having a smaller molecular weight (e.g., for easier delivery) than classical SpCas9 or having a modified or rearranged primary amino acid structure (e.g., in circular arrangement).
The guide editors used in the methods and compositions described herein may also include Cas9 equivalents, including Cas12a (Cpf 1) and Cas12b1 proteins, which are the result of convergent evolution. As used herein, napDNAbps (e.g., spCas9, cas9 variants, or Cas9 equivalents) may also contain various modifications that alter/enhance its PAM specificity. Finally, the present application contemplates any Cas9, cas9 variant, or Cas9 equivalent having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence (e.g., a reference SpCas9 classical sequence or a reference Cas9 equivalent (e.g., cas12a (Cpf 1)).
napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeats) related nuclease. As described above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters comprise a spacer, a sequence complementary to a preceding mobile element and a target invader nucleic acid. The CRISPR cluster was transcribed and processed into CRISPR RNA (crRNA). In a type II CRISPR system, correct processing of pre-crrnas requires trans-encoding small RNAs (tracrRNA), endogenous ribonuclease 3 (rnc) and Cas9 proteins. tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crrnas. Subsequently, cas9/crRNA/tracrRNA cleaves the linear or circular dsDNA target complementary to the spacer in an endonucleolytic manner. Target strands that are not complementary to crrnas are first cleaved in an endonucleolytic manner and then trimmed in a 3'-5' manner in an exo-fusion manner. In fact, DNA binding and cleavage typically requires a protein and both RNAs. However, a single guide RNA ("sgRNA", or simply "gRNA") may be engineered to incorporate aspects of both crRNA and tracrRNA into a single RNA species. See, for example, jink m.et al, science 337:816-821 (2012), the entire contents of which are incorporated herein by reference.
In some embodiments, napDNAbp directs cleavage of one or both strands at a target sequence position (e.g., within the target sequence and/or within the complement of the target sequence). In some embodiments, napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500 or more base pairs from the first or last nucleotide of the target sequence. In some embodiments, the vector encodes a napDNAbp that is mutated with respect to the corresponding wild-type enzyme such that the mutated napDNAbp lacks the ability to cleave one or both strands of a target polynucleotide comprising a target sequence. For example, aspartic acid to alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from streptococcus pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves single strand). Other examples of mutations that make Cas9 a nickase include, but are not limited to, H840A, N854A and N863A that reference equivalent amino acid sites in classical SpCas9 sequences or other Cas9 variants or Cas9 equivalents.
As used herein, the term "Cas protein" refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a different sequence than a naturally occurring Cas protein, or any fragment of a Cas protein, but which retains all or a substantial amount of the essential functions required for the disclosed methods, i.e., (i) has the ability of the Cas protein to programmably bind to a nucleic acid of a target DNA, and (ii) to nick a target DNA sequence on one strand. Cas proteins contemplated herein include crispcas 9 proteins, as well as Cas9 equivalents, variants (e.g., cas9 nickase (nCas 9) or nuclease-inactive Cas9 (dCas 9)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and can include Cas9 equivalents from any class 2 CRISPR system (e.g., type II, V, VI), including Cas12a (Cpf 1), cas12e (CasX), cas12b1 (C2C 1), cas12b2, cas12C (C2C 3), C2C4, C2C8, C2C5, C2C10, C2C9Cas13a (C2), cas13d, cas13C (C2C 7), cas13b (C2C 6), and Cas13b. Other Cas equivalents are described in Makarova et al, "C2C2 is a single-component programmable RNA-guide RNA-targeting CRISPR effector," Science 2016;353 (6299) Makarova et al, "Classification and Nomenclature of CRISPR-Cas Systems: is white from heat? "The CRISPR Journal, vol.1.No.5, 2018, the contents of which are incorporated herein by reference.
The term "Cas9" or "Cas9 nuclease" or "Cas9 portion" or "Cas9 domain" includes any naturally occurring Cas9 from any organism, any naturally occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, as well as any mutant or variant of Cas9 that is naturally occurring or engineered. The term Cas9 is not meant to be particularly limiting and may be referred to as "Cas9 or equivalent. Exemplary Cas9 proteins are further described herein and/or described in the art and are incorporated herein by reference. The present disclosure is not limited to the particular Cas9 used in the guide editor used in the methods and compositions described herein.
Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., "Complete genome sequence of an M1 strain of Streptococcus pyogenes," ferrett et al, j.j., mcshin w.m., ajdic d.j., savic g., lyon k., primeaux c, sezate s, suvorov a.n., kenton s, lai h.s, lin s.p., qian y, jia h.g., najar f.z., ren q., zhu h., song l., white j., yuan x, clifton s.w., roe B.A., mcLaughlin R.E., proc.Natl.Acad.Sci.U.S.A.98:4663 (2001); "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III." Deltcheva E., "Chukinski K.," Sharma C.M., "Gonzales K.," Chao Y., "Pirzada Z.A.," Eckert M.R., "Vogel J.," Charpentier E., "Nature 471:602-607 (2011); and" A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity., "Jinek M.," Chundiski K., "Fonfara I.," Hauer M., "Doudna J.A.," Charpentier E.Science 337:816-821 (2012), the entire contents of each of which are incorporated herein by reference).
Examples of Cas9 and Cas9 equivalents are provided below; however, these specific examples are not meant to be limiting. The guide editor used in the methods and compositions of the present disclosure may use any suitable napDNAbp, including any suitable Cas9 or Cas9 equivalent.
A.Wild type classical SpCas9
In one embodiment, the guide editor constructs used in the methods and compositions described herein may comprise a "classical SpCas9" nuclease from streptococcus pyogenes, which has been widely used as a tool for genome engineering and categorized as a type II subgroup enzyme of the class 2 CRISPR-Cas system. Such Cas9 proteins are large multi-domain proteins comprising two distinct nuclease domains. Point mutations can be introduced into Cas9 to eliminate one or both nuclease activities, resulting in nickase Cas9 (nCas 9) or dead Cas9 (dCas 9), respectively, that still retain the ability to bind DNA in an sgRNA programming manner. In principle, cas9 or a variant thereof (e.g., nCas 9) can target a protein to almost any DNA sequence by co-expression with an appropriate sgRNA when fused to another protein or domain. As used herein, classical SpCas9 protein refers to a wild-type protein from streptococcus pyogenes having the amino acid sequence:
/>
/>
/>
The guide editors used in the methods and compositions described herein can include classical SpCas9, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type Cas9 sequence provided above. These variants may include SpCas9 variants that contain one or more mutations, including any known mutations reported by SwissProt accession No. Q99ZW2 (SEQ ID NO: 2), including:
other wild-type SpCas9 sequences useful in the present disclosure include:
/>
/>
/>
/>
/>
/>
/>
/>
/>
the guide editors used in the methods and compositions described herein can include any of the SpCas9 sequences described above, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
B.Wild type Cas9 orthologs
In other embodiments, the Cas9 protein may be a wild-type Cas9 ortholog from another bacterial species that is different from classical Cas9 from streptococcus pyogenes. For example, the following Cas9 orthologs may be used in conjunction with the guided editor constructs used in the methods and compositions described herein. Furthermore, any variant Cas9 ortholog having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the following orthologs may also be used with the guide editor.
/>
/>
/>
/>
/>
/>
/>
/>
The guide editors used in the methods and compositions described herein can include any of the Cas9 orthologous sequences described above, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
napDNAbp may include any suitable homologs and/or orthologs or naturally occurring enzymes, for example, cas9 homologs and/or orthologs have been described in different species including, but not limited to, streptococcus pyogenes and streptococcus thermophilus. Preferably, the Cas moiety is configured (e.g., mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only the target double-stranded DNA. Other suitable single strands of Cas9 nucleases, and sequences will be apparent to the skilled artisan based on the present disclosure, such Cas9 nucleases and sequences include those from chlinski, rhun, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference) are disclosed in the organisms and loci. In some embodiments, the Cas9 nuclease has an inactive (e.g., inactive) DNA cleavage domain, i.e., cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 3. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one of the Cas9 orthologs in the table above.
C.Death Cas9 variants
In certain embodiments, the guide editors used in the methods and compositions described herein can include dead Cas9, e.g., dead SpCas9, that has no nuclease activity due to one or more mutations that inactivate two nuclease domains of Cas9, i.e., ruvC domains (which cleave non-pre-spacer DNA strands) and HNH domains (cleave pre-spacer DNA strands). Nuclease inactivation may be due to one or more mutations resulting in one or more substitutions and/or deletions in the amino acid sequence of the encoded protein or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95% or at least 99% sequence identity thereto.
As used herein, the term "dCas9" refers to Cas9 without nuclease activity or nuclease-dead Cas9, or a functional fragment thereof, and includes any naturally occurring dCas9 from any organism, any naturally occurring dCas9 equivalent or functional fragment thereof, any engineered dCas9 variant or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of naturally occurring or engineered dCas 9. The term dCas9 is not meant to be limiting in particular and may be referred to as "dCas9 or equivalent. Exemplary dCas9 proteins and methods for preparing dCas9 proteins are further described herein and/or described in the art and incorporated herein by reference.
In other embodiments, dCas9 corresponds to or comprises part or all of the Cas9 amino acid sequence with one or more mutations that inactivate Cas9 nuclease activity. In other embodiments, cas9 variants are provided having mutations other than D10A and H840A that result in complete or partial inactivation of endogenous Cas9 nuclease (e.g., nCas9 or dCas9, respectively) activity. For example, with reference to a wild-type sequence such as Cas9 from streptococcus pyogenes (NCBI reference sequence: NC 017053.1), such mutations include other amino acid substitutions at D10 and H840 of Cas9, or other substitutions within the nuclease domain (e.g., substitutions in the HNH nuclease subdomain and/or RuvC1 subdomain). In some embodiments, variants or homologs of Cas9 are provided (e.g., variants of Cas9 from streptococcus pyogenes (NCBI reference sequence: nc_017053.1 (SEQ ID NO: 4)) that are identical to the NCBI reference sequence: NC 017053.1 is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical. In some embodiments, variants of Cas9 (e.g., variants of NCBI reference sequence: nc_017053.1 (SEQ ID NO: 4)) are provided having an amino acid sequence of about 5 amino acids, about 10 amino acids, about 15 amino acids, about 20 amino acids, about 25 amino acids, about 30 amino acids, about 40 amino acids, about 50 amino acids, about 75 amino acids, about 100 amino acids, or more, shorter or longer than nc_017053.1 (SEQ ID NO: 4).
In one embodiment, death Cas9 may be based on the classical SpCas9 sequence of Q99ZW2, and may have the following sequence comprising D10X and H810X, where X may be any amino acid, substitution (underlined and bold), or variant is with SEQ ID NO:40, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity.
In one embodiment, death Cas9 may be based on the classical SpCas9 sequence of Q99ZW2 and may have the following sequence, which comprises D10A and H810A substitutions (underlined and bold), or is identical to SEQ ID NO:23, a variant having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity.
/>
Variants of cas9 nicking enzyme
In one embodiment, the guide editor used in the methods and compositions described herein comprises Cas9 nickase. The term "nCas9" or "Cas9 nickase" refers to Cas9 variants that are capable of introducing a single-strand break in a double-stranded DNA molecule target. In some embodiments, the Cas9 nickase comprises only a single functional nuclease domain. Wild-type Cas9 (e.g., classical SpCas 9) comprises two independent nuclease domains, namely, a RuvC domain (cleaving a non-pre-spacer DNA strand) and a HNH domain (cleaving a pre-spacer DNA strand). In one embodiment, the Cas9 nickase comprises a mutation in the RuvC domain that inactivates RuvC nuclease activity. For example, mutations in aspartic acid (D) 10, histidine (H) 983, aspartic acid (D) 986, or glutamic acid (E) 762 have been reported as loss-of-function mutations in RuvC nuclease domains and creation of functional Cas9 nickases (e.g., nishimasu et al, "Crystal structure of Cas9 in complex with guide RNA and target DNA," Cell 156 (5), 935-949, incorporated herein by reference). Thus, the nickase mutation in the RuvC domain can include D10X, H983X, D986X, or E762X, where X is any amino acid other than a wild-type amino acid. In certain embodiments, the nicking enzyme may be D10A, H983A, D986A, or E762A, or a combination thereof.
In various embodiments, the Cas9 nickase may have a mutation in the RuvC nuclease domain and have one of the following amino acid sequences or a variant of an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
/>
/>
/>
/>
/>
In another embodiment, the Cas9 nickase comprises a mutation in the HNH domain that inactivates HNH nuclease activity. For example, mutations in histidine (H) 840 or asparagine (R) 863 have been reported as loss-of-function mutations of the HNH nuclease domain and creation of functional Cas9 nickases (e.g., nisimasu et al, "Crystal structure of Cas9 in complex with guide RNA and target DNA," Cell 156 (5), 935-949, which are incorporated herein by reference). Thus, the nickase mutation in the HNH domain may include H840X and R863X, where X is any amino acid other than the wild type amino acid. In certain embodiments, the nicking enzyme may be H840A or R863A or a combination thereof.
In various embodiments, the Cas9 nickase may have a mutation in the HNH nuclease domain and have one of the following amino acid sequences or a variant of an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
/>
/>
/>
In some embodiments, the N-terminal methionine is removed from the Cas9 nickase or from any Cas9 variant, ortholog or equivalent disclosed or contemplated herein. For example, a methionine reduced Cas9 nickase includes a sequence or variant of an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
/>
/>
/>
E.Other Cas9 variants
In addition to death Cas9 and Cas9 nickase variants, cas9 proteins used herein may also include other "Cas9 variants having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference Cas9 protein (including any wild-type Cas9 or mutant Cas9 (e.g., inactive Cas9 or Cas9 nickase), or Cas9 fragments, or circularly arranged Cas9, or other variants of Cas9 disclosed herein or known in the art). In some embodiments, cas9 variants may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes as compared to reference Cas 9. In some embodiments, the Cas9 variant comprises a fragment (e.g., a gRNA binding domain or a DNA cleavage domain) of reference Cas9 such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild-type Cas 9. In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of the corresponding wild-type Cas9 (e.g., SEQ ID NO: 2).
In some embodiments, the present disclosure can also utilize Cas9 fragments that retain functionality and are fragments of any Cas9 protein disclosed herein. In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length.
In various embodiments, the guide editors used in the methods and compositions disclosed herein can comprise one of the Cas9 variants described below or a Cas9 variant that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the reference Cas9 variants.
F.Small Cas9 variants
In some embodiments, the guide editors used in the methods and compositions contemplated herein may include Cas9 proteins having a molecular weight that is less than the classical SpCas9 sequence. In some embodiments, the small Cas9 variants may facilitate delivery to cells, e.g., by expression vectors, nanoparticles, or other means of delivery. In certain embodiments, the small Cas9 variants may include enzymes classified as type II enzymes of a class 2 CRISPR-Cas system. In some embodiments, the small Cas9 variants may include enzymes classified as type V enzymes of a class 2 CRISPR-Cas system. In other embodiments, the small Cas9 variants may include enzymes classified as class VI enzymes of a class 2 CRISPR-Cas system.
Classical SpCas9 protein is 1368 amino acids in length and has a predicted molecular weight of 158 kilodaltons. As used herein, the term "small Cas9 variant" refers to any Cas9 variant-naturally occurring, engineered, or otherwise-that retains less than 1300 amino acids, or at least 1290 amino acids, or less than 1280 amino acids, or less than 1270 amino acids, or less than 1260 amino acids, or less than 1250 amino acids, or less than 1240 amino acids, or less than 1230 amino acids, or less than 1220 amino acids, or less than 1210 amino acids, or less than 1200 amino acids, or less than 1190 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 1150 amino acids, or less than 1140 amino acids, or less than 1130 amino acids, or less than 1120 amino acids, or less than 1110 amino acids, or less than 1100 amino acids, or less than 1050 amino acids, or less than 1000 amino acids, or less than 900 amino acids, or less than 850 amino acids, or less than 800 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than about 500 amino acids, or less than 500 amino acids, as desired for a Cas function. Cas9 variants may include those classified as type II, type V or type VI enzymes of a class 2 CRISPR-Cas system.
In various embodiments, the guide editors used in the methods and compositions disclosed herein can comprise small Cas9 variants described below or Cas9 variants that are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference small Cas9 protein.
/>
/>
/>
/>
G.Cas9 equivalents
In some embodiments, the guide editor used in the methods and compositions described herein can include any Cas9 equivalent. As used herein, the term "Cas9 equivalent" is a broad term, including any napDNAbp protein that performs the same function as Cas9 in a guide editor, although its amino acid primary sequence and/or its three-dimensional structure may be different and/or unrelated from an evolutionary point of view. Thus, while Cas9 equivalents include any Cas9 ortholog, homolog, mutant or variant described or encompassed herein that is evolutionarily related, cas9 equivalents also include proteins that may have been evolved to have the same or similar function as Cas9 by a convergent evolution process, they do not necessarily have any similarity in amino acid sequence and/or three-dimensional structure. The guide editors used in the methods and compositions described herein include any Cas9 equivalent that will provide the same or similar function as Cas9, although Cas9 equivalents may be based on proteins produced by convergent evolution. For example, if Cas9 refers to a type II enzyme of a CRISPR-Cas system, then Cas9 equivalent may refer to a type V or VI enzyme of the CRISPR-Cas system.
For example, cas12e (CasX) is a Cas9 equivalent that is reported to have the same function as Cas9 but evolved by convergent evolution. Thus, consider the description in Liu et al, "CasX enzymes comprises a distinct family of RNA-guided genome editors," Nature,2019, vol.566: the Cas12e (CasX) protein in 218-223 is used with the guide editor used in the methods and compositions described herein. Furthermore, any variant or modification of Cas12e (CasX) is contemplated and within the scope of the present disclosure.
Cas9 is a bacterial enzyme that evolves in a wide variety of species. However, cas9 equivalents contemplated herein may also be obtained from archaebacteria that constitute domains and kingdoms of single-cell prokaryotic microorganisms other than bacteria.
In some embodiments, cas9 equivalents may refer to Cas12e (CasX) or Cas12d (CasY), which have been described, for example, in Burstein et al, "New CRISPR-Cas systems from uncultivated microns", "Cell res.2017 Feb 21.Doi:10.1038/cr.2017.21, the entire contents of which are incorporated herein by reference. Using genome-resolved metagenomics, many CRISPR-Cas systems were identified, including Cas9, which was first reported in the archaebacteria life domain. This different Cas9 protein is found in very few nanoarchaea (nanoarchaea) as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems, CRISPR-Cas12e and CRISPR-Cas12d, were found, which are one of the most compact systems found so far. In some embodiments, cas9 refers to Cas12e, or a variant of Cas12 e. In some embodiments, cas9 refers to Cas12d or a variant of Cas12 d. It is understood that other RNA-guided DNA binding proteins may be used as the nucleic acid programmable DNA binding protein (napDNAbp) and are within the scope of the present disclosure. See also Liu et al, "CasX enzymes comprises a distinct family of RNA-guided genome editors," Nature,2019, vol 566:218-223. Any such Cas9 equivalents are contemplated.
In some embodiments, the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp is a naturally occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas portion or any Cas portion provided herein.
In various embodiments, the nucleic acid-programmable DNA-binding proteins include, but are not limited to, cas9 (e.g., dCas9 and nCas 9), cas12e (CasX), cas12d (CasY), cas12a (Cpf 1), cas12b1 (C2C 1), cas13a (C2), cas12C (C2C 3), argonaute proteins, and Cas12b1. One example of a nucleic acid programmable DNA binding protein with PAM specificity different from Cas9 is a clustered regularly interspaced short palindromic repeat from Prevolvula (Prevolella) and Francisella (Francisella) 1 (i.e., cas12a (Cpf 1)). Like Cas9, cas12a (Cpf 1) is also a class 2 CRISPR effector, but it is an enzyme of the V-subgroup rather than the II-subgroup. Cas12a (Cpf 1) has been shown to mediate strong DNA interference, a characteristic different from Cas 9. Cas12a (Cpf 1) is a single stranded RNA guided endonuclease, lacking tracrRNA, that utilizes a T-rich pre-spacer proximity motif (TTN, TTTN, or YTN). Furthermore, cpf1 cleaves DNA by staggered DNA double strand breaks. Of the 16 Cpf1 family proteins, 2 enzymes from the genus amino acid coccus (Acidococcus) and the family Trichosporoceae (Lachnospiraceae) demonstrated high genome editing activity in human cells. Cpf1 proteins are known in the art and have been previously described, for example, in Yamano et al, "Crystal structure of Cpf1 in complex with guide RNA and target DNA." Cell (165) 2016, p.949-962; the entire contents of which are incorporated herein by reference.
In other embodiments, cas proteins can include any CRISPR-associated protein, including, but not limited to, cas12a, cas12b1, cas1B, cas2, cas3, cas4, cas5, cas6, cas7, cas8, cas9 (also referred to as Csn1 and Csx 12), cas10, csy1, csy2, csy3, cse1, cse2, csc1, csc2, csa5, csn2, csm3, csm4, csm5, csm6, cmr1, cmr3, cmr, csb, csb, csm, csb3, csx17, csx14, csx10, csx16, csaX 3, csx1, csx15, csf1, csf2, csf3, csf4, homologs thereof, or modified versions thereof, and preferably comprise a nicking enzyme mutation (e.g., a mutation corresponding to D10A mutation of the wild-type Cas9 polypeptide of SEQ ID NO: 2).
In various other embodiments, the napDNAbp may be any one of the following proteins: cas9, cas12a (Cpf 1), cas12e (CasX), cas12d (CasY), cas12b1 (C2C 1), cas13a (C2), cas12C (C2C 3), geoCas9, cjCas9, cas12g, cas12h, cas12i, cas13b, cas13C, cas13d, cas14, csn2, xCas9, spCas9-NG, circularly arranged Cas9, or Argonaute protein (Ago) domains, or variants thereof.
Exemplary Cas9 equivalent protein sequences may include the following:
/>
/>
/>
/>
/>
the guide editors used in the methods and compositions described herein may also include Cas12a (Cpf 1) (dCpf 1) variants, which may be used as guide nucleotide sequence-programmable DNA binding protein domains. The Cas12a (Cpf 1) protein has a RuvC-like endonuclease domain similar to that of Cas9, but no HNH endonuclease domain, and the N-terminus of Cas12a (Cpf 1) has no alpha-helical recognition cleft (lobe) of Cas 9. Zetsche et al, cell,163, 759-771, 2015 (incorporated herein by reference) indicate that the RuvC-like domain (Cpf 1) of Cas12a is responsible for cleaving two DNA strands and inactivating the RuvC-like domain inactivates Cas12a (Cpf 1) nuclease activity.
In some embodiments, the napDNAbp is a single effector of a microbial CRISPR-Cas system. Single effectors of microbial CRISPR-Cas systems include, but are not limited to, cas9, cas12a (Cpf 1), cas12b1 (C2C 1), cas13a (C2), and Cas12C (C2C 3). Generally, microbial CRISPR-Cas systems are classified into class 1 and class 2 systems. Class 1 systems have a multi-subunit effector complex, while class 2 systems have a single protein effector. For example, cas9 and Cas12a (Cpf 1) are class 2 effectors. In addition to Cas9 and Cas12a (Cpf 1), three different class 2 CRISPR-Cas systems (Cas 12b1, cas13a, and Cas12 c) are described in Shmakov et al, "Discovery and Functional Characterization of Diverse Class 2 CRISPR Cas Systems", mol. Cell,2015 Nov 5;60 (3): 385-397, the entire contents of which are incorporated herein by reference.
The effectors of both (Cas 12b1 and Cas12 c) in the system comprise RuvC-like endonuclease domains associated with Cas12 a. The third system Cas13a comprises effectors with two predicted HEPN RNase domains. Mature CRISPR RNA is produced independently of tracrRNA, unlike CRISPR RNA produced by Cas12b 1. Cas12b1 relies on CRISPR RNA and tracrRNA for DNA cleavage. Bacterial Cas13a has been shown to have unique RNase activity for CRISPR RNA maturation, as opposed to its RNA-activated single-stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA processing behavior of Cas12 a. See, e.g., east-Seletsky, et al, "Two distinct RNase activities of CRISPR-Cas13a enable guide-RNA processing and RNA detection", nature,2016 Oct 13;538 (7624): 270-273, the entire contents of which are incorporated herein by reference. In vitro biochemical analysis of Cas13a in Leptotrichia shahii indicated that Cas13a was guided by single strand CRISPR RNA and was programmable to cleave ssRNA targets with complementary pre-spacer regions. Catalytic residues in two conserved HEPN domains mediate cleavage. Mutations in the catalytic residues produce a catalytic death RNA binding protein. See, e.g., abudayyeh et al, "C2C2 is a single-component programmable RNA-guide RNA-targeting CRISPR effector", science,2016 Aug 5;353 (6299), the entire contents of which are incorporated herein by reference.
The crystal structure of Alicyclobaccillus acidoterrastris Cas b1 (AacC 2c 1) has been reported to complex with chimeric single molecule guide RNAs (sgrnas). See, e.g., liu et al, "C2C1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism", mol. Cell,2017 Jan 19;65 (2): 310-322, the entire contents of which are incorporated by reference herein. The crystal structure in thermophilic acidophiles (Alicyclobacillus acidoterrestris) C2C1 that bind to target DNA as a ternary complex is also reported. See, e.g., yang et al, "PAM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonucleolytic", cell,2016 Dec 15;167 (7): 1814-1828, the entire contents of which are incorporated herein by reference. The conformations of the catalytic ability of AacC2C1 (in both cases of target and non-target DNA strands) have been captured independently, within a single RuvC catalytic pocket, with C2C1 mediated cleavage resulting in staggered seven nucleotide breaks of the target DNA. Structural comparison between the C2C1 ternary complex and the previously determined Cas9 and Cpf1 counterparts suggests a diversity of mechanisms used by the CRISPR-Cas9 system.
In some embodiments, the napDNAbp can be a C2C1, C2, or C2C3 protein. In some embodiments, napDNAbp is a C2C1 protein. In some embodiments, the napDNAbp is a Cas13a protein. In some embodiments, the napDNAbp is a Cas12c protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring Cas12b1 (C2C 1), cas13a (C2), or Cas12C (C2C 3) protein. In some embodiments, the napDNAbp is a naturally occurring Cas12b1 (C2C 1), cas13a (C2), or Cas12C (C2C 3) protein.
H.Cas9 annular arrangement body
In various embodiments, the guide editors used in the methods and compositions disclosed herein can comprise a circular arrangement of Cas 9.
The term "circularly permuted Cas9" or "circular permutation of Cas9" or "CP-Cas9" refers to any Cas9 protein or variant thereof that appears or has been modified or engineered as a circular permutation variant, meaning that the N-and C-termini of Cas9 proteins (e.g., wild-type Cas9 proteins) have been locally rearranged. This circular arrangement of Cas9 proteins or variants thereof retains the ability to bind DNA when complexed with guide RNAs (grnas). See Oakes et al, "Protein Engineering of Cas9 for enhanced function," Methods Enzymol,2014, 546:491-511, oakes et al, "CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification," Cell, january 10, 2019, 176:254-267, each of which is incorporated herein by reference. The present disclosure contemplates any previously known CP-Cas9 or use of a new CP-Cas9, so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with guide RNAs (grnas).
Any Cas9 protein described herein, including any variant, ortholog, or any engineered or naturally occurring Cas9 or equivalent thereof, can be reconfigured as a circular array variant.
In various embodiments, the circular arrangement of Cas9 can have the following structure:
n-terminal- [ original C-terminal ] - [ optional linker ] - [ original N-terminal ] -C-terminal.
As an example, the present disclosure contemplates the following circular arrangement of 1368 amino acids (numbering based on amino acid positions in SEQ ID NO: 2) of classical streptococcus pyogenes Cas9 (UniProtKB-Q99 ZW2 (Cas 9_strp1)).
N-terminal- [1268-1368] - [ optional linker ] - [1-1267] -C-terminal;
n-terminal- [1168-1368] - [ optional linker ] - [1-1167] -C-terminal;
n-terminal- [1068-1368] - [ optional linker ] - [1-1067] -C-terminal;
n-terminal- [968-1368] - [ optional linker ] - [1-967] -C-terminal;
n-terminal- [868-1368] - [ optional linker ] - [1-867] -C-terminal;
n-terminal- [768-1368] - [ optional linker ] - [1-767] -C-terminal;
n-terminal- [668-1368] - [ optional linker ] - [1-667] -C-terminal;
n-terminal- [568-1368] - [ optional linker ] - [1-567] -C-terminal;
n-terminal- [468-1368] - [ optional linker ] - [1-467] -C-terminal;
n-terminal- [368-1368] - [ optional linker ] - [1-367] -C-terminal;
n-terminal- [268-1368] - [ optional linker ] - [1-267] -C-terminal;
n-terminal- [168-1368] - [ optional linker ] - [1-167] -C-terminal;
n-terminal- [68-1368] - [ optional linker ] - [1-67] -C-terminal; or (b)
Corresponding circular arrays of N-terminal- [10-1368] - [ optional linker ] - [1-9] -C-terminal, or other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
In a particular embodiment, the circularly arranged Cas9 has the following structure (1368 amino acids based on streptococcus pyogenes Cas9 (UniProtKB-Q99 ZW2 (Cas 9_strp1) (numbering based on amino acid position in SEQ ID NO: 2)):
n-terminal- [102-1368] - [ optional linker ] - [1-101] -C-terminal;
n-terminal- [1028-1368] - [ optional linker ] - [1-1027] -C-terminal;
n-terminal- [1041-1368] - [ optional linker ] - [1-1043] -C-terminal;
n-terminal- [1249-1368] - [ optional linker ] - [1-1248] -C-terminal; or (b)
Corresponding circular arrangements of the N-terminus- [1300-1368] - [ optional linker ] - [1-1299] -C-terminus, or other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
In other embodiments, the circularly arranged Cas9 has the following structure (1368 amino acids (numbering based on amino acid positions in SEQ ID NO: 2) based on streptococcus pyogenes Cas9 (UniProtKB-Q99 ZW2 (Cas 9_strp1):
n-terminal- [103-1368] - [ optional linker ] - [1-102] -C-terminal;
n-terminal- [1029-1368] - [ optional linker ] - [1-1028] -C-terminal;
n-terminal- [1042-1368] - [ optional linker ] - [1-1041] -C-terminal;
n-terminal- [1250-1368] - [ optional linker ] - [1-1249] -C-terminal; or (b)
Corresponding circular arrangements of the N-terminus- [1301-1368] - [ optional linker ] - [1-1300] -C-terminus, or other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
In some embodiments, the circular arrangement can be formed by ligating the C-terminal fragment of Cas9 to the N-terminal fragment of Cas9 directly or by using a linker (e.g., an amino acid linker). In some embodiments, the C-terminal fragment can correspond to 95% or more of the amino acids at the C-terminus of Cas9 (e.g., about 1300-1368 amino acids), or 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10% or 5% or more of the amino acids at the C-terminus of Cas9 (e.g., any of the sequences of SEQ ID NOS: 54-63). The N-terminal portion can correspond to 95% or more of the amino acids (e.g., about 1-1300 amino acids) of the N-terminal of Cas9, or 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of the amino acids of the N-terminal of Cas9 (e.g., SEQ ID NO: 2).
In some embodiments, the circular arrangement can be formed by ligating the C-terminal fragment of Cas9 to the N-terminal fragment of Cas9 directly or by using a linker (e.g., an amino acid linker). In some embodiments, the C-terminal fragment rearranged to the N-terminus comprises or corresponds to 30% or less of the amino acids at the C-terminus of Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 2). In some embodiments, the C-terminal fragment rearranged to the N-terminus comprises or corresponds to 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% of the amino acids of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 2). In some embodiments, the C-terminal fragment rearranged to the N-terminus comprises or corresponds to 410 residues or less of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 2). In some embodiments, the C-terminal portion rearranged to the N-terminus comprises or corresponds to 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 residues of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 2). In some embodiments, the C-terminal portion rearranged to the N-terminus comprises or corresponds to 357, 341, 328, 120, or 69 residues of the C-terminus of Cas9 (e.g., cas9 of SEQ ID NO: 2).
In other embodiments, a circular arrangement Cas9 variant may be defined as a topology of the primary structure of Cas9 based on the following methodRearrangement based on SEQ ID NO:2, streptococcus pyogenes Cas9: (a) A circular arrangement (CP) site corresponding to the internal amino acid residue of Cas9 primary structure was selected, which split the original protein into two halves: an N-terminal region and a C-terminal region; (b) The Cas9 protein sequence is modified (e.g., by genetic engineering techniques) by moving the original C-terminal region (containing the CP site amino acid) to before the original N-terminal region, thereby forming a new N-terminal of the Cas9 protein now beginning with the CP site amino acid residue. The CP site may be located in any domain of the Cas9 protein, including, for example, the helix-II domain, ruvCIII domain, or CTD domain. For example, the CP site can be located at (relative to streptococcus pyogenes Cas9 of SEQ ID No. 2) original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1071, 1241249, or 1282. Thus, once relocated to the N-terminus, the original amino acids 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249 will become the new N-terminal amino acids. The naming of these CP-Cas9 proteins can be considered as Cas9-CP, respectively 181 ,Cas9-CP 199 、Cas9-CP 230 、Cas9-CP 270 、Cas9-CP 310 、Cas9-CP 1010 、Cas9-CP 1016 、Cas9-CP 1023 、Cas9-CP 1029 、Cas9-CP 1041 、Cas9-CP 1247 、Cas9-CP 1249 And Cas9-CP 1282 . The description is not meant to be limited to the sequences represented by SEQ ID NO:2, but can be implemented to make CP variants of any Cas9 sequence, whether at CP sites corresponding to these positions, or all at other CP sites. This description is not intended to limit the specific CP sites in any way. Almost any CP site can be used to form the CP-Cas9 variant.
The following provides for a sequence based on SEQ ID NO:2, wherein the linker sequence is underlined and the optional methionine (M) residues are shown in bold. It is to be understood that the present disclosure provides CP-Cas9 sequences that do not include a linker sequence or include a different linker sequence. It is understood that the CP-Cas9 sequence may be based on the sequence other than SEQ ID NO:2, and any examples provided herein are not meant to be limiting. Exemplary CP-Cas9 sequences are as follows:
/>
/>
/>
/>
cas9 circular arrays useful in guided editing constructs for use in the methods and compositions described herein. Exemplary C-terminal fragments of Cas9 (Cas 9 based on SEQ ID NO: 2) are provided below, which can rearrange to the N-terminus of Cas 9. It should be understood that such C-terminal fragments of Cas9 are exemplary and not meant to be limiting. These exemplary CP-Cas9 fragments have the following sequences:
I.Cas9 variants with modified PAM specificity
The guide editor used in the methods and compositions of the present disclosure may also comprise Cas9 variants with modified PAM specificity. Some aspects of the disclosure provide Cas9 proteins that exhibit activity against a target sequence that does not comprise classical PAM (5 ' -NGG-3', where N is A, C, G or T) at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NGG-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNG-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNA-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNC-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NNT-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NGT-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NGA-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NGC-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAA-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAC-3' pam sequence at its 3' -end. In some embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAT-3' pam sequence at its 3' -end. In other embodiments, the Cas9 protein exhibits activity against a target sequence comprising a 5' -NAG-3' pam sequence at its 3' -end.
It is to be understood that any amino acid mutation described herein (e.g., a 262T) from a first amino acid residue (e.g., a) to a second amino acid residue (e.g., T) can also include a mutation from the first amino acid residue to an amino acid residue that is similar (e.g., conserved) to the second amino acid residue. For example, a mutation of an amino acid having a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) can be a mutation of a second amino acid having a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, an alanine-to-threonine mutation (e.g., an a262T mutation) can also be an alanine-to-threonine amino acid (e.g., serine) that is similar in size and chemical nature to threonine. As another example, an amino acid mutation having a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation of a second amino acid having a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, a mutation of an amino acid having a polar side chain (e.g., serine, threonine, asparagine, or glutamine) can be a mutation of a second amino acid having a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Other similar pairs of amino acids include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan will recognize that such conservative amino acid substitutions may have little effect on the protein structure and may be well tolerated without compromising function. In some embodiments, any amino group provided herein that is mutated from an amino acid to threonine can be an amino acid mutation to serine. In some embodiments, any amino group in an amino acid mutation provided herein from one amino acid to arginine may be an amino acid mutation to lysine. In some embodiments, any amino group provided herein that is mutated from an amino acid to isoleucine can be mutated from an amino acid to alanine, valine, methionine, or leucine. In some embodiments, any amino group in an amino acid mutation provided herein from one amino acid to lysine can be mutated from an amino acid to arginine. In some embodiments, any amino group provided herein that is mutated from an amino acid to an amino acid aspartic acid can be an amino acid mutation to glutamic acid or asparagine. In some embodiments, any amino group in an amino acid mutation provided herein from one amino acid to valine can be an amino acid mutation to alanine, isoleucine, methionine, or leucine. In some embodiments, any amino group provided herein that is mutated from an amino acid to glycine can be an amino acid mutation to alanine. However, it is understood that other conserved amino acid residues will be recognized by those skilled in the art and that any amino acid mutation of other conserved amino acid residues is also within the scope of the present disclosure.
In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5' -NAA-3' pam sequence at its 3' -end. In some embodiments, the combination of mutations is present in any one of the clones listed in table 1. In some embodiments, the combination of mutations is a conservative mutation of the clones listed in table 1. In some embodiments, the Cas9 protein comprises a combination of mutations for any one of the Cas9 clones listed in table 1.
Table 1: NAA PAM clone
/>
In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 1.
In some embodiments, the polypeptide of SEQ ID NO:2, the Cas9 protein exhibits increased activity on a target sequence comprising no classical PAM (5 ' -NGG-3 ') at its 3' end compared to the streptococcus pyogenes Cas9 provided by 2. In some embodiments, the polypeptide of SEQ ID NO:2, the Cas9 protein exhibits at least a 5-fold increase in activity against a target sequence having a 3' end that is not immediately adjacent to the classical PAM sequence (5 ' -NGG-3 '). In some embodiments, the polypeptide of SEQ ID NO:2 exhibit an increase in activity of a Cas9 protein to a target sequence not immediately adjacent to a classical PAM sequence (5 '-NGG-3') of at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold over the activity of the same target sequence. In some embodiments, the 3' end of the target sequence is directly adjacent to the AAA, GAA, CAA or TAA sequence. In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity against a target sequence comprising a 5' -NAC-3' pam sequence at its 3' end. In some embodiments, the combination of mutations is present in any one of the clones listed in table 2. In some embodiments, the combination of mutations is a conservative mutation of the clones listed in table 2. In some embodiments, the Cas9 protein comprises a combination of mutations for any one of the Cas9 clones listed in table 2.
Table 2: NAC PAM cloning
/>
In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one of the variants of table 2.
In some embodiments, the polypeptide that hybridizes to SEQ ID NO:2, the Cas9 protein exhibits increased activity on a target sequence comprising no classical PAM (5 ' -NGG-3 ') at its 3' end compared to the streptococcus pyogenes Cas9 provided by 2. In some embodiments, the polypeptide that hybridizes to SEQ ID NO:2, the Cas9 protein exhibits at least a 5-fold increase in activity against a target sequence having a 3' end that is not immediately adjacent to the classical PAM sequence (5 ' -NGG-3 '). In some embodiments, the polypeptide that hybridizes to SEQ ID NO:2, the Cas9 protein exhibits at least a 10-fold, at least a 50-fold, at least a 100-fold, at least a 500-fold, at least a 1,000-fold, at least a 5,000-fold, at least a 10,000-fold, at least a 50,000-fold, at least a 100,000-fold, at least a 500,000-fold, or at least a 1,000,000-fold increase in activity of the Cas9 protein on a target sequence that is not immediately adjacent to the classical PAM sequence (5 '-NGG-3'). In some embodiments, the 3' end of the target sequence is directly adjacent to AAC, GAC, CAC or TAC sequences.
In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity against a target sequence comprising a 5' -NAT-3' pam sequence at its 3' end. In some embodiments, the combination of mutations is present in any one of the clones listed in table 3. In some embodiments, the combination of mutations is a conservative mutation of the clones listed in table 3. In some embodiments, the Cas9 protein comprises a combination of mutations for any one of the Cas9 clones listed in table 3.
Table 3: NAT PAM cloning
The above description of various napDNAbps that may be used in connection with a boot editor is not meant to be limiting in any way. The guide editor may include classical SpCas9, or any orthologous Cas9 protein, or any variant Cas9 protein, including any naturally occurring Cas9 variant, mutant, or other engineered version, which is known or may be prepared or evolved by directed evolution or other mutagenesis processes. In various embodiments, cas9 or Cas9 variants have nickase activity, i.e., cleave only the strand of the target DNA sequence. In other embodiments, cas9 or Cas9 variants have an inactive nuclease, i.e., a "dead" Cas9 protein. Other variant Cas9 proteins that may be used are those having a smaller molecular weight (e.g., for easier delivery) than classical SpCas9 or having a modified or rearranged primary amino acid structure (e.g., in the form of a circular arrangement). The guide editors used in the methods and compositions described herein may also comprise Cas9 equivalents, including Cas12a/Cpf1 and Cas12b proteins, which are the result of convergent evolution. napDNAbp (e.g., spCas9, cas9 variants, or Cas9 equivalents) as used herein may also comprise various modifications that alter/enhance its PAM specificity. Finally, the present application contemplates any Cas9, cas9 variant, or Cas9 equivalent that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence (e.g., a reference SpCas9 classical sequence or a reference Cas9 equivalent (e.g., cas12a/Cpf 1)).
In particular embodiments, the Cas9 variant with extended PAM capability is SpCas9 (H840A) VRQR (SEQ ID NO: 64) with the following amino acid sequence (wherein V, R, Q, R substitutions SpCas9 (H840A) relative to SEQ ID NO:33 are shown in bold underlines:
in another particular embodiment, the Cas9 variant with extended PAM capability is a SpCas9 (H840A) VRER with the following amino acid sequence (wherein V, R, E, R substitutions SpCas9 (H840A) relative to SEQ id no:51 are shown in bold underlines:
in some embodiments, the napDNAbp that functions without a classical PAM sequence is an Argonaute protein. An example of such a nucleic acid programmable DNA binding protein is the Argonaute protein (NgAgo) from saline-alkali bacillus griseus (Natronobacterium gregoryi). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5' phosphorylated ssDNA of about 24 nucleotides (gDNA) to direct it to a target site and create a DNA double strand break at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a pre-spacer adjacent motif (PAM). The use of nuclease-inactive NgAgo (dNgAgo) can greatly expand the bases that are likely to be targeted. Features and applications of NgAgo are described in Gao et al, nat biotechnol, 2016Jul;34 (7): 768-73 PubMed PMID:27136078; swars et al, nature.507 (7491) (2014): 258-61; and Swarts et al, nucleic Acids res.43 (10) (2015): 5120-9, each of which is incorporated herein by reference.
In some embodiments, the napDNAbp is a prokaryotic homolog of the Argonaute protein. Prokaryotic homologs of the Argonaute protein are known and have been described, for example, in makarovak, et al, "Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements", biol direct.2009 Aug 25;4: doi:10.1186/1745-6150-4-29, which is incorporated herein by reference in its entirety. In some embodiments, the napDNAbp is a Marinitoga piezophile Argonaute (MpAgo) protein. CRISPR-associated Marinitoga piezophile Argonaute (MpAgo) proteins use 5' -phosphorylation guides to cleave single stranded target sequences. All known Argonaute proteins use 5' guides. The crystal structure of the MpAgo-RNA complex shows a guide-strand binding site that contains residues that prevent 5' phosphate interactions. This data demonstrates the evolution of the Argonaute protein subclass with non-classical specificity for 5' -hydroxylation guides. See, for example, kaya et al, "A bacterial Argonaute with noncanonical guide RNA specificity", proc Natl Acad Sci usa.2016apr 12;113 (15): 4057-62, the entire contents of which are incorporated herein by reference). It is understood that other argonaute proteins may be used and are within the scope of the present disclosure.
Some aspects of the disclosure provide Cas9 domains with different PAM specificities. Typically, cas9 proteins, such as Cas9 from streptococcus pyogenes (spCas 9), require classical NGG PAM sequences to bind to specific nucleic acid regions. This may limit the ability to edit the desired bases within the genome. In some embodiments, the base editing fusion proteins provided herein may need to be placed at a precise location, such as placing the target base within a 4-base region (e.g., an "editing window") that is about 15 bases upstream of PAM. See Komor, a.c., et al, "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage" Nature 533, 420-424 (2016), the entire contents of which are incorporated herein by reference. Thus, in some embodiments, any fusion protein provided herein can comprise a Cas9 domain capable of binding to a nucleotide sequence that does not comprise a classical (e.g., NGG) PAM sequence. Cas9 domains that bind non-classical PAM sequences have been described in the art and are apparent to the skilled artisan. For example, cas9 domains that bind non-classical PAM sequences have been described in kleinsriver, b.p., et al, "Engineered CRISPR-Cas9 nucleases with altered PAM specificities" Nature 523, 481-485 (2015); and kleinsriver, b.p., et al, "Broadening the targeting range of Staphylococcus aureus CRJSPR-Cas9 by modifying PAM recognition" Nature Biotechnology, 1293-1298 (2015); the entire contents of each are hereby incorporated by reference.
For example, a napDNAbp domain with altered PAM specificity, e.g., a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to wild-type francissamum novyi (Francisella novicida) Cpfl (D917, E1006, and D1255) (SEQ ID NO: 66) having the amino acid sequence:
other napDNAbp domains with altered PAM specificity, for example domains with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to bacillus stearothermophilus (Geobacillus thermodenitrificans) Cas9 (SEQ ID NO: 20) with the amino acid sequence:
in some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) is a nucleic acid programmable DNA binding protein that does not require classical (NGG) PAM sequences. In some embodiments, the napDNAbp is an argonaute protein. An example of such a nucleic acid programmable DNA binding protein is the Argonaute protein (NgAgo) from saline-alkali bacillus griseus (Natronobacterium gregoryi). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5' phosphorylated ssDNA of about 24 nucleotides (gDNA), directs it to a target site, and double-strand breaks DNA at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a pre-spacer adjacent motif (PAM). The use of nuclease-inactivated NgAgo (dNgAgo) can greatly expand the bases that are likely to be targeted. Characterization and application of NgAgo is described in Gao et al, nat biotechnol, 34 (7): 768-73 (2016), pubMed PMID:27136078; swarots et al, nature,507 (7491): 258-61 (2014); and Swarts et al, nucleic Acids res.43 (10) (2015): 5120-9, each of which is incorporated herein by reference. The sequence of the Argonaute protein of the halophiles griseus is shown in SEQ ID NO: 67.
The disclosed fusion proteins may comprise a napDNAbp domain with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to a wild-type halophil Argonaute protein (SEQ ID NO: 67) having the amino acid sequence:
/>
furthermore, the variant or mutant Cas9 protein may be obtained or constructed using any available method. As used herein, the term "mutation" refers to the substitution of a residue within a sequence (e.g., a nucleic acid or amino acid sequence) with another residue, or the deletion or insertion of one or more residues within the sequence. Mutations are typically described by determining the position of an original residue followed by that residue in the sequence, and by the identity of the newly substituted residue. Various methods for making amino acid substitutions (mutations) provided herein are well known in the art and are provided, for example, in Green and Sambrook, molecular Cloning: a Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)). Mutations may include a variety of classes, such as single base polymorphisms, microrepeat regions, insertion deletions and inversions, and are not meant to be limiting in any way. Mutations may include "loss of function" mutations, which are the normal result of mutations that reduce or eliminate protein activity. Most loss-of-function mutations are recessive in that in the heterozygote, the second chromosomal copy carries an unmutated version of the gene encoding the fully functional protein, the presence of which compensates for the effects of the mutation. Mutations also include "function-gain" mutations, which are mutations that confer abnormal activity on proteins or cells that are not normally present. Many function-acquiring mutations are located in the regulatory sequences, rather than in the coding region, and therefore have many consequences. For example, mutations may result in expression of one or more genes in the wrong tissue, which has achieved functions that they normally do not. Due to its nature, the function-gain mutation is usually dominant.
Mutations can be introduced into the reference Cas9 protein using site-directed mutagenesis. Older site-directed mutagenesis methods known in the art rely on subcloning the sequence to be mutated into a vector, such as an M13 phage vector, which allows isolation of single-stranded DNA templates. In these methods, a mutagenic primer (i.e., a primer that is capable of annealing to the site to be mutated but has one or more mismatched nucleotides at the site to be mutated) is annealed to a single-stranded template, and then the complementary sequence of the template is polymerized starting from the 3' end of the mutagenic primer. The resulting duplex is then transformed into host bacteria and plaques are screened for the desired mutation. Recently, the PCR method has been adopted for site-directed mutagenesis, which has the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require subcloning. Several problems must be considered in performing PCR-based site-directed mutagenesis. First, in these methods, it is necessary to reduce the number of PCR cycles to prevent the expansion of unwanted mutations introduced by the polymerase. Second, selection must be made to reduce the number of non-mutant parent molecules that persist in the reaction. Third, the extended length PCR method is preferred to allow the use of a single PCR primer set. Fourth, because of the template-independent end-extension activity of some thermostable polymerases, it is often necessary to incorporate an end-fill step in the procedure prior to blunt-end ligation of PCR-generated mutant products.
Mutations can also be introduced by directed evolution processes, such as phage-assisted continuous evolution (PACE) or phage-assisted discontinuous evolution (PANCE). As used herein, the term "phage-assisted continuous evolution (PACE)" refers to continuous evolution employing phage as a viral vector. The general concept of PACE technology has been published as WO2010/028347 on, for example, international PCT application PCT/US2009/056194 filed on 8, 9, 2009, 3, 11, 2010; international PCT application PCT/US2011/066747 filed 12/22 2011, 6/28 of 2012 published as WO 2012/088381; U.S. patent No. 9,023,594 issued 5/2015; international PCT application PCT/US 2015/01022 filed on 1 month 20 of 2015, published as WO2015/134121 on 11 month 9 of 2015; and international PCT application PCT/US 2015/01022 filed on 1 month 20 2015, 10 month 20 2016, published as WO 2016/168831, each of which is incorporated herein by reference in its entirety. Variant Cas9 may also be obtained by phage-assisted discontinuous evolution (PANCE), as used herein refers to discontinuous evolution using phage as viral vector. PANCE is a simplified technique for rapid in vivo directed evolution, using sequential bottle transfer (serial flask transfers) of evolved "select phage" (SP), which contains the gene of interest to be evolved in fresh e.coli host cells, allowing the genes in the host e.coli to remain constant while the genes contained in SP are evolving. Continuous flask transfer has been a widely used method of microbiological laboratory evolution, and similar methods have recently been developed for phage evolution. The PANCE system is characterized by a lower stringency than the PACE system.
Any reference mentioned above to Cas9 or Cas9 equivalents is incorporated herein by reference in its entirety, if not already stated.
Other programmable nucleases
In various embodiments described herein, the guide editor comprises a napDNAbp, such as a Cas9 protein. These proteins are "programmable" by complexing with a guide RNA (or PEgRNA, as the case may be), which directs the Cas9 protein to the target site of DNA, which has a sequence that is partially complementary to the spacer of the gRNA (or PEgRNA), and also has the desired PAM sequence. However, in certain embodiments contemplated herein, napDNAbp may be substituted with a different type of programmable protein, such as a zinc finger nuclease or a transcription activator-like effector nuclease (TALEN).
It is contemplated that suitable nucleases do not necessarily need to be "programmed" by a nucleic acid targeting molecule (e.g., a guide RNA), but can be programmed by defining the specificity of a DNA binding domain, such as a nuclease in particular. As with the napdNAbp portion for guided editing, such alternative programmable nucleases are preferably modified to cleave only one strand of the target DNA. In other words, the programmable nuclease should preferably function as a nicking enzyme. Once a programmable nuclease (e.g., ZFN or TALEN) is selected, additional functionality can be engineered into the system to enable it to operate in a similar guided editing mechanism. For example, a programmable nuclease can be modified by binding (e.g., by chemical ligation) an RNA or DNA extension arm, wherein the extension arm comprises a Primer Binding Site (PBS) and a DNA synthesis template. The programmable nuclease can also bind (e.g., via a chemical or amino acid linker) to a polymerase, the nature of which depends on whether the extension arm is DNA or RNA. In the case of an RNA extension arm, the polymerase may be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of a DNA extension arm, the polymerase can be a DNA-dependent DNA polymerase (e.g., a prokaryotic polymerase, including Pol I, pol II, or Pol III, or a eukaryotic polymerase, including Pol a, pol b, pol g, pol d, pol e, or Pol z). The system may also include other functions added as a fusion to the programmable nuclease or in trans to promote the entire reaction (e.g., (a) helicase untwists DNA at the cleavage site to obtain a cleavage strand with a useful 3' end as a primer, (b) FEN1 to help remove the endogenous strand on the cleavage strand to drive the reaction towards replacement of the endogenous strand with the synthetic strand, or (c) npas 9: gRNA complex forms a second site nick on the opposite strand, which may help drive integration of the synthetic repair by favorable cellular repair of the non-editing strand). In a manner similar to the guided editing using napDNAbp, this complex with other programmable nucleases can be used for synthesis, and then the newly synthesized DNA replacement strand carrying the editing of interest is permanently installed into the target site of the DNA.
Suitable alternative programmable nucleases are well known in the art and can be used in place of napDNAbp: the gRNA complex builds an alternative guided editor system that can be programmed to selectively bind to a target site of DNA, and which can be further modified in the manner described above to co-localize the polymerase and RNA or DNA extension arms comprising primer binding sites and DNA synthesis templates to specific nicking sites. For example, a transcription activator-like effector nuclease (TALEN) can be used as a programmable nuclease in the guided editing methods and compositions described herein. TALENS is an artificial restriction enzyme produced by fusing the TAL effector DNA binding domain with a DNA cleavage domain. These reagents can achieve efficient, programmable and specific DNA cleavage, and are powerful tools for in situ genome editing. Transcription activator-like effectors (TALEs) can be engineered rapidly to bind almost any DNA sequence. As used herein, the term TALEN is broad and includes monomeric TALENs that can cleave double-stranded DNA without the aid of another TALEN. The term TALEN is also used to refer to one or both members of a pair of TALENs that are engineered to cooperatively cleave DNA at the same site. The cooperative TALENs may be referred to as left and right TALENs, which relate to the rotational orientation of DNA. See U.S. Ser. No. 12/965,590; U.S. serial No. 13/426,991 (U.S. patent No. 8,450,471); U.S. serial No. 13/427,040 (U.S. patent No. 8,440,431); U.S. serial No. 13/427,137 (U.S. patent No. 8,440,432); and U.S. Ser. No. 13/738,381, which is incorporated by reference in its entirety. Furthermore, TALENS is described in WO 2015/027134, U.S. Pat. No. 9,181,535, boch et al, "Breaking the Code ofDNABinding Specificity of TAL-TypeIII Effectors", science, vol.326, pp.1509-1512 (2009), bogdarove et al, TAL effects: customizable Proteins for DNA Targeting, science, vol.333, pp.1843-1846 (2011), cam et al, "Highly efficient generation of heritable zebrafish gene mutations using homo-and heterodimeric TALENs", nucleic Acids Research, vol.40, pp.8001-8010 (2012), and Cerak et al, "Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting", nucleic Acids Research, vol.39, no.17, e82 (2011), each of which is incorporated herein by reference.
Zinc finger nucleases can also be used as alternative programmable nucleases for guided editing in place of napDNAbp (e.g., cas9 nickase) in the methods and compositions disclosed herein. As with TALENS, the ZFN protein may be modified to function as a nicking enzyme, i.e., the ZFN is engineered to cleave only one strand of target DNA in a manner similar to that used with the guidance editor in the methods and compositions described herein. ZFN proteins have been widely described in the art, for example, carroll et al, "Genome Engineering with Zinc-Finger nucleic acids," Genetics, aug 2011, vol.188:773-782; durai et al, "Zinc finger nucleases: custom-designed molecular scissors for genome engineering of plant and mammalian cells, "Nucleic Acids Res,2005, vol.33:5978-90; and Gajet al, "ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering," Trends biotechnol.2013, vol.31:397-405, each of which is incorporated herein by reference in its entirety.
A guide editor: polymerase domain (e.g., reverse transcriptase)
The present disclosure provides compositions and methods for guided editing with increased editing efficiency and/or reduced indel formation by inhibiting DNA mismatch repair pathways while targeted site guided editing. Thus, the present disclosure provides methods of editing a nucleic acid molecule by guided editing, which methods involve contacting the nucleic acid molecule with inhibitors of guided editors, pegRNA and DNA mismatch repair pathways, thereby installing one or more modifications to the nucleic acid molecule at a target site with increased editing efficiency and/or reduced indels formation. The present disclosure also provides polynucleotides for editing a DNA target site by guided editing comprising a nucleic acid sequence encoding napDNAbp, a polymerase, and an inhibitor of the DNA mismatch repair pathway, wherein napDNAbp and polymerase are capable of installing one or more modifications in the DNA target site with increased editing efficiency and/or reduced indels formation in the presence of peprna. Thus, the methods and compositions described herein use a guided editor that may include a polymerase (e.g., reverse transcriptase).
In various embodiments, the guidance editors used in the methods and compositions disclosed herein include a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase), or variant thereof, which may be provided as a fusion protein with a napDNAbp or other programmable nuclease, or in trans.
Any polymerase can be used in the guidance editor and methods and compositions disclosed herein. The polymerase may be a wild-type polymerase, a functional fragment, a mutant, a variant, a truncated variant, or the like. The polymerase may include a wild-type polymerase from a eukaryotic, prokaryotic, archaebacterial, or viral organism, and/or the polymerase may be modified by genetic engineering, mutagenesis, directed evolution-based processes. The polymerase may include T7DNA polymerase, T5DNA polymerase, T4DNA polymerase, klenow fragment DNA polymerase, DNA polymerase III, and the like. The polymerase may also be thermostable and may include a Taq, tne, tma, pfu, tfl, tth, stoffel fragment,And->Polymerase, KOD, tgo, JDF3 and mutants, variants and derivatives thereof (see U.S. Pat. No. 5,436,149; U.S. Pat. No. 4,889,818; U.S. Pat. No. 4,965,185; U.S. Pat. No. 5,079,352; U.S. Pat. No. 5,614,365; U.S. Pat. No. 5,374,553; U.S. Pat. No. 5,270,179; U.S. Pat. No. 5,047,342; U.S. Pat. No. 5,512,462; WO92/06188; WO92/06200; WO96/10640; barnes, W.M., gene 112:29-35 (1992); lawyer, F.C., et al, PCR method appl.2:275-287 (1993); flaman, J. -M, et al Nue.acids Res.22 (15): 3259-3260 (1994), each of which is incorporated by reference). To synthesize longer nucleic acid molecules (e.g., nucleic acid molecules that are more than about 3-5Kb in length), at least two DNA polymerases can be used. In certain embodiments, one polymerase may substantially lack 3 'exonuclease activity, while another polymerase may have 3' exonuclease activity. Such pairing may include the same or different polymerases. Examples of DNA polymerases that substantially lack 3' exonuclease activity include, but are not limited to, taq, tne (exo-), tma (exo-), pfu (exo-), pwo (exo-), exo-KOD and TthDNA polymerases and mutants, variants and derivatives thereof.
The term "template DNA molecule" as used herein refers to a nucleic acid strand that synthesizes a complementary nucleic acid strand by a DNA polymerase in, for example, a primer extension reaction of a DNA synthesis template of PEgRNA.
As used herein, the term "template-dependent manner" is intended to refer to a process of template-dependent extension of a primer molecule (e.g., synthesis of DNA by DNA polymerase). The term "template-dependent manner" refers to polynucleotide synthesis of RNA or DNA, wherein the sequence of the newly synthesized polynucleotide strand is determined by well-known complementary base pairing rules (see, e.g., watson, j.d. et al, in: molecular Biology of the Gene,4th Ed., w.a. benjamin, inc., menlo Park, calif (1987)). The term "complementary" refers to the broad concept of sequence complementarity between regions of two polynucleotide strands or between two nucleotides through base pairing. Adenine nucleotides are well known to be capable of forming specific hydrogen bonds ("base pairing") with thymine or uracil nucleotides. Similarly, cytosine nucleotides are known to be capable of base pairing with guanine nucleotides. Thus, in the case of guided editing, it can be said that the polymerase of the guided editor "complements" the sequence of the DNA synthesis template with respect to the DNA single strand synthesized by the DNA synthesis template.
A.Exemplary polymerases
In various embodiments, the guidance editors used in the methods and compositions described herein comprise a polymerase. The present disclosure encompasses any wild-type polymerase obtained from any naturally occurring organism or virus or obtained from commercial or non-commercial sources. In addition, the polymerase that can be used in the guided editor can include any naturally occurring mutant polymerase, engineered mutant polymerase, or other variant polymerase, including truncated variants that retain functionality. The polymerases useful herein can also be designed to comprise specific amino acid substitutions, such as those specifically disclosed herein. In certain preferred embodiments, the polymerases useful for the guided editor used in the methods and compositions of the present disclosure are template-based polymerases, i.e., they synthesize nucleotide sequences in a template-dependent manner.
The polymerase is an enzyme that synthesizes a nucleotide chain and can be used in conjunction with the guided editor system used in the methods and compositions described herein. The polymerase is preferably a "template dependent" polymerase (i.e., a polymerase that synthesizes a nucleotide chain based on the sequence of nucleotide bases of the template chain). In some constructions, the polymerase may also be "template independent" (i.e., a polymerase that synthesizes a nucleotide chain without the need for a template chain). The polymerase may be further classified as "DNA polymerase" or "RNA polymerase". In various embodiments, the guided editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase may be a "DNA-dependent DNA polymerase" (i.e., the template molecule is thus a DNA strand). In this case, the DNA template molecule may be PEgRNA, wherein the extension arm comprises a DNA strand. In this case, the PEgRNA may be referred to as a chimeric or hybrid PEgRNA, which comprises an RNA portion (i.e., a guide RNA component, including a spacer region and a gRNA core) and a DNA portion (i.e., an extension arm). In various other embodiments, the DNA polymerase may be an "RNA-dependent DNA polymerase" (i.e., the template molecule is thus an RNA strand). In this case, PEgRNA is RNA, i.e. comprises RNA extension. The term "polymerase" may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., polymerase activity). Typically, the enzyme will begin synthesis at the 3 '-end of a primer that anneals to the polynucleotide template sequence (e.g., a primer sequence that anneals to the primer binding site of PEgRNA) and will proceed toward the 5' end of the template strand. "DNA polymerase" catalyzes the polymerization of deoxynucleotides. As used herein, reference to a DNA polymerase, the term DNA polymerase includes "functional fragments thereof. "functional fragment thereof" refers to any portion of a wild-type or mutant DNA polymerase that comprises less than the complete amino acid sequence of the polymerase and retains the ability to catalyze the polymerization of a polynucleotide under at least one set of conditions. Such a functional fragment may exist as a separate entity or it may be a component of a larger polypeptide, such as a fusion protein.
In some embodiments, the polymerase may be from a phage. Phage DNA polymerases typically lack 5 'to 3' exonuclease activity because the activity is encoded by a separate polypeptide. Examples of suitable DNA polymerases are T4, T7 and phi29DNA polymerases. Commercially available enzymes are: t4 (available from a number of sources, such as Epicentre) and T7 (available from a number of sources, such as unmodified from Epicentre, and DNA polymerase for 3 'to 5' exoT 7 "sequenase" from USB).
In other embodiments, the polymerase is an archaebacteria polymerase. 2 different types of DNA polymerase have been identified in archaebacteria: type I B/pol (homolog of Pfu from Pyrococcus furiosus (Pyrococcus furiosus)) and type II 2.pol (homolog of Pyrococcus furiosus (P.furiosus) DP1/DP 22-subunit polymerase). DNA polymerases from both classes have been shown to naturally lack related 5 'to 3' exonuclease activity and have 3 'to 5' exonuclease (proofreading) activity. Suitable DNA polymerases (pol I or pol II) can be derived from archaebacteria having an optimal growth temperature similar to the desired assay temperature. Thermostable archaebacteria DNA polymerases were isolated from Pyrococcus species (Pyrococcus), GB-D species, woesii, abysi, horikoshii, thermococcus (Thermococcus) species (kodakaraensis KOD1, litora, species 9 degrees North-7, species JDF-3, gorgon arius), thermomyces crypticus (Pyrodictium occultum) and Archaeoglobus fulgidus.
The polymerase may also be from a eubacterial species. There are 3 classes of eubacterial DNA polymerases pol I, H and III. Enzymes in the Pol IDNA polymerase family have 5 'to 3' exonuclease activity, and certain members also exhibit 3 'to 5' exonuclease activity. Pol IIDNA polymerase naturally lacks 5 'to 3' exonuclease activity, but does exhibit 3 'to 5' exonuclease activity. Pol III DNA polymerase represents the major replicative DNA polymerase of a cell, consisting of multiple subunits. Pol III catalytic subunits lack 5 'to 3' exonuclease activity, but in some cases 3 'to 5' exonuclease activity is located in the same polypeptide. There are various Pol I DNA polymerases commercially available, some of which are modified to reduce or eliminate 5 'to 3' exonuclease activity.
Suitable thermostable pol I DNA polymerases can be isolated from a variety of thermophilic eubacteria, including Thermus (Thermus) species and Thermotoga maritima (Thermotoga maritima), such as Thermus aquaticus (Thermus aquaticus, taq), thermus thermophilus (Thermus thermophilus, tth) and Thermotoga maritima (Thermotoga maritima, tmaUlTma). Other eubacteria relevant to those listed above are described in Thermophilic Bacteria (Kristjansson, j.k., ed.) CRC Press, inc., boca Raton, fla.,1992.
The present invention further provides chimeric or non-chimeric DNA polymerases, chemically modified according to the methods disclosed in U.S. Pat. nos. 5,677,152, 6,479,264 and 6,183,998, the contents of which are incorporated herein by reference in their entirety. Other archaeal DNA polymerases relevant to those listed above are described in the following references: archaea: a Laboratory Manual (Robb, f.t. and Place, a.r., eds.), cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y.,1995and Thermophilic Bacteria (Kristjansson, j.k., ed.) CRC Press, inc, boca Raton, fla, 1992.
B. Exemplary reverse transcriptase
In various embodiments, the guided editor used in the methods and compositions described herein comprises a reverse transcriptase as a polymerase. The present disclosure encompasses any wild-type reverse transcriptase obtained from any naturally occurring organism or virus or obtained from commercial or non-commercial sources. In addition, reverse transcriptase useful in the guided editor used in the methods and compositions of the present disclosure can include any naturally occurring mutant RT, engineered mutant RT, or other variant RT, including truncated variants that retain function. RT may also be designed to contain specific amino acid substitutions, such as those specifically disclosed herein.
Reverse transcriptase is a multifunctional enzyme, typically having three enzymatic activities, including RNA-dependent and DNA-dependent DNA polymerization activities, and RNaseH activity that catalyzes the cleavage of RNA in RNA-DNA hybrids. Some reverse transcriptase mutants have disabled the RNaseH moiety to prevent accidental damage to mRNA. These enzymes that synthesize complementary DNA (cDNA) using mRNA as a template were first identified in RNA viruses. Reverse transcriptase is then isolated and purified directly from virus particles, cells or tissues (see, e.g., kacian et al, 1971,Biochim.Biophys.Acta 46:365-83; yang et al, 1972, biochem. Biophys. Res. Comm.47:505-11; gerard et al, 1975, J. Virol.15:785-97; liu et al, 1977,Arch.Virol.55 187-200; kato et al, 1984,J.Virol.Methods 9:325-39; luke et al, 1990, biochem.29:1764-69, and Le Grice et al, 1991, J. Virol.65:7004-07, each of which is incorporated by reference). Recently, mutants and fusion proteins have been created in order to seek improved properties such as thermostability, fidelity and activity. Any wild-type, variant and/or mutant form of reverse transcriptase known in the art or that can be prepared using methods known in the art is contemplated herein.
The Reverse Transcriptase (RT) gene (or genetic information contained therein) may be obtained from a number of different sources. For example, the gene may be obtained from a retrovirus-infected eukaryotic cell, or from a number of plasmids containing a portion of the retroviral genome or the entire genome. In addition, messenger RNA-like RNA containing RT gene can be obtained from retroviruses. Examples of sources of RT include, but are not limited to, moloney murine leukemia Virus (M-MLV or MLVRT); human T cell leukemia virus type 1 (HTLV-1); bovine Leukemia Virus (BLV); rous Sarcoma Virus (RSV); human Immunodeficiency Virus (HIV); yeasts, including Saccharomyces, neurospora, drosophila; a primate; and rodents. See, for example, weiss, et al, U.S. patent No. 4,663,290 (1987); gerard, g.r., DNA:271-79 (1986); kotewicz, m.l., et al, gene 35:249-58 (1985); tame, n., et al, proc.Natl.Acad.Sci. (USA): 4944-48 (1985); roth, m.j., at al., j.biol.chem.260:9326-35 (1985); michel, f., et al, nature 316:641-43 (1985); akins, r.a., et al, cell 47:505-16 (1986), EMBO J.4:1267-75 (1985); and Fawcett, d.f., cell 47:1007-15 (1986) (each of which is incorporated herein by reference in its entirety).
Wild type RT
Exemplary enzymes for use with the leader editor may include, but are not limited to, M-MLV reverse transcriptase and RSV reverse transcriptase. Enzymes having reverse transcriptase activity are commercially available. In certain embodiments, the reverse transcriptase is provided in trans to other components of the guidance editor system. That is, the reverse transcriptase is expressed or otherwise provided as a separate component, i.e., not as a fusion protein with napDNAbp.
One of ordinary skill in the art will recognize wild-type reverse transcriptases including, but not limited to, moloney murine leukemia Virus (M-MLV); human Immunodeficiency Virus (HIV) reverse transcriptase and avian sarcomSup>A-leukemiSup>A virus (ASLV) reverse transcriptase, including but not limited to Rous SarcomSup>A Virus (RSV) reverse transcriptase, avian Myeloblastosis Virus (AMV) reverse transcriptase, avian Erythroblastosis Virus (AEV)) helper virus MCAV reverse transcriptase, avian myelomSup>A virus MC29 helper virus MCAV reverse transcriptase, avian reticuloendotheliosis virus (REV-T) helper virus REV-Sup>A reverse transcriptase, avian sarcomSup>A virus UR2 helper virus UR2AV reverse transcriptase, avian sarcomSup>A virus Y73AV helper virus reverse transcriptase, rous Associated Virus (RAV) reverse transcriptase and Myeloblastosis Associated Virus (MAV) reverse transcriptase, may be suitably used in the subject methods and compositions described herein.
Exemplary wild-type RT enzymes are as follows:
/>
/>
/>
/>
variant and error prone RT
Reverse transcriptase is essential for the synthesis of complementary DNA (cDNA) strands from RNA templates. Reverse transcriptase is an enzyme consisting of different domains exhibiting different biochemical activities. These enzymes catalyze the synthesis of DNA from RNA templates as follows: in the presence of the annealing primer, reverse transcriptase binds to the RNA template and initiates polymerization. RNA-dependent DNA polymerase activity synthesizes complementary DNA (cDNA) strands and binds dntps. RNaseH activity degradation DNA: RNA templates for RNA complexes. Thus, reverse transcriptase comprises (a) binding activity that recognizes and binds to RNA/DNA hybrids, (b) RNA-dependent DNA polymerase activity, and (c) RNaseH activity. In addition, reverse transcriptases are generally considered to have different properties, including their thermostability, sustained synthesis capacity (dNTP incorporation rate) and fidelity (or error rate). Reverse transcriptase variants contemplated herein can include any mutation of the reverse transcriptase enzyme that affects or alters any one or more of these enzyme activities (e.g., RNA-dependent DNA polymerase activity, RNaseH activity, or DNA/RNA hybrid binding activity) or enzyme properties (e.g., thermostability, sustained synthesis capacity, or fidelity). Such variants are available in the art in public areas, commercially available, or can be prepared using known mutagenesis methods, including directed evolution processes (e.g., PACE or PANCE).
In various embodiments, the reverse transcriptase may be a variant reverse transcriptase. As used herein, "variant reverse transcriptase" includes any naturally occurring or genetically engineered variant that comprises one or more mutations (including single mutations, inversions, deletions, insertions, and rearrangements) relative to a reference sequence (e.g., reference wild-type sequence). RT may have several activities, including RNA-dependent DNA polymerase activity, ribonuclease H activity, and DNA-dependent DNA polymerase activity. In general, these activities enable enzymes to convert single-stranded RNA into double-stranded cDNA. In retroviruses and retrotransposons, such cDNA can integrate into the host genome from which new copies of RNA can be produced by transcription by the host cell. Variant RT may comprise mutations that affect one or more of these activities (either decreasing or increasing these activities, or eliminating these activities altogether). Furthermore, variant RT may comprise one or more mutations that make RT more or less stable, less prone to aggregation, and facilitate purification and/or detection, and/or modification of other properties or characteristics.
One of ordinary skill in the art will recognize variant reverse transcriptases derived from other reverse transcriptases including, but not limited to, moloney murine leukemia Virus (M-MLV); human Immunodeficiency Virus (HIV) reverse transcriptase and avian sarcomSup>A-leukemiSup>A virus (ASLV) reverse transcriptase, including but not limited to Rous SarcomSup>A Virus (RSV) reverse transcriptase, avian Myeloblastosis Virus (AMV) reverse transcriptase, avian Erythroblastosis Virus (AEV)) helper virus MCAV reverse transcriptase, avian myelomSup>A virus MC29 helper virus MCAV reverse transcriptase, avian reticuloendotheliosis virus (REV-T) helper virus REV-Sup>A reverse transcriptase, avian sarcomSup>A virus UR2 helper virus UR2AV reverse transcriptase, avian sarcomSup>A virus Y73AV helper virus reverse transcriptase, rous Associated Virus (RAV) reverse transcriptase and Myeloblastosis Associated Virus (MAV) reverse transcriptase may be suitably used in the subject methods and compositions described herein.
One way to prepare variant RT is by genetic modification (e.g., by modifying the DNA sequence of a wild-type reverse transcriptase). Numerous methods are known in the art that allow random and targeted mutation of DNA sequences (see, e.g., ausubel et al short Protocols in Molecular Biology (1995) 3.sup.rd Ed.John Wiley)&Sons, inc.). In addition, there are a number of commercially available kits for site-directed mutagenesis, including conventional methods and PCR-based methods. Examples include QuikChange site-directed mutagenesis kitSite-directed mutagenesis kit (NEWENGLAND)) And GeneArt TM Site-directed mutagenesis System (THERMOSHOER->)。
In addition, mutant reverse transcriptases may be produced by insertion mutation or truncation (N-terminal, internal or C-terminal insertion or truncation) according to methods known to those skilled in the art. As used herein, the term "mutation" refers to the substitution of a residue within a sequence (e.g., a nucleic acid or amino acid sequence) with another residue, or the deletion or insertion of one or more residues in the sequence. Mutations are generally described herein by determining the position of an original residue followed by that residue in the sequence and the identity of the newly substituted residue. Various methods for making amino acid substitutions (mutations) provided herein are well known in the art and are described by, for example, green and Sambrook, molecular Cloning: a Laboratory Manual (4 th ed., cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2012)). Mutations may include a variety of classes, such as single base polymorphisms, microreplication regions, indels, and inversions, and are not meant to be limiting in any way. Mutations may include "loss of function" mutations, which are the normal result of mutations that reduce or eliminate protein activity. Most loss-of-function mutations are recessive in that in the heterozygote, the second chromosomal copy carries an unmutated version of the gene encoding the full-function protein, whose presence compensates for the effects of the mutation. Mutations also include "gain of function" mutations, a mutation that confers abnormal activity on a protein or cell that is not normally present. Many function-acquiring mutations are located in the regulatory sequences, rather than in the coding region, and therefore have many consequences. For example, mutations may result in expression of one or more genes in the wrong tissue, which has obtained their commonly lacking function. Due to its nature, the function-gain mutation is usually dominant.
Earlier site-directed mutagenesis methods known in the art have relied on subcloning the sequence to be mutated into a vector, such as an M13 phage vector, which allows isolation of single-stranded DNA templates. In these methods, a mutagenic primer (i.e., a primer that is capable of annealing to the site to be mutated but carries one or more mismatched nucleotides at the site to be mutated) is annealed to a single-stranded template, and then the complementary sequence of the template is polymerized starting from the 3' end of the mutagenic primer. The resulting duplex is then transformed into host bacteria and plaques are screened for the desired mutation.
Recently, the PCR method has been adopted for site-directed mutagenesis, which has the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require subcloning. Several problems must be considered in performing PCR-based site-directed mutagenesis. First, in these methods, it is necessary to reduce the number of PCR cycles to prevent unwanted mutation expansion of the polymerase. Second, selection must be made to reduce the number of non-mutant parent molecules that persist in the reaction. Third, in order to allow the use of a single PCR primer set, an extended length PCR method is preferred. Fourth, because of the template-independent end extension activity of some thermostable polymerases, it is often necessary to add an end-fill step to the program prior to blunt-end ligation of PCR-generated mutant products.
Random mutagenesis methods exist in the art that will result in a set of mutants with one or more randomly located mutations. Such a set of mutants may then be screened for a property that exhibits an increased stability relative to the wild-type reverse transcriptase, for example.
An example of a random mutagenesis method is the so-called "error-prone PCR method". As the name suggests, this method amplifies a given sequence under conditions where the DNA polymerase does not support high fidelity incorporation. Although the conditions that promote misincorporation of different DNA polymerases vary, one skilled in the art can determine such conditions for a given enzyme. A key variable in amplification fidelity for many DNA polymerases is, for example, the type and concentration of divalent metal ions in the buffer. The use of manganese ions and/or changes in magnesium or manganese ion concentration can therefore be applied to affect the error rate of the polymerase.
In various aspects, the RT of the guided editor may be an "error-prone" reverse transcriptase variant. Error-prone reverse transcriptase known and/or available in the art may be used. The error rate of any particular reverse transcriptase is a property of the enzyme "fidelity" that represents the accuracy of template directed polymerization of DNA against its RNA template. RT with high fidelity has a low error rate. In contrast, RT with low fidelity has a high error rate. The fidelity of M-MLV-based reverse transcriptase is reported to have an error rate in the range of one error in the synthesized 15,000 to 27,000 nucleotides. See Boutabout et al, "DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Ty," Nucleic Acids Res,2001, 29:2217-2222, which is incorporated by reference. Thus, for the purposes of this application, those reverse transcriptases that are considered "error-prone" or that are considered to have "error-prone fidelity" are those reverse transcriptases that have an error rate of less than one error in the 15,000 nucleotides synthesized.
Error-prone reverse transcriptase can also be produced by mutagenesis of the starting RT enzyme (e.g., wild-type M-MLVRT). Mutagenesis methods are not limited and may include directed evolution processes such as phage-assisted continuous evolution (PACE) or phage-assisted discontinuous evolution (PANCE). As used herein, the term "phage-assisted continuous evolution (PACE)" refers to continuous evolution employing phage as a viral vector. The general concept of PACE technology has been described in, for example, international PCT application PCT/US2009/056194 filed on 8 th 9 th 2009, published as WO2010/028347 on 11 th 3 th 2010; PCT international application PCT/US2011/066747 filed 12/22 2011, 6/28 of 2012 published as WO 2012/088381; U.S. patent No. 9,023,594 issued 5/2015; international PCT application No. PCT/US 2015/01022, filed on 1 month 20 2015, published as WO2015/134121 on 9 month 11 2015, and International PCT application No. PCT/US2016/027795, filed on 4 month 15 2016, published as WO 2016/168831 on 10 month 20, each of which is incorporated herein by reference in its entirety.
Error-prone reverse transcriptase can also be obtained by phage-assisted discontinuous evolution (PANCE), which as used herein refers to discontinuous evolution using phage as a viral vector. PANCE is a simplified technique for rapid in vivo directed evolution, using successive flask transfer of the evolved "select phage" (SP), which contains the gene of interest to be evolved in fresh e.coli host cells, allowing the genes in the host e.coli to remain constant while the genes contained in SP are evolving constantly. Continuous flask transfer has been a widely used method of microbiological laboratory evolution, and similar methods have recently been developed for phage evolution. The PANCE system is characterized by a lower stringency than the PACE system.
Other error-prone reverse transcriptases have been described in the literature, each of which is contemplated for use in the methods and compositions herein. For example, error-prone reverse transcriptases have been described in Bebenek et al, "Error-prone Polymerization by HIV-1 Reverse Transcriptase," J Biol Chem,1993, vol.268:10324-10334, sebastian-Martin et al, "Transcriptional inaccuracy threshold attenuates differences in RNA-dependent DNA synthesis fidelity between retroviral reverse transcriptases," Scientific Reports,2018, vol.8:627, each of which is incorporated by reference. Further, reverse transcriptases, including error-prone reverse transcriptases, are available from commercial suppliers, includingReverse transcriptase, AMV reverse transcriptase,>reverse transcriptase and M-MuLV reverse transcriptase, both from NEW ENGLAND +.>Or AMV reverse transcriptase XL, SMART script reverse transcriptase, GPR ultrapure MMLV reverse transcriptase, all from TAKARA BIO USA, INC. (previous CLONTECH).
The present disclosure also contemplates reverse transcriptase having mutations in the RNaseH domain. As described above, one of the intrinsic properties of reverse transcriptase is RNaseH activity, which cleaves RNA while polymerizing: RNA templates of cDNA hybrids. RNaseH activity may not be suitable for synthesis of long cDNA because RNA templates may degrade before full length reverse transcription is complete. RNaseH activity may also decrease reverse transcription efficiency, possibly due to its competition with the polymerase activity of the enzyme. Accordingly, the present disclosure contemplates any reverse transcriptase variant comprising modified RNaseH activity.
The present disclosure also contemplates reverse transcriptases having mutations in the RNA-dependent DNA polymerase domain. As described above, one of the intrinsic properties of reverse transcriptase is RNA-dependent DNA polymerase activity, which binds nucleobases to the sequence consisting of RNA: the template RNA strand of the cDNA hybrid is in the strand of the nascent cDNA encoded by the template RNA strand. RNA-dependent DNA polymerase activity (i.e., in terms of its incorporation rate) can be increased or decreased to increase or decrease the sustained synthesis capacity of the enzyme. Thus, the present disclosure contemplates any reverse transcriptase variant comprising modified RNA-dependent DNA polymerase activity such that the sustained synthesis capacity of the enzyme is increased or decreased relative to the unmodified version.
Reverse transcriptase variants having altered thermostability characteristics are also contemplated herein. The ability of reverse transcriptase to withstand high temperatures is an important aspect of cDNA synthesis. The elevated reaction temperature helps denature RNAs with strong secondary structures and/or high GC content, allowing reverse transcriptase to read sequences. Thus, reverse transcription at higher temperatures can achieve full-length cDNA synthesis and higher yields, which may lead to improved generation of 3' flap ssDNA as a result of the pilot editing process. The optimal temperature range for wild-type M-MLV reverse transcriptase is typically 37-48 ℃; however, it is possible to introduce mutations that allow reverse transcription activity at higher temperatures than 48℃including 49℃50℃51℃52℃53℃54℃55℃56℃57℃58℃59℃63℃63℃64℃65℃66℃and higher.
Variant reverse transcriptases contemplated herein, including error-prone RT, thermostable RT, RT that increases the ability to continue synthesis, may be engineered by various conventional strategies, including mutagenesis or evolutionary processes. In some cases, variants may be produced by introducing a single mutation. In other cases, a variant may require more than one mutation. For those mutants that contain more than one mutation, the effect of a given mutation can be assessed by introducing the identified mutation into the wild-type gene by site-directed mutagenesis and isolating it from other mutations carried by the particular mutant. Screening assays for the single mutants so generated will allow the effect of the mutation to be determined separately.
Variant RT enzymes as used herein may also include other "RT variants" that are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference RT protein (including any wild-type RT), or mutant RT, or fragment RT, or other variants of RT, disclosed or contemplated herein or known in the art.
In some embodiments, an RT variant may have an amino acid change of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, or up to 100, or up to 200, or up to 300, or up to 400, or up to 500, or more, compared to a reference RT. In some embodiments, the RT variants comprise fragments of the reference RT such that the fragments are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragments of the reference RT. In some embodiments, the fragment is the corresponding wild-type RT (M-MLV reverse transcriptase) (e.g., SEQ ID NO: 81) or SEQ ID NO:69-79, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of any of the reverse transcriptases.
In some embodiments, the present disclosure may also utilize RT fragments that retain their functionality and are fragments of any of the RT proteins disclosed herein. In some embodiments, the RT fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, or up to 600 or more amino acids in length.
In other embodiments, the present disclosure may also utilize RT variants truncated by a certain number of amino acids at the N-terminus or C-terminus, or both, resulting in sufficient polymerase function remaining in the truncated variants. In some embodiments, the RT truncated variant has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 amino acids at the N-terminal end of the protein. In other embodiments, the RT truncated variant has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 1090210, 220, 230, 240, or 250 amino acids at the C-terminal end of the protein. In other embodiments, the RT truncation variants have truncations of the same or different lengths at the N-terminal and C-terminal ends.
For example, the guided editors used in the methods and compositions disclosed herein can include truncated versions of M-MLV reverse transcriptase. In this embodiment, the reverse transcriptase contains 4 mutations (D200N, T306K, W313F, T330P; note that the L603W mutation present in PE2 is no longer present due to truncation). The DNA sequence encoding such a truncated editor is 522bp smaller than PE2, so it may be suitable for applications where delivery of the DNA sequence is challenging due to its size (i.e., adeno-associated virus and lentiviral delivery). This embodiment is called MMLV-RT (truncated) and has the following amino acid sequence:
in various embodiments, the guide editors used in the methods and compositions disclosed herein can comprise one of the RT variants described herein, or an RT variant that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the reference Cas9 variants.
In other embodiments, the present methods and compositions can utilize DNA polymerases that have evolved to reverse transcriptases, described in Effefson et al, "Synthetic evolutionary ongin of a proofreading reverse transcriptase," Science, june 24, 2016, vol.352:1590-1593, the contents of which are incorporated herein by reference.
In certain other embodiments, the reverse transcriptase is provided as a module that also comprises a napDNAbp fusion protein. In other words, in some embodiments, the reverse transcriptase is fused to napDNAbp as a fusion protein.
In various embodiments, the variant reverse transcriptase may consist of the sequence set forth in SEQ ID NO:81, a wild-type M-MLV reverse transcriptase.
In various embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising one or more of the following mutations: SEQ ID NO:81 or other wild-type RT polypeptide sequence P51L, S67K, E69K, L139P, T197A, D200N, H R, F N, E302K, E302R, T306K, F309N, W313F, T330P, L345G, L G, N454 79524G, E562Q, D583N, H594Q, L603W, E K, or D653N at the corresponding amino acid position in the wild-type M-MLV RT or other wild-type RT polypeptide sequence.
Some exemplary reverse transcriptases are provided below, which may be fused to a napDNAbp protein or provided as separate proteins according to various embodiments of the present disclosure. Exemplary reverse transcriptases include variants having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to a wild-type enzyme or a portion of the following:
/>
/>
/>
/>
/>
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising one or more of the following mutations: SEQ ID NO:81 or further wild-type RT polypeptide sequence P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D583X, H594X, L603X, E607X, or D653X, where "X" may be any amino acid.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is L.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is P.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is a.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is R.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an E302X mutation at the corresponding amino acid position of the wild-type M-MLV RT or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an E302X mutation at the corresponding amino acid position of the wild-type M-MLV RT or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is R.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is F.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is P.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is G.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is G.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an N454X mutation at the corresponding amino acid position of the wild-type M-MLV RT or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is G.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an E562X mutation at the corresponding amino acid position of the wild type M-MLV RT or an additional wild type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is Q.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is Q.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an L603X mutation at the corresponding amino acid position of the wild-type M-MLV RT or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is W.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an E607X mutation at the corresponding amino acid position of the wild-type M-MLV RT or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is K.
In various other embodiments, the guidance editors (RT as fusion partner or provided in trans) used in the methods and compositions described herein may include variant RT comprising the amino acid sequence of SEQ ID NO:81 or an additional wild-type RT polypeptide sequence, wherein "X" may be any amino acid. In certain embodiments, X is N.
Some exemplary reverse transcriptases are provided below, which may be fused to a napDNAbp protein or provided as separate proteins according to various embodiments of the present disclosure. Exemplary reverse transcriptases include those which bind to a polypeptide consisting of SEQ ID NO:81-98 has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type enzyme or to a portion of the enzyme.
The guidance editors used in the methods and compositions described herein contemplate any publicly available reverse transcriptases described or disclosed in the following U.S. patents (each of which is incorporated by reference in their entirety): U.S. patent No.: 10,202,658;10,189,831;10 150,955;9,932,567;9,783,791;9,580,698;9,534,201; and 9,458,484, and any variants thereof using known methods of installing mutations or known methods of evolving proteins. The following references describe reverse transcriptases in the art. Each of which is incorporated herein by reference in its entirety.
Herzig,E.,Voronin,N.,Kucherenko,N.&Hizi,A.A Novel Leu92Mutant of HIV-1 Reverse Transcriptase with a Selective Deficiency in Strand Transfer Causes a Loss of Viral Replication.J.Virol.89,8119-8129(2015).
Mohr,G.et al.A Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition.Mol.Cell 72,700-714.e8(2018).
Zhao,C.,Liu,F.&Pyle,A.M.An ultraprocessive,accurate reverse transcriptase encoded by a metazoan group II intron.RNA 24,183-195(2018).
Zimmerly,S.&Wu,L.An Unexplored Diversity of Reverse Transcriptases in Bacteria.Microbiol Spectr 3,MDNA3-0058-2014(2015).
Ostertag,E.M.&Kazazian Jr,H.H.Biology of Mammalian L1 Retrotransposons.Annual Review of Genetics 35,501-538(2001).
Perach,M.&Hizi,A.Catalytic Features of the Recombinant Reverse Transcriptase of Bovine Leukemia Virus Expressed in Bacteria.Virology 259,176-189(1999).
Lim,D.et al.Crystal structure of the moloney murine leukemia virus RNase H domain.J.Virol.80,8379-8389(2006).
Zhao,C.&Pyle,A.M.Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution.Nature Structural&Molecular Biology 23,558-565(2016).
Griffiths,D.J.Endogenous retroviruses in the human genome sequence.Genome Biol.2,REVIEWS1017(2001).
Baranauskas,A.et al.Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants.Protein Eng Des Sel 25,657-668(2012).
Zimmerly,S.,Guo,H.,Perlman,P.S.&Lambowltz,A.M.Group II intron mobility occurs by target DNA-primed reverse transcription.Cell 82,545-554(1995).
Feng,Q.,Moran,J.V.,Kazazian,H.H.&Boeke,J.D.Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition.Cell 87,905-916(1996).
Berkhout,B.,Jebbink,M.&Zsíros,J.Identification of an Active Reverse Transcriptase Enzyme Encoded by a Human Endogenous HERV-K Retrovirus.Journal of Virology 73,2365-2375(1999).
Kotewicz,M.L.,Sampson,C.M.,D’Alessio,J.M.&Gerard,G.F.Isolation of cloned Moloney murine leukemia virus reverse transcriptase lacking ribonuclease H activity.Nucleic Acids Res 16,265-277(1988).
Arezi,B.&Hogrefe,H.Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer.Nucleic Acids Res 37,473-481(2009).
Blain,S.W.&Goff,S.P.Nuclease activities of Moloney murine leukemia virus reverse transcriptase.Mutants with altered substrate specificities.J.Biol.Chem.268,23585-23592(1993).
Xiong,Y.&Eickbush,T.H.Origin and evolution ofretroelements based upon their reverse transcriptase sequences.EMBO J 9,3353-3362(1990).
Herschhorn,A.&Hizi,A.Retroviral reverse transcriptases.Cell.Mol.Life Sci.67,2717-2747(2010).
Taube,R.,Loya,S.,Avidan,O.,Perach,M.&Hizi,A.Reverse transcriptase of mouse mammary tumour virus:expression in bacteria,purification and biochemical characterization.Biochem.J.329(Pt 3),579-587(1998).
Liu,M.et al.Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage.Science 295,2091-2094(2002).
Luan,D.D.,Korman,M.H.,Jakubczak,J.L.&Eickbush,T.H.Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site:a mechanism for non-LTR retrotransposition.Cell 72,595-605(1993).
Nottingham,R.M.et al.RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase.RNA 22,597-613(2016).
Telesnitsky,A.&Goff,S.P.RNase H domain mutations affect the interaction between Moloney murine leukemia virus reverse transcriptase and its primer-template.Proc.Natl.Acad.Sci.U.S.A.90,1276-1280(1993).
Halvas,E.K.,Svarovskaia,E.S.&Pathak,V.K.Role of Murine Leukemia Virus Reverse Transcriptase Deoxyribonucleoside Triphosphate-Binding Site in Retroviral Replication and In Viivo Fidelity.Journal of Virology 74,10349-10358(2000).
Nowak,E.et al.Structural analysis of monomeric retroviral reverse transcriptase in complex with an RNA/DNA hybrid.Nucleic Acids Res 41,3874-3887(2013).
Stamos,J.L.,Lentzsch,A.M.&Lambowitz,A.M.Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications.Molecular Cell 68,926-939.e4(2017).
Das,D.&Georgiadis,M.M.The Crystal Structure of the Monomeric Reverse Transcriptase from Moloney Murine Leukemia Virus.Structure 12,819-829(2004).
Avidan,O.,Meer,M.E.,Oz,I.&Hizi,A.The processivity and fidelity of DNA synthesis exhibited by the reverse transcriptase of bovine leukemia virus.European Journal of Biochemistry 269,859-867(2002).
Gerard,G.F.et al.The role of template-primer in protection of reverse transcriptase from thermal inactivation.Nucleic Acids Res 30,3118-3129(2002).
Monot,C.et al.The Specificity and Flexibility of L1 Reverse Transcription Priming at Imperfect T-Tracts.PLOS Genetics 9,e1003499(2013).
Mohr,S.et al.Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing.RNA 19,958-970(2013).
Any of the references mentioned above relating to reverse transcriptase are incorporated herein by reference in their entirety if not already stated.
A guide editor: fusion proteins
The napDNAbp and polymerase (e.g., reverse transcriptase) may be provided in the form of a fusion protein. That is, the present disclosure Use of a guidance editor comprising a fusion protein, wherein the fusion protein comprises a napDNAbp domain and a polymerase (e.g., reverse Transcriptase) domain.
The guided editor (PE) systems used in the methods and compositions described herein contemplate fusion proteins comprising napDNAbp and a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase) and optionally linked by a linker. The present application contemplates combining any suitable napDNAbp and polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase) in a single fusion protein. Examples of napDNAbps and polymerases (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as reverse transcriptase), respectively, are defined herein. Since polymerases are well known in the art and amino acid sequences are readily available, the disclosure is not meant to be limited in any way to those specific polymerases identified herein.
In various embodiments, the fusion protein may comprise any suitable structural configuration. For example, the fusion protein may comprise napDNAbp fused to a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase) from the N-terminal to the C-terminal direction. In other embodiments, the fusion protein may comprise a polymerase (e.g., reverse transcriptase) fused to napDNAbp from N-terminus to C-terminus. The fusion domains may optionally be linked by a linker, such as an amino acid sequence. In other embodiments, the fusion protein may comprise the structure NH 2 -[napDNAbp]- [ polymerase ]]-COOH; or NH 2 - [ polymerase ]]-[napDNAbp]-COOH, where']Each occurrence of "- [" indicates the presence of an optional linker sequence. In embodiments where the polymerase is a reverse transcriptase, the fusion protein may comprise the structure NH 2 -[napDNAbp]-[RT]-COOH; or NH 2 -[RT]-[napDNAbp]-COOH, where']Each occurrence of "- [" indicates the presence of an optional linker sequence.
In various embodiments, the guide editor fusion protein can have an amino acid sequence (referred to herein as "PE 1") that includes a Cas9 variant (i.e., cas9 nickase) and an M-MLV RT wild type that contains an H840A mutation, as well as an N-terminal NLS sequence (19 amino acids) and an amino acid linker (32 amino acids) that connects the C-terminus of the Cas9 nickase domain to the N-terminus of the RT domain. The PE1 fusion protein has the following structure: [ NLS ] - [ Cas9 (H840A) ] - [ linker ] - [ MMLV-RT (wt) ]. The amino acid sequence of PE1 and its individual components is as follows:
/>
/>
In another embodiment, the guide editor fusion protein can have an amino acid sequence (referred to herein as "PE 2") that includes a Cas9 variant (i.e., cas9 nickase) with an H840A mutation and an M-MLV RT comprising mutations D200N, T P, L603W, T K and W313F, as well as an N-terminal NLS sequence (19 amino acids) and an amino acid linker (33 amino acids) that connects the C-terminus of the Cas9 nickase domain to the N-terminus of the RT domain. The PE2 fusion protein has the following structure: [ NLS ] - [ Cas9 (H840A) ] - [ linker ] - [ MMLV-RT (D200N) (T330P) (L603W) (T306K) (W313F) ]. The amino acid sequence of PE2 is as follows:
/>
/>
in still other embodiments, the guided editor fusion protein may have the following amino acid sequence:
/>
/>
/>
in other embodiments, the guide editor fusion protein may be based on SaCas9 or SpCas9 nickases with altered PAM specificity, such as the sequences exemplified below:
/>
/>
/>
in other embodiments, the guided editor fusion proteins used in the methods and compositions contemplated herein can include a Cas9 nickase (e.g., cas9 (H840A)) fused to a truncated version of an M-MLV reverse transcriptase. In this embodiment, the reverse transcriptase also comprises 4 mutations (D200N, T306K, W313F, T330P; note that the L603W mutation present in PE2 is no longer present due to truncation). The DNA sequence encoding such a truncated editor is 522bp smaller than PE2, so it may be suitable for applications where delivery of the DNA sequence is challenging due to its size (i.e., adeno-associated virus and lentiviral delivery). This embodiment is referred to as Cas9 (H840A) -MMLV-RT (truncated) or "PE 2-short" or "PE 2-truncated" and has the following amino acid sequence:
/>
In various embodiments, the leader editor fusion proteins used in the methods and compositions contemplated herein may also include any variant of the sequences disclosed above having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to PE1, PE2, or any of the leader editor fusion sequences shown above.
In certain embodiments, linkers can be used to attach any peptide or peptide domain or moiety of the invention (e.g., napDNAbp linked or fused to a reverse transcriptase).
A guide editor: modified PE fusion proteins (e.g., PEmax)
In one aspect, the present disclosure provides modified guided editor proteins. In one embodiment, the modified guided editor fusion protein is PEmax (of SEQ ID NO: 99), or is identical to the sequence of SEQ ID NO:99 has an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least up to 100% identical in sequence. The sequence of PEmax is as follows:
Explanation:
binary SV40NLS
SpCas9 R221K N394K H840A
Joint= (SGGSx 2-binary SV40NLS-SGGSx 2) (SGGSSGGS KRTADGSEFESPKKKRKV SGGSSGGS) (SEQ ID NO: 105)
Genscript codon optimized MMLV RT five mutant (D200N T306K W313F T330P L603W)
Other linker sequences
c-Myc NLS
PEmax protein sequence (SEQ ID NO: 99):
PEmax DNA sequence (SEQ ID NO: 5660):
/>
SEQ ID NO:99 PEmax component sequence:
binary SV40 NLS:
SpCas9 R221K N394K H840A:
/>
linker= (SGGSx 2-bipartite SV40NLS-SGGSx 2):
genscript codon optimized MMLV RT five mutant (D200N T306K W313F T330P L603W):
other linker sequences:
binary SV40 NLS:
other linker sequences:
c-Myc NLS:
the modified fusion protein may comprise any suitable structural configuration. For example, the fusion protein can comprise napDNAbp fused to a polymerase (e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase, such as a reverse transcriptase) in the N-terminal to C-terminal direction. In other embodiments, the fusion protein can comprise a polymerase (e.g., reverse transcriptase) fused to napDNAbp in the N-terminal to C-terminal direction. The fusion domains may optionally be linked by a linker, such as an amino acid sequence. In other embodiments, the fusion protein may comprise the structure NH 2 -[napDNAbp]- [ polymerase ] ]-COOH; or NH 2 - [ polymerase ]]-[napDNAbp]-COOH, each of which is "]- [ "means that an optional linker sequence is present. In embodiments where the polymerase is a reverse transcriptase, the fusion protein may comprise the structure NH 2 -[napDNAbp]-[RT]-COOH; or NH 2 -[RT]-[napDNAbp]-COOH, each of which is "]- [ "means that an optional linker sequence is present.
In various embodiments, the guided editor fusion proteins used in the methods and compositions contemplated herein can also include any variant of the disclosed sequences described above having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to PEmax.
In certain embodiments, a linker may be used to attach any peptide or peptide domain or portion of the invention (e.g., napDNAbp linked or fused to a reverse transcriptase).
The present disclosure contemplates any Cas9 protein modification known in the art having one or more PEmax mutations described herein (i.e., R221K, N394K and/or H840A), as well as any combination of modified Cas9 proteins with one or more PEmax architecture features described herein (e.g., optimized MMLV RT five mutants, NLS, linkers, etc.).
In some embodiments, the PEmax proteins described herein include any of the following other Cas9 sequences disclosed herein or variants thereof, which may be further modified at the corresponding amino acid positions with one or more mutations described herein. The napDNAbp used in the PEmax constructs described herein may include any suitable homolog and/or ortholog or naturally occurring enzyme, such as Cas9.Cas9 homologs and/or orthologs have been described in a variety of species including, but not limited to, streptococcus pyogenes and streptococcus thermophilus. The Cas moiety may be configured (e.g., mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target double-stranded DNA. Other suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on the present disclosure, and such Cas9 nucleases and sequences include those from chlkinski, rhun, and charplenier, "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems" (2013) RNA Biology 10:5,726-737, the entire contents of which are incorporated herein by reference. In some embodiments, the Cas9 nuclease has an inactive (e.g., inactivated) DNA cleavage domain; that is, cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the Cas9 protein provided by any one Cas9 ortholog provided herein.
Guidance editor-MMR inhibitor fusion proteins
The present disclosure contemplates that in some embodiments, an MMR inhibitor (e.g., an antibody to MMR protein or a polypeptide inhibitor, such as an MLH1 dominant negative variant that inhibits MMR) may be linked to the guide editor fusion protein or at least one component thereof.
In certain embodiments, an inhibitor domain described herein, e.g., an anti-MLH 1 antibody or a dominant negative variant of MLH1, may also be provided in cis by fusing the domain to a leader editor domain. Consider any of the following structures, wherein "] - [" represents an optional linker:
[ napDNAbp ] - [ reverse transcriptase ] - [ MLH1 inhibitor ];
[ reverse transcriptase ] - [ napDNAbp ] - [ MLH1 inhibitor ];
[ MLH1 inhibitor ] - [ reverse transcriptase ] - [ napdNAbp ];
[ MLH1 inhibitor ] - [ napdNAbp ] - [ reverse transcriptase ];
[ napDNAbp ] - [ MLH1 inhibitor ] - [ reverse transcriptase ]; or alternatively
[ reverse transcriptase ] - [ MLH1 inhibitor ] - [ napdNAbp ].
In certain embodiments, the inhibitor domain is a dominant negative variant of MLH 1. Consider any of the following structures, wherein "] - [" represents an optional linker:
[ napDNAbp ] - [ reverse transcriptase ] - [ MLH1 dominant negative variant ];
[ reverse transcriptase ] - [ napDNAbp ] - [ MLH1 dominant negative variant ];
[ MLH1 dominant negative variant ] - [ reverse transcriptase ] - [ napdNAbp ];
[ MLH1 dominant negative variant ] - [ napDNABP ] - [ reverse transcriptase ];
[ napDNAbp ] - [ MLH1 dominant negative variant ] - [ reverse transcriptase ]; or alternatively
[ reverse transcriptase ] - [ MLH1 dominant negative variant ] - [ napdNAbp ].
In certain other embodiments, an inhibitor domain described herein, e.g., an anti-MMR protein antibody or dominant negative variant of MMR protein, may also be provided in cis by fusing the domain to a leader editor domain. Consider any of the following structures, wherein "] - [" represents an optional linker:
[ napDNAbp ] - [ reverse transcriptase ] - [ anti-MMR protein inhibitor ];
[ reverse transcriptase ] - [ napDNAbp ] - [ anti-MMR protein inhibitor ];
[ anti-MMR protein inhibitor ] - [ reverse transcriptase ] - [ napDNAbp ];
[ anti-MMR protein inhibitor ] - [ napDNAbp ] - [ reverse transcriptase ];
[ napDNAbp ] - [ anti-MMR protein inhibitor ] - [ reverse transcriptase ]; or alternatively
[ reverse transcriptase ] - [ anti-MMR protein inhibitor ] - [ napdNAbp ], wherein the MMR protein is any one of MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ, and PCNA.
In certain embodiments, the inhibitor domain is a dominant negative variant of any MMR protein, such as MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA. Consider any of the following structures, wherein "] - [" represents an optional linker:
[ napDNAbp ] - [ reverse transcriptase ] - [ dominant negative variant of any MMR protein ];
[ reverse transcriptase ] - [ napDNAbp ] - [ dominant negative variant of any MMR protein ];
[ dominant negative variant of any MMR protein ] - [ reverse transcriptase ] - [ napDNAbp ];
[ dominant negative variant of any MMR protein napDNAbp ] - [ reverse transcriptase ];
[ napDNAbp ] - [ dominant negative variant of any MMR protein ] - [ reverse transcriptase ]; or [ reverse transcriptase ] - [ dominant negative variant of any MMR protein ] - [ napdNAbp ].
Furthermore, the MMR inhibitor may be fused to only one domain of the guidance editor and administered separately from the other guidance editor domain. For example, the MMR inhibitor can be fused to a napDNAbp domain, whereby the polymerase domains are provided separately in trans. In another case, the MMR inhibitor can be fused to the polymerase domain, whereby the napDNAbp domain is provided separately in trans.
A.Joint
As defined above, the term "linker" as used herein refers to a chemical group or molecule that connects two molecules or moieties (e.g., a binding domain and a cleavage domain of a nuclease). In some embodiments, the linker connects the gRNA binding domain of the RNA-programmable nuclease and the catalytic domain of a polymerase (e.g., reverse transcriptase). In some embodiments, the linker connects dCas9 and reverse transcriptase. Typically, a linker is located between or on both sides of two groups, molecules or other moieties and connects them to each other by covalent bonds, thereby linking the two. In some embodiments, the linker is an amino acid or multiple amino acids (e.g., peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
The length of the linker may be as simple as the covalent bond or may be a polymeric linker of many atoms. In certain embodiments, the linker is a polypeptide or amino acid based. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide bond. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of an aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, acetic acid, alanine, beta-alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminocaproic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a benzene ring. The linker may include a functionalized moiety to facilitate attachment of nucleophiles (e.g., thiols, amino groups) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, michael (Michael) acceptors, haloalkanes, aryl halides, acyl halides, and isothiocyanates.
In some other embodiments, the linker comprises the amino acid sequence (GGGGS) n (SEQ ID NO: 118), (G) n (SEQ ID NO: 119), (EAAAK) n (SEQ ID NO: 120), (GGS) n (SEQ ID NO: 121), (SGGS) n (SEQ ID NO: 122), (XP) n (SEQ ID NO: 123), or any combination thereof, wherein n is independently an integer from 1 to 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS) N (SEQ ID NO: 121), wherein N is 1, 3 or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 124), also known as XTEN. In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 125). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 126). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 127). In other embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGGS (SEQ ID NO:128, 60 AA).
In certain embodiments, linkers can be used to attach any peptide or peptide domain or moiety of the invention (e.g., napDNAbp linked or fused to a reverse transcriptase).
As defined above, the term "linker" as used herein refers to a chemical group or molecule that connects two molecules or moieties (e.g., a binding domain and a cleavage domain of a nuclease). In some embodiments, the linker connects the gRNA binding domain of the RNA-programmable nuclease and the catalytic domain of the recombinase. In some embodiments, the linker connects dCas9 and reverse transcriptase. Typically, a linker is located between or on both sides of two groups, molecules or other moieties and is attached to each other by a covalent bond, thereby linking the two. In some embodiments, the linker is an amino acid or multiple amino acids (e.g., peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
The length of the linker may be as simple as the covalent bond or may be a polymeric linker of many atoms. In certain embodiments, the linker is a polypeptide or amino acid based. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide bond. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of an aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, acetic acid, alanine, beta-alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminocaproic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a benzene ring. The linker may include a functionalized moiety to facilitate attachment of nucleophiles (e.g., thiols, amino groups) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, michael acceptors, haloalkanes, aryl halides, acyl halides, and isothiocyanates.
In some other embodiments, the linker comprises the amino acid sequence (GGGGS) n (SEQ ID NO: 118), (G) n (SEQ ID NO: 119), (EAAA) n (SEQ ID NO: 120), (GGS) n (SEQ ID NO: 121), (SGGS) n (SEQ ID NO: 122), (XP) n (SEQ ID NO: 123), or any combination thereof, wherein n is independently an integer from 1 to 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS) N (SEQ ID NO: 121), wherein N is 1, 3 or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 124), also known as XTEN. In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 125). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 126). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 127).
In particular, the following linkers may be used in different embodiments to connect the guide editor domains to each other:
GGS(SEQ ID NO:129);
GGSGGS(SEQ ID NO:130);
GGSGGSGGS(SEQ ID NO:131);
SGGSSGGSSGSETPGTSESATPESSGGSSGGSS(SEQ ID NO:102)
SGSETPGTSESATPES (SEQ ID NO: 124), also known as XTEN;
SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGGS(SEQ ID NO:128)。
the PE fusion protein can also comprise various other domains in addition to the napDNAbp (e.g., cas9 domain) and the polymerase domain (e.g., RT domain). For example, where napDNAbp is Cas9 and the polymerase is RT, the PE fusion protein may comprise one or more linkers connecting the Cas9 domain to the RT domain. The linker may also connect other functional domains, such as Nuclear Localization Sequences (NLS) or FEN1 (or other flap endonucleases) to the PE fusion protein or domain thereof.
In some embodiments, the PE fusion protein may comprise an inhibitor of the DNA mismatch repair pathway (e.g., MLHdn as described herein). In certain embodiments, the PE fusion protein and the inhibitor of the DNA mismatch repair pathway are fused via a linker. In some embodiments, the linker is a self-hydrolyzing linker. Suitable self-hydrolyzing linkers include, but are not limited to, amino acid sequences comprising 2A self-cleaving peptides. The 2A self-cleaving peptide is capable of inducing ribosome jump during protein translation, resulting in the inability of the ribosome to form peptide bonds between two genes or gene fragments. Exemplary 2A self-cleaving peptides useful as linkers in the fusion proteins described herein include amino acid sequences:
T2A-EGRGSLLTCGDVENPGP(SEQ ID NO:233)
P2A-ATNFSLLKQAGDVENPGP(SEQ ID NO:234)
E2A-QCTNYALLKLAGDVESNPGP(SEQ ID NO:235)
F2A-VKQTLNFDLLKLAGDVESNPGP(SEQ ID NO:236)
in certain embodiments, the PE fusion proteins described herein are produced by a polypeptide comprising SEQ ID NO:234 to MLH1 dn.
B.Nuclear Localization Sequence (NLS)
In various embodiments, the PE fusion protein may comprise one or more Nuclear Localization Sequences (NLS) that help facilitate translocation of the protein into the nucleus. Such sequences are well known in the art and may include sequences consisting of SEQ ID NO: 1. 101, 103, 133-139.
The above NLS example is non-limiting. The PE fusion protein may comprise any known NLS sequence, including those described in Cokol et al, "Finding nuclear localization signals," EMBO rep, "2000,1 (5): 411-415 and Freitas et al, "Mechanisms and Signals for the Nuclear Import of Proteins," Current Genomics,2009, 10 (8): 550-7, each of which is incorporated herein by reference.
In various embodiments, the guidance editors and constructs encoding the guidance editors used in the methods and compositions disclosed herein further comprise one or more, preferably at least two, nuclear localization signals. In some embodiments, the boot editor comprises at least two NLSs. In embodiments with at least two NLSs, the NLSs may be the same NLS or may be different NLSs. In addition, NLS can be expressed as part of a fusion protein with the rest of the guidance editor. In some embodiments, one or more NLSs are two-component NLSs ("bpnlss"). In certain embodiments, the disclosed fusion proteins comprise two-component NLS. In some embodiments, the disclosed fusion proteins comprise two or more two-component NLS.
The location of the NLS fusion can be at the N-terminus, the C-terminus, or within the sequence of the guide editor (e.g., inserted between the encoded napDNAbp component (e.g., cas 9) and the polymerase domain (e.g., reverse transcriptase domain).
The NLS may be any NLS sequence known in the art. The NLS may also be any NLS discovered in the future for nuclear localization. The NLS may also be any naturally occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).
The term "nuclear localization sequence" or "NLS" refers to an amino acid sequence that facilitates the import of a protein into the nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and will be apparent to the skilled artisan. For example, NLS sequences are described in International PCT application PCT/EP 2000/0110290, filed 11/23 in 2000, 31 in 2001, published as WO/2001/038547, the contents of which are incorporated herein by reference. In some embodiments, the NLS comprises the amino acid sequences PKKKKRKV (SEQ ID NO: 132), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 1), KRTADGSEFESPKKKRKV (SEQ ID NO: 140) or KRTADGSEFEPKKKRKV (SEQ ID NO: 141). In other embodiments, the NLS comprises amino acid sequences NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 142), PAAKRVKLD (SEQ ID NO: 135), RQRRNELKRSF (SEQ ID NO: 143), NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 144).
In one aspect of the disclosure, the guided editor is modified with one or more Nuclear Localization Signals (NLS), preferably at least two NLS. In certain embodiments, the guided editor is modified with two or more NLS. The present disclosure contemplates the use of any nuclear localization signal known in the art at the time of disclosure, or any nuclear localization signal that is confirmed or otherwise available in the prior art after the time of filing of the present application. Representative nuclear localization signals are peptide sequences that direct a protein to the nucleus expressing the sequence. The nuclear localization signal is predominantly basic, can be located almost anywhere in the amino acid sequence of a protein, and generally comprises a short sequence of 4 amino acids (Aueri & Agrawal, (1998) J.biol. Chem.273:14731-37, incorporated herein by reference) to 8 amino acids, and is generally rich in lysine and arginine residues (Magin et al, (2000) Virology 274:11-16, incorporated herein by reference). The nuclear localization signal typically comprises a proline residue. A variety of nuclear localization signals have been identified and have been used to affect the transport of biomolecules from the cytoplasm to the nucleus. See, e.g., tinland et al, (1992) proc.Natl. Acad.Sci.U.S. A.89:7442-46; moede et al, (1999) FEBS lett.461:229-34, which are incorporated by reference. Translocation is currently thought to involve nucleoporins.
Most NLS can be divided into three categories: (i) Single component NLS, such as the SV40 larger T antigen NLS (PKKKRKV (SEQ ID NO: 132)); (ii) Two-component motifs consisting of two basic domains separated by a different number of spacer amino acids, exemplified by Xenopus nucleoplasmic protein NLS (KRRXXXXXXXXKKK (SEQ ID NO: 145)); (iii) Non-classical sequences such as M9 of hnRNPA1 protein, influenza virus nucleoprotein NLS and yeast Gal4 protein NLS (Dingwall and Laskey 1991).
Nuclear localization signals occur at different points in the amino acid sequence of proteins. NLS has been identified at the N-terminus, C-terminus and the central region of the protein. Thus, the present disclosure provides a boot editor that can be decorated with one or more NLS's at the C-terminus, N-terminus, and interior regions of the boot editor. Residues of longer sequences that do not function as constituent NLS residues should be selected so as not to interfere with the nuclear localization signal itself, e.g., in tension or space. Thus, although there is no strict limitation on the composition of the sequence comprising the NLS, in practice such sequences are functionally limited in length and composition.
The present disclosure contemplates any suitable means by which a guided editor is modified to include one or more NLSs. In one aspect, the guided editor may be designed to express a guided editor protein that translationally fuses one or more NLSs at its N-terminus or C-terminus (or both), i.e., to form a guided editor-NLS fusion construct. In other embodiments, the nucleotide sequence encoding the guided editor may be genetically modified to incorporate a reading frame encoding one or more NLS within the interior region of the encoded guided editor. In addition, the NLS may include various amino acid linkers or spacers encoded between the guide editor and the N-terminal, C-terminal, or internally linked NLS amino acid sequences (e.g., in the central region of the protein). Thus, the present disclosure also provides nucleotide constructs, vectors, and host cells for expressing a fusion protein comprising a guide editor and one or more NLS.
The guidance editors used in the methods and compositions described herein may also include a nuclear localization signal that is linked to the guidance editors by one or more linkers, such as a polymer, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplation of the present disclosure are not intended to be limiting in any way, may be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain), and are attached to the guidance editor by any suitable strategy that enables the formation of bonds (e.g., covalent bonds, hydrogen bonds) between the guidance editor and one or more NLSs.
C.Flap endonuclease (e.g., FEN 1)
In various embodiments, the PE fusion protein may comprise one or more flap endonucleases (e.g., FEN 1), which refers to enzymes that catalyze the removal of 5' single stranded DNA flaps. These enzymes are used to remove 5' flaps formed during cellular processes, including DNA replication. Guided editing as used in the methods and compositions described herein can utilize endogenously provided flap endonucleases or those provided in trans to remove the 5' flap of endogenous DNA formed at the target site during guided editing. Flap endonucleases are known in the art and can be found in the descriptions of Patel et al, "Flap endonucleases pass 5'-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5' -ends," Nucleic Acids Research,2012, 40 (10): 4507-4519 and Tsutakawa et al, "Human flap endonuclease structures, DNA double-base flip, and a unified understanding of the FEN1 superfamity," Cell,2011, 145 (2): 198-211 (each of which is incorporated herein by reference). An exemplary flap endonuclease is FEN1, which can be represented by the following amino acid sequence:
The flap endonuclease may also include any FEN1 variant, mutant or other flap endonuclease ortholog, homolog or variant. Non-limiting FEN1 variants are exemplified as follows:
/>
in various embodiments, the guided editor fusion proteins used in the methods and compositions contemplated herein can comprise amino acid sequences that are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the sequences described above. Other endonucleases that can be utilized by the present methods to facilitate removal of a 5' single-stranded DNA flap include, but are not limited to, (1) trex2, (2) exo1 endonucleases (e.g., keijzers et al, biosci rep.2015, 35 (3): e 00206).
Trex 2
3' three prime repair exonuclease 2 (TREX 2) -human [ accession No. NM-080701 ]
/>
3' three primary repair exonuclease 2 (TREX 2) -mouse [ accession No. NM-011907 ]
3' three primary repair exonuclease 2 (TREX 2) -rat [ accession No. NM-001107580 ]
ExoI
Human exonuclease 1 (EXO 1) is involved in many different DNA metabolic processes including DNA mismatch repair (MMR), micro-mediated end ligation, homologous Recombination (HR) and replication. Human EXO1 belongs to the family of eukaryotic nucleases Rad2/XPG, which also include FEN1 and GEN1. The Rad2/XPG family is conserved in nuclease domains of species ranging from phage to human. The EXO1 gene product exhibits 5 'exonuclease and 5' flap activity. Furthermore, EXO1 comprises an inherent 5' rnaseh activity. Human EXO1 has high affinity for processing double-stranded DNA (dsDNA), nicks, gaps, pseudo Y structures, and can use its inherited flap activity to break down Holliday (Holliday) linkers. Human EXO1 is associated with MMR and comprises a conserved binding domain that directly interacts with MLH1 and MSH 2. PCNA, mutSα (MSH 2/MSH6 complex), 14-3-3, MRN and 9-1-1 complexes can positively stimulate EXO1 nucleolytic activity.
Exonuclease 1 (EXO 1) accession No. NM-003686 (Chile exonuclease 1 (EXO 1), transcript variant 3) -isoform A
Exonuclease 1 (EXO 1) accession No. NM-006027 (Chile exonuclease 1 (EXO 1), transcript variant 3) -isoform B
Exonuclease 1 (EXO 1) accession No. NM-001319224 (Chile exonuclease 1 (EXO 1), transcript variant 4) -isoform C
D.Inteins and cleaved inteins
It will be appreciated that in some embodiments (e.g., using AAV particles to deliver a guidance editor in vivo), it may be advantageous to break a polypeptide (e.g., deaminase or napDNAbp) or fusion protein (e.g., guidance editor) into an N-terminal half and a C-terminal half, deliver them separately, and then co-localize them to reform the intact protein (or fusion protein, as the case may be) within the cell. The separate halves of the protein or fusion protein may each comprise a split intein tag to facilitate the reformation of the intact protein or fusion protein by the protein trans-splicing mechanism.
Trans-splicing of proteins catalyzed by split inteins provides a completely enzymatic method for protein ligation. The cleaved intein is essentially a continuous intein (e.g., a mini-intein) cleaved into two parts (designated N-i and C-intein, respectively). The N-intein and C-intein of the cleaved intein may be non-covalently bound to form an active intein and catalyze splicing reactions in substantially the same manner as the continuous intein. The split inteins have been found in nature and have also been engineered in the laboratory. As used herein, the term "split intein" refers to any intein in which there is one or more peptide bond breaks between the N-terminal and C-terminal amino acid sequences, such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently recombine or recombine into an intein that reacts with trans-splicing reactions. Any catalytically active intein or fragment thereof may be used to derive the cleaved intein for use in the methods of the invention. For example, in one aspect, the split intein may be derived from a eukaryotic intein. In another aspect, the disrupted intein may be derived from a bacterial intein. In another aspect, the disrupted intein may be derived from an archaebacteria intein. Preferably, the cleaved inteins so derivatized will have only the amino acid sequence necessary to catalyze the trans-splicing reaction.
As used herein, "N-terminal cleavage intein (In)" refers to any intein sequence comprising an N-terminal amino acid sequence that is responsible for trans-splicing reactions. Thus, in also contains sequences that are cut out when trans-splicing occurs. In may comprise a modified sequence that is the N-terminal portion of a naturally occurring intein sequence. For example, in may comprise additional amino acid residues and/or mutated residues, provided that the inclusion of such additional and/or mutated residues does not render In nonfunctional In trans-splicing. Preferably, residues comprising additional and/or mutations increase or enhance the trans-splicing activity of In.
As used herein, "C-terminal split intein (Ic)" refers to any intein sequence comprising a C-terminal amino acid sequence that is responsible for the trans-splicing reaction. In one aspect, ic comprises 4 to 7 consecutive amino acid residues, at least 4 of which are from the last β -strand of the intein from which they were derived. Thus Ic also comprises the sequence that is cut out when trans-splicing occurs. Ic may comprise a modified sequence that is the C-terminal portion of a naturally occurring intein sequence. For example, ic may comprise additional amino acid residues and/or mutated residues, provided that the inclusion of such additional and/or mutated residues does not render In nonfunctional In trans-splicing. Preferably, inclusion of additional and/or mutated residues increases or enhances the trans-splicing activity of Ic.
In some embodiments of the invention, the peptide linked to Ic or In may comprise additional chemical moieties, including, inter alia, fluorophores, biotin, polyethylene glycol (PEG), amino acid analogs, unnatural amino acids, phosphate groups, glycosyls, radioisotope labels, and drug molecules. In other embodiments, the peptide linked to Ic may comprise one or more chemically reactive groups, including ketone, aldehyde, cys residues, and Lys residues. In the presence of an "Intein Splicing Polypeptide (ISP)", the N-intein and the C-intein of a cleaved intein can be non-covalently bound to form an active intein and catalyze a splicing reaction. As used herein, "Intein Splice Polypeptide (ISP)" refers to the portion of the amino acid sequence of a cleaved intein that remains when Ic, in, or both are removed from the cleaved intein. In certain embodiments, in comprises an ISP. In another embodiment, ic comprises an ISP. In yet another embodiment, the ISP is a separate peptide that is covalently linked to neither In nor Ic.
The cleaved inteins may be generated from the contiguous inteins by engineering one or more cleavage sites in the unstructured loop or intervening amino acid sequences between-12 conserved β -strands present in the mini-intein structure. There may be some flexibility in the location of the cleavage site within the region between the β -strands, provided that cleavage occurs so as not to disrupt the structure of the intein, particularly the structured β -strand, to an extent sufficient to result in loss of splicing activity of the protein.
In protein trans-splicing, one precursor protein consists of an N-intein moiety and a subsequent N-intein moiety, the other precursor protein consists of a C-intein and a subsequent C-intein moiety, and the trans-splicing reaction (co-catalyzed by the N-and C-inteins) cleaves the two intein sequences and connects the two intein sequences with a peptide bond. Protein trans-splicing is an enzymatic reaction that can be performed at very low (e.g., micromolar) concentrations of protein and can be performed under physiological conditions.
An exemplary sequence is as follows:
/>
/>
although inteins are most often found as continuous domains, some exist in naturally broken forms. In this case, the two fragments are expressed as separate polypeptides and must be combined before splicing occurs, so-called protein trans-splicing.
An exemplary break intein is an Ssp DnaE intein, which comprises two subunits, namely DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-N and dnaE-C, encoding dnaE-N and dnaE-C subunits, respectively. DnaE is a broken intein naturally occurring in synechocyanopsis sp. PCC6803 is capable of directing trans-splicing of two different proteins, each comprising a fusion with DnaE-N or DnaE-C.
Other naturally occurring or engineered cleaved intein sequences are known or can be prepared from the complete intein sequences described herein or those available in the art. Examples of disrupted intein sequences can be found in Stevens et al, "A promiscuous split intein with expanded protein engineering applications," PNAS,2017, vol.114:8538-8543; iwai et al, "Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostc punctiforme, FEBS Lett,580:1853-1858, each of which is incorporated herein by reference. Additional disrupted intein sequences can be found, for example, in WO2013/045632, WO2014/055782, WO2016/069774 and EP2877490, the respective contents of which are incorporated herein by reference.
Furthermore, protein trans-splicing has been described in vivo and in vitro (Shelledecker, et al, gene 207:187 (1998), southworth, et al, EMBO J.17:918 (1998), mills, et al, proc.Natl. Acad. Sci. USA,95:3543-3548 (1998), lew, et al, J.biol. Chem.,273:15887-15890 (1998), wu, et al, biochim. Biophys. Acta 35732:1 (1998 b), yamazaki, et al, J.am. Chem. 120:5591 (1998), evans, et al, J.biol. Chem.9091 (2000), oto, et al, biochemiy 38:16040-16044 (1999), otoo, et al, J.biol. Chem. 14:105-105 (1998)), and the subsequent expression of the protein by the two fragments of the protein expressed by the recombinant expression cassette (1998), and the expression of the two fragments were provided by the recombinant expression of the recombinant protein expressed in vivo or in vitro.
RNA-protein interaction domains
In various embodiments, two separate protein domains (e.g., cas9 domain and polymerase domain) can be co-located with each other by using an "RNA-protein recruitment system" (e.g., "MS2 tagging technology") to form a functional complex (similar to the function of a fusion protein comprising two separate protein domains). Such systems typically label one protein domain with an "RNA-protein interaction domain" (also known as an "RNA-protein recruitment domain"), and label another protein domain with an "RNA-binding protein" (e.g., a specific hairpin structure) that specifically recognizes and binds to the RNA-protein interaction domain. These types of systems can be utilized to co-locate the domains of the guided editor and to recruit additional functionality, such as UGI domains, for the guided editor. In one example, the MS2 tagging technique is based on the interaction of the MS2 phage coat protein ("MCP" or "MS2 cp") with stem loops or hairpin structures present in the phage genome, i.e., "MS2 hairpin". In the case of MS2 hairpins, they are recognized and bound by MS2 phage coat protein (MCP). Thus, in one exemplary scenario, the deaminase-MS 2 fusion may recruit Cas9-MCP fusion.
Reviews of other modular RNA-protein interaction domains in the art are described, for example, in Johansson et al, "RNA recognition by the MS2 phage coat protein," hem virol, 1997, vol.8 (3): 176-185; delebecque et al, "Organization of intracellular reactions with rationally designed RNA assemblies," Science,2011, vol.333:470-474; mali et al, "Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering," nat. Biotechnol.,2013, vol.31:833-838; and Zalatan et al, "Engineering complex synthetic transcriptional programs with CRISPR RNA scaffoldes," Cell,2015, vol.160:339-350, each of which is incorporated herein by reference in its entirety. Other systems include PP7 hairpins that specifically recruit PCP proteins and "Com" hairpins that specifically recruit Com proteins. See Zalatan et al.
The nucleotide sequence of the MS2 hairpin (or equivalently "MS2 aptamer") is:
the amino acid sequence of MCP or MS2cp is:
E.UGI domain
In other embodiments, the guidance editors used in the methods and compositions described herein may comprise one or more uracil glycosidase inhibitor domains. As used herein, the term "Uracil Glycosidase Inhibitor (UGI)" or "UGI domain" refers to a protein capable of inhibiting uracil-DNA glycosidase base excision repair enzymes. In some embodiments, the UGI domain comprises a wild-type UGI or a sequence as set forth in SEQ ID NO: 168. In some embodiments, the UGI proteins provided herein comprise a fragment of UGI and a protein homologous to UGI or fragment of UGI. For example, in some embodiments, the UGI domain comprises SEQ ID NO:168, and a fragment of the amino acid sequence shown in seq id no. In some embodiments, the UGI fragment comprises a sequence comprising SEQ ID NO:168, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence set forth in seq id no. In some embodiments, the UGI comprises a sequence that is identical to SEQ ID NO:168, or an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO:168, and a fragment of the amino acid sequence shown in seq id no. In some embodiments, a protein comprising UGI or a fragment of UGI or a homolog of UGI or fragment of UGI is referred to as a "UGI variant. The UGI variant has homology to UGI or a fragment thereof. For example, the UGI variant hybridizes to a wild-type UGI or SEQ ID NO:168 is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97%, at least 98%, at least 99%, at least 99.5%, or at least 99.9% identical. In some embodiments, the UGI variant comprises a fragment of UGI such that the fragment hybridizes to a wild-type UGI or SEQ ID NO:168 is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical. In some embodiments, the UGI comprises the following amino acid sequences:
uracil-DNA glycosidase inhibitors:
>sp|P14739|UNGI_BPPB2
the guidance editors used in the methods and compositions described herein may comprise more than one UGI domain, which may be separated by one or more linkers as described herein.
F.Other PE elements
In certain embodiments, the guidance editors used in the methods and compositions described herein may comprise a base repair inhibitor. The term "base repair inhibitor" or "IBR" refers to a protein capable of inhibiting the activity of a nucleic acid repair enzyme (e.g., base excision repair enzyme). In some embodiments, the IBR is an OGG base excision repair inhibitor. In some embodiments, the IBR is a base excision repair inhibitor ("iBER"). Exemplary inhibitors of base excision repair include inhibitors of APE1, endo III, endo IV, endo V, endo VIII, fpg, alogg 1, hNEIL1, T7EndoI, T4PDG, UDG, hSMUG1, and hAAG. In some embodiments, the IBR is an inhibitor of Endo V or hAAG. In some embodiments, the IBR is an iBER that may be a small molecule or peptide inhibitor of a catalytically inactive glycosidase or a catalytically inactive dioxygenase or oxidase, or a variant thereof. In some embodiments, the IBR is iBER, which may be a TDG inhibitor, MBD4 inhibitor, or an alkbhase inhibitor. In some embodiments, the IBR is an iBER comprising a catalytically inactive TDG or a catalytically inactive MBD 4. An exemplary catalytically inactive TDG is SEQ ID NO:172 N140A mutant of (human TDG).
Some exemplary glycosidases are provided below. Any catalytically inactive variant of these glycosidase domains is a napDNAbp or iBER of a polymerase domain that can be fused to the guide editor used in the methods and compositions provided herein.
OGG (human)
MPG (human)
MBD4 (human)
/>
TDG (human)
In some embodiments, the fusion proteins described herein can comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the leader editor component). The fusion protein may comprise any additional protein sequence, and optionally comprises a linker sequence between any two domains. Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences (e.g., nuclear export sequences or other localization sequences), and sequence tags that may be used for the lysis, purification or detection of fusion proteins.
Examples of protein domains that can be fused to a leader editor or component thereof (e.g., a napDNAbp domain, a polymerase domain, or an NLS domain) include, but are not limited to, epitope tags and reporter sequences. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza Hemagglutinin (HA) tags, myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol Acetyl Transferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green Fluorescent Protein (GFP), hcRed, dsRed, cyan Fluorescent Protein (CFP), yellow Fluorescent Protein (YFP), and autofluorescent proteins, including Blue Fluorescent Protein (BFP). The guide editor may be fused to gene sequences encoding proteins or protein fragments that bind to DNA molecules or bind to other cellular molecules, including but not limited to Maltose Binding Protein (MBP), S-tag, lex a DNA Binding Domain (DBD) fusion, GAL4DNA binding domain fusion, and pure herpes virus (HSV) BP16 protein fusion. Other domains that may form part of the guidance editor are described in U.S. patent publication No. 2011/0059502 published at 3/10 in 2011, and incorporated herein by reference in its entirety.
In one aspect of the disclosure, reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol Acetyl Transferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green Fluorescent Protein (GFP), hcRed, dsRed, cyan Fluorescent Protein (CFP), yellow Fluorescent Protein (YFP), and autofluorescent proteins, including Blue Fluorescent Protein (BFP), which can be introduced into cells to encode gene products that serve as markers for measuring changes or modifications in gene product expression. In certain embodiments of the present disclosure, the gene product is a luciferase. In another embodiment of the present disclosure, expression of the gene product is reduced.
Suitable protein tags provided herein include, but are not limited to, a Biotin Carboxylase Carrier Protein (BCCP) tag, myc-tag, calmodulin-tag, FLAG-tag, hemagglutinin (HA) -tag, polyhistidine tag (also known as histidine tag or His tag), maltose Binding Protein (MBP) tag, nus tag, glutathione-S-transferase (GST) tag, green Fluorescent Protein (GFP) tag, thioredoxin tag, S tag, softtags (e.g., softtag 1, softtag 3), chain tag, biotin ligase tag, flash tag, V5 tag, and SBP tag. Other suitable sequences will be apparent to those skilled in the art. In some embodiments, the fusion protein comprises one or more His-tags.
In some embodiments of the present disclosure, the activity of the guided editing system may be time-sequentially adjusted by adjusting the residence time, amount, and/or activity of the expression component of the PE system. For example, as described herein, a PE may be fused to a protein domain capable of altering the intracellular half-life of the PE. In certain embodiments involving two or more vectors (e.g., vector systems in which the components described herein are encoded on two or more separate vectors), the activity of the PE system can be time-sequentially modulated by controlling the time of delivery of the vector. For example, in some embodiments, the vector encoding the nuclease system may deliver the PE prior to the vector encoding the template. In other embodiments, the vector encoding PEgRNA may deliver the guide before the vector encoding the PE system. In some embodiments, the vector encoding the PE system and PEgRNA are delivered simultaneously. In certain embodiments, the concurrently delivered vector delivers, for example, PE, PEgRNA, and/or second strand guide RNA components in a time sequence. In further embodiments, the RNA transcribed from the coding sequence on the vector (e.g., nuclease transcript) may further comprise at least one element capable of altering the intracellular half-life of the RNA and/or modulating translational control. In some embodiments, the half-life of the RNA can be increased. In some embodiments, the half-life of the RNA can be reduced. In some embodiments, the element may be capable of increasing the stability of the RNA. In some embodiments, the element may be capable of reducing the stability of RNA. In some embodiments, the element may be within the 3' utr of the RNA. In some embodiments, the element may comprise a polyadenylation signal (PA). In some embodiments, the element may include a cap, such as an upstream mRNA or PEgRNA end. In some embodiments, the RNA may not comprise PA, such that it degrades more rapidly in the cell after transcription. In some embodiments, the elements may comprise at least one AU-rich element (ARE). ARE can be bound by an ARE binding protein (ARE-BP) in a manner that depends on tissue type, cell type, time, cell location and environment. In some embodiments, the destabilizing element may promote RNA decay, affect RNA stability, or activate translation. In some embodiments, an ARE may comprise 50 to 150 nucleotides in length. In some embodiments, an ARE may comprise at least one copy of the sequence AUUUA. In some embodiments, at least one ARE may be added to the 3' utr of RNA. In some embodiments, the element may be woodchuck hepatitis virus (WHP).
A post-transcriptional regulatory element (WPRE) that generates tertiary structure to enhance expression of the transcript. In further embodiments, the element is a modified and/or truncated WPRE sequence capable of enhancing expression of the transcript, described, for example, in Zufferey et al, J Virol,73 (4): 2886-92 (1999) and flajol et al, J Virol,72 (7): 6175-80 (1998). In some embodiments, WPRE or equivalent may be added to the 3' utr of RNA. In some embodiments, the element may be selected from other RNA sequence motifs that are enriched in transcripts that decay rapidly or slowly.
In some embodiments, a vector encoding PE or PEgRNA may self-destruct by cleavage of a target sequence present on the vector by the PE system. Cleavage may prevent PE or PEgRNA from continuing transcription from the vector. While transcription may occur on the linearized vector for a period of time, the expressed transcript or protein undergoing intracellular degradation will have less time to produce off-target effects without the need for continued supply from the expression of the encoding vector.
PEgRNA
The guided editing systems used in the methods and compositions described herein contemplate the use of any suitable PEgRNA.
PEgRNA architecture
In some embodiments, extended guide RNAs can be used in the guided editing systems used in the methods and compositions disclosed herein, wherein a traditional guide RNA comprises about 20nt pre-spacer and a gRNA core region that binds to napDNAbp. In this embodiment, the guide RNA includes an extended RNA segment at the 5 'end, i.e., a 5' extension. In this embodiment, the 5' extension comprises a reverse transcription template sequence, a reverse transcription primer binding site, and optionally a 5-20 nucleotide linker sequence. The RT primer binding site hybridizes to the free 3' end formed after a nick is formed in the non-target strand of the R loop, thereby priming the reverse transcriptase for DNA polymerization in the 5' to 3' direction.
In another embodiment, extended guide RNAs can be used in the methods and compositions disclosed herein using a guided editing system wherein a traditional guide RNA comprises about 20nt pre-spacer and a gRNA core that binds to napDNAbp. In this embodiment, the guide RNA comprises an extended RNA fragment at the 3 'end, i.e., 3' extension. In this embodiment, the 3' extension includes a reverse transcription template sequence and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3' end formed after a nick is formed in the non-target strand of the R loop, thereby priming the reverse transcriptase for DNA polymerization in the 5' to 3' direction.
In another embodiment, an extended guide RNA can be used in another embodiment of the extended guide RNA in the guided editing systems used in the methods and compositions disclosed herein, wherein the traditional guide RNA comprises a pre-spacer of about 20nt and a gprna core that binds to napDNAbp. In this embodiment, the guide RNA comprises an extended RNA segment at a position between molecules in the gRNA core, i.e., intramolecular extension. In this embodiment, the intramolecular extension includes a reverse transcription template sequence and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3' end formed after a nick is formed in the non-target strand of the R loop, thereby priming the reverse transcriptase for DNA polymerization in the 5' to 3' direction.
In one embodiment, the location of intermolecular RNA extension is not in the pre-spacer of the guide RNA. In another embodiment, the location of intermolecular RNA extension in the gRNA core. In yet another embodiment, the location of intermolecular RNA extension is the location of the guide RNA molecule except any location within the pre-spacer, or the pre-spacer is destroyed.
In one embodiment, the intermolecular RNA extension is inserted downstream of the 3' end of the pre-spacer. In another embodiment, the intermolecular RNA extension is inserted at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides downstream of the 3' end of the pre-spacer sequence.
In other embodiments, the intermolecular RNA extension inserts into the gRNA, which refers to the portion of the guide RNA that corresponds to or comprises the tracrRNA that binds to and/or interacts with the Cas9 protein or an equivalent thereof (i.e., a different napDNAbp). Preferably, insertion of the intermolecular RNA extension does not disrupt or minimally disrupts the interaction between the tracrRNA moiety and the napDNAbp.
The length of the RNA extension (which includes at least the RT template and the primer binding site, see e.g. fig. 3) may be any useful length. In various embodiments, the RNA extension is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
The RT template sequence may also be of any suitable length. For example, the RT template sequence may be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
In other embodiments, wherein the reverse transcription primer binding site sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
In other embodiments, the optional linker or spacer sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
In certain embodiments, the RT template sequence encodes a single stranded DNA molecule that is homologous to the non-target strand (and thus complementary to the corresponding site of the target strand) but includes one or more nucleotide changes. The at least one nucleotide change may include one or more single base nucleotide changes, one or more deletions, and one or more insertions.
The single-stranded DNA product synthesized by the RT template sequence is homologous to the non-target strand and contains one or more nucleotide changes. The single stranded DNA product of the RT template sequence hybridizes in equilibrium with the complementary target strand sequence, displacing the homologous endogenous target strand sequence. In some embodiments, the displaced endogenous strand may be referred to as a 5' endogenous DNA flap species. This 5 'endogenous DNA flap species can be removed by a 5' flap endonuclease (e.g., FEN 1) that can ligate the single stranded DNA product now hybridized to the endogenous target strand, thereby forming a mismatch between the endogenous sequence and the newly synthesized strand. Mismatches can be resolved by the process of cell's innate DNA repair and/or replication.
In various embodiments, the nucleotide sequence of the RT template sequence corresponds to the nucleotide sequence of a non-target strand that is displaced as a 5' flap species and overlaps with the site to be edited.
In various embodiments of the extended guide RNA, the reverse transcription template sequence may encode a single-stranded DNA flap complementary to the endogenous DNA sequence adjacent to the nicking site, wherein the single-stranded DNA flap comprises the desired nucleotide change. The single-stranded DNA flap can displace endogenous single-stranded DNA at the nicking site. The endogenous single stranded DNA displaced at the nicking site may have a 5' end and form an endogenous flap, which may be excised by the cell. In various embodiments, excision of the 5 'endogenous flap can aid in driving product formation, as removal of the 5' endogenous flap facilitates hybridization of the single-stranded 3'DNA flap to the corresponding complementary DNA strand, as well as incorporation or assimilation of the desired nucleotide change carried by the single-stranded 3' DNA flap into the target DNA.
In various embodiments of the extended guide RNA, cellular repair of the single stranded DNA flap results in the installation of the desired nucleotide change, thereby forming the desired product.
In other embodiments, the desired nucleotide changes are installed in the following edit window: between about-5 and +5 of the incision site, or between about-10 and +10 of the incision site, or between about-20 and +20 of the incision site, or between about-30 and +30 of the incision site, or between about-40 and +40 of the incision site, or between about-50 and +50 of the incision site, or between about-60 and +60 of the incision site, or between about-70 and +70 of the incision site, or between about-80 and +80 of the incision site, or between about-90 and +90 of the incision site, or between about-100 and +100 of the incision site, or between about-200 and +200 of the incision site.
In other embodiments, the desired nucleotide changes are installed in the following edit window: between about +1 and +2 from the incision site, or about +1 to +3, +1 to +4, +1 to +5, +1 to +6, +1 to +7, +1 to +8, +1 to +9, +1 to +10, +1 to +11, +1 to +12, +1 to +13, +1 to +14, +1 to +15, +1 to +16, +1 to +17, +1 to +18, +1 to +19, +1 to +20, +1 to +21, +1 to +22, +1 to +23, +1 to +24, +1 to +25, +1 to +26, +1 to +27, +1 to +28, +1 to +29, +1 to +30, +1 to +31, +1 to +32, +1 to +33, +1 to +34, +1 to +35, +1 to +36, +1 to +37, +1 to +38, +1 to +39, +1 to +40, +1 to +41. +1 to +44, +1 to +45, +1 to +46, +1 to +47, +1 to +48, +1 to +49, +1 to +50, +1 to +51, +1 to +52, +1 to +53, +1 to +54, +1 to +55, +1 to +56, +1 to +57, +1 to +58, +1 to +59, +1 to +60, +1 to +61, +1 to +62, +1 to +63, +1 to +64, +1 to +65, +1 to +66, +1 to +67, +1 to +68, +1 to +69, +1 to +70, +1 to +71, +1 to +72, +1 to +73, +1 to +74, +1 to +75, +1 to +76, +1 to +77, +1 to +78, +1 to +79, +1 to +80, +1 to +81, +1 to +82, +1 to +83, +1 to +1 +1 to +85, +1 to +86, +1 to +87, +1 to +88, +1 to +89, +1 to +90, +1 to +91, +1 to +92, +1 to +93, +1 to +94, +1 to +95, +1 to +96, +1 to +97, +1 to +98, +1 to +99, +1 to +100, +1 to +101, +1 to +102, +1 to +103, +1 to +104, +1 to +105, +1 to +106, +1 to +107, +1 to +108, +1 to +109, +1 to +110, +1 to +111, +1 to +112, +1 to +113, +1 to +114, +1 to +115, +1 to +116, +1 to +117, +1 to +118, +1 to +119, +1 to +120, +1 to +121, +1 to +123, +1 to +125, or +124.
In other embodiments, the desired nucleotide changes are installed in the following edit window: about +1 to +2 from the incision site, or about +1 to +5, +1 to +10, +1 to +15, +1 to +20, +1 to +25, +1 to +30, +1 to +35, +1 to +40, +1 to +45, +1 to +50, +1 to +55, +1 to +100, +1 to +105, +1 to +110, +1 to +115, +1 to +120, +1 to +125, +1 to +130, +1 to +135, +1 to +140, +1 to +145, +1 to +150, +1 to +155, +1 to +160, +1 to +165, +1 to +170, +1 to +175, +1 to +180, +1 to +185, +1 to +190, +1 to +195, or +1 to +200 from the incision site.
In various aspects, the extended guide RNA is a modified version of the guide RNA. The guide RNA may be naturally occurring, expressed from a coding nucleic acid, or chemically synthesized. Methods for obtaining or otherwise synthesizing a guide RNA and determining the appropriate sequence for the guide RNA are well known in the art and include pre-spacers that interact and hybridize with the target strand of the genomic target site of interest.
In various embodiments, the particular design aspects of the guide RNA sequence depend on, among other factors, the nucleotide sequence of the genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., cas9 protein) present in the guided editing system used in the methods and compositions described herein, e.g., PAM sequence position, percentage of G/C content in the target sequence, degree of microhomologous regions, secondary structure, and the like.
In general, a guide sequence is any polynucleotide sequence that has sufficient complementarity to a target polynucleotide sequence to hybridize to the target sequence and direct the specific binding of napDNAbp (e.g., cas9 homolog, or Cas9 variant) to the sequence of the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using a suitable alignment algorithm. The optimal alignment may be determined using any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, the Burrow-Wheeler transform-based algorithm (e.g., burrows Wheeler Aligner), clustalW, clustalX, BLAT, novoalign (Novocraft Technologies, ELAND (Illumina, san Diego, calif.), SOAP (available from SOAP. Genemics. Org. Cn), and Maq (available from map. Sourcefore. Net.) in some embodiments, the guide sequences are about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2728, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
In some embodiments, the guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12 or fewer nucleotides in length. The ability of the guide sequence to direct sequence-specific binding of the guide editor to the target sequence may be assessed by any suitable assay. For example, a component of a guide editor may be provided to a host cell having a corresponding target sequence, including a guide sequence to be tested, e.g., by transfection with a vector encoding a guide editor component disclosed herein, and then preferential cleavage within the target sequence is assessed, e.g., by the Surveyor assay described herein. Similarly, cleavage of a target polynucleotide sequence can be assessed in vitro by: a component of a target sequence, a guidance editor is provided, comprising a guide sequence to be tested and a control guide sequence different from the test guide sequence, and the binding or cleavage rate of the target sequence between the test and control guide sequence reactions is compared. Other assays are possible and will occur to those of skill in the art.
The guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within the genome of the cell. Exemplary target sequences include those that are unique in the target genome. For example, for Streptococcus pyogenes Cas9, the unique target sequence in the genome can include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 173), where NNNNNNNNNNNNXGG (SEQ ID NO: 174) (N is A, G, T, or C; X can be any base). Unique target sequences in the genome may include the streptococcus pyogenes Cas9 target site in the form mmmmmmmmmmmmnnnnnnnnnnnnxgg (SEQ ID NO: 175), where nnnnnnnnnnxgg (SEQ ID NO: 176) (N is A, G, T or C; X may be any base). For streptococcus thermophilus CRISPR1Cas9, the unique target sequence in the genome may comprise the Cas9 target site of the form mmmmmmmmnnnnnnnnnnxxagaaw (SEQ ID NO: 177), where nnnnnnnnnnxxagaaw (SEQ ID NO: 178) (N is A, G, T or C; X may be any base; W is a or T). Unique target sequences in the genome may include the streptococcus thermophilus CRISPR1Cas9 target site in the form mmmmmmmmmmnnnnnnnnnnnnxxagaaaw (SEQ ID NO: 179) where nnnnnnnnnnxxaaw (SEQ ID NO: 180) (N is A, G, T or C; X may be any base; W is a or T). For Streptococcus pyogenes Cas9, the unique target sequence in the genome may comprise a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG (SEQ ID NO: 181), wherein NNNNNNNNNNNNNNXGGXG (SEQ ID NO: 182) (N is A, G, T or C; X may be any base). Unique target sequences in the genome can include the streptococcus pyogenes Cas9 target site in the form mmmmmmmmmmmmnnnnnnnnnnnnxggxg (SEQ ID NO: 183), where nnnnnnnnnnxggxg (SEQ ID NO: 184) (N is A, G, T or C; and X can be any base). In each of these sequences, "M" may be A, G, T or C and need not be considered when determining that the sequence is unique.
In some embodiments, the guide sequence is selected to reduce the extent of secondary structure within the guide sequence. The secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimum Gibbs free energy. An example of such an algorithm is mFold, such as Zukerand Stiegler (Nucleic Acids Res.9 (1981), 133-148). Another example folding algorithm is RNAfold, an online web server developed by the university of vienna (the University of Vienna) theoretical chemistry research, using centroid structure prediction algorithms (see, e.g., a.r. gruber et al, 2008, cell 106 (1): 23-24; and PA Carr and GM Church,2009,Nature Biotechnology 27 (12): 1151-62). Further algorithms can be found in U.S. application Ser. No. 61/836,080 (Broad Reference BI-2013/004A); incorporated herein by reference.
Generally, a tracr mate sequence includes any sequence that has sufficient complementarity to a tracr sequence to facilitate one or more of the following: (1) Excision of the guide sequences flanking the tracr mate sequence in cells containing the corresponding tracr sequence; and (2) forming a complex at the target sequence, wherein the complex comprises a tracr mate sequence hybridized to a tracr sequence. In general, the degree of complementarity refers to the optimal alignment of the tracr mate sequence and tracr sequence along the length of the shorter of the two sequences. The optimal alignment may be determined by any suitable alignment algorithm and may further result in self-complementarity within a secondary structure, such as a tracr sequence or tracr mate sequence. In some embodiments, the optimal alignment is about or greater than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99% or more, the degree of complementarity between the tracr sequence and the tracr mate sequence is along the length of the shorter of the two. In some embodiments, the tracr sequence is about or greater than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50 or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript such that hybridization between the two results in a transcript having a secondary structure (e.g., hairpin structure). The preferred loop forming sequence for the hairpin structure is four nucleotides in length, most preferably having the sequence GAAA. However, longer or shorter loop sequences may be used, and alternative sequences may be used. The sequence preferably includes a nucleotide triplet (e.g., AAA) and additional nucleotides (e.g., C or G). Examples of loop forming sequences include CAAA and AAAG. In embodiments of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has 2, 3, 4 or 5 hairpins. In another embodiment of the invention, the transcript has up to 5 hairpins. In some embodiments, the single transcript further comprises a transcription termination sequence; preferably, this is a polyT sequence, such as 6T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence and a tracr sequence are as follows (listed from 5 'to 3'), wherein "N" represents a base of the guide sequence, the first lowercase letter represents the tracr mate sequence, the second lowercase letter represents the tracr sequence, and the last poly-T sequence represents the transcription terminator:
(1)NNNNNNNNGTTTTTGTACTCTCAAGATTTAGAAATAAATCTTGCAGAAGCTACAAAGATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGTTATTTAATTTTTT(SEQ ID NO:185);
(2)NNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAAGATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGTTATTTAATTTTTT(SEQ ID NO:186);
(3)NNNNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAAGATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTTT(SEQ ID NO:187);
(4)NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTT(SEQ ID NO:188);
(5) NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGTTTTTTT (SEQ ID NO: 189); and
(6)NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCATTTTTTTT(SEQ ID NO:190)。
in some embodiments, sequences (1) to (3) are used in combination with Cas9 from streptococcus thermophilus CRISPR 1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from streptococcus pyogenes. In some embodiments, the tracr sequence is a transcript separate from a transcript comprising the tracr mate sequence.
It will be apparent to those of skill in the art that in order to target any fusion protein comprising a Cas9 domain and a single-stranded DNA binding protein to a target site, such as a site comprising a point mutation to be edited, it is often desirable to co-express the fusion protein with a guide RNA (e.g., sgRNA), as disclosed herein. As explained in more detail elsewhere herein, the guide RNA generally comprises a tracrRNA framework that allows Cas9 binding and confers Cas9: nucleic acid editing enzyme/domain fusion protein sequence specific guide sequences.
In some embodiments, the guide RNA comprises the structure 5'- [ guide sequence ] -GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUU-3' (SEQ ID NO: 191), wherein the guide sequence comprises a target sequence that is complementary to the target sequence. The guide sequence is typically 20 nucleotides long. Based on the present disclosure, for connecting Cas9: the sequence of a suitable guide RNA for targeting a nucleic acid editing enzyme/domain fusion protein to a particular genomic target site will be apparent to those skilled in the art. Such suitable guide RNA sequences typically comprise a guide sequence complementary to a nucleic acid sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Provided herein are some exemplary guide RNA sequences suitable for targeting any provided fusion protein to a specific target sequence. Other guide sequences are well known in the art and may be used with the guide editor used in the methods and compositions described herein.
In some embodiments, PEgRNA comprises 3 major constituent elements arranged in a 5 'to 3' direction, namely: a spacer, a gRNA core, and an extension arm at the 3' end. The extension arms can be further divided in the 5 'to 3' direction into the following structural elements, namely: homology arms, editing templates, and primer binding sites. In addition, PEgRNA can comprise an optional 3 'modification region (e 1) and an optional 5' modification region (e 2). Still further, the PEgRNA may include a transcription termination signal (not depicted) at the 3' end of the PEgRNA. These structural elements are further defined herein. The description of PEgRNA structure is not meant to be limiting, but rather encompasses variations in the arrangement of elements. For example, the optional sequence modification regions (e 1) and (e 2) may be located within or between any of the other regions shown, and are not limited to being located at the 3 'and 5' ends.
In some embodiments, PEgRNA contemplated herein can be designed according to the methods defined in example 2. PEgRNA comprises 3 major component elements arranged in a 5 'to 3' direction, namely: a spacer, a gRNA core, and an extension arm at the 3' end. The extension arms can be further divided in the 5 'to 3' direction into the following structural elements, namely: homology arms, editing templates, and primer binding sites. In addition, PEgRNA can comprise an optional 3 'modification region (e 1) and an optional 5' modification region (e 2). Still further, the PEgRNA may include a transcription termination signal (not depicted) at the 3' end of the PEgRNA. These structural elements are further defined herein. The description of PEgRNA structure is not meant to be limiting, but rather encompasses variations in the arrangement of elements. For example, the optional sequence modification regions (e 1) and (e 2) may be located within or between any of the other regions shown, and are not limited to being located at the 3 'and 5' ends.
PEgRNA improvements
The PEgRNA may also include additional design improvements that may alter the properties and/or characteristics of the PEgRNA, thereby improving the efficacy of guided editing. In various embodiments, these improvements may fall into one or more of many different categories, including, but not limited to: (1) Designed to be able to efficiently express functional PEgRNA from a non-polymerase III (pol III) promoter, which is able to express longer PEgRNA without cumbersome sequence requirements; (2) Improvements to the core Cas 9-bound PEgRNA scaffold, which can increase potency; (3) Modifying PEgRNA to increase RT processivity, thereby enabling insertion of longer sequences at the target genomic site; (4) RNA motifs are added at the 5 'or 3' end of PEgRNA to increase PEgRNA stability, enhance RT processivity, prevent misfolding of PEgRNA, or recruit other elements important for genome editing.
In one embodiment, PEgRNA can be designed with a pol III promoter to increase expression of PEgRNA of longer length with larger extension arms. sgrnas are typically expressed from the U6 snRNA promoter. The promoter recruits pol III to express related RNAs and can be used to express short RNAs that remain in the nucleus. However, pol III is not very processive and cannot express RNAs of over a few hundred nucleotides in length at the levels required for efficient genome editing. In addition, pol III may stop or terminate at the extension of U, which may limit sequence diversity using PEgRNA insertion. Other promoters that recruit polymerase II (e.g., pCMV) or polymerase I (e.g., U1snRNA promoter) have been tested for their ability to express longer sgRNAs. However, these promoters are typically partially transcribed, which results in an additional sequence 5' to the spacer in the expressed PEgRNA, which has been shown to result in Cas9: sgRNA activity was significantly reduced in a site-dependent manner. Furthermore, while pol III transcribed PEgRNA can simply terminate in a 6-7U extension, PEgRNA transcribed from pol II or pol I requires a different termination signal. Typically, such signals also result in polyadenylation, and thus undesirable transport of PEgRNA from the nucleus. Similarly, RNAs expressed from pol II promoters (e.g., pCMV) are typically 5' -capped, which also results in their nuclear export.
Heretofore, rinn and colleagues screened a variety of expression platforms for the production of long non-coding RNA- (lncRNA) -tagged sgRNAs 183 . These platforms include ENE elements expressed from pCMV and terminating in MALAT1 ncRNA from human 184 PANENE elements from KSHV 185 Or 3' box from U1 snRNA 186 . Notably, MALAT1 ncRNA and PANEE form a triple helix protecting the poly A tail 184·187 . These constructs may also enhance RNA stability. These expression systems capable of expressing longer pegrnas are also contemplated.
In addition, a series of methods have been devised to cleave the pol II promoter portion that will be transcribed as part of the PEgRNA, adding a self-cleaving ribozyme (e.g., hammerhead type 188 Pistol type 189 Axe type 189 Hairpin type 190 、VS 191 、twister 192 Or a twister si ster 192 Ribozymes) or other self-cleaving element to process the guide for transcription, or Csy4 193 Identification ofAnd also results in a hair clip for the processing guide. Furthermore, it is hypothesized that the incorporation of multiple ENE motifs can increase PEgRNA expression and stability, as demonstrated previously for KSHV PAN RNA and elements 185 . Cyclization of PEgRNA in the form of circular intron RNA (cinna) is also expected to result in enhanced RNA expression and stability, as well as nuclear localization 194
In different embodiments, PEgRNA can include various of the above elements, as exemplified by the following sequences.
Non-limiting example 1-PEgRNA expression platform consisting of pCMV, csy4 hairpin, PEgRNA and MALAT1 ENE
Non-limiting example 2-PEgRNA expression platform consisting of pCMV, csy4 hairpin, PEgRNA and PAN ENE
Non-limiting example 3-PEgRNA expression platform consisting of pCMV, csy4 hairpin, PEgRNA and 3xPAN ENE
/>
Non-limiting example 4-PEgRNA expression platform consisting of pCMV, csy4 hairpin, PEgRNA and 3' box
Non-limiting example 5-PEgRNA expression platform consisting of pU1, csy4 hairpin, PEgRNA and 3' frame
In various other embodiments, PEgRNA can be improved by introducing modifications to the scaffold or core sequence. This can be done by introducing known means.
It may be possible to improve the core, PEgRNA scaffold binding Cas9 to enhance PE activity. Several such methods have been demonstrated. For example, the first mating element of scaffold (P1) comprises a GTTTT-AAAAAAC (SEQ ID NO: 68) mating element. This T extension has been shown to cause pol III pauses and premature termination of RNA transcripts. In this part of P1, a rational mutation of one of the T-A pairs to the G-C pair has been shown to enhance sgRNA activity, suggesting that this approach is also possible for PEgRNA 195 . In addition, increasing the length of P1 has been shown to enhance sgRNA folding and result in increased activity 195 This is shown to be another way to increase PEgRNA activity. Examples of improvements to the core may include:
PEgRNA containing 6nt extension to P1
PEgRNA containing T-A to G-C mutation in P1
In various other embodiments, PEgRNA can be modified by introducing modifications into the editing template region. As PEgRNA provides a templated insert that increases in size, it is more likely to be degraded by endonucleases, spontaneously hydrolyze, or fold into a secondary structure that cannot be reverse transcribed by RT or disrupt PEgRNA scaffold folding and subsequent Cas9-RT binding. Thus, pair PEg may be requiredModification of the RNA template can affect large insertions, such as insertion of the entire gene. Some strategies to do this include inserting modified nucleotides into synthetic or semi-synthetic pegrnas, making the RNAs more resistant to degradation or hydrolysis, or less likely to adopt inhibitory secondary structures 196 . Such modifications may include 8-aza-7-deazaguanosine, which reduces the RNA secondary structure in the G-rich sequence; locked Nucleic Acid (LNA), reducing degradation and enhancing certain kinds of RNA secondary structures; 2' -O-methyl, 2' -fluoro or 2' -O-methoxyethoxy modifications that enhance RNA stability. These modifications may also be included elsewhere in the PEgRNA to enhance stability and activity. Alternatively or additionally, the template of PEgRNA may be designed so that it encodes both the desired protein product and more likely adopts a simple secondary structure that can be deployed by RT. Such simple structures act as thermodynamic sources, so that more complex structures preventing reverse transcription are less likely to occur. Finally, the template can also be split into two separate pegrnas. In such designs, PE is used to initiate transcription and the individual template RNAs are recruited to the targeted location by an RNA recognition element (e.g., MS2 aptamer) on the RNA binding protein fused to Cas9 or on the PEgRNA itself. RT may bind directly to this separate template RNA or reverse transcription may be initiated on the original PEgRNA prior to switching to the second template. This approach can achieve long insertions by preventing misfolding of PEgRNA after addition of long templates without the need to dissociate Cas9 from the genome for long insertions to occur, which dissociation may inhibit PE-based long insertions.
In other embodiments, PEgRNA can be modified by introducing additional RNA motifs at the 5 'and 3' ends of the PEgRNA or even at sites in between (e.g., in the gRNA core region or spacer). Several such motifs, e.g., PAN ENE from KSHV and ENE from MALAT1, are discussed above as possible means of terminating expression of longer PEgRNA from a non-pol III promoter. These elements form RNA triplexes that engulf the polyA tail, resulting in their retention in the nucleus 184,187 . However, by forming complex structures of closed terminal nucleotides at the 3' end of PEgRNA, these structures may also help prevent exonuclease-mediatedPEgRNA degradation of (c).
The insertion of additional structural elements at the 3' end may also enhance RNA stability, although not termination from a non-pol III promoter. Such motifs may include hairpin or RNA quadruplexes that block the 3' end 197 Or a self-cleaving ribozyme (e.g., HDV) that results in the formation of a 2' -3' -cyclic phosphate at the 3' end and also may make PEgRNA less likely to be degraded by exonucleases 198 . Inducing cyclization of PEgRNA to form cinna by incomplete splicing also increases PEgRNA stability and results in retention of PEgRNA in the nucleus 194
Other RNA motifs can also improve the ability of RT to continue synthesis or enhance PEgRNA activity by enhancing the binding of RT to DNA-RNA duplex. The addition of native sequences bound by RT to its cognate retroviral genome enhances RT activity 199 . This may include the natural Primer Binding Site (PBS), the polypurine region (PPT) or the kis loop (kissing loop) involved in the dimerization and transcription initiation of the retroviral genome 199
Addition of dimerization motifs (e.g., kis-or GNRA tetra-ring/tetra-ring receptor pairs) at the 5 'and 3' ends of PEgRNA 200 ) Can also lead to effective cyclization of PEgRNA and improve stability. Furthermore, the addition of these motifs is expected to physically separate PEgRNA spacers from primer binding sites, preventing spacer occlusion from impeding PE activity. Short 5 'or 3' extensions of the PEgRNA that form small foot-print (toehold) hairpins at the spacer or along the primer binding site can also advantageously compete for annealing of complementary internal regions along the length of the PEgRNA, such as interactions between the spacer and the primer binding site that may exist. Finally, the kissing loops can also be used to recruit other template RNAs to the genomic site and to be able to exchange RT activity from one RNA to another. Some secondary RNA structures that can be engineered into any region of PEgRNA, including the end portions of the extension arms (i.e., e1 and e 2), are shown.
Example improvements include, but are not limited to:
PEgRNA-HDV fusion
PEgRNA-MMLV kissing ring
PEgRNA-VS ribozyme kissing loop
PEgRNA-GNRA tetracyclic/tetracyclic receptors
PEgRNA template-switched secondary RNA-HDV fusion
PEgRNA scaffolds can be further improved by directed evolution in a manner similar to how spCas9 and guide editor (PE) are improved. Directed evolution may enhance PEgRNA recognition by Cas9 or evolved Cas9 variants. Furthermore, different PEgRNA scaffold sequences may be optimal to enhance PE activity at the relevant site, reduce off-target activity, or both at different genomic loci. Finally, evolution of PEgRNA scaffolds with the addition of other RNA motifs almost certainly increases the activity of the fusion PEgRNA relative to the non-evolved fusion RNA. For example, the evolution of an allosteric ribozyme consisting of a c-di-GMP-I aptamer and a hammerhead ribozyme has led to a significant increase in activity 202 Evolution was shown to also increase the activity of hammerhead-PEgRNA fusions. Furthermore, while Cas9 does not currently generally allow 5' extension of sgrnas, directed evolution may create mutations that mitigate this intolerance, allowing the use of other RNA motifs.
The present disclosure contemplates any such means to further enhance the efficacy of the guided editing systems used in the methods and compositions disclosed herein.
In various embodiments, it may be advantageous to limit the occurrence of a continuous T sequence from the extension arm, as a continuous series of T may limit the ability of PEgRNA to be transcribed. For example, at least three consecutive T's, at least 4 consecutive T's, at least 5 consecutive T's, at least 6 consecutive T's, at least 7 consecutive T's, at least 8 consecutive T's, at least 9 consecutive T's, at least 10 consecutive T's, at least 11 consecutive T's, at least 12 consecutive T's, at least 13 consecutive T's, at least 14 consecutive T's, or at least 15 consecutive T's should be avoided or should be removed from the final design sequence. In one embodiment, the inclusion of unwanted continuous T strings in the PEgRNA extension arm can be avoided, but target sites rich in continuous A:T nucleobase pairs are avoided.
PEgRNA for evading MMR
The present disclosure also provides novel pegrnas for guided editing. As described above, guided editing using a pegRNA having a DNA synthesis template comprising three or more consecutive nucleotide mismatches relative to the target site sequence can evade correction of the MMR pathway, resulting in an increase in guided editing efficiency and/or a decrease in the frequency of indel formation as compared to the introduction of single nucleotide mismatches using guided editing. Thus, the present disclosure provides a pegRNA useful for introducing modifications into a target nucleic acid that have increased efficiency of guided editing and/or reduced frequency of indels compared to a corresponding control pegRNA that does not contain three or more consecutive nucleotide mismatches relative to the target site sequence.
The pegRNA provided by the present disclosure can be used to edit nucleic acid molecules by guided editing while improving guided editing efficiency and/or reducing indel formation. Without wishing to be bound by theory, the pegrnas provided in the present disclosure can avoid or reduce the effect of cellular MMR correction on mismatches at target sites introduced by nucleotide changes that direct editing. In some embodiments, the present disclosure provides that the extension arm of the pegRNA comprises three or more consecutive nucleotide mismatches relative to a target site on the nucleic acid molecule. In some embodiments, the DNA synthesis template of the pegRNA comprises three or more consecutive nucleotide mismatches relative to the target site on the nucleic acid molecule. In some embodiments, at least one of the three or more consecutive nucleotide mismatches introduces a silent mutation. In some embodiments, at least one consecutive nucleotide mismatch results in a change in the amino acid sequence of a protein expressed from the target nucleic acid molecule, while at least one remaining nucleotide mismatch is a silent mutation. The silent mutation may be introduced into the coding region of the target nucleic acid molecule or into a non-coding region of the target nucleic acid molecule. When silent mutations are introduced into the coding region, they may introduce one or more substitution codons that encode the same amino acid as the unedited nucleic acid molecule into the nucleic acid molecule. When silent mutations are introduced into non-coding regions, they may be present in regions of a nucleic acid molecule that do not affect splicing, gene regulation, RNA life, or other biological properties of a target site on a nucleic acid molecule.
Any number of three or more consecutive nucleotide mismatches may be incorporated into the extension arm of the pegRNA described herein to achieve the benefit of avoiding or reducing the effects of cellular MMR pathway correction. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to the target site on the nucleic acid molecule.
Any number of three or more consecutive nucleotide mismatches may be incorporated into the extension arm of the pegRNA described herein to achieve the benefit of escaping correction of the MMR pathway, thereby improving editing efficiency and/or reducing unintended indels. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises three consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 3-5 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 6-10 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 11-20 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 20-25 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule edited by the guided editing.
Kit for detecting a substance in a sample
The compositions of the present disclosure may be assembled into a kit. In some embodiments, the kit comprises a nucleic acid vector for expressing a guidance editor as described herein and an MMR inhibitor (e.g., without limitation, dominant negative variants of MLH 1). In other embodiments, the kit further comprises suitable guide nucleotide sequences (e.g., PEgRNA and second locus gRNA) or nucleic acid vectors for expressing such guide nucleotide sequences to target the Cas9 protein or guide editor to a desired target sequence.
The kits described herein can include one or more containers containing components for performing the methods described herein and optionally instructions for use. Any of the kits described herein may also include components necessary to perform the assay methods. Where applicable, the components of the kit may be provided in liquid form (e.g., solution) or in solid form (e.g., dry powder). In some cases, some components may be reconstituted or otherwise processable (e.g., into an active form), for example, by the addition of a suitable solvent or other substance (e.g., water), which may or may not be provided with the kit.
In some embodiments, the kit may optionally include instructions and/or promotions for use of the provided components. As used herein, "description" may designate an element of description and/or promotion, and generally refers to written description relating to or associated with the packages of the present disclosure. The instructions may also include any verbal or electronic instructions provided in any manner such that the user will clearly recognize that the instructions will be associated with the kit, e.g., audiovisual (e.g., video tape, DVD, etc.), internet and/or web-based communications, etc. The written instructions may take the form prescribed by a government agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which may also reflect approval of the manufacture, use or sale agency for administration to animals. As used herein, "promotional" includes all methods of conducting business, including educational methods, hospital and other clinical guidelines, scientific research, drug discovery or development, academic research, pharmaceutical industry activities (including pharmaceutical sales), and any advertising or other promotional activities, including any form of written, verbal, and electronic communications relevant to the present disclosure. Furthermore, as described herein, the kit may include other components depending on the particular application.
The kit may comprise any one or more of the components described herein in one or more containers. The components may be prepared aseptically, packaged in syringes and shipped refrigerated. Alternatively, it may be stored in a vial or other container. The second container may have other components prepared aseptically. Alternatively, the kit may include the active agent pre-mixed and transported in a vial, tube or other container.
The kit may take a variety of forms, such as a blister pack, shrink-wrap bag, vacuum-sealed bag, sealed thermoformed tray, or similar pouch or tray form, wherein the fitment is loosely packaged within a pouch, one or more tubes, containers, boxes, or bags. The kit may be sterilized after the addition of the accessories, allowing the individual accessories in the container to be opened in other ways. The kit may be sterilized using any suitable sterilization technique, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. The kit may also include other components, such as containers, cell culture media, salts, buffers, reagents, syringes, needles, fabrics (e.g., gauze) for applying or removing disinfectant, disposable gloves, supports for reagents prior to application, and the like, depending on the particular application. Some aspects of the present disclosure provide kits comprising nucleic acid constructs comprising nucleotide sequences encoding various components of the guided editing systems used in the methods and compositions described herein (e.g., including, but not limited to, napDNAbp, reverse transcriptase, polymerase, fusion proteins (e.g., comprising napDNAbps and reverse transcriptase (or more broadly, polymerase), extended guide RNAs and complexes comprising fusion proteins and extended guide RNAs), and auxiliary elements, such as a second nick-producing component (e.g., a second nick-producing gRNA) and a 5' endogenous DNA flap removal endonuclease to help drive the guided editing process toward editing product formation). In some embodiments, the nucleotide sequences comprise a heterologous promoter (or more than one promoter) that drives expression of the guided editing system components.
Other aspects of the disclosure provide kits comprising one or more nucleic acid constructs encoding various components of the guided editing systems used in the methods and compositions described herein, e.g., comprising nucleotide sequences encoding components of the guided editing systems capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of a component of the guidance system.
Some aspects of the disclosure provide kits comprising a nucleic acid construct comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., cas9 domain) fused to a reverse transcriptase, and (b) a heterologous promoter that drives the expression of the sequence of (a).
Cells
Cells that may comprise any of the compositions described herein include prokaryotic cells and eukaryotic cells. The methods described herein are useful for delivering Cas9 proteins or guide editors and MMR inhibitors (e.g., dominant negative variants of MLH 1) to eukaryotic cells (e.g., mammalian cells such as human cells). In some embodiments, the cells are in vitro (e.g., cultured cells). In some embodiments, the cell is in vivo (e.g., in a subject such as a human subject). In some embodiments, the cells are ex vivo (e.g., isolated from a subject and can be administered back to the same or a different subject).
Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells), or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines including, but not limited to, human Embryonic Kidney (HEK) cells, heLa cells, cancer cells from the U.S. national cancer institute (National Cancer Institute) 60 cancer cell line (NCI 60), DU145 (prostate cancer) cells, lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from myeloma) and Saos-2 (bone cancer) cells. In some embodiments, the rAAV vector is delivered into Human Embryonic Kidney (HEK) cells (e.g., HEK 293 or HEK 293T cells). In some embodiments, the rAAV vector is delivered into a stem cell (e.g., a human stem cell), such as a pluripotent stem cell (e.g., a human pluripotent stem cell, including a human induced pluripotent stem cell (hiPSC)). Stem cells refer to cells that are capable of dividing indefinitely in culture and producing specialized cells. Pluripotent stem cells refer to a class of stem cells that are capable of differentiating into all tissues of an organism, but are incapable of supporting the development of the complete organism alone. Human induced pluripotent stem cells refer to somatic (e.g., mature or adult) cells that are reprogrammed to an embryonic stem-like state by forced expression of genes and factors important to maintain defined characteristics of the embryonic stem cells (see, e.g., takahashi and Yamanaka, cell 126 (4): 663-76, 2006, incorporated herein by reference). Human induced pluripotent stem cells express stem cell markers and are capable of producing cells with all three germ layers (ectodermal, endodermal, mesodermal) characteristics.
Other non-limiting examples of cell lines that may be used in accordance with the present disclosure include 293-T, 3T3, 4T1, 721, 9L, A-549, A172, A20, A253, A2780ADR, A2780cis, A431, ALC, B16, B35, BCP-1, BEAS-2B, bEnd.3, BHK-21, BR 293, bxPC3, C2C12, C3H-10T1/2, C6/36, cal-27, CGR8, CHO, CML T1, CMT, COR-L23/5010, COR-L23/CPR, COR-L23, COS-7, COV-434, CT26, D17, DH82, DU145, duCaP, E14Tg2a, EL4, EM2, EM3, EMT6/AR1, T6/AR10, HB 0.9, HB 3H 9, HCL 1, CML 2, HCP 7, HCP 54, HCP 1, HCP 2; high Five cells, HL-60, HMEC, HT-29, HUVEC, JUVEC, jurkat, JY cells, K562 cells, KCL22, KG1, ku812, KYO1, LNCap, ma-Mel 1, 2, 3..48, MC-38, MCF-10A, MCF-7, MDA-MB-231, MDA-MB-435, MDA-MB-468, MDCK II, MG63, MONO-MAC 6, MOR/0.2R, MRC5, MTD-1A, myEnd, NALM-1, NCI-H69/CPR, NCI-H69/LX10 NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NW-145, OPCN/OPCTPER, PNT-1A/PNT2, PTK2, raji, RBL cells, renCa, RIN-5F, RMA/RMAS, S2, saos-2 cells, sf21, sf9, siHa, SKBR3, SKOV-3, T-47D, T2, T84, THP1, U373, U87, U937, VCaP, WM39, WT-49, X63, YAC-1 and YAC cells.
Some aspects of the disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, the host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, the cells are transfected ex vivo. In some embodiments, the cell is transfected in vivo. In some embodiments, the transfected cells are taken from a subject. In some embodiments, the cells are derived from cells, such as cell lines, taken from the subject. A variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, heLa-S3, huh1, huh4, huh7, HUVEC, HASMC, HEKn, HEKa, miaPaCell, panc 1.1, PC-3, TF1, CTLL-2, C1R, rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, calu1, SW480, SW620, SKOV3, SK-UT, caCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, jurkat, J45.01, LRMB, bcl-1, BC-3, IC21, DLD2, raw264.7, NRK-52E, MRC, MEF, hepG 2, heLa B, heLa T4, COS-1, COS-6, COS-M6-A, BS-C1 monkey kidney epithelium, BAHI-231, HB/3T 3 mouse fibroblast, T3, swiss 3, sword 3, 3-3, and human embryo cells; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A172, A20, A253, A431, A-549, ALC, B16, B35, BCP-12B, BCP-12B, 3, BHK-21, BR 293.BxPC 3. C3H-10T1/2, C6/36, cal-27, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr-/-, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, duCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, heLa, hepa1C1C7, HL-60, HMEC, HT-29, jurkat, JY cells, K562 cells, ku812, KCL22, KG1, KYO1, ma-Mel1-48 MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, myEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, peer, PNT-1A/PNT 2, renCa, RIN-5F, RMA/AS, saos-2 cells, sf-9, SKBr3, T2, T-47D, T, THP1 cell lines, U373, U87, U937, VCaP, vero cells, WM39, WT-49, X63, YAC-1, YAR and transgenic variants thereof.
Cell lines can be obtained from a variety of sources known to those skilled in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, va)). In some embodiments, cells transfected with one or more vectors described herein are used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, cells transiently transfected (e.g., transiently transfected with one or more vectors, or transfected with RNA) with a CRISPR system component as described herein and modified by the activity of a CRISPR complex are used to establish a new cell line, wherein the new cell line comprises cells comprising the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more of the vectors described herein, or cell lines derived from such cells, are used to evaluate one or more test compounds.
Carrier body
Some aspects of the disclosure relate to the use of recombinant viral vectors (e.g., adeno-associated viral vectors, adenoviral vectors, or herpes simplex viral vectors) to deliver the guidance editors and dominant negative variants of MLH1 described herein to cells. In the case of the split PE approach, the N-terminal portion of the PE fusion protein and the C-terminal portion of the PE fusion protein are delivered to the same cell by separate recombinant viral vectors (e.g., adeno-associated viral vectors, adenovirus vectors, or herpes simplex viral vectors) because the full-length Cas9 protein or guide editor exceeds packaging limitations of the various viral vectors, such as rAAV (about 4.9 kb).
In some embodiments, the vector used herein may encode a PE fusion protein or any component thereof (e.g., napDNAbp, linker, or polymerase) or a dominant negative mutant of MLH 1. In addition, vectors used herein may encode pegrnas and/or helper grnas for second strand nick generation. The vector may be capable of driving expression of one or more coding sequences in a cell. In some embodiments, the cell may be a prokaryotic cell, such as a bacterial cell. In some embodiments, the cell may be a eukaryotic cell, such as a yeast, plant, insect, or mammalian cell. In some embodiments, the eukaryotic cell may be a mammalian cell. In some embodiments, the eukaryotic cell may be a rodent cell. In some embodiments, the eukaryotic cell may be a human cell. Suitable promoters for driving expression in different types of cells are known in the art. In some embodiments, the promoter may be wild-type. In other embodiments, the promoter may be modified for more efficient or effective expression. In other embodiments, the promoter may be truncated but retain its function. For example, the promoter may have a normal size or reduced size suitable for proper packaging of the vector into a virus.
In some embodiments, promoters useful for directing the editor vector may be constitutive, inducible, or tissue specific. In some embodiments, the promoter may be a constitutive promoter. Non-limiting exemplary constitutive promoters include the cytomegalovirus immediate early promoter (CMV), the Simian Virus (SV 40) promoter, the adenovirus Major Late (MLP) promoter, rous sarcomaViral (RSV) promoter, mouse Mammary Tumor Virus (MMTV) promoter, phosphoglycerate kinase (PGK) promoter, elongation factor- α (EF 1 a) promoter, ubiquitin promoter, actin promoter, tubulin promoter, immunoglobulin promoter, functional fragments thereof, or a combination of any of the foregoing. In some embodiments, the promoter may be a CMV promoter. In some embodiments, the promoter may be a truncated CMV promoter. In other embodiments, the promoter may be an EF1a promoter. In some embodiments, the promoter may be an inducible promoter. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be a promoter with low basal (non-inducible) expression levels, such as Promoter (Clontech). In some embodiments, the promoter may be a tissue-specific promoter. In some embodiments, the tissue-specific promoter is expressed exclusively or predominantly in liver tissue. Non-limiting exemplary tissue-specific promoters include the B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-. Beta.promoter, mb promoter, nphs1 promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.
In some embodiments, the guidance editor and the MLH1 dominant negative mutant vector (e.g., including any vector encoding a guidance editor fusion protein and/or PEgRNA and/or auxiliary second strand incision generating gRNA) may comprise an inducible promoter to begin expression only after it is delivered to the target cell. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be a promoter with a low basis (non-basic Inducible) expression level of promoters, e.g.Promoter (Clontech).
In other embodiments, the guidance editor vector (e.g., including any vector encoding a guidance editor fusion protein and/or PEgRNA and/or an auxiliary second strand incision generating gRNA) and the MLH1 dominant negative mutant vector (e.g., any vector encoding an MLH1 dominant negative mutant as described herein) may comprise a tissue specific promoter to begin expression only after it has been delivered to a particular tissue. Non-limiting exemplary tissue-specific promoters include the B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, flt-1 promoter, GFAPpro promoter, GPIIb promoter, ICAM-2 promoter, INF-beta promoter, mb promoter, nphs1 promoter, OG-2 promoter, SP-B promoter, SYN1 promoter and WASP promoter.
In some embodiments, the nucleotide sequence encoding PEgRNA (or any guide RNA used in connection with guided editing) may be operably linked to at least one transcriptional or translational control sequence. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to at least one promoter. In some embodiments, the promoter is recognized by RNA polymerase III (Pol III). Non-limiting examples of Pol III promoters include U6, HI, and tRNA promoters. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human U6 promoter. In other embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human HI promoter. In some embodiments, the nucleotide sequence encoding the guide RNA can be operably linked to a mouse or human tRNA promoter. In embodiments with more than one guide RNA, the promoters used to drive expression may be the same or different. In some embodiments, the nucleotides encoding the crRNA of the guide RNA and the nucleotides encoding the tracrRNA of the guide RNA may be provided on the same vector. In some embodiments, the nucleotide encoding the crRNA and the nucleotide encoding the tracrRNA may be driven by the same promoter. In some embodiments, the crRNA and tracrRNA can be transcribed into a single transcript. For example, crrnas and tracrrnas can be processed from a single transcript to form a bi-molecular guide RNA. Alternatively, crrnas and tracrRNA may be transcribed into single molecule guide RNAs.
In some embodiments, the nucleotide sequence encoding the guide RNA may be located on the same vector comprising the nucleotide sequence encoding the PE fusion protein. In some embodiments, the expression of the guide RNA and PE fusion proteins may be driven by their respective promoters. In some embodiments, the expression of the guide RNA may be driven by the same promoter that drives expression of the PE fusion protein. In some embodiments, the guide RNA and PE fusion protein transcripts may be contained within a single transcript. For example, the guide RNA can be within the untranslated region (UTR) of the Cas9 protein transcript. In some embodiments, the guide RNA may be within the 5' utr of the PE fusion protein transcript. In other embodiments, the guide RNA may be within the 3' utr of the PE fusion protein transcript. In some embodiments, the intracellular half-life of a PE fusion protein transcript may be reduced by including a guide RNA within its 3'utr and thereby shortening the length of its 3' utr. In other embodiments, the guide RNA may be within an intron of the PE fusion protein transcript. In some embodiments, a suitable splice site may be added at the intron where the guide RNA is located, such that the guide RNA is correctly spliced out of the transcript. In some embodiments, expression of Cas9 protein and guide RNA in close proximity on the same vector may promote more efficient formation of CRISPR complexes.
The carrier system may comprise one carrier, or two carriers, or three carriers, or four carriers, or five carriers, or more. In some embodiments, the vector system may comprise a single vector encoding the PE fusion protein, PEgRNA, and MLH1 dominant negative mutant. In other embodiments, the vector system may comprise two vectors, one of which encodes the PE fusion protein and the PEgRNA and the other of which encodes the dominant negative mutant of MLH 1.
Some examples of materials that may be used as pharmaceutically acceptable carriers include: (1) saccharides such as lactose, glucose and sucrose; (2) starches, such as corn starch, potato starch; (3) Cellulose and its derivatives such as sodium carboxymethyl cellulose, methyl cellulose, ethyl cellulose, microcrystalline cellulose, cellulose acetate, etc.; (4) powdery tragacanth; (5) malt; (6) gelatin; (7) Lubricants such as magnesium stearate, sodium lauryl sulfate, talc, and the like; (8) excipients such as cocoa butter, suppository waxes, etc.; (9) Oils such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, soybean oil, and the like; (10) glycols, such as propylene glycol; (11) Polyols such as glycerol, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents such as magnesium hydroxide, aluminum hydroxide, etc.; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethanol; (20) a pH buffer solution; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) Serum components such as serum albumin, high density lipoprotein and low density lipoprotein; (22) C2-C12 alcohols, such as ethanol; (23) other non-toxic compatible substances for pharmaceutical formulations. Wetting agents, colorants, mold release agents, coating agents, sweeteners, flavoring agents, perfuming agents, preservatives and antioxidants can also be present in the formulation. Terms such as "excipient," "carrier," "pharmaceutically acceptable carrier," and the like are used interchangeably herein.
Delivery method
In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as one or more vectors described herein, one or more transcripts thereof, and/or one or more proteins transcribed therefrom, to a host cell. In some aspects, the invention also provides cells produced by such methods, as well as organisms (e.g., animals, plants, or fungi) comprising or produced by such cells. In some embodiments, the guidance editors described herein are delivered to the cell in combination with (and optionally complexed with) a guide sequence and an inhibitor of the DNA mismatch repair pathway. In any of the delivery methods described herein, an inhibitor of the DNA mismatch repair pathway may also be delivered with the guidance editor. In some embodiments, the inhibitor is MLH1dn as further described herein. In some embodiments, the inhibitor is encoded on the same vector as the guidance editor. In certain embodiments, the inhibitor is fused to the priming editor. In some embodiments, the inhibitor is encoded on a second vector that is delivered with a vector encoding a lead editor. In some embodiments, the inhibitor that directs the editor fusion protein and DNA mismatch repair pathway is delivered directly to the cell as a protein. In certain embodiments, the guidance editor is fused to an inhibitor of the DNA mismatch repair pathway and the fusion protein is delivered directly into the cell.
Exemplary delivery strategies include vector-based strategies, PE ribonucleoprotein complex delivery, and delivery of PE by mRNA methods.
In some embodiments, delivery methods are provided that include nuclear transfection, microinjection, gene gun, virion, liposome, immunoliposome, polycation, or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent enhanced DNA uptake.
Exemplary nucleic acid delivery methods include lipofection, nuclear transfection, electroporation, stable genome integration (e.g., piggybac), microinjection, gene gun, virion, liposome, immunoliposome, polycation, or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent enhanced DNA uptake. Lipofection is described, for example, in U.S. patent No. 5,049,386,4,946,787; and 4,897,355, lipid transfection reagents are commercially available (e.g., transfectam TM 、Lipofectin TM And SFCellLine 4D-Nucleofector X Kit TM (Lonza)). Cationic and neutral lipids suitable for efficient receptor recognition lipid transfection of polynucleotides include Feigner in WO91/17424; those in WO 91/16024. Delivery may be cellular (e.g., in vitro or ex vivo administration) or target tissue (e.g., in vivo administration). Delivery may be achieved through the use of RNP complexes.
Lipid: preparation of nucleic acid complexes, including targeting liposomes, such as immunolipid complexes, is well known to those skilled in the art (see, e.g., crystal, science 270:404-410 (1995); blaese et al, cancer Gene Ther.2:291-297 (1995); behr et al, bioconjugate chem.5:382-389 (1994); remy et al, bioconjugate chem.5:647-654 (1994); gao et al, gene Therapy 2:710-722 (1995); ahmad et al, cancer Res.52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054,4,501,728, 4,774,085, 4,837,028, and 4,946,787).
In other embodiments, the delivery methods and vectors provided herein are RNP complexes. RNP delivery of fusion proteins significantly enhances DNA specificity for base editing. RNP delivery of fusion proteins results in separation of mid-target and off-target DNA editing. RNP delivery eliminates off-target editing of non-repeat sites while maintaining mid-target editing comparable to plasmid delivery, and greatly reduces off-target DNA editing even at highly repeated VEGFA site 2. See Rees, h.a. et al Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery, nat.commun.8, 15790 (2017), U.S. patent No. 9,526,784 to date 2016, 12, 27 and U.S. patent No. 9,737,604 to date 2017, 8, 22, each of which is incorporated herein by reference.
Other methods of delivering nucleic acids to cells are known to those of skill in the art. See, for example, US2003/0087817, incorporated herein by reference.
Other aspects of the disclosure provide methods of delivering a guided editor construct to a cell to form a complete functional guided editor within the cell. For example, in some embodiments, the cells are contacted with a composition described herein (e.g., a composition comprising a nucleotide sequence encoding a split Cas9 or a split guide editor or an AAV particle comprising a nucleic acid vector comprising such a nucleotide sequence). In some embodiments, the contacting results in the delivery of such nucleotide sequences into the cell, wherein the N-terminal portion of the Cas9 protein or guide editor and the C-terminal portion of the Cas9 protein or guide editor are expressed and linked within the cell to form the complete Cas9 protein or complete guide editor.
It is to be understood that any of the rAAV particles, nucleic acid molecules, or compositions provided herein can be stably or transiently introduced into a cell in any suitable manner. In some embodiments, the disclosed proteins can be transfected into cells. In some embodiments, the cell may be transduced or transfected with a nucleic acid molecule. For example, a nucleic acid molecule encoding a split protein or rAAV particle containing a viral genome encoding one or more nucleic acid molecules can be transduced (e.g., with a virus encoding a split protein) or transfected (e.g., with a plasmid encoding a split protein). Such transduction may be stable or transient. In some embodiments, the cell expressing the split protein or containing the split protein can be transduced or transfected with one or more guide RNA sequences, for example, when delivering the split Cas9 (e.g., nCas 9) protein. In some embodiments, plasmids expressing the split proteins can be introduced into the cells by electroporation, transient (e.g., lipofection), and stable genomic integration (e.g., piggybac), and viral transduction, or other methods known to those of skill in the art.
Examples
Example 1: enhanced guided editing system by identifying and manipulating cellular determinants of editing results
Introduction to the invention
The ability to manipulate the genome in a programmable manner suggests biology and has shown promise in the clinical treatment of genetic diseases. In order to expand the breadth of sequence variations that can be precisely installed in living cells, guided editing was developed, which is a gene editing method that can achieve all types of targeted DNA base pair substitutions, small insertions, small deletions, and combinations thereof, without the need for double-stranded DNA breaks or donor DNA templates (Anzalone et al, 2020;Anzalone et al, 2019). Guided editing has been widely used to introduce genetic changes in drosophila (Bosch et al, 2021), rice and wheat (Lin et al, 2020), zebra fish (Petri et al, 2021), mouse embryos (Liu et al, 2020), post-partum mice (Liu et al, 2021), human stem cells (S uqu et al, 2020), and patient-derived organoids (Schene et al, 2020). Despite its versatility, guided editing efficiencies can vary greatly between different editing categories, target loci, and cell types (Anzalone et al, 2019). To maximize the utility of guided editing, cellular determinants of the guided editing results are identified and the resulting hint is used to develop a guided editing system to improve efficiency and result purity.
The PE2 guided editing system consists of two components: an engineered Reverse Transcriptase (RT) fused to Cas9 nickase (PE 2 protein) and a guide editing guide RNA (pegRNA) containing a spacer sequence complementary to the target DNA and encoding the 3' extension desired to edit (Anzalone et al 2020;Anzalone et al, 2019) (fig. 40A). These components together form a PE2-pegRNA complex that binds to one strand of the target DNA locus and nicks the opposite strand. The nick exposes the 3' end of the DNA which hybridizes to the Primer Binding Site (PBS) in the pegRNA extension. RT templates within the pegRNA extension reverse-transcribe, then generate a 3' DNA flap containing the editing sequence, and ultimately result in the installation of the sequence into the genome.
The PE3 system differs from PE2 in that the non-editing strand is cleaved from the pegRNA target using additional unidirectional guide RNAs (sgrnas), which improves editing efficiency by stimulating substitution of the non-editing strand (Anzalone et al, 2019) (fig. 40A).
According to the current model, the newly synthesized 3' flap replaces the adjacent strand of genomic DNA by flap interconversion (fig. 40A). The shifted 5' flap is then excised and the edited sequence is ligated into the genome. Nicking the non-editing strand in the PE3 system is believed to induce cellular replacement of the non-editing strand during heteroduplex resolution, thereby facilitating replication of the edited sequence to the complementary strand.
The inventors studied the role of DNA repair mechanisms in guided editing and developed an improved guided editing system by manipulating these processes. As described herein, mixed CRISPR interference (CRISPRi) based screening was used to systematically investigate the effect of 476 genes involved in DNA repair and related processes on substitution-guided editing results. Specific DNA mismatch repair (MMR) genes were identified as strongly suppressing the guided editing efficiency and promoting indel formation. Consistent with the model of heteroduplex DNA formed during MMR recovery guide editing, guide editing categories that are not susceptible to MMR activity are identified and thus generated more efficiently, including g.c-to-c.g transversions and substitution of three or more consecutive bases. Integrating these findings, a novel guided editing system was developed to improve editing results by transient expression of dominant negative MMR protein (MLH 1 dn). In MMR-potent cell types (including induced pluripotent stem cells and primary T cells), the editing efficiency of these PE4 (pe2+mlh1dn) and PE5 (pe3+mlh1dn) systems was improved by 7.7 and 2.0 times on average over PE2 and PE3, and the editing was: the indel ratio (resulting purity) was increased 3.4-fold. Transient co-expression of MLH1dn does not result in detectable changes in microsatellite repeat length (clinically used MMR efficacy biomarkers) (Umar et al, 2004). By varying codon usage, nuclear localization signals, cas9 domain mutations, and linker composition and length, an optimized guided editor architecture was engineered (PEmax-see fig. 54B) that increased editing efficiency in conjunction with improved PE4, PE5, and recently developed engineering pegRNA (epegRNA) (Nelson et al, 2021). Finally, studies have shown that strategically installing additional silent mutations near the intended edit can improve guided editing efficiency by attenuating MMR recognition. These findings deepen understanding of pilot editing and established a pilot editing system that significantly improved efficiency and resultant purity in 169 different edits of 19 loci of seven mammalian cell types.
Results
Design for pooled CRISPRi screening to guide editing results
A novel gene screening method called Repair-seq was used to study guided editing efficiency and results, including original sequence, desired editing, and indel frequency. Repair-seq measures the effect of many loss-of-function disturbances on genome editing experiments by correlating identity of CRISPRi sgRNAs with editing sites in pooling screening (Hussmann et al, 2021) (FIG. 40B). Briefly, a library of sgrnas is transduced by lentiviruses into cells expressing the CRISPRi effector protein (dCas 9-KRAB) such that most infected cells receive only one sgRNA, resulting in a one gene knockdown per cell. Target sequences for genome editing were also delivered with the sgRNA library in the same lentiviral cassette. After genome editing at the target site, double-ended sequencing allows the frequency of each editing result to be measured while each related CRISPRi is interfering.
To adapt the pair-seq to the guide editing, the guide editing and the CRISPRi (also typically dependent on streptococcus pyogenes Cas9 (SpCas 9)) are first made orthogonal. A SaPE 2-guided editor variant (Ran et al, 2015) was constructed by replacing the SpCas9 nickase domain in PE2 with a staphylococcus aureus Cas9 (SaCas 9) N580A nickase. Pilot editing activity was verified using SaPE2 and orthogonal staphylococcus aureus pegRNA (Sa-pegRNA) (fig. 47A).
Next, lentiviral Repair-seq vectors were designed for SaPE2 screening by adding a complex SaPE2 editing site. This site consisted of a single pre-target spacer identified as highly edited in HEK293T cells (fig. 47A) and two flanking pre-spacers that allowed the complementary strand (unedited strand) to nick the target 50-bp downstream (+50 nick) or 50-bp upstream (-50 nick) (fig. 40B, fig. 47B). The design supports the guided editing of SaPE2 in three ways: PE2, PE3 with a +50 cut (PE 3 +50), or PE3 with a-50 cut (PE 3-50) (FIG. 47C).
To enable screening, the Sa-pegRNA was optimized to edit the site in a validated HeLa CRISPRi cell line (Gilbert et al, 2013). These cells were transfected with SaPE2, sa-pegRNA and Sa-sgRNA plasmids, programming G.C-to-C.G transversions at the editing site, yielding up to 5.2% editing and 0.17% indels (PE 2), 12% editing and 5.8% indels (PE3+50), and 2.6% editing and 19% indels (PE 3-50) (FIG. 47D). The high proportion of indels from PE3-50 is consistent with previous reports that double-nicks leave 3 'projections that result in higher insertion frequencies than 5' projections (Bothmer et al, 2017). Successfully transfected cells were enriched using SaPE2 constructs carrying blasticidin selection markers, with an expected increase in editing frequency of 9.4% with PE2, 15% with pe3+50, and 3.5% with PE3-50 (fig. 47E). Together, these efforts establish screening assays in which the baseline that directs the results of editing is within an efficiency range that is well suited for detecting increases or decreases in editing caused by CRISPRi interference.
Identification of DNA repair Components affecting guided editing results
Using this optimized assay, repair-seq screening of pilot editing results was performed using PE2 and PE3+50 in K562 and HeLa cells and PE3-50 in HeLa cells. A library of 1,513 sgrnas targeting 476 genes for DNA repair and effect enrichment in the relevant process (fig. 48F) and 60 non-targeted control sgrnas were cloned into lentiviral screening vectors (humismann et al 2021) (table 5). The library was transduced into human K562 and HeLa CRISPRi cell lines (Gilbert et al 2014;Gilbert et al, 2013) and after 5 days cells were transfected with SaPE2, sa-pegRNA and Sa-sgRNA plasmids that programmed g.c-to-c.g transitions at pre-verified editing sites. Genomic DNA was extracted from cells 3 days after transfection, 453bp region containing CRISPRi sgRNA, editing site and complementary nick site was amplified by PCR, and double-ended sequencing was performed to measure distribution of editing results for each gene interference (fig. 40B, fig. 47B). To interpret the resulting data, the frequency of editing results for cells containing gene-targeted CRISPRi sgrnas was compared to the corresponding frequency for cells containing non-targeted sgrnas controls. A decrease in the frequency of results after gene knockdown indicates that the activity of the gene promoted the formation of results, while an increase in the frequency indicates that the activity of the gene inhibited the results.
The effect of gene knockdown on the expected G.C-to-C.G edit frequency was examined. In cells with non-targeted CRISPRi sgrnas, sequencing reads of 4.3% -4.9% (K562) and 8.5% -8.7% (HeLa) contained exactly the expected edits after PE2 edits (fig. 40C-40D). For PE3+50 these levels increased to 14% -16% (K562) and 14% -16% (HeLa), but for PE3-50 these levels decreased to 2.1% -2.2% (HeLa). In all pilot editing conditions screened, CRISPRi targeted components MSH2, MSH6, MLH1 and PMS2 of the mutsα -mutlαmmr complex (Iyer et al, 2006;Kunkel and Erie,2005;Li,2008), significantly increased the editing efficiency of PE2 up to 5.8 fold, the increase in pe3+50 up to 2.5 fold, and the increase in PE3-50 up to 2.0 fold (fig. 40E-40G and 47G-47J, table 6). Knock-down of EXO1 (exonuclease acting in MMR (Genschel et al, 2002)) also increases the expected editing efficiency of PE2 in K562 cells by up to 2.3-fold. In contrast, knockdown of LIG1 (nick-blocking DNA ligase, pasco et al 2004) and FEN1 (5 'flap endonuclease, liu et al 2004) reduced the frequency of expected edits, consistent with their role in guiding nick ligation and 5' flap excision during editing previously proposed (Anzalone et al 2019). Taken together, these data indicate that MMR activity is mediated through editing against point mutation installations.
In addition to the expected editing, the Repair-seq screen also measures the effect of genetic interference on editing byproduct formation. These byproducts fall into four main categories: deletion (FIG. 41A), tandem repeat (FIG. 41B) and two types of results containing unexpected sequences from the pegRNA (FIGS. 41C-41D). The baseline frequency of total unexpected edits in PE2 was lower (0.31% in K562, 0.60% in HeLa; fig. 41E), but unexpected byproducts were observed to be more frequent and diverse in PE3-50 (58% in HeLa; fig. 48A) and PE3+50 (8.2% in K562, 9.5% in HeLa; fig. 41F). Baseline frequencies and genetic modulators of these classes varied among PE2, pe3+50 and PE3-50 screens, providing rich observations as to how to handle different guided editing constructs (fig. 48A-48I). Two of these observations provide information on the model of the role of MMR activity during pilot editing.
First, one unexpected result category contains expected G.C to C.G edits and additional base substitutions and 1-nucleotide (nt) insertions near the target site (FIG. 41C). These additional mutated sequences match perfectly the 9-nt at the 3 'end of the pegRNA scaffold sequence, consistent with reverse transcription to the pegRNA scaffold and incorporation of the resulting 3' DNA flap into a partially homologous genomic sequence. Re-encoding the pegRNA scaffold to avoid sequence homology to the genomic target greatly reduced the frequency of these additional mutations (fig. 49A-49B), indicating a general approach to eliminating this type of editing by-product. Notably, it was observed that the knockdown of MMR gene increased the frequency of this editing by-product from 0.08% to 2.0% in the PE 2K 562 screen (fig. 41E, fig. 48D). Thus, MMR suppresses the formation of this result to a greater extent than the intended editing, indicating that different guided editing intermediates may differ in the extent to which they are handled by MMR.
Second, MMR knockdown reduced the frequency of unintended results from the majority of categories of pe3+50 (fig. 41F-41H and 48H), suggesting that temporarily suppressing some MMR activity may increase the efficiency of guided editing and the purity of the results. Although tandem repeats of sequences between nicks are common for PE3-50 (fig. 48A), these repeat results are rare for pe3+50 (0.37% of non-targeted reads in KS62, 2.3% of non-targeted reads in HeLa), and up to 3.7-fold (K562) and 1.5-fold (HeLa) reduction by MMR knockdown (fig. 41F, fig. 48H). The most abundant class of unexpected PE3+50 results contained sequences from reverse transcribed 3' DNA flaps that did not re-incorporate the genomic sequence at the expected flap annealing position (5.1% of non-targeted reads in K562, 3.8% in HeLa; FIG. 41D). In both cell types, the frequency of unintended flap reconnection results increased after knocking down HLTF (cross-superplasticizer) (Poole and corez, 2017), but decreased up to 1.7-fold upon knocking down MMR genes (fig. 41H). Finally, MMR knockdown significantly reduced the deletion of PE3+50 up to 3.7-fold (K562) and 1.6-fold (HeLa; FIG. 41G). MMR knockdown in K562 cells also qualitatively altered the boundaries of the observed deleted sequences. For both PE3 constructs, the genomic sequence between the two SaPE 2-induced nicks was most frequently deleted, but a deletion extending beyond this region was also observed (fig. 41I, top). MMR knockdown significantly reduced the frequency of these longer deletions than the deletions between the cuts programmed with pe3+50 (fig. 41I, bottom and fig. 41J), suggesting that MMR activity may lead to the formation of longer deletions during pilot editing.
Model for guiding mismatch repair of editing intermediates
In these Repair-seq screens, the impact of MMR knockdown on both expected and unexpected editing results in a working model that MMR acts during guided editing. In eukaryotes, MMR resolves DNA heteroduplexes containing single base mismatches or small indel loops (IDLs) by selectively replacing DNA strands containing nearby nicks (iyr et al, 2006;Kunkel and Erie,2005;Li,2008). To initiate MMR, heteroduplex is first bound by mutsα (MSH 2-MSH 6), mutsα (MSH 2-MSH 6) recognizes base mismatches and 1-nt to 2-nt IDL (Warren et al 2007), or mutsβ (MSH 2-MSH 3) recognizes 2-nt to 13-nt IDL (Gupta et al 2012) (fig. 42C). Next, MSH2 recruits mutlα heterodimers (PMS 2-MLH 1), which cleave only nick-containing strands around heteroduplex (Fang and Modrich,1993;Kadyrov et al, 2006;Pluciennik et al, 2010;Thomas et al, 1991). From these nicks EXO1 mediates 5 'to 3' excision of heteroduplex (Genschel et al, 2002), polymerase delta re-synthesizes excised DNA strands, and ligase I (LIG 1) blocks the nascent strand to complete repair (Iyer et al, 2006;Kunkel and Erie,2005;Zhang et al, 2005).
Without wishing to be bound by theory, MMR may engage a specific boot editing intermediate: DNA heteroduplex formed by hybridization of reverse transcribed 3' DNA flap with adjacent genomic DNA (FIG. 42A). If MutSα -MutLα recognizes heteroduplex within the structure, a 3' nick that occurs after valve equilibration (but before ligation) can stimulate selective excision and subsequent repair of the edited strand to regenerate the original, unedited sequence. Alternatively, MMR can also prevent productive flap interconversion by rejecting annealing of edited 3' flaps to genomic targets (Sugawara et al 2004). In either case, inhibiting MMR during the guided editing process may delay heteroduplex repair or increase the likelihood of nick ligation. In addition, successful nick ligation will eliminate the ability of MMR to break down heteroduplex biased towards removal of editing products. This model was supported by a significant increase (1.6 to 5.8 fold) in the efficiency of PE2 editing from MutSα -MutLα gene knockdown (FIGS. 40C-40D), which strongly suggests that 3' -nicked heteroduplex intermediates were normally repaired to the original sequence by MMR prior to nicking (FIG. 42A). Thus, interfering with MMR replies for these intermediates may improve guided editing efficiency.
The model also explains the benefits of complementary strand cuts in the PE3 editing system. Nicking the non-editing strand of the heteroduplex intermediate introduces an additional strand discrimination signal that can guide MMR to replace the non-editing strand more frequently, resulting in higher guided editing efficiency and reduced MMR inhibition effects (fig. 42B). Consistent with this model, knockdown of the mutsα -mutlα component increased the expected g.c-to-c.g editing of PE2 up to 5.8-fold, but only up to 2.6-fold for pe3+50 (fig. 40G). MMR activity can also facilitate successful guided editing by guiding the excision of non-edited chains, but is limited to guided editing intermediates in which edited chains have been connected and non-edited chain belt cuts. However, MMR knockdown significantly enhanced the pe3+50 editing in K562 and HeLa cells (fig. 40F), suggesting that this intermediate is unusual and MMR usually restores heteroduplex prior to nick ligation.
The mechanism of MMR may also explain how mutsα -mutlα gene knockdown reduces indel by-products from pe3+50 (fig. 41F). During repair-guided editing of heteroduplex intermediates, mutlα can induce DSB by indiscriminately nicking the target locus, especially when both DNA strands already contain peprna and the scission programmed by the sgRNA (fig. 49D). DSBs formed after excision from these additional non-programmed incisions may widen the boundaries of indels at the guided editing locus. Consistent with this hypothesis, knockdown of the mutsα -mutlα component disproportionately reduced the result of the pe3+50 deletion outside the sequence between the pegRNA and the sgRNA nicks (fig. 41I). Taken together, these findings of the Repair-seq screen support a model in which MMR activity strongly suppresses the expected guided editing results, rather promoting indel by-products.
MMR inhibition can improve guided editing of endogenous loci
The effect of MMR on guided editing was tested in endogenous genomic loci and other cell types using a typical SpCas 9-based guided editor. HEK293T cells were treated with siRNA targeting MSH2, MSH6, MLH1 or PMS2, the cells were cultured for 3 days to allow siRNA mediated knockdown (fig. 49E), then transfected with plasmids encoding PE2 and pegRNA programming point mutations at three endogenous genomic loci (EMX 1, RUNX1 and HEK293T cell site 3, hereinafter HEK 3). In these loci, mRNA knockdown was observed to strongly increase average PE2 editing from 7.7% to 25% while indel frequency was reduced from 0.39% to 0.28% (fig. 42C), but to a lesser extent, average PE3 editing efficiency was increased (from 25% to 37%). These results are consistent with the model presented herein, which indicates that complementary strand nick generation improves guided editing efficiency by guiding MMR of the unedited strand. Thus, the impact of MMR on PE3 editing efficiency is mitigated by its adverse effects on the reply 3' flap intermediate (which impedes guided editing) and mediating the replacement of the unedited chain (which facilitates guided editing). In addition, knockdown of MMR gene reduced the frequency of PE3 indels from 5.5% to 3.2% on average, resulting in a 2.9-fold increase in PE3 result purity (fig. 42C). These findings indicate that inhibition of MMR components can improve the guided editing results of human cell endogenous genomic sites.
Guided editing was measured in MMR deficient Δmsh2 or Δmlh1 haploid HAP1 cells. PE 2-guided editing efficiency was much higher in MMR-deficient HAP1 cells (17% HEK3, 5.0% EMX 1) than in wild-type control cells (0.44% HEK3, 0.07% EMX 1; FIG. 42D). Furthermore, consistent with the assumptions described herein, nicking the unedited strand at these loci did not affect editing efficiency in MMR deficient HAP1 cells (FIG. 42D). Taken together, these results further support a model that MMR impedes guided editing by facilitating excision of edited DNA strands, although this effect is partially offset in PE3 systems by the role of MMR in replacing non-edited strands.
Engineering dominant negative MMR proteins can improve guided editing efficiency and accuracy
Encouraging cell pretreatment using MMR-targeted siRNA can increase the efficiency of guided editing, thus exploring strategies for co-delivering both guided editor and MMR inhibitors. Co-transfection of untreated PE2 and MLH1siRNA did not significantly increase editing efficiency after 3 days (FIG. 49F). It is speculated that dominant negative MMR protein variants may transiently co-express with PE2 or enhance guided editing as fusion proteins with PE 2. Atpase-deficient mutants encoding PE2, pegRNA and human MSH2, MSH6, PMS2 and MLH1 (Iaccarino et al, 1998; et al, 2002; tomer et al 2002) or PMS2 and MLH1Plasmid co-transfection of the HEK293T cells with the nuclease-deficient mutants (Gueneau et al 2013;Kadyrov et al, 2006) (fig. 43A). Among these mutants, atpase-impaired MLH1E34A and endonuclease-impaired mlh1Δ756 increased PE2 editing efficiency by 1.6 to 3.1 fold for three single base substitution edits of HEK3, EMX1 and RUNX1 loci.
Next, additional dominant negative MLH1 variants were designed and tested to maximize pilot editing efficiency. MLH 1N-terminal domain (NTD) mediates recruitment of MutLα (PMS 2-MLH 1) to MSH2 (Plotz et al, 2003) during MMR and contains ATPases necessary for MutLα function (Kadyrov et al, 2006) (FIG. 43B). In contrast, the MLH 1C-terminal domain (CTD) dimerizes with PMS2 and contributes to mutlα endonuclease activity critical to MMR (Gueneau et al, 2013) (fig. 43B). While the dominant negative variant of MLH1Δ756 destroyed the endonuclease, a larger deletion of these residues (MLH1Δ754-756) was found to further increase the guided editing efficiency of the three sites tested (FIG. 43C, FIG. 50A). However, combining atpase and endonuclease mutations (MLH 1E34A Δ754-756) did not further improve guided editing (fig. 43C). Comparing these dominant negative MLH1 variants in ten pilot edits, MLH1Δ754-756 was observed to maximize PE2 editing efficiency on average (3.2 fold; FIG. 50D). Thus, MLH1Δ754-756 is designated MLH1dn. MLH1dn improved guided editing in HEK293T cells in a dose-dependent manner (FIG. 50B). MLH1dn did not increase editing in MMR-deficient HCT116 cells (Parsons et al, 1993) (FIG. 50C). Both human and mouse MLH1dn also improved the substitution-guided editing efficiency of three sites in human HEK293T cells and four sites in mouse N2A cells (FIGS. 50D-50E).
Next, shorter MLH1 truncations were identified, which could also inhibit MMR during pilot editing. As such, MLH1NTD (residues 1-335) moderately enhanced PE2 editing by 1.5 to 1.8 fold, but introduction of the E34A ATPase mutation attenuated this improvement in two of the three test loci (FIG. 43C, FIG. 50A). The addition of a Nuclear Localization Signal (NLS) to MLH1NTD can enhance PE2 editing by 1.9 to 2.5 times, similar to the extent of full length MLH1 dn. In contrast, MLH1CTD (residues 501-756)The endonuclease mutants and NLS-added variants thereof did not substantially enhance PE2 editing (fig. 43C, fig. 50A). These data indicate that dominant negative MLH1 variants can inhibit MMR by forming a catalytically impaired mutlα complex with PMS2 or by binding to saturated MSH2, which will prevent efficient recruitment of mutlα during MMR. The full length MLH1dn can realize both functions at the same time, MMR can be inhibited by two mechanisms, but NLS-added MLH1NTD (hereinafter referred to as MLH 1) NTD -NLS) would be expected to inhibit MMR by MSH2 binding because it lacks CTD required for dimerization with PMS2 (Guerrette et al, 1999). MLH1NTD-NLS increases PE2 editing efficiency to nearly the same extent as MLH1dn (FIG. 43C, FIG. 50A), indicating MSH2 isolation is an effective mechanism to inhibit MMR.
MLH1dn or MLH1 NTD NLS also fused directly to PE2 protein (via 32aa (SGGS) x2-XTEN16- (SGGS) x2 linker), but did not show higher editing efficiency compared to PE2 alone (FIG. 50A). However, the addition of MLH1dn to PE2 with a self-cleaving P2A linker (PE 2-P2A-MLH1 dn) (Kim et al 2011) increased the efficiency of guided editing at three sites by a factor of 2.0 to 2.7 (FIG. 50A). Among a total of 55 dominant negative MMR protein variants tested at three loci in HEK293T cells, the trans-expressed PE2 and MLH1dn provided the greatest average enhancement (3.2-fold) in PE2 editing efficiency. Also from MLH1 with trans-expression NTD PE2 of NLS (2.7-fold) and PE2-P2A-MLH1dn (2.4-fold) a significant average improvement in PE2 editing was observed (FIG. 43C). These three variants also increased PE3 editing efficiency by an average of 1.2-fold, but reduced the unwanted indels by 1.4 to 4.0-fold (fig. 43E). PE2 edits using MLH1dn are named PE4 systems, and PE3 edits using MLH1dn are named PE5 systems (FIG. 43F). MLH1 compared to MLH1dn (753-aa) NTD NLS can also provide enhanced guided editing efficiency using smaller proteins (355-aa). When a single construct is required, it is recommended to use PE2-P2A-MLH1dn.
The versatility of the PE4 and PE5 systems was assessed by seven additional single base substitution edits at different genomic loci in HEK293T cells. On average, PE4 increased editing efficiency 2.0 times (average < 0.4%; FIG. 43G) over PE2 with minimal indels. PE5 improves editing by a factor of 1.2 over PE3 and will edit: the indel purity was improved 3.0-fold (fig. 43G). It was also tested whether MLH1dn could increase the efficiency of PE3b, a guided editing strategy that uses complementary strand cuts specific to the editing sequence to minimize coincident cuts on both strands that promote indel formation (Anzalone et al, 2019). PE3b co-expressed with MLH1dn (hereinafter referred to as PE5 b) increased editing at the FANCF locus by 1.7-fold (< 0.6%; FIG. 50H) compared to PE3b with low indel frequency. Furthermore, it was shown that PE4 provides significantly improved guided editing performance compared to PE2 when indels must be minimized or located at loci where complementary strand cuts produce non-productive editing results (fig. 50H). The effect of pilot editing enhancement of MLH1dn was also compared with that of complete MLH1 knockout in the cloned HeLa cell line. At the four sites tested, MLH1 knockdown enhanced PE2 and PE3 editing to a greater extent than MLH1dn coexpression (PE 4 and PE5; FIG. 50F), indicating the opportunity for additional guided editing enhancement by further modulating this pathway. Overall, these data establish PE4 and PE5 systems that can significantly improve the efficiency of guided editing and resultant purity of various endogenous genomic loci in human cells.
Characterization of bootstrap edit types enhanced by PE4 and PE5
Next, the extent to which MLH1dn improves guided editing in various different editing types was studied. PE4 was compared with PE2 in HEK293T cells using 84 pegrnas, which together introduce all 12 possible single base substitutions at seven genomic loci. In these edits, MLH1dn increased the pilot editing efficiency by a factor of 2.0 and reduced the average indel frequency from 0.40% to 0.31% compared to PE2 (FIGS. 44A-B, 51A-51B).
MLH1dn improves the efficiency of A.T-to-G. C, T.A-to-G.C and C.G-to-A.T guided editing to a small extent, 1.7 times that of PE 2. Notably, G.C-to-C.G substitutions (formation of C.C mismatches after 3' flap hybridization) used minimal improvement of MLH1dn (1.2 fold), which was consistent with previous studies confirming that C.C mismatches were not efficiently repaired by MMR (Lahue et al, 1989; su et al, 1988;Thomas et al, 1991). This finding suggests that editing of G.C-to-C.G is more efficient at evading MMR, and thus may result in higher basal editing efficiency. Consistent with this possibility, the efficiency of G.C-to-C.G editing using PE2 (27%) was substantially more efficient than G.C-to-A.T (18%) or G.C-to-T.A (20%) editing in guided editing of changing PAM between seven endogenous loci (FIG. 51C).
To further confirm that G.C-to-C.G substitutions were less sensitive to MMR, the effect of MMR knockdown or knockdown on the efficiency of G.C-to-C.G guided editing at the RNF2 locus that was not affected by MLH1dn coexpression was tested (FIG. 51A). siRNA knockdown of MMR component in HEK293T cells (FIG. 51D) and knockdown of MSH2 or MLH1 in HAP1 cells (FIG. 51E) did not change the efficiency of the G.C-to-C.G editing. G.C-to-A. T, G.C-to-C · G and G.C-to-T.A editions and methods described in HeLa SaPE2 of a pre-validated screening site in CRISPRi cells was compared (fig. 40B). PE2 and PE3+50 installed G.C-to-C.G edits more efficiently than G.C-to-A.T or G.C-to-T.A edits, consistent with weaker MMR activity at C.C mismatches (FIG. 51F). In addition, CRISPRi knockout of MSH2 increased G.C-to-A.T and G.C-to-T.A editing efficiency (16 times for PE2 and 4.3 times for PE3 +50) to a greater extent than G.C-to-C.G editing (4.0 times for PE2 and 1.9 times for PE3 +50). These data strongly demonstrate that G.C-to-C.G guided edits are not easily repaired by MMR and therefore are more installation efficient.
PE5 and PE3 were also compared to the same set of 84 pegrnas used in the above experiments. PE5 produced an average 1.2 fold increase in editing relative to PE3 (fig. 44A, 51G). MLH1dn significantly reduces the frequency of unwanted indels by an average of 2.2 fold, resulting in editing: the indel purity was increased by a factor of 2.8 (fig. 44A). Overall, analysis of 84 different single nucleotide substitutions for seven genomic loci strongly supports a model in which the PE4 and PE5 systems improve guided editing efficiency and resultant purity by attenuating the reactive MMR of the guided editing intermediate.
To determine if MLH1dn can also improve small insertion and deletion guide editing, 1bp and 3bp insertions and deletions were subsequently installed into HEK293T cells using PE4 and PE 5. Among the 12 pegRNAs at the three loci, an average editing efficiency of PE4 was observed to be increased 2.2 times over PE2 without an increase in indel frequency, while the editing efficiency of PE5 was increased 1.2 times over PE3, editing: the indel purity was increased by a factor of 2.9 (FIGS. 44C and 51I). To assess the effect of MLH1dn on larger sequence changes, PE2 and PE4 were tested using 33 pegRNAs in combination that together programmed 1, 3, 6, 10, 15 and 20bp insertions and deletions at HEK3 and FANCF loci. The enhancement of the guided editing efficiency of MLH1dn was found to gradually decrease with increasing insertion and deletion lengths (FIGS. 44D and 51H), making PE4 edited 15-bp and 20-bp insertions and deletions on average only 1.1 times more efficient than PE 2. These observations are consistent with previous reports of MMR repair length 13-nt IDL (Acharya et al, 1996;Genschel et al, 1998; umar et al, 1994). Together, these results demonstrate that PE4 and PE5 strategies can enhance small (< 15-bp) targeted insertions and deletions.
Installing additional silent mutations can increase guided editing efficiency by evading MMR
It is also discussed whether other categories of guided edits can bypass the MMR. In addition to point mutations, short insertions, and shortages, guided editing can also install multiple adjacent or consecutive base substitutions (Anzalone et al, 2019). Mutsα and mutsβ heterodimers each recognize specific heteroduplex DNA structures (Gupta et al 2012;Warren et al, 2007), suggesting that consecutively mismatched DNA bubbles may impair recognition of these MMR components. To assess this possibility, PE2 and PE4 were tested using 35 different edits that generated 1-5 consecutive base substitutions at the five genomic loci of HEK293T cells. Among seven edits mutating two adjacent bases, PE4 produced 2.3 times higher editing efficiency than PE2, comparable to 2.4 fold more strongly than single base substitution at the same target nucleotide (fig. 44E, fig. 52A). In contrast, the editing frequency of the longer 3 to 5 base consecutive substitutions of PE4 was increased by only 1.2 to 1.5 times relative to PE 2. The decrease in the effect of MMR on these larger edits reflected in an average PE2 edit efficiency of 3 to 5bp consecutive substitutions (13% of 21 edits) that was higher than 1-2bp consecutive substitutions (4.8% of 14 edits) (fig. 44F, fig. 52A). Inhibition of MMR with MLH1dn (PE 4) increased the average edit frequency of these 3-5bp and 1-2bp consecutive substitutions to 16% and 10%, respectively, thus reducing the difference between these edit types.
Installing additional silent mutations near the intended editing can similarly increase guided editing efficiency by attenuating repair of the resulting heteroduplex, even in the absence of any MLH1 inhibition (FIG. 51G). PE2 was tested on six gene targets using either programming only the pegRNA encoding the mutation, or programming the pegRNA close to encoding other silent mutations (mostly less than 5 bp) edited. Creating additional silent mutations at four of the six sites increases the efficiency of the desired coding changes, on average 1.8-fold for the optimal pegRNA per site (FIGS. 44H and 52B). Inhibition of MMR with MLH1dn (PE 4) increased the editing efficiency of the best silencing pegRNA to a lesser extent (1.2 fold on average) than that of just producing the coding mutant (1.7 fold on average), indicating that these additional silencing mutations enhance editing by evading MMR. Consistent with this mechanism, it was observed that the silent mutations tested did not significantly attenuate the impact of MMR inhibition on editing efficiency at two sites where silencing pegRNA did not improve editing (fig. 52B). Taken together, these findings support that MMR repair heteroduplexes containing three or more consecutive mismatched bases more inefficiently, and surprisingly reveal that strategically installing additional silent or benign mutations near the desired edit can increase guided editing efficiency by avoiding guided MMR reversion of editing intermediates even when using PE2 or PE3 based systems that do not manipulate MMR activity.
PE4 and PE5 greatly improve guided editing in cell types by fully active MMR
HEK293T cells were partially MMR deficient due to hypermethylation of the MLH1 promoter (Trojan et al, 2002), which, according to the findings above, might explain the higher efficiency of guided editing observed in HEK293T cells compared to other mammalian cell types (Anzalone et al, 2019). To assess whether MLH1dn could also improve guided editing to a greater extent in cells with fully active MMR, guided editing was compared in HEK293T cells and MMR-effective HeLa cells using 30 pegRNAs co-targeting seven loci. Consistent with higher MMR activity in HeLa cells (Holmes et al 1990;Thomas et al, 1991), complementary strand nick production (PE 3 and PE 2) resulted in a significant increase in pilot editing efficiency (22-fold) in HeLa cells compared to HEK293T cells (3.5-fold) (fig. 44I-44J and 52C). Also, in the same 30 edits, PE4 increased the average editing efficiency (6.7 fold) compared to PE2 in MMR-effective HeLa cells compared to partially MMR-deficient HEK293T cells (2.0 fold) while maintaining the lowest indel frequency (average 0.67% in HeLa cells). Also, the average editing efficiency of PE5 was increased 1.9-fold over PE3 in HeLa cells, but only 1.1-fold in HEK293T cells. MLH1dn also causes editing in HeLa cells: the indel ratio increased 2.8-fold on average, and 3.2-fold in HEK293T cells (fig. 44I).
PE4 and PE5 were also evaluated in MMR-effective K562 and U2OS cell lines (Matheson and Hall,2003; peng et al, 2014). Among the six sites in K562 cells and four sites in U2OS cells, PE4 and PE5 increased the average guided editing efficiency by 6.0-fold and 1.7-fold compared to PE2 and PE3, respectively (fig. 44I-44J), again providing much greater benefits than observed in MMR-compromised HEK293T cells. Editing of PE5 in these cells: the indel purity was also improved by 2.6 times compared to PE 3. Interestingly, while PE4 increased the G.C-to-C.G edit frequency at DNMT1 by only a factor of 1.4 compared to PE2 in HEK293T cells (FIG. 51A), a greater improvement was observed in HeLa, K562 and U2OS cells (average 2.7-fold; FIG. 44J), indicating that PE4 and PE5 can enhance guided edits in MMR-effective cell types, even for edit categories that escape MMR activity more effectively. In summary, a comparison between 70 edits at seven endogenous sites in HEK293T, heLa, K562 and U2OS cells suggests that MLH1dn significantly improves guided editing efficiency, especially in cell types that do not have MMR defects.
Influence of MLH1dn on purity of guide editing result
Next, it is deeply described how MLH1dn reduces the unintended guided editing results of endogenous genomic targets. To separate the step of leading to leader editing from the step of leading to indel byproducts, the leader editor was tested using non-editing pegrnas that templated the 3' dna flap perfectly complementary to the target locus and did not lead to sequence changes at the target locus (fig. 45A). Among four sites in HEK293T cells, these non-editing pegrnas produced an average of 4.4% indels using PE3 and an average of 4.3% indels using PE5 (fig. 45B-45C, fig. 53A). In contrast, four pegRNAs with single nucleotide mutations programmed at these sites produced an average of 8.5% indels using PE3 and an average of 4.8% indels using PE 5. These data indicate that MMR inhibition of MLH1dn does not alter indel frequency in the absence of heteroduplex. It was also observed that MLH1dn did not alter the indel frequency of peRNA and nicking-generating sgRNA co-transfected with PE2-dRT (PE 2 containing inactivated reverse transcriptase) or with SpCas 9H 840A nicking enzyme (nCas 9; FIG. 53), indicating that MMR did not affect repair of double-nicked intermediates lacking reverse transcription 3' flaps. These results strongly suggest that MMR involvement directed to editing heteroduplex intermediates stimulates indel products, which can be relieved by PE 5.
Next, evidence that MMR activity expansion directs editing of indel byproduct size was sought in seven endogenous loci in HEK293T cells. The distribution of the unexpected target sequence deletions in PE3 and PE5 was compared to 84 pegRNAs encoding single base substitutions. Consistent with the results of the Repair-seq screen (fig. 41I), the MLH1dn in the PE5 system reduced the loss outside of the peprna and sgRNA programmed cuts to a greater extent than the loss between these cuts (fig. 45D-45E, 53B). In contrast, MLH1dn did not affect the distribution of guided editing deletion results from non-editing pegRNA that did not produce mismatches (FIG. 53C). In summary, the analysis supports a model in which non-programmed excision of MutLα followed by excision at the target locus generates larger indel byproducts in the MMR process that directs editing intermediates.
Finally, it was examined how MLH1dn affects pegRNA scaffold sequence incorporation and unintended flap reconnection (two unintended outcome categories previously identified in the Repair-seq screen) (FIGS. 41C-41D). Among the 84 pegRNAs tested in HEK293T cells described above, PE5 reduced the average frequency of these combined results by a factor of 1.6 compared to PE3 (from 1.8% to 1.0% on average, FIGS. 45F, 53D). Consistent with the data from PE2 and PE3+50 screens (FIGS. 41E-41F), these results were more rare in the absence of complementary strand cuts (0.27% frequency for PE2 and 0.28% frequency for PE 4; FIG. 53D), indicating that they were typically formed by double-cut intermediates. Thus, PE5 significantly reduced the size of the unintended deletions compared to PE3 and reduced the frequency of pegRNA scaffold sequence incorporation and unintended flap reconnections.
Effect of MLH1dn on off-target genomic DNA changes
It is next evaluated whether MMR component manipulation would affect off-target editing. PE2 and PE4 were tested in HEK293T cells using eight pegrnas targeting HEK3, EMX1, FANCF and HEK4 loci. The resulting genomic changes were measured at the first four Cas9 off-target sites identified by CIRCLE-seq for each targeted locus (Tsai et al, 2017). The average frequency of off-target guided editing was still very low (0.094% using PE2 and 0.12% using PE 4) with or without ML1dn, while the average efficiency of targeted editing increased from 9.7% for PE2 to 20% for PE 4; (FIGS. 45G and 53E). These data are consistent with previous reports, indicating high DNA specificity of pilot editing (Anzalone et al, 2019; jin et al, 2021; kim et al, 2020), and that MLH1dn does not significantly increase off-target pilot editing.
Next, it was investigated whether transient inhibition of MMR with MLH1dn could induce genomic mutations independent of the pilot editor activity. In humans, MMR defects are risk factors for colorectal cancer, most commonly manifested as mutations that alter the length of repetitive microsatellite sequences within the genome (Fishel et al, 1993; leach et al, 1993;Parsons et al, 1993). Microsatellite instability has been used clinically as a measure of MMR activity (Bacher et al, 2004; umar et al, 2004) because errors in microsatellite replication are almost completely repaired by MMR (Strand et al, 1993; tran et al, 1997).
Microsatellite instability was assessed by high throughput sequencing of 17 single nucleotide beams that have previously been demonstrated to be responsive to MMR inhibition and used as clinical biomarkers of MMR deficiency (Hempelmann et al 2015). These microsatellite loci were analyzed in HAP1 cells, heLa cells and MMR-deficient HCT116 cells. HCT116 cells showed on average significantly shorter microsatellite lengths (13.9-nt) than HAP1 or HeLa cells (18.4-nt; FIGS. 45H, 53F). To determine if this assay could also detect short duration MMR defects in the same cell type, microsatellite instability of wild-type HAP1 cells and monoclonal HAP1 cells grown for 2 months (about 60 cell divisions) after MSH2 knockout were compared. These MMR knockout cells exhibited an average decrease in microsatellite length of 0.24nt (fig. 45H, fig. 53F), demonstrating that even recent MMR lesions could be detected by accumulation of microsatellite length erosion.
To assess the effect of transient MLH1dn expression used in PE4 and PE5 systems, microsatellite instability was next measured in MMR-effective HeLa cells 3 days after transfection with either PE2 or PE4 encoding plasmids. Although MLH1dn increased the efficiency of pilot editing at the target locus from 1.3% (PE 2) to 7.6% (PE 4) (FIG. 51G), there was no difference in average microsatellite length in PE2 treated cells compared to PE4 treated cells (difference < 0.01nt; FIGS. 45H, 53F). These data indicate that transient MLH1dn expression can enhance guided editing without stimulating detectable instability of 17 microsatellites for clinical diagnosis of MMR defects. Since microsatellite repeats are on average more susceptible to MMR inhibition than other parts of the genome (Strand et al, 1993; tran et al, 1997), these findings indicate that PE4 and PE5 editing systems using transient MLH1dn expression (including MLH1dn mRNA co-delivery described below) can enhance on-target editing without substantial off-target mutation burden.
PEmax system with optimized architecture and synergy with engineered pegRNA
To further improve guide editing, PE2 proteins were optimized by altering RT codon usage, mutations within the SpCas9 domain, length and composition of the peptide linker between nCas9 and RT, and position, composition and number of NLS sequences (fig. 56A-56B). Among the 21 such variants tested, the greatest enhancement in editing efficiency was observed in the guided editor architecture using: the R221K N394K mutation in human codon optimized RT, 34-aa linker comprising a binary SV40 NLS (Wu et al, 2009), additional C-terminal C-Myc NLS (Dang and Lee, 1988) and SpCas9 previously shown to improve Cas9 nuclease activity (Spencer and Zhang, 2017) (fig. 46A, 56A). This optimized boot editor architecture is named PEmax (SEQ ID NO:99 as disclosed herein; see also the schematic diagram of FIG. 54A). This optimized leader editor architecture (hereinafter PEmax) outperforms other improved leader editor variants, including PE2 x (Liu et al, 2021) with additional NLS sequences contained therein and CMP-PE-V1 (Park et al, 2021) with high mobility peptides, at seven target sites tested in HeLa cells (fig. 56B-56D). Inserting high mobility peptides into PEmax (CMP-PEmax) did not further improve bootstrap editing (fig. 56C-56D).
Among the seven substitution edits targeting different loci, using the PEmax architecture with the PE2 system (PE 2 max) increased the average frequency of expected edits 2.3-fold in HeLa cells and 1.2-fold in HEK293T cells compared to the original PE2 architecture (fig. 52B). Similarly, PE3 using the PEmax architecture (PE 3 max) increased the average editing efficiency 2.8-fold in HeLa cells and 1.2-fold in HEK293T cells compared to PE3 (fig. 53B and 56C). PE3max also slightly reduced average editing in both cell types: indels are 1.2-fold pure, which may reflect the enhancement of nickase activity of Cas 9R 221K and N394K mutations within the PEmax architecture (Spencer and Zhang, 2017).
It was also observed that PE4max (PE 4 using the PEmax architecture) increased the average editing frequency 1.9-fold in HeLa cells and 1.1-fold in HEK293T cells compared to PE4 (fig. 56E). Finally, PE5max (PE 5 using the PEmax architecture) also increased the average editing efficiency 2.2-fold in HeLa cells and 1.2-fold in HEK293T cells compared to PE 5.
Among the individual improvements to pilot editing, engineering pegRNA (epegRNA) was recently developed, which contains additional 3' rna structural motifs that increase pilot editing efficiency (Nelson et al, 2021) (fig. 46C). To assess whether the optimized PE4max and PE5max systems could act synergistically with the epegRNA, PE4max and PE5max binding to the epegRNA was tested in HeLa and HEK293T cells. In seven substitution edits targeting different loci, epegrnas increased PE4max editing efficiency 2.5-fold on average in HeLa cells and 1.5-fold in HEK293T cells compared to normal pegrnas (fig. 46B). Also, epegRNA enhanced PE5max editing by 1.4-fold in HeLa cells and 1.1-fold in HEK293T cells compared to normal pegRNA, and did not affect editing: indel purity.
The combination of all the enhancement functions (MLH 1dn, PEmax and epegRNA) of the above-described guided editing system significantly improves guided editing performance. PE4max with epegr rna increased editing efficiency 72-fold on average in MMR-effective HeLa cells and 3.5-fold in MMR-deficient HEK293T cells compared to PE2 with normal pegRNA (fig. 46B). PE5max with epegRNA increased editing efficiency 12-fold on average in HeLa cells, 1.6-fold in HEK293T cells, and as a result purity 4.6-fold on average in HeLa cells, and 3.3-fold in HEK293T cells, compared to PE3 with pegRNA. Taken together, these results demonstrate that combining the PE4/PE5, PEmax and epegRNA methods can greatly improve guided editing efficiency and result purity.
Guided editing of disease-associated loci and cell types using PE4 and PE5
To determine the applicability of these improved editing systems, PE4max and PE5max were used to edit treatment-related loci in wild-type HeLa and HEK293T cells. First, silent G.C-to-A.T transversions were performed at codon 6 of HBB, which was mutated in sickle cell disease patients (Ingram, 1956). Second, the G127V allele (G.C-to-T.A transversion) was installed in the PRNP, conferring resistance to prion diseases (asate et al, 2015; mead et al, 2009). Third, silent c.g-to-t.a mutations were introduced at the CDKL5 site, which is known to contain pathogenic mutations that lead to CDKL5 deficiency disorders (severe neurodevelopmental disease) (Olson et al, 2019). Fourth, cxcr4p191A alleles (g.c-to-c.g edits) that inhibit HIV infection in human cells were installed (Liu et al, 2018). Fifth, IL2RB H134D Y F (non-adjacent T-a-to-a-T and G-C-to-C-G editions) variants were generated that were able to achieve orthogonal IL-2 receptor responsiveness for adoptive T cell transfer therapy (sockosky et al 2018). Finally, the BCL11A repressor binding site within the HBG1 and HBG2 fetal hemoglobin gene promoters is recoded into GATA1 transcriptional activator motifs (non-adjacent g.c-to-a.t and c.g-to-a.t editions), which in principle can induce fetal hemoglobin expression to treat hemoglobinopathies (Amato et al, 2014). The sequences are shown in table 4 below.
TABLE 4 sequence for directed editing of disease-related loci and cell types
/>
Among these six disease-related edits, an average guided editing efficiency of 29-fold increase in PE4max was observed in HeLa cells and 2.1-fold increase in HEK293T cells compared to PE2 (fig. 46D). Notably, the PE4max editing efficiency (8.6% editing and 0.19% indels in HeLa cells, 20% editing and 0.26% indels in HEK293T cells) was similar to or exceeded the PE3 editing efficiency (4.5% editing and 1.5% indels in HeLa cells, 24% editing and 5.4% indels in HEK293T cells), but the indels were much less. Furthermore, PE5max increased disease-related allele conversion by an average of 6.1-fold in HeLa cells, 1.5-fold in HEK293T cells, and edited in HeLa cells as compared to PE 3: the indel purity was increased 6.4-fold, 3.5-fold in HEK293T cells (fig. 46D). Taken together, these results demonstrate that PE4max and PE5max support higher guided editing performance on gene targets associated with human disease in common cell lines as compared to PE2 and PE 3.
Next, the PE4 and PE5 editing systems were evaluated in cell models of genetic diseases and primary human cells. First, pathogenic CDKL5 c.1412dela mutations in human induced pluripotent stem cells (ipscs) derived from the isogenic heterozygous patients were corrected (Chen et al, 2021). Electroporation of these ipscs with PE3 components (in vitro transcribed PE2 mRNA and synthetic pegRNA and nick-producing sgrnas) resulted in 17% editable pathogenic allele correction and 20% total indel product (fig. 46E, fig. 56F). These components were co-electroporated with MLH1dn mRNA for PE5 editing, increasing the correction efficiency to 34% and reducing the indel frequency to 6.1%. To further reduce indels, PE4 and PE5b strategies are also used. In the absence of complementary strand nicks, MLH1dn was observed to improve allele correction from 4.0% (PE 2) to 10% (PE 4) with little indels (< 0.34%) (FIGS. 46E, 56F). Likewise, PE3b resulted in 13% mutant allele editing and 4.8% indels, while PE5b increased editing to 27% with an indels rate of 3.8%. In the PE4, PE5 and PE5b systems tested, MLH1dn increased the correction efficiency of pathogenic CDKL5 c.1412delA mutations in patient-derived iPSCs by a factor of 2.2, resulting in a 3.6-fold increase in purity.
Next, primary T cells of healthy human donors were electroporated using PE2 mRNA, MLH1dn mRNA, synthetic pegRNA and nicking sgRNA to introduce protective PRNP G127V mutations, G.C-to-T.A transversions at FANCF, or 1-bp insertions of T at RNF 2. At these three sites, PE5 was found to achieve an average of 41% editing efficiency and 12% indels, with significant improvement over the 22% editing efficiency and 26% indels using PE3 (fig. 46F, 56G). PE4 and PE5 are also used to install protective CXCR 4P 191A alleles (Liu et al, 2018) that can prevent HIV infection, and IL2RB H134D Y F variants (sockosky et al, 2018) that can achieve orthogonal IL-2T cell stimulation. For these edits, PE4 increased allele switching frequency by an average of 3.6-fold compared to PE2, with little indel by-product (fig. 46F, fig. 56G). Furthermore, PE5 resulted in an average of 52% editing and 11% indels, compared to 44% and 17% indels using an average of 44% editing of PE 3: the indel ratio was increased by a factor of 2.0. In summary, these results between six sites in human ipscs and primary T cells identify PE4 and PE5 as an enhanced guided editing system, which can significantly improve editing efficiency and result purity in cell types relevant to the study and potential treatment of genetic diseases.
Discussion of the invention
Using pooled CRISPRi screens, MMR activity was found to strongly inhibit the efficiency and resultant purity of substitution-guided editing. Based on the results and understanding of CRISPRi screening and the role of MMR in inhibition of editing by nucleotide substitution, PE4 and PE5 systems were developed. In particular, the PE4 and PE5 systems co-express MLH1dn, temporarily inhibit MMR, enhance guided editing efficacy and reduce indels without causing substantial off-target genomic changes. Optimization of the bootstrap editor protein resulted in a PEmax architecture that could work synergistically with the PE4 and PE5 systems and epegrnas (Nelson et al, 2021-incorporated herein by reference) to further enhance bootstrap editing performance. Together, these discovery-supported guided editing DNA repair models, PE4 and PE5 strategies developed to circumvent the guided editing bottleneck, and the improved PEmax guided editor architecture described herein, greatly enhance the usefulness of guided editing in genome accurate operations.
This work shows that guided edits can install certain types of edits, including g.c-to-c.g transversions and three or more consecutive base substitutions, and are more efficient due to the ability of the corresponding guided editing intermediate to evade MMR. In addition to edit types, other properties may also affect the sensitivity of guided edits to MMR, such as the sequence context of the target site. The state of the editing site may also be important because MMR can repair early replicated euchromatin (Supek and Lehner, 2015) and the hysteretic DNA during replication (Lujan et al, 2014) more efficiently.
While co-delivery of MLH1dn with a guided editor has been demonstrated to improve editing efficiency and accuracy, the studies presented herein regarding the role of MMR in determining guided editing results, as well as the in-depth knowledge of the type of guided editing intermediates for MMR repair, enable researchers to design guided editing experiments to evade MMR even without expression of MLH1 dn. For example, the data provided herein demonstrate that strategically installing additional nearby silent mutations can enhance boot editing results by avoiding MMR replies to boot editing intermediates, even with PE2 or PE3 systems. Other MMR suppression approaches may also be advantageous for guided editing. Although small molecules that selectively target MMR have not been reported, chemical inhibitors can be used in applications limited by MLH1dn delivery and can temporarily control MMR inhibition. RNA interference provides another strategy for transient MMR knockdown for applications such as viral delivery where expression of the guide editor may be maintained for long periods of time.
The PE4 and PE5 systems strongly enhanced the guided editing of seven mammalian cell types tested, including patient-derived ipscs and primary T cells. In MMR-effective cell types, PE4 had an average 7.7-fold increase in editing efficiency over PE2, uniquely achieving a considerable level of gene editing without DSB generation due to complementary strand nick generation. This improvement, in combination with the PEmax architecture and epegrnas, can override the editing frequency of PE3 while maintaining few indel byproducts. Thus, PE4max with epegRNA is particularly useful for gene editing applications that cannot tolerate indel formation or are limited by nick-generating sgRNA delivery. In contrast, in cells with active MMR (including most cellular targets), the editing efficiency of PE5 was improved by an average of 2.0-fold compared to PE3, and the editing result purity was improved by about 3-fold.
Experimental model and subject detailed information
Culture conditions for immortalized cell lines
HEK293T, heLa dCAs9-BFP-KRAB, heLa, HCT and N2A cells were cultured in Dulbecco's Modified Eagle's Medium (DMEM) plus GlutaMAX (Thermo Fisher Scientific) supplemented with 10% Fetal Bovine Serum (FBS) (Thermo Fisher Scientific). HeLa dCAS9-BFP-KRAB cells were cultured in DMEM plus Glutamax supplemented with 10% FBS, 100U mL-1 penicillin and 100 μg mL-1 streptomycin (Thermo Fisher Scientific). K562dCAS9-BFP-KRAB and K562 cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium (Thermo Fisher Scientific) supplemented with 10% FBS and 1% penicillin-streptomycin-glutamine 100× (Thermo Fisher Scientific). All HAP1 cell types are present and complementedIscove with 10% FBS was cultured in Dulbecco's medium (IMDM) plus GlutaMAX (Thermo Fisher Scientific). U2OS cells in 10% FBS, 100U mL -1 Penicillin and 100 μg mL -1 Streptomycin (Thermo Fisher Scientific) in McCoy's 5A medium (Gibco). HeLa dAS 9-BFP-KRAB and K562 dAS 9-BFP-KRAB cell lines were verified by short tandem repeat marker testing. All cell types were passaged every 2-3 days, maintained at less than 80% confluency, at 37℃with 5% CO 2 And (3) culturing under the condition that mycoplasma detection is negative.
Isolation of primary human T cells
Peripheral Blood Mononuclear Cells (PBMCs) were isolated from the buffy coat of healthy donors (sangboll souvenir blood center, minnesota) by density centrifugation using Lymphoprep density gradient medium (STEMCELL Technologies) and SepMate tube (STEMCELL Technologies). T cells were isolated from PBMCs using EasySep human T cell isolation kit (STEMCELL Technologies).
Culture conditions for induced pluripotent stem cells derived from human patients
All iPSC cultures were performed by staff on the human neuronal core of boston children hospital according to institutional guidelines and with institutional approval (IRB #: P00016119). Clone iPS cell line MAN0855-01#a (Coriell Institute #or00007) was amplified from female CDKL5 deficient patients carrying a heterozygous CDKL5 c.1412delap.d 471fs mutation on the X chromosome (Chen et al, 2021). MAN0855-01#A expressed mutant CDKL5 transcript was previously verified by Sanger sequencing of cDNA. MAN0855-01#A iPS cell line was cultured in StemFlex medium (Thermo Fisher Scientific) diluted 1:50 with DMEM/F12 (Thermo Fisher Scientific) on Geltrex (Thermo Fisher Scientific) and coated according to the manufacturer's protocol. For periodic maintenance, iPS cell colonies were passaged at 80% confluence using Gentle Cell Dissociation Reagent (STEMCELL Technologies) every 5-7 days.
Method details
General methods and molecular cloning
The cloned lentivirus transfer plasmid and the plasmid used to direct mammalian expression of editors and other proteins were assembled using uracil excision (USER) (Cavaleiro et al 2015). Briefly, DNA fragments were amplified using uracil resistant Phusion U Green Multiplex PCR Master Mix (Thermo Fisher Scientific) with deoxyuracil containing primers (Integrated DNA Technologies). The deoxyuracil-incorporated DNA fragments were assembled with USER enzymes (New England BioLabs) and DpnI (New England BioLabs) using junctions with melting temperatures of 42℃to 60℃according to the manufacturer's protocol and then transformed into cells. All guide editor constructs were cloned into the pCMV-PE2 vector backbone (Addgene # 132775) under constitutive expression of the CMV promoter (Anzalone et al, 2019). All bootstrap editor constructs also contained the following mutations within the MMLV RT: d200N, T306K, W313F, T330P and L603W. All DNA repair proteins and RFP expression constructs were cloned into vectors under constitutive expression of the EF1 alpha promoter. Human MSH2, MSH6, PMS2 and MLH1 sequences were subcloned from plasmids pfb1_hmsh2 (adedge # 129423), pfb1_hmsh6 (adedge # 129424), pfb1_pms2 (adedge # 129425) and pfb1_mlh1 (adedge # 129426) (Geng et al 2011). The human CDKN1A sequence was subcloned from plasmid Flag p21 WT (Addgene # 16240) (Zhou et al, 2001). Codon optimized MLH1 sequences for human and mouse cell expression were designed using GenSmart codon optimization (Genscript) and ordered as gBlock gene fragments (Integrated DNA Technologies).
Plasmids for mammalian expression of pegRNA or sgRNA were cloned as previously described (Anzalone et al, 2019) using Golden Gate assembly (Engler et al, 2008). Briefly, the guide RNA vector backbone for human U6 promoter expression was digested overnight with BsaI-HFv2 (New England BioLabs) according to the manufacturer's protocol and the linearized product was purified by 1% agarose gel electrophoresis using QIAquick gel extraction kit (QIAGEN). Oligonucleotides (Integrated DNA Technologies) for spacer sequences, guide RNA scaffolds, and 3 'extensions were annealed, assembled with linearized U6 backbone DNA using T4 DNA ligase (New England BioLabs) according to the manufacturer's protocol, and transformed into cells. Only guide RNA scaffold oligonucleotides with 5' phosphorylation modifications were purchased. Some plasmids encoding pegRNA and epegRNA were synthesized by Twist Bioscience. Table 7 provides a list of pegrnas and nicking-generating sgrnas used in this work.
Unless otherwise stated, the assembled plasmid was transformed into One Shot Mach1 cells (Thermo Fisher Scientific) and contained in 50. Mu.g ml -1 Carbenicillin (Gold Biotechnology) Luria-Bertani (LB) or 2 XYT agar. The plasmid sequence was completely verified by Sanger sequencing (Quintara Biosciences), and bacteria containing the verified plasmid contained 100. Mu.g ml -1 Carbenicillin (Gold Biotechnology) was grown in 2 XYT medium. Plasmid DNA was isolated using QIAGEN Plasmid Plus Midi kit or QIAGEN Plasmid Plus Maxi kit, endotoxin removed and 2-fold recommended amount of RNase a was added to buffer P1. Some pegRNA and sgRNA plasmid DNA were isolated using the PureYield Plasmid Miniprep system (Promega Corporation) and endotoxin removed. Plasmid DNA purified using the PureYield Plasmid Miniprep system was used only for HEK293T and HeLa cell transfection. All plasmids were eluted in nuclease-free water (QIAGEN) and quantified using a NanoDrop One UV-Vis spectrophotometer (Thermo Fisher Scientific).
Lentivirus production for generating cell lines
To package lentiviruses to generate stable cell lines, HEK293T cells were grown at 7.5×10 per well 5 Individual cells were seeded on 6-well plates (Corning) and placed in DMEM supplemented with 10% fbs. At 16 hours post inoculation to 60% confluence, cells were transfected with 12. Mu.L Lipofectamine 2000 (Thermo Fisher Scientific) and 1.33. Mu.g lentiviral transfer plasmid, 0.67. Mu.g pMD2.G (Addgene # 12259) and 1. Mu.g psPAX2 (Addgene # 12260) according to the manufacturer's protocol. 6 hours after transfection, the medium was replaced with DMEM supplemented with 10% fbs. 48 hours after transfection, the virus supernatant was centrifuged at 3000g for 15 minutes to remove cell debris, filtered through a 0.45 μm PVDF filter (Corning) and stored at-80 ℃.
Design and construction of HEK293T line with integrated HBB coding region
Lentiviral transfer plasmids were designed and cloned to contain the human HBB coding region under expression of the EF1 alpha promoter (pEF 1 alpha)(CDS) and PuroR-T2A-BFP markers. Lentiviruses carrying the cassette were generated from HEK293T cells as described above. To stabilize the integration of HBB CDS, the cells were supplemented with 10% FBS and 10. Mu.g mL -1 Infection with lentivirus in 6X 10 plates (Corning) of DMEM with polybrene (Sigma-Aldrich) 5 HEK293T cells. BFP fluorescence was monitored daily using CytoFLEX S flow cytometer (Beckman Coulter) to ensure MOI of 0.1 and single copy integration. After 2 days of infection, at 2. Mu.g. Mu.L -1 HEK293T cells were selected for 3 days with puromycin (Thermo Fisher Scientific) and stable transduction was confirmed by measuring BFP fluorescence. The resulting cell line was used to optimize pegRNA for guided editing using SaPE2 at the integrated HBB site. For measurement editing, a PCR amplification was performed on 214bp amplicon of the integrated HBB CDS locus. Amplification of the genomic HBB locus using these primers resulted in a 1064bp amplicon of different sizes.
Design and construction of HeLa systems using CRISPRi sgrnas and guided editing targets
The lentiviral transfer plasmid backbone (pPC 1000) for the guide editing Repair-seq screen was designed and cloned to contain universal guide editing sites and express control streptococcus pyogenes sgrnas for CRISPRi targeting. The guide editing site consists of the HBB pre-targeting spacer of the Sa-pegRNA flanked by complementary strand Sa-sgRNA pre-spacers derived from the Saccharomyces cerevisiae genome. The pre-spacer is positioned such that the SaPE2-sgRNA complex nicks 50-bp upstream and 50-bp downstream of the nick formed by the SaPE 2-pegRNA. The 234-bp editing site was located near the EGFP targeting control streptococcus pyogenes sgRNA sequence expressed from the mouse U6 promoter, allowing for the amplification of the sgRNA and editing site in the same 453-bp amplicon by PCR. pPC1000 also contains a pEF1α -Puror-T2A-BFP selectable marker.
Lentiviruses encoding the pPC1000 cassette were generated from HEK293T cells as described above. To stabilize integration, the cells were supplemented with 10% FBS and 10. Mu.g mL -1 Infection with pPC1000 lentivirus in a DMEM 6-well plate (Corning) of polybrene (Sigma-Aldrich) 2.5X10 5 HeLa dCAS9-BFP-KRAB cells. BFP fluorescence was monitored using CytoFLEX S flow cytometer (Beckman Coulter) to ensure MOI of 0.1 and single copy integration. Infection withAfter 2 days, at 2. Mu.g. Mu.L -1 Puromycin (Thermo Fisher Scientific) was selected for 3 days and stable transduction was confirmed by measuring BFP fluorescence. The resulting HeLa dCAs9-BFP-KRAB cell line with integrated pPC1000 sequence was used to test guide editing conditions, sa-pegRNA and Sa-sgRNA for Repair-seq screening.
Transfection of HEK293T, heLa, HCT116 and N2A cells
Unless otherwise indicated, HEK293T cells were at 1.6-1.8X10 per well 4 Individual cells were seeded in DMEM plus GlutaMAX supplemented with 10% fbs on 96 well plates (Corning). 16 to 24 hours post inoculation, cells of 60% -80% confluency were transfected with 0.5 μl Lipofectamine 2000 (Thermo Fisher Scientific) and 200ng pilot editor plasmid, 66ng pegRNA plasmid, 22ng sgRNA plasmid (if needed) and 100ng for RFP or MMR protein expression (if needed) according to manufacturer's protocol.
For array experiments, heLa dCAs9-BFP-KRAB and HeLa cells were seeded at 8X 103 cells per well in DMEM plus Glutamax supplemented with 10% FBS on 96 well plates (Corning). Cells of 60% -80% confluence were transfected with 0.3. Mu.L of TransIT-HeLa reagent (Mirus Bio) and 56.25ng of the guide editor plasmid with P2A-BlastR selection marker, 18.75ng of the pegRNA plasmid, 6.25ng of the gRNA plasmid (if needed) and 28.1ng of the human codon optimized MLH1dn plasmid (if needed) at 16 to 24 hours post inoculation, according to manufacturer's protocol. 24 hours post-transfection, 10 ng. Mu.L -1 Blasticidin (Thermo Fisher Scientific) was added to each well to select cells expressing the guide editor.
HCT116 cells at 1.6X10 per well 4 Individual cells were seeded in DMEM plus GlutaMAX supplemented with 10% fbs on 96 well plates (Corning). From 16 to 20 hours post inoculation, cells of 60% -80% confluency were transfected with 0.5. Mu.L Lipofectamine 3000 plus 0.8. Mu. L P3000 reagent (Thermo Fisher Scientific) and 200ng of the guide editor plasmid with P2A-BlastR selectable marker, 66ng of the pegRNA plasmid, 22ng of the sgRNA plasmid (if needed) and 100ng of the MLH1dn plasmid (if needed) according to manufacturer's protocol. The following day of transfection, the medium was replaced with 10% FBS and 10 ng. Mu.L -1 Blasticidin (Thermo Fishe)r Scientific) fresh DMEM plus GlutaMAX to select cells expressing the guide editor.
N2A cells at 1.6X10 per well 4 Individual cells were seeded in DMEM plus GlutaMAX supplemented with 10% fbs on 96 well plates (Corning). From 16 to 20 hours post inoculation, cells of 60% -80% confluency were transfected with 0.5 μl Lipofectamine 2000 (Thermo Fisher Scientific) and 175ng of guide editor plasmid, 50ng of pegRNA plasmid, 20ng of sgRNA plasmid (if needed) and 87.5ng of plasmid encoding human or mouse codon optimized hMLH1dn (if needed) according to manufacturer's protocol. Genomic DNA was extracted 72 hours after transfection.
Electroporation of HAP1, K562 and U2OS cells
According to the manufacturer's protocol, HAP1 cells were subjected to nuclear transfection using SE Cell Line 4D-Nucleofector X Kit S (Lonza) using 4X 10 5 Individual cells (procedure DZ-113), 300ng of PE2-P2A-BSD, 100ng of pegRNA plasmid and 33ng of sgRNA plasmid, if desired. Following nuclear transfection, cells were cultured in 48-well plates (Corning) with IMDM supplemented with 10% fbs plus GlutaMAX. The following day of nuclear transfection, the medium was replaced with 10% FBS and 10 ng. Mu.L -1 Fresh IMDM plus GlutaMAX to blasticidin (Thermo Fisher Scientific) was added to select cells expressing the guide editor.
According to the manufacturer's protocol, K562 cells were nuclear transfected with SF Cell Line4D-Nucleofector X Kit S (Lonza) using 5X 10 5 Individual cells (program FF-120), 800ng of the guide editor plasmid, 200ng of the pegRNA plasmid, 83ng of the sgRNA plasmid (if desired) and 400ng of the MLH1dn plasmid (if desired). After nuclear transfection, cells were plated in 6-well plates (Corning) with 10% FBS and 292. Mu.g mL -1 RPMI 1640 medium of L-glutamine (Thermo Fisher Scientific) was cultured.
According to the manufacturer's protocol, U2OS cells were subjected to nuclear transfection using SE Cell Line4D-Nucleofector X Kit S (Lonza) using 2X 10 5 Individual cells (procedure DZ-100), 160 ng of PE2 or PE2-P2A-MLH1dn plasmid, 400ng of pegRNA plasmid and 166ng of sgRNA plasmid, if desired. Following nuclear transfection, cells were treated with 10% FBS in 12-well or 24-well plates (Corning)Is cultured in McCoy's 5A medium.
Genomic DNA extraction
Unless otherwise indicated, HEK293T, heLa dCas9-BFP-KRAB, heLa, HCT, N2A, HAP1, K562 and U2OS cells were cultured for 72 hours after transfection or nuclear transfection, and then genomic DNA was isolated. Cells were washed once with PBS (Thermo Fisher Scientific) and lysed with gDNA buffer (10 mM Tris-HCl, pH 8.0;0.05% SDS, 800. Mu.L -1 Protease K (New England BioLabs)) was cleaved at 37℃for 1.5-2 hours, followed by enzyme inactivation at 80℃for 30 minutes.
High throughput amplicon sequencing of genomic DNA samples
To assess gene editing, loci were amplified from genomic DNA samples via two rounds of PCR and deep sequenced. Briefly, the initial PCR step (PCR 1) uses primers (Integrated DNA Technologies) containing Illumina forward and reverse aptamers to amplify the genomic sequence of interest. Each 20. Mu.L PCR1 reaction was performed on a CFX96 Touch real-time PCR detection system (Bio-Rad Laboratories) using 500nM of each primer, 0.8 to 1.0. Mu.L of genomic DNA, 1 XSYBR Green (Thermo Fisher Scientific) and 10. Mu. L Q5 High-Fidelity 2 XSMaster Mix (New England BioLabs), with the following thermal cycling conditions: 98℃for 2 minutes, 29-31 cycles [98℃for 10 seconds, 61℃for 20 seconds, 72℃for 30 seconds ], then 72℃for 2 minutes. PCR1 reactions were monitored with SYBR Green fluorescence to avoid over-amplification. Table 8 provides a list of primers used in the PCR1 reaction.
TABLE 8 primers for PCR1 reaction
/>
/>
/>
The subsequent PCR step (PCR 2) adds unique i7 and i5 Illumina bar code combinations at both ends of the PCR1 DNA fragment to effect sample multiplexing. Each 12.5. Mu.L PCR2 reaction was performed using 500nM each of the barcode primers, 0.5. Mu.L PCR1 product, and 6.25. Mu. L Phusion U Green Multiplex PCR Master Mix (Thermo Fisher Scientific), with the following thermocycling conditions: 98℃for 2 minutes, 9 cycles [98℃for 15 seconds, 61℃for 20 seconds, 72℃for 30 seconds ], then 72℃for 2 minutes. PCR2 products were pooled by common amplicon, separated by electrophoresis on a 1% agarose gel, purified using a QIAquick gel extraction kit (QIAGEN), and eluted in nuclease free water. The DNA amplicon library was quantified using a Qubit 3.0 fluorometer (Thermo Fisher Scientific) and then sequenced using MiSeq Reagent Kit v or MiSeq Reagent Micro Kit v2 (Illumina) with a single read cycle of 280-300. Table 7 provides a list of FASTQ sequencing files generated in this work.
Quantification of amplicon sequencing data
The guided editing experiments from all permutations of fig. 42A-54B were analyzed as follows. Sequencing reads were demultiplexed using a MiSeq report (Illumina). The amplicon sequence was aligned to the reference sequence using CRISPResso2 in standard mode using the parameters "-q 30" and "-discard_indel_reads TRUE" (Clement et al, 2019). For each amplicon, the position of the CRISPResso2 quantification window includes the entire sequence between the pegRNA and the sgRNA-directed Cas9 cleavage site, as well as an additional ≡10-nt beyond the two cleavage sites. For each amplicon, the same quantitative window was used for PE2, PE3, PE4 and PE5 conditions, whether or not nicking sgRNA was used. All guided editing efficiencies describe the percentage of (number of reads expected to edit without indels)/(number of reads aligned to amplicon). Single base substitution directs the quantification of editing frequency to: (frequency of expected base substitutions in reference aligned, non-discarded reads) × (number of reference aligned, non-discarded reads)/(number of reference aligned reads). For all other guided edits (insertions, deletions, successive substitutions, edit combinations), CRISPResso2 is run in HDR mode with all the same parameters described above and uses the expected editing result as the expected allele (-e). The frequencies of these edits are quantized to: (number of HDR aligned reads)/(number of reference aligned reads). All indel frequencies were quantified as: (number of reads containing indels)/(number of reference aligned reads).
CRISPRi library cloning and lentivirus production
The CRISPRi sgRNA oligonucleotide library oBA697 was designed to contain 60 non-targeted control sgrnas and 1,513 sgrnas targeting 477 DNA repair genes, 3 sgrnas per gene (humismann et al, 2021). Table 5 provides a list of targeted genes and sequences in the sgRNA library. oBA697 oligonucleotide library was amplified by PCR using Phusion high fidelity DNA polymerase (New England BioLabs) and purified using Nucleospin Gel and PCR Clean-up kit (Macherey-Nagel). Amplified CRISPRi library inserts and pPC1000 lentiviral screening vectors containing pre-validated guide editing sites were digested with BstXI and BlpI restriction endonucleases (Thermo Fisher Scientific), ligated with T4 ligase (New England BioLabs) and then transformed into stillar cells (Takara Bio). The pooled plasmid library was amplified in MegaX DH10B electrocompetent cells (Thermo Fisher Scientific). Pooled library sequences were further verified by sequencing on MiSeq Reagent Kit v (Illumina).
To generate lentiviruses using the pPC1000 library, HEK293T cells were seeded in 15cm dishes containing DMEM supplemented with 10% fbs. One day after inoculation, cells were transfected with 60. Mu.L of TransIT-LT1 reagent (Mirus Bio) and 15. Mu.g of pPC1000 plasmid library and 5. Mu.g of packaging plasmid to express HIV-1gag/pol, rev, tat and VSV-G envelope proteins. 24 hours after transfection, 40. Mu.L of ViralBoost reagent (ALSTEM) was added to each 15cm dish. The virus supernatant was collected 48 hours after transfection, filtered through a 0.45 μm PVDF filter (Corning) and stored at-80 ℃.
Repair-seq screening in HeLa cells
PE2、PE3+50, PE3-50Repair-seq screening biological repeat screening was performed in HeLa cells with integrated dCAS9-BFP-KRAB (Addgene # 46911) (Gilbert et al, 2013). Lentiviruses containing the CRISPRi library were generated from HEK293T cells as described above. dCAS9-KRAB expressing HeLa CRISPRi cells were transduced with a lentiviral library in DMEM at 0.1MOI supplemented with 10% FBS, 100U mL -1 Penicillin, 100. Mu.g mL -1 Streptomycin (Thermo Fisher Scientific) and 8 μg mL -1 Polycuramide (Sigma-Aldrich). 2 days after infection, 1. Mu.g mL was used -1 Puromycin (Thermo Fisher Scientific) treats HeLa CRISPRi cells to select HeLa CRISPRi cells with integrated library members. 3 days after infection, 2. Mu.g mL was added to the cells -1 Puromycin. Throughout the lentiviral transduction and selection steps, cells were analyzed for BFP fluorescence on a BD LSRII flow cytometer to ensure an MOI of 0.1 and complete selection. After 3 days of selection, the medium was replaced with DMEM supplemented with 10% fbs and 50% confluent HeLa CRISPRi cells were transfected in 150mm dishes (Corning) using the following: 30. Mu.g of the SaPE2-P2A-BSD plasmid, 10. Mu.g of the Sa-pegRNA plasmid for the installation of +6G.C-to-C.G edits at the previously validated editing site, and 3.3. Mu.g of the Sa-sgRNA plasmid for +50 or-50 complementary strand nick generation (if necessary), 140. Mu.L of TransIT-HeLa reagent (Mirus Bio) were used according to the manufacturer's protocol. For the unedited HeLa control conditions, cells were transfected with only 30. Mu.g of the SaPE2-P2A-BSD plasmid as described above. 24 hours post-transfection, 10. Mu.g mL was used -1 Blasticidin (Thermo Fisher Scientific) treated cells to select for expression of SaPE2 protein. 72 hours after transfection, heLa CRISPRi cells were washed with PBS (nermo Fisher Scientific), resuspended with trypsin and DMEM, and pelleted at 1000g for 10 minutes. Finally, the cells were washed again with PBS, precipitated at 1000g for 10 minutes, and then stored at-80 ℃.
Repair-seq screening in K562 cells
PE2 and PE3+50 Repair-seq screening biological repeat screening was performed in K562 cells with integrated dCAS9-BFP-KRAB (Addgene # 46911) (Gilbert et al, 2014). Lentiviruses containing the CRISPRi library were generated from HEK293T cells as described above. Will expressdCAS9-KRAB K562 CRISPRi cells were transduced with lentiviral library at 0.2MOI in RPMI supplemented with 10% FBS, 100U mL -1 Penicillin, 100. Mu.g mL -1 Streptomycin, 292 μg mL -1 L-Glutamine and 8 μg mL -1 Polycuramide (Sigma-Aldrich) was centrifuged at 1000g for 2 hours at room temperature. 2 days after infection, 3. Mu.g mL was used -1 Puromycin (Thermo Fisher Scientific) treated K562 CRISPRi cells to select cells with members of the integration library. After infection, the density of K562 CRISPRi cells was maintained at about 5X 10 5 mL -1 And 3 and 5 days after infection with 100U mL supplemented with 10% FBS -1 Penicillin, 100. Mu.g mL -1 Streptomycin, 292 μg mL -1 L-Glutamine and 3 μg mL -1 Fresh RPMI replacement cultures of puromycin. During medium exchange, cells were pelleted, washed with DPBS and resuspended in fresh medium to remove dead cells. All centrifugation was performed at 150g for 5 minutes in 50mL standard tubes to avoid cell loss. 6 days post infection, with a supplement of 10% FBS and 292. Mu.g mL -1 Fresh RPMI of L-glutamine (Thermo Fisher Scientific) was changed to culture twice to remove dead cells and antibiotics. Throughout the lentiviral transduction and selection steps, the BFP fluorescence of the cells was analyzed on an Attune NxT flow cytometer to ensure an MOI of 0.2 and complete the selection. Cells were nuclear transfected with SE Cell Line 4D-Nucleofector X kit L (Lonza) using 1X 10 at 7 days post infection 7 Individual cells (program FF-120), 7.5 μg of SaPE2 plasmid, 2.5 μg of Sa-pegRNA plasmid for installation of +6G.C-to-C.G editing at pre-validated editing sites, and 3.3 μg of Sa-sgRNA plasmid for +50 complementary strand nick generation (for PE3+50 conditions). For the unedited K562 control conditions, cells were mock electroporated as described above without any DNA plasmid. Following nuclear transfection, cells were plated at 5X 10 5 mL -1 Density inoculation with 10% FBS and 292. Mu.g mL -1 L-glutamine in RPMI. 48 hours after nuclear transfection, the cultures were blown up and down 5 times to prevent cell aggregation. 84 hours after nuclear transfection, cells were pelleted at 1000g for 10 minutes, washed with DPBS, pelleted at 1000g for 10 minutes, and then stored at-80 ℃.
High throughput sequencing of Repair-seq library
Genomic DNA was extracted from edited HeLa CRISPRi and K562 CRISPRi cells using NucleoSpin Blood XL Maxi kit (machey-Nagel). Table 9 lists the number of living cells from which genomic DNA was extracted under each Repair-seq condition.
TABLE 9 high throughput sequencing of Repair-seq library
Genomic DNA was used as a template for the first round of PCR (PCR 1) to amplify the region of interest. For each sample, 100 μl PCR1 reactions were performed on a BioRad C1000 thermocycler using 1 μl of 100 μΜ of each primer for amplifying pPC1000 sgrnas and editing sites (table 8), 10 μg genomic DNA as template, 50 μ L NEBNext Ultra II Q5 Master Mix (New England BioLabs) and molecular biology grade water (corning), with the following thermocycler conditions: 98℃for 30 seconds, 22 cycles [98℃for 10 seconds, 65℃for 75 seconds ], then 65℃for 5 minutes. These amplification reactions were verified by TBE gel electrophoresis and ethidium bromide staining. The amplified product was purified using SPRIselect (Beckman Coulter) prior to the next round of PCR amplification (PCR 2). Purified PCR1 amplicon was quantified using a high sensitivity DNA chip (Agilent Technologies) on an Agilent 2100 Bioanalyzer. PCR2 enabled sample indexing by adding the i7 and i5 Illumina barcodes. For each 50 μl index PCR2 reaction: using 10ng PCR1 amplicon as template, 25 μ L KAPA HiFi HotStart ReadyMix (Roche Molecular Systems), 3 μL 10 μM each of the barcode primers, the following thermocycling conditions were used: 95℃for 3 minutes, 8 cycles [98℃for 20 seconds, 65℃for 15 seconds, 72℃for 15 seconds ], then 72℃for 1 minute. The reaction was verified by TBE gel electrophoresis and ethidium bromide staining. PCR2 products from the four PCR2 products were purified using SPRISelect and quantified on an Agilent 2100 Bioanalyzer prior to pooling. The Repair-seq library was sequenced using NovaSeq 6000 S1 Reagent Kit v1.5 (Illumina) with two 8-nt index reads, 44 cycles for R1 reads, 263 cycles for R2 reads. Table 9 lists each screening condition and the number of sequencing reads obtained in duplicate.
Processing of Repair-seq screening data
The Repair-seq screening data was processed using a modified version of the analysis method described in (humismann et al 2021) and modified to accommodate the different library preparation strategies used in this example (direct amplification of genomic DNA without UMI ligation prior to amplification) and the different nature of Repair outcome categories empirically observed in the pilot editing data.
Briefly, a batch of screened sequencing data consisted of 4 reads per spot: 2 8-nt index reads, 44-nt R1 reads of CRISPRi sgRNA, 263-nt R2 reads of repair results. Reads from a batch of screens are demultiplexed into individual screens based on index reads. In each screen, reads were demultiplexed into groups representing the results of cells receiving each individual CRISPRi sgRNA by comparing the R1 sequence to the table of expected CRISPRi sgrnas, allowing at most one mismatch between the observed sequence and the expected sequence. Since direct amplification without UMI does not allow for consistent error correction of multiple reads per repair result, analysis must take into account errors present in the sequence of results introduced by PCR or sequencing to avoid interpreting such errors as true edit results. As an initial classification, reads with a base detection rate of greater than or equal to 30, less than 60%, are discarded.
To classify repair result sequencing reads passing this mass filter, the results were first aligned locally with the screening vector, pegRNA sequence, human genome (hg 19) and bovine genome (bosTau 7) to identify portions of the result sequences and a comprehensive set of alignments of any of these reference sequences. The identified local alignment set is then pruned to a reduced alignment set that uses a greedy approach to interpret as many reads as possible using a minimum number of alignments. The reduced comparison is then parsed through a decision tree that examines its construction to assign results to categories.
If the result consists of a single alignment with the screening vector without the programming SNV or any indels, the result is classified as unedited, except for 1-nt deletions which do not fall within the 5-nt range of the programming cut, which are considered to be possible sequencing or PCR errors and discarded. Results are classified as deleted if they consist of two alignments with the screening vector that together cover the entire read but omit a portion of the screening vector. Results are classified as tandem repeats if they consist of two or more alignments of the selection vector that together cover the entire read such that any two consecutive alignments cover the selection vector partially overlapping on the read. If a set of reduced alignments includes alignments with the pegRNA such that the Primer Binding Site (PBS) of the pegRNA is aligned to the same portion of the reads that are PBS in the screening vector alignment, but the Reverse Transcription Template (RTT) of the pegRNA is not aligned to the same portion of the reads that are RTT in the screening vector alignment, the result is classified as ligating the pegRNA sequence at an unexpected position. Note that in some such cases, the resulting sequence is also theoretically consistent with a multi-stage editing event consisting of an initial deletion or replication that does not disrupt PAM or pre-spacer and subsequent pegRNA-dependent editing of the resulting modified target sequence. If a set of reduced alignments includes alignments with pegrnas such that the PBS and RTT are aligned to the same portion of reads as PBS and RTT in a single alignment of the screening vector covering the entire read, and the pegRNA alignment contains fewer edits relative to the results of the screening vector alignment, the results are classified as installing additional edits from closely matched scaffold sequences.
Quantification of CRISPRi-induced frequency variation of results
After classifying all results for all CRISPRi sgrnas, counts for each class of each sgRNA were collected into a matrix for downstream analysis and the total frequency for each class of all results for cells receiving non-targeted sgrnas was calculated to establish an undisturbed baseline frequency. Because not all CRISPRi sgrnas achieve high levels of knockdown, calculating the gene level impact of CRISPRi sgrnas on the outcome category must balance between increased confidence in assigning multiple sgrnas supported phenotypes per gene if not all sgrnas target genes with high activity. To do this, the gene level variation of the resulting class frequency in a given screening repeat (used in fig. 40E-40G, 41E-41H, 47A-47I) was calculated by first calculating the log2 fold change of the resulting class frequency for each targeted sgRNA relative to the combined frequency of all non-targeted sgrnas. For each gene, the log2 fold change in gene level was taken as the average of these values for the two sgrnas targeting the gene with the most extreme absolute values. To qualitatively estimate the range of values produced in the absence of real signals by the process of selecting extrema, sixty non-targeted sgrnas were randomly divided into 20 groups of 3 quasicogens each and the same process was applied to these quasicogens.
Quantification of missing boundaries
In order to maximize the signal-to-noise ratio when calculating the position-specific profile of deletion frequencies and removing the relative fraction of deletions of sequences far from the programmed cut, the results of all sgrnas targeting each gene were grouped together. In each set of such results, the count array for each position in the screening vector is initialized to 0. For each read classified as a miss, the interval from the first miss location to the last miss location in the count array is incremented by 1. The final count array is then divided by the total number of results. Deletions flanking with little homology result in sequence results consistent with two or more degenerate deletion boundary pairs. For these deletions, the pair with the smallest value in the screening vector coordinate system is arbitrarily selected. To prevent primer dimers or other non-specific amplification products from being erroneously identified as long deletions, obvious deletions of deletion regions overlapping the 10-nt window around either amplicon primer were excluded from the calculation of the deletion boundary statistics.
Design of recoded Sa-pegRNA scaffolds
In the performed Repair-seq screening, unexpected editing results were observed, with additional edits installed from closely matched scaffold sequences. In addition to the expected +6G.C-to-C.G transversions, the results also contained +17T.A-to-C.G and +19C.G insertions. These unexpected edits are consistent with the incorporation of an extended 3' DNA flap into the genome generated by reverse transcription of the Sa-pegRNA scaffold sequence. Since the extended 3' flap shares 5-nt homology (5 ' -GCCAA-3 ') with the genomic target sequence following the last edited nucleotide, it is hypothesized that disruption of this homology may reduce the frequency of reverse transcription from the Sa-pegRNA scaffold incorporating these unintended edits. Thus, recoded Sa-pegRNA was designed that altered two base pairs within the Sa-pegRNA scaffold while retaining the same base pairing interactions. The extended 3' flap templated by the recoded Sa-pegRNA has reduced homology to the genomic target sequence. The spacer, PBS and RT template sequences of the recoded Sa-pegRNA are identical to those of the Sa-pegRNA used in the Repair-seq screen. It was observed that guided editing using this re-encoded Sa-pegRNA mediates similar expected editing frequencies, but greatly reduces the unintended incorporation of scaffold sequences, compared to the original Sa-pegRNA used in the Repair-seq screen.
HEK293T siRNA transfection
For the experiment in fig. 42C, HEK293T cells were at 7.5 x 10 per well 5 Density of individual cells was seeded in DMEM plus GlutaMAX supplemented with 10% fbs on 6 well plates (Corning). At 16 hours post inoculation to 60% confluence, cells were transfected with 9. Mu. L Lipofectamine RNAiMAX (Thermo Fisher Scientific) and 90pmol ON-TARGETplus SMARTpool siRNA (Horizon Discovery) according to the manufacturer's protocol. One day after transfection, the medium was replaced with fresh DMEM plus GlutaMAX supplemented with 10% fbs. 2 days after transfection, cells were washed once with PBS and resuspended using TrypLE (Thermo Fisher Scientific) and DMEM plus GlutataMAX supplemented with 10% FBS. HEK293T cells were then plated at 2.5X10 per well 4 Individual cells were seeded onto 96-well plates (Corning). From 16 to 24 hours post inoculation, cells of 60% -80% confluency were transfected with 0.5. Mu.L Lipofectamine 2000 (Thermo Fisher Scientific) and 200ng pilot editor plasmid, 66ng pegRNA plasmid, 22ng sgRNA plasmid (if needed) and 5pmol of the same ON-TARGETplus SMARTpool siRNA used in the first transfection, according to manufacturer's protocol. For control conditions, cells were treated with non-targeted siRNA in both transfections. For the experiment in fig. 49F, only a second transfection with PE component and siRNA was performed. After the second transfection, the cells were cultured for 72 hours, then Extracting genome DNA.
Real-time quantitative PCR
To measure RNAi knockdown (FIG. 49E), RNA was isolated from HEK293T cells 72 hours after the second siRNA transfection and converted to cDNA using a SYBR Green Fast Advanced Cells-to-CT kit (Thermo Fisher Scientific), where cell lysis was performed for 10-15 minutes using a lysate containing 1:50 DNaseI to completely digest genomic DNA. All other steps were performed according to the manufacturer's protocol. Each 20 μL qPCR reaction was repeated technically and biologically using 500nM of each primer, 2 μL cDNA, 1×SYBR Green (Thermo Fisher Scientific) and 10 μ L Q5 High-Fidelity 2×Master Mix (New England BioLabs) on a CFX96 Touch real-time PCR detection system (Bio-Rad Laboratories) with the following thermocycling conditions: 98℃for 2 minutes, 40 cycles [98℃for 15 seconds, 65℃for 20 seconds, 72℃for 30 seconds ]]. beta-Actin (ACTB) served as a housekeeping gene for normalizing the amount of eDNA in each qPCR reaction. Through 2 -ΔΔCT The method calculates the relative RNA abundance of gene knockdown compared to a non-targeted siRNA control. Table 8 provides a list of primers used in the qPCR reaction.
Plasmid transfection agent titration in HEK293T cells
For the experiment in fig. 50B, HEK293T cells were seeded in DMEM plus GlutaMAX supplemented with 10% fbs on 96 well plates (Corning) at a density of 1.6-1.8 x 104 cells per well. 16 to 24 hours after inoculation, cells of 60% -80% confluence were transfected with 0.5. Mu.L Lipofectamine 2000 (Thermo Fisher Scientific) and 66ng pegRNA plasmid, 0-200ng PE2 plasmid and 0-100ng for MLH1dn or RFP plasmid, according to manufacturer's protocol. pUC19 stuffer plasmid was combined with varying amounts of PE2, MLH1dn and RFP plasmids to maintain the total plasmid amount transfected constant (366 ng). In titrations to change the amount of total editor and trans protein, the PE2 plasmid was used with MLH1dn and RFP in a 2:1 mass ratio. Genomic DNA was isolated from cells 72 hours after transfection.
Generation of MLH1 knockout HeLa cell clones
Comparison of expression from MLH1dn with MLH1 knockout using one cloned wild-type HeLa cell line and two cloned ΔMLH2 1 HeLa cell linesIs enhanced (fig. 50F). To generate clonal lines, heLa cells were grown at 2.5X10 cells per well 5 Individual cells were seeded in DMEM plus GlutaMAX supplemented with 10% fbs on 6 well plates (Corning). At 18 hours post inoculation to 60% confluence, cells were transfected with 7.5. Mu.L of TransIT-HeLa reagent (Mirus Bio) using 2. Mu.g of pLX_331-Cas9 (SpCas 9 with blasticidin marker, addgene # 96924) and 500ng of sgRNA plasmid (spacer, 5'-GACAGTGGTGAACCGCATCG-3' (SEQ ID NO: 367)) according to manufacturer's protocol. To prepare cloned wild-type HeLa cells as controls, 500ng of pUC19 plasmid was transfected instead of the sgRNA plasmid. 24 hours post-transfection, 10 ng. Mu.L -1 Blasticidin (Thermo Fisher Scientific) was added to each well to select cells transfected with Cas 9.
Three days after transfection, cells were plated on 96-well plates at a density of 1 cell per well and conditioned DMEM supplemented with 10% fbs plus GlutaMAX. Individual clones were grown and amplified for 18 days. To verify that Δmlh1 cells contained a biallelic MLH1 frameshift mutation and control cells contained a wild-type genotype, the MLH1 locus from the cloned genomic DNA was sequenced as described above at MiSeq (Illumina). FASTQ sequencing files for MLH1 in HeLa clones are listed in table 3. HeLa ΔMLH2 1 clone 1 contains MLH1c.55_56 insA and c.41_58 dellinstAACTTCC alleles. HeLa ΔMLH2 clone 2 contains MLH1c.55_56 insA and c.20_66del alleles. All pilot editing experiments using these cloned HeLa lines were performed as described above for HeLa cells.
Guided editing of successive substitution and additional silent mutations
Seven sets of guided edits (35 edits total) were tested at five loci of HEK293T cells, replacing 1-5 consecutive bases. In each set of consecutive substitutions, all five edits change at least one base (+1-3 nucleotides) within the seed region of the pegRNA pre-spacer, at least one base (+5G or +6G) within the PAM sequence of the pegRNA pre-spacer, or no base at all within the seed region or PAM sequence. Since guided editing of changing seed regions or PAM sequences can be performed more efficiently, the design of these successive substitution edits can control these confounding effects on editing efficiency, enabling comparison of editing efficiencies within each group.
Six sets of guided edits (27 edits total) were tested on six gene targets in HEK293T cells, and these edits were programmed to encode changes with or without additional silent mutations. Each of the six coding edits will occur at one of the PAM nucleotides (+5g or +6g) of the pegRNA pre-spacer. This design controls confounding effects on editing efficiency (as described above), allowing editing efficiency within each group to be compared. The silent mutations are designed to approximate (typically within 5-bp) the expected coding edits to maximally interfere with MMR recognition of the expected coding edits. As described above, the frequency of reads containing the desired coding edits without indels and with or without any additional silent mutations was quantified using CRISPResso 2.
Analysis of Cas9 off-target site directed editor Activity
As described above, the guided editor activity of the known Cas9 off-target site was determined by sequencing the genomic DNA of HEK293T cells 3 days after transfection with the plasmid encoding PE2, pegRNA and MLH1dn (if needed). The first 4 off-target sites (Tsai et al, 2017) (16 sites total) of each of the HEK3, EMX1, FANCF and HEK4 spacer previously detected by CIRCLE-seq were deep sequenced from genomic DNA samples, as described above. To analyze off-target editing, reads were aligned with reference off-target amplicon using parameters "-q30" and "-w 10" in standard mode using CRISPResso2 (Clement et al, 2019). Off-target reads are invoked as loosely as possible to capture all potential reverse transcription products. For each off-target reference amplicon, the 3 'nucleotide sequence of the Cas9 nick site (which can guide editing of the target) is compared to the 3' dna flap sequence encoded by the pegRNA reverse transcription. Counting from the 5 'end, the smallest sequence that deviates from the 3' dna flap that could guide editing of the target sequence is designated as the off-target marker sequence. All reference aligned reads containing the off-target marker sequence directly 3' to the Cas9 cleavage site (including reads containing indels) are referred to as off-target reads. Thus, off-target editing efficiency is quantified as a percentage of (number of off-target reads)/(number of reference alignment reads). For some amplicons, the mismatch rate at the relevant editing position is comparable to the mismatch rate at other positions in the amplicon, indicating that context-specific sequencing errors may lead to significant off-target guided editing, and thus this conservative approach may overestimate the true rate of off-target site pegRNA mediated editing.
Sequencing of microsatellite instability in genomic DNA
Microsatellite instability in genomic DNA of HCT116 cells, monoclonal wild-type HAP1 cells, monoclonal HAP1 cells grown for 2 months (about 60 cell divisions) after MSH2 knockout, heLa cells transfected with plasmids encoding PE2-P2A-BSD, pegRNA and MLH1dn (if necessary) for 3 days was assessed. HeLa cell transfection was performed as described above. The 17 single nucleotide repeats in genomic DNA samples were deeply sequenced, and these repeats were highly sensitive to MMR activity and were widely used to diagnose MMR defects in tumors (Bacher et al 2004;Hempelmann et al, 2015; umar et al 2004). The first PCR reaction (PCR 1) uses primers (Integrated DNA Technologies) containing Illumina forward and reverse aptamers to amplify the microsatellite sequences of interest. Each 20. Mu.L PCR1 reaction was performed on a CFX96 Touch real-time PCR detection system (Bio-Rad Laboratories) using 250nM of each primer, 0.8. Mu.L of genomic DNA, 1 XSYBR Green (Thermo Fisher Scientific) and 10. Mu. L Q5 High-Fidelity 2 XScale Mix (New England BioLabs), with the following thermal cycling conditions: 98℃for 3 minutes, 30 cycles [98℃for 15 seconds, 62℃for 30 seconds, 72℃for 30 seconds ], then 72℃for 3 minutes. All 17 PCR1 products amplified from the same genomic DNA sample were pooled, purified with 0.8XAMPure XP beads (Beckman Coulter) and eluted with nuclease-free water. Table 8 provides a list of primers used in the PCR1 reaction. The subsequent PCR step (PCR 2) adds unique i7 and i5 Illumina bar code combinations at both ends of the PCR1 DNA amplicon to achieve sample multiplexing. Each 20. Mu.L PCR2 reaction was performed on a CFX96 Touch real-time PCR detection system using 500nM of each barcode primer, 25ng of mixed PCR1 product, 1 XSYBR Green, and 10. Mu. L Q5 High-Fidelity 2 XScale Mix, with the following thermocycling conditions: 98℃for 2 minutes, 8 cycles of [98℃for 15 seconds, 61℃for 20 seconds, 72℃for 30 seconds ], then 72℃for 2 minutes. All PCR2 products were pooled, purified with 0.8×ampure XP beads, and eluted with nuclease-free water. The DNA amplicon library was quantified using a Qubit 3.0 fluorometer (Thermo Fisher Scientific) and then sequenced using MiSeq Reagent Kit v (Illumina) with 300 single read cycles. Table 7 provides a list of FASTQ sequencing files generated in these experiments.
Quantification of microsatellite instability
Seventeen microsatellites analyzed were each composed of long homopolymers. In order to quantify the observed length of these microsatellites in a manner that is robust to the high sequencing error rates observed in the homopolymer, sequences expected to flank the homopolymer are searched in each sequencing read, and the final length of the homopolymer is then considered as the distance between these flanking sequences. Specifically, for each locus, the longest homopolymer within the amplicon was identified and 20-nt of the expected reference sequence on both sides was recorded. Sequencing reads were demultiplexed to their origin loci based on the first 20 nucleotides of each read. Within the reads of each locus, for each sequencing read, the first occurrence of sequences within hamming distance 2 of the two flanking sequences is recorded. If both flanking sequences are in the intended opposite direction and the respective distance is within 50nt, the distance between them is recorded.
in vitro transcription of the guide editor and MLH1dn mRNA used in iPSC and T cell experiments
The plasmids were cloned to encode an inactivated T7 promoter, followed by a 5 'untranslated region (UTR), a Kozak sequence, a coding sequence for PE2 or MLH1dn, and a 3' UTR, as described previously (Nelson et al, 2021). Inactivation of the T7 promoter prevents potential transcription of the circular plasmid template during mRNA production. PCR amplification was performed using primers that correct for T7 promoter inactivation and append 119-nt Poly (A) tails to the 3' UTR, these components together using Phusion U Green Multiplex Master Mix (Thermo Fisher Scientific). The resulting PCR product was purified using a QIAquick PCR purification kit (Thermo Fisher Scientific) and used as a template for subsequent in vitro transcription. PE2 and MLH1dn mRNA were transcribed from these templates using HiScribe T7 high yield RNA synthesis kit (New England BioLabs) and CleanCap A was used G (TriLink Biotechnologies) Co-transcription capping and capping with N 1 -methyl pseudouridine-5' -triphosphate (TriLink Biotechnologies) completely replaces UTP. Transcribed mRNA was precipitated in 2.5M lithium chloride (Thermo Fisher Scientific), washed twice in 70% ethanol, and then dissolved in nuclease-free water. The PE2 and MLH1dn mRNA obtained was quantified using a NanoDrop One UV-Vis spectrophotometer (Thermo Fisher Scientific) and stored at-80 ℃.
Electroporation of induced pluripotent stem cells derived from human patients
Prior to electroporation, 24-well plates (Thermo Fisher Scientific) were coated with 250. Mu.L per well of rhLamin-521 (Thermo Fisher Scientific) diluted 1:40 in DPBS (Thermo Fisher Scientific) and at 5% CO 2 Incubate for 2 hours at 37 ℃. For electroporation, iPS cell colonies with a degree of fusion of 70% -80% were washed once with DPBS and at 5% co 2 Dissociation was carried out in a pre-heated Accutase (Innovative Cell Technologies) for 10 minutes at 37℃in an incubator. Next, iPS cells were gently ground, transferred into sterile 15mL conical tubes, and then mixed with an equal volume of DMEM/F12 (Thermo Fisher Scientific) to quench the dissociating enzyme activity. Cells were pelleted at 300g for 3 min and resuspended in StemFlex medium (Thermo Fisher Scientific) supplemented with 10. Mu. M Y-27632 (Cayman Chemical). Cell count and viability were determined using a Countess II FL automated cell counter (Thermo Fisher Scientific). For electroporation using NEON transfection System 10. Mu.L kit (Thermo Fisher Scientific), 2X 10 5 iPS cells were pelleted at 300g for 3 min and then resuspended in 9 μl NEON buffer R. The cell solution was combined with 1. Mu.g PE2 mRNA, 90pmol synthetic pegRNA (Integrated DNA Technologies), 60pmol synthetic sgRNA (Synthego) (if needed) and 3. Mu.L of a mixture of 0-2. Mu.g MLH1dn mRNA in NEON buffer R. The synthesized pegRNA and sgRNA were dissolved in TE buffer (10 mM Tris-HCl, pH 8.0;0.1mM EDTA). Simulated control electroporation was performed using 3 μl NEON buffer R without any RNA addition. Prior to electroporation, rhLamin-521 was aspirated and immediately replaced with 250. Mu.L of pre-warmed StemFlex medium supplemented with 10. Mu. M Y-27632 per 24 wells. Next, NEON transfection was usedThe system (Thermo Fisher Scientific) electroporates 10 μl of the combined cell and RNA mixture with the following parameters: 1400V, 20ms, one pulse. Cells were immediately seeded into rhLaminin-521 coated 24-well plates, each supplemented with 250 μl of StemFlex medium supplemented with 10 μ M Y-27632. The next day was replaced with 500. Mu.L of StemFlex medium supplemented with 5. Mu. M Y-27632. 72 hours after electroporation, the medium was replaced with 500. Mu.L of StemFlex medium per well. 96 hours after electroporation, iPS cells were washed once with DPBS, lysed with gDNA lysis buffer (10 mM Tris-HCl, pH 8.0;0.05% SDS, 800 units. Mu.L -1 Protease K (New England BioLabs)) was cleaved at 37℃for 2 hours to extract genomic DNA, followed by enzyme inactivation at 80℃for 30 minutes. All iPSC electroporation was performed in duplicate technical replicates and three biological replicates. After amplicon sequencing of the edited CDKL5 locus, the frequency of expected edits and indels was quantified using CRISPResso2 in HDR mode, as described above. Since the c.1412dela allele of the patient-derived iPSC is heterozygous, the frequency of editable alleles with expected edits is quantified as: (edit frequency-edit frequency in analog control)/(edit frequency in 100-analog control). The frequency of editable alleles with indels was quantified as described above: (total number of reads containing indels)/(number of reads aligned with amplicon). The frequency of editable alleles with expected editing or indels was averaged between technical replicates and the values of three biological replicates were shown.
Electroporation of primary human T cells
T cells were activated for 2 days using Dynabeads Human T-Expander CD3/CD28 (Thermo Fisher Scientific) and at 37℃and 5% CO prior to electroporation 2 Culturing in T cell culture medium (X-VIVO 15 serum-free hematopoietic cell culture medium, lonza) supplemented with 5% AB human serum (Valley Biomedical), 1× GlutaMAX (Thermo Fisher Scientific), 12mM N-acetylcysteine (Sigma Aldrich), 50U mL -1 Penicillin and 50 μg mL -1 Streptomycin (Thermo Fisher Scientific), 300IU mL -1 IL-2 (Peprotech) and 5ng mL -1 Recombinant human IL-7 (Peprotech) and IL-15 (Peprotech). Electric punctureCD3/CD28 beads were removed from cells 5-7 hours prior to the wells. For electroporation using NEON transfection system 10. Mu.L kit (Thermo Fisher Scientific), 3.0-3.5X10 samples were each precipitated by centrifugation at 300g for 5 min 5 Individual cells were resuspended in 11 μl NEON buffer T. The cell solution was added to a mixture of 1. Mu.g PE2 mRNA, 90pmol synthetic pegRNA (Integrated DNA Technologies), 60pmol synthetic sgRNA (Synthego) and 0-2. Mu.g MLH1dn mRNA. The synthesized pegRNA and sgRNA were dissolved in TE buffer (10 mM Tris-HCl, pH 8.0;0.1mM EDTA). Simulated control electroporation was performed using 3 μl NEON buffer T without any RNA addition. Electroporation was performed on the NEON transfection system (Thermo Fisher Scientific) using 10. Mu.L NEON tip with the following parameters: 1,400V, 10ms, three pulses. Cells were plated in 600 μl fresh T cell culture medium in 24 well plates. Cell count and viability were determined 2 days after electroporation using a Countess II automated cell counter (Thermo Fisher Scientific) and 1mL of fresh T cell medium was added to the cells. After 4 days of electroporation, genomic DNA was isolated by centrifugation at 300g for 5 minutes to pellet cells and following the "mammalian cell lysate" protocol using PureLink Genomic DNA Mini kit (Thermo Fisher Scientific) and eluting in nuclease-free water.
Quantification and statistical analysis
The number of independent biological and technical replicates for each experiment is depicted in the brief description of the figures or in the detailed information section of the method. In fig. 44G, a nonparametric Mann-Whitney U test was used to compare pilot editing data in HEK293T cells with pilot editing data in HeLa, K562, and U2OS cells.
This example refers to tables 5-7 provided below:
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
example 2: progression of PEmax
To further improve guided editing, PE2 protein was optimized by varying the use of Reverse Transcriptase (RT) codons, the length and composition of the peptide linker between nCas9 and reverse transcriptase, the position, composition and number of NLS sequences, and mutations inside the SpCas9 domain (fig. 55A and 55B). Of the 20 such variants tested, the greatest enhancement in editing efficiency was observed in the guided editor architecture using: the R221K and N394K mutations in Genscript human codon optimized RT, 34-aa linker comprising a binary SV40NLS (Wu et al, 2009), additional C-terminal C-Myc NLS (Dang and Lee, 1988) and SpCas9 previously shown to improve Cas9 nuclease activity (Spencer and Zhang, 2017) (fig. 54A and 55A). This optimized boot editor architecture is named PEmax. In seven substitution edits targeting different loci, the average frequency of expected edits was increased 2.3-fold in HeLa cells and 1.2-fold in HEK293T cells using the PEmax architecture with the PE2 system (PE 2 max) compared to the original PE2 architecture (fig. 55B). Similarly, PE3 using the PEmax architecture (PE 3 max) increased the average editing efficiency 3.2-fold in HeLa cells and 1.2-fold in HEK293T cells compared to PE3, without substantially changing the product purity (fig. 54A and 55A).
Reference to the literature
Acharya,S.,Wilson,T.,Gradia,S.,Kane,M.F.,Guerrette,S.,Marsischky,G.T.,Kolodner,R.,and Fishel,R.(1996).hMSH2 forms specific mispair-binding complexes with hMSH3 and hMSH6.Proceedings of the National Academy of Sciences 93,13629-13634.
Amato,A.,Cappabianca,M.P.,Perri,M.,Zaghis,I.,Grisanti,P.,Ponzini,D.,and Di Biagio,P.(2014).Interpreting elevated fetal hemoglobin in pathology and health at the basic laboratory level:new and knownγ-gene mutations associated with hereditary persistence of fetal hemoglobin.International Journal of Laboratory Hematology 36,13-19.
Anzalone,A.V.,Koblan,L.W.,and Liu,D.R.(2020).Genome editing with CRISPR-Cas nucleases,base editors,transposases and prime editors.Nature Biotechnology 38,824-844.
Anzalone,A.V.,Randolph,P.B.,Davis,J.R.,Sousa,A.A.,Koblan,L.W.,Levy,J.M.,Chen,P.J.,Wilson,C.,Newby,G.A.,Raguram,A.,et al.(2019).Search-and-replace genome editing without double-strand breaks or donor DNA.Nature.
Asante,E.A.,Smidak,M.,Grimshaw,A.,Houghton,R.,Tomlinson,A.,Jeelani,A.,Jakubcova,T.,Hamdan,S.,Richard-Londt,A.,Linehan,J.M.,et al.(2015).A naturally occurring variant of the human prion protein completely prevents priorn disease.Nature 522,478-481.
Bacher,J.W.,Flanagan,L.A.,Smalley,R.L.,Nassif,N.A.,Burgart,L.J.,Halberg,R.B.,Megid,W.M.A.,and Thibodeau,S.N.(2004).Development of a fluorescent multiplex assay for detection of MSI-High rumors.Disease Markers 20,237-250.
Bartlett,D.W.,and Davis,M.E.(2006).Insights into the kinetics of siRNA-mediated gene silencing from live-cell and live-animal bioluminescent imaging.Nucleic Acids Research 34,322-333.
Bosch,J.A.,Birchak,G.,and Perrimon,N.(2021).Precise genome engineering in Drosophila using prime editing.Proceedings of the National Academy of Sciences 118,e2021996118.
Bothmer,A.,Phadke,T.,Barrera,L.A.,Margulies,C.M.,Lee,C.S.,Buquicchio,F.,Moss,S.,Abdulkerim,H.S.,Selleck,W.,Jayaram,H.,et al.(2017).Characterization of the interplay between DNA repair and CRISPR/Cas9-induced DNA lesions at an endogenous locus.Nature Communications 8,13905.
Cavaleiro,A.M.,Kim,S.H.,S.,Nielsen,M.T.,and/>M.H.H.(2015).Accurate DNA Assembly and Genome Engineering with Optimized Uracil Excision Cloning.ACS Synthetic Biology 4,1042-1046.
Chen,P.-F.,Chen,T.,Forman,T.E.,Swanson,A.C.,O′Kelly,B.,Dwyer,S.A.,Buttermore,E.D.,Kleiman,R.,Js Carrington,S.,Lavery,D.J.,et al.(2021).Generation and characterization of human induced pluripotent stem cells(iPSCs)from three male and three female patients with CDKL5Deficiency Disorder(CDD).Stem Cell Research 53,102276.
Clement,K.,Rees,H.,Canver,M.C.,Gehrke,J.M.,Farouni,R.,Hsu,J.Y.,Cole,M.A.,Liu,D.R.,Joung,J.K.,Bauer,D.E.,et al.(2019).CRISPResso2 provides accurate and rapid genome editing sequence analysis.Nature Biotechnology 37,224-226.
Dang,C.V.,and Lee,W.M.(1988).Identification of the human c-myc protein nuclear translocation signal.Molecular and Cellular Biology 8,4048-4054.
Engler,C.,Kandzia,R.,and Marillonnet,S.(2008).A One Pot,One Step,Precision Cloning Method with High Throughput Capability.PLoS ONE 3,e3647.
Fang,W.H.,and Modrich,P.(1993).Human strand-specific mismatch repair occurs by a bidirectional mechanism similar to that of the bacterial reaction.Journal of Biological Chemistry 268,11838-11844.
Fishel,R.,Lescoe,M.K.,Rao,M.R.S.,Copeland,N.G.,Jenkins,N.A.,Garber,J.,Kane,M.,and Kolodner,R.(1993).The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer.Cell 75,1027-1038.
Geng,H.,Du,C.,Chen,S.,Salerno,V.,Manfredi,C.,and Hsieh,P.(2011).In vitro studies of DNA mismatch repair proteins.Analytical Biochemistry 413,179-184.
Genschel,J.,Bazemore,L.R.,and Modrich,P.(2002).Human Exonuclease I Is Required for 5′and 3′Mismatch Repair.Journal of Biological Chemistry 277,13302-13311.
Genschel,J.,Littman,S.J.,Drummond,J.T.,and Modrich,P.(1998).Isolation of MutSβ from Human Cells and Comparison of the Mismatch Repair Specificities of MutSβ and MutSα.Journal of Biological Chemistry 273,19895-19901.
Gilbert,Luke A.,Horlbeck,Max A.,Adamson,B.,Villalta,Jacqueline E.,Chen,Y.,Whitehead,Evan H.,Guimaraes,C.,Panning,B.,Ploegh,Hidde L.,Bassik,Michael C.,et al.(2014).Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation.Cell 159,647-661.
Gilbert,Luke A.,Larson,Matthew H.,Morsut,L.,Liu,Z.,Brar,Gloria A.,Torres,Sandra E.,Stern-Ginossar,N.,Brandman,O.,Whitehead,Evan H.,Doudna,Jennifer A.,et al.(2013).CRISPR-Mediated Modular RNA-Guided Regulation of Transcription in Eukaryotes.Cell 154,442-451.
Gueneau,E.,Dherin,C.,Legrand,P.,Tellier-Lebegue,C.,Gilquin,B.,Bonnesoeur,P.,Londino,F.,Quemener,C.,Le Du,M.-H.,Márquez,J.A.,et al.(2013).Structure of the MutLαC-terminal domain reveals how M1h1 contributes to Pms1 endonuclease site.Nature Structural&Molecular Biology 20,461-468.
Guerrette,S.,Acharya,S.,and Fishel,R.(1999).The Interaction of the Human MutL Homologues in Hereditary Nonpolyposis Colon Cancer.Journal of Biological Chemistry 274,6336-6341.
Gupta,S.,Gellert,M.,and Yang,W.(2012).Mechanism of mismatch recognition revealed by human MutSβ bound to unpaired DNA loops.Nature Structural&Molecular Biology 19,72-78.
Hempelmann,J.A.,Scroggins,S.M.,Pritchard,C.C.,and Salipante,S.J.(2015).MSIplus for Integrated Colorectal Cancer Molecular Testing by Next-Generation Sequencing.The Journal of Molecular Diagnostics 17,705-714.
Hussmann,J.A.,Ling,J.,Ravisankar,P.,Yan,J.,Cirincione,A.,Xu,A.,Simpson,D.,Yang,D.,Bothmer,A.,Cotta-Ramusino,C.,et al.(2021).Repair-seq enables systematic mapping of DNA repair processes in genome editing.Submitted.
Iaccarino,I.,Marra,G.,and Palomba,F.J.,J.(1998).hMSH2 and hMSH6 play distinct roles in mismatch binding and contribute differently to the ATPase activity of hMutSalpha.The EMBO Journal 17,2677-2686.
Ingram,V.M.(1956).A Specific Chemical Difference Between the Globins of Normal Human and Sickle-Cell Nature 178,792-794.
Iyer,R.R.,Pluciennik,A.,Burdett,V.,and Modrich,P.L.(2006).DNA Mismatch Repair:Functions and Mechanisms.Chemical Reviews 106,302-323.
Jin,S.,Lin,Q.,Luo,Y.,Zhu,Z.,Liu,G.,Li,Y.,Chen,K.,Qiu,J.-L.,and Gao,C.(2021).Genome-wide specificity of prime editors in plants.Nature Biotechnology.
Kadyrov,F.A.,Dzantiev,L.,Constantin,N.,and Modrich,P.(2006).Endonucleolytic Function of MutLαin Human Mismatch Repair.Cell 126,297-308.
Kim,D.Y.,Moon,S.B.,Ko,J.-H.,Kim,Y.-S.,and Kim,D.(2020).Unbiased investigation of specificities of prime editing systems in human cells.Nucleic Acids Research 48,10576-10589.
Kim,J.H.,Lee,S.-R.,Li,L.-H.,Park,H.-J.,Park,J.-H.,Lee,K.Y.,Kim,M.-K.,Shin,B.A.,and Choi,S.-Y.(2011).High Cleavage Efficiency of a 2A Peptide Derived from Porcine Teschovirus-1 in Human Cell Lines,Zebrafish and Mice.PLoS ONE 6,e18556.
Kunkel,T.A.,and Erie,D.A.(2005).DNA mismatch repair.Annu Rev Biochem 74,681-710.
Lahue,R.S.,Au,K.G.,and Modrich,P.(1989).DNA mismatch correction in a defimed system.Science 245,160.
Leach,F.S.,Nicolaides,N.C.,Papadopoulos,N.,Liu,B.,Jen,J.,Parsons,R.,P.,Sistonen,P.,Aaltonen,L.A.,/>M.,et al.(1993).Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer.Cell 75,1215-1225.
Li,G.-M.(2008).Mechanisms and functions of DNA mismatch repair.Cell Research 18,85-98.
Lin,Q.,Zong,Y.,Xue,C.,Wang,S.,Jin,S.,Zhu,Z.,Wang,Y.,Anzalone,A.V.,Raguram,A.,Doman,J.L.,et al.(2020).Prime genome editing in rice and wheat.Nature Biotechnology 38,582-585.
Liu,P.,Liang,S.-Q.,Zheng,C.,Mintzer,E.,Zhao,Y.G.,Ponnienselvan,K.,Mir,A.,Sontheimer,E.J.,Gao,G.,Flotte,T.R.,et al.(2021).Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice.Nature Communications 12.
Liu,S.,Wang,Q.,Yu,X.,Li,Y.,Guo,Y.,Liu,Z.,Sun,F.,Hou,W.,Li,C.,Wu,L.,et al.(2018).HIV-1 inhibition in cells with CXCR4 mutant genome created by CRISPR-Cas9 and piggyBac recombinant technologies.Scientific Reports 8.
Liu,Y.,Li,X.,He,S.,Huang,S.,Li,C.,Chen,Y.,Liu,Z.,Huang,X.,and Wang,X.(2020).Efficient generation of mouse models with the prime editing system.Cell Discovery 6.
Lujan,S.A.,Clausen,A.R.,Clark,A.B.,MacAlpine,H.K.,MacAlpine,D.M.,Malc,E.P.,Mieczkowski,P.A.,Burkholder,A.B.,Fargo,D.C.,Gordenin,D.A.,et al.(2014).Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition.Genome Res 24,1751-1764.
Mead,S.,Whitfield,J.,Poulter,M.,Shah,P.,Uphill,J.,Campbell,T.,A1-Dujaily,H.,Hummerich,H.,Beck,J.,Mein,C.A.,et al.(2009).ANovel Protective Prion Protein Variant that Colocalizes with Kuru Exposure.New England Journal of Medicine 361,2056-2065.
Nelson,J.W.,Randolph,P.B.,Shen,S.P.,Everette,K.A.,Chen,P.J.,Anzalone,A.V.,Newby,G.A.,An,M.,Chen,J.C.,and Liu,D.R.(2021).Engineered pegRNAs that improve prime editing efficiency.Submitted.
Olson,H.E.,Demarest,S.T.,Pestana-Knight,E.M.,Swanson,L.C.,Iqbal,S.,Lal,D.,Leonard,H.,Cross,J.H.,Devinsky,O.,and Benke,T.A.(2019).Cyclin-Dependent Kinase-Like 5 Deficiency Disorder:Clinical Review.Pediatric Neurology 97,18-25.
Parsons,R.,Li,G.-M.,Longley,M.J.,Fang,W.-H.,Papadopoulos,N.,Jen,J.,De La Chapelle,A.,Kinzler,K.W.,Vogelstein,B.,and Modrich,P.(1993).Hypermutability and mismatch repair deficiency in RER+tumor cells.Cell 75,1227-1236.
Petri,K.,Zhang,W.,Ma,J.,Schmidts,A.,Lee,H.,Horng,J.E.,Kim,D.Y.,Kurt,I.C.,Clement,K.,Hsu,J.Y.,et al.(2021).CRISPR prime editing with ribonucleoprotein complexes in zebrafish and primary human cells.Nature Biotechnology.
Plotz,G.,Raedle,J.,Brieger,A.,Trojan,J.,and Zeuzem,S.(2003).N-terminus of hMLH1 confers interaction of hMutL and hMutL with hMutS.Nucleic Acids Research 31,3217-3226.
Pluciennik,A.,Dzantiev,L.,Iyer,R.R.,Constantin,N.,Kadyrov,F.A.,and Modrich,P.(2010).PCNA function in the activation and strand direction of MutL endonuclease in mismatch repair.Proceedings of the National Academy of Sciences 107,16066-16071.
Ran,F.A.,Cong,L.,Yah,W.X.,Scott,D.A.,Gootenberg,J.S.,Kriz,A.J.,Zetsche,B.,Shalem,O.,Wu,X.,Makarova,K.S.,et al.(2015).In vivo genome editing using Staphylococcus aureus Cas9.Nature 520,186-191.
M.,Dufner,P.,Marra,G.,and Jiricny,J.(2002).Mutations within the hMLH1 and hPMS2 Subunits of the Human MutLαMismatch Repair Factor Affect Its ATPase Activity,but Not Its Ability to Interact with hMutSα.Journal of Biological Chemistry 277,21810-21820.
Schene,I.F.,Joore,I.P.,Oka,R.,Mokry,M.,Van Vugt,A.H.M.,Van Boxtel,R.,Van Der Doef,H.P.J.,Van Der Laan,L.J.W.,Verstegen,M.M.A.,Van Hasselt,P.M.,et al.(2020).Prime editing for functional repair in patient-derived disease models.Nature Communications 11.
Shcherbakova,P.V.,and Kunkel,T.A.(1999).Mutator phenotypes conferred by MLH1 overexpression and by heterozygosity for m1h1 mutations.Molecular and cellular biology 19,3177-3183.
Sockolosky,J.T.,Trotta,E.,Parisi,G.,Picton,L.,Su,L.L.,Le,A.C.,Chhabra,A.,Silveria,S.L.,George,B.M.,King,I.C.,et al.(2018).Selective targeting of engineered T cells using orthogonal IL-2 cytokine-receptor complexes.Science 359,1037-1042.
Spencer,J.M.,and Zhang,X.(2017).Deep mutational scanning of S.pyogenes Cas9 reveals important functional domains.Scientific Reports 7.
Strand,M.,Prolla,T.A.,Liskay,R.M.,and Petes,T.D.(1993).Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair.Nature 365,274-276.
Su,S.S.,Lahue,R.S.,Au,K.G.,and Modrich,P.(1988).Mispair specificity of methyl-directed DNA mismatch correction in vitro.Journal of Biological Chemistry 263,6829-6835.
Sugawara,N.,Goldfarb,T.,Studamire,B.,Alani,E.,and Haber,J.E.(2004).Heteroduplex rejection during single-strand annealing requires Sgsl helicase and mismatch repair proteins Msh2 and Msh6 but not Pmsl.Proceedings of the National Academy of Sciences 101,9315-9320.
Supek,F.,and Lehner,B.(2015).Differential DNA mismatch repair underlies mutation rate variation across the human genome.Nature 521,81-84.
Sürün,D.,Schneider,A.,Mircetic,J.,Neumann,K.,Lansing,F.,Paszkowski-Rogacz,M.,V.,Lee-Kirsch,M.A.,and Buchholz,F.(2020).Efficient Generation and Correction of Mutations in Human iPS Cells Utilizing mRNAs of CRISPR Base Editors and Prime Editors.Genes 11,511.
Thomas,D.C.,Roberts,J.D.,and Kunkel,T.A.(1991).Heteroduplex repair in extracts of human HeLa cells.Journal of Biological Chemistry 266,3744-3751.
Tomer,G.,Buermeyer,A.B.,Nguyen,M.M.,and Liskay,R.M.(2002).Contribution of human m1h1 and pms2 ATPase activities to DNA mismatch repair.J Biol Chem 277,21801-21809.
Tran,H.T.,Keen,J.D.,Kricker,M.,Resnick,M.A.,and Gordenin,D.A.(1997).Hypermutability of homonucleotide runs in mismatch repair and DNA polymerase proofreading yeast mutants.Molecular and Cellular Biology 17,2859-2865.
Trojan,J.,Zeuzem,S.,Randolph,A.,Hemmerle,C.,Brieger,A.,Raedle,J.,Plotz,G.,Jiricny,J.,and Mana,G.(2002).Functional analysis of hMLH1 variants and HNPCC-related mutations using a human expression system.Gastroenterology 122,211-219.
Tsai,S.Q.,Nguyen,N.T.,Malagon-Lopez,J.,Topkar,V.V.,Aryee,M.J.,and Joung,J.K.(2017).CIRCLE-seq:a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets.Nature Methods 14,607-614.
Umar,A.,Boland,C.R.,Terdiman,J.P.,Syngal,S.,Chapelle,A.D.L.,Ruschoff,J.,Fishel,R.,Lindor,N.M.,Burgart,L.J.,Hamelin,R.,et al.(2004).Revised Bethesda Guidelines for Hereditary Nonpolyposis Colorectal Cancer (Lynch Syndrome)and Microsatellite Instability.JNCI Journal of the National Cancer Institute 96,261-268.
Umar,A.,Boyer,J.C.,and Kunkel,T.A.(1994).DNA loop repair by human cell extracts.Science 266,814.
Warren,J.J.,Pohlhaus,T.J.,Changela,A.,Iyer,R.R.,Modrich,P.L.,and Beese,Lorena S.(2007).Structure of the Human MutSαDNA Lesion Recognition Complex.Molecular Cell 26,579-592.
Wu,J.,Corbett,A.H.,and Berland,K.M.(2009).The Intracellular Mobility of Nuclear Import Receptors and NLS Cargoes.Biophysical Journal 96,3840-3849.
Zhang,Y.,Yuan,F.,Presnell,S.R.,Tian,K.,Gao,Y.,Tomkinson,A.E.,Gu,L.,and Li,G.-M.(2005).Reconstitution of 5′-Directed Human Mismatch Repair in a Purified System.Cell 122,693-705.
Zhou,B.P.,Liao,Y.,Xia,W.,Spohn,B.,Lee,M.-H.,and Hung,M.-C.(2001).Cytoplasmic localization of p21Cip1/WAF1 by Akt-induced phosphorylation in HER-2/neu-overexpressing cells.Nature Cell Biology 3,245-252.
Equivalents and scope
Articles such as "a," "an," and "the" may mean one or more, unless indicated to the contrary or apparent from the context. An embodiment or description that includes an "or" between one or more members of a group is considered satisfactory if one, more than one, or all of the group members are present, used, or otherwise associated with a given product or process, unless indicated to the contrary or apparent from the context. The present invention includes embodiments wherein exactly one member of the group is present, used or otherwise associated with a given product or process. The present invention includes embodiments in which more than one or all of the group members are present, utilized, or otherwise associated with a given product or process.
Furthermore, this disclosure encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms of one or more of the listed claims are introduced into another claim. For example, any claim that depends from another claim may be modified to include one or more limitations found in any other claim that depends from the same base claim. Where elements are presented in a list format, such as in Markush group format, each subgroup of elements is also disclosed, and any elements may be removed from the group. It should be understood that, in general, where an invention or an aspect of the invention is referred to as comprising particular elements and/or features, certain embodiments of the disclosure or aspects of the disclosure consist of or consist essentially of such elements and/or features. For the sake of brevity, those embodiments are not specifically set forth herein. It should also be noted that the terms "comprising" and "including" are intended to be open-ended and allow for the inclusion of additional elements or steps. Where ranges are given, endpoints are also included. Furthermore, unless otherwise indicated or apparent from the context and understanding of one of ordinary skill in the art, values expressed as ranges can employ any particular value or subrange within the ranges described in the various embodiments of the invention to one tenth of the unit of the lower limit of the range unless the context clearly dictates otherwise.
The present application relates to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If a conflict exists between any of the incorporated references and this specification, the present specification will control. Furthermore, any particular embodiment of the invention falling within the scope of the prior art may be expressly excluded from any one or more embodiments. Because such embodiments are believed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention may be excluded from any embodiment for any reason, whether or not related to the existence of prior art.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. The scope of the embodiments described herein is not intended to be limited to the above description, but rather as set forth in the appended embodiments. Those of ordinary skill in the art will appreciate that various changes and modifications may be made to the present description without departing from the spirit or scope of the invention as defined by the appended embodiments.

Claims (329)

1. A method for editing a nucleic acid molecule by guided editing, the method comprising: contacting a nucleic acid molecule with an inhibitor that directs the editor, pegRNA, and DNA mismatch repair pathways, thereby installing one or more modifications to the nucleic acid molecule at a target site.
2. A method for editing a double stranded target DNA sequence, the method comprising: contacting the double stranded target DNA with (i) a guide editor, (ii) a guide editing guide RNA (pegRNA) and (iii) an inhibitor of the DNA mismatch repair pathway,
wherein the guide editor comprises a nucleic acid programmable DNA binding protein (napdNAbp) and a DNA polymerase,
wherein the pegRNA comprises a spacer sequence, a gRNA core, and an extension arm comprising a DNA synthesis template and a Primer Binding Site (PBS),
wherein the spacer sequence comprises a region complementary to the target strand of the double-stranded target DNA sequence,
wherein the gRNA core is associated with the napdNAbp,
wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence, one or more nucleotide edits compared to a target strand of the double-stranded target DNA sequence;
wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence,
Wherein the contacting installs the one or more nucleotide edits into the double stranded target DNA, thereby editing the double stranded target DNA.
3. The method of claim 2, wherein the PBS comprises a region complementary to a region upstream of a nick site in the non-target strand of the target DNA sequence, wherein the nick site is characteristic of the napDNAbp.
4. The method of claim 1, wherein the method further comprises contacting the nucleic acid molecule with a second strand-nick generating gRNA.
5. The method of claim 1, wherein the guided editing efficiency of said guided editor and said pegRNA is increased in the presence of an inhibitor of said DNA mismatch repair pathway compared to using said guided editor and said pegRNA in the absence of an inhibitor of said DNA mismatch repair pathway, the guide editing efficiency increases by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
6. The method of claim 1, wherein the frequency of indel formation using said guide editor and said pegRNA in the presence of an inhibitor of said DNA mismatch repair pathway is compared to the frequency of indel formation using said guide editor and said pegRNA in the absence of an inhibitor of said DNA mismatch repair pathway, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
7. The method of claim 1, wherein the purity of the results of editing using said guide editor and said pegRNA in the presence of an inhibitor of said DNA mismatch repair pathway is compared to the purity of the results of editing using said guide editor and said pegRNA in the absence of an inhibitor of said DNA mismatch repair pathway, the editing result has a purity increase of at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, or at least 75, wherein the edit result purity is measured by the ratio of expected edit/unexpected indels.
8. The method of claim 1, wherein said inhibitor of the DNA mismatch repair pathway inhibits the expression or function of one or more proteins (MMR proteins) of said DNA mismatch repair pathway.
9. The method of claim 8, wherein the one or more MMR proteins are selected from the group consisting of MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA.
10. The method of claim 8 wherein the one or more MMR proteins is MLH1.
11. The method of claim 10, wherein MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
12. The method of claim 8, wherein the inhibitor is an antibody that inhibits the activity of one or more proteins of the DNA mismatch repair pathway.
13. The method of claim 8, wherein said inhibitor is a small molecule that inhibits the activity of said one or more proteins of said DNA mismatch repair pathway.
14. The method of claim 8, wherein the inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits the activity of the one or more proteins of the DNA mismatch repair pathway.
15. The method of claim 8, wherein the inhibitor is a dominant negative variant of MMR protein that inhibits the activity of wild-type MMR protein.
16. The method of claim 8, wherein the inhibitor is a dominant negative variant of MLH1 that inhibits the activity of MLH 1.
17. The method of claim 16, wherein the sequence corresponding to SEQ ID NO:204, wherein the dominant negative variant of MLH1 comprises one or more amino acid substitutions, insertions and/or deletions in the atpase domain, wherein the one or more amino acid changes impair or eliminate the atpase activity of the dominant negative variant of MLH 1.
18. The method of claim 16, wherein the sequence corresponding to SEQ ID NO:204, wherein the dominant negative variant of MLH1 comprises one or more amino acid substitutions, insertions and/or deletions in the endonuclease domain, wherein said one or more amino acid changes impair or eliminate the endonuclease activity of the dominant negative variant of MLH 1.
19. The method of claim 16, wherein the sequence corresponding to SEQ ID NO:204, a dominant negative variant of said MLH1 is truncated at the C-terminus compared to the wild type MLH1 protein.
20. The method of claim 16, wherein the sequence corresponding to SEQ ID NO:204, said dominant negative variant of MLH1 is truncated at the N-terminus compared to the wild type MLH1 protein.
21. The method of any one of claims 16-20, wherein the dominant negative variant of MLH1 further comprises a Nuclear Localization Signal (NLS) at the N-terminus and/or an NLS at the C-terminus.
22. The method of claim 16, wherein the dominant negative variant is (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity.
23. The method of claim 16, wherein the dominant negative variant (a) comprises an E34A amino acid substitution; (b) a deletion of amino acid 756; (c) a deletion of amino acids 754-756; (d) an E34A amino acid substitution and a deletion of amino acids 754-756; (e) a deletion of amino acids 336-756; (f) amino acid substitutions of E34A and deletions of amino acids 336-756; (g) a deletion of amino acids 1-500; (h) Deletions of amino acids 1-500 and deletions of amino acids 754-756; or (i) a deletion of amino acids 1-460 and a deletion of amino acids 754-756, optionally wherein the dominant negative variant further comprises an NLS comprising the sequence KRTADGSEFESPKKKRKV at the C-and/or N-terminus.
24. The method of claim 8, wherein the inhibitor is a dominant negative variant of PMS2 that inhibits PMS2 activity, optionally wherein the dominant negative variant comprises one or more amino acid substitutions, insertions and/or deletions compared to the wild type PMS2 protein, optionally wherein the dominant negative variant comprises (a) an E705K substitution, (b) a deletion of amino acids 2-607, (c) a deletion of amino acids 2-635, (d) a deletion of amino acids 1-635, (E) an E41A substitution and/or (f) a deletion after amino acid 134 compared to the wild type PMS2 protein.
25. The method of claim 8, wherein the inhibitor is a dominant negative variant of MSH6 that inhibits MSH6 activity, optionally wherein the dominant negative variant comprises one or more amino acid substitutions, insertions, and/or deletions compared to the wild type MSH6 protein, optionally wherein the dominant negative variant comprises (a) a K1140R substitution and/or (b) a deletion of amino acids 2-361 compared to the wild type MSH6 protein.
26. The method of claim 8, wherein the inhibitor comprises CDKN1A.
27. The method of claim 1, wherein the guided editor comprises napDNAbp and a polymerase.
28. The method of claim 27, wherein the napDNAbp is a nuclease-active Cas9 domain, a nuclease-inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof.
29. The method of claim 27, wherein the napDNAbp is a Cas9 nickase comprising one or more amino acid substitutions in the HNH domain, optionally wherein the one or more amino acid substitutions comprises H840X, N854X and/or N863X, wherein X is any amino acid other than the original amino acid, optionally wherein the one or more amino acid substitutions comprises H840A, N854A and/or N863A.
30. The method of claim 27, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
31. The method of claim 27, wherein the napDNAbp comprises SEQ ID NO: 2. 4-67 or 104, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 2. 4-67 or 104, having at least 80%, 85%, 90%, 95% or 99% sequence identity.
32. The method of claim 27, wherein the napDNAbp comprises SEQ ID NO:2 or SEQ ID NO:37 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2 or SEQ ID NO:37 has an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
33. The method of claim 27, wherein the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase.
34. The method of claim 27, wherein the polymerase is a reverse transcriptase.
35. The method of claim 34, wherein the reverse transcriptase is a retroviral reverse transcriptase, optionally wherein the reverse transcriptase is a moloney murine leukemia virus reverse transcriptase (MMLV-RT), optionally wherein the MMLV-RT comprises one or more amino acid substitutions as compared to wild type MMLV-RT selected from the group consisting of: d200N, T306K, W313F, T330P and L603W.
36. The method of claim 34, wherein the reverse transcriptase comprises the amino acid sequence of SEQ ID NO:69-98 or an amino acid sequence identical to any one of SEQ ID NOs: 69-98, optionally wherein the reverse transcriptase comprises an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity to SEQ ID NO:105, and a sequence of amino acids.
37. The method of claim 27, wherein the napDNAbp and the polymerase of the guided editor are linked to form a fusion protein, optionally wherein the napDNAbp and the polymerase are linked by a linker.
38. The method of claim 37, wherein the linker comprises SEQ ID NO:102 or 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102 or 118-131, or an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity.
39. The method of claim 37, wherein the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
40. The method of claim 37, wherein the fusion protein comprises SEQ ID NO:99 or SEQ ID NO:107, or an amino acid sequence that hybridizes with SEQ ID NO:99 or SEQ ID NO:107, or a sequence of amino acids having at least 80%, 85%, 90%, 95% or 99% sequence identity.
41. The method of claim 1, wherein said guide editor, said pegRNA and said inhibitor of the DNA mismatch repair pathway are encoded on one or more DNA vectors.
42. The method of claim 41, wherein the one or more DNA vectors comprise an AAV or lentiviral DNA vector.
43. The method of claim 42, wherein the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
44. The method of any one of claims 27-36, wherein said guide editor and said inhibitor of the DNA mismatch repair pathway are not covalently linked.
45. The method of any one of claims 27-36, wherein the napDNAbp, the polymerase or the guided editor as fusion protein is further linked to an inhibitor of the DNA mismatch repair pathway by a second linker.
46. The method of claim 45, wherein the second linker comprises a self-hydrolyzing linker, optionally wherein the second linker is a T2A linker or a P2A linker.
47. The method of claim 45, wherein the second linker comprises the amino acid sequence of SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity.
48. The method of claim 45, wherein the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
49. The method of claim 1, wherein the one or more modifications to the nucleic acid molecule installed at the target site comprise one or more transitions, one or more transversions, one or more insertions, one or more deletions, one or more inversions, or any combination thereof, and optionally less than 15bp.
50. The method of claim 49, wherein the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A.
51. The method of claim 49, wherein the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T.
52. The method of claim 1, wherein the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs.
53. The method of claim 1, wherein the one or more modifications comprise an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides, optionally wherein the one or more edits comprise an insertion or deletion of 1-15 nucleotides.
54. The method of claim 1, wherein the one or more modifications comprise correction of a disease-associated mutation in a disease-associated gene.
55. The method of claim 54, wherein the disease-related gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity.
56. The method of claim 54, wherein the disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; duchenne muscular dystrophy; galactosylation; hemochromatosis; huntington's disease; maple syrup urine disease; marfan syndrome; type 1 neurofibromatosis; congenital thick nail disease; phenylketonuria; severe combined immunodeficiency; sickle cell anemia; smith-lyme-oepitz syndrome; trinucleotide repeat disorders; prion diseases; and tay's disease.
57. The method of any one of claims 1-56, wherein the gRNA core comprises minimal sequence homology to the sequence of the target site, optionally wherein the gRNA core comprises no more than 1%, 5%, 10%, 15%, 20%, 25% or 30% sequence homology to the double stranded target DNA either 5, 10, 15, 20, 30, 35, 40, 45 or 50 nucleotides flanking the one or more nucleotide editing positions.
58. A composition for editing a nucleic acid molecule by directed editing comprising an inhibitor of a directed editor, pegRNA and DNA mismatch repair pathway, wherein the composition is capable of installing one or more modifications to the nucleic acid molecule at a target site.
59. The composition of claim 58, wherein said composition further comprises a second strand-cut generating gRNA.
60. The composition of claim 58, wherein in the presence of an inhibitor of the DNA mismatch repair pathway, the guide editing efficiency increases by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
61. The composition of claim 58, wherein the frequency of indel formation using said guide editor and said pegRNA in the presence of an inhibitor of said DNA mismatch repair pathway, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
62. The composition of claim 58, wherein in the presence of an inhibitor of the DNA mismatch repair pathway, the editing result has a purity increase of at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, or at least 75, wherein the edit result purity is measured by the ratio of expected edit/unexpected indels.
63. The composition of claim 58, wherein said inhibitor of the DNA mismatch repair pathway inhibits the expression of one or more proteins (MMR proteins) of said DNA mismatch repair pathway.
64. The composition of claim 63, wherein the one or more MMR proteins are selected from the group consisting of MLH1, PMS2 (or MutLα), PMS1 (or MutLβ), MLH3 (or MutLγ), mutSα (MSH 2-MSH 6), mutSβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ, and PCNA.
65. The composition of claim 63, wherein the one or more MMR proteins is MLH1.
66. The composition of claim 65, wherein MLH1 comprises the amino acid sequence of SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
67. The composition of claim 63, wherein said inhibitor is an antibody that inhibits the activity of one or more proteins of said DNA mismatch repair pathway.
68. The composition of claim 63, wherein said inhibitor is a small molecule that inhibits the activity of said one or more proteins of said DNA mismatch repair pathway.
69. The composition of claim 63, wherein said inhibitor is a small interfering RNA (siRNA) or small non-coding microRNA that inhibits the activity of said one or more proteins of said DNA mismatch repair pathway.
70. The composition of claim 63, wherein the inhibitor is a dominant negative variant of MMR protein that inhibits the activity of wild-type MMR protein.
71. The composition of claim 63, wherein the inhibitor is a dominant negative variant of MLH1 that inhibits MLH1 activity.
72. The composition of claim 71, wherein the amino acid sequence of SEQ ID NO:204, wherein the dominant negative variant of MLH1 comprises one or more amino acid substitutions, insertions and/or deletions in the atpase domain, wherein the one or more amino acid changes impair or eliminate the atpase activity of the dominant negative variant of MLH 1.
73. The composition of claim 71, wherein the amino acid sequence of SEQ ID NO:204, wherein the dominant negative variant of MLH1 comprises one or more amino acid substitutions, insertions and/or deletions in the endonuclease domain, wherein said one or more amino acid changes impair or eliminate the endonuclease activity of the dominant negative variant of MLH 1.
74. The composition of claim 71, wherein the amino acid sequence of SEQ ID NO:204, a dominant negative variant of said MLH1 is truncated at the C-terminus compared to the wild type MLH1 protein.
75. The composition of claim 71, wherein the amino acid sequence of SEQ ID NO:204, said dominant negative variant of MLH1 is truncated at the N-terminus compared to the wild type MLH1 protein.
76. The composition of any one of claims 71-75, wherein said dominant negative variant of MLH1 further comprises a Nuclear Localization Signal (NLS) at the N-terminus and/or an NLS at the C-terminus.
77. The composition of claim 71, wherein said dominant negative variant is (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223 has at least 70%, at least 75%, at least 80% At least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity.
78. The composition of claim 71, wherein said dominant negative variant (a) comprises an E34A amino acid substitution; (b) a deletion of amino acid 756; (c) a deletion of amino acids 754-756; (d) an E34A amino acid substitution and a deletion of amino acids 754-756; (e) a deletion of amino acids 336-756; (f) amino acid substitutions of E34A and deletions of amino acids 336-756; (g) a deletion of amino acids 1-500; (h) Deletions of amino acids 1-500 and deletions of amino acids 754-756; or (i) a deletion of amino acids 1-460 and a deletion of amino acids 754-756, optionally wherein the dominant negative variant further comprises an NLS comprising the sequence KRTADGSEFESPKKKRKV at the C-and/or N-terminus.
79. The composition of claim 73, wherein the inhibitor is a dominant negative variant of PMS2 that inhibits PMS2 activity, optionally wherein the dominant negative variant comprises one or more amino acid substitutions, insertions and/or deletions compared to the wild type PMS2 protein, optionally wherein the dominant negative variant comprises (a) an E705K substitution, (b) a deletion of amino acids 2-607, (c) a deletion of amino acids 2-635, (d) a deletion of amino acids 1-635, (E) an E41A substitution and/or (f) a deletion after amino acid 134 compared to the wild type PMS2 protein.
80. The composition of claim 73, wherein the inhibitor is a dominant negative variant of MSH6 that inhibits MSH6 activity, optionally wherein the dominant negative variant comprises one or more amino acid substitutions, insertions, and/or deletions compared to the wild-type MSH6 protein, optionally wherein the dominant negative variant comprises (a) a K1140R substitution and/or (b) a deletion of amino acids 2-361 compared to the wild-type MSH6 protein.
81. The composition of claim 73, wherein the inhibitor comprises CDKN1A.
82. The composition of claim 58, wherein said guided editor comprises napDNAbp and a polymerase.
83. The composition of claim 82, wherein the napDNAbp is a nuclease-active Cas9 domain, a nuclease-inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof.
84. The composition of claim 82, wherein the napDNAbp is a Cas9 nickase comprising one or more amino acid substitutions in the HNH domain, optionally wherein the one or more amino acid substitutions comprises H840X, N854X and/or N863X, wherein X is any amino acid other than the original amino acid, optionally wherein the one or more amino acid substitutions comprises H840A, N854A and/or N863A.
85. The composition of claim 84, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, argonauteCas b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
86. The composition of claim 84, wherein the napDNAbp comprises SEQ ID NO: 2. 4-67 or PEmax, or an amino acid sequence identical to any one of SEQ ID NOs: 2. 4-67 or 104, having at least 80%, 85%, 90%, 95% or 99% sequence identity.
87. The composition of claim 84, wherein the napDNAbp comprises SEQ ID NO:2 or SEQ ID NO:37 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2 or SEQ ID NO:37 has an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
88. The composition of claim 84, wherein the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase.
89. The composition of claim 84, wherein the polymerase is a reverse transcriptase.
90. The composition of claim 89, wherein said reverse transcriptase is a retroviral reverse transcriptase, optionally wherein said reverse transcriptase is a moloney murine leukemia virus reverse transcriptase (MMLV-RT), optionally wherein said MMLV-RT comprises one or more amino acid substitutions compared to wild type MMLV-RT selected from the group consisting of: d200N, T306K, W313F, T330P and L603W.
91. The composition of claim 89, wherein said reverse transcriptase comprises the sequence set forth in SEQ ID NO:69-98 or an amino acid sequence identical to any one of SEQ ID NOs: 69-98, optionally wherein the reverse transcriptase comprises an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity to SEQ ID NO:105, and a sequence of amino acids.
92. The composition of claim 84, wherein the napDNAbp and the polymerase of the guided editor are linked to form a fusion protein, optionally wherein the napDNAbp and the polymerase are linked by a linker.
93. The composition of claim 92, wherein the linker comprises the amino acid sequence of SEQ ID NO: 102. 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102. 118-131, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
94. The composition of claim 92, wherein the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
95. The composition of claim 92, wherein the fusion protein comprises SEQ ID NO:99 or SEQ ID NO:107, or an amino acid sequence that hybridizes with SEQ ID NO:99 or SEQ ID NO:107, or a sequence of amino acids having at least 80%, 85%, 90%, 95% or 99% sequence identity.
96. The composition of claim 58, wherein said guide editor, said pegRNA and said inhibitor of the DNA mismatch repair pathway are encoded on one or more DNA vectors.
97. The composition of claim 96, wherein the one or more DNA vectors comprise an AAV or lentiviral DNA vector.
98. The composition of claim 97, wherein the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
99. The composition of any one of claims 84-91, wherein said guide editor and said inhibitor of the DNA mismatch repair pathway are not covalently linked.
100. The composition of any one of claims 84-91, wherein said guided editor as a fusion protein is further linked to an inhibitor of said DNA mismatch repair pathway via a second linker.
101. The composition of claim 100, wherein the second linker comprises a self-hydrolyzing linker, optionally wherein the second linker is a T2A linker or a P2A linker.
102. The composition of claim 100, wherein the second linker comprises the amino acid sequence of SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131, 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
103. The composition of claim 100, wherein the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
104. The composition of claim 58, wherein the one or more modifications to the nucleic acid molecule installed at the target site comprise one or more transitions, one or more transversions, one or more insertions, one or more deletions, one or more inversions, or any combination thereof, and optionally less than 15bp.
105. The composition of claim 104, wherein the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A.
106. The composition of claim 104, wherein the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T.
107. The composition of claim 58, wherein said one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs.
108. The composition of claim 58, wherein the one or more modifications comprise an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides, optionally wherein the one or more edits comprise an insertion or deletion of 1-15 nucleotides.
109. The composition of claim 58, wherein said one or more modifications comprise correction of a mutation in a disease-associated gene associated with a disease.
110. The composition of claim 109, wherein the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity.
111. The composition of claim 109, wherein the disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; duchenne muscular dystrophy; galactosylation; hemochromatosis; huntington's disease; maple syrup urine disease; marfan syndrome; type 1 neurofibromatosis; congenital thick nail disease; phenylketonuria; severe combined immunodeficiency; sickle cell anemia; smith-lyme-oepitz syndrome; trinucleotide repeat disorders; prion diseases; and tay's disease.
112. The composition of any one of claims 58-111, wherein the gRNA core comprises minimal sequence homology to the sequence of the target site, optionally wherein the gRNA core comprises no more than 1%, 5%, 10%, 15%, 20%, 25% or 30% sequence homology to the double stranded target DNA either 5, 10, 15, 20, 30, 35, 40, 45 or 50 nucleotides upstream or downstream of the one or more nucleotide editing positions.
113. A polynucleotide for editing a DNA target site by directed editing, the polynucleotide comprising a nucleic acid sequence encoding a napDNAbp, a polymerase and an inhibitor of a DNA mismatch repair pathway, wherein the napDNAbp and the polymerase are capable of installing one or more modifications in the DNA target site in the presence of pegRNA.
114. The polynucleotide of claim 113, wherein the polynucleotide further comprises a nucleic acid sequence encoding a second strand-break generating gRNA.
115. The polynucleotide of claim 113, wherein in the presence of an inhibitor of the DNA mismatch repair pathway, the guide editing efficiency increases by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
116. The polynucleotide of claim 113, wherein in the presence of an inhibitor of the DNA mismatch repair pathway, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
117. The polynucleotide of claim 113, wherein said inhibitor of the DNA mismatch repair pathway inhibits one or more proteins of said DNA mismatch repair pathway.
118. The polynucleotide of claim 117, wherein said one or more proteins are selected from the group consisting of MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO, POL δ, and PCNA.
119. The polynucleotide of claim 117, wherein the one or more proteins is MLH1.
120. The polynucleotide of claim 119, wherein MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
121. The polynucleotide of claim 113, wherein said inhibitor is an antibody that inhibits the activity of one or more proteins of said DNA mismatch repair pathway.
122. The polynucleotide of claim 113, wherein said inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits the activity of one or more proteins of said DNA mismatch repair pathway.
123. The polynucleotide of claim 113, wherein the inhibitor is a dominant negative variant of MMR protein that inhibits the activity of wild-type MMR protein.
124. The polynucleotide of claim 113, wherein the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
125. The polynucleotide of claim 124, wherein said dominant-negative variant is (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity.
126. The polynucleotide of claim 113, wherein the napDNAbp is a nuclease-active Cas9 domain, a nuclease-inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof.
127. The polynucleotide of claim 113, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, argonauteCas b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
128. The polynucleotide of claim 113, wherein the napDNAbp comprises SEQ ID NO: 2. 4-67 or 104, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 2. 4-67 or 104, having at least 80%, 85%, 90%, 95% or 99% sequence identity.
129. The polynucleotide of claim 113, wherein the napDNAbp comprises SEQ ID NO:2 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
130. The polynucleotide of claim 113, wherein said polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase.
131. The polynucleotide of claim 113, wherein said polymerase is a reverse transcriptase.
132. The polynucleotide of claim 131, wherein the reverse transcriptase comprises the sequence of SEQ ID NO:69-98, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
133. The polynucleotide of claim 113, wherein the napDNAbp and the editor-directing polymerase are linked by a linker to form a fusion protein.
134. The polynucleotide of claim 133, wherein the linker comprises the sequence of SEQ ID NO:102 or 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102 or 118-131, or an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity.
135. The polynucleotide of claim 133, wherein the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
136. The polynucleotide of claim 113, wherein said polynucleotide is a DNA vector.
137. The polynucleotide of claim 136, wherein the DNA vector is an AAV or lentiviral DNA vector.
138. The polynucleotide of claim 137, wherein the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
139. The polynucleotide of claim 133, wherein said guide editor as a fusion protein is further linked to an inhibitor of said DNA mismatch repair pathway by a second linker.
140. The polynucleotide of claim 139, wherein said second linker comprises a self-hydrolyzing linker.
141. The polynucleotide of claim 139, wherein said second linker comprises the sequence of SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity.
142. The polynucleotide of claim 139, wherein the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
143. The polynucleotide of claim 113, wherein the one or more modifications to the nucleic acid molecule installed at the target site comprise one or more transitions, one or more transversions, one or more insertions, one or more deletions, or one or more inversions.
144. The polynucleotide of claim 143, wherein said one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A.
145. The polynucleotide of claim 143, wherein the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T.
146. The polynucleotide of claim 113, wherein the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs.
147. The polynucleotide of claim 113, wherein the one or more modifications comprise an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
148. The polynucleotide of claim 113, wherein the one or more modifications comprise correction of a disease-associated gene.
149. The polynucleotide of claim 148, wherein said disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity.
150. The polynucleotide of claim 148, wherein said disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency; alpha-1 antitrypsin deficiency; cystic fibrosis; duchenne muscular dystrophy; galactosylation; hemochromatosis; huntington's disease; maple syrup urine disease; marfan syndrome; type 1 neurofibromatosis; congenital thick nail disease; phenylketonuria; severe combined immunodeficiency; sickle cell anemia; smith-lyme-oepitz syndrome; trinucleotide repeat disorders; prion diseases; and tay's disease.
151. A cell comprising the polynucleotide of any one of claims 113-150, optionally wherein the cell is a mammalian cell, a non-human primate cell, or a human cell.
152. A pharmaceutical composition comprising the composition of any one of claims 41-80 or the polynucleotide of any one of claims 113-151, or the cell of claim 151, and a pharmaceutical excipient.
153. A kit comprising the composition of any one of claims 58-112 or the polynucleotide of any one of claims 113-150, a pharmaceutical excipient, and instructions for editing a DNA target site by directed editing.
154. A composition comprising a first nucleic acid sequence encoding a nucleic acid programmable DNA binding protein (napDNAbp), a second nucleic acid sequence encoding a polymerase, and a third nucleic acid sequence encoding an inhibitor of a DNA mismatch repair pathway.
155. The composition of claim 154, wherein the composition further comprises a guide editing guide RNA (pegRNA) or a nucleic acid sequence encoding the pegRNA, wherein the pegRNA comprises a spacer sequence, a gRNA core, and an extension arm comprising a DNA synthesis template and a Primer Binding Site (PBS), wherein the spacer sequence comprises a region complementary to a target strand of a double-stranded target DNA sequence, wherein the gRNA core is associated with the napDNAbp, wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence and one or more nucleotide edits compared to a target strand of the double-stranded target DNA sequence, and wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence.
156. The composition of claim 154, wherein said first nucleic acid sequence and said second nucleic acid sequence are located on a single polynucleotide.
157. The composition of claim 154, wherein said first nucleic acid sequence, said second nucleic acid sequence, and said third nucleic acid sequence are located on a single polynucleotide.
158. The composition of claim 157, wherein said first nucleic acid sequence and said second nucleic acid sequence are linked to encode a napDNAbp-DNA polymerase fusion protein.
159. The composition of claim 158, wherein the first nucleic acid sequence and the second nucleic acid sequence are linked to encode a napDNAbp-DNA polymerase fusion protein, wherein the third nucleic acid sequence is linked to the first nucleic acid sequence or the second nucleic acid sequence via a linker nucleic acid sequence, optionally wherein the linker nucleic acid sequence encodes a peptide linker, optionally wherein the peptide linker is a self-hydrolyzing linker, optionally wherein the self-hydrolyzing linker is a T2A linker or a P2A linker, optionally wherein the self-hydrolyzing linker comprises the amino acid sequence of SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity.
160. The composition of claim 156 or 157, wherein said single polynucleotide is part of a DNA vector.
161. The composition of claim 156 or 157, wherein said single polynucleotide is part of an mRNA sequence.
162. The composition of claim 160, wherein the DNA vector is an AAV or lentiviral DNA vector, optionally wherein the DNA vector further comprises a promoter.
163. The composition of claim 161, wherein the mRNA sequence further comprises a promoter.
164. A method for editing a nucleic acid molecule by guided editing, the method comprising: contacting a nucleic acid molecule with a guide editor and a guide editing guide RNA (pegRNA), wherein the guide editor comprises a nucleic acid programmable DNA binding protein and a DNA polymerase, wherein the pegRNA comprises a spacer sequence, a gRNA core, and an extension arm comprising a DNA synthesis template and a Primer Binding Site (PBS), wherein the DNA synthesis template comprises three or more consecutive nucleotide mismatches relative to an endogenous sequence of a target site on the nucleic acid molecule, wherein the three or more consecutive nucleotide mismatches comprise (i) insertions, deletions, or substitutions of x consecutive nucleotides that correct a mutation (e.g., a disease-related mutation), and (ii) insertions, deletions, or substitutions of y consecutive nucleotides immediately adjacent to the x consecutive nucleotides, wherein the insertions, deletions, or substitutions of y consecutive nucleotides are silent mutations, wherein (x+y) is an integer no less than 3, wherein y is an integer no less than 1, and wherein incorporation of the silent mutations increases efficiency, decreases the frequency of non-intended insertions, and/or increases the purity of editing results by guide editing.
165. The method of claim 164, wherein at least one of the three or more consecutive nucleotide mismatches results in a change in the amino acid sequence of the protein expressed from the nucleic acid molecule, and wherein at least one of the remaining three or more consecutive nucleotide mismatches is a silent mutation.
166. The method of claim 165, wherein the silent mutation is located in a coding region of the target nucleic acid molecule.
167. The method of claim 166, wherein the silent mutation introduces one or more alternative codons encoding the same amino acid as the unedited nucleic acid molecule into the nucleic acid molecule.
168. The method of claim 165, wherein the silent mutation is located in a non-coding region of the target nucleic acid molecule.
169. The method of claim 168, wherein the silent mutation does not affect splicing, gene regulation, RNA life, or other biological properties of a target site on the nucleic acid molecule.
170. The method of any one of claims 164-169, wherein the DNA synthesis template of the pegRNA comprises four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to the endogenous sequence of the target site on the nucleic acid molecule.
171. The method of any one of claims 164-170, wherein the three or more consecutive nucleotide mismatches evade correction of the DNA mismatch repair pathway.
172. The method of any one of claim 164-171, wherein compared to a method using a pegRNA comprising a DNA synthesis template comprising only one consecutive nucleotide mismatch relative to the endogenous sequence of the target site on the nucleic acid molecule, the guide editing efficiency increases by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
173. The method of any one of claim 164-171, wherein compared to a method using a pegRNA comprising a DNA synthesis template comprising only one consecutive nucleotide mismatch relative to the endogenous sequence of the target site on the nucleic acid molecule, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
174. A method for editing a nucleic acid molecule by guided editing, the method comprising: contacting a nucleic acid molecule with a guide editor and a pegRNA, wherein an extension arm of the pegRNA comprises a DNA synthesis template comprising an insertion or deletion of 10 or more nucleotides relative to an endogenous sequence of a target site on the nucleic acid molecule.
175. The method of claim 174, wherein the DNA synthesis template comprises an insertion or deletion of 11 nucleotides or more, 12 nucleotides or more, 13 nucleotides or more, 14 nucleotides or more, 15 nucleotides or more, 16 nucleotides or more, 17 nucleotides or more, 18 nucleotides or more, 19 nucleotides or more, 20 nucleotides or more, 21 nucleotides or more, 22 nucleotides or more, 23 nucleotides or more, 24 nucleotides or more, or 25 nucleotides or more, relative to the endogenous sequence of the target site on the nucleic acid molecule.
176. The method of claim 175, wherein the DNA synthesis template comprises an insertion or deletion of 15 or more nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule.
177. The method of any one of claims 174-176, wherein an insertion or deletion of 10 or more nucleotides relative to the endogenous sequence of the target site on the nucleic acid molecule evades correction by the DNA mismatch repair pathway.
178. The method of any one of claims 174-176, wherein the guided editing efficiency is increased by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 10-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a method using a DNA synthesis template comprising an insertion or deletion of less than 10 nucleotides relative to an endogenous sequence of a target site on the nucleic acid molecule.
179. The method of any one of claims 174-176, wherein relative to a method using a pegRNA comprising a DNA synthesis template comprising an insertion or deletion of less than 10 nucleotides relative to an endogenous sequence of a target site on the nucleic acid molecule, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
180. A guided editing guide RNA (pegRNA) for editing a nucleic acid molecule by guided editing, wherein the pegRNA comprises a spacer sequence, a gRNA core, and an extension arm comprising a DNA synthesis template and a Primer Binding Site (PBS), wherein the DNA synthesis template comprises three or more consecutive nucleotide mismatches relative to an endogenous sequence of a target site on the nucleic acid molecule, wherein the three or more consecutive nucleotide mismatches comprise (i) insertions, deletions, or substitutions of x nucleotides that correct a mutation (e.g., a disease-related mutation), and (ii) insertions, deletions, or substitutions of y nucleotides immediately adjacent to the x nucleotides, wherein the insertions, deletions, or substitutions of y nucleotides are silent mutations, wherein (x+y) is an integer no less than 3, wherein y is an integer no less than 1, and wherein incorporation of the silent mutations increases efficiency, reduces the frequency of unexpected insertions, and/or improves the purity of editing results by guided editing.
181. The pegRNA of claim 180, wherein at least one of the three or more consecutive nucleotide mismatches results in a change in an amino acid sequence of a protein expressed from the nucleic acid molecule, and wherein at least one of the remaining three or more consecutive nucleotide mismatches is a silent mutation.
182. The pegRNA of claim 181, wherein the silent mutation is located in a coding region of the target nucleic acid molecule.
183. The pegRNA of claim 182, wherein the silent mutation introduces one or more substitution codons encoding the same amino acid as an unedited nucleic acid molecule into the nucleic acid molecule.
184. The pegRNA of claim 181, wherein the silent mutation is located in a non-coding region of the target nucleic acid molecule.
185. The pegRNA of claim 183, wherein the silent mutation is in a region of the nucleic acid molecule that does not affect splicing, gene regulation, RNA lifetime, or other biological properties of a target site on the nucleic acid molecule.
186. The pegRNA of any one of claims 180-185, wherein the extension arm of the pegRNA comprises four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to an endogenous sequence of a target site on the nucleic acid molecule.
187. The pegRNA of any one of claims 180-186, wherein the three or more consecutive nucleotide mismatches escape the DNA mismatch repair pathway.
188. The pegRNA of any of claims 180 to 186, wherein the pegRNA relative to a DNA synthesis template comprising only one consecutive nucleotide mismatch relative to an endogenous sequence of a target site on said nucleic acid molecule, use of the pegRNA in a guided editing results in at least a 1.5 fold, at least a 2.0 fold, at least a 2.5 fold, at least a 3.0 fold, at least a 3.5 fold, at least a 4.0 fold, at least a 4.5 fold, at least a 5.0 fold, at least a 5.5 fold, at least a 6.0 fold, at least a 6.5 fold, at least a 7.0 fold, at least a 7.5 fold, at least a 8.0 fold, at least a 8.5 fold, at least a 9.0 fold, at least a 9.5 fold, at least a 10.0 fold, at least a 11 fold, at least a 12 fold, at least a 13 fold, at least a 14 fold, at least a 15 fold, at least a 16 fold, at least a 17 fold, at least a 18 fold, at least a 19 fold, at least a 20 fold, at least a 21 fold, at least a 22 fold, at least a 23 fold, at least a 24 fold, at least a 25 fold, at least a 26 fold, at least a 27 fold, at least a 28 fold, at least a 29 fold, at least a 30 fold, at least a 31 fold at least 32-fold, at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
189. The pegRNA of any one of claims 180-186, wherein use of the pegRNA in a guided editing results in a reduction in the indel formation frequency by at least 1.5-fold, at least 2.0-fold, at least 2.5-fold, at least 3.0-fold, at least 3.5-fold, at least 4.0-fold, at least 4.5-fold, at least 5.0-fold, at least 5.5-fold, at least 6.0-fold, at least 6.5-fold, at least 7.0-fold, at least 7.5-fold, at least 8.0-fold, at least 10-fold, at least 8.5-fold, at least 9.0-fold, at least 9.5-fold, or at least 10.0-fold relative to a pegRNA comprising a DNA synthesis template that comprises only one contiguous nucleotide mismatch relative to an endogenous sequence at a target site on the nucleic acid molecule.
190. A guided editor system comprising the pegRNA of any one of claims 180-189 and a guided editor, wherein the guided editor comprises napDNAbp and a polymerase.
191. The guided editor system of claim 190, further comprising an inhibitor of a DNA mismatch repair pathway.
192. A guide editor comprising a fusion protein comprising (i) a nucleic acid programmable DNA binding protein (napDNAbp) and (ii) a DNA polymerase, wherein the napDNAbp is a Cas9 nickase (nCas 9), the nCas9 being relative to a nucleic acid sequence as set forth in SEQ ID NO:2 comprises an R221K amino acid substitution, an N39K amino acid substitution, and an amino acid substitution that inactivates HNH domain nuclease activity, or a corresponding amino acid substitution thereof.
193. The guide editor of claim 192, wherein the nCas9 compares to the nucleotide sequence set forth in SEQ ID NO:2 comprises R221K, N39K and H840A amino acid substitutions.
194. The guide editor of claim 193, wherein the nCas9 and the DNA polymerase are linked by a linker, optionally wherein the linker comprises the amino acid sequence of SEQ ID NO: x5, optionally wherein the guidance editor further comprises an SV40 NLS at the N-terminus, optionally wherein the guidance editor further comprises an SV40 NLS and/or a C-Myc NLS at the C-terminus.
195. The boot editor of claim 192 comprising SEQ ID NO:99 or amino acid sequence corresponding to SEQ ID NO:99 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 100% sequence identity.
196. A guided editor system comprising the guided editor of any one of claims 192-195 and an inhibitor of a DNA mismatch repair pathway.
197. A guided editor system comprising the guided editor of any one of claims 192-195 and a guided editing guide RNA (pegRNA).
198. A polynucleotide encoding the guided editor of any one of claims 192-195.
199. The polynucleotide of claim 198, wherein said polynucleotide is DNA.
200. The polynucleotide of claim 198, wherein the polynucleotide is mRNA.
201. A vector comprising the polynucleotide of claim 198, optionally wherein expression of the fusion protein is under the control of a promoter, optionally wherein the promoter is a U6 promoter.
202. A guided editor system for site-specific genomic modification comprising (a) a guided editor comprising (i) a nucleic acid programmable DNA binding protein (napDNAbp) and (ii) a DNA polymerase, and (b) an inhibitor of a DNA mismatch repair pathway.
203. The guided editor system of claim 202, wherein the inhibitor of a DNA mismatch repair pathway inhibits one or more proteins of the DNA mismatch repair pathway.
204. The guided editor system of claim 203, wherein the one or more proteins are selected from the group consisting of MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA.
205. The guided editor system of claim 203, wherein the one or more proteins is MLH1.
206. The guided editor system of claim 205, wherein MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
207. The guided editor system of claim 202, wherein the inhibitor is an antibody that inhibits an activity of one or more proteins of the DNA mismatch repair pathway.
208. The guided editor system of claim 202, wherein the inhibitor is a small molecule that inhibits an activity of one or more proteins of the DNA mismatch repair pathway.
209. The guided editor system of claim 202, wherein the inhibitor is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits activity of one or more proteins of the DNA mismatch repair pathway.
210. The guided editor system of claim 202, wherein the inhibitor is a dominant negative variant of an MMR protein that inhibits an activity of a wild-type MMR protein.
211. The guided editor system of claim 202, wherein the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
212. The guided editor system of claim 202, wherein the dominant negative variant is (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity.
213. The guided editor system of claim 202, wherein the napDNAbp is a nuclease-active Cas9 domain, a nuclease-inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof.
214. The bootstrapping editor system of claim 202 wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, argonauteCas b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
215. The guided editor system of claim 202, wherein the napDNAbp comprises SEQ ID NO: 2. 4-67, 104, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 2. 4-67, 104 has at least 80%, 85%, 90%, 95% or 99% sequence identity.
216. The guided editor system of claim 202, wherein the napDNAbp comprises SEQ ID NO:2 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
217. The guided editor system of claim 202, wherein the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase.
218. The guided editor system of claim 202, wherein the polymerase is a reverse transcriptase.
219. The guided editor system of claim 218, wherein the reverse transcriptase comprises SEQ ID NO:69-98, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
220. The guided editor system of claim 202, wherein the napDNAbp and the polymerase of the guided editor are linked by a linker to form a fusion protein, and optionally wherein the inhibitor is linked to the napDNAbp or the polymerase by a second linker.
221. The guided editor system of claim 220, wherein the linker and/or the second linker comprises SEQ ID NO:102 or 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102 or 118-131, or an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity.
222. The guided editor system of claim 220, wherein the linker and/or the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
223. The boot editor system of claim 202 wherein the boot editor is SEQ ID NO:100 or PE1 with SEQ ID NO:100 has an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 99% or up to 100% sequence identity.
224. The boot editor system of claim 202 wherein the boot editor is SEQ ID NO:107 or PE2 with SEQ ID NO:107 has an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 99% or up to 100% sequence identity.
225. The boot editor system of claim 202 wherein the boot editor is SEQ ID NO:100 or PE1 with SEQ ID NO:100, and the inhibitor is a dominant negative variant of MLH1 that inhibits MLH1, and the inhibitor is an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or up to 100% sequence identity.
226. The guided editor system of claim 225, wherein the dominant negative variant of MLH1 is (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO: 213), (h) MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity, the guidance editor system of claim 145, wherein the guidance editor is SEQ ID NO:107 or PE2 with SEQ ID NO:107 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 99% or up to 100% sequence identity, and the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
227. The boot editor system of claim 202 wherein the boot editor is SEQ ID NO:107 or PE2 with SEQ ID NO:107 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 99% or up to 100% sequence identity, and the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
228. The guided editor system of claim 227, wherein the dominant negative variant of MLH1 is (a) MLH 1E 34A (SEQ ID NO: 222), (b) MLH1 delta 756 (SEQ ID NO: 208), (c) MLH1 delta 754-756 (SEQ ID NO: 209), (d) MLH 1E 34A delta 754-756 (SEQ ID NO: 210), (E) MLH1 1-335 (SEQ ID NO: 211), (f) MLH1 1-335E 34A (SEQ ID NO: 212), (g) MLH1 1-335 NLS SV40 (SEQ ID NO:213)、(h)MLH1 501-756 (SEQ ID NO: 215), (i) MLH1 501-753 (SEQ ID NO: 216), (j) MLH1 461-753 (SEQ ID NO: 218), or (k) NLS SV40 MLH1 501-753 (SEQ ID NO: 223), or comprises a sequence identical to SEQ ID NO:208-213, 215, 216, 218, 222 or 223, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including an amino acid sequence of 100% sequence identity.
229. The boot editor system of claim 202 wherein the boot editor is SEQ ID NO:107 or PE2 with SEQ ID NO:107 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 99% or up to 100% sequence identity, and the inhibitor is a dominant negative variant of MLH1 that inhibits MLH 1.
230. The guided editor system of claim 202, wherein the DNA polymerase is a reverse transcriptase.
231. The guided editor system of claim 230, wherein the reverse transcriptase is a retroviral reverse transcriptase.
232. The guided editor system of claim 230, wherein the reverse transcriptase lacks rnase activity.
233. The guided editor system of claim 230, wherein the reverse transcriptase is moloney-murine leukemia virus reverse transcriptase (MMLV-RT).
234. The guided editor system of claim 233, wherein the MMLV-RT comprises a sequence selected from the group consisting of SEQ ID NOs: 89 and 701-716 has an amino acid sequence having at least 85% identity.
235. The guided editor system of claim 202, wherein the napDNAbp is a CRISPR-associated (Cas) nuclease.
236. The guided editor system of claim 235, wherein the napDNAbp comprises a Cas9 nuclease domain.
237. The guided editor system of claim 236, wherein the Cas9 nuclease domain is a nickase that compares to the sequence set forth in SEQ ID NO:18 comprises an H840X substitution or corresponding substitution, wherein X is any amino acid other than histidine.
238. The guided editor system of any of claims 202-237, further comprising a perna capable of complexing with a napDNAbp of the guided editor and programming the napDNAbp to bind a target DNA sequence.
239. A nucleic acid molecule encoding the guided editor system of any one of claims 202-238 or a component thereof.
240. A method for precisely installing nucleotide edits of at least 15bp in a double-stranded target DNA sequence under conditions sufficient to evade a DNA mismatch repair pathway, the method comprising: contacting the double stranded target DNA sequence with a guide editor comprising a nucleic acid programmable DNA binding protein (napDNAbp), a DNA polymerase, and a guide editing guide RNA (PEgRNA), wherein the PEgRNA comprises a spacer region that hybridizes to a first strand of the double stranded target DNA sequence, an extension arm that hybridizes to a second strand of the double stranded target DNA sequence, a DNA synthesis template comprising the nucleotide edits, and a gRNA core that interacts with the napDNAbp, and wherein the PEgRNA directs the guide editor to install the nucleotide edits in the double stranded target DNA sequence.
241. The method of claim 240, wherein the nucleotide edits are deletions of at least 15bp in length.
242. The method of claim 240, wherein the nucleotide edits are insertions of at least 15bp in length.
243. The method of claim 240, wherein the nucleotide edit is a length of at least 16bp, 17bp, 18bp, 19bp, 20bp, 21bp, 22bp, 23bp, 24bp, or 25 bp.
244. A method for editing a nucleic acid molecule by guided editing, the method comprising: contacting a nucleic acid molecule with a guide editor and a pegRNA, thereby installing one or more modifications to the nucleic acid molecule at a target site, wherein the nucleic acid molecule is in a cell comprising a knockout of one or more genes involved in the DNA mismatch repair (MMR) pathway.
245. The method of claim 244, wherein the method further comprises contacting the nucleic acid molecule with a second strand-nick generating gRNA.
246. The method of claim 244, wherein the method is performed in a cell that does not comprise a knockout of one or more genes involved in MMR, the guide editing efficiency increases by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
247. The method of claim 244, wherein the method is performed in a cell that does not comprise a knockout of one or more genes involved in MMR, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
248. The method of claim 244, wherein the one or more genes involved in MMR are selected from genes encoding proteins MLH1, PMS2 (or mutlα), PMS1 (or mutlβ), MLH3 (or mutlγ), mutsα (MSH 2-MSH 6), mutsβ (MSH 2-MSH 3), MSH2, MSH6, PCNA, RFC, EXO1, POL δ, and PCNA.
249. The method of claim 248, wherein the one or more genes are genes encoding MLH 1.
250. The method of claim 249, wherein MLH1 comprises SEQ ID NO:204, or amino acid sequence that hybridizes to SEQ ID NO:204 has an amino acid sequence of at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity.
251. The method of claim 244, wherein the guidance editor comprises napDNAbp and a polymerase.
252. The method of claim 251, wherein the napDNAbp is a nuclease-active Cas9 domain, a nuclease-inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof.
253. The method of claim 251, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
254. The method of claim 251, wherein the napDNAbp comprises SEQ ID NO: 2. 4-67 or 99 (PEmax), or an amino acid sequence identical to any one of SEQ ID NOs: 2. 4-67 or 99 (PEmax) has an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
255. The method of claim 251, wherein the napDNAbp comprises SEQ ID NO:2 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2, an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
256. The method of claim 251, wherein the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase.
257. The method of claim 251, wherein the polymerase is a reverse transcriptase.
258. The method of claim 257, wherein the reverse transcriptase comprises SEQ ID NO:69-98, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
259. The method of claim 251, wherein the napDNAbp and the editor-directing polymerase are linked by a linker to form a fusion protein.
260. The method of claim 259, wherein the linker comprises the amino acid sequence of SEQ ID NO:102 or 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102 or 118-131, or an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity.
261. The method of claim 259, wherein the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
262. The method of claim 244, wherein the one or more modifications to the nucleic acid molecule installed at the target site comprise one or more transitions, one or more transversions, one or more insertions, one or more deletions, or one or more inversions, and optionally less than 15bp.
263. The method of claim 262, wherein the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A.
264. The method of claim 262, wherein the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T.
265. The method of claim 244, wherein the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs.
266. The method of claim 244, wherein the one or more modifications comprise an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
267. The method of claim 244, wherein the one or more modifications comprise correction of a disease-associated gene.
268. The method of claim 267, wherein the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; hypertension; alzheimer's disease; arthritis; diabetes mellitus; cancer; and obesity.
269. A method of editing a nucleic acid molecule by guided editing, the method comprising: contacting a nucleic acid molecule with an inhibitor of a guide editor, pegRNA, and p53, thereby installing one or more modifications to the nucleic acid molecule at a target site.
270. The method of claim 269, wherein the method further comprises contacting the nucleic acid molecule with a second strand-nick generating gRNA.
271. The method of claim 269, wherein in the presence of the inhibitor of p53, the guide editing efficiency increases by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
272. The method of claim 269, wherein in the presence of the inhibitor of p53, the indel formation frequency is reduced by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, at least 10.0 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 21 fold, at least 22 fold, at least 23 fold, at least 24 fold, at least 25 fold, at least 26 fold, at least 27 fold, at least 28 fold, at least 29 fold, at least 30 fold, at least 31 fold, at least 32 fold, at least at least 33-fold, at least 34-fold, at least 35-fold, at least 36-fold, at least 37-fold, at least 38-fold, at least 39-fold, at least 40-fold, at least 41-fold, at least 42-fold, at least 43-fold, at least 44-fold, at least 45-fold, at least 46-fold, at least 47-fold, at least 48-fold, at least 49-fold, at least 50-fold, at least 51-fold, at least 52-fold, at least 53-fold, at least 54-fold, at least 55-fold, at least 56-fold, at least 57-fold, at least 58-fold, at least 59-fold, at least 60-fold, at least 61-fold, at least 62-fold, at least 63-fold, at least 64-fold, at least 65-fold, at least 66-fold, at least 67-fold, at least 68-fold, at least 69-fold, at least 70-fold, at least 71-fold, at least 72-fold, at least 73-fold, at least 74-fold, or at least 75-fold.
273. The method of claim 269, wherein the inhibitor of p53 is a protein.
274. The method of claim 273, wherein the protein is i53.
275. The method of claim 269 wherein the inhibitor of p53 is an antibody that inhibits p53 activity.
276. The method of claim 269 wherein the inhibitor of p53 is a small molecule that inhibits p53 activity.
277. The method of claim 269, wherein the inhibitor of p53 is a small interfering RNA (siRNA) or a small non-coding microrna that inhibits p53 activity.
278. The method of claim 269, wherein the guidance editor comprises napDNAbp and a polymerase.
279. The method of claim 278, wherein the napDNAbp is a nuclease-active Cas9 domain, a nuclease-inactivating Cas9 domain, or a Cas9 nickase domain or variant thereof.
280. The method of claim 278, wherein the napDNAbp is selected from the group consisting of: cas9, cas12e, cas12d, cas12a, cas12b1, cas13a, cas12c, cas12b2, cas13a, cas12c, cas12d, cas12e, cas12h, cas12i, cas12g, cas12f (Cas 14), cas12f1, cas12j (Cas Φ), and Argonaute, and optionally has nickase activity.
281. The method of claim 278, wherein the napDNAbp comprises SEQ ID NO: 2. 4-67 or 104, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 2. 4-67 or 104, having at least 80%, 85%, 90%, 95% or 99% sequence identity.
282. The method of claim 278, wherein the napDNAbp comprises SEQ ID NO:2 or SEQ ID NO:37 (i.e., napDNAbp of PE1 and PE 2) or a sequence identical to SEQ ID NO:2 or SEQ ID NO:37 has an amino acid sequence having at least 80%, 85%, 90%, 95% or 99% sequence identity.
283. The method of claim 278, wherein the polymerase is a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase.
284. The method of claim 278, wherein the polymerase is a reverse transcriptase.
285. The method of claim 284, wherein the reverse transcriptase comprises the sequence of SEQ ID NO:69-98, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 69-98, an amino acid sequence that has at least 80%, 85%, 90%, 95%, or 99% sequence identity.
286. The method of claim 278, wherein the napDNAbp and the editor-directing polymerase are linked by a linker to form a fusion protein.
287. The method of claim 286, wherein the linker comprises the amino acid sequence of SEQ ID NO:102 or 118-131, or an amino acid sequence that hybridizes to any one of SEQ ID NOs: 102 or 118-131, or an amino acid sequence having at least 80%, 85%, 90%, 95%, or 99% sequence identity.
288. The method of claim 286, wherein the linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
289. The method of claim 269, wherein the guide editor, the pegRNA, and the inhibitor of p53 are encoded on one or more DNA vectors.
290. The method of claim 289, wherein the one or more DNA vectors comprise an AAV or lentiviral DNA vector.
291. The method of claim 290, wherein the AAV vector is serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
292. The method of claim 286, wherein the guidance editor as a fusion protein is further linked to the inhibitor of p53 through a second linker.
293. The method of claim 292, wherein the second linker comprises a self-hydrolyzing linker.
294. The method of claim 292, wherein said second linker comprises the amino acid sequence of SEQ ID NO: 102. 118-131 or 233-236, or an amino acid sequence identical to any one of SEQ ID NOs: 102. 118-131 or 233-236 has an amino acid sequence that has at least 80%, 85%, 90%, 95% or 99% sequence identity.
295. The method of claim 292, wherein the second linker is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acids in length.
296. The method of claim 269, wherein the one or more modifications to the nucleic acid molecule installed at the target site comprise one or more transitions, one or more transversions, one or more insertions, one or more deletions, or one or more inversions, and optionally wherein the one or more modifications are less than 15bp.
297. The method of claim 296, wherein the one or more transformations are selected from the group consisting of: (a) T to C; (b) a to G; (C) C to T; and (d) G to A.
298. The method of claim 296, wherein the one or more transversions are selected from the group consisting of: (a) T to a; (b) T to G; (C) C to G; (d) C to A; (e) A to T; (f) A to C; (G) G to C; and (h) G to T.
299. The method of claim 269, wherein the one or more modifications comprise altering (1) G: c base pair to T: a base pairs, (2) G: c base pair to a: t base pairs, (3) G: c base pair to C: g base pairs, (4) T: a base pairs to G: c base pairs, (5) T: a base pair to a: t base pairs, (6) T: a base pair to C: g base pairs, (7) C: g base pair to G: c base pairs, (8) C: g base pair to T: a base pairs, (9) C: g base pair to a: t base pairs, (10) a: t base pairs to T: a base pairs, (11) a: t base pairs to G: c base pairs, or (12) a: t base pairs to C: g base pairs.
300. The method of claim 269, wherein the one or more modifications comprise an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
301. The method of claim 269, wherein the one or more modifications comprise correction of a disease-associated gene.
302. The method of claim 301, wherein the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease, hypertension, alzheimer's disease, arthritis, diabetes, cancer, and obesity.
303. The method of claim 301, wherein the disease-associated gene is associated with a monogenic disorder selected from the group consisting of: adenosine Deaminase (ADA) deficiency, alpha-1 antitrypsin deficiency, cystic fibrosis, duchenne muscular dystrophy, galactosylemia, hemochromatosis, huntington's disease, maple syrup urine disease, marfan's syndrome, type 1 neurofibromatosis, congenital creutzfeldt-jakob disease, phenylketonuria, severe combined immunodeficiency, sickle cell anemia, smith-lyme-oltz syndrome, trinucleotide repeat disorders, prion disease, and tazizan's disease.
304. The method of any one of the preceding claims, wherein the nucleic acid molecule is in a cell.
305. The method of claim 304, wherein the cell is a mammalian cell, a non-human primate cell, or a human cell.
306. The method of claim 304, wherein the cell is ex vivo.
307. The method of claim 304, wherein the cell is in a subject, optionally wherein the subject is a human.
308. A method for treating a disease in a subject in need thereof, the method comprising administering to the subject: (i) a guide editor, (ii) a pegRNA, and (iii) an inhibitor of the DNA mismatch repair pathway, wherein the guide editor comprises a nucleic acid programmable DNA binding protein (napdNAbp) and a DNA polymerase,
wherein the pegRNA comprises a spacer sequence, a gRNA core, and an extension arm comprising a DNA synthesis template and a Primer Binding Site (PBS),
wherein the spacer sequence comprises a region complementary to a target strand of a double stranded target DNA sequence in the subject,
wherein the gRNA core is associated with the napdNAbp,
wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence, one or more nucleotide edits compared to a target strand of the double-stranded target DNA sequence;
Wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence,
wherein the guide editor and the pegRNA install one or more nucleotide edits in the double stranded target DNA, wherein installation of the one or more nucleotide edits corrects one or more mutations in the double stranded target DNA that are associated with the disease, thereby treating the disease in the subject.
309. A method for treating a disease in a subject in need thereof, the method comprising administering to the subject: (i) A guide editor and (ii) a pegRNA, wherein the guide editor comprises a nucleic acid programmable DNA binding protein (napdNAbp) and a DNA polymerase,
wherein the pegRNA comprises a spacer sequence, a gRNA core, and an extension arm comprising a DNA synthesis template and a Primer Binding Site (PBS),
wherein the spacer sequence comprises a region complementary to a target strand of a double stranded target DNA sequence in the subject,
wherein the gRNA core is associated with the napdNAbp,
wherein the DNA synthesis template comprises a region complementary to a non-target strand of the double-stranded target DNA sequence and comprises three or more consecutive nucleotide mismatches relative to an endogenous sequence of the double-stranded target DNA sequence, wherein the three or more consecutive nucleotide mismatches comprise (i) insertions, deletions or substitutions of x nucleotides that correct a mutation (e.g., a disease-related mutation), and (ii) insertions, deletions or substitutions of y nucleotides immediately adjacent to the x nucleotides, wherein the insertions, deletions or substitutions of y nucleotides are silent mutations, wherein (x+y) is an integer no less than 3, wherein y is an integer no less than 1, and wherein incorporation of the silent mutations increases efficiency by directing editing, reduces unintended frequency of indels and/or improves purity of editing results,
Wherein the primer binding site comprises a region complementary to a non-target strand of the double-stranded target DNA sequence,
wherein the guide editor and the pegRNA install an x nucleotide insertion, deletion, or substitution in the double stranded target DNA, wherein installation of the x nucleotide insertion, deletion, or substitution corrects one or more mutations in the double stranded target DNA that are associated with the disease, thereby treating the disease in the subject.
310. The method of claim 308 or 309, wherein said subject is a human.
311. A fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) domain and a domain comprising RNA-dependent DNA polymerase activity, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO:99, or an amino acid sequence that hybridizes to SEQ ID NO:99 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 100% sequence identity.
312. A fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) domain and a domain comprising RNA-dependent DNA polymerase activity, wherein the napDNAbp comprises the amino acid sequence of SEQ ID NO:104, or an amino acid sequence that hybridizes to SEQ ID NO:104 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 100% sequence identity.
313. A fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) domain and a domain comprising RNA-dependent DNA polymerase activity, wherein the domain comprising RNA-dependent DNA polymerase activity comprises the amino acid sequence of SEQ ID NO:98, or an amino acid sequence that hybridizes to SEQ ID NO:98 has an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 100% sequence identity.
314. The fusion protein of any one of claims 311-313, further comprising a linker connecting the napDNAbp and the domain comprising RNA-dependent DNA polymerase activity.
315. The fusion protein of claim 314, wherein the linker comprises the amino acid sequence of SEQ ID NO:105.
316. the fusion protein of claim 311, wherein the napDNAbp is a Cas9 nickase.
317. The fusion protein of claim 316, wherein the Cas9 nickase is relative to SEQ ID NO:37 comprises an H840A substitution, and at least one substitution at R221 or N394.
318. A complex comprising the fusion protein of any one of claims 311-317 and PEgRNA, wherein the PEgRNA directs the fusion protein to a target DNA sequence for guided editing.
319. The complex of claim 318, wherein the PEgRNA comprises a guide RNA and a nucleic acid extension arm located at the 3 'or 5' end of the guide RNA.
320. The complex of claim 319, wherein the PEgRNA is capable of binding to a napDNAbp and directing the napDNAbp to the target DNA sequence.
321. A polynucleotide encoding the fusion protein of any one of claims 311-317.
322. A vector comprising the polynucleotide of claim 321, wherein expression of the fusion protein is under the control of a promoter.
323. The vector of claim 322, wherein the promoter is a U6 promoter.
324. A cell comprising the fusion protein of any one of claims 311-317 and PEgRNA bound to napDNAbp of the fusion protein.
325. A cell comprising the complex of any one of claims 318-320.
326. A pharmaceutical composition comprising: (i) The fusion protein of any one of claims 311-317, the complex of any one of claims 318-320, the polynucleotide of claim 321, or the vector of claims 322-323; and (ii) a pharmaceutically acceptable excipient.
327. A method for editing a nucleic acid molecule by guided editing, the method comprising: contacting a nucleic acid molecule with the modified guide editor of any of claims 311-317 and a pegRNA, thereby installing one or more modifications to the nucleic acid molecule at a target site.
328. The method of claim 327, wherein the method further comprises contacting the nucleic acid molecule with a second strand-nick generating gRNA.
329. The method of claim 327, wherein the guided editing efficiency is increased by at least 1.5 fold, at least 2.0 fold, at least 2.5 fold, at least 3.0 fold, at least 3.5 fold, at least 4.0 fold, at least 4.5 fold, at least 5.0 fold, at least 5.5 fold, at least 6.0 fold, at least 6.5 fold, at least 7.0 fold, at least 7.5 fold, at least 8.0 fold, at least 10 fold, at least 8.5 fold, at least 9.0 fold, at least 9.5 fold, or at least 10.0 fold relative to guided editing using PE 2.
CN202280020781.3A 2021-01-11 2022-01-11 Boot editor variants, constructs and methods for enhancing boot editing efficiency and accuracy Pending CN117321201A (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US63/136,194 2021-01-11
US63/176,202 2021-04-16
US63/176,180 2021-04-16
US63/194,865 2021-05-28
US63/194,913 2021-05-28
US63/231,230 2021-08-09
US202163255897P 2021-10-14 2021-10-14
US63/255,897 2021-10-14
PCT/US2022/012054 WO2022150790A2 (en) 2021-01-11 2022-01-11 Prime editor variants, constructs, and methods for enhancing prime editing efficiency and precision

Publications (1)

Publication Number Publication Date
CN117321201A true CN117321201A (en) 2023-12-29

Family

ID=89285299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280020781.3A Pending CN117321201A (en) 2021-01-11 2022-01-11 Boot editor variants, constructs and methods for enhancing boot editing efficiency and accuracy

Country Status (1)

Country Link
CN (1) CN117321201A (en)

Similar Documents

Publication Publication Date Title
CN114127285B (en) Methods and compositions for editing nucleotide sequences
JP2023525304A (en) Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US20230021641A1 (en) Cas9 variants having non-canonical pam specificities and uses thereof
JP2023543803A (en) Prime Editing Guide RNA, its composition, and its uses
EP4274894A2 (en) Prime editor variants, constructs, and methods for enhancing prime editing efficiency and precision
JPWO2020191233A5 (en)
JPWO2020191234A5 (en)
JPWO2020191243A5 (en)
WO2023076898A1 (en) Methods and compositions for editing a genome with prime editing and a recombinase
CN112424360A (en) Method for producing gene editing vector using fixed guide RNA pair
JP2024530487A (en) Improved Prime Editor and Usage
CN117321201A (en) Boot editor variants, constructs and methods for enhancing boot editing efficiency and accuracy
WO2023205687A1 (en) Improved prime editing methods and compositions
WO2024138087A2 (en) Methods and compositions for modulating cellular factors to increase prime editing efficiencies
WO2024108092A1 (en) Prime editor delivery by aav
CN116685682A (en) Guided editing guide RNAs, compositions thereof, and methods of using the same
CN118056010A (en) Improved boot editor and method of use

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination