CN118234854A

CN118234854A - Improved lead editing system efficiency using cis-acting regulatory elements

Info

Publication number: CN118234854A
Application number: CN202280075555.5A
Authority: CN
Inventors: 康巧华
Original assignee: Sigma Aldrich Co LLC
Current assignee: Sigma Aldrich Co LLC
Priority date: 2021-09-13
Filing date: 2022-09-09
Publication date: 2024-06-21

Abstract

The present invention is a synthetic nucleic acid composition and method of use thereof, the synthetic nucleic acid composition comprising: i) A sequence encoding a CRISPR-Cas protein, ii) a sequence encoding a reverse transcriptase, and iii) a sequence encoding a cis-acting regulatory element.

Description

Improved lead editing system efficiency using cis-acting regulatory elements

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application Ser. Nos. 63/243,423 and 63/363,247, both filed on 9 and 13, 2022 and 4 and 20, both of which are incorporated herein by reference in their entireties.

Background

Targeted genomic modifications are powerful tools for genetic manipulation of DNA, including manipulation of eukaryotic cells, embryos and animals. For example, the exogenous sequence may be integrated into the targeted genomic location and/or a particular endogenous DNA (e.g., chromosomal) sequence may be deleted, inactivated or modified. Prior to CRISPR/Cas9 method (Perez-Pinera P,Ousterout DG,Gersbach CA,Advances in targeted genome editing,Curr Opin Chem Biol.,2012,16(3-4):268-77;Hsu PD,Lander ES,Zhang F,Development and Applications of CRISPR-Cas9 for Genome Engineering,Cell,2014,157(6):1262-1278), the method relied on the use of engineered nucleases, such as Zinc Finger Nucleases (ZFNs) or transcription activator-like effector nucleases (TALENs). These chimeric nucleases contain a programmable sequence-specific DNA binding module linked to a non-specific DNA cleavage domain. However, each new genomic target requires the design of a new ZFN or TALEN that contains a new sequence-specific DNA binding module. Thus, these custom designed nucleases tend to be expensive and time consuming to prepare. Furthermore, the specificity of ZFNs and TALENs makes them likely to mediate off-target cleavage.

The Crispr/Cas9 technology greatly enhances the ability of workers to target and manipulate DNA sequences, particularly eukaryotic sequences in vivo. However, CRISPR systems are not without their own limitations. For example, CRISPR/Cas9 systems function by creating Double Strand Breaks (DSBs) that allow for insertions, deletions, or base substitutions at the break site. However, DSBs are also associated with undesirable consequences including, for example, translocation. Furthermore, known pathological alleles originate from very precise (albeit inappropriate) insertions, deletions or base substitutions, which require precise gene editing to correct. Current techniques often lack the necessary precision and/or efficiency or lead to unacceptable results.

Recently Anzalone et al introduced a CRISPR-based system called lead editing (Anzalone et al, nature, 12 months 5 of 2019. 576:149-157). The lead editing system allows "search and replace" genome editing without double strand breaks or donor DNA. The authors describe that this system allows "genome editing in human cells that mediates targeted insertions, deletions and conversions between all 12 possible bases and combinations thereof" (supra, page 149). However, as is known in the art, with regard to lead editing, factors affecting efficiency have not been widely studied (Kim et al, nature Biotechnology, month 2021, volume 39, 198-206). There is a need for compositions and methods that improve the efficiency of lead editing systems.

Disclosure of Invention

In various aspects of the invention, there are compositions and methods that substantially increase the efficiency of a lead editing system.

In a non-limiting example, the leader editing system (PES) comprises a Cas9 (H840A) nickase-Reverse Transcriptase (RT) fusion protein and a leader editing guide RNA (pegRNA). The PE pegRNA complex binds to the target DNA and nicks the strand containing PAM (pre-spacer adjacent motif). The resulting 3 'end hybridizes to the primer binding site and then reverse transcription of the new DNA containing the desired edit is initiated (prime) using the pegRNA's transcriptase template; the balance between edited 3' flap and unedited 5' flap, cleavage and ligation of cellular 5' flap, and DNA repair results in stably edited DNA (Anzalone et al, 2019) (see fig. 1; prior art). Current lead editors as exemplified by the system shown in fig. 1 do not exhibit the desired high efficiency, which limits their further use in the research or therapeutic field.

Current leader editing techniques use, for example, cas9 (H840A) nickase-Reverse Transcriptase (RT) fusion proteins that bind to leader editing guide RNAs (pegrnas). The desired editing by the lead editing technique depends on the balance between the edited 3'flap and the unedited 5' flap. Due to the large size of Cas9 (H840A) nickase-RT fusion proteins, stable and efficient expression of the Cas9 (H840A) nickase-RT fusion proteins in target cells is always a challenge to achieve, and this affects the desired editing obtained by using PES.

To address this issue, the inventors incorporate a cis-acting regulatory element (e.g., dENE or sRSM 1) into the Cas9 (H840A) nickase-RT fusion expression cassette (fig. 2) to improve its mRNA stability and protein expression, which is believed to greatly enhance the efficiency of lead editing as well as any other gene editing techniques involving effector protein expression, including CRISPR-Cas9, CRISPR-Cas9 nickase, CRISPRi, CRISPRa, and the like. As shown in the examples section below, the present invention greatly improves the efficiency of current lead editing techniques without changing the characteristics of the desired editing and without adding any additional components to the lead editing complex.

Accordingly, the present invention relates to compositions and methods for substantially improving lead editing efficiency.

In one aspect, the invention contemplates a synthetic nucleic acid composition comprising: i) A sequence encoding a CRISPR-Cas protein, ii) a sequence encoding a reverse transcriptase, and iii) a sequence encoding a cis-acting regulatory element.

The CRISPR-Cas protein encoded by the synthetic nucleic acid composition of the invention may be any CRISPR-Cas protein known to one of ordinary skill in the art. In one aspect of the invention, the CRISPR-Cas protein is nCas-H840A.

The reverse transcriptase encoded by the synthetic nucleic acid composition may be any reverse transcriptase known to one of ordinary skill in the art. In one aspect of the invention, the reverse transcriptase is M-MLV-RT.

The cis-acting regulatory element encoded by the synthetic nucleic acid compositions of the present invention may be any cis-acting regulatory element known to those skilled in the art. In one aspect of the invention, the cis-acting regulatory element is dENE, ENE or sRSM1.

In one aspect of the invention, the synthetic nucleic acid composition of the invention is DNA.

In one aspect of the invention, the synthetic nucleic acid composition of the invention is RNA.

The invention contemplates that the synthetic nucleic acid compositions of the invention further comprise an expression promoter.

The invention further contemplates the synthetic nucleic acid compositions of the invention in an expression vector.

The invention further contemplates the incorporation of the synthetic nucleic acid compositions of the invention into transfected viruses.

The invention further contemplates that the cis-acting regulatory elements of the synthetic nucleic acid compositions of the invention are located after the stop codon of the CRISPR-Cas9 sequence and before the mRNA terminator.

The invention further contemplates that the synthetic nucleic acid composition of the invention further comprises a leader editing guide RNA (pegRNA), wherein the pegRNA is derived from one of PE1, PE2, and PE 2.

The invention further contemplates amino acid sequences encoded by the synthetic nucleic acid compositions of the invention.

The invention also relates to a use method. In one aspect, the invention features a method of modifying an endogenous DNA sequence, the method comprising: providing: i) An operable expression vector comprising a synthetic nucleic acid composition comprising: 1) a sequence encoding a CRISPR-Cas type II system protein, 2) a sequence encoding a reverse transcriptase, and 3) a sequence comprising a cis-acting regulatory element; ii) a leader editing guide RNA (pegRNA) comprising a Primer Binding Site (PBS); and iii) a cell comprising a target endogenous DNA sequence that is at least 50% complementary to PBS; transfecting a cell comprising an endogenous DNA sequence of interest with the synthetic nucleic acid composition and pegRNA of the invention; and culturing the transfected cells such that the desired modification is made to the endogenous DNA sequence.

The invention further contemplates that the synthetic nucleic acid composition used in the methods of the invention can be any CRISPR-Cas protein known to one of ordinary skill in the art. In one aspect of the invention, the CRISPR-Cas type II system protein is a Cas9 protein.

The methods of the invention further contemplate that the endogenous DNA sequences are at least 75% complementary to PBS.

The methods of the invention further contemplate that the endogenous DNA sequences are at least 90% complementary to PBS.

The methods of the invention further contemplate that the endogenous DNA sequences are at least 95% complementary to PBS.

The methods of the invention further contemplate that the endogenous DNA sequences are at least 98% complementary to PBS.

The methods of the invention further contemplate that the endogenous DNA sequence is 100% complementary to PBS.

The methods of the invention further contemplate that the CRISPR-Cas protein can be any CRISPR-Cas protein known to one of ordinary skill in the art. In one aspect of the invention, it is nCas-H840A.

It is further contemplated that the reverse transcriptase of the methods of the present invention may be any reverse transcriptase known to one of ordinary skill in the art. In one aspect of the invention, it is M-MLV-RT.

It is further contemplated that the cis-acting regulatory element of the methods of the present invention may be any cis-acting regulatory element known to those of skill in the art. In one aspect of the invention, the cis-acting regulatory element is selected from the group consisting of dENE, ENE, and sRSM.

The methods of the invention further contemplate that the operable expression vector encoding a synthetic nucleic acid of the invention is DNA.

The methods of the invention further contemplate that the operable expression vector encoding a synthetic nucleic acid of the invention is RNA.

The methods of the invention further contemplate the incorporation of the synthetic nucleic acid compositions of the invention into transfected viruses.

The methods of the invention further contemplate that the synthetic nucleic acid composition cis-acting regulatory elements of the invention are located after the stop codon of the CRISPR-Cas9 sequence and before the mRNA terminator.

The process of the present invention further contemplates pegRNA being derived from one of PE1, PE2, and PE 3.

The methods of the invention further contemplate introducing a CRISPR/Cas type II system protein encoded in an operable expression vector into a cell.

Drawings

Fig. 1 shows a diagram of a pilot editing technique as shown in the prior art. nCas9 (H840A) =cas 9 (H840A nickase; rt=reverse transcriptase; pbs=primer binding site).

FIG. 2 shows a diagram of the introduction of cis-acting regulatory elements into a lead editing expression cassette.

FIG. 3 shows that PE2-dENE enhances the editing efficiency of PE 2.

FIG. 4 shows that PE3-dENE enhances the editing efficiency of PE 3.

Figure 5 shows that 3' -UTR dENE improves PE editing efficiency for HEK3 targets in K562 cells.

Fig. 6 shows that 3' -UTR dENE does not improve PE editing efficiency for HEK3 targets in HEK293 cells.

Detailed Description

Definition of the definition

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The following references provide the skilled artisan with a general definition of many of the terms used in the present invention: singleton et al ,Dictionary of Microbiology and Molecular Biology(2nd ed.1994);The Cambridge Dictionary of Science and Technology(Walker, edited, 1988); the Glossary of Genetics, fifth edition, R.Rieger et al (eds.), SPRINGER VERLAG (1991); and Hale & Marham, THE HARPER Collins Dictionary of Biology (1991). The following terms as used herein have their assigned meanings unless otherwise indicated.

When introducing elements of the present disclosure or the preferred embodiments thereof, the articles "a," "an," "the," and "said" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.

The transitional phrases "comprising," "consisting essentially of … …," and "consisting of … …" have the meaning as given in MPEP 2111.03 (Manual of Patent Examining Procedure; united STATES PATENT AND TRADEMARK Office). Any claims using the transitional phrase "consisting essentially of … …" will be understood to list only the essential elements of the invention, and any other elements listed in the dependent claims are understood to be unnecessary to the invention listed in the claim to which they depend.

As used herein, the term "endogenous sequence" refers to the original chromosomal sequence of a cell.

As used herein, the term "exogenous" refers to a chromosomal sequence that is not native to the cell, or that is at a different chromosomal location at its natural location in the cell's genome.

"Gene" as used herein refers to the DNA region (including exons and introns) encoding a gene product, as well as all the DNA regions that regulate the production of the gene product, whether or not such regulatory sequences are adjacent to the coding and/or transcribed sequences. Thus, genes include, but are not necessarily limited to, promoter sequences, terminators, translational regulatory sequences, such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, border elements, origins of replication, matrix attachment sites, and locus control regions.

The term "heterologous" refers to an entity that is not endogenous or native to the cell of interest. For example, a heterologous protein refers to a protein that is derived or originally derived from an exogenous source (such as an exogenously introduced nucleic acid sequence). In some cases, the heterologous protein is not normally produced by the cell of interest.

The terms "nucleic acid" and "polynucleotide" refer to polymers of deoxyribonucleotides or ribonucleotides in either a linear or circular conformation, and in either single-or double-stranded form. For the purposes of this disclosure, these terms should not be construed as limiting the length of the polymer. The term may include known analogs of natural nucleotides, as well as nucleotides modified in the base, sugar and/or phosphate moieties (e.g., phosphorothioate backbones). Typically, analogs of a particular nucleotide have the same base pairing specificity; i.e., an analog of a will base pair with T.

The term "synthetic nucleic acid" refers to a nucleotide sequence synthesized in vitro (e.g., in a laboratory, and manually or with a nucleic acid synthesizer), and wherein the sequence is not found in nature. The sequence may be, for example, DNA or RNA or modifications thereof as described below, may be of any length, and may be any nucleotide sequence, provided that the sequence is not naturally occurring.

The term "nucleotide" refers to deoxyribonucleotide or ribonucleotide. The nucleotides may be standard nucleotides (i.e., adenosine, guanosine, cytidine, thymidine, and uridine) or nucleotide analogs. Nucleotide analogs refer to nucleotides having a modified purine or pyrimidine base or modified ribose moiety. Nucleotide analogs can be naturally occurring nucleotides (e.g., inosine) or non-naturally occurring nucleotides. Non-limiting examples of modifications to the sugar or base portion of a nucleotide include the addition (or removal) of acetyl, amino, carboxyl, carboxymethyl, hydroxyl, methyl, phosphoryl, and thiol groups, as well as the substitution of carbon and nitrogen atoms of the base with other atoms (e.g., 7-deazapurine). Nucleotide analogs also include dideoxynucleotides, 2' -O-methyl nucleotides, locked Nucleic Acids (LNA), peptide Nucleic Acids (PNA), and morpholino oligonucleotides (morpholinos).

The terms "polypeptide" and "protein" are used interchangeably and refer to a polymer of amino acid residues.

Techniques for determining nucleic acid and amino acid sequence identity are known in the art. Typically, such techniques involve determining the nucleotide sequence of the mRNA of the gene and/or determining the amino acid sequence encoded thereby, and comparing these sequences to a second nucleotide or amino acid sequence. Genomic sequences may also be determined and compared in this manner. In general, identity refers to the exact correspondence between nucleotides or amino acids of each of two polynucleotide or polypeptide sequences. Two or more sequences (polynucleotides or amino acids) may be compared by determining their percent identity. The percent identity of two sequences (whether nucleic acid sequences or amino acid sequences) is the number of exact matches between two aligned sequences divided by the length of the shorter sequence and multiplied by 100. Approximate alignment of nucleic acid sequences is provided by the local homology algorithm of Smith and Waterman, ADVANCES IN APPLIED MATHEMATICS 2:482-489 (1981). This algorithm can be applied to amino acid sequences by using a scoring matrix developed by Dayhoff, atlas of Protein Sequences and Structure, M.O. Dayhoff, journal 5:353-358,National Biomedical Research Foundation,Washington,D.C, USA, and standardized by Gribskov, nucleic acids Res.14 (6): 6755-6763 (1986). An exemplary implementation of this algorithm to determine percent identity of sequences is provided by Genetics Computer Group (Madison, wis.) in the "BestFit" utility. Other suitable programs for calculating percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST using default parameters. For example, BLASTN and BLASTP using the following default parameters may be used: genetic code = standard; filter = none; chain = both; cut-off = 60; desired = 10; matrix = BLOSUM62; description = 50 sequences; ranking basis = high score; database = non-redundant, genBank + EMBL + DDBJ + PDB + GenBank CDS translation + Swiss protein + Spupdate + PIR. Details of these programs can be found on the GenBank website.

As various changes could be made in the above cells and methods without departing from the scope of the invention, it is intended that all matter contained in the above description and examples set forth below shall be interpreted as illustrative and not in a limiting sense.

Pilot editing system

The leader editing system (PES) is an improvement over CRISPR/Cas9 technology. As first described by Anzalone et al, PES uses a leader editing guide RNA (pegRNA) to guide the CRISPR/Cas9 complex to the desired target site in the genome. PEG is described (Marzec et al, TRENDS IN CELL Biology, month 4 2020, 33:4, 257-259) as containing not only a spacer region complementary to the target DNA strand, but also a Primer Binding Site (PBS) region and sequences to be introduced into the targeted DNA region. PBS is complementary to the second DNA strand and will generate a primer for Reverse Transcriptase (RT) linked to Cas9 nickase. RT is an RNA-dependent polymerase that uses the sequence from pegRNA as a template. The sequence is copied directly from the peg DNA into the target DNA sequence, thereby altering the target sequence in the desired manner.

In a non-limiting example, the leader editing (PE) comprises a Cas9 (H840A) nickase-Reverse Transcriptase (RT) fusion protein and a leader editing guide RNA (pegRNA); the PE pegRNA complex binds to the target DNA and nicks the PAM-containing strand. The resulting 3' end hybridizes to the primer binding site and then reverse transcription of the new DNA containing the desired edits is initiated using the pegRNA transcription template; the balance between edited 3' flap and unedited 5' flap, cleavage and ligation of endogenous cellular 5' flap, and DNA repair results in stably edited DNA (Anzalone et al, 2019; see FIG. 1). To date, several versions of the leader editor (PE) have been developed. PE1[ SEQ ID NO:1] was named by using wild-type Moloney murine leukemia virus reverse transcriptase (M-MLV RT) fused to the C-terminus of a Cas9 (H840A) nickase. PE2[ SEQ ID NO:2] engineered M-MLV RT was used. PE3[ SEQ ID NO:3] is defined by introducing additional guide RNA to nick the unedited strand (which increases editing efficiency, although also increases indel frequency). In PE3b (Anzalone et al), this nicking one-way guide RNA (sgRNA) targets the edited sequence, thereby preventing nicking of the unedited strand before editing occurs, which results in fewer indels in mammalian cells.

Cis-regulatory element

The present invention substantially improves the efficiency of a lead editing system by incorporating one or more cis-regulatory elements (CREs) into the system. (see FIG. 2) those of ordinary skill in the art will appreciate in light of the present description that cis-regulatory elements other than those specifically exemplified herein may also be suitable for use with the present invention, and that screening for such cis-regulatory elements without undue experimentation following the teachings of the present specification is within the skill and knowledge of those of ordinary skill in the art.

As Wittkapp and Kalay teach us (Nature REVIEWS GENETICS, 1, 13, pages 59-69), cis-regulatory elements are terms for a collection of transcription factor binding sites and other non-coding DNA sufficient to activate (or inhibit) transcription in defined spatial and/or temporal expression domains. Cis-regulatory elements are a class of cis-regulatory sequences required for activation and maintenance of transcription. They consist of DNA (usually non-coding DNA) containing binding sites for transcription factors and other regulatory molecules. Promoters, enhancers and silencers are the most commonly recognized CRE types.

However, promoters required for transcription in eukaryotes typically only produce basal levels of mRNA. Enhancers are more variable than promoters and help up-regulate expression and transcription.

Another way to observe the regulation of gene expression with cis-and trans-regulatory elements is that the cis-regulatory element is typically a binding site for one or more trans-acting factors. Cis-regulatory elements are typically present on the same DNA molecule as the genes they regulate, whereas trans-regulatory elements can regulate genes distant from the genes they are transcribed from. Transcription factors are one example of trans-acting factors.

Enhancers are CREs that affect (enhance) the transcription of genes on the same DNA molecule, and can be found upstream, downstream, internal to introns, or even relatively distant from the genes they regulate. Multiple enhancers can act in a synergistic manner to regulate transcription of a gene (supra, wittkapp and Kalay). Many whole genome sequencing projects have revealed that enhancers are often transcribed as long non-coding RNA (lncRNA) or enhancer RNA (eRNA), whose level changes are often correlated with changes in the level of target gene mRNA (Melamed P., yosefzun Y., et al, trans-script, month 3, 2, 7 (1): 26-31.).

While the inventors contemplate that any cis-acting regulatory element would convey benefit to CRISPR-based nucleic acid modification techniques, preferred non-limiting examples of suitable cis-acting regulatory elements are:

Nuclear expression element (Element for Nuclear Expression, ENE): u-rich inner loops (URIL) with short flanking duplex, which confer RNA stability, examples include ENE from Kaposi's sarcoma-associated herpesvirus (KSHV), ENE from human lung adenocarcinoma metastasis-associated transcript 1 (MALAT 1), ENE from multiple endocrine tumor beta (MENbeta), and the like.

Dual nuclear expression element (Double Element for Nuclear Expression, dENE): containing two predicted double helix regions URIL, examples include rice TWIFB1dENE and its 20 mutants (M1 to M20), which are known in the art and can be found depicted in FIG. 5 of Torabi et al ,"RNA stabilization by a poly(A)tail 3'-end binding pocket and other modes of poly(A)-RNA interaction",Science,2021,371(6529).

KSHV ENE sequence:

UGUUUUGGCUGGGUUUUUCCUUGUUCGCACCGGACACCU CCAGUGACCAGACGGCAAGGUUUUUAUCCCAGUGUAUAUU[SEQ ID NO:4]

rhesus herpesvirus (Rrhesus rhadinovirus) (PRV) ENE sequence:

CGUUUGUGUUGGUUUUUAUGACCAGCUUGGUACAAAACC UGCUGGUGAUUUUUUACCCAACAAAUAAUAAAUAAAA[SEQ ID NO:5]

MALAT1 ENE sequence:

UAGGGUCAUGAAGGUUUUUCUUUUCCUGAGAAAACAACA CGUAUUGUUUUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGC UU[SEQ ID NO:6]

MALAT1 ENE+A-rich beam sequence:

UAGGGUCAUGAAGGUUUUUCUUUUCCUGAGAAAACAACA CGUAUUGUUUUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGC UUAAAAAAAAAAAAAGCAAAA[SEQ ID NO:7]

MALAT1 ENE+A-rich bundle+ mascRNA sequence:

UAGGGUCAUGAAGGUUUUUCUUUUCCUGAGAAAACAACACGUAUUGUUUUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGCUUAAAAAAAAAAAAAGCAAAAGAUGCUGGUGGUUGGCACUCCUGGUUUCCAGGACGGGGUUCAAAUCCCUGCGGCGUCUUUGCUUUGACU[SEQ ID NO:8]

MALAT1 ene+a-rich beam variant sequence:

GAAGGUUUUUCUUUUCCUGAGAAAACAACACGUAUUGUU UUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGCUUAAAAAAA AAAAAAGCAAAA[SEQ ID NO:9]

The MENβENE sequence:

GCCGCCGCAGGUGUUUCUUUUACUGAGUGCAGCCCAUGG CCGCACUCAGGUUUUGCUUUUCACCUUCCCAUCUG[SEQ ID NO:10]

menβene+ a-rich beam sequence:

GCCGCCGCAGGUGUUUCUUUUACUGAGUGCAGCCCAUGG CCGCACUCAGGUUUUGCUUUUCACCUUCCCAUCUGUGAAAGA GUGAGCAGGAAAAAGCAAAA[SEQ ID NO:11]

mer beta ENE + a-rich bundle variant sequence:

AGGUGUUUCUUUUACUGAGUGCAGCCCAUGGCCGCACUC AGGUUUUGCUUUUCACCUUCCCAUCUGUGAAAGAGUGAGCAG GAAAAAGCAAAA[SEQ ID NO:12]

Rice TWIFB1 dENE sequence: UGUUGGCUGUACUCUUUUCUUUGUCAUGGUUUUCUCAAAUAU GAGUUUUUACAUGACAAAGUUUUUAACGAGGCAGCAUGUA [ SEQ ID NO:13].

MCDiV ENE sequence:

GAGUGUAACUCAACAGUUUUUCCUAACCACGCGUCGCGU GGCAGGUUUUUUAAUCUGAGAGUUACAUUC[SEQ ID NO:14]

ATCOPIA27_ ATh-I ENE sequence:

GUGCUGUACUCUUUUUCCUCACUAUGGUUUUGUCCCGAA AGGGUUUUCCUAGUAAGGUUUUAAUGAGGCAGCAU[SEQ ID NO:15]

TUCP _ ZMa ENE sequence:

GGCUGUACUCUUUUUUCCUGUCUAGGGUUUCUCACAAGG GUGAGUUUUACCUAGACAGGUUUUUAACGAGGCAACC[SEQ ID NO:16]

Other ens or dENE and variants or mutants thereof are known in the art and may be found described in Tycowski et al ,"Conservation of a Triple-Helix-Forming RNA Stability Element in Noncoding and Genomic RNAs of Diverse Viruses",Cell Rep.,2012,2:26-32, and Tycowski et al ,"Myriad Triple-Helix-Forming Structures in the Transposable Element RNAs of Plants and Fungi",Cell Rep.,2016,15:1266–1276.

Some computational frameworks (e.g., TEISER, a tool for obtaining information structural elements in RNA (Tool for Eliciting Informative Structural ELEMENTS IN RNA)) were used to identify structural RNA stabilizing motif 1 (sRSM 1) (the statistically most significant 3' utr element that stabilizes RNA), which are known in the art and can be found described in Goodarzi et al ,"Systematic discovery of structural elements governing stability of mammalian messenger RNAs",Nature,2012,485(264).

Structural RNA stabilizing motif 1 (sRSM 1) sequence set 1:

AAAACUAUUUUGAAGAUGGUGGUGAGCUGCAAAAUAGCUGGAUGGAUUUGAAUGAUUGGGAUGAUACAUCAUUGAACUGCACUUUAUAUAACCAAAGCUUAGCAGUUUGUUAGAUAAGAGUCUAUGUAUGUCUCUGGUUAGGAUGAAGUUAAUUUUAUGUUUUUAACAUGGUAUUUUUGAAGGAGCUAAUGAAACACUGG[SEQ ID NO:17]

structural RNA stabilizing motif 1 (sRSM 1) sequence set 2:

AUUGUUUCUGGAAACUGCUUGCCAAGACAACAUUUAUUAACUGUUAGAACACUUGCUUUAUGUUUGUGUGUACAUAUUUUCCACAAAUGUUAUAAUUUAUAUAGUGUGGUUGAACAGGAUGCAAUCUUUUGUUGUCUAAAGGUGCUGCAGUUAAAAAAAAAACAACCUUUUCUUUCAAUAUGGCAUGUAGUGGAGUUUUU[SEQ ID NO:18]

Other sRSM sequences are known to those skilled in the art, examples of which can be found in Goodarzi et al ,"Systematic discovery of structural elements governing stability of mammalian messenger RNAs",Nature,2012,485:264.

Other suitable 3' UTR sequences are known to those of ordinary skill in the art and include, but are not limited to, the c-fos gene and v-fos gene 3' UTR, CD47 3' UTR, BIRC 3' UTR, beta-actin 3' UTR, beta-globin 3' UTR, hmga 23 ' UTR, cam 2a3' UTR, cyclin B1 3' UTR, and U-rich motifs associated with increased mRNA stability.

Other cis-acting regulatory elements are known to those of ordinary skill in the art and are incorporated herein. Those skilled in the art will be able to identify and optimize suitable cis-acting regulatory elements without undue experimentation in light of the teachings of the present specification.

CRISPR/Cas proteins and systems

The invention will be helpful in understanding the CRISPR/Cas protein system in general and in the context of the invention.

Lead editing guide RNA

As described above, several variations of the Pilot Editor (PE) were developed. The PE contains reverse transcriptase fused to RNA-programmable nicking enzyme and a leader editing guide RNA to copy genetic information directly from the extension on pegRNA into the target genomic locus. Therefore pegRNA "directs" the PE editing device to a specific site (target DNA), where the single strand of double stranded DNA is cleaved by Cas9 enzyme. pegRNA also contains sequences encoding the desired edits to the target DNA. According to pegRNA's design, PE can precisely and efficiently exchange any single-letter DNA for any other, and can make deletions and insertions. One of ordinary skill in the art will understand how to construct the appropriate pegRNA for a particular target site.

RNA-guided endonucleases

An RNA-guided endonuclease, such as Cas9, may comprise at least one nuclear localization signal, at least one nuclease domain, and at least one domain that interacts with pegRNA to target the endonuclease to a particular nucleotide sequence for cleavage. Nucleic acids encoding RNA-guided endonucleases are also known, as well as methods of modifying chromosomal sequences of eukaryotic cells or embryos using RNA-guided endonucleases. The RNA guided endonucleases interact with specific pegRNA, each of said pegRNA directs the endonuclease to a specific targeting site where the RNA guided endonuclease introduces a strand break that can be repaired by the DNA repair process, such that the chromosomal sequence is modified. Since pegRNA provides specificity, RNA-based endonucleases are versatile and can be used with different pegRNA to target different genomic sequences. The methods disclosed herein can be used to target and modify specific chromosomal sequences and/or introduce exogenous sequences (or lack endogenous sequences) at targeted locations in the genome of a cell or embryo. Furthermore, targeting is specific, with limited off-target effects.

The present disclosure provides fusion proteins, wherein the fusion proteins comprise a CRISPR/Cas-like protein or a fragment thereof and an effector domain. Suitable effector domains include, but are not limited to, cleavage domains, epigenetic modification domains, transcriptional activation domains, and transcriptional repressor domains. Each fusion protein is directed by a specific pegRNA to a specific chromosomal sequence, where the effector domain mediates targeted genomic modifications or gene regulation. In one aspect, the fusion protein may act as a dimer, increasing the length of the target site and increasing its likelihood of uniqueness in the genome (thus reducing off-target effects). For example, endogenous CRISPR systems modify genomic positions based on DNA binding word lengths of about 13-20bp (Cong et al, science, 339:819-823). At this word length, only 5-7% of the target sites within the genome are unique (Iseli et al, PLos One (6): e 579). In contrast, zinc finger nucleases typically have a DNA binding word length in the range of 30-36bp, resulting in about 85-87% unique target sites within the human genome. The smaller size of the DNA binding sites utilized by CRISPR-based systems limits and complicates the design of targeted CRISPs-based nucleases near desired locations (such as disease SNPs, small exons, start and stop codons, and other locations within complex genomes). The present disclosure provides not only means for extending CRISPR DNA binding word length (i.e., to limit off-target activity), but also CRISPR fusion proteins with modified functionality. Thus, the disclosed CRISPR fusion proteins have increased target specificity and unique functionality. Also provided herein are methods of using the fusion proteins to modify or regulate expression of targeted chromosomal sequences.

The RNA-guided endonuclease may comprise at least one nuclear localization signal that allows the endonuclease to enter the nucleus of eukaryotic cells and embryos (such as non-human single cell embryos). The RNA guided endonuclease further comprises at least one nuclease domain and at least one domain that interacts with pegRNA. The RNA-guided endonuclease is directed to a specific nucleic acid sequence (or target site) by pegRNA. pegRNA interact with the RNA-guided endonuclease and the target site such that once guided to the target site, the RNA-guided endonuclease is able to introduce strand breaks into the target site nucleic acid sequence. Since pegRNA provides specificity for targeted cleavage, the endonucleases of the RNA guided endonucleases are universal and can be used with different pegRNA to cleave different target nucleic acid sequences. The RNA-guided endonuclease may be a protein, may be encoded by an isolated nucleic acid (i.e., RNA or DNA), may be encoded by a vector comprising a nucleic acid encoding the RNA-guided endonuclease, and may be a protein-RNA complex comprising the RNA-guided endonuclease plus pegRNA.

RNA-guided endonucleases can be derived from Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated (Cas) systems. The CRISPR/Cas system may be a type I, type II or type III system. Non-limiting examples of suitable CRISPR/Cas proteins include Cas3, cas4, cas5e (or CasD)、Cas6、Cas6e、Cas6f、Cas7、Cas8a1、Cas8a2、Cas8b、Cas8c、Cas9、Cas10、Cas10d、CasF、CasG、CasH、Csy1、Csy2、Csy3、Cse1( or CasA), cse2 (or CasB), cse3 (or CasE), cse4 (or CasC)、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3,Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csz1、Csx15、Csf1、Csf2、Csf3、Csf4 and Cu1966.

In one embodiment, the RNA guided endonuclease is derived from a type II CRISPR/Cas system. In particular embodiments, the RNA-guided endonuclease is derived from a Cas9 protein. Cas9 proteins may be from Streptococcus pyogenes (Streptococcus pyogenes), streptococcus thermophilus (Streptococcus thermophilus), streptococcus (Streptomyces sp.), north Amycolatopsis Dactylicapni (Nocardiopsis dassonvillei), streptomyces roseoflorius (Streptomyces pristinaespiralis), streptomyces viridochromogenes (Streptomyces viridochromogenes), streptomyces viridochromogenes, Streptomyces viridochromogenes, streptomyces roseoflorius (Streptosporangium roseum), streptomyces roseoflorius, alicyclobacillus acidocaldarius (Alicyclobacillus acidocaldarius), bacillus pseudomycoides (Bacillus pseudomycoides), bacillus selenitireducens, microbacterium sibiricum (Exiguobacterium sibiricum), lactobacillus delbrueckii (Lactobacillus delbrueckii), Lactobacillus salivarius (Lactobacillus salivarius), microscilla marina, bacteria of the order burkholderia, polaromonas naphthalenivorans, genus polar monad (Polaromonas sp.), crocosphaera watsonii, genus blue (Cyanothece sp.), microcystis aeruginosa (Microcystis aeruginosa), genus Synechococcus (Synechococcus sp.), genus, Acetobacter araffinus (Acetohalobium arabaticum), ammonifex degensii, caldicelulosiruptor becscii, candidatus Desulforudis, clostridium botulinum (Clostridium botulinum), clostridium difficile (Clostridium difficile), georgi apparatus (Finegoldia magna), thermophilic anaerobe (Natranaerobius thermophilus), Anaerobic enterobacter thermophilus (Pelotomaculum thermopropionicum), acidithiobacillus caldus (Acidithiobacillus caldus), acidithiobacillus ferrooxidans (Acidithiobacillus ferrooxidans), allochromatium vinosum, haibacterium (Marinobacter sp.), nitrococcus halophilus (Nitrosococcus halophilus), nitrosococcus watsoni, Pseudoalteromonas nata (Pseudoalteromonas haloplanktis), ktedonobacter racemifer, methanohalobium evestigatum, anabaena variabilis (Anabaena variabilis), chlorella foam (Nodularia spumigena), nostoc (Nostoc sp.), arthrospira maxima (Arthrospira maxima), arthrospira platensis (Arthrospira platensis), and Arthrospira platensis, Arthrospira (Arthrospira sp.), sphingeum (Lyngbya sp.), microcystis prototheca (Microcoleus chthonoplastes), oscillatoria (Osciliatria sp.), pachyrhizus mobilis (Petrotoga mobilis), thermomyces africanus (Thermosipho africanus) or Acaryochloris marina.

Typically, the CRISPR/Cas protein comprises at least one RNA recognition and/or RNA binding domain. The RNA recognition and/or RNA binding domain interacts with the guide RNA. The CRISPR/Cas protein may also comprise nuclease domains (i.e., dnase or rnase domains), DNA binding domains, helicase domains, rnase domains, protein-protein interaction domains, dimerization domains, and other domains.

The CRISPR/Cas-like protein may be a wild-type CRISPR/Cas protein, a modified CRISPR/Cas protein, or a fragment of a wild-type or modified CRISPR/Cas protein. The CRISPR/Cas-like protein may be modified to increase nucleic acid binding affinity and/or specificity, alter enzyme activity, and/or alter another property of the protein. For example, the nuclease (i.e., dnase, rnase) domain of the CRISPR/Cas-like protein can be modified, deleted, or inactivated. Alternatively, the CRISPR/Cas-like protein may be truncated to remove domains not essential for the function of the fusion protein. CRISPR/Cas-like proteins may also be truncated or modified to optimize the activity of the effector domain of the fusion protein.

In some embodiments, the CRISPR/Cas-like protein may be derived from a wild-type Cas9 protein or a fragment thereof. In other embodiments, the CRISPR/Cas-like protein may be derived from a modified Cas9 protein. For example, the amino acid sequence of the Cas9 protein may be modified to alter one or more properties of the protein (e.g., nuclease activity, affinity, stability, etc.). Alternatively, the domain of the Cas9 protein that is not involved in RNA-guided cleavage may be removed from the protein such that the modified Cas9 protein is smaller than the wild-type Cas9 protein.

Typically, the Cas9 protein comprises at least two nuclease (i.e., dnase) domains. For example, the Cas9 protein may comprise a RuvC-like nuclease domain and an HNH-like nuclease domain. RuvC and HNH domains work together to cleave single strands to create double strand breaks in DNA. (Jinek et al, science, 337:816-821). In some embodiments, cas 9-derived proteins may be modified to contain only one functional nuclease domain (RuvC-like or HNH-like nuclease domain). For example, the Cas 9-derived protein may be modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., there is no nuclease activity). In some embodiments where one of the nuclease domains is inactive, the Cas 9-derived protein is capable of introducing a nick into double-stranded nucleic acid (such proteins are referred to as "nickases"), but does not cleave double-stranded DNA. For example, conversion of aspartic acid to alanine (D10A) in the RuvC-like domain converts Cas 9-derived protein to a nickase. Likewise, the conversion of histidine to alanine (H840A or H839A) in the HNH domain converts Cas 9-derived proteins to nickases. Each nuclease domain can be modified using well-known methods such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis, among other methods known in the art.

The RNA-guided endonuclease may comprise at least one nuclear localization signal. Typically, the NLS comprises a stretch of basic amino acids. Nuclear localization signals are known in the art (see, e.g., lange et al, j. Biol. Chem.,2007, 282:5101-5105). For example, in one embodiment, the NLS may be a haplotype sequence, such as PKKKRKV (SEQ ID NO: 19) or PKKKRRV (SEQ ID NO: 8). In another embodiment, the NLS may be a double-typing sequence. In yet another embodiment, the NLS may be KRPAATKKAGQAKKKK (SEQ ID NO: 20). NLS can be located at the N-terminus, C-terminus or internal position of RNA-guided endonucleases.

In some embodiments, the RNA-guided endonuclease may further comprise at least one cell penetrating domain. In one embodiment, the cell penetrating domain may be a cell penetrating peptide sequence derived from an HIV-1TAT protein. For example, the TAT cell penetrating sequence may be GRKKRRQRRRPPQPKKKRKV (SEQ ID NO: 21). In another embodiment, the cell penetrating domain may be a TLM (PLSSIFSRIGDPPKKKRKV; SEQ ID NO: 22), which is a cell penetrating peptide sequence derived from human hepatitis B virus. In yet another embodiment, the cell penetrating domain may be MPG (GALFLGWLGAAGSTMGAPKKKRKV; SEQ ID NO:23 or GALFLGFLGAAGSTMGAWSQPKKKRKV; SEQ ID NO: 24). In another embodiment, the cell penetrating domain may be Pep-1 (KETWWETWWWWQPKKKKKKKV; SEQ ID NO: 25), VP22 (which is a cell penetrating peptide from a herpes simplex virus), or a polyarginine peptide sequence. The cell penetrating domain may be located at the N-terminal, C-terminal or internal positions of the protein.

In other embodiments, the RNA guided endonuclease may further comprise at least one marker domain. Non-limiting examples of the labeling domain include fluorescent proteins, purification tags, and epitope tags. In some embodiments, the marker domain may be a fluorescent protein. Non-limiting examples of suitable fluorescent proteins include green fluorescent protein (e.g., GFP-2, tagGFP, turboGFP, EGFP, emerald, azami Green, monomeric Azami Green, copGFP, aceGFP, zsGreen), yellow fluorescent protein (e.g., YFP, EYFP, citrine, venus, YPet, phiYFP, zsYellow 1), blue fluorescent protein (e.g., EBFP2, azurite, mKalama1, GFPuv, sapphire, T-sapphire), cyan fluorescent protein (e.g., ECFP, cerulean, cyPet, amCyan1, midoriishi-Cyan), red fluorescent protein (mKate、mKate2、mPlum、DsRed monomer、mCherry、mRFP1、DsRed-Express、DsRed2、DsRed-Monomer、HcRed-Tandem、HcRed1、AsRed2、eqFP611、mRasberry、mStrawberry、Jred), and Orange fluorescent protein (mOrange, mKO, kusabira-Orange, monomeric Kusabira-Orange, mTangerine, tdTomato), or any other suitable fluorescent protein. In other embodiments, the marker domain may be a purification tag and/or an epitope tag. Exemplary tags include, but are not limited to, glutathione-S-transferase (GST), chitin Binding Protein (CBP), maltose binding protein, thioredoxin (TRX), poly (NANP), tandem Affinity Purification (TAP) tag 、myc、AcV5、AU1、AU5、E、ECS、E2、FLAG、HA、nus、Softag 1、Softag 3、Strep、SBP、Glu-Glu、HSV、KT3、S、S1、T7、V5、VSV-G、6×His、, biotin Carboxyl Carrier Protein (BCCP), and calmodulin.

In certain embodiments, the RNA-guided endonuclease may be part of a protein-RNA complex comprising pegRNA. pegRNA interact with RNA-guided endonucleases to direct the endonuclease to a specific target site, wherein the 5' end of the guide RNA base pairs with a specific pre-spacer.

(II) fusion proteins

Another aspect of the present disclosure provides a fusion protein comprising a CRISPR/Cas-like protein or a fragment thereof and an effector domain in combination with pegRNA and a cis-acting regulatory element. CRISPR/Cas-like proteins are directed to a target site through pegRNA where the effector domain can modify or affect a target nucleic acid sequence. The effector domain may be a cleavage domain, an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. The fusion protein may further comprise at least one additional domain selected from a nuclear localization signal, a cell penetrating domain or a labeling domain.

(A) CRISPR/Cas-like proteins

The fusion protein comprises a CRISPR/Cas-like protein or a fragment thereof. CRISPR/Cas-like proteins are described in detail in section (I) above. The CRISPR/Cas-like protein may be located at the N-terminus, C-terminus, or an internal position of the fusion protein.

In some embodiments, the CRISPR/Cas-like protein of the fusion protein may be derived from a Cas9 protein. The Cas 9-derived protein may be wild-type, modified, or a fragment thereof. In some embodiments, cas 9-derived proteins may be modified to contain only one functional nuclease domain (RuvC-like or HNH-like nuclease domain). For example, the Cas 9-derived protein may be modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., there is no nuclease activity). In some embodiments where one of the nuclease domains is inactive, the Cas 9-derived protein is capable of introducing a nick into double-stranded nucleic acid (such proteins are referred to as "nickases"), but does not cleave double-stranded DNA. For example, conversion of aspartic acid to alanine (D10A) in the RuvC-like domain converts Cas 9-derived protein to a nickase. Likewise, the conversion of histidine to alanine (H840A or H839A) in the HNH domain converts Cas 9-derived proteins to nickases. In other embodiments, both the RuvC-like nuclease domain and the HNH-like nuclease domain can be modified or removed such that the Cas 9-derived protein is incapable of nicking or cleaving double-stranded nucleic acids. In other embodiments, all nuclease domains of the Cas 9-derived protein may be modified or removed such that the Cas 9-derived protein lacks all nuclease activity.

In any of the above embodiments, any or all of the nuclease domains can be inactivated by one or more of deletion, insertion, and/or substitution mutations using well-known methods, such as site-directed mutagenesis, PCR-mediated mutagenesis, and total-gene synthesis, among other methods known in the art. In one exemplary embodiment, the CRISPR/Cas-like protein of the fusion protein is derived from a Cas9 protein, wherein all nuclease domains have been inactivated or deleted.

(B) Effector domains

The fusion protein further comprises an effector domain. The effector domain may be a cleavage domain, an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. The effector domain may be located at the N-terminal, C-terminal or internal position of the fusion protein.

(I) Cleavage domain

In some embodiments, the effector domain is a cleavage domain. As used herein, "cleavage domain" refers to a domain that cleaves DNA. The cleavage domain may be obtained from any endonuclease or exonuclease. Non-limiting examples of endonucleases from which the cleavage domain may be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, e.g., NEW ENGLAND Biolabs Catalog or Belfort et al, (1997) Nucleic Acids Res.25:3379-3388. Additional enzymes that cleave DNA are known (e.g., S1 nuclease, mung bean nuclease, pancreatic DNase I, micrococcus nuclease, yeast HO endonuclease). See also Linn et al, (eds.) Nucleases, cold Spring Harbor Laboratory Press,1993. One or more of these enzymes (or functional fragments thereof) may be used as a source of cleavage domains.

In some embodiments, the cleavage domain may be derived from a type II-S endonuclease. Type II-S endonucleases cleave DNA at a site that is typically several base pairs from the recognition site and thus have separable recognition and cleavage domains. These enzymes are typically monomers that associate transiently to form dimers to cleave each strand of DNA at staggered positions (STAGGERED LOCATION). Non-limiting examples of suitable type II-S endonucleases include BfiI, bpmI, bsaI, bsgI, bsmBI, bsmI, bspMI, fokI, mbolI and SapI. In an exemplary embodiment, the cleavage domain of the fusion protein is a fokl cleavage domain or derivative thereof.

In certain embodiments, the type II-S cleavage can be modified to promote dimerization of two different cleavage domains, each of which is attached to a CRISPR/Cas-like protein or fragment thereof. For example, the cleavage domain of fokl can be modified by mutating certain amino acid residues. As non-limiting examples, amino acid residues at positions 446, 447, 479, 483, 484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 of the fokl cleavage domain are modified targets. For example, the modified cleavage domains of FokI that form the obligatory heterodimer include those in which the first modified cleavage domain includes mutations at amino acid positions 490 and 538, and the second modified cleavage domain includes a pair of mutations at amino acid positions 486 and 499 (Miller et al, 2007, nat. Biotechnol,25:778-785; szczpek et al, 2007, nat. Biotechnol, 25:786-793). For example, in one domain (E490K, I538K), glu (E) at position 490 may be changed to Lys (K), and Ile (I) at position 538 may be changed to K, and in another cleavage domain (Q486E, I499L), gin (Q) at position 486 may be changed to E, and I at position 499 may be changed to Leu (L). In other embodiments, the modified FokI cleavage domain may include three amino acid changes (Doyon et al, 2011, nat. Methods, 8:74-81). For example, one modified fokl domain (termed ELD) may comprise the Q486E, I499L, N496D mutation, while another modified fokl domain (termed KKR) may comprise the E490K, I538K, H537R mutation.

In exemplary embodiments, the effector domain of the fusion protein is a fokl cleavage domain or a modified fokl cleavage domain.

In embodiments where the effector domain is a cleavage domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, cas9 derivatization may be modified as discussed herein such that its endonuclease activity is removed. For example, cas9 derivatives may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.

(Ii) Epigenetic modification of the domains

In other embodiments, the effector domain of the fusion protein may be an epigenetic modification domain. Typically, the epigenetic modification domain alters the histone structure and/or chromosomal structure without altering the DNA sequence. Altering histone and/or chromatin structure can result in altered gene expression. Examples of epigenetic modifications include, but are not limited to, acetylation or methylation of lysine residues in histones, and methylation of cytosine residues in DNA. Non-limiting examples of suitable epigenetic modification domains include histone acetyl transferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.

In embodiments where the effector domain is a Histone Acetyl Transferase (HAT) domain, the HAT domain may be derived from EP300 (i.e., E1A binding protein p 300), CREBBP (i.e., CREB binding protein )、CDY1、CDY2、CDYL1、CLOCK、ELP3、ESA1、GCN5(KAT2A)、HAT1,KAT2B、KAT5、MYST1、MYST2、MYST3、MYST4、NCOA1、NCOA2、NCOA3、NCOAT、P/CAF、Tip60、TAFII250 or tft 3c4. In one such embodiment, the HAT domain is p300.

In embodiments where the effector domain is an epigenetic modification domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, cas9 derivatization may be modified as discussed herein such that its endonuclease activity is removed. For example, cas9 derivatives may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.

(Iii) Transcriptional activation domains

In other embodiments, the effector domain of the fusion protein may be a transcriptional activation domain. Typically, the transcriptional activation domain interacts with transcriptional control elements and/or transcriptional regulatory proteins (i.e., transcription factors, RNA polymerases, etc.) to increase and/or activate transcription of the gene. In some embodiments, the transcriptional activation domain may be, but is not limited to, a herpes simplex virus VP16 activation domain, VP64 (which is a tetrameric derivative of VP 16), NF-. Kappa. B p65 activation domain, p53 activation domains 1 and 2, CREB (cAMP response element binding protein) activation domain, E2A activation domain, and NFAT (nuclear factor of activated T cells) activation domain. In other embodiments, the transcriptional activation domains may be Gal4, gcn4, MLL, rtg3, GIn3, oaf1, pip2, pdr1, pdr3, pho4, and Leu3. The transcriptional activation domain may be wild-type or it may be a modified form of the original transcriptional activation domain. In some embodiments, the effector domain of the fusion protein is a VP16 or VP64 transcriptional activation domain.

In embodiments where the effector domain is a transcriptional activation domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, the Cas 9-derived protein may be modified such that its endonuclease activity is removed as discussed herein. For example, cas9 derivatives may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.

(Iv) Transcription repressor domain

In other embodiments, the effector domain of the fusion protein may be a transcriptional repressor domain. Typically, a transcription repressor domain interacts with transcription control elements and/or transcription regulatory proteins (i.e., transcription factors, RNA polymerase, etc.) to reduce and/or terminate transcription of a gene. Non-limiting examples of suitable transcription repressor domains include the Inducible CAMP Early Repressor (ICER) domain, the Kruppel related cassette A (KRAB-A) repressor domain, the YY1 glycine-rich repressor domain, the Sp 1-like repressor, the E (spl) repressor, the IκB repressor, and MeCP2.

In embodiments where the effector domain is a transcription repressor domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, the Cas 9-derived protein may be modified as discussed herein such that its endonuclease activity is removed. For example, cas9 may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.

(C) Additional domains

In some embodiments, the fusion protein further comprises at least one additional domain. Non-limiting examples of suitable additional domains include nuclear localization signals, cell penetration or translocation domains, and labeling domains. Non-limiting examples of suitable nuclear localization signals, cell penetrating domains and labeling domains are presented in section (I) above.

(D) Fusion protein dimers

In embodiments where the effector domain of the fusion protein is a cleavage domain, a dimer comprising at least one fusion protein may be formed. The dimer may be a homodimer or a heterodimer. In some embodiments, the heterodimer comprises two different fusion proteins. In other embodiments, the heterodimer comprises one fusion protein and an additional protein.

In some embodiments, the dimer is a homodimer, wherein the two fusion protein monomers are identical in terms of primary amino acid sequence. In one embodiment where the dimer is a homodimer, the Cas 9-derived proteins are modified such that their endonuclease activity is removed, i.e., such that they do not have a functional nuclease domain. In certain embodiments in which Cas 9-derived proteins are modified such that their endonuclease activity is removed, each fusion protein monomer comprises the same Cas 9-like protein and the same cleavage domain. The cleavage domain can be any cleavage domain, such as any of the exemplary cleavage domains provided herein. In a particular embodiment, the cleavage domain is a fokl cleavage domain or a modified fokl cleavage domain. In such embodiments, the specificity pegRNA directs the fusion protein monomers to different but closely adjacent sites such that upon dimer formation, the nuclease domains of both monomers will create a double-strand break in the target DNA.

In other embodiments, the dimer is a heterodimer of two different fusion proteins. For example, the CRISPR/Cas-like proteins of each fusion protein can be derived from different CRISPR/Cas proteins or from orthologous CRISPR/Cas proteins from different bacterial species. For example, each fusion protein can comprise a Cas 9-like protein, which Cas 9-like protein is derived from a different bacterial species. In these embodiments, each fusion protein will recognize a different target site (i.e., the target site specified by the pre-spacer sequence and/or PAM sequence). For example, pegRNA can localize heterodimers to different but closely adjacent sites such that their nuclease domains produce efficient double-strand breaks in the target DNA. Heterodimers can also have Cas9 proteins with nicking activity modified such that the nicking positions are different.

Alternatively, the two fusion proteins of the heterodimer may have different effector domains. In embodiments where the effector domain is a cleavage domain, each fusion protein may contain a different modified cleavage domain. For example, each fusion protein may contain a different modified fokl cleavage domain, as detailed in section (II) (b) (i) above. In these embodiments, the Cas-9 protein may be modified such that its endonuclease activity is removed.

As will be appreciated by those of skill in the art, the two fusion proteins forming the heterodimer may differ in both CRISPR/Cas-like protein domains and effector domains.

In any of the above embodiments, the homodimer or heterodimer may comprise at least one additional domain selected from the group consisting of a Nuclear Localization Signal (NLS), a cell penetration, a translocation domain, and a labeling domain, as detailed above.

In any of the above embodiments, one or both of the Cas 9-derived proteins may be modified such that its endonuclease activity is removed or modified.

In alternative embodiments, the heterodimer comprises one fusion protein and an additional protein. For example, the additional protein may be a nuclease. In one embodiment, the nuclease is a zinc finger nuclease. The zinc finger nuclease comprises a zinc finger DNA binding domain and a cleavage domain. Zinc fingers recognize and bind three (3) nucleotides. The zinc finger DNA binding domain can comprise from about three zinc fingers to about seven zinc fingers. The zinc finger DNA binding domain may be derived from a naturally occurring protein, or it may be engineered. See, for example, beerli et al, (2002) Nat. Biotechnol.20:135-141; pabo et al, (2001) Ann.Rev.biochem.70:313-340; isalan et al, (2001) Nat. Biotechnol.19:656-660; segal et al (2001) curr.Opin.Biotechnol.12:632-637; choo et al, (2000) Curr.Opin. Structure. Biol.10:411-416; zhang et al, (2000) J.biol. Chem.275 (43): 33850-33860; doyon et al, (2008) Nat. Biotechnol.26:702-708; and Santiago et al, (2008) Proc. Natl. Acad. Sci. USA 105:5809-5814. The cleavage domain of the zinc finger nuclease may be any of the cleavage domains detailed in section (II) (b) (i) above. In exemplary embodiments, the cleavage domain of the zinc finger nuclease is a fokl cleavage domain or a modified fokl cleavage domain. Such zinc finger nucleases would dimerize with fusion proteins comprising a fokl cleavage domain or a modified fokl cleavage domain.

In some embodiments, the zinc finger nuclease may comprise at least one additional domain selected from a nuclear localization signal, a cell penetration or translocation domain, which is detailed above.

In certain embodiments, any of the fusion proteins detailed above or dimers comprising at least one fusion protein may be part of a protein-RNA complex comprising at least one pegRNA. pegRNA interact with a CRISPR-Cas 0-like protein of a fusion protein to direct the fusion protein to a specific target site, wherein the 5' end of pegRNA base pairs with a specific pre-spacer sequence.

(III) nucleic acids encoding RNA-guided endonucleases or fusion proteins

Another aspect of the invention provides a nucleic acid encoding any of the RNA guided endonucleases or fusion proteins described in sections (I) and (II), respectively, above. The nucleic acid may be RNA or DNA. In one embodiment, the nucleic acid encoding an RNA-guided endonuclease or fusion protein is mRNA. The mRNA can be 5 'capped and/or 3' polyadenylation. In another embodiment, the nucleic acid encoding an RNA-guided endonuclease or fusion protein is DNA. The DNA may be present in a vector (see below).

Nucleic acids encoding RNA-guided endonucleases or fusion proteins can be codon optimized for efficient translation into proteins in eukaryotic cells or animals of interest. For example, codons may be optimized for expression in humans, mice, rats, hamsters, cows, pigs, cats, dogs, fish, amphibians, plants, yeast, insects, and the like. The codon optimization program is available as free software. Commercial codon optimization procedures are also available.

In some embodiments, DNA encoding an RNA-guided endonuclease or fusion protein may be operably linked to at least one promoter control sequence. In some iterations, the DNA coding sequence may be operably linked to a promoter control sequence for expression in a eukaryotic cell or animal of interest. Promoter control sequences may be constitutive, regulated, or tissue specific. Suitable constitutive promoter control sequences include, but are not limited to, a cytomegalovirus immediate early promoter (CMV), a simian virus (SV 40) promoter, an adenovirus major late promoter, a Rous Sarcoma Virus (RSV) promoter, a Mouse Mammary Tumor Virus (MMTV) promoter, a phosphoglycerate kinase (PGK) promoter, an elongation factor (ED 1) - α promoter, a ubiquitin promoter, an actin promoter, a tubulin promoter, an immunoglobulin promoter, a fragment thereof, or a combination of any of the foregoing promoters. Examples of suitable regulated promoter control sequences include, but are not limited to, those regulated by heat shock, metals, steroids, antibiotics, or alcohols. Non-limiting examples of tissue specific promoters include the B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-beta promoter, mb promoter, nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter. The promoter sequence may be wild-type or it may be modified for more efficient or effective expression. In one exemplary embodiment, the coding DNA can be operably linked to a CMV promoter for constitutive expression in mammalian cells.

In certain embodiments, the sequence encoding the RNA-guided endonuclease or fusion protein may be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for in vitro mRNA synthesis. In such embodiments, the in vitro transcribed RNA may be purified for use in the methods detailed in sections (IV) and (V) below. For example, the promoter sequence may be a T7, T3 or SP6 promoter sequence or a variant of a T7, T3 or SP6 promoter sequence. In one exemplary embodiment, the DNA encoding the fusion protein is operably linked to a T7 promoter, which T7 promoter is used for in vitro mRNA synthesis using a T7 RNA polymerase.

In alternative embodiments, the sequence encoding the RNA-guided endonuclease or fusion protein may be operably linked to a promoter sequence for in vitro expression of the RNA-guided endonuclease or fusion protein in a bacterial or eukaryotic cell. In such embodiments, the expressed protein may be purified for use in the methods detailed in sections (IV) and (V) below. Suitable bacterial promoters include, but are not limited to, T7 promoters, lac operator promoters, trp promoters, variants thereof, and combinations thereof. An exemplary bacterial promoter is tac, which is a hybrid of trp and lac promoters. Non-limiting examples of suitable eukaryotic promoters are listed above.

In further aspects, DNA encoding an RNA-guided endonuclease or fusion protein may also be linked to a polyadenylation signal (e.g., SV40polyA signal, bovine Growth Hormone (BGH) polyA signal, etc.) and/or at least one transcription termination sequence. Furthermore, the sequence encoding the RNA-guided endonuclease or fusion protein may also be linked to a sequence encoding at least one nuclear localization signal, at least one cell penetrating domain and/or at least one marker domain, which is detailed in part (I) above.

In various embodiments, DNA encoding an RNA-guided endonuclease or fusion protein may be present in the vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/minichromosomes, transposons and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, etc.). In one embodiment, the DNA encoding the RNA-guided endonuclease or fusion protein is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript and variants thereof. The vector may comprise additional expression control sequences (e.g., enhancer sequences, kozak sequences, polyadenylation sequences, transcription termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like. Additional information can be found in "Current Protocols in Molecular Biology" Ausubel et al, john Wiley & Sons, new York,2003 or "Molecular Cloning:A Laboratory Manual"Sambrook&Russell,Cold Spring Harbor Press,Cold Spring Harbor,N.Y.,, 3 rd edition, 2001.

In some embodiments, an expression vector comprising a sequence encoding an RNA-guided endonuclease or fusion protein may further comprise a sequence encoding pegRNA. The sequence encoding pegRNA is typically operably linked to at least one transcriptional control sequence for expression pegRNA in the cell or embryo of interest. For example, the DNA encoding pegRNA may be operably linked to a promoter sequence that is recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6, U3, H1, and 7SL RNA promoters.

(IV) methods for modifying chromosomal sequences using RNA-guided endonucleases

Another aspect of the disclosure includes a method of modifying a chromosomal sequence in a eukaryotic cell or embryo. The method comprises introducing into a eukaryotic cell or embryo (i) at least one RNA-guided endonuclease comprising at least one nuclear localization signal or a nucleic acid encoding at least one RNA-guided endonuclease comprising at least one nuclear localization signal, (ii) at least one pegRNA or DNA encoding at least one pegRNA, and optionally, (iii) at least one donor polynucleotide comprising a donor sequence. The method further comprises culturing the cell or embryo such that each pegRNA directs the RNA-guided endonuclease to a target site in the chromosomal sequence, wherein the RNA-guided endonuclease introduces a double-strand break at the target site, and the double-strand break is repaired by a DNA repair process such that the chromosomal sequence is modified.

In some embodiments, the method may comprise introducing an RNA-guided endonuclease (or encoding nucleic acid) and a pegRNA (or encoding DNA) into the cell or embryo, wherein the RNA-guided endonuclease introduces a double strand break in the targeted chromosomal sequence. In embodiments where the optional donor polynucleotide is not present, double strand breaks in the chromosomal sequence may be repaired by a non-homologous end joining (NHEJ) repair process. Because NHEJ is error-prone, a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during repair of a break. Thus, the targeted chromosomal sequence may be modified or inactivated. For example, a single nucleotide change (SNP) may produce an altered protein product, or a shift in the reading frame of the coding sequence may inactivate or "knock out" the sequence such that no protein product is produced. In embodiments where an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence at the target site during double strand break repair. For example, in embodiments in which the donor sequence is flanked by an upstream sequence and a downstream sequence that have substantial sequence identity to the upstream and downstream sequences, respectively, of the target site in the chromosomal sequence, the donor sequence may be exchanged with or integrated into the chromosomal sequence at the targeted site during repair conducted by homology-directed repair Cheng Jie. Alternatively, in embodiments where the donor sequence is flanked by compatible overhangs (or the compatible overhangs are created by RNA-guided endonuclease sites), the donor sequence may be directly linked to the cleaved chromosomal sequence by a non-homologous repair process during double-strand break repair. The exchange or integration of the donor sequence into the chromosomal sequence modifies the targeted chromosomal sequence or introduces an exogenous sequence into the chromosomal sequence of the cell or embryo.

In other embodiments, the method may comprise introducing two RNA-guided endonucleases (or encoding nucleic acids) and two pegRNA (or encoding DNA) into the cell or embryo, wherein the RNA-guided endonucleases introduce two double strand breaks in the chromosomal sequence. See fig. 3B. The two breaks may be within a few base pairs, within tens of base pairs, or may be separated by thousands of base pairs. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a sequence loss between the two cleavage sites and/or a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).

In other embodiments, the method may comprise introducing into the cell or embryo one RNA-guided endonuclease modified to cleave one strand of a double-stranded sequence (or encoding nucleic acid) and two pegRNA (or encoding DNA), wherein each pegRNA directs the RNA-guided endonuclease to a specific target site at which the modified endonuclease cleaves one strand of the double-stranded chromosomal sequence (i.e., the nick), and wherein the two nicks are in opposite strands and sufficiently close to constitute a double-stranded break. See fig. 3A. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).

(A) RNA-guided endonucleases

The method comprises introducing into a cell or embryo at least one RNA-guided endonuclease comprising at least one nuclear localization signal or a nucleic acid encoding at least one RNA-guided endonuclease comprising at least one nuclear localization signal. Such RNA-guided endonucleases and nucleic acids encoding RNA-guided endonucleases are described in sections (I) and (III), respectively, above. Such guided RNAs may be pegRNA.

In some embodiments, the RNA-guided endonuclease can be introduced into the cell or embryo as an isolated protein. In such embodiments, the RNA-guided endonuclease may further comprise at least one cell penetrating domain that facilitates cellular uptake of the protein. In other embodiments, the RNA-guided endonuclease may be introduced into the cell or embryo as an mRNA molecule. In other embodiments, the RNA-guided endonuclease may be introduced into the cell or embryo as a DNA molecule. Typically, the DNA sequence encoding the fusion protein is operably linked to a promoter sequence that is functional in the cell or embryo of interest. The DNA sequence may be linear or the DNA sequence may be part of a vector. In further embodiments, the fusion protein may be introduced into the cell or embryo as an RNA-protein complex comprising the fusion protein and pegRNA.

In alternative embodiments, the DNA encoding the RNA guided endonuclease may further comprise a sequence encoding pegRNA. Typically, each of the sequences encoding the RNA-guided endonucleases and pegRNA is operably linked to appropriate promoter control sequences that allow expression of the RNA-guided endonucleases and pegRNA, respectively, in a cell or embryo. The DNA sequences encoding RNA-guided endonucleases and pegRNA may further comprise additional expression control, regulatory and/or processing sequences. The DNA sequences encoding the RNA guided endonucleases and pegRNA may be linear or may be part of a vector.

(B) Pilot editing guide RNA (PegRNA)

The method further comprises introducing at least one pegRNA or DNA encoding at least one pegRNA into the cell or embryo. pegRNA interact with RNA-guided endonucleases to direct the endonuclease to a specific target site at which the 5' end of pegRNA base pairs with a specific pre-spacer in the chromosomal sequence.

Each pegRNA contains three regions: a first region at the 5 'end complementary to the target site in the chromosomal sequence, a second internal region forming a stem-loop structure, and a third 3' region that remains substantially single-stranded. The first region of each pegRNA is different such that each pegRNA directs the fusion protein to a particular target site. The second and third regions of each pegRNA may be the same in all pegRNA.

The first region of pegRNA is complementary to a sequence at a target site in the chromosomal sequence (i.e., a pre-spacer sequence) such that the first region of pegRNA can base pair with the target site. In various embodiments, the first region of pegRNA may comprise from about 10 nucleotides to more than about 25 nucleotides. For example, the length of the base pairing region between the first region of pegRNA and the target site in the chromosomal sequence can be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, or more than 25 nucleotides. In an exemplary embodiment, the first region of pegRNA is about 19, 20, or 21 nucleotides in length.

PegRNA also include a second region that forms a secondary structure. In some embodiments, the secondary structure comprises a stem (or hairpin) and a loop. The length of the loops and stems may vary. For example, the loop may range from about 3 to about 10 nucleotides in length, and the stem may range from about 6 to about 20 base pairs in length. The stem may comprise one or more projections of 1 to about 10 nucleotides. Thus, the total length of the second region may range from about 16 to about 60 nucleotides in length. In one exemplary embodiment, the loop is about 4 nucleotides in length and the stem comprises about 12 base pairs.

PegRNA also includes a third region at the 3' end that remains substantially single-stranded. Thus, the third region is not complementary to any chromosomal sequence in the cell of interest and is not complementary to the remainder of pegRNA. The length of the third region may vary. Typically, the third region is more than about 4 nucleotides in length. For example, the length of the third region may range from about 5 to about 60 nucleotides in length.

The sum length of the second and third regions of pegRNA (also referred to as the universal region or the scaffold region) can range from about 30 to about 120 nucleotides in length. In one aspect, the sum length of the second and third regions of pegRNA ranges from about 70 to about 100 nucleotides in length.

In some embodiments pegRNA comprises a single molecule containing all three regions. In other embodiments pegRNA may comprise two separate molecules. The first RNA molecule may comprise one half of the "stem" of the first region of pegRNA and the second region of pegRNA. The second RNA molecule may comprise the other half of the "stem" of the second region of pegRNA and the third region of pegRNA. Thus, in this embodiment, the first and second RNA molecules each contain nucleotide sequences that are complementary to each other. For example, in one embodiment, the first and second RNA molecules each comprise a sequence (of about 6 to about 20 nucleotides) that base pairs with another sequence to form a functional pegRNA.

In some embodiments pegRNA may be introduced into the cell or embryo as an RNA molecule. The RNA molecule can be transcribed in vitro. Alternatively, the RNA molecule may be chemically synthesized.

In other embodiments pegRNA may be introduced into the cell or embryo as a DNA molecule. In such cases, the DNA encoding pegRNA may be operably linked to a promoter control sequence to express pegRNA in the cell or embryo of interest. For example, the RNA coding sequence may be operably linked to a promoter sequence recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6 or H1 promoters. In exemplary embodiments, the RNA coding sequence is linked to a mouse or human U6 promoter. In other exemplary embodiments, the RNA coding sequence is linked to a mouse or human H1 promoter.

The DNA molecule encoding pegRNA may be linear or circular. In some embodiments, the DNA sequence encoding pegRNA may be part of a vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/minichromosomes, transposons and viral vectors. In an exemplary embodiment, the DNA encoding the RNA guided endonuclease is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript and variants thereof. The vector may comprise additional expression control sequences (e.g., enhancer sequences, kozak sequences, polyadenylation sequences, transcription termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like.

In embodiments where both RNA-guided endonucleases and pegRNA are introduced into a cell as DNA molecules, each may be part of a separate molecule (e.g., one vector containing the fusion protein coding sequence and a second vector containing the pegRNA coding sequence) or both may be part of the same molecule (e.g., one vector containing the coding (and regulatory) sequences of both fusion proteins and pegRNA).

(C) Target site

The RNA-guided endonuclease along with pegRNA is directed to a target site in the chromosomal sequence, wherein the RNA-guided endonuclease introduces a break in the chromosomal sequence. The target site is not limited in sequence except that the sequence is immediately followed (downstream) by the consensus sequence. This consensus sequence is also known as the pre-spacer adjacent motif (PAM). Examples of PAMs include, but are not limited to, NGG, NGGNG, and NNAGAAW (where N is defined as any nucleotide and W is defined as a or T). As detailed in section (IV) (b) above, the first region of pegRNA (at the 5' end) is complementary to the pre-spacer of the target sequence. Typically, the first region of pegRNA is about 19 to 21 nucleotides in length. Thus, in certain aspects, the sequence of the target site in the chromosomal sequence is 5'-N _19-21 -NGG-3'. PAM is shown in italics.

The target site may be in a coding region of a gene, an intron of a gene, a control region of a gene, a non-coding region between genes, or the like. The gene may be a protein-encoding gene or an RNA-encoding gene. The gene may be any gene of interest.

(D) Optional donor polynucleotide

In some embodiments, the method further comprises introducing at least one donor polynucleotide to the target site. The donor polynucleotide comprises at least one donor sequence. In some aspects, the donor sequence of the donor polynucleotide corresponds to an endogenous or native chromosomal sequence. For example, the donor sequence may be substantially identical to a portion of the chromosomal sequence at or near the target site, but it comprises at least one nucleotide change. Thus, the donor sequence may comprise a modified form of the wild-type sequence at the target site such that, upon integration or exchange with the native sequence, the sequence at the targeted chromosomal location comprises at least one nucleotide change. For example, the change may be an insertion of one or more nucleotides, a deletion of one or more nucleotides, a substitution of one or more nucleotides, or a combination thereof. As a result of integrating the modified sequence, the cell or embryo/animal can produce a modified gene product from the targeted chromosomal sequence.

In other aspects, the donor sequence of the donor polynucleotide corresponds to the exogenous sequence. As used herein, an "exogenous" sequence refers to a sequence that is not native to a cell or embryo, or that is at a different location in its natural location in the genome of the cell or embryo. For example, the exogenous sequence may comprise a protein coding sequence that may be operably linked to an exogenous promoter control sequence such that, upon integration into the genome, the cell or embryo/animal is capable of expressing the protein encoded by the integrated sequence. Alternatively, the exogenous sequence may be integrated into the chromosomal sequence such that its expression is under the control of the endogenous promoter sequence. In other iterations, the exogenous sequence may be a transcription control sequence, another expression control sequence, an RNA coding sequence, or the like. Integration of exogenous sequences into chromosomal sequences is known as "knock-in".

The length of the donor sequence may and will vary, as will be appreciated by those skilled in the art. For example, the length of the donor sequence may vary from a few nucleotides to hundreds of thousands of nucleotides.

A donor polynucleotide comprising an upstream sequence and a downstream sequence. In some embodiments, the donor sequence in the donor polynucleotide is flanked by an upstream sequence and a downstream sequence that have substantial sequence identity to sequences located upstream and downstream, respectively, of the target site in the chromosomal sequence. Because of these sequence similarities, the upstream and downstream sequences of the donor polynucleotide allow for homologous recombination between the donor polynucleotide and the targeted chromosomal sequence, such that the donor sequence can be integrated into (or exchanged with) the chromosomal sequence.

As used herein, an upstream sequence refers to a nucleic acid sequence that shares substantial sequence identity with a chromosomal sequence upstream of the target site. Similarly, a downstream sequence refers to a nucleic acid sequence that shares substantial sequence identity with a chromosomal sequence downstream of the target site. As used herein, the phrase "substantial sequence identity" refers to sequences having at least about 75% sequence identity. Thus, the upstream and downstream sequences in the donor polynucleotide may have about 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to the sequence upstream or downstream of the targeted site. In exemplary embodiments, the upstream and downstream sequences in the donor polynucleotide may have about 95% to 100% sequence identity to the chromosomal sequence upstream or downstream of the targeted site. In one embodiment, the upstream sequence shares substantial sequence identity with a chromosomal sequence located immediately upstream of (i.e., adjacent to) the target site. In other embodiments, the upstream sequence shares substantial sequence identity with a chromosomal sequence located within about one hundred (100) nucleotides upstream of the target site. Thus, for example, the upstream sequence may share substantial sequence identity with chromosomal sequences located about 1 to about 20, about 21 to about 40, about 41 to about 60, about 61 to about 80, or about 81 to about 100 nucleotides upstream of the target site. In one embodiment, the downstream sequence shares substantial sequence identity with a chromosomal sequence located immediately downstream of (i.e., adjacent to) the target site. In other embodiments, the downstream sequence shares substantial sequence identity with a chromosomal sequence located within about one hundred (100) nucleotides downstream of the targeted site. Thus, for example, the downstream sequence may share substantial sequence identity with chromosomal sequences located about 1 to about 20, about 21 to about 40, about 41 to about 60, about 61 to about 80, or about 81 to about 100 nucleotides downstream of the targeted site.

Each upstream or downstream sequence may range in length from about 20 nucleotides to about 5000 nucleotides. In some embodiments, the upstream and downstream sequences may comprise about 50、100、200、300、400、500、600、700、800、900、1000、1100、1200、1300、1400、1500、1600、1700、1800、1900、2000、2100、2200、2300、2400、2500、2600、2800、3000、3200、3400、3600、3800、4000、4200、4400、4600、4800 or 5000 nucleotides. In exemplary embodiments, the upstream and downstream sequences may range in length from about 50 to about 1500 nucleotides.

The donor polynucleotide comprising an upstream sequence and a downstream sequence having sequence similarity to the targeted chromosomal sequence may be linear or circular. In embodiments where the donor polynucleotide is circular, it may be part of a vector. For example, the vector may be a plasmid vector.

A donor polynucleotide comprising a targeted cleavage site. In other embodiments, the donor polynucleotide may additionally comprise at least one targeted cleavage site recognized by an RNA-guided endonuclease. The targeted cleavage site added to the donor polynucleotide may be placed upstream or downstream or both upstream and downstream of the donor sequence. For example, the donor sequence may be flanked by targeted cleavage sites such that, upon cleavage by an RNA-guided endonuclease, the donor sequence is flanked by overhangs that are compatible with overhangs in the chromosomal sequence that result after cleavage by the RNA-guided endonuclease. Thus, during double strand break repair, the donor sequence may be linked to the cleaved chromosomal sequence by a non-homologous repair process. Typically, the donor polynucleotide comprising the targeted cleavage site will be circular (e.g., may be part of a plasmid vector).

A donor polynucleotide comprising a short donor sequence with optional overhangs. In alternative embodiments, the donor polynucleotide may be a linear molecule comprising a short donor sequence with an optional short overhang that is compatible with the overhang produced by the RNA-guided endonuclease. In such embodiments, the donor sequence may be directly linked to the cleaved chromosomal sequence during double strand break repair. In some cases, the donor sequence may be less than about 1,000, less than about 500, less than about 250, or less than about 100 nucleotides. In some cases, the donor polynucleotide may be a linear molecule comprising a short donor sequence with blunt ends. In other iterations, the donor polynucleotide may be a linear molecule comprising a short donor sequence with 5 'and/or 3' overhangs. The overhang may comprise 1,2, 3,4 or 5 nucleotides.

Typically, the donor polynucleotide is DNA. The DNA may be single-stranded or double-stranded and/or linear or circular. The donor polynucleotide may be a DNA plasmid, bacterial Artificial Chromosome (BAC), yeast Artificial Chromosome (YAC), viral vector, linear DNA segment, PCR fragment, naked nucleic acid, or nucleic acid complexed with a delivery vehicle such as a liposome or poloxamer. In certain embodiments, the donor polynucleotide comprising the donor sequence may be part of a plasmid vector. In any of these cases, the donor polynucleotide comprising the donor sequence may further comprise at least one additional sequence.

(E) Introduction into cells or embryos

RNA-targeted endonucleases (or encoding nucleic acids), pegRNA (or encoding DNA), and optionally donor polynucleotides can be introduced into cells or embryos by a variety of means. In some embodiments, the cell or embryo is transfected. Suitable transfection methods include calcium phosphate mediated transfection, nuclear transfection (or electroporation), cationic polymer transfection (such as DEAE-dextran or polyethylenimine), viral transduction, virosome transfection, virion transfection, liposome transfection, cationic liposome transfection, immunoliposome transfection, non-liposome lipofection, dendrimer transfection, heat shock transfection, magnetic transfection, lipofection, gene gun delivery, puncture infection (impalefection), acoustic perforation, optical transfection, and proprietary reagent enhanced nucleic acid uptake. Methods of transfection are well known in the art (see, e.g., ausubel et al, "Current Protocols in Molecular Biology", john Wiley & Sons, new York,2003 or "Molecular Cloning:A Laboratory Manual"Sambrook&Russell,Cold Spring Harbor Press,Cold Spring Harbor,N.Y.,, 3rd edition, 2001). In other embodiments, the molecule is introduced into the cell or embryo by microinjection. Typically, an embryo is a fertilized single cell stage embryo of the species of interest. For example, the molecule may be injected into a prokaryotic cell of a single-cell embryo.

The RNA-targeted endonuclease (or encoding nucleic acid), pegRNA (or DNA encoding pegRNA), and optionally the donor polynucleotide may be introduced into the cell or embryo simultaneously or sequentially. The ratio of RNA-targeted endonucleases (or encoding nucleic acids) to pegRNA (or encoding DNA) will typically be about stoichiometric such that they can form an RNA-protein complex. In one embodiment, the DNA encoding the RNA-targeted endonuclease and the DNA encoding pegRNA are delivered together in a plasmid vector.

(F) Culturing cells or embryos

The method further comprises maintaining the cell or embryo under conditions such that pegRNA directs the RNA-guided endonuclease to a target site in the chromosomal sequence and the RNA-guided endonuclease introduces at least one double-strand break in the chromosomal sequence. Double strand breaks can be repaired by a DNA repair process such that the chromosomal sequence is modified by deleting at least one nucleotide, inserting at least one nucleotide, replacing at least one nucleotide, or a combination thereof.

In embodiments where no donor polynucleotide is introduced into the cell or embryo, the double strand break may be repaired via a non-homologous end joining (NHEJ) repair process. Because NHEJ is error-prone, a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during repair of a break. Thus, sequences on the chromosomal sequence may be modified such that the reading frame of the coding region may be shifted and such that the chromosomal sequence is inactivated or "knocked out. The inactivated chromosomal sequence encoding the protein does not produce the protein encoded by the wild-type chromosomal sequence.

In embodiments in which a donor polynucleotide comprising an upstream sequence and a downstream sequence is introduced into a cell or embryo, the double strand break may be repaired by a Homology Directed Repair (HDR) process such that the donor sequence is integrated into the chromosomal sequence. Thus, the exogenous sequence may be integrated into the genome of the cell or embryo, or the targeted chromosomal sequence may be modified by exchanging the wild-type chromosomal sequence for the modified sequence.

In embodiments in which a donor polynucleotide comprising a targeted cleavage site is introduced into a cell or embryo, the RNA-guided endonuclease can cleave both the targeted chromosomal sequence and the donor polynucleotide. The linearized donor polynucleotide may be integrated into the chromosomal sequence at the double strand break site by ligation between the donor polynucleotide and the cleaved chromosomal sequence via the NHEJ process.

In embodiments where a linear donor polynucleotide comprising a short donor sequence is introduced into a cell or embryo, the short donor sequence may be integrated into the chromosomal sequence at the double strand break site via the NHEJ process. Integration can be via blunt-ended ligation between the short donor sequence and the chromosomal sequence at the double-strand break site. Alternatively, integration may be via cohesive end (i.e., with 5 'or 3' overhangs) linkages between short donor sequences flanked by overhangs compatible with those generated by RNA-targeted endonucleases in the excised chromosomal sequence.

Typically, cells are maintained under conditions suitable for cell growth and/or maintenance. Suitable cell culture conditions are well known in the art and are described, for example, in Santiago et al, (2008) PNAS105:5809-5814; moehle et al, (2007) PNAS104:3055-3060; urnov et al, (2005) Nature 435:646-651; and Lombardo et al (2007) Nat.Biotechnology 25:1298-1306. Those skilled in the art understand that methods of culturing cells are known in the art and may and will vary depending on the cell type. In all cases, routine optimization can be used to determine the best technique for a particular cell type.

Embryos can be cultured in vitro (e.g., in cell culture). Typically, if desired, the embryos are cultured at the necessary O ₂/CO₂ ratio at the appropriate temperature and in the appropriate medium to allow expression of the RNA endonucleases and pegRNA. Suitable non-limiting examples of media include M2, M16, KSOM, BMOC, and HTF media. Those skilled in the art will appreciate that the culture conditions may and will vary depending on the species of embryo. In all cases, routine optimization can be used to determine optimal culture conditions for a particular embryo species. In some cases, the cell line may be derived from an embryo (e.g., an embryonic stem cell line) cultured in vitro.

Alternatively, the embryo may be cultured in vivo by transferring the embryo into the uterus of a female host. Generally, the female host is from the same or similar species as the embryo. Preferably, the female host is pseudopregnant. Methods for preparing pseudopregnant female hosts are known in the art. In addition, methods of transferring embryos into female hosts are known. Culturing embryos in vivo allows embryo development and can result in live production of animals derived from the embryo. Such animals will contain a modified chromosomal sequence in each cell of the body.

(G) Cell and embryo types

Various eukaryotic cells and embryos are suitable for use in this method. For example, the cell may be a human cell, a non-human mammalian cell, a non-mammalian vertebrate cell, an invertebrate cell, an insect cell, a plant cell, a yeast cell, or a single cell eukaryotic organism. Typically, the embryo is a non-human mammalian embryo. In particular embodiments, the embryo may be a single cell non-human mammalian embryo. Exemplary mammalian embryos (including single cell embryos) include, but are not limited to, mouse, rat, hamster, rodent, rabbit, cat, canine, ovine, porcine, bovine, equine, and primate embryos. In other embodiments, the cells may be stem cells. Suitable stem cells include, but are not limited to, embryonic stem cells, ES-like stem cells, fetal stem cells, adult stem cells, pluripotent stem cells, induced pluripotent stem cells, multipotent stem cells, oligopotent stem cells, monopotent stem cells, and the like. In an exemplary embodiment, the cell is a mammalian cell.

Non-limiting examples of suitable mammalian cells include Chinese Hamster Ovary (CHO) cells, baby Hamster Kidney (BHK) cells; mouse myeloma NSO cells, mouse embryonic fibroblasts 3T3 cells (NIH 3T 3), mouse B lymphoma a20 cells; mouse melanoma B16 cells; mouse myoblasts C2C12 cells; mouse myeloma SP2/0 cells; mouse embryo mesenchymal C3H-10T1/2 cells; mouse cancer CT26 cells, mouse prostate DuCuP cells; mouse mammary gland EMT6 cells; mouse liver cancer Hepa1c1c7 cells; mouse myeloma J5582 cells; mouse epithelial MTD-1A cells; mouse myocardial MyEnd cells; mouse kidney RenCa cells; mouse pancreatic RIN-5F cells; mouse melanoma X64 cells; mouse lymphoma YAC-1 cells; rat glioblastoma 9L cells; rat B lymphoma RBL cells; rat neuroblastoma B35 cells; rat hepatoma cells (HTCs); buffalo rat liver BRL 3A cells; canine kidney cells (MDCK); canine mammary gland (CMT) cells; rat osteosarcoma D17 cells; rat monocyte/macrophage DH82 cells; monkey kidney SV-40 transformed fibroblasts (COS 7); monkey kidney CVI-76 cells; african green monkey kidney (VERO-76) cells; human embryonic kidney cells (HEK 293, HEK 293T); human cervical cancer cells (HELA); human lung cells (W138); human hepatocytes (Hep G2); human U2-OS osteosarcoma cells, human A549 cells, human A431 cells, and human K562 cells. A broad list of mammalian cell lines can be found in the american type culture collection catalog (ATCC, mamassas, va.).

(V) methods of modifying chromosomal sequences or regulating expression of chromosomal sequences using fusion proteins

Another aspect of the disclosure includes methods for modifying a chromosomal sequence or regulating expression of a chromosomal sequence in a cell or embryo. The method comprises introducing into a cell or embryo (a) at least one fusion protein or nucleic acid encoding at least one fusion protein, wherein the fusion protein comprises a CRISPR/Cas-like protein or fragment thereof and an effector domain, and (b) at least one pegRNA or DNA encoding pegRNA, wherein pegRNA directs the CRISPR/Cas-like protein of the fusion protein to a targeted site in a chromosomal sequence, and the effector domain of the fusion protein modifies the chromosomal sequence or modulates expression of the chromosomal sequence.

Fusion proteins comprising a CRISPR/Cas-like protein or fragment thereof and an effector domain are described in detail in section (II) above. Typically, the fusion proteins disclosed herein further comprise at least one nuclear localization signal. Nucleic acids encoding the fusion proteins are described in section (III) above. In some embodiments, the fusion protein may be introduced into the cell or embryo as an isolated protein (which may further comprise a cell penetrating domain). Furthermore, the isolated fusion protein may be part of a protein-RNA complex comprising pegRNA. In other embodiments, the fusion protein may be introduced into the cell or embryo as an RNA molecule (which may be capped and/or polyadenylation). In other embodiments, the fusion protein may be introduced into a cell or embryo as a DNA molecule. For example, the fusion protein and pegRNA may be introduced into a cell or embryo as discrete DNA molecules or as part of the same DNA molecule. Such DNA molecules may be plasmid vectors.

In some embodiments, the method further comprises introducing at least one zinc finger nuclease into the cell or embryo. Zinc finger nucleases are described in section (II) (d) above. In other embodiments, the method further comprises introducing at least one donor polynucleotide into the cell or embryo. The donor polynucleotide is described in detail in section (IV) (d) above. Means for introducing molecules into cells or embryos and means for culturing cells or embryos are described in sections (IV) (e) and (IV) (f), respectively, above. Suitable cells and embryos are described in section (IV) (g) above.

In certain embodiments where the effector domain of the fusion protein is a cleavage domain (e.g., a fokl cleavage domain or a modified fokl cleavage domain), the method may comprise introducing one fusion protein (or nucleic acid encoding one fusion protein) and two pegRNA (or DNA encoding two pegRNA) into the cell or embryo. Two pegRNA direct the fusion protein to two different target sites in the chromosomal sequence, wherein the fusion protein dimerizes (e.g., forms a homodimer) such that two cleavage domains can introduce a double-strand break into the chromosomal sequence. In embodiments where the optional donor polynucleotide is not present, double strand breaks in the chromosomal sequence may be repaired by a non-homologous end joining (NHEJ) repair process. Because NHEJ is error-prone, a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during repair of a break. Thus, the targeted chromosomal sequence may be modified or inactivated. For example, a single nucleotide change (SNP) may produce an altered protein product, or a shift in the reading frame of the coding sequence may inactivate or "knock out" the sequence such that no protein product is produced. In embodiments where an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence at the targeted site during double strand break repair. For example, in embodiments in which the donor sequence is flanked by an upstream sequence and a downstream sequence that have substantial sequence identity to the upstream and downstream sequences, respectively, of the target site in the chromosomal sequence, the donor sequence may be exchanged with or integrated into the chromosomal sequence at the targeted site during repair conducted by homology-directed repair Cheng Jie. Alternatively, in embodiments where the donor sequence is flanked by compatible overhangs (or the compatible overhangs are created by RNA-guided endonuclease sites), the donor sequence may be directly linked to the cleaved chromosomal sequence by a non-homologous repair process during double-strand break repair. The exchange or integration of the donor sequence into the chromosomal sequence modifies the targeted chromosomal sequence or introduces an exogenous sequence into the chromosomal sequence of the cell or embryo.

In other embodiments where the effector domain of the fusion protein is a cleavage domain (e.g., a fokl cleavage domain or a modified fokl cleavage domain), the method may comprise introducing two different fusion proteins (or nucleic acids encoding two different fusion proteins) and two pegRNA (or DNA encoding two pegRNA) into the cell or embryo. The fusion proteins may be different, as detailed in section (II) above. Each pegRNA directs the fusion protein to a specific target site in the chromosomal sequence, where the fusion protein dimerizes (e.g., forms a heterodimer) such that two cleavage domains can introduce a double-strand break into the chromosomal sequence. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).

In other embodiments where the effector domain of the fusion protein is a cleavage domain (e.g., a fokl cleavage domain or modified fokl cleavage domain), the method may comprise introducing into the cell or embryo a fusion protein (or nucleic acid encoding a fusion protein), a pegRNA (or DNA encoding a pegRNA), and a zinc finger nuclease (or nucleic acid encoding a zinc finger nuclease), wherein the zinc finger nuclease comprises a fokl cleavage domain or modified fokl cleavage domain. pegRNA directs the fusion protein to a specific chromosomal sequence and the zinc finger nuclease is directed to another chromosomal sequence, wherein the fusion protein and the zinc finger nuclease dimerize such that the cleavage domain of the fusion protein and the cleavage domain of the zinc finger nuclease can introduce a double-strand break into the chromosomal sequence. See fig. 1B. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).

In other embodiments where the effector domain of the fusion protein is a transcriptional activation domain or transcriptional repressor domain, the method may comprise introducing into the cell or embryo a fusion protein (or nucleic acid encoding a fusion protein) and a pegRNA (or DNA encoding a pegRNA). pegRNA direct the fusion protein to a specific chromosomal sequence, wherein the transcriptional activation domain or transcriptional repressor domain activates or represses, respectively, expression of the targeted chromosomal sequence. See fig. 2A.

In an alternative embodiment where the effector domain of the fusion protein is an epigenetic modified domain, the method may comprise introducing into the cell or embryo a fusion protein (or nucleic acid encoding a fusion protein) and a pegRNA (or DNA encoding a pegRNA). pegRNA direct the fusion protein to a specific chromosomal sequence, wherein the epigenetic modification domain modifies the structure of the targeted chromosomal sequence. See fig. 2B. Epigenetic modifications include acetylation, methylation, and/or nucleotide methylation of histones. In some cases, structural modifications of the chromosomal sequence result in changes in expression of the chromosomal sequence.

(VI) genetically modified cells and animals

The present disclosure includes genetically modified cells, non-human embryos, and non-human animals comprising at least one chromosomal sequence that has been modified using RNA-guided endonuclease-mediated or fusion protein-mediated processes, e.g., using the methods described herein. The present disclosure provides cells comprising at least one RNA-guided endonuclease or fusion protein encoding a targeted chromosomal sequence of interest or a DNA or RNA molecule encoding a fusion protein, at least one pegRNA, and optionally one or more donor polynucleotides. The present disclosure also provides a non-human embryo comprising at least one DNA or RNA molecule encoding an RNA-guided endonuclease or fusion protein targeting a chromosomal sequence of interest, at least one pegRNA, and optionally one or more donor polynucleotides.

The present disclosure provides genetically modified non-human animals, non-human embryos, or animal cells comprising at least one modified chromosomal sequence. The modified chromosomal sequence may be modified such that it (1) is inactivated, (2) has altered expression or produces an altered protein product, or (3) comprises an integrated sequence. Using the methods described herein, the chromosomal sequence is modified with RNA-guided endonuclease-mediated or fusion protein-mediated processes.

As discussed, one aspect of the present disclosure provides genetically modified animals in which at least one chromosomal sequence has been modified. In one embodiment, the genetically modified animal comprises at least one inactivated chromosomal sequence. The modified chromosomal sequence may be inactivated such that the sequence is not transcribed and/or the functional protein product is not produced. Thus, a genetically modified animal comprising an inactivated chromosomal sequence may be referred to as a "knockout" or a "conditional knockout". The inactivated chromosomal sequence may include a deletion mutation (i.e., deleting one or more nucleotides), an insertion mutation (i.e., inserting one or more nucleotides), or a nonsense mutation (i.e., a single nucleotide is substituted with another nucleotide such that a stop codon is introduced). As a result of the mutation, the targeted chromosomal sequence is inactivated and the functional protein is not produced. The inactivated chromosomal sequence does not contain exogenously introduced sequences. Also included herein are genetically modified animals in which two, three, four, five, six, seven, eight, nine, or ten or more chromosomal sequences are inactivated.

In another embodiment, the modified chromosomal sequence may be altered such that it encodes a variant protein product. For example, a genetically modified animal comprising a modified chromosomal sequence may comprise targeted point mutations or other modifications such that an altered protein product is produced. In one embodiment, the chromosomal sequence may be modified such that at least one nucleotide is altered and the expressed protein comprises an altered amino acid residue (missense mutation). In another embodiment, the chromosomal sequence may be modified to include more than one missense mutation such that more than one amino acid is altered. Furthermore, the chromosomal sequence may be modified to have three nucleotide deletions or insertions such that the expressed protein comprises a single amino acid deletion or insertion. The altered protein or variant protein may have altered properties or activity, such as altered substrate specificity, altered enzymatic activity, altered kinetic rate, etc., as compared to the wild-type protein.

In another embodiment, the genetically modified animal may comprise at least one chromosomally integrated sequence. Genetically modified animals comprising an integrated sequence may be referred to as "knockins" or "conditional knockins". The chromosomally integrated sequence may encode, for example, an ortholog protein, an endogenous protein, or a combination of both. In one embodiment, sequences encoding orthologous or endogenous proteins may be integrated into the chromosomal sequence encoding the protein such that the chromosomal sequence is inactivated, but the exogenous sequence is expressed. In this case, the sequence encoding the ortholog protein or endogenous protein may be operably linked to a promoter control sequence. Alternatively, sequences encoding ortholog or endogenous proteins may be integrated into the chromosomal sequence without affecting expression of the chromosomal sequence. For example, the sequence encoding the protein may be integrated into a "safe harbor (safe harbor)" locus, such as the Rosa26 locus, the HPRT locus, or the AAV locus. The disclosure also includes genetically modified animals in which two, three, four, five, six, seven, eight, nine, or ten or more sequences (including protein-encoding sequences) are integrated into the genome.

The sequence encoding chromosomal integration of the protein may encode a wild-type form of the protein of interest or may encode a protein comprising at least one modification such that an altered form of the protein is produced. For example, a sequence encoding chromosomal integration of a protein associated with a disease or disorder may comprise at least one modification such that an altered form of the produced protein causes or enhances the associated disorder. Alternatively, the sequence encoding chromosomal integration of a protein associated with a disease or disorder may comprise at least one modification such that altered forms of the protein prevent the development of the associated disorder.

In further embodiments, the genetically modified animal may be a "humanized" animal comprising at least one sequence encoding chromosomal integration of a functional human protein. The functional human protein may be free of corresponding orthologs in the genetically modified animal. Alternatively, the wild-type animal from which the genetically modified animal is derived may comprise an ortholog corresponding to a functional human protein. In this case, the orthologous sequences in the "humanized" animal are inactivated such that no functional protein is produced, and the "humanized" animal comprises at least one sequence encoding chromosomal integration of a human protein.

In another embodiment, the genetically modified animal may comprise at least one modified chromosomal sequence encoding a protein such that the expression pattern of the protein is altered. For example, regulatory regions controlling protein expression, such as promoters or transcription factor binding sites, may be altered such that the protein is overproduced, or tissue-specific or temporal expression of the protein is altered, or a combination thereof. Alternatively, conditional knockout systems can be used to alter the expression pattern of a protein. Non-limiting examples of conditional knockout systems include the Cre-lox recombination system. The Cre-lox recombination system comprises a Cre recombinase (a site-specific DNA recombinase) that catalyzes the recombination of nucleic acid sequences between specific sites (lox sites) in a nucleic acid molecule. Methods for generating time and tissue specific expression using this system are known in the art. Typically, genetically modified animals are produced that have chromosomal sequences flanked by lox sites. The genetically modified animal comprising chromosomal sequences flanking lox can then be hybridized to another genetically modified animal expressing Cre recombinase. Offspring animals are then produced that contain the chromosomal sequences flanking lox and Cre recombinase, and recombine the chromosomal sequences flanking lox, resulting in a deletion or inversion of the chromosomal sequence encoding the protein. The expression of Cre recombinase can be temporally and conditionally regulated to effect temporally and conditionally regulated recombination of chromosomal sequences.

In any of these embodiments, the modified chromosomal sequences of the genetically modified animals disclosed herein can be heterozygous. Alternatively, the modified chromosomal sequence of the genetically modified animal may be homozygous.

The genetically modified animals disclosed herein can be crossed to produce animals comprising more than one modified chromosomal sequence or to produce animals in which one or more modified chromosomal sequences are homozygous. For example, two animals comprising the same modified chromosomal sequence may be crossed to produce an animal in which the modified chromosomal sequence is homozygous. Alternatively, animals with different modified chromosomal sequences may be crossed to produce animals comprising two modified chromosomal sequences.

For example, a first animal comprising an inactivated chromosomal sequence gene "X" may be crossed with a second animal comprising a chromosomal integrated sequence encoding a human gene "X" protein to produce a "humanized" gene "X" offspring comprising both the inactivated gene "X" chromosomal sequence and the chromosomal integrated human gene "X" sequence. In addition, the humanized gene "X" animal can be crossed with a humanized gene "Y" animal to produce a humanized gene X/gene Y offspring. Those skilled in the art will appreciate that many combinations are possible.

In other embodiments, animals comprising the modified chromosomal sequence may be crossed to combine the modified chromosomal sequence with other genetic backgrounds. As non-limiting examples, other genetic backgrounds may include wild-type genetic backgrounds, genetic backgrounds with deletion mutations, genetic backgrounds with another targeted integration, genetic backgrounds with non-targeted integration.

As used herein, the term "animal" refers to a non-human animal. The animal may be an embryo, a larva or an adult. Suitable animals include vertebrates such as mammals, birds, reptiles, amphibians, crustaceans and fish. Examples of suitable mammals include, but are not limited to, rodents, companion animals, livestock and primates. Non-limiting examples of rodents include mice, rats, hamsters, gerbils and guinea pigs. Suitable companion animals include, but are not limited to, cats, dogs, rabbits, hedgehog and ferrets. Non-limiting examples of livestock include horses, goats, sheep, pigs, cattle, llamas, and alpacas. Suitable primates include, but are not limited to, pigtail, chimpanzee, lemur, macaque, marmoset, spider monkey, squirrel monkey, and green monkey. Non-limiting examples of birds include chickens, turkeys, ducks, and geese. Alternatively, the animal may be an invertebrate such as an insect, nematode or the like. Non-limiting examples of insects include fruit flies and mosquitoes. An exemplary animal is a rat. Non-limiting examples of suitable rat strains include Dahl salt sensitivity, fischer344, lewis, long Evans Hooded, sprague-Dawley, and Wistar. In one embodiment, the animal is not a genetically modified mouse. In each of the above iterations of the suitable animals of the invention, the animals do not include exogenously introduced randomly integrated transposon sequences.

Another aspect of the present disclosure provides a genetically modified cell or cell line comprising at least one modified chromosomal sequence. The genetically modified cell or cell line can be derived from any of the genetically modified animals disclosed herein. Alternatively, the chromosomal sequence in the cell may be modified using the methods described herein, as described above (in the paragraph describing chromosomal sequence modification in animals). The disclosure also includes lysates of the cells or cell lines.

Typically, the cell is a eukaryotic cell. Suitable host cells include fungi or yeasts such as Pichia (Pichia), saccharomyces (Saccharomyces) or Schizosaccharomyces (Schizosaccharomyces); insect cells such as SF9 cells from spodoptera frugiperda (Spodoptera frugiperda) or S2 cells from drosophila melanogaster (Drosophila melanogaster); and animal cells, such as mouse, rat, hamster, non-human primate, or human cells. Exemplary cells are mammalian. The mammalian cells may be primary cells. In general, any primary cell that is susceptible to double strand breaks can be used. The cells may be of various cell types, such as fibroblasts, myoblasts, T or B cells, macrophages, epithelial cells, and the like.

When a mammalian cell line is used, the cell line may be any established cell line or a primary cell line that has not been described. The cell line may be adherent or non-adherent, or the cell line may be grown under conditions that promote adherent, non-adherent, or organotypic growth using standard techniques known to those skilled in the art. Non-limiting examples of suitable mammalian cells and cell lines are provided in section (IV) (g) herein. In other embodiments, the cells may be stem cells. Non-limiting examples of suitable stem cells are provided in section (IV) (g).

The present disclosure also provides genetically modified non-human embryos comprising at least one modified chromosomal sequence. The chromosomal sequence in the embryo may be modified using the methods described herein, as described above (in the paragraph describing chromosomal sequence modification in animals). In one embodiment, the embryo is a non-human fertilized single cell stage embryo of an animal species of interest. Exemplary mammalian embryos (including single cell embryos) include, but are not limited to, mouse, rat, hamster, rodent, rabbit, cat, canine, ovine, porcine, bovine, equine, and primate embryos.

Examples

The cis-acting regulatory element greatly enhances the efficiency of PE2 and PE3

In a proof of concept experiment, a wild-type double element (dENE) for nuclease expression from rice TWIFB1 was added to the 3' end of the leader editor expression cassette immediately after the stop codon and before the mRNA terminator (e.g., BGH), as shown in fig. 2, where dENE was reverse transcribed into the RNA sequence contained in the leader editor mRNA. As shown in fig. 3 and 4, K562 cells were nuclear transfected with a leader editor (PE 2) expression construct containing dENE sequences in the construct or no dENE sequences in the construct, and also a pegRNA expression construct targeting HEK3 sites for different types of editing. In the case of PE3, an additional nicking guide RNA expression construct is added to the nuclear transfection mixture. Cells were harvested three days after nuclear transfection for Next Generation Sequencing (NGS) analysis of lead editing efficiency. As shown, the lead editor (PE 2) expression cassette contained 3' -UTR dENE greatly enhanced the editing efficiency of +1ctt insertion and +5g deletion editing for HEK3 targets to 8-fold (fig. 3), and in the case of PE3 enhanced the editing efficiency of +1t to a conversion editing and +1t deletion and +5g to C conversion editing for HEK3 targets to 2-fold (fig. 4).

The cis-acting regulatory elements enhance lead editing efficiency in a cell type dependent manner

In two independent experiments, K562 or HEK293 cells were nuclear transfected with a leader editor (PE 2) expression construct containing dENE sequences in the construct or NO dENE sequences in the construct and also a pegRNA expression construct targeting the HEK3 site (GGCCCAGACTGAGCACGTGATGG [ SEQ ID NO:26], underlined bases representing PAM sequences) for different types of editing, as shown in FIGS. 5 and 6. In the case of PE3, an additional nicking guide RNA expression construct is added to the nuclear transfection mixture. Cells were harvested three days after nuclear transfection for Next Generation Sequencing (NGS) analysis of lead editing efficiency. As shown in fig. 5, in K562 cells, the lead editor (PE 2) expression cassette contained 3' -UTR dENE enhanced the editing efficiency of +1CTT insertion and +5G deletion editing for HEK3 targets by about 50%, and similarly enhanced the editing efficiency of +1T-to-a conversion, +1CTT insertion and +5G deletion, and +1T deletion and +5G-to-C editing for HEK3 targets by about 50% in the case of PE 3. In contrast, in HEK293 cells, it was not demonstrated that the lead editor (PE 2) expression cassette contained 3' -UTR dENE enhanced the editing efficiency of the same type of editing for the same HEK3 target, and also in the case of PE3, as shown in fig. 6. This means that the 3' -UTR element dENE acts in a cell type dependent manner to enhance the efficiency of pilot editing of test targets in certain cell types.

Claims

1. A synthetic nucleic acid composition comprising: i) A sequence encoding a CRISPR-Cas protein, ii) a sequence encoding a reverse transcriptase, and iii) a sequence encoding a cis-acting regulatory element.

2. The synthetic nucleic acid composition of claim 1, wherein the CRISPR-Cas protein is nCas-H840A.

3. The synthetic nucleic acid composition of any one of claims 1 or 2, wherein the reverse transcriptase is M-MLV-RT.

4. The synthetic nucleic acid composition of any one of claims 1-3, wherein the cis-acting regulatory element is dENE or ENE.

5. The synthetic nucleic acid composition of any one of claims 1-3, wherein the cis-acting regulatory element is sRSM1.

6. The synthetic nucleic acid composition of any one of claims 1-5, wherein the nucleic acid is DNA.

7. The synthetic nucleic acid composition of any one of claims 1-5, wherein the nucleic acid is RNA.

8. The synthetic nucleic acid composition of any one of claims 1-7, wherein the composition further comprises an expression promoter.

9. The synthetic nucleic acid composition of claim 8, wherein the composition is in an expression vector.

10. The synthetic nucleic acid composition of claim 8, wherein the composition is incorporated into a transfected virus.

11. The synthetic nucleic acid composition of any one of claims 1-10, wherein the cis-acting regulatory element is located after the stop codon of the CRISPR-Cas9 sequence and before an mRNA terminator.

12. The synthetic nucleic acid composition of any one of claims 1-11, further comprising a leader editing guide RNA (pegRNA), wherein the pegRNA is derived from one of PE1, PE2, and PE 2.

13. An amino acid sequence encoded by the synthetic nucleic acid composition of any one of claims 1-11.

14. A method of modifying an endogenous DNA sequence, the method comprising:

a) Providing: i) An operable expression vector comprising a synthetic nucleic acid composition comprising: 1) a sequence encoding a CRISPR-CasII type system protein, 2) a sequence encoding a reverse transcriptase, and 3) a sequence comprising a cis-acting regulatory element; ii) a leader editing guide RNA (pegRNA) comprising a Primer Binding Site (PBS); and iii) a cell comprising a target endogenous DNA sequence that is at least 50% complementary to the PBS;

b) Transfecting said cell comprising the endogenous DNA sequence of interest with the synthetic nucleic acid composition of the invention and pegRNA; and

C) Culturing the transfected cells such that the endogenous DNA sequence is subjected to the desired modification.

15. The method of claim 14, wherein the CRISPR-Cas type II system protein is a Cas9 protein.

16. The method according to claim 14, wherein the endogenous DNA sequence is at least 75% complementary to the PBS.

17. The method according to claim 14, wherein the endogenous DNA sequence is at least 90% complementary to the PBS.

18. The method according to claim 14, wherein the endogenous DNA sequence is at least 95% complementary to the PBS.

19. The method according to claim 14, wherein the endogenous DNA sequence is at least 98% complementary to the PBS.

20. The method according to claim 14, wherein the endogenous DNA sequence is 100% complementary to the PBS.

21. The method of any one of claims 14-20, wherein the CRISPR-Cas protein is nCas-H840A.

22. The method of any one of claims 14-21, wherein the reverse transcriptase is M-MLV-RT.

23. The method of any one of claims 14-22, wherein the cis-acting regulatory element is Dene or ENE.

24. The method of any one of claims 14-22, wherein the cis-acting regulatory element is sRSM1.

25. The method of any one of claims 14-24, wherein the operable expression vector is DNA.

26. The method of any one of claims 14-24, wherein the operable expression vector is RNA.

27. The synthetic nucleic acid composition of any one of claims 14-26, wherein the composition is incorporated into a transfected virus.

28. The synthetic nucleic acid composition of any one of claims 14-27, wherein the cis-acting regulatory element is located after the stop codon of the CRISPR-Cas9 sequence and before an mRNA terminator.

29. The method of any one of claims 14-28, wherein the pegRNA is derived from one of PE1, PE2, and PE 3.

30. The method of any one of claims 14-29, wherein the CRISPR/CasII-type system protein encoded in an operable expression vector is introduced into the cell.