WO2022253277A1 - Type I-C CRISPR-Cas3系统及其应用 - Google Patents

Type I-C CRISPR-Cas3系统及其应用 Download PDF

Info

Publication number
WO2022253277A1
WO2022253277A1 PCT/CN2022/096648 CN2022096648W WO2022253277A1 WO 2022253277 A1 WO2022253277 A1 WO 2022253277A1 CN 2022096648 W CN2022096648 W CN 2022096648W WO 2022253277 A1 WO2022253277 A1 WO 2022253277A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
protein
target
direct repeat
region
Prior art date
Application number
PCT/CN2022/096648
Other languages
English (en)
French (fr)
Inventor
赖锦盛
李英男
宋伟彬
赵海铭
Original Assignee
中国农业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国农业大学 filed Critical 中国农业大学
Priority to CN202280039558.3A priority Critical patent/CN117529552A/zh
Publication of WO2022253277A1 publication Critical patent/WO2022253277A1/zh

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K67/00Rearing or breeding animals, not otherwise provided for; New or modified breeds of animals
    • A01K67/027New or modified breeds of vertebrates
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material

Definitions

  • the invention relates to the technical field of clustered regularly interspaced short palindromic repeats (CRISPR).
  • CRISPR clustered regularly interspaced short palindromic repeats
  • the present invention provides methods and compositions for nucleic acid editing (for example, gene or genome editing, large fragment deletion, single base editing, genome structure variation), including the use of Type I-C CRISPR-Cas3 system.
  • the Type I-C CRISPR-Cas3 system of the present invention can realize the deletion of precise large fragments of the genome, for example, for the knockout of any length of a single gene coding frame, the knockout of gene regulatory elements such as long lncRNA or enhancer, and the realization of gene Or single base editing of the genome and large structural variation of the genome.
  • CRISPR/Cas technology is a widely used gene editing technology. It uses RNA guidance to specifically bind to target sequences on the genome and cuts DNA to generate double-strand breaks. targeted editing of the genome.
  • the class1 system is mainly composed of a single effector protein.
  • the widely used CRISPR/Cas9 system belongs to the class1 system. type II family.
  • the class2 system is mainly composed of multiple effector proteins, and is currently divided into three families: type I, type II, and type III. The more mature research is mainly the type E system in the type I family.
  • the class2 system is the same as the class1 system.
  • the Type I-E system is mainly composed of two parts, one part is the cas3 protein with nuclease activity and the Cas5, Cas6, Cas7, Cas8e, and Cas11 proteins that form the Cascade complex.
  • the guide RNA recognizes the substrate DNA by binding to the cascade complex, and then further recruits the Cas3 protein to cleave the substrate DNA.
  • the currently reported human 293T cell editing using the type I-E system found that the type I-E system mainly induces the deletion of long-range genomic fragments. However, the length of the deletion of this fragment is random, which limits its production and application; There are few reports on the technology of eukaryotic genome editing in other families of class1.
  • the inventors of the present application have unexpectedly developed a new Type I-C CRISPR-Cas3 system or carrier system and a method of applying the system after a large number of experiments and repeated explorations, which can be used to achieve precise large fragment deletion and/or the target gene or genome Or other target nucleic acid editing (such as modifying genes, knocking out genes, changing the expression of gene products, repairing mutations, and/or inserting polynucleotides, single-base mutations, etc.).
  • the system is particularly advantageous for use in eukaryotic cells.
  • Type I-C CRISPR-Cas3 system which comprises:
  • cas5c protein or nucleotide sequence encoding cas5c protein (1) cas5c protein or nucleotide sequence encoding cas5c protein;
  • cas7 protein or nucleotide sequence encoding cas7 protein (3) cas7 protein or nucleotide sequence encoding cas7 protein;
  • cas11c protein or nucleotide sequence encoding cas11c protein (4) cas11c protein or nucleotide sequence encoding cas11c protein.
  • system further includes: (5) cas3 protein or nucleotide sequence encoding cas3 protein.
  • the protein described in any one of (1)-(5) optionally comprises an additional protein or polypeptide selected from an epitope Tags, reporter gene sequences, nuclear localization signal (NLS) sequences, targeting moieties, transcriptional activation domains (e.g., VP64), transcriptional repression domains (e.g., KRAB domains or SID domains), nuclease domains (e.g., , Fok1), adenosine deaminase (e.g., TadA8e), cytosine deaminase (e.g., APOBEC3), a domain having an activity selected from the group consisting of: methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cle
  • At least 1 (eg, at least 2, at least 3, at least 4 or all 5) of the proteins described in any one of (1)-(5) comprises the An additional protein or polypeptide; eg, the protein described in each of (1)-(5) comprises the additional protein or polypeptide.
  • the additional protein or polypeptide is an NLS sequence; for example, the protein described in each of (1)-(5) comprises an NLS sequence.
  • the NLS sequence is as shown in SEQ ID NO:15.
  • the additional protein or polypeptide is linked to the protein with or without a linker.
  • the linker is a peptide linker or a non-peptide linker.
  • the peptide linker sequence is shown in SEQ ID NO: 16, 17 or 66.
  • the NLS sequence is at, near or near the terminus (eg, N-terminal or C-terminal) of the protein.
  • the additional protein or polypeptide is an adenosine deaminase (eg, TadA8e) or a cytosine deaminase (eg, APOBEC3).
  • one of the proteins described in any one of (1)-(4) comprises adenosine deaminase (eg, TadA8e) or cytosine deaminase (eg, APOBEC3) .
  • the adenosine deaminase or cytosine deaminase amino acid sequence is located at, near or close to the terminus (eg, N-terminus or C-terminus) of the protein (eg, cas8c protein).
  • the amino acid sequence of the adenosine deaminase or cytosine deaminase is located at, close to or close to the N-terminus of the cas8c protein.
  • the cas3 protein comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 1; (ii) the same as the sequence shown in SEQ ID NO: 1 than having one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95% of the sequence shown in SEQ ID NO: 1 %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas5c protein comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 2; (ii) the same as the sequence shown in SEQ ID NO: 2 than having one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95% of the sequence shown in SEQ ID NO:2 %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas8c protein comprises, or consists of, a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 3; (ii) the same as the sequence shown in SEQ ID NO: 3 than having one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95% of the sequence shown in SEQ ID NO:2 %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas7 protein comprises a sequence selected from the following, or is composed of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 4; (ii) the same sequence as shown in SEQ ID NO: 4 than having one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95% of the sequence shown in SEQ ID NO:4 %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas11c protein comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 5; (ii) the same as the sequence shown in SEQ ID NO: 5 than having one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95% of the sequence shown in SEQ ID NO:5 %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity.
  • amino acid substitutions, deletions or additions e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%
  • the cas3 protein comprises an NLS sequence, and comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 18; (ii) the sequence shown in SEQ ID NO: 18 One or more amino acid substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) compared to the sequence shown or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least Sequences with 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas5c protein comprises an NLS sequence, and comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 19; (ii) the sequence shown in SEQ ID NO: 19 One or more amino acid substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) compared to the sequence shown or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least Sequences with 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas8c protein comprises an NLS sequence, and comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 21; (ii) the sequence shown in SEQ ID NO: 21 One or more amino acid substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) compared to the sequence shown or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least Sequences with 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas7 protein comprises an NLS sequence, and comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 20; (ii) the sequence shown in SEQ ID NO: 20 One or more amino acid substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) compared to the sequence shown or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least Sequences with 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity;
  • the cas11c protein comprises an NLS sequence, and comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) the sequence shown in SEQ ID NO: 22; (ii) the sequence shown in SEQ ID NO: 22 One or more amino acid substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids) compared to the sequence shown or (iii) have at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least Sequences with 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity.
  • amino acid substitutions, deletions or additions e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids
  • the system does not include a cas3 protein or a nucleotide sequence encoding a cas3 protein.
  • one of the cas proteins (e.g., cas5c, cas8c protein, cas7, or cas11c) in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g. , the amino acid sequence of the adenosine deaminase or cytosine deaminase is located at, near or close to the end of the cas protein (eg, N-terminal or C-terminal).
  • the cas8c protein in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g., the adenosine deaminase or cytosine deaminase
  • adenosine deaminase e.g., TadA8e
  • cytosine deaminase e.g., APOBEC3
  • the amino acid sequence of pyrimidine deaminase is located at, close to or close to the N-terminus of the cas8c protein.
  • the adenosine deaminase or cytosine deaminase is connected to the protein through a linker or not; for example, the linker is a peptide linker or a non-peptide linker; for example, the peptide linker sequence is as SEQ ID NO: 16, 17 or 66.
  • the cas8c protein in the system comprises TadA8e, and the cas8c protein comprises the sequence shown in SEQ ID NO:67.
  • the system further comprises a guide RNA (guide RNA) of the Type I-C CRISPR-Cas3 system or a nucleotide sequence encoding the guide RNA; wherein the guide RNA comprises a direct repeat sequence and a guide sequence capable of hybridizing to the target sequence.
  • guide RNA guide RNA
  • the direct repeat sequence comprises a stem-loop structure.
  • the direct repeat sequence is capable of binding to one or more cas proteins in the system; for example, the direct repeat sequence is capable of binding to a protein selected from cas5c, cas8c, cas7 protein, cas11c protein or one or more proteins; for example, the guide RNA can bind to the Cascade complex formed by cas5c protein, cas8c protein, cas7 protein, and cas11c protein.
  • the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence shown as 5'TTC- .
  • PAM protospacer adjacent motif
  • the direct repeat sequence comprises a first region and a second region, and the first region comprises a stem-loop structure.
  • the first region is located 5' to the second region.
  • the guide RNA comprises two copies of the direct repeat sequence, i.e., a first copy of the direct repeat sequence and a second copy of the direct repeat sequence, and located at A targeting sequence between the first copy of the direct repeat and the second copy of the direct repeat.
  • the guide RNA comprises the second region of the first copy of the direct repeat sequence, the targeting sequence, and the first region of the second copy of the direct repeat sequence.
  • the targeting sequence is located between the second region of the first copy of the direct repeat and the first region of the second copy of the direct repeat.
  • the second region of the first copy of the direct repeat sequence is located at the 5' end of the targeting sequence, and the first region of the second copy of the direct repeat sequence is located at the 3' end of the targeting sequence.
  • nucleotides there may or may not be excess nucleotides between the second region of the first copy of the direct repeat sequence and the targeting sequence.
  • the direct repeat sequence comprises SEQ ID NO:
  • the sequence shown in 11 may consist of the sequence shown in SEQ ID NO:11.
  • the first region of the direct repeat sequence comprises The sequence shown in SEQ ID NO: 13 or consists of the sequence shown in SEQ ID NO: 13
  • the second region of the direct repeat sequence comprises the sequence shown in SEQ ID NO: 14 or consists of the sequence shown in SEQ ID NO: 14 The sequence shown.
  • the system further comprises one or more guide RNAs of the Type I-C CRISPR-Cas3 system or a nucleotide sequence encoding the one or more guide RNAs; wherein, the one The one or more guide RNAs comprise a direct repeat sequence, a first guide sequence capable of hybridizing to a first target sequence, and a second guide sequence capable of hybridizing to a second target sequence; wherein the first target sequence and the second target sequence The sequences flank the region to be modified (eg, the region to be deleted) in the double-stranded target nucleic acid molecule, respectively.
  • the first target sequence and the second target sequence are respectively located on two single strands of the region to be modified.
  • the first target sequence and the second target sequence are respectively located at the 3' end of the region to be modified in each single strand.
  • the direct repeat sequence comprises a stem-loop structure.
  • the direct repeat sequence is capable of binding to one or more cas proteins in the system; for example, the direct repeat sequence is capable of binding to a protein selected from cas5c, cas8c, cas7 protein, cas11c protein binding to one or more proteins.
  • the guide RNA can bind to the Cascade complex formed by cas5c protein, cas8c protein, cas7 protein, and cas11c protein.
  • the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence shown as 5'TTC- .
  • PAM protospacer adjacent motif
  • the direct repeat sequence comprises a first region and a second region, and the first region comprises a stem-loop structure.
  • the first region is located 5' to the second region.
  • the one guide RNA comprises:
  • said one guide RNA comprises from 5' to 3' direction: the first copy of said direct repeat sequence, said first guide sequence, said same The second copy of the direct repeat sequence, the second targeting sequence, the third copy of the direct repeat sequence.
  • said one guide RNA comprises from 5' to 3' direction: the second region of the first copy of said direct repeat sequence, said first guide sequence , the second copy of the direct repeat sequence, the second targeting sequence, the first region of the third copy of the direct repeat sequence.
  • the direct repeat sequence is shown in SEQ ID NO: 11.
  • the first region of the direct repeat sequence comprises SEQ ID NO: 13
  • the second region of the direct repeat sequence contains or consists of the sequence shown in SEQ ID NO:14.
  • the plurality of guide RNAs comprise:
  • a first guide RNA comprising a direct repeat sequence and a first guide sequence capable of hybridizing to a first target sequence
  • a second guide RNA comprising a direct repeat sequence and a second guide sequence capable of hybridizing to a second target sequence.
  • the first guide RNA comprises two copies of the direct repeat sequence, i.e., a first copy of the direct repeat sequence and a second copy of the direct repeat sequence, and a second copy located between the two copies of the repeat sequence a targeting sequence; or, the first guide RNA comprises the second region of the first copy of the direct repeat sequence from the 5' to the 3' direction, the first targeting sequence, and the first region of the second copy of the direct repeat sequence;
  • the second guide RNA comprises two copies of the direct repeat sequence, i.e., a first copy of the direct repeat sequence and a second copy of the direct repeat sequence, and a second copy located between the two copies of the repeat sequence Two guide sequences; or, the second guide RNA comprises the second region of the first copy of the direct repeat sequence from the 5' to the 3' direction, the second guide sequence, and the first region of the second copy of the direct repeat sequence.
  • the first region of the direct repeat sequence comprises SEQ ID NO: 13
  • the second region of the direct repeat sequence contains or consists of the sequence shown in SEQ ID NO:14.
  • the application provides a Type I-C CRISPR-Cas3 vector system comprising one or more vectors, the one or more vectors comprising: a core encoding the cas protein in the Type I-C CRISPR-Cas3 system Nucleotide sequence, the cas protein comprises cas5c protein, cas8c protein, cas7 protein and cas11c protein.
  • said cas5c protein, cas8c protein, cas7 protein and cas11c protein are as defined above.
  • the one or more vectors further comprise a nucleotide sequence encoding a cas3 protein.
  • said cas3 protein is as defined above.
  • the one or more vectors comprise: a first expression cassette comprising a nucleotide sequence encoding a cas3 protein; and a second expression cassette comprising a sequence encoding a cas5c protein, a cas8c protein , the nucleotide sequences of cas7 protein and cas11c protein.
  • said first expression cassette comprises a promoter, such as an inducible promoter.
  • the second expression cassette comprises a promoter, such as an inducible promoter.
  • the nucleotide sequences encoding cas5c protein, cas8c protein, cas7 protein and cas11c protein are arranged in any order.
  • the nucleotide sequences encoding cas5c protein, cas8c protein, cas7 protein and cas11c protein are connected to each other by a nucleotide sequence encoding a self-cleaving peptide (such as T2A).
  • the one or more vectors do not comprise a nucleotide sequence encoding a cas3 protein.
  • one of the cas proteins e.g., cas5c, cas8c protein, cas7, or cas11c
  • one of the cas proteins (e.g., cas5c, cas8c protein, cas7, or cas11c) in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g. , the amino acid sequence of the adenosine deaminase or cytosine deaminase is located at, near or close to the end of the cas protein (eg, N-terminal or C-terminal).
  • the cas8c protein in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g., the adenosine deaminase or cytosine deaminase
  • adenosine deaminase e.g., TadA8e
  • cytosine deaminase e.g., APOBEC3
  • the amino acid sequence of pyrimidine deaminase is located at, close to or close to the N-terminus of the cas8c protein.
  • the adenosine deaminase or cytosine deaminase is connected to the protein through a linker or not; for example, the linker is a peptide linker or a non-peptide linker; for example, the peptide linker sequence is as SEQ ID NO: 16, 17 or 66.
  • the cas8c protein in the system comprises TadA8e, and the cas8c protein comprises the sequence shown in SEQ ID NO:67.
  • the one or more vectors comprise: a first expression cassette comprising a nucleotide sequence encoding a cas8c protein; and a second expression cassette comprising a sequence encoding a cas5c protein, a cas7 protein and the nucleotide sequence of the cas11c protein.
  • the cas8c protein comprises adenosine deaminase (eg, TadA8e) or cytosine deaminase (eg, APOBEC3).
  • said first expression cassette comprises a promoter, such as an inducible promoter.
  • the second expression cassette comprises a promoter, such as an inducible promoter.
  • the nucleotide sequences encoding cas5c protein, cas7 protein and cas11c protein are arranged in any order.
  • the nucleotide sequences encoding cas5c protein, cas7 protein and cas11c protein are connected to each other by a nucleotide sequence encoding a self-cleaving peptide (such as T2A).
  • the one or more vectors further include: a nucleotide sequence comprising a guide RNA encoding a Type I-C CRISPR-Cas3 system, the guide RNA is as described in defined in Part I.
  • the nucleotide sequence encoding the guide RNA in the Type I-C CRISPR-Cas3 system is located in an additional expression cassette; for example, the additional expression cassette comprises a promoter, such as an inducible promoter son.
  • the one or more vectors also include: a nucleotide sequence encoding one or more guide RNAs in the Type I-C CRISPR-Cas3 system, the The one or more guide RNAs are as defined in Section II.
  • the nucleotide sequence encoding one or more guide RNAs in the Type I-C CRISPR-Cas3 system is located in an additional expression cassette; for example, the additional expression cassette comprises a promoter , such as inducible promoters.
  • the nucleotide sequences encoding the cas protein are all located on the same vector.
  • nucleotide sequence encoding the cas protein and the nucleotide sequence encoding the guide RNA are both located on the same vector.
  • the application provides a Type I-C CRISPR-Cas3 system comprising: one or more guide RNAs or a nucleotide sequence encoding the one or more guide RNAs; wherein, the one or more guide RNAs comprising a direct repeat sequence, a first guide sequence capable of hybridizing to a first target sequence, and a second guide sequence capable of hybridizing to a second target sequence; wherein the first target sequence and the second target sequence They are respectively located at the flanks of the region to be modified (eg, the region to be deleted) in the double-stranded target nucleic acid molecule.
  • the first target sequence and the second target sequence are respectively located on the two single strands of the region to be modified;
  • the chains are respectively located at the 3' end of the region to be modified.
  • the direct repeat sequence comprises a stem-loop structure.
  • the direct repeat sequence is capable of binding to one or more cas proteins in the Type I-C CRISPR-Cas3 system.
  • the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence shown as 5'TTC- .
  • PAM protospacer adjacent motif
  • the direct repeat sequence comprises a first region and a second region, and the first region comprises a stem-loop structure.
  • the first region is located 5' to the second region.
  • the system, wherein the one guide RNA comprises:
  • said one guide RNA comprises from 5' to 3' direction: the first copy of said direct repeat sequence, said first guide sequence, said same The second copy of the direct repeat sequence, the second targeting sequence, the third copy of the direct repeat sequence.
  • said one guide RNA comprises from 5' to 3' direction: the second region of the first copy of said direct repeat sequence, said first guide sequence , the second copy of the direct repeat sequence, the second targeting sequence, the first region of the third copy of the direct repeat sequence.
  • the plurality of guide RNAs comprise:
  • a first guide RNA comprising a direct repeat sequence and a first guide sequence capable of hybridizing to a first target sequence
  • a second guide RNA comprising a direct repeat sequence and a second guide sequence capable of hybridizing to a second target sequence.
  • the first guide RNA comprises two copies of the direct repeat sequence, i.e., a first copy of the direct repeat sequence and a second copy of the direct repeat sequence, and a second copy located between the two copies of the repeat sequence a targeting sequence; or, the first guide RNA comprises the second region of the first copy of the direct repeat sequence from the 5' to the 3' direction, the first targeting sequence, and the first region of the second copy of the direct repeat sequence;
  • the second guide RNA comprises two copies of the direct repeat sequence, i.e., a first copy of the direct repeat sequence and a second copy of the direct repeat sequence, and a second copy located between the two copies of the repeat sequence Two guide sequences; or, the second guide RNA comprises the second region of the first copy of the direct repeat sequence from the 5' to the 3' direction, the second guide sequence, and the first region of the second copy of the direct repeat sequence.
  • the system further includes: the cas protein in the Type I-C CRISPR-Cas3 system or the nucleotide sequence encoding the cas protein.
  • each of the cas proteins further comprises additional proteins or polypeptides selected from the group consisting of epitope tags, reporter gene sequences, nuclear localization signal (NLS) sequences, targeting moieties , transcriptional activation domain (e.g., VP64), transcriptional repression domain (e.g., KRAB domain or SID domain), nuclease domain (e.g., Fok1), adenosine deaminase (e.g., TadA8e), cytosine A deaminase (e.g., APOBEC3) having an activity domain selected from the group consisting of methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, Nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, double-stranded
  • the additional protein or polypeptide is an NLS sequence.
  • the additional protein or polypeptide is an adenosine deaminase (eg, TadA8e) or a cytosine deaminase (eg, APOBEC3).
  • the cas protein comprises cas3 protein, cas5c protein, cas8c protein and cas7 protein.
  • the cas3 protein, cas5c protein, cas8c protein, cas7 protein are as defined in Part I.
  • the cas protein comprises cas3 protein, cas5c protein, cas8c protein, cas7 protein and cas11c protein.
  • the cas3 protein, cas5c protein, cas8c protein, cas7 protein and cas11c protein are as defined in Part I.
  • the cas protein comprises cas5c protein, cas8c protein, cas7 protein and cas11c protein, and does not comprise cas3 protein.
  • the cas5c protein, cas8c protein, cas7 protein and cas11c protein are as defined in Part I.
  • one of the cas proteins (e.g., cas5c, cas8c protein, cas7, or cas11c) in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g.
  • the amino acid sequence of the adenosine deaminase or cytosine deaminase is located at, near or close to the end of the cas protein (eg, N-terminal or C-terminal).
  • the cas8c protein in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g., the adenosine deaminase or cytosine deaminase
  • the amino acid sequence of pyrimidine deaminase is located at, close to or close to the N-terminus of the cas8c protein.
  • the adenosine deaminase or cytosine deaminase is connected to the protein through a linker or not; for example, the linker is a peptide linker or a non-peptide linker; for example, the peptide linker sequence is as SEQ ID NO: 16, 17 or 66.
  • the cas8c protein in the system comprises TadA8e, and the cas8c protein comprises the sequence shown in SEQ ID NO:67.
  • the application provides a Type I-C CRISPR-Cas3 vector system comprising one or more vectors, the one or more vectors comprising: encoding one or more of the Type I-C CRISPR-Cas3 system A nucleotide sequence of a guide RNA, the one or more guide RNAs being as defined in Section V.
  • the one or more vectors further comprise: a nucleotide sequence encoding the cas protein in the Type I-C CRISPR-Cas3 system.
  • each of the cas proteins further comprises additional proteins or polypeptides selected from the group consisting of epitope tags, reporter gene sequences, nuclear localization signal (NLS) sequences, targeting moieties , transcriptional activation domain (e.g., VP64), transcriptional repression domain (e.g., KRAB domain or SID domain), nuclease domain (e.g., Fok1), adenosine deaminase (e.g., TadA8e), cytosine A deaminase (e.g., APOBEC3) having an activity domain selected from the group consisting of methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, Nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, double-stranded
  • the additional protein or polypeptide is an NLS sequence.
  • the additional protein or polypeptide is an adenosine deaminase (eg, TadA8e) or a cytosine deaminase (eg, APOBEC3).
  • the cas protein comprises cas3 protein, cas5c protein, cas8c protein and cas7 protein.
  • the cas3 protein, cas5c protein, cas8c protein, cas7 protein are as defined in Part I.
  • the cas protein comprises cas3 protein, cas5c protein, cas8c protein, cas7 protein and cas11c protein.
  • the cas3 protein, cas5c protein, cas8c protein, cas7 protein and cas11c protein are as defined in Part I.
  • the cas protein comprises cas5c protein, cas8c protein, cas7 protein and cas11c protein, and does not comprise cas3 protein.
  • the cas5c protein, cas8c protein, cas7 protein and cas11c protein are as defined in Part I.
  • one of the cas proteins (e.g., cas5c, cas8c protein, cas7, or cas11c) in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g.
  • the amino acid sequence of the adenosine deaminase or cytosine deaminase is located at, near or close to the end of the cas protein (eg, N-terminal or C-terminal).
  • the cas8c protein in the system comprises adenosine deaminase (e.g., TadA8e) or cytosine deaminase (e.g., APOBEC3); e.g., the adenosine deaminase or cytosine deaminase
  • the amino acid sequence of pyrimidine deaminase is located at, close to or close to the N-terminus of the cas8c protein.
  • the adenosine deaminase or cytosine deaminase is connected to the protein through a linker or not; for example, the linker is a peptide linker or a non-peptide linker; for example, the peptide linker sequence is as SEQ ID NO: 16, 17 or 66.
  • the cas8c protein in the system comprises TadA8e, and the cas8c protein comprises the sequence shown in SEQ ID NO:67.
  • the nucleotide sequence encoding one or more guide RNAs in the Type I-C CRISPR-Cas3 system is the same as the nucleotide sequence encoding the Type I-C CRISPR-Cas3 system
  • the nucleotide sequences of cas proteins are located in different expression cassettes.
  • the nucleotide sequence encoding the cas3 protein is located in a different expression cassette from the nucleotide sequence encoding other cas proteins; for example, the nucleotide sequence encoding the cas protein located in the same expression cassette
  • the nucleotide sequences are linked to each other by a nucleotide sequence encoding a self-cleaving peptide (eg T2A).
  • the nucleotide sequences encoding the cas protein are all located on the same vector.
  • the nucleotide sequence encoding the cas protein and the nucleotide sequence encoding one or more guide RNAs are both located on the same vector.
  • the application provides a kit comprising the system or vector system described in any of I-VI; and using the system to perform nucleic acid editing (such as gene or genome editing, gene or genome large fragment deletion, Or genome single base modification, genome structure variation).
  • nucleic acid editing such as gene or genome editing, gene or genome large fragment deletion, Or genome single base modification, genome structure variation.
  • the application provides a delivery composition comprising the system or carrier system described in any of Sections I-VI, and a delivery system.
  • the delivery system is selected from particles, vesicles or viral vectors.
  • the particles comprise lipids, sugars, metals or proteins.
  • the vesicle comprises exosomes or liposomes.
  • the viral vector comprises an adenovirus, lentivirus, or adeno-associated virus.
  • the application provides a method for inducing a deletion in a target genome comprising complementary first and second nucleic acid strands, the method comprising: incorporating the system described in any of sections I-VI Either the vector system is contacted with the target genome, or delivered into a cell containing the target genome.
  • the one or more cas proteins contained in the system or carrier system are capable of forming a complex with the guide RNA, and after the complex binds to the target sequence, induces a response to the cas protein containing the Deletion of a region of target sequence.
  • the deletion is a large deletion, such as greater than 0.1 kb, greater than 0.2 kb, greater than 0.5 kb, greater than 1 kb, greater than 10 kb, greater than 100 kb, greater than 10 kb, greater than 50 kb, greater than 100 kb, such as less than Fragments of 500kb, less than 400kb, less than 300kb, and less than 200kb were deleted.
  • the one or more guide RNAs comprised by the system or vector system comprise a direct repeat sequence, a first guide sequence capable of hybridizing to a first target sequence, and a first guide sequence capable of hybridizing to a second target sequence.
  • the first target sequence is located in the first nucleic acid strand of the target genome, and the second target sequence is located in the second nucleic acid strand of the target genome; for example, in the first nucleic acid strand
  • the first target sequence is located at the 3' end of the region to be deleted
  • the second target sequence is located at the 3' end of the region to be deleted.
  • the length of the region to be deleted is greater than 0.1 kb, such as greater than 0.2 kb, greater than 0.3 kb, greater than 0.4 kb, greater than 0.5 kb; for example, the length of the region to be deleted is less than 500 kb, such as Less than 400kb, less than 300kb, less than 200kb; for example, the length of the region to be deleted is 0.2kb-200kb (for example, 0.2kb-2kb, 0.2kb-5kb, 0.2kb-10kb, 0.2kb-100kb, 0.2kb-200kb; for example 0.5kb-1.5kb, 0.5kb-2kb, 0.5kb-10kb).
  • the target genome is present in a cell, or alternatively, the target genome is present in a nucleic acid molecule (eg, a plasmid) in vitro.
  • a nucleic acid molecule eg, a plasmid
  • the cells are prokaryotic cells.
  • the cells are eukaryotic cells.
  • the cells are selected from animal cells (e.g., mammalian cells, such as human cells), plant cells (e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts).
  • animal cells e.g., mammalian cells, such as human cells
  • plant cells e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts.
  • the method is for chromosome ablation.
  • the application provides a method for inducing a structural variation of a genome comprising complementary first and second nucleic acid strands, the method comprising: using the system described in any part of I-VI or The vector system contacts the target genome, or is delivered to a cell containing the target genome.
  • the one or more cas proteins contained in the system or carrier system are capable of forming a complex with the guide RNA, and after the complex binds to the target sequence, induces a response to the cas protein containing the Deletion of regions of the target sequence thereby induces changes in genome structure.
  • the deletion is a large deletion, such as greater than 0.1 kb, greater than 0.2 kb, greater than 0.5 kb, greater than 1 kb, greater than 10 kb, greater than 100 kb, greater than 10 kb, greater than 50 kb, greater than 100 kb, such as less than Fragments of 500kb, less than 400kb, less than 300kb, and less than 200kb were deleted.
  • the one or more guide RNAs comprised by the system or vector system comprise a direct repeat sequence, a first guide sequence capable of hybridizing to a first target sequence, and a first guide sequence capable of hybridizing to a second target sequence.
  • the first target sequence is located in the first nucleic acid strand of the target genome, and the second target sequence is located in the second nucleic acid strand of the target genome; for example, in the first nucleic acid strand
  • the first target sequence is located at the 3' end of the region to be deleted
  • the second target sequence is located at the 3' end of the region to be deleted.
  • the length of the region to be deleted is greater than 0.1 kb, such as greater than 0.2 kb, greater than 0.3 kb, greater than 0.4 kb, greater than 0.5 kb; for example, the length of the region to be deleted is less than 500 kb, such as Less than 400kb, less than 300kb, less than 200kb; for example, the length of the region to be deleted is 0.2kb-200kb (for example, 0.2kb-2kb, 0.2kb-5kb, 0.2kb-10kb, 0.2kb-100kb, 0.2kb-200kb; for example 0.5kb-1.5kb, 0.5kb-2kb, 0.5kb-10kb).
  • the target genome is present in a cell, or alternatively, the target genome is present in a nucleic acid molecule (eg, a plasmid) in vitro.
  • a nucleic acid molecule eg, a plasmid
  • the cells are prokaryotic cells.
  • the cells are eukaryotic cells.
  • the cells are selected from animal cells (e.g., mammalian cells, such as human cells), plant cells (e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts).
  • animal cells e.g., mammalian cells, such as human cells
  • plant cells e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts.
  • the application provides a method for modifying a target nucleic acid molecule, comprising: contacting the system or carrier system described in any of I-VI with the target nucleic acid molecule, or delivering it to a in cells.
  • the one or more cas proteins contained in the system or carrier system are capable of forming a complex with the guide RNA, and after the complex binds to the target sequence, induces a response to the cas protein containing the Modification of a target nucleic acid molecule of a target sequence.
  • the target nucleic acid molecule is RNA or DNA.
  • the target nucleic acid molecule is double-stranded DNA.
  • the target nucleic acid molecule is a gene or genome.
  • the target nucleic acid molecule is present in a cell, or alternatively, the target nucleic acid molecule is present in a nucleic acid molecule (eg, a plasmid) in vitro.
  • a nucleic acid molecule eg, a plasmid
  • the cells are prokaryotic cells.
  • the cells are eukaryotic cells.
  • the cells are selected from animal cells (e.g., mammalian cells, such as human cells), plant cells (e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts).
  • animal cells e.g., mammalian cells, such as human cells
  • plant cells e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts.
  • said modification refers to a large deletion of said target nucleic acid molecule.
  • the modification refers to a break of the target nucleic acid molecule, such as a double-strand break of DNA; for example, the modification also includes insertion of an exogenous nucleic acid into the break.
  • the modification refers to the change of a single base (eg, cytosine, adenine) in the target nucleic acid molecule.
  • the present application provides a method for inducing a single base mutation in a target nucleic acid molecule, comprising: contacting the system or carrier system described in any part of I-VI with the target nucleic acid molecule, or delivering to In a cell comprising the target nucleic acid molecule.
  • the one or more cas proteins contained in the system or carrier system are capable of forming a complex with the guide RNA, and after the complex binds to the target sequence, induces a response to the cas protein containing the The modification of a single base in the target nucleic acid molecule of the target sequence, and a single base mutation occurs during the process of nucleic acid repair or replication.
  • the modification of the single base refers to a modification that can change the base complementary pairing mode of the base to be modified; for example, before modification, the base to be modified and the first base Complementary pairing, after being modified, the modified base is complementary paired with the second base.
  • the one or more cas proteins comprised in the system or vector system further comprise adenosine deaminase (eg, TadA8e) or cytosine deaminase (eg, APOBEC3).
  • adenosine deaminase eg, TadA8e
  • cytosine deaminase eg, APOBEC3
  • the one or more cas proteins (eg, cas8c protein) contained in the system or vector system further comprise adenosine deaminase (eg, TadA8e), the to-be-modified
  • the base is adenine.
  • adenine is complementary to thymine.
  • adenine is modified to hypoxanthine, and hypoxanthine is complementary to cytosine.
  • the one or more cas proteins (eg, cas8c protein) contained in the system or carrier system further comprise cytosine deaminase (eg, APOBEC3), the to-be-modified
  • the base is cytosine.
  • cytosine is complementary to guanine.
  • cytosine is modified to uracil, and uracil is complementary to thymine.
  • the target nucleic acid molecule is RNA or DNA.
  • the target nucleic acid molecule is double-stranded DNA.
  • the target nucleic acid molecule is a gene or genome.
  • the target nucleic acid molecule is present in a cell, or alternatively, the target nucleic acid molecule is present in a nucleic acid molecule (eg, a plasmid) in vitro.
  • a nucleic acid molecule eg, a plasmid
  • the cells are prokaryotic cells.
  • the cells are eukaryotic cells.
  • the cells are selected from animal cells (e.g., mammalian cells, such as human cells), plant cells (e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts).
  • animal cells e.g., mammalian cells, such as human cells
  • plant cells e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts.
  • the application provides a method for altering the expression of a gene product, comprising: contacting the system or vector system described in any of I-VI with a target nucleic acid molecule encoding the gene product, or delivering it to a target nucleic acid molecule comprising the gene product. In the cell of the target nucleic acid molecule.
  • the one or more cas proteins contained in the system or carrier system are capable of forming a complex with the guide RNA, and after the complex binds to the target sequence, induces a response to the cas protein containing the Modification of a target nucleic acid molecule of a target sequence thereby altering the expression of a gene product.
  • the target nucleic acid molecule is present in a cell, or the target nucleic acid molecule is present in a nucleic acid molecule (eg, a plasmid) in vitro.
  • a nucleic acid molecule eg, a plasmid
  • the cells are prokaryotic cells.
  • the cells are eukaryotic cells.
  • the cells are selected from animal cells (e.g., mammalian cells, such as human cells), plant cells (e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts).
  • animal cells e.g., mammalian cells, such as human cells
  • plant cells e.g., maize cells, maize protoplasts, rice cells, Arabidopsis cells, Arabidopsis protoplasts.
  • expression of the gene product is altered (eg, enhanced or decreased).
  • the gene product is a protein.
  • the application provides a method of producing a plant having a modified trait, said method comprising contacting a plant cell with a system or vector system as described in any of I-VI, or subjecting the plant cell to a plant as claimed in claim 1.
  • the plant is an agricultural plant such as corn, barley, cotton, rice, soybean, wheat, rice.
  • the methods for inducing deletions in the target genome methods for inducing genomic structural variations, methods for modifying target nucleic acid molecules, methods for inducing single-base mutations in target nucleic acid molecules, and gene alterations
  • the cas protein or the nucleotide sequence encoding the cas protein, the guide RNA or the nucleotide sequence encoding the guide RNA contained in the system or vector system exist in the delivery system.
  • the delivery system is selected from particles, vesicles or viral vectors.
  • the particles comprise lipids, sugars, metals or proteins.
  • the vesicle comprises exosomes or liposomes.
  • the viral vector comprises an adenovirus, lentivirus, or adeno-associated virus.
  • the application provides the system or carrier system, kit or delivery composition described in any part of I-VI, for use in nucleic acid editing, or in the preparation of a preparation for nucleic acid edit.
  • the nucleic acid editing comprises gene or genome editing.
  • the gene or genome editing includes deletion of large fragments of nucleic acid, modification of genes, knockout of genes, changes in the expression of gene products, repair mutations, and/or insertion of polynucleotides, and single-base mutations.
  • the nucleic acid editing includes inducing genome structural variation or chromosome elimination.
  • the application provides a system or carrier system, kit or delivery composition described in any part of I-VI for use in the preparation of a preparation for editing a target nucleoside in a target locus acid sequences to modify organisms or non-human organisms (such as plants).
  • Type I-C CRISPR-CAS3 system refers to a type 1 CRISPR-CAS system comprising a multi-subunit crRNA-effector complex, more specifically to type I systems, and even more specifically to subtype I-C systems .
  • Subtype I-C systems can include a number of different CAS components, including, for example, Cas3, Cas5 (e.g., cas5c), Cas7, and Cas8 (e.g., Cas8c), and optionally other CAS components (see, e.g., Makarova et al. 2020. Nature Reviews Microbiology 18(2):67–83.
  • the CAS protein used in the present application is derived or derived from a prokaryote with a natural I-C system, such as Desulfovibrio vulgaris str.
  • the amino acid sequence of the cas3 protein can be found in NCBI Genbank ID: 504337588.
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as those in I-C CRISPR-CAS3 systems from different sources
  • cas3 protein without affecting its biological function. Therefore, in the present invention, the term "cas3 protein" should include all such sequences, including for example the sequence shown in SEQ ID NO: 1 and its natural or artificial variants.
  • the amino acid sequence of the cas5c protein can be found in NCBI Genbank ID: 499490067.
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as those in I-C CRISPR-CAS3 systems from different sources
  • cas5c protein without affecting its biological function. Therefore, in the present invention, the term "cas5c protein" should include all such sequences, including for example the sequence shown in SEQ ID NO: 2 and its natural or artificial variants.
  • the amino acid sequence of the cas8c protein can be found in NCBI Genbank ID: 499490068.
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as those in I-C CRISPR-CAS3 systems from different sources
  • cas8c protein without affecting its biological function. Therefore, in the present invention, the term "cas8c protein" should include all such sequences, including for example the sequence shown in SEQ ID NO: 3 and its natural or artificial variants.
  • the amino acid sequence of the cas7 protein can be found in NCBI Genbank ID: 499490069.
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as those in I-C CRISPR-CAS3 systems from different sources
  • cas7 protein without affecting its biological function. Therefore, in the present invention, the term "cas7 protein" should include all such sequences, including for example the sequence shown in SEQ ID NO: 4 and its natural or artificial variants.
  • the amino acid sequence of the cas11c protein can be found in NCBI Genbank ID: 499490068.
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as those in I-C CRISPR-CAS3 systems from different sources
  • cas11c protein without affecting its biological function. Therefore, in the present invention, the term "cas11c protein" should include all such sequences, including for example the sequence shown in SEQ ID NO: 5 and its natural or artificial variants.
  • a guide RNA can comprise a direct repeat sequence and a guide sequence (guide sequence), or consist essentially of or consist of a direct repeat sequence and a guide sequence (also referred to as a spacer sequence in the context of an endogenous CRISPR system). (spacer)) composition.
  • a targeting sequence is any polynucleotide sequence that is sufficiently complementary to a target sequence to hybridize to and direct specific binding of a CRISPR/Cas complex to the target sequence.
  • the degree of complementarity between the target sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, Or at least 99%. Determining the optimal alignment is within the ability of one of ordinary skill in the art. For example, there are public and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, Smith-Waterman in matlab, Bowtie, Geneious, Biopython, and SeqMan.
  • the targeting sequence is at least 5, at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, At least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, or at least 50 nucleotides. In some cases, the targeting sequence is no more than 50, 45, 40, 35, 30, 25, 24, 23, 22, 21, 20, 15 in length 1, 10 or fewer nucleotides. In certain embodiments, the targeting sequence is 10-50, or 15-40, or 20-40 nucleotides in length.
  • the direct repeat sequence is at least 10, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22 in length at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 35, at least 40, at least 45, at least 50, At least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, or at least 70 nucleotides .
  • the direct repeat sequence is no more than 70, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56 in length , 55, 50, 45, 40, 35, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 15 1, 10 or fewer nucleotides.
  • the direct repeat sequence is 55-70 nucleotides in length, such as 55-65 nucleotides, such as 60-65 nucleotides, such as 62-65 nucleosides acid, eg 63-64 nucleotides.
  • the direct repeat sequence is 15-40 nucleotides in length, such as 15-38 nucleotides, such as 20-40 nucleotides, such as 22-38 nucleosides acid, for example 32 nucleotides. In certain embodiments, the direct repeat sequence is not less than 30nt in length, such as 30nt-37nt.
  • CRISPR/Cas complex refers to a ribonucleoprotein complex formed by the combination of guide RNA (guide RNA) or mature crRNA and the Cas protein, which includes hybridization to the target sequence and binding to the Cas protein. Protein-binding targeting sequence.
  • the ribonucleoprotein complex can recognize and cleave polynucleotides that can hybridize to the guide RNA or mature crRNA.
  • target sequence refers to a polynucleotide targeted by a guide sequence designed to be targeted, e.g., a sequence that is complementary to the guide sequence, wherein the target Hybridization between the sequence and the guide sequence will facilitate the formation of the CRISPR/Cas complex. Perfect complementarity is not required, so long as there is sufficient complementarity to cause hybridization and promote the formation of a CRISPR/Cas complex.
  • a target sequence can comprise any polynucleotide, such as DNA or RNA.
  • the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located in an organelle of the eukaryotic cell such as the mitochondria or chloroplast.
  • the expression "target sequence” or “target polynucleotide” may be any polynucleotide endogenous or exogenous to a cell (eg, a eukaryotic cell).
  • the target polynucleotide can be a polynucleotide present in the nucleus of a eukaryotic cell.
  • the target polynucleotide can be a sequence encoding a gene product (eg, protein) or a non-coding sequence (eg, regulatory polynucleotide or dummy DNA).
  • PAM protospacer adjacent motif
  • the exact sequence and length requirements for the PAM vary depending on the Cas effector enzyme used, but the PAM is typically a 2-5 base pair sequence adjacent to the protospacer (ie, target sequence). Those skilled in the art will be able to identify the PAM sequence to use with a given Cas effector protein.
  • the term "adenosine deaminase” refers to a protein that catalyzes the hydrolytic deamination of adenine or adenosine.
  • the adenosine deaminase catalyzes the hydrolytic deamination of adenine or adenosine to inosine in deoxyribonucleic acid (DNA).
  • the adenosine deaminase is TadA8e.
  • the amino acid sequence of the adenosine deaminase can be found in NCBI Genbank ID: UNJ19119.1 or NCBI Genbank ID: QHD44350.1.
  • adenosine deaminase shall include all such sequences, including for example the sequences shown in NCBI Genbank ID: UNJ19119.1 or NCBI Genbank ID: QHD44350.1 and their natural or artificial variants. body.
  • cytosine deaminase refers to a protein that catalyzes the hydrolytic deamination of cytidine or cytosine.
  • the cytosine deaminase is APOBEC3.
  • the amino acid sequence of the cytosine deaminase can be found in NCBI Genbank ID: 76096346 or NCBI Genbank ID: 176865758.
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as cytosine deaminases from different sources
  • mutations or variations including but not limited to, substitutions, deletions and/or additions, such as cytosine deaminases from different sources
  • cytidine deaminase shall include all such sequences, including for example the sequences shown in NCBI Genbank ID: 76096346 or NCBI Genbank ID: 176865758 and their natural or artificial variants.
  • the term "identity" is used to refer to the match of sequences between two polypeptides or between two nucleic acids.
  • the sequences are aligned for optimal comparison purposes (for example, gaps may be introduced in a first amino acid sequence or nucleic acid sequence to best align with a second amino acid or nucleic acid sequence).
  • Jiabi pair The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position.
  • the determination of percent identity between two sequences can also be accomplished using a mathematical algorithm.
  • a non-limiting example of a mathematical algorithm for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Modified from .Acad.Sci.U.S.A. 90:5873-5877. Such an algorithm was incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403.
  • vector refers to a nucleic acid delivery vehicle into which a polynucleotide can be inserted.
  • the vector is called an expression vector.
  • a vector can be introduced into a host cell by transformation, transduction or transfection, so that the genetic material elements it carries can be expressed in the host cell.
  • Vectors are well known to those skilled in the art, including but not limited to: plasmids; phagemids; cosmids; artificial chromosomes, such as yeast artificial chromosomes (YAC), bacterial artificial chromosomes (BAC) or P1-derived artificial chromosomes (PAC) ; Phage such as lambda phage or M13 phage and animal viruses.
  • artificial chromosomes such as yeast artificial chromosomes (YAC), bacterial artificial chromosomes (BAC) or P1-derived artificial chromosomes (PAC)
  • Phage such as lambda phage or M13 phage and animal viruses.
  • Animal viruses that can be used as vectors include, but are not limited to, retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpesviruses (such as herpes simplex virus), poxviruses, baculoviruses, papillomaviruses, papillomaviruses, papillomaviruses, Polyoma vacuolar virus (eg SV40).
  • retroviruses including lentiviruses
  • adenoviruses such as herpes simplex virus
  • poxviruses such as herpes simplex virus
  • baculoviruses such as herpes simplex virus
  • baculoviruses such as herpes simplex virus
  • papillomaviruses papillomaviruses
  • papillomaviruses papillomaviruses
  • Polyoma vacuolar virus eg
  • the I-C CRISPR-Cas3 system of the present invention has significant application value.
  • the I-C CRISPR-Cas3 system provided by the present invention can realize the deletion of large fragments of the genome, such as the knockout of gene coding regions, the knockout of long lncRNAs or enhancers, and the application to chromosome elimination, etc. has greater advantages .
  • the I-C CRISPR-Cas3 system provided by the present invention has pre-crRNA processing activity. Compared with the Cas9 system, it does not require tracrRNA and can be more easily applied to multi-target gene editing.
  • the guide RNA comprising two reverse target sites provided by the present invention can achieve genome-accurate fragment deletion compared with the gene editing of the type I-E system using a single target site.
  • Fig. 1 is the design map of carrier in embodiment 1.
  • Figure 1A is the vector design of the type I-C system
  • Figure 1B is the vector design of the type I-C system (D11C) with Cas11c protein deletion
  • Figure 1C is the vector design of the type I-E system.
  • FIG. 2 is the result of the detection of the YFFP reporting system in Example 2.
  • Figure 2A is a schematic diagram of the YFFP reporting system;
  • Figure 2B is the target design in the YFFP reporting system, wherein IC-1, IC-2 and IC-3 are three exemplary targets of type I-C targeting YFFP recombinant sequences;
  • 2C is the result of fluorescence microscope detection of protoplasts co-transformed by type I-C system and YFFP reporter system.
  • Fig. 3 is the detection result of the dual-luciferase reporter system in Example 3.
  • Figure 3A is the experimental flow chart of the detection of the dual luciferase reporter system
  • Figure 3B is the detection results of the dual luciferase reporter system of the type I-C system, D11C system, type I-E system, and Cas9 system, and the ordinate indicates the relative fluorescence value of each system.
  • Figure 4 is the detection of endogenous gene editing activity in maize in Example 4.
  • Figure 4A is the PCR detection result of the type I-C system
  • Figure 4B is the PCR detection result of the D11C system
  • Figure 4C is the first-generation sequencing comparison result of the first detection site on the O2 gene of the type I-C system
  • Figure 4D is the type I-C system Alignment results of the first-generation sequencing of the second detected locus on the O2 gene.
  • Fig. 5 is the endogenous gene O2 (site 1, O2-1; And, site 2, O2-2), PDL1, GL2 and IPK1 editing activity comparison of type I-C system and type I-E system in embodiment 5 result.
  • Figure 6 is the design map of the adenine single base editing vector (I-C TadA8e) in Example 6.
  • Figure 7 is the gene editing detection results of the type I-C system in stable transgenic maize plants in Example 7.
  • Figure 7A is the double-target design targeting the ZB7 gene, where #g1 and #g2 are two targets respectively;
  • Figure 7B is the double-target design targeting the GA2 gene, where #g1 and #g2 are two targets respectively Dots;
  • Figure 7C is the first-generation sequencing comparison result of the detection of editing in ZB7 gene transgenic plants;
  • Figure 7D is the first-generation sequencing comparison result of detection of editing in GA2 gene transgenic plants.
  • Figure 8 is the gene editing detection results of the type I-C system in rice stable transgenic plants in Example 8.
  • Figure 8A is the dual-target design targeting the SLR1 gene, in which #g1 and #g2 are two targets respectively;
  • Figure 8B is the comparison result of the first-generation sequencing to detect the editing status of the SLR1 gene transgenic plants.
  • Fig. 9 is the gene editing detection result of the type I-C system in Arabidopsis protoplasts in Example 9.
  • Figure 9A is the design of targets targeting RBSC1B, RBSC2B, and RBSC3B genes, where #g1 and #g2 are two targets respectively, located on the homologous sequences of RBSC1B, RBSC2B, and RBSC3B genes;
  • Figure 9B is Arabidopsis protoplasts Next-generation sequencing alignment results for detection of edits.
  • LB liquid medium 10g Tryptone (Tryptone), 5g Yeast Extract (Yeast Extract), 10g NaCl, dilute to 1L, and sterilize.
  • CTAB solution CTAB (cetyltrimethylammonium bromide) 16.7g, 5M NaCl 234mL, 1M Tris-HCl (pH 8.0) 83.5mL, 0.5M EDTA (pH 8.0) 33.4mL with distilled water to 1L volume, When using, add 100:1 ⁇ -mercaptoethanol in proportion.
  • W5 solution 154mM NaCl, 125mM CaCl 2 , 5mM KCl, 4mM MES to 500mL, adjust the pH to 5.7 with NaOH.
  • MMG solution 0.4mM mannitol, 15mM MgCl 2 , 4mM MES to 10ML.
  • the large plasmid kit was purchased from QIAGEN Company, item number: 12963.
  • the Blunt-smiple carrier was purchased from Shanghai Yisheng Biotechnology Co., Ltd., item number: CB111-02.
  • Escherichia coli competent DH5 ⁇ was purchased from Beijing Qingke Biological Co., Ltd., item number: TSV-A07.
  • the dual-luciferase reporter system detection kit was purchased from Shanghai Yisheng Biotechnology Co., Ltd., item number: 11402ES60.
  • P3301 vector was purchased from Youbao Biological Company, product number VT1386.
  • the Puc19 vector was purchased from Takara Biological Company, Cat. No. 3219.
  • a monocistronic expression vector for expressing Cas8c, Cas7, and Cas5c was designed using the maize UBI promoter and T2A splicing peptide.
  • D11C Cas11c protein deletion
  • a nuclear localization signal at the N-terminus of each protein nuclear localization signal amino acid sequence such as SEQ ID NO: 15.
  • the carrier structures are shown in Figures 1A and 1B, respectively.
  • the expression vector of the plant type I-E system (structure shown in Figure 1C) according to the application of the animal cell type I-E system as a control group.
  • the Cas3 protein was expressed by the CMV35S promoter
  • the guide RNA was expressed by the OsU3 promoter
  • the above-mentioned protein and RNA components were constructed into the P3301 vector (purchased from Youbao Bio, Cat. No.: VT1386) for subsequent Experimental detection.
  • Embodiment 2 YFFP reporting system detection
  • a YFFP reporter system In order to preliminarily identify whether the system can have DNA shearing activity, we first constructed a YFFP reporter system.
  • the construction method is: insert a 55bp DNA sequence containing the type I-C PAM recognition site after the 289th nucleotide residue of the YFP DNA sequence, and add a TGA stop codon at the same time to construct a 223bp homologous sequence on both sides.
  • the YFFP recombinant sequence of the sequence was constructed on the PUC19 vector and expressed with the 35S promoter to obtain a recombinant vector containing the YFFP reporter system.
  • the principle of the YFFP reporter system is shown in Figure 2A.
  • the type I-C system can cleave the DNA at the target site of the inserted DNA sequence, then repair YFFP into a complete YFP through the single-strand annealing repair pathway. , and then emit green fluorescence under blue excitation light. Therefore, the DNA cleavage activity of the Type I-C system can be judged by the fluorescent signal.
  • the protoplast concentration is 2*106/ml, count with a hemocytometer.
  • a dual-luciferase reporter system based on the maize protoplast transformation system.
  • the construction of this system is similar to that of the YFFP reporting system in Example 2.
  • the construction method is as follows: a 55bp DNA sequence containing a type I-C PAM recognition site is inserted into the 1190bp nucleotide residue of FLuc, and a TGA stop codon is added, and a 780bp homogeneous sequence is designed on both sides of the DNA. source arm, and express the modified amino acid sequence with 35S promoter.
  • the puc19 vector was selected as the backbone, and the 35S promoter was used to promote the expression of Rluc as the internal reference expression vector.
  • the detection process of the dual-luciferase reporter system is shown in Figure 3A. If the Cas system (I-C, D11C, Cas9 system, or type I-E system) can perform DNA detection at the target site of the inserted DNA sequence After cleavage, the Fluc is repaired into a complete Fluc through the single-strand annealing repair pathway, and then the luciferase activity is restored. Therefore, the DNA shearing activity of the Cas system can be judged by measuring luciferase activity.
  • the I-C, D11C, Cas9, and I-E vectors constructed in the above step 2 were respectively combined with the dual luciferase reporter system constructed in the above step 1 for protoplast co-transformation.
  • step 2 we select a DNA sequence (34nt) with 5'-TTC characteristics as the target site, and select two DNA sequences (34nt) with a reverse distance of about 1kb as the interval sequence for the construction of the U3-RNA vector. Afterwards, each constructed U3-RNA vector was connected to the p3301 vector. In order to further evaluate the role of Cas11c protein, we simultaneously ligated the U3-RNA vector targeting O2 gene to the p3301 vector lacking Cas11c protein.
  • the type I-E system as an activity reference, and referred to the double-target design of type I-C in the above step 2, and selected a DNA sequence with 5'-AAG characteristics (32nt) as the targeting site, a vector based on the type I-E system was designed.
  • the target design method is the same as that of the type I-C system, and the distance between the two reverse target sites is about 1kb.
  • step 2 and step 3 were respectively transformed into protoplasts, and the DNA of the maize genome was extracted after culturing at 28°C for 48 hours.
  • Design primers to amplify the region of about 1kb upstream and downstream of the target connect the amplified product to the Blunt-simple vector, randomly select 96 recombinant clones to detect bacterial P with M13F/M13R primer pairs, and perform gel electrophoresis analysis.
  • the electrophoresis results are shown in Figure 4A, and the results show that the edited product of the type I-C system has a PCR band (about 1 kb) at the O2 gene locus that is less than the amplified length (2Kb) of the wild-type genome, as marked by an asterisk in Figure 4A shown in the lane).
  • the PCR products marked with red asterisks were subjected to first-generation sequencing with M13F, and the results of the first-generation sequencing were compared with the B73 reference genome. It was found that the sequences contained large deletions, and the missing fragments were mainly between the two targets ( As shown in Figure 4C and 4D).
  • Example 5 Comparison of eukaryotic editing activity between Type I-C system and type I-E system
  • Example 4 we detected the DNA of the protoplasts transformed by the vector shown in Figure 1C by the type I-E system, and carried out a generation sequencing of its PCR amplification products, and the sequencing results showed that type
  • the deletion of fragments produced by the I-E system is also mainly located between the two targets.
  • the editing results of a single target used in the previous literature (ref: Dolan et al.2019) (ref: Hiroyuki Morisaka et al.2019) mainly caused the deletion of random large fragments. Therefore, the two reverse methods of the present invention
  • the design of the target site makes up for the deficiency of the type I system for the deletion of random lengths of fragments, and improves the accuracy of the length of fragment deletions.
  • 2.type I-E system and type I-C of the present invention are in maize endogenous gene O2 (site 1, O2-1; And, site 2, O2-2), the editing efficiency of PDL1, GL2 and IPK1 is as follows As shown in Figure 5, it can be seen from the figure that the editing efficiency of the type I-C system is between 5% and 55%, with an average efficiency of 23.14%, and that of the type I-E system is between 4% and 45%, with an average efficiency of 23.14%. The efficiency is 14.87%. Therefore, among the maize endogenous genes we have tested so far, the editing efficiency of the type I-C system is much higher than that of the type I-E system that has been applied to animal cell editing.
  • Example 6 Type I-C system is used for adenine base editing
  • a monocistronic expression vector for expressing Cas7, Cas5c, and Cas11c was designed using the corn UBI promoter and T2A splicing peptide.
  • the TadA8e-Cas8c fusion protein was expressed with the CMV35S promoter, and the N-terminus of each protein was added Nuclear localization signal (the amino acid sequence of the nuclear localization signal is shown in SEQ ID NO: 15).
  • the guide RNA was expressed through the OsU3 promoter, and the above protein and RNA components were constructed into the P3301 vector (purchased from Youbao Biology, Cat. No.: VT1386) for subsequent experiments.
  • the carrier design map is shown in Figure 6.
  • TTC type I-C recognition PAM
  • I-C TadA8e adenine single base editing vector
  • step 2 we select a DNA sequence (34nt) with 5'-TTC characteristics as the target site, and select two DNA sequences (34nt) with a reverse distance of about 1kb as the interval sequence for the construction of the U3-RNA vector. Afterwards, each constructed U3-RNA vector was connected to the p3301 vector.
  • clones containing one or more deletions in each transgenic event are considered to be gene-edited positive plants, and the proportions of gene-edited positive plants of the ZB7 gene and GA2 gene are 86.67% and 60%, as shown in Table 2.
  • step 2 we select a DNA sequence (34nt) with 5'-TTC characteristics as the target site, and select two DNA sequences (34nt) with a reverse distance of about 1kb as the interval sequence for the construction of the U3-RNA vector. Then each constructed U3-RNA vector was ligated to the p1300 vector.
  • step 3 Transform the vector constructed in step 2 into Agrobacterium and regenerate the callus, extract the DNA of the leaves of the transgenic plants of the T0 generation and perform PCR amplification.
  • the detection method is the same as the detection method in step 3 in Example 7.
  • Genome-specific primers are designed near the upstream and downstream 500 bp of the target for PCR amplification, and the amplified PCR is connected to the Blunt-simple vector, and 24 recombinant
  • the M13F/M13R primer pair was used for cloning to perform bacterial P detection and first-generation sequencing, and compare the results of the first-generation sequencing with the genome sequence of the reference gene, and the comparison results are shown in Figure 8B.
  • step 3 clones containing one or more deletions in each transgenic event are considered to be gene editing-positive plants. According to the statistical results in Table 3, the gene in the T0 generation of rice stable transgenic plants The ratio of editing positive plants was 80%.
  • step 2 we select a DNA sequence (34nt) with 5'-TTC characteristics as the target site, and select two DNA sequences (34nt) with a maximum reverse distance of about 7kb The construction of the U3-RNA vector was performed as a spacer sequence. Then each constructed U3-RNA vector was ligated to the p1300 vector.
  • the vector constructed in step 2 was transformed into protoplasts of Arabidopsis thaliana, and the DNA of the protoplasts was extracted after dark culture at 22° C. for 48 hours and amplified by PCR.
  • the detection method is the same as the detection method in step 4 in Example 4.
  • Genome-specific primers are designed near the upstream and downstream 500 bp of the target for PCR amplification, and the amplified PCR is connected to the Blunt-simple vector, and 96 recombinant
  • the M13F/M13R primer pair was used for cloning to perform bacterial P detection and first-generation sequencing, and compare the results of the first-generation sequencing with the genome sequence of the reference gene, and the comparison results are shown in Figure 9B. According to the first-generation sequencing results, the editing efficiency of the type I-C system in Arabidopsis protoplasts was 7.29% (as shown in Table 4).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Environmental Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Animal Husbandry (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biophysics (AREA)
  • Botany (AREA)
  • Developmental Biology & Embryology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

涉及规律成簇的间隔短回文重复(CRISPR)技术领域。提供了用于核酸编辑(例如,基因或基因组编辑,大片段缺失,单碱基编辑,基因组结构变异)的方法以及组合物,包括使用Type I-C CRISPR-Cas3系统。Type I-C CRISPR-Cas3系统可以实现基因组的精确的大片段缺失,例如,对于单个基因编码框任意长度的敲除,基因调控元件如长的lncRNA或者增强子的敲除,以及实现基因或基因组的单碱基编辑和基因组大的结构变异。

Description

Type I-C CRISPR-Cas3系统及其应用 技术领域
本发明涉及规律成簇的间隔短回文重复(CRISPR)技术领域。具体而言,本发明提供了用于核酸编辑(例如,基因或基因组编辑,大片段缺失,单碱基编辑,基因组结构变异)的方法以及组合物,包括使用Type I-C CRISPR-Cas3系统。本发明的Type I-C CRISPR-Cas3系统可以实现基因组的精确的大片段的缺失,例如,对于单个基因编码框任意长度的敲除,基因调控元件如长的lncRNA或者增强子的敲除,以及实现基因或基因组的单碱基编辑和基因组大的结构变异。
背景技术
CRISPR/Cas技术是一种被广泛使用的基因编辑技术,它通过RNA引导对基因组上的靶序列进行特异性结合并切割DNA产生双链断裂,进而通过利用生物非同源末端连接或同源重组的修复方式进行基因组的定点编辑。
目前基于现有的CRISPR系统的分类可以将其分为class1和class2两大类(Liu and Doudna 2020),class1系统主要由单个效应蛋白组成,被广泛应用的CRISPR/Cas9系统即属于class1系统中的type Ⅱ家族。尽管CRISPR/Cas9系统在基因编辑领域的技术应用已经很成熟,然而由于CRISPR/Cas9在基因组编辑后所产生编辑类型主要为小的缺失片段,因此,利用CRISPR/Cas9系统进行基因组大片段删除或者染色体消除等方面的应用仍然具有很大难度。
class2系统主要由多个效应蛋白组成,目前主要分为type I、type Ⅱ、type Ⅲ共3个家族,研究较为成熟的主要为type I家族中E型系统。class2系统与class1系统相同,在guide RNA的引导下通过对PAM基序的识别进而入侵靶序列,实现对底物DNA的结合和切割。Type I-E系统主要由两个部分组成,一部分为具有核酸酶活性的cas3蛋白以及形成Cascade复合体的Cas5,Cas6,Cas7,Cas8e,Cas11蛋白。guide RNA通过与级联复合体结合识别底物DNA,之后进一步招募Cas3蛋白对底物DNA进行裂解。目前已报道的利用type I-E系统进行人类293T细胞编辑发现type I-E系统主要诱导基因组远程的长片段的缺失,然而这种片段的缺失长度是随机的,使其在生产应用上具有局限性;同时利用class1其他家族进行真核生物基因组编辑的技术鲜有报道。
因此,鉴于目前CRISPR/Cas系统对于基因组编辑产生的缺失长度的缺陷以及type I系统编辑产生的随机片段缺失的局限性,开发一种更稳健的、可以实现基因组精确的大片段缺失的CRISPR/Cas系统对具有重要意义。
发明内容
本申请的发明人经过大量实验和反复摸索,出人意料地开发了新的Type I-C CRISPR-Cas3系统或载体系统以及应用所述系统的方法,其可用于实现靶基因或基因组精确的大片段缺失和/或其他靶核酸编辑(例如修饰基因、敲除基因、改变基因产物的表达、修复突变、和/或插入多核苷酸、单碱基突变等)。在某些实施方案中,所述系统在真核细胞中的应用是特别有利的。
I.含cas11c的Type I-C系统
一方面,本申请提供了一种Type I-C CRISPR-Cas3系统,其包含:
(1)cas5c蛋白或编码cas5c蛋白的核苷酸序列;
(2)cas8c蛋白或编码cas8c蛋白的核苷酸序列,
(3)cas7蛋白或编码cas7蛋白的核苷酸序列;以及,
(4)cas11c蛋白或编码cas11c蛋白的核苷酸序列。
在某些优选的实施方案中,所述系统还包括:(5)cas3蛋白或编码cas3蛋白的核苷酸序列。
在某些优选的实施方案中,所述系统中,(1)-(5)任一项中所述的蛋白任选地包含另外的蛋白或多肽,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),腺苷脱氨酶(例如,TadA8e),胞嘧啶脱氨酶(例如,APOBEC3),具有选自下列的活性的结构域:甲基化酶活性,去甲基化酶活性,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合。
在某些优选的实施方案中,(1)-(5)任一项中所述的蛋白中的至少1个(例如至少2个,至少3个,至少4个或全部5个)包含所述另外的蛋白或多肽;例如,(1)-(5)每一项中所述的蛋白均包含所述另外的蛋白或多肽。
在某些优选的实施方案中,所述另外的蛋白或多肽是NLS序列;例如,(1)-(5)每一项中所述的蛋白均包含NLS序列。在某些优选的实施方案中,所述NLS序列如SEQ ID NO:15所示。
在某些优选的实施方案中,所述另外的蛋白或多肽通过接头或者不通过接头与所述蛋白连接。在某些优选的实施方案中,所述接头是肽接头或非肽接头。在某些优选的实施方案中,所述肽接头序列如SEQ ID NO:16、17或66所示。
在某些优选的实施方案中,所述NLS序列位于、靠近或接近所述蛋白的末端(例如,N端或C端)。
在某些优选的实施方案中,所述另外的蛋白或多肽是腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。在某些优选的实施方案中,(1)-(4)任一项中所述的蛋白中的1个包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。在某些优选的实施方案中,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述蛋白(例如cas8c蛋白)的末端(例如,N端或C端)。在某些优选的实施方案中,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端。
在某些优选的实施方案中,所述系统中:
(1)所述cas3蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:1所示的序列;(ii)与SEQ ID NO:1所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:1所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(2)所述cas5c蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:2所示的序列;(ii)与SEQ ID NO:2所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:2所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(3)所述cas8c蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:3所示的序列;(ii)与SEQ ID NO:3所示的序列相比具有一个或多个氨基酸的置 换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:2所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(4)所述cas7蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:4所示的序列;(ii)与SEQ ID NO:4所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:4所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(5)所述cas11c蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:5所示的序列;(ii)与SEQ ID NO:5所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:5所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。
在某些优选的实施方案中,所述系统中:
(1)所述cas3蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:18所示的序列;(ii)与SEQ ID NO:18所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:18所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(2)所述cas5c蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:19所示的序列;(ii)与SEQ ID NO:19所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:19所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少 94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(3)所述cas8c蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:21所示的序列;(ii)与SEQ ID NO:21所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:21所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(4)所述cas7蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:20所示的序列;(ii)与SEQ ID NO:20所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:20所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
(5)所述cas11c蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:22所示的序列;(ii)与SEQ ID NO:22所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:22所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。
在某些优选的实施方案中,所述系统不包括cas3蛋白或编码cas3蛋白的核苷酸序列。在此类实施方案中,所述系统中的一个cas蛋白(例如cas5c、cas8c蛋白、cas7或cas11c)包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶的氨基酸序列位于、靠近或接近所述cas蛋白的末端(例如,N端或C端)。在某些优选的实施方案中,所述系统中的cas8c蛋白包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶 脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端。例如,所述腺苷脱氨酶或胞嘧啶脱氨酶通过接头或者不通过接头与所述蛋白连接;例如,所述接头是肽接头或非肽接头;例如,所述肽接头序列如SEQ ID NO:16、17或66所示。例如,所述系统中的cas8c蛋白包含TadA8e,所述cas8c蛋白包含如SEQ ID NO:67所示的序列。
在某些优选的实施方案中,所述系统进一步包含Type I-C CRISPR-Cas3系统的导向RNA(guide RNA)或编码所述导向RNA的核苷酸序列;其中,所述导向RNA包含同向重复序列以及能够与靶序列杂交的导向序列。
在某些优选的实施方案中,所述同向重复序列包含茎环结构。
在某些优选的实施方案中,所述同向重复序列能够与所述系统中的一种或多种cas蛋白结合;例如,所述同向重复序列能够与选自cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白中的一种或多种蛋白结合;例如,所述导向RNA能够与cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白形成的Cascade复合物结合。
在某些优选的实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’TTC-所示的序列。
在某些优选的实施方案中,所述系统中,所述同向重复序列包含第一区域和第二区域,所述第一区域包含茎环结构。
在某些优选的实施方案中,所述第一区域位于所述第二区域的5’端。
在某些优选的实施方案中,所述第一区域与所述第二区域之间含有或不含有多余核苷酸。
在某些优选的实施方案中,所述系统中,所述导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述同向重复序列第一拷贝和同向重复序列第二拷贝之间的导向序列。
在某些优选的实施方案中,所述系统中,所述导向RNA包含同向重复序列第一拷贝的第二区域,导向序列,以及同向重复序列第二拷贝的第一区域。
在某些优选的实施方案中,所述导向序列位于所述同向重复序列第一拷贝的第二区域和所述同向重复序列第二拷贝的第一区域之间。
在某些优选的实施方案中,所述同向重复序列第一拷贝的第二区域位于所述导向序列的5’端,并且,所述同向重复序列第二拷贝的第一区域位于所述导向序列的3’端。
在某些优选的实施方案中,所述同向重复序列第一拷贝的第二区域与所述导向序列之 间含有或不含有多余核苷酸。
在某些优选的实施方案中,所述导向序列与所述同向重复序列第二拷贝的第一区域之间含有或不含有多余核苷酸。
在某些优选的实施方案中,所述系统中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的序列如上文中所定义时,所述同向重复序列包含SEQ ID NO:11所示的序列或由SEQ ID NO:11所示的序列组成。
在某些优选的实施方案中,所述系统中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的序列如上文中所定义时,所述同向重复序列的第一区域包含SEQ ID NO:13所示的序列或由SEQ ID NO:13所示的序列组成,所述同向重复序列的第二区域包含SEQ ID NO:14所示的序列或由SEQ ID NO:14所示的序列组成。
II.含cas11c的双靶向Type I-C系统
在某些优选的实施方案中,所述系统进一步包含Type I-C CRISPR-Cas3系统的一种或多种导向RNA或编码所述一种或多种导向RNA的核苷酸序列;其中,所述一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;其中,所述第一靶序列和第二靶序列分别位于双链靶核酸分子中待修饰区域(例如待缺失区域)的侧翼。
在某些优选的实施方案中,所述第一靶序列和第二靶序列分别位于所述待修饰区域的两条单链上。例如,所述第一靶序列和第二靶序列在各自单链中分别位于所述待修饰区域的3’端。
在某些优选的实施方案中,所述同向重复序列包含茎环结构。
在某些优选的实施方案中,所述同向重复序列能够与所述系统中的一种或多种cas蛋白结合;例如,所述同向重复序列能够与选自cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白中的一种或多种蛋白结合。例如,所述导向RNA能够与cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白形成的Cascade复合物结合。
在某些优选的实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’TTC-所示的序列。
在某些优选的实施方案中,所述系统中,所述同向重复序列包含第一区域和第二区域,所述第一区域包含茎环结构。
在某些优选的实施方案中,所述第一区域位于所述第二区域的5’端。
在某些优选的实施方案中,所述第一区域与所述第二区域之间含有或不含有多余核苷酸。
在某些优选的实施方案中,所述系统中,所述一种导向RNA包含:
(i)同向重复序列的第一拷贝,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝;或者,
(ii)同向重复序列的第一拷贝的第二区域,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝的第一区域。
在某些优选的实施方案中,(i)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝。
在某些优选的实施方案中,(ii)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝的第二区域,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝的第一区域。
在某些优选的实施方案中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的序列如上文中所定义时,所述同向重复序列如SEQ ID NO:11所示。
在某些优选的实施方案中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如上文中所定义时,所述同向重复序列的第一区域包含SEQ ID NO:13所示的序列或由SEQ ID NO:13所示的序列组成,所述同向重复序列的第二区域包含SEQ ID NO:14所示的序列或由SEQ ID NO:14所示的序列组成。
在某些优选的实施方案中,所述系统中,所述多种导向RNA包含:
包含同向重复序列以及能够与第一靶序列杂交的第一导向序列的第一导向RNA;和
包含同向重复序列以及能够与第二靶序列杂交的第二导向序列的第二导向RNA。
在某些优选的实施方案中,所述系统中:
所述第一导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第一导向序列;或者,所述第一导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第一导向 序列,以及同向重复序列第二拷贝的第一区域;
所述第二导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第二导向序列;或者,所述第二导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第二导向序列,以及同向重复序列第二拷贝的第一区域。
在某些优选的实施方案中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如上文中所定义时,所述同向重复序列的第一区域包含SEQ ID NO:13所示的序列或由SEQ ID NO:13所示的序列组成,所述同向重复序列的第二区域包含SEQ ID NO:14所示的序列或由SEQ ID NO:14所示的序列组成。
III.含cas11c的Type I-C载体系统
在一方面,本申请提供了一种Type I-C CRISPR-Cas3载体系统,其包含一种或多种载体,所述一种或多种载体包含:编码Type I-C CRISPR-Cas3系统中的cas蛋白的核苷酸序列,所述cas蛋白包含cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白。
在某些优选的实施方案中,所述cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如上文中定义。
在某些优选的实施方案中,所述一种或多种载体还包含编码cas3蛋白的核苷酸序列。在某些优选的实施方案中,所述cas3蛋白如上文中定义。在某些优选的实施方案中,所述一种或多种载体包含:第一表达盒,其包含编码cas3蛋白的核苷酸序列;以及,第二表达盒,其包含编码cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列。例如,所述第一表达盒包含启动子,例如诱导型启动子。例如,所述第二表达盒包含启动子,例如诱导型启动子。例如,在所述第二表达盒中,所述编码cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列以任意顺序排列。例如,在所述第二表达盒中,所述编码cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列彼此之间由编码自裂解肽(例如T2A)的核苷酸序列连接。
在某些优选的实施方案中,所述一种或多种载体不包含编码cas3蛋白的核苷酸序列。在此类实施方案中,所述系统中的一个cas蛋白(例如cas5c、cas8c蛋白、cas7或cas11c)包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶的氨基酸序列位于、靠近或接近所述cas蛋白的末端(例 如,N端或C端)。在某些优选的实施方案中,所述系统中的cas8c蛋白包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端。例如,所述腺苷脱氨酶或胞嘧啶脱氨酶通过接头或者不通过接头与所述蛋白连接;例如,所述接头是肽接头或非肽接头;例如,所述肽接头序列如SEQ ID NO:16、17或66所示。例如,所述系统中的cas8c蛋白包含TadA8e,所述cas8c蛋白包含如SEQ ID NO:67所示的序列。
在某些优选的实施方案中,所述一种或多种载体包含:第一表达盒,其包含编码cas8c蛋白的核苷酸序列;以及,第二表达盒,其包含编码cas5c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列。例如,所述cas8c蛋白包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。例如,所述第一表达盒包含启动子,例如诱导型启动子。例如,所述第二表达盒包含启动子,例如诱导型启动子。例如,在所述第二表达盒中,所述编码cas5c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列以任意顺序排列。例如,在所述第二表达盒中,所述编码cas5c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列彼此之间由编码自裂解肽(例如T2A)的核苷酸序列连接。
在某些优选的实施方案中,所述载体系统中,所述一种或多种载体还包括:包含编码Type I-C CRISPR-Cas3系统中的导向RNA的核苷酸序列,所述导向RNA如第I部分中定义。
在某些优选的实施方案中,所述编码Type I-C CRISPR-Cas3系统中的导向RNA的核苷酸序列位于另外的表达盒中;例如,所述另外的表达盒包含启动子,例如诱导型启动子。
IV.含cas11c的双靶向Type I-C载体系统
在某些优选的实施方案中,所述载体系统中,所述一种或多种载体还包括:编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列,所述一种或多种导向RNA如第II部分中定义。
在某些优选的实施方案中,所述编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列位于另外的表达盒中;例如,所述另外的表达盒包含启动子,例如诱导型启动子。
在某些优选的实施方案中,所述载体系统中,所述编码cas蛋白的核苷酸序列均位于同一载体上。
在某些优选的实施方案中,所述编码cas蛋白的核苷酸序列以及编码导向RNA的核苷酸序列均位于同一载体上。
V.双靶向Type I-C CRISPR-Cas3系统
在一方面,本申请提供了一种Type I-C CRISPR-Cas3系统,其包含:一种或多种导向RNA或编码所述一种或多种导向RNA的核苷酸序列;其中,所述一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;其中,所述第一靶序列和第二靶序列分别位于双链靶核酸分子中待修饰区域(例如待缺失区域)的侧翼。
在某些优选的实施方案中,所述第一靶序列和第二靶序列分别位于所述待修饰区域的两条单链上;例如,所述第一靶序列和第二靶序列在各自单链中分别位于所述待修饰区域的3’端。
在某些优选的实施方案中,所述同向重复序列包含茎环结构。
在某些优选的实施方案中,所述同向重复序列能够与Type I-C CRISPR-Cas3系统中的一种或多种cas蛋白结合。
在某些优选的实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’TTC-所示的序列。
在某些优选的实施方案中,所述系统中,所述同向重复序列包含第一区域和第二区域,所述第一区域包含茎环结构。
在某些优选的实施方案中,所述第一区域位于所述第二区域的5’端。
在某些优选的实施方案中,所述第一区域与所述第二区域之间含有或不含有多余核苷酸。
在某些优选的实施方案中,所述系统,其中,所述一种导向RNA包含:
(i)同向重复序列的第一拷贝,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝;或者,
(ii)同向重复序列的第一拷贝的第二区域,能够与第一靶序列杂交的第一导向序 列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝的第一区域。
在某些优选的实施方案中,(i)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝。
在某些优选的实施方案中,(ii)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝的第二区域,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝的第一区域。
在某些优选的实施方案中,所述系统中,所述多种导向RNA包含:
包含同向重复序列以及能够与第一靶序列杂交的第一导向序列的第一导向RNA;和
包含同向重复序列以及能够与第二靶序列杂交的第二导向序列的第二导向RNA。
在某些优选的实施方案中,所述系统中:
所述第一导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第一导向序列;或者,所述第一导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第一导向序列,以及同向重复序列第二拷贝的第一区域;
所述第二导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第二导向序列;或者,所述第二导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第二导向序列,以及同向重复序列第二拷贝的第一区域。
在某些优选的实施方案中,所述系统中,所述系统进一步包括:Type I-C CRISPR-Cas3系统中的cas蛋白或编码所述cas蛋白的核苷酸序列。
在某些优选的实施方案中,所述cas蛋白各自还包含另外的蛋白或多肽,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),腺苷脱氨酶(例如,TadA8e),胞嘧啶脱氨酶(例如,APOBEC3),具有选自下列的活性的结构域:甲基化酶活性,去甲基化酶活性,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性; 以及其任意组合。
在某些优选的实施方案中,所述另外的蛋白或多肽是NLS序列。在某些优选的实施方案中,所述另外的蛋白或多肽是腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。
在某些优选的实施方案中,所述系统中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白和cas7蛋白。在某些优选的实施方案中,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白如第I部分中定义。
在某些优选的实施方案中,所述的系统中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白。在某些优选的实施方案中,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白如第I部分中定义。
在某些优选的实施方案中,所述cas蛋白包含cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白,并且不包含cas3蛋白。在某些优选的实施方案中,所述cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白如第I部分中定义。在此类实施方案中,所述系统中的一个cas蛋白(例如cas5c、cas8c蛋白、cas7或cas11c)包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶的氨基酸序列位于、靠近或接近所述cas蛋白的末端(例如,N端或C端)。在某些优选的实施方案中,所述系统中的cas8c蛋白包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端。例如,所述腺苷脱氨酶或胞嘧啶脱氨酶通过接头或者不通过接头与所述蛋白连接;例如,所述接头是肽接头或非肽接头;例如,所述肽接头序列如SEQ ID NO:16、17或66所示。例如,所述系统中的cas8c蛋白包含TadA8e,所述cas8c蛋白包含如SEQ ID NO:67所示的序列。
VI.双靶向Type I-C CRISPR-Cas3载体系统
在一方面,本申请提供了一种Type I-C CRISPR-Cas3载体系统,其包含一种或多种载体,所述一种或多种载体包含:编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列,所述一种或多种导向RNA如第V部分中定义。
在某些优选的实施方案中,所述载体系统中,所述一种或多种载体还包含:编码 Type I-C CRISPR-Cas3系统中的cas蛋白的核苷酸序列。
在某些优选的实施方案中,所述cas蛋白各自还包含另外的蛋白或多肽,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),腺苷脱氨酶(例如,TadA8e),胞嘧啶脱氨酶(例如,APOBEC3),具有选自下列的活性的结构域:甲基化酶活性,去甲基化酶活性,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合。
在某些优选的实施方案中,所述另外的蛋白或多肽是NLS序列。在某些优选的实施方案中,所述另外的蛋白或多肽是腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。
在某些优选的实施方案中,所述系统中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白和cas7蛋白。在某些优选的实施方案中,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白如第I部分中定义。
在某些优选的实施方案中,所述的系统中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白。在某些优选的实施方案中,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白如第I部分中定义。
在某些优选的实施方案中,所述cas蛋白包含cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白,并且不包含cas3蛋白。在某些优选的实施方案中,所述cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白如第I部分中定义。在此类实施方案中,所述系统中的一个cas蛋白(例如cas5c、cas8c蛋白、cas7或cas11c)包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶的氨基酸序列位于、靠近或接近所述cas蛋白的末端(例如,N端或C端)。在某些优选的实施方案中,所述系统中的cas8c蛋白包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端。例如,所述腺苷脱氨酶或胞嘧啶脱氨酶通过接头或者不通过接头与所述蛋白连接;例如,所述接头是肽接头或非肽接头;例如,所述肽接头序列如 SEQ ID NO:16、17或66所示。例如,所述系统中的cas8c蛋白包含TadA8e,所述cas8c蛋白包含如SEQ ID NO:67所示的序列。
在某些优选的实施方案中,所述载体系统中,所述编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列与所述编码Type I-C CRISPR-Cas3系统中的cas蛋白的核苷酸序列位于不同的表达盒内。
在某些优选的实施方案中,所述编码cas3蛋白的核苷酸序列与编码其他cas蛋白的核苷酸序列位于不同的表达盒内;例如,位于同一表达盒内的所述编码cas蛋白的核苷酸序列彼此之间由编码自裂解肽(例如T2A)的核苷酸序列连接。
在某些优选的实施方案中,所述的载体系统,其中,所述编码cas蛋白的核苷酸序列均位于同一载体上。
在某些优选的实施方案中,所述编码cas蛋白的核苷酸序列以及所述编码一种或多种导向RNA的核苷酸序列均位于同一载体上。
试剂盒
在一方面,本申请提供了试剂盒,其包括I-VI任一部分中所述的系统或载体系统;以及使用所述系统进行核酸编辑(例如基因或基因组编辑,基因或基因组大片段缺失,基因或基因组单碱基修饰,基因组结构变异)的说明书。
递送组合物
在一方面,本申请提供了递送组合物,其包含I-VI任一部分中所述的系统或载体系统,以及递送系统。
在某些优选的实施方案中,所述递送系统选自粒子、囊泡或病毒载体。
在某些优选的实施方案中,所述粒子包含脂质、糖、金属或蛋白质。
在某些优选的实施方案中,所述囊泡包含外来体或脂质体。
在某些优选的实施方案中,所述病毒载体包含腺病毒、慢病毒或腺相关病毒。
方法
在一方面,本申请提供了在靶基因组中诱导缺失的方法,所述靶基因组包含互补的第一核酸链和第二核酸链,所述方法包括:将I-VI任一部分中所述的系统或载体系统与所 述靶基因组接触,或者递送至包含所述靶基因组的细胞中。
在某些优选的实施方案中,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的区域的缺失。
在某些优选的实施方案中,所述缺失是大片段缺失,例如大于0.1kb、大于0.2kb、大于0.5kb、大于1kb、大于10kb、大于100kb、大于10kb、大于50kb、大于100kb、例如小于500kb、小于400kb、小于300kb、小于200kb的片段缺失。
在某些优选的实施方案中,所述系统或载体系统所包含的一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;其中,所述第一靶序列和第二靶序列分别位于所述靶基因组中待缺失区域的侧翼。
在某些优选的实施方案中,所述第一靶序列位于所述靶基因组的第一核酸链,所述第二靶序列位于所述靶基因组的第二核酸链;例如,在第一核酸链中,所述第一靶序列位于所述待缺失区域的3’端,并且,第二核酸链中,所述第二靶序列位于所述待缺失区域的3’端。
在某些优选的实施方案中,所述待缺失区域的长度大于0.1kb,例如大于0.2kb,大于0.3kb,大于0.4kb,大于0.5kb;例如,所述待缺失区域的长度小于500kb,例如小于400kb,小于300kb,小于200kb;例如所述待缺失区域的长度为0.2kb-200kb(例如0.2kb-2kb、0.2kb-5kb、0.2kb-10kb、0.2kb-100kb、0.2kb-200kb;例如0.5kb-1.5kb、0.5kb-2kb、0.5kb-10kb)。
在某些优选的实施方案中,所述靶基因组存在于细胞内,或者,所述靶基因组存在于体外的核酸分子(例如,质粒)中。
在某些优选的实施方案中,所述细胞是原核细胞。
在某些优选的实施方案中,所述细胞是真核细胞。
在某些优选的实施方案中,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体)。
在某些优选的实施方案中,所述方法用于染色体消除。
在一方面,本申请提供了一种诱导基因组结构变异的方法,所述基因组包含互补的第一核酸链和第二核酸链,所述方法包括:将I-VI任一部分中所述的系统或载体系统与靶基因组接触,或者递送至包含所述靶基因组的细胞中。
在某些优选的实施方案中,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的区域的缺失从而诱导基因组结构变异。
在某些优选的实施方案中,所述缺失是大片段缺失,例如大于0.1kb、大于0.2kb、大于0.5kb、大于1kb、大于10kb、大于100kb、大于10kb、大于50kb、大于100kb、例如小于500kb、小于400kb、小于300kb、小于200kb的片段缺失。
在某些优选的实施方案中,所述系统或载体系统所包含的一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;其中,所述第一靶序列和第二靶序列分别位于所述靶基因组中待缺失区域的侧翼。
在某些优选的实施方案中,所述第一靶序列位于所述靶基因组的第一核酸链,所述第二靶序列位于所述靶基因组的第二核酸链;例如,在第一核酸链中,所述第一靶序列位于所述待缺失区域的3’端,并且,第二核酸链中,所述第二靶序列位于所述待缺失区域的3’端。
在某些优选的实施方案中,所述待缺失区域的长度大于0.1kb,例如大于0.2kb,大于0.3kb,大于0.4kb,大于0.5kb;例如,所述待缺失区域的长度小于500kb,例如小于400kb,小于300kb,小于200kb;例如所述待缺失区域的长度为0.2kb-200kb(例如0.2kb-2kb、0.2kb-5kb、0.2kb-10kb、0.2kb-100kb、0.2kb-200kb;例如0.5kb-1.5kb、0.5kb-2kb、0.5kb-10kb)。
在某些优选的实施方案中,所述靶基因组存在于细胞内,或者,所述靶基因组存在于体外的核酸分子(例如,质粒)中。
在某些优选的实施方案中,所述细胞是原核细胞。
在某些优选的实施方案中,所述细胞是真核细胞。
在某些优选的实施方案中,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生 质体)。
在一方面,本申请提供了修饰靶核酸分子的方法,其包括:将I-VI任一部分中所述的系统或载体系统与所述靶核酸分子接触,或者递送至包含所述靶核酸分子的细胞中。
在某些优选的实施方案中,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的靶核酸分子的修饰。
在某些优选的实施方案中,所述靶核酸分子是RNA或DNA。
在某些优选的实施方案中,所述靶核酸分子是双链DNA。
在某些优选的实施方案中,所述靶核酸分子是基因或基因组。
在某些优选的实施方案中,所述靶核酸分子存在于细胞内,或者,所述靶核酸分子存在于体外的核酸分子(例如,质粒)中。
在某些优选的实施方案中,所述细胞是原核细胞。
在某些优选的实施方案中,所述细胞是真核细胞。
在某些优选的实施方案中,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体)。
在某些优选的实施方案中,所述修饰是指所述靶核酸分子的大片段缺失。
在某些优选的实施方案中,所述修饰是指所述靶核酸分子的断裂,如DNA的双链断裂;例如,所述修饰还包括将外源核酸插入所述断裂中。
在某些优选的实施方案中,所述修饰是指述靶核酸分子中的单碱基(例如胞嘧啶,腺嘌呤)发生改变。
在一方面,本申请提供了一种诱导靶核酸分子产生单碱基突变的方法,其包括:将I-VI任一部分中所述的系统或载体系统与所述靶核酸分子接触,或者递送至包含所述靶核酸分子的细胞中。
在某些优选的实施方案中,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的靶核酸分子中单碱基的修饰,并在核酸修复或复制过程产生单碱基突变。
在某些优选的实施方案中,所述单碱基的修饰是指能改变待修饰碱基的碱基互补配对方式的修饰;例如,经修饰前,所述待修饰碱基与第一碱基互补配对,经修饰后,所述被修饰碱基与第二碱基互补配对。
在某些优选的实施方案中,所述系统或载体系统中所包含的一种或多种cas蛋白还包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。
在某些优选的实施方案中,所述所述系统或载体系统中所包含的一种或多种cas蛋白(例如cas8c蛋白)还包含腺苷脱氨酶(例如,TadA8e),所述待修饰碱基为腺嘌呤,经修饰前,腺嘌呤与胸腺嘧啶互补配对,经修饰后,腺嘌呤被修饰为次黄嘌呤,次黄嘌呤与胞嘧啶互补配对。
在某些优选的实施方案中,所述所述系统或载体系统中所包含的一种或多种cas蛋白(例如cas8c蛋白)还包含胞嘧啶脱氨酶(例如,APOBEC3),所述待修饰碱基为胞嘧啶,经修饰前,胞嘧啶与鸟嘌呤互补配对,经修饰后,胞嘧啶被修饰为尿嘧啶,尿嘧啶与胸腺嘧啶互补配对。
在某些优选的实施方案中,所述靶核酸分子是RNA或DNA。
在某些优选的实施方案中,所述靶核酸分子是双链DNA。
在某些优选的实施方案中,所述靶核酸分子是基因或基因组。
在某些优选的实施方案中,所述靶核酸分子存在于细胞内,或者,所述靶核酸分子存在于体外的核酸分子(例如,质粒)中。
在某些优选的实施方案中,所述细胞是原核细胞。
在某些优选的实施方案中,所述细胞是真核细胞。
在某些优选的实施方案中,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体)。
在一方面,本申请提供了改变基因产物的表达的方法,其包括:将I-VI任一部分中所述的系统或载体系统与编码所述基因产物的靶核酸分子接触,或者递送至包含所述靶核酸分子的细胞中。
在某些优选的实施方案中,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的靶 核酸分子的修饰从而改变基因产物的表达。
在某些优选的实施方案中,所述靶核酸分子存在于细胞内,或者所述靶核酸分子存在于体外的核酸分子(例如,质粒)中。
在某些优选的实施方案中,所述细胞是原核细胞。
在某些优选的实施方案中,所述细胞是真核细胞。
在某些优选的实施方案中,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞,玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体)。
在某些优选的实施方案中,所述基因产物的表达被改变(例如,增强或降低)。
在某些优选的实施方案中,所述基因产物是蛋白。
在一方面,本申请提供了产生具有经修饰性状的植物的方法,所述方法包括将植物细胞与I-VI任一部分中所述的系统或载体系统接触,或者使该植物细胞经受权利要求如上文所述的在靶基因组中诱导缺失的方法、诱导基因组结构变异的方法、修饰靶核酸分子的方法、诱导靶核酸分子产生单碱基突变的方法或改变基因表达产物的方法,由此修饰或编辑该植物细胞的靶基因或基因组中的靶核酸分子,并且由所述植物细胞再生植物。
在某些优选的实施方案中,所述植物是农业植物,例如玉米、大麦、棉花、大米、大豆、小麦、水稻。
在某些优选的实施方案中,所述的的在靶基因组中诱导缺失的方法、诱导基因组结构变异的方法、修饰靶核酸分子的方法、诱导靶核酸分子产生单碱基突变的方法、改变基因表达产物的方法或产生具有经修饰性状的植物的方法中,所述系统或载体系统中所包含的cas蛋白或编码cas蛋白的核苷酸序列、导向RNA或编码导向RNA的核苷酸序列存在于递送系统中。
在某些优选的实施方案中,所述递送系统选自粒子、囊泡或病毒载体。
在某些优选的实施方案中,所述粒子包含脂质、糖、金属或蛋白质。
在某些优选的实施方案中,所述囊泡包含外来体或脂质体。
在某些优选的实施方案中,所述病毒载体包含腺病毒、慢病毒或腺相关病毒。
在一方面,本申请提供了I-VI任一部分中所述的系统或载体系统、试剂盒或递送组 合物,用于核酸编辑的用途,或者在制备制剂中的用途,所述制剂用于核酸编辑。
在某些优选的实施方案中,所述核酸编辑包括基因或基因组编辑。
在某些优选的实施方案中,所述基因或基因组编辑包括核酸大片段缺失、修饰基因、敲除基因、改变基因产物的表达、修复突变、和/或插入多核苷酸、单碱基突变。
在某些优选的实施方案中,所述核酸编辑包括诱导基因组结构变异或染色体消除。
在一方面,本申请提供了I-VI任一部分中所述的系统或载体系统、试剂盒或递送组合物,在制备制剂中的用途,所述制剂用于编辑靶基因座中的靶核苷酸序列来修饰生物或非人类生物(例如植物)。
术语定义
在本申请中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的核酸化学实验室操作步骤均为相应领域内广泛使用的常规步骤。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。除非在本文别处具体限定或不同地描述,否则以下与本发明有关的术语和描述应按照下面给出的定义来理解。
当本文使用术语“例如”、“如”、“诸如”、“包括”、“包含”或其变体时,这些术语将不被认为是限制性术语,而将被解释为表示“但不限于”或“不限于”。
除非本文另外指明或根据上下文明显矛盾,否则术语“一个”和“一种”以及“该”和类似指称物在描述本发明的上下文中(尤其在以下权利要求的上下文中)应被解释成覆盖单数和复数。
如本文所用,术语“Type I-C CRISPR-CAS3系统”是指包含多亚基crRNA-效应子复合物的1类CRISPR-CAS系统,更具体地涉及I型系统,甚至更具体地涉及亚型I-C系统。亚型I-C系统可以包括多个不同的CAS组件,例如包括Cas3、Cas5(例如cas5c)、Cas7和Cas8(例如,Cas8c)等CAS组件以及任选的其他CAS组件(参见例如Makarova et al.2020.Nature Reviews Microbiology 18(2):67–83.https://doi.org/10.1038/s41579-019-0299-x.、Koonin,Makarova,and Zhang 2017.Current Opinion in Microbiology 37:67–78.https://doi.org/10.1016/j.mib.2017.05.008.、Koonin and Makarova 2019.Russian Veterinary Journal 2019(2):29– 36.http://dx.doi.org/10.1098/rstb.2018.0087,上述文献全文通过引用并入本文)。在某些实施方案中,本申请中使用的CAS蛋白源自或衍生自具有天然I-C系统的原核生物,例如Desulfovibrio vulgaris str.Hildenborough(参见Hochstrasser et al.2016.Molecular Cell 63(5):840–51.https://doi.org/10.1016/j.molcel.2016.07.027.、McBride et al.2020.Molecular Cell 80(6):971-979.e7.https://doi.org/10.1016/j.molcel.2020.11.003.;上述文献全文通过引用并入本文)。但是应当理解,可以使用来自任何来源的CAS蛋白(例如,Cas3、Cas5(例如,Cas5c)、Cas7、Cas8(例如,Cas8c)、cas11c)或其衍生物。在某些实施方案中,本申请使用的不同CAS组件可源自或衍生自同一种生物或不同种生物。
在某些实施方案中,所述cas3蛋白的氨基酸序列可参见NCBI Genbank ID:504337588。然而,本领域技术人员理解,在cas3蛋白的氨基酸序列中,可天然产生或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的I-C CRISPR-CAS3系统中的cas3蛋白),而不影响其生物学功能。因此,在本发明中,术语“cas3蛋白”应包括所有此类序列,包括例如SEQ ID NO:1所示的序列以及其天然或人工的变体。
在某些实施方案中,所述cas5c蛋白的氨基酸序列可参见NCBI Genbank ID:499490067。然而,本领域技术人员理解,在cas5c蛋白的氨基酸序列中,可天然产生或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的I-C CRISPR-CAS3系统中的cas5c蛋白),而不影响其生物学功能。因此,在本发明中,术语“cas5c蛋白”应包括所有此类序列,包括例如SEQ ID NO:2所示的序列以及其天然或人工的变体。
在某些实施方案中,所述cas8c蛋白的氨基酸序列可参见NCBI Genbank ID:499490068。然而,本领域技术人员理解,在cas8c蛋白的氨基酸序列中,可天然产生或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的I-C CRISPR-CAS3系统中的cas8c蛋白),而不影响其生物学功能。因此,在本发明中,术语“cas8c蛋白”应包括所有此类序列,包括例如SEQ ID NO:3所示的序列以及其天然或人工的变体。
在某些实施方案中,所述cas7蛋白的氨基酸序列可参见NCBI Genbank ID:499490069。然而,本领域技术人员理解,在cas7蛋白的氨基酸序列中,可天然产生 或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的I-C CRISPR-CAS3系统中的cas7蛋白),而不影响其生物学功能。因此,在本发明中,术语“cas7蛋白”应包括所有此类序列,包括例如SEQ ID NO:4所示的序列以及其天然或人工的变体。
在某些实施方案中,所述cas11c蛋白的氨基酸序列可参见NCBI Genbank ID:499490068。然而,本领域技术人员理解,在cas11c蛋白的氨基酸序列中,可天然产生或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的I-C CRISPR-CAS3系统中的cas11c蛋白),而不影响其生物学功能。因此,在本发明中,术语“cas11c蛋白”应包括所有此类序列,包括例如SEQ ID NO:5所示的序列以及其天然或人工的变体。
如本文中所使用的,术语“导向RNA(guide RNA)”、“成熟crRNA”可互换地使用并且具有本领域技术人员通常理解的含义。一般而言,导向RNA可以包含同向(direct)重复序列和导向序列(guide sequence),或者基本上由或由同向重复序列和导向序列(在内源性CRISPR系统背景下也称为间隔序列(spacer))组成。在某些情况下,导向序列是与靶序列具有足够互补性从而与所述靶序列杂交并引导CRISPR/Cas复合物与所述靶序列的特异性结合的任何多核苷酸序列。在某些实施方案中,当最佳比对时,导向序列与其相应靶序列之间的互补程度为至少50%、至少60%、至少70%、至少80%、至少90%、至少95%、或至少99%。确定最佳比对在本领域的普通技术人员的能力范围内。例如,存在公开和可商购的比对算法和程序,诸如但不限于ClustalW、matlab中的史密斯-沃特曼算法(Smith-Waterman)、Bowtie、Geneious、Biopython以及SeqMan。
在某些情况下,所述导向序列在长度上为至少5个、至少10个、至少15个、至少16个、至少17个、至少18个、至少19个、至少20个、至少21个、至少22个、至少23个、至少24个、至少25个、至少26个、至少27个、至少28个、至少29个、至少30个、至少35个、至少40个、至少45个或至少50个核苷酸。在某些情况下,所述导向序列在长度上为不超过50个、45个、40个、35个、30个、25个、24个、23个、22个、21个、20个、15个、10个或更少个核苷酸。在某些实施方案中,所述导向序列在长度上为10-50个、或15-40个、或20-40个核苷酸。
在某些情况下,所述同向重复序列在长度上为至少10个、至少15个、至少16个、至少17个、至少18个、至少19个、至少20个、至少21个、至少22个、至少23个、 至少24个、至少25个、至少26个、至少27个、至少28个、至少29个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少56个、至少57个、至少58个、至少59个、至少60个、至少61个、至少62个、至少63个、至少64个、至少65个或至少70个核苷酸。在某些情况下,所述同向重复序列在长度上为不超过70个、65个、64个、63个、62个、61个、60个、59个、58个、57个、56个、55个、50个、45个、40个、35个、30个、29个、28个、27个、26个、25个、24个、23个、22个、21个、20个、15个、10个或更少个核苷酸。在某些实施方案中,所述同向重复序列在长度上为55-70个核苷酸,例如55-65个核苷酸,例如60-65个核苷酸,例如62-65个核苷酸,例如63-64个核苷酸。在某些实施方案中,所述同向重复序列在长度上为15-40个核苷酸,例如15-38个核苷酸,例如20-40个核苷酸,例如22-38个核苷酸,例如32个核苷酸。在某些实施方案中,所述同向重复序列在长度上不少于30nt,例如30nt-37nt。
如本文中所使用的,术语“CRISPR/Cas复合物”是指,导向RNA(guide RNA)或成熟crRNA与Cas蛋白结合所形成的核糖核蛋白复合体,其包含杂交到靶序列上并且与Cas蛋白结合的导向序列。该核糖核蛋白复合体能够识别并切割能与该导向RNA或成熟crRNA杂交的多核苷酸。
因此,在形成CRISPR/Cas复合物的情况下,“靶序列”是指被设计为具有靶向性的导向序列所靶向的多核苷酸,例如与该导向序列具有互补性的序列,其中靶序列与导向序列之间的杂交将促进CRISPR/Cas复合物的形成。完全互补性不是必需的,只要存在足够互补性以引起杂交并且促进一种CRISPR/Cas复合物的形成即可。靶序列可以包含任何多核苷酸,如DNA或RNA。在某些情况下,所述靶序列位于细胞的细胞核或细胞质中。在某些情况下,该靶序列可位于真核细胞的一个细胞器例如线粒体或叶绿体内。
在本发明中,表述“靶序列”或“靶多核苷酸”可以是对细胞(例如,真核细胞)而言任何内源或外源的多核苷酸。例如,该靶多核苷酸可以是一种存在于真核细胞的细胞核中的多核苷酸。该靶多核苷酸可以是一个编码基因产物(例如,蛋白质)的序列或一个非编码序列(例如,调节多核苷酸或无用DNA)。在某些情况下,据信该靶序列应该与原间隔序列临近基序(PAM)相关。对PAM的精确序列和长度要求取决于使用的Cas效应酶而不同,但是PAM典型地是临近原间隔序列(也即,靶序列)的2-5个碱基对序列。本领域技术人员能够鉴定与给定的Cas效应蛋白一起使用的PAM序列。
如本文所用,术语“腺苷脱氨酶”是指催化腺嘌呤或腺苷的水解脱氨的蛋白。在一些 实施方案中,所述腺苷脱氨酶催化腺嘌呤或腺苷在脱氧核糖核酸(DNA)中水解脱氨为肌苷。在一些实施方案中,所述腺苷脱氨酶是TadA8e。在某些实施方案中,所述腺苷脱氨酶的氨基酸序列可参见NCBI Genbank ID:UNJ19119.1或NCBI Genbank ID:QHD44350.1。然而,本领域技术人员理解,在腺苷脱氨酶的氨基酸序列中,可天然产生或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的腺苷脱氨酶),而不影响其生物学功能。因此,在本发明中,术语“腺苷脱氨酶”应包括所有此类序列,包括例如NCBI Genbank ID:UNJ19119.1或NCBI Genbank ID:QHD44350.1所示的序列以及其天然或人工的变体。
如本文所用,术语“胞嘧啶脱氨酶”是指催化胞苷或胞嘧啶水解脱氨的蛋白。在某些实施方案中,所述胞嘧啶脱氨酶是APOBEC3。在某些实施方案中,所述胞嘧啶脱氨酶的氨基酸序列可参见NCBI Genbank ID:76096346或NCBI Genbank ID:176865758。然而,本领域技术人员理解,在胞嘧啶脱氨酶的氨基酸序列中,可天然产生或人工引入突变或变异(包括但不限于,置换,缺失和/或添加,例如不同来源的胞嘧啶脱氨酶),而不影响其生物学功能。因此,在本发明中,术语“胞嘧啶脱氨酶”应包括所有此类序列,包括例如NCBI Genbank ID:76096346或NCBI Genbank ID:176865758所示的序列以及其天然或人工的变体。
如本文中所使用的,术语“同一性”用于指两个多肽之间或两个核酸之间序列的匹配情况。为了测定两个氨基酸序列或两个核酸序列的百分比同一性,为了最佳比较目的将序列进行比对(例如,可在第一氨基酸序列或核酸序列中引入缺口以与第二氨基酸或核酸序列最佳比对)。然后比较对应氨基酸位置或核苷酸位置处的氨基酸残基或核苷酸。当第一序列中的位置被与第二序列中的对应位置相同的氨基酸残基或核苷酸占据时,则分子在该位置上是同一的。两个序列之间的百分比同一性是由序列所共享的同一性位置的数目的函数(即,百分比同一性=同一重叠位置的数目/位置的总数×100%)。在某些实施方案中,两个序列长度相同。
两个序列之间的百分比同一性的测定还可使用数学算法来实现。用于两个序列的比较的数学算法的一个非限制性实例是Karlin和Altschul的算法,1990,Proc.Natl.Acad.Sci.U.S.A.87:2264-2268,如同Karlin和Altschul,1993,Proc.Natl.Acad.Sci.U.S.A.90:5873-5877中改进的。将这样的算法整合至Altschul等人,1990,J.Mol.Biol.215:403的NBLAST和XBLAST程序中。
如本文中所使用的,术语“载体”是指,可将多聚核苷酸插入其中的一种核酸运载工具。当载体能使插入的多核苷酸编码的蛋白获得表达时,载体称为表达载体。载体可以通过转化,转导或者转染导入宿主细胞,使其携带的遗传物质元件在宿主细胞中获得表达。载体是本领域技术人员公知的,包括但不限于:质粒;噬菌粒;柯斯质粒;人工染色体,例如酵母人工染色体(YAC)、细菌人工染色体(BAC)或P1来源的人工染色体(PAC);噬菌体如λ噬菌体或M13噬菌体及动物病毒等。可用作载体的动物病毒包括但不限于,逆转录酶病毒(包括慢病毒)、腺病毒、腺相关病毒、疱疹病毒(如单纯疱疹病毒)、痘病毒、杆状病毒、乳头瘤病毒、乳头多瘤空泡病毒(如SV40)。一种载体可以含有多种控制表达的元件,包括但不限于,启动子序列、转录起始序列、增强子序列、选择元件及报告基因。另外,载体还可含有复制起始位点。
发明的有益效果
与现有技术相比,本发明的I-C CRISPR-Cas3系统具有显著的应用价值。例如,本发明提供的I-C CRISPR-Cas3系统可以实现基因组大片段的缺失,如对于基因编码区的敲除,长的lncRNA或者增强子的敲除,以及应用于染色体消除等方面具有更大的优势。例如,本发明提供的I-C CRISPR-Cas3系统具有pre-crRNA的加工活性,其相较于Cas9系统无需tracrRNA,可以更加简便地应用于多靶点的基因编辑。例如,本发明提供的包含两个反向靶位点的导向RNA与type I-E系统利用单个靶位点的基因编辑相比可以实现基因组精确的片段缺失。
下面将结合附图和实施例对本发明的实施方案进行详细描述,但是本领域技术人员将理解,下列附图和实施例仅用于说明本发明,而不是对本发明的范围的限定。根据附图和优选实施方案的下列详细描述,本发明的各种目的和有利方面对于本领域技术人员来说将变得显然。
附图说明
图1为实施例1中载体的设计图谱。图1A为type I-C系统的载体设计,图1B为Cas11c蛋白缺失的type I-C系统(D11C)的载体设计,图1C为type I-E系统的载体设计。
图2为实施例2中YFFP报告系统检测的结果。图2A为YFFP报告系统原理图;图2B为YFFP报告系统中的靶点设计,其中,IC-1、IC-2和IC-3为type I-C靶向YFFP重组序列的三个示例性靶点;2C为经type I-C系统和YFFP报告系统共转化的原生质体的荧光显微镜检测结果。
图3为实施例3中双荧光素酶报告系统的检测结果。图3A为双荧光素酶报告系统检测的实验流程图,图3B为type I-C系统、D11C系统、type I-E系统、Cas9系统的双荧光素酶报告系统检测结果,纵坐标表示各系统相对荧光值。
图4为实施例4中玉米内源基因编辑活性的检测。图4A为type I-C系统的PCR检测结果,图4B为D11C系统的PCR检测结果,图4C为type I-C系统在O2基因上第一个检测位点的一代测序比对结果,图4D为type I-C系统在O2基因上第二个检测位点的一代测序比对结果。
图5为实施例5中type I-C系统和type I-E系统的在玉米内源基因O2(位点1,O2-1;和,位点2,O2-2),PDL1,GL2和IPK1编辑活性比较的结果。
图6为实施例6中腺嘌呤单碱基编辑载体(I-C TadA8e)的设计图谱。
图7为实施例7中type I-C系统在玉米稳定转基因植株的基因编辑检测结果。图7A为靶向ZB7基因的双靶点设计,其中#g1和#g2分别为两个靶点;图7B为靶向GA2基因的双靶点设计,其中#g1和#g2分别为两个靶点;图7C为ZB7基因转基因植株编辑情况检测的一代测序比对结果;图7D为GA2基因转基因植株编辑情况检测的一代测序比对结果。
图8为实施例8中type I-C系统在水稻稳定转基因植株的基因编辑检测结果。图8A为靶向SLR1基因的双靶点设计,其中#g1和#g2分别为两个靶点;图8B为SLR1基因转基因植株编辑情况检测的一代测序比对结果。
图9为实施例9中type I-C系统在拟南芥原生质体的基因编辑检测结果。图9A为靶向RBSC1B、RBSC2B、RBSC3B基因的靶点设计,其中#g1和#g2分别为两个靶点,位于RBSC1B、RBSC2B、RBSC3B基因的同源序列上;图9B为拟南芥原生质体编辑情况检测的一代测序比对结果。
序列信息
本发明涉及的部分序列的信息提供于下面的表1中。
表1:序列信息
Figure PCTCN2022096648-appb-000001
Figure PCTCN2022096648-appb-000002
Figure PCTCN2022096648-appb-000003
Figure PCTCN2022096648-appb-000004
Figure PCTCN2022096648-appb-000005
Figure PCTCN2022096648-appb-000006
Figure PCTCN2022096648-appb-000007
Figure PCTCN2022096648-appb-000008
Figure PCTCN2022096648-appb-000009
Figure PCTCN2022096648-appb-000010
Figure PCTCN2022096648-appb-000011
Figure PCTCN2022096648-appb-000012
Figure PCTCN2022096648-appb-000013
Figure PCTCN2022096648-appb-000014
Figure PCTCN2022096648-appb-000015
具体实施方式
现参照下列意在举例说明本发明(而非限定本发明)的实施例来描述本发明。
除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。例如,本发明中所使用的免疫学、生物化学、化学、分子生物学、微生物学、细胞生物学、基因组学和重组DNA等常规技术,可参见萨姆布鲁克(Sambrook)、弗里奇(Fritsch)和马尼亚蒂斯(Maniatis),《分子克隆:实验室手册》(MOLECULAR CLONING:A LABORATORY MANUAL),第 2次编辑(1989);《当代分子生物学实验手册》(CURRENT PROTOCOLS IN MOLECULAR BIOLOGY)(F.M.奥苏贝尔(F.M.Ausubel)等人编辑,(1987));《酶学方法》(METHODS IN ENZYMOLOGY)系列(学术出版公司):《PCR 2:实用方法》(PCR 2:A PRACTICAL APPROACH)(M.J.麦克弗森(M.J.MacPherson)、B.D.黑姆斯(B.D.Hames)和G.R.泰勒(G.R.Taylor)编辑(1995)),以及《动物细胞培养》(ANIMAL CELL CULTURE)(R.I.弗雷谢尼(R.I.Freshney)编辑(1987))。
另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。
以下实施例涉及的部分试剂的来源如下:
LB液体培养基:10g胰蛋白胨(Tryptone),5g酵母提取物(Yeast Extract),10g NaCl,定容至1L,灭菌。
CTAB溶液:CTAB(十六烷基三甲基溴化铵)16.7g,5M NaCl 234mL,1M Tris-HCl(pH 8.0)83.5mL,0.5M EDTA(pH 8.0)33.4mL用蒸馏水补充至1L体积,使用时按比例加入100:1的β-巯基乙醇。
W5溶液:154mM NaCl,125mM CaCl 2,5mM KCl,4mM MES定容至500mL,用NaOH调pH为5.7。
MMG溶液:0.4mM甘露醇,15mM MgCl 2,4mM MES定容至10ML。
大提质粒试剂盒采购自QIAGEN公司,货号:12963。
Blunt-smiple载体采购自上海翌圣生物科技有限公司,货号:CB111-02。
大肠杆菌感受态DH5α购自北京擎科生物有限公司,货号:TSV-A07。
双荧光素酶报告系统检测试剂盒购至上海翌圣生物科技有限公司,货号:11402ES60。
P3301载体购自优宝生物公司,货号VT1386。
Puc19载体购自Takara生物公司,货号3219。
除非特别指明,以下实施例中涉及的序列合成均由南京金斯瑞生物科技有限公司完成,涉及的测序均由北京睿博兴科生物技术有限公司和六合华大公司完成。
实施例1.实验相关的载体设计
1.通过NCBI数据库查找了Desulfovibrio vulgaris str.Hildenborough菌株中相应的蛋白注释,分别获得了Cas3、Cas5c、Cas8c、Cas7的蛋白质氨基酸序列信息(SEQ ID NOs:1-4),并进行了真核生物玉米的密码子优化,优化后的蛋白编码序列如SEQ ID NOs:6-10所示。
2.利用玉米UBI启动子以及T2A剪切肽设计了用于表达Cas8c、Cas7、Cas5c的单顺反子表达载体。为了检测Cas11c蛋白在真核生物编辑的作用,我们同时设计了含有Cas11c蛋白和Cas11c蛋白缺失(D11C)的载体,并在每个蛋白的N端加入了核定位信号(核定位信号氨基酸序列如SEQ ID NO:15所示)。载体结构分别如图1A和1B所示。
3.为了评估type Ⅰ-C系统的活性,我们同时根据动物细胞type Ⅰ-E系统的应用设计了植物type Ⅰ-E系统的表达载体(结构如图1C所示)作为对照组。上述所有的载体中Cas3蛋白用CMV35S启动子进行表达,导向RNA通过OsU3启动子进行表达,并将以上各个蛋白和RNA组分构建至P3301载体(购自优宝生物,货号:VT1386)进行后续的实验检测。
实施例2.YFFP报告系统检测
1.为了初步鉴定该系统能否具有DNA剪切活性,我们首先构建了YFFP的报告系统。构建方式为:将一段55bp含有type Ⅰ-C PAM识别位点的DNA序列插入至YFP的DNA序列第289个核苷酸残基之后,同时加入TGA终止密码子,构建一段两侧含有223bp同源序列的YFFP重组序列,并将其构建至PUC19载体上,用35S启动子进行表达,得到含有YFFP报告系统的重组载体。YFFP报告系统的原理如图2A所示,如果所述type Ⅰ-C系统可以在插入的DNA序列的靶位点上进行DNA的裂解,之后通过单链退火的修复途径将YFFP修复成为完整的YFP,进而在蓝色激发光下发出绿色荧光。因此,可通过荧光信号判断所述Type Ⅰ-C系统的DNA剪切活性。
2.在上述步骤1制备的重组载体上选择含有type Ⅰ-C识别PAM(TTC)的DNA序列作为靶位点,例如,如图2B中所示的IC-1、IC-2或IC-3所示的靶点,各靶点所用导向RNA序列分别如SEQ ID NO:49、50或51所示,进行P3301-type Ⅰ-C载体 (载体结构如图1A所示)的构建。
3.原生质体的提取
选取叶片的中部分离原生质体,用锋利的刀片切成大约0.5mm宽的条块,可以20~30个放在一起切开;
将其转移到配好的酶解液中,避光,真空泵-15~-20(inHg)抽真空30分钟;再避光酶解5~6小时,同时缓慢摇动(脱色摇床,速度10rpm);
酶解结束后,加等量的W5溶液,稍有力地用手水平摇动10秒钟,释放原生质体;
使用40um尼龙膜过滤原生质体到50mL的圆底离心管中,100g水平离心3分钟沉淀原生质体,吸出上清;
加W5重悬原生质体,冰浴30分钟,使原生质体自然沉降,尽可能弃上清;
加适量MMG溶液重悬,原生质体浓度为2*106/ml,血球计数器计数。
4.将上述步骤1和2中构建完成的载体进行原生质体的共转化,28℃培养48h后用荧光显微镜进行观察。
结果如图2C所示,在荧光显微镜下观察到,转化type Ⅰ-C系统的原生质体中具有绿色荧光信号的细胞,表明实施例1中如图1A构建的载体具有DNA裂解的活性。
实施例3.双荧光素酶报告系统的检测
1.为了进一步确定Cas11c蛋白是否是真核生物编辑所必须的,我们构建了基于玉米原生质体转化系统的双荧光素酶报告系统。该系统的构建与实施例2中YFFP报告系统的构建方法类似。构建方式为:将一段55bp含有type Ⅰ-C PAM识别位点的DNA序列插入至FLuc的1190bp的核苷酸残基处,并加入TGA终止密码子,同时在该DNA两侧设计了780bp的同源臂,并将该改造后的氨基酸序列用35S启动子启动表达。同时,选择puc19载体作为骨架,用35S启动子启动Rluc的表达作为内参表达载体。双荧光素酶报告系统的检测流程如图3A所示,如果所述Cas系统(I-C、D11C、Cas9系统,或,type Ⅰ-E系统)可以在插入的DNA序列的靶位点上进行DNA的裂解,之后通过单链退火的修复途径将Fluc修复成为完整的Fluc,进而恢复荧光素酶活性。因此,可通过荧光素酶活性的测定判断所述Cas系统的DNA剪切活性。
2.选择含有type Ⅰ-C系统PAM(TTC)的DNA序列进行实施例1中如图1A和1B的载体构建,并分别命名为I-C和D11C,同时选择Cas9系统PAM(NGG)的DNA序 列进行Cas9对照载体的构建(参见ref:Jinjie Zhu et al.2015)以及I-E系统PAM(A AG)的DNA序列进行type Ⅰ-E系统载体的构建(ref:Hiroyuki Morisaka et al.2019)。
3.将上述步骤2构建的I-C、D11C、Cas9和I-E载体分别与上述步骤1构建的双荧光素酶报告系统进行原生质体共转化。
4.将转化后的原生质体每个4个重复混成一个体系于1.5mL离心管中,8000rpm离心3min后去上清,加入双荧光酶报告基因检测试剂盒中的裂解液500uL,于冰上放置5min孵育,12000rpm离心1min后取20uL上清与黑色酶标板中,加入100uL萤火虫荧光素酶反应液,酶标仪测定荧光读值。之后加入100uL海肾荧光素酶反应液,酶标仪测定荧光读值,所有的数据经过处理后用Graphpad prism8.0进行作图。
各编辑系统(type Ⅰ-C系统、D11C系统、type Ⅰ-E系统、Cas9系统)的双荧光素酶报告系统检测结果如图3B所示,由图可知,Ⅰ-C系统、Ⅰ-E系统、Cas9系统的相对荧光值都要高于未处理的实验组(约为未处理实验组的3-5倍);而缺少了Cas11c表达的系统(即D11C系统)的相对荧光值与未处理的实验组没有显著性差异,说明Cas11c蛋白在真核生物的编辑过程具有重要的作用。
实施例4.玉米内源基因编辑活性的检测
1.为了进一步检测Ⅰ-C系统在真核生物内源基因的编辑活性,我们选择玉米O2基因(GRMZM2G015534)、PDL1基因(GRMZM2G091481)、GL2基因(GRMZM2G098239)作为靶向的基因。为了提高基因片段缺失长度的准确度,我们针对每个检测位点分别设计了两个反向双靶点。例如,针对O2基因的两个检测位点,所述双靶点的设计如图4C中c1和c2(第一个检测位点,O2-1),和,如图4D中c3和c4所示(第二个检测位点,O2-2)。
2.根据步骤1中的靶位点设计方法,我们选择具有5’-TTC特征的DNA序列(34nt)作为靶向位点,并选择两个反向距离1kb左右的DNA序列(34nt)作为间隔序列进行U3-RNA载体的构建。之后将每个构建完成的U3-RNA载体连接至p3301载体上。为了进一步评估Cas11c蛋白的作用,我们将靶向O2基因的U3-RNA载体同时连接至Cas11c蛋白缺失的p3301载体上。
3.为了评估type Ⅰ-C系统的活性,我们选择了type Ⅰ-E系统作为活性参照,并参 考上述步骤2中type Ⅰ-C的双靶点设计,选择具有5’-AAG特征的DNA序列(32nt)作为靶向位点,设计了基于type Ⅰ-E系统的载体。靶点设计方法与type Ⅰ-C系统相同,两个反向靶位点之间的距离为1kb左右。
4.将步骤2和步骤3中构建好的载体分别进行原生质体的转化,28℃培养48h后提取玉米基因组的DNA。设计引物扩增靶点上下游1kb左右的区间,将扩增后的产物连接Blunt-simple载体,随机挑选96个重组的克隆用M13F/M13R引物对进行菌P检测,进行凝胶电泳分析。电泳结果如图4A所示,结果表明type Ⅰ-C系统编辑产物在O2基因位点上具有小于野生型基因组扩增长度(2Kb)的PCR条带(1kb左右,如图4A中星号标注的泳道所示)。将红色星号标注的PCR产物用M13F进行一代测序,将一代测序结果与B73参考基因组进行序列比对,结果发现该序列都含有大片段的缺失,缺失的片段主要为两个靶点之间(如图4C和4D)所示。
5.同上述步骤4中的检测方法,我们同时扩增了O2位点由D11C载体转化的原生质体的DNA,将扩增后的产物连接Blunt-simple载体,随机挑选96个重组的克隆用M13F/M13R引物对进行菌P检测,然而进行凝胶电泳分析,电泳结果如图4B所示,结果表明,所述96个重组克隆的PCR产物的大小都与野生型基因组PCR扩增的长度相同(2Kb),即,Cas11c蛋白缺失的D11C载体不能有效实现玉米内源基因的编辑,该结果进一步证实Cas11c蛋白在type Ⅰ-C系统作用于真核生物基因组编辑具有重要的作用。
6.将步骤4中的一代测序结果与原始的基因组进行BLAST比对,通过比对发现缺失的片段大部分都位于两个靶点之间,如图4C所示,第一个检测位点的缺失片段长度为904bp-944bp,如图4D所示,第二个检测位点的缺失片段长度为894bp-1282bp。初步统计编辑的效率,第一个检测位点的编辑效率为5.21%,第二个检测位点的编辑效率为15.58%。
实施例5.Type Ⅰ-C系统和type Ⅰ-E系统的真核编辑活性比较
1.根据实施例4中的实验方法,我们检测了由type Ⅰ-E系统即图1C中所示载体转化的原生质体的DNA,并将其PCR扩增产物进行一代测序,测序结果表明,type Ⅰ-E系统所产生的片段的缺失也是主要位于两个靶点之间。而之前文献(ref:Dolan et al.2019)(ref:Hiroyuki Morisaka et al.2019)中所使用的单个靶点的编辑结果主要造成随机的大片段的缺失,因此,本发明两个反向的靶位点的设计弥补了type Ⅰ系统对于片 段随机长度缺失的不足,提高了片段缺失长度的准确度。
2.type Ⅰ-E系统和本发明的type Ⅰ-C在玉米内源基因O2(位点1,O2-1;和,位点2,O2-2),PDL1,GL2和IPK1的编辑效率如图5所示,由图可知,type Ⅰ-C系统的编辑效率在5%-55%之间,平均效率为23.14%,type Ⅰ-E系统的编辑效率在4%-45%之间,平均效率为14.87%。因此,在我们目前所检测的玉米内源基因中,type Ⅰ-C系统的编辑效率远高于目前已经应用于动物细胞编辑的type Ⅰ-E系统。
实施例6.Type Ⅰ-C系统用于腺嘌呤碱基编辑
1.腺嘌呤单碱基编辑载体(I-C TadA8e)设计
利用玉米UBI启动子以及T2A剪切肽设计了用于表达Cas7、Cas5c、Cas11c的单顺反子表达载体,TadA8e-Cas8c融合蛋白用CMV35S启动子进行表达,并在每个蛋白的N端加入了核定位信号(核定位信号氨基酸序列如SEQ ID NO:15所示)。导向RNA通过OsU3启动子进行表达,并将以上各个蛋白和RNA组分构建至P3301载体(购自优宝生物,货号:VT1386)进行后续的实验。载体设计图谱如图6所示。
2.选取玉米基因组上含有type I-C识别PAM(TTC)的DNA序列作为靶序列,进行腺嘌呤单碱基编辑载体(I-C TadA8e)的构建。
3.将以上构建好的载体进行玉米原生质体的转化,提取转化后的DNA,进行靶位点上下游的PCR扩增,将PCR扩增后的DNA产物连接B载体并进行测序,从测序结果判断靶序列附近是否有Ato G的碱基替换。
实施例7.Type Ⅰ-C系统在玉米稳定转基因植株的基因编辑检测
1.为了检测Type I-C系统在玉米稳定转基因植株的编辑效率,我们选择了玉米ZB7基因(GRMZM2G027059)、GA2基因(GRMZM2G368411)作为靶向基因,并在每个基因上设计两个反向的靶点,如图7所示。例如,针对ZB7基因的两个检测位点,所述双靶点的设计如图7A中#g1和#g2。
2.根据步骤1中的靶位点设计方法,我们选择具有5’-TTC特征的DNA序列(34nt)作为靶向位点,并选择两个反向距离1kb左右的DNA序列(34nt)作为间隔序列进行U3-RNA载体的构建。之后将每个构建完成的U3-RNA载体连接至p3301载体上。
3.将步骤2中构建好的载体进行农杆菌的转化和愈伤组织的再生,提取T0代转基因植株的叶片的DNA并进行PCR扩增。检测方法同实施例4中步骤4的检测方法,在靶点的上下游500bp附近设计基因组特异性的引物进行PCR扩增,并将扩增的PCR连接Blunt-simple载体,随机挑选24个重组的克隆用M13F/M13R引物对进行菌P检测和一代测序,将一代测序的结果与参考基因的基因组序列进行比对,比对结果如图7C(ZB7基因)和7D(GA2基因)所示。
4.根据步骤3的一代测序结果,每个转基因事件含有1个及以上的缺失的克隆认为该转基因事件为基因编辑阳性植株,统计ZB7基因和GA2基因的基因编辑阳性植株的比例分别为86.67%和60%,如表2所示。
表2 基因编辑阳性植株的比例
Figure PCTCN2022096648-appb-000016
实施例8.Type Ⅰ-C系统在水稻稳定转基因植株的基因编辑检测
1.为了检测Type I-C系统在水稻稳定转基因植株的编辑效率,我们选择了水稻SLR1基因(LOC_Os03g49990)作为靶向基因,并在该个基因上设计两个反向的靶点,所述双靶点的设计如图8A中#g1和#g2。
2.根据步骤1中的靶位点设计方法,我们选择具有5’-TTC特征的DNA序列(34nt)作为靶向位点,并选择两个反向距离1kb左右的DNA序列(34nt)作为间隔序列进行U3-RNA载体的构建。之后将每个构建完成的U3-RNA载体连接至p1300载体上。
3.将步骤2中构建好的载体进行农杆菌的转化和愈伤组织的再生,提取T0代转基因植株的叶片的DNA并进行PCR扩增。检测方法同实施例7中步骤3的检测方法,在靶点的上下游500bp附近设计基因组特异性的引物进行PCR扩增,并将扩增的PCR连接Blunt-simple载体,随机挑选24个重组的克隆用M13F/M13R引物对进行菌P检测和一代测序,将一代测序的结果与参考基因的基因组序列进行比对,比对结果如图8B所示。
4.根据步骤3的一代测序结果,每个转基因事件含有1个及以上的缺失的克隆认为该转基因事件为基因编辑阳性植株,根据表3的统计结果,在水稻稳定转基因植株的T0代 中基因编辑阳性植株的比例为80%。
表3 基因编辑阳性植株的比例
Figure PCTCN2022096648-appb-000017
实施例9.Type Ⅰ-C系统在拟南芥原生质体的基因编辑检测
1.为了检测Type I-C系统在拟南芥原生质体的编辑效率,我们选择了拟南芥RBCS1B基因(AT5G38430)、RBCS2B基因(AT5G38420)、RBCS3B基因(AT5G38410)作为靶向基因,并在这3个基因上的同源区域设计了两个反向的靶点,如图9A所示,所述双靶点的设计如图9A中#g1和#g2。
2.根据步骤1中的靶位点设计方法,我们选择具有5’-TTC特征的DNA序列(34nt)作为靶向位点,并选择两个反向距离最大为7kb左右的DNA序列(34nt)作为间隔序列进行U3-RNA载体的构建。之后将每个构建完成的U3-RNA载体连接至p1300载体上。
3.将步骤2中构建好的载体进行拟南芥原生质体的转化,22℃暗培养48h后提取原生质体的DNA并进行PCR扩增。检测方法同实施例4中步骤4的检测方法,在靶点的上下游500bp附近设计基因组特异性的引物进行PCR扩增,并将扩增的PCR连接Blunt-simple载体,随机挑选96个重组的克隆用M13F/M13R引物对进行菌P检测和一代测序,将一代测序的结果与参考基因的基因组序列进行比对,比对结果如图9B所示。根据一代测序结果,type I-C系统在拟南芥原生质体的编辑效率为7.29%(如表4所示)。
表4 编辑效率统计
Figure PCTCN2022096648-appb-000018
尽管本发明的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公布的所有教导,可以对细节进行各种修改和变动,并且这些改变均在本发明的保护范围之内。本发明的全部分为由所附权利要求及其任何等同物给出。

Claims (52)

  1. 一种Type I-C CRISPR-Cas3系统,其包含:
    (1)cas5c蛋白或编码cas5c蛋白的核苷酸序列;
    (2)cas8c蛋白或编码cas8c蛋白的核苷酸序列;
    (3)cas7蛋白或编码cas7蛋白的核苷酸序列;以及,
    (4)cas11c蛋白或编码cas11c蛋白的核苷酸序列。
  2. 权利要求1所述的系统,其中,所述系统还包括:(5)cas3蛋白或编码cas3蛋白的核苷酸序列。
  3. 权利要求1或2所述的系统,其中,(1)-(5)任一项中所述的蛋白任选地包含另外的蛋白或多肽,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),腺苷脱氨酶(例如,TadA8e),胞嘧啶脱氨酶(例如,APOBEC3),具有选自下列的活性的结构域:甲基化酶活性,去甲基化酶活性,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合;
    例如,(1)-(5)任一项中所述的蛋白中的至少1个(例如至少2个,至少3个,至少4个或全部5个)包含所述另外的蛋白或多肽;例如,(1)-(5)每一项中所述的蛋白均包含所述另外的蛋白或多肽;
    例如,所述另外的蛋白或多肽是NLS序列;例如,(1)-(5)每一项中所述的蛋白均包含NLS序列;
    例如,所述NLS序列如SEQ ID NO:15所示;
    例如,所述另外的蛋白或多肽通过接头或者不通过接头与所述蛋白连接;
    例如,所述接头是肽接头或非肽接头;
    例如,所述肽接头序列如SEQ ID NO:16、17或66所示;
    例如,所述NLS序列位于、靠近或接近所述蛋白的末端(例如,N端或C端);
    例如,所述另外的蛋白或多肽是腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,(1)-(4)任一项中所述的蛋白中的1个包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);
    例如,所述腺苷脱氨酶或胞嘧啶脱氨酶的氨基酸序列位于、靠近或接近所述蛋白(例如cas8c蛋白)的末端(例如,N端或C端);
    例如,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端。
  4. 权利要求1-3任一项所述的系统,其中:
    (1)所述cas5c蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:2所示的序列;(ii)与SEQ ID NO:2所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:2所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    (2)所述cas8c蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:3所示的序列;(ii)与SEQ ID NO:3所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:2所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    (3)所述cas7蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:4所示的序列;(ii)与SEQ ID NO:4所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:4所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    (4)所述cas11c蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:5所示的序列;(ii)与SEQ ID NO:5所示的序列相比具有一个或多个氨基酸的置 换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:5所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    优选地,所述cas3蛋白包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:1所示的序列;(ii)与SEQ ID NO:1所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:1所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。
  5. 权利要求1-4任一项所述的系统,其中:
    (1)所述cas5c蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:19所示的序列;(ii)与SEQ ID NO:19所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:19所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    (2)所述cas8c蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:21所示的序列;(ii)与SEQ ID NO:21所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:21所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    (3)所述cas7蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:20所示的序列;(ii)与SEQ ID NO:20所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8 个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:20所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    (4)所述cas11c蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:22所示的序列;(ii)与SEQ ID NO:22所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:22所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列;
    优选地,所述cas3蛋白包含NLS序列,并且包含选自下列的序列,或由选自下列的序列组成:(i)SEQ ID NO:18所示的序列;(ii)与SEQ ID NO:18所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或(iii)与SEQ ID NO:18所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。
  6. 权利要求1-5任一项所述的系统,其中,所述系统不包含cas3蛋白或编码cas3蛋白的核苷酸序列;
    例如,所述系统中的一个cas蛋白(例如cas5c、cas8c蛋白、cas7或cas11c)包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);例如,所述腺苷脱氨酶或胞嘧啶脱氨酶的氨基酸序列位于、靠近或接近所述cas蛋白的末端(例如,N端或C端);
    例如,所述系统中的cas8c蛋白包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);
    例如,所述腺苷脱氨酶或胞嘧啶脱氨酶氨基酸序列位于、靠近或接近所述cas8c蛋白的N端;
    例如,所述腺苷脱氨酶或胞嘧啶脱氨酶通过接头或者不通过接头与所述蛋白连接;
    例如,所述接头是肽接头或非肽接头;
    例如,所述肽接头序列如SEQ ID NO:16、17或66所示;
    例如,所述系统中的cas8c蛋白包含TadA8e,所述cas8c蛋白包含如SEQ ID NO:67所示的序列。
  7. 权利要求1-6任一项所述的系统,其进一步包含Type I-C CRISPR-Cas3系统的导向RNA(guide RNA)或编码所述导向RNA的核苷酸序列;其中,所述导向RNA包含同向重复序列以及能够与靶序列杂交的导向序列;
    例如,所述同向重复序列包含茎环结构;
    例如,所述同向重复序列能够与所述系统中的一种或多种cas蛋白结合;例如,所述同向重复序列能够与选自cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白中的一种或多种蛋白结合;例如,所述导向RNA能够与cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白形成的Cascade复合物结合;
    例如,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’TTC-所示的序列。
  8. 权利要求7所述的系统,其中,所述同向重复序列包含第一区域和第二区域,所述第一区域包含茎环结构;
    例如,所述第一区域位于所述第二区域的5’端;
    例如,所述第一区域与所述第二区域之间含有或不含有多余核苷酸。
  9. 权利要求8所述的系统,其中,所述导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述同向重复序列第一拷贝和同向重复序列第二拷贝之间的导向序列。
  10. 权利要求8所述的系统,其中,所述导向RNA包含同向重复序列第一拷贝的第二区域,导向序列,以及同向重复序列第二拷贝的第一区域;
    优选地,所述导向序列位于所述同向重复序列第一拷贝的第二区域和所述同向重复序 列第二拷贝的第一区域之间;
    优选地,所述同向重复序列第一拷贝的第二区域位于所述导向序列的5’端,并且,所述同向重复序列第二拷贝的第一区域位于所述导向序列的3’端;
    优选地,所述同向重复序列第一拷贝的第二区域与所述导向序列之间含有或不含有多余核苷酸;
    优选地,所述导向序列与所述同向重复序列第二拷贝的第一区域之间含有或不含有多余核苷酸。
  11. 权利要求7所述的系统,其中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求4或5中定义时,所述同向重复序列包含SEQ ID NO:11所示的序列或由SEQ ID NO:11所示的序列组成。
  12. 权利要求8所述的系统,其中,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求4或5中定义时,所述同向重复序列的第一区域包含SEQ ID NO:13所示的序列或由SEQ ID NO:13所示的序列组成,所述同向重复序列的第二区域包含SEQ ID NO:14所示的序列或由SEQ ID NO:14所示的序列组成。
  13. 权利要求1-6任一项所述的系统,其进一步包含Type I-C CRISPR-Cas3系统的一种或多种导向RNA或编码所述一种或多种导向RNA的核苷酸序列;其中,所述一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;
    其中,所述第一靶序列和第二靶序列分别位于双链靶核酸分子中待修饰区域(例如待缺失区域)的侧翼;
    例如,所述第一靶序列和第二靶序列分别位于所述待修饰区域的两条单链上;例如,所述第一靶序列和第二靶序列在各自单链中分别位于所述待修饰区域的3’端;
    例如,所述同向重复序列包含茎环结构;
    例如,所述同向重复序列能够与所述系统中的一种或多种cas蛋白结合;例如,所述同向重复序列能够与选自cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋白中的一种或多种蛋白结合;例如,所述导向RNA能够与cas5c蛋白、cas8c蛋白、cas7蛋白、cas11c蛋 白形成的Cascade复合物结合;
    例如,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’TTC-所示的序列。
  14. 权利要求13所述的系统,其中,所述同向重复序列包含第一区域和第二区域,所述第一区域包含茎环结构;
    例如,所述第一区域位于所述第二区域的5’端;
    例如,所述第一区域与所述第二区域之间含有或不含有多余核苷酸。
  15. 权利要求13或14所述的系统,其中,所述一种导向RNA包含:
    (i)同向重复序列的第一拷贝,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝;或者,
    (ii)同向重复序列的第一拷贝的第二区域,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝的第一区域;
    优选地,(i)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝;优选地,(ii)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝的第二区域,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝的第一区域;
    例如,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求4或5中定义时,所述同向重复序列如SEQ ID NO:11所示;
    例如,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求4或5中定义时,所述同向重复序列的第一区域包含SEQ ID NO:13所示的序列或由SEQ ID NO:13所示的序列组成,所述同向重复序列的第二区域包含SEQ ID NO:14所示的序列或由SEQ ID NO:14所示的序列组成。
  16. 权利要求13或14所述的系统,其中,所述多种导向RNA包含:
    包含同向重复序列以及能够与第一靶序列杂交的第一导向序列的第一导向RNA;和
    包含同向重复序列以及能够与第二靶序列杂交的第二导向序列的第二导向RNA。
  17. 权利要求16所述的系统,其中:
    所述第一导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第一导向序列;或者,所述第一导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第一导向序列,以及同向重复序列第二拷贝的第一区域;
    所述第二导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第二导向序列;或者,所述第二导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第二导向序列,以及同向重复序列第二拷贝的第一区域;
    例如,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求4或5中定义时,所述同向重复序列如SEQ ID NO:11所示;
    例如,当所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求4或5中定义时,所述同向重复序列的第一区域包含SEQ ID NO:13所示的序列或由SEQ ID NO:13所示的序列组成,所述同向重复序列的第二区域包含SEQ ID NO:14所示的序列或由SEQ ID NO:14所示的序列组成。
  18. 一种Type I-C CRISPR-Cas3载体系统,其包含一种或多种载体,所述一种或多种载体包含:编码Type I-C CRISPR-Cas3系统中的cas蛋白的核苷酸序列,所述cas蛋白包含cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白;
    例如,所述cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求3-5任一项中定义。
  19. 权利要求18所述的载体系统,其中,所述一种或多种载体还包含编码cas3蛋白的核苷酸序列;
    例如,所述cas3蛋白如权利要求3-5任一项中定义。
  20. 权利要求19所述的载体系统,其中,所述一种或多种载体包含:
    第一表达盒,其包含编码cas3蛋白的核苷酸序列;以及,
    第二表达盒,其包含编码cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列;
    例如,所述第一表达盒包含启动子,例如诱导型启动子;
    例如,所述第二表达盒包含启动子,例如诱导型启动子;
    例如,在所述第二表达盒中,所述编码cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列以任意顺序排列;
    例如,在所述第二表达盒中,所述编码cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列彼此之间由编码自裂解肽(例如T2A)的核苷酸序列连接。
  21. 权利要求18所述的载体系统,其中,所述一种或多种载体不包含编码cas3蛋白的核苷酸序列;
    例如,所述系统中的cas蛋白如权利要求6中定义。
  22. 权利要求21所述的载体系统,其中,所述一种或多种载体包含:
    第一表达盒,其包含编码cas8c蛋白的核苷酸序列;以及,
    第二表达盒,其包含编码cas5c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列;
    例如,所述cas8c蛋白如权利要求6中定义;
    例如,所述第一表达盒包含启动子,例如诱导型启动子;
    例如,所述第二表达盒包含启动子,例如诱导型启动子;
    例如,在所述第二表达盒中,所述编码cas5c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列以任意顺序排列;
    例如,在所述第二表达盒中,所述编码cas5c蛋白、cas7蛋白和cas11c蛋白的核苷酸序列彼此之间由编码自裂解肽(例如T2A)的核苷酸序列连接。
  23. 权利要求18-22任一项所述的载体系统,其中,所述一种或多种载体还包括:包含编码Type I-C CRISPR-Cas3系统中的导向RNA的核苷酸序列,所述导向RNA如权利要求7-12任一项中定义;
    例如,所述编码Type I-C CRISPR-Cas3系统中的导向RNA的核苷酸序列位于另外的表达盒中;例如,所述另外的表达盒包含启动子,例如诱导型启动子。
  24. 权利要求18-22任一项所述的载体系统,其中,所述一种或多种载体还包括:编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列,所述一种或多种导向RNA如权利要求13-17任一项中定义;
    例如,所述编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列位于另外的表达盒中;例如,所述另外的表达盒包含启动子,例如诱导型启动子。
  25. 权利要求18-24任一项所述的载体系统,其中,所述编码cas蛋白的核苷酸序列均位于同一载体上;
    例如,所述编码cas蛋白的核苷酸序列以及编码导向RNA的核苷酸序列均位于同一载体上。
  26. 一种Type I-C CRISPR-Cas3系统,其包含:一种或多种导向RNA或编码所述一种或多种导向RNA的核苷酸序列;其中,所述一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;
    其中,所述第一靶序列和第二靶序列分别位于双链靶核酸分子中待修饰区域(例如待缺失区域)的侧翼;
    例如,所述第一靶序列和第二靶序列分别位于所述待修饰区域的两条单链上;例如,所述第一靶序列和第二靶序列在各自单链中分别位于所述待修饰区域的3’端;
    例如,所述同向重复序列包含茎环结构;
    例如,所述同向重复序列能够与Type I-C CRISPR-Cas3系统中的一种或多种cas蛋白结合;
    例如,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3’端,并且所述PAM具有5’TTC-所示的序列。
  27. 权利要求26所述的系统,其中,所述同向重复序列包含第一区域和第二区域,所述第一区域包含茎环结构;
    优选地,所述第一区域位于所述第二区域的5’端;
    例如,所述第一区域与所述第二区域之间含有或不含有多余核苷酸。
  28. 权利要求26或27所述的系统,其中,所述一种导向RNA包含:
    (i)同向重复序列的第一拷贝,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝;或者,
    (ii)同向重复序列的第一拷贝的第二区域,能够与第一靶序列杂交的第一导向序列,同向重复序列的第二拷贝,能够与第二靶序列杂交的第二导向序列,同向重复序列的第三拷贝的第一区域;
    优选地,(i)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝;
    优选地,(ii)中,所述一种导向RNA从5’至3’方向包含:所述同向重复序列的第一拷贝的第二区域,所述第一导向序列,所述同向重复序列的第二拷贝,所述第二导向序列,所述同向重复序列的第三拷贝的第一区域。
  29. 权利要求26或27所述的系统,其中,所述多种导向RNA包含:
    包含同向重复序列以及能够与第一靶序列杂交的第一导向序列的第一导向RNA;和
    包含同向重复序列以及能够与第二靶序列杂交的第二导向序列的第二导向RNA。
  30. 权利要求29所述的系统,其中:
    所述第一导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第一导向序列;或者,所述第一导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第一导向序列,以及同向重复序列第二拷贝的第一区域;
    所述第二导向RNA包含两个拷贝的同向重复序列,即,同向重复序列的第一拷贝和同向重复序列的第二拷贝,以及位于所述两个拷贝的重复序列之间的第二导向序列;或者,所述第二导向RNA从5’至3’方向包含同向重复序列第一拷贝的第二区域,第二导向 序列,以及同向重复序列第二拷贝的第一区域。
  31. 权利要求26-30任一项所述的系统,其中,所述系统进一步包括:Type I-CCRISPR-Cas3系统中的cas蛋白或编码所述cas蛋白的核苷酸序列;
    例如,所述cas蛋白各自还包含另外的蛋白或多肽,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),腺苷脱氨酶(例如,TadA8e),胞嘧啶脱氨酶(例如,APOBEC3),具有选自下列的活性的结构域:甲基化酶活性,去甲基化酶活性,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合;
    例如,所述另外的蛋白或多肽是NLS序列;
    例如,所述另外的蛋白或多肽是腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。
  32. 权利要求31所述的系统,其中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白和cas7蛋白;
    例如,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白如权利要求3-5任一项中定义。
  33. 权利要求31所述的系统,其中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白;
    例如,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白如权利要求3-5任一项中定义。
  34. 权利要求31所述的系统,其中,所述cas蛋白包含cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白,并且不包含cas3蛋白;
    例如,所述cas5c蛋白、cas8c蛋白、cas7蛋白以及cas11c蛋白如权利要求3-5任一项中定义;
    例如,所述cas蛋白如权利要求6中定义。
  35. 一种Type I-C CRISPR-Cas3载体系统,其包含一种或多种载体,所述一种或多种载体包含:编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列,所述一种或多种导向RNA如权利要求13-17任一项中定义。
  36. 权利要求35所述的载体系统,其中,所述一种或多种载体还包含:编码Type I-C CRISPR-Cas3系统中的cas蛋白的核苷酸序列;
    例如,所述cas蛋白各自还包含另外的蛋白或多肽,所述另外的蛋白或多肽选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),腺苷脱氨酶(例如,TadA8e),胞嘧啶脱氨酶(例如,APOBEC3),具有选自下列的活性的结构域:甲基化酶活性,去甲基化酶活性,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合;
    例如,所述另外的蛋白或多肽是NLS序列;
    例如,所述另外的蛋白或多肽是腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3)。
  37. 权利要求36所述的载体系统,其中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白、和cas7蛋白;
    例如,所述cas3蛋白、cas5c蛋白、cas8c蛋白、和cas7蛋白如权利要求3-5任一项中定义。
  38. 权利要求36所述的载体系统,其中,所述cas蛋白包含cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白;
    例如,所述cas3蛋白、cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白如权利要求3-5任一项中定义。
  39. 权利要求36所述的载体系统,其中,所述cas蛋白包含cas5c蛋白、cas8c蛋白、cas7蛋白和cas11c蛋白;并且不包含cas3蛋白;
    例如,所述cas蛋白如权利要求6中定义。
  40. 权利要求35-39任一项所述的载体系统,其中,所述编码Type I-C CRISPR-Cas3系统中的一种或多种导向RNA的核苷酸序列与所述编码Type I-C CRISPR-Cas3系统中的cas蛋白的核苷酸序列位于不同的表达盒内;
    例如,所述编码cas3蛋白的核苷酸序列与编码其他cas蛋白的核苷酸序列位于不同的表达盒内;例如,位于同一表达盒内的所述编码cas蛋白的核苷酸序列彼此之间由编码自裂解肽(例如T2A)的核苷酸序列连接。
  41. 权利要求35-40任一项所述的载体系统,其中,所述编码cas蛋白的核苷酸序列均位于同一载体上;
    例如,所述编码cas蛋白的核苷酸序列以及所述编码一种或多种导向RNA的核苷酸序列均位于同一载体上。
  42. 试剂盒,其包括权利要求1-17任一项所述的系统、权利要求18-25任一项所述的载体系统、权利要求26-34任一项所述的系统或权利要求35-41任一项所述的载体系统;以及使用所述系统进行核酸编辑(例如基因或基因组编辑,基因或基因组大片段缺失,基因或基因组单碱基修饰,基因组结构变异)的说明书;
    例如,所述试剂盒包含权利要求7-12任一项所述的系统;
    例如,所述试剂盒包含权利要求13-17任一项所述的系统;
    例如,所述试剂盒包含权利要求18-22所述的载体系统;
    例如,所述试剂盒包含权利要求23-25任一项所述的载体系统;
    例如,所述试剂盒包含权利要求31-34任一项所述的系统;
    例如,所述试剂盒包含权利要求36-41任一项所述的载体系统。
  43. 递送组合物,其包含权利要求1-17任一项所述的系统、权利要求18-25任一项所述的载体系统、权利要求26-34任一项所述的系统或权利要求35-41任一项所述的载体系 统,以及递送系统;
    例如,所述递送系统选自粒子、囊泡或病毒载体;
    例如,所述粒子包含脂质、糖、金属或蛋白质;
    例如,所述囊泡包含外来体或脂质体;
    例如,所述病毒载体包含腺病毒、慢病毒或腺相关病毒;
    例如,所述递送组合物包含权利要求7-12任一项所述的系统;
    例如,所述递送组合物包含权利要求13-17任一项所述的系统;
    例如,所述递送组合物包含权利要求18-22所述的载体系统;
    例如,所述递送组合物包含权利要求23-25任一项所述的载体系统;
    例如,所述递送组合物包含权利要求31-34任一项所述的系统;
    例如,所述递送组合物包含权利要求36-41任一项所述的载体系统。
  44. 在靶基因组中诱导缺失的方法,所述靶基因组包含互补的第一核酸链和第二核酸链,所述方法包括:将权利要求13-17任一项所述的系统或权利要求24或25所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统与所述靶基因组接触,或者递送至包含所述靶基因组的细胞中;
    例如,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的区域的缺失;
    例如,所述缺失是大片段缺失,例如大于0.1kb、大于0.2kb、大于0.5kb、大于1kb、大于10kb、大于100kb、大于10kb、大于50kb、大于100kb、例如小于500kb、小于400kb、小于300kb、小于200kb的片段缺失;
    例如,所述系统或载体系统所包含的一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;其中,所述第一靶序列和第二靶序列分别位于所述靶基因组中待缺失区域的侧翼;
    例如,所述第一靶序列位于所述靶基因组的第一核酸链,所述第二靶序列位于所述靶基因组的第二核酸链;例如,在第一核酸链中,所述第一靶序列位于所述待缺失区域的3’端,并且,第二核酸链中,所述第二靶序列位于所述待缺失区域的3’端;
    例如,所述待缺失区域的长度大于0.1kb,例如大于0.2kb,大于0.3kb,大于0.4kb,大于0.5kb;例如,所述待缺失区域的长度小于500kb,例如小于400kb,小于 300kb,小于200kb;例如所述待缺失区域的长度为0.2kb-200kb(例如0.2kb-2kb、0.2kb-5kb、0.2kb-10kb、0.2kb-100kb、0.2kb-200kb;例如0.5kb-1.5kb、0.5kb-2kb、0.5kb-10kb);
    例如,所述靶基因组存在于细胞内,或者,所述靶基因组存在于体外的核酸分子(例如,质粒)中;
    例如,所述细胞是原核细胞;
    例如,所述细胞是真核细胞;
    例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体,水稻细胞,拟南芥细胞、拟南芥原生质体);
    例如,所述方法用于染色体消除。
  45. 诱导基因组结构变异的方法,所述基因组包含互补的第一核酸链和第二核酸链,所述方法包括:将权利要求13-17任一项所述的系统或权利要求24或25所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统与靶基因组接触,或者递送至包含所述靶基因组的细胞中;
    例如,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的区域的缺失从而诱导基因组结构变异;
    例如,所述缺失是大片段缺失,例如大于0.1kb、大于0.2kb、大于0.5kb、大于1kb、大于10kb、大于100kb、大于10kb、大于50kb、大于100kb、例如小于500kb、小于400kb、小于300kb、小于200kb的片段缺失;
    例如,所述系统或载体系统所包含的一种或多种导向RNA包含同向重复序列、能够与第一靶序列杂交的第一导向序列以及能够与第二靶序列杂交的第二导向序列;其中,所述第一靶序列和第二靶序列分别位于所述靶基因组中待缺失区域的侧翼;
    例如,所述第一靶序列位于所述靶基因组的第一核酸链,所述第二靶序列位于所述靶基因组的第二核酸链;例如,在第一核酸链中,所述第一靶序列位于所述待缺失区域的3’端,并且,第二核酸链中,所述第二靶序列位于所述待缺失区域的3’端;
    例如,所述待缺失区域的长度大于0.1kb,例如大于0.2kb,大于0.3kb,大于0.4kb,大于0.5kb;例如,所述待缺失区域的长度小于500kb,例如小于400kb,小于 300kb,小于200kb;例如所述待缺失区域的长度为0.2kb-200kb(例如0.2kb-2kb、0.2kb-5kb、0.2kb-10kb、0.2kb-100kb、0.2kb-200kb;例如0.5kb-1.5kb、0.5kb-2kb、0.5kb-10kb);
    例如,所述靶基因组存在于细胞内,或者,所述靶基因组存在于体外的核酸分子(例如,质粒)中;
    例如,所述细胞是原核细胞;
    例如,所述细胞是真核细胞;
    例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体)。
  46. 修饰靶核酸分子的方法,其包括:将权利要求7-17任一项所述的系统、权利要求23-25任一项所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统与所述靶核酸分子接触,或者递送至包含所述靶核酸分子的细胞中;
    例如,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的靶核酸分子的修饰;
    例如,所述靶核酸分子是RNA或DNA;
    例如,所述靶核酸分子是双链DNA;
    例如,所述靶核酸分子是基因或基因组;
    例如,所述靶核酸分子存在于细胞内,或者,所述靶核酸分子存在于体外的核酸分子(例如,质粒)中;
    例如,所述细胞是原核细胞;
    例如,所述细胞是真核细胞;
    例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体);
    例如,所述修饰是指所述靶核酸分子的大片段缺失;
    例如,所述修饰是指所述靶核酸分子的断裂,如DNA的双链断裂;例如,所述修饰还包括将外源核酸插入所述断裂中;
    例如,所述修饰是指述靶核酸分子中的单碱基(例如胞嘧啶,腺嘌呤)发生改变。
  47. 诱导靶核酸分子产生单碱基突变的方法,其包括:将权利要求7-17所述的系统、权利要求23-25任一项所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统与所述靶核酸分子接触,或者递送至包含所述靶核酸分子的细胞中;
    例如,所述系统或载体系统中所包含的cas蛋白如权利要求6中定义;
    例如,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的靶核酸分子中单碱基的修饰,并在核酸修复或复制过程产生单碱基突变;
    例如,所述单碱基的修饰是指能改变待修饰碱基的碱基互补配对方式的修饰;例如,经修饰前,所述待修饰碱基与第一碱基互补配对,经修饰后,所述被修饰碱基与第二碱基互补配对;
    例如,所述系统或载体系统中所包含的一种或多种cas蛋白还包含腺苷脱氨酶(例如,TadA8e)或胞嘧啶脱氨酶(例如,APOBEC3);
    例如,所述所述系统或载体系统中所包含的一种或多种cas蛋白(例如cas8c蛋白)还包含腺苷脱氨酶(例如,TadA8e),所述待修饰碱基为腺嘌呤,经修饰前,腺嘌呤与胸腺嘧啶互补配对,经修饰后,腺嘌呤被修饰为次黄嘌呤,次黄嘌呤与胞嘧啶互补配对;
    例如,所述所述系统或载体系统中所包含的一种或多种cas蛋白(例如cas8c蛋白)还包含胞嘧啶脱氨酶(例如,APOBEC3),所述待修饰碱基为胞嘧啶,经修饰前,胞嘧啶与鸟嘌呤互补配对,经修饰后,胞嘧啶被修饰为尿嘧啶,尿嘧啶与胸腺嘧啶互补配对;
    例如,所述靶核酸分子是RNA或DNA;
    例如,所述靶核酸分子是双链DNA;
    例如,所述靶核酸分子是基因或基因组;
    例如,所述靶核酸分子存在于细胞内,或者,所述靶核酸分子存在于体外的核酸分子(例如,质粒)中;
    例如,所述细胞是原核细胞;
    例如,所述细胞是真核细胞;
    例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞、玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体)。
  48. 改变基因产物的表达的方法,其包括:将权利要求7-17任一项所述的系统、权利要求23-25任一项所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统与编码所述基因产物的靶核酸分子接触,或者递送至包含所述靶核酸分子的细胞中;
    例如,所述系统或载体系统中所包含的一种或多种cas蛋白能够与导向RNA形成复合物,并且在所述复合物与靶序列结合后,诱导对包含该靶序列的靶核酸分子的修饰从而改变基因产物的表达;
    例如,所述靶核酸分子存在于细胞内,或者所述靶核酸分子存在于体外的核酸分子(例如,质粒)中;
    例如,所述细胞是原核细胞;
    例如,所述细胞是真核细胞;
    例如,所述细胞选自动物细胞(例如,哺乳动物细胞,例如人类细胞)、植物细胞(例如玉米细胞,玉米原生质体、水稻细胞,拟南芥细胞、拟南芥原生质体);
    例如,所述基因产物的表达被改变(例如,增强或降低);
    例如,所述基因产物是蛋白。
  49. 产生具有经修饰性状的植物的方法,所述方法包括将植物细胞与权利要求7-17任一项所述的系统、权利要求23-25任一项所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统接触,或者使该植物细胞经受权利要求44-48任一项所述的方法,由此修饰或编辑该植物细胞的靶基因或基因组中的靶核酸分子,并且由所述植物细胞再生植物;
    例如,所述方法包括将植物细胞与权利要求13-17任一项所述的系统或权利要求24或25所述的载体系统、权利要求31-34任一项所述的系统、或权利要求36-41任一项所述的载体系统接触;
    例如,所述植物是农业植物,例如玉米、大麦、棉花、大米、大豆、小麦、水稻。
  50. 权利要求44-49任一项所述的方法,其中所述系统或载体系统中所包含的cas蛋白或编码cas蛋白的核苷酸序列、导向RNA或编码导向RNA的核苷酸序列存在于递送 系统中;
    例如,所述递送系统选自粒子、囊泡或病毒载体;
    例如,所述粒子包含脂质、糖、金属或蛋白质;
    例如,所述囊泡包含外来体或脂质体;
    例如,所述病毒载体包含腺病毒、慢病毒或腺相关病毒。
  51. 权利要求1-17任一项所述的系统、权利要求18-25任一项所述的载体系统、权利要求26-34任一项所述的系统或权利要求35-41任一项所述的载体系统,权利要求42所述的试剂盒或权利要求43所述的递送组合物,用于核酸编辑的用途,或者在制备制剂中的用途,所述制剂用于核酸编辑;
    例如,所述核酸编辑包括基因或基因组编辑;
    例如,所述基因或基因组编辑包括核酸大片段缺失、修饰基因、敲除基因、改变基因产物的表达、修复突变、和/或插入多核苷酸、单碱基突变;
    例如,所述核酸编辑包括诱导基因组结构变异或染色体消除。
  52. 权利要求1-17任一项所述的系统、权利要求18-25任一项所述的载体系统、权利要求26-34任一项所述的系统或权利要求35-41任一项所述的载体系统,权利要求42所述的试剂盒或权利要求43所述的递送组合物,在制备制剂中的用途,所述制剂用于编辑靶基因座中的靶核苷酸序列来修饰生物或非人类生物(例如植物)。
PCT/CN2022/096648 2021-06-03 2022-06-01 Type I-C CRISPR-Cas3系统及其应用 WO2022253277A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280039558.3A CN117529552A (zh) 2021-06-03 2022-06-01 Type I-C CRISPR-Cas3系统及其应用

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110620180 2021-06-03
CN202110620180.3 2021-06-03

Publications (1)

Publication Number Publication Date
WO2022253277A1 true WO2022253277A1 (zh) 2022-12-08

Family

ID=84323922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096648 WO2022253277A1 (zh) 2021-06-03 2022-06-01 Type I-C CRISPR-Cas3系统及其应用

Country Status (2)

Country Link
CN (1) CN117529552A (zh)
WO (1) WO2022253277A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2015101792A4 (en) * 2014-12-24 2016-01-28 Massachusetts Institute Of Technology Engineering of systems, methods and optimized enzyme and guide scaffolds for sequence manipulation
US20170175110A1 (en) * 2013-11-27 2017-06-22 Gen9, Inc. Libraries of Nucleic Acids and Methods for Making the Same
WO2019241452A1 (en) * 2018-06-13 2019-12-19 Caribou Biosciences, Inc. Engineered cascade components and cascade complexes
CN111613272A (zh) * 2020-05-21 2020-09-01 西湖大学 程序化框架gRNA及其应用

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170175110A1 (en) * 2013-11-27 2017-06-22 Gen9, Inc. Libraries of Nucleic Acids and Methods for Making the Same
AU2015101792A4 (en) * 2014-12-24 2016-01-28 Massachusetts Institute Of Technology Engineering of systems, methods and optimized enzyme and guide scaffolds for sequence manipulation
WO2019241452A1 (en) * 2018-06-13 2019-12-19 Caribou Biosciences, Inc. Engineered cascade components and cascade complexes
CN112272704A (zh) * 2018-06-13 2021-01-26 卡里布生物科学公司 改造的cascade组分和cascade复合体
CN111613272A (zh) * 2020-05-21 2020-09-01 西湖大学 程序化框架gRNA及其应用

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIROYUKI MORISAKA , KAZUTO YOSHIMI , YUGA OKUZAKI , PETER GEE , YAYOI KUNIHIRO , EKASIT SONPHO , HUAIGENG XU , NORIKO SASAKAWA , Y: "CRISPR-Cas3 induces broad and unidirectional genome editing in human cells", NATURE COMMUNICATIONS, vol. 10, no. 1, 1 December 2019 (2019-12-01), pages 1 - 13, XP055720163, DOI: 10.1038/s41467-019-13226-x *

Also Published As

Publication number Publication date
CN117529552A (zh) 2024-02-06

Similar Documents

Publication Publication Date Title
AU2019372642B2 (en) Novel CRISPR/Cas12f enzyme and system
EP3390631B1 (en) Methods and compositions for t-rna based guide rna expression
EP3387134B1 (en) Methods and compositions for enhanced nuclease-mediated genome modification and reduced off-target site effects
US11713471B2 (en) Class II, type V CRISPR systems
WO2022253185A1 (zh) Cas12蛋白、含有Cas12蛋白的基因编辑系统及应用
JP6552965B2 (ja) 配列操作のための改善された系、方法および酵素組成物のエンジニアリングおよび最適化
EP3080275B1 (en) Method of selection of transformed diatoms using nuclease
JP2022113799A (ja) 遺伝子産物の発現を変更するためのCRISPR-Cas系および方法
CN110592089B (zh) 用于切割靶dna的组合物及其用途
EP2526112B1 (en) Targeted genomic alteration
CN111742051A (zh) 延伸的单向导rna及其用途
JP2016539653A (ja) 微小藻類のゲノム操作のためのCas9ヌクレアーゼプラットフォーム
WO2020098793A1 (zh) CRISPR-Cas12a酶和系统
WO2020087631A1 (zh) 基于C2c1核酸酶的基因组编辑系统和方法
CN113025597B (zh) 改进的基因组编辑系统
WO2021004456A1 (zh) 改进的基因组编辑系统及其应用
CN106978438B (zh) 提高同源重组效率的方法
WO2022253277A1 (zh) Type I-C CRISPR-Cas3系统及其应用
US20210269801A1 (en) Methods of Managing Nucleic Acid Replication, Expression, and Cleavage Using CRISPR Associated Nucleases
KR102151064B1 (ko) 매칭된 5' 뉴클레오타이드를 포함하는 가이드 rna를 포함하는 유전자 교정용 조성물 및 이를 이용한 유전자 교정 방법
WO2022075419A1 (ja) Crisprタイプi-dシステムを利用した標的ヌクレオチド配列改変技術
US20220073937A1 (en) Increasing gene editing and site-directed integration events utilizing mieotic and germline promoters
JP2024501892A (ja) 新規の核酸誘導型ヌクレアーゼ
WO2024175016A1 (zh) CRISPR/Cas效应蛋白及系统
AU2020356441A1 (en) A nucleic acid delivery vector comprising a circular single stranded polynucleotide

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815319

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280039558.3

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22815319

Country of ref document: EP

Kind code of ref document: A1