WO2023232109A1 - 新的crispr基因编辑系统 - Google Patents

新的crispr基因编辑系统 Download PDF

Info

Publication number
WO2023232109A1
WO2023232109A1 PCT/CN2023/097783 CN2023097783W WO2023232109A1 WO 2023232109 A1 WO2023232109 A1 WO 2023232109A1 CN 2023097783 W CN2023097783 W CN 2023097783W WO 2023232109 A1 WO2023232109 A1 WO 2023232109A1
Authority
WO
WIPO (PCT)
Prior art keywords
guide rna
trac
sequence
protein
effector protein
Prior art date
Application number
PCT/CN2023/097783
Other languages
English (en)
French (fr)
Inventor
高彩霞
靳帅
朱子旭
李运嘉
K·T·赵
刘关稳
Original Assignee
中国科学院遗传与发育生物学研究所
北京齐禾生科生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院遗传与发育生物学研究所, 北京齐禾生科生物科技有限公司 filed Critical 中国科学院遗传与发育生物学研究所
Publication of WO2023232109A1 publication Critical patent/WO2023232109A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)

Definitions

  • the invention belongs to the field of genetic engineering. Specifically, the present invention relates to a new CRISPR gene editing system and its application. More specifically, the present invention provides a transposon and CRISPR-Cas12 intermediate (TraC) effector protein or a functional variant thereof, as well as a gene editing system based thereon and its application.
  • TraC transposon and CRISPR-Cas12 intermediate
  • the Type V effector protein is a Cas12 protein with multiple functional domains. Its iconic feature is that it contains a RuvC-like domain, which is generally responsible for the cleavage of target DNA.
  • Type V subtypes are very abundant. The currently discovered and classified subtypes include Cas12a-k, a total of 11 subtypes. Among them, Cas12a and Cas12b have been developed into efficient eukaryotic gene editing systems.
  • Cas12a also known as Cpf1 protein, includes a RuvC-like domain similar to Cas9 protein or TnpB protein, but compared with Cas9, Cas12a family proteins lack the HNH domain and only use the RuvC domain to cut the two strands of DNA. .
  • Cas12b was called C2c1 (Class 2 Candidate 1) when it was first discovered. Its C-terminal sequence is very similar to the TnpB protein of the IS605 family, but does not have significant sequence similarity with other Class II family proteins. Its Cas genes include Cas1/Cas4 fusion gene, Cas2, and Cas12b genes. The maturation of crRNA also requires the participation of trRNA. Cas12c was originally called C2c3 (Class 2 Candidate 3), and its Cas genes only include Cas1 and Cas12c genes. The Cas12c gene only has limited similarity with the TnpB homologous sequence of Cpf1.
  • Type V subtypes have increased explosively in recent years.
  • a total of 10 Type V subtypes have been discovered, including Cas12a, Cas12b and Cas12c proteins.
  • the nucleic acid interference activities of these subtypes have also been gradually demonstrated experimentally.
  • scientists from Arbor Biotechnology Company have demonstrated through in vitro experiments the DNA double-stranded cleavage activity of effector proteins Cas12c, Cas12g, Cas12h and Cas12i from Type VC, Type VG, Type VH, and Type VI.
  • the effector proteins of Type VD and Type VE subtypes are CasX and CasY respectively, also known as Cas12d and Cas12e.
  • the effector protein of the Type VF subtype which was previously considered to be one of the subtypes of the Type VU family, is Cas14 (also known as Cas12f), which can cleave single-stranded DNA and RNA and is only one-third the size of the Cas9 protein.
  • the Cas14 protein was first developed into the nucleic acid detection tool DETECTOR, and has recently been proven to have double-stranded DNA cleavage activity in prokaryotes and eukaryotes.
  • the Cas ⁇ protein (also known as Cas12j) recently discovered in macrophages has also been proven to have the ability to cut double-stranded DNA in prokaryotes, animal cells, and plant cells.
  • its effector protein Cas12k is "hijacked" by the transposon Tn7, which can generate an R-loop at the target site and utilize the targeting ability of crRNA to achieve site-specific transposition of the transposon.
  • This hijacking protein provides a new strategy for targeted insertion into DNA.
  • Embodiment 1 An engineered clustered regularly interspaced short palindromic repeats (CRISPR) system, comprising:
  • the guide RNA is selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) a guide RNA comprising tracrRNA and/or crRNA, such as a single guide RNA (sgRNA) comprising tracrRNA and crRNA;
  • reRNA guide RNA
  • sgRNA single guide RNA
  • the TraC effector protein can form a CRISPR complex with guide RNA
  • the TraC effector protein can target-bind to a target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or can target-bind to a target DNA sequence under the guidance of a guide RNA containing tracrRNA and/or crRNA.
  • Embodiment 2 The engineered CRISPR system of embodiment 1, wherein the tracrRNA contains a non-targeting strand binding sequence (NTB) complementary to a non-targeting strand (NTS).
  • NTB non-targeting strand binding sequence
  • NTS non-targeting strand
  • Embodiment 3 An engineered clustered regularly interspaced short palindromic repeats CRISPR vector system comprising one or more constructs, comprising:
  • a second regulatory element operably linked to one or more nucleotide sequences encoding one or more guide RNAs selected from i) derived from a guide RNA (reRNA) of the right end element of the transposon and/or ii) a guide RNA containing tracrRNA and/or crRNA, such as a single guide RNA (sgRNA) containing tracrRNA and crRNA;
  • reRNA guide RNA
  • sgRNA single guide RNA
  • the TraC effector protein can form a CRISPR complex with guide RNA
  • the TraC effector protein can target-bind to a target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or can target-bind to a target DNA sequence under the guidance of a guide RNA containing tracrRNA and/or crRNA.
  • Embodiment 4 The engineered CRISPR vector system of embodiment 3, wherein the tracrRNA contains a non-targeting strand binding sequence (NTB) complementary to a non-targeting strand (NTS).
  • NTB non-targeting strand binding sequence
  • NTS non-targeting strand
  • Embodiment 5 The system of embodiment 2 or 4, wherein the guide RNA is a guide RNA comprising tracrRNA and crRNA, wherein the tracrRNA contains a non-targeting strand binding sequence complementary to the non-targeting strand (NTS) ( NTB), wherein the guide RNA hybridizes to the targeting strand (TS) of the target DNA sequence via crRNA and to the non-targeting strand (NTS) via NTB.
  • NTS targeting strand
  • NTS targeting strand
  • Embodiment 6 The system of embodiment 4, wherein when transcribed, the one or more guide RNAs Hybridizes to the target DNA, and the guide RNA forms a complex with the TraC effector protein, which causes distal cleavage of the target DNA sequence.
  • Embodiment 7 The system of any one of embodiments 1-6, wherein the target DNA sequence is within a cell, preferably within a eukaryotic cell.
  • Embodiment 8 The system of any one of embodiments 1-7, wherein the effector protein comprises one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences or mitochondrial localization sequences.
  • NLS nuclear localization sequences
  • cytoplasmic localization sequences cytoplasmic localization sequences
  • chloroplast localization sequences mitochondrial localization sequences.
  • Embodiment 9 The system of any one of embodiments 1-8, wherein the nucleic acid sequences encoding the effector protein are codon optimized for expression in eukaryotic cells.
  • Embodiment 10 The system of any one of embodiments 1-9, wherein components a) and b) or their nucleotide sequences are constructed on the same or different vectors.
  • Embodiment 11 A method for modifying a DNA sequence of interest, which method includes systematically delivering the DNA sequence of interest as described in any one of Embodiments 1-10 into the DNA sequence of interest or a cell containing the DNA sequence of interest.
  • Embodiment 12 A method of modifying a DNA sequence of interest, the method comprising delivering a composition of a TraC effector protein and one or more nucleic acid components to the DNA sequence of interest, wherein the effector protein is capable of being transformed upon derivation.
  • Targeted binding to the target DNA sequence is guided by the guide RNA of the right end element of the transposon, and can also be targeted to bind the target DNA sequence under the guidance of the guide RNA containing tracrRNA and crRNA; the effector protein and the one or more nucleic acid components
  • a CRISPR complex is formed, and after the complex is targeted to bind to a DNA sequence of interest that is 3' to the protospacer adjacent motif (PAM), the effector protein induces modification of the DNA sequence of interest.
  • PAM protospacer adjacent motif
  • Embodiment 13 The method of embodiment 12, wherein the gene of interest is in a cell, preferably a eukaryotic cell.
  • Embodiment 14 The method of embodiment 13, wherein the cell is an animal cell or a human cell.
  • Embodiment 15 The method of embodiment 13, wherein the cell is a plant cell.
  • Embodiment 16 The method of embodiment 12, wherein the effector protein comprises one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences or mitochondrial localization sequences.
  • NLS nuclear localization sequences
  • cytoplasmic localization sequences cytoplasmic localization sequences
  • chloroplast localization sequences mitochondrial localization sequences.
  • Embodiment 17 The method of embodiment 12, wherein the effector protein and nucleic acid component, or a construct expressing the effector protein and nucleic acid component, are comprised in a delivery system.
  • Embodiment 18 The method of embodiment 17, wherein the delivery system comprises a virus, virus-like particles, virions, liposomes, vesicles, exosomes, liposomal nanoparticles (LNP), N-acetylgalactose amine (GalNAc) or engineered bacteria.
  • the delivery system comprises a virus, virus-like particles, virions, liposomes, vesicles, exosomes, liposomal nanoparticles (LNP), N-acetylgalactose amine (GalNAc) or engineered bacteria.
  • Embodiment 19 A transposon and CRISPR-Cas12 intermediate (TraC) effector protein or functional variant thereof for genome editing in an organism or an organism cell, wherein the TraC effector protein or its function Sexual variants can form CRISPR complexes with guide RNA;
  • TraC CRISPR-Cas12 intermediate
  • the guide RNA is selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) a guide RNA comprising tracrRNA and crRNA, such as a single guide RNA (sgRNA) comprising tracrRNA and crRNA;
  • reRNA guide RNA
  • sgRNA single guide RNA
  • the TraC effector protein or a functional variant thereof is capable of acting on a guide RNA derived from the right end element of a transposon. Targeted binding to target DNA sequences can also be guided by guide RNA containing tracrRNA and crRNA.
  • Embodiment 20 Transposon and CRISPR-Cas12 intermediate protein (TraC) effector protein or functional variant thereof for genome editing in an organism or organism cell, the TraC effector protein or a functional variant thereof and
  • (ii) Contains at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70% with one of SEQ ID NO: 1-37 , at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least An amino acid sequence that is 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or even 100% sequence identical, or contains one or more, with respect to SEQ ID NOs: 1-37, For example, an amino acid sequence with 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids substituted, deleted or added.
  • Embodiment 21 The TraC effector protein of embodiment 20 or a functional variant thereof, wherein the effector protein functional variant is derived from SEQ ID NO: 25 and includes a sequence selected from K78R relative to SEQ ID NO: 25 , D86R, S137R, V145R, I147R, P148R, D150R, V228R, V254R, A510R, A278R, K315R, S334R, L343R, A369R, H392R, L394R, S408R, N456R, V500R, A510R, T One or more amino acid substitutions of 573R.
  • Embodiment 22 The TraC effector protein of embodiment 20 or 21, or a functional variant thereof, wherein the effector protein functional variant is derived from SEQ ID NO:25 and includes a sequence selected relative to SEQ ID NO:25. Mutations from any of the sets shown in Table 3 or Table 4.
  • Embodiment 23 The TraC effector protein or functional variant thereof of embodiment 20, wherein the effector protein functional variant comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 80-87.
  • Embodiment 24 The TraC effector protein of any one of embodiments 20-23, or a functional variant thereof, which has at least guide RNA-mediated sequence-specific targeting ability.
  • Embodiment 25 The TraC effector protein of any one of embodiments 20-23, or a functional variant thereof, which has guide RNA-mediated sequence-specific targeting ability, and double-stranded nucleic acid cleavage activity.
  • Embodiment 26 The TraC effector protein of any one of embodiments 20-23, or a functional variant thereof, having guide RNA-mediated sequence-specific targeting ability, and nickase activity.
  • Embodiment 27 The TraC effector protein of any one of embodiments 20-23, or a functional variant thereof, which has guide RNA-mediated sequence-specific targeting ability but does not have double-stranded nucleic acid cleavage activity and/or nicking Enzyme activity.
  • Embodiment 28 The TraC effector protein of any one of embodiments 24-27, or a functional variant thereof, wherein the guide RNA is selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) A guide RNA comprising tracrRNA and/or crRNA, such as a single guide RNA (sgRNA) comprising tracrRNA and crRNA.
  • reRNA guide RNA
  • sgRNA single guide RNA
  • Embodiment 29 The TraC effector protein of embodiment 28 or a functional variant thereof, said TraC effector protein White or its functional variant can target the target DNA sequence under the guidance of the guide RNA derived from the right end element of the transposon, or can target the target DNA sequence under the guidance of the guide RNA containing tracrRNA and crRNA.
  • Embodiment 30 The TraC effector protein of embodiment 28 or a functional variant thereof, wherein the guide RNA is a reRNA derived from the TnpB system, for example, the reRNA comprises the scaffold sequence shown in SEQ ID NO: 77 or 78.
  • Embodiment 31 The TraC effector protein of embodiment 28 or a functional variant thereof, wherein the guide RNA is a single guide RNA (sgRNA) of tracrRNA and crRNA, for example, the sgRNA comprises SEQ ID NO: 75 or 76 Scaffold sequence.
  • sgRNA single guide RNA
  • Embodiment 32 The TraC effector protein of any one of embodiments 19-31, or a functional variant thereof, further comprising at least one nuclear localization sequence (NLS), cytoplasmic localization sequence, chloroplast localization sequence or mitochondrial localization sequence.
  • NLS nuclear localization sequence
  • cytoplasmic localization sequence cytoplasmic localization sequence
  • chloroplast localization sequence mitochondrial localization sequence
  • Embodiment 33 A fusion protein comprising the TraC effector protein or functional variant thereof according to any one of embodiments 19-32, and at least one other functional protein.
  • Embodiment 34 The fusion protein of embodiment 33, wherein the other functional protein is a deaminase.
  • Embodiment 35 The fusion protein of embodiment 34, wherein the deaminase is a cytosine deaminase, for example, the cytosine deaminase is selected from the group consisting of APOBEC1 deaminase, activation-induced cytidine deaminase (AID ), APOBEC3G, CDA1, human APOBEC3A deaminase, double-stranded DNA deaminase (Ddd), single-stranded DNA deaminase (Sdd) or their functional variants.
  • APOBEC1 deaminase activation-induced cytidine deaminase
  • AID activation-induced cytidine deaminase
  • APOBEC3G activation-induced cytidine deaminase
  • CDA1 APOBEC3G
  • human APOBEC3A deaminase double-stranded
  • Embodiment 36 The fusion protein of embodiment 35, further comprising a uracil DNA glycosylase inhibitor (UGI).
  • UMI uracil DNA glycosylase inhibitor
  • Embodiment 37 The fusion protein of embodiment 34, wherein the deaminase is an adenine deaminase, eg, a DNA-dependent adenine deaminase derived from E. coli tRNA adenine deaminase TadA (ecTadA).
  • the deaminase is an adenine deaminase, eg, a DNA-dependent adenine deaminase derived from E. coli tRNA adenine deaminase TadA (ecTadA).
  • Embodiment 38 The fusion protein of any one of embodiments 34-37, wherein the fusion protein includes cytosine deaminase and adenine deaminase.
  • Embodiment 39 The fusion protein of embodiment 33, wherein the other functional protein is selected from the group consisting of a transcription activator protein, a transcription repressor protein, a DNA methylase, a DNA demethylase, and a reverse transcriptase.
  • Embodiment 40 The fusion protein of any one of embodiments 33-39, wherein different parts of the fusion protein can be connected independently through a linker or directly.
  • Embodiment 41 The fusion protein of any one of embodiments 33-40, further comprising at least one nuclear localization sequence (NLS), cytoplasmic localization sequence, chloroplast localization sequence or mitochondrial localization sequence.
  • NLS nuclear localization sequence
  • cytoplasmic localization sequence cytoplasmic localization sequence
  • chloroplast localization sequence mitochondrial localization sequence
  • Embodiment 42 The TraC effector protein of any one of embodiments 19-32 or a functional variant thereof or the fusion protein of any one of embodiments 33-41 is performed on cells, preferably eukaryotic cells, more preferably plant cells. Uses of genome editing.
  • Embodiment 43 The use of embodiment 42, wherein the genome editing includes base editing (Base Editor), guide editing (Prime Editor), and PrimeRoot editing (PrimRoot Editor).
  • Embodiment 44 A genome editing system for site-directed modification of target nucleic acid sequences in cellular genomes system, which includes:
  • An expression construct encoding a nucleotide sequence encoding the TraC effector protein of any one of embodiments 19-32, or a functional variant thereof, or the fusion protein of any one of embodiments 33-41.
  • Embodiment 45 The genome editing system of embodiment 44, further comprising at least one guide RNA (gRNA) and/or an expression construct comprising a nucleotide sequence encoding the at least one guide RNA.
  • gRNA guide RNA
  • Embodiment 46 The genome editing system of embodiment 45, wherein the genome editing system comprises any one selected from:
  • an expression construct comprising a nucleotide sequence encoding the TraC effector protein of any one of embodiments 19-32 or a functional variant thereof or the fusion protein of any one of embodiments 33-41, and said at least a guide RNA;
  • an expression construct comprising a nucleotide sequence encoding the TraC effector protein of any one of embodiments 19-32 or a functional variant thereof or the fusion protein of any one of embodiments 33-41, and an expression construct comprising an expression construct of the nucleotide sequence of at least one guide RNA;
  • v) comprising a nucleotide sequence encoding the TraC effector protein of any one of embodiments 19-32 or a functional variant thereof or the fusion protein of any one of embodiments 33-41 and encoding said at least one guide RNA Expression constructs of nucleotide sequences.
  • Embodiment 47 The genome editing system of any one of embodiments 45-46, wherein the guide RNA is selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) comprising tracrRNA and/or crRNA Guide RNA, such as single guide RNA (sgRNA) including tracrRNA and crRNA.
  • reRNA guide RNA
  • sgRNA single guide RNA
  • Embodiment 48 The genome editing system of embodiment 47, wherein the guide RNA is a reRNA derived from the TnpB system, for example, the reRNA comprises the scaffold sequence shown in SEQ ID NO: 77 or 78.
  • Embodiment 49 The genome editing system of any one of embodiments 45-46, wherein the guide RNA is a single guide RNA (sgRNA) comprising tracrRNA and crRNA, for example, the sgRNA comprises SEQ ID NO: 75 or 76 Shows the stent sequence.
  • sgRNA single guide RNA
  • the sgRNA comprises SEQ ID NO: 75 or 76 Shows the stent sequence.
  • Embodiment 50 The genome editing system of embodiment 47 or 49, wherein the guide RNA comprises tracrRNA and crRNA, such as a single guide RNA (sgRNA) comprising tracrRNA and crRNA, wherein the crRNA comprises the same target sequence immediately adjacent to the PAM
  • tracrRNA contains a sequence complementary to a sequence located distal to the PAM target sequence (non-targeting strand binding sequence, NTB).
  • Embodiment 51 The genome editing system of any one of embodiments 44-50, wherein the genome editing system further comprises a donor nucleic acid molecule comprising a nucleotide sequence to be site-specifically inserted into the genome, e.g. As described, the nucleotide sequence to be inserted into the genome at a specific site includes sequences homologous to the sequences flanking the target sequence in the genome.
  • Embodiment 52 The genome editing system of any one of embodiments 44-51, wherein the nucleotide sequence encoding the TraC effector protein or a functional variant thereof or the fusion protein and/or encoding the at least one
  • the nucleotide sequence of the guide RNA is operably linked to an expression control element such as a promoter.
  • Embodiment 53 The genome editing system of any one of embodiments 44-52, wherein the components of the genome editing system are comprised in a delivery system selected from the group consisting of viruses, virus-like particles, virions, lipids Plastids, vesicles, exosomes, liposome nanoparticles (LNP), N-acetylgalactosamine (GalNAc) or engineered bacteria.
  • a delivery system selected from the group consisting of viruses, virus-like particles, virions, lipids Plastids, vesicles, exosomes, liposome nanoparticles (LNP), N-acetylgalactosamine (GalNAc) or engineered bacteria.
  • Embodiment 54 A method of producing a genetically modified cell, comprising introducing the genome editing system of any one of embodiments 44-53 into the cell.
  • Embodiment 55 The method of embodiment 54, wherein the cells are from prokaryotes or eukaryotes, preferably from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens , ducks, geese; plants, including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut, and Arabidopsis.
  • prokaryotes or eukaryotes preferably from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens , ducks, geese; plants, including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut, and Arabidopsis.
  • the present invention obtains new CRISPR effector proteins and their genome editing systems, enriching the selection and application scenarios of genome editing tools;
  • the TraC subbranch CRISPR effector protein obtained by the present invention has a dual-guide mechanism and has the targeted cleavage pathways of both the TnpB system and the CRISPR system. That is, the TraC effector protein can both target and bind target DNA under the guidance of reRNA. It can target and bind target DNA under the guidance of sgRNA, helping to achieve multiple genome editing under the same gene editing tool;
  • the TraC effector protein obtained by the present invention is the smallest monomer among the currently known monomeric Cas12 proteins, which is helpful for achieving delivery and editing in vivo;
  • the TraC effector protein obtained in the present invention interacts with the non-targeting strand of the target dsDNA to form a bubble under the guidance of sgRNA containing the non-targeting strand complementary sequence (NTB) complementary to the non-targeting strand (NTS). structure, which facilitates the opening and editing of PAM distal DNA.
  • NTB non-targeting strand complementary sequence
  • NTS non-targeting strand
  • Figure 1 Shows three structural motifs conserved in 86 Cas12 proteins.
  • Figure 2 Shows the prokaryotic expression system of TraC protein.
  • Figure 3 A flow chart showing the use of a fluorescent reporter system to screen CRISPR systems with DNA double-strand binding ability.
  • Figure 5 Flow chart for screening CRISPR systems with DNA double-strand cutting ability.
  • Figure 6 Test results of DNA double-stranded cleavage ability.
  • A The test results of TraC-875, TraC-365, TraC-655, and TraC-445; B: The test results of TraC-297, TraC-459, TraC-466, and TraC-949. LbCpf1 as yang Sexual comparison.
  • Figure 7 Flowchart of using the plasmid interference system to detect the DNA double-stranded cleavage ability of the new CRISPR system.
  • Figure 8 Results of testing TraC-459, TraC-875 and TraC-297 proteins using the plasmid interference system.
  • Figure 9 A: Prediction of secondary structure of accessory RNA and structural folding model analysis of V-type CRISPR and TnpB systems; B: Model of co-evolution of effectors and accessory RNA.
  • Figure 10 Shows optimization of TraC protein sgRNA.
  • A sgRNA predicted by TraC-459 protein;
  • B tracrRNA: The impact of crRNA complementary region truncation length, tracrRNA 5’ region truncation length, and spacer length on the editing efficiency of TraC-459 protein.
  • Figure 11 Shows that optimized sgRNA-opt can significantly improve the editing efficiency of TraC-459.
  • Figure 12 shows the use of plasmid interference experiments to analyze the dsDNA cleavage ability of TraC-459 on E. coli under different guide RNAs.
  • Figure 13 shows the prediction results of the three-dimensional structure folding of TraC-459 protein.
  • Figure 14 Shows TraC-459 variant screen.
  • TraC effector protein targets DNA in a bubble-like structure guided by reprogrammed sgRNA.
  • Reprogrammed sgRNA can improve editing efficiency.
  • FIG. 18 Shows that TraC protein is affected by temperature in plant cells.
  • the editing efficiency of TraC-5M-7 at 32°C is 1-29 times higher than that at 25°C.
  • the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleic acids at one or both ends of the protein or nucleic acid. Glycoside acid, but still possesses the activity described in the present invention.
  • those skilled in the art know that the methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain practical circumstances (such as when expressed in a specific expression system), but will not substantially affect the function of the polypeptide.
  • Gene as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in subcellular components of the cell (eg, mitochondria, plastids).
  • organism includes any organism suitable for genome editing, preferably eukaryotes.
  • organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese; plants including monocots and dicots, For example, rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis thaliana, etc.
  • Genetically modified organism or “genetically modified cell” means an organism or cell that contains exogenous polynucleotides or modified genes or expression control sequences within its genome.
  • exogenous polynucleotides can be stably integrated into the genome of an organism or cell and inherited for successive generations.
  • Exogenous polynucleotides can be integrated into the genome alone or as part of a recombinant DNA construct.
  • a modified gene or expression control sequence is one in which the sequence contains single or multiple deoxynucleotide substitutions, deletions, and additions in the genome of an organism or cell.
  • Form with respect to a sequence means a sequence from an alien species or, if from the same species, a sequence that has undergone significant changes in composition and/or locus from its native form by deliberate human intervention.
  • nucleic acid sequence is used interchangeably and are single- or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases.
  • Nucleotides are referred to by their single-letter names as follows: "A” for adenosine or deoxyadenosine (for RNA or DNA, respectively), “C” for cytidine or deoxycytidine, and “G” for guanosine or Deoxyguanosine, "U” represents uridine, “T” represents deoxythymidine, “R” represents purine (A or G), “Y” represents pyrimidine (C or T), “K” represents G or T, “ H” represents A or C or T, “I” represents inosine, and “N” represents any nucleotide.
  • nucleotide sequences may be represented herein as DNA sequences (including T), when referring to RNA, one skilled in the art can readily determine the corresponding RNA sequence (i.e., substituting U for T).
  • Polypeptide “peptide,” and “protein” are used interchangeably herein and refer to a polymer of amino acid residues. The term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acids, as well as to naturally occurring amino acid polymers.
  • the terms “polypeptide,” “peptide,” “amino acid sequence,” and “protein” may also include modified forms including, but not limited to, glycosylation, lipid linkage, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation, lation and ADP-ribosylation.
  • Sequence "identity” has an art-recognized meaning, and the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions can be calculated using published techniques. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along a region of the molecule.
  • identity is well known to those skilled in the art (Carrillo, H. & Lipman, D., SIAM J Applied Math 48:1073 (1988) ).
  • construct or “expression construct” refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "Expression” refers to the production of a functional product.
  • expression of a nucleotide sequence may refer to transcription of the nucleotide sequence (eg, transcription to produce mRNA or functional RNA) and/or translation of the RNA into a precursor or mature protein.
  • the "expression construct" of the present invention can be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, an RNA capable of translation (such as mRNA).
  • An "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different sources, or control sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that which normally occurs in nature.
  • regulatory sequence and “regulatory element” are used interchangeably and refer to a coding sequence that is located upstream (5' non-coding sequence), intermediate or downstream (3' non-coding sequence) and affects the transcription, RNA processing or Stability or translated nucleotide sequence. Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns, and polyadenylation recognition sequences.
  • a promoter refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • a promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from said cell.
  • the promoter may be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably and refer to expression primarily, but not necessarily exclusively, in one tissue or organ, but also in a specific cell or cell type promoter.
  • Developmentally regulated promoter refers to a promoter whose activity is determined by developmental events.
  • inducible promoters selectively express operably linked DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).
  • operably linked means that a regulatory element (eg, but not limited to, a promoter sequence, a transcription termination sequence, etc.) is linked to a nucleic acid sequence (eg, a coding sequence or an open reading frame) such that the nucleotide Transcription of the sequence is controlled and regulated by the transcriptional regulatory elements.
  • a regulatory element eg, but not limited to, a promoter sequence, a transcription termination sequence, etc.
  • nucleic acid sequence eg, a coding sequence or an open reading frame
  • Introducing" a nucleic acid molecule eg, plasmid, linear nucleic acid fragment, RNA, etc.
  • a nucleic acid molecule or protein into an organism means transforming an organism's cells with the nucleic acid or protein so that the nucleic acid or protein can function in the cell.
  • Transformation as used in the present invention includes stable transformation and transient transformation.
  • “Stable transformation” refers to the introduction of exogenous nucleotide sequences into the genome, resulting in stable inheritance of the exogenous nucleotide sequences. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generations thereof.
  • Transient transformation refers to the introduction of a nucleic acid molecule or protein into a cell to perform its function without stable inheritance of the exogenous nucleotide sequence. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
  • Chargeer refers to the physiological, morphological, biochemical or physical characteristics of a cell or organism.
  • Agronomic traits specifically refer to measurable indicator parameters of crop plants, including but not limited to: leaf greenness, grain yield, growth rate, total biomass or accumulation rate, fresh weight at maturity, dry weight at maturity, fruit Yield, seed yield, total plant nitrogen content, fruit nitrogen content, seed nitrogen content, plant vegetative tissue nitrogen content, plant total free amino acid content, fruit free amino acid content, seed free amino acid content, plant vegetative tissue free amino acid content, total plant protein content, fruit protein content, seed protein content, plant vegetative tissue protein content, herbicide resistance and drought resistance, nitrogen absorption, root lodging, harvest index, stem lodging, plant height, ear height, ear length, disease resistance properties, cold resistance, salt resistance and number of tillers, etc.
  • the present invention provides a new type of CRISPR effector protein, which has the targeted cleavage activity of the TnpB system and the CRISPR system, that is, it can not only target the target DNA under the guidance of reRNA, but also can bind to the target DNA under the guidance of tracrRNA and/or crRNA. Targeted binding to target DNA is guided by RNA such as sgRNA.
  • This subtype of CRISPR nuclease is also referred to herein as transposon and CRISPR-Cas12 intermediate (TraC) effector proteins.
  • the invention provides an engineered Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system, comprising:
  • the guide RNA is selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) a guide RNA comprising tracrRNA and/or crRNA, such as a single guide RNA (sgRNA) comprising tracrRNA and crRNA;
  • reRNA guide RNA
  • sgRNA single guide RNA
  • the TraC effector protein can form a CRISPR complex with guide RNA
  • the TraC effector protein can target-bind to a target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or can target-bind to a target DNA sequence under the guidance of a guide RNA containing tracrRNA and crRNA.
  • the engineered clustered regularly interspaced short palindromic repeats (CRISPR) system is a genome editing system for genome editing in an organism or cells of an organism.
  • the TraC effector protein is as defined below.
  • the tracrRNA contains a non-targeting strand binding sequence (NTB) complementary to a non-targeting strand (NTS).
  • NTB non-targeting strand binding sequence
  • NTS non-targeting strand
  • the invention also provides an engineered clustered regularly interspaced short palindromic repeats CRISPR vector system comprising one or more constructs, comprising:
  • a second regulatory element operably linked to one or more nucleotide sequences encoding one or more guide RNAs selected from i) derived from a guide RNA (reRNA) of the right end element of the transposon and/or ii) a guide RNA containing tracrRNA and/or crRNA, such as a single guide RNA (sgRNA) containing tracrRNA and crRNA;
  • reRNA guide RNA
  • sgRNA single guide RNA
  • the TraC effector protein can form a CRISPR complex with guide RNA
  • the TraC effector protein can target-bind to a target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or can target-bind to a target DNA sequence under the guidance of a guide RNA containing tracrRNA and/or crRNA.
  • the TraC effector protein is as defined below.
  • the tracrRNA contains a non-targeting strand binding sequence (NTB) complementary to a non-targeting strand (NTS).
  • NTB non-targeting strand binding sequence
  • NTS non-targeting strand
  • the guide RNA is a guide RNA comprising tracrRNA and/or crRNA, wherein the tracrRNA contains a non-targeting strand binding sequence (NTB) complementary to a non-targeting strand (NTS), wherein the guide RNA
  • NTB non-targeting strand binding sequence
  • the guide RNA The crRNA hybridizes to the targeting strand (TS) of the target DNA sequence, and the NTB hybridizes to the non-targeting strand (NTS).
  • the one or more guide RNAs hybridize to the target DNA when transcribed, and the guide RNA forms a complex with the TraC effector protein, the complex causing distal cleavage of the target DNA sequence.
  • the target DNA sequence is within a cell, preferably within a eukaryotic cell.
  • the effector protein contains one or more nuclear localization signals.
  • the nucleic acid sequences encoding the effector protein are codon optimized for expression in eukaryotic cells.
  • components a) and b) or their nucleotide sequences are constructed on the same or different vectors.
  • the present invention provides a method for modifying a DNA sequence of interest, which method comprises delivering the system described herein to the DNA sequence of interest or a cell containing the DNA sequence of interest.
  • the invention provides a method of modifying a DNA sequence of interest, the method comprising delivering a composition of a TraC effector protein and one or more nucleic acid components to the DNA sequence of interest, wherein the effector protein is both capable of Targeted binding to the target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or targeted binding to the target DNA sequence under the guidance of a guide RNA containing tracrRNA and/or crRNA; the effector protein is combined with the one or Multiple nucleic acid components form a CRISPR complex, and after the complex targets a DNA sequence of interest that is 3' to the protospacer adjacent motif (PAM), the effector protein induces a response to the DNA sequence of interest. Grooming.
  • PAM protospacer adjacent motif
  • the DNA sequence of interest is within a cell, preferably a eukaryotic cell.
  • the cell is an animal cell or a human cell.
  • the cell is a plant cell.
  • the effector protein comprises one or more nuclear localization sequences (NLS), cytoplasmic localization position sequence, chloroplast localization sequence or mitochondrial localization sequence.
  • NLS nuclear localization sequences
  • cytoplasmic localization position sequence cytoplasmic localization position sequence
  • chloroplast localization sequence mitochondrial localization sequence
  • the effector protein and nucleic acid component or a construct expressing the effector protein and nucleic acid component, are comprised in a delivery system.
  • the delivery system includes viruses, virus-like particles, virions, liposomes, vesicles, exosomes, liposomal nanoparticles (LNP), N-acetylgalactosamine (GalNAc), or engineered bacteria .
  • the invention provides a transposon and CRISPR-Cas12 intermediate (TraC) effector protein or a functional variant thereof for genome editing in an organism or an organism cell, wherein the TraC effector protein or its functional variant capable of forming a CRISPR complex with guide RNA;
  • TraC transposon and CRISPR-Cas12 intermediate
  • the guide RNA is selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) a guide RNA comprising tracrRNA and/or crRNA, such as a single guide RNA (sgRNA) comprising tracrRNA and crRNA;
  • reRNA guide RNA
  • sgRNA single guide RNA
  • the TraC effector protein or its functional variant can either target the target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or can target the target under the guidance of a guide RNA containing tracrRNA and crRNA. DNA sequence.
  • the invention provides transposons and CRISPR-Cas12 intermediate (TraC) effector proteins or functional variants thereof for genome editing in organisms or cells of organisms, the TraC effector proteins or functions thereof sexual variant
  • (ii) Contains at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70% with one of SEQ ID NO: 1-37 , at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least An amino acid sequence that is 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or even 100% sequence identical, or contains one or more, with respect to SEQ ID NOs: 1-37, For example, an amino acid sequence with 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids substituted, deleted or added.
  • the effector protein or functional variant thereof is derived from SEQ ID NO: 25.
  • the effector protein or functional variant thereof comprises, relative to the sequence of SEQ ID NO: 25, selected from the group consisting of K78R, D86R, S137R, V145R, I147R, P148R, D150R, V228R, V254R, A510R, A278R, One or more amino acid substitutions of K315R, S334R, L343R, A369R, H392R, L394R, S408R, N456R, V500R, A510R, T573R.
  • the effector protein or functional variant thereof comprises any set of mutations selected from the group shown in Table 3 or Table 4 relative to the sequence of SEQ ID NO:25.
  • the effector protein or functional variant thereof comprises an amino acid sequence selected from SEQ ID NOs: 80-87.
  • the TraC effector protein or functional variant thereof has at least guide RNA-mediated Sequence-specific targeting capabilities. That is, the TraC effector protein or its functional variant can form a complex with the guide RNA and bind to a specific target sequence (such as a DNA target sequence).
  • the TraC effector protein or functional variant thereof has guide RNA-mediated sequence-specific targeting capabilities, as well as double-stranded nucleic acid (eg, double-stranded DNA) cleavage activity.
  • double-stranded nucleic acid eg, double-stranded DNA
  • the TraC effector protein or its functional variant forms a complex with the guide RNA and binds to a specific target sequence (such as a DNA target sequence)
  • a specific target sequence such as a DNA target sequence
  • it can cleave double-stranded nucleic acids (such as double-stranded DNA) within or near the target sequence. , forming a double-strand break (DSB).
  • DSB double-strand break
  • the TraC effector protein or functional variant thereof has guide RNA-mediated sequence-specific targeting capabilities, as well as nickase activity. For example, after the TraC effector protein or its functional variant forms a complex with the guide RNA and binds to a specific target sequence (such as a DNA target sequence), it can generate a nick in or near the target sequence.
  • TraC effector proteins with nickase activity or functional variants thereof are also called TraC nickases.
  • the TraC effector protein or functional variant thereof has guide RNA-mediated sequence-specific targeting capabilities but does not have double-stranded nucleic acid cleavage activity and/or nickase activity.
  • Such TraC effector proteins or functional variants thereof that do not have double-stranded nucleic acid cleavage activity and/or nickase activity are also called dead TraC effector proteins.
  • RNA molecules targeting target sequences Generally speaking, the gRNA of the CRISPR system targets the target sequence through base pairing between the crRNA and the complementary strand of the target sequence.
  • the guide RNA can be selected from i) a guide RNA (reRNA) derived from the right end element of the transposon and/or ii) a guide RNA containing tracrRNA and/or crRNA, such as a single guide RNA containing tracrRNA and crRNA ( sgRNA).
  • reRNA guide RNA
  • sgRNA single guide RNA containing tracrRNA and crRNA
  • the TraC effector protein of the present invention or a functional variant thereof is capable of targeting and binding to a target DNA sequence under the guidance of a guide RNA derived from the right end element of the transposon, or it is capable of binding to a target DNA sequence containing tracrRNA and/or crRNA targets and binds to the target DNA sequence under the guidance of guide RNA.
  • the guide RNA is a guide RNA (reRNA) derived from the right end element of a transposon, for example, the reRNA comprises the scaffold sequence set forth in SEQ ID NO: 77 or 78.
  • reRNA guide RNA
  • the specific form or sequence of reRNA can vary according to the specific TraC effector protein.
  • the design can refer to Karvelis, T. et al.
  • Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature 599, 692-696 (2021).
  • the guide RNA comprises tracrRNA and/or crRNA.
  • the guide RNA is a guide RNA formed by complementation of tracrRNA and crRNA.
  • the guide RNA is a single guide RNA (sgRNA) comprising tracrRNA and crRNA, wherein tracrRNA and crRNA are fused.
  • the guide RNA may comprise only crRNA, which may also be referred to as sgRNA. The specific gRNA form or sequence may vary depending on the specific TraC nuclease.
  • the guide RNA containing tracrRNA and/or crRNA may also be called CRISPR system guide RNA.
  • Guide RNA containing tracrRNA and/or crRNA is a conventional form of guide RNA for CRISPR systems.
  • tracrRNA The sequence of and/or crRNA can be obtained by analyzing sequences near the CRISPR effector protein locus. It is within the ability of those skilled in the art to analyze and obtain tracrRNA and/or crRNA-containing guide RNAs for CRISPR effector proteins.
  • the guide RNA comprising tracrRNA and/or crRNA is derived or matured from the nucleotide sequence of one of SEQ ID NOs: 38-74.
  • the guide RNA includes tracrRNA and crRNA, such as a single guide RNA (sgRNA) including tracrRNA and crRNA.
  • the crRNA comprises the same sequence as the target sequence immediately adjacent to the PAM (e.g., 3' to the PAM), thereby complementary binding to the opposite strand of the PAM (targeting strand).
  • the tracrRNA contains a sequence complementary to a sequence distal to the PAM (in the direction of the target sequence) (non-targeting strand binding sequence, NTB).
  • the non-targeting strand binding sequence is located at the 5' end of the tracrRNA.
  • NTB in tracrRNA to the distal sequence of PAM can help the effector protein-guide RNA complex to open the PAM distal DNA region and improve editing efficiency.
  • the complement of the non-targeting strand binding sequence is from about 10 to about 50 nucleotides from the PAM, such as about 10, about 16, about 20, about 24, about 28, about 30, about 40 , or about 50 nucleotides, preferably about 20 nucleotides from the PAM.
  • the non-targeting strand binding sequence is about 5 to about 20 nucleotides in length, preferably about 8 to 12 nucleotides in length, and more preferably about 10 nucleotides in length.
  • the complementary sequence of the non-targeting strand binding sequence overlaps at least partially with the target sequence.
  • the complement of the non-targeting strand binding sequence is included in the target sequence.
  • a "target sequence” refers to a sequence of about 20 nucleotides in length in the genome that is characterized by flanking (e.g., 5' flanking) PAM (pregapacer adjacent motif) sequences.
  • PAM is necessary for the complex formed by CRISPR nuclease, such as the TraC effector protein of the present invention or its functional variant, and the guide RNA to recognize the target sequence.
  • CRISPR nuclease such as the TraC effector protein of the present invention or its functional variant
  • the guide RNA to recognize the target sequence.
  • the target sequence can be located on any strand of the genomic DNA molecule.
  • the strand bound by crRNA is called the targeting strand (TS), and the strand complementary to the targeting strand is called the non-targeting strand (NTS).
  • the sgRNA comprises the scaffold sequence set forth in SEQ ID NO: 75 or 76.
  • nucleotides 154-209 in the scaffold sequence shown in SEQ ID NO:75 or nucleotides 92-147 in the scaffold sequence shown in SEQ ID NO:76 are reprogrammable regions, and the Regions can be reprogrammed to contain non-targeting strand binding sequences (NTB).
  • NTB non-targeting strand binding sequences
  • the invention also provides a protein complex of the TraC effector protein or a functional variant thereof and at least one other functional protein.
  • the TraC effector protein or functional variant thereof and the other functional protein form a protein complex via an affinity tag that mediates specific binding.
  • the other functional protein forms a protein complex with the TraC effector protein or a functional variant thereof by specifically binding to the guide RNA.
  • the invention also provides a fusion protein of the TraC effector protein or a functional variant thereof and at least one other functional protein.
  • the other functional protein is a deaminase.
  • the protein complex or fusion protein can be used for base editing in organisms or organism cells.
  • the protein complex or fusion protein containing the TraC effector protein or its functional variant and a deaminase is also called a base editor.
  • the protein complex or fusion protein can comprise one or more of the deaminase enzymes.
  • the deaminase is cytosine deaminase.
  • Cytosine deaminase refers to a deaminase that can accept single-stranded DNA as a substrate and catalyze the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively.
  • cytosine deaminase examples include, but are not limited to, APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDAl, human APOBEC3A deaminase, double-stranded DNA deaminase (Ddd), single-stranded DNA deaminase (Sdd) (Ddd and Sdd reference CN202310220057.1, PCT/CN2023/080052) or their functional variants.
  • APOBEC1 deaminase activation-induced cytidine deaminase
  • AID activation-induced cytidine deaminase
  • AID activation-induced cytidine deaminase
  • AID activation-induced cytidine deaminase
  • APOBEC3G activation-induced cytidine deaminase
  • CDAl
  • a cytidine deaminase in a protein complex or fusion protein is capable of converting cytidine deamination of single-stranded DNA produced in the formation of a protein complex or fusion protein-guide RNA-DNA complex. into U, and then base substitution from C to T is achieved through base mismatch repair.
  • the protein complex or fusion protein comprising cytosine deaminase further comprises a uracil DNA glycosylase inhibitor (UGI).
  • Uracil DNA glycosylase can catalyze the removal of U from DNA and initiate base excision repair (BER), resulting in the repair of U:G to C:G. Therefore, without being bound by any theory, inclusion of a uracil DNA glycosylase inhibitor (UGI) in the fusion protein of the invention will be able to increase the efficiency of C to T base editing.
  • the deaminase is adenine deaminase.
  • Adenine deaminase refers to a domain that can accept single-stranded DNA as a substrate and catalyze the formation of inosine (I) from adenosine or deoxyadenosine (A).
  • adenine deaminase in the protein complex or fusion protein can deaminate adenosine of the single-stranded DNA generated in the formation of the protein complex or fusion protein-guide RNA-DNA complex into inosine ( I), because DNA polymerase treats inosine (I) as guanine (G), A to G substitution can be achieved through base mismatch repair.
  • the adenine deaminase is a DNA-dependent adenine deaminase derived from E. coli tRNA adenine deaminase TadA (ecTadA).
  • the protein complex or fusion protein includes cytosine deaminase and adenine deaminase.
  • the other functional proteins may be transcription activator proteins, transcription repressor proteins, DNA methylases, DNA demethylases, etc., thereby enabling transcriptional regulation functions and/or epigenetic modification functions.
  • the other functional protein may be reverse transcriptase.
  • the protein complex or fusion protein containing the TraC effector protein or its functional variant and reverse transcriptase can be used for large fragment DNA insertion, such as prime editor (Anzalone, A.V., Randolph, P.B., Davis, J.R.
  • the linkers described herein can be 1-50 in length (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, non-functional amino acid sequences without secondary or higher structures.
  • the joint may be a flexible joint.
  • the TraC effector protein or functional variant thereof or the other functional protein forming a protein complex or the fusion protein is recombinantly produced.
  • the TraC effector protein or functional variant thereof or the other functional protein forming a protein complex or the fusion protein further contains a fusion tag, e.g. for TraC effector A tag for the isolation/and purification of a protein or a functional variant thereof or said other functional protein or said fusion protein forming a protein complex.
  • Methods for recombinantly producing proteins are known in the art. And there are many tags known in the art that can be used to separate/and purify proteins, including but not limited to His tags, GST tags, etc. Generally speaking, these tags do not alter the activity of the protein of interest.
  • the TraC effector protein of the invention or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein further comprises a nuclear localization sequence (NLS) , for example, connected to the nuclear localization sequence through a linker.
  • NLS nuclear localization sequence
  • one or more NLS in the TraC effector protein or functional variant thereof or the other functional protein forming a protein complex or the fusion protein should be of sufficient strength to drive in the nucleus
  • the TraC effector protein or functional variant thereof or the other functional protein forming a protein complex or the fusion protein accumulates in an amount that can achieve its genome editing function.
  • the strength of the nuclear localization activity is determined by the number, position, one or Determined by multiple specific NLS, or a combination of these factors.
  • Exemplary nuclear localization sequences include, but are not limited to, SV40 nuclear localization signal sequence and nucleoplasmin nuclear localization signal sequence.
  • the TraC effector protein or its functional variant or the fusion protein of the present invention can also include other positioning sequences, such as cytoplasmic positioning sequence, chloroplast positioning sequence, mitochondrial positioning sequence, etc. .
  • the invention provides the TraC effector protein of the invention or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein in a cell, preferably a eukaryotic cell, more preferably a plant.
  • a cell preferably a eukaryotic cell, more preferably a plant.
  • the invention provides a genome editing system for site-directed modification of a target nucleic acid sequence in a cell genome, which includes the TraC effector protein of the invention or a functional variant thereof or all components forming a protein complex.
  • Said other functional protein or said fusion protein and/or an expression construct comprising a nucleotide sequence encoding said TraC effector protein of the invention or its functional variant or said fusion protein.
  • the terms "genome editing system” and “gene editing system” are used interchangeably and refer to a combination of components required for genome editing of the genome within the cells of an organism, wherein the individual components of the system, e.g. The TraC effector protein or its functional variant or the other functional protein forming a protein complex or the fusion protein, gRNA or corresponding expression construct, etc. can exist independently, or can be used in any combination as exist in the form of compositions.
  • the components of the genome editing system are comprised in a delivery system selected from viruses, virus-like particles, virions, liposomes, vesicles, exosomes, liposomal nanoparticles particles (LNP), N-acetylgalactosamine (GalNAc) or engineered bacteria.
  • a delivery system selected from viruses, virus-like particles, virions, liposomes, vesicles, exosomes, liposomal nanoparticles particles (LNP), N-acetylgalactosamine (GalNAc) or engineered bacteria.
  • the genome editing system further includes at least one guide RNA (gRNA) and/or an expression construct comprising a nucleotide sequence encoding the at least one guide RNA.
  • gRNA guide RNA
  • the guide RNA is selected from i) a guide RNA derived from the right end element of a transposon (reRNA) and/or ii) a guide RNA comprising tracrRNA and/or crRNA, e.g., a single guide RNA comprising tracrRNA and crRNA (sgRNA).
  • reRNA transposon
  • sgRNA single guide RNA comprising tracrRNA and crRNA
  • the guide RNA is a single guide RNA (sgRNA) comprising tracrRNA and crRNA, for example, the sgRNA comprises the scaffold sequence shown in SEQ ID NO: 75 or 76.
  • sgRNA single guide RNA
  • the guide RNA derived from the CRISPR system includes tracrRNA and crRNA, such as a single guide RNA (sgRNA) including tracrRNA and crRNA.
  • the crRNA comprises the same sequence as the target sequence immediately adjacent to the PAM, thereby complementary binding to the opposite strand of the PAM.
  • the tracrRNA includes a sequence complementary to a stretch of the PAM distal to the target sequence (non-targeting strand binding sequence, NTB).
  • the non-targeting strand binding sequence is located at the 5' end of the tracrRNA.
  • the complement of the non-targeting strand binding sequence is from about 10 to about 50 nucleotides from the PAM, such as about 10, about 16, about 20, about 24, about 28, about 30, about 40 , or about 50 nucleotides, preferably about 20 nucleotides from the PAM.
  • the non-targeting strand binding sequence is about 5 to about 20 nucleotides in length, preferably about 8 to 12 nucleotides in length, and more preferably about 10 nucleotides in length.
  • the complementary sequence of the non-targeting strand binding sequence overlaps at least partially with the target sequence.
  • the complement of the non-targeting strand binding sequence is included in the target sequence.
  • nucleotides 154-209 in the scaffold sequence shown in SEQ ID NO:75 or nucleotides 92-147 in the scaffold sequence shown in SEQ ID NO:76 are reprogrammable regions, and the Regions can be reprogrammed to contain non-targeting strand binding sequences (NTB).
  • NTB non-targeting strand binding sequences
  • the 5' or 3' end of the target sequence targeted by the genome editing system of the present invention needs to contain a protospacer adjacent motif (PAM).
  • PAM protospacer adjacent motif
  • the specific gRNA form or sequence will vary depending on the specific nuclease.
  • the gRNA when used to guide editing, can be a so-called pegRNA.
  • the pegRNA additionally adds a reverse transcription template (RT) sequence and a primer binding site (PBS) sequence to the sgRNA.
  • RT reverse transcription template
  • PBS primer binding site
  • the PAM recognized by the nucleases of the invention or functional variants thereof is a T-rich PAM. In some embodiments, the PAM recognized by the nucleases of the invention or functional variants thereof is a G-rich PAM.
  • the PAM can be, for example, 5'-TTTN-3', 5'-TGTNNN-3', PolyT, PolyG, 5'-TTTG-3', 5'-TTC-3', 5'-TGA-3', 5'-YTTC-3', 5'-CTCGTG-3', 5'-GTTG-3', 5'-CTTG-3', 5'-TCTG-3', 5'-TTTA-3', 5' -TTAG-3', where N represents A, G, C or T and Y represents C or G).
  • PAMs Based on the presence of PAMs, those skilled in the art can easily determine target sequences in the genome that can be used for targeting and optionally editing and design appropriate guide RNAs accordingly. For example, if there is a PAM sequence 5'-TTC-3' in the genome, then about 18 to about 35, preferably 20, 21, 22 or 23 consecutive nucleotides immediately adjacent to its 5' or 3' can be used as the target sequence.
  • the at least one guide RNA is encoded by a different expression construct. In some implementations In the scheme, the at least one guide RNA is encoded by the same expression construct. In some embodiments, the at least one guide RNA and the TraC effector protein of the invention or a functional variant thereof or the fusion protein are encoded by the same expression construct.
  • the genome editing system may comprise any one selected from:
  • the TraC effector protein of the present invention or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein and the at least one guide RNA, optionally, the TraC effector
  • the protein or functional variant thereof or the fusion protein and the at least one guide RNA form a complex
  • an expression construct comprising a nucleotide sequence encoding the TraC effector protein of the invention or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein, and the at least one species guide RNA;
  • an expression construct comprising a nucleotide sequence encoding the TraC effector protein of the invention or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein, and an expression construct encoding the An expression construct for the nucleotide sequence of at least one guide RNA;
  • v) comprising a nucleotide sequence encoding the TraC effector protein of the invention or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein and encoding the at least one guide RNA Expression constructs for nucleotide sequences.
  • the genome editing system further comprises a donor nucleic acid molecule comprising a nucleotide sequence to be site-specifically inserted into the genome.
  • the nucleotide sequence to be site-directedly inserted into the genome is flanked by sequences homologous to sequences flanking the target sequence in the genome. After editing, the nucleotide sequence to be inserted into the genome at a specific site can be integrated into the genome through homologous recombination.
  • the nucleic acid encoding the TraC effector protein or a functional variant thereof or the other functional protein forming a protein complex or the fusion protein is codon-optimized for the organism from which the cells to be genome edited are derived.
  • Codon optimization refers to replacing at least one codon of the native sequence (e.g., about or more than about 1, 2, 3, 4, 5, 10) with a codon that is more frequently or most frequently used in the host cell's genes. , 15, 20, 25, 50 or more codons while maintaining the native amino acid sequence and modifying the nucleic acid sequence to enhance expression in the host cell of interest. Different species display certain codons for specific amino acids specific preferences. Codon bias (differences in codon usage between organisms) is often related to the efficiency of messenger RNA (mRNA) translation, which is thought to depend on the nature of the codons being translated and Availability of specific transfer RNA (tRNA) molecules.
  • mRNA messenger RNA
  • tRNA transfer RNA
  • tRNAs within a cell generally reflects the codons most frequently used for peptide synthesis.
  • genes can be tailored to be most efficient in a given organism based on codon optimization.
  • Optimal gene expression. Codon utilization tables are readily available, for example in the Codon Usage Database available at www.kazusa.orjp/codon/, and these tables can be adjusted in different ways Applicable. See, Nakamura Y. et al., “Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucl. Acids Res., 28:292 (2000).
  • the cells from which genome editing can be performed by the TraC effector protein or functional variant thereof or the fusion protein or genome editing system of the present invention are derived from organisms that can be prokaryotes or eukaryotes, preferably eukaryotes, Including but not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese; plants including monocots and dicots, such as rice, corn , wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
  • a nucleotide sequence encoding the TraC effector protein or a functional variant thereof or the fusion protein and/or a nucleotide sequence encoding the at least one guide RNA and an expression control element If the promoter is operably linked.
  • promoters examples include, but are not limited to, polymerase (pol) I, pol II or pol III promoters.
  • pol I promoters include the chicken RNA pol I promoter.
  • pol II promoters include, but are not limited to, the cytomegalovirus immediate early (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the simian virus 40 (SV40) immediate early promoter.
  • pol III promoters include the U6 and H1 promoters. Inducible promoters such as metallothionein promoters can be used.
  • promoters include the T7 phage promoter, the T3 phage promoter, the ⁇ -galactosidase promoter, and the Sp6 phage promoter.
  • the promoter may be cauliflower mosaic virus 35S promoter, corn Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, corn U3 promoter, rice actin promoter.
  • the 5' end of the guide RNA coding sequence is connected to a first The 3' end of the ribozyme coding sequence, the first ribozyme is designed to cleave the first ribozyme-guide RNA fusion generated by intracellular transcription at the 5' end of the guide RNA, thereby forming a fusion that does not carry 5' guide RNA with extra nucleotides at the end.
  • the 3' end of the guide RNA coding sequence is linked to the 5' end of a second ribozyme coding sequence, and the second ribozyme is designed to cleave intracellularly at the 3' end of the guide RNA.
  • the resulting guide RNA-second ribozyme fusion is transcribed, thereby forming a guide RNA that does not carry additional nucleotides at the 3' end.
  • the 5' end of the guide RNA coding sequence is linked to the 3' end of the first ribozyme coding sequence, and the 3' end of the guide RNA coding sequence is linked to the 5' end of the second ribozyme coding sequence.
  • the first ribozyme is designed to cleave the first ribozyme-guide RNA-second ribozyme fusion generated by intracellular transcription at the 5' end of the guide RNA
  • the second ribozyme is designed to The first ribozyme-guide RNA-second ribozyme fusion generated by intracellular transcription is cleaved at the 3' end of the guide RNA, thereby forming a guide RNA that does not carry extra nucleotides at the 5' and 3' ends.
  • first or second ribozyme is within the capabilities of those skilled in the art. For example, see Gao et al., JIPB, Apr, 2014; Vol 56, Issue 4, 343-349.
  • the 5' end of the guide RNA coding sequence is connected to a first 3' end of the tRNA coding sequence
  • the first tRNA is designed to be cleaved at the 5' end of the guide RNA (i.e., by the precise tRNA processing machinery present within the cell (which precisely excises the 5' and 5' ends of the precursor tRNA 3' additional sequence to form the first tRNA-guide RNA fusion generated by intracellular transcription of the mature tRNA), thus forming a non-carrying Guide RNA with extra nucleotides at the 5' end.
  • the 3' end of the guide RNA coding sequence is connected to the 5' end of a second tRNA coding sequence, and the second tRNA is designed to be generated by intracellular transcription of the 3' end tRNA of the guide RNA.
  • a guide RNA-second tRNA fusion thereby forming a guide RNA that does not carry additional nucleotides at the 3' end.
  • the 5' end of the guide RNA coding sequence is connected to the 3' end of the first tRNA coding sequence
  • the 3' end of the guide RNA coding sequence is connected to the 5' end of the second tRNA coding sequence
  • the first tRNA is designed to cleave the first tRNA-guide RNA-second tRNA fusion generated by intracellular transcription at the 5' end of the guide RNA
  • the second tRNA is designed to cleave at the 5' end of the guide RNA.
  • the 3' end cleaves the first tRNA-guide RNA-second tRNA fusion generated by intracellular transcription, thereby forming a guide RNA that does not carry additional nucleotides at the 5' and 3' ends.
  • tRNA-guide RNA fusions are within the capabilities of those skilled in the art. For example, you can refer to Xie et al., PNAS, Mar 17, 2015; vol.112, no.11, 3570-3575.
  • the present invention provides a method for site-directed modification of a target nucleic acid sequence in the genome of a cell, comprising introducing the genome editing system of the present invention into the cell.
  • the invention also provides a method of producing genetically modified cells, comprising introducing the genome editing system of the invention into the cells.
  • the invention also provides genetically modified organisms comprising genetically modified cells or progeny cells thereof produced by the methods of the invention.
  • the target sequence to be modified can be located anywhere in the genome, such as within a functional gene such as a protein-coding gene, or can be located in a gene expression regulatory region such as a promoter region or enhancer region, thereby achieving the described Modification of gene function or modification of gene expression. Modifications in the cellular target sequence can be detected by T7EI, PCR/RE or sequencing methods.
  • the gene editing system can be introduced into cells through various methods well known to those skilled in the art.
  • Methods that can be used to introduce the gene editing system of the present invention into cells include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (such as baculovirus, vaccinia virus, adenovirus, etc.) viruses, adeno-associated viruses, lentiviruses and other viruses), biolistics, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.
  • the methods of the invention are performed in vitro.
  • the cells are isolated cells, or cells in an isolated tissue or organ.
  • the methods of the present invention can also be performed in vivo.
  • the cells are cells in an organism, and the system of the present invention can be introduced into the cells in vivo by, for example, virus- or Agrobacterium-mediated methods.
  • Cells that can be genome edited by the method of the present invention can be from prokaryotes or eukaryotes, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens and ducks , goose; plant, bag Including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
  • the invention provides a method of producing a genetically modified plant, comprising introducing a genome editing system of the invention into at least one said plant, thereby causing a modification in the genome of said at least one plant.
  • the genome editing system can be introduced into the plant by various methods well known to those skilled in the art.
  • Methods that can be used to introduce the genomic system of the present invention into plants include, but are not limited to: biolistic method, PEG-mediated protoplast transformation, soil Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube channel method and ovary injection Law.
  • the modification of the target sequence can be achieved by simply introducing or producing the TraC effector protein or its functional variant or the fusion protein or guide RNA in plant cells, and the modification can be stable Genetically, there is no need to stably transform plants with the genome editing system. This avoids potential off-target effects of the stably existing genome editing system and avoids the integration of foreign nucleotide sequences in the plant genome, resulting in higher biosafety.
  • the introduction is performed in the absence of selection pressure, thereby avoiding integration of exogenous nucleotide sequences into the plant genome.
  • the introduction includes transforming the genome editing system of the invention into isolated plant cells or tissues, and then regenerating the transformed plant cells or tissues into intact plants.
  • the regeneration is performed in the absence of selection pressure, that is, without the use of any selection agent against the selection gene carried on the expression vector during tissue culture. Not using a selection agent can improve plant regeneration efficiency and obtain herbicide-resistant plants that do not contain exogenous nucleotide sequences.
  • the genome editing system of the present invention can be transformed into specific parts of an intact plant, such as leaves, shoot tips, pollen tubes, young ears, or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate in tissue culture.
  • in vitro expressed proteins and/or in vitro transcribed RNA molecules are directly transformed into the plant.
  • the protein and/or RNA molecules can achieve genome editing in plant cells and are subsequently degraded by the cells, avoiding the integration of exogenous nucleotide sequences in the plant genome.
  • the method further includes treating (e.g., culturing) the plant cells, tissues, or intact plants that have been introduced with the genome editing system at an elevated temperature (relative to a conventional culture temperature, e.g., room temperature), the elevated temperature being The highest temperature is, for example, 32°C.
  • the plant is rice.
  • genetic modification of plants using the methods of the present invention can result in plants whose genomes are free of exogenous polynucleotide integration, that is, non-transgene (transgene-free) modified plants.
  • the modification is associated with a plant trait, such as an agronomic trait, eg the modification results in the plant having altered (preferably improved) traits, eg agronomic traits, relative to a wild-type plant.
  • the method further includes the step of screening plants for desired modifications and/or desired traits, such as agronomic traits.
  • the method further includes obtaining progeny of the genetically modified plant.
  • the genetically modified plant or its progeny has the desired modifications and/or desired traits such as agronomic traits.
  • the present invention also provides a genetically modified plant or a progeny thereof or a part thereof, wherein said plant is obtained by the above-mentioned method of the present invention.
  • the genetically modified plant or progeny thereof or parts thereof are non-transgenic.
  • the genetically modified plant or its progeny has the desired genetic modification and/or the desired traits such as agronomic traits.
  • the present invention also provides a plant breeding method, comprising crossing a genetically modified first plant obtained by the above-mentioned method of the present invention with a second plant that does not contain the modification, so that the modified Import the second plant.
  • the genetically modified first plant has desirable traits such as agronomic traits.
  • the invention also encompasses the use of the genome editing system of the invention in disease treatment.
  • the up-regulation, down-regulation, inactivation, activation or mutation correction of disease-related genes can be achieved, thereby achieving prevention and/or treatment of diseases.
  • the genome modification described in the present invention can be located in the protein coding region of a disease-related gene, or can be located in a gene expression regulatory region such as a promoter region or enhancer region, thereby achieving functional modification or modification of the disease-related gene. Modification of disease-related gene expression. Therefore, modification of disease-related genes described herein includes modifications to the disease-related genes themselves (such as protein coding regions), as well as modifications to their expression regulatory regions (such as promoters, enhancers, introns, etc.).
  • a “disease-associated” gene refers to any gene that produces a transcription or translation product at abnormal levels or in an abnormal form in cells derived from disease-affected tissue as compared to non-disease control tissues or cells. Where altered expression is associated with the emergence and/or progression of a disease, it may be a gene that is expressed at an abnormally high level; it may be a gene that is expressed at an abnormally low level.
  • Disease-associated genes also refer to genes that have one or more mutations or genetic variants that are directly responsible for or in linkage disequilibrium with one or more genes responsible for the etiology of the disease. The mutation or genetic variation is, for example, a single nucleotide variation (SNV).
  • SNV single nucleotide variation
  • the invention also provides methods of treating a disease in a subject in need thereof, comprising delivering to said subject an effective amount of a genome editing system of the invention to modify a gene associated with said disease.
  • the present invention also provides the use of a genome editing system for the preparation of a pharmaceutical composition for treating a disease in a subject in need thereof, wherein the genome editing system is used to modify a gene associated with the disease.
  • the present invention also provides a pharmaceutical composition for treating a disease in a subject in need thereof, comprising the genome editing system of the present invention, and optionally a pharmaceutically acceptable carrier, wherein the genome editing system is used to modify the Disease-related genes.
  • the "subject" of the present invention is a mammal, such as a human.
  • the genome editing systems described herein are used to introduce point mutations into nucleic acids.
  • the genome editing systems described herein are used for the correction of genetic defects, such as in the correction of point mutations that result in loss of function in a gene product.
  • the genetic defect is associated with a disease or condition (eg, a lysosomal storage disease or a metabolic disease, such as, for example, Type I diabetes).
  • the methods provided herein can be used to introduce inactivating point mutations into genes encoding gene products associated with diseases or disorders. Cause or allele.
  • the protocols described herein are intended for the treatment of patients with diseases associated with or caused by point mutations that can be corrected by the genome editing systems provided herein.
  • the disease is a proliferative disease.
  • the disease is a genetic disease.
  • the disease is a de novo disease.
  • the disease is a metabolic disease.
  • the disease is a lysosomal storage disease.
  • mitochondrial diseases or disorders refers to diseases caused by abnormal mitochondria, such as mitochondrial gene mutations, enzymatic pathways, etc.
  • disorders include, but are not limited to: neurological disorders, loss of motor control, muscle weakness and pain, gastrointestinal disorders and difficulty swallowing, poor growth, heart disease, liver disease, diabetes, respiratory complications, epilepsy, vision/hearing problems, lactic acid Toxicity, developmental delay, and susceptibility to infection.
  • diseases described in the present invention include, but are not limited to, genetic diseases, circulatory system diseases, muscle diseases, brain, central nervous system and immune system diseases, Alzheimer's disease, secretase disorders, amyotrophic lateral sclerosis (ALS) ), autism, trinucleotide repeat expansion disorders, hearing disorders, gene-targeted therapy of non-dividing cells (neurons, muscles), liver and kidney diseases, epithelial cell and lung diseases, cancer, Usher syndrome or retinitis pigmentosa-39, cystic fibrosis, HIV and AIDS, beta thalassemia, sickle cell disease, herpes simplex virus, autism, drug addiction, age-related macular degeneration, schizophrenia .
  • genetic diseases include, but are not limited to, genetic diseases, circulatory system diseases, muscle diseases, brain, central nervous system and immune system diseases, Alzheimer's disease, secretase disorders, amyotrophic lateral sclerosis (ALS) ), autism, trinucleotide repeat expansion disorders, hearing disorders, gene-targeted therapy
  • WO2015089465A1 PCT/US2014/070135
  • WO2016205711A1 PCT/US2016/038181
  • WO2018141835A1 PCT/EP2018/052491
  • WO2020191234A1 PCT/US2 020/023713
  • WO2020191233A1 PCT/ Related diseases for which the genome editing system listed in US2020/023712
  • WO2019079347A1 PCT/US2018/056146
  • WO2021155065A1 PCT/US2021/015580
  • Administration of the genome editing systems or pharmaceutical compositions of the invention can be tailored to the body weight and species of the patient or subject.
  • the frequency of administration is within the limits of medical or veterinary medicine. It depends on general factors including the patient or subject's age, sex, general health, other conditions, and the specific condition or symptom being addressed.
  • kits for use in the method of the invention, the kit comprising the genome editing system of the invention, and instructions for use.
  • Kits generally include labels indicating the intended use and/or method of use of the contents of the kit.
  • the term label includes any written or recorded material on or provided with the kit or otherwise provided with the kit.
  • the next step is to filter out candidate proteins that are the same as the annotated type or have similar sequences to the annotated proteins through CRISPR type analysis and protein similarity analysis. After removing redundancy, 37 new proteins containing conserved domains (SEQ ID NO: 1- 37). These proteins are defined as intermediates between transposons and C RISPR-Cas12 (TraC for short). Correspondingly, the CRISPR system using TraC as the effector protein is defined as the CRISPR-TraC system.
  • FIG. 2 illustrates the prokaryotic expression of TraC-N483 protein.
  • 483 represents the name of the new protein, and repeat is the CRISPR locus region.
  • NC1 and NC2 are non-coding RNA regions where tracrRNA may exist.
  • Example 2 Using a fluorescent reporter system to screen new CRISPR lines with DNA binding ability in prokaryotic cells system
  • the inventor used a fluorescent reporter system to screen the function of the new CRISPR system. This system can screen CRISPR systems with DNA double-strand binding ability.
  • the specific experimental design is shown in Figure 3 and Figure 4:
  • a plasmid with p15a as the backbone is used to express Cas12 protein, miniCRISPR (repeat-spacer-repeat) and non-coding RNA sequence (ncRNA), use another plasmid with pBR322 as the backbone to express yellow fluorescent protein (YFP) (pUC-PAM-YFP), in which there is a target site complementary to the spacer sequence in the 5' untranslated region of the YFP protein and its
  • the sequence of the upstream random PAM library is: nnnnnnGTGATCGACAGCAACAAGTGAGCG or nnnnGTGATCGACAGCAACAAGTGAGCG, where nnnnnn and nnnn are PAM libraries of different lengths, covering 4096
  • FIG. 4 shows the screening results of dLbCas12a protein. Bacteria with extremely low YFP expression in the P2 region (B box) were sorted by flow cytometry.
  • the above system can be used to screen the DNA double-stranded binding characteristics of candidate proteins of the new CRISPR system, and the inventors screened some representative candidate proteins.
  • TraC-N287, TraC-445, TraC-483, and TraC-655 all screened out T-rich PAMs
  • TraC-N701 is a G-rich PAM, which implies that most of these proteins are T-rich or have a small amount of G.
  • the enriched PAM is consistent with the previously reported finding that most Cas12 family proteins recognize T-enriched PAM.
  • Example 3 Detailed detection of PAM with double-stranded cleavage functional protein by next-generation sequencing
  • this system can be used to screen CRISPR systems with DNA double-strand cutting ability.
  • the specific experimental design is as follows:
  • the plasmid containing the PAM library and the plasmid expressing the protein were co-transfected (this is the treatment group).
  • the protein expression vector with the crRNA expression box deleted was co-transfected with the plasmid of the PAM library to form a control group.
  • PAMs that can be recognized and cleaved by the protein to be tested will be lost, resulting in a decrease in the proportion of targeted PAMs relative to the control group. Therefore, the PAM sequence of the protein to be tested can be obtained by comparing the depletion status of the two PAM libraries through next-generation sequencing.
  • Example 4 Using plasmid interference system to screen new CRISPR systems with DNA cutting ability in prokaryotic cells
  • this example uses a plasmid interference system as a detection model.
  • the specific experimental design is shown in Figure 7.
  • the plasmid interference experimental system was used to verify the specific PAM information of the candidate protein with obvious PAM obtained in Example 3.
  • the specific implementation process is as follows: Taking the candidate protein TraC-459 as an example, it was obtained in Example 3 that the protein can recognize a typical 5 '-TTC-3'PAM motif, the 3' of the motif is adjacent to the GFP-T1 target site (SEQ ID NO: 79), using the pUC-polyT-YFP vector as a template to construct a series of PAM sequences with Tra-C459 that can be recognized Target vectors (pUC-TTC-YFP, pUC-GTC-YFP, pUC-TCC-YFP, pUC-TTG-YFP, pUC-TGC-YFP, pUC-CTTC-YFP, pUC-GTTC-YFP and pUC-TTTC -YFP), co-transform the Y53-459 vector and the above target vector into E.
  • TraC-875 protein has strong cleavage activity under the 5'-CTCGTG-3'PAM motif, and its detailed PAM sequence needs further exploration; TraC-297 protein can extensively and efficiently cleave 5'-GTTG-3', 5'-CTTG-3', 5'-TCTG-3', 5'-TTTA-3', 5'-TTAG-3'
  • the target sequence under the PAM motif; TraC-949 protein can cleave the target sequence under the 5'-NTGA-3'PAM motif, and the cleavage efficiency of the target sequence under the 5'-TTGA-3'PAM motif is The highest, while the cleavage efficiency for 5'-TTGA-3', 5'-ATGA-3', 5'-GTGA-3', and 5'-CTGA-3'PAM targets is relatively low.
  • Figure 8B The results are shown in Figure 8B.
  • the inventors predicted the secondary structure of the accessory RNA of the V-type CRISPR and TnpB systems and analyzed its structural folding model (Figure 9A).
  • the study found that different protein subtypes can be divided into three categories according to the folding model.
  • the folding model reflects the characteristics of the three types of CRISPR loci, which are the distance between the CRISPR protein and tracrRNA or the absence of tracrRNA.
  • the classification results indicate that the TnpB protein may have experienced a transposon jump to the CRISPR site, or reRNA split into tracrRNA and CRISPR RNA.
  • the diversity of accessory RNA assemblages also supports the model of coevolution of effectors and accessory RNAs. type (see Figure 9B for the evolutionary model).
  • Example 6 Editing activity of TraC protein using sgRNA as guide RNA
  • the TraC-459 protein in the TraC system was selected to verify its editing activity on DNA.
  • sgRNA-pre predicted sgRNA (sgRNA-predicted, referred to as sgRNA-pre) for the VEGFA-T1 site of HEK293T cells through tracrRNA and crRNA recombination (see Figure 10A).
  • sgRNA-pre predicted sgRNA
  • tracrRNA crRNA complementary region truncated to a length of 11-15bp, or tracrRNA 5' region truncated to a length of 19-21bp, or a spacer length of 22-27bp shows better editing effects.
  • sgRNA-opt the optimized sgRNA (sgRNA-optimal, referred to as sgRNA-opt) was obtained.
  • This optimization strategy is called the second generation sgRNA optimization method of the TraC system (referred to as sgRNA-v2).
  • Figure 11 shows that sgRNA-opt, as a guide RNA, can significantly improve the editing efficiency of TraC-459.
  • Example 7 Editing activity of TraC protein using reRNA as guide RNA
  • the coevolution model of Example 5 predicts that the TraC protein is an evolved descendant of TnpB. Because the TnpB system uses the 3' flanking sequence as a guide RNA for DNA cleavage (Karvelis, T. et al. Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature 599, 692-696 (2021).). This example examines the use of TnpB guide RNA in the CRISPR system of TraC protein.
  • the inventors selected reRNAs of TnpB mutant proteins (882-TnpB-reRNA, 966-TnpB-reRNA) with similar structures as guide RNAs to be verified.
  • the inventor aimed at the GFP-T1 target and fused the scaffold sequences of 882-TnpB-reRNA and 966-TnpB-reRNA with the targeting sequence of GFP-T1.
  • the plasmid interference experiment in Example 4 was then used to analyze the dsDNA cutting ability of TraC-459 in E. coli e.coli under different guide RNAs.
  • the experimental results are shown in Figure 12.
  • Experimental results show that TraC-459 shows varying degrees of DNA interference activity under different types of guide RNA relative to the blank vector control (shown as pEmpty in Figure 12).
  • TraC4-59 has a dual-guide mechanism and has the targeted cleavage pathways of both the TnpB system and the CRISPR system. That is, the TraC effector protein can both target and bind the target DNA under the guidance of reRNA. Targeted binding to target DNA under the guidance of sgRNA.
  • the inventor constructed the duplex sequence of the TraC-459 protein and used the multimer v3 model of AlphaFold2 to predict the three-dimensional structure folding of the TraC-459 protein.
  • the results showed that the five predicted TraC-459 proteins were the most In the optimal protein structures (Rank 1 to 5), there is no double-body interaction (Figure 13).
  • a predicted alignment error (PAE) heatmap providing a distance error for each pair of residues. When predicted and true It gives AlphaFold2's estimate of the positional error at residue x when the structure is aligned at residue y. Values range from 0-35 Angstroms (white-black).
  • this example obtained a series of optimized TraC-459 variants through arginine scanning mutation, directed evolution and artificial intelligence-assisted evolution methods.
  • the screening process is shown in Figure 14a- c.
  • some of the selected TraC-459 mutants have higher editing efficiency.
  • the experiment tested the editing efficiency of the five mutants in the mutant library, and a total of three sets of parallel experiments were conducted (Table 2).
  • the ratio of the editing efficiency of the obtained mutant to that of wild-type TraC-459 is >1, indicating that the mutant has higher editing efficiency.
  • mutants with improved editing efficiency screened according to this method are shown in Table 3.
  • Representative 5-arginine mutants screened through arginine scanning mutation screening include TraC-5M-7 (S137R, P148R, D150R, K315R and A369R), which is the TraC-5M-7 mutant in Figure 14b.
  • Research shows that the editing efficiency of TraC-5M-7 at the VEGFA-T1 site is 24.02 times higher than that of the original TraC-459.
  • the TraC mutants with improved editing efficiency designed according to this method are shown in Table 3.
  • the inventors developed a deep learning model through the data of a series of TraC variants and obtained 7 representative mutants TraC-B22, -B24, -B26, - B32, -B34, -B35, and B36 have enhanced editing activity in human cells (the editing activity is shown in Figure 14d, and the mutation sites are shown in Table 4).
  • TraC-459 is a highly compact monomeric Cas12-like protein.
  • the inventor found that it is the smallest monomeric CRISPR effector protein currently known. And it has a unique sgRNA and reRNA dual-guide mechanism and dual-pairing function, which does not exist in other Cas12 subtypes.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

本发明属于基因工程领域。具体而言,本发明涉及一种新的CRISPR基因编辑系统及其应用。更具体而言,本发明提供了一种转座子与CRISPR-Cas12中间体(TraC)效应蛋白或其功能性变体,以及基于其的基因编辑系统及其应用。

Description

新的CRISPR基因编辑系统 技术领域
本发明属于基因工程领域。具体而言,本发明涉及一种新的CRISPR基因编辑系统及其应用。更具体而言,本发明提供了一种转座子与CRISPR-Cas12中间体(TraC)效应蛋白或其功能性变体,以及基于其的基因编辑系统及其应用。
发明背景
Type V类型的效应蛋白为具有多个功能域的Cas12蛋白,其标志性的特征为包含类RuvC结构域,该结构域一般负责靶DNA的切割。Type V的亚型非常丰富,目前已发现并分类的亚型包括Cas12a-k共计11种亚型,其中Cas12a与Cas12b已经被开发成高效的真核生物基因编辑系统。Cas12a也被称作Cpf1蛋白,包括一个与Cas9蛋白或TnpB蛋白类似的RuvC-like的结构域,但与Cas9相比,Cas12a家族蛋白缺乏HNH结构域,仅使用RuvC结构域切割DNA的两条链。Cas12b在最初被挖掘到时被称作C2c1(Class 2 Candidate 1),其C端序列与IS605家族的TnpB蛋白非常相似,但并不与其他Class II家族的蛋白有显著的序列相似性。其Cas基因包括Cas1/Cas4的融合基因、Cas2、与Cas12b基因。其crRNA的成熟也需要trRNA的参与。Cas12c在最初被称为C2c3(Class 2 Candidate 3),其Cas基因仅包括Cas1与Cas12c基因,Cas12c基因仅与Cpf1的TnpB同源序列部分具有有限的相似性。
得益于生物信息分析方法的改进,Type V类型的亚型数量近年来有了爆发性的增长,包括Cas12a、Cas12b与Cas12c蛋白在内,共计10种Type V的亚型被发现。关于这些亚型的核酸干涉活性也逐渐被实验证明。如Arbor生物技术公司的科学家通过体外实验证明了来自Type V-C、Type V-G、Type V-H、Type V-I的效应蛋白Cas12c、Cas12g、Cas12h和Cas12i的DNA双链切割活性。此外,Type V-D与Type V-E亚型的效应蛋白分别为CasX与CasY,也被称为Cas12d与Cas12e,最初在“不可培养”的微生物的宏基因组中被发现,在2019年也被证明其具有在大肠杆菌和人类细胞系中的基因组编辑活性。之前被认为是Type V-U家族的亚型之一的Type V-F亚型的效应蛋白为Cas14(也被称为Cas12f),可以切割单链DNA与RNA,这种大小仅为Cas9蛋白三分之一的Cas14蛋白首先被开发成了核酸检测工具DETECTOR,近期也被证明其有原核、真核生物体内DNA双链的切割活性。而近期在巨噬菌体中发现的Casφ蛋白(又称Cas12j)也被证明在原核、动物细胞与植物细胞中均具有DNA双链的切割能力。在Type V-K类型中,其效应蛋白Cas12k被转座子Tn7所“劫持”,在靶位点处可以产生R-环,并利用crRNA的靶向能力,实现转座子的定点转座。这种劫持蛋白提供了一个新的靶向插入DNA的策略。
鉴定新的能够实现基因编辑功能的CRISPR效应蛋白,能够丰富基因组编辑工具,对于生物医药领域具有重要意义。
发明简述
本发明至少提供以下技术方案:
实施方案1.一种工程化的规律间隔成簇短回文重复序列(CRISPR)系统,包含:
a)转座子和CRISPR-Cas12中间体(TraC)效应蛋白或者编码该效应蛋白的一种或多种核苷酸序列;和
b)一种或多种向导RNA,或者编码该一种或多种向导RNA的核苷酸序列,
其中向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
所述TraC效应蛋白能够与向导RNA形成CRISPR复合物;
所述TraC效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列。
实施方案2.根据实施方案1所述的工程化的CRISPR系统,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB)。
实施方案3.一种包含一种或多种构建体的工程化的规律间隔成簇短回文重复序列CRISPR载体系统,包含:
a)可操作地连接至编码转座子和CRISPR-Cas12中间体(TraC)效应蛋白的核苷酸序列的第一调节元件;和
b)可操作地连接至一种或多种核苷酸序列的第二调节元件,该一种或多种核苷酸序列编码一种或多种向导RNA,该向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
所述TraC效应蛋白能够与向导RNA形成CRISPR复合物;
所述TraC效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列。
实施方案4.根据实施方案3所述的工程化的CRISPR载体系统,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB)。
实施方案5.如实施方案2或4所述的系统,其中向导RNA为包含tracrRNA和crRNA的向导RNA,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB),其中所述向导RNA通过crRNA与目标DNA序列的靶向链(TS)杂交,并且通过NTB与非靶向链(NTS)杂交。
实施方案6.如实施方案4所述的系统,其中当转录时,该一种或多种向导RNA 与目标DNA杂交,并且向导RNA与该TraC效应蛋白形成复合物,该复合物引起该目标DNA序列远端切割。
实施方案7.如实施方案1-6中任一项所述的系统,其中该目标DNA序列是在细胞内,优选为真核细胞内。
实施方案8.如实施方案1-7中任一项所述的系统,其中该效应蛋白包含一个或多个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
实施方案9.如实施方案1-8中任一项所述的系统,其中编码该效应蛋白的这些核酸序列被密码子优化,用于在真核细胞中表达。
实施方案10.如实施方案1-9中任一项所述的系统,其中组分a)和b)或它们的核苷酸序列构建在相同或不同载体上。
实施方案11.一种修饰目的DNA序列的方法,该方法包括将如实施方案1-10中任一项所述的系统地送到所述目的DNA序列或含有该目的DNA序列的细胞中。
实施方案12.一种修饰目的DNA序列的方法,该方法包括将TraC效应蛋白和一种或多种核酸组分的组合物递送至所述目的DNA序列,其中所述效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列;该效应蛋白与该一种或多种核酸组分形成CRISPR复合物,并且在所述复合物与是前间区序列邻近基序(PAM)的3’的目的DNA序列靶向结合后,该效应蛋白诱导对该目的DNA序列的修饰。
实施方案13.如实施方案12所述的方法,其中该目的基因是在细胞内,优选为真核细胞。
实施方案14.如实施方案13所述的方法,其中该细胞是动物细胞或人类细胞。
实施方案15.如实施方案13所述的方法,其中该细胞是植物细胞。
实施方案16.如实施方案12所述的方法,其中该效应蛋白包含一个或多个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
实施方案17.如实施方案12所述的方法,其中效应蛋白和核酸组分,或表达所述效应蛋白和核酸组分的构建体被包含在一个递送系统中。
实施方案18.如实施方案17所述的方法,其中递送系统包括病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体、脂质体纳米颗粒(LNP)、N-乙酰半乳糖胺(GalNAc)或工程菌。
实施方案19.一种用于在生物体或生物体细胞中进行基因组编辑的转座子与CRISPR-Cas12中间体(TraC)效应蛋白或其功能性变体,其中所述TraC效应蛋白或其功能性变体能够与向导RNA形成CRISPR复合物;
所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
所述TraC效应蛋白或其功能性变体既能够在衍生自转座子右端元件的向导RNA的 引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列。
实施方案20.用于在生物体或生物体细胞中进行基因组编辑的转座子与CRISPR-Cas12中间蛋白(TraC)效应蛋白或其功能性变体,所述TraC效应蛋白或其功能性变体(i)包含选自“TSxxCxxCx”、“GIDRG”和“CxxCGxxxxADxxAA”的至少一个、至少两个或全部三个氨基酸序列基序,其中x代表任意氨基酸,例如任意天然编码的氨基酸;和
(ii)包含与SEQ ID NO:1-37之一具有至少30%、至少35%、至少40%、至少45%、至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%、至少99.9%、甚至100%序列相同性的氨基酸序列,或包含相对于SEQ ID NO:1-37具有一或多个,例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸取代、缺失或添加的氨基酸序列。
实施方案21.实施方案20所述的TraC效应蛋白或其功能性变体,其中所述效应蛋白功能性变体衍生自SEQ ID NO:25,且相对于SEQ ID NO:25序列包含选自K78R、D86R、S137R、V145R、I147R、P148R、D150R、V228R、V254R、A510R、A278R、K315R、S334R、L343R、A369R、H392R、L394R、S408R、N456R、V500R、A510R、T573R的一个或多个氨基酸取代。
实施方案22.实施方案20或21所述的TraC效应蛋白或其功能性变体,其中所述效应蛋白功能性变体衍生自SEQ ID NO:25,且相对于SEQ ID NO:25序列包含选自表3或表4中所示的任一组突变。
实施方案23.实施方案20所述的TraC效应蛋白或其功能性变体,其中所述效应蛋白功能性变体包含选自SEQ ID NO:80-87的氨基酸序列。
实施方案24.实施方案20-23中任一项的TraC效应蛋白或其功能性变体,其至少具有向导RNA介导的序列特异性靶向能力。
实施方案25.实施方案20-23中任一项的TraC效应蛋白或其功能性变体,其具有向导RNA介导的序列特异性靶向能力,以及双链核酸切割活性。
实施方案26.实施方案20-23中任一项的TraC效应蛋白或其功能性变体,其具有向导RNA介导的序列特异性靶向能力,以及切口酶活性。
实施方案27.实施方案20-23中任一项的TraC效应蛋白或其功能性变体,其具有向导RNA介导的序列特异性靶向能力,但不具有双链核酸切割活性和/或切口酶活性。
实施方案28.实施方案24-27中任一项的TraC效应蛋白或其功能性变体,其中所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA)。
实施方案29.实施方案28的TraC效应蛋白或其功能性变体,所述TraC效应蛋 白或其功能性变体既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列。
实施方案30.实施方案28的TraC效应蛋白或其功能性变体,其中所述向导RNA是衍生自TnpB系统的reRNA,例如,所述reRNA包含SEQ ID NO:77或78所示支架序列。
实施方案31.实施方案28的TraC效应蛋白或其功能性变体,其中所述向导RNA是tracrRNA和crRNA的单向导RNA(sgRNA),例如,所述sgRNA包含SEQ ID NO:75或76所示支架序列。
实施方案32.实施方案19-31中任一项的TraC效应蛋白或其功能性变体,其还包含至少一个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
实施方案33.一种融合蛋白,包含实施方案19-32中任一项所述TraC效应蛋白或其功能性变体,以及至少一种其它功能性蛋白。
实施方案34.实施方案33的融合蛋白,其中所述其它功能性蛋白是脱氨酶。
实施方案35.实施方案34的融合蛋白,其中所述脱氨酶是胞嘧啶脱氨酶,例如,所述胞嘧啶脱氨酶选自APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶、双链DNA脱氨酶(Ddd)、单链DNA脱氨酶(Sdd)或它们的功能性变体。
实施方案36.实施方案35的融合蛋白,所述融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。
实施方案37.实施方案34的融合蛋白,其中所述脱氨酶是腺嘌呤脱氨酶,例如,衍生自大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的DNA依赖型腺嘌呤脱氨酶。
实施方案38.实施方案34-37中任一项的融合蛋白,其中所述融合蛋白包括胞嘧啶脱氨酶和腺嘌呤脱氨酶。
实施方案39.实施方案33的融合蛋白,其中所述其它功能性蛋白是选自转录激活蛋白、转录抑制蛋白、DNA甲基化酶、DNA去甲基化酶、逆转录酶。
实施方案40.实施方案33-39中任一项的融合蛋白,其中所述融合蛋白的不同部分之间可以独立地通过接头或直接相连。
实施方案41.实施方案33-40中任一项的融合蛋白,其还包含至少一个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
实施方案42.实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白在对细胞,优选真核细胞,更优选植物细胞进行基因组编辑的用途。
实施方案43.实施方案42的用途,其中所述基因组编辑包括碱基编辑(Base Editor)、引导编辑(Prime Editor)、PrimeRoot编辑(PrimRoot Editor)。
实施方案44.一种用于对细胞基因组中靶核酸序列进行定点修饰的基因组编辑系 统,其包含:
实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白;和/或
编码实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白的核苷酸序列的表达构建体。
实施方案45.实施方案44的基因组编辑系统,其还包括至少一种向导RNA(gRNA)和/或包含编码所述至少一种向导RNA的核苷酸序列的表达构建体。
实施方案46.实施方案45的基因组编辑系统,其中所述基因组编辑系统包含选自以下的任一项:
i)实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白,和所述至少一种向导RNA,任选地,所述TraC效应蛋白或其功能性变体或所述融合蛋白和所述至少一种向导RNA形成复合物;
ii)包含编码实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白的核苷酸序列的表达构建体,和所述至少一种向导RNA;
iii)实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白,和包含编码所述至少一种向导RNA的核苷酸序列的表达构建体;
iv)包含编码实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白的核苷酸序列的表达构建体,和包含编码所述至少一种向导RNA的核苷酸序列的表达构建体;
v)包含编码实施方案19-32中任一项的TraC效应蛋白或其功能性变体或实施方案33-41中任一项的融合蛋白的核苷酸序列和编码所述至少一种向导RNA的核苷酸序列的表达构建体。
实施方案47.实施方案45-46中任一项的基因组编辑系统,其中所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA)。
实施方案48.实施方案47的基因组编辑系统,其中所述向导RNA是衍生自TnpB系统的reRNA,例如,所述reRNA包含SEQ ID NO:77或78所示支架序列。
实施方案49.实施方案45-46中任一项的基因组编辑系统,其中所述向导RNA是包含tracrRNA和crRNA的单向导RNA(sgRNA),例如,所述sgRNA包含SEQ ID NO:75或76所示支架序列。
实施方案50.实施方案47或49的基因组编辑系统,其中所述向导RNA包含tracrRNA和crRNA,例如是包含tracrRNA和crRNA的单向导RNA(sgRNA),其中所述crRNA包含与PAM紧邻的靶序列相同的序列,tracrRNA包含与位于PAM靶序列方向远端的序列互补的序列(非靶向链结合序列,NTB)。
实施方案51.实施方案44-50中任一项的基因组编辑系统,其中所述基因组编辑系统还包含供体核酸分子,所述供体核酸分子包含待定点插入基因组中的核苷酸序列,例 如所述待定点插入基因组中的核苷酸序列两侧包含与基因组中靶序列两侧序列同源的序列。
实施方案52.实施方案44-51中任一项的基因组编辑系统,其中编码所述TraC效应蛋白或其功能性变体或所述融合蛋白的核苷酸序列和/或编码所述至少一种向导RNA的核苷酸序列与表达调控元件如启动子可操作地连接。
实施方案53.实施方案44-52中任一项的基因组编辑系统,其中所述基因组编辑系统的组分被包含在递送体系中,所述递送体系选自病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体、脂质体纳米颗粒(LNP)、N-乙酰半乳糖胺(GalNAc)或工程菌。
实施方案54.一种产生经遗传修饰的细胞的方法,包括将实施方案44-53中任一项的基因组编辑系统导入所述细胞。
实施方案55.实施方案54的方法,其中所述细胞来自原核生物或真核生物,优选来自哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥。
本发明的优点主要在于:
(1)本发明得到了新的CRISPR效应蛋白及其基因组编辑系统,丰富了基因组编辑工具的选择和应用场景;
(2)本发明所得到的TraC亚支CRISPR效应蛋白具有双向导机制,同时具有TnpB系统和CRISPR系统的靶向切割途径,即TraC效应蛋白既能够在reRNA的引导下靶向结合目标DNA,也能够在sgRNA的引导下靶向结合目标DNA,有助于实现相同基因编辑工具下的多重基因组编辑;
(3)本发明所得到的TraC效应蛋白为目前已知的单体Cas12蛋白中最小的单体,有助于实现生物体内的递送和编辑;
(4)本发明所得到的TraC效应蛋白在含有与非靶向链(NTS)互补配对的非靶向链互补序列(NTB)的sgRNA的引导下与目标dsDNA的非靶向链互作形成气泡结构,有助于PAM远端DNA的打开和编辑。
附图简述
图1.示出三个保守存在于86个Cas12蛋白的结构基序。
图2.示出TraC蛋白的原核表达系统。
图3.示出利用荧光报告系统筛选有DNA双链结合能力的CRISPR系统的流程图。
图4.利用荧光报告系统筛选dLbCas12a蛋白的结果。
图5.筛选有DNA双链切割能力的CRISPR系统的流程图。
图6.DNA双链切割能力的测试结果。A:TraC-875、TraC-365、TraC-655、TraC-445的测试结果;B:TraC-297、TraC-459、TraC-466、TraC-949的测试结果。LbCpf1作为阳 性对照。
图7.使用质粒干涉系统检测新型CRISPR系统的DNA双链切割能力的流程图。
图8.使用质粒干涉系统测试TraC-459、TraC-875与TraC-297蛋白的结果。
图9.A:V型CRISPR和TnpB系统的附属RNA二级结构预测和结构折叠模型分析;B:效应物与附属RNA协同进化的模型。
图10.示出TraC蛋白sgRNA的优化。A:TraC-459蛋白预测的sgRNA;B:tracrRNA:crRNA互补区截短长度、tracrRNA 5’区域截短长度、间隔区(spacer)长度对TraC-459蛋白编辑效率的影响。
图11.示出优化的sgRNA-opt可以显著提升TraC-459的编辑效率。
图12.示出使用质粒干涉实验分析TraC-459在不同向导RNA下对大肠杆菌E.coli的dsDNA切割能力。
图13.示出TraC-459蛋白三维结构折叠情况预测结果。
图14.示出TraC-459变体筛选。
图15.TraC效应蛋白sgRNA复合物的二级结构预测显示在tracrRNA末端有一个类似气泡结构的区域。
图16.TraC效应蛋白在重编程的sgRNA向导下的类似气泡结构的区域靶向DNA。
图17.重编程的sgRNA可以提高编辑效率。
图18.示出TraC蛋白在植物细胞中受温度影响。TraC-5M-7在32℃下的编辑效率比在25℃下的编辑效率高1-29倍。
发明详述
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,Molecular Cloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold Spring Harbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
如本文所用,术语“和/或”涵盖由该术语连接的项目的所有组合,应视作各个组合已经单独地在本文列出。例如,“A和/或B”涵盖了“A”、“A和B”以及“B”。例如,“A、B和/或C”涵盖“A”、“B”、“C”、“A和B”、“A和C”、“B和C”以及“A和B和C”。
“包含”一词在本文中用于描述蛋白质或核酸的序列时,所述蛋白质或核酸可以是由所述序列组成,或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷 酸,但仍然具有本发明所述的活性。此外,本领域技术人员清楚多肽N端由起始密码子编码的甲硫氨酸在某些实际情况下(例如在特定表达系统表达时)会被保留,但不实质影响多肽的功能。因此,本申请说明书和权利要求书中在描述具体的多肽氨基酸序列时,尽管其可能不包含N端由起始密码子编码的甲硫氨酸,然而此时也涵盖包含该甲硫氨酸的序列,相应地,其编码核苷酸序列也可以包含起始密码子;反之亦然。
“基因组”如本文所用不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
如本文所用,“生物体”包括适于基因组编辑的任何生物体,优选真核生物。生物体的实例包括但不限于,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物包括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥等。
“经遗传修饰的生物体”或“经遗传修饰的细胞”意指在其基因组内包含外源多核苷酸或修饰的基因或表达调控序列的生物体或细胞。例如外源多核苷酸能够稳定地整合进生物体或细胞的基因组中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。修饰的基因或表达调控序列为在生物体或细胞基因组中所述序列包含单个或多个脱氧核苷酸取代、缺失和添加。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A”为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G”表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H”表示A或C或T,“I”表示肌苷,并且“N”表示任何核苷酸。尽管本文中的核苷酸序列可能以DNA序列表示(包含T),但在提及RNA时,本领域技术人员可以容易地确定相应的RNA序列(即用U替换T)。
“多肽”、“肽”、和“蛋白质”在本发明中可互换使用,指氨基酸残基的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白质”还可包括修饰形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。
序列“相同性”具有本领域公认的含义,并且可以利用公开的技术计算两个核酸或多肽分子或区域之间序列相同性的百分比。可以沿着多核苷酸或多肽的全长或者沿着该分子的区域测量序列相同性。(参见,例如:Computational Molecular Biology,Lesk,A.M.,ed.,Oxford University Press,New York,1988;Biocomputing:Informatics and Genome Projects,Smith,D.W.,ed.,Academic Press,New York,1993;Computer Analysis of Sequence  Data,Part I,Griffin,A.M.,and Griffin,H.G.,eds.,Humana Press,New Jersey,1994;Sequence Analysis in Molecular Biology,von Heinje,G.,Academic Press,1987;and Sequence Analysis Primer,Gribskov,M.and Devereux,J.,eds.,M Stockton Press,New York,1991)。虽然存在许多测量两个多核苷酸或多肽之间的相同性的方法,但是术语“相同性”是技术人员公知的(Carrillo,H.&Lipman,D.,SIAM J Applied Math 48:1073(1988))。
在肽或蛋白中,合适的保守型氨基酸取代是本领域技术人员已知的,并且一般可以进行而不改变所得分子的生物活性。通常,本领域技术人员认识到多肽的非必需区中的单个氨基酸取代基本上不改变生物活性(参见,例如,Watson et al.,Molecular Biology of the Gene,4th Edition,1987,The Benjamin/Cummings Pub.co.,p.224)。
如本发明所用,“构建体”或“表达构建体”是指适于感兴趣的核苷酸序列在生物体中表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA)。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中,启动子是能够控制细胞中基因转录的启动子,无论其是否来源于所述细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用,并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”生物体是指用所述核酸或蛋白质转化生物体细胞,使得所述核酸或蛋白质在细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。
“稳定转化”指将外源核苷酸序列导入基因组中,导致外源核苷酸序列稳定遗传。 一旦稳定转化,外源核酸序列稳定地整合进所述生物体和其任何连续世代的基因组中。
“瞬时转化”指将核酸分子或蛋白质导入细胞中,执行功能而没有外源核苷酸序列稳定遗传。瞬时转化中,外源核酸序列不整合进基因组中。
“性状”指细胞或生物体的生理的、形态的、生化的或物理的特征。
“农艺性状”特别是指作物植物的可测量的指标参数,包括但不限于:叶片绿色、籽粒产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白含量、种子蛋白含量、植物营养组织蛋白质含量、除草剂的抗性抗旱性、氮的吸收、根的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗盐性和分蘖数等。
二、基因组编辑系统
本发明提供了一类新的CRISPR效应蛋白,其具有TnpB系统和CRISPR系统的靶向切割活性,即既能够在reRNA的引导下靶向结合目标DNA,也能够在tracrRNA和/或crRNA组成的向导RNA如sgRNA的引导下靶向结合目标DNA。此类亚型CRISPR核酸酶在本文也称为转座子与CRISPR-Cas12中间体(TraC)效应蛋白。
因此,在一方面,本发明提供了一种工程化的规律间隔成簇短回文重复序列(CRISPR)系统,包含:
a)转座子和CRISPR-Cas12中间体(TraC)效应蛋白或者编码该效应蛋白的一种或多种核苷酸序列;和
b)一种或多种向导RNA,或者编码该一种或多种向导RNA的核苷酸序列,
其中向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
所述TraC效应蛋白能够与向导RNA形成CRISPR复合物;
所述TraC效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列。
在一些实施方案中,所述工程化的规律间隔成簇短回文重复序列(CRISPR)系统是用于在生物体或生物体细胞中进行基因组编辑的基因组编辑系统。
在一些实施方案中,所述TraC效应蛋白如下文所定义。
在一些实施方案中,所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB)。
在一方面,本发明还提供了一种包含一种或多种构建体的工程化的规律间隔成簇短回文重复序列CRISPR载体系统,包含:
a)可操作地连接至编码转座子和CRISPR-Cas12中间体(TraC)效应蛋白的核苷酸序 列的第一调节元件;和
b)可操作地连接至一种或多种核苷酸序列的第二调节元件,该一种或多种核苷酸序列编码一种或多种向导RNA,该向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
所述TraC效应蛋白能够与向导RNA形成CRISPR复合物;
所述TraC效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列。
在一些实施方案中,所述TraC效应蛋白如下文所定义。
在一些实施方案中,所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB)。
在一些实施方案中,向导RNA为包含tracrRNA和/或crRNA的向导RNA,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB),其中所述向导RNA通过crRNA与目标DNA序列的靶向链(TS)杂交,并且通过NTB与非靶向链(NTS)杂交。
在一些实施方案中,其中当转录时,该一种或多种向导RNA与目标DNA杂交,并且向导RNA与该TraC效应蛋白形成复合物,该复合物引起该目标DNA序列远端切割。
在一些实施方案中,该目标DNA序列是在细胞内,优选为真核细胞内。
在一些实施方案中,该效应蛋白包含一个或多个核定位信号。
在一些实施方案中,编码该效应蛋白的这些核酸序列被密码子优化,用于在真核细胞中表达。
在一些实施方案中,组分a)和b)或它们的核苷酸序列构建在相同或不同载体上。
在一方面,本发明提供一种修饰目的DNA序列的方法,该方法包括将本文所述的系统地送到所述目的DNA序列或含有该目的DNA序列的细胞中。
在一方面,本发明提供一种修饰目的DNA序列的方法,该方法包括将TraC效应蛋白和一种或多种核酸组分的组合物递送至所述目的DNA序列,其中所述效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列;该效应蛋白与该一种或多种核酸组分形成CRISPR复合物,并且在所述复合物与是前间区序列邻近基序(PAM)的3’的目的DNA序列靶向结合后,该效应蛋白诱导对该目的DNA序列的修饰。
在一些实施方案中,其中该目的DNA序列是在细胞内,优选为真核细胞。
在一些实施方案中,其中该细胞是动物细胞或人类细胞。
在一些实施方案中,其中该细胞是植物细胞。
在一些实施方案中,其中该效应蛋白包含一个或多个核定位序列(NLS)、细胞质定 位序列、叶绿体定位序列或线粒体定位序列。
在一些实施方案中,其中效应蛋白和核酸组分,或表达所述效应蛋白和核酸组分的构建体被包含在一个递送系统中。
在一些实施方案中,其中递送系统包括病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体、脂质体纳米颗粒(LNP)、N-乙酰半乳糖胺(GalNAc)或工程菌。
在一方面,本发明提供一种用于在生物体或生物体细胞中进行基因组编辑的转座子与CRISPR-Cas12中间体(TraC)效应蛋白或其功能性变体,其中所述TraC效应蛋白或其功能性变体能够与向导RNA形成CRISPR复合物;
所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
所述TraC效应蛋白或其功能性变体既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列。
在一方面,本发明提供用于在生物体或生物体细胞中进行基因组编辑的转座子与CRISPR-Cas12中间体(TraC)效应蛋白或其功能性变体,所述TraC效应蛋白或其功能性变体
(i)包含选自“TSxxCxxCx”、“GIDRG”和“CxxCGxxxxADxxAA”的至少一个、至少两个或全部三个氨基酸序列基序,其中x代表任意氨基酸,例如任意天然编码的氨基酸;和
(ii)包含与SEQ ID NO:1-37之一具有至少30%、至少35%、至少40%、至少45%、至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%、至少99.9%、甚至100%序列相同性的氨基酸序列,或包含相对于SEQ ID NO:1-37具有一或多个,例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸取代、缺失或添加的氨基酸序列。
在一些实施方案中,所述效应蛋白或其功能性变体衍生自SEQ ID NO:25。
在一些实施方案中,所述效应蛋白或其功能性变体相对于SEQ ID NO:25的序列包含选自K78R、D86R、S137R、V145R、I147R、P148R、D150R、V228R、V254R、A510R、A278R、K315R、S334R、L343R、A369R、H392R、L394R、S408R、N456R、V500R、A510R、T573R的一个或多个氨基酸取代。
在一些实施方案中,所述效应蛋白或其功能性变体相对于SEQ ID NO:25的序列包含选自表3或表4中所示的任一组突变。
在一些具体实施方案中,所述效应蛋白或其功能性变体包含选自SEQ ID NO:80-87的氨基酸序列。
在一些实施方案中,所述TraC效应蛋白或其功能性变体至少具有向导RNA介导的 序列特异性靶向能力。也就是说,所述TraC效应蛋白或其功能性变体能够与向导RNA形成复合物并结合至特定靶序列(如DNA靶序列)。
在一些实施方案中,所述TraC效应蛋白或其功能性变体具有向导RNA介导的序列特异性靶向能力,以及双链核酸(如双链DNA)切割活性。例如,所述TraC效应蛋白或其功能性变体与向导RNA形成复合物并结合至特定靶序列(如DNA靶序列)后,能够在靶序列内或附近切割双链核酸(如双链DNA),形成双链断裂(DSB)。
在一些实施方案中,所述TraC效应蛋白或其功能性变体具有向导RNA介导的序列特异性靶向能力,以及切口酶活性。例如,所述TraC效应蛋白或其功能性变体与向导RNA形成复合物并结合至特定靶序列(如DNA靶序列)后,能够在靶序列内或附近产生切口(nick)。具有切口酶活性的TraC效应蛋白或其功能性变体也称为TraC切口酶。
在一些实施方案中,所述TraC效应蛋白或其功能性变体具有向导RNA介导的序列特异性靶向能力,但不具有双链核酸切割活性和/或切口酶活性。这样的不具有双链核酸切割活性和/或切口酶活性的TraC效应蛋白或其功能性变体也称为死亡的TraC效应蛋白。
“向导RNA”和“gRNA”在本文中可互换使用,指的是能够与TraC效应蛋白或其功能性变体形成复合物并由于与靶序列具有一定相同性而能够将所述复合物靶向靶序列的RNA分子。通常而言,CRISPR系统的gRNA通过crRNA与靶序列互补链之间的碱基配对而靶向所述靶序列。
本发明中,所述向导RNA可以选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA)。
在一些实施方案中,本发明的所述TraC效应蛋白或其功能性变体既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列。
在一些实施方案中,所述向导RNA是衍生自转座子的右端元件的向导RNA(reRNA),例如,所述reRNA包含SEQ ID NO:77或78所示支架序列。具体的reRNA的形式或序列可依据具体TraC效应蛋白而不同,设计可以参考Karvelis,T.et al.Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease.Nature 599,692-696(2021).
在一些实施方案中,所述向导RNA包含tracrRNA和/或crRNA。在一些实施方案中,所述向导RNA是tracrRNA和crRNA通过互补形成的向导RNA。在一些实施方案中,所述向导RNA是包含tracrRNA和crRNA的单向导RNA(sgRNA),其中tracrRNA和crRNA融合。在一些实施方式中,所述向导RNA可以仅包含crRNA,其也可称为sgRNA。具体的gRNA的形式或序列可依据具体TraC核酸酶而不同。
本文中,包含tracrRNA和/或crRNA的向导RNA也可称为CRISPR系统向导RNA。包含tracrRNA和/或crRNA的向导RNA是CRISPR系统的常规向导RNA形式。tracrRNA 和/或crRNA的序列可以通过分析CRISPR效应蛋白基因座附近的序列获得。分析并获得CRISPR效应蛋白的包含tracrRNA和/或crRNA的向导RNA在本领域技术人员的能力范围内。
在一些实施方案中,所述包含tracrRNA和/或crRNA的向导RNA衍生自或成熟自SEQ ID NO:38-74之一的核苷酸序列。
在一些实施方案中,所述向导RNA包含tracrRNA和crRNA,例如是包含tracrRNA和crRNA的单向导RNA(sgRNA)。在一些实施方案中,所述crRNA包含与PAM紧邻(例如在PAM的3’)的靶序列相同的序列,由此与PAM的相对链(靶向链)互补结合。在一些实施方案中,tracrRNA包含与在PAM远端(在靶序列方向)一段序列互补的序列(非靶向链结合序列,NTB)。在一些实施方案中,所述非靶向链结合序列位于tracrRNA的5’末端。
tracrRNA中的NTB与PAM远端的序列的结合可以有助于效应蛋白-向导RNA复合物打开PAM远端DNA区域,提高编辑效率。
在一些实施方案中,所述非靶向链结合序列的互补序列距离PAM大约10-大约50个核苷酸,例如大约10、大约16、大约20、大约24、大约28、大约30、大约40、或大约50个核苷酸,优选距离PAM大约20个核苷酸,。在一些实施方案中,所述非靶向链结合序列长度为大约5-大约20个核苷酸,优选约8-12个核苷酸,更优选大约10个核苷酸。在一些实施方案中,所述非靶向链结合序列的互补序列与靶序列至少部分重叠。在一些实施方案中,所述非靶向链结合序列的互补序列包含于所述靶序列中。
如本文所用,“靶序列”是指基因组中由侧翼(例如5’侧翼)的PAM(前间区序列邻近基序)序列所表征的长度约20个核苷酸的序列。通常而言,PAM是CRISPR核酸酶如本发明的TraC效应蛋白或其功能性变体与向导RNA形成的复合物识别靶序列所必需的。基于PAM的存在,本领域技术人员可以容易地确定基因组中可用于靶向的靶序列。而且取决于PAM的位置,靶序列可以位于基因组DNA分子的任一条链上,crRNA结合的链称为靶向链(TS),与靶向链互补的链称为非靶向链(NTS)。
在一些实施方案中,所述sgRNA包含SEQ ID NO:75或76所示支架序列。
在一些实施方案中,SEQ ID NO:75所示支架序列中第154-209位核苷酸或SEQ ID NO:76所示支架序列中第92-147位核苷酸为可重编程区,该区域可以被重编程为包含非靶向链结合序列(NTB)。
在一方面,本发明还提供了所述TraC效应蛋白或其功能性变体与至少一种其它功能性蛋白的蛋白复合物。在一些实施方案中,所述TraC效应蛋白或其功能性变体和所述其它功能性蛋白通过介导特异性结合的亲和性标签而形成蛋白复合物。在一些实施方案中,所述其它功能性蛋白通过特异性结合向导RNA而和所述TraC效应蛋白或其功能性变体形成蛋白复合物。
在一方面,本发明还提供了所述TraC效应蛋白或其功能性变体与至少一种其它功能性蛋白的融合蛋白。
在一些实施方案中,所述其它功能性蛋白是脱氨酶。由此,所述蛋白复合物或融合蛋白可以用于在生物体或生物体细胞中进行碱基编辑。包含所述TraC效应蛋白或其功能性变体和脱氨酶的蛋白复合物或融合蛋白也称为碱基编辑器。在一些实施方案中,所述蛋白复合物或融合蛋白可以包含一或多个所述脱氨酶。
在一些实施方案中,所述脱氨酶是胞嘧啶脱氨酶。“胞嘧啶脱氨酶”指的是能够接受单链DNA作为底物,催化胞苷或脱氧胞苷分别脱氨化为尿嘧啶或脱氧尿嘧啶的脱氨酶。可用于本发明的胞嘧啶脱氨酶的实例包括但不限于例如APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶、双链DNA脱氨酶(Ddd)、单链DNA脱氨酶(Sdd)(Ddd和Sdd参考CN202310220057.1、PCT/CN2023/080052)或它们的功能性变体。所述文献或专利各自通过引用整体并入本文。
在本发明的一些实施方案中,蛋白复合物或融合蛋白中的胞苷脱氨酶能够将蛋白复合物或融合蛋白-向导RNA-DNA复合物形成中产生的单链DNA的胞苷脱氨转换成U,再通过碱基错配修复实现C至T的碱基替换。
在一些实施方式中,所述包含胞嘧啶脱氨酶的蛋白复合物或融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。尿嘧啶DNA糖基化酶可以催化U从DNA上的去除并启动碱基切除修复(BER),导致将U:G修复成C:G。因此,不受任何理论限制,在本发明的融合蛋白包含尿嘧啶DNA糖基化酶抑制剂(UGI)将能够增加C至T碱基编辑的效率。
在一些实施方案中,所述脱氨酶是腺嘌呤脱氨酶。“腺嘌呤脱氨脱氨酶”是指能够接受单链DNA作为底物,催化腺苷或脱氧腺苷(A)形成肌苷(I)的结构域。
在本发明中,蛋白复合物或融合蛋白中的腺嘌呤脱氨酶能够将蛋白复合物或融合蛋白-向导RNA-DNA复合物形成中产生的单链DNA的腺苷脱氨转换成肌苷(I),由于DNA聚合酶会将肌苷(I)当做鸟嘌呤(G)处理,因此通过碱基错配修复可以实现A至G的取代。
在一些实施方案中,所述腺嘌呤脱氨酶是衍生自大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的DNA依赖型腺嘌呤脱氨酶。
在一些实施方案中,所述蛋白复合物或融合蛋白包括胞嘧啶脱氨酶和腺嘌呤脱氨酶。
在一些实施方案中,所述其它功能蛋白可以是转录激活蛋白、转录抑制蛋白、DNA甲基化酶、DNA去甲基化酶等,从而能够实现转录调控功能和/或表观遗传修饰功能。在一些实施方案中,所述其它功能蛋白可以是逆转录酶。包含所述TraC效应蛋白或其功能性变体与逆转录酶的蛋白复合物或融合蛋白可以用于大片段DNA插入,例如引导编辑(prime editor)(Anzalone,A.V.,Randolph,P.B.,Davis,J.R.et al.Search-and-replace genome editing without double-strand breaks or donor DNA.Nature 576,149–157(2019).)、PrimeRoot编辑(PrimeRoot editor)(Sun,C.,Lei,Y.,Li,B.et al.Precise integration of large DNA sequences in plant genomes using PrimeRoot editors.Nat Biotechnol(2023).),所述文献各自通过引用整体并入本文。
本发明的融合蛋白的不同部分之间可以独立地通过接头或直接相连。本文所述的接头可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、 17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如,所述接头可以是柔性接头。
在本发明各方面的在一些实施方案中,所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白是重组产生的。在本发明各方面的在一些实施方案中,所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白还含有融合标签,例如用于TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白分离/和或纯化的标签。重组产生蛋白质的方法是本领域已知的。并且本领域已知多种可以用于分离/和或纯化蛋白质的标签,包括但不限于His标签、GST标签等。通常而言,这些标签不会改变目的蛋白的活性。
在本发明各方面的一些实施方案中,本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白还包含核定位序列(NLS),例如,通过接头与所述核定位序列相连。一般而言,所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白中的一个或多个NLS应具有足够的强度,以便在细胞核中驱动所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白以可实现其基因组编辑功能的量积聚。一般而言,核定位活性的强度由所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。示例性的核定位序列包括但不限于SV40核定位信号序列、nucleoplasmin核定位信号序列。此外,根据所需要编辑的DNA位置,本发明的所述TraC效应蛋白或其功能性变体或所述融合蛋白还可以包括其他的定位序列,例如细胞质定位序列、叶绿体定位序列、线粒体定位序列等。
在一方面,本发明提供本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白在对细胞,优选真核细胞,更优选植物细胞进行基因组编辑的用途。
在一方面,本发明提供了一种用于对细胞基因组中靶核酸序列进行定点修饰的基因组编辑系统,其包含本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白和/或包含编码本发明的所述TraC效应蛋白或其功能性变体或所述融合蛋白的核苷酸序列的表达构建体。
在本文中,术语“基因组编辑系统”和“基因编辑系统”可互换使用,是指用于对生物体细胞内基因组进行基因组编辑所需的成分的组合,其中所述系统的各个成分,例如所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白、gRNA或相应的表达构建体等可以各自独立地存在,或者可以以任意的组合作为组合物的形式存在。在一些实施方案中,所述基因组编辑系统的组分被包含在递送体系中,所述递送体系选自病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体、脂质体纳米颗粒(LNP)、N-乙酰半乳糖胺(GalNAc)或工程菌。
在一些实施方案中,所述基因组编辑系统还包括至少一种向导RNA(gRNA)和/或包含编码所述至少一种向导RNA的核苷酸序列的表达构建体。
在一些实施方案中,所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA)。
在一些实施方案中,所述向导RNA是包含tracrRNA和crRNA的单向导RNA(sgRNA),例如,所述sgRNA包含SEQ ID NO:75或76所示支架序列。
在一些实施方案中,所述衍生自CRISPR系统的向导RNA包含tracrRNA和crRNA,例如是包含tracrRNA和crRNA的单向导RNA(sgRNA)。在一些实施方案中,所述crRNA包含与PAM紧邻的靶序列相同的序列,由此与PAM的相对链互补结合。在一些实施方案中,tracrRNA包含与PAM在靶序列方向远端的一段序列互补的序列(非靶向链结合序列,NTB)。在一些实施方案中,所述非靶向链结合序列位于tracrRNA的5’末端。
在一些实施方案中,所述非靶向链结合序列的互补序列距离PAM大约10-大约50个核苷酸,例如大约10、大约16、大约20、大约24、大约28、大约30、大约40、或大约50个核苷酸,优选距离PAM大约20个核苷酸,。在一些实施方案中,所述非靶向链结合序列长度为大约5-大约20个核苷酸,优选约8-12个核苷酸,更优选大约10个核苷酸。在一些实施方案中,所述非靶向链结合序列的互补序列与靶序列至少部分重叠。在一些实施方案中,所述非靶向链结合序列的互补序列包含于所述靶序列中。
在一些实施方案中,SEQ ID NO:75所示支架序列中第154-209位核苷酸或SEQ ID NO:76所示支架序列中第92-147位核苷酸为可重编程区,该区域可以被重编程为包含非靶向链结合序列(NTB)。
一般而言,本发明的基因组编辑系统靶向的靶序列5’或3’末端需包含前间区序列邻近基序(protospacer adjacent motif)(PAM)。具体的gRNA的形式或序列取决于具体核酸酶而有所不同。
在一些实施方案中,在用于引导编辑时,所述gRNA可以是所谓的pegRNA。所述pegRNA在sgRNA的基础上额外加入逆转录模板(RT)序列和引物结合位点(PBS)序列。
在一些实施方案中,本发明的核酸酶或其功能性变体识别的PAM是富含T的PAM。在一些实施方案中,本发明的核酸酶或其功能性变体识别的PAM是富含G的PAM。所述PAM可以是例如5’-TTTN-3’、5’-TGTNNN-3’、PolyT、PolyG、5’-TTTG-3’、5’-TTC-3’、5’-TGA-3’、5’-YTTC-3’、5’-CTCGTG-3’、5’-GTTG-3’、5’-CTTG-3’、5’-TCTG-3’、5’-TTTA-3’、5’-TTAG-3’,其中N表示A、G、C或T,Y表示C或G)。
本领域技术人员基于PAM的存在,可以容易地确定基因组中可以用于靶向以及任选地编辑的靶序列并相应地设计合适的向导RNA。例如,基因组中存在一个PAM序列5’-TTC-3’,则其5’或3’紧邻的大约18-大约35个,优选20、21、22或23个连续核苷酸可作为靶序列。
在一些实施方案中,所述至少一种向导RNA由不同表达构建体编码。在一些实施 方案中,所述至少一种向导RNA由同一表达构建体编码。在一些实施方案中,所述至少一种向导RNA和本发明的所述TraC效应蛋白或其功能性变体或所述融合蛋白由同一表达构建体编码。
例如,在一些实施方案中,所述基因组编辑系统可以包含选自以下的任一项:
i)本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白和所述至少一种向导RNA,任选地,所述TraC效应蛋白或其功能性变体或所述融合蛋白和所述至少一种向导RNA形成复合物;
ii)包含编码本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白的核苷酸序列的表达构建体,和所述至少一种向导RNA;
iii)本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白,和包含编码所述至少一种向导RNA的核苷酸序列的表达构建体;
iv)包含编码本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白的核苷酸序列的表达构建体,和包含编码所述至少一种向导RNA的核苷酸序列的表达构建体;
v)包含编码本发明的所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白的核苷酸序列和编码所述至少一种向导RNA的核苷酸序列的表达构建体。
在一些实施方案中,所述基因组编辑系统还包含供体核酸分子,所述供体核酸分子包含待定点插入基因组中的核苷酸序列。在一些实施方案中,待定点插入基因组中的核苷酸序列两侧包含与基因组中靶序列两侧序列同源的序列。在编辑后,所述待定点插入基因组中的核苷酸序列可以通过同源重组整合进基因组中。
为了在细胞中获得有效表达,在本发明的一些实施方式中,所述编码所述TraC效应蛋白或其功能性变体或形成蛋白复合物的所述其它功能性蛋白或所述融合蛋白的核苷酸序列针对待进行基因组编辑的细胞所来自的生物体进行密码子优化。
密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修饰核酸序列以便增强在感兴趣宿主细胞中的表达的方法。不同的物种对于特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关,而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码子。因此,可以将基因定制为基于密码子优化在给定生物中的最佳基因表达。密码子利用率表可以容易地获得,例如在www.kazusa.orjp/codon/上可获得的密码子使用数据库(“Codon Usage Database”)中,并且这些表可以通过不同的方式调整适用。参见,Nakamura Y.等,“Codon usage tabulated from the international DNA sequence databases:status for the year 2000. Nucl.Acids Res.,28:292(2000)。
可通过本发明的所述TraC效应蛋白或其功能性变体或所述融合蛋白或基因组编辑系统进行基因组编辑的细胞所来自的生物体可以是原核生物或真核生物,优选是真核生物,包括但不限于,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物包括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥等。
在本发明一些实施方式中,编码所述TraC效应蛋白或其功能性变体或所述融合蛋白的核苷酸序列和/或编码所述至少一种向导RNA的核苷酸序列与表达调控元件如启动子可操作地连接。
本发明可使用的启动子的实例包括但不限于聚合酶(pol)I、pol II或pol III启动子。pol I启动子的实例包括鸡RNA pol I启动子。pol II启动子的实例包括但不限于巨细胞病毒立即早期(CMV)启动子、劳斯肉瘤病毒长末端重复(RSV-LTR)启动子和猿猴病毒40(SV40)立即早期启动子。pol III启动子的实例包括U6和H1启动子。可以使用诱导型启动子如金属硫蛋白启动子。启动子的其他实例包括T7噬菌体启动子、T3噬菌体启动子、β-半乳糖苷酶启动子和Sp6噬菌体启动子。当用于植物时,启动子可以是花椰菜花叶病毒35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子。
在一些实施方式中,为了在细胞内精确产生向导RNA,在编码所述至少一种向导RNA的核苷酸序列的表达构建体中,其中所述向导RNA编码序列的5’端连接至第一核酶编码序列的3’端,所述第一核酶被设计为在所述向导RNA的5’末端切割细胞内转录生成的第一核酶-向导RNA融合物,由此形成不携带5’端额外核苷酸的向导RNA。在一实施方案中,所述向导RNA编码序列的3’端连接至第二核酶编码序列的5’端,所述第二核酶被设计为在所述向导RNA的3’末端切割细胞内转录生成的向导RNA-第二核酶融合物,由此形成不携带3’端额外核苷酸的向导RNA。在一些实施方案中,所述向导RNA编码序列的5’端连接至第一核酶编码序列的3’端,所述向导RNA编码序列的3’端连接至第二核酶编码序列的5’端,所述第一核酶被设计为在所述向导RNA的5’末端切割细胞内转录生成的第一核酶-向导RNA-第二核酶融合物,所述第二核酶被设计为在所述向导RNA的3’末端切割细胞内转录生成的第一核酶-向导RNA-第二核酶酶融合物,由此形成不携带5’和3’端额外核苷酸的向导RNA。
所述第一或第二核酶的设计属于本领域技术人员的能力范围内。例如,可以参见Gao et al.,JIPB,Apr,2014;Vol 56,Issue 4,343-349。
在一些实施方式中,为了在细胞内精确产生向导RNA,在编码所述至少一种向导RNA的核苷酸序列的表达构建体中,其中所述向导RNA编码序列的5’端连接至第一tRNA编码序列的3’端,所述第一tRNA被设计为在所述向导RNA的5’末端切割(即,被细胞内存在的精确加工tRNA的机制(其精确切除前体tRNA的5’和3’额外序列以形成成熟tRNA)所切割)细胞内转录生成的第一tRNA-向导RNA融合物,由此形成不携 带5’端额外核苷酸的向导RNA。在一实施方案中,所述向导RNA编码序列的3’端连接至第二tRNA编码序列的5’端,所述第二tRNA被设计为在所述向导RNA的3’末端tRNA细胞内转录生成的向导RNA-第二tRNA融合物,由此形成不携带3’端额外核苷酸的向导RNA。在一些实施方案中,所述向导RNA编码序列的5’端连接至第一tRNA编码序列的3’端,所述向导RNA编码序列的3’端连接至第二tRNA编码序列的5’端,所述第一tRNA被设计为在所述向导RNA的5’末端切割细胞内转录生成的第一tRNA-向导RNA-第二tRNA融合物,所述第二tRNA被设计为在所述向导RNA的3’末端切割细胞内转录生成的第一tRNA-向导RNA-第二tRNA融合物,由此形成不携带5’和3’端额外核苷酸的向导RNA。
所述tRNA-向导RNA融合物的设计属于本领域技术人员的能力范围内。例如,可以参考Xie et al.,PNAS,Mar 17,2015;vol.112,no.11,3570-3575。
三、定点修饰细胞基因组中靶核酸序列的方法
在另一方面,本发明提供了一种对细胞基因组中靶核酸序列进行定点修饰的方法,包括将本发明的基因组编辑系统导入所述细胞。
在另一方面,本发明还提一种产生经遗传修饰的细胞的方法,包括将本发明的基因组编辑系统导入所述细胞。
在另一方面,本发明还提供经遗传修饰的生物体,其包含通过本发明的方法产生的经遗传修饰的细胞或其后代细胞。
在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。可以通过T7EI、PCR/RE或测序方法检测所述细胞靶序列中的修饰。
在本发明的方法中,所述基因编辑系统可以通过本领域技术人员熟知的各种方法导入细胞。
可用于将本发明的基因编辑系统导入细胞的方法包括但不限于:磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化。
在一些实施方式中,本发明的方法在体外进行。例如,所述细胞是分离的细胞,或在分离的组织或器官中的细胞。
在另一些实施方式中,本发明的方法还可以在体内进行。例如,所述细胞是生物体内的细胞,可以通过例如病毒或土壤农杆菌介导的方法将本发明的系统体内导入所述细胞。
可以通过本发明的方法进行基因组编辑的细胞可以来自原核生物或真核生物,例如,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包 括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥等。
本发明提供了一种产生经遗传修饰的植物的方法,包括将本发明的基因组编辑系统导入至少一个所述植物,由此导致所述至少一个植物的基因组中的修饰。
在本发明的方法中,所述基因组编辑系统可以本领域技术人员熟知的各种方法导入植物。可用于将本发明的基因组系统导入植物的方法包括但不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。
在本发明的方法中,只需在植物细胞中导入或产生所述TraC效应蛋白或其功能性变体或所述融合蛋白、向导RNA即可实现对靶序列的修饰,并且所述修饰可以稳定遗传,无需将所述基因组编辑系统稳定转化植物。这样避免了稳定存在的基因组编辑系统的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的基因组编辑系统转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率,获得不含外源核苷酸序列的除草剂抗性植物。
在另一些实施方式中,可以将本发明的基因组编辑系统转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录的RNA分子转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现基因组编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组中的整合。
一些实施方案中,所述方法还包括在升高的温度下(相对于常规培养的温度如室温)处理(如培养)已经导入所述基因组编辑系统的植物细胞、组织或完整植物,所述升高的温度例如是32℃。在一些优选实施方案中,所述植物是水稻。
因此,在一些实施方式中,使用本发明的方法对植物进行遗传修饰可以获得其基因组无外源多核苷酸整合的植物,即非转基因(transgene-free)的经修饰的植物。
在本发明的一些实施方式中,其中所述修饰与植物性状如农艺性状相关,例如所述修饰导致所述植物相对于野生型植物具有改变的(优选改善的)性状,例如农艺性状。
在一些实施方式中,所述方法还包括筛选具有期望的修饰和/或期望的性状如农艺性状的植物的步骤。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的植物的后代。 优选地,所述经遗传修饰的植物或其后代具有期望的修饰和/或期望的性状如农艺性状。
在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分,其中所述植物通过本发明上述的方法获得。在一些实施方式中,所述经遗传修饰的植物或其后代或其部分是非转基因的。优选地,所述经遗传修饰的植物或其后代具有期望的遗传修饰和/或期望的性状如农艺性状。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明上述的方法获得的经遗传修饰的第一植物与不含有所述修饰的第二植物杂交,从而将所述修饰导入第二植物。优选地,所述经遗传修饰的第一植物具有期望的性状如农艺性状。
四、治疗应用
本发明还涵盖本发明的基因组编辑系统在疾病治疗中的应用。
通过本发明的基因组编辑系统对疾病相关基因进行修饰,可以实现疾病相关基因的上调、下调、失活、激活或者突变纠正等,从而实现疾病的预防和/或治疗。例如,本发明中所述基因组修饰可以位于疾病相关基因的蛋白编码区内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而可以实现对所述疾病相关基因功能修饰或对疾病相关基因表达的修饰。因此,本文所述修饰疾病相关基因包括对疾病相关基因本身(例如蛋白编码区)的修饰,也包含对其表达调控区域(如启动子、增强子、内含子等)的修饰。
“疾病相关”基因是指与非疾病对照的组织或细胞相比,在来源于疾病影响的组织的细胞中以异常水平或以异常形式产生转录或翻译产物的任何基因。在改变的表达与疾病的出现和/或进展相关的情况下,它可以是以异常高的水平被表达的基因;它可以是以异常低的水平被表达的基因。疾病相关基因还指具有一个或多个突变或直接负责或与一个或多个负责疾病的病因学的基因连锁不平衡的遗传变异的基因。所述突变或遗传变异例如是单核苷酸变异(SNV)。转录的或翻译的产物可以是已知的或未知的,并且可以处于正常或异常水平。
因此,本发明还提供治疗有需要的对象中的疾病的方法,包括向所述对象递送有效量的本发明的基因组编辑系统以修饰与所述疾病相关的基因。本发明还提供基因组编辑系统在制备用于治疗有需要的对象中的疾病的药物组合物中的用途,其中所述基因组编辑系统用于修饰与所述疾病相关的基因。本发明还提供用于治疗有需要的对象中的疾病的药物组合物,其包含本发明的基因组编辑系统,以及任选的药学可接受的载体,其中所述基因组编辑系统用于修饰与所述疾病相关的基因。
优选地,本发明所述“对象”是哺乳动物,例如是人。
在一些实施方案中,本发明描述的基因组编辑系统用于将点突变引入到核酸中。
在一些实施方案中,本发明描述的基因组编辑系统用于导致遗传缺陷的校正,例如在校正导致基因产物中功能丧失的点突变中。在一些实施方案中,遗传缺陷与疾病或病症(例如溶酶体贮积病或代谢性疾病,诸如例如I型糖尿病)相关。在一些实施方案中,本文提供的方法可用于将失活性点突变引入到编码与疾病或病症相关的基因产物的基 因或等位基因中。
在一些实施方案中,本发明描述的方案的目的是用于治疗患有与点突变相关或由点突变引起的疾病,所述点突变可以通过本文提供的基因组编辑系统进行校正。在一些实施方案中,疾病是增殖性疾病。在一些实施方案中,疾病是遗传疾病。在一些实施方案中,疾病是新生性疾病。在一些实施方案中,疾病是代谢性疾病。在一些实施方案中,疾病是溶酶体贮积病。
在一些实施方案中,本发明描述的方案的目的是可用于治疗线粒体疾病或紊乱。如本文所使用的,“线粒体疾病”涉及由异常线粒体引起的疾病,例如线粒体基因突变、酶途径等。疾病的例子包括但不限于:神经疾病、运动控制丧失、肌肉无力和疼痛、胃肠疾病和吞咽困难、生长不良、心脏病、肝病、糖尿病、呼吸并发症、癫痫、视觉/听力问题、乳酸酸中毒、发育迟缓和易受感染。
本发明中所述疾病的示例包括但不限于遗传性疾病,循环系统疾病,肌肉疾病,大脑、中枢神经和免疫系统疾病,阿尔茨海默病,分泌酶病症,肌萎缩性侧索硬化(ALS),孤独症,三核苷酸重复序列扩增病症,听力疾病,非分裂细胞(神经元、肌肉)的基因靶向治疗,肝脏和肾脏疾病,上皮细胞和肺部疾病,癌症,乌谢尔综合征或色素性视网膜炎-39,囊性纤维化,HIV和AIDS,β地中海贫血,镰状细胞疾病,单纯性疱疹病毒,自闭症,药物成瘾,年龄相关性黄斑变性,精神分裂症。通过校正点突变或将失活性突变引入到疾病相关基因中来治疗的其他疾病对于本领域技术人员来说是已知的,因此本公开内容在这方面不受限制。除本发明示例性描述的疾病外,也可以用本发明提供的策略和基因组编辑系统治疗其他的相关疾病,该应用对本领域技术人员是显而易见的。本发明可应用的疾病或靶点参考WO2015089465A1(PCT/US2014/070135)、WO2016205711A1(PCT/US2016/038181)、WO2018141835A1(PCT/EP2018/052491)、WO2020191234A1(PCT/US2020/023713)、WO2020191233A1(PCT/US2020/023712)、WO2019079347A1(PCT/US2018/056146)、WO2021155065A1(PCT/US2021/015580)中所列明的基因组编辑系统适用的相关疾病。
本发明的基因组编辑系统或药物组合物的施用可针对患者或受试者的体重和物种进行调整。施用频率在医学或兽医学允许的范围之内。其取决于包括患者或受试者的年龄、性别、一般健康状况、其他状况以及着手解决的特定病状或症状在内的常规因素。
五、试剂盒
本发明还包括用于本发明的方法的试剂盒,该试剂盒包括本发明的基因组编辑系统,以及使用说明。试剂盒一般包括表明试剂盒内容物的预期用途和/或使用方法的标签。术语标签包括在试剂盒上或与试剂盒一起提供的或以其他方式随试剂盒提供的任何书面的或记录的材料。
实施例
实施例1:新CRISPR系统的生物信息学挖掘
首先我们开发了一种更有针对性地识别CRISPR效应蛋白相关编码基因的策略,用以识别锚定在共享的高度保守基序的基因。我们对已知的86个Cas12b-Cas12i家族的Cas12蛋白使用MEME motif软件进行了保守结构域预测,发现了三个保守存在于86个Cas12蛋白的结构基序(图1),分别为“TSxxCxxCx”、“GIDRG”和“CxxCGxxxxADxxAA”。
继而针对已发表的NCBI公共数据库中的微生物基因组/宏基因组数据进行了搜索,初步搜索了GTDB数据库中32562个CRISPR array上下游10kb以内的所有蛋白,从中挑选出166个候选蛋白至少具有图1A所示的三种保守motif之二。下一步通过CRISPR类型分析和蛋白相似度分析过滤掉与已注释类型相同或与已注释蛋白序列相似的候选蛋白,去除冗余后得到37个含有保守结构域的全新蛋白(SEQ ID NO:1-37)。这些蛋白被定义为转座子与CRISPR-Cas12中间体(intermediates between Transposon and CRISPR-Cas12,简称TraC)。相应的,以TraC为效应蛋白的CRISPR系统定义为CRISPR-TraC系统。
TraC蛋白的原核表达系统以图2为例。图2中示例性的给出TraC-N483蛋白的原核表达,图中483表示新蛋白名称,repeat为CRISPR locus区域。NC1与NC2为可能存在tracrRNA的非编码RNA区域。合成基因时,使用pTac启动子驱动蛋白基因的表达,使用J22119启动Repeat-spacer-repeat-noncoding序列的表达(SEQ ID NO:38-74)。
表1:序列对照表

实施例2:利用荧光报告系统在原核细胞中筛选具有DNA结合能力的新型CRISPR系 统
发明人利用荧光报告系统对新型CRISPR系统的功能进行筛选,该系统可筛选有DNA双链结合能力的CRISPR系统,具体实验设计如图3、图4所示:
以不具有切割活性的dead LbCas12a为例(即dCas为dLbCas12a,对应的Y53载体称为Y53-dLbCas12a),使用一个p15a为骨架的质粒表达Cas12蛋白,miniCRISPR(repeat-spacer-repeat)和非编码RNA序列(ncRNA),使用另一个以pBR322为骨架的质粒表达黄色荧光蛋白(YFP)(pUC-PAM-YFP),其中在YFP蛋白的5’非翻译区有与spacer序列互补的靶位点及其上游的随机PAM文库,序列为:nnnnnnGTGATCGACAGCAACAAGTGAGCG或nnnnGTGATCGACAGCAACAAGTGAGCG,其中,nnnnnn与nnnn为不同长度PAM的文库,分别涵盖4096和256种PAM序列,如图3。
如果被测蛋白能够成功成熟crRNA,并在crRNA的指导下,靶向YFP 5’非翻译区靶位点,则该蛋白会持续结合在YFP的5’非翻译区,阻遏YFP的转录,从而造成YFP表达量下降。但该蛋白仅会在有合适的PAM时起作用,因此,可以通过流式细胞仪分选YFP表达量低的细菌,再对分选的细菌进行一代测序,从而快速获得被测蛋白的PAM序列。图4为dLbCas12a蛋白的筛选结果,流式分选P2区域(B框)中YFP表达量极低的细菌,一代测序分选后的细菌后发现流式分选的YFP阴性细胞的PAM为TTTN(与先前报道的LbCas12a PAM研究结果相同),而在IPTG诱导蛋白表达之前,或之后的流式荧光细胞分选(FACS)细菌的PAM均为随机文库形式(NNNN)(A框表示)。
利用上述系统可以对新型CRISPR系统的候选蛋白的DNA双链结合特征进行筛选,发明人筛选了部分具有代表性的候选蛋白。其中TraC-N287、TraC-445、TraC-483、TraC-655均筛选出了T富集的PAM,TraC-N701是G富集的PAM,这暗示这类蛋白大部分具有T富集或少量G富集的PAM,与之前报道的多数Cas12家族蛋白均识别T富集的PAM这一发现相符。
实施例3:二代测序详细检测具有双链切割功能蛋白的PAM
此外,利用该系统可筛选有DNA双链切割能力的CRISPR系统,具体实验设计如下:
如图5所示,将带有PAM文库的质粒与表达蛋白的质粒共转(此为处理组),同时以crRNA表达框缺失的蛋白表达载体与PAM文库的质粒共转形成对照组,理论上能够被待测蛋白识别并切割的PAM会丢失,导致被靶向PAM的比例相对于对照组降低,从而可以通过二代测序比较两者PAM文库的消减情况来获得被测蛋白的PAM序列。
发明人对TraC-875、TraC-365、TraC-655、TraC-445(图6A)、TraC-297、TraC-459、TraC-466、TraC-949(图6B)的PAM序列进行了测试,两组蛋白的测试均以LbCpf1为阳性对照。两组实验中LbCpf1阳性对照结果如预期结果一致,均出现了TTTN PAM的富集,表明两组实验结果均可信任。在第一组实验中,TraC-875与TraC-365蛋白出现 了TGTNNN PAM的富集;TraC-655与TraC-445出现了较弱信号的PolyT或PolyG PAM的富集(图6A)。该结果与上述实施例2中流式细胞仪检测得到的PAM结果类似,这类Cas蛋白有5’的Poly T或Poly G PAM。TraC-297、TraC-459、TraC-466、TraC-949蛋白的PAM类型的实验结果(图6B)发现TraC-297识别TTTG类型的PAM,TraC-459蛋白识别TTC类型的PAM,TraC-466蛋白识别TTC类型的PAM,TraC-949蛋白识别TGA类型的PAM。该结果进一步探究了该类蛋白在真核体系中的工作需求。
实施例4:利用质粒干涉系统在原核细胞中筛选具有DNA切割能力的新型CRISPR系统
为进一步检测新型CRISPR系统的DNA双链切割能力,本实施例使用质粒干涉系统作为检测模型,具体实验设计如图7所示。使用质粒干涉实验体系验证实施例3中得到的具有明显PAM的候选蛋白的具体PAM信息,具体实施过程如下:以候选蛋白TraC-459为例,实施例3中得到该蛋白可识别一典型的5’-TTC-3’PAM基序,基序的3’紧邻GFP-T1靶点(SEQ ID NO:79),使用pUC-polyT-YFP载体为模板构建一系列带有Tra-C459可识别PAM序列的靶点载体(pUC-TTC-YFP、pUC-GTC-YFP、pUC-TCC-YFP、pUC-TTG-YFP、pUC-TGC-YFP、pUC-CTTC-YFP、pUC-GTTC-YFP和pUC-TTTC-YFP),将Y53-459载体与上述靶点载体共转化大肠杆菌感受态,同时使用Y53空载体与每个靶点载体共转化作为对照,双抗Lb固体培养基过夜培养后,计算二者生长的阳性克隆数量,从而测试TraC-459在不同PAM上的靶向能力。从结果可得候选蛋白TraC-459可以识别TTC的PAM,而在其他PAM上的靶向能力较低(图8A),该结论与实施例3的二代测试结果相同。
同理,对TraC-875与TraC-297蛋白的PAM进行了验证,发现TraC-875蛋白对于5’-CTCGTG-3’PAM基序下的切割活性较强,其详细的PAM序列有待进一步探索;TraC-297蛋白可以广泛地、高效地切割5’-GTTG-3’、5’-CTTG-3’、5’-TCTG-3’、5’-TTTA-3’、5’-TTAG-3’PAM基序下的靶点序列;TraC-949蛋白可以切割5’-NTGA-3’PAM基序下的靶点序列,其中对5’-TTGA-3’PAM基序下的靶点序列切割效率最高,而对5’-TTGA-3’、5’-ATGA-3’、5’-GTGA-3’、5’-CTGA-3’PAM靶点的切割效率相对较低。结果示于图8B。
实施例5:新预测的CRISPR系统的进化模型
为进一步分析新得到的CRISPR系统的TraC蛋白的结构和功能特征。发明人对V型CRISPR和TnpB系统的附属RNA二级结构进行预测,并分析其结构折叠模型(图9A)。研究发现,不同的蛋白亚型可以根据折叠模型分为三类,折叠模型反应了CRISPR三类基因座特征分别为CRISPR蛋白与tracrRNA的位置远近或tracrRNA缺失。分类结果表明TnpB蛋白可能经历了转座子跳转到CRISPR位点,或reRNA分裂成为tracrRNA和CRISPR RNA。附属RNA组合的多样性也支持了效应物与附属RNA协同进化的模 型(进化模型参见图9B)。
实施例6:TraC蛋白以sgRNA为向导RNA的编辑活性
本实施例选择TraC系统中的TraC-459蛋白验证其对DNA的编辑活性。
在一方面,sgRNA的结构和长度设计会对CRISPR系统的编辑效率造成影响,因此发明人对最适合TraC系统的sgRNA结构进行筛选。首先,发明人针对HEK293T细胞的VEGFA-T1位点,通过tracrRNA和crRNA重组,设计了一条预测sgRNA(sgRNA-predicted,简称sgRNA-pre)(参见图10A)。随后,以sgRNA-pre为基准,分别检验了tracrRNA:crRNA互补区截短长度、tracrRNA 5’区域截短长度、间隔区(spacer)长度对TraC-459蛋白编辑效率的影响(参见图10B)。结果显示,tracrRNA:crRNA互补区截短长度11-15bp,或tracrRNA 5’区域截短长度19-21bp,或间隔区长度22-27bp时显示出较好的编辑效果。据此得到优化的sgRNA(sgRNA-optimal,简称sgRNA-opt),此优化策略称为TraC系统第二代sgRNA优化方法(简称sgRNA-v2)。图11显示,sgRNA-opt作为引导RNA,可以显著提升TraC-459的编辑效率。
实施例7:TraC蛋白以reRNA为向导RNA的编辑活性
实施例5的协同进化模型预测了TraC蛋白为TnpB进化的后代。由于TnpB系统是使用3’侧翼序列作为DNA切割的向导RNA(Karvelis,T.et al.Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease.Nature 599,692-696(2021).)。本实施例考察将TnpB的向导RNA用于TraC蛋白的CRISPR系统的情形。
在体内实验验证中,发明人基于三维结构聚类分析,选择结构接近的TnpB突变体蛋白的reRNA(882-TnpB-reRNA、966-TnpB-reRNA)作为待验证向导RNA。发明人针对GFP-T1靶点,将882-TnpB-reRNA和966-TnpB-reRNA的scaffold序列与GFP-T1的靶向序列融合。随后采用实施例4中的质粒干涉实验分析TraC-459在不同引导RNA下对大肠杆菌e.coli的dsDNA切割能力,实验结果如图12所示。实验结果表明TraC-459在不同类型的向导RNA下相对空白载体对照(图12中pEmpty所示)均显示出不同程度的DNA干扰活性。
综合实施例6、7的结果表明,TraC4-59具有双向导机制,同时具有TnpB系统和CRISPR系统的靶向切割途径,即TraC效应蛋白既能够在reRNA的引导下靶向结合目标DNA,也能够在sgRNA的引导下靶向结合目标DNA。
实施例8:TraC蛋白的蛋白工作形式预测
为了进一步阐释TraC系统蛋白的工作形式,发明人构建了TraC-459蛋白的双体序列,使用AlphaFold2的multimer v3模型预测了TraC-459蛋白三维结构折叠情况,结果显示预测的5种TraC-459最佳的蛋白结构中(Rank1~5)均不存在双体形式的互作(图13)。上图为预测对齐误差(PAE)热图(为每对残基提供一个距离误差。当预测和真实 结构在残基y上对齐时,它给出了残基x处AlphaFold2对位置误差的估计。值的范围为0-35埃(白色-黑色)。通常它显示为一个热图图像,其中残基编号沿垂直和水平轴运行,每个像素的颜色表示相应残基对的PAE值。如果两个域的相对位置被可靠地预测,则每个域中各有一个残基的残基对的PAE值将很低(小于5A,图中白色为0)。图中横纵坐标均为两个TraC-459单体蛋白长度。前575氨基酸为一个TraC-459单体蛋白,后575为另一个TraC-459氨基酸。热图中仅在前一个TraC-459单体和后一个TraC-459单体中呈现为白色。而两个TraC-459单体之间的区域为黑色。结合575aa的紧凑型蛋白结构,这暗示着TraC-459是最小的Cas12单体。
实施例9:TraC蛋白的优化
为进一步验证TraC蛋白的有效性,拓展其应用场景,本实施例通过精氨酸扫描突变、定向进化和人工智能辅助进化方法获得了一系列优化的TraC-459变体,筛选过程如图14a-c所示。经过对TraC-459突变体库的细胞内编辑效率检验,部分筛选出的TraC-459突变体具有更高的编辑效率。以五突变体为例,实验检测了突变体库中的五突变体的编辑效率,共进行三组平行实验(表2)。其中,所得到突变体编辑效率与野生型TraC-459的比值>1表明该突变体具有更高的编辑效率。根据此方法筛选的编辑效率提高的突变体结果见表3。代表性的通过精氨酸扫描突变筛选得到的5精氨酸突变体如TraC-5M-7(S137R,P148R,D150R,K315R和A369R),即图14b中TraC-5M-7突变体。研究表明,TraC-5M-7在VEGFA-T1位点的编辑效率比原始TraC-459高24.02倍。根据此方法设计得到的编辑效率提高的TraC突变体如表3所示。
为了进一步设计具有编辑活性增加的TraC-459,发明人通过一系列TraC变体的数据,开发了一个深度学习模型,获得了7个代表性的突变体TraC-B22、-B24、-B26、-B32、-B34、-B35、B36在人细胞中具有增强的编辑活性(编辑活性如图14d所示,突变位点见表4)。
表2五氨基酸突变体精氨酸扫描突变实验结果

表3具有编辑效率提高的TraC突变体
表4代表性的TraC-459突变体的突变位点及氨基酸序列
实施例10:TraC系统的双配对功能
随后,我们通过sgRNA-opt的二级结构预测,发现了在tracrRNA末端有一个类似气泡结构的区域(图15),结合进化分析,这块突出的区域可能是TnpB的reRNA进化过程中的flanking DNA进化而来,因此该区域可能是一个可以重编程靶向DNA的区域。结合已报道的TnpB结构信息显示,TnpB蛋白不能自主打开靶向位置远PAM端的末端区域,造成对GC含量高的区域编辑效率较低。因此对这个区域进行改造可能可以结合PAM远端的区域,帮助打开DNA双链,从而提高在高GC含量靶点的编辑效率(图16a)。
我们在VEGFA-T1靶点上在HEK293人类细胞中对气泡结构区域重编程,测试最适合的配对区域与配对长度。对VEGFA-T1中从PAM序列下游13bp到47bp的互补区(图16a中的L1~L7)进行了评估,发现PAM序列下游21-32nt互补区(L2构建体)具有最高的编辑活性(图16b)。其次,我们在L2构建体互补区,测试了互补长度从20bp到6bp(S1至S8)的编辑活性,发现10bp的互补区长度(S6)具有最高的编辑活性(图16c)。最后,我们将气泡结构区域与PAM远端区域靶向互补,发现在4个人类细胞系PAM远端区域GC含量较高的内源靶点上,重编程形式的sgRNA可以提高编辑效率(图17)。这样一个额外的可重编程的区域,在其他CRISPR系统中均没有发现。
综上所述,本实施例表明TraC-459是一种高度紧凑的单体Cas12样蛋白。发明人通过与现有技术中已知蛋白进行比较,发现其是目前已知的最小的单体CRISPR效应蛋白。并且其具有独特的sgRNA和reRNA双向导机制,和双配对功能,这在其他Cas12亚型中是不存在的。
实施例11:TraC蛋白在植物细胞中受温度影响
在植物细胞中,部分植物细胞能够耐受高温培养。为验证TraC蛋白对植物细胞中受温度条件的影响,发明人分别在25℃和32℃条件下,选择了水稻原生质体中的5个 内源靶点(OsAAT1、OsALST1、OsEPSPST1、OsPDST1、OsPDST2)进行了测试。我们在体外测试,TraC-5M-7在32℃下的编辑效率比在25℃下的编辑效率高1-29倍,最高效率达到3.41%(图18)。


















Claims (55)

  1. 一种工程化的规律间隔成簇短回文重复序列(CRISPR)系统,包含:
    a)转座子和CRISPR-Cas12中间体(TraC)效应蛋白或者编码该效应蛋白的一种或多种核苷酸序列;和
    b)一种或多种向导RNA,或者编码该一种或多种向导RNA的核苷酸序列,
    其中向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
    所述TraC效应蛋白能够与向导RNA形成CRISPR复合物;
    所述TraC效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列。
  2. 根据权利要求1所述的工程化的CRISPR系统,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB)。
  3. 一种包含一种或多种构建体的工程化的规律间隔成簇短回文重复序列CRISPR载体系统,包含:
    a)可操作地连接至编码转座子和CRISPR-Cas12中间体(TraC)效应蛋白的核苷酸序列的第一调节元件;和
    b)可操作地连接至一种或多种核苷酸序列的第二调节元件,该一种或多种核苷酸序列编码一种或多种向导RNA,该向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
    所述TraC效应蛋白能够与向导RNA形成CRISPR复合物;
    所述TraC效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和/或crRNA的向导RNA的引导下靶向结合目标DNA序列。
  4. 根据权利要求3所述的工程化的CRISPR载体系统,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB)。
  5. 如权利要求2或4所述的系统,其中向导RNA为包含tracrRNA和crRNA的向导RNA,其中所述tracrRNA含有与非靶向链(NTS)互补配对的非靶向链结合序列(NTB),其中所述向导RNA通过crRNA与目标DNA序列的靶向链(TS)杂交,并且通过NTB与非靶向链(NTS)杂交。
  6. 如权利要求4所述的系统,其中当转录时,该一种或多种向导RNA与目标DNA杂交,并且向导RNA与该TraC效应蛋白形成复合物,该复合物引起该目标DNA序列远端切割。
  7. 如权利要求1-6中任一项所述的系统,其中该目标DNA序列是在细胞内,优选 为真核细胞内。
  8. 如权利要求1-7中任一项所述的系统,其中该效应蛋白包含一个或多个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
  9. 如权利要求1-8中任一项所述的系统,其中编码该效应蛋白的这些核酸序列被密码子优化,用于在真核细胞中表达。
  10. 如权利要求1-9中任一项所述的系统,其中组分a)和b)或它们的核苷酸序列构建在相同或不同载体上。
  11. 一种修饰目的DNA序列的方法,该方法包括将如权利要求1-10中任一项所述的系统地送到所述目的DNA序列或含有该目的DNA序列的细胞中。
  12. 一种修饰目的DNA序列的方法,该方法包括将TraC效应蛋白和一种或多种核酸组分的组合物递送至所述目的DNA序列,其中所述效应蛋白既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列;该效应蛋白与该一种或多种核酸组分形成CRISPR复合物,并且在所述复合物与是前间区序列邻近基序(PAM)的3’的目的DNA序列靶向结合后,该效应蛋白诱导对该目的DNA序列的修饰。
  13. 如权利要求12所述的方法,其中该目的基因是在细胞内,优选为真核细胞。
  14. 如权利要求13所述的方法,其中该细胞是动物细胞或人类细胞。
  15. 如权利要求13所述的方法,其中该细胞是植物细胞。
  16. 如权利要求12所述的方法,其中该效应蛋白包含一个或多个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
  17. 如权利要求12所述的方法,其中效应蛋白和核酸组分,或表达所述效应蛋白和核酸组分的构建体被包含在一个递送系统中。
  18. 如权利要求17所述的方法,其中递送系统包括病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体、脂质体纳米颗粒(LNP)、N-乙酰半乳糖胺(GalNAc)或工程菌。
  19. 一种用于在生物体或生物体细胞中进行基因组编辑的转座子与CRISPR-Cas12中间体(TraC)效应蛋白或其功能性变体,其中所述TraC效应蛋白或其功能性变体能够与向导RNA形成CRISPR复合物;
    所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA);
    所述TraC效应蛋白或其功能性变体既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列。
  20. 用于在生物体或生物体细胞中进行基因组编辑的转座子与CRISPR-Cas12中间蛋白(TraC)效应蛋白或其功能性变体,所述TraC效应蛋白或其功能性变体
    (i)包含选自“TSxxCxxCx”、“GIDRG”和“CxxCGxxxxADxxAA”的至少一个、至少两个或全部三个氨基酸序列基序,其中x代表任意氨基酸,例如任意天然编码的氨基 酸;和
    (ii)包含与SEQ ID NO:1-37之一具有至少30%、至少35%、至少40%、至少45%、至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%、至少99.9%、甚至100%序列相同性的氨基酸序列,或包含相对于SEQ ID NO:1-37具有一或多个,例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸取代、缺失或添加的氨基酸序列。
  21. 权利要求20所述的TraC效应蛋白或其功能性变体,其中所述效应蛋白功能性变体衍生自SEQ ID NO:25,且相对于SEQ ID NO:25序列包含选自K78R、D86R、S137R、V145R、I147R、P148R、D150R、V228R、V254R、A510R、A278R、K315R、S334R、L343R、A369R、H392R、L394R、S408R、N456R、V500R、A510R、T573R的一个或多个氨基酸取代。
  22. 权利要求20或21所述的TraC效应蛋白或其功能性变体,其中所述效应蛋白功能性变体衍生自SEQ ID NO:25,且相对于SEQ ID NO:25序列包含选自表3或表4中所示的任一组突变。
  23. 权利要求20所述的TraC效应蛋白或其功能性变体,其中所述效应蛋白功能性变体包含选自SEQ ID NO:80-87的氨基酸序列。
  24. 权利要求20-23中任一项的TraC效应蛋白或其功能性变体,其至少具有向导RNA介导的序列特异性靶向能力。
  25. 权利要求20-23中任一项的TraC效应蛋白或其功能性变体,其具有向导RNA介导的序列特异性靶向能力,以及双链核酸切割活性。
  26. 权利要求20-23中任一项的TraC效应蛋白或其功能性变体,其具有向导RNA介导的序列特异性靶向能力,以及切口酶活性。
  27. 权利要求20-23中任一项的TraC效应蛋白或其功能性变体,其具有向导RNA介导的序列特异性靶向能力,但不具有双链核酸切割活性和/或切口酶活性。
  28. 权利要求24-27中任一项的TraC效应蛋白或其功能性变体,其中所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA)。
  29. 权利要求28的TraC效应蛋白或其功能性变体,所述TraC效应蛋白或其功能性变体既能够在衍生自转座子右端元件的向导RNA的引导下靶向结合目标DNA序列,也能够在包含tracrRNA和crRNA的向导RNA的引导下靶向结合目标DNA序列。
  30. 权利要求28的TraC效应蛋白或其功能性变体,其中所述向导RNA是衍生自TnpB系统的reRNA,例如,所述reRNA包含SEQ ID NO:77或78所示支架序列。
  31. 权利要求28的TraC效应蛋白或其功能性变体,其中所述向导RNA是tracrRNA和crRNA的单向导RNA(sgRNA),例如,所述sgRNA包含SEQ ID NO:75或76所示 支架序列。
  32. 权利要求19-31中任一项的TraC效应蛋白或其功能性变体,其还包含至少一个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
  33. 一种融合蛋白,包含权利要求19-32中任一项所述TraC效应蛋白或其功能性变体,以及至少一种其它功能性蛋白。
  34. 权利要求33的融合蛋白,其中所述其它功能性蛋白是脱氨酶。
  35. 权利要求34的融合蛋白,其中所述脱氨酶是胞嘧啶脱氨酶,例如,所述胞嘧啶脱氨酶选自APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶、双链DNA脱氨酶(Ddd)、单链DNA脱氨酶(Sdd)或它们的功能性变体。
  36. 权利要求35的融合蛋白,所述融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。
  37. 权利要求34的融合蛋白,其中所述脱氨酶是腺嘌呤脱氨酶,例如,衍生自大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的DNA依赖型腺嘌呤脱氨酶。
  38. 权利要求34-37中任一项的融合蛋白,其中所述融合蛋白包括胞嘧啶脱氨酶和腺嘌呤脱氨酶。
  39. 权利要求33的融合蛋白,其中所述其它功能性蛋白是选自转录激活蛋白、转录抑制蛋白、DNA甲基化酶、DNA去甲基化酶、逆转录酶。
  40. 权利要求33-39中任一项的融合蛋白,其中所述融合蛋白的不同部分之间可以独立地通过接头或直接相连。
  41. 权利要求33-40中任一项的融合蛋白,其还包含至少一个核定位序列(NLS)、细胞质定位序列、叶绿体定位序列或线粒体定位序列。
  42. 权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白在对细胞,优选真核细胞,更优选植物细胞进行基因组编辑的用途。
  43. 权利要求42的用途,其中所述基因组编辑包括碱基编辑(Base Editor)、引导编辑(Prime Editor)、PrimeRoot编辑(PrimRoot Editor)。
  44. 一种用于对细胞基因组中靶核酸序列进行定点修饰的基因组编辑系统,其包含:
    权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白;和/或
    编码权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白的核苷酸序列的表达构建体。
  45. 权利要求44的基因组编辑系统,其还包括至少一种向导RNA(gRNA)和/或包含编码所述至少一种向导RNA的核苷酸序列的表达构建体。
  46. 权利要求45的基因组编辑系统,其中所述基因组编辑系统包含选自以下的任一项:
    i)权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中 任一项的融合蛋白,和所述至少一种向导RNA,任选地,所述TraC效应蛋白或其功能性变体或所述融合蛋白和所述至少一种向导RNA形成复合物;
    ii)包含编码权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白的核苷酸序列的表达构建体,和所述至少一种向导RNA;
    iii)权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白,和包含编码所述至少一种向导RNA的核苷酸序列的表达构建体;
    iv)包含编码权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白的核苷酸序列的表达构建体,和包含编码所述至少一种向导RNA的核苷酸序列的表达构建体;
    v)包含编码权利要求19-32中任一项的TraC效应蛋白或其功能性变体或权利要求33-41中任一项的融合蛋白的核苷酸序列和编码所述至少一种向导RNA的核苷酸序列的表达构建体。
  47. 权利要求45-46中任一项的基因组编辑系统,其中所述向导RNA选自i)衍生自转座子右端元件的向导RNA(reRNA)和/或ii)包含tracrRNA和/或crRNA的向导RNA,例如包含tracrRNA和crRNA的单向导RNA(sgRNA)。
  48. 权利要求47的基因组编辑系统,其中所述向导RNA是衍生自TnpB系统的reRNA,例如,所述reRNA包含SEQ ID NO:77或78所示支架序列。
  49. 权利要求45-46中任一项的基因组编辑系统,其中所述向导RNA是包含tracrRNA和crRNA的单向导RNA(sgRNA),例如,所述sgRNA包含SEQ ID NO:75或76所示支架序列。
  50. 权利要求47或49的基因组编辑系统,其中所述向导RNA包含tracrRNA和crRNA,例如是包含tracrRNA和crRNA的单向导RNA(sgRNA),其中所述crRNA包含与PAM紧邻的靶序列相同的序列,tracrRNA包含与位于PAM靶序列方向远端的序列互补的序列(非靶向链结合序列,NTB)。
  51. 权利要求44-50中任一项的基因组编辑系统,其中所述基因组编辑系统还包含供体核酸分子,所述供体核酸分子包含待定点插入基因组中的核苷酸序列,例如所述待定点插入基因组中的核苷酸序列两侧包含与基因组中靶序列两侧序列同源的序列。
  52. 权利要求44-51中任一项的基因组编辑系统,其中编码所述TraC效应蛋白或其功能性变体或所述融合蛋白的核苷酸序列和/或编码所述至少一种向导RNA的核苷酸序列与表达调控元件如启动子可操作地连接。
  53. 权利要求44-52中任一项的基因组编辑系统,其中所述基因组编辑系统的组分被包含在递送体系中,所述递送体系选自病毒、病毒样颗粒、病毒体、脂质体、囊泡、外来体、脂质体纳米颗粒(LNP)、N-乙酰半乳糖胺(GalNAc)或工程菌。
  54. 一种产生经遗传修饰的细胞的方法,包括将权利要求44-53中任一项的基因组编辑系统导入所述细胞。
  55. 权利要求54的方法,其中所述细胞来自原核生物或真核生物,优选来自哺乳动 物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥。
PCT/CN2023/097783 2022-06-01 2023-06-01 新的crispr基因编辑系统 WO2023232109A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210620277 2022-06-01
CN202210620277.9 2022-06-01

Publications (1)

Publication Number Publication Date
WO2023232109A1 true WO2023232109A1 (zh) 2023-12-07

Family

ID=88991305

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097783 WO2023232109A1 (zh) 2022-06-01 2023-06-01 新的crispr基因编辑系统

Country Status (2)

Country Link
CN (1) CN117187213A (zh)
WO (1) WO2023232109A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021031085A1 (zh) * 2019-08-19 2021-02-25 南方医科大学 一种高保真CRISPR/AsCpf1突变体的构建及其应用
US20210115421A1 (en) * 2019-10-17 2021-04-22 Pairwise Plants Services, Inc. Variants of cas12a nucleases and methods of making and use thereof
CN113373130A (zh) * 2021-05-31 2021-09-10 复旦大学 Cas12蛋白、含有Cas12蛋白的基因编辑系统及应用
CN114075559A (zh) * 2020-09-14 2022-02-22 珠海舒桐医疗科技有限公司 一种2型CRISPR/Cas9基因编辑系统及其应用

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021031085A1 (zh) * 2019-08-19 2021-02-25 南方医科大学 一种高保真CRISPR/AsCpf1突变体的构建及其应用
US20210115421A1 (en) * 2019-10-17 2021-04-22 Pairwise Plants Services, Inc. Variants of cas12a nucleases and methods of making and use thereof
CN114075559A (zh) * 2020-09-14 2022-02-22 珠海舒桐医疗科技有限公司 一种2型CRISPR/Cas9基因编辑系统及其应用
CN113373130A (zh) * 2021-05-31 2021-09-10 复旦大学 Cas12蛋白、含有Cas12蛋白的基因编辑系统及应用

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KARVELIS TAUTVYDAS; DRUTEIKA GYTIS; BIGELYTE GRETA; BUDRE KAROLINA; ZEDAVEINYTE RIMANTE; SILANSKAS ARUNAS; KAZLAUSKAS DARIUS; VENC: "Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease", NATURE, vol. 599, no. 7886, 7 October 2021 (2021-10-07), pages 692 - 696, XP037627757, DOI: 10.1038/s41586-021-04058-1 *
MA WANG; XU YING-SHUANG; SUN XIAO-MAN; HUANG HE: "Transposon-Associated CRISPR-Cas System: A Powerful DNA Insertion Tool", TRENDS IN MICROBIOLOGY, ELSEVIER SCIENCE LTD., KIDLINGTON., GB, vol. 29, no. 7, 18 February 2021 (2021-02-18), GB , pages 565 - 568, XP086604210, ISSN: 0966-842X, DOI: 10.1016/j.tim.2021.01.017 *
ZHANG MENG-SI, ZHU DE-KANG, WANG MING-SHU: "Transposases in Bacterial Insertion Sequences and Their Transposition Mechanisms", CHINESE JOURNAL OF BIOCHEMISTRY AND MOLECULAR BIOLOGY, vol. 34, no. 10, 1 October 2018 (2018-10-01), pages 1057 - 1064, XP093115635, DOI: 10.13865/j.cnki.cjbmb.2018.10.06 *

Also Published As

Publication number Publication date
CN117187213A (zh) 2023-12-08

Similar Documents

Publication Publication Date Title
Chen et al. Prime editing for precise and highly versatile genome manipulation
CN109983124B (zh) 使用可编程dna结合蛋白增强靶向基因组修饰
US11702643B2 (en) System and method for genome editing
EP3985113A1 (en) Crispr-associated (cas) protein
US20160362667A1 (en) CRISPR-Cas Compositions and Methods
EP3744844A1 (en) Extended single guide rna and use thereof
US9708589B2 (en) Compositions and methods for custom site-specific DNA recombinases
CN107922949A (zh) 用于通过同源重组的基于crispr/cas的基因组编辑的化合物和方法
WO2021032155A1 (zh) 一种碱基编辑系统和其使用方法
JP7138712B2 (ja) ゲノム編集のためのシステム及び方法
CN113373130A (zh) Cas12蛋白、含有Cas12蛋白的基因编辑系统及应用
WO2023169454A1 (zh) 腺嘌呤脱氨酶及其在碱基编辑中的用途
WO2023169410A1 (zh) 胞嘧啶脱氨酶及其在碱基编辑中的用途
WO2021175289A1 (zh) 多重基因组编辑方法和系统
WO2020020193A1 (zh) 基于人apobec3a脱氨酶的碱基编辑器及其用途
JP7361109B2 (ja) C2c1ヌクレアーゼに基づくゲノム編集のためのシステムおよび方法
Kim et al. Base editing of organellar DNA with programmable deaminases
WO2023232109A1 (zh) 新的crispr基因编辑系统
WO2021175288A1 (zh) 改进的胞嘧啶碱基编辑系统
EP4392557A1 (en) Method for cas9 nickase-mediated gene editing
JP2024501892A (ja) 新規の核酸誘導型ヌクレアーゼ
KR20220039564A (ko) 신규의 개량된 염기 편집 또는 교정용 융합단백질 및 이의 용도
WO2023165613A1 (zh) 5'→3'核酸外切酶在基因编辑系统中的用途和基因编辑系统及其编辑方法
WO2021098709A1 (zh) 衍生自黄杆菌的基因编辑系统
US6498036B1 (en) Methods of targeting a chromosomal gene sequence in a eukaryotic cell

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23815286

Country of ref document: EP

Kind code of ref document: A1