WO2021169925A1 - Fusion protein and application thereof - Google Patents

Fusion protein and application thereof Download PDF

Info

Publication number
WO2021169925A1
WO2021169925A1 PCT/CN2021/077328 CN2021077328W WO2021169925A1 WO 2021169925 A1 WO2021169925 A1 WO 2021169925A1 CN 2021077328 W CN2021077328 W CN 2021077328W WO 2021169925 A1 WO2021169925 A1 WO 2021169925A1
Authority
WO
WIPO (PCT)
Prior art keywords
fusion protein
nucleic acid
sequence
present
combination
Prior art date
Application number
PCT/CN2021/077328
Other languages
French (fr)
Chinese (zh)
Inventor
牛小牧
李彦莎
梁亚峰
Original Assignee
山东舜丰生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东舜丰生物科技有限公司 filed Critical 山东舜丰生物科技有限公司
Publication of WO2021169925A1 publication Critical patent/WO2021169925A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H5/00Angiosperms, i.e. flowering plants, characterised by their plant parts; Angiosperms characterised otherwise than by their botanic taxonomy
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H6/00Angiosperms, i.e. flowering plants, characterised by their botanic taxonomy
    • A01H6/20Brassicaceae, e.g. canola, broccoli or rucola
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8218Antisense, co-suppression, viral induced gene silencing [VIGS], post-transcriptional induced gene silencing [PTGS]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/48Hydrolases (3) acting on peptide bonds (3.4)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/48Hydrolases (3) acting on peptide bonds (3.4)
    • C12N9/485Exopeptidases (3.4.11-3.4.19)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y207/00Transferases transferring phosphorus-containing groups (2.7)
    • C12Y207/10Protein-tyrosine kinases (2.7.10)
    • C12Y207/10001Receptor protein-tyrosine kinase (2.7.10.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y304/00Hydrolases acting on peptide bonds, i.e. peptidases (3.4)
    • C12Y304/11Aminopeptidases (3.4.11)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor

Definitions

  • the invention belongs to the field of biotechnology, and specifically relates to a fusion protein and its application.
  • DNA methylation is a form of DNA chemical modification, which can change the genetic performance without changing the DNA sequence. It is a common modification method in eukaryotic cells. DNA methylation is established by DNA methylation transferase using S-adenosylmethionine (SAM) as a methyl donor to catalyze the reaction.
  • SAM S-adenosylmethionine
  • the base at the modified site can be N-6 of adenine (6mA), N-4 of cytosine, N-7 of guanine (7mG) and cytosine The C-5 position (5mC). They are catalyzed by different DNA methylases. However, the most clear and common research is the methylation of 5mC, the C-5 position of cytosine.
  • Methylated DNA can be demethylated.
  • Passive demethylation is related to half-reserved DNA replication. Because the new strands produced by DNA replication do not have DNA methylation, if the methylation maintenance system does not work, this will lead to the occurrence of DNA demethylation. Obviously, this is a passive process.
  • Active demethylation is related to the catalysis of DNA demethylase. For example, TET1 (ten-eleven translocation 1) and ROS1 (repressor of silence 1) are demethylases of animals and plants, respectively. They cannot directly remove the methyl group at the C-5 position of cytosine. The mechanism of base mismatch repair introduces a new unmodified cytosine.
  • DNA demethylation plays an important role in the reactivation of silent genes.
  • DNA methylation can lead to changes in DNA conformation in certain regions, thereby affecting the interaction between protein and DNA, leading to gene silencing.
  • DNA methylation controls a variety of biological processes, including flower morphology, sex determination, plant structure, flowering time, biomass, and leaf senescence.
  • the epigenetic traits of organisms can be manipulated. Therefore, the development of targeted nucleic acid methylation or demethylation tools has important scientific value for the study of methylation function and epigenetic breeding.
  • the present invention provides a protein capable of demethylating and modifying nucleic acid efficiently and site-specifically and its application.
  • a fusion protein comprising components selected from the following:
  • the positioning function element D1 which has the function of targeting and binding DNA
  • Demethylation functional element D2 which has the function of converting methylated nucleotides into non-methylated nucleotides.
  • the D1 element has no catalytic activity and is selected from the following group: Cas protein, zinc finger protein or TALENs protein, or functional domains thereof, or a combination thereof.
  • the D1 element is selected from the following group: dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7, or functional domains thereof, or combinations thereof.
  • the D1 element is dCas9.
  • the D1 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO:1 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, A sequence with at least 97%, at least 98%, or at least 99% sequence identity.
  • the D2 element has the function of converting methylated cytosine into unmethylated cytosine.
  • the D2 element is a demethylase or its demethylation functional domain selected from the group consisting of ROS1, TET, DME, DML, or a combination thereof.
  • the D2 element is ROS1 or its functional domain.
  • the D2 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • SEQ ID NO: 3 has one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8 1, 9, or 10 amino acid substitutions, deletions or additions) sequence; or
  • sequence shown in SEQ ID NO: 3 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least A sequence of 97%, at least 98%, or at least 99% sequence identity.
  • the D1 element is located at the N end or C end of the D2 element.
  • the D1 element and the D2 element are connected by one or more of the following components: peptide bond, connecting peptide, nuclear localization signal, epitope tag, or a combination thereof.
  • the nuclear localization signal comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO: 5 or SEQ ID NO: 7 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%. %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity.
  • the epitope tag is selected from the group consisting of His tag, GST tag, HA tag, c-Myc tag, Flag tag, V5 tag, or a combination thereof.
  • the fusion protein comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO: 9 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, A sequence with at least 97%, at least 98%, or at least 99% sequence identity.
  • the N-terminal or C-terminal of the fusion protein further includes one or more of the following elements: epitope tag, reporter gene sequence, nuclear localization signal (NLS), chloroplast signal peptide, transcription activation domain (E.g., VP64), transcription repression domain (e.g. KRAB structure and or SID domain), nuclease domain (e.g. Fok1), or a combination thereof.
  • epitope tag reporter gene sequence
  • NLS nuclear localization signal
  • chloroplast signal peptide E.g., VP64
  • transcription repression domain e.g. KRAB structure and or SID domain
  • nuclease domain e.g. Fok1
  • a fusion protein combination in the second aspect of the present invention, includes a first fusion protein and a second fusion protein, and the structure of the first fusion protein and the second fusion protein is the same as the present
  • the fusion protein according to the first aspect of the invention is shown; it is characterized in that D2 in the first fusion protein and the second fusion protein are different;
  • the first fusion protein or the second fusion protein has the structure shown in formula I from the N-terminus to the C-terminus;
  • D1 is a positioning function element, which has the function of targeting and binding DNA
  • D2 is a demethylated functional element, which has the function of converting methylated nucleotides into unmethylated nucleotides
  • X is connecting peptide, epitope tag or nuclear localization signal (NLS);
  • n an integer of 0-6;
  • D2 in the respective structures of the first fusion protein and the second fusion protein are different.
  • n 1
  • X is a nuclear localization signal.
  • D2 of the first fusion protein is ROS1 or its functional domain
  • D2 of the second fusion protein is TET1 or its functional domain
  • nucleic acid encoding the fusion protein according to the first aspect of the present invention.
  • sequence of the nucleic acid includes the following elements:
  • Z1 is the nucleotide sequence encoding the positioning function element D1 in the fusion protein.
  • Z2 is the nucleotide sequence encoding the functional demethylation element D2 in the fusion protein.
  • the Z1 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO: 2 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% Sequence of sequence identity;
  • the Z2 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO: 4 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% Sequence of sequence identity;
  • the nucleic acid comprises a sequence encoding a nuclear localization signal
  • the sequence encoding the nuclear localization signal has a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in any one of SEQ ID NO: 6 or SEQ ID NO: 8 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80% , A sequence with at least 90%, at least 95% sequence identity;
  • the nucleic acid has a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO: 10 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% Sequence of sequence identity;
  • nucleic acid construct comprising a first nucleic acid sequence and one or more second nucleic acid sequences, wherein the first nucleic acid sequence encodes the fusion protein according to the first aspect of the present invention Or the fusion protein combination according to the second aspect of the present invention, wherein the second nucleic acid sequence is a gRNA coding sequence.
  • the 5 'and / or 3' of the end of the first nucleic acid sequence comprises one or more nuclear localization signal.
  • one end of the first nucleic acid sequence contains a promoter, and optionally, the other end contains a terminator; the promoter is selected from RNA polymerase II-dependent promoters, and the promoter The sub is selected from UBI, UBQ, 35S, Actin, SPL, CmYLCV, YAO, CDC45, rbcS, rbcL, PsGNS2, UEP1, TobRB7, Cab, or a combination thereof.
  • the nucleic acid construct contains 1-6 gRNA coding sequences.
  • the coding sequence of the gRNA series distribution at the 5 'end or 3' end of the first nucleic acid sequence is a preferred embodiment.
  • the second nucleic acid sequences are distributed at both ends of the first nucleic acid structure sequence.
  • each gRNA coding sequence in the second nucleic acid sequence contains an RNA polymerase III-dependent promoter promoter, and the promoter is selected from: U6, U3, U6a , U6b, U6c, U6-1, U3b, U3d, U6-26, U6-29, 7SL or 5H1.
  • a vector which contains the nucleic acid according to the third aspect of the present invention or the nucleic acid construct according to the fourth aspect of the present invention.
  • a protein component comprising the fusion protein described in the first aspect of the present invention or the fusion protein combination described in the second aspect of the present invention.
  • the nucleic acid component is one or more gRNA sequences
  • the protein component and the nucleic acid component combine with each other to form the complex.
  • a polynucleotide combination is provided, which encodes the fusion protein combination according to the second aspect of the present invention.
  • the polynucleotide combination includes a first polynucleotide and a second polynucleotide, wherein both the first polynucleotide and the second nucleotide encodes as in the present invention
  • the fusion protein described in the first aspect, and the D2 elements of the two fusion proteins are different.
  • first nucleotide and the second nucleotide respectively further include one or more gRNA coding sequences.
  • first polynucleotide and the second nucleotide are located in the same vector or different vectors.
  • first polynucleotide and the second nucleotide are located in different vectors.
  • the vector containing the first nucleic acid and the vector containing the second nucleic acid transform cells simultaneously or sequentially.
  • a host cell containing the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention, or the first aspect of the present invention.
  • the vector of the fifth aspect, or the complex of the sixth aspect of the present invention, or the polynucleotide of the third aspect of the present invention or the polynucleotide of the fourth aspect of the present invention integrated in the genome of the host cell The nucleic acid construct, or the polynucleotide combination according to the seventh aspect of the present invention.
  • the host cell is a eukaryotic cell or a prokaryotic cell.
  • the host cell is a plant cell.
  • the plant is a monocotyledonous plant or a dicotyledonous plant.
  • the ninth aspect of the present invention provides a method for preparing the fusion protein of the first aspect of the present invention, which includes the following steps:
  • the host cell contains the vector according to the fifth aspect of the present invention, or the polynucleotide according to the third aspect of the present invention is integrated into the genome.
  • the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention, or the nucleic acid according to the third aspect of the present invention, or the fourth aspect of the present invention is provided.
  • the nucleic acid construct described in the aspect, or the vector described in the fifth aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the polynucleotide combination described in the seventh aspect of the present invention is used to remove the target nucleic acid. Use in methylation modification.
  • the demethylation is the conversion of methylated cytosine to unmethylated cytosine.
  • the target nucleic acid is derived from a eukaryote or a prokaryote.
  • the target nucleic acid is derived from plant cells or animal cells.
  • the target nucleic acid is derived from the nucleus, cytoplasm, chloroplast or mitochondria.
  • the target nucleic acid is DNA, RNA or a combination thereof.
  • the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention, or the nucleic acid according to the third aspect of the present invention, or the first aspect of the present invention is provided.
  • the nucleic acid construct described in the fourth aspect, or the vector described in the fifth aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the polynucleotide combination described in the seventh aspect of the present invention is prepared for Use in a kit for demethylation modification of target nucleic acid.
  • kits comprising one or more of the following group: the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention , Or the nucleic acid according to the third aspect of the present invention, or the nucleic acid construct according to the fourth aspect of the present invention, the vector according to the fifth aspect of the present invention, the complex according to the sixth aspect of the present invention, the seventh aspect of the present invention.
  • a method for reducing DNA methylation of a target gene or its promoter or its enhancer in a cell expresses the fusion described in the first aspect of the present invention in the cell.
  • Protein, and one or more gRNAs related to the target gene are provided.
  • the fourteenth aspect of the present invention provides a method for regulating the expression of a target gene, which includes the following steps: expressing the fusion protein described in the first aspect of the present invention and combining it with the target gene or the expression control element of the target gene, Demethylate the DNA at this site.
  • the regulation includes: activation, enhancement, inhibition, reduction or inactivation.
  • the present invention provides a method for activating or enhancing gene expression, which comprises the following steps: expressing the fusion protein described in the first aspect of the present invention and combining it with the expression control element of the target gene, Demethylate the DNA at this site.
  • the expression control elements include: promoter, enhancer, terminator, transposon, and silencer.
  • a method for regulating plant traits is provided, which is characterized by comprising the following steps:
  • nucleic acid sequence expressing the gRNA related to the fusion protein and the regulatory gene of the first aspect of the present invention is introduced into the plant cell and integrated into the genome;
  • the method for introducing cells includes Agrobacterium infection, gene gun transformation, microinjection, electric shock, ultrasound, and polyethylene glycol (PEG)-mediated method.
  • the traits are epigenetic traits of plants.
  • Figure 1 shows the decrease in the methylation level of MEMS in the target region in the transgenic T1 plant in Example 1.
  • Figure 2 shows the expression level of ROS1 in the transgenic T1 plant in Example 1.
  • Figure 3 shows the genetic stability of MEMS site demethylation in the transgenic T2 plant in Example 1.
  • Figure 4 shows the results of demethylation of different regions in Example 1.
  • Figure 5 shows the genetic stability of the transgenic T2 plants in Example 2.
  • Figure 6 shows the structural composition of the demethylated gene editing tool.
  • the inventors developed an efficient and site-specific method for removing DNA methylation modification for the first time. Specifically, the present inventors fused dCas9 or its functional domain with the function of targeting and binding DNA with the demethylase ROS1 or its functional domain to obtain a fusion protein; and introduced multiple nucleic acids corresponding to the target. The sequence of the gRNA sequence can be accurately positioned to demethylate the target nucleic acid region.
  • the demethylation method of the present invention has precise and efficient demethylation modification efficiency in plants, and has important scientific value for studying epigenetics of plants and regulating plant traits through demethylation. On this basis, the present invention has been completed.
  • fusion protein refers to the fusion protein described in the first aspect of the present invention, which has the function of targeted binding to DNA and converting target methylated nucleotides into unmethylated nucleotides.
  • fusion protein combination refers to a combination of multiple fusion proteins in the present invention.
  • each fusion protein has a different demethylase catalytic domain.
  • the different demethylase catalytic domains have different demethylation effects on different target nucleic acid sites, so that they complement each other.
  • Cas protein refers to a nuclease.
  • a preferred Cas protein is the Cas9 protein.
  • Typical Cas9 proteins include (but are not limited to): Cas9 derived from Staphylococcus aureus.
  • the Cas9 protein can also be replaced by Cas proteins derived from other CRISPR systems, such as Cpf1 nuclease.
  • the source of the Cpf1 nuclease is selected from the group consisting of Acidaminococcus and Laureus sp. Family (Lachnospiraceae), acid aminococcus mutants, Lachnospiraceae mutants.
  • the “d” in the "dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7” stands for "dead”, which means Cas protein that has lost its enzymatic cleavage activity, that is, it cannot cut single-stranded or double-stranded DNA sequences, but can still interact with The gRNA forms a complex that targets and binds to the DNA sequence.
  • epitope tag can be fused to the N-terminus or C-terminus of the target protein through molecular genetics, without affecting the biological activity of the target protein, and it is easy to detect with the target protein. .
  • the "connecting peptide” is a short peptide chain composed of multiple amino acids that connects the D1 element and the D2 element to form a fusion protein.
  • the connecting peptide does not affect the expression of the fusion protein.
  • the length of the connecting peptide is generally 1-100 aa, preferably, 15-85 aa, more preferably, 25-70 aa, more preferably, 24-32 aa.
  • the commonly used connecting peptide can be XTEN.
  • gRNA is also called guide RNA or guide RNA, and has the meaning commonly understood by those skilled in the art.
  • guide RNAs can include direct repeats and guide sequences, or consist essentially of direct repeats and guide sequences (also called spacers in the context of endogenous CRISPR systems). (spacer)) composition.
  • gRNA can include crRNA and tracrRNA, or only crRNA, depending on the Cas protein it depends on.
  • crRNA and tracrRNA can be artificially modified and fused to form single guide RNA (sgRNA).
  • the gRNA of the present invention may be natural, or artificially modified or designed and synthesized.
  • the targeting sequence is any polynucleotide sequence that has sufficient complementarity with the target sequence to hybridize with the target sequence and guide the specific binding of the CRISPR/Cas complex to the target sequence, usually having 17- Sequence length of 23nt.
  • the degree of complementarity between the targeting sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, Or at least 99%. Determining the best alignment is within the abilities of those of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs, such as but not limited to ClustalW, Smith-Waterman algorithm in matlab, Bowtie, Geneious, Biopython, and SeqMan.
  • the "functional domain” refers to a region of a protein or enzyme that independently performs its biological function and has a specific structure. It can be a part of the protein structure, or it can be composed of one or more protein domains in an operably linked manner.
  • the domain is a combination of different secondary structures and super-secondary structures, and is a subunit that bears part or all of the physiological functions in the expression of protein functions.
  • the number of amino acid residues in common domains is between 100 and 400, the smallest domain has only 40 to 50 amino acid residues, and the larger domain can exceed 400 amino acid residues.
  • epigenetic refers to a genetic function that has undergone heritable changes in the absence of changes in the DNA sequence of a gene, which ultimately leads to a change in phenotype.
  • mechanisms affecting epigenetics include the following: DNA modification (such as DNA methylation), protein covalent modification, paramutation, regulation of non-coding RNA, chromatin remodeling, or genome imprinting.
  • the "epigenetic traits” mentioned herein refer to the observable plant traits or characteristics controlled by or involved in the regulation of epigenetic mechanisms in plants.
  • the demethylation modification described in the present invention mainly refers to the modification of 5-methylcytosine (5mC), which is a reversible epigenetic modification and plays an important role in the growth and development of plants.
  • 5mC 5-methylcytosine
  • Common demethylases in plants include but are not limited to: ROS1, TET1, DME, DML, etc.
  • ROS1 is a dual-function glycosidase that can directly excise methylated cytosine to create an empty base site, and then initiate base mismatch repair to introduce an unmodified cytosine.
  • TET is a dioxygenase that can oxidize methylated cytosine to 5-hydroxymethylcytosine, and then further catalyze it to 5-formylcytosine and 5-carboxycytosine, and then pass the DNA sugar group
  • the enzyme (TDG) cuts off 5-formylcytosine or 5-carboxycytosine to create an empty base site, which initiates base mismatch repair and reintroduces an unmodified cytosine.
  • the present invention provides a fusion protein, which has the function of targeted binding to DNA and converting target methylated nucleotides into unmethylated nucleotides.
  • the D1 element has no catalytic activity and is selected from the following group: Cas protein, zinc finger protein or TALENs protein, or functional domains thereof, or a combination thereof.
  • the D1 element is selected from the following group: dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7, or a combination thereof.
  • the D1 element is dCas9.
  • the D1 element is a functional domain of the dCas9 protein, comprising or consisting of the amino acid sequence shown in SEQ ID NO:1; its corresponding coding nucleotide sequence is as SEQ ID NO: 2 shown.
  • the D2 element has the function of converting methylated cytosine into unmethylated cytosine.
  • the D2 element is a demethylase or its demethylation functional domain selected from the following group: ROS1, TET, DME, DML, or a combination thereof; preferably, the D2 element is ROS1 or its function area.
  • the D2 element is a functional domain of the ROS1 protein, comprising or consisting of the amino acid sequence shown in SEQ ID NO: 3; its corresponding coding nucleotide sequence is shown in SEQ ID NO: 4 Shown.
  • the D1 element and the D2 element are connected by one or more of the following components: peptide bond, connecting peptide, nuclear localization signal, epitope tag, or a combination thereof.
  • the nuclear localization signal comprises or consists of the amino acid sequence shown in SEQ ID NO: 5 or SEQ ID NO: 7; the corresponding coding nucleotide sequences of each thereof are as SEQ ID NO: 6 and SEQ ID NO: Shown at 8.
  • the fusion protein comprises a sequence selected from the following, or consists of a sequence selected from the following:
  • sequence shown in SEQ ID NO: 9 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, A sequence with at least 97%, at least 98%, or at least 99% sequence identity.
  • the N-terminal or C-terminal of the fusion protein further includes one or more of the following elements: epitope tag, reporter gene sequence, nuclear localization signal (NLS), chloroplast signal peptide, transcription activation A domain (for example, VP64), a transcription repression domain (for example, a KRAB structure and or a SID domain), a nuclease domain (for example, Fok1), or a combination thereof.
  • epitope tag for example, reporter gene sequence, nuclear localization signal (NLS), chloroplast signal peptide, transcription activation A domain (for example, VP64), a transcription repression domain (for example, a KRAB structure and or a SID domain), a nuclease domain (for example, Fok1), or a combination thereof.
  • the present invention also includes fragments and analogs having the functions of the fusion protein of the present invention.
  • fragment and analogs refer to polypeptides that substantially maintain the same biological function or activity as the fusion protein of the present invention.
  • the fusion protein fragment, derivative or analogue of the present invention may be: (i) a polypeptide in which one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) are substituted, and such substituted amino acids
  • the residue may or may not be encoded by the genetic code; or (ii) a polypeptide with substitution groups in one or more amino acid residues; or (iii) the mature polypeptide and another compound (such as a compound that prolongs the half-life of the polypeptide) Such as polyethylene glycol) fused to a polypeptide; or (iv) additional amino acid sequence fused to the polypeptide sequence to form a polypeptide (such as a leader sequence or secretory sequence or a sequence or proprotein sequence used to purify the polypeptide, Or fusion protein).
  • these fragments, derivatives and analogs belong to the scope well known to those skilled in the art.
  • the said fusion protein variant is the amino acid sequence shown in SEQ ID NO: 9, after several (usually 1-60, preferably 1-30, more preferably 1- 20, preferably 1-10) derived sequences obtained by substituting, deleting or adding at least one amino acid, and adding one or several (usually within 20, preferably 10) at the C-terminus and/or N-terminus Within 5) amino acids.
  • amino acids with similar or similar properties are substituted, the function of the protein is usually not changed, and the addition of one or several amino acids to the C-terminal and/or ⁇ terminal usually does not change the function of the protein.
  • the present invention also includes analogs of the claimed fusion protein.
  • the difference between these analogs and the sequence SEQ ID NO: 9 of the present invention may be the difference in the amino acid sequence, the difference in the modified form that does not affect the sequence, or both.
  • Analogs of these proteins include natural or induced genetic variants. Induced variants can be obtained by various techniques, such as random mutagenesis by radiation or exposure to mutagens, site-directed mutagenesis or other known biological techniques. Analogs also include analogs having residues different from natural L-amino acids (such as D-amino acids), and analogs having non-naturally occurring or synthetic amino acids (such as ⁇ , ⁇ -amino acids). It should be understood that the protein of the present invention is not limited to the representative proteins exemplified above.
  • Modifications include: chemically derived forms of proteins in vivo or in vitro, the modifications can maintain or enhance or partially inhibit the transport function of the protein; the modifications include chemical modifications of amino acid side chains, peptides
  • the chemical modification of the chain end group such as the chemical modification of the sulfhydryl group, the chemical modification of the amino group, the chemical modification of the carboxyl group, the chemical modification of the disulfide bond and other modifications; the chemical modification includes phosphorylation modification (such as phosphotyrosine, Phosphoserine, phosphothreonine), glycosylation modification (mediated by glycosylase, such as N-glycosylation, O-glycosylation), fatty acylation (such as acetylation, palmitoylation), etc. .
  • the present invention also relates to methods for producing fusion proteins or fragments, derivatives or analogs thereof. It includes (a) culturing the above-mentioned host cell under conditions conducive to the production of the fusion protein or its fragment, derivative or analogue; and (b) isolating the fusion protein or its fragment, derivative or analogue.
  • the cells are cultured on a nutrient medium suitable for the production of the fusion protein by a method well known in the art. If the polypeptide is secreted into the nutrient medium, the polypeptide can be directly recovered from the medium. If the polypeptide is not secreted into the medium, it can be recovered from cell lysates.
  • the polypeptide can be detected by methods known in the art that are specific to the polypeptide. These detection methods may include the use of specific antibodies, the formation of enzyme products, or the disappearance of enzyme substrates.
  • the produced polypeptide can be recovered by methods known in the art.
  • the cells can be harvested by centrifugation, broken up by physical or chemical methods, and the resulting crude extract is retained for further purification.
  • Any convenient method can be used to lyse the transformed host cells expressing the fusion protein of the present invention or its fragments, derivatives or analogs, including freeze-thaw cycles, ultrasound, mechanical disruption, or the use of cytolytic agents. These methods are well known to those skilled in the art.
  • the fusion protein of the present invention or its fragments, derivatives or analogues can be recovered and purified from the culture of transformed host cells.
  • the methods used include ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, and phosphocellulose. Chromatography, Hydrophobic Interaction Chromatography, Affinity Chromatography, Hydroxyapatite Chromatography, Phytohemagglutinin Chromatography, etc.
  • the nucleic acid encoding the fusion protein of the present invention can encode the amino acid sequence shown in SEQ ID NO: 9, and preferably has the nucleotide sequence shown in SEQ ID NO: 10.
  • the present invention also includes at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80% with the preferred nucleic acid sequence of the present invention (SEQ ID NO: 10) , At least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence homology nucleic acid .
  • “Homology” or “identity” refers to the matching of sequences between two polypeptides or between two nucleic acids. When a certain position in the two sequences to be compared is occupied by the same base or amino acid monomer subunit (for example, a certain position in each of the two DNA molecules is occupied by adenine, or two A certain position in each of the polypeptides is occupied by lysine, so each molecule is the same at that position.
  • the "percent identity” between two sequences is a function of the number of positions compared by the number of matching positions shared by the two sequences x 100. For example, if 6 out of 10 positions in two sequences match, then the two sequences have 60% identity.
  • the nucleotide sequence in SEQ ID NO: 10 can be substituted, deleted or added one or more to generate a derivative sequence of SEQ ID NO: 10.
  • NO:10 has low homology, and can basically encode the amino acid sequence shown in SEQ ID NO:9.
  • “the nucleotide sequence in SEQ ID NO: 10 has been substituted, deleted, or added at least one nucleotide-derived sequence” means that it can be used under moderately stringent conditions, and more preferably under highly stringent conditions.
  • the nucleotide sequence to which the nucleotide sequence shown in SEQ ID NO: 10 hybridizes.
  • variant forms include (but are not limited to): deletion of several (usually 1-90, preferably 1-60, more preferably 1-20, and most preferably 1-10) nucleotides , Insertion and/or substitution, and adding several at the 5'and/or 3'end (usually within 60, preferably within 30, more preferably within 10, most preferably within 5 ) Nucleotide.
  • the polynucleotide or nucleic acid sequence of the present invention may be in the form of DNA or RNA.
  • the form of DNA includes: DNA, genomic DNA or synthetic DNA.
  • DNA can be single-stranded or double-stranded.
  • DNA can be a coding strand or a non-coding strand.
  • polynucleotide encoding the fusion protein of the present invention may include a polynucleotide encoding the fusion protein, or a polynucleotide that also includes additional coding and/or non-coding sequences.
  • the present invention also relates to variants of the aforementioned polynucleotides, which encode fragments, analogs and derivatives of polyglycosides or polypeptides having the same amino acid sequence as the present invention.
  • the variants of this polynucleotide can be naturally occurring allelic variants or non-naturally occurring variants. These nucleotide variants include substitution variants, deletion variants and insertion variants.
  • allelic variant is an alternative form of a polynucleotide. It may be a substitution, deletion or insertion of one or more nucleotides, but it will not substantially change the function of the encoded polypeptide. .
  • the present invention also relates to polynucleotides that hybridize with the aforementioned sequences and have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences.
  • the present invention particularly relates to polynucleotides that can hybridize with the polynucleotides of the present invention under stringent conditions.
  • stringent conditions refer to: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2 ⁇ SSC, 0.1% SDS, 60°C; or (2) adding during hybridization There are denaturants, such as 50% (v/v) methylphthalamide, 0.1% calf serum/0.1% Ficoll, 42°C, etc.; or (3) only the identity between the two sequences is at least 90% or more, It is more preferable that the hybridization occurs when more than 95%.
  • the full-length nucleic acid sequence of the present invention or its fragments can usually be obtained by PCR amplification method, recombination method or artificial synthesis method.
  • primers can be designed according to the relevant nucleotide sequence disclosed in the present invention, especially the open reading frame sequence, and a commercially available DNA library or a cDNA prepared by a conventional method known to those skilled in the art can be used.
  • the library is used as a template to amplify the relevant sequences. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order.
  • the recombination method can be used to obtain the relevant sequence in large quantities. It is usually cloned into a vector, and then transferred into a cell, and then the relevant sequence is isolated from the proliferated host cell by conventional methods.
  • artificial synthesis methods can also be used to synthesize related sequences, especially when the fragment length is short. Usually, by first synthesizing multiple small fragments, and then ligating to obtain fragments with very long sequences.
  • the DNA sequence encoding the protein (or fragment or derivative thereof) of the present invention can be obtained completely through chemical synthesis. This DNA sequence can then be introduced into various existing DNA molecules (or such as vectors) and cells known in the art. In addition, mutations can also be introduced into the protein sequence of the present invention through chemical synthesis.
  • the present invention provides a fusion protein and its coding sequence for highly efficient and site-specific removal of DNA methylation modification, which is of great significance for studying the function of DNA methylation.
  • the present invention provides for the first time the application of demethylated fusion protein in plants, and found that it has precise and efficient demethylation modification efficiency in plants, which is useful for studying epigenetics of plants and through demethylation regulation Plant traits have important scientific value.
  • sgRNA design five sgRNAs targeted to the MEMS region are designed.
  • the corresponding sgRNA sequences are shown in Table 1.
  • sgRNA also has a sticky end for ligation.
  • sgMEMS-1 and sgMEMS-4 are connected to U6 carrier; sgMEMS-2 and sgMEMS-5 are connected to U3b carrier; sgMEMS-3 is connected to 7SL carrier. Sequencing verified that the sgRNA was successfully ligated into the corresponding vector.
  • the three mixed sgRNA fragments obtained above were ligated to the p1300-UBQ-dCas9-TET1cd and p1300-UBQ-dCas9-ROS1cd vectors respectively using T4 ligase.
  • the p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd vector can be obtained after the ligation reaction at 16°C for 2h. Sequencing verifies that the fragments are correctly connected to the vector.
  • p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd vectors were digested with Kpn I+EcoR I, purified and recovered by ethanol; the mixed sgRNA fragments were connected into p1300-sgMEMS1_2_3 with T4 ligase -UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd carrier, the final carrier p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd-sgMEMS4_5 and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd-sg-sg4 carrier.
  • DNA extraction uses QIAGEN's plant DNA extraction kit.
  • Control DNA methylation/positive seedling methylation>1.5 is a positive seedling that is successfully edited.
  • One plant is selected from the plant with vector and one plant without the vector, and the leaves are taken again, and DNA is extracted with QIAGEN's plant DNA extraction kit.
  • dCas9-ROS1cd and dCas9-TET1cd reduce the methylation level of MEMS in the target region in transgenic T1 plants
  • transgenic plants No. 13 and No. 14 of dCas9-ROS1cd, No. 5 and No. 14 transgenic plants of dCas9-TETcd1 were significantly demethylated compared to the wild-type and dCas9 positive control plants. Retouch.
  • the transgenic line with dCas9-TET1cd maintained the original low methylation level at the MEMS site, and the dCas9-TET1cd T2 individuals without the transgene showed methylation reversal.
  • Both dCas9-ROS1cd and dCas9-TET1cd can mediate the demethylation of MEMS sites in the ROS1 promoter region in plants, and the demethylation editing efficiency of dCas9-ROS1cd is higher than that of dCas9-TET1cd.
  • Demethylation of MEMS sites can effectively reduce the expression of ROS1 gene. It shows that DNA methylation and demethylation can effectively regulate gene expression.
  • sgRNAs 1, 2, and 3 are connected to the upstream of the fusion protein, while sgRNAs 4, 5, and 6 are connected to the downstream of the fusion protein.
  • the sequence of sgRNA is shown in Table 3. sgRNA is consistent.
  • T4 ligase Use T4 ligase to ligate the recovered fragments into Takara's p20T vector.
  • a, b, and c respectively correspond to the methylation editing results of 3 sites; the bottom of each figure represents the position of the editing region on the chromosome, the red line represents the position of the CG site on the genome, and the blue The line represents the position of the CHG site on the genome, the black arrow represents the position of the primer used to analyze DNA methylation, and the position of the sgRNA corresponding to the genome is also marked in the figure; the top of each figure represents the level of DNA methylation , The solid represents DNA methylation at the corresponding site, the open represents no DNA methylation, red represents CG methylation, blue represents CHG methylation, and green represents CHH methylation.
  • the T2 individuals with and without the transgene maintained their hypomethylated state.
  • Both dCas9-ROS1cd and dCas9-TET1cd can mediate the demethylation of DNA at Chr4.8670151-8671193, and the demethylation editing efficiency of dCas9-TET1cd is higher than that of dCas9-ROS1cd.
  • the Chr5.9872445-9873033 (solo-LTR site) site
  • only dCas9-ROS1cd successfully demethylated it.
  • Chr3:2849440-2849791 only dCas9-TET1cd successfully demethylated it.
  • dCas9-ROS1cd and dCas9-TET1cd show different efficiencies, and they can complement each other when applied.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Cell Biology (AREA)
  • Botany (AREA)
  • Physiology (AREA)
  • Developmental Biology & Embryology (AREA)
  • Environmental Sciences (AREA)
  • Virology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Natural Medicines & Medicinal Plants (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided are a fusion protein and an application thereof. Specifically, provided is a fusion protein, which comprises a component selected from among the following, or is composed of the following components: (1) a positioning functional element D1, which has the function of targeting and binding DNA; and (2) a demethylated functional element D2, which has the function of converting a methylated nucleotide into an unmethylated nucleotide.

Description

一种融合蛋白及其应用A fusion protein and its application 技术领域Technical field
本发明属于生物技术领域,具体涉及一种融合蛋白及其应用。The invention belongs to the field of biotechnology, and specifically relates to a fusion protein and its application.
背景技术Background technique
DNA甲基化(DNA methylation)为DNA化学修饰的一种形式,能够在不改变DNA序列的前提下,改变遗传表现,是真核细胞普遍存在的修饰方式。DNA甲基化的建立均是由DNA甲基化转移酶以S-腺苷甲硫氨酸(SAM)为甲基供体催化反应形成。DNA methylation (DNA methylation) is a form of DNA chemical modification, which can change the genetic performance without changing the DNA sequence. It is a common modification method in eukaryotic cells. DNA methylation is established by DNA methylation transferase using S-adenosylmethionine (SAM) as a methyl donor to catalyze the reaction.
甲基化修饰有多种方式,被修饰位点的碱基可以是腺嘌呤的N-6位(6mA)、胞嘧啶的N-4位、鸟嘌呤的N-7位(7mG)和胞嘧啶的C-5位(5mC)。它们分别由不同的DNA甲基化酶催化。然而,研究的最清楚、最普遍的还是5mC即胞嘧啶C-5位的甲基化。There are many ways to modify the methylation. The base at the modified site can be N-6 of adenine (6mA), N-4 of cytosine, N-7 of guanine (7mG) and cytosine The C-5 position (5mC). They are catalyzed by different DNA methylases. However, the most clear and common research is the methylation of 5mC, the C-5 position of cytosine.
甲基化的DNA可以发生去甲基化。DNA的去甲基化存在被动去甲基化和主动去甲基化两种机制。被动去甲基化与DNA半保留复制有关。因为DNA复制产生的新链是没有DNA甲基化的,如果甲基化的维持系统没有工作,这样就导致了DNA去甲基化的发生,很明显这是一个被动的过程。主动去甲基化与DNA去甲基化酶催化有关。例如,TET1(ten-eleven translocation 1)和ROS1(repressor of silencing 1)分别是动物和植物的去甲基化酶,它们都不能直接移除胞嘧啶C-5位上的甲基,均是通过碱基错配修复的机制引入一个新的没有修饰的胞嘧啶。Methylated DNA can be demethylated. There are two mechanisms for DNA demethylation: passive demethylation and active demethylation. Passive demethylation is related to half-reserved DNA replication. Because the new strands produced by DNA replication do not have DNA methylation, if the methylation maintenance system does not work, this will lead to the occurrence of DNA demethylation. Obviously, this is a passive process. Active demethylation is related to the catalysis of DNA demethylase. For example, TET1 (ten-eleven translocation 1) and ROS1 (repressor of silence 1) are demethylases of animals and plants, respectively. They cannot directly remove the methyl group at the C-5 position of cytosine. The mechanism of base mismatch repair introduces a new unmodified cytosine.
研究表明,DNA去甲基化对于沉默基因的重新激活起着重要作用。并且,DNA的甲基化可以导致某些区域DNA构象变化,从而影响蛋白与DNA的相互作用,导致基因沉默。在植物中,DNA甲基化控制多种生物过程,其中包括花形态、性别决定、植物结构、开花时间、生物量和叶片衰老等。通过控制DNA的甲基化可实现对生物体表观遗传性状的操控。因此,开发靶向核酸甲基化或去甲基化工具,对于甲基化功能的研究和表观遗传育种具有重要的科学价值。Studies have shown that DNA demethylation plays an important role in the reactivation of silent genes. In addition, DNA methylation can lead to changes in DNA conformation in certain regions, thereby affecting the interaction between protein and DNA, leading to gene silencing. In plants, DNA methylation controls a variety of biological processes, including flower morphology, sex determination, plant structure, flowering time, biomass, and leaf senescence. By controlling DNA methylation, the epigenetic traits of organisms can be manipulated. Therefore, the development of targeted nucleic acid methylation or demethylation tools has important scientific value for the study of methylation function and epigenetic breeding.
因此,本领域迫切需要开发出一种能够一种高效、定点地对DNA进行甲基化或去甲基化修饰的方法。Therefore, there is an urgent need in this field to develop a method that can efficiently and site-specifically methylate or demethylate DNA.
发明内容Summary of the invention
本发明提供一种高效、定点地对核酸进行去甲基化修饰的蛋白及其应用。The present invention provides a protein capable of demethylating and modifying nucleic acid efficiently and site-specifically and its application.
在本发明的第一方面,提供了一种融合蛋白,包括选自下列的组分:In the first aspect of the present invention, a fusion protein is provided, comprising components selected from the following:
(1)定位功能元件D1,其具有靶向和结合DNA的功能;和(1) The positioning function element D1, which has the function of targeting and binding DNA; and
(2)去甲基化功能元件D2,其具有将甲基化核苷酸转化为非甲基化核苷酸的功能。(2) Demethylation functional element D2, which has the function of converting methylated nucleotides into non-methylated nucleotides.
在另一优选例中,所述D1元件无催化活性,并且选自下组:Cas蛋白、锌指蛋白或 TALENs蛋白,或其功能结构域,或其组合。In another preferred embodiment, the D1 element has no catalytic activity and is selected from the following group: Cas protein, zinc finger protein or TALENs protein, or functional domains thereof, or a combination thereof.
在另一优选例中,所述D1元件选自下组:dCas9、dCpf1、dCas12、dCas13、dCms1、dMAD7,或其功能域,或其组合。In another preferred embodiment, the D1 element is selected from the following group: dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7, or functional domains thereof, or combinations thereof.
在另一优选例中,所述D1元件为dCas9。In another preferred embodiment, the D1 element is dCas9.
在另一优选例中,所述D1元件包含选自下列的序列,或由选自下列的序列组成:In another preferred example, the D1 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
(1)SEQ ID NO:1所示的氨基酸序列;(1) The amino acid sequence shown in SEQ ID NO: 1;
(2)与SEQ ID NO:1所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸的置换、缺失或添加)的序列;或(2) Compared with the sequence shown in SEQ ID NO: 1, there are one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, etc.) 8, 9, or 10 amino acid substitutions, deletions or additions) sequence; or
(3)与SEQ ID NO:1所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。(3) The sequence shown in SEQ ID NO:1 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, A sequence with at least 97%, at least 98%, or at least 99% sequence identity.
在另一优选例中,所述D2元件具有将甲基化胞嘧啶转换为非甲基化胞嘧啶的功能。In another preferred embodiment, the D2 element has the function of converting methylated cytosine into unmethylated cytosine.
在另一优选例中,所述D2元件是选自下组的去甲基化酶或其去甲基化功能域:ROS1、TET、DME、DML,或其组合。In another preferred example, the D2 element is a demethylase or its demethylation functional domain selected from the group consisting of ROS1, TET, DME, DML, or a combination thereof.
在另一优选例中,所述D2元件是ROS1或其功能域。In another preferred embodiment, the D2 element is ROS1 or its functional domain.
在另一优选例中,所述D2元件包含选自下列的序列,或由选自下列的序列组成:In another preferred example, the D2 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
(1)SEQ ID NO:3所示的氨基酸序列;(1) The amino acid sequence shown in SEQ ID NO: 3;
(2)SEQ ID NO:3所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸的置换、缺失或添加)的序列;或(2) The sequence shown in SEQ ID NO: 3 has one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8 1, 9, or 10 amino acid substitutions, deletions or additions) sequence; or
(3)SEQ ID NO:3所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。(3) The sequence shown in SEQ ID NO: 3 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least A sequence of 97%, at least 98%, or at least 99% sequence identity.
在另一优选例中,所述的D1元件位于D2元件的N端或C端。In another preferred example, the D1 element is located at the N end or C end of the D2 element.
在另一优选例中,所述的D1元件和D2元件通过一个或多个下列组件连接:肽键、连接肽、核定位信号、表位标签,或其组合。In another preferred example, the D1 element and the D2 element are connected by one or more of the following components: peptide bond, connecting peptide, nuclear localization signal, epitope tag, or a combination thereof.
在另一优选例中,所述核定位信号,包含选自下列的序列,或由选自下列的序列组成:In another preferred example, the nuclear localization signal comprises a sequence selected from the following, or consists of a sequence selected from the following:
(1)SEQ ID NO:5或SEQ ID NO:7所示的氨基酸序列;(1) The amino acid sequence shown in SEQ ID NO: 5 or SEQ ID NO: 7;
(2)与SEQ ID NO:5或SEQ ID NO:7所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸的置换、缺失或添加)的序列;或(2) Compared with the sequence shown in SEQ ID NO: 5 or SEQ ID NO: 7, there are one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions) sequence; or
(3)与SEQ ID NO:5或SEQ ID NO:7所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。(3) The sequence shown in SEQ ID NO: 5 or SEQ ID NO: 7 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%. %, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity.
在另一优选例中,在所述的表位标签选自下组:His标签、GST标签、HA标签、c-Myc标签、Flag标签、V5标签,或其组合。In another preferred example, the epitope tag is selected from the group consisting of His tag, GST tag, HA tag, c-Myc tag, Flag tag, V5 tag, or a combination thereof.
在另一优选例中,所述融合蛋白,包含选自下列的序列,或由选自下列的序列组成:In another preferred example, the fusion protein comprises a sequence selected from the following, or consists of a sequence selected from the following:
(1)SEQ ID NO:9所示的氨基酸序列;(1) The amino acid sequence shown in SEQ ID NO: 9;
(2)与SEQ ID NO:9所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸的置换、缺失或添加)的序列;或(2) Compared with the sequence shown in SEQ ID NO: 9, there are one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, etc.) 8, 9, or 10 amino acid substitutions, deletions or additions) sequence; or
(3)与SEQ ID NO:9所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。(3) The sequence shown in SEQ ID NO: 9 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, A sequence with at least 97%, at least 98%, or at least 99% sequence identity.
在另一优选例中,所述融合蛋白的N端或C端进一步包括以下一种或多种元件:表位标签、报告基因序列、核定位信号(NLS)、叶绿体信号肽、转录激活结构域(例如,VP64)、转录抑制结构域(例如KRAB结构与或SID结构域)、核酸酶结构域(例如Fok1),或其组合。In another preferred embodiment, the N-terminal or C-terminal of the fusion protein further includes one or more of the following elements: epitope tag, reporter gene sequence, nuclear localization signal (NLS), chloroplast signal peptide, transcription activation domain (E.g., VP64), transcription repression domain (e.g. KRAB structure and or SID domain), nuclease domain (e.g. Fok1), or a combination thereof.
在本发明的第二方面,提供了一种融合蛋白组合,所述融合蛋白组合中包括第一融合蛋白和第二融合蛋白,所述第一融合蛋白和所述第二融合蛋白的结构如本发明第一方面所述的融合蛋白所示;其特征在于,所述第一融合蛋白和所述第二融合蛋白中的D2是不同的;In the second aspect of the present invention, a fusion protein combination is provided, the fusion protein combination includes a first fusion protein and a second fusion protein, and the structure of the first fusion protein and the second fusion protein is the same as the present The fusion protein according to the first aspect of the invention is shown; it is characterized in that D2 in the first fusion protein and the second fusion protein are different;
在另一优选例中,所述第一融合蛋白或第二融合蛋白从N端到C端具有式I所示的结构;In another preferred embodiment, the first fusion protein or the second fusion protein has the structure shown in formula I from the N-terminus to the C-terminus;
D1-(X)n-D2   (式I)D1-(X)n-D2 (Formula I)
式中,Where
D1为定位功能元件,其具有靶向和结合DNA的功能;D1 is a positioning function element, which has the function of targeting and binding DNA;
D2为去甲基化功能元件,其具有将甲基化核苷酸转化为非甲基化核苷酸的功能;D2 is a demethylated functional element, which has the function of converting methylated nucleotides into unmethylated nucleotides;
X为连接肽、表位标签或核定位信号(NLS);X is connecting peptide, epitope tag or nuclear localization signal (NLS);
n表示0-6的整数;n represents an integer of 0-6;
“-”表示连接上述元件的肽键;"-" indicates a peptide bond connecting the above-mentioned elements;
其中,D1和D2的位置顺序可互换;Among them, the position order of D1 and D2 can be interchanged;
并且,所述第一融合蛋白和第二融合蛋白各自结构中的D2是不同的。In addition, D2 in the respective structures of the first fusion protein and the second fusion protein are different.
当n为0时,D1、D2直接通过肽键连接。When n is 0, D1 and D2 are directly connected by peptide bonds.
在另一优选例中,n为1。In another preferred example, n is 1.
在另一优选例中,X为核定位信号。In another preferred example, X is a nuclear localization signal.
在另一优选例中,所述第一融合蛋白的D2为ROS1或其功能域;第二融合蛋白的D2为TET1或其功能域。In another preferred example, D2 of the first fusion protein is ROS1 or its functional domain; D2 of the second fusion protein is TET1 or its functional domain.
在本发明的第三方面,提供了一种核酸,编码如本发明第一方面所述的融合蛋白。In the third aspect of the present invention, there is provided a nucleic acid encoding the fusion protein according to the first aspect of the present invention.
在另一优选例中,所述核酸的序列包括以下元件:In another preferred example, the sequence of the nucleic acid includes the following elements:
(1)Z1,为编码所述融合蛋白中的定位功能元件D1的核苷酸序列;和(1) Z1 is the nucleotide sequence encoding the positioning function element D1 in the fusion protein; and
(2)Z2,为编码所述融合蛋白中的去甲基化功能元件D2的核苷酸序列。(2) Z2 is the nucleotide sequence encoding the functional demethylation element D2 in the fusion protein.
在另一优选例中,所述Z1元件包含选自下列的序列,或由选自下列的序列组成:In another preferred example, the Z1 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
(i)SEQ ID NO:2所示的序列;(i) SEQ ID NO: the sequence shown in 2;
(ii)与SEQ ID NO:2所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如 1个、2个、3个、4个、5个、6个、7个、8个、9个或10个碱基的置换、缺失或添加)的序列;(ii) Compared with the sequence shown in SEQ ID NO: 2, there are substitutions, deletions or additions of one or more bases (e.g. 1, 2, 3, 4, 5, 6, 7 , 8, 9, or 10 base substitution, deletion or addition) sequence;
(iii)与SEQ ID NO:2所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) The sequence shown in SEQ ID NO: 2 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% Sequence of sequence identity;
(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) A sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or
(v)(i)-(iii)任一项中所述的序列的反向互补序列。(v) The reverse complement of the sequence described in any one of (i) to (iii).
在另一优选例中,所述Z2元件包含选自下列的序列,或由选自下列的序列组成:In another preferred example, the Z2 element comprises a sequence selected from the following, or consists of a sequence selected from the following:
(i)SEQ ID NO:4所示的序列;(i) SEQ ID NO: the sequence shown in 4;
(ii)与SEQ ID NO:4所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个碱基的置换、缺失或添加)的序列;(ii) Compared with the sequence shown in SEQ ID NO: 4, there are substitutions, deletions or additions of one or more bases (e.g. 1, 2, 3, 4, 5, 6, 7 , 8, 9, or 10 base substitution, deletion or addition) sequence;
(iii)与SEQ ID NO:4所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) The sequence shown in SEQ ID NO: 4 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% Sequence of sequence identity;
(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) A sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or
(v)(i)-(iii)任一项中所述的序列的反向互补序列。(v) The reverse complement of the sequence described in any one of (i) to (iii).
在另一优选例中,所述核酸包含编码核定位信号的序列,所述编码核定位信号的序列具有选自下列的序列,或由选自下列的序列组成:In another preferred example, the nucleic acid comprises a sequence encoding a nuclear localization signal, and the sequence encoding the nuclear localization signal has a sequence selected from the following, or consists of a sequence selected from the following:
(i)SEQ ID NO:6或SEQ ID NO:8任一项所示的序列;(i) The sequence shown in any one of SEQ ID NO: 6 or SEQ ID NO: 8;
(ii)与SEQ ID NO:6或SEQ ID NO:8任一项所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个碱基的置换、缺失或添加)的序列;(ii) Compared with the sequence shown in any one of SEQ ID NO: 6 or SEQ ID NO: 8 with one or more base substitutions, deletions or additions (for example, 1, 2, 3, 4 , 5, 6, 7, 8, 9, or 10 base substitutions, deletions or additions);
(iii)与SEQ ID NO:6或SEQ ID NO:8任一项所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) The sequence shown in any one of SEQ ID NO: 6 or SEQ ID NO: 8 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80% , A sequence with at least 90%, at least 95% sequence identity;
(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) A sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or
(v)(i)-(iii)任一项中所述的序列的反向互补序列。(v) The reverse complement of the sequence described in any one of (i) to (iii).
在另一优选例中,所述核酸具有选自下列的序列,或由选自下列的序列组成:In another preferred example, the nucleic acid has a sequence selected from the following, or consists of a sequence selected from the following:
(i)SEQ ID NO:10所示的序列;(i) SEQ ID NO: the sequence shown in 10;
(ii)与SEQ ID NO:10所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个碱基的置换、缺失或添加)的序列;(ii) Compared with the sequence shown in SEQ ID NO: 10, there are substitutions, deletions or additions of one or more bases (e.g. 1, 2, 3, 4, 5, 6, 7 , 8, 9, or 10 base substitution, deletion or addition) sequence;
(iii)与SEQ ID NO:10所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) The sequence shown in SEQ ID NO: 10 has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% Sequence of sequence identity;
(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) A sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or
(v)(i)-(iii)任一项中所述的序列的反向互补序列。(v) The reverse complement of the sequence described in any one of (i) to (iii).
在本发明的第四方面,提供了一种核酸构建物,包括第一核酸序列和一个或多个第二核酸序列,其中,所述第一核酸序列编码本发明第一方面所述的融合蛋白或本发明第二方面所述的融合蛋白组合,所述第二核酸序列为gRNA编码序列。In the fourth aspect of the present invention, there is provided a nucleic acid construct comprising a first nucleic acid sequence and one or more second nucleic acid sequences, wherein the first nucleic acid sequence encodes the fusion protein according to the first aspect of the present invention Or the fusion protein combination according to the second aspect of the present invention, wherein the second nucleic acid sequence is a gRNA coding sequence.
在另一优选例中,所述的第一核酸序列的5 端和/或3 端包括一个或多个核定位信号。 In another preferred embodiment, the 5 'and / or 3' of the end of the first nucleic acid sequence comprises one or more nuclear localization signal.
在另一优选例中,所述第一核酸序列的一端含有一个启动子,和任选地,另一端含有一个终止子;所述启动子选自RNA聚合酶II依赖型启动子,所述启动子选自UBI、UBQ、35S、Actin、SPL,CmYLCV、YAO、CDC45、rbcS、rbcL、PsGNS2、UEP1、TobRB7、Cab,或其组合。In another preferred embodiment, one end of the first nucleic acid sequence contains a promoter, and optionally, the other end contains a terminator; the promoter is selected from RNA polymerase II-dependent promoters, and the promoter The sub is selected from UBI, UBQ, 35S, Actin, SPL, CmYLCV, YAO, CDC45, rbcS, rbcL, PsGNS2, UEP1, TobRB7, Cab, or a combination thereof.
在另一优选例中,所述核酸构建物含有1-6个gRNA编码序列。In another preferred embodiment, the nucleic acid construct contains 1-6 gRNA coding sequences.
在另一优选例中,所述的gRNA编码序列串联分布于第一核酸序列的5 端或3 端。 In another preferred embodiment, the coding sequence of the gRNA series distribution at the 5 'end or 3' end of the first nucleic acid sequence.
在另一优选例中,当含有两个或两个以上的第二核酸序列时,所述的第二核酸序列分布于第一核酸构序列的两端。In another preferred example, when two or more second nucleic acid sequences are contained, the second nucleic acid sequences are distributed at both ends of the first nucleic acid structure sequence.
在另一优选例中,所述的第二核酸序列中每个gRNA编码序列的5’端均含有一个RNA聚合酶III依赖型启动子启动子,所述启动子选自:U6、U3、U6a、U6b、U6c、U6-1、U3b、U3d、U6-26、U6-29、7SL或5H1。In another preferred example, the 5'end of each gRNA coding sequence in the second nucleic acid sequence contains an RNA polymerase III-dependent promoter promoter, and the promoter is selected from: U6, U3, U6a , U6b, U6c, U6-1, U3b, U3d, U6-26, U6-29, 7SL or 5H1.
在本发明的第五方面,提供了一种载体,含有本发明第三方面所述的核酸或本发明第四方面所述的核酸构建物。In the fifth aspect of the present invention, a vector is provided, which contains the nucleic acid according to the third aspect of the present invention or the nucleic acid construct according to the fourth aspect of the present invention.
在本发明的第六方面,提供了一种复合物,包含:In the sixth aspect of the present invention, there is provided a composite comprising:
(1)蛋白组分,包含本发明第一方面所述融合蛋白或本发明第二方面所述的融合蛋白组合。(1) A protein component comprising the fusion protein described in the first aspect of the present invention or the fusion protein combination described in the second aspect of the present invention.
(2)核酸组分,为一个或多个gRNA序列;(2) The nucleic acid component is one or more gRNA sequences;
其中,所述的蛋白组分与核酸组分相互结合形成所述复合物。Wherein, the protein component and the nucleic acid component combine with each other to form the complex.
在本发明的第七方面,提供了一种多核苷酸组合,其编码本发明第二方面所述的融合蛋白组合。In the seventh aspect of the present invention, a polynucleotide combination is provided, which encodes the fusion protein combination according to the second aspect of the present invention.
在另一优选例中,所述多核苷酸组合中包括第一多核苷酸和第二多核苷酸,其中,所述第一多核苷酸和第二核苷酸均编码如本发明第一方面所述的融合蛋白,且两个融合蛋白的D2元件不同。In another preferred embodiment, the polynucleotide combination includes a first polynucleotide and a second polynucleotide, wherein both the first polynucleotide and the second nucleotide encodes as in the present invention The fusion protein described in the first aspect, and the D2 elements of the two fusion proteins are different.
在另一优选例中,所述第一核苷酸和第二核苷酸分别还包括一个或多个gRNA编码序列。In another preferred embodiment, the first nucleotide and the second nucleotide respectively further include one or more gRNA coding sequences.
在另一优选例中,所述第一多核苷酸和第二核苷酸位于同一载体或不同载体中。In another preferred embodiment, the first polynucleotide and the second nucleotide are located in the same vector or different vectors.
在另一优选例中,所述第一多核苷酸和第二核苷酸位于不同载体中。In another preferred embodiment, the first polynucleotide and the second nucleotide are located in different vectors.
在另一优选例中,含有第一核酸的载体和含有第二核酸的载体同时或依次转化细胞。In another preferred embodiment, the vector containing the first nucleic acid and the vector containing the second nucleic acid transform cells simultaneously or sequentially.
在本发明的第八方面,提供了一种宿主细胞,所述宿主细胞中含有如本发明第一方面所述的融合蛋白、或本发明第二方面所述的融合蛋白组合、或本发明第五方面所述的载体、或本发明第六方面所述的复合物,或所述宿主细胞的基因组中整合有本发明第三方面所述的多核苷酸、或本发明第四方面所述的核酸构建物、或本发明第七方面所述的多核苷酸组合。In the eighth aspect of the present invention, there is provided a host cell containing the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention, or the first aspect of the present invention. The vector of the fifth aspect, or the complex of the sixth aspect of the present invention, or the polynucleotide of the third aspect of the present invention or the polynucleotide of the fourth aspect of the present invention integrated in the genome of the host cell The nucleic acid construct, or the polynucleotide combination according to the seventh aspect of the present invention.
在另一优选例中,所述宿主细胞为真核细胞或原核细胞。In another preferred embodiment, the host cell is a eukaryotic cell or a prokaryotic cell.
在另一优选例中,所述宿主细胞为植物细胞。In another preferred embodiment, the host cell is a plant cell.
在另一优选例中,所述植物为单子叶植物或双子叶植物。In another preferred example, the plant is a monocotyledonous plant or a dicotyledonous plant.
本发明第九方面,提供了一种制备本发明第一方面所述融合蛋白的方法,其包括以下步骤:The ninth aspect of the present invention provides a method for preparing the fusion protein of the first aspect of the present invention, which includes the following steps:
(1)在合适的条件下表达本发明第八方面所述的宿主细胞,(1) expressing the host cell according to the eighth aspect of the present invention under suitable conditions,
(2)分离提取所述的融合蛋白。(2) Separate and extract the fusion protein.
在另一优选例中,所述的宿主细胞中含有本发明第五方面所述的载体,或基因组中整合有本发明第三方面所述的多核苷酸。In another preferred embodiment, the host cell contains the vector according to the fifth aspect of the present invention, or the polynucleotide according to the third aspect of the present invention is integrated into the genome.
在本发明的第十方面,提供了本发明第一方面所述的融合蛋白、或本发明第二方面所述的融合蛋白组合、或本发明第三方面所述的核酸、或本发明第四方面所述的核酸构建物、或本发明第五方面所述的载体、或本发明第六方面所述的复合物、或本发明第七方面所述的多核苷酸组合在对目标核酸进行去甲基化修饰中的用途。In the tenth aspect of the present invention, the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention, or the nucleic acid according to the third aspect of the present invention, or the fourth aspect of the present invention is provided. The nucleic acid construct described in the aspect, or the vector described in the fifth aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the polynucleotide combination described in the seventh aspect of the present invention is used to remove the target nucleic acid. Use in methylation modification.
在另一优选例中,所述的去甲基化为将甲基化胞嘧啶转变为非甲基化胞嘧啶。In another preferred embodiment, the demethylation is the conversion of methylated cytosine to unmethylated cytosine.
在另一优选例中,所述目标核酸来自于真核生物或原核生物。In another preferred example, the target nucleic acid is derived from a eukaryote or a prokaryote.
在另一优选例中,所述目标核酸来自植物细胞或动物细胞。In another preferred embodiment, the target nucleic acid is derived from plant cells or animal cells.
在另一优选例中,所述目标核酸来自于细胞核、细胞质、叶绿体或线粒体。In another preferred embodiment, the target nucleic acid is derived from the nucleus, cytoplasm, chloroplast or mitochondria.
在另一优选例中,所述目标核酸为DNA、RNA或其组合。In another preferred embodiment, the target nucleic acid is DNA, RNA or a combination thereof.
在本发明的第十一方面,提供了本发明第一方面所述的融合蛋白、或本发明第二方面所述的融合蛋白组合、或本发明第三方面所述的核酸、或本发明第四方面所述的核酸构建物、或本发明第五方面所述的载体、或本发明第六方面所述的复合物、或本发明第七方面所述的多核苷酸组合在制备用于对目标核酸进行去甲基化修饰的试剂盒中的用途。In the eleventh aspect of the present invention, the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention, or the nucleic acid according to the third aspect of the present invention, or the first aspect of the present invention is provided. The nucleic acid construct described in the fourth aspect, or the vector described in the fifth aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the polynucleotide combination described in the seventh aspect of the present invention is prepared for Use in a kit for demethylation modification of target nucleic acid.
在本发明的第十二方面,提供了一种试剂盒,包含下组中的一种或多种:本发明第一方面所述的融合蛋白、或本发明第二方面所述的融合蛋白组合、或本发明第三方面所述的核酸、或本发明第四方面所述的核酸构建物、本发明第五方面所述的载体、本发明第六方面所述的复合物、本发明第七方面所述的多核苷酸组合,和本发明第八方面所述的宿主细胞。In the twelfth aspect of the present invention, a kit is provided, comprising one or more of the following group: the fusion protein according to the first aspect of the present invention, or the fusion protein combination according to the second aspect of the present invention , Or the nucleic acid according to the third aspect of the present invention, or the nucleic acid construct according to the fourth aspect of the present invention, the vector according to the fifth aspect of the present invention, the complex according to the sixth aspect of the present invention, the seventh aspect of the present invention The polynucleotide combination described in this aspect and the host cell described in the eighth aspect of the present invention.
在本发明的第十三方面,提供了一种减少细胞中靶基因或其启动子或其增强子的DNA甲基化的方法,所述的方法在细胞中表达本发明第一方面所述融合蛋白,和一个或多个与所述靶基因相关的gRNA。In the thirteenth aspect of the present invention, there is provided a method for reducing DNA methylation of a target gene or its promoter or its enhancer in a cell. The method expresses the fusion described in the first aspect of the present invention in the cell. Protein, and one or more gRNAs related to the target gene.
本发明第十四方面,提供了一种调控靶基因表达的方法,其包括以下步骤:表达本发明第一方面所述的融合蛋白,并使其与靶基因或靶基因的表达调控元件结合,使该部位DNA发生去甲基化。The fourteenth aspect of the present invention provides a method for regulating the expression of a target gene, which includes the following steps: expressing the fusion protein described in the first aspect of the present invention and combining it with the target gene or the expression control element of the target gene, Demethylate the DNA at this site.
在另一优选例中,所述的调控包括:激活、增强、抑制、降低或使失活。In another preferred embodiment, the regulation includes: activation, enhancement, inhibition, reduction or inactivation.
在另一优选例中,本发明提供了一种激活或增强基因表达的方法,其包括以下步骤:表达本发明第一方面所述的融合蛋白,并使其与靶基因的表达调控元件结合,使该部位DNA发生去甲基化。In another preferred embodiment, the present invention provides a method for activating or enhancing gene expression, which comprises the following steps: expressing the fusion protein described in the first aspect of the present invention and combining it with the expression control element of the target gene, Demethylate the DNA at this site.
在另一优选例中,所述的表达调控元件包括:启动子、增强子、终止子、转座子、沉默子。In another preferred example, the expression control elements include: promoter, enhancer, terminator, transposon, and silencer.
在本发明的第十五方面,提供了一种调控植物性状的方法,其特征在于包括以下步骤:In the fifteenth aspect of the present invention, a method for regulating plant traits is provided, which is characterized by comprising the following steps:
(i)提供一种植物细胞;(i) Provide a plant cell;
(ii)将表达本发明第一方面所述融合蛋白和调控基因相关的gRNA的核酸序列导入所述的植物细胞,并整合到基因组中;(ii) The nucleic acid sequence expressing the gRNA related to the fusion protein and the regulatory gene of the first aspect of the present invention is introduced into the plant cell and integrated into the genome;
(iii)将所述的细胞培养成苗;(iii) culturing the cells into seedlings;
(iv)筛选具有目标性状的植株。(iv) Screening plants with target traits.
在另一优选例中,所述的导入细胞的方法包括农杆菌侵染、基因枪转化、显微注射法、电击法、超声波法和聚乙二醇(PEG)介导法。In another preferred embodiment, the method for introducing cells includes Agrobacterium infection, gene gun transformation, microinjection, electric shock, ultrasound, and polyethylene glycol (PEG)-mediated method.
在另一优选例中,所述的性状为植物表观遗传性状。In another preferred embodiment, the traits are epigenetic traits of plants.
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described in the following (such as the embodiments) can be combined with each other to form a new or preferred technical solution. Due to space limitations, I will not repeat them one by one here.
附图说明Description of the drawings
图1显示了实施例1中转基因T1植物中对靶向区域MEMS的甲基化水平的降低。Figure 1 shows the decrease in the methylation level of MEMS in the target region in the transgenic T1 plant in Example 1.
图2显示了实施例1中转基因T1植物中ROS1的表达水平。Figure 2 shows the expression level of ROS1 in the transgenic T1 plant in Example 1.
图3显示了实施例1中转基因T2植物中MEMS位点去甲基化的遗传稳定性。Figure 3 shows the genetic stability of MEMS site demethylation in the transgenic T2 plant in Example 1.
图4显示了实施例1中不同区域的去甲基化结果。Figure 4 shows the results of demethylation of different regions in Example 1.
图5显示了实施例2中转基因T2植株的遗传稳定性。Figure 5 shows the genetic stability of the transgenic T2 plants in Example 2.
图6显示了去甲基化基因编辑工具的结构组成。Figure 6 shows the structural composition of the demethylated gene editing tool.
具体实施方式Detailed ways
本发明人经过广泛而深入的研究,经过大量的筛选,首次开发了一种高效、定点去除DNA甲基化修饰的方法。具体地,本发明人将具有靶向和结合DNA功能的dCas9或其功能域与去甲基化酶ROS1或其功能域进行融合,从而获得一种融合蛋白;并且,引入多个对应于目标核酸序列的gRNA序列,从而进行精准定位,以对目标核酸区域进行去甲基化修饰。实验表明,本发明的去甲基化方法在植物中具有精准高效的去甲基化修饰效率,对研究植物的表观遗传学及通过去甲基化调控植物性状具有重要科学价值。在此基础上,完成了本发明。After extensive and in-depth research and a large number of screenings, the inventors developed an efficient and site-specific method for removing DNA methylation modification for the first time. Specifically, the present inventors fused dCas9 or its functional domain with the function of targeting and binding DNA with the demethylase ROS1 or its functional domain to obtain a fusion protein; and introduced multiple nucleic acids corresponding to the target. The sequence of the gRNA sequence can be accurately positioned to demethylate the target nucleic acid region. Experiments show that the demethylation method of the present invention has precise and efficient demethylation modification efficiency in plants, and has important scientific value for studying epigenetics of plants and regulating plant traits through demethylation. On this basis, the present invention has been completed.
术语the term
如本文所用,术语“融合蛋白”是指本发明第一方面所述的融合蛋白,其具有靶向结合DNA并且使目标甲基化核苷酸转化为非甲基化核苷酸的功能。As used herein, the term "fusion protein" refers to the fusion protein described in the first aspect of the present invention, which has the function of targeted binding to DNA and converting target methylated nucleotides into unmethylated nucleotides.
如本文所用,术语“融合蛋白组合”是指本发明中的多种融合蛋白的组合,在本发明的融合蛋白组合中,各融合蛋白具有不同的去甲基化酶催化结构域。优选地,所述的不同的去甲基化酶催化结构域对不同目标核酸位点的去甲基化效果不同,从而互相起到互补的作用。As used herein, the term "fusion protein combination" refers to a combination of multiple fusion proteins in the present invention. In the fusion protein combination of the present invention, each fusion protein has a different demethylase catalytic domain. Preferably, the different demethylase catalytic domains have different demethylation effects on different target nucleic acid sites, so that they complement each other.
如本文所用,术语“Cas蛋白”指一种核酸酶。一种优选的Cas蛋白是Cas9蛋白。典型的Cas9蛋白包括(但并不限于):来源于葡萄球菌(Staphylococcus aureus)的Cas9。在本发明中,所述的Cas9蛋白还可以被来源于其他CRISPR系统的Cas蛋白替换,如Cpf1核酸酶,所述Cpf1核酸酶的来源选自下组:酸性氨基球菌(Acidaminococcus)、毛螺菌科(Lachnospiraceae)、酸性氨基球菌突变体、毛螺菌科突变体。所述的“dCas9、dCpf1、dCas12、dCas13、dCms1、dMAD7”中的“d”代表“dead”,表示失去酶切割活性的Cas蛋白,即不能切割单链或双链DNA序列,但仍然能够与gRNA形成复合物,靶向并结合DNA序列。As used herein, the term "Cas protein" refers to a nuclease. A preferred Cas protein is the Cas9 protein. Typical Cas9 proteins include (but are not limited to): Cas9 derived from Staphylococcus aureus. In the present invention, the Cas9 protein can also be replaced by Cas proteins derived from other CRISPR systems, such as Cpf1 nuclease. The source of the Cpf1 nuclease is selected from the group consisting of Acidaminococcus and Laureus sp. Family (Lachnospiraceae), acid aminococcus mutants, Lachnospiraceae mutants. The "d" in the "dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7" stands for "dead", which means Cas protein that has lost its enzymatic cleavage activity, that is, it cannot cut single-stranded or double-stranded DNA sequences, but can still interact with The gRNA forms a complex that targets and binds to the DNA sequence.
如本文所用,术语“表位标签”通过分子遗传学手段,表位标签可以融合至目的蛋白的N端或C端,通过情况下不会影响目的蛋白的生物活性,而且易于用目的蛋白的检测。As used herein, the term "epitope tag" can be fused to the N-terminus or C-terminus of the target protein through molecular genetics, without affecting the biological activity of the target protein, and it is easy to detect with the target protein. .
如本文所使用,所述“连接肽”是将D1元件和D2元件相连形成融合蛋白、由多个氨基酸组成的短肽链,连接肽不影响融合蛋白功能的表达。所述连接肽的长度一般为1-100aa,较佳地,15-85aa,更佳地,25-70aa,更佳地,24-32aa。例如,常用的连接肽可以选用XTEN。As used herein, the "connecting peptide" is a short peptide chain composed of multiple amino acids that connects the D1 element and the D2 element to form a fusion protein. The connecting peptide does not affect the expression of the fusion protein. The length of the connecting peptide is generally 1-100 aa, preferably, 15-85 aa, more preferably, 25-70 aa, more preferably, 24-32 aa. For example, the commonly used connecting peptide can be XTEN.
如本文所用,所述的“gRNA”又称为guide RNA或导向RNA,并且具有本领域技术人员通常理解的含义。一般而言,导向RNA可以包含同向(direct)重复序列和导向序列(guide sequence),或者基本上由或由同向重复序列和导向序列(在内源性CRISPR系统背景下也称为间隔序列(spacer))组成。gRNA在不同的CRISPR系统中,依据其所依赖的Cas蛋白的不同,可以包括crRNA和tracrRNA,也可以只含有crRNA。crRNA和tracrRNA可以经过人工改造融合形成single guide RNA(sgRNA)。本发明所述的gRNA可以是天然的,也可以是经过人工改造或设计合成的。在某些情况下,导向序列是与靶序列具有足够互补性从而与所述靶序列杂交并引导CRISPR/Cas复合物与所述靶序列的特异性结合的任何多核苷酸序列,通常具有17-23nt的序列长度。在某些实施方案中,当最佳比对时,导向序列与其相应靶序列之间的互补程度为至少50%、至少60%、至少70%、至少80%、至少90%、至少95%、或至少99%。确定最佳比对在本领域的普通技术人员的能力范围内。例如,存在公开和可商购的比对算法和程序,诸如但不限于ClustalW、matlab中的史密斯-沃特曼算法(Smith-Waterman)、Bowtie、Geneious、Biopython以及SeqMan。As used herein, the "gRNA" is also called guide RNA or guide RNA, and has the meaning commonly understood by those skilled in the art. Generally speaking, guide RNAs can include direct repeats and guide sequences, or consist essentially of direct repeats and guide sequences (also called spacers in the context of endogenous CRISPR systems). (spacer)) composition. In different CRISPR systems, gRNA can include crRNA and tracrRNA, or only crRNA, depending on the Cas protein it depends on. crRNA and tracrRNA can be artificially modified and fused to form single guide RNA (sgRNA). The gRNA of the present invention may be natural, or artificially modified or designed and synthesized. In some cases, the targeting sequence is any polynucleotide sequence that has sufficient complementarity with the target sequence to hybridize with the target sequence and guide the specific binding of the CRISPR/Cas complex to the target sequence, usually having 17- Sequence length of 23nt. In certain embodiments, when optimally aligned, the degree of complementarity between the targeting sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, Or at least 99%. Determining the best alignment is within the abilities of those of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs, such as but not limited to ClustalW, Smith-Waterman algorithm in matlab, Bowtie, Geneious, Biopython, and SeqMan.
如本文所用,所述的“功能域”是指的蛋白或酶中独立发挥其生物学功能、具有特异结构的区域。其可以是蛋白结构的一部分,也可以由一个或多个蛋白结构域以可操作连接的方式组成。结构域是由不同的二级结构和超二级结构组合而成,在蛋白功能表达中承担部分或全部生理功能的亚单位。常见结构域的氨基酸残基数在100~400个之间,最小的结构域只有40~50个氨基酸残基,大的结构域可超过400个氨基酸残基。As used herein, the "functional domain" refers to a region of a protein or enzyme that independently performs its biological function and has a specific structure. It can be a part of the protein structure, or it can be composed of one or more protein domains in an operably linked manner. The domain is a combination of different secondary structures and super-secondary structures, and is a subunit that bears part or all of the physiological functions in the expression of protein functions. The number of amino acid residues in common domains is between 100 and 400, the smallest domain has only 40 to 50 amino acid residues, and the larger domain can exceed 400 amino acid residues.
如本文所用,所述的“表观遗传”是指基因的DNA序列没有发生改变的情况下,基 因功能发生了可遗传的变化,并最终导致了表型的变化。目前发现的影响表观遗传的机制有以下几种:DNA修饰(如DNA甲基化)、蛋白质共价修饰、副突变、非编码RNA的调控、染色质重塑或基因组印迹等。本所述的“表观遗传性状”是指植物中表观遗传机制控制或参与调控的、可以观察到的植物性状或特征。As used herein, the term "epigenetic" refers to a genetic function that has undergone heritable changes in the absence of changes in the DNA sequence of a gene, which ultimately leads to a change in phenotype. Currently discovered mechanisms affecting epigenetics include the following: DNA modification (such as DNA methylation), protein covalent modification, paramutation, regulation of non-coding RNA, chromatin remodeling, or genome imprinting. The "epigenetic traits" mentioned herein refer to the observable plant traits or characteristics controlled by or involved in the regulation of epigenetic mechanisms in plants.
去甲基化酶Demethylase
本发明所述的去甲基化修饰主要指5-甲基胞嘧啶(5mC)的修饰,它是一种可逆的表观遗传修饰,在植物的生长发育过程中具有重要的作用。研究表明去甲基化修饰在植物生长发育中与印记基因表达、果实发育、生物和非生物胁迫、根瘤发育和根瘤固氮等过程具有重要的相关性。植物中常见的的去甲基化酶包括但不限于:ROS1、TET1、DME、DML等。The demethylation modification described in the present invention mainly refers to the modification of 5-methylcytosine (5mC), which is a reversible epigenetic modification and plays an important role in the growth and development of plants. Studies have shown that demethylation modification has an important correlation with processes such as imprinted gene expression, fruit development, biotic and abiotic stress, nodule development, and nodule nitrogen fixation in plant growth and development. Common demethylases in plants include but are not limited to: ROS1, TET1, DME, DML, etc.
ROS1是一个具备双功能的糖苷酶,它可以直接切除甲基化胞嘧啶产生一个空碱基位点,接着引发碱基错配修复引入一个未经修饰的胞嘧啶。ROS1 is a dual-function glycosidase that can directly excise methylated cytosine to create an empty base site, and then initiate base mismatch repair to introduce an unmodified cytosine.
TET是一种双加氧酶,它可以将甲基化的胞嘧啶氧化为5-羟甲基胞嘧啶,接着进一步催化为5-甲酰基胞嘧啶和5-羧基胞嘧啶,然后通过DNA糖基酶(TDG)切除掉5-甲酰基胞嘧啶或者5-羧基胞嘧啶产生一个空碱基位点,从而引发碱基错配修复重新引入一个未经修饰的胞嘧啶。TET is a dioxygenase that can oxidize methylated cytosine to 5-hydroxymethylcytosine, and then further catalyze it to 5-formylcytosine and 5-carboxycytosine, and then pass the DNA sugar group The enzyme (TDG) cuts off 5-formylcytosine or 5-carboxycytosine to create an empty base site, which initiates base mismatch repair and reintroduces an unmodified cytosine.
本发明融合蛋白及其编码序列Fusion protein of the present invention and its coding sequence
本发明提供了一种融合蛋白,其具有靶向结合DNA并且使目标甲基化核苷酸转化为非甲基化核苷酸的功能。The present invention provides a fusion protein, which has the function of targeted binding to DNA and converting target methylated nucleotides into unmethylated nucleotides.
其中,所述D1元件无催化活性,并且选自下组:Cas蛋白、锌指蛋白或TALENs蛋白,或其功能结构域,或其组合。例如,所述D1元件选自下组:dCas9、dCpf1、dCas12、dCas13、dCms1、dMAD7,或其组合。优选地,所述D1元件为dCas9。Wherein, the D1 element has no catalytic activity and is selected from the following group: Cas protein, zinc finger protein or TALENs protein, or functional domains thereof, or a combination thereof. For example, the D1 element is selected from the following group: dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7, or a combination thereof. Preferably, the D1 element is dCas9.
在一个优选的实施方式中,所述D1元件是dCas9蛋白的功能结构域,包含SEQ ID NO:1所示的氨基酸序列,或由其组成;其相应的编码核苷酸序列如SEQ ID NO:2所示。In a preferred embodiment, the D1 element is a functional domain of the dCas9 protein, comprising or consisting of the amino acid sequence shown in SEQ ID NO:1; its corresponding coding nucleotide sequence is as SEQ ID NO: 2 shown.
优选地,所述D2元件具有将甲基化胞嘧啶转换为非甲基化胞嘧啶的功能。例如,所述D2元件是选自下组的去甲基化酶或其去甲基化功能域:ROS1、TET、DME、DML,或其组合;优选地,所述D2元件是ROS1或其功能域。Preferably, the D2 element has the function of converting methylated cytosine into unmethylated cytosine. For example, the D2 element is a demethylase or its demethylation functional domain selected from the following group: ROS1, TET, DME, DML, or a combination thereof; preferably, the D2 element is ROS1 or its function area.
在一个优选的实施方式中,所述D2元件是ROS1蛋白的功能域,包含SEQ ID NO:3所示的氨基酸序列,或由其组成;其相应的编码核苷酸序列如SEQ ID NO:4所示。In a preferred embodiment, the D2 element is a functional domain of the ROS1 protein, comprising or consisting of the amino acid sequence shown in SEQ ID NO: 3; its corresponding coding nucleotide sequence is shown in SEQ ID NO: 4 Shown.
在另一优选例中,所述的D1元件和D2元件通过一个或多个下列组件连接:肽键、连接肽、核定位信号、表位标签,或其组合。优选地,所述核定位信号,包含SEQ ID NO:5或SEQ ID NO:7所示的氨基酸序列,或由其组成;其各自相应的编码核苷酸序列如SEQ ID NO:6和SEQ ID NO:8所示。In another preferred example, the D1 element and the D2 element are connected by one or more of the following components: peptide bond, connecting peptide, nuclear localization signal, epitope tag, or a combination thereof. Preferably, the nuclear localization signal comprises or consists of the amino acid sequence shown in SEQ ID NO: 5 or SEQ ID NO: 7; the corresponding coding nucleotide sequences of each thereof are as SEQ ID NO: 6 and SEQ ID NO: Shown at 8.
在一个特别优选的实施方式中,所述融合蛋白,包含选自下列的序列,或由选自下列的序列组成:In a particularly preferred embodiment, the fusion protein comprises a sequence selected from the following, or consists of a sequence selected from the following:
(1)SEQ ID NO:9所示的氨基酸序列;(1) The amino acid sequence shown in SEQ ID NO: 9;
(2)与SEQ ID NO:9所示的序列相比具有一个或多个氨基酸的置换、缺失或添加(例如1个、2个、3个、4个、5个、6个、7个、8个、9个或10个氨基酸的置换、缺失或添加)的序列;或(2) Compared with the sequence shown in SEQ ID NO: 9, there are one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, etc.) 8, 9, or 10 amino acid substitutions, deletions or additions) sequence; or
(3)与SEQ ID NO:9所示的序列具有至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、或至少99%的序列同一性的序列。(3) The sequence shown in SEQ ID NO: 9 has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, A sequence with at least 97%, at least 98%, or at least 99% sequence identity.
在另一个优选的实施方式中,所述融合蛋白的N端或C端进一步包括以下一种或多种元件:表位标签、报告基因序列、核定位信号(NLS)、叶绿体信号肽、转录激活结构域(例如,VP64)、转录抑制结构域(例如KRAB结构与或SID结构域)、核酸酶结构域(例如Fok1),或其组合。In another preferred embodiment, the N-terminal or C-terminal of the fusion protein further includes one or more of the following elements: epitope tag, reporter gene sequence, nuclear localization signal (NLS), chloroplast signal peptide, transcription activation A domain (for example, VP64), a transcription repression domain (for example, a KRAB structure and or a SID domain), a nuclease domain (for example, Fok1), or a combination thereof.
本发明还包括具有本发明融合蛋白的功能的片段和类似物。如本文所用,术语“片段”和“类似物”是指基本上保持本发明的融合蛋白相同的生物学功能或活性的多肽。The present invention also includes fragments and analogs having the functions of the fusion protein of the present invention. As used herein, the terms "fragment" and "analog" refer to polypeptides that substantially maintain the same biological function or activity as the fusion protein of the present invention.
本发明的融合蛋白片段、衍生物或类似物可以是:(i)有一个或多个保守或非保守性氨基酸残基(优选保守性氨基酸残基)被取代的多肽,而这样的取代的氨基酸残基可以是也可以不是由遗传密码编码的;或(ii)在一个或多个氨基酸残基中具有取代基团的多肽;或(iii)成熟多肽与另一个化合物(比如延长多肽半衰期的化合物,例如聚乙二醇)融合所形成的多肽;或(iv)附加的氨基酸序列融合到此多肽序列而形成的多肽(如前导序列或分泌序列或用来纯化此多肽的序列或蛋白原序列,或融合蛋白)。根据本文的定义这些片段、衍生物和类似物属于本领域熟练技术人员公知的范围。The fusion protein fragment, derivative or analogue of the present invention may be: (i) a polypeptide in which one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) are substituted, and such substituted amino acids The residue may or may not be encoded by the genetic code; or (ii) a polypeptide with substitution groups in one or more amino acid residues; or (iii) the mature polypeptide and another compound (such as a compound that prolongs the half-life of the polypeptide) Such as polyethylene glycol) fused to a polypeptide; or (iv) additional amino acid sequence fused to the polypeptide sequence to form a polypeptide (such as a leader sequence or secretory sequence or a sequence or proprotein sequence used to purify the polypeptide, Or fusion protein). According to the definition herein, these fragments, derivatives and analogs belong to the scope well known to those skilled in the art.
本发明中,所述的所述融合蛋白变体是如SEQ ID NO:9所示的氨基酸序列,经过若干个(通常为1-60个,较佳地1-30个,更佳地1-20个,最佳地1-10个)取代、缺失或添加至少一个氨基酸所得的衍生序列,以及在C末端和/或N末端添加一个或数个(通常为20个以内,较佳地为10个以内,更佳地为5个以内)氨基酸。例如,在所述蛋白中,用性能相近或相似的氨基酸进行取代时,通常不会改变蛋白质的功能,在C末端和/或\末端添加一个或数个氨基酸通常也不会改变蛋白质的功能。这些保守性变异最好根据表A进行替换而产生。In the present invention, the said fusion protein variant is the amino acid sequence shown in SEQ ID NO: 9, after several (usually 1-60, preferably 1-30, more preferably 1- 20, preferably 1-10) derived sequences obtained by substituting, deleting or adding at least one amino acid, and adding one or several (usually within 20, preferably 10) at the C-terminus and/or N-terminus Within 5) amino acids. For example, in the protein, when amino acids with similar or similar properties are substituted, the function of the protein is usually not changed, and the addition of one or several amino acids to the C-terminal and/or \terminal usually does not change the function of the protein. These conservative variants are best generated according to Table A by substitution.
表ATable A
Figure PCTCN2021077328-appb-000001
Figure PCTCN2021077328-appb-000001
Figure PCTCN2021077328-appb-000002
Figure PCTCN2021077328-appb-000002
本发明还包括所要求保护的融合蛋白的类似物。这些类似物与本发明序列SEQ ID NO:9的差别可以是氨基酸序列上的差异,也可以是不影响序列的修饰形式上的差异,或者兼而有之。这些蛋白的类似物包括天然或诱导的遗传变异体。诱导变异体可以通过各种技术得到,如通过辐射或暴露于诱变剂而产生随机诱变,还可通过定点诱变法或其他已知分了生物学的技术。类似物还包括具有不同于天然L-氨基酸的残基(如D-氨基酸)的类似物,以及具有非天然存在的或合成的氨基酸(如β、γ-氨基酸)的类似物。应理解,本发明的蛋白并不限于上述例举的代表性的蛋白。The present invention also includes analogs of the claimed fusion protein. The difference between these analogs and the sequence SEQ ID NO: 9 of the present invention may be the difference in the amino acid sequence, the difference in the modified form that does not affect the sequence, or both. Analogs of these proteins include natural or induced genetic variants. Induced variants can be obtained by various techniques, such as random mutagenesis by radiation or exposure to mutagens, site-directed mutagenesis or other known biological techniques. Analogs also include analogs having residues different from natural L-amino acids (such as D-amino acids), and analogs having non-naturally occurring or synthetic amino acids (such as β, γ-amino acids). It should be understood that the protein of the present invention is not limited to the representative proteins exemplified above.
修饰(通常不改变一级结构)形式包括:体内或体外蛋白的化学衍生形式,所述的修饰能够保持或增强或部分抑制蛋白的转运功能;所述的修饰包括氨基酸侧链的化学修饰、肽链末端基团化学修饰,如巯基的化学修饰、氨基的化学修饰、羧基的化学修饰、二硫键的化学修饰及其他修饰;所述的化学修饰包括,磷酸化修饰(如磷酸酪氨酸,磷酸丝氨酸,磷酸苏氨酸)、糖基化修饰(由糖基化酶介导,如N-糖基化、O-糖基化)、脂酰化修饰(如乙酰化、棕榈酰化)等。Modifications (usually without changing the primary structure) include: chemically derived forms of proteins in vivo or in vitro, the modifications can maintain or enhance or partially inhibit the transport function of the protein; the modifications include chemical modifications of amino acid side chains, peptides The chemical modification of the chain end group, such as the chemical modification of the sulfhydryl group, the chemical modification of the amino group, the chemical modification of the carboxyl group, the chemical modification of the disulfide bond and other modifications; the chemical modification includes phosphorylation modification (such as phosphotyrosine, Phosphoserine, phosphothreonine), glycosylation modification (mediated by glycosylase, such as N-glycosylation, O-glycosylation), fatty acylation (such as acetylation, palmitoylation), etc. .
本发明还涉及产生融合蛋白或其片段、衍生物或类似物的方法。包括在(a)有助于所述融合蛋白或其片段、衍生物或类似物生产的条件下培养上述宿主细胞;和(b)分离所述融合蛋白或其片段、衍生物或类似物。The present invention also relates to methods for producing fusion proteins or fragments, derivatives or analogs thereof. It includes (a) culturing the above-mentioned host cell under conditions conducive to the production of the fusion protein or its fragment, derivative or analogue; and (b) isolating the fusion protein or its fragment, derivative or analogue.
在本发明的生产方法中,用本领域众所周知的方法将所述细胞培养于适于所述融合蛋白产生的营养培养基上。若所述多肽被分泌入营养培养基中,则可直接从培养基中回收该多肽。若所述多肽不分泌到培养基中,则可从细胞裂解物中回收它。In the production method of the present invention, the cells are cultured on a nutrient medium suitable for the production of the fusion protein by a method well known in the art. If the polypeptide is secreted into the nutrient medium, the polypeptide can be directly recovered from the medium. If the polypeptide is not secreted into the medium, it can be recovered from cell lysates.
可用本领域已知特异于所述多肽的方法检测该多肽。这些检测方法可包括使用特异抗体、形成酶产物或酶底物的消失。The polypeptide can be detected by methods known in the art that are specific to the polypeptide. These detection methods may include the use of specific antibodies, the formation of enzyme products, or the disappearance of enzyme substrates.
产生的多肽可用本领域已知的方法回收。例如,可以通过离心收获细胞,用物理的或化学的方法使之破碎,并保留得到的粗提取液以进一步纯化。可以用任何方便的方法裂解表达本发明的融合蛋白或其片段、衍生物或类似物的转化宿主细胞,包括冻融循环、超声波、机械破碎或使用细胞溶解剂。这些方法是本领域技术人员熟知的。可以从转化宿主细胞的培养物中回收和纯化本发明的融合蛋白或其片段、衍生物或类似物,采用的方法包括硫酸铵或乙醇沉淀、酸提取、阴离子或阳离子交换层析、磷酸纤维素层析、疏水作用层析、亲合层析、羟磷灰石层析和植物血凝素层析等等。The produced polypeptide can be recovered by methods known in the art. For example, the cells can be harvested by centrifugation, broken up by physical or chemical methods, and the resulting crude extract is retained for further purification. Any convenient method can be used to lyse the transformed host cells expressing the fusion protein of the present invention or its fragments, derivatives or analogs, including freeze-thaw cycles, ultrasound, mechanical disruption, or the use of cytolytic agents. These methods are well known to those skilled in the art. The fusion protein of the present invention or its fragments, derivatives or analogues can be recovered and purified from the culture of transformed host cells. The methods used include ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, and phosphocellulose. Chromatography, Hydrophobic Interaction Chromatography, Affinity Chromatography, Hydroxyapatite Chromatography, Phytohemagglutinin Chromatography, etc.
在一个特别优选的实施方式中,编码本发明融合蛋白的核酸能够编码SEQ ID NO:9所示的氨基酸序列,优选地具有SEQ ID NO:10所示的核苷酸序列。In a particularly preferred embodiment, the nucleic acid encoding the fusion protein of the present invention can encode the amino acid sequence shown in SEQ ID NO: 9, and preferably has the nucleotide sequence shown in SEQ ID NO: 10.
本发明还包括与本发明的优选核酸序列(SEQ ID NO:10)具有至少10%、至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少85%、至少90%、少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%或至少99%序列同源性的核酸。The present invention also includes at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80% with the preferred nucleic acid sequence of the present invention (SEQ ID NO: 10) , At least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence homology nucleic acid .
“同源性”或“同一性”是指两个多肽之间或两个核酸之间序列的匹配情况。当两个进行比较的序列中的某个位置都被相同的碱基或氨基酸单体亚单元占据时(例如,两个DNA分子的每一个中的某个位置都被腺嘌呤占据,或两个多肽的每一个中的某个位置都被赖氨酸占据,)那么各分子在该位置上是同一的。两个序列之间的“百分数同一性”是由这两个序列共有的匹配位置数目初一进行比较的位置数目×100的函数。例如,如果两个序列的10个位置中有6个匹配,那么这两个序列具有60%的同一性。通常,在将两个序列比对难以产生最大同一性时进行比较,这样的比对可以通过使用,例如计算机程序如Align程序(DNAstar,Inc.)(DNAstar,Inc.)方便地进行的Needleman等人(1970)J.Mol.Biol.J.Mol.Biol.J.Mol.Biol.J.Mol.Biol.48:443-453的方法来实现。还可使用已整合入ALIGN程序(版本2.0)的E.Meyers和W.Miller(Comput.Appl Biosci.,4:11-17(1988))的算法,使用PAM120权重残基表(weight residue table)、12的缺口长度罚分和4的缺口罚分来测定两个氨基酸序列之间的百分数同一性。此的方法来实现。此外,可使用已整合入GCG软件包(可在www.gcg.com上获得)的GAP程序中的Needleman和Wunsch(J MoI Biol.48:444-453(1970))算法,使用Blossum 62矩阵或PAM250矩阵以及16、14、12、10、8、6或4的缺口权重(gap weight)和1、2、3、4、5或6的长度权重来测定两个氨基酸序列之间的百分数同一性。在本文中,所述基因的变体可以通过插入或删除调控区域,进行随机或定点突变等来获得。"Homology" or "identity" refers to the matching of sequences between two polypeptides or between two nucleic acids. When a certain position in the two sequences to be compared is occupied by the same base or amino acid monomer subunit (for example, a certain position in each of the two DNA molecules is occupied by adenine, or two A certain position in each of the polypeptides is occupied by lysine, so each molecule is the same at that position. The "percent identity" between two sequences is a function of the number of positions compared by the number of matching positions shared by the two sequences x 100. For example, if 6 out of 10 positions in two sequences match, then the two sequences have 60% identity. Usually, when it is difficult to compare two sequences to produce the maximum identity, such an alignment can be performed by using, for example, Needleman, etc., which is conveniently performed by a computer program such as the Align program (DNAstar, Inc.) (DNAstar, Inc.) Human (1970) J. Mol. Biol. J. Mol. Biol. J. Mol. Biol. J. Mol. Biol. 48: 443-453. You can also use the algorithms of E. Meyers and W. Miller (Comput. Appl Biosci., 4:11-17 (1988)) that have been integrated into the ALIGN program (version 2.0), and use the PAM120 weight residue table (weight residue table) A gap length penalty of 12 and a gap penalty of 4 are used to determine the percent identity between two amino acid sequences. This method can be achieved. In addition, you can use the Needleman and Wunsch (J MoI Biol.48:444-453(1970)) algorithms in the GAP program integrated into the GCG software package (available on www.gcg.com), use the Blossum 62 matrix or PAM250 matrix and gap weights of 16, 14, 12, 10, 8, 6, or 4 and length weights of 1, 2, 3, 4, 5, or 6 to determine the percent identity between two amino acid sequences . Herein, variants of the gene can be obtained by inserting or deleting regulatory regions, performing random or site-directed mutations, and the like.
在本发明中,SEQ ID NO:10中的核苷酸序列可以经过取代、缺失或添加一个或多个,生成SEQ ID NO:10的衍生序列,由于密码子的简并性,即使与SEQ ID NO:10的同源性较低,也能基本编码出如SEQ ID NO:9所示的氨基酸序列。另外,“在SEQ ID NO:10中的核苷酸序列经过取代、缺失或添加至少一个核苷酸衍生序列”的含义还包括能在中度严谨条件下,更佳的在高度严谨条件下与SEQ ID NO:10所示的核苷酸序列杂交的核苷酸序列。这些变异形式包括(但并小限于):若干个(通常为1-90个,较佳地1-60个,更佳地1-20个,最佳地1-10个)核苷酸的缺失、插入和/或取代,以及在5’和/或3’端添加数个(通常为60个以内,较佳地为30个以内,更佳地为10个以内,最佳地为5个以内)核苷酸。In the present invention, the nucleotide sequence in SEQ ID NO: 10 can be substituted, deleted or added one or more to generate a derivative sequence of SEQ ID NO: 10. NO:10 has low homology, and can basically encode the amino acid sequence shown in SEQ ID NO:9. In addition, "the nucleotide sequence in SEQ ID NO: 10 has been substituted, deleted, or added at least one nucleotide-derived sequence" means that it can be used under moderately stringent conditions, and more preferably under highly stringent conditions. The nucleotide sequence to which the nucleotide sequence shown in SEQ ID NO: 10 hybridizes. These variant forms include (but are not limited to): deletion of several (usually 1-90, preferably 1-60, more preferably 1-20, and most preferably 1-10) nucleotides , Insertion and/or substitution, and adding several at the 5'and/or 3'end (usually within 60, preferably within 30, more preferably within 10, most preferably within 5 ) Nucleotide.
本发明所述的多核苷酸或核酸序列可以是DNA形式或RNA形式。DNA形式包括:DNA、基因组DNA或人工合成的DNA,DNA可以是单链的或是双链的。DNA可以是编码链或非编码链。The polynucleotide or nucleic acid sequence of the present invention may be in the form of DNA or RNA. The form of DNA includes: DNA, genomic DNA or synthetic DNA. DNA can be single-stranded or double-stranded. DNA can be a coding strand or a non-coding strand.
术语“编码本发明融合蛋白的多核苷酸”可以是包括编码此融合蛋白的多核苷酸,也可以是还包括附加编码和/或非编码序列的多核苷酸。本发明还涉及上述多核苷酸的变异体,其编码与本发明有相同的氨基酸序列的多苷或多肽的片段、类似物和衍生物。此多核苷酸的变异体可以是天然发生的等位变异体或非天然发生的变异体。这些核苷 酸变异体包括取代变异体、缺失变异体和插入变异体。如本领域所知的,等位变异体是一个多核苷酸的替换形式,它可能是一个或多个核苷酸的取代、缺失或插入,但不会从实质上改变其编码的多肽的功能。The term "polynucleotide encoding the fusion protein of the present invention" may include a polynucleotide encoding the fusion protein, or a polynucleotide that also includes additional coding and/or non-coding sequences. The present invention also relates to variants of the aforementioned polynucleotides, which encode fragments, analogs and derivatives of polyglycosides or polypeptides having the same amino acid sequence as the present invention. The variants of this polynucleotide can be naturally occurring allelic variants or non-naturally occurring variants. These nucleotide variants include substitution variants, deletion variants and insertion variants. As known in the art, an allelic variant is an alternative form of a polynucleotide. It may be a substitution, deletion or insertion of one or more nucleotides, but it will not substantially change the function of the encoded polypeptide. .
本发明还涉及与上述的序列杂交且两个序列之间具有至少50%,较佳地至少70%,更佳地至少80%相同性的多核苷酸。本发明特别涉及在严格条件下与本发明所述多核苷酸可杂交的多核苷酸。在本发明中,“严格条件”是指:(1)在较低离子强度和较高温度下的杂交和洗脱,如0.2×SSC,0.1%SDS,60℃;或(2)杂交时加有变性剂,如50%(v/v)甲酞胺,0.1%小牛血清/0.1%Ficoll,42℃等;或(3)仅在两条序列之间的相同性至少在90%以上,更好是95%以上时才发生杂交。The present invention also relates to polynucleotides that hybridize with the aforementioned sequences and have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides that can hybridize with the polynucleotides of the present invention under stringent conditions. In the present invention, "stringent conditions" refer to: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2×SSC, 0.1% SDS, 60°C; or (2) adding during hybridization There are denaturants, such as 50% (v/v) methylphthalamide, 0.1% calf serum/0.1% Ficoll, 42°C, etc.; or (3) only the identity between the two sequences is at least 90% or more, It is more preferable that the hybridization occurs when more than 95%.
本发明的核酸全长序列或其片段通常可以用PCR扩增法、重组法或人工合成的方法获得。对于PCR扩增法,可根据本发明所公开的有关核苷酸序列,尤其是开放阅读框序列来设计引物,并用市售的DNA库或按本领域技术人员已知的常规方法所制备的cDNA库作为模板,扩增而得有关序列。当序列较长时,常常需要进行两次或多次PCR扩增,然后再将各次扩增出的片段按正确次序拼接在一起。一旦获得了有关的序列,就可以用重组法来大批量地获得有关序列。通常是将其克隆入载体,再转入细胞,然后通过常规方法从增殖后的宿主细胞中分离得到有关序列。The full-length nucleic acid sequence of the present invention or its fragments can usually be obtained by PCR amplification method, recombination method or artificial synthesis method. For the PCR amplification method, primers can be designed according to the relevant nucleotide sequence disclosed in the present invention, especially the open reading frame sequence, and a commercially available DNA library or a cDNA prepared by a conventional method known to those skilled in the art can be used. The library is used as a template to amplify the relevant sequences. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order. Once the relevant sequence is obtained, the recombination method can be used to obtain the relevant sequence in large quantities. It is usually cloned into a vector, and then transferred into a cell, and then the relevant sequence is isolated from the proliferated host cell by conventional methods.
此外,还可用人工合成的方法来合成有关序列,尤其是片段长度较短时。通常,通过先合成多个小片段,然后再进行连接可获得序列很长的片段。目前,已经可以完全通过化学合成来得到编码本发明蛋白(或其片段,或其衍生物)的DNA序列。然后可将该DNA序列引入本领域中已知的各种现有的DNA分子(或如载体)和细胞中。此外,还可通过化学合成将突变引入本发明蛋白序列中。In addition, artificial synthesis methods can also be used to synthesize related sequences, especially when the fragment length is short. Usually, by first synthesizing multiple small fragments, and then ligating to obtain fragments with very long sequences. At present, the DNA sequence encoding the protein (or fragment or derivative thereof) of the present invention can be obtained completely through chemical synthesis. This DNA sequence can then be introduced into various existing DNA molecules (or such as vectors) and cells known in the art. In addition, mutations can also be introduced into the protein sequence of the present invention through chemical synthesis.
本发明的主要优点包括:The main advantages of the present invention include:
1)本发明提供了一种高效、定点去除DNA甲基化修饰的融合蛋白及其编码序列,对研究DNA甲基化的功能具有重要意义。1) The present invention provides a fusion protein and its coding sequence for highly efficient and site-specific removal of DNA methylation modification, which is of great significance for studying the function of DNA methylation.
2)本发明首次提供了去甲基化融合蛋白在植物中的应用,发现其在植物中具有精准高效的去甲基化修饰效率,对研究植物的表观遗传学及通过去甲基化调控植物性状具有重要科学价值。2) The present invention provides for the first time the application of demethylated fusion protein in plants, and found that it has precise and efficient demethylation modification efficiency in plants, which is useful for studying epigenetics of plants and through demethylation regulation Plant traits have important scientific value.
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数是重量百分比和重量份数。The present invention will be further explained below in conjunction with specific embodiments. It should be understood that these embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. The experimental methods without specific conditions in the following examples usually follow conventional conditions, such as Sambrook et al., Molecular Cloning: Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the conditions described in the manufacturing The conditions suggested by the manufacturer. Unless otherwise specified, percentages and parts are weight percentages and parts by weight.
实施例1:去甲基化基因编辑工具载体的构建Example 1: Construction of demethylated gene editing tool vector
1.1 dCas9-TET1cd去甲基化工具1.1 dCas9-TET1cd demethylation tool
(1)利用实验室已有的dCas9和TET1序列,利用高保真酶Q5扩增得到dCas9和TET1cd的序列片段。胶回收片段以备用。(1) Using the existing dCas9 and TET1 sequences in the laboratory, the high-fidelity enzyme Q5 was used to amplify the sequence fragments of dCas9 and TET1cd. Glue recovery fragments for future use.
(2)利用Nco I和BamH I切开的p1300-UBQ-CAS9载体,通过切胶回收得到p1300-UBQ片段,备用。(2) The p1300-UBQ-CAS9 vector cut with Nco I and BamH I, and the p1300-UBQ fragment is recovered by cutting the gel, for use.
(3)利用重组酶将dCas9片段重组进入p1300-UBQ片段得到p1300-UBQ-dCas9,备用。利用Sanger测序证明片段重组成功。(3) Use recombinase to recombine the dCas9 fragment into the p1300-UBQ fragment to obtain p1300-UBQ-dCas9 for use. Sanger sequencing was used to prove that the fragments were successfully recombined.
(4)接着利用BamHI切开上述得到的p1300-UBQ-dCas9载体。(4) Next, the p1300-UBQ-dCas9 vector obtained above was cut with BamHI.
(5)利用重组酶将TET1cd片段重组进入p1300-UBQ-dCas9片段得到p1300-UBQ-dCas9-TET1cd载体,此载体为靶向编辑DNA甲基化的终载体。利用Sanger测序证明片段重组成功。(5) Using recombinase to recombine the TET1cd fragment into the p1300-UBQ-dCas9 fragment to obtain the p1300-UBQ-dCas9-TET1cd vector, which is the final vector for targeted editing of DNA methylation. Sanger sequencing was used to prove that the fragments were successfully recombined.
1.2 dCas9-ROS1cd去甲基化工具1.2 dCas9-ROS1cd demethylation tool
(1)利用拟南芥的cDNA扩增得到ROS1cd的序列。(1) Amplify the ROS1cd sequence using Arabidopsis cDNA.
(2)利用重组酶将ROS1cd片段重组进入p1300-UBQ-dCas9片段得到p1300-UBQ-dCas9-ROS1cd载体,此载体为靶向编辑DNA甲基化的终载体。利用Sanger测序证明片段重组成功。(2) Use recombinase to recombine the ROS1cd fragment into the p1300-UBQ-dCas9 fragment to obtain the p1300-UBQ-dCas9-ROS1cd vector, which is the final vector for targeted editing of DNA methylation. Sanger sequencing was used to prove that the fragments were successfully recombined.
实施例2:ROS1启动子区MEMS去甲基化对ROS1表达的调控作用Example 2: Regulation of ROS1 expression by MEMS demethylation in the ROS1 promoter region
2.1靶点设计与构建2.1 Target design and construction
(1)根据sgRNA设计的规则设计5个靶向MEMS区域的sgRNA,对应sgRNA的序列见表1。sgRNA除了由靶向MEMS区域的20bp以外,还存在一段用于连接反应的粘性末端。(1) According to the rules of sgRNA design, five sgRNAs targeted to the MEMS region are designed. The corresponding sgRNA sequences are shown in Table 1. In addition to the 20bp targeted to the MEMS region, sgRNA also has a sticky end for ligation.
表1靶向MEMS区域的sgRNA序列Table 1 sgRNA sequence targeting MEMS region
Figure PCTCN2021077328-appb-000003
Figure PCTCN2021077328-appb-000003
(2)将sgRNA的F和R序列通过annealling program变成带有粘性末端的双链DNA片段。其过程为:将正反引物稀释到100μM,各取1μL与1μL T4 DNA连接酶缓冲液、0.5μL T4多核苷酸激酶、6.5μL ddH2O混合,将混合物在37℃ 30min,95℃ 5min,接着以0.2℃/s的速度降低到25℃,最后用水稀释250倍。(2) Turn the F and R sequences of sgRNA into double-stranded DNA fragments with sticky ends through annealing program. The process is: dilute the positive and negative primers to 100μM, take 1μL each with 1μL T4 DNA ligase buffer, 0.5μL T4 polynucleotide kinase, 6.5μL ddH2O, mix the mixture at 37°C for 30min, 95°C for 5min, and then The rate of 0.2°C/s was reduced to 25°C, and finally it was diluted 250 times with water.
(3)用Bbs I酶切U6,U3b,7SL载体,回收载体片段,备用。(3) U6, U3b, and 7SL vectors are digested with Bbs I, and the vector fragments are recovered for use.
(4)取1μL双链sgRNA片段与1μL酶切的载体,利用T4连接酶将sgRNA连入 U6,U3b,7SL载体中。sgMEMS-1和sgMEMS-4连入U6载体;sgMEMS-2和sgMEMS-5连入U3b载体;sgMEMS-3连入7SL载体。测序验证sgRNA成功连接进入相应载体。(4) Take 1 μL of double-stranded sgRNA fragment and 1 μL of digested vector, and use T4 ligase to connect sgRNA into U6, U3b, and 7SL vectors. sgMEMS-1 and sgMEMS-4 are connected to U6 carrier; sgMEMS-2 and sgMEMS-5 are connected to U3b carrier; sgMEMS-3 is connected to 7SL carrier. Sequencing verified that the sgRNA was successfully ligated into the corresponding vector.
(5)利用相应引物分别扩增上述得到的U6-sgMEMS-1,U3b-sgMEMS-2,7SL-sgMEMS-3载体,胶回收得到带有启动子和sgRNA的片段,备用。(5) Amplify the U6-sgMEMS-1, U3b-sgMEMS-2, and 7SL-sgMEMS-3 carriers obtained above with the corresponding primers, and recover the fragments with the promoter and sgRNA by gel recovery, and use them for later use.
(6)利用Sbf I+Xho I,Xho I+Xba I,Xba I+Xma I分别酶切上述得到的U6-sgMEMS-1,U3b-sgMEMS-2,7SL-sgMEMS-3片段,混合过柱纯化回收,备用。(6) Use Sbf I+Xho I, Xho I+Xba I, Xba I+Xma I to digest the U6-sgMEMS-1, U3b-sgMEMS-2, and 7SL-sgMEMS-3 fragments obtained above, and mix and purify them. Recycle, spare.
(7)利用Sbf I+Xma I酶切p1300-UBQ-dCas9-TET1cd和p1300-UBQ-dCas9-ROS1cd载体,乙醇沉淀回收,备用。(7) Use Sbf I+Xma I to digest the p1300-UBQ-dCas9-TET1cd and p1300-UBQ-dCas9-ROS1cd vectors, and recover by ethanol precipitation for use.
(8)利用T4连接酶将上述得到的混合的3个sgRNA片段分别连接到p1300-UBQ-dCas9-TET1cd和p1300-UBQ-dCas9-ROS1cd载体上。16℃连接反应2h即可得到p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd和p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd载体。测序验证片段正确连入载体。(8) The three mixed sgRNA fragments obtained above were ligated to the p1300-UBQ-dCas9-TET1cd and p1300-UBQ-dCas9-ROS1cd vectors respectively using T4 ligase. The p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd vector can be obtained after the ligation reaction at 16°C for 2h. Sequencing verifies that the fragments are correctly connected to the vector.
(9)下面将sgMEMS4和sgMEMS5连入p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd和p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd载体中,方法与上面类似。首先用对应引物PCR扩增得到U6-sgMEMS4和U3b-sgMEMS5片段;接着,用Kpn I+Xho I和Xho I+EcoR I分别酶切得到的U6-sgMEMS4和U3b-sgMEMS5片段,混合过柱纯化回收;然后,用Kpn I+EcoR I酶切p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd和p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd载体,乙醇纯化回收;用T4连接酶将混合的sgRNA片段连入p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd和p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd载体中,即得到最终载体p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd-sgMEMS4_5和p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd-sgMEMS4_5载体中。(9) Next, connect sgMEMS4 and sgMEMS5 to p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd carrier, the method is similar to the above. First, the U6-sgMEMS4 and U3b-sgMEMS5 fragments were obtained by PCR amplification with corresponding primers; then, the U6-sgMEMS4 and U3b-sgMEMS5 fragments obtained by digestion with Kpn I+Xho I and Xho I+EcoR I, respectively, were mixed and purified by column purification. ; Then, p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd vectors were digested with Kpn I+EcoR I, purified and recovered by ethanol; the mixed sgRNA fragments were connected into p1300-sgMEMS1_2_3 with T4 ligase -UBQ-dCas9-TET1cd and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd carrier, the final carrier p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd-sgMEMS4_5 and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd-sg-sg4 carrier.
2.2遗传转化2.2 Genetic transformation
(1)将p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd-sgMEMS4_5和p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd-sgMEMS4_5载体直接转化农杆菌GV3101。(1) The p1300-sgMEMS1_2_3-UBQ-dCas9-TET1cd-sgMEMS4_5 and p1300-sgMEMS1_2_3-UBQ-dCas9-ROS1cd-sgMEMS4_5 vectors were directly transformed into Agrobacterium GV3101.
a.将质粒加入农杆菌感受态细胞中,之后冰浴5min,然后放入液氮中5min,接着37℃水浴锅5min。a. Add the plasmid to Agrobacterium competent cells, then ice bath for 5 min, then put it in liquid nitrogen for 5 min, and then 37°C water bath for 5 min.
b.取出离心管,加入适量无抗生素的LB培养液(500μL),28℃摇床震荡培养2h。b. Take out the centrifuge tube, add an appropriate amount of antibiotic-free LB broth (500μL), shake culture at 28°C for 2h.
c.取少量菌液(50μL)涂抹于带有卡纳氨苄和利福平抗性的固体LB培养基上,在28℃培养箱中培养2d,即可看见菌落长出。c. Take a small amount of bacterial liquid (50μL) and smear it on the solid LB medium resistant to kanaampicin and rifampicin, and incubate it in an incubator at 28°C for 2 days, and the colonies can be seen to grow.
(2)将携带有载体的农杆菌转入拟南芥中。(2) Transform the Agrobacterium carrying the carrier into Arabidopsis thaliana.
a.挑取3个上述得到的单克隆菌落于含有3mL相应抗生素的LB培养液中,28℃摇床震荡培养16h。a. Pick 3 monoclonal colonies obtained above and place them in LB culture medium containing 3 mL of the corresponding antibiotics, and culture them with shaking at 28°C for 16 hours.
b.去1mL上述菌液于含有100mL相应抗生素的LB培养液中,过夜培养,测定其OD值在1.5–2.0。b. Remove 1 mL of the above-mentioned bacterial liquid and incubate overnight in LB culture medium containing 100 mL of corresponding antibiotics, and determine the OD value of 1.5-2.0.
c.室温4000g 10min离心收集农杆菌菌体,将农杆菌重悬在新配置的100mL 5%蔗糖溶液中。c. Centrifuge at 4000g for 10min at room temperature to collect Agrobacterium cells, and resuspend the Agrobacterium in a newly prepared 100mL 5% sucrose solution.
d.加入20μL Silwet L-77于上述蔗糖悬浮菌液中。d. Add 20μL Silwet L-77 to the above sucrose suspension bacteria liquid.
e.开花的拟南芥地上部分浸入上述溶液中,浸入15s左右。用保鲜膜包裹放于黑色托盘中,避光放入温室中,16-24h后,取出正常培养。e. The above-ground part of flowering Arabidopsis is immersed in the above solution for about 15 seconds. Wrap it with plastic wrap and place it in a black tray, protect it from light, and place it in a greenhouse. After 16-24 hours, take it out for normal culture.
f.待果荚变黄,收取种子即为T1种子,备用。f. When the fruit pods turn yellow, collect the seeds as T1 seeds, and set aside.
2.3转基因阳性苗筛选2.3 Screening of transgenic positive vaccines
(1)将T1种子用5%次氯酸钠溶液消毒和灭菌,并用无菌水洗涤5遍后,备用。(1) Disinfect and sterilize the T1 seeds with 5% sodium hypochlorite solution, and wash them with sterile water 5 times before use.
(2)将种子重悬在适量的无菌水中,然后倒入含有潮霉素的1/2MS培养基上,让种子均匀分布于培养基上,待吹干后,将板子用锡箔纸包裹起来,放于4℃冰箱7d。(2) Resuspend the seeds in an appropriate amount of sterile water, and then pour them on 1/2MS medium containing hygromycin, so that the seeds are evenly distributed on the medium, and after drying, wrap the board with tin foil , Placed in a refrigerator at 4°C for 7 days.
(3)将板子放入恒温培养箱中10-14d。将阳性苗移植到土里,放入温室中培养。(3) Put the board in a constant temperature incubator for 10-14 days. Transplant the positive seedlings into the soil and place them in the greenhouse for cultivation.
2.4 DNA甲基化水平及编辑效率的检测2.4 Detection of DNA methylation level and editing efficiency
(1)待阳性苗长到合适大小时,取阳性苗叶片提取DNA。DNA提取用QIAGEN的植物DNA提取试剂盒。(1) When the positive seedlings grow to a suitable size, take the leaves of the positive seedlings to extract DNA. DNA extraction uses QIAGEN's plant DNA extraction kit.
(2)测定阳性苗甲基化水平(2) Determine the methylation level of positive seedlings
a.用重亚硫酸盐处理阳性苗的DNA。此步骤用名字为BisulFlash DNA Modification的试剂盒完成。a. Treat the DNA of the positive seedlings with bisulfite. This step is completed with the kit named BisulFlash DNA Modification.
b.用特意设计、专门用于甲基化测序的引物扩增上述处理后的DNA,胶回收DNA,备用。b. Amplify the above-mentioned processed DNA with specially designed primers specially used for methylation sequencing, and gel to recover the DNA for use.
c.将每个阳性苗对应的胶回收产物混合,接着送入测序平台,进行测序。c. Mix the gel recovery products corresponding to each positive seedling, and then send them to the sequencing platform for sequencing.
d.分析甲基化数据并统计甲基化编辑效率。我们定义:对照DNA甲基化/阳性苗甲基化>1.5为编辑成功的阳性苗。d. Analyze methylation data and count the efficiency of methylation editing. We define: Control DNA methylation/positive seedling methylation>1.5 is a positive seedling that is successfully edited.
2.5 DNA去甲基化的遗传的稳定性2.5 Genetic stability of DNA demethylation
(1)选取一株编辑成功的阳性苗,收获种子为T2。(1) Select a successfully edited positive seedling, and harvest the seed as T2.
(2)将收获的T2种子消毒和灭菌,种植于正常的1/2 MS培养基上,4℃放置7d,恒温培养箱中14d,移植到土里于温室培养。(2) Disinfect and sterilize the harvested T2 seeds, plant them on a normal 1/2 MS medium, place them at 4°C for 7 days, place them in a constant temperature incubator for 14 days, and transplant them into the soil for cultivation in a greenhouse.
(3)选取若干植株,取其叶片,用CTAB法提取其DNA,利用M13F和sgRNA分析鉴定存在载体和不存在载体的植株。(3) Select several plants, take their leaves, extract their DNA with CTAB method, and use M13F and sgRNA analysis to identify plants with and without vectors.
(4)存在载体的植株和不存在载体的植株各选取一株,再次取其叶片,用QIAGEN的植物DNA提取试剂盒提取DNA。(4) One plant is selected from the plant with vector and one plant without the vector, and the leaves are taken again, and DNA is extracted with QIAGEN's plant DNA extraction kit.
(5)分析其DNA甲基化的水平。(5) Analyze the level of DNA methylation.
2.6基因表达分析2.6 Gene expression analysis
(1)对于选中的植株,取其叶片,用QIAGEN的植物RNA提取试剂盒提取RNA。(1) For selected plants, take their leaves and extract RNA with QIAGEN's plant RNA extraction kit.
(2)用全式金的反转录试剂盒将RNA反转成cDNA。(2) Reverse RNA into cDNA with a full gold reverse transcription kit.
(3)用Takara公司的SYBR分析基因的表达水平。(3) Analyze the expression level of genes with Takara's SYBR.
2.7实验结果2.7 Experimental results
(1)dCas9-ROS1cd和dCas9-TET1cd在转基因T1植物中降低靶向区域MEMS的甲基化水平(1) dCas9-ROS1cd and dCas9-TET1cd reduce the methylation level of MEMS in the target region in transgenic T1 plants
如图1所示,dCas9-ROS1cd的13号和14号转基因植株、dCas9-TETcd1的5号和14号转基因植株相比于野生型以、dCas9的阳性对照植株均发生了显著的去甲基化修饰。As shown in Figure 1, transgenic plants No. 13 and No. 14 of dCas9-ROS1cd, No. 5 and No. 14 transgenic plants of dCas9-TETcd1 were significantly demethylated compared to the wild-type and dCas9 positive control plants. Retouch.
(2)ROS1的表达水平(2) Expression level of ROS1
如图2所示,dCas9-ROScd1的13号和14号转基因植株、dCas9-TET1cd的5号和14号转基因植株中ROS1的表达量均低于野生型和对照组。As shown in Figure 2, the expression levels of ROS1 in the transgenic plants No. 13 and 14 of dCas9-ROScd1, and the transgenic plants No. 5 and 14 of dCas9-TET1cd were lower than those of the wild type and the control group.
(3)dCas9-ROS1cd和dCas9-TET1cd在MEMS位点的编辑效率(3) Editing efficiency of dCas9-ROS1cd and dCas9-TET1cd at MEMS sites
表2Table 2
Figure PCTCN2021077328-appb-000004
Figure PCTCN2021077328-appb-000004
(4)MEMS位点去甲基化的遗传稳定性(4) Genetic stability of MEMS site demethylation
如图3所示,T2代植株中,具有dCas9-TET1cd的转基因株系在MEMS位点保持了原有的低甲基化水平,无转基因的dCas9-TET1cd T2个体表现出甲基化逆转。As shown in Figure 3, among the T2 generation plants, the transgenic line with dCas9-TET1cd maintained the original low methylation level at the MEMS site, and the dCas9-TET1cd T2 individuals without the transgene showed methylation reversal.
2.8实验结论2.8 Experimental conclusion
dCas9-ROS1cd和dCas9-TET1cd均可以介导植物中ROS1启动子区MEMS位点的去甲基化,且dCas9-ROS1cd去甲基化编辑效率高于dCas9-TET1cd。MEMS位点的去甲基化可有效降低ROS1基因的表达。表明DNA的甲基化和去甲基化可有效调控基因的表达。Both dCas9-ROS1cd and dCas9-TET1cd can mediate the demethylation of MEMS sites in the ROS1 promoter region in plants, and the demethylation editing efficiency of dCas9-ROS1cd is higher than that of dCas9-TET1cd. Demethylation of MEMS sites can effectively reduce the expression of ROS1 gene. It shows that DNA methylation and demethylation can effectively regulate gene expression.
实施例3:RdDM突变体(nrpd1)中去甲基化实验Example 3: Demethylation experiment in RdDM mutant (nrpd1)
3.1靶点设计3.1 Target design
靶点设计与前面一致,1、2、3号sgRNA连入融合蛋白的上游,而4、5、6号sgRNA连入融合蛋白的下游,sgRNA的序列见表3,此sgRNA序列组成与前面的sgRNA是一致的。The target design is consistent with the previous one. sgRNAs 1, 2, and 3 are connected to the upstream of the fusion protein, while sgRNAs 4, 5, and 6 are connected to the downstream of the fusion protein. The sequence of sgRNA is shown in Table 3. sgRNA is consistent.
表3靶向多个区域的sgRNA序列Table 3 sgRNA sequences targeting multiple regions
Figure PCTCN2021077328-appb-000005
Figure PCTCN2021077328-appb-000005
Figure PCTCN2021077328-appb-000006
Figure PCTCN2021077328-appb-000006
3.2遗传转化3.2 Genetic transformation
参见实验例中步骤2遗传转化过程。See step 2 genetic transformation process in the experimental example.
3.3阳性苗筛选3.3 Screening of positive vaccines
参见实验例中步骤3阳性苗筛选。See step 3 in the experimental example for positive vaccine screening.
3.4DNA甲基化水平及编辑效率的检测3.4 Detection of DNA methylation level and editing efficiency
(1)待阳性苗长到合适大小时,取阳性苗叶片用CTAB法提取DNA。(1) When the positive seedlings grow to a suitable size, take the leaves of the positive seedlings and extract DNA with the CTAB method.
(2)利用Chop-PCR分析阳性苗甲基化水平。(2) Use Chop-PCR to analyze the methylation level of positive seedlings.
a.用适当的甲基化敏感的限制性内切酶处理1ug DNA,处理时间为12h-16h。a. Treat 1ug of DNA with appropriate methylation-sensitive restriction enzymes for 12h-16h.
b.利用相应的引物扩增酶切处理后的DNA,电泳,以条带明暗判断甲基化的高低。b. Use the corresponding primers to amplify the digested DNA, electrophoresis, and judge the level of methylation based on the light and dark bands.
(3)将Chop-PCR判断甲基化降低的阳性苗标记上,再次取叶片,用GIAGEN的试剂盒提取DNA。(3) Mark the positive seedlings with reduced methylation judged by Chop-PCR, take the leaves again, and extract DNA with GIAGEN kit.
(4)甲基化测序分析DNA甲基化水平(4) Methylation sequencing to analyze DNA methylation level
a.用重亚硫酸盐处理试剂盒提取的DNA。a. Treat the DNA extracted from the kit with bisulfite.
b.用设计的引物扩征处理后DNA,电泳,胶回收,备用。b. Use the designed primers to expand the processed DNA, electrophoresis, gel recovery, and use it for later use.
c.利用T4连接酶将上述回收片段连入Takara公司的p20T载体。c. Use T4 ligase to ligate the recovered fragments into Takara's p20T vector.
d.挑取阳性克隆,菌落PCR,测序。d. Pick positive clones, colony PCR, and sequencing.
e.用KISMETH分析甲基化水平。e. Analyze the methylation level with KISMETH.
f.利用Chop-PCR统计甲基化编辑效率。f. Use Chop-PCR to count the methylation editing efficiency.
3.5 DNA去甲基化的遗传的稳定性3.5 Genetic stability of DNA demethylation
与前面一致,只是测定甲基化用酶连测序分析。Consistent with the previous, only the enzyme-linked sequencing analysis was used to determine the methylation.
3.6实验结果3.6 Experimental results
(1)dCas9-ROS1cd和dCas9-TET1cd对不同区域的去甲基化结果(1) Results of demethylation of different regions by dCas9-ROS1cd and dCas9-TET1cd
图注:a,b,c分别对应3个位点的甲基化编辑结果;每个图的最下面表示编辑区域在染色体上的位置,红色线代表CG位点在基因组上的位置,蓝色线代表CHG位点在基因组上的位置,黑色箭头代表用于分析DNA甲基化引物的位置,sgRNA对应基因组的位置也被标记显示在图中;每个图的上方代表DNA甲基化的水平,实心代表对应位点有DNA甲基化,空心代表没有DNA甲基化,红色代表CG甲基化,蓝色代表CHG甲基化,绿色代表CHH甲基化。Legend: a, b, and c respectively correspond to the methylation editing results of 3 sites; the bottom of each figure represents the position of the editing region on the chromosome, the red line represents the position of the CG site on the genome, and the blue The line represents the position of the CHG site on the genome, the black arrow represents the position of the primer used to analyze DNA methylation, and the position of the sgRNA corresponding to the genome is also marked in the figure; the top of each figure represents the level of DNA methylation , The solid represents DNA methylation at the corresponding site, the open represents no DNA methylation, red represents CG methylation, blue represents CHG methylation, and green represents CHH methylation.
如图4所示,在Chr4.8670151-8671193位点,dCas9-ROS1cd的L44转基因植株、dCas9-TETcd1的L4转基因植株均发生了显著的去甲基化修饰,其DNA甲基化几乎全部被靶向移除。在Chr5.9872445-9873033(solo-LTR site)位点,只有dCas9-ROS1cd的J41转基因植株发生了显著的去甲基化修饰。相反,在Chr3:2849440-2849791位点,只有dCas9-TET1cd的E6发生了显著的去甲基化修饰。As shown in Figure 4, at Chr4.8670151-8671193, the L44 transgenic plants of dCas9-ROS1cd and the L4 transgenic plants of dCas9-TETcd1 have undergone significant demethylation modification, and almost all of their DNA methylation has been targeted. To remove. At Chr5.9872445-9873033 (solo-LTR site) site, only dCas9-ROS1cd J41 transgenic plants had significant demethylation modification. In contrast, at Chr3:2849440-2849791, only E6 of dCas9-TET1cd was significantly demethylated.
(2)dCas9-ROS1cd和dCas9-TET1cd在不同区域的编辑效率(2) Editing efficiency of dCas9-ROS1cd and dCas9-TET1cd in different regions
表4Table 4
Figure PCTCN2021077328-appb-000007
Figure PCTCN2021077328-appb-000007
Figure PCTCN2021077328-appb-000008
Figure PCTCN2021077328-appb-000008
(3)遗传稳定性(3) Genetic stability
如图5所示,T2代植株中,具有转基因株系和无转基因的T2个体均保持其低甲基化的状态。As shown in Figure 5, in the T2 generation plants, the T2 individuals with and without the transgene maintained their hypomethylated state.
3.7实验结论3.7 Experimental conclusion
dCas9-ROS1cd和dCas9-TET1cd均可以介导Chr4.8670151-8671193位点的DNA发生去甲基化,且dCas9-TET1cd去甲基化编辑效率高于dCas9-ROS1cd。然而,对于Chr5.9872445-9873033(solo-LTR site)位点,只有dCas9-ROS1cd成功使其发生去甲基化修饰。相反,对于Chr3:2849440-2849791位点,只有dCas9-TET1cd成功使其发生去甲基化修饰。对于不同的位点,dCas9-ROS1cd和dCas9-TET1cd表现出不同的效率,它们可以在应用时,进行互补。Both dCas9-ROS1cd and dCas9-TET1cd can mediate the demethylation of DNA at Chr4.8670151-8671193, and the demethylation editing efficiency of dCas9-TET1cd is higher than that of dCas9-ROS1cd. However, for the Chr5.9872445-9873033 (solo-LTR site) site, only dCas9-ROS1cd successfully demethylated it. In contrast, for Chr3:2849440-2849791, only dCas9-TET1cd successfully demethylated it. For different sites, dCas9-ROS1cd and dCas9-TET1cd show different efficiencies, and they can complement each other when applied.
在本发明提及的所有文献都在本申请中引用作为参考,就如同每一篇文献被单独引用作为参考那样。此外应理解,在阅读了本发明的上述讲授内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。All documents mentioned in the present invention are cited as references in this application, as if each document was individually cited as a reference. In addition, it should be understood that after reading the above teaching content of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

Claims (12)

  1. 一种融合蛋白,其特征在于,包括选自下列的组分:A fusion protein characterized by comprising components selected from the following:
    (1)定位功能元件D1,其具有靶向和结合DNA的功能;和(1) The positioning function element D1, which has the function of targeting and binding DNA; and
    (2)去甲基化功能元件D2,其具有将甲基化核苷酸转化为非甲基化核苷酸的功能。(2) Demethylation functional element D2, which has the function of converting methylated nucleotides into non-methylated nucleotides.
  2. 如权利要求1所述的融合蛋白,其特征在于,所述D1无催化活性,并且选自下组:Cas蛋白、锌指蛋白或TALENs蛋白,或其功能域,或其组合;The fusion protein of claim 1, wherein the D1 has no catalytic activity and is selected from the group consisting of Cas protein, zinc finger protein or TALENs protein, or functional domains thereof, or a combination thereof;
    优选的,所述D1选自下组:dCas9、dCpf1、dCas12、dCas13、dCms1、dMAD7,或其功能域,或其组合。Preferably, the D1 is selected from the following group: dCas9, dCpf1, dCas12, dCas13, dCms1, dMAD7, or functional domains or combinations thereof.
  3. 如权利要求1或2所述的融合蛋白,其特征在于,所述D2具有将甲基化胞嘧啶转换为非甲基化胞嘧啶的功能;The fusion protein of claim 1 or 2, wherein the D2 has the function of converting methylated cytosine to unmethylated cytosine;
    优选的,所述D2是选自下组的去甲基化酶或其去甲基化功能域:ROS1、TET、DME、DML,或其组合。Preferably, the D2 is a demethylase or its demethylation functional domain selected from the following group: ROS1, TET, DME, DML, or a combination thereof.
  4. 如权利要求1所述的融合蛋白,其特征在于,所述D1和D2通过一个或多个下列组件连接:肽键、连接肽、核定位信号、表位标签,或其组合。The fusion protein of claim 1, wherein the D1 and D2 are connected by one or more of the following components: peptide bond, connecting peptide, nuclear localization signal, epitope tag, or a combination thereof.
  5. 一种融合蛋白组合,其特征在于,所述融合蛋白组合包括第一融合蛋白和第二融合蛋白;A fusion protein combination, characterized in that the fusion protein combination includes a first fusion protein and a second fusion protein;
    所述第一融合蛋白和所述第二融合蛋白的结构各自独立地如权利要求1-4任一项所述的融合蛋白所示;The structures of the first fusion protein and the second fusion protein are each independently as shown in the fusion protein of any one of claims 1 to 4;
    并且所述第一融合蛋白和所述第二融合蛋白中的D2是不同的;And D2 in the first fusion protein and the second fusion protein are different;
    优选的,所述第一融合蛋白的D2选自ROS1或其功能域,所述第二融合蛋白的D2选自TET或其功能域。Preferably, D2 of the first fusion protein is selected from ROS1 or its functional domain, and D2 of the second fusion protein is selected from TET or its functional domain.
  6. 一种核酸,其特征在于,编码如权利要求1-4任一项所述的融合蛋白或权利要求5所述的融合蛋白组合。A nucleic acid characterized in that it encodes the fusion protein according to any one of claims 1 to 4 or the fusion protein combination according to claim 5.
  7. 一种核酸构建物,其特征在于,包括第一核酸序列和一个或多个第二核酸序列,其中,所述第一核酸序列编码如权利要求1-4任一项所述的融合蛋白或权利要求5所述的融合蛋白组合,所述第二核酸序列为gRNA编码序列。A nucleic acid construct, which is characterized by comprising a first nucleic acid sequence and one or more second nucleic acid sequences, wherein the first nucleic acid sequence encodes the fusion protein or the right according to any one of claims 1 to 4 The fusion protein combination of claim 5, wherein the second nucleic acid sequence is a gRNA coding sequence.
  8. 一种载体,其特征在于,含有权利要求6所述的核酸或权利要求7所述的核酸构建物。A vector, characterized in that it contains the nucleic acid of claim 6 or the nucleic acid construct of claim 7.
  9. 一种复合物,其特征在于,包含:A compound characterized in that it contains:
    (1)蛋白组分,包含权利要求1-4任一项所述的融合蛋白或权利要求5所述的融合蛋白组合。(1) A protein component, comprising the fusion protein according to any one of claims 1 to 4 or the fusion protein combination according to claim 5.
    (2)核酸组分,为一个或多个gRNA序列;(2) The nucleic acid component is one or more gRNA sequences;
    其中,所述的蛋白组分与核酸组分相互结合形成所述复合物。Wherein, the protein component and the nucleic acid component combine with each other to form the complex.
  10. 一种宿主细胞,其特征在于,所述宿主细胞中含有如权利要求1-4任一项所述的融合蛋白、或权利要求5所述的融合蛋白组合、或权利要求8所述的载体、或权利要求9所述的复合物,或所述宿主细胞的基因组中整合有权利要求6所述的核酸或权利要求7所述的核酸构建物。A host cell, characterized in that the host cell contains the fusion protein according to any one of claims 1 to 4, or the fusion protein combination according to claim 5, or the vector according to claim 8, Or the complex of claim 9, or the nucleic acid of claim 6 or the nucleic acid construct of claim 7 integrated into the genome of the host cell.
  11. 权利要求1-4任一项所述的融合蛋白、或权利要求5所述的融合蛋白组合、或权利要求6所述的核酸、或权利要求7所述的核酸构建物、或权利要求8所述的载体、或权利要求9所述的复合物在对目标核酸进行去甲基化修饰中的用途。The fusion protein of any one of claims 1-4, or the fusion protein combination of claim 5, or the nucleic acid of claim 6, or the nucleic acid construct of claim 7, or the nucleic acid construct of claim 8 Use of the vector or the complex of claim 9 in the demethylation modification of target nucleic acid.
  12. 权利要求1-4任一项所述的融合蛋白、或权利要求5所述的融合蛋白组合、或权利要求6所述的核酸、或权利要求7所述的核酸构建物、或权利要求8所述的载体、或权利要求9所述的复合物在制备用于对目标核酸进行去甲基化修饰的试剂盒中的用途。The fusion protein of any one of claims 1-4, or the fusion protein combination of claim 5, or the nucleic acid of claim 6, or the nucleic acid construct of claim 7, or the nucleic acid construct of claim 8 Use of the vector or the complex of claim 9 in the preparation of a kit for demethylation modification of a target nucleic acid.
PCT/CN2021/077328 2020-02-26 2021-02-23 Fusion protein and application thereof WO2021169925A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010120522.0 2020-02-26
CN202010120522.0A CN113307878A (en) 2020-02-26 2020-02-26 Fusion protein and application thereof

Publications (1)

Publication Number Publication Date
WO2021169925A1 true WO2021169925A1 (en) 2021-09-02

Family

ID=77370001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/077328 WO2021169925A1 (en) 2020-02-26 2021-02-23 Fusion protein and application thereof

Country Status (2)

Country Link
CN (1) CN113307878A (en)
WO (1) WO2021169925A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113151293B (en) * 2020-10-20 2023-03-10 中国农业科学院生物技术研究所 Stress-resistant gene line AcDwEm and application thereof in improving salt resistance, drought resistance and high temperature resistance of crops
CN114591439B (en) * 2021-10-18 2023-06-20 翌圣生物科技(上海)股份有限公司 Recombinant TET enzyme MBD2-NgTET1 and application thereof in improving 5caC (cubic-alternating current) ratio in TET enzyme oxidation product
CN114540325B (en) * 2022-01-17 2022-12-09 广州医科大学 Method for targeted DNA demethylation, fusion protein and application thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016022363A2 (en) * 2014-07-30 2016-02-11 President And Fellows Of Harvard College Cas9 proteins including ligand-dependent inteins
WO2017208247A1 (en) * 2016-06-02 2017-12-07 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Assay for the removal of methyl-cytosine residues from dna
WO2018140362A1 (en) * 2017-01-26 2018-08-02 The Regents Of The University Of California Targeted gene demethylation in plants
WO2018154096A1 (en) * 2017-02-24 2018-08-30 Georg-August-Universität Göttingen Stiftung Öffentlichen Rechts, Universitätsmedizin Method for re-expression of different hypermethylated genes involved in fibrosis, like hypermethylated rasal,1 and use thereof in treatment of fibrosis as well as kit of parts for re-expression of hypermethylated genes including rasal1 in a subject
KR20190115717A (en) * 2018-04-03 2019-10-14 서울대학교산학협력단 Composition and kit for reducing methylation of target DNA and induction of expression of target gene in animal cell, and method using the same
WO2019232069A1 (en) * 2018-05-30 2019-12-05 Emerson Collective Investments, Llc Cell therapy

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016022363A2 (en) * 2014-07-30 2016-02-11 President And Fellows Of Harvard College Cas9 proteins including ligand-dependent inteins
WO2017208247A1 (en) * 2016-06-02 2017-12-07 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Assay for the removal of methyl-cytosine residues from dna
WO2018140362A1 (en) * 2017-01-26 2018-08-02 The Regents Of The University Of California Targeted gene demethylation in plants
WO2018154096A1 (en) * 2017-02-24 2018-08-30 Georg-August-Universität Göttingen Stiftung Öffentlichen Rechts, Universitätsmedizin Method for re-expression of different hypermethylated genes involved in fibrosis, like hypermethylated rasal,1 and use thereof in treatment of fibrosis as well as kit of parts for re-expression of hypermethylated genes including rasal1 in a subject
KR20190115717A (en) * 2018-04-03 2019-10-14 서울대학교산학협력단 Composition and kit for reducing methylation of target DNA and induction of expression of target gene in animal cell, and method using the same
WO2019232069A1 (en) * 2018-05-30 2019-12-05 Emerson Collective Investments, Llc Cell therapy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MAEDER MORGAN L; ANGSTMAN JAMES F; RICHARDSON MARCY E; LINDER SAMANTHA J; CASCIO VINCENT M; TSAI SHENGDAR Q; HO QUAN H; SANDER JEF: "Targeted DNA demethylation and activation of endogenous genes using programmable TALE-TET1 fusion proteins", NATURE BIOTECHNOLOGY, GALE GROUP INC., NEW YORK, vol. 31, no. 12, 1 December 2013 (2013-12-01), New York, pages 1137 - 1142, XP037163685, ISSN: 1087-0156, DOI: 10.1038/nbt.2726 *

Also Published As

Publication number Publication date
CN113307878A (en) 2021-08-27

Similar Documents

Publication Publication Date Title
Santosh Kumar et al. CRISPR-Cas9 mediated genome editing of drought and salt tolerance (OsDST) gene in indica mega rice cultivar MTU1010
WO2021169925A1 (en) Fusion protein and application thereof
CN110157726A (en) The method of Plant Genome fixed point replacement
CN109112146B (en) Cloning and breeding application of gene qSLWA9 for controlling pod length and grain weight traits of brassica napus
CN116179589B (en) SlPRMT5 gene and application of protein thereof in regulation and control of tomato fruit yield
CN110592134B (en) Application of SDG40 gene or coded protein thereof
CN112126652B (en) Application of rice OsAUX3 gene in regulation of rice seed grain length
CN116286724A (en) Lectin receptor protein TaLecRLK2 and encoding gene and application thereof
CN116445507A (en) Wheat effective tillering number and yield regulating gene TraesCS2A02G577100, encoding protein, expression vector and application thereof
CN114958867B (en) Corn ear grain weight and yield regulation gene KWE2, coded protein, functional marker, expression vector and application thereof
CN114292855B (en) PagARR9 gene for regulating and controlling growth of xylem of poplar and application thereof
CN114213515B (en) Gene OsR498G0917707800.01 and application of encoded protein in regulation of rice chalkiness
CN112662687B (en) Method, kit and gene for delaying flowering phase of corn
CN115925848A (en) Dendrobium ERF transcription factor gene DoERF5 and application thereof
Aoki Resurrection of an ancestral gene: functional and evolutionary analyses of the Ng rol genes transferred from Agrobacterium to Nicotiana
CN104805100B (en) Paddy gene OsS μ 2 applications in plant leaf blade aging is delayed of BP
CN110452914B (en) Gene BnC04BIN2-like1 for regulating brassinolide signal transduction and application thereof
Li et al. Establishment of an efficient Agrobacterium tumefaciens-mediated leaf disc transformation of Thellungiella halophila
WO2023003177A1 (en) Method for producing tomato plant having controlled disease resistance by gene editing, and tomato plant produced by same production method
CN112661823B (en) Gene and method for changing flowering period of corn
CN115851821B (en) Application of BBX16 gene in improving plant salt tolerance
CN112646014B (en) Gene and method for changing flowering period of corn
CN114736919B (en) Method for cultivating drought-resistant corn by editing carbonic anhydrase gene and application thereof
CN112126633B (en) Tomato cyclin dependent kinase SlCDK8 gene and application thereof
US20220042030A1 (en) A method to improve the agronomic characteristics of plants

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21760799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21760799

Country of ref document: EP

Kind code of ref document: A1