WO2021175289A1 - 多重基因组编辑方法和系统 - Google Patents

多重基因组编辑方法和系统 Download PDF

Info

Publication number
WO2021175289A1
WO2021175289A1 PCT/CN2021/079087 CN2021079087W WO2021175289A1 WO 2021175289 A1 WO2021175289 A1 WO 2021175289A1 CN 2021079087 W CN2021079087 W CN 2021079087W WO 2021175289 A1 WO2021175289 A1 WO 2021175289A1
Authority
WO
WIPO (PCT)
Prior art keywords
plant
editing
scrna
seq
deaminase
Prior art date
Application number
PCT/CN2021/079087
Other languages
English (en)
French (fr)
Inventor
高彩霞
李超
陈坤玲
Original Assignee
中国科学院遗传与发育生物学研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院遗传与发育生物学研究所 filed Critical 中国科学院遗传与发育生物学研究所
Priority to US17/909,309 priority Critical patent/US20240117368A1/en
Priority to EP21764356.8A priority patent/EP4116426A4/en
Priority to BR112022017704A priority patent/BR112022017704A2/pt
Priority to CN202180019211.8A priority patent/CN115667528B/zh
Publication of WO2021175289A1 publication Critical patent/WO2021175289A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the invention relates to the field of plant genetic engineering. Specifically, the present invention relates to a multiple genome editing method and system suitable for plants, especially crops. More specifically, the present invention relates to a CRISPR nickase-based system and method capable of simultaneously realizing different types of genome editing.
  • CRISPR/Cas Clusters of regularly spaced short palindrome repeats and their related systems (Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated, CRISPR/Cas) have greatly promoted molecular biology. development of. In the Class 2 system, more and more Cas proteins have been discovered and engineered, including Cas9 targeting DNA, Cas12 targeting single-stranded DNA (ssDNA) and RNA, Cas13 targeting RNA, and CAST The system is used to insert DNA. The diversity and simplicity of the CRISPR/Cas system make it a super molecular toolbox. In addition, the Cas protein can also be transformed into a variant lacking nuclease activity.
  • the Cas9 (Streptococcus pyogenes Cas9, SpCas9) protein from Streptococcus pyogenes consists of two nuclease domains, RuvC and HNH, which cut the non-targeting strand and the targeting strand, respectively.
  • SpCas9 can be engineered into a nickase nCas9 (Nickase Cas9); Or replace Asp10 and His840 with alanine at the same time, so that SpCas9 loses nuclease activity and becomes dCas9 (Deactive Cas9).
  • Asp10 and His840 with alanine at the same time, so that SpCas9 loses nuclease activity and becomes dCas9 (Deactive Cas9).
  • the development of these variants has promoted the CRISPR/Cas9 system to become a toolbox for genome editing systems ( Figure 1a).
  • Cas9 is used to generate double-strand break (DSB) on the genome; paired nCas9 can also be used to generate highly specific DSB on the genome, and nCas9 (D10A) is also used in the single-base editing system CBE (Cytosine Base Editor) and ABE (Adenine Base Editor) development; dCas9 is often used to fuse various effector proteins to achieve CRISPR interference (CRISPR Interference, CRISPRi), CRISPR activation (CRISPRa), gene composition and appearance Genetic modification, etc. However, in most cases, these systems perform only one type of genome editing after one transformation.
  • sgRNA RNA aptamer into the sgRNA backbone to form scaffold RNA (scRNA).
  • scRNA scaffold RNA
  • the dCas9/scRNA complex recruits gene activation or inhibitory factors through the hairpin structure, which can be located in different positions. The point realizes the dual functions of gene transcription activation and inhibition at the same time.
  • Another strategy uses multiple homologous CRISPR systems to achieve the triple function of gene activation, suppression, and deletion at different target sites at the same time.
  • most of these multiple strategies for genome engineering have been developed in bacteria, yeast and human cells. Due to the limitations of delivery methods and PAM, it is still challenging to develop a multiple genome editing system in plants using different homologous CRISPR systems.
  • the efficiency of homologous recombination (HR) in plants is still relatively low. It is of great significance for breeders to superimpose multiple important agronomic traits or change the regulatory network of genes at the genetic level. Therefore, there is an urgent need in the art for methods and systems that can achieve multiple genome editing in plants such as crops.
  • SWISS Simultaneous and Wide-editing Induced by Single System
  • Figure 1a SWISS uses two scRNAs containing different RNA aptamers to recruit the cytosine deaminase or adenine deaminase fused with the corresponding RNA aptamer binding protein. After a single transformation, they can be used at different target sites. Two types of editing are realized: CBE and ABE. Introducing the paired sgRNA into the SWISS system can generate DSB at the third target site, making SWISS a CRISPR system with triple editing function ( Figure 1a).
  • Figure 1 Optimization of the construction of plant cytosine base editors recruited by RNA aptamers.
  • (a) A multiple genome editing system based on nCas9 nuclease-based CRISPR scaffold RNA programming.
  • (b) The structure of pOsU3-esgRNA-2 ⁇ MS2 construct, with two MS2 hairpin structures at the 3'end of esgRNA.
  • (c) The architecture of PBEc1 to PBEc5. Abbreviations: XTEN, 16-aa linker; NLS, nuclear localization signal; CaMV, cauliflower mosaic virus; Term, terminator.
  • FIG. 1 A variety of scaffold RNA and binding protein orthologs can effectively mediate the conversion of C to T.
  • (c) Compare the C>T conversion of the BFP to GFP reporter system induced by various scRNAs and their homologous PBEc in rice protoplasts (n 3). Values and error bars represent the average ⁇ standard error of three independent experiments.
  • FIG. 3 Optimizing plant adenine base editor constructs using multiple scaffold RNA and binding protein homologues.
  • ecTadA7.10 evolved E. coli TadA
  • aa amino acid
  • XTEN 16aa linker
  • NLS nuclear localization signal
  • CaMV cauliflower mosaic virus
  • Term terminator.
  • c The structure of PABEc5 to PABEc7.
  • FIG. 4 Simultaneous multiple genome editing of CRISPR scaffold RNA programming based on nCas9 (D10A) platform in rice protoplasts.
  • the picture on the left is a schematic diagram of the SWISSv1.1 strategy.
  • the CBE target with esgRNA-2 ⁇ MS2 and the paired sgRNA for DSB were assembled in the same vector.
  • a sample of untreated protoplasts was used as a control. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • PABEc6 uses esgRNA-2 ⁇ boxB and paired sgRNA to simultaneously induce ABE and DSB.
  • the picture on the left is a schematic diagram of the SWISSv1.2 strategy.
  • the ABE target with esgRNA-2xboxB and the paired sgRNA for DSB were assembled in the same vector.
  • a sample of untreated protoplasts was used as a control. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • the picture on the left is a schematic diagram of the SWISSv2 strategy.
  • a CBE target with esgRNA-2 ⁇ MS2 and an ABE target with esgRNA-2 ⁇ boxB were assembled in the same vector.
  • a sample of untreated protoplasts was used as a control. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • (d) Simultaneous CBE, ABE and DSB induced by MGE with esgRNA-2 ⁇ MS2, esgRNA-2 ⁇ boxB and paired sgRNA.
  • a CBE target with esgRNA-2 ⁇ MS2, an ABE target with esgRNA-2 ⁇ boxB and paired sgRNA for DSB were assembled in the same vector.
  • a sample of untreated protoplasts was used as a control. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • a sample of untreated protoplasts was used as a control. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • Figure 6 Base editing efficiency of different scRNAs and homologous PBEc to endogenous genes in rice protoplasts.
  • the data is displayed in the form of a box plot (center line, median, frame limit, 25th and 75th percentile of the data; upper and lower whiskers extend to the minimum or maximum value respectively).
  • Figure 7 Product purity and indel frequency of esgRNA-2 ⁇ MS2, esgRNA-3 ⁇ MS2, sgRNA4.0, esgRNA-2 ⁇ com and related PBEc in rice protoplasts.
  • (a) shows the product distribution between edited DNA sequencing reads of esgRNA-2 ⁇ MS2, esgRNA-3 ⁇ MS2, sgRNA4.0 and esgRNA-2 ⁇ com with homologous PBEc in rice protoplasts. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • FIG. 8 C to T editing frequency of APOBEC1 narrow window variant recruited by scaffold RNA in rice protoplasts.
  • YE1-PBE, YE2-PBE, EE-PBE and YEE-PBE Abbreviations: XTEN, 16-aa linker; NLS, nuclear localization signal; CaMV, cauliflower mosaic virus; Term, terminator.
  • FIG. 9 Activity of PABEc8 to PABEc10 with adenosine deaminase N-terminal binding protein.
  • (a) The structure of PABEc8 to PABEc10. Abbreviations: ecTadA7.10, evolved E. coli TadA; aa, amino acid; XTEN, 16aa linker; NLS, nuclear localization signal; CaMV, cauliflower mosaic virus; Term, terminator.
  • Figure 10 Activity, product purity and indel frequency of selected scaffold RNA and homologous PABEc in rice protoplasts.
  • the OsALS-T1, OsCDC48, OsDEP1-T1, OsNRT1.1B, OsEV and OsOD targets were tested. One of three independent biological replicates is shown.
  • (b) shows the DNA sequencing reads edited by esgRNA-2 ⁇ MS2, esgRNA-MS2+f6, esgRNA-1 ⁇ PP7-1, esgRNA-2 ⁇ boxB and esgRNA-2 ⁇ com and associated PABEc in rice protoplasts Product distribution. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • FIG. 11 Schematic diagram of multiple sgRNA assembly of SWISSv1.1 and SWISSv1.2.
  • (a) Schematic diagram of paired sgRNA assembly. Paired sgRNAs (paired sgL and sgR) are designed in the outward direction of PAM, and the distance between the cut sites is 40-68 bp. Amplify the PCR product from the esgRNA-pTaU6 template, and then insert it into the BsaI site of pOsU3-esgRNA through Golden Gate Assembly.
  • the ABE target was inserted into the BsaI site of pOsU3-esgRNA-2 ⁇ boxB, and then the pOsU3-ABE target-esgRNA-2 ⁇ boxB part was amplified.
  • a PCR product with paired sgRNA was amplified from the paired sgRNAs plasmid. The above two PCR products were assembled into the EcoRI and HindIII digested backbone of pOsU3-esgRNA by multiple one-step cloning.
  • Figure 12 shows the distribution of deletion products between the SWISSv1.1, SWISSv1.2 and SWISSv3 indel sequencing reads. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • FIG. 13 Schematic diagram of various sgRNA assembly of SWISSv2 and SWISSv3.
  • FIG. 1 Schematic diagram of CBE target, ABE target and paired sgRNA assembly.
  • a PCR product with CBE target and ABE target was amplified from the dual sgRNA plasmid of CBE target and ABE target.
  • a PCR product with paired sgRNA was amplified from the paired sgRNA plasmid. The above two PCR products were assembled into the EcoRI and HindIII digested backbone of pOsU3-esgRNA by multiple one-step cloning.
  • FIG. 14 Simultaneous CBE, ABE and DSB in rice.
  • a PCR product with CBE target and ABE target was amplified from the dual sgRNA plasmid of CBE target and ABE target.
  • a PCR product with paired sgRNA was amplified from the paired sgRNA plasmid.
  • the above two PCR products were assembled into the pH-MGE binary vector digested by HindIII by multiple one-step cloning.
  • the PAM sequence is shown in brown. Ten T0 seedlings (T0-1 to T0-10) were analyzed. WT/D and WT/U indicate genomic DNA amplicons with or without wild-type (WT) control digested by T7E1. A total of 55 mutants were identified. The band marked by the red arrow can be diagnosed as positive for genome editing. The sequence was confirmed by Sanger sequencing. Further analyze the sequencing spectra of indels through online tools DSDecodeM and TIDE.
  • FIG. 15 Analysis of undesired edit misses caused by SWISSv2 and SWISSv3.
  • T2A-mediated "self-cleavage" is achieved by the ribosome skipping the formation of a glycyl-prolyl peptide bond at the C-terminus.
  • the position of T2A will affect the expression level of the polycistronic construct.
  • Successful skipping can produce three independent proteins as designed. However, skipping failures may also occur, resulting in fusion protein products.
  • both esgRNA-2 ⁇ MS2 and esgRNA-2 ⁇ boxB can recruit the fusion protein, which leads to undesirable performance on the ABE target.
  • Cytosine editing and undesirable adenine editing on the CBE target (b) The efficiency of undesirable editing off-target caused by SWISSv2 and SWISSv3.
  • the cytosine of the ABE target and the adenine of the CBE target in SWISSv2 and SWISSv3 were analyzed. Values and error bars represent the average ⁇ standard error of three independent experiments.
  • Figure 16 Shows a variety of different scRNA/sgRNA structures designed.
  • Figure 17 shows a variety of different scRNA/sgRNA structures designed.
  • the term “and/or” encompasses all combinations of items connected by the term, and should be treated as if each combination has been individually listed herein.
  • “A and/or B” encompasses “A”, “A and B”, and “B”.
  • “A, B, and/or C” encompasses "A”, “B”, “C”, “A and B”, “A and C”, “B and C”, and "A and B and C”.
  • the protein or nucleic acid may be composed of the sequence, or may have additional amino acids or nuclei at one or both ends of the protein or nucleic acid. Glycolic acid, but still has the activity described in the present invention.
  • methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain actual conditions (for example, when expressed in a specific expression system), but does not substantially affect the function of the polypeptide.
  • Gene as used herein not only covers chromosomal DNA present in the nucleus, but also includes organelle DNA present in subcellular components of the cell (such as mitochondria, plastids).
  • Genetically modified plant means a plant that contains an exogenous polynucleotide or a modified gene or expression control sequence in its genome.
  • exogenous polynucleotides can be stably integrated into the genome of plants and inherited for successive generations.
  • the exogenous polynucleotide can be integrated into the genome alone or as part of a recombinant DNA construct.
  • the modified gene or expression control sequence includes one or more deoxynucleotide substitutions, deletions and additions in the plant genome.
  • Form in terms of sequence means a sequence from a foreign species, or if from the same species, a sequence that has undergone significant changes in composition and/or locus from its natural form through deliberate human intervention.
  • nucleic acid sequence is used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural Or changed nucleotide bases.
  • Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (respectively RNA or DNA), “C” is cytidine or deoxycytidine, and “G” is guanosine or Deoxyguanosine, “U” means uridine, “T” means deoxythymidine, “R” means purine (A or G), “Y” means pyrimidine (C or T), “K” means G or T, “ H” means A or C or T, “D” means A, T or G, “I” means inosine, and “N” means any nucleotide.
  • Polypeptide “peptide”, and “protein” are used interchangeably in the present invention and refer to a polymer of amino acid residues.
  • the term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally-occurring amino acids, as well as to naturally-occurring amino acid polymers.
  • the terms "polypeptide”, “peptide”, “amino acid sequence” and “protein” may also include modified forms, including but not limited to glycosylation, lipid linkage, sulfation, gamma carboxylation of glutamic acid residues, hydroxyl And ADP-ribosylation.
  • expression construct refers to a vector suitable for expression of a nucleotide sequence of interest in an organism, such as a recombinant vector.
  • “Expression” refers to the production of a functional product.
  • the expression of a nucleotide sequence may refer to the transcription of the nucleotide sequence (such as transcription to generate mRNA or functional RNA) and/or the translation of RNA into a precursor or mature protein.
  • the "expression construct" of the present invention can be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, can be RNA (such as mRNA) that can be translated, for example, RNA generated by in vitro transcription.
  • RNA such as mRNA
  • the "expression construct" of the present invention may contain regulatory sequences and nucleotide sequences of interest from different sources, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from those normally occurring in nature.
  • regulatory sequence and “regulatory element” are used interchangeably and refer to the upstream (5' non-coding sequence), middle or downstream (3' non-coding sequence) of the coding sequence, and affect the transcription, RNA processing, or processing of the related coding sequence. Stability or translated nucleotide sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
  • Promoter refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • the promoter is a promoter capable of controlling gene transcription in a cell, regardless of whether it is derived from the cell.
  • the promoter can be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably, and refer to mainly but not necessarily exclusively expressed in a tissue or organ, and can also be expressed in a specific cell or cell type The promoter.
  • tissue-preferred promoter refers to a promoter whose activity is determined by developmental events.
  • inducible promoters selectively express operably linked DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).
  • promoters include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters.
  • pol I promoter examples include chicken RNA pol I promoter.
  • pol II promoters include, but are not limited to, cytomegalovirus immediate early (CMV) promoter, Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and simian virus 40 (SV40) immediate early promoter.
  • pol III promoters include U6 and H1 promoters.
  • An inducible promoter such as a metallothionein promoter can be used.
  • promoters include T7 phage promoter, T3 phage promoter, ⁇ -galactosidase promoter, and Sp6 phage promoter.
  • the promoter can be cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, maize U3 promoter, rice actin promoter.
  • operably linked refers to the connection of regulatory elements (for example, but not limited to, promoter sequences, transcription termination sequences, etc.) to nucleic acid sequences (for example, coding sequences or open reading frames) such that the nucleotides The transcription of the sequence is controlled and regulated by the transcription control element.
  • regulatory elements for example, but not limited to, promoter sequences, transcription termination sequences, etc.
  • nucleic acid sequences for example, coding sequences or open reading frames
  • "Introducing" a nucleic acid molecule (e.g. plasmid, linear nucleic acid fragment, RNA, etc.) or protein into an organism refers to transforming the cell of the organism with the nucleic acid or protein so that the nucleic acid or protein can function in the cell.
  • the "transformation” used in the present invention includes stable transformation and transient transformation.
  • “Stable transformation” refers to the introduction of an exogenous nucleotide sequence into the genome, resulting in the stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generations thereof.
  • Transient transformation refers to the introduction of nucleic acid molecules or proteins into cells to perform functions without stable inheritance of foreign genes. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
  • Proteins refer to the physiological, morphological, biochemical or physical characteristics of cells or organisms.
  • “Agronomic traits” especially refer to the measurable index parameters of crop plants, including but not limited to: leaf green, grain yield, growth rate, total biomass or accumulation rate, fresh weight at maturity, dry weight at maturity, fruit Yield, seed yield, plant total nitrogen content, fruit nitrogen content, seed nitrogen content, plant nutrient tissue nitrogen content, plant total free amino acid content, fruit free amino acid content, seed free amino acid content, plant nutrient tissue free amino acid content, plant total protein Content, fruit protein content, seed protein content, plant nutrient tissue protein content, herbicide resistance, drought resistance, nitrogen absorption, root lodging, harvest index, stem lodging, plant height, ear height, ear length, disease resistance Resistance, cold resistance, salt resistance and tiller number.
  • the present invention provides a genome editing system for multiple editing in plants, especially crops, which comprises:
  • gene editing system refers to a combination of components required for editing the genome of a cell or organism.
  • the various components of the system such as CRISPR nickase, first scRNA, first fusion protein, second scRNA, second fusion protein, paired gRNA, and their expression vectors can exist independently of each other, or can be any
  • the combination is in the form of a composition.
  • CRISPR nickase refers to the nickase form of CRISPR nuclease, which forms a nick in the double-stranded nucleic acid molecule, but does not completely cut the double-stranded nucleic acid, and still retains the sequence-specific DNA binding ability guided by gRNA .
  • the CRISPR nickase is a Ca9 nickase, such as a Cas9 nickase derived from S. pyogenes Cas9 (SpCas9).
  • the Cas9 nickase comprises the amino acid sequence shown in SEQ ID NO: 25 (nCas9(D10A)).
  • the Cas9 nickase is a Cas9 variant nickase that recognizes the PAM sequence 5'-NG-3', and includes the amino acid sequence shown in SEQ ID NO: 48 (nCas9-NG(D10A)).
  • guide RNA and “gRNA” are used interchangeably, and refer to the ability to form a complex with CRISPR nuclease or its derivative protein such as CRISPR nickase, and because it has a certain identity with the target sequence, the The complex targets the RNA molecule of the target sequence.
  • the gRNA targets the target sequence by base pairing with the complementary strand of the target sequence.
  • the gRNA used by Cas9 nuclease or its derivative protein such as Cas9 nickase is usually composed of crRNA and tracrRNA molecules that are partially complementary to form a complex, wherein the crRNA contains sufficient identity with the target sequence to hybridize with the complementary strand of the target sequence.
  • sgRNA single guide RNA
  • the sgRNA comprises the nucleotide sequence shown in SEQ ID NO: 3 or SEQ ID NO: 4.
  • RNA aptamer refers to an RNA molecule that can specifically bind to a specific protein.
  • RNA aptamers suitable for the present invention include but are not limited to MS2, PP7, boxB and com, and the corresponding RNA aptamer-specific binding proteins are MCP (SEQ ID NO: 34), PCP (SEQ ID NO: 35), N22p (SEQ ID NO: 36) and COM (SEQ ID NO: 37).
  • scRNA or the interchangeably used terms "scaffold RNA” and “Scaffold RNA” refer to RNA molecules formed by incorporation of RNA aptamers on gRNA of the CRISPR system, such as sgRNA, which retain the function of gRNA, And can recruit the specific binding protein of the RNA aptamer or the fusion protein containing the binding protein.
  • the scRNA comprises two or more RNA aptamers. In some embodiments, the scRNA comprises the nucleotide sequence set forth in one of SEQ ID NO: 5-24.
  • the first scRNA comprises the nucleotide sequence shown in SEQ ID NO: 13 or 15.
  • the first RNA aptamer specific binding protein comprises the amino acid sequence shown in SEQ ID NO:34.
  • the first scRNA comprises the nucleotide sequence shown in SEQ ID NO:24.
  • the first RNA aptamer specific binding protein comprises the amino acid sequence shown in SEQ ID NO: 37.
  • the second scRNA comprises the nucleotide sequence shown in SEQ ID NO:22.
  • the second RNA aptamer specific binding protein comprises the amino acid sequence shown in SEQ ID NO: 36.
  • cytosine deamination domain refers to a domain that can accept single-stranded DNA as a substrate and catalyze the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively.
  • the cytosine deaminase domain comprises at least one (e.g., one or two) cytosine deaminase polypeptides.
  • the cytidine deaminization domain in the first fusion protein can convert the cytidine C deamination of the single-stranded DNA produced during the formation of the CRIPR nickase-first scRNA-first fusion protein-DNA complex into Uracil U, then through base mismatch repair to achieve C to T base replacement.
  • cytosine deaminase examples include, but are not limited to, for example, APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase, or their functional modifications body.
  • the cytosine deaminase is APOBEC1 deaminase or a functional variant thereof.
  • the cytosine deaminase comprises the amino acid sequence of one of SEQ ID NOs: 26-30.
  • the first RNA aptamer-specific binding protein in the first fusion protein, is located at the N-terminus of the cytosine deamination domain. In some embodiments, in the first fusion protein, the first RNA aptamer-specific binding protein and the cytosine deamination domain are fused through a linker.
  • the first fusion protein further comprises Uracil DNA Glycosylase Inhibitor (UGI).
  • Uracil DNA Glycosylase Inhibitor In the cell, uracil DNA glycosylase catalyzes the removal of U from DNA and initiates base excision repair (BER), resulting in the repair of U:G to C:G. Therefore, without being limited by any theory, the inclusion of Uracil DNA Glycosylase Inhibitor (UGI) in the first fusion protein of the present invention will be able to increase the efficiency of C to T base editing.
  • the UGI comprises the amino acid sequence shown in SEQ ID NO:31.
  • adenine deamination domain refers to a domain that can accept single-stranded DNA as a substrate and catalyze the formation of inosine (I) from adenosine or deoxyadenosine (A).
  • the adenine deaminase domain comprises at least one (eg, one) DNA-dependent adenine deaminase polypeptide.
  • the adenine deamination domain in the fusion protein can convert the adenosine deamination of single-stranded DNA produced in the formation of the CRISPR nickase-second scRNA-second fusion protein-DNA complex into inosine ( I), because DNA polymerase treats inosine (I) as guanine (G), the substitution of A to G can be achieved through base mismatch repair.
  • the DNA-dependent adenine deaminase is a variant of E. coli tRNA adenine deaminase TadA (ecTadA).
  • ecTadA E. coli tRNA adenine deaminase TadA
  • An exemplary wild-type ecTadA amino acid sequence is shown in SEQ ID NO: 32.
  • the DNA-dependent adenine deaminase comprises an amino acid sequence as shown in SEQ ID NO: 33.
  • E. coli tRNA adenine deaminase usually functions as a dimer, it is expected that two DNA-dependent adenine deaminase will form a dimer or DNA-dependent adenine deaminase and wild-type adenine The formation of dimers by deaminase can significantly increase the editing activity of fusion proteins A to G.
  • the adenine deaminase domain comprises two of the DNA-dependent adenine deaminase.
  • the adenine deaminase domain further comprises a corresponding DNA-dependent adenine deaminase (such as a DNA-dependent variant of E. coli tRNA adenine deaminase TadA) fused to the DNA-dependent adenine deaminase Wild-type adenine deaminase (eg E. coli tRNA adenine deaminase TadA).
  • the DNA-dependent adenine deaminase e.g., a DNA-dependent variant of E.
  • coli tRNA adenine deaminase TadA is fused to a corresponding wild-type adenine deaminase (e.g., E. coli The C-terminus of tRNA adenine deaminase (TadA).
  • DNA-dependent adenine deaminase e.g., a DNA-dependent variant of E. coli tRNA adenine deaminase TadA
  • DNA-dependent adenine deaminase e.g., A DNA-dependent variant of Escherichia coli tRNA adenine deaminase TadA
  • wild-type adenine deaminase for example, Escherichia coli tRNA adenine deaminase TadA
  • the second RNA aptamer-specific binding protein in the second fusion protein, is located at the C-terminus of the adenine deamination domain. In some embodiments, in the second fusion protein, the second RNA aptamer-specific binding protein and the adenine deamination domain are fused through a linker.
  • linkers can be 1-50 pieces in length (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, non-functional amino acid sequences without secondary or higher structure.
  • the joint may be a flexible joint or the like.
  • the linker is 16 amino acids in length, for example, the linker comprises the amino acid sequence shown in SEQ ID NO:41.
  • the linker is 36 amino acids in length, for example, the linker comprises the amino acid sequence shown in SEQ ID NO: 42 or 43.
  • the CRISPR nickase, the first fusion protein and/or the second fusion protein of the present invention may further comprise a nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • one or more of the NLS in the CRISPR nickase, the first fusion protein and/or the second fusion protein should have sufficient strength to drive the protein in the nucleus of the cell to achieve its base The amount of editing functions accumulates.
  • the strength of nuclear localization activity is determined by the number and location of NLS in the protein, one or more specific NLS used, or a combination of these factors.
  • the NLS of the CRISPR nickase, the first fusion protein and/or the second fusion protein of the present invention may be located at the N-terminal and/or C-terminal or in the middle.
  • the CRISPR nickase, first fusion protein and/or second fusion protein comprise about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS .
  • the CRISPR nickase, the first fusion protein and/or the second fusion protein comprise about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 at or near the N-terminus Or more NLS.
  • the CRISPR nickase, the first fusion protein, and/or the second fusion protein are contained at or near the C-terminal about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Or more NLS.
  • each one can be selected as not dependent on the other NLS.
  • the NLS comprises the amino acid sequence shown in SEQ ID NO: 39 or 40.
  • the CRISPR nickase, the first fusion protein and/or the second fusion protein of the present invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, etc.
  • the CRISPR nickase, the first fusion protein and/or the second fusion protein of the present invention are connected to each other by a "self-cleaving peptide".
  • self-cleaving peptide means a peptide that can achieve self-cleavage within a cell.
  • the self-cleaving peptide may include a protease recognition site, so that it can be recognized and specifically cleaved by the protease in the cell.
  • the self-cleaving peptide may be a 2A polypeptide.
  • 2A polypeptide is a type of short peptide derived from viruses, and its self-cleavage occurs during translation. When 2A polypeptide is used to connect two different target polypeptides and expressed in the same reading frame, the two target polypeptides are almost produced at a ratio of 1:1.
  • 2A polypeptides can be P2A from porcine techovirus-1, T2A from Thosea asignis virus, and E2A from equine rhinitis A virus. And F2A from foot-and-mouth disease virus.
  • T2A porcine techovirus-1
  • T2A from Thosea asignis virus
  • E2A from equine rhinitis A virus
  • F2A from foot-and-mouth disease virus.
  • T2A for example, comprising the amino acid sequence shown in SEQ ID NO:38.
  • the CRISPR nickase, the first fusion protein and/or the second fusion protein of the present invention can be placed in the same expression vector for expression.
  • the first scRNA, the second scRNA, and/or the paired gRNA may be expressed by the same expression construct.
  • the genome editing system of the present invention By using the genome editing system of the present invention, different types of genome editing of different target sites can be simultaneously realized through one transformation. For example, if i) and ii-1), ii-2) and ii-3) in the system are introduced into plants together (in the same vector or in separate vectors), the first target position can be achieved by one transformation CT editing in the spot, AG editing in the second target site, and deletion mutations in the third target site.
  • the present invention provides a method for producing genetically modified plants, such as crop plants, including introducing the genome editing system of the present invention into plants.
  • i) and ii-1) in the system are introduced into plants together, thereby achieving C-T editing in the first target site.
  • i) and ii-1) and ii-2) in the system are co-introduced into the plant, thereby achieving mid-CT editing at the first target site, and AG at the second target site edit.
  • i) and ii-2) and ii-3) in the system are introduced into plants together to achieve A-G editing in the second target site and deletion mutations in the third target site.
  • i) and ii-1) and ii-3) in the system are introduced into plants together to achieve C-T editing in the first target site and deletion mutations in the third target site.
  • i) and ii-1), ii-2) and ii-3) in the system are co-introduced into the plant to achieve CT editing in the first target site, and in the second target site AG editing, and deletion mutations at the third target site.
  • i), ii-1), ii-2), ii-3) and combinations thereof in the system are introduced into the plant at the same time, for example, into the plant in the same vector, or in one transformation Import plants.
  • the method includes:
  • the genome editing system can be introduced into the plant by various methods well known to those skilled in the art.
  • Methods that can be used to introduce the editing system of the present invention into plants include, but are not limited to: gene bombardment, PEG-mediated transformation of protoplasts, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube passage method, and ovary injection Law.
  • the system is introduced into the plant by transient transformation.
  • the target site can be modified by introducing or producing the protein and RNA molecules into plant cells, and the modification can be inherited stably without the need to stably transform the editing system into plants. This avoids the potential off-target effects of the stably existing editing system, and also avoids the integration of exogenous nucleotide sequences in the plant genome, thereby having higher biological safety.
  • the introduction is performed in the absence of selective pressure, so as to avoid the integration of foreign nucleotide sequences in the plant genome.
  • the introduction includes transforming the genome editing system of the present invention into an isolated plant cell or tissue, and then regenerating the transformed plant cell or tissue into a whole plant.
  • the regeneration is performed in the absence of selective pressure, that is, no selective agent for the selective gene carried on the expression vector is used during the tissue culture process. Not using selection agents can improve plant regeneration efficiency and obtain modified plants that do not contain exogenous nucleotide sequences.
  • the genome editing system of the present invention can be transformed to specific parts on the whole plant, such as leaves, stem tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to undergo tissue culture regeneration.
  • the protein expressed in vitro and/or the RNA molecule transcribed in vitro is directly transformed into the plant.
  • the protein and/or RNA molecule can realize gene editing in plant cells and then be degraded by the cell, avoiding the integration of foreign nucleotide sequences in the plant genome.
  • genetic modification and breeding of plants using the method of the present invention can obtain plants without foreign DNA integration, that is, transgene-free modified plants.
  • Plants that can be gene-edited by the system and method of the present invention include monocotyledonous plants and dicotyledonous plants.
  • the plant may be a crop plant such as wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava, or potato.
  • the target site is related to plant traits such as agronomic traits, whereby the editing results in the plant having an altered trait relative to a wild-type plant.
  • the target sequence to be modified can be located anywhere in the genome, for example, in a functional gene such as a protein-coding gene, or, for example, can be located in a gene expression regulatory region such as a promoter region or an enhancer region, so as to achieve Modification of gene function or modification of gene expression.
  • the method further includes obtaining progeny of the genetically modified plant.
  • the present invention also provides a genetically modified plant or its progeny or part thereof, wherein the plant is obtained by the above-mentioned method of the present invention.
  • the genetically modified plant or progeny or part thereof is non-transgenic.
  • the present invention also provides a plant breeding method, comprising crossing the genetically modified first plant obtained by the above-mentioned method of the present invention with a second plant not containing the genetic modification, thereby combining the The genetic modification is introduced into the second plant.
  • the plant cytosine base editing system PBE is mainly composed of the following modules: 1) Cytosine deaminase is used to deaminate cytosine (C) into uracil (Uracil, U); 2) nCas9(D10A) ) For sgRNA programmable DNA base editing and promote endogenous mismatch repair (Mismatch Repair, MMR) pathway; 3) Uracil DNA glycosylase inhibitor (Uracil DNA Glycosylase Inhibitor, UGI) for inhibition in vivo The activity of uracil glycosylase UDG prevents U from becoming an AP site.
  • MS2 is a commonly used RNA aptamer. Studies have shown that the scRNA formed by adding two MS2 hairpin structures to the 3'end of esgRNA can efficiently mediate CRISPRa in human cells. Therefore, the scRNA vector pOsU3-esgRNA-2 ⁇ MS2 driven by the OsU3 promoter was first constructed ( Figure 1b).
  • T2A "self-cleavage" peptides were used to express multiple protein modules at the same time, where nCas9 (D10A) was fused or not fused with APOBEC1 or UGI as the RNA programmable module, and MS2 binding protein MCP fused APOBEC1 or UGI as recruited modules to construct PBEc1 to PBEc5. All PBEc vectors are codon optimized for use and are driven by the Ubi-1 promoter of maize ( Figure 1c).
  • the BFP-to-GFP reporter system was used to evaluate the C>T efficiency of PBEc vectors.
  • the GFP fluorescent activity of the reporter system requires that the His66 encoded by CAC on BFP be changed to Tyr66 encoded by TAC. Therefore, a scRNA plasmid esgRNA-2 ⁇ MS2-BFP targeting this site was constructed, with base C located at the 4th position far from the PAM.
  • PEG induction method different combinations of PBEc and esgRNA-2 ⁇ MS2-BFP were used to transform rice protoplasts. PBE and sgRNA-BFP were used as the control group, GFP was the positive control, and the untransformed rice protoplasts were the negative control.
  • nPBEc4 has high product editing purity, non-target products are not obvious ( ⁇ 0.04%); its Indel efficiency (0.04-0.29%) is consistent with that of untreated rice protoplasts (0.01-0.32%), far Lower than Cas9 (4.82-11.75%) ( Figure 5).
  • the nCas9(D10A) complex programmed by scRNA simultaneously recruits APOBEC1 and UGI to enhance the base editing activity of C>T. Therefore, the carrier structure of PBEc4 is used as the object of further research.
  • a MS2 hairpin structure was chimerized on sgRNA or four loops and neck loops of esgRNA with an AU or CG flip and extended hairpin structure, respectively, to construct sgRNA2.0 to sgRNA4.0 ( Figure 2b and Figure 16). , 17).
  • sgRNA7-1, sgRNA7-2 and sgRNAB.0 containing two variants of PP7 hairpin structure and boxB hairpin structure were also constructed ( Figure 2b and Figures 16, 17).
  • scRNAs were constructed using the strategy of connecting 1 or 2 RNA aptamers (including MS2, PP7, boxB, com) hairpin structure at the 3'end ( Figure 2b and Figures 16, 17).
  • 1 or 2 RNA aptamers including MS2, PP7, boxB, com
  • a combination of the above two strategies was used to construct sgRNA4.0 containing the MS2 hairpin structure ( Figure 2b and Figures 16, 17).
  • the C>T editing efficiency of the transformation combination of PBEc4 and esgRNA-2 ⁇ MS2, esgRNA-3 ⁇ MS2 or sgRNA4.0 and the transformation combination of PBEc8 and esgRNA-2 ⁇ com in the reporting system is 7.47% and 8.00%, respectively , 8.83%, and 6.90%, all of which have higher efficiency than the transformation combination of PBE and esgRNA (6.03%) ( Figure 2c).
  • scRNA-2 ⁇ MS2, esgRNA-3 ⁇ MS2, sgRNA4.0, esgRNA-2 ⁇ com high-efficiency scRNA-mediated rice endogenous target sites
  • five endogenous targets The sites were constructed on four scRNA vectors and co-transformed rice protoplasts with PBEc4 and PBEc8, respectively.
  • the sgRNA or esgRNA vector containing the target site was co-transformed with PBE and Cas9, respectively, to transform rice protoplasts as a control group. After culturing at 22°C for 60 hours, the rice protoplasm DNA was extracted and amplicon NGS sequencing was performed.
  • esgRNA-2 ⁇ MS2, esgRNA-3 ⁇ MS2, esgRNA-2 ⁇ com-mediated C>T editing efficiency is 2.31 ⁇ 3.75 times higher than that of sgRNA ( Figure 2d, Figure 6), and have the same major The single base window C 3 ⁇ C 9 ( Figure 2d).
  • different scRNAs in rice protoplasts also have higher product purity (>99.68%) ( Figure 7a) and lower Indel value ( ⁇ 0.56%), much lower than Cas9 ( ⁇ 21.21%) ( Figure 7b) ).
  • RNA aptamers into sgRNA provides an effective solution for the multiple recruitment of nCas9 (D10A) using RNA programming in plants.
  • the selected esgRNA-2 ⁇ MS2, esgRNA-3 ⁇ MS2 and esgRNA-2 ⁇ com can be used as candidates for mediating CBE function in the multiple genome editing system.
  • PABE-7 mainly consists of the following modules: a heterodimer composed of wild-type adenine deaminase ecTadA and artificially evolved deoxyadenine deaminase ecTadA7.10, and PBE system Consistent nCas9 (D10A), and 3 copies of SV40NLS at the C end of nCas9 (D10A).
  • PABEc1 for esgRNA-2 ⁇ MS2 recruitment was first constructed based on PBEc4 ( Figure 3a). Use mGFP reporter system to test PABEc1-mediated A>G base editing activity.
  • the ecTadA-ecTadA7.10 heterodimer was fused to the C-terminus of nCas9 (D10A) on PABEc3 to construct PABEc4 ( Figure 3a).
  • the A>G base editing efficiency of PABEc4 was 10.60%, which was still lower than the combination of PABE-7 and esgRNA ( Figure 3b). This result suggests that the optimization of PABEc's conformation to improve its A>G base editing activity in plants is limited.
  • the PABEc3 conformation is used to further develop multiple systems and try to use other RNA aptamers to enhance the activity of this conformation.
  • the C-terminal MCP of PABEc3 was replaced with PCP, N22p, and Com, and PABEc5, PABEc6, and PABEc7 were constructed to identify the RNA hairpin structures of PP7, boxB, and com, respectively ( Figure 3c).
  • the A>G activity of the combination of PABEc6 and esgRNA-2 ⁇ boxB in the reporting system was 26.53%, which was slightly higher than the activity of the combination of PABE-7 and esgRNA (25.57%) (Figure 3d).
  • RNA aptamers the combinations with the highest reporter system A>G activity are: PABEc3 and esgRNA-2 ⁇ MS2+f6, PABEc5 and esgRNA-1 ⁇ PP7-1, PABEc7 and esgRNA-2 ⁇ com
  • PABEc3 and esgRNA-2 ⁇ MS2+f6 PABEc5 and esgRNA-1 ⁇ PP7-1
  • the efficiencies were 18.07%, 21.03%, and 22.47%, respectively, which were all lower than the combination of PABE-7 and esgRNA (Figure 3d).
  • vectors PABEc8, PABEc9, and PABEc10 based on the PABEc2 conformation were also constructed (Figure 9a), using PCP, N22p or Com binding protein at the N-terminus of adenine deaminase, but all tested combinations are in the mGFP reporter system.
  • the efficiency ( ⁇ 21.23%) was lower than the combination of PABE-2 and sgRNA (22.87%) ( Figure 9b).
  • PABEc and these scRNAs Similar to the use of scRNAs containing RNA hairpin structures on the four loops and stem loop 2 in PBEc, PABEc and these scRNAs also mediate a lower GFP fluorescence signal (Figure 3d). Therefore, the scRNA with the 3′-end RNA hairpin structure mediates C>T and A>G more efficiently than the scRNA with the four loops and the stem loop 2 RNA hairpin structure.
  • the average efficiency in the main editing window A 4 to A 8 is 4.65%, which is comparable to that of PABE.
  • the efficiency of the combination of -2 and sgRNA is comparable (average 4.78%) ( Figure 3e, Figure 10a). Since the length of N22p in the tested binding protein is the shortest, 33 amino acids, it is speculated that the activity of the ecTadA-ecTadA7.10 heterodimer is not only affected by the position of the binding protein, but also by the length of the binding protein.
  • nCas9 D10A
  • nCas9 (D10A) the function of nCas9 (D10A) to mediate multiple genome editing, firstly based on PBEc4 and simultaneously expressed 1 esgRNA-2 ⁇ MS2 and a pair of sgRNA integrated SWISSv1.1, simultaneously produces cytosine at different target sites Base editing and paired nCas9-mediated DSB ( Figure 4a).
  • the Ubi-1 promoter and T2A "self-cleaving" peptide were used to simultaneously express nCas9(D10A), MCP-APOBEC1-UGI And ecTadA-ecTadA7.10-N22p, the MGE vector was constructed ( Figure 13a).
  • the TaU6 promoter drives esgRNA-2 ⁇ MS2 for C>T editing
  • the OsU3 promoter drives esgRNA-2 ⁇ boxB for A>G editing ( Figure 13b), which is combined with MGE to form SWISSv2 ( Figure 4c).
  • SWISSv3 provides an alternative solution for gene stacking and heritable modification in plants.
  • Paired sgRNAs are designed according to the following rules: (1) with the PAM in the outer direction; (2) the distance between the cut sites is the40-68bp.
  • SWISSv3 can also produce Three-site single mutant mutants (Table 4).
  • SWISSv3 using scRNA can be used as a triple-function comprehensive programmable genome editing system in plants, which will be beneficial to crop molecular breeding.
  • T0 lines (rice) carrying the observed mutations is relative to the total number of T0 transgenic rice lines analyzed.
  • b Analyze the genotype of indels through online tools DSDecodeM (reference) and TIDE (reference). .
  • T2A is used to express multiple modules at the same time. It is speculated that the "self-cleavage" efficiency of T2A will affect the product purity of CBE or ABE target sites ( Figure 15a). As shown in Figure 15a, T2A-mediated "self-cleavage" is achieved by ribosome skipping the glycine-proline peptide bond formed at the C-terminus of T2A. The position of T2A will affect the level of simultaneous expression of multiple modules.
  • MGE successful skipping will produce three independent target proteins, but failure of skipping will produce non-targeted fusion proteins, especially the produced MCP-APOBEC1-UGI-T2A-ecTadA-ecTadA7.10-N22p Fusion proteins can be recruited by esgRNA-2 ⁇ MS2 and esgRNA-2 ⁇ boxB, which will produce non-target C-base editing at the ABE target site or non-target A-base editing at the CBE target site, causing non-target sexual off-target. Analyze the amplicon NGS sequencing results of SWISSv2 and SWISSv3 rice protoplasts, and check the efficiency of ABE target site cytosine editing and CBE target site adenine editing.
  • a PAM motif is shown in bold and underlined; b NA is not available.
  • the multiple genome editing system using multiple sgRNA strategies can be divided into two aspects: one is to perform the same type of genome editing on different target sites; the other is to perform different targets on different target sites proposed in this study.
  • Types of genome editing So far, it is the first time that the SWISS system developed in this research can use a programmable Cas protein to mediate multiple different types of genome editing simultaneously in plants. Although this multiple editing can be achieved using CRISPR/Cas homologous proteins, the carriers of multiple homologous proteins will be larger, which is not conducive to gene gun-mediated genetic transformation, and the requirements for PAM are also stricter.
  • the SWISS system uses only one type of nCas9 (D10A), which can alleviate the problems caused by the above two shortcomings, especially the Cas9 variant using NG PAM, which can further expand the editing scope of SWISS ( Figure 4e, Table 2).
  • RNA polymerase III promoters OsU3 and TaU6 were used to express multiple sgRNAs.
  • Other multiple sgRNA strategies can also be used to further optimize the SWISS system, such as Csy4RNA ribonuclease or ribozymes to produce multiple sgRNAs.
  • the average C>T activity of scRNA-recruited construction is higher than that of PBE, and it is not accompanied by a wider base editing window. This strategy can be used to increase the editing activity of narrow editing window cytosine variants.
  • the A>G activity of the construction recruited by scRNA is only comparable to PABE-2, its activity is sufficient to mediate SWISSv3 to obtain rice A>G mutants.
  • RNA aptamers does not improve the efficiency of PABEc construction. It also means that the space for optimizing PABEc is relatively limited, and it is necessary to develop more efficient adenine deaminase.
  • the dual-function SWISSv1.1 and SWISSv1.2 systems can also be implemented by using PBE and PABE combined with multiple sgRNA strategies.
  • the RNA aptamer recruitment strategy in this study provides another alternative method, especially nCas9(D10A) overexpressing plants are subjected to multiple genome editing. This strategy is based on advantages. Therefore, in the future, nCas9(D10A) overexpression rice can also be constructed. As a development platform, it only needs to transform multiple sgRNA and base editing recruitment modules. After a second transformation, the multiple editing functions of SWISS can be realized, and it can also reduce non-purpose The off-target phenomenon is conducive to the molecular design and breeding of crops.
  • a quadruple-function CRISPR system can be realized.
  • the SWISS system can also use random and multiple sgRNA strategies for directed evolution of plant endogenous genes, as well as applications beyond plants, such as changing cell fate or metabolic regulation pathways.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

提供了一种适用于植物特别是作物的多重基因组编辑方法和系统,其是基于nCas9核酸酶、且能够同时实现不同类型基因组编辑的系统和方法。

Description

多重基因组编辑方法和系统 技术领域
本发明涉及植物基因工程领域。具体而言,本发明涉及一种适用于植物特别是作物的多重基因组编辑方法和系统。更具体而言,本发明涉及基于CRISPR切口酶的能够同时实现不同类型基因组编辑的系统和方法。
发明背景
作为一种可编程的分子生物学技术,成簇的规律间隔的短回文重复序列及其相关系统(Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated,CRISPR/Cas)极大的促进了分子生物学的发展。在Class 2系统中,越来越多的Cas蛋白被发现并工程化改造,其中包括Cas9靶向DNA,Cas12靶向单链DNA(Single Strand DNA,ssDNA)和RNA,Cas13靶向RNA,以及CAST系统用来插入DNA。CRISPR/Cas系统的多样性和简便性使其成为一个超级的分子工具箱。此外,Cas蛋白还可以被改造为核酸酶活性缺失的变体。来自酿脓链球菌的Cas9(Streptococcus pyogenes Cas9,SpCas9)蛋白由RuvC和HNH两个核酸酶结构域组成,分别切割非靶向链和靶向链。因此,通过替换第10位的天冬氨酸(Asp10)或第840位的组氨酸(His840)为丙氨酸(Ala),SpCas9可以被工程化改造为一个切口酶nCas9(Nickase Cas9);又或者将Asp10和His840同时替换为丙氨酸,使SpCas9失去核酸酶活性,成为dCas9(Deactive Cas9)。这些变体的开发,促进了CRISPR/Cas9系统成为基因组编辑系统的工具箱(图1a)。Cas9用来在基因组上产生双链断裂(Double Strand Break,DSB);成对的nCas9也可以用于在基因组上产生高特异性的DSB,nCas9(D10A)还被用于单碱基编辑系统CBE(Cytosine Base Editor)和ABE(Adenine Base Editor)的开发;dCas9常被用于融合各种效应蛋白进而实现CRISPR干扰(CRISPR Interference,CRISPRi)、CRISPR激活(CRISPR Activation,CRISPRa)、基因组成像以及表观遗传修饰等。但是,大部分情况下,这些系统经过一次转化只执行一种类型的基因组编辑。
为了实施多重综合可编程的基因组编辑应用,目前已经开发了几种策略。一种策略利用截短的sgRNA或crRNA控制Cas9或Cas12a的核酸酶活性来调控基因的表达,同时利用全长的sgRNA或crRNA在另一位点产生DSB。另外一种策略将RNA适配体的发夹结构整合到sgRNA骨架上,形成支架RNA(Scaffold RNA,scRNA),dCas9/scRNA复合体通过发夹结构招募基因激活或抑制因子,可以在不同的位点同时实现基因转录激活和抑制的双重功能。还有一种策略使用多个同源的CRISPR系统在不同的靶位点同时实现基因激活、抑制和删除三重功能。但是,这些基因组工程的多重策略多是在细菌、酵母和人类细胞中开发的。由于受到递送方法和PAM的限制,使用不同的同源CRISPR 系统在植物中开发一个多重基因组编辑系统依然具有挑战性。此外,植物中同源重组(Homologous Recombination,HR)的效率依然比较低,在遗传水平上将多个重要的农艺性状进行叠加或改变基因的调控网络对育种家具有重要意义。因此,本领域迫切需要能够在植物例如作物中实现多重基因组编辑的方法和系统。
发明简述
与Cas9和dCas9相比,nCas9在多重基因组编辑上的潜力并没有完全开发出来。本发明提供了一种基于nCas9核酸酶的多重基因组编辑系统,并命名为单系统产生的同时多重编辑(Simultaneous and Wide-editing Induced by Single System,SWISS)(图1a)。SWISS利用两种含有不同RNA适配体(aptamer)的scRNA分别招募相应RNA适配体结合蛋白融合的胞嘧啶脱氨酶或腺嘌呤脱氨酶,经一次转化即可在不同的靶位点分别实现CBE和ABE两种编辑类型。再将成对的sgRNA引入到该SWISS系统中,可以在第三个靶位点处产生DSB,使得SWISS成为具有三重编辑功能的CRISPR系统(图1a)。
附图简述
图1、RNA适配体募集的植物胞嘧啶碱基编辑器构建的优化。(a)基于nCas9核酸酶的CRISPR支架RNA编程的多重基因组编辑系统。(b)pOsU3-esgRNA-2×MS2构建体的结构,在esgRNA的3'末端带有两个MS2发夹结构。(c)PBEc1至PBEc5的体系结构。缩写:XTEN,16-aa的接头;NLS,核定位信号;CaMV,花椰菜花叶病毒;Term,终止子。(d)比较水稻原生质体中由PBE和五种PBEc诱导的BFP变GFP报告系统的C>T转换(n=3)。值和误差棒表示三个独立实验的平均值±标准误差。(e)比较由PBE和五种PBEc诱导的水稻内源基因的C>T编辑频率(n=3)。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。
图2、多种支架RNA和结合蛋白直系同源物可以有效介导C到T的转换。(a)PBEc6至PBEc8的结构。(b)在sgRNA和esgRNA的四环(tetraloop)和茎环2或在3'末端带有MS2、PP7、boxB或com RNA发夹结构的scRNA示意图。(c)比较水稻原生质体中由各种scRNA及其同源PBEc诱导的BFP变GFP报告系统的C>T转换(n=3)。值和误差棒表示三个独立实验的平均值±标准误差。(d)比较由四种scRNA及其同源PBEc诱导的水稻内源基因的C>T编辑频率(n=3)。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。(e)通过使用scRNA招募策略(n=3),可以增强APOBEC1窄窗口变体的C>T编辑频率。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。
图3、使用多种支架RNA和结合蛋白同源物优化植物腺嘌呤碱基编辑器构建体。(a)PABEc1到PABEc4的结构。缩写:ecTadA7.10,进化后的大肠杆菌TadA;aa,氨基酸;XTEN,16aa的链接器;NLS,核定位信号;CaMV,花椰菜花叶病毒;Term,终止子。(b)比较水稻原生质体中由PABE和四个PABEc诱导的mGFP-GFP报告系统的A>G 转换(n=3)。值和误差棒表示三个独立实验的平均值±标准误差。(c)PABEc5至PABEc7的结构。(d)在水稻原生质体中,使用由多种scRNA及其同源PABEc诱导的mGFP变GFP报告系统的A>G转换比较(n=3)。值和误差棒表示三个独立实验的平均值±标准误差。(e)比较由五种scRNA及其同源PABEc诱导的水稻内源基因的A>G编辑频率(n=3)。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。
图4、在水稻原生质体中基于nCas9(D10A)平台的CRISPR支架RNA编程的同时多重基因组编辑。(a)由PBEc4用esgRNA-2×MS2和配对的sgRNA同时诱导的CBE和DSB。左图为SWISSv1.1策略示意图。右图,测试了两组sgRNA(n=3)。将带有esgRNA-2×MS2的CBE靶标和用于DSB的成对sgRNA组装在同一载体中。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。(b)PABEc6用esgRNA-2×boxB和成对的sgRNA同时诱导ABE和DSB。左图是SWISSv1.2策略的示意图。右图,测试了两组sgRNA(n=3)。将带有esgRNA-2xboxB的ABE靶标和用于DSB的成对sgRNA组装在同一载体中。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。(c)由esgRNA-2×MS2和esgRNA-2×boxB的MGE诱导的同时CBE和ABE。左图是SWISSv2策略的示意图。右图,测试了两组sgRNA(n=3)。在同一载体中组装了一个带有esgRNA-2×MS2的CBE靶标和一个带有esgRNA-2×boxB的ABE靶标。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。(d)由MGE用esgRNA-2×MS2、esgRNA-2×boxB和成对的sgRNA诱导的同时CBE、ABE和DSB。上,SWISSv3策略示意图。下,测试了两组sgRNA(n=3)。在同一载体中组装了一个带有esgRNA-2×MS2的CBE靶标,一个带有esgRNA-2×boxB的ABE靶标和用于DSB的成对sgRNA。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。(e)nCas9-NG PAM变体可以扩大SWISSv2和SWISSv3多重基因组编辑策略的范围。测试了两组用于SWISSv2的sgRNA和一组用于SWISSv3的sgRNA(n=3)。将多个sgRNA组装在同一载体中。未经处理的原生质体样品用作对照。值和误差棒表示三个独立实验的平均值±标准误差。
图5、PBEc和Cas9在水稻原生质体中的插入缺失效率。PBE+sgRNA、PBEc1+esgRNA-2×MS2、PBEc2+esgRNA-2×MS2、PBEc3+esgRNA-2×MS2、PBEc4+esgRNA-2×MS2、PBEc5+esgRNA-2×MS2和Cas9+sgRNA在水稻原生质体中的插入缺失效率比较。值和误差棒表示三个独立生物学重复的平均值±标准误差。
图6、水稻原生质体中不同scRNA和同源PBEc对内源基因的碱基编辑效率。数据以箱线图的形式显示(中心线,中位数,框线限制,数据的第25和75个百分位数;上下须线分别延伸到最小值或最大值)。每个箱线图中的数据包括三个独立的实验(n=39)。
图7、esgRNA-2×MS2、esgRNA-3×MS2、sgRNA4.0和esgRNA-2×com与相关PBEc在水稻原生质体中的产物纯度和插入缺失频率。(a)显示了水稻原生质体中带有同 源PBEc的esgRNA-2×MS2、esgRNA-3×MS2,sgRNA4.0和esgRNA-2×com的编辑的DNA测序读数之间的产物分布。值和误差棒表示三个独立实验的平均值±标准误差。(b)esgRNA-2×MS2、esgRNA-3×MS2、sgRNA4.0和esgRNA-2×com与同源PBEc在水稻原生质体中的插入缺失效率。值和误差棒表示三个独立生物学重复的平均值±标准误差。
图8、水稻原生质体中支架RNA招募的APOBEC1窄窗口变体的C至T编辑频率。(a)YE1-PBE、YE2-PBE、EE-PBE和YEE-PBE的结构。缩写:XTEN,16-aa的接头;NLS,核定位信号;CaMV,花椰菜花叶病毒;Term,终止子。(b)YE1-PBEc4、YE2-PBEc4、EE-PBEc4和YEE-PBEc4的结构。缩写:XTEN,16-aa的接头;NLS,核定位信号;CaMV,花椰菜花叶病毒;Term,终止子。(c)APOBEC1窄窗口变体在nCas9融合结构与APOBEC1窄窗口变体在scRNA招募架构中的活性。测试了OsEV和OsOD。显示了三个独立的生物学重复之一。
图9、具有腺苷脱氨酶N末端结合蛋白的PABEc8至PABEc10的活性。(a)PABEc8到PABEc10的结构。缩写:ecTadA7.10,进化后的大肠杆菌TadA;aa,氨基酸;XTEN,16aa的接头;NLS,核定位信号;CaMV,花椰菜花叶病毒;Term,终止子。(b)比较由水稻原生质体中各种scRNA及其同源PABEc诱导的mGFP变GFP报告系统的A>G转换(n=3)。值和误差棒表示三个独立实验的平均值±标准误差。
图10、水稻原生质体中所选支架RNA与同源PABEc的活性、产物纯度和插入缺失频率。(a)水稻原生质体中esgRNA-2×MS2、esgRNA-MS2+f6、esgRNA-1×PP7-1、esgRNA-2×boxB和esgRNA-2×com与同源PABEc的活性。测试了OsALS-T1、OsCDC48、OsDEP1-T1、OsNRT1.1B、OsEV和OsOD靶。显示了三个独立的生物学重复之一。(b)显示了在水稻原生质体中esgRNA-2×MS2、esgRNA-MS2+f6、esgRNA-1×PP7-1、esgRNA-2×boxB和esgRNA-2×com和关联PABEc编辑的DNA测序读段的产物分布。值和误差棒表示三个独立实验的平均值±标准误差。(c)水稻原生质体中esgRNA-2×MS2、esgRNA-MS2+f6、esgRNA-1×PP7-1、esgRNA-2×boxB和esgRNA-2×com与同源PABEc的插入缺失效率。值和误差棒表示三个独立生物学重复的平均值±标准误差。
图11、SWISSv1.1和SWISSv1.2的多种sgRNA组装示意图。(a)配对的sgRNA组装示意图。成对的sgRNA(成对的sgL和sgR)以PAM向外方向设计,切口位点之间的距离为40-68bp。从esgRNA-pTaU6模板扩增PCR产物,然后通过Golden Gate Assembly将其插入pOsU3-esgRNA的BsaI位点。(b)CBE靶标和成对sgRNA组装的示意图。将CBE靶插入pOsU3-esgRNA-2×MS2的BsaI位点,然后扩增pOsU3-CBE靶标-esgRNA-2×MS2的部分。从成对的sgRNAs质粒中扩增出具有成对的sgRNA的PCR产物。通过多步克隆将上述两种PCR产物组装到pOsU3-esgRNA的EcoRI和HindIII消化的骨架中。(c)ABE靶标和配对sgRNA装配的示意图。将ABE靶插入pOsU3-esgRNA-2×boxB的BsaI位点,然后扩增pOsU3-ABE靶标-esgRNA-2×boxB部分。从成对的sgRNAs质粒 中扩增出具有成对的sgRNA的PCR产物。通过多重一步克隆将上述两种PCR产物组装到pOsU3-esgRNA的EcoRI和HindIII消化的骨架中。
图12、显示了SWISSv1.1、SWISSv1.2和SWISSv3插入缺失测序读段之间的缺失产物分布。值和误差棒表示三个独立实验的平均值±标准误差。
图13、SWISSv2和SWISSv3的多种sgRNA组装示意图。(a)MGE和MGE-NG的结构。缩写:ecTadA7.10,进化后的大肠杆菌TadA;aa,氨基酸;XTEN,16aa的接头;NLS,核定位信号;CaMV,花椰菜花叶病毒;Term,终止子。(b)CBE靶标和ABE靶组装的示意图。从esgRNA-2×boxB-pTaU6模板扩增PCR产物,然后通过Golden Gate Assembly将其插入pOsU3-esgRNA-2×MS2的BsaI位点。(c)CBE靶标、ABE靶标和成对的sgRNA装配的示意图。从CBE靶标和ABE靶标双重sgRNA质粒中扩增出具有CBE靶标和ABE靶标的PCR产物。从成对的sgRNA质粒中扩增出具有成对的sgRNA的PCR产物。通过多重一步克隆将上述两种PCR产物组装到pOsU3-esgRNA的EcoRI和HindIII消化的骨架中。
图14、水稻中同时进行CBE、ABE和DSB。(a)CBE靶标、ABE靶标和配对sgRNA的双元载体示意图。从CBE靶标和ABE靶标双重sgRNA质粒中扩增出具有CBE靶标和ABE靶标的PCR产物。从成对的sgRNA质粒中扩增出具有成对的sgRNA的PCR产物。通过多重一步克隆将以上两种PCR产物组装到HindIII消化的pH-MGE双元载体中。(b)OsALS-T2、OsACC-T2和OsBADH2-Indels突变体的T7E1分析结果。目标C/A碱基以红色突出显示。PAM序列显示为棕色。分析了十株T0苗(T0-1至T0-10)。WT/D和WT/U表示有或没有由T7E1消化的野生型(WT)对照的基因组DNA扩增子。总共鉴定出55个突变体。红色箭头标记的条带可诊断基因组编辑阳性。通过Sanger测序确定序列。通过在线工具DSDecodeM和TIDE进一步分析插入缺失的测序谱图。
图15、分析SWISSv2和SWISSv3引起的不期望的编辑脱靶。(a)“自切割”T2A肽介导的可能的MGE蛋白产物的示意图。T2A介导的“自切割”是通过核糖体跳过在C端形成甘氨酰-脯氨酰肽键的方式实现的。T2A的位置将影响多顺反子构建体的表达水平。成功跳过可按设计产生三种独立的蛋白质。但是,也可能发生跳过失败,从而产生融合蛋白产物。尤其是当产生了MCP-APOBEC1-UGI-T2A-ecTadA-ecTadA7.10-N22p融合蛋白时,esgRNA-2×MS2和esgRNA-2×boxB均可募集该融合蛋白,从而导致对ABE靶标进行不希望的胞嘧啶编辑和在CBE目标上进行不希望的腺嘌呤编辑。(b)由SWISSv2和SWISSv3引起的不希望的编辑脱靶的效率。分析了SWISSv2和SWISSv3中ABE靶的胞嘧啶和CBE靶的腺嘌呤。值和误差棒表示三个独立实验的平均值±标准误差。
图16、示出所设计的多种不同scRNA/sgRNA结构。
图17、示出所设计的多种不同的scRNA/sgRNA结构。
发明详述
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,Molecular Cloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold Spring Harbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
如本文所用,术语“和/或”涵盖由该术语连接的项目的所有组合,应视作各个组合已经单独地在本文列出。例如,“A和/或B”涵盖了“A”、“A和B”以及“B”。例如,“A、B和/或C”涵盖“A”、“B”、“C”、“A和B”、“A和C”、“B和C”以及“A和B和C”。
“包含”一词在本文中用于描述蛋白质或核酸的序列时,所述蛋白质或核酸可以是由所述序列组成,或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷酸,但仍然具有本发明所述的活性。此外,本领域技术人员清楚多肽N端由起始密码子编码的甲硫氨酸在某些实际情况下(例如在特定表达系统表达时)会被保留,但不实质影响多肽的功能。因此,本申请说明书和权利要求书中在描述具体的多肽氨基酸序列时,尽管其可能不包含N端由起始密码子编码的甲硫氨酸,然而此时也涵盖包含该甲硫氨酸的序列,相应地,其编码核苷酸序列也可以包含起始密码子;反之亦然。
“基因组”如本文所用不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
“经遗传修饰的植物”意指在其基因组内包含外源多核苷酸或包含经修饰的基因或表达调控序列的植物。例如外源多核苷酸能够稳定地整合进植物的基因组中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。经修饰的基因或表达调控序列为在植物基因组中所述基因或表达调控序列包含一个或多个脱氧核苷酸取代、缺失和添加。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A”为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G”表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H”表示A或C或T,“D”表示A、T或G,“I”表示肌苷,并且“N”表示任何核苷酸。
“多肽”、“肽”、和“蛋白”在本发明中可互换使用,指氨基酸残基的聚合物。该 术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白”还可包括修饰形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。
如本发明所用,“表达构建体”是指适于感兴趣的核苷酸序列在生物体中表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA),例如是体外转录生成的RNA。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中,启动子是能够控制细胞中基因转录的启动子,无论其是否来源于所述细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用,并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。
启动子的实例包括但不限于聚合酶(pol)I、pol II或pol III启动子。pol I启动子的实例包括鸡RNA pol I启动子。pol II启动子的实例包括但不限于巨细胞病毒立即早期(CMV)启动子、劳斯肉瘤病毒长末端重复(RSV-LTR)启动子和猿猴病毒40(SV40)立即早期启动子。pol III启动子的实例包括U6和H1启动子。可以使用诱导型启动子如金属硫蛋白启动子。启动子的其他实例包括T7噬菌体启动子、T3噬菌体启动子、β-半乳糖苷酶启动子和Sp6噬菌体启动子。当用于植物时,启动子可以是花椰菜花叶病毒35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”生物体是指用所 述核酸或蛋白质转化生物体细胞,使得所述核酸或蛋白质在细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。“稳定转化”指将外源核苷酸序列导入基因组中,导致外源基因稳定遗传。一旦稳定转化,外源核酸序列稳定地整合进所述生物体和其任何连续世代的基因组中。“瞬时转化”指将核酸分子或蛋白质导入细胞中,执行功能而没有外源基因稳定遗传。瞬时转化中,外源核酸序列不整合进基因组中。
“性状”指细胞或生物体的生理的、形态的、生化的或物理的特征。
“农艺性状”特别是指作物植物的可测量的指标参数,包括但不限于:叶片绿色、籽粒产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白含量、种子蛋白含量、植物营养组织蛋白质含量、除草剂的抗性抗旱性、氮的吸收、根的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗盐性和分蘖数等。
二、多重基因组编辑系统
在一方面,本发明提供一种用于在植物尤其是作物中进行多重编辑的基因组编辑系统,其包含:
i)CRISPR切口酶和/或含有编码所述CRISPR切口酶的核苷酸序列的表达构建体;和
ii)选自以下的一项或多项或全部:
ii-1)靶向所述植物基因组内的第一靶区域的第一scRNA和/或含有编码所述第一scRNA的核苷酸序列的表达构建体,所述第一scRNA包含至少一个第一RNA适配体(aptamer);以及,第一融合蛋白和/或含有编码所述第一融合蛋白的核苷酸序列的表达构建体,所述第一融合蛋白包含第一RNA适配体特异性结合蛋白和胞嘧啶脱氨结构域;
ii-2)靶向所述植物基因组内的第二靶区域的第二scRNA和/或含有编码所述第二scRNA的核苷酸序列的表达构建体,所述第二scRNA包含至少一个第二RNA适配体;以及,第二融合蛋白和/或含有编码所述第二融合蛋白的核苷酸序列的表达构建体,所述第二融合蛋白包含第二RNA适配体特异性结合蛋白和腺嘌呤脱氨结构域;
ii-3)靶向所述植物基因组内的第三靶区域的成对gRNA和/或含有编码所述成对gRNA的核苷酸序列的表达构建体,所述成对gRNA分别靶向所述第三靶区域DNA的不同链。
如本文所用,“基因组编辑系统”是指用于对细胞或生物体内基因组进行编辑所需的成分的组合。其中所述系统的各个成分,例如CRISPR切口酶、第一scRNA、第一融合蛋白、第二scRNA、第二融合蛋白、成对gRNA,以及它们的表达载体可以各自独立 地存在,或者可以以任意的组合作为组合物的形式存在。
如本文所用,“CRISPR切口酶”是指CRISPR核酸酶的切口酶形式,其在双链核酸分子形成切口(nick),但不完全切断双链核酸,仍然保留gRNA指导的序列特异性DNA结合能力。
在一些实施方案中,所述CRISPR切口酶是Ca9切口酶,例如是衍生自化脓链球菌(S.pyogenes)Cas9(SpCas9)的Cas9切口酶。在一些实施方案中,所述Cas9切口酶包含SEQ ID NO:25所示的氨基酸序列(nCas9(D10A))。
在一些实施方案中,所述Cas9切口酶是识别PAM序列5’-NG-3’的Cas9变体切口酶,其包含SEQ ID NO:48所示氨基酸序列(nCas9-NG(D10A))。
如本文所用,“向导RNA”和“gRNA”可互换使用,指的是能够与CRISPR核酸酶或其衍生蛋白如CRISPR切口酶形成复合物并由于与靶序列具有一定相同性而能够将所述复合物靶向靶序列的RNA分子。gRNA通过与靶序列互补链之间的碱基配对而靶向所述靶序列。例如,Cas9核酸酶或其衍生蛋白如Cas9切口酶所采用的gRNA通常由部分互补形成复合物的crRNA和tracrRNA分子构成,其中crRNA包含与靶序列具有足够相同性以便与该靶序列的互补链杂交并且指导CRISPR复合物(Cas9+crRNA+tracrRNA)与该靶序列序列特异性地结合的引导序列(也称spacer)。然而,本领域已知可以设计单向导RNA(sgRNA),其同时包含crRNA和tracrRNA的特征。
在一些实施方案中,所述sgRNA包含SEQ ID NO:3或SEQ ID NO:4所示核苷酸序列。
如本文所用,“RNA适配体(aptamer)”是指能够与特定蛋白特异性结合的RNA分子。适于本发明的RNA适配体的实例包括但不限于MS2、PP7、boxB和com,其相应的RNA适配体特异性结合蛋白为MCP(SEQ ID NO:34)、PCP(SEQ ID NO:35)、N22p(SEQ ID NO:36)和COM(SEQ ID NO:37)。
如本文所用,“scRNA”或可互换使用的术语“支架RNA”、“Scaffold RNA”是指在CRISPR系统的gRNA例如sgRNA上并入RNA适配体形成的RNA分子,其保留gRNA的功能,并能够招募所述RNA适配体的特异性结合蛋白或包含所述结合蛋白的融合蛋白。
在一些实施方案中,所述scRNA包含两个或更多个RNA适配体。在一些实施方案中,所述scRNA包含SEQ ID NO:5-24之一所述的核苷酸序列。
在一些优选实施方案中,所述第一scRNA包含SEQ ID NO:13或15所示核苷酸序列。相应的,所述第一RNA适配体特异性结合蛋白包含SEQ ID NO:34所示氨基酸序列。
在一些优选实施方案中,所述第一scRNA包含SEQ ID NO:24所示核苷酸序列。相应的,所述第一RNA适配体特异性结合蛋白包含SEQ ID NO:37所示氨基酸序列。
在一些优选实施方案中,所述第二scRNA包含SEQ ID NO:22所示核苷酸序列。相应的,所述第二RNA适配体特异性结合蛋白包含SEQ ID NO:36所示氨基酸序列。
如本文所用,“胞嘧啶脱氨结构域”指的是能够接受单链DNA作为底物,催化胞苷 或脱氧胞苷分别脱氨化为尿嘧啶或脱氧尿嘧啶的结构域。在一些实施方案中,所述胞嘧啶脱氨结构域包含至少一个(例如一个或两个)胞嘧啶脱氨酶多肽。
在本发明中,第一融合蛋白中的胞苷脱氨结构域能够将CRIPR切口酶-第一scRNA-第一融合蛋白-DNA复合物形成中产生的单链DNA的胞苷C脱氨转换成尿嘧啶U,再通过碱基错配修复实现C至T的碱基替换。
可用于本发明的胞嘧啶脱氨酶的实例包括但不限于例如APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶,或它们的功能性变体。在一些实施方式中,所述胞嘧啶脱氨酶是APOBEC1脱氨酶或其功能性变体。在一些实施方案中,所述胞嘧啶脱氨酶包含SEQ ID NO:26-30之一的氨基酸序列。
在一些实施方案中,所述第一融合蛋白中,所述第一RNA适配体特异性结合蛋白位于所述胞嘧啶脱氨结构域的N端。在一些实施方案中,所述第一融合蛋白中,所述第一RNA适配体特异性结合蛋白与所述胞嘧啶脱氨结构域之间通过接头融合。
在一些实施方案中,所述第一融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。在细胞中,尿嘧啶DNA糖基化酶催化U从DNA上的去除并启动碱基切除修复(BER),导致将U:G修复成C:G。因此,不受任何理论限制,在本发明的第一融合蛋白包含尿嘧啶DNA糖基化酶抑制剂(UGI)将能够增加C至T碱基编辑的效率。
在一些实施方案中,所述UGI包含SEQ ID NO:31所示氨基酸序列。
如本文所用,“腺嘌呤脱氨结构域”是指能够接受单链DNA作为底物,催化腺苷或脱氧腺苷(A)形成肌苷(I)的结构域。在一些实施方案中,所述腺嘌呤脱氨结构域包含至少一个(例如一个)DNA依赖型腺嘌呤脱氨酶多肽。
在本发明中,融合蛋白中的腺嘌呤脱氨结构域能够将CRISPR切口酶-第二scRNA-第二融合蛋白-DNA复合物形成中产生的单链DNA的腺苷脱氨转换成肌苷(I),由于DNA聚合酶会将肌苷(I)当做鸟嘌呤(G)处理,因此通过碱基错配修复可以实现A至G的取代。
在一些实施方案中,所述DNA依赖型腺嘌呤脱氨酶是大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的变体。示例性的野生型ecTadA氨基酸序列如SEQ ID NO:32所示。在本发明一些优选实施方式中,所述DNA依赖型腺嘌呤脱氨酶包含如SEQ ID NO:33所示的氨基酸序列。
由于大肠杆菌tRNA腺嘌呤脱氨酶(ecTadA)通常以二聚体发挥功能,因此预期两个DNA依赖型腺嘌呤脱氨酶形成二聚体或DNA依赖型腺嘌呤脱氨酶与野生型腺嘌呤脱氨酶形成二聚体可以显著提高融合蛋白A至G的编辑活性。在一些优选实施方案中,所述腺嘌呤脱氨结构域包含两个所述DNA依赖型腺嘌呤脱氨酶。
在一些优选实施方案中,所述腺嘌呤脱氨结构域还包含与所述DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)融合的对应的野生型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA)。在一些优选实施方案中,所述DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变 体)融合至对应的野生型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA)的C端。
在一些实施方案中,所述两个DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)之间或所述DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)与所述对应的野生型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA)之间通过接头融合。
在一些实施方案中,所述第二融合蛋白中,所述第二RNA适配体特异性结合蛋白位于所述腺嘌呤脱氨结构域的C端。在一些实施方案中,所述第二融合蛋白中,所述第二RNA适配体特异性结合蛋白与所述腺嘌呤脱氨结构域之间通过接头融合。
如本文所用,“接头”可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如,所述接头可以是柔性接头等。在一些实施方案中,所述接头长16个氨基酸,例如所述接头包含SEQ ID NO:41所示氨基酸序列。在一些实施方案中,所述接头长36个氨基酸,例如所述接头包含SEQ ID NO:42或43所示氨基酸序列。
在本发明的一些实施方案中,本发明的CRISPR切口酶、第一融合蛋白和/或第二融合蛋白还可以包含核定位序列(NLS)。一般而言,所述CRISPR切口酶、第一融合蛋白和/或第二融合蛋白中的一个或多个NLS应具有足够的强度,以便在细胞的核中驱动所述蛋白以可实现其碱基编辑功能的量积聚。一般而言,核定位活性的强度由所述蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。
在本发明的一些实施方案中,本发明的CRISPR切口酶、第一融合蛋白和/或第二融合蛋白的NLS可以位于N端和/或C端或中间。在一些实施方案中,所述CRISPR切口酶、第一融合蛋白和/或第二融合蛋白包含约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述CRISPR切口酶、第一融合蛋白和/或第二融合蛋白包含在或接近于N端的约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述CRISPR切口酶、第一融合蛋白和/或第二融合蛋白包含在或接近于C端约1、2、3、4、5、6、7、8、9、10个或更多个NLS。当存在多于一个NLS时,每一个可以被选择为不依赖于其他NLS。在一些具体实施方式中,NLS包含SEQ ID NO:39或40所示的氨基酸序列。
此外,根据所需要编辑的DNA位置,本发明的CRISPR切口酶、第一融合蛋白和/或第二融合蛋白还可以包括其他的定位序列,例如细胞质定位序列、叶绿体定位序列、线粒体定位序列等。
在一些实施方案中,本发明的CRISPR切口酶、第一融合蛋白和/或第二融合蛋白通过“自裂解肽”相互连接。
如本文所用“自裂解肽”意指可以在细胞内实现自剪切的肽。例如,所述自裂解肽可以包含蛋白酶识别位点,从而被细胞内的蛋白酶识别并特异性切割。或者,所述自裂解肽可以是2A多肽。2A多肽是一类来自病毒的短肽,其自切割发生在翻译期间。当用 2A多肽连接两种不同目的多肽在同一读码框表达时,几乎以1:1的比例生成两种目的多肽。常用的2A多肽可以是来自猪捷申病毒(porcine techovirus-1)的P2A、来自明脉扁刺蛾β四体病毒(Thosea asigna virus)的T2A、马甲型鼻病毒(equine rhinitis A virus)的E2A和来自口蹄疫病毒(foot-and-mouth disease virus)的F2A。本领域也已知多种这些2A多肽的功能性变体,这些变体也可以用于本发明。在一些具体实施方案中,所述2A多肽是T2A,例如包含SEQ ID NO:38所示氨基酸序列。
通过使用自裂解肽,可以将本发明的CRISPR切口酶、第一融合蛋白和/或第二融合蛋白置于同一个表达载体进行表达。
在一些实施方案中,所述第一scRNA、第二scRNA和/或所述成对的gRNA可以由同一表达构建体表达。
通过使用本发明的基因组编辑系统,可以通过一次转化同时实现不同靶位点的不同类型的基因组编辑。例如,如果将所述系统中的i)和ii-1)、ii-2)以及ii-3)共同(在同一载体或在分开的载体中)导入植物中,可以一次转化实现第一靶位点中C-T编辑,第二靶位点中A-G编辑,以及第三靶位点的缺失突变。
三、产生经遗传修饰的植物的方法
在另一方面,本发明提供了一种产生经遗传修饰的植物例如作物植物的方法,包括将本发明的基因组编辑系统导入植物。
在一些实施方案中,将所述系统中的i)和ii-1)共同导入植物中,由此实现第一靶位点中C-T编辑。
在一些实施方案中,将所述系统中的i)和ii-1)以及ii-2)共同导入植物中,由此实现第一靶位点的中C-T编辑,第二靶位点中的A-G编辑。
在一些实施方案中,将所述系统中的i)和ii-2)以及ii-3)共同导入植物中,实现第二靶位点中A-G编辑以及第三靶位点的缺失突变。
在一些实施方案中,将所述系统中的i)和ii-1)以及ii-3)共同导入植物中,实现第一靶位点中C-T编辑,以及第三靶位点的缺失突变。
在一些实施方案中,将所述系统中的i)和ii-1)、ii-2)以及ii-3)共同导入植物中,实现第一靶位点中C-T编辑,第二靶位点中A-G编辑,以及第三靶位点的缺失突变。
在一些实施方案中,所述系统中的i)、ii-1)、ii-2)、ii-3)及其组合是同时导入所述植物,例如在同一载体导入植物,或在一次转化中导入植物。
在一些实施方案中,所述方法包括:
a)将本发明的基因组编辑系统的i)导入植物,获得稳定表达所述CRSPR切口酶的转基因植物;
b)将本发明的基因组编辑系统的ii-1)、ii-2)、ii-3)及其任意组合导入步骤a)获得的转基因植物。
在本发明的产生经遗传修饰的植物的方法中,所述基因组编辑系统可以本领域技术 人员熟知的各种方法导入植物。可用于将本发明的编辑系统导入植物的方法包括但不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。优选地,通过瞬时转化将所述系统导入植物。
在本发明的方法中,只需在植物细胞中导入或产生所述蛋白和RNA分子即可实现对靶位点的修饰,并且所述修饰可以稳定遗传,无需将所述编辑系统稳定转化植物。这样避免了稳定存在的编辑系统的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的基因组编辑系统转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率,获得不含外源核苷酸序列的经修饰的植物。
在另一些实施方式中,可以将本发明的基因组编辑系统转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录的RNA分子转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现基因编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组中的整合。
因此,在一些实施方式中,使用本发明的方法对植物进行遗传修饰和育种可以获得无外源DNA整合的植物,即非转基因(transgene-free)的经修饰的植物。
可以通过本发明的系统和方法进行基因编辑的植物包括单子叶植物和双子叶植物。例如,所述植物可以是作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
在本发明的一些实施方式中,其中所述靶位点与植物性状如农艺性状相关,由此所述编辑导致所述植物相对于野生型植物具有改变的性状。在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的植物的后代。在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分,其中所述植物通过本发明上述的方法获得。在一些实施方式中,所述经遗传修饰的植物或其后代或其部分是非转基因的。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明上述的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而将所述遗传修 饰导入第二植物。
实施例
实施例1、利用MS2介导C至T转换
植物胞嘧啶碱基编辑系统PBE,主要由以下几个模块组成:1)胞嘧啶脱氨酶用来将胞嘧啶(Cytosine,C)脱氨为尿嘧啶(Uracil,U);2)nCas9(D10A)用于sgRNA可编程的DNA碱基编辑和促进内源的错配修复(Mismatch Repair,MMR)途径;3)尿嘧啶DNA糖基化酶抑制剂(Uracil DNA Glycosylase Inhibitor,UGI)用于抑制体内尿嘧啶糖基化酶UDG的活性,防止U变成AP位点。
MS2是常用的RNA适配体,已有研究表明将两个MS2发夹结构添加到esgRNA的3′端形成的scRNA,能够在人类细胞中能够高效的介导CRISPRa。因此,首先构建了由OsU3启动子驱动的该scRNA载体pOsU3-esgRNA-2×MS2(图1b)。为了构建RNA适配体招募的PBE系统载体PBEc,利用T2A“自剪切”肽同时表达多个蛋白模块,其中nCas9(D10A)融合或不融合APOBEC1或UGI作为RNA可编程的模块,MS2结合蛋白MCP融合APOBEC1或UGI作为被招募模块,构建了PBEc1至PBEc5。所有的PBEc载体均用作物密码子优化,并由玉米的Ubi-1启动子驱动(图1c)。
为了筛选出高效的PBEc载体,使用BFP变GFP报告系统对PBEc载体的C>T效率进行评价,该报告系统的GFP荧光活性需要BFP上CAC编码的His66变为TAC编码的Tyr66。因此,构建了靶向该位点的scRNA质粒esgRNA-2×MS2-BFP,使碱基C位于PAM远端的第4位。利用PEG诱导法,将不同的PBEc和esgRNA-2×MS2-BFP组合转化水稻原生质体,PBE和sgRNA-BFP作为对照组,GFP为阳性对照,未转化的水稻原生质体为阴性对照。22℃培养36h后,使用流式细胞仪检测各处理组的GFP荧光活性。三次实验重复结果表明,PBEc1至PBEc5组的GFP荧光活性为0.67-10.80%,其中具有MCP-APOBEC1-UGI被招募模块的PBEc4的荧光活性最高,其次为具有MCP-UGI-APOBEC1被招募模块的PBEc5,分别为PBE和sgRNA-BFP对照组的2.87倍、1.21倍(图1d)。PBEc1和PBEc2的被招募模块均为MCP-APOBEC1,其效率与PBE对照组相当(图1d)。虽然MCP以二聚体的形式结合MS2的发夹结构,但是当被招募模块为MCP-UGI时(PBEc3),C>T的效率急剧下降(图1d)。
为了验证PBEc载体在内源靶位点上的C>T活性,将水稻的6个内源靶位点构建到pOsU3-esgRNA-2×MS2-BFP或pOsU3-sgRNA载体上(表1)。含有靶位点的scRNA载体分别与PBEc1至PBEc5共转化水稻原生质体,含有靶位点的sgRNA载体分别与PBE和Cas9共转化水稻原生质体作为对照组。22℃培养60h后,提取水稻原生质DNA,并进行扩增子NGS测序。结果表明,使用MS2和MCP配对的PBEc载体与PBE的编辑窗口一致,均为C 3~C 9(图1e)。所测试的5个PBEc载体在内源靶位点上C 3~C 9的编辑效率为0.13~11.73%(图1e),其中PBEc4的编辑效率最高,在4个靶位点(OsACC-T1、OsDEP1-T2、OsEV和OsOD)上的活性是PBE的3.62倍,2个靶位点(OsCDC48和OsDEP1-T1)的活性与PBE相比略低。 此外,nPBEc4具有较高的产物编辑纯度,非目的产物不明显(<0.04%);其产生的Indel效率(0.04~0.29%)与未处理组的水稻原生质体一致(0.01~0.32%),远低于Cas9(4.82~11.75%)(图5)。综上,scRNA编程的nCas9(D10A)复合体同时招募APOBEC1和UGI能够提升C>T的碱基编辑活性。因此,使用PBEc4的载体结构作为进一步研究的对象。
表1、用于比较PBEc and PABEc活性的sgRNA靶序列
Figure PCTCN2021079087-appb-000001
aPAM基序用粗体和下划线示出。
实施例2、设计不同的scRNA介导C至T转换
为了开发多重招募系统,并获得多种具有不同RNA适配体的scRNA,将PBEc4中的MCP替换为PCP、N22p和Com,分别识别病毒RNA发夹结构PP7、boxB和com,构建了PBEc6、PBEc7和PBEc8(图2a)。有报道表明将MS2发夹结构嵌合在sgRNA的二级结构四环(Tetraloop)和茎环2(Stem Loop 2)上形成的sgRNA2.0比在sgRNA 3′端连接两个MS2发夹结构形成的scRNA形式sgRNA-2×MS2效率更高。因此,在sgRNA或含有一个A-U或C-G翻转和发夹结构延长的esgRNA的四环和颈环2上分别嵌合一个MS2发夹结构,构建了sgRNA2.0至sgRNA4.0(图2b和图16、17)。使用相同的构建策略,还构建了分别含有PP7两种发夹结构变体和boxB发夹结构的sgRNA7-1、sgRNA7-2和sgRNAB.0(图2b和图16、17)。同时,使用3′端连接1个或2个RNA适配体(包括MS2、PP7、boxB、com)发夹结构的策略构建了多种scRNA(图2b和图16、17)。为了招募更多的APOBEC1-UGI模块,使用上述两种策略结合的方法构建了含有MS2发夹结构的sgRNA4.0(图2b和图16、17)。
为了比较不同scRNA与相应PBEc之间的活性,在水稻原生质体中使用BFP变GFP报告系统进行效率评价。BFP-sgRNA或BFP-esgRNA和PBE载体作为对照组。不同于已报道的结论,所有的sgRNA2.0构象的scRNA(包括MS2、PP7和boxB)在报告系统中介导的 C>T编辑效率都非常低(0.07~0.43%)(图2c)。相反地,scRNA 3′端含有2个或3个RNA适配体发夹结构和含有1个发夹结构的esgRNA-1×com构象均能介导高效的C>T编辑效率(1.83~8.83%)(图2c)。其中,PBEc4与esgRNA-2×MS2、esgRNA-3×MS2或sgRNA4.0的转化组合以及PBEc8与esgRNA-2×com的转化组合在报告系统中的C>T编辑效率分别是7.47%、8.00%、8.83%、6.90%,均比PBE和esgRNA的转化组合(6.03%)的效率高(图2c)。两种scRNA构象之间的活性之所以与已报道的结论不一致,是因为在本研究中使用的3′端scRNA构象中RNA适配体发夹结构之间具有一个双链linker,增加了scRNA 3′端多个发夹结构构象的稳定性。
为了评价上述高效scRNA(esgRNA-2×MS2、esgRNA-3×MS2、sgRNA4.0、esgRNA-2×com)介导的水稻内源靶位点的C>T编辑活性,将5个内源靶位点构建到4个scRNA载体上,并分别与PBEc4和PBEc8共转化水稻原生质体。含有靶位点的sgRNA或esgRNA载体分别与PBE和Cas9共转化水稻原生质体作为对照组。22℃培养60h后,提取水稻原生质DNA,并进行扩增子NGS测序。结果表明,在所测试的5个靶位点(OsACC-T1、OsDEP1-T1、OsDEP1-T2、OsEV和OsOD)的C 3~C 9编辑窗口上,使用esgRNA(平均值7.96%)、esgRNA-2×MS2(平均值18.04%)、esgRNA-3×MS2(平均值14.96%)以及esgRNA-2×com(平均值11.13%)介导的C>T碱基编辑活性均比使用sgRNA(平均值4.82%)和sgRNA4.0(平均值4.78%)高。在这些scRNA中,esgRNA-2×MS2、esgRNA-3×MS2、esgRNA-2×com介导的C>T编辑效率比sgRNA高2.31~3.75倍(图2d,图6),并且具有相同的主要的单碱基窗口C 3~C 9(图2d)。同时,不同的scRNA在水稻原生质体中也具有较高的产物纯度(>99.68%)(图7a)和较低的Indel值(<0.56%),远低于Cas9(<21.21%)(图7b)。
此外,为了开发窄编辑窗口的高效PBE系统,将PBE和PBEc4的APOBEC1部分替换为催化活性降低的APOBEC1变体YE1、YE2、EE和YEE(图8a,b)。使用OsEV和OsOD两个靶位点在水稻原生质体中测试这些变体在PBE和PBEc4构象下的C>T碱基编辑窗口和活性。扩增子测序结果表明,YE1-PBEc4、EE-PBEc4或YEE-PBEc4与esgRNA-2×MS2组合,均能提高C>T的碱基编辑活性,在编辑窗口中心位置上(OsEV的C 5和OsOD的C 6)的活性大概是这些变体在PBE构象下的1.37~1.78倍(图2e,图8c)。其中,EE-PBEc4在保持窄窗口的同时,也具有较低从属编辑产物(图2e,图8c),说明窄窗口的PBE可以通过scRNA招募的方式增强其编辑活性。
综上,将不同的RNA适配体整合进sgRNA对于在植物中使用RNA编程的nCas9(D10A)的多重招募提供了有效的解决方案。此外,筛选出来的esgRNA-2×MS2、esgRNA-3×MS2和esgRNA-2×com可以作为多重基因组编辑系统中介导CBE功能的候选对象。
实施例3、优化载体和scRNA介导A至G转换
植物腺嘌呤单碱基编辑器PABE-7主要有以下几个模块构成:野生型腺嘌呤脱氨酶 ecTadA和人工进化的脱氧腺嘌呤脱氨酶ecTadA7.10组成的异二聚体,与PBE系统一致的nCas9(D10A),以及在nCas9(D10A)C端的3个拷贝的SV40NLS。为了将PABE-7改造为RNA适配体招募的构象,首先基于PBEc4构建了用于esgRNA-2×MS2招募的PABEc1(图3a)。使用mGFP报告系统测试PABEc1介导的A>G碱基编辑活性,在该报告系统中非编码链上的A>G的转换使得编码链上的终止密码子TAG转变为CAG(Gln69),从而产生GFP荧光报告活性。将PABEc1、Ubi-mGFP和mGFP-esgRNA-2×MS2共转化水稻原生质体,22℃培养24h后,使用流式细胞仪检测其GFP荧光活性。结果表明,与PBEc4和esgRNA-2×MS2组合提高C>T活性的特点不同,PABEc1在mGFP报告系统中的A>G活性(1.73%)比人类细胞中常用的构建形式PABE-2(7.03%)低很多(图3b)。由于ecTadA*变体与nCas9之间的linker长度为32个氨基酸时比16个氨基酸的XTEN linker提供更高的效率,因此将PABEc1的XTEN linker替换为32个氨基酸的linker((SGGS) 2-XTEN-(SGGS) 2),构建了PABEc2,以及具有C端MCP的PABEc3(图3a)。在mGFP报告系统中,PABEc2和PABEc3的活性分别为7.23%和8.03%,比PABE-2略高,但仍低于PABE-7和esgRNA组合的效率(14.40%)(图3b)。为了评价一种类似于SAM系统的构建形式对于提高A>G活性的效力,将ecTadA-ecTadA7.10异二聚体融合到PABEc3上nCas9(D10A)的C端,构建了PABEc4(图3a)。然而,在mGFP报告系统中,PABEc4的A>G碱基编辑效率为10.60%,依然低于PABE-7和esgRNA组合(图3b)。该结果暗示,通过优化PABEc的构象来提升其在植物中的A>G碱基编辑活性是有限的。综上,PABEc3构象被用来进一步的多重系统的开发,并尝试利用其它的RNA适配体提升该构象的活性。
将PABEc3C端的MCP替换为PCP、N22p和Com,构建了PABEc5、PABEc6和PABEc7,分别用来识别PP7、boxB和com的RNA发夹结构(图3c)。利用mGFP报告系统测试相应的scRNA和PABEc3、PABEc5、PABEc6或PABEc7。其中,PABEc6和esgRNA-2×boxB组合在报告系统中的A>G活性为26.53%,略高于PABE-7和esgRNA组合的活性(25.57%)(图3d)。在其它类型的RNA适配体中,具有最高报告系统A>G活性的组合依次是:PABEc3和esgRNA-2×MS2+f6、PABEc5和esgRNA-1×PP7-1、PABEc7和esgRNA-2×com,效率分别为18.07%、21.03%、22.47%,均低于PABE-7和esgRNA组合(图3d)。同时,也构建了基于PABEc2构象的载体PABEc8、PABEc9和PABEc10(图9a),分别在腺嘌呤脱氨酶的N端使用PCP、N22p或Com结合蛋白,但是所有测试的组合在mGFP报告系统中的效率(<21.23%)均低于PABE-2和sgRNA组合(22.87%)(图9b)。与在PBEc中使用在四环和茎环2上含有RNA发夹结构的scRNA一样,PABEc和这些scRNA也介导较低的GFP荧光信号(图3d)。因此,具有3′端RNA发夹结构的scRNA介导C>T和A>G的效率均比具有四环和茎环2上RNA发夹结构的scRNA的效率高。
为了评价PABEc介导的水稻内源基因的A>G碱基编辑效率,使用esgRNA-2×MS2、esgRNA-2×MS2+f6、esgRNA-1×PP7-1、esgRNA-2×boxB和esgRNA-2×com,并分别构建了6个内源靶位点载体,与相应的PABEc共转化水稻原生质体(表1)。PABE-2和sgRNA组合与PABE-7和esgRNA组合作为对照组。在所测试的PABEc和scRNA转化组合中, PABEc6和esgRNA-2×boxB介导最高的A>G碱基编辑效率,在主要的编辑窗口A 4~A 8上的效率平均为4.65%,与PABE-2和sgRNA组合的效率相当(平均4.78%)(图3e,图10a)。由于N22p在所测试的结合蛋白中的长度是最短的,为33个氨基酸,推测ecTadA-ecTadA7.10异二聚体的活性不仅受结合蛋白位置影响,也受到结合蛋白的长度影响。同样地,在水稻原生质体中PABEc6和esgRNA-2×boxB组合也具有较高的产物纯度(平均99.76%)(图10b),其indel值为0.04~0.38%,与未处理对照组一致(0.05~0.40%),远低于Cas9(10.94~23.84%)(图10c)。因此,选择esgRNA-2×boxB作为多重基因组编辑系统中介导ABE功能的scRNA。
实施例4、基于Cas9切口酶的多重基因组编辑系统
在水稻原生质体中成功地使用scRNA介导的CBE或ABE功能为进一步开发使用nCas9(D10A)平台同时编辑的多重基因组编辑系统奠定了基础。为了利用nCas9(D10A)的功能介导多重基因组编辑,首先基于PBEc4和同时表达的1个esgRNA-2×MS2与一组成对的sgRNA集成了SWISSv1.1,在不同的靶位点同时产生胞嘧啶碱基编辑和成对nCas9介导的DSB(图4a)。共测试了两组sgRNA(表2),每组的多sgRNA均组装在同一个载体上,并由OsU3或TaU6驱动(图11)。在水稻原生质体中,所测试靶位点的编辑窗口C 3~C 9上,C>T的编辑活性为0.33~31.32%,同时在另一靶位点产生的indel效率为1.74~2.52%(图4a)。采用同样的策略,基于PABEc6和esgRNA-2×boxB以及成对的sgRNA,集成了SWISSv1.2(图4b)。在水稻原生质体的两个测试组中,A>G的效率高达2.85%,同时在另一靶位点产生的indel效率高达2.49%(图4b)。此外,NGS测序结果表明,在SWISSv1.1和SWISSv1.2中,成对的nCas9(D10A)产生的indel突变序列读数至少79%为删除(图12)。上述结果表明,scRNA介导的PBE和PABE系统均能使用多sgRNA实现碱基编辑和indel双重功能,证明了在使用PBEc4和PABEc6时,利用成对的nCas9(D10A)能产生indel。
为了测试scRNA介导的碱基编辑在不同靶位点同时产生CBE和ABE双重功能的能力,使用Ubi-1启动子和T2A“自剪切”肽同时表达nCas9(D10A)、MCP-APOBEC1-UGI和ecTadA-ecTadA7.10-N22p,构建了MGE载体(图13a)。由TaU6启动子驱动esgRNA-2×MS2用于C>T编辑,OsU3启动子驱动esgRNA-2×boxB用于A>G编辑(图13b),与MGE集合成SWISSv2(图4c)。在水稻原生质体中,共测试了两组靶位点(表2),结果表明,SWISSv2可以在一个靶位点产生C>T(<13.19%)碱基编辑的同时,也可以在另一靶产生位点A>G(<4.27%)碱基编辑。将成对的sgRNA加入到SWISSv2中,成为SWISSv3(图4d)。依然在水稻原生质体中测试了两组靶位点(表2,图13c)。扩增子NGS测序结果表明,SWISSv3作为一个多重综合可编程的基因组编辑系统能够同时在不同的靶位点实现三种功能的编辑:C>T(<11.68%)、A>G(<2.64%)和indel(<2.22%)(图4e)。综上,SWISSv3在植物中的基因叠加和可遗传的修饰上提供了一种可供选择的方案。
表2、用于在水稻原生质体测试SWISSv1.1、SWISSv1.2、SWISSv2和SWISSv3的sgRNA 靶序列
Figure PCTCN2021079087-appb-000002
aPAM基序用粗体和下划线示出。
b成对的sgRNA根据以下规则设计:(1)以PAM在外的方向;(2)切口位点之间的距离为the40-68bp。
实施例5、水稻植株中的多重编辑
为了验证SWISSv3在水稻植株中的编辑能力,构建了多sgRNA载体靶向OsALS、OsACC和OsBADH2,并与MGE组装进pCAMBIA1300双元载体(图14a)。水稻再生植株中的突变位点使用T7E1和Sanger测序进行检测(图14b),结果表明,发生CBE、ABE和indel的效率分别为25.45%、16.36%和52.73%(表3)。更重要的是,有4株包含了同时分别在三个不同靶位点产生CBE、ABE和indel的三突突变体,效率为7.27%;高达12.73%的再生植株具有双突;SWISSv3也能够产生三个位点的单突突变体(表4)。综上,SWISSv3使用scRNA可以在植物中作为三重功能的综合可编程的基因组编辑系统,将会有利于作物分子育种。
表3.SWISSv3在T0水稻植物中诱导的突变频率
Figure PCTCN2021079087-appb-000003
Figure PCTCN2021079087-appb-000004
a携带观察到的突变的T0品系(水稻)的数量相对于所分析的T0转基因水稻品系总数。 b通过在线工具DSDecodeM(参考)和TIDE(参考)分析插入缺失的基因型。.
表4.使用SWISSv3在T0水稻植物中的多重基因组编辑
Figure PCTCN2021079087-appb-000005
aN.A.代表不可用。
实施例6、SWISS系统的脱靶分析
在SWISSv2和SWISSv3中,使用了T2A同时表达多个模块,推测T2A的“自剪切”效率会影响CBE或ABE靶位点的产物纯度(图15a)。如图15a所示,T2A介导的“自剪切”是通过核糖体跳读T2A C端形成的甘氨酸-脯氨酸肽键实现的,T2A的位置会影响多模块同时表达的水平。MGE中,成功的跳读将产生三个独立的目的蛋白,但是跳读失败,就会产生非目的性的融合蛋白,尤其是产生的MCP-APOBEC1-UGI-T2A-ecTadA-ecTadA7.10-N22p融合蛋白,均可以被 esgRNA-2×MS2和esgRNA-2×boxB招募,将会产生ABE靶位点的非目的C碱基编辑或CBE靶位点的非目的A碱基编辑,从而引起非目的性的脱靶。分析SWISSv2和SWISSv3的水稻原生质体的扩增子NGS测序结果,并检查ABE靶位点胞嘧啶编辑和CBE靶位点腺嘌呤编辑的效率。结果表明,两者都存在非目的编辑,C>T和A>G非目的编辑的效率分别低于0.90%和0.19%,但依然比未处理对照组高(C>T,<0.07%;A>G,<0.04%)(图15b)。同时,该结果提示需要采用更高效的共表达多个木块的策略。
使用Cas-OFFinder进一步在全基因组水平上搜索小于或等于3nt错配的潜在脱靶位点,并对这些位点进行测序,结果表明,在所有的潜在脱靶位点上均未发现脱靶现象(表5)。由于SWISS系统中整合了胞嘧啶脱氨酶和腺嘌呤脱氨酶,可能会存在潜在的不可预测的DNA和RNA脱靶,需要采用高效和高特异性的脱氨酶变体来进一步解决这个问题。
表5.分析了三重突变体中OsALS-T2,OsACC-T2,OsBADH2-Indels-sgL和OsBADH2-Indels-sgR的潜在脱靶位点。
Figure PCTCN2021079087-appb-000006
Figure PCTCN2021079087-appb-000007
aPAM基序以粗体和下划线示出; bN.A.不可用。
使用多sgRNA策略的多重基因组编辑系统可以分为两个方面:一种是在不同的靶位点上进行相同类型的基因组编辑;另外一种是本研究中提出的在不同的靶位点进行不同类型的基因组编辑。到目前为止,本研究开发的SWISS系统能够在植物中使用一种可编程的Cas蛋白介同时导多重不同类型的基因组编辑尚属首次。尽管这种多重编辑可以使用CRISPR/Cas同源蛋白实现,但是多个同源蛋白的载体会更大,不利于基因枪介导的遗传转化,同时对PAM的需求也更严格。而SWISS系统只使用一种nCas9(D10A),能够减轻上述两个缺点带来的问题,尤其是使用NG PAM的Cas9变体,能够进一步拓展SWISS的编辑范围(图4e,表2)。
在本研究中,使用了RNA聚合酶III型启动子OsU3和TaU6表达多sgRNA,也可以采用其他的多sgRNA策略以进一步优化SWISS系统,例如Csy4RNA核糖核酸酶或核酶产生多sgRNA。scRNA招募的构建的C>T平均活性比PBE的高,且并不伴随更宽的碱基编辑窗口,这种策略可以用来提升窄编辑窗口胞嘧啶变体的编辑活性。尽管scRNA招募的构建的A>G活性仅与PABE-2相当,但其活性足够介导SWISSv3获得水稻A>G突变体。同时,与PBEc构建不一样的是,使用不同的RNA适配体并不能提高PABEc构建的效率,也意味着优化PABEc的空间较为有限,需要开发更高效的腺嘌呤脱氨酶。
当然,双功能的SWISSv1.1和SWISSv1.2系统也可以采用PBE和PABE结合多sgRNA策略来实现,但是,本研究的RNA适配体招募策略提供了另外一种可以选择的方法,尤其是采用nCas9(D10A)过表达植物进行多重基因组编辑,本策略根据优势。因此,未来也可以构建nCas9(D10A)过表达水稻,作为一个开发平台,只需要转化多sgRNA和碱基编辑招募模块,经过二次转化即可实现SWISS的多重编辑功能,同时也可以降低非目的脱靶现象,并有利于作物的分子设计育种。此外,优化第三种scRNA,构建截短的间隔序列(14~15nt),招募表观修饰因子、基因调控抑制子、激活子或荧光蛋白,便可以实现四重功能的CRISPR系统。SWISS系统还可以采用随机和多sgRNA策略进行植物内源基因的定向进化,以及超出植物的应用,例如改变细胞命运或代谢调控途径。

Claims (36)

  1. 一种用于在植物尤其是作物中进行多重编辑的基因组编辑系统,其包含:
    i)CRISPR切口酶和/或含有编码所述CRISPR切口酶的核苷酸序列的表达构建体;和
    ii)选自以下的一项或多项或全部:
    ii-1)靶向所述植物基因组内的第一靶区域的第一scRNA和/或含有编码所述第一scRNA的核苷酸序列的表达构建体,所述第一scRNA包含至少一个第一RNA适配体(aptamer);以及,第一融合蛋白和/或含有编码所述第一融合蛋白的核苷酸序列的表达构建体,所述第一融合蛋白包含第一RNA适配体特异性结合蛋白和胞嘧啶脱氨结构域;
    ii-2)靶向所述植物基因组内的第二靶区域的第二scRNA和/或含有编码所述第二scRNA的核苷酸序列的表达构建体,所述第二scRNA包含至少一个第二RNA适配体;以及,第二融合蛋白和/或含有编码所述第二融合蛋白的核苷酸序列的表达构建体,所述第二融合蛋白包含第二RNA适配体特异性结合蛋白和腺嘌呤脱氨结构域;
    ii-3)靶向所述植物基因组内的第三靶区域的成对gRNA和/或含有编码所述成对gRNA的核苷酸序列的表达构建体,所述成对gRNA分别靶向所述第三靶区域DNA的不同链。
  2. 权利要求1的系统,其中所述CRISPR切口酶是Ca9切口酶,例如其包含SEQ ID NO:25或48所示的氨基酸序列。
  3. 权利要求1或2的系统,所述成对gRNA包含SEQ ID NO:3或SEQ ID NO:4所示核苷酸序列。
  4. 权利要求1-3中任一项的系统,所述RNA适配体选自MS2、PP7、boxB和com。
  5. 权利要求1-4中任一项的系统,所述RNA适配体特异性结合蛋白选自MCP、PCP、N22p和COM。
  6. 权利要求1-5中任一项的系统,所述scRNA包含两个或更多个RNA适配体。
  7. 权利要求1-6中任一项的系统,所述scRNA包含SEQ ID NO:5-24之一所述的核苷酸序列。
  8. 权利要求1-7中任一项的系统,所述第一scRNA包含SEQ ID NO:13或15所示核苷酸序列。
  9. 权利要求8的系统,所述第一RNA适配体特异性结合蛋白包含SEQ ID NO:34所示氨基酸序列。
  10. 权利要求1-7中任一项的系统,所述第一scRNA包含SEQ ID NO:24所示核苷酸序列。
  11. 权利要求10的系统,所述第一RNA适配体特异性结合蛋白包含SEQ ID NO:37 所示氨基酸序列。
  12. 权利要求1-11中任一项的系统,所述第二scRNA包含SEQ ID NO:22所示核苷酸序列。
  13. 权利要求12的系统,所述第二RNA适配体特异性结合蛋白包含SEQ ID NO:36所示氨基酸序列。
  14. 权利要求1-13中任一项的系统,所述胞嘧啶脱氨酶选自APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶,或它们的功能性变体。
  15. 权利要求14的系统,所述胞嘧啶脱氨酶是APOBEC1脱氨酶或其功能性变体。
  16. 权利要求15的系统,所述胞嘧啶脱氨酶包含SEQ ID NO:26-30之一的氨基酸序列。
  17. 权利要求1-16中任一项的系统,所述第一RNA适配体特异性结合蛋白位于所述胞嘧啶脱氨结构域的N端。
  18. 权利要求1-17中任一项的系统,所述第一RNA适配体特异性结合蛋白与所述胞嘧啶脱氨结构域之间通过接头融合。
  19. 权利要求1-18中任一项的系统,所述第一融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI),例如,所述UGI包含SEQ ID NO:31所示氨基酸序列。
  20. 权利要求1-19中任一项的系统,所述腺嘌呤脱氨结构域包含至少一个DNA依赖型腺嘌呤脱氨酶多肽。
  21. 权利要求20的系统,所述DNA依赖型腺嘌呤脱氨酶是大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的变体,例如,所述DNA依赖型腺嘌呤脱氨酶包含如SEQ ID NO:33所示的氨基酸序列。
  22. 权利要求21的系统,所述腺嘌呤脱氨结构域还包含与所述大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体融合的对应的野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA,例如,所述野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA包含SEQ ID NO:32所示的氨基酸序列。
  23. 权利要求22的系统,所述大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体融合至对应的野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA的C端,优选地,通过接头融合。
  24. 权利要求1-23中任一项的系统,所述第二RNA适配体特异性结合蛋白位于所述腺嘌呤脱氨结构域的C端。
  25. 权利要求1-24中任一项的系统,所述第二RNA适配体特异性结合蛋白与所述腺嘌呤脱氨结构域之间通过接头融合。
  26. 权利要求1-25中任一项的系统,所述CRISPR切口酶、第一融合蛋白和/或第二融合蛋白还包含核定位序列(NLS)。
  27. 权利要求1-26中任一项的系统,所述CRISPR切口酶、第一融合蛋白和/或第 二融合蛋白通过“自裂解肽”相互连接。
  28. 一种产生经遗传修饰的植物例如作物植物的方法,包括将权利要求1-27中任一项的基因组编辑系统导入所述植物。
  29. 权利要求28的方法,其中将所述系统中的i)和ii-1)共同导入植物中,由此实现第一靶位点中C-T编辑。
  30. 权利要求28的方法,其中将所述系统中的i)和ii-1)以及ii-2共同导入植物中,由此实现第一靶位点的中C-T编辑,第二靶位点中的A-G编辑。
  31. 权利要求28的方法,其中将所述系统中的i)和ii-2)以及ii-3)共同导入植物中,实现第二靶位点中A-G编辑以及第三靶位点的缺失突变。
  32. 权利要求28的方法,其中将所述系统中的i)和ii-1)以及ii-3)共同导入植物中,实现第一靶位点中C-T编辑,以及第三靶位点的缺失突变。
  33. 权利要求28的方法,其中将所述系统中的i)和ii-1)、ii-2)以及ii-3)共同导入植物中,实现第一靶位点中C-T编辑,第二靶位点中A-G编辑,以及第三靶位点的缺失突变。
  34. 权利要求28-33中的方法,其中所述系统中的i)、ii-1)、ii-2)、ii-3)或其组合是同时导入所述植物,例如在同一载体导入植物,或在一次转化中导入植物。
  35. 权利要求28的方法所述方法包括:
    c)将所述系统的i)导入植物,获得稳定表达所述CRSPR切口酶的转基因植物;
    d)将所述系统的ii-1)、ii-2)、ii-3)或其任意组合导入步骤a)获得的转基因植物。
  36. 权利要求28-35中任一项的方法,所述植物包括单子叶植物和双子叶植物,例如,所述植物是作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
PCT/CN2021/079087 2020-03-04 2021-03-04 多重基因组编辑方法和系统 WO2021175289A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US17/909,309 US20240117368A1 (en) 2020-03-04 2021-03-04 Multiplex genome editing method and system
EP21764356.8A EP4116426A4 (en) 2020-03-04 2021-03-04 MULTIPLEX GENOME EDITING METHOD AND SYSTEM
BR112022017704A BR112022017704A2 (pt) 2020-03-04 2021-03-04 Método e sistema de edição de genoma multiplex
CN202180019211.8A CN115667528B (zh) 2020-03-04 2021-03-04 多重基因组编辑方法和系统

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010143643.7 2020-03-04
CN202010143643 2020-03-04

Publications (1)

Publication Number Publication Date
WO2021175289A1 true WO2021175289A1 (zh) 2021-09-10

Family

ID=77612853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079087 WO2021175289A1 (zh) 2020-03-04 2021-03-04 多重基因组编辑方法和系统

Country Status (5)

Country Link
US (1) US20240117368A1 (zh)
EP (1) EP4116426A4 (zh)
CN (1) CN115667528B (zh)
BR (1) BR112022017704A2 (zh)
WO (1) WO2021175289A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114686456A (zh) * 2022-05-10 2022-07-01 中山大学 基于双分子脱氨酶互补的碱基编辑系统及其应用
WO2023163806A1 (en) * 2022-02-22 2023-08-31 Massachusetts Institute Of Technology Engineered nucleases and methods of use thereof
WO2024193739A1 (en) * 2023-03-23 2024-09-26 Ostravska Univerzita Method for producing proteins in host cells

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116751799B (zh) * 2023-06-14 2024-01-26 江南大学 一种多位点双重碱基编辑器及其应用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016054106A1 (en) * 2014-09-29 2016-04-07 The Regents Of The University Of California SCAFFOLD RNAs
CN106191099A (zh) * 2016-07-27 2016-12-07 苏州泓迅生物科技有限公司 一种基于CRISPR‑Cas9系统的酿酒酵母基因组并行多重编辑载体及其应用
CN107027313A (zh) * 2014-10-17 2017-08-08 宾州研究基金会 用于多元rna引导的基因组编辑和其它rna技术的方法和组合物
WO2019138052A1 (en) * 2018-01-11 2019-07-18 Kws Saat Se Optimized plant crispr/cpf1 systems
CN110520163A (zh) * 2017-01-05 2019-11-29 新泽西鲁特格斯州立大学 独立于dna双链断裂的靶向基因编辑平台及其用途

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10287590B2 (en) * 2014-02-12 2019-05-14 Dna2.0, Inc. Methods for generating libraries with co-varying regions of polynuleotides for genome modification
CN108291218B (zh) * 2015-07-15 2022-08-19 新泽西鲁特格斯州立大学 核酸酶非依赖性靶向基因编辑平台及其用途
SG11201903089RA (en) * 2016-10-14 2019-05-30 Harvard College Aav delivery of nucleobase editors
CN107043779B (zh) * 2016-12-01 2020-05-12 中国农业科学院作物科学研究所 一种CRISPR/nCas9介导的定点碱基替换在植物中的应用
EP3589751A4 (en) * 2017-03-03 2021-11-17 The Regents of The University of California RNA TARGETING OF MUTATIONS VIA SUPPRESSOR RNA AND DEAMINASES
CA3064601A1 (en) * 2017-06-26 2019-01-03 The Broad Institute, Inc. Crispr/cas-adenine deaminase based compositions, systems, and methods for targeted nucleic acid editing
CN110066824B (zh) * 2018-01-24 2021-06-08 中国农业科学院植物保护研究所 一套用于水稻的碱基编辑人工系统
US11739322B2 (en) * 2018-02-01 2023-08-29 Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences Method for genome editing using a self-inactivating CRISPR nuclease
WO2019153902A1 (zh) * 2018-02-11 2019-08-15 中国科学院上海生命科学研究院 植物基因组定点替换的方法
CN110527695B (zh) * 2019-03-07 2020-06-16 山东舜丰生物科技有限公司 一种用于基因定点突变的核酸构建物

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016054106A1 (en) * 2014-09-29 2016-04-07 The Regents Of The University Of California SCAFFOLD RNAs
CN107027313A (zh) * 2014-10-17 2017-08-08 宾州研究基金会 用于多元rna引导的基因组编辑和其它rna技术的方法和组合物
CN106191099A (zh) * 2016-07-27 2016-12-07 苏州泓迅生物科技有限公司 一种基于CRISPR‑Cas9系统的酿酒酵母基因组并行多重编辑载体及其应用
CN110520163A (zh) * 2017-01-05 2019-11-29 新泽西鲁特格斯州立大学 独立于dna双链断裂的靶向基因编辑平台及其用途
WO2019138052A1 (en) * 2018-01-11 2019-07-18 Kws Saat Se Optimized plant crispr/cpf1 systems

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHAO LI, YUAN ZONG, SHUAI JIN, HAOCHENG ZHU, DEXING LIN, SHENGNAN LI, JIN-LONG QIU, YANPENG WANG, CAIXIA GAO: "SWISS: multiplexed orthogonal genome editing in plants with a Cas9 nickase and engineered CRISPR RNA scaffolds", GENOME BIOLOGY, vol. 21, no. 1, 1 December 2020 (2020-12-01), XP055768063, DOI: 10.1186/s13059-020-02051-x *
JESSE ZALATAN, MICHAEL LEE, RICARDO ALMEIDA, LUKE GILBERT, EVAN WHITEHEAD, MARIE LARUSSA, JORDAN. TSAI, JONATHAN WEISSMAN, JOHN DU: "Engineering Complex Synthetic Transcriptional Programs with CRISPR RNA Scaffolds", CELL, ELSEVIER, AMSTERDAM NL, vol. 160, no. 1-2, 1 January 2015 (2015-01-01), Amsterdam NL, pages 339 - 350, XP055278878, ISSN: 0092-8674, DOI: 10.1016/j.cell.2014.11.052 *
SAMBROOK, JFRITSCH, E.F.MANIATIS, T: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
See also references of EP4116426A4
WANG, HONGZHEN, YU JIAXIN, LIU QIANG, YU YINGJIE, CHENG JUN: "CRISPR/Cas9 Genome Editing Technique in Maize Breeding", MOLECULAR PLANT BREEDING, vol. 17, no. 20, 1 January 2019 (2019-01-01), pages 6696 - 6704, XP055842318, ISSN: 6696-6704, DOI: 10.13271/j.mpb.017.006696 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023163806A1 (en) * 2022-02-22 2023-08-31 Massachusetts Institute Of Technology Engineered nucleases and methods of use thereof
CN114686456A (zh) * 2022-05-10 2022-07-01 中山大学 基于双分子脱氨酶互补的碱基编辑系统及其应用
CN114686456B (zh) * 2022-05-10 2023-02-17 中山大学 基于双分子脱氨酶互补的碱基编辑系统及其应用
WO2024193739A1 (en) * 2023-03-23 2024-09-26 Ostravska Univerzita Method for producing proteins in host cells

Also Published As

Publication number Publication date
CN115667528B (zh) 2024-10-01
EP4116426A1 (en) 2023-01-11
US20240117368A1 (en) 2024-04-11
EP4116426A4 (en) 2024-05-22
CN115667528A (zh) 2023-01-31
BR112022017704A2 (pt) 2022-11-01

Similar Documents

Publication Publication Date Title
WO2021175289A1 (zh) 多重基因组编辑方法和系统
WO2021032155A1 (zh) 一种碱基编辑系统和其使用方法
KR20200103769A (ko) 연장된 단일 가이드 rna 및 그 용도
CN104080462B (zh) 用于修饰预定的靶核酸序列的组合物和方法
CN109136248B (zh) 多靶点编辑载体及其构建方法和应用
CN108130342B (zh) 基于Cpf1的植物基因组定点编辑方法
WO2021185358A1 (zh) 一种提高植物遗传转化和基因编辑效率的方法
CN107027313A (zh) 用于多元rna引导的基因组编辑和其它rna技术的方法和组合物
WO2021082830A1 (zh) 靶向性修饰植物基因组序列的方法
CN111575319B (zh) 一种高效的crispr rnp和供体dna共位介导的基因插入或替换方法及其应用
WO2018117746A1 (ko) 동물 배아의 염기 교정용 조성물 및 염기 교정 방법
WO2023169454A1 (zh) 腺嘌呤脱氨酶及其在碱基编辑中的用途
WO2019205939A1 (zh) 一种重复片段介导的植物定点重组方法
WO2023092731A1 (zh) Mad7-nls融合蛋白、用于植物基因组定点编辑的核酸构建物及其应用
WO2020020193A1 (zh) 基于人apobec3a脱氨酶的碱基编辑器及其用途
WO2021175288A1 (zh) 改进的胞嘧啶碱基编辑系统
WO2022199665A1 (zh) 一种提高植物遗传转化和基因编辑效率的方法
KR102679001B1 (ko) 신규의 개량된 염기 편집 또는 교정용 융합단백질 및 이의 용도
WO2023227050A1 (zh) 一种在基因组中定点插入外源序列的方法
Curtis et al. Recombinant DNA, vector design, and construction
US20200377909A1 (en) Directed genome engineering using enhanced targeted editing technologies
WO2023232109A1 (zh) 新的crispr基因编辑系统
US20120070900A1 (en) T-dna/protein nano-complexes for plant transformation
JP2022071820A (ja) 遺伝子の発現を高める方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21764356

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022017704

Country of ref document: BR

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021764356

Country of ref document: EP

Effective date: 20221004

ENP Entry into the national phase

Ref document number: 112022017704

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20220902