WO2020020193A1 - 基于人apobec3a脱氨酶的碱基编辑器及其用途 - Google Patents

基于人apobec3a脱氨酶的碱基编辑器及其用途 Download PDF

Info

Publication number
WO2020020193A1
WO2020020193A1 PCT/CN2019/097398 CN2019097398W WO2020020193A1 WO 2020020193 A1 WO2020020193 A1 WO 2020020193A1 CN 2019097398 W CN2019097398 W CN 2019097398W WO 2020020193 A1 WO2020020193 A1 WO 2020020193A1
Authority
WO
WIPO (PCT)
Prior art keywords
plant
base editing
fusion protein
pbe
nuclease
Prior art date
Application number
PCT/CN2019/097398
Other languages
English (en)
French (fr)
Inventor
高彩霞
宗媛
Original Assignee
中国科学院遗传与发育生物学研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院遗传与发育生物学研究所 filed Critical 中国科学院遗传与发育生物学研究所
Priority to CN201980049597.XA priority Critical patent/CN112805385B/zh
Publication of WO2020020193A1 publication Critical patent/WO2020020193A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Definitions

  • the invention relates to the field of genetic engineering. Specifically, the present invention relates to a base editor based on human APOBEC3A deaminase and use thereof, in particular to use of the editor in plant base editing, wherein the editor can mediate efficient C to T Nucleotide substitution.
  • BE base editor
  • DSB DNA double-strand breaks
  • Insertions and deletions Hess, GT et al. Mol. Cell 68, 26-43 (2017); Yang, B. et al. J. Genet. Genomics 44, 423-437 (2017).
  • This technology can complement HDR technology and circumvent some of its limitations.
  • BE3 The most widely used cytidine base editor, BE3, consists of a fusion of the cytosine deaminase APOBEC1 with Cas9 nickase (nCas9 (D10A)) and the uracil glycosylase inhibitor UGI (Komor, AC, etc. Nature 533,420– 424 (2016)), which can directly achieve C to T point mutations in genomic DNA targets.
  • the current BE3 editor is limited by a narrow deamination window within five base pairs, making it less efficient at certain target sites, and when the target As nucleotide C moves away from position 7, the efficiency is usually reduced.
  • BE3 clearly favors TC dinucleotides, and the editing activity for GC dinucleotides is significantly reduced or even undetectable. Both of these limitations prevent the editor from making precise mutations and diversified mutations, so further improvements to the base editor technology are needed.
  • the present invention includes a novel base editor A3A-PBE system, which can efficiently introduce C to T substitution mutations at endogenous genomic sites in a wide range of deamino groups in the range of 17 bp.
  • A3A-BE3 can work efficiently in highly GC-environment and hypermethylated regions, producing diversified mutations in coding and non-coding regions. Making the A3A-BE3 base editing system an attractive new tool for generating valuable precise mutations and diversified mutants in plant breeding can help improve crop improvement efficiency through genomic engineering.
  • Figure 1 Comparing the C to T base editing efficiency of A3A-PBE and PBE.
  • a A3A-PBE edits the range of cytosine bases.
  • b Schematic representation of three cytosine base editor constructs.
  • Figure 2 Comparing the C to T base editing efficiency of A3A-PBE and PBE.
  • a Flow cytogram of the conversion of BFP to GFP in rice using three cytosine base editors. Each cytosine base editor and pUbi-BFPm and pOsU3-BFP-sgRNA transformed protoplasts were used. GFP and untreated protoplast samples were used as controls. Scale bar, 150 ⁇ m.
  • b The frequency (%) of C to T substitution in the target region of the BFP coding sequence is measured by flow cytometry (FCM). Data are from three independent biological replicates, all values are mean ⁇ standard error. **** P ⁇ 0.0001.
  • Figure 3 Comparison of C to T base editing efficiency of A3A-PBE and PBE.
  • a Frequency of targeted single C to T substitutions introduced by PBE, A3A-PBE, and A3A-Gam at 4 target sites in wheat protoplasts.
  • b Frequency of targeted single C to T substitutions introduced by PBE, A3A-PBE, and A3A-Gam at six target sites of rice protoplasts.
  • Figures 4 and 5 Testing the purity of cytosine base editing products of wheat genome loci. The product distribution and insertion frequency of four representative wheat genomic DNA sites in wheat protoplasts treated with PBE, A3A-PBE, and A3A-Gam are shown. A total of 19,000-140,000 sequencing reads were used per location.
  • Figures 6, 7 and 8 Testing the purity of cytosine base editing products of rice genome loci.
  • the product distribution and insertion frequency of six representative rice genomic DNA sites in rice protoplasts treated with PBE, A3A-PBE, and A3A-Gam are shown.
  • a total of 25,000-131,000 sequencing reads were used per location.
  • Figure 10 Comparison of C to T base editing efficiency of A3A-PBE and PBE base editors in potato protoplasts.
  • (c) Indel frequencies of ten target sites in potatoes. Relative sgRNA insertion and deletion frequencies induced by PBE, A3A-PBE, and Cas9. Data are from three independent biological replicates (n 3), and each frequency is calculated (mean ⁇ standard error).
  • Figure 11 A3A-PBE's broad applicability in C to T base editing.
  • a Use A3A-PBE and PBE base editors to compare C to T base substitution efficiency in high GC backgrounds.
  • b Effect of sequence background on the efficiency of base editing when using PBE (windows 3-9) and A3A-PBE (windows 1-17).
  • the data (mean ⁇ standard error) was calculated using the data in Figures 3a-b and Figure 11a.
  • c Targeting a single C to T substitution frequency introduced by A3A-PBE in the cis element of the TaVRN1-A1 promoter.
  • Figure 12 Wide application of A3A-PBE in C to T base editing.
  • a Frequency of mutations induced by A3A-PBE in T0 wheat, rice and potato.
  • b Amino acid substitutions in TaALS confer herbicide resistance. Amino acid sequence alignment of wild type (WT) TaALS and T0-7 mutant TaALS. The phenotype of TO-7 after three weeks of growth in regeneration medium supplemented with 0.254 ppm nicosulfuron. Scale bar, 1cm.
  • FIG. 13 Identification and analysis of wheat seedlings with A3A-PBE targeted C to T substitutions.
  • Protospacer-adjacent motif (PAM) sequences are highlighted in bold and EcoO109I restriction sites are underlined.
  • Figure 14 Constructs for TaALS and TaMTL base editing and detection of transgene integration in the resulting T0 mutants.
  • the results of the transgene integration test were obtained using 5 primer pairs on 10 representative taals mutant plants and 10 tamt1 mutants.
  • Figure 15 Analysis of purified A3A-PBE- ⁇ UGI protein by SDS-PAGE. 3 ⁇ g of purified protein was separated on 10% SDS-PAGE and then visualized by Coomassie blue staining.
  • FIG. 16 A3A-PBE is widely used in C to T base editing.
  • b Bioinformatics analysis of PBE and A3A-PBE in the rice genome targeted Cs (NGG PAM) or Gs (CCN PAM) range. PBE or A3A-PBE in cooperation with different Cas9 variants (VQR, EQR, VRER, SaCas9, and SaKKH) significantly increased the base editing range of targeted Cs or Gs in the rice genome.
  • Figure 17 Vector construction of Cpf1-based A3A base editor.
  • Figure 18 Base editing of rice endogenous genes using Cpf1-based A3A base editor.
  • FIG. 19 shows the base editing efficiency of a construct containing an A3A mutant (N57G substitution).
  • Figure 20 Shows the effect of NLS on the efficiency of base editing.
  • CRISPR effector protein generally refers to nucleases found in naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, catalytically active fragments thereof, and the like.
  • the term encompasses any effector protein based on the CRISPR system capable of achieving gene targeting (e.g., gene editing, gene targeting regulation, etc.) within a cell.
  • Cas9 nucleases examples include Cas9 nucleases or variants thereof.
  • the Cas9 nuclease may be a Cas9 nuclease from a different species, such as spCas9 from S. pyogenes or SaCas9 derived from S. aureus.
  • Cas9 nuclease and Cas9 are used interchangeably herein and refer to RNA that includes Cas9 protein or a fragment thereof (eg, a protein comprising an active DNA cleavage domain of Cas9 and / or a gRNA binding domain of Cas9). Guided nuclease.
  • Cas9 is a component of the CRISPR / Cas (cluster regularly spaced short palindromic repeats and related systems) genome editing system, which can target and cleave DNA target sequences to form DNA double-strand breaks (DSB) under the guidance of guide RNA ).
  • CRISPR effector proteins may also include Cpf1 nucleases or variants thereof, such as highly specific variants.
  • the Cpf1 nuclease may be a Cpf1 nuclease from a different species, such as a Cpf1 nuclease from Francisella novicida U112, Acidaminococcus sp. BV3L6, and Lachnospiraceae bacteria ND2006.
  • gRNA and “guide RNA” are used interchangeably and refer to RNA capable of forming a complex with a CRISPR effector protein and being able to target the complex to a target sequence due to a certain complementarity with the target sequence molecule.
  • gRNAs typically consist of crRNA and tracrRNA molecules that are partially complementary to form a complex, where the crRNA contains a molecule that is sufficiently complementary to a target sequence to hybridize to the target sequence and direct the CRISPR complex (Cas9 + crRNA + tracrRNA) A sequence that specifically binds to the target sequence.
  • sgRNA unidirectional RNA
  • Cpf1 + crRNA complex RNA
  • Gene when used in plant cells encompasses not only chromosomal DNA present in the nucleus, but also organelle DNA present in subcellular components (such as mitochondria, plastids) of the cell.
  • plant includes the entire plant and any progeny, cells, tissues, or parts of the plant.
  • plant part includes any part of a plant, including, for example, but not limited to: seeds (including mature seeds, immature embryos without seed coats, and immature seeds); plant cuttings; plant cells; Plant cell culture; plant organs (e.g., pollen, embryos, flowers, fruits, buds, leaves, roots, stems, and related explants).
  • a plant tissue or plant organ may be a seed, a callus, or any other plant cell population organized into a structural or functional unit.
  • the plant cell or tissue culture can regenerate a plant having the physiological and morphological characteristics of the plant from which the cell or tissue is derived, and can regenerate a plant having substantially the same genotype as the plant. In contrast, some plant cells are unable to regenerate plants.
  • Renewable cells in plant cells or tissue culture can be embryos, protoplasts, meristematic cells, callus, pollen, leaves, anthers, roots, root tips, silk, flowers, nuts, spikes, cobs, shell , Or stem.
  • Plant parts include harvestable parts and parts that can be used to propagate offspring plants.
  • Plant parts that can be used for reproduction include, for example, but are not limited to: seeds; fruits; cuttings; seedlings; tubers; and rootstocks.
  • Harvestable parts of a plant can be any useful part of a plant, including, for example, but not limited to: flowers; pollen; seedlings; tubers; leaves; stems; fruits; seeds; and roots.
  • Plant cells are the structural and physiological unit of a plant.
  • plant cells include protoplasts and protoplasts with partial cell walls.
  • Plant cells may be in the form of isolated single cells or cell aggregates (e.g., loose callus and cultured cells), and may be part of higher-level tissue units (e.g., plant tissues, plant organs, and plants).
  • tissue units e.g., plant tissues, plant organs, and plants.
  • plant cells can be protoplasts, gamete-producing cells, or cells or collections of cells capable of regenerating whole plants.
  • a seed that contains multiple plant cells and is capable of regenerating into a whole plant is considered a "plant part.”
  • protoplast refers to a plant cell with its cell wall completely or partially removed and its lipid bilayer membrane exposed.
  • a protoplast is an isolated plant cell without a cell wall, which has the potential to regenerate a cell culture or a whole plant.
  • Plant "offspring” includes any subsequent generation of a plant.
  • a "genetically modified plant” includes a plant that includes an exogenous polynucleotide or a modified gene or expression control sequence in its genome.
  • exogenous polynucleotides can be stably integrated into the genome and inherited for successive generations.
  • Exogenous polynucleotides can be integrated into the genome individually or as part of a recombinant DNA construct.
  • Modified genes or expression control sequences are those in the plant genome that contain single or multiple deoxynucleotide substitutions, deletions, and additions.
  • a genetically modified plant obtained by the present invention may contain one or more A to G substitutions relative to a wild-type plant (the corresponding plant that is not genetically modified).
  • Exogenous with respect to a sequence means a sequence from a foreign species or, if it is from the same species, a sequence that has been significantly altered in composition and / or locus from its natural form through deliberate human intervention.
  • nucleic acid sequences are used interchangeably and are single- or double-stranded RNA or DNA polymers, which may optionally contain synthetic, non-natural Or changed nucleotide bases.
  • Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (corresponding to RNA or DNA respectively), “C” means cytidine or deoxycytidine, and “G” means guanosine or Deoxyguanosine, “U” for uridine, “T” for deoxythymidine, “R” for purine (A or G), “Y” for pyrimidine (C or T), “K” for G or T, “ “H” means A or C or T, “I” means inosine, and “N” means any nucleotide.
  • Protein refers to polymers of amino acid residues.
  • the term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acids, as well as to naturally occurring amino acid polymers.
  • polypeptide may also include modified forms, including but not limited to glycosylation, lipid linking, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation And ADP-ribosylation.
  • expression construct refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in a plant. "Expression” refers to the production of a functional product.
  • expression of a nucleotide sequence may refer to transcription of the nucleotide sequence (eg, transcription to produce mRNA or functional RNA) and / or translation of the RNA into a precursor or mature protein.
  • the "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, may be an RNA (such as mRNA) capable of translation.
  • RNA such as mRNA
  • the "expression construct" of the present invention may contain regulatory sequences and nucleotide sequences of interest from different sources, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that which is generally naturally occurring.
  • regulatory sequence and “regulatory element” are used interchangeably and refer to the upstream (5 'non-coding sequence), middle or downstream (3' non-coding sequence) of a coding sequence, and affect the transcription, RNA processing or Stability or translated nucleotide sequence.
  • a plant expression regulatory element refers to a nucleotide sequence capable of controlling transcription, RNA processing or stability, or translation of a nucleotide sequence of interest in a plant.
  • Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
  • a “promoter” refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • the promoter is a promoter capable of controlling gene transcription in a plant cell, whether or not it is derived from a plant cell.
  • the promoter may be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
  • Constant promoter refers to a promoter that will generally cause expression of a gene in most cell types in most cases.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably and refer to expression that is primarily, but not necessarily exclusively, in a tissue or organ, and can also be expressed in a specific cell or cell type Promoter.
  • Development-regulated promoter refers to a promoter whose activity is determined by developmental events.
  • Inducible promoters selectively express operably linked DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).
  • operably linked means that a regulatory element (such as, but not limited to, a promoter sequence, a transcription termination sequence, etc.) is linked to a nucleic acid sequence (such as a coding sequence or an open reading frame) such that a nucleotide The transcription of the sequence is controlled and regulated by the transcriptional regulatory elements.
  • a regulatory element such as, but not limited to, a promoter sequence, a transcription termination sequence, etc.
  • nucleic acid sequence such as a coding sequence or an open reading frame
  • Introducing" a nucleic acid molecule eg, a plasmid, linear nucleic acid fragment, RNA, etc.
  • protein into a plant refers to transforming a plant cell with the nucleic acid or protein so that the nucleic acid or protein can function in the plant cell.
  • transformation includes stable transformations and transient transformations.
  • “Stable transformation” refers to the introduction of a foreign nucleotide sequence into the genome of a plant, resulting in stable inheritance of the foreign gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the plant and any successive generations thereof.
  • Transient transformation refers to the introduction of a nucleic acid molecule or protein into a plant cell to perform a function without the stable inheritance of a foreign gene. In transient transformation, the exogenous nucleic acid sequence is not integrated into the plant genome.
  • “Character” refers to a physiological, morphological, biochemical, or physical characteristic of a plant or specific plant material or cell. In some embodiments, these characteristics may be visible to the naked eye, such as the size of seeds and plants; indicators that can be measured using biochemical techniques, such as the content of protein, starch, or oil in seeds or leaves; observable metabolism or Physiological processes, such as measuring resistance to water stress, specific salt, sugar, or nitrogen concentrations; detectable levels of gene expression; or observing agronomic traits such as resistance to osmotic stress or yield. In some embodiments, the trait also includes the resistance of the plant to the herbicide.
  • “Agronomic traits” are measurable index parameters, including but not limited to: leaf green, grain yield, growth rate, total biomass or accumulation rate, fresh weight at maturity, dry weight at maturity, fruit yield, seed yield, Plant total nitrogen content, fruit nitrogen content, seed nitrogen content, plant vegetative tissue nitrogen content, plant total free amino acid content, fruit free amino acid content, seed free amino acid content, plant nutrition tissue free amino acid content, plant total protein content, fruit protein content , Seed protein content, plant nutrient tissue protein content, drought resistance, nitrogen absorption, root lodging, harvest index, stem lodging, plant height, ear height, ear length, disease resistance, cold resistance, salt resistance and Tiller number, etc.
  • the present invention provides a base editing fusion protein comprising a nuclease-inactivated CRISPR effector protein (such as Cas9 and Cpf1, etc.) and APOBEC3A deaminase.
  • the base editing fusion protein comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 12-16.
  • base editing fusion protein and “base editor” are used interchangeably.
  • the invention also provides the use of the base editing fusion protein for base editing of a target sequence in a cell genome.
  • the present invention also provides a system for base editing a target sequence in a cell genome, which comprises at least one of the following i) to v):
  • an expression construct comprising a nucleotide sequence encoding a base editing fusion protein, and a guide RNA;
  • a base editing fusion protein a base editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA
  • an expression construct comprising a nucleotide sequence encoding a base editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
  • the base-editing fusion protein includes a nuclease-inactivated CRISPR effector protein (such as Cas9 and Cpf1, etc.) and APOBEC3A deaminase, and the guide RNA can target the base-editing fusion protein to a target in a cell genome. Sequence, such that the base-editing fusion protein causes one or more C's in the target sequence to be replaced by T.
  • a nuclease-inactivated CRISPR effector protein such as Cas9 and Cpf1, etc.
  • APOBEC3A deaminase APOBEC3A deaminase
  • the APOBEC3A deaminase is a human APOBEC3A deaminase.
  • the APOBEC3A deaminase comprises at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% of SEQ ID NO: 2 %, At least 95%, at least 96%, at least 97%, at least 98%, at least 99% of the sequence identity of the amino acid sequence, and substantially retain the deaminase activity of SEQ ID NO: 2.
  • the APOBEC3A deaminase comprises one or more relative to SEQ ID NO: 2, for example, one, two, three, four, five, six, seven, eight, 9 or 10 amino acid substitutions, deletions or additions, and substantially retain the deaminase activity of SEQ ID NO: 2.
  • the human APOBEC3A deaminase comprises the amino acid sequence shown in SEQ ID NO: 2.
  • the APOBEC3A deaminase contains an amino acid substitution at position 57 relative to SEQ ID NO: 2, such as a N57G substitution.
  • nuclease-inactivated CRISPR effector protein refers to the absence of double-stranded nucleic acid cleavage activity of a CRISPR effector protein, but still retains the gRNA-directed DNA targeting ability.
  • CRISPR effectors that lack double-stranded nucleic acid cleavage activity also encompass nickases, which form nicks in the double-stranded nucleic acid molecule, but do not completely cut the double-stranded nucleic acid.
  • the nuclease-inactivated CRISPR effector protein of the present invention has a nicking enzyme activity.
  • the mismatch repair of eukaryotes guides the removal and repair of mismatched bases in the DNA strand by nicks in the strand.
  • the U: G mismatch formed by cytidine deaminase may be repaired as C: G.
  • the nuclease-inactivated CRISPR effector protein is nuclease-inactivated Cas9.
  • the DNA cleavage domain of Cas9 nuclease is known to contain two subdomains: the HNH nuclease subdomain and the RuvC subdomain.
  • the HNH subdomain cleaves strands complementary to the gRNA, while the RuvC subdomain cleaves non-complementary strands. Mutations in these subdomains can inactivate the nuclease activity of Cas9, forming "nuclease-inactivated Cas9".
  • the nuclease-inactivated Cas9 still retains gRNA-directed DNA-binding ability. Therefore, in principle, when fused to another protein, the nuclease-inactivated Cas9 can simply target the additional protein to almost any DNA sequence by co-expression with a suitable guide RNA.
  • the nuclease-inactivated Cas9 according to the present invention may be derived from Cas9 of different species, for example, derived from S. pyogenes Cas9 (SpCas9), or derived from S. aureus Cas9 (SaCas9 ). Simultaneously mutating the HNH nuclease subdomain and RuvC subdomain of Cas9 (for example, containing mutations D10A and H840A) deactivates the nuclease of Cas9 and becomes a nuclease-death Cas9 (dCas9). Mutation inactivating one of the subdomains can make Cas9 have nickase activity, ie, obtain Cas9 nickase (nCas9), for example, nCas9 with only mutation D10A.
  • SpCas9 S. pyogenes Cas9
  • SaCas9 S. aureus Cas9
  • the nuclease-inactivated Cas9 of the invention comprises the amino acid substitutions D10A and / or H840A relative to wild-type Cas9.
  • nuclease-inactivated Cas9 may further include additional mutations.
  • nuclease-inactivated SpCas9 may also contain EQR, VQR or VRER mutations and SaCas9 may also contain KKH mutations (Kim et al. Nat. Biotechnol. 35, 371-376.).
  • the nuclease-inactivated SpCas9 comprises the amino acid sequence shown in SEQ ID NO: 4.
  • the nuclease-inactivated CRISPR effector protein is nuclease-inactivated Cpf1.
  • Cpf1 contains a DNA cleavage domain (RuvC), which can be mutated to make the DNA cleavage activity of Cpf1 absent, resulting in "Cpf1 without DNA cleavage activity".
  • the Cpf1 lacking the DNA cutting activity still retains the gRNA-directed DNA-binding ability. Therefore, in principle, when fused to another protein, Cpf1 with a loss of DNA cleavage activity can simply target the additional protein to almost any DNA sequence by co-expression with a suitable guide RNA.
  • the Cpf1 lacking DNA cutting activity in the present invention can be derived from Cpf1 of different species, for example, Cpf1 protein derived from Francisella novelicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacteria ND2006, respectively, which are called FnCpf1, AsCpf1 and LbCpf1.
  • the Cpf1 lacking in DNA cleavage activity is FnCpf1 lacking in DNA cleavage activity.
  • the FnCpf1 lacking DNA cleavage activity comprises a D917A mutation relative to the wild-type FnCpf1.
  • the Cpf1 lacking in DNA cleavage activity is AsCpf1 lacking in DNA cleavage activity.
  • the AsCpf1 lacking DNA cleavage activity comprises a D908A mutation relative to the wild-type AsCpf1.
  • the Cpf1 lacking in DNA cleavage activity is LbCpf1 lacking in DNA cleavage activity.
  • the LbCpf1 lacking DNA cutting activity comprises a D832A mutation relative to the wild-type LbCpf1.
  • the APOBEC3A deaminase is fused to the N-terminus of the nuclease-inactivated CRISPR effector protein (such as nuclease-inactivated Cas9 or Cpf1).
  • the APOBEC3A deaminase and the nuclease-inactivated CRISPR effector protein are fused via a linker.
  • the connector may be 1-50 in length (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, non-functional amino acid sequences without secondary structure.
  • the linker may be a flexible linker, such as GGGGS, GS, GAP, (GGGGS) x3, GGS, (GGS) x7, and the like.
  • the linker is 32 amino acids in length.
  • the linker is an XTEN linker shown in SEQ ID NO: 3.
  • uracil DNA glycosylase catalyzes the removal of U from DNA and initiates base excision repair (BER), resulting in the repair of U: G to C: G. Therefore, without being bound by any theory, the inclusion of a uracil DNA glycosylase inhibitor in the base editing fusion protein of the present invention or the system of the present invention will be able to increase the efficiency of base editing.
  • the base editing fusion protein further comprises a uracil DNA glycosylase inhibitor (UGI).
  • UMI uracil DNA glycosylase inhibitor
  • the uracil DNA glycosylase inhibitor comprises the amino acid sequence shown in SEQ ID NO: 5.
  • the base editing fusion protein of the invention further comprises a Gam protein.
  • the amino acid sequence is as shown in SEQ ID NO: 6.
  • the base editing fusion protein of the invention further comprises a nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • one or more NLS in the base editing fusion protein should have sufficient strength to drive the base editing fusion protein in the nucleus of a plant cell to accumulate in an amount that can achieve its base editing function .
  • the strength of nuclear localization activity is determined by the number, position of NLS in the base editing fusion protein, one or more specific NLS used, or a combination of these factors.
  • the NLS of the base editing fusion protein of the invention may be located at the N-terminus and / or C-terminus. In some embodiments of the present invention, the NLS of the base editing fusion protein of the present invention may be located between APOBEC3A deaminase and the nuclease-inactivated CRISPR effector protein. In some embodiments of the present invention, the NLS of the base editing fusion protein of the present invention may be located between APOBEC3A deaminase and the nuclease-inactivated CRISPR effector protein. In some embodiments, the base editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS.
  • the base editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS at or near the N-terminus. In some embodiments, the base editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS at or near the C-terminus. In some embodiments, the base editing fusion protein comprises a combination of these, such as including one or more NLS at the N-terminus and one or more NLS at the C-terminus. When there is more than one NLS, each can be selected to be independent of other NLS. In some embodiments of the invention, the base editing fusion protein comprises at least 2 NLS, for example, the at least 2 NLS are located at the C-terminus.
  • the NLS is located at the C-terminus of the base editing fusion protein. In some embodiments, the base editing fusion protein comprises at least 3 NLS. In some embodiments, the base editing fusion protein does not include NLS at the N-terminus and / or between the APOBEC3A deaminase and the nuclease-inactivated CRISPR effector protein.
  • NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of a protein, but other types of NLS are also known.
  • Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5'-AAGAAGAGAAAGGTC-3 '), PKKKRKV (nucleotide sequence 5'-CCCAAGAAGAAGAGGAAGGTG-3' or CCAAAGAAGAAGAGGAAGGTT), or SGGSPKKKRKV (nucleotide sequence 5'- TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3 ').
  • the N-terminus of the base editing fusion protein comprises an NLS of the amino acid sequence shown by PKKKRKV.
  • the C-terminus of the base editing fusion protein comprises an NLS of the amino acid sequence shown by KRPAATKKAGQAKKKK.
  • the N-terminus of the base-editing fusion protein has a higher NLS efficiency including the amino acid sequence shown by PKKKRKV.
  • the base editing fusion protein of the present invention may further include other positioning sequences, such as a cytoplasmic positioning sequence, a chloroplast positioning sequence, a mitochondrial positioning sequence, and the like.
  • the base editing fusion protein comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 12-16.
  • the nucleotide sequence encoding the base editing fusion protein is codon optimized for the plant to be subjected to base editing.
  • Codon optimization refers to replacing at least one codon of a natural sequence (e.g., about or more than about 1, 2, 3, 4, 5, 10, etc.) by codons that are used more or more frequently in a host cell's gene. , 15, 20, 25, 50 or more codons while maintaining the natural amino acid sequence while modifying the nucleic acid sequence in order to enhance expression in the host cell of interest. Different species show certain codons for specific amino acids Specific preferences. Codon preference (differences in codon usage between organisms) is often related to the translation efficiency of messenger RNA (mRNA), which is believed to depend on the nature of the codons being translated and Availability of specific transport RNA (tRNA) molecules.
  • mRNA messenger RNA
  • tRNA transport RNA
  • codon usage table can be easily obtained, for example, in www.kazusa.orjp / codon / available on the codon usage database ( "codon usage database"), and this Table can be adjusted in different ways applicable See, Nakamura Y. et, "Codon usage tabulated from the international DNA sequence databases:.. Status for the year2000.Nucl.Acids Res, 28: 292 (2000).
  • the base editing fusion protein is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NOs: 7-11.
  • the guide RNA is a single guide RNA (sgRNA).
  • sgRNA single guide RNA
  • Methods for constructing suitable sgRNAs from a given target sequence are known in the art. For example, see the literature: Wang, Y. et al. Simultaneous editing of homoeoalleles in hexaploid bread heaters heritable resistance to powdery mildew. Nat. Biotechnol. 32, 947-951 (2014); of crops using CRISPR-Cas system. Nat. Biotechnol. 31,686-688 (2013); Liang, Z. et al. Targeted mutagenesis in Zea mays using TALENs and the CRISPR / Cas system. J Genet Genomics. 41, 63– 68 (2014).
  • the guide RNA is esgRNA.
  • the construction of the esgRNA can refer to Li, C. et al. Genome Biol. 19, 59 (2018).
  • the nucleotide sequence encoding a base editing fusion protein and / or the nucleotide sequence encoding a guide RNA is operably linked to a plant expression regulatory element such as a promoter.
  • promoters examples include, but are not limited to, the cauliflower mosaic virus 35S promoter (Odell et al. (1985) Nature 313: 810-812), the corn Ubi-1 promoter, the wheat U6 promoter, rice U3 promoter, corn U3 promoter, rice actin promoter, TrpPro5 promoter (U.S. Patent Application No. 10 / 377,318; filed on March 16, 2005), pEMU promoter (Last et al. (1991) Theor .Appl.Genet. 81: 581-588), MAS promoter (Velten et al. (1984) EMBO J.
  • the cauliflower mosaic virus 35S promoter Odell et al. (1985) Nature 313: 810-812
  • the corn Ubi-1 promoter the wheat U6 promoter
  • rice U3 promoter corn U3 promoter
  • rice actin promoter TrpPro5 promoter
  • TrpPro5 promoter U.S. Patent Application No. 10 / 377
  • Promoters useful in the present invention also include commonly used tissue-specific promoters reviewed in Moore et al. (2006) Plant J. 45 (4): 651-683.
  • RNA of sgRNA that can be used in the present invention is obtained by means of self-cleaving of tRNA (Zhang et al. (2017) Genome Biology, 2017, 18: 191).
  • the present invention provides a method for generating a genetically modified organism, which comprises introducing the system of the present invention for base editing a target sequence in a cell genome into a cell of the organism, whereby the A guide RNA targets the base editing fusion protein to a target sequence in the organism's cell genome, causing one or more C's in the target sequence to be replaced by T.
  • the organism is a plant.
  • target sequences that can be recognized and targeted by Cas9 and guide RNA complexes are within the skill of those of ordinary skill in the art.
  • the design of the target sequence or crRNA coding sequence that can be recognized and targeted by the Cpf1 protein and guide RNA (ie crRNA) complex can refer to, for example, Zhang et al., Cell 163, 1-13, October 22, 2015.
  • the target sequence is a sequence complementary to the guide sequence of about 20 nucleotides contained in the guide RNA, and the 3 'end is immediately adjacent to the anterior interstitial sequence proximity motif (PAM) NGG.
  • PAM anterior interstitial sequence proximity motif
  • the target sequence has the following structure: 5'-N X -NGG-3 ', where N is independently selected from A, G, C, and T; X is 14 ⁇ X ⁇ An integer of 30; Nx represents X consecutive nucleotides, and NGG is a PAM sequence. In some preferred embodiments of the invention, X is 20.
  • the window for base editing is located at positions 1-17 of the target sequence. That is, the system of the present invention can make one or more C in the range of 1-17 positions from the 5 'end of the target sequence replaced by T.
  • it further comprises screening for an organism such as a plant having a desired nucleotide substitution.
  • Nucleotide substitutions in organisms such as plants can be detected by T7EI, PCR / RE or sequencing methods. For example, see Shan, Q., Wang, Y., Li, J. & Gao, C. Genome editing and wheat using the CRISPR / Cas system. Nat. Protoc. 9, 2395-2410 (2014).
  • the target sequence to be modified may be located at any position in the genome, for example, in a functional gene such as a protein coding gene, or may be located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby achieving Modification of gene function or modification of gene expression.
  • C to T base editing in the target sequence of the cell can be detected by T7EI, PCR / RE or sequencing methods.
  • the base editing system can be introduced into cells by various methods well known to those skilled in the art.
  • Methods that can be used to introduce the genome editing system of the present invention into cells include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, liposome transfection, microinjection, viral infections (such as baculovirus, vaccinia virus, adenovirus Viruses, adeno-associated viruses, lentiviruses, and other viruses), particle bombardment, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.
  • Cells that can be subjected to genome editing by the methods of the present invention can be from, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants, including monads Leaf plants and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, and the like.
  • the method of the invention is particularly suitable for producing genetically modified plants, such as crop plants.
  • the base editing system can be introduced into a plant by various methods well known to those skilled in the art. Methods that can be used to introduce the base editing system of the present invention into plants include, but are not limited to: gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube pathway method, and daughter Room injection.
  • the base editing system is introduced into a plant by transient transformation.
  • the base editing fusion protein and guide RNA are introduced or generated in a plant cell to modify the target sequence, and the modification can be stably inherited without the need to edit the base.
  • the system stably transforms the plant. This avoids the potential off-target effect of the stable base editing system, and also avoids the integration of foreign nucleotide sequences in the plant genome, thereby having higher biological safety.
  • the introduction is performed in the absence of selection pressure, thereby avoiding the integration of exogenous nucleotide sequences in the plant genome.
  • the introducing comprises transforming the base editing system of the present invention into an isolated plant cell or tissue, and then regenerating the transformed plant cell or tissue into a whole plant.
  • the regeneration is performed in the absence of selection pressure, that is, no selection agent for the selection gene carried on the expression vector is used during tissue culture. Without the use of a selection agent, the regeneration efficiency of the plant can be improved, and a modified plant can be obtained without an exogenous nucleotide sequence.
  • the base editing system of the present invention can be transformed into specific parts on intact plants, such as leaves, stem tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants which are difficult to carry out tissue culture regeneration.
  • proteins expressed in vitro and / or RNA molecules transcribed in vitro are directly transformed into the plant.
  • the protein and / or RNA molecule can realize base editing in the plant cell, and then be degraded by the cell, avoiding the integration of the exogenous nucleotide sequence in the plant genome.
  • genetic modification and breeding of plants using the methods of the present invention can result in plants without foreign DNA integration, ie non-transgene-free modified plants.
  • the base editing system of the present invention has high specificity (low off-target rate) when performing base editing in plants, which also improves biosafety.
  • Plants that can be base edited by the method of the present invention include monocotyledons and dicotyledons.
  • the plant may be a crop plant such as wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava or potato.
  • the target sequence is related to a plant trait, such as an agronomic trait, whereby the base editing results in the plant having an altered trait relative to a wild-type plant.
  • the target sequence to be modified may be located at any position in the genome, for example, in a functional gene such as a protein coding gene, or may be located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby achieving Modification of gene function or modification of gene expression.
  • the C to T substitution results in an amino acid substitution in the target protein.
  • the C to T substitution results in a change in the expression of the target gene.
  • the method further comprises obtaining progeny of the genetically modified plant.
  • the present invention also provides a genetically modified plant or progeny or part thereof, wherein the plant is obtained by the above-mentioned method of the present invention.
  • the genetically modified plant or its progeny or parts thereof are non-transgenic.
  • the present invention also provides a plant breeding method, which comprises crossing a genetically modified first plant obtained by the above-mentioned method of the present invention with a second plant that does not contain the genetic modification, thereby combining the genetically modified first plant The genetic modification is introduced into the second plant.
  • the protoplasts used in the present invention are derived from winter wheat variety Kenong199, japonica rice variety Zhonghua 11 and potato variety "Désirée” (Désirée).
  • the rAPOBEC1 in plant nCas9-PBE system (hereinafter referred to as PBE) (Zong, Y. et al. Nat. Biotechnol. 35,438-440 (2017)) was replaced with human APOBEC3A (hereinafter referred to as A3A), and codons were performed on cereal plants. Optimization ( Figure 1b), A3A-PBE was obtained.
  • each plant base editor PBE, A3A-PBE, and A3A-Gam constructs were co-transfected into rice protoplasts with pUbi-BFPm and pOsU3-BFP-sgRNA.
  • Example 2 A3A-PBE mutation efficiency and editing window verification in wheat and rice cells
  • sgRNAs were designed for three wheat genes (TaALS, TaMTL, TaLOX2-T1, and TaLOX2-T2), and six rice genes (OsAAT-T1, OsCDC48, OsDEP1, OsPDS, OsNRT1.1B-T1, OsOD and OsEV) each design one sgRNA ( Figure 3a-b and Table 1).
  • deletion and / or insertion mutations were generated using wild-type Cas9 (WT Cas9).
  • the underlined C / G bases are those edited by PBE, A3A-PBE, and A3A-Gam.
  • the PAM motifs in each target sequence are shown in bold.
  • next-generation sequencing technology NGS
  • NGS next-generation sequencing technology
  • the gene editing efficiency of A3A-PBE is the highest, its editing frequency in wheat is 0.3-36.9%, and its editing frequency in rice is 0.5-31.1% ( Figure 3a-b).
  • the average editing efficiency of A3A-PBE at 10 target sites is 13.1%, which is 13 times and 5 times higher than the average efficiency of PBE (1%) and the average efficiency of A3A-Gam (2.8%), respectively.
  • the increase in base editing efficiency at these target sites is: PBE ⁇ A3A-Gam ⁇ A3A-PBE, which is consistent with the results of the reporting system ( Figure 2a-b).
  • A3A-PBE did not induce unexpected edits ( ⁇ 0.1%) at any wheat and rice genome target loci, and its insertion and deletion frequency ( ⁇ 0.1%) was significantly lower than that of wild Cas9 (WT Cas9) (2.2-21.6%) ( Figure 5-10).
  • Example 3 A3A-PBE mutation efficiency in tetraploid potatoes and editing window verification
  • Tetraploid inheritance has made the study of potatoes and breeding by traditional crosses a challenge (Obidiegwu, JE, Flath, K. and Gebhardt, C. Theor. Appl. Genet. 127, 763-780 (2014)).
  • This example uses A3A-PBE in a tetraploid potato (Solanum tuberosum).
  • a fusion protein of A3A-PBE and PBE is driven by the 35S promoter, and sgRNA is driven by the AtU6 promoter (Fig. 11a).
  • SgRNA was co-transformed into potato protoplasts along with A3A-PBE or PBE constructs, and base editing-induced mutations were detected 48 hours after transfection.
  • the average editing efficiency of PBE at these 10 target sites was 0.4% ( Figure 3c). Observing the C to T conversion of A3A-PBE at these 10 target sites, the average efficiency (4.3%) was about 11 times higher than PBE.
  • A3A-PBE provides higher C-T mutation efficiency and wider editing windows at multiple loci in wheat, rice, and potato cells than PBE.
  • Example 4 Testing the situation of the A3A-PBE fusion gene at a high GC site within the endogenous plant gene
  • A3A-PBE is more advantageous for targeted mutations. All in all, A3A-PBE can edit cytidine almost equally regardless of the sequence context, which is superior to PBE ( Figure 11b). Given the reduced requirements for target cytosine flanking sequences, this technology will improve the targeting range and thus be more conducive to generating point mutations.
  • Example 5 Investigate whether A3A-PBE can produce diversified mutations when combined with multiple sgRNAs
  • the TaVRN1-A1 promoter contains multiple regulatory sites, such as the VRN box, CArG box, and a putative AG hybridization box (Figure 11c). Mutations in these multiple binding sites can affect wheat flowering time (Chengxia, L., and Jorge J. 55, 543-554 (2008) .; Kippes, N. et al. Proc. Natl. Acad. Sci. USA 112, E5401-E5410 (2015)).
  • Three sgRNAs were designed to target relevant binding sites (Figure 11c).
  • amplicons of TaVRN1 target sites were amplified, thereby identifying reads carrying different mutations in these six cis-elements, which Efficiency ranges from 1.2% to 27.7%.
  • A3A-PBE effectively edited C nucleotides at positions 4 to 16 of the sgRNA target sequence, which is sufficient to disrupt the binding to the bZIP transcription factor (Figure 11c) (Chengxia, L And Jorge, D. The Plant J. 55, 543-554 (2008); Kippes, N. et al. Proc. Natl. Acad. Sci. USA 112, E5401-E5410 (2015)).
  • acetolactate synthase gene (ALS) in wheat, which is the first enzyme in the branched chain amino acid biosynthetic pathway.
  • ALS acetolactate synthase gene
  • Substitution of the conservative P197 amino acid of Lolium rigididum ALS with other amino acids can make the grass species resistant to the herbicide nicosulfuron (Powles, SB and Yu, Q. Annu. Rev. Plant Biol. 61,317-347 (2010) .).
  • P197 in Lolium rigididum corresponds to P174 in the hexaploid wheat target site TaALS.
  • A3A-PBE and pTaU6-ALS-sgRNA constructs were transferred into immature wheat embryos by gene gun method, and plants were regenerated without the use of herbicides or resistance screening.
  • PCR-RE and Sanger sequencing out of 120 transformed immature embryos, 27 mutant plants containing at least one C to T substitution were regenerated with a mutation efficiency of 22.5% (27/120) ( Figure 12a, Figure 13 ), About 4-10 times more efficient than previously reported CRISPR / Cas9-mediated gene knockouts or point mutations.
  • C to T substitutions were found at positions -7, 6, 7, 8, 9, 10, 12 and 13 of the anterior interstitial sequence ( Figures 12a and 13).
  • the online tool CRISPR-P was used to predict potential off-target regions, and OsCDC48 and OsNRT1.1B-T2 off-target sites were identified and detected in the rice genome.
  • A3A-PBE without UGI (A3A-PBE- ⁇ UGI) protein was expressed and purified in E. coli ( Figure 15).
  • the fusion protein In the absence of UGI, the fusion protein is less toxic to plant cells, easier to purify, and can increase the likelihood of C nucleotide conversion to the other three base nucleotides.
  • the A3A-PBE- ⁇ UGI protein forms a ribonucleoprotein complex with sgRNA transcribed in vitro, and transfers a complex against two wheat genes (TaMTL and TaLOX2-T5) into protoplasts ( Figure 16a and Table 1).
  • Plant A3A-PBE- ⁇ UGI RNP can be further optimized to produce non-transgenic mutant plants, which can facilitate the application of base editing in the breeding and commercialization of improved crop plants.
  • A3A was also mutated, and the N mutation at position 57 was G (N57G substitution) to construct the A3A-PBE-N57G fusion protein.
  • A3A-PBE, A3A-PBE-N57G and A3A-PBE- ⁇ UGI were transformed into wheat and rice protoplasts, and base editing was performed for different genes. The results are shown in Figure 19.
  • A3A-PBE-N57G and A3A-PBE- ⁇ UGI can have higher editing efficiency at some sites.
  • nCas9 in the aforementioned A3A base editor is replaced with a nuclease-inactivated Cpf1 protein.
  • Vector construction is shown in Figure 17.
  • the obtained Cpf1-based A3A base editor was used to edit the endogenous target gene rice DEP1, and the mutation efficiency at the tenth position C was detected.
  • the results are shown in Figure 18.
  • the results show that compared with APOBEC1, human APOBEC3A can significantly improve base editing efficiency.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Paper (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)

Abstract

提供了一种基于人APOBEC3A脱氨酶的碱基编辑器及其用途,其中该编辑器能够介导高效的C至T的核苷酸取代。

Description

基于人APOBEC3A脱氨酶的碱基编辑器及其用途 技术领域
本发明涉及基因工程领域。具体而言,本发明涉及基于人APOBEC3A脱氨酶的碱基编辑器及其用途,特别是所述编辑器在植物碱基编辑中的用途,其中所述编辑器能够介导高效的C至T的核苷酸取代。
发明背景
目前,大量的与重要农艺性状相关的单核苷酸变体被开发并应用于作物改良(Zhao,K.等.Nat.Commun.2,467(2011);Henikoff,S.和Comai,L.Annu.Rev.Plant Biol.54,375–401(2003))。植物单核苷酸多态性遗传工程代表了分子育种的巨大进步(Voytas,D.F.和Gao,C.PLoS Biol.12,e1001877(2014);Gao,C.Nat.Rev.Mol.Cell Biol.19,275-276(2018))。
近期出现的碱基编辑器(BE)技术在包括植物在内的多种物种中实现了单核苷酸基因组修饰,而不需引入DNA双链断裂(DSB)、外源供体DNA模板以及多余的插入缺失(Hess,G.T.等.Mol.Cell 68,26-43(2017);Yang,B.等.J.Genet.Genomics 44,423-437(2017))。该技术可以与HDR技术互补并规避其部分局限性。最为广泛使用的胞苷碱基编辑器BE3,由胞嘧啶脱氨酶APOBEC1与Cas9切口酶(nCas9(D10A))和尿嘧啶糖基化酶抑制剂UGI融合组成(Komor,A.C.等.Nature 533,420–424(2016)),其可直接实现基因组DNA靶中的C至T点突变。
已有研究对BE3进行修饰以扩大其PAM的选择范围,并提高其编辑效率和特异性(Kim,Y.B.等.Nat.Biotechnol.35,371-376(2017);Komor,A.C.等.Sci.Adv.3,eaao4774(2017);Kim,K.等.Nat.Biotechnol.35,435-437(2017);Rees,H.A.等.Nat.Commun.8,15790(2017).;Gerhke,J.M.等.bioRxiv 273938.doi:10.1101/273938(2018);St Martin,A.等.Nucleic Acids Res.9.doi:10.1093/nar/gky332(2018))。然而,尽管这些进展非常有利且取得了相关进展,但是目前的BE3编辑器受限于五个碱基对内的狭窄的脱氨基窗,使其在某些靶位点效率较低,并且当靶核苷酸C远离位置7时通常效率会降低。另外,BE3明显偏好于TC二核苷酸,而对GC二核苷酸的编辑活性明显降低甚至不可检测。这两点限制性都妨碍了编辑器进行精准突变和多样化突变,因此需要进一步对碱基编辑器技术进行改进。
发明简述
本发明包括一种新型碱基编辑器A3A-PBE系统,该系统能够在17bp范围内的脱氨 基窗口内,广泛地在内源基因组位点上高效引入C至T取代突变。A3A-BE3可以在高度GC化环境和高甲基化区域高效工作,在编码区和非编码区产生多样化的突变。使A3A-BE3碱基编辑系统成为植物育种中产生有价值的精准突变和多样化突变体的有吸引力的新工具,有助于通过基因组工程提高作物改良效率。
附图简述
图1:比较A3A-PBE和PBE的C至T碱基编辑效率。a:A3A-PBE编辑胞嘧啶碱基的范围。b:三种胞嘧啶碱基编辑器构建体的示意图。
图2:比较A3A-PBE和PBE的C至T碱基编辑效率。a:使用三种胞嘧啶碱基编辑器将水稻中BFP转化为GFP的流式细胞图。使用各胞嘧啶碱基编辑器以及pUbi-BFPm和pOsU3-BFP-sgRNA转化后的原生质体。GFP和未处理的原生质体样品用作对照。比例尺,150μm。b:通过流式细胞术(FCM)测量在BFP编码序列的靶区域中C至T取代的频率(%)。数据来自三个独立的生物学重复,所有值均为平均值±标准误差。****P<0.0001。
图3:比较A3A-PBE和PBE的C至T碱基编辑效率。a:由PBE、A3A-PBE和A3A-Gam在小麦原生质体的4个靶位点引入的靶向单C至T取代的频率。b:由PBE、A3A-PBE和A3A-Gam在水稻原生质体的6个靶位点引入的靶向单C至T取代的频率。c:由PBE和A3A-PBE在马铃薯原生质体的10个靶位点引入的靶向单C至T取代的频率。未处理的原生质体样品用作对照。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值±标准误差)。
图4和图5:测试小麦基因组座位的胞嘧啶碱基编辑的产物纯度。显示了用PBE、A3A-PBE和A3A-Gam处理的小麦原生质体中四个代表性小麦基因组DNA位点的产物分布和插入频率。每个位置总共使用19,000-140,000个测序读数。
图6、图7和图8:测试水稻基因组座位的胞嘧啶碱基编辑产物纯度。显示了用PBE、A3A-PBE和A3A-Gam处理的水稻原生质体中六个代表性水稻基因组DNA位点的产物分布和插入频率。每个位置总共使用25,000-131,000个测序读数。
图9:小麦和水稻基因组十个靶位点的插入缺失频率。测量由PBE、A3A-PBE、A3A-Gam和Cas9诱导的插入缺失频率。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值±标准误差)。
图10:A3A-PBE和PBE碱基编辑器在马铃薯原生质体中C至T碱基编辑效率的比较。(a)两种胞嘧啶碱基编辑器和sgRNA载体的示意图。(b)靶向StALS和StGBSS的sgRNA序列。脱氨窗口中的C碱基以蓝色突出显示。PAM序列显示为红色。(c)马铃薯十个靶位点的插入缺失频率。由PBE、A3A-PBE和Cas9诱导的相对sgRNA的插入缺失频率。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值±标准误差)。
图11:A3A-PBE在C到T碱基编辑中的广泛适用。a:使用A3A-PBE和PBE碱基编辑器比较高GC背景中C至T碱基取代效率。b:序列背景对使用PBE(窗口为3-9) 和A3A-PBE(窗口为1-17)时碱基编辑效率的影响。使用图3a-b和图11a中的数据计算频率(平均值±标准误差)。c:由A3A-PBE在TaVRN1-A1启动子的顺式元件中引入的靶向单一C至T取代频率。
图12:A3A-PBE在C到T碱基编辑中的广泛适用。a:T0小麦、水稻和马铃薯中由A3A-PBE诱导的突变频率。b:TaALS中的氨基酸取代赋予除草剂抗性。野生型(WT)TaALS与T0-7突变体TaALS的氨基酸序列比对。生长三周后在添加0.254ppm烟嘧磺隆的再生培养基中T0-7的表型。比例尺,1cm。
图13:鉴定和分析具有A3A-PBE靶向C至T取代的小麦幼苗。(a)靶向TaALS同源物外显子保守区域的sgRNA序列。脱氨窗口中的C碱基以红色突出显示。前间区序列邻近基序(Protospacer-adjacent motif,PAM)序列用粗体突出显示,EcoO109I限制性位点用下划线表示。(b)对10株代表性taals突变体进行PCR-RE分析。泳道T0-1至T0-10显示用EcoO109I消化后的独立小麦植物的扩增PCR片段。标记WT/D和WT/U的泳道分别指从野生型(WT)植物扩增的使用和不使用EcoO109I消化的PCR片段。箭头标记的条带表示阳性碱基编辑。
图14:用于TaALS和TaMTL碱基编辑的构建体以及在所得到的T0突变体中检测转基因整合。(a)用于TaALS和TaMTL碱基编辑的A3A-PBE和pTaU6-sgRNA载体图。显示了用于检测转基因整合的5个引物对(F1/R1、F2/R2、F3/R3、F4/R4和F5/R5)的位置。(b)针对10个代表性taals突变体植物和10个tamt1突变体使用5个引物对得到转基因整合试验的结果。在TaALS(T0-3、T0-5、T0-6和T0-7)的四种突变体和TaMTL(T0-1、T0-2、T0-3、T0-5、T0-6和T0-9)的六种突变体中,使用5种引物对均未产生预期的PCR扩增,表明其为非转基因的。使用从野生型小麦植物提取的基因组DNA(cvKenong 199)作为阴性对照。用A3A-PBE或pTaU6-sgRNA质粒DNA作为阳性对照。
图15:通过SDS-PAGE分析纯化的A3A-PBE-ΔUGI蛋白。在10%SDS-PAGE分离3μg纯化蛋白质,然后经考马斯蓝染色显现。
图16:A3A-PBE在C到T碱基编辑中广泛适用。a:使用A3A-PBE-ΔUGI(DNA)和A3A-PBE-ΔUGI(RNP)比较C至T碱基编辑效率。未处理的原生质体样品用作对照。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值)。b:生物信息学分析PBE和A3A-PBE在水稻基因组靶向Cs(NGG PAM)或Gs(CCN PAM)的范围。PBE或A3A-PBE协同不同的Cas9变体(VQR、EQR、VRER,SaCas9和SaKKH)显著增加水稻基因组中靶向Cs或Gs的碱基编辑范围。
图17:基于Cpf1的A3A碱基编辑器的载体构建。
图18:用基于Cpf1的A3A碱基编辑器对水稻内源基因进行碱基编辑。
图19:示出包含A3A突变体(N57G取代)的构建体碱基编辑效率。
图20:示出NLS对碱基编辑效率的影响。
发明详述
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,Molecular Cloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold Spring Harbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
如本文所用,术语“CRISPR效应蛋白”通常指在天然存在的CRISPR系统中存在的核酸酶,以及其修饰形式、其变体、其催化活性片段等。该术语涵盖基于CRISPR系统的能够在细胞内实现基因靶向(例如基因编辑、基因靶向调控等)的任何效应蛋白。
“CRISPR效应蛋白”的实例包括Cas9核酸酶或其变体。所述Cas9核酸酶可以是来自不同物种的Cas9核酸酶,例如来自化脓链球菌(S.pyogenes)的spCas9或衍生自金黄色葡萄球菌(S.aureus)的SaCas9。“Cas9核酸酶”和“Cas9”在本文中可互换使用,指的是包括Cas9蛋白或其片段(例如包含Cas9的活性DNA切割结构域和/或Cas9的gRNA结合结构域的蛋白)的RNA指导的核酸酶。Cas9是CRISPR/Cas(成簇的规律间隔的短回文重复序列及其相关系统)基因组编辑系统的组分,能在向导RNA的指导下靶向并切割DNA靶序列形成DNA双链断裂(DSB)。
“CRISPR效应蛋白”的实例还可以包括Cpf1核酸酶或其变体例如高特异性变体。所述Cpf1核酸酶可以是来自不同物种的Cpf1核酸酶,例如来自Francisella novicida U112、Acidaminococcus sp.BV3L6和Lachnospiraceae bacterium ND2006的Cpf1核酸酶。
如本文所用,“gRNA”和“向导RNA”可互换使用,指的是能够与CRISPR效应蛋白形成复合物并由于与靶序列具有一定互补性而能够将所述复合物靶向靶序列的RNA分子。例如,在基于Cas9的基因编辑系统中,gRNA通常由部分互补形成复合物的crRNA和tracrRNA分子构成,其中crRNA包含与靶序列具有足够互补性以便与该靶序列杂交并且指导CRISPR复合物(Cas9+crRNA+tracrRNA)与该靶序列序列特异性地结合的序列。然而,本领域已知可以设计单向导RNA(sgRNA),其同时包含crRNA和tracrRNA的特征。而在基于Cpf1的基因组编辑系统中,gRNA通常仅由成熟crRNA分子构成,其中crRNA包含的序列与靶序列具有足够相同性以便与靶序列的互补序列杂交并且指导复合物(Cpf1+crRNA)与该靶序列序列特异性结合。基于所使用的CRISPR效应蛋白和待编辑的靶序列设计合适的gRNA序列属于本领域技术人员的能力范围内。
“基因组”在用于植物细胞时不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
如本文所使用的,术语“植物”包括整个植物和任何后代、植物的细胞、组织、或 部分。术语“植物部分”包括植物的任何部分,包括,例如但不限于:种子(包括成熟种子、没有种皮的未成熟胚、和不成熟的种子);植物插条(plant cutting);植物细胞;植物细胞培养物;植物器官(例如,花粉、胚、花、果实、芽、叶、根、茎,和相关外植体)。植物组织或植物器官可以是种子、愈伤组织、或者任何其他被组织成结构或功能单元的植物细胞群体。植物细胞或组织培养物能够再生出具有该细胞或组织所来源的植物的生理学和形态学特征的植物,并能够再生出与该植物具有基本上相同基因型的植物。与此相反,一些植物细胞不能够再生产生植物。植物细胞或组织培养物中的可再生细胞可以是胚、原生质体、分生细胞、愈伤组织、花粉、叶、花药、根、根尖、丝、花、果仁、穗、穗轴、壳、或茎。
植物部分包括可收获的部分和可用于繁殖后代植物的部分。可用于繁殖的植物部分包括,例如但不限于:种子;果实;插条;苗;块茎;和砧木。植物的可收获部分可以是植物的任何有用部分,包括,例如但不限于:花;花粉;苗;块茎;叶;茎;果实;种子;和根。
植物细胞是植物的结构和生理单元。如本文所使用的,植物细胞包括原生质体和具有部分细胞壁的原生质体。植物细胞可以处于分离的单个细胞或细胞聚集体的形式(例如,松散愈伤组织和培养的细胞),并且可以是更高级组织单元(例如,植物组织、植物器官、和植物)的一部分。因此,植物细胞可以是原生质体、产生配子的细胞,或者能够再生成完整植物的细胞或细胞的集合。因此,在本文的实施方案中,包含多个植物细胞并能够再生成为整株植物的种子被认为是一种“植物部分”。
如本文所使用的,术语“原生质体”是指细胞壁被完全或部分地除去、其脂双层膜裸露的植物细胞。典型地,原生质体是没有细胞壁的分离植物细胞,其具有再生成细胞培养物或整株植物的潜力。
植物“后代”包括植物的任何后续世代。
“经遗传修饰的植物”包括在其基因组内包含外源多核苷酸或修饰的基因或表达调控序列的植物。例如外源多核苷酸能够稳定地整合进基因组中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。修饰的基因或表达调控序列为在植物基因组中所述序列包含单个或多个脱氧核苷酸取代、缺失和添加。例如,通过本发明获得的经遗传修饰的植物可以相对于野生型植物(相应的未经所述遗传修饰的植物)包含一个或多个A至G的取代。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A”为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G”表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H” 表示A或C或T,“I”表示肌苷,并且“N”表示任何核苷酸。
“多肽”、“肽”、和“蛋白质”在本发明中可互换使用,指氨基酸残基的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白质”还可包括修饰形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。
如本发明所用,“表达构建体”是指适于感兴趣的核苷酸序列在植物中表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA)。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。植物表达调控元件指的是能够在植物中控制感兴趣的核苷酸序列转录、RNA加工或稳定性或者翻译的核苷酸序列。
调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中,启动子是能够控制植物细胞中基因转录的启动子,无论其是否来源于植物细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用,并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”植物是指用所述核酸或蛋白质转化植物细胞,使得所述核酸或蛋白质在植物细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。
“稳定转化”指将外源核苷酸序列导入植物基因组中,导致外源基因稳定遗传。一旦稳定转化,外源核酸序列稳定地整合进所述植物和其任何连续世代的基因组中。
“瞬时转化”指将核酸分子或蛋白质导入植物细胞中,执行功能而没有外源基因稳 定遗传。瞬时转化中,外源核酸序列不整合进植物基因组中。
“性状”指植物或特定植物材料或细胞的生理的、形态的、生化的或物理的特征。在一些实施方式中,这些特征可以是肉眼可见的,比如种子、植株的大小等;可用生物化学技术测定的指标,如种子或叶片中蛋白、淀粉或油份的含量等;可观察的代谢或生理过程,如测定对水分胁迫、特定盐、糖或氮浓度的抗性;可检测的基因表达水平;或可观察渗透胁迫的抗性或产量等农艺性状。在一些实施方式中,性状还包括植物对除草剂的抗性。
“农艺性状”是可测量的指标参数,包括但不限于:叶片绿色、籽粒产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白含量、种子蛋白含量、植物营养组织蛋白质含量、抗旱性、氮的吸收、根的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗盐性和分蘖数等。
二、碱基编辑系统
首先,本发明提供一种碱基编辑融合蛋白,其包含核酸酶失活的CRISPR效应蛋白(如Cas9和Cpf1等)和APOBEC3A脱氨酶。在一些实施方案中,所述碱基编辑融合蛋白包含选自SEQ ID NO:12-16的氨基酸序列。
本发明人令人惊奇地发现,核酸酶失活的CRISPR效应蛋白与APOBEC3A脱氨酶相融合形成的碱基编辑器,能够在17bp范围内的脱氨基窗口内,广泛地在植物内源基因组位点上甚至是高GC背景的位点高效引入C至T取代突变。在本文实施方案中,“碱基编辑融合蛋白”和“碱基编辑器”可互换使用。
本发明还提供了所述碱基编辑融合蛋白在对细胞基因组中的靶序列进行碱基编辑的用途。
本发明还提供了一种用于对细胞基因组中的靶序列进行碱基编辑的系统,其包含以下i)至v)中至少一项:
i)碱基编辑融合蛋白,和向导RNA;
ii)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导RNA;
iii)碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构建体;
iv)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编码向导RNA的核苷酸序列的表达构建体;
v)包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷酸序列的表达构建体;
其中所述碱基编辑融合蛋白包含核酸酶失活的CRISPR效应蛋白(如Cas9和Cpf1等)和APOBEC3A脱氨酶,所述向导RNA能够将所述碱基编辑融合蛋白靶向细胞基因 组中的靶序列,从而所述碱基编辑融合蛋白导致所述靶序列中的一或多个C被T取代。
在本发明各个方面的一些实施方案中,所述APOBEC3A脱氨酶是人APOBEC3A脱氨酶。在一些实施方案中,所述APOBEC3A脱氨酶包含与SEQ ID NO:2具有至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%序列相同性的氨基酸序列,并基本上保留SEQ ID NO:2的脱氨酶活性。在一些实施方案中,所述APOBEC3A脱氨酶相对于SEQ ID NO:2包含一或多个,例如1个、2个、3个、4个、5个、6个、7个、8个、9个、10个氨基酸取代、缺失或添加,并基本上保留SEQ ID NO:2的脱氨酶活性。在一些实施方案中,所述人APOBEC3A脱氨酶包含SEQ ID NO:2所示氨基酸序列。在一些实施方案中,所述APOBEC3A脱氨酶相对于SEQ ID NO:2包含在第57位的氨基酸取代,例如N57G取代。
如本发明所用,“核酸酶失活的CRISPR效应蛋白”是指CRISPR效应蛋白的双链核酸切割活性缺失,然而还保留gRNA指导的DNA靶向能力。缺失双链核酸切割活性的CRISPR效应蛋白也涵盖切口酶(nickase),其在双链核酸分子形成切口(nick),但不完全切断双链核酸。
在本发明的一些优选的实施方案中,本发明所述核酸酶失活的CRISPR效应蛋白具有切口酶活性。不受任何理论限制,据认为真核生物的错配修复通过DNA链上的切口(nick)来指导该链错配碱基的移除和修复。胞苷脱氨酶作用形成的U:G错配可能被修复为C:G。通过在包含未编辑的G的一条链上引入切口,将能够优先地将U:G错配修复为期望的U:A或T:A。
在一些实施方案中,所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cas9。Cas9核酸酶的DNA切割结构域已知包含两个亚结构域:HNH核酸酶亚结构域和RuvC亚结构域。HNH亚结构域切割与gRNA互补的链,而RuvC亚结构域切割非互补的链。在这些亚结构域中的突变可以使Cas9的核酸酶活性失活,形成“核酸酶失活的Cas9”。所述核酸酶失活的Cas9仍然保留gRNA指导的DNA结合能力。因此,原则上,当与另外的蛋白融合时,核酸酶失活的Cas9可以简单地通过与合适的向导RNA共表达而将所述另外的蛋白靶向几乎任何DNA序列。
本发明所述核酸酶失活的Cas9可以衍生自不同物种的Cas9,例如,衍生自化脓链球菌(S.pyogenes)Cas9(SpCas9),或衍生自金黄色葡萄球菌(S.aureus)Cas9(SaCas9)。同时突变Cas9的HNH核酸酶亚结构域和RuvC亚结构域(例如,包含突变D10A和H840A)使Cas9的核酸酶失去活性,成为核酸酶死亡Cas9(dCas9)。突变失活其中一个亚结构域可以使得Cas9具有切口酶活性,即获得Cas9切口酶(nCas9),例如,仅具有突变D10A的nCas9。
因此,在本发明的一些实施方案中,本发明所述核酸酶失活的Cas9相对于野生型Cas9包含氨基酸取代D10A和/或H840A。
在本发明的一些具体实施方式中,所述核酸酶失活的Cas9还可以包含额外的突变。 例如核酸酶失活的SpCas9还可以包含EQR、VQR或VRER突变以及SaCas9还可以包含KKH突变(Kim et al.Nat.Biotechnol.35,371-376.)。
在本发明的一些具体实施方式中,所述核酸酶失活的SpCas9包含SEQ ID NO:4所示的氨基酸序列。
在一些实施方案中,所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cpf1。Cpf1包含一个DNA切割结构域(RuvC),将其突变后可以使Cpf1的DNA切割活性缺失,形成“DNA切割活性缺失的Cpf1”。所述DNA切割活性缺失的Cpf1仍然保留gRNA指导的DNA结合能力。因此,原则上,当与另外的蛋白融合时,DNA切割活性缺失的Cpf1可以简单地通过与合适的向导RNA共表达而将所述另外的蛋白靶向几乎任何DNA序列。
本发明所述DNA切割活性缺失的Cpf1可以衍生自不同物种的Cpf1,例如,衍生自Francisella novicida U112、Acidaminococcus sp.BV3L6和Lachnospiraceae bacterium ND2006的分别称为FnCpf1、AsCpf1和LbCpf1的Cpf1蛋白。
在一些实施方案中,所述DNA切割活性缺失的Cpf1是DNA切割活性缺失的FnCpf1。在一些具体实施方式中,所述DNA切割活性缺失的FnCpf1相对于野生型FnCpf1包含D917A突变。
在一些实施方案中,所述DNA切割活性缺失的Cpf1是DNA切割活性缺失的AsCpf1。在一些具体实施方式中,所述DNA切割活性缺失的AsCpf1相对于野生型AsCpf1包含D908A突变。
在一些实施方案中,所述DNA切割活性缺失的Cpf1是DNA切割活性缺失的LbCpf1。在一些具体实施方式中,所述DNA切割活性缺失的LbCpf1相对于野生型LbCpf1包含D832A突变。
在本发明的一些实施方案中,所述APOBEC3A脱氨酶被融合至所述核酸酶失活的CRISPR效应蛋白(如核酸酶失活的Cas9或Cpf1)的N末端。
在本发明的一些实施方案中,所述APOBEC3A脱氨酶和所述核酸酶失活的CRISPR效应蛋白(如核酸酶失活的Cas9或Cpf1)通过接头融合。所述接头可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如,所述接头可以是柔性接头,例如GGGGS、GS、GAP、(GGGGS)x 3、GGS和(GGS)x7等。优选地,所述接头长32个氨基酸。在一些优选的实施方案中,所述接头是SEQ ID NO:3所示的XTEN接头。
在细胞中,尿嘧啶DNA糖基化酶催化U从DNA上的去除并启动碱基切除修复(BER),导致将U:G修复成C:G。因此,不受任何理论限制,在本发明的碱基编辑融合蛋白或本发明的系统中包含尿嘧啶DNA糖基化酶抑制剂将能够增加碱基编辑的效率。
因此,在本发明的一些实施方案中,所述碱基编辑融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。在一些具体实施方式中,所述尿嘧啶DNA糖基化酶抑制剂包含 SEQ ID NO:5所示的氨基酸序列。
在一些实施方案中,本发明的碱基编辑融合蛋白还包含Gam蛋白。在一些实施方案中,其氨基酸序列如SEQ ID NO:6所示。
在本发明的一些实施方案中,本发明的碱基编辑融合蛋白还包含核定位序列(NLS)。一般而言,所述碱基编辑融合蛋白中的一个或多个NLS应具有足够的强度,以便在植物细胞的核中驱动所述碱基编辑融合蛋白以可实现其碱基编辑功能的量积聚。一般而言,核定位活性的强度由所述碱基编辑融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。
在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以位于N端和/或C端。在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以位于APOBEC3A脱氨酶与所述核酸酶失活的CRISPR效应蛋白之间。在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以位于APOBEC3A脱氨酶与所述核酸酶失活的CRISPR效应蛋白之间。在一些实施方案中,所述碱基编辑融合蛋白包含约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含在或接近于N端的约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含在或接近于C端约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含这些的组合,如包含在N端的一个或多个NLS以及在C端的一个或多个NLS。当存在多于一个NLS时,每一个可以被选择为不依赖于其他NLS。在本发明的一些实施方式中,所述碱基编辑融合蛋白包含至少2个NLS,例如所述至少2个NLS位于C端。在一些实施方案中,所述NLS位于所述碱基编辑融合蛋白的C末端。在一些实施方案中,所述碱基编辑融合蛋白包含至少3个NLS。在一些实施方案中,所述碱基编辑融合蛋白在N端和/或在所述APOBEC3A脱氨酶与所述核酸酶失活的CRISPR效应蛋白之间不包含NLS。
一般而言,NLS由暴露于蛋白表面上的带正电的赖氨酸或精氨酸的一个或多个短序列组成,但其他类型的NLS也是已知的。NLS的非限制性实例包括:KKRKV(核苷酸序列5’-AAGAAGAGAAAGGTC-3’)、PKKKRKV(核苷酸序列5’-CCCAAGAAGAAGAGGAAGGTG-3’或CCAAAGAAGAAGAGGAAGGTT),或SGGSPKKKRKV(核苷酸序列5’-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3’)。
在本发明的一些实施方式中,所述碱基编辑融合蛋白的N端包含PKKKRKV所示的氨基酸序列的NLS。在本发明的一些实施方式中,所述碱基编辑融合蛋白的C端包含KRPAATKKAGQAKKKK所示的氨基酸序列的NLS。在本发明的一些实施方式中,所述碱基编辑融合蛋白的C端包含PKKKRKV所示的氨基酸序列的NLS效率更高。
此外,根据所需要编辑的DNA位置,本发明的碱基编辑融合蛋白还可以包括其他的定位序列,例如细胞质定位序列、叶绿体定位序列、线粒体定位序列等。
在一些具体实施方案中,所述碱基编辑融合蛋白包含选自SEQ ID NO:12-16的氨基 酸序列。
为了在植物中获得有效表达,在本发明的一些实施方式中,所述编码碱基编辑融合蛋白的核苷酸序列针对待进行碱基编辑的植物进行密码子优化。
密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修饰核酸序列以便增强在感兴趣宿主细胞中的表达的方法。不同的物种对于特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关,而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码子。因此,可以将基因定制为基于密码子优化在给定生物中的最佳基因表达。密码子利用率表可以容易地获得,例如在 www.kazusa.orjp/codon/上可获得的密码子使用数据库(“Codon Usage Database”)中,并且这些表可以通过不同的方式调整适用。参见,Nakamura Y.等,“Codon usage tabulated from the international DNA sequence databases:status for the year2000.Nucl.Acids Res.,28:292(2000)。
在一些具体实施方案中,所述碱基编辑融合蛋白由选自SEQ ID NO:7-11的核苷酸序列编码。
在本发明一些实施方式中,所述向导RNA是单向导RNA(sgRNA)。根据给定的靶序列构建合适的sgRNA的方法是本领域已知的。例如,可参见文献:Wang,Y.et al.Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew.Nat.Biotechnol.32,947-951(2014);Shan,Q.et al.Targeted genome modification of crop plants using a CRISPR-Cas system.Nat.Biotechnol.31,686-688(2013);Liang,Z.et al.Targeted mutagenesis in Zea mays using TALENs and the CRISPR/Cas system.J Genet Genomics.41,63–68(2014)。在本发明一些优选实施方式中,所述向导RNA是esgRNA。所述esgRNA的构建可以参考Li,C.et al.Genome Biol.19,59(2018)。
在本发明一些实施方式中,所述编码碱基编辑融合蛋白的核苷酸序列和/或所述编码向导RNA的核苷酸序列与植物表达调控元件如启动子可操作地连接。
本发明可使用的启动子的实例包括但不限于:花椰菜花叶病毒35S启动子(Odell et al.(1985)Nature 313:810-812)、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子、TrpPro5启动子(美国专利申请No.10/377,318;2005年3月16日提请)、pEMU启动子(Last et al.(1991)Theor.Appl.Genet.81:581-588)、MAS启动子(Velten et al.(1984)EMBO J.3:2723-2730)、玉米H3组蛋白启动子(Lepetit et al.(1992)Mol.Gen.Genet.231:276-285和Atanassova et al.(1992)Plant J.2(3):291-300)和欧洲油菜(Brassica napus)ALS3(PCT申请WO 97/41228)启动子。可用于本发明的启动子还包含Moore et al.(2006)Plant J.45(4):651-683中综述的常用组织特异性启动子。
在本发明可使用的sgRNA的精确RNA的获得借助于tRNA的自身切割产生(Zhang et al.(2017)Genome Biology,2017,18:191)。
三、产生经遗传修饰的生物体的方法
在另一方面,本发明提供了一种产生经遗传修饰的生物体的方法,包括将本发明的用于对细胞基因组中的靶序列进行碱基编辑的系统导入生物体细胞,由此所述向导RNA将所述碱基编辑融合蛋白靶向所述生物体细胞基因组中的靶序列,导致所述靶序列中的一或多个C被T取代。在一些优选实施方案中,所述生物体是植物。
可以被Cas9和向导RNA复合物识别并靶向的靶序列的设计属于本领域普通技术人员的技能范围。可以被Cpf1蛋白和向导RNA(即crRNA)复合物识别并靶向的靶序列或crRNA编码序列的设计可以参照例如Zhang et al.,Cell 163,1-13,October 22,2015。一般而言,靶序列是与向导RNA中包含的大约20个核苷酸的引导序列互补的序列,且3’末端紧邻前间区序列邻近基序(protospacer adjacent motif)(PAM)NGG。
例如,在本发明的一些实施方案中,所述靶序列具有以下结构:5’-N X-NGG-3’,其中N独立地选自A、G、C和T;X为14≤X≤30的整数;Nx表示X个连续的核苷酸,NGG为PAM序列。在本发明的一些优选的实施方案中,X为20。在一些实施方案中,碱基编辑的窗口位于靶序列的位置1-17。也就是说,本发明的系统可以使靶序列从5’末端起的第1-17位范围内的一或多个C被T取代。
在本发明所述方法的一些实施方案中,还包括筛选具有期望的核苷酸取代的生物体如植物。可以通过T7EI、PCR/RE或测序方法检测生物体如植物中的核苷酸取代,例如可参见Shan,Q.,Wang,Y.,Li,J.&Gao,C.Genome editing in rice and wheat using the CRISPR/Cas system.Nat.Protoc.9,2395-2410(2014)。
在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。
可以通过T7EI、PCR/RE或测序方法检测所述细胞靶序列中的C至T碱基编辑。
在本发明的方法中,所述碱基编辑的系统可以通过本领域技术人员熟知的各种方法导入细胞。可用于将本发明的基因组编辑系统导入细胞的方法包括但不限于:磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化。
可以通过本发明的方法进行基因组编辑的细胞可以来自例如,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥等。
本发明的方法尤其适合于产生经遗传修饰的植物,例如作物植物。在本发明的产生经遗传修饰的植物的方法中,所述碱基编辑系统可以本领域技术人员熟知的各种方法导 入植物。可用于将本发明的碱基编辑系统导入植物的方法包括但不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。优选地,通过瞬时转化将所述碱基编辑系统导入植物。
在本发明的方法中,只需在植物细胞中导入或产生所述碱基编辑融合蛋白和向导RNA即可实现对靶序列的修饰,并且所述修饰可以稳定遗传,无需将所述碱基编辑系统稳定转化植物。这样避免了稳定存在的碱基编辑系统的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的碱基编辑系统转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率,获得不含外源核苷酸序列的经修饰的植物。
在另一些实施方式中,可以将本发明的碱基编辑系统转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录的RNA分子转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现碱基编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组中的整合。
因此,在一些实施方式中,使用本发明的方法对植物进行遗传修饰和育种可以获得无外源DNA整合的植物,即非转基因(transgene-free)的经修饰的植物。此外,本发明的碱基编辑系统在植物中进行碱基编辑时具有高特异性(低脱靶率),这也提高了生物安全性。
可以通过本发明的方法进行碱基编辑的植物包括单子叶植物和双子叶植物。例如,所述植物可以是作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
在本发明的一些实施方式中,其中所述靶序列与植物性状如农艺性状相关,由此所述碱基编辑导致所述植物相对于野生型植物具有改变的性状。在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。相应地,在本发明的一些实施方式中,所述C至T的取代导致靶蛋白中的氨基酸取代。在本发明的另一些实施方式中,所述C至T的取代导致靶基因的表达发生变化。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的植物的后代。
在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分,其中所述植物 通过本发明上述的方法获得。在一些实施方式中,所述经遗传修饰的植物或其后代或其部分是非转基因的。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明上述的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而将所述遗传修饰导入第二植物。
实施例
为了便于理解本发明,下面将参照相关具体实施例及附图对本发明进行更全面的描述。附图中给出了本发明的较佳实施例。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。
本发明中使用的原生质体来自于冬小麦品种Kenong199、粳稻品种中花11和马铃薯品种“Désirée”(Désirée)。
实施例1—PBE系统的优化及其编辑效率验证
将植物nCas9-PBE系统(以下称为PBE)(Zong,Y.等.Nat.Biotechnol.35,438-440(2017))中的rAPOBEC1替换为人类APOBEC3A(以下称为A3A),针对谷类植物进行密码子优化(图1b),获得A3A-PBE。
将UGI和Mu蛋白添加到A3A-PBE中,从而产生A3A-Gam(图1b),期望增加碱基编辑效率和产物纯度(Komor,A.C.等.Sci.Adv.3,eaao4774(2017))。
使用之前研究描述过的报道基因系统来表征这些构建体的碱基编辑活性,当BFP-sgRNA靶序列的C 4变为T 4时,其可将BFP转化为GFP(Zong,Y.等.Nat.Biotechnol.35,438-440(2017))。通过PEG介导的转化,将各个植物碱基编辑器(PBE、A3A-PBE和A3A-Gam)构建体与pUbi-BFPm和pOsU3-BFP-sgRNA共转染到水稻原生质体中。
流式细胞术(FCM)分析显示A3A-PBE在24.5%的频率下产生最大比率的GFP表达细胞,比PBE大约高出12倍(图2a-b)。A3A-Gam的编辑效率低于A3A-PBE,但高于PBE。
实施例2—A3A-PBE在小麦和水稻细胞中突变效率和编辑窗口验证
为了进一步测试A3A-PBE编辑内源基因的效果,为3种小麦基因(TaALS、TaMTL、TaLOX2-T1和TaLOX2-T2)设计了4种sgRNA,并为6种水稻基因(OsAAT-T1,OsCDC48,OsDEP1,OsPDS,OsNRT1.1B-T1,OsOD和OsEV)各自设计1种sgRNA(图3a-b和表1)。作为对照,使用野生型Cas9(WT Cas9)产生缺失和/或插入突变(插入缺失,indel)。
表1.sgRNA靶位点和序列的描述
Figure PCTCN2019097398-appb-000001
Figure PCTCN2019097398-appb-000002
注:标有下划线的C/G碱基即由PBE、A3A-PBE和A3A-Gam编辑的碱基。每个目标序列中的PAM基序以粗体显示。
使用下一代测序技术(NGS)对每个基因座获取100,000-270,000个读段,从而评估原生质体中各基因的C至T碱基编辑。最终评估出A3A-PBE的基因编辑效率最高,其在小麦中的编辑频率为0.3-36.9%,其在水稻中的编辑频率为0.5-31.1%(图3a-b)。A3A-PBE在10个靶位点的平均编辑效率为13.1%,与PBE的平均效率(1%)、和A3A-Gam的平均效率(2.8%)相比,分别提高了13倍和5倍。这些目标位点碱基编辑效率的增加幅度为:PBE<A3A-Gam<A3A-PBE,这与报告系统的结果一致(图2a-b)。
通过分析10个测试位点的原位空间位置的编辑效率,发现在多数情况下,A3A-PBE的活性脱氨基窗口跨越大约17个核苷酸,从前间区序列(protospacer)位置1-17,比以往在植物系统中报道的PBE的编辑窗口(位置3到9)更宽(图3a-b)。
由于大多数靶向Cs位于前间区序列的3-9位置之外,这意味着A3A-PBE的靶向范围有所增加,且在一定程度上可以克服PAM的要求限制。此外,A3A-PBE与其他两种构建体一起,并未在任何小麦和水稻基因组靶基因座上诱导出非预期编辑(<0.1%),且其插入缺失频率(<0.1%)明显低于野生型Cas9(WT Cas9)(2.2-21.6%)(图5-10)。
实施例3—A3A-PBE在四倍体马铃薯中突变效率和编辑窗口验证
四倍体遗传使得对马铃薯的研究和经传统杂交进行育种成为一项挑战(Obidiegwu,J.E.,Flath,K.和Gebhardt,C.Theor.Appl.Genet.127,763-780(2014))。本实施例在四倍体马铃薯(Solanum tuberosum)中应用了A3A-PBE。在本发明中,利用35S启动子驱动A3A-PBE和PBE的融合蛋白,利用AtU6启动子驱动sgRNA(图11a)。为了靶向两个内源性马铃薯基因StALS(StALS-T1至StALS-T4)和 StGBSS(StGBSS-T1至StGBSS-T7),分别设计了四个和六个sgRNA(图3c、图10b和表1)。
将sgRNA连同A3A-PBE或PBE构建体共转化到马铃薯原生质体中,并且在转染后48小时检测到碱基编辑诱导的突变。PBE在这10个靶位点的平均编辑效率为0.4%(图3c)。在这10个靶位点观察A3A-PBE的C至T转化率,其平均效率(4.3%)高出PBE约11倍。
在A3A-PBE编辑的10个靶位点中均观察到C至T的转换,并在前间区序列内观察到有效编辑频率跨越位置1到17(图3c),这与小麦和水稻细胞中的结果一致(图3a-b)。
同样,A3A-PBE诱导的插入缺失(<0.1%)相比WT Cas9(6.2-34.5%)大幅降低(图10)。
这是首次发现胞苷脱氨基的基因编辑可用于靶向马铃薯基因组,这为将A3A-PBE广泛用于双子叶植物铺平了道路。
总之,这些结果表明A3A-PBE在小麦、水稻和马铃薯细胞中的多个基因座处提供了比PBE更高的C至T突变效率和更宽的编辑窗口。
实施例4—测试A3A-PBE融合基因在内源植物基因内的高GC位点处的情况
使用针对3种小麦基因和3种水稻基因设计了7种不同的sgRNA(TaHPPD,TaDEP1,TaLOX2-T3,TaLOX2-T4,OsHPPD,OsAAT-T2和OsNRT1.1B-T2)(图12a、表1),并直接比较了A3A-PBE和PBE的编辑活性。该实施例表明A3A-PBE融合基因对紧邻G的下游目标C明显没有偏见(Komor,A.C.等.Nature 533,420–424(2016))。A3A-PBE在这七个靶位点中,在高GC背景下的编辑效率提高至41.2%(图11a)。
同时,在PBE的所有靶位点中几乎观察不到C至T的编辑细胞(<0.2%),与A3A-PBE的碱基编辑相比,该效率降低了50倍。因此,就植物基因组中大量包含5'-GC-3'的序列而言,A3A-PBE对靶向突变更为有利。总而言之,无论在何种序列背景下,A3A-PBE都几乎可以同等地编辑胞苷,这一点优于PBE(图11b)。鉴于对靶胞嘧啶侧翼序列的要求降低,该技术将改善靶向范围从而更为有利于产生点突变。
实施例5—调查A3A-PBE是否可以在与多种sgRNA结合时产生多样化的突变
A3A-PBE广泛的脱氨基窗口和高编辑效率表明,它可能可以在研究基因调控区域发挥作用,在基因调控区域可能需要突变多个位点。因此调查了A3A-PBE是否可以在与多种sgRNA结合时产生多样化的突变。TaVRN1-A1启动子包含多个调控位点,例如VRN盒、CArG盒以及一个推定的AG杂交盒(图11c),这些多个结合位点的突变可以影响小麦开花时间(Chengxia,L.和Jorge,D.The Plant J.55,543-554(2008).;Kippes,N.等.Proc.Natl.Acad.Sci.USA 112,E5401-E5410(2015))。
设计了3个sgRNA,用于靶向相关的结合位点(图11c)。在A3A-PBE或其变体A3A-PBE-VQR处理的原生质体中,扩增出TaVRN1靶位点的扩增子,从而鉴定出在这六个顺式元件中携带不同突变的读段,其效率范围为1.2%至27.7%。例如,在VRN盒的靶位点,A3A-PBE有效编辑了sgRNA靶序列第4位至第16位的C核苷酸,足以破坏与bZIP转录因子之间的结合(图11c)(Chengxia,L.和Jorge,D.The Plant J.55,543-554(2008);Kippes,N.等.Proc.Natl.Acad.Sci.USA 112,E5401-E5410(2015))。
实施例6—再生A3A-PBE碱基编辑的突变植物
靶向小麦中的乙酰乳酸合酶基因(ALS),其为支链氨基酸生物合成途径中的第一种酶。将硬直黑麦草(Lolium rigidum)ALS的保守P197氨基酸取代为其他氨基酸可使草种抗除草剂烟嘧磺隆(Powles,S.B.和Yu,Q.Annu.Rev.Plant Biol.61,317-347(2010).)。硬直黑麦草(Lolium rigidum)中的P197对应于六倍体小麦靶位点TaALS中的P174。
通过基因枪法将A3A-PBE和pTaU6-ALS-sgRNA构建体转入未成熟的小麦胚中,并且在未使用除草剂或抗性筛选的前提下再生了植物。通过PCR-RE和Sanger测序,在120个转化后的未成熟胚中,再生了27株含有至少一个C至T取代的突变植物,突变效率为22.5%(27/120)(图12a、图13),比先前报道的CRISPR/Cas9介导的基因敲除或点突变效率高约4-10倍。在前间区序列位置-7、6、7、8、9、10、12和13处发现C至T替换(图12a和图13)。
在27个突变体中,鉴定了多种氨基酸取代组合,其中12个突变体在三个基因组中均具有靶向性突变(表2)。更为重要的是,这27个突变体中有两个突变体(T0-7,T0-9),其有6个等位基因同时被编辑,切编码的蛋白均包含氨基酸替换(图12a-b和表2)。
评估了T0-7突变体的除草剂抗性。在添加0.254ppm烟嘧磺隆的再生培养基上培养三周后,突变植物仍具有正常表型,并未有损伤性状。而野生型(WT)植物显示出严重的发育迟缓和叶子枯萎性状(图12b)。
Figure PCTCN2019097398-appb-000003
Figure PCTCN2019097398-appb-000004
Figure PCTCN2019097398-appb-000005
Figure PCTCN2019097398-appb-000006
Figure PCTCN2019097398-appb-000007
实施例7—A3A-PBE碱基编辑的多样化和精确性验证
通过农杆菌介导的转化,使用A3A-PBE系统靶向OsCDC48和OsNRT1.1B-T2位点,在水稻中获得碱基编辑的植物。鉴定了OsCDC48碱基替换效率为82.9%(34/41)和OsNRT1.1B-T2碱基替换效率为44.1%(15/34),其中包括7个OsCDC48和4个OsNRT1.1B-T2纯合突变品系(图12a)。
通过PEG介导的原生质体转化,靶向马铃薯StGBSS-T6。从原生质体再生出两株独立的杂合突变体马铃薯植物,碱基编辑频率为6.5%(2/31)。
通过A3A-PBE可以获得不同的突变体组合,例如,在34个OsCDC48突变体植物中,存在五种组合:3个单碱基取代,1个双碱基取代,8个三碱基取代,14个五碱基取代和6个六碱基取代(图12a),这些取代比先前报道的更有效,且比PBE产生的突变更多样化。
使用在线工具CRISPR-P预测潜在脱靶区域,鉴定和检测了水稻基因组中OsCDC48和OsNRT1.1B-T2的脱靶位点。
转基因水稻植物均未在两个靶位点产生插入缺失或非预期的编辑(图12a)。在两个靶点的潜在的3个错配脱靶区域均未检测到突变(表4)。这表明,A3A-PBE系统可以有效的在诱导植物中的特定靶点产生突变,而不引起其他基因组修饰。
实施例8—对A3A-PBE融合基因的进一步优化
在大肠杆菌表达并纯化了的不含UGI(A3A-PBE-ΔUGI)蛋白的A3A-PBE(图15)。在没有UGI的情况下,融合蛋白对植物细胞的毒性更小,更易于纯化,并可以提高C核苷酸转化为其他三种碱基核苷酸的可能。A3A-PBE-ΔUGI蛋白与体外转录的sgRNA形成核糖核蛋白复合物,并将针对2个小麦基因(TaMTL和TaLOX2-T5)的复合物转入原生质体中(图16a和表1)。
扩增子深度测序结果显示A3A-PBE-ΔUGI RNP的C至T取代频率在1.8%,效率低于A3A-PBE-ΔUGI质粒形式(平均为3.9%)(图16a),而采用PBE RNP形式则不可行。植物A3A-PBE-ΔUGI RNP可以经进一步优化用以产生非转基因突变植物,这可以促进碱基编辑在改良作物植物的育种和商业化中的应用。
此外,还对A3A进行了突变,第57位的N突变为G(N57G取代),构建了A3A-PBE-N57G融合蛋白。A3A-PBE、A3A-PBE-N57G和A3A-PBE-ΔUGI转化在小麦和水稻原生质体,针对不同基因进行碱基编辑。结果如图19所示。A3A-PBE-N57G和A3A-PBE-ΔUGI在某些位点可具有较高的编辑效率。
此外,对A3A-PBE融合蛋白N端加一个NLS,构建A3A-PBE-NLS,并在小麦原生质体中行进行验证。结果如图20所示。A3A-PBE-NLS某些位点具有与A3A-PBE相当或更高的编辑效率。
实施例9—水稻参考基因组序列(Os-Nipponbare-Reference-IRGSP-1.0)的计算分析结果
对水稻参考基因组序列(Os-Nipponbare-Reference-IRGSP-1.0)的计算分析显示,相比于PBE,本发明中具有17个核苷酸编辑窗口的A3A-PBE碱基编辑器在碱基编辑靶向范围中,C/G碱基编辑数量提高了1.8倍(图16b)。相似地,当SpCas9,SaCas9及其变体携带NGG、NGA NCGC、NNGRRT和NNNRRT PAMs时,A3A脱氨酶可以在基因组范围内突变90%的C/G碱基(图16b)。
实施例10—基于Cpf1的A3A碱基编辑器
本实施例中,将前述A3A碱基编辑器中的nCas9替换为核酸酶失活的Cpf1蛋白。载体构建如图17所示。
利用所得的基于Cpf1的A3A碱基编辑器编辑內源靶基因水稻DEP1,检测在第十位C的突变效率。结果如图18所示。结果表明,相比于APOBEC1,人APOBEC3A能够显著提高碱基编辑效率。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (28)

  1. 一种碱基编辑系统,其包含以下i)至v)中至少一项:
    i)碱基编辑融合蛋白,和向导RNA;
    ii)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导RNA;
    iii)碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构建体;
    iv)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编码向导RNA的核苷酸序列的表达构建体;
    v)包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷酸序列的表达构建体;
    其中所述碱基编辑融合蛋白包含核酸酶失活的CRISPR效应蛋白和APOBEC3A脱氨酶,所述向导RNA能够将所述碱基编辑融合蛋白靶向细胞基因组中的靶序列,从而所述碱基编辑融合蛋白导致所述靶序列中的一或多个C被T取代。
  2. 权利要求1的系统,其中所述APOBEC3A脱氨酶包含SEQ ID NO:2的氨基酸序列,或包含相对于SEQ ID NO:2包含一或多个,例如1个、2个、3个、4个、5个、6个、7个、8个、9个、10个氨基酸取代、缺失或添加的氨基酸序列,或包含相对于SEQ ID NO:2包含N57G取代的氨基酸序列。
  3. 权利要求1的系统,其中所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cas9,其相对于野生型Cas9包含氨基酸取代D10A和/或H840A,例如,所述核酸酶失活的Cas9包含SEQ ID NO:4的氨基酸序列。
  4. 权利要求1的系统,其中所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cpf1,例如LbCpf1。
  5. 权利要求1的系统,其中所述APOBEC3A脱氨酶被融合至所述核酸酶失活的CRISPR效应蛋白的N末端。
  6. 权利要求1的系统,其中所述APOBEC3A脱氨酶和所述核酸酶失活的CRISPR效应蛋白通过接头融合。
  7. 权利要求1的系统,其中所述碱基编辑融合蛋白还在其N端和/或C端包含核定位序列(NLS)。
  8. 权利要求1的系统,其中所述所述碱基编辑融合蛋白还包含UGI序列,例如氨基酸序列示于SEQ ID NO:5的UGI序列。
  9. 权利要求1的系统,其中所述碱基编辑融合蛋白还包含Gam蛋白序列,例如氨基酸序列示于SEQ ID NO:6的Gam序列。
  10. 权利要求1的系统,其中所述碱基编辑融合蛋白包含SEQ ID NO:7-11之一所示的核苷酸序列编码的氨基酸序列或包含SEQ ID NO:12-16之一所示的氨基酸序列。
  11. 权利要求1的系统,其中所述编码碱基编辑融合蛋白的核苷酸序列针对待进行 碱基编辑的植物进行密码子优化,例如,所述编码碱基编辑融合蛋白的核苷酸序列示于SEQ ID NO:7-9中任一个。
  12. 权利要求1的系统,其中所述向导RNA是单向导RNA(sgRNA)。
  13. 权利要求1的系统,所述编码碱基编辑融合蛋白的核苷酸序列和/或所述编码向导RNA的核苷酸序列与植物表达调控元件可操作地连接。
  14. 权利要求14的系统,其中所述调控元件是启动子,例如35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子或玉米U3启动子。
  15. 权利要求1的系统,其中所述CRISPR效应蛋白是Cas9核酸酶或Cpf1核酸酶。
  16. 权利要求1的系统,其中所述向导RNA的靶区域长度为20个核苷酸。
  17. 一种产生经遗传修饰的植物的方法,包括将权利要求1-16中任一项的系统导入植物,由此所述向导RNA将所述碱基编辑融合蛋白靶向所述植物基因组中的靶序列,导致所述靶序列中的一或多个C被T取代。
  18. 权利要求17的方法,其中所述导入在不存在选择压力下进行。
  19. 权利要求17的方法,还包括筛选具有期望的核苷酸取代的植物。
  20. 权利要求17-19中任一项的方法,其中所述植物选自单子叶植物和双子叶植物。
  21. 权利要求20的方法,其中所述植物是作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
  22. 权利要求17-21中任一项的方法,其中所述靶序列与植物性状如农艺性状相关,由此所述碱基编辑导致所述植物相对于野生型植物具有改变的性状。
  23. 权利要求17-22中任一项的方法,其中通过瞬时转化导入所述系统。
  24. 权利要求17-23中任一项的方法,其中所述系统通过选自以下的方法导入所述植物:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、病毒介导的转化、花粉管通道法和子房注射法。
  25. 权利要求17-24中任一项的方法,还包括获得所述经遗传修饰的植物的后代。
  26. 权利要求17-25中任一项的方法,其中没有外源DNA整合到所述经修饰的植物的基因组中。
  27. 经遗传修饰的植物或其后代或其部分,其中所述植物通过权利要求17-26中任一项的方法获得,优选地,所述经遗传修饰的植物是非转基因的。
  28. 一种植物育种方法,包括将通过权利要求17-26中任一项的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而将所述遗传修饰导入第二植物。
PCT/CN2019/097398 2018-07-24 2019-07-24 基于人apobec3a脱氨酶的碱基编辑器及其用途 WO2020020193A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201980049597.XA CN112805385B (zh) 2018-07-24 2019-07-24 基于人apobec3a脱氨酶的碱基编辑器及其用途

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810816603.7 2018-07-24
CN201810816603 2018-07-24

Publications (1)

Publication Number Publication Date
WO2020020193A1 true WO2020020193A1 (zh) 2020-01-30

Family

ID=69182103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097398 WO2020020193A1 (zh) 2018-07-24 2019-07-24 基于人apobec3a脱氨酶的碱基编辑器及其用途

Country Status (2)

Country Link
CN (1) CN112805385B (zh)
WO (1) WO2020020193A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114317590A (zh) * 2020-09-30 2022-04-12 北京市农林科学院 一种将植物基因组中的碱基c突变为碱基t的方法
WO2022188816A1 (zh) * 2021-03-09 2022-09-15 苏州齐禾生科生物科技有限公司 改进的cg碱基编辑系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115678900A (zh) * 2021-07-30 2023-02-03 中国科学院天津工业生物技术研究所 缩小碱基编辑器的编辑窗口的方法、碱基编辑器及用途

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017070632A2 (en) * 2015-10-23 2017-04-27 President And Fellows Of Harvard College Nucleobase editors and uses thereof
CN108070611A (zh) * 2016-11-14 2018-05-25 中国科学院遗传与发育生物学研究所 植物碱基编辑方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11201903089RA (en) * 2016-10-14 2019-05-30 Harvard College Aav delivery of nucleobase editors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017070632A2 (en) * 2015-10-23 2017-04-27 President And Fellows Of Harvard College Nucleobase editors and uses thereof
CN108070611A (zh) * 2016-11-14 2018-05-25 中国科学院遗传与发育生物学研究所 植物碱基编辑方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU, JIAHUI ET AL: "Research Progress of Base Editing System", WORLD SCI-TECH R&D, vol. 39, no. 6, 15 September 2017 (2017-09-15), pages 457 - 461, ISSN: 1006-6055 *
ST MARTIN, A.: "A fluorescent reporter for quantification and enrichment of DNA editing by APOBEC-Cas9 or cleavage by Cas9 in living cells", NUCLEIC ACIDS RESEARCH, vol. 46, no. 14, 9 May 2018 (2018-05-09), pages e84, XP055682169, ISSN: 0305-1048 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114317590A (zh) * 2020-09-30 2022-04-12 北京市农林科学院 一种将植物基因组中的碱基c突变为碱基t的方法
CN114317590B (zh) * 2020-09-30 2024-01-16 北京市农林科学院 一种将植物基因组中的碱基c突变为碱基t的方法
WO2022188816A1 (zh) * 2021-03-09 2022-09-15 苏州齐禾生科生物科技有限公司 改进的cg碱基编辑系统

Also Published As

Publication number Publication date
CN112805385A (zh) 2021-05-14
CN112805385B (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
US11820990B2 (en) Method for base editing in plants
US20220267788A1 (en) Creation of herbicide resistant gene and use thereof
WO2019120310A1 (en) Base editing system and method based on cpf1 protein
JP2019523011A (ja) 植物における塩基編集のための方法
US20200140874A1 (en) Genome Editing-Based Crop Engineering and Production of Brachytic Plants
JP2018527920A (ja) 部位特異的ヌクレオチド置換によりグリホサート耐性イネを取得するための方法
WO2020020193A1 (zh) 基于人apobec3a脱氨酶的碱基编辑器及其用途
CN110621154A (zh) 用于植物的除草剂耐受性的方法和组合物
WO2021032155A1 (zh) 一种碱基编辑系统和其使用方法
WO2016138021A1 (en) Haploid induction
JP2021519098A (ja) 植物におけるアミノ酸含有量の調節
US20230183724A1 (en) Methods and compositions for multiplexed editing of plant cell genomes
WO2021175288A1 (zh) 改进的胞嘧啶碱基编辑系统
JP2022531253A (ja) 標的化核酸配列に多様性を生じさせるための組成物及び方法
WO2018228348A1 (en) Methods to improve plant agronomic trait using bcs1l gene and guide rna/cas endonuclease systems
US11981900B2 (en) Increasing gene editing and site-directed integration events utilizing meiotic and germline promoters
US20220073937A1 (en) Increasing gene editing and site-directed integration events utilizing mieotic and germline promoters
US20210155949A1 (en) Improving agronomic characteristics in maize by modification of endogenous mads box transcription factors
WO2022144877A1 (en) Herbicide resistant cannabis plant
WO2023115030A2 (en) Lodging resistance in eragrostis tef
CN114174518A (zh) 非生物胁迫耐受性植物及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19841954

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19841954

Country of ref document: EP

Kind code of ref document: A1