WO2021082830A1 - 靶向性修饰植物基因组序列的方法 - Google Patents

靶向性修饰植物基因组序列的方法 Download PDF

Info

Publication number
WO2021082830A1
WO2021082830A1 PCT/CN2020/117736 CN2020117736W WO2021082830A1 WO 2021082830 A1 WO2021082830 A1 WO 2021082830A1 CN 2020117736 W CN2020117736 W CN 2020117736W WO 2021082830 A1 WO2021082830 A1 WO 2021082830A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
pegrna
target
plant
reverse transcriptase
Prior art date
Application number
PCT/CN2020/117736
Other languages
English (en)
French (fr)
Inventor
高彩霞
林秋鹏
宗媛
靳帅
薛郴销
Original Assignee
中国科学院遗传与发育生物学研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院遗传与发育生物学研究所 filed Critical 中国科学院遗传与发育生物学研究所
Priority to BR112022008468A priority Critical patent/BR112022008468A2/pt
Priority to EP20882981.2A priority patent/EP4053284A4/en
Priority to US17/773,426 priority patent/US20230075587A1/en
Priority to CN202080077133.2A priority patent/CN114945671A/zh
Publication of WO2021082830A1 publication Critical patent/WO2021082830A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1276RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes

Definitions

  • the invention relates to the field of plant genetic engineering. Specifically, the present invention relates to a method for targeted modification of plant genome sequence. More specifically, the present invention relates to a method for targeted modification of a specific sequence in the plant genome to a target sequence of interest through a nuclease-reverse transcriptase fusion protein guided by a guide RNA, and the method produced by the method. Genetically modified plants and their descendants.
  • the single-base editing system can achieve efficient conversion of cytosine to thymine (C ⁇ T) and conversion of adenine to guanine (A ⁇ G) at the target site.
  • C ⁇ T cytosine to thymine
  • a ⁇ G adenine to guanine
  • this method has limited types of base conversion, and it cannot achieve precise insertion or deletion of fragments. Therefore, there is still a need in the art for efficient methods that can achieve precise targeted modification of plant genome sequences.
  • the present invention includes a new type of plant DNA precision editing system, which consists of Cas nuclease (Cas9-H840A) with target chain nicking activity fused with reverse transcriptase, and a 3'end with a repair template (RT template) and free
  • the single-stranded binding region (PBS) consists of pegRNA (prime editing gRNA).
  • This system combines the free single-stranded DNA sequence produced by Cas nickase such as Cas9-H840A through PBS, and makes it transcribe the single-stranded DNA sequence according to the given RT template. After cell repair, the PAM sequence-3 can be realized in the genome. Any change in the DNA sequence downstream of the bit.
  • new nicking sgRNA it creates nicks on the non-target strand of pegRNA, which helps to promote cell repair according to the donor template. Experimental results show that the system effectively induces precise modification of target sites in plants.
  • FIG. 1 Schematic diagram of the principle of the present invention
  • FIG. 2 The working diagram of three different types of PPE (plant prime editor) systems.
  • the system that does not provide additional nicking sgRNA is named (PPE2); the system that provides additional nicking sgRNA that helps to cut the opposite strand of pegRNA is named (PPE3); when the PAM sequence of nicking sgRNA that cuts the opposite strand is located in the spacer sequence of pegRNA The system is named (PPE3b).
  • FIG. 3 Schematic diagram of PPE construct and pegRNA construct.
  • Figure 4 The working principle of the BFP-to-GFP reporter system for detecting precise editing in plant protoplasts.
  • FIG. 5 Flow cytometer measurement of the fluorescence intensity of the PPE system.
  • CK is the protoplast control without plasmid transformation
  • PBE is the BE3 single-base editing reporter system
  • PPE3b( ⁇ M-MLV) refers to the control group without M-MLV reverse transcriptase.
  • FIG. 6 Flow cytometry measuring the efficiency of the PPE system.
  • CK is the protoplast control without plasmid transformation
  • PBE is the BE3 single-base editing reporter system
  • PPE3b( ⁇ M-MLV) refers to the control group without M-MLV reverse transcriptase.
  • Figure 7 Editing of PPE system in rice endogenous targets.
  • FIG. 8 Editing of PPE system's endogenous targets in wheat.
  • Figure 9 By-products and their proportions produced by the PPE system.
  • Figure 10 Editing of PPE-CaMV system in plant endogenous targets.
  • Figure 11 Schematic diagram of pegRNA processed by ribozymes initiated by type II promoters.
  • Figure 12 Editing of PPE-R system in plant endogenous genes.
  • Figure 14 The effect of different PBS lengths on the PPE system.
  • Figure 15 The impact of different RT template lengths on the PPE system.
  • Figure 16 The influence of different RT template lengths on the precise editing ratio of the PPE system.
  • Figure 17 The effect of different nicking sgRNA positions on the PPE system.
  • FIG. 18 The PPE system implements different types of mutations in plant endogenous genes.
  • Figure 19 The PPE system realizes the insertion of fragments of different lengths in plant endogenous genes.
  • FIG. 20 The PPE system realizes the deletion of fragments of different lengths in plant endogenous genes.
  • Figure 21 Schematic diagram of PPE construct used for Agrobacterium infection in rice.
  • Figure 22 Using the PPE system to obtain rice mutants and their sequencing results, the arrow indicates the location of the target mutation.
  • FIG. 23 Monoclonal sequencing results of T0-9 mutant plants.
  • Figure 24 Use the published data for three target sites and the newly obtained data for ten new target sites in rice protoplasts to compare the effect of different Tm-guided PBS lengths on editing efficiency.
  • Figure 25 Normalization of priming editing frequency with different PBS melting temperatures. Normalize the highest editing frequency obtained at each target to 1, and adjust the frequency obtained at other PBS Tm accordingly.
  • Figure 26 Schematic diagram of primed editing using single pegRNA and dual pegRNA strategies.
  • (a) Use only NGG-pegRNA for editing (editing the forward DNA strand).
  • (b) Use only CCN-pegRNA for editing (editing reverse DNA strands).
  • (c) Edit using dual-pegRNA strategy. Double-pegRNA creates two edits in two DNA strands at the same time.
  • Figure 27 Comparison of editing efficiency induced by NGG-pegRNA, CCN-pegRNA and dual-pegRNA strategies at 15 target sites.
  • Figure 28 Product purity of NGG-pegRNA, CCN-pegRNA and double-pegRNA when edited at 15 endogenous sites in rice protoplasts.
  • Figure 29 The percentage of rice genome bases that can theoretically be targeted by single pegRNA and double-pegRNA primed editing.
  • the term “and/or” encompasses all combinations of items connected by the term, and should be treated as if each combination has been individually listed herein.
  • “A and/or B” encompasses “A”, “A and B”, and “B”.
  • “A, B, and/or C” encompasses "A”, “B”, “C”, “A and B”, “A and C”, “B and C”, and "A and B and C”.
  • the protein or nucleic acid may be composed of the sequence, or may have additional amino acids or nuclei at one or both ends of the protein or nucleic acid. Glycolic acid, but still has the activity described in the present invention.
  • methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain actual conditions (for example, when expressed in a specific expression system), but does not substantially affect the function of the polypeptide.
  • Gene as used herein not only covers chromosomal DNA present in the nucleus, but also includes organelle DNA present in subcellular components of the cell (such as mitochondria, plastids).
  • Genetically modified plant means a plant that contains an exogenous polynucleotide or a modified gene or expression control sequence in its genome.
  • exogenous polynucleotides can be stably integrated into the genome of plants and inherited for successive generations.
  • the exogenous polynucleotide can be integrated into the genome alone or as part of a recombinant DNA construct.
  • the modified gene or expression control sequence includes one or more deoxynucleotide substitutions, deletions and additions in the plant genome.
  • Form in terms of sequence means a sequence from a foreign species, or if from the same species, a sequence that has undergone significant changes in composition and/or locus from its natural form through deliberate human intervention.
  • nucleic acid sequence is used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural Or changed nucleotide bases.
  • Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (respectively RNA or DNA), “C” is cytidine or deoxycytidine, and “G” is guanosine or Deoxyguanosine, “U” means uridine, “T” means deoxythymidine, “R” means purine (A or G), “Y” means pyrimidine (C or T), “K” means G or T, “ H” means A or C or T, “D” means A, T or G, “I” means inosine, and “N” means any nucleotide.
  • Polypeptide “peptide”, and “protein” are used interchangeably in the present invention and refer to a polymer of amino acid residues.
  • the term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally-occurring amino acids, as well as to naturally-occurring amino acid polymers.
  • the terms "polypeptide”, “peptide”, “amino acid sequence” and “protein” may also include modified forms, including but not limited to glycosylation, lipid linkage, sulfation, gamma carboxylation of glutamic acid residues, hydroxyl And ADP-ribosylation.
  • expression construct refers to a vector suitable for expression of a nucleotide sequence of interest in an organism, such as a recombinant vector.
  • “Expression” refers to the production of a functional product.
  • the expression of a nucleotide sequence may refer to the transcription of the nucleotide sequence (such as transcription to generate mRNA or functional RNA) and/or the translation of RNA into a precursor or mature protein.
  • the "expression construct" of the present invention can be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, can be RNA (such as mRNA) that can be translated, for example, RNA generated by in vitro transcription.
  • RNA such as mRNA
  • the "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different sources, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from those normally occurring in nature.
  • regulatory sequence and “regulatory element” are used interchangeably and refer to the upstream (5' non-coding sequence), middle or downstream (3' non-coding sequence) of the coding sequence, and affect the transcription, RNA processing, or processing of the related coding sequence. Stability or translated nucleotide sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
  • Promoter refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • a promoter is a promoter capable of controlling gene transcription in a cell, regardless of whether it is derived from the cell.
  • the promoter can be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably, and refer to mainly but not necessarily exclusively expressed in a tissue or organ, and can also be expressed in a specific cell or cell type Promoter.
  • tissue-preferred promoter refers to a promoter whose activity is determined by developmental events.
  • inducible promoters selectively express operably linked DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).
  • promoters include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters.
  • pol I promoter examples include the chicken RNA pol I promoter.
  • pol II promoters include, but are not limited to, cytomegalovirus immediate early (CMV) promoter, Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and simian virus 40 (SV40) immediate early promoter.
  • pol III promoters include U6 and H1 promoters.
  • An inducible promoter such as a metallothionein promoter can be used.
  • promoters include T7 phage promoter, T3 phage promoter, ⁇ -galactosidase promoter, and Sp6 phage promoter.
  • the promoter may be cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, maize U3 promoter, rice actin promoter.
  • operably linked refers to the connection of regulatory elements (for example, but not limited to, promoter sequences, transcription termination sequences, etc.) to nucleic acid sequences (for example, coding sequences or open reading frames) such that the nucleotides The transcription of the sequence is controlled and regulated by the transcription control element.
  • regulatory elements for example, but not limited to, promoter sequences, transcription termination sequences, etc.
  • nucleic acid sequences for example, coding sequences or open reading frames
  • "Introducing" nucleic acid molecules (such as plasmids, linear nucleic acid fragments, RNA, etc.) or proteins into an organism refers to transforming the cells of the organism with the nucleic acid or protein so that the nucleic acid or protein can function in the cell.
  • the "transformation” used in the present invention includes stable transformation and transient transformation.
  • “Stable transformation” refers to the introduction of an exogenous nucleotide sequence into the genome, resulting in the stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generations thereof.
  • Transient transformation refers to the introduction of nucleic acid molecules or proteins into cells to perform functions without stable inheritance of foreign genes. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
  • Proteins refer to the physiological, morphological, biochemical or physical characteristics of cells or organisms.
  • “Agronomic traits” especially refer to the measurable index parameters of crop plants, including but not limited to: leaf green, grain yield, growth rate, total biomass or accumulation rate, fresh weight at maturity, dry weight at maturity, fruit Yield, seed yield, plant total nitrogen content, fruit nitrogen content, seed nitrogen content, plant nutrient tissue nitrogen content, plant total free amino acid content, fruit free amino acid content, seed free amino acid content, plant nutrient tissue free amino acid content, plant total protein Content, fruit protein content, seed protein content, plant nutrient tissue protein content, herbicide resistance, drought resistance, nitrogen absorption, root lodging, harvest index, stem lodging, plant height, ear height, ear length, disease resistance Resistance, cold resistance, salt resistance and tiller number.
  • the present invention relates to a genome editing system for targeted modification of the genomic DNA sequence of an organism, which comprises:
  • fusion protein and/or an expression construct containing a nucleotide sequence encoding the fusion protein, wherein the fusion protein comprises CRISPR nickase and reverse transcriptase; and/or
  • the at least one pegRNA includes a guide sequence, a scaffold sequence, a reverse transcription (RT) template sequence, and a primer binding site (PBS) sequence from 5'to 3'direction.
  • RT reverse transcription
  • PBS primer binding site
  • the at least one pegRNA can form a complex with the fusion protein and target the fusion protein to a target sequence in the genome, resulting in a nick in the target sequence.
  • the organism is a plant.
  • gene editing system refers to a combination of components required for genome editing of the genome in a cell.
  • Each component of the system such as fusion protein, gRNA, etc., can exist independently of each other, or can exist in any combination as a composition.
  • target sequence refers to a sequence of about 20 nucleotides in length in the genome characterized by a 5'or 3'flanking PAM (proximal region sequence adjacent motif) sequence.
  • PAM proximal region sequence adjacent motif
  • the target sequence is immediately adjacent to the PAM at the 3'end, such as 5'-NGG-3'. Based on the existence of PAM, those skilled in the art can easily determine the target sequence in the genome that can be used for targeting. And depending on the location of the PAM, the target sequence can be located on any strand of the genomic DNA molecule.
  • the target sequence is preferably 20 nucleotides.
  • the CRISPR nickase in the fusion protein can form a nick in the target sequence in the genomic DNA.
  • the CRISPR nickase is a Cas9 nickase.
  • the Cas9 nickase is derived from SpCas9 of S. pyogenes, and at least includes the amino acid substitution H840A relative to wild-type SpCas9.
  • An exemplary wild-type SpCas9 includes the amino acid sequence shown in SEQ ID NO:1.
  • the Cas9 nickase comprises the amino acid sequence shown in SEQ ID NO: 2.
  • the Cas9 nickase in the fusion protein can be located at the -3 nucleotide of the PAM of the target sequence (the first nucleotide at the 5'end of the PAM sequence is the +1) and the -4 nuclear An incision is formed between the glycidic acid.
  • the Cas9 nickase is a Cas9 nickase variant capable of recognizing altered PAM sequences.
  • the Cas9 nickase is a Cas9 variant that recognizes the PAM sequence 5'-NG-3'.
  • the Cas9 nickase variant that recognizes the PAM sequence 5'-NG-3' contains the following amino acid substitutions relative to wild-type Cas9: H840A, R1335V, L1111R, D1135V, G1218R, E1219F, A1322R, T1337R, where amino acid numbering Refer to SEQ ID NO:1.
  • the nick formed by the Cas9 nickase of the present invention can cause the target sequence to form a free single chain with a 3'end (3' free single chain) and a free single chain with a 5'end (5' free single chain).
  • the reverse transcriptase in the fusion protein of the present invention can be derived from different sources.
  • the reverse transcriptase is a reverse transcriptase derived from a virus.
  • the reverse transcriptase is M-MLV reverse transcriptase or a functional variant thereof.
  • An exemplary wild-type M-MLV reverse transcriptase sequence is shown in SEQ ID NO: 3.
  • the reverse transcriptase is an enhanced M-MLV reverse transcriptase, for example, the amino acid sequence of the enhanced M-MLV reverse transcriptase is shown in SEQ ID NO: 4.
  • the reverse transcriptase is CaMV-RT from Cauliflower mosaic virus (CaMV), and its amino acid sequence is shown in SEQ ID NO: 5.
  • the reverse transcriptase is a reverse transcriptase derived from bacteria, such as retron-RT from Escherichia coli, and its amino acid sequence is shown in SEQ ID NO: 6.
  • the CRISPR nickase and the reverse transcriptase in the fusion protein are connected by a linker.
  • linkers can be 1-50 pieces in length (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, non-functional amino acid sequences without secondary or higher structure.
  • the linker may be a flexible linker, such as GGGGS, GS, GAP, (GGGGS)x3, GGS and (GGS)x7, etc.
  • it may be the linker shown in SEQ ID NO: 7.
  • the CRISPR nickase in the fusion protein is fused to the N-terminus of the reverse transcriptase directly or through a linker. In some embodiments, the CRISPR nickase in the fusion protein is fused to the C-terminus of the reverse transcriptase directly or through a linker.
  • the fusion protein of the present invention may also include a nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • one or more NLS in the fusion protein should have sufficient strength to drive the accumulation of the fusion protein in an amount that can achieve its base editing function in the nucleus of the cell.
  • the strength of nuclear localization activity is determined by the number and location of NLS in the fusion protein, one or more specific NLS used, or a combination of these factors.
  • the guide sequence (also called seed sequence or spacer sequence) in at least one pegRNA of the present invention is set to have sufficient sequence identity (preferably 100% identity) with the target sequence, so as to be able to communicate with the target sequence through base pairing.
  • Complementary strands combine to achieve sequence-specific targeting.
  • scaffold sequences of gRNA suitable for genome editing based on CRISPR nuclease are known in the art, and these can be used in the pegRNA of the present invention.
  • the scaffold sequence of the gRNA is shown in SEQ ID NO: 8.
  • the primer binding sequence is set to be complementary to at least a part of the target sequence (preferably with at least a part of the target sequence).
  • the primer binding sequence is complementary to the DNA where the target sequence is located.
  • At least a part of the 3'free single strands caused by the nick in the strand is complementary to at least a part of the 3'free single strands, especially the nucleotide sequence at the 3'end of the 3'free single strands Complementary (preferably fully paired).
  • the 3'free single strand of the chain binds to the primer binding sequence through base pairing
  • the 3'free single strand can be used as a primer to be a reverse transcription (RT) template immediately adjacent to the primer binding sequence
  • RT reverse transcription
  • the sequence is used as a template, and reverse transcription is performed under the action of the reverse transcriptase in the fusion protein to extend the DNA sequence corresponding to the reverse transcription (RT) template sequence.
  • the primer binding sequence depends on the length of the free single strand formed in the target sequence by the CRISPR nickase used, however, it should have the minimum length to ensure specific binding.
  • the primer binding sequence may be 4-20 nucleotides in length, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 in length. , 17, 18, 19, 20 nucleotides.
  • the primer binding sequence is set to have a Tm (melting temperature) of no more than about 52°C. In some embodiments, the Tm (melting temperature) of the primer binding sequence is about 18°C-52°C, preferably about 24°C-36°C, more preferably about 28°C-32°C, more preferably about 30°C.
  • the method of calculating the Tm of a nucleic acid sequence is well known in the art, for example, it can be calculated using the Oligo Analysis Tool online analysis tool.
  • the appropriate Tm can be obtained by selecting the appropriate length of the PBS.
  • a PBS sequence with a suitable Tm can be obtained by selecting a suitable target sequence.
  • the RT template sequence can be any sequence.
  • the sequence information can be integrated into the DNA strand where the target sequence is located (that is, the strand containing the target sequence PAM), and then through the cell's DNA repair function, a DNA double strand containing the sequence information of the RT template is formed .
  • the RT template sequence contains the desired modification.
  • the desired modification includes substitutions, deletions, and/or additions of one or more nucleotides.
  • the modification includes one or more substitutions selected from: C to T substitution, C to G substitution, C to A substitution, G to T substitution, G to C substitution, G to A substitution, A to T substitution , A to G substitution, A to C substitution, T to C substitution, T to G substitution, T to A substitution; and/or including one or more nucleotide deletions, such as 1 to about 100 or more One, such as 1, 2, 3, 4, 5, about 10, about 20, about 30, about 40, about 50, about 75, about 100 nucleotide deletions ; And/or include the insertion of one or more nucleotides, such as 1 to about 100 or more, such as 1 to about 100 or more, such as 1, 2, 3, 4 One, five, about 10, about 20, about 30, about 40, about 50, about 75, about 100 nucleotide insertions.
  • the RT template sequence is set to correspond to the sequence downstream of the nick of the target sequence (for example, complementary to at least a part of the sequence downstream of the nick of the target sequence), and contains the desired modification.
  • the desired modification includes substitutions, deletions and/or additions of one or more nucleotides.
  • the modification includes one or more substitutions selected from: C to T substitution, C to G substitution, C to A substitution, G to T substitution, G to C substitution, G to A substitution, A to T substitution , A to G substitution, A to C substitution, T to C substitution, T to G substitution, T to A substitution; and/or including one or more nucleotide deletions, such as 1 to about 100 or more One, such as 1, 2, 3, 4, 5, about 10, about 20, about 30, about 40, about 50, about 75, about 100 nucleotide deletions ; And/or include the insertion of one or more nucleotides, such as 1 to about 100 or more, such as 1 to about 100 or more, such as 1, 2, 3, 4 One, five, about 10, about 20, about 30, about 40, about 50, about 75, about 100 nucleotide insertions.
  • the RT template sequence may be about 1-300 or more nucleotides in length, for example, 1, 2, 3, 4, 5, about 10, about 20, about 30, about 40, about 50, about 75, about 100, about 125, about 150, about 175, about 200, about 225, about 250, about 275 , About 300 nucleotides or more polynucleotides.
  • the RT template sequence is 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 nucleotides in length.
  • the plant genome editing system further includes a nicking gRNA (nicking gRNA for generating additional nicks) and/or an expression construct containing a nucleotide sequence encoding the nicking gRNA, and the nicking gRNA includes Guide sequence and scaffold sequence.
  • the nicked gRNA does not include reverse transcription (RT) template sequence and primer binding site (PBS) sequence.
  • the guide sequence (also called seed sequence or spacer sequence) in the nicked gRNA of the present invention is set to have sufficient sequence identity (preferably 100% identity) with the nick target sequence in the genome, so that the fusion protein target of the present invention can be
  • the nicking target sequence results in a nick in the nicking target sequence, and the nicking target sequence and the target sequence targeted by pegRNA (pegRNA target sequence) are located on opposite strands of the genomic DNA.
  • the nick formed by the nick RNA and the nick formed by the pegRNA are about 1 to about 300 or more nucleotides apart, such as 1, 2, 3, 4, 5, about 10, about 20, about 30, about 40, about 50, about 75, about 100, about 125, about 150, about 175, about 200, about 225 , About 250, about 275, about 300 or more nucleotides.
  • the nick formed by the nicking RNA is located upstream or downstream of the formation of the pegRNA (the upstream or downstream refers to the DNA strand where the pegRNA target sequence is located).
  • the guide sequence in the nicked gRNA and the relative strand (modified) of the pegRNA target sequence after the editing event has sufficient sequence identity (preferably 100% identity), so that the nicked gRNA only targets To nick target sequences that are generated after pegRNA-induced target sequences are targeted and modified.
  • the PAM of the nick target sequence is located within the complement of the pegRNA target sequence.
  • the sequence of the pegRNA and/or nicked gRNA can be precisely processed using a self-processing system.
  • the 5'end of the pegRNA and/or nicked gRNA is connected to the 3'end of the first ribozyme, and the first ribozyme is designed to be at the 5'end of the pegRNA and/or nicked gRNA.
  • 'End cuts the fusion; and/or the 3'end of the pegRNA and/or nicked gRNA is connected to the 5'end of the second ribozyme, which is designed to be in the pegRNA and/or The 3'end of the nicked gRNA cuts the fusion.
  • first or second ribozyme is within the abilities of those skilled in the art. For example, see Gao et al., JIPB, Apr, 2014; Vol 56, Issue 4,343-349.
  • a method of precisely processing gRNA refer to WO 2018/149418, for example.
  • the genome editing system comprises at least one pair of pegRNA and/or an expression construct containing a nucleotide sequence encoding the at least one pair of pegRNA.
  • the two pegRNAs in the pegRNA pair are configured to target different target sequences on the same strand of genomic DNA.
  • the two pegRNAs in the pegRNA pair are configured to target target sequences on different strands of genomic DNA.
  • the PAM of the target sequence of one pegRNA in the pegRNA pair is located on the sense strand, and the PAM of the other pegRNA is located on the antisense strand.
  • the nicks induced by the two pegRNAs are located on both sides of the site to be modified.
  • the pegRNA-induced nick for the sense strand is located upstream (5' direction) of the site to be modified, and the pegRNA-induced nick for the antisense strand is located downstream (3' direction) of the site to be modified. The upstream or downstream is relative to the sense strand.
  • the induced nicks of the two pegRNAs are about 1 to about 300 or more nucleotides apart, for example, 1-15 nucleotides apart.
  • the two pegRNAs in the pegRNA pair are configured to introduce the same desired modification.
  • one type of pegRNA is configured to introduce A to G substitutions in the sense strand
  • the other type of pegRNA is configured to introduce T to C substitutions at corresponding positions on the antisense strand.
  • one pegRNA is set to introduce a two-nucleotide deletion in the sense strand
  • the other pegRNA is set to also introduce a two-nucleotide deletion in the corresponding position of the antisense strand.
  • Other types of modification can be deduced by analogy.
  • the pegRNA targeting two different strands can achieve the same desired modification by designing an appropriate RT template sequence.
  • the nucleotide sequence encoding the fusion protein is codon-optimized for the plant species whose genome is to be modified.
  • Codon optimization refers to replacing at least one codon of the natural sequence with a codon that is used more frequently or most frequently in the gene of the host cell (e.g., about or more than about 1, 2, 3, 4, 5, 10). , 15, 20, 25, 50 or more codons while maintaining the natural amino acid sequence to modify the nucleic acid sequence to enhance expression in the host cell of interest.
  • Different species display certain codons for specific amino acids Codon preference (the difference in codon usage between organisms) is often related to the translation efficiency of messenger RNA (mRNA), and the translation efficiency is considered to depend on the nature and the nature of the codon being translated
  • mRNA messenger RNA
  • tRNA transfer RNA
  • Codon utilization tables can be easily obtained, such as the codon usage database available on www.kazusa.orjp/codon/ ("Codon Usage Database"), and these tables can be adjusted in different ways Applicable. See, Nakamura Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucl. Acids Res., 28:292 (2000).
  • the fusion protein of the present invention is encoded by the nucleotide sequence shown in any one of SEQ ID NO: 9-11 or comprises the amino acid sequence shown in any one of SEQ ID NO: 12-14.
  • Plants that can be genome modified by the genome editing system of the present invention include monocotyledonous plants and dicotyledonous plants.
  • the plants are crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, and alfalfa.
  • the present invention provides a method for determining the PBS sequence in pegRNA of the genome editing system of the present invention, the method comprising:
  • a PBS sequence with a Tm not exceeding 52°C for example, a Tm of about 18°C-52°C, preferably about 24°C-36°C, more preferably about 28°C-32°C, and more preferably about 30°C.
  • the present invention provides a method for producing a genetically modified plant, comprising introducing the genome editing system of the present invention into at least one of the plants, thereby causing a modification in the genome of the at least one plant.
  • the modification includes substitution, deletion and/or addition of one or more nucleotides.
  • the modification includes one or more substitutions selected from: C to T substitution, C to G substitution, C to A substitution, G to T substitution, G to C substitution, G to A substitution, A to T substitution , A to G substitution, A to C substitution, T to C substitution, T to G substitution, T to A substitution; and/or including one or more nucleotide deletions, such as 1 to about 100 or more One, such as 1, 2, 3, 4, 5, about 10, about 20, about 30, about 40, about 50, about 75, about 100 nucleotide deletions ; And/or include the insertion of one or more nucleotides, such as 1 to about 100 or more, such as 1 to about 100 or more, such as 1, 2, 3, 4 1, 5, about 10, about 20, about 30, about 40, about 50, about 75, about 100 nucleotide insertions.
  • the method further includes screening for plants having the desired modification from the at least one plant.
  • the genome editing system can be introduced into plants by various methods well known to those skilled in the art.
  • Methods that can be used to introduce the genome editing system of the present invention into plants include, but are not limited to: gene bombardment, PEG-mediated transformation of protoplasts, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube passage method, and ovary Injection method.
  • the genome editing system is introduced into the plant by transient transformation.
  • the genome modification can be realized by introducing or producing the fusion protein and gRNA into plant cells, and the modification can be inherited stably, without the need to change the components of the genome editing system.
  • the exogenous polynucleotide is stably transformed into plants. This avoids the potential off-target effects of the stable (continuously generated) genome editing system, and also avoids the integration of foreign nucleotide sequences in the plant genome, thereby having higher biological safety.
  • the introduction is performed in the absence of selective pressure, so as to avoid the integration of foreign nucleotide sequences in the plant genome.
  • the introduction includes transforming the genome editing system of the present invention into an isolated plant cell or tissue, and then regenerating the transformed plant cell or tissue into a whole plant.
  • the regeneration is performed in the absence of selective pressure, that is, no selective agent for the selective gene carried on the expression vector is used during the tissue culture process.
  • no selection agent can improve the regeneration efficiency of plants and obtain modified plants that do not contain exogenous nucleotide sequences.
  • the genome editing system of the present invention can be transformed to specific parts on the whole plant, such as leaves, stem tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate from tissue culture.
  • the protein expressed in vitro and/or the RNA molecule transcribed in vitro (for example, the expression construct is an RNA molecule transcribed in vitro) is directly transformed into the plant.
  • the protein and/or RNA molecule can realize genome editing in plant cells and then be degraded by the cell, avoiding the integration of foreign nucleotide sequences in the plant genome.
  • genetic modification and breeding of plants using the method of the present invention can obtain plants whose genomes have no exogenous polynucleotide integration, that is, transgene-free modified plants.
  • the method further includes culturing the plant cell, tissue or whole plant into which the genome editing system has been introduced at an elevated temperature, for example, 37°C.
  • the modified genomic region is related to plant traits such as agronomic traits, whereby the modified substitution causes the plant to have an altered (preferably improved) trait relative to a wild-type plant, For example, agronomic traits.
  • the method further includes the step of screening for plants with desired modifications and/or desired traits such as agronomic traits.
  • the method further includes obtaining progeny of the genetically modified plant.
  • the genetically modified plant or its progeny have desired modifications and/or desired traits such as agronomic traits.
  • the present invention also provides a genetically modified plant or its progeny or part thereof, wherein the plant is obtained by the above-mentioned method of the present invention.
  • the genetically modified plant or progeny or part thereof is non-transgenic.
  • the genetically modified plant or its progeny have desired genetic modification and/or desired traits such as agronomic traits.
  • the present invention also provides a plant breeding method, comprising crossing a genetically modified first plant obtained by the above-mentioned method of the present invention with a second plant that does not contain the modification, thereby combining the modification Introduce the second plant.
  • the genetically modified first plant has desired traits such as agronomic traits.
  • nCas9(H840A)-M-MLV construct nCas9(H840A)-CaMV construct, nCas9(H840A)-retron construct were constructed by Suzhou Jinweizhi Company.
  • the M-MLV used in this example is compared with the wild type M-MLV reverse transcriptase has 5 amino acid mutations.
  • M-MLV, RT-CaMV and RT-retron are all codon-optimized by monocotyledons.
  • the pegRNA fragments were constructed on the vector promoted by the OsU3 promoter using the Gibson method to obtain an OsU3-pegRNA construct suitable for rice.
  • the pegRNA fragments were constructed on the TaU6 promoter-promoted vector using the Gibson method to obtain a TaU6-pegRNA construct suitable for wheat.
  • Use the Gibson method to construct pegRNA fragments (including RT and PBS sequences) with ribozymes at both the 5'and 3'ends into the vector initiated by the maize Ubiquitin-1 (Ubi-1) promoter to obtain Ubi-pegRNA-R Construct.
  • the nicking gRNA is constructed by T4 ligase into the vector promoted by the TaU3 promoter to obtain the TaU3-nick vector.
  • the PAM sequence is shown in bold.
  • the protoplasts used in the present invention are from rice Zhonghua 11 variety and Kenong 199 wheat variety.
  • the seeds are rinsed with 75% ethanol for 1 minute, then treated with 4% sodium hypochlorite for 30 minutes, and washed with sterile water for more than 5 times. Cultivate on M6 medium for 3-4 weeks, at 26°C, protected from light.
  • Potted wheat seeds are planted in a culture room, and cultivated for about 1-2 weeks (about 10 days) under the conditions of a temperature of 25 ⁇ 2°C, an illumination of 1000 Lx, and an illumination of 14-16h/d.
  • the FACSAria III (BD Biosciences) instrument is used for flow cytometry analysis of protoplasts. The specific steps are as follows:
  • the 20 ⁇ L amplification system contains 4 ⁇ L 5 ⁇ Fastpfu buffer, 1.6 ⁇ L dNTPs (2.5mM), 0.4 ⁇ L Forward primer (10 ⁇ M), 0.4 ⁇ L Reverse primer (10 ⁇ M), 0.4 ⁇ L FastPfu polymerase (2.5U/ ⁇ L), and 2 ⁇ L DNA template ( ⁇ 60ng).
  • Amplification conditions 95°C pre-denaturation for 5min; 95°C denaturation for 30s, 50-64°C annealing for 30s, 72°C extension for 30s, 35 cycles; 72°C full extension for 5min, storage at 12°C;
  • the above-mentioned amplified product is diluted 10 times, and 1 ⁇ L is used as the second round of PCR amplification template, and the amplification primer is a sequencing primer containing Barcode.
  • the 50 ⁇ L amplification system contains 10 ⁇ L 5 ⁇ Fastpfu buffer, 4 ⁇ L dNTPs (2.5mM), 1 ⁇ L Forward primer (10 ⁇ M), 1 ⁇ L Reverse primer (10 ⁇ M), 1 ⁇ L FastPfu polymerase (2.5U/ ⁇ L), and 1 ⁇ L DNA template.
  • the amplification conditions are as above, and the number of amplification cycles is 35 cycles.
  • PCR products were separated by 2% agarose gel electrophoresis, and the target fragments were recovered by gel extraction with AxyPrep DNA Gel Extraction kit, and the recovered products were quantitatively analyzed by NanoDrop ultra-micro spectrophotometer; 100ng of recovered products were taken and mixed, And sent to Shenggong Bioengineering Co., Ltd. for amplicon sequencing library construction and amplicon sequencing analysis.
  • the original data is split according to the sequencing primers, and the WT is used as a control to compare and analyze the editing type and editing efficiency of the product at different gene targeting sites in the three repeated experiments.
  • the Cas9 (H840A) nickase-reverse transcriptase fusion (PPEs, plant prime editors) can be used to precisely modify the target sequence ( Figure 1-2)
  • the nCas9(H840A)-M-MLV construct (PPE-M) was constructed -MLV), nCas9(H840A)-CaMV construct (PPE-CaMV), nCas9(H840A)-retron (PPE-retron) construct, OsU3/TaU6 promoter driven RNA with target and RT and PBS sequence
  • the pegRNA construct and the nicking gRNA construct driven by the TaU3 promoter can produce nicking on the non-target strand ( Figure 3).
  • target 10 endogenous sites in rice OsCDC48-T1, OsCDC48-T2, OsCDC48-T3, OsALS-T1, OsALS-T2, OsDEP1, OsEPSPS-T1, OsEPSPS -T2, OsLDAMR and OsGAPDH
  • 7 wheat endogenous sites TaUbi10-T1, TaUbi10-T2, TaGW2, TaGASR7, TaLOX2, TaMLO and TaDME
  • pegRNA processed by ribozymes was also tested for the working conditions of the PPE system.
  • a Ubi-1 driven ribozyme-processed pegRNA construct with target guide RNA and RT and PBS sequences was constructed to replace the OsU3 driven pegRNA construct in the original system, and the system was named (PPE-R, R represents ribozyme (Ribozyme) ( Figure 11).
  • the results of endogenous targets show that the use of ribozyme processing strategies can also achieve precise endogenous sequence changes.
  • PPE-R is improved compared with PPE at some sites, with an efficiency of up to 9.7% ( Figure 12). This result indicates that both pegRNA processed by ribozymes or pegRNA using type II promoters are suitable for the PPE system.
  • the protoplasts were cultured at 37°C to test whether they could improve the editing efficiency.
  • Two rice endogenous sites (OsCDC48-T2 and OsALS-T2) were selected for testing. After the transformed protoplasts were cultured overnight at 26°C, they were incubated at 37°C for 8 hours, and then returned to 26°C to continue the culture. The efficiency was compared with the treatment group. The results show that treatment at 37°C can significantly improve the editing efficiency of PPE systems (including PPE2, PPE3 and PPE3b), with an average increase of 1.6 times (from 3.9% to 6.3%), and a maximum increase of 2.9 times (Figure 13).
  • Example 4 Test the influence of different PBS, RT template length and nicking gRNA position on the PPE system
  • the effects of different PBS, RT template lengths and nicking gRNA positions on the PPE system were tested.
  • the results using OsCDC48-T1 as the test site showed that the tested different PBS lengths (6-16nt) and RT template lengths (7-23nt) ) Can produce targeted sequence modification at a specific site ( Figure 14-15).
  • the efficiency at the OsCDC48-T1 site is 3.4% to 15.3%, and the efficiency at the OsCDC48-T2 site is 0.9% to 8.1%.
  • the efficiency at the OsALS-T2 site is 1.1% to 10.5%.
  • Embodiment 5 PPE system realizes multiple types of precise modification of endogenous sites
  • the maximum base-directed insertion efficiency can reach 3.0%, and the longest insertion length can reach 15 nt ( Figure 19); the highest base-directed deletion efficiency can reach 19.2 %, the longest deletion length can reach 40nt ( Figure 20). Therefore, the system can efficiently add and delete small fragments. Therefore, the PPE system can achieve multiple types of targeted modification of endogenous sites.
  • Example 6 PPE system to obtain targeted editing plants
  • Example 7 The Tm of PBS affects the efficiency of the PPE system
  • Example 8 Using dual pegRNA strategy to significantly improve the efficiency of the PPE system
  • NGG-pegRNA and CCN-pegRNA different pegRNAs
  • Figure 26 Fifteen target sites were selected from ten rice genes, and a pair of pegRNA was designed for each target (Table 3). Then, the editing activities of only NGG-pegRNA, only CCN-pegRNA and double-pegRNA were compared at the same position.
  • the PAM for each target site is shown in bold, and PBS is underlined.
  • the dual-pegRNA strategy has the highest activity in most target sites (13 out of 15). They generated C-to-A, G-to-A, G-to-T, A-to-G, T-to-A, C-to-G and CT-AG point mutations, 1bp(T) Or 2bp (AT) deletion, and 1bp (A) insertion, the maximum editing efficiency reached 24.5% (Figure 27).
  • the editing efficiency of double-pegRNA at all tested sites is about 4.2 times higher than that of single NGG-pegRNA (OsNRT1.1B (insert A) is up to 27.9 times), and on average 1.8 times higher than that of single CCN-pegRNA (the highest is OsALS (A ⁇ G) ) 7.2 times). Also, the ratio of by-products of using double pegRNA is not higher than that of single pegRNA ( Figure 28).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Plant Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)

Abstract

提供了一种通过向导RNA指导的核酸酶-逆转录酶融合蛋白将植物基因组中的特定序列靶向性修改为感兴趣的目的序列的方法,以及通过该方法产生的经遗传修饰的植物及其后代。

Description

靶向性修饰植物基因组序列的方法 技术领域
本发明涉及植物基因工程领域。具体而言,本发明涉及一种靶向性修饰植物基因组序列的方法。更具体而言,本发明涉及一种通过向导RNA指导的核酸酶-逆转录酶融合蛋白将植物基因组中的特定序列靶向性修改为感兴趣的目的序列的方法,以及通过所述方法产生的经遗传修饰的植物及其后代。
发明背景
许多重要的农艺性状都取决于基因组的序列。通过对基因组特定序列进行定向的改变,能够赋予生物体新的可遗传的性状,从而为疾病治疗和育种改良提供可能。目前,通过基因组编辑技术(例如CRISPR/Cas技术)可以实现对特定序列进行切割,进而激活细胞的修复途径对损伤处进行修复,进而改变靶位点处序列。其中,基于同源重组(HR)修复途径,可以实现对基因组序列的精确改变。但在大多数高等生物尤其是植物中,同源重组效率非常低,导致该方法的广泛应用受到了极大的限制。此外,利用单碱基编辑系统,可以实现目标位点高效的胞嘧啶转换为胸腺嘧啶(C→T),以及腺嘌呤转换为鸟嘌呤(A→G)。然而该方法的碱基转换种类有限,此外也无法实现片段的精确插入或删除。因此,本领域仍然需要高效的能实现植物基因组序列精确的定向修饰的方法。
发明简述
本发明包括一种新型的植物DNA精准编辑系统,该系统由有靶标链缺刻活性的Cas核酸酶(Cas9-H840A)融合逆转录酶、以及一个3’端带有修复模板(RT template)和游离单链的结合区(PBS)的pegRNA(prime editing gRNA,引发编辑gRNA)构成。该系统通过PBS结合Cas切口酶例如Cas9-H840A所产生的游离单链,并使其依照给定的RT模板转录出单链DNA序列,经过细胞的修复,可以在基因组中实现位于PAM序列-3位下游的DNA序列的任意变化。此外,通过引入新的nicking sgRNA,使其在pegRNA的非靶标链上产生缺刻,有助于促进细胞依照供体模板进行修复。实验结果表明该系统在植物中有效地诱导目标位点的精准修改。
附图简述
图1:本发明的原理示意图
图2:三种不同类型的PPE(plant prime editor)系统的工作示意图。不提供额外的nicking sgRNA的系统命名为(PPE2);提供额外的有助于切割pegRNA相对链的nicking sgRNA的系统命名为(PPE3);当切割相对链的nicking sgRNA的PAM序列位于pegRNA 的spacer序列之内时,该系统命名为(PPE3b)。
图3:PPE构建体及pegRNA构建体的示意图。
图4:用于检测植物原生质体中精确编辑的BFP-to-GFP报告系统的工作原理。
图5:流式细胞仪测定PPE系统的荧光强度结果图。“CK”为未经质粒转化的原生质体对照,“PBE”为BE3单碱基编辑报告系统,“PPE3b(ΔM-MLV)”表示缺失M-MLV逆转录酶的对照组。
图6:流式细胞仪测定PPE系统的效率。“CK”为未经质粒转化的原生质体对照,“PBE”为BE3单碱基编辑报告系统,“PPE3b(ΔM-MLV)”表示缺失M-MLV逆转录酶的对照组。
图7:PPE系统在水稻内源靶点的编辑。
图8:PPE系统在小麦内源靶点的编辑。
图9:PPE系统所产生的副产物及其比例。
图10:PPE-CaMV系统在植物内源靶点中的编辑。
图11:II型启动子启动的核酶加工的pegRNA示意图。
图12:PPE-R系统在植物内源基因中的编辑。
图13:温度处理提升PPE系统编辑效率。
图14:不同PBS长度对PPE系统的影响。
图15:不同RT模板长度对PPE系统的影响。
图16:不同RT模板长度对PPE系统的精准编辑比例的影响。
图17:不同nicking sgRNA位置对PPE系统的影响.
图18:PPE系统在植物内源基因中实现不同类型的突变。
图19:PPE系统在植物内源基因中实现不同长度片段的插入。
图20:PPE系统在植物内源基因中实现不同长度片段的删除。
图21:用于水稻农杆菌侵染的PPE构建体示意图。
图22:利用PPE系统获得水稻突变体及其测序结果,箭头指示目的突变所在位置。
图23:对T0-9突变植株的进行单克隆测序结果。
图24:使用对三个靶位点已发表的数据和使用水稻原生质体中对十个新的靶位点新获得的数据比较不同Tm指导的PBS长度对编辑效率的影响。
图25:具有不同PBS解链温度的引发编辑频率的标准化。将在每个靶处获得的最高编辑频率归一化为1,并相应地调整在其它PBS Tm处获得的频率。
图26:使用单pegRNA和双pegRNA策略进行引发编辑的示意图。(a)仅使用NGG-pegRNA进行编辑(编辑正向DNA链)。(b)仅使用CCN-pegRNA进行编辑(编辑反向DNA链)。(c)用双-pegRNA策略进行编辑。双-pegRNA在两条DNA链中同时创建两个编辑。
图27:NGG-pegRNA、CCN-pegRNA和双-pegRNA策略在15个靶位点诱导的编辑效率的比较。
图28:NGG-pegRNA、CCN-pegRNA和双-pegRNA在水稻原生质体的15个内源位点进行编辑时的产物纯度。
图29:理论上可以通过单个pegRNA和双-pegRNA的引发编辑可靶向的水稻基因组碱基的百分比。
具体实施方式
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,Molecular Cloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold Spring Harbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
如本文所用,术语“和/或”涵盖由该术语连接的项目的所有组合,应视作各个组合已经单独地在本文列出。例如,“A和/或B”涵盖了“A”、“A和B”以及“B”。例如,“A、B和/或C”涵盖“A”、“B”、“C”、“A和B”、“A和C”、“B和C”以及“A和B和C”。
“包含”一词在本文中用于描述蛋白质或核酸的序列时,所述蛋白质或核酸可以是由所述序列组成,或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷酸,但仍然具有本发明所述的活性。此外,本领域技术人员清楚多肽N端由起始密码子编码的甲硫氨酸在某些实际情况下(例如在特定表达系统表达时)会被保留,但不实质影响多肽的功能。因此,本申请说明书和权利要求书中在描述具体的多肽氨基酸序列时,尽管其可能不包含N端由起始密码子编码的甲硫氨酸,然而此时也涵盖包含该甲硫氨酸的序列,相应地,其编码核苷酸序列也可以包含起始密码子;反之亦然。
“基因组”如本文所用不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
“经遗传修饰的植物”意指在其基因组内包含外源多核苷酸或包含经修饰的基因或表达调控序列的植物。例如外源多核苷酸能够稳定地整合进植物的基因组中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。经修饰的基因或表达调控序列为在植物基因组中所述基因或表达调控序列包含一个或多个脱氧核苷酸取代、缺失和添加。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是 单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A”为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G”表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H”表示A或C或T,“D”表示A、T或G,“I”表示肌苷,并且“N”表示任何核苷酸。
“多肽”、“肽”、和“蛋白”在本发明中可互换使用,指氨基酸残基的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白”还可包括修饰形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。
如本发明所用,“表达构建体”是指适于感兴趣的核苷酸序列在生物体中表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA),例如是体外转录生成的RNA。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中,启动子是能够控制细胞中基因转录的启动子,无论其是否来源于所述细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用,并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。
启动子的实例包括但不限于聚合酶(pol)I、pol II或pol III启动子。pol I启动子的实例包括鸡RNA pol I启动子。pol II启动子的实例包括但不限于巨细胞病毒立即早期(CMV)启动子、劳斯肉瘤病毒长末端重复(RSV-LTR)启动子和猿猴病毒40(SV40)立即早期启动子。pol III启动子的实例包括U6和H1启动子。可以使用诱导型启动子如金属硫蛋白启动子。启动子的其他实例包括T7噬菌体启动子、T3噬菌体启动子、β-半乳糖苷酶启动子和Sp6噬菌体启动子。当用于植物时,启动子可以是花椰菜花叶病毒35S启 动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”生物体是指用所述核酸或蛋白质转化生物体细胞,使得所述核酸或蛋白质在细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。“稳定转化”指将外源核苷酸序列导入基因组中,导致外源基因稳定遗传。一旦稳定转化,外源核酸序列稳定地整合进所述生物体和其任何连续世代的基因组中。“瞬时转化”指将核酸分子或蛋白质导入细胞中,执行功能而没有外源基因稳定遗传。瞬时转化中,外源核酸序列不整合进基因组中。
“性状”指细胞或生物体的生理的、形态的、生化的或物理的特征。
“农艺性状”特别是指作物植物的可测量的指标参数,包括但不限于:叶片绿色、籽粒产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白含量、种子蛋白含量、植物营养组织蛋白质含量、除草剂的抗性抗旱性、氮的吸收、根的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗盐性和分蘖数等。
二、植物基因组编辑系统
在一方面,本发明涉及一种用于靶向性修饰生物体基因组DNA序列的基因组编辑系统,其包含:
i)融合蛋白和/或含有编码所述融合蛋白的核苷酸序列的表达构建体,其中所述融合蛋白包含CRISPR切口酶和逆转录酶;和/或
ii)至少一种pegRNA和/或含有编码所述至少一种pegRNA的核苷酸序列的表达构建体,
其中所述至少一种pegRNA从5’至3’方向包含引导序列、支架(scaffold)序列、反转录(RT)模板序列和引物结合位点(PBS)序列。
在一些实施方案中,所述至少一种pegRNA能够与所述融合蛋白形成复合物并将所述融合蛋白靶向基因组中的靶序列,导致所述靶序列内的切口。
在一些实施方案中,所述生物体是植物。
如本文所用,“基因组编辑系统”是指用于对细胞内基因组进行基因组编辑所需的成分的组合。其中所述系统的各个成分,例如融合蛋白、gRNA等可以各自独立地存在,或者可以以任意的组合作为组合物的形式存在。
如本文所用,“靶序列”是指基因组中由5’或3’侧翼的PAM(前间区序列邻近基序)序列所表征的长度大约20个核苷酸的序列。通常而言,PAM是CRISPR核酸酶或其变体与向导RNA形成的复合物识别靶序列所必需的。例如,对于Cas9核酸酶及其变体,其靶序列在3’末端紧邻PAM,例如5’-NGG-3’。基于PAM的存在,本领域技术人员可以容易地确定基因组中可用于靶向的靶序列。而且取决于PAM的位置,靶序列可以位于基因组DNA分子的任一条链上。对于Cas9或其衍生物例如Cas9切口酶而言,靶序列优选20个核苷酸。
在一些实施方案中,融合蛋白中的所述CRISPR切口酶(nickase)能够在基因组DNA中靶序列内形成切口(nick)。在一些实施方案中,所述CRISPR切口酶是Cas9切口酶。
在一些实施方案中,所述Cas9切口酶衍生自化脓链球菌(S.pyogenes)的SpCas9,且相对于野生型SpCas9至少包含氨基酸取代H840A。示例性的野生型SpCas9包含SEQ ID NO:1所示氨基酸序列。在一些实施方案中,所述Cas9切口酶包含SEQ ID NO:2所示氨基酸序列。在一些实施方案中,所述融合蛋白中的Cas9切口酶能够在靶序列的PAM的-3位核苷酸(PAM序列5’端的第一个核苷酸为+1位)和-4位核苷酸之间形成切口。
在一些实施方案中,所述Cas9切口酶是能够识别改变的PAM序列的Cas9切口酶变体。在一些优选实施方案中,所述Cas9切口酶是识别PAM序列5’-NG-3’的Cas9变体。在一些实施方案中,识别PAM序列5’-NG-3’的Cas9切口酶变体相对于野生型Cas9包含以下氨基酸取代H840A、R1335V、L1111R、D1135V、G1218R、E1219F、A1322R、T1337R,其中氨基酸编号参照SEQ ID NO:1。
本发明所述Cas9切口酶形成的切口能够导致靶序列形成具有3’末端的游离单链(3’游离单链)和具有5’末端的游离单链(5’游离单链)。
在一些实施方案中,本发明的融合蛋白中的所述逆转录酶可以衍生自不同来源。在一些实施方案中,所述逆转录酶是来源于病毒的逆转录酶。例如,在一些实施方案中,所述逆转录酶是M-MLV逆转录酶或其功能性变体。示例性的野生型M-MLV逆转录酶序列如SEQ ID NO:3所示。在一些实施方案中,所述逆转录酶是增强型M-MLV逆转录酶,例如增强型M-MLV逆转录酶的氨基酸序列如SEQ ID NO:4所示。在一些实施方案中,所述逆转录酶是来自花椰菜花叶病毒(CaMV,Cauliflower mosaic virus)的CaMV-RT,其氨基酸序列如SEQ ID NO:5所示。在一些实施方案中,所述逆转录酶是来源于细菌的逆转录酶,例如来自大肠杆菌(Escherichia coli)的retron-RT,其氨基酸序列如SEQ ID NO:6所示。
在一些实施方案中,融合蛋白中的所述CRISPR切口酶和所述逆转录酶通过接头相连。如本文所用,“接头”可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如,所述接头可以是柔性接头,例如GGGGS、GS、GAP、(GGGGS)x 3、GGS和(GGS)x7等。例如,可以是SEQ ID NO:7所示接头。
在一些实施方案中,融合蛋白中的所述CRISPR切口酶直接或通过接头融合至所述 逆转录酶的N端。在一些实施方案中,融合蛋白中的所述CRISPR切口酶直接或通过接头融合至所述逆转录酶的C端。
在本发明的一些实施方案中,本发明的融合蛋白还可以包含核定位序列(NLS)。一般而言,所述融合蛋白中的一个或多个NLS应具有足够的强度,以便在细胞的核中驱动所述融合蛋白以可实现其碱基编辑功能的量积聚。一般而言,核定位活性的强度由所述融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。
本发明的至少一种pegRNA中的引导序列(也称种子序列或spacer序列)被设置为与靶序列具有充分的序列相同性(优选100%相同性),从而能够通过碱基配对与靶序列的互补链结合,实现序列特异性靶向。
本领域已知多种适合用于基于CRISPR核酸酶(例如Cas9)的基因组编辑的gRNA的支架(scaffold)序列,这些可以用于本发明的pegRNA中。在一些具体实施方案中,所述gRNA的支架序列示于SEQ ID NO:8。
在一些实施方式中,所述引物结合序列被设置为与所述靶序列的至少一部分互补(优选与所述靶序列的至少一部分完全配对),优选地,所述引物结合序列与靶序列所在DNA链中由切口导致的3’游离单链的至少一部分互补(优选与所3’游离单链的至少一部分完全配对),特别是与所述3’游离单链的3’末端的核苷酸序列互补(优选完全配对)。当所述链的3’游离单链与所述引物结合序列通过碱基配对结合时,所述3’游离单链能够作为引物,以与所述引物结合序列紧邻的反转录(RT)模板序列作为模板,在融合蛋白中的逆转录酶的作用下进行反转录,延伸出对应于所述反转录(RT)模板序列的DNA序列。
所述引物结合序列取决于所使用的CRISPR切口酶在靶序列中形成的游离单链的长度,然而,其应当具有确保特异性结合的最少长度。在一些实施方案中,所述引物结合序列长度可以为4-20个核苷酸,例如长度为4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个核苷酸。
在一些实施方案中,所述引物结合序列被设置为具有不超过大约52℃的Tm(解链温度)。在一些实施方案中,所述引物结合序列的Tm(解链温度)为大约18℃-52℃,优选大约24℃-36℃,更优选大约28℃-32℃,更优选大约30℃。
计算核酸序列的Tm的方法为本领域公知,例如可以使用Oligo Analysis Tool在线分析工具计算。示例性的计算公式为Tm=N G:C*4+N A:T*2,其中N G:C是序列中G和C碱基的数目,N A:T是序列中A和T碱基的数目。可以通过选择合适的PBS的长度来获得合适的Tm。或者,可以通过选择合适的靶序列来获得具有合适的Tm的PBS序列。
在一些实施方式中,所述RT模板序列可以是任意序列。通过上述反转录,其序列信息可以被整合进靶序列所在的DNA链(也即包含靶序列PAM的链),再通过细胞的DNA修复作用,形成包含所述RT模板序列信息的DNA双链。在一些实施方案中,所述RT模板序列包含期望的修饰。例如,所述期望修饰包括一或多个核苷酸的取代、缺失和/或添加。例如,所述修饰包括一个或多个选自以下的取代:C至T取代、C至G 取代、C至A取代、G至T取代、G至C取代、G至A取代、A至T取代、A至G取代、A至C取代、T至C取代、T至G取代、T至A取代;和/或包括一个或多个核苷酸的缺失,例如1个至大约100个或更多个,例如1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个的核苷酸缺失;和/或包括一个或多个核苷酸的插入,例如1个至大约100个或更多个,例如1个至大约100个或更多个,例如1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个的核苷酸插入。
在一些实施方式中,所述RT模板序列被设置为对应于靶序列切口下游的序列(例如,与靶序列切口下游的序列的至少一部分互补),并且包含期望的修饰。所述期望修饰包括一或多个核苷酸的取代、缺失和/或添加。例如,所述修饰包括一个或多个选自以下的取代:C至T取代、C至G取代、C至A取代、G至T取代、G至C取代、G至A取代、A至T取代、A至G取代、A至C取代、T至C取代、T至G取代、T至A取代;和/或包括一个或多个核苷酸的缺失,例如1个至大约100个或更多个,例如1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个的核苷酸缺失;和/或包括一个或多个核苷酸的插入,例如1个至大约100个或更多个,例如1个至大约100个或更多个,例如1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个的核苷酸插入。
在一些实施方式中,所述RT模板序列长度可以为大约1-300个或更多个核苷酸,例如长度为1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个、大约125个、大约150个、大约175个、大约200个、大约225个、大约250个、大约275个、大约300个核苷酸或更多个多核苷酸。优选地,所述RT模板序列长度为7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23个核苷酸。
在一些实施方案中,所述植物基因组编辑系统还包括切口gRNA(nicking gRNA,用于产生额外切口)和/或含有编码所述切口gRNA的核苷酸序列的表达构建体,所述切口gRNA包含引导序列和支架(scaffold)序列。在一些优选实施方案中,所述切口gRNA不包含反转录(RT)模板序列和引物结合位点(PBS)序列。
本发明的切口gRNA中的引导序列(也称种子序列或spacer序列)被设置为与基因组中切口靶序列具有充分序列相同性(优选100%相同性),从而能够将本发明所述融合蛋白靶向所述切口靶序列,并导致所述切口靶序列内的切口,所述切口靶序列与pegRNA靶向的靶序列(pegRNA靶序列)位于基因组DNA的相对链上。在一些实施方案中,所述切口RNA形成的切口和所述pegRNA形成的切口相距大约1个-大约300个或更多个核苷酸,例如相距1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个、大约125个、大约150个、大约175个、大约200个、大约225个、大约250个、大约275个、大约300个核苷酸或更多个 核苷酸。在一些实施方案中,所述切口RNA形成的切口位于所述pegRNA形成的上游或下游(所述上游或下游均参照pegRNA靶序列所在的DNA链)。在一些实施方案中,所述切口gRNA中的引导序列与pegRNA靶序列在编辑事件发生后的相对链(经修饰)具有充分序列相同性(优选100%相同性),从而所述切口gRNA仅靶向在pegRNA诱导的靶序列靶向及修改完成后才产生的切口靶序列。在一些实施方案中,所述切口靶序列的PAM位于所述pegRNA靶序列的互补序列内。
在一些实施方案中,所述的pegRNA和/或切口gRNA可以使用自加工系统对其序列进行精确加工。在一些具体实施方案中,所述pegRNA和/或切口gRNA的5’端连接至第一核酶的3’端,所述第一核酶被设计为在所述pegRNA和/或切口gRNA的5’端切割所述融合物;和/或所述pegRNA和/或切口gRNA的3’端连接至第二核酶的5’端,所述第二核酶被设计为在所述pegRNA和/或切口gRNA的3’端切割所述融合物。所述第一或第二核酶的设计属于本领域技术人员的能力范围内。例如,可以参见Gao et al.,JIPB,Apr,2014;Vol 56,Issue 4,343-349。精确加工gRNA的方法例如可以参见WO 2018/149418。
在一些实施方案中,所述基因组编辑系统包含至少一对pegRNA和/或含有编码所述至少一对pegRNA的核苷酸序列的表达构建体。在一些实施方案中,所述pegRNA对中的两种pegRNA被设置为靶向基因组DNA的相同链上的不同靶序列。在一些实施方案中,所述pegRNA对中的两种pegRNA被设置为靶向基因组DNA的不同链上的靶序列。在一些实施方案中,所述pegRNA对中的一种pegRNA的靶序列的PAM位于有义链,而另一种pegRNA的PAM位于反义链。在一些实施方案中,所述两种pegRNA的诱导的切口分别位于待修饰位点的两侧。在一些实施方案中,针对有义链的pegRNA诱导的切口位于待修饰位点的上游(5’方向),针对反义链的pegRNA诱导的切口位于待修饰位点的下游(3’方向)。所述上游或下游相对于有义链而言。在一些实施方案中,所述两种pegRNA的诱导的切口相距大约1个-大约300个或更多个核苷酸,例如相距1-15个核苷酸。
在一些实施方案中,所述pegRNA对中的两种pegRNA被设置为导入相同的期望的修饰。举例而言,其中一种pegRNA被设置为在有义链导入A至G的取代,而另一种pegRNA则被设置为在反义链相应位置相应地导入T至C的取代。再举例而言,其中一种pegRNA被设置为在有义链导入两个核苷酸的缺失,另一种pegRNA则被设置为在反义链相应位置同样导入两个核苷酸的缺失。其它类型的修饰可以此类推。可以通过设计合适的RT模板序列来使分别靶向两条不同链的pegRNA实现相同的期望修饰。
为了在植物中获得有效表达,在本发明的一些实施方式中,编码所述融合蛋白的核苷酸序列针对其基因组待进行修饰的植物物种进行密码子优化。
密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修饰核酸序列以便增强在感兴趣宿主 细胞中的表达的方法。不同的物种对于特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关,而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码子。因此,可以将基因定制为基于密码子优化在给定生物中的最佳基因表达。密码子利用率表可以容易地获得,例如在www.kazusa.orjp/codon/上可获得的密码子使用数据库(“Codon Usage Database”)中,并且这些表可以通过不同的方式调整适用。参见,Nakamura Y.等,“Codon usage tabulated from the international DNA sequence databases:status for the year2000.Nucl.Acids Res.,28:292(2000)。
在一些实施方案中,本发明所述融合蛋白由SEQ ID NO:9-11任一所示核苷酸序列编码或包含SEQ ID NO:12-14任一所示氨基酸序列。
可以通过本发明的基因组编辑系统进行基因组修饰的植物包括单子叶植物和双子叶植物,例如,所述植物是作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。
在一方面,本发明提供一种确定本发明的基因组编辑系统的pegRNA中的PBS序列的方法,所述方法包括:
a)根据所使用的CRISPR切口酶识别的PAM和待修饰位点确定至少一个候选靶序列,
b)根据所述CRISPR切口酶在所述至少一个候选靶序列产生的切口位置,获得一系列PBS序列;
c)计算所述PBS序列的Tm;
d)选择Tm不超过52℃,例如Tm为大约18℃-52℃,优选大约24℃-36℃,更优选大约28℃-32℃,更优选大约30℃的PBS序列。
三、产生经遗传修饰的植物的方法
另一方面,本发明提供了一种产生经遗传修饰的植物的方法,包括将本发明的基因组编辑系统导入至少一个所述植物,由此导致所述至少一个植物的基因组中的修饰。所述修饰包括一或多个核苷酸的取代、缺失和/或添加。例如,所述修饰包括一个或多个选自以下的取代:C至T取代、C至G取代、C至A取代、G至T取代、G至C取代、G至A取代、A至T取代、A至G取代、A至C取代、T至C取代、T至G取代、T至A取代;和/或包括一个或多个核苷酸的缺失,例如1个至大约100个或更多个,例如1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个的核苷酸缺失;和/或包括一个或多个核苷酸的插入,例如1个至大约100个或更多个,例如1个至大约100个或更多个,例如1个、2个、3个、4个、5个、大约10个、大约20个、大约30个、大约40个、大约50个、大约75个、大约100个的核苷酸插入。
在一些实施方案中,所述方法还包括从所述至少一个植物筛选具有期望的修饰的植物。
在本发明的方法中,所述基因组编辑系统可以本领域技术人员熟知的各种方法导入植物。可用于将本发明的基因组编辑系统导入植物的方法包括但不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。优选地,通过瞬时转化将所述基因组编辑系统导入植物。
在本发明的方法中,只需在植物细胞中导入或产生所述融合蛋白和gRNA即可实现对基因组的修饰,并且所述修饰可以稳定遗传,无需将编码所述基因组编辑系统的组分的外源多核苷酸稳定转化植物。这样避免了稳定存在的(持续产生的)基因组编辑系统的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的基因组编辑系统转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率,获得不含外源核苷酸序列的经修饰的植物。
在另一些实施方式中,可以将本发明的基因组编辑系统转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录的RNA分子(例如,所述表达构建体是体外转录的RNA分子)转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现基因组编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组中的整合。
因此,在一些实施方式中,使用本发明的方法对植物进行遗传修饰和育种可以获得其基因组无外源多核苷酸整合的植物,即非转基因(transgene-free)的经修饰的植物。
一些实施方案中,所述方法还包括在升高的温度下培养已经导入所述基因组编辑系统的植物细胞、组织或完整植物,所述升高的温度例如是37℃。
在本发明的一些实施方式中,其中所述被修饰的基因组区域与植物性状如农艺性状相关,由此所述修饰取代导致所述植物相对于野生型植物具有改变的(优选改善的)性状,例如农艺性状。
在一些实施方式中,所述方法还包括筛选具有期望的修饰和/或期望的性状如农艺性状的植物的步骤。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的植物的后代。优选地,所述经遗传修饰的植物或其后代具有期望的修饰和/或期望的性状如农艺性状。
在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分,其中所述植物通过本发明上述的方法获得。在一些实施方式中,所述经遗传修饰的植物或其后代或其部分是非转基因的。优选地,所述经遗传修饰的植物或其后代具有期望的遗传修饰和/或期望的性状如农艺性状。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明上述的方法获得的经遗传修饰的第一植物与不含有所述修饰的第二植物杂交,从而将所述修饰导入第二植物。优选地,所述经遗传修饰的第一植物具有期望的性状如农艺性状。
实施例
材料与方法
1、载体构建
nCas9(H840A)-M-MLV构建体、nCas9(H840A)-CaMV构建体、nCas9(H840A)-retron构建体由苏州金唯智公司进行构建,本实施例中所用的M-MLV相比于野生型M-MLV逆转录酶有5个氨基酸突变。M-MLV、RT-CaMV和RT-retron都经过单子叶植物密码子优化。
使用Gibson法将pegRNA片段(包括RT及PBS序列)构建至OsU3启动子启动的载体上,获得适用于水稻的OsU3-pegRNA构建体。使用Gibson法将pegRNA片段(包括RT及PBS序列)构建至TaU6启动子启动的载体上,获得适用于小麦的TaU6-pegRNA构建体。使用Gibson法将5’端及3’端均带有核酶的pegRNA片段(包括RT及PBS序列)构建至玉米Ubiquitin-1(Ubi-1)启动子启动的载体上,获得Ubi-pegRNA-R构建体。
通过T4连接酶将nicking gRNA构建至TaU3启动子启动的载体上,获得TaU3-nick载体。
表1.pegRNA靶向位点及RT与PBS序列列表
Figure PCTCN2020117736-appb-000001
Figure PCTCN2020117736-appb-000002
Figure PCTCN2020117736-appb-000003
Figure PCTCN2020117736-appb-000004
Figure PCTCN2020117736-appb-000005
Figure PCTCN2020117736-appb-000006
粗体示出PAM序列。
2、原生质体分离和转化
本发明中使用的原生质体来自于水稻中花11品种和科农199小麦品种。
2.1水稻苗培养
种子先用75%乙醇漂洗1分钟,再用4%次氯酸钠处理30分钟,无菌水洗涤5次以上。放在M6培养基上培养3-4周,26℃,避光处理。
2.2水稻原生质体分离
(1)剪下水稻茎秆,用刀片将其中间部分切成0.5-1mm的丝,放入0.6M的甘露醇溶液中避光处理10min,再用滤网过滤,将其放入50mL酶解液(0.45μm滤膜过滤)中,抽真空(压强约15Kpa)30min,取出后放置于摇床(10rpm)上室温酶解5h;
(2)加30-50mL W5稀释酶解产物,用75μm尼龙滤膜过滤酶解液于圆底离心管中(50mL);
(3)23℃,250g(rcf),升3降3,离心3min,弃上清;
(4)用20mL W5轻轻悬起细胞,重复步骤(3)
(5)加适量MMG悬浮,待转化。
2.3水稻原生质体转化
(1)分别加所需转化载体各10μg于2mL离心管,混匀后,用去尖的枪头吸取200μL原生质体,轻弹混匀,加入220μL PEG4000溶液,轻弹混匀,室温避光诱导转化20-30min;
(2)加880μL W5轻轻颠倒混匀,250g(rcf),升3降3,离心3min,弃上清;
(3)加1mL WI溶液,轻轻颠倒混匀,轻轻转至转移到流式管中,26℃暗处培养48小时。需要进行37℃处理的原生质体,则转移至流式管后26℃暗处培养12小时后转移至37℃暗处培养8小时,最后再转移至26℃继续培养至总培养时长为48小时。
2.4小麦苗培养
将小麦种子盆栽种植于培养室,于温度25±2℃,光照度1000Lx,光照14~16h/d的条件下培养约1-2周(10天左右)。
2.5小麦原生质体分离
(1)取小麦幼嫩的叶片,用刀片将其中间部分切成0.5-1mm的丝,放入0.6M的甘露醇溶液中避光处理10min,再用滤网过滤,将其放入50mL酶解液(0.45μm滤膜过滤)中,抽真空(压强约15Kpa)30min,取出后放置于摇床(10rpm)上室温酶解5h;
(2)加30-50mL W5稀释酶解产物,用75μm尼龙滤膜过滤酶解液于圆底离心管中(50mL);
(3)23℃,100g(rcf),升3降3,离心3min,弃上清;(4)用10mL W5轻轻悬起,冰上放置30min;原生质体逐渐沉降,弃上清;(5)加适量MMG悬浮,至于冰上,待转化。
2.6小麦原生质体转化
(1)分别加入需转化质粒各10μg于2mL离心管,混匀。
(2)用去尖的枪头吸取200μL原生质体,轻弹混匀,立即加入250μL PEG4000溶液,轻弹混匀,室温避光诱导转化20-30min;
(3)加800μL W5(室温)轻轻颠倒混匀,100g(rcf),升3降3,离心3min,弃上清;
(4)加1mL W5,轻轻颠倒混匀,轻轻转至6孔板中,已预先加入1mL W5,用锡纸包裹6孔板,26℃暗培养48h。
3、流式细胞仪观察细胞荧光情况
流式分析原生质体使用的是FACSAria III(BD Biosciences)仪器,具体操作步骤如下:
(1)仪器开机后,打开BD FACSDiva Software软件,进行仪器校准等操作。
(2)点击“New Protocol”,创建合适的实验方案。
(3)选择“density plot”,画一张FSC/SSC散点图,再画一张GFP/PE-Texas Red散点图。
(4)调节FSC/SSC电压使细胞群体出现在散点图的中央,调节FL1电压,使野生型对照原生质体群体出现在散点图的中央,GFP阳性原生质体群体会在GFP荧光通道信号更强的位置出现(在野生型对照样本中没有这个群体)。如有必要,调节补偿,使GFP阴性和阳性原生质体群体分区分得更明显。
(5)设置门,圈住GFP阳性群体。需要准备一个原生质体阴性对照用以确定门的界限。
(6)右键点击需要分选的细胞群体,选择“left sort”,根据实验需要以及目的细胞的百分比,设置分析条件和分析模式。
(7)对已经准备好的培养在流式管中的原生质体样品依次进行上样,记录相关数据,进行分析。
(8)关闭软件,仪器关机。
4、原生质体DNA提取与扩增子测序分析
3.1原生质体DNA提取
收集原生质体于2mL离心管中,利用CTAB法提取原生质体DNA(~30μL),并利用NanoDrop超微量分光光度计测定其浓度(30-60ng/μL),-20℃保存。
3.2扩增子测序分析
(1)利用基因组引物对原生质体DNA模板进行PCR扩增。20μL扩增体系包含4μL 5×Fastpfu buffer,1.6μL dNTPs(2.5mM),0.4μL Forward primer(10μM),0.4μL Reverse primer(10μM),0.4μL FastPfu polymerase(2.5U/μL),以及2μL DNA template(~60ng)。扩增条件:95℃预变性5min;95℃变性30s,50-64℃退火30s,72℃延伸30s,35个循环;72℃充分延伸5min,12℃保存;
(2)上述扩增产物稀释10倍,取1μL作为第二轮PCR扩增模板,扩增引物为含有Barcode的测序引物。50μL扩增体系包含10μL 5×Fastpfu buffer,4μL dNTPs(2.5mM),1μL Forward primer(10μM),1μL Reverse primer(10μM),1μL FastPfu polymerase(2.5U/μL),以及1μL DNA template。扩增条件如上,扩增循环数为35个循环。
(3)PCR产物于2%琼脂糖凝胶电泳分离,并利用AxyPrep DNA Gel Extraction kit对目的片段进行胶回收,回收产物利用NanoDrop超微量分光光度计进行定量分析;分别取100ng回收产物进行混合,并送生工生物工程有限公司进行扩增子测序文库构建及扩增子测序分析。
(4)待测序完成后,按测序引物对原始数据进行拆分,以WT作为对照,在3次重复试验的不同基因靶向位点上对产物的编辑类型及编辑效率进行比较和分析。
实施例1、Cas9(H840A)切口酶-逆转录酶融合系统在水稻原生质体中对BFP变GFP报告系统进行精确修改
为了测试Cas9(H840A)切口酶-逆转录酶融合(PPEs,plant prime editors)是否可用于 精确修改目标序列(图1-2),构建了nCas9(H840A)-M-MLV构建体(PPE-M-MLV)、nCas9(H840A)-CaMV构建体(PPE-CaMV)、nCas9(H840A)-retron(PPE-retron)构建体、OsU3/TaU6启动子驱动的带有靶点引导RNA及RT和PBS序列的pegRNA构建体、以及TaU3启动子驱动的可在非靶标链上产生切口的nicking gRNA构建体(图3)。通过与BFP变GFP报告系统(图4)观察PPE在原生质体中的工作情况。当BFP基因序列中的“CC”转变为“GT”时,可使第66位氨基酸从组氨酸(H,Histidine)转变为酪氨酸(Y,Tyrosine),使基因编码GFP荧光蛋白,从而使细胞发出绿色荧光。进而可以利用流式细胞仪对PPE的工作效率进行分析。结果显示,缺少逆转录酶的PPE(即PPE3b(ΔM-MLV))不能使细胞发光,而PPE-M-MLV、pegRNA、nicking gRNA与BFP变GFP报告系统共同转化原生质体后(即PPE3b),可以明显观察到细胞发出绿色荧光,效率平均为4.4%(图5-6)。将该系统中的M-MLV逆转录酶替换为来源于CaMV病毒的逆转录酶(即PPE3b-CaMV)或来源于细菌retron系统的逆转录酶(即PPE3b-retron)同样可以使细胞发出绿色荧光,效率分别为3.7%和2.4%(图5-6)。因此,以上结果表明PPE在植物体中具有使报告系统的目标序列按期望进行指定修改的能力,初步证明了PPE可以在植物中工作。而且PPE可以为其他形式及来源的逆转录酶。
实施例2、Cas9(H840A)切口酶-逆转录酶融合系统可以在水稻和小麦原生质体中精确修改基因组序列
为了测试PPE系统是否可以水稻内源位点上工作,针对10个水稻内源位点(OsCDC48-T1、OsCDC48-T2、OsCDC48-T3、OsALS-T1、OsALS-T2、OsDEP1、OsEPSPS-T1、OsEPSPS-T2、OsLDAMR和OsGAPDH),以及7个小麦内源位点(TaUbi10-T1、TaUbi10-T2、TaGW2、TaGASR7、TaLOX2、TaMLO和TaDME)设计并构建了21个pegRNA,测试了PPE2和PPE3(或PPE3b)是否能在这些内源位点上工作。结果表明PPE2和PPE3(或PPE3b)系统都可以在水稻内源位点上实现特定的包括C变T、G变T、A变G、G变A、T变A和C变A的单碱基改变,以及增加或删除特定的碱基,该系统在水稻中的效率最高为8.2%(图7)。PPE2和PPE3(或PPE3b)系统也可以在小麦中实现内源位点的包括A变T、C变G、G变C、T变G和C变A的单碱基改变,效率最高为1.4%(图8)。PPE2和PPE3(或PPE3b)在测试的位点中整体效率差别不大。此外,也观察到了PPE系统存在一定比例的副产物,主要为pegRNA骨架插入或替换(图9)。以上结果表明PPE系统包括PPE2、PPE3和PPE3b都可以在植物内源位点上实现精确的单碱基突变、精准插入和精准删除在内的定向修改。
实施例3、Cas9(H840A)切口酶-逆转录酶融合系统的改进
为了进一步提升PPE编辑系统,测试了PPE-CaMV系统在水稻内源位点的工作情况。结果表明,PPE-CaMV也可以在水稻中实现精确的内源序列的定向修改,效率最高可以达到5.8%(图10),该结果表明来源于其他物种或其他逆转录酶形式的PPE系统同 样可以定向修改植物内源基因。
此外,也测试了核酶加工的pegRNA用于PPE系统的工作情况。构建了Ubi-1驱动的通过核酶加工的带有靶点引导RNA及RT和PBS序列的pegRNA的构建体替换原系统中OsU3驱动pegRNA的构建体,并将该系统命名为(PPE-R,R表示核酶Ribozyme)(图11)。内源靶点的结果表明,利用核酶加工的策略同样可以实现精准的内源序列的改变,PPE-R在部分位点上较PPE有所提升,效率最高可达9.7%(图12)。该结果表明核酶加工的pegRNA或利用II型启动子启动pegRNA均适用于PPE系统。
为了提升PPE系统的编辑效率,将原生质体置于37℃下培养,测试其能否提升编辑效率。挑选了2个水稻内源位点(OsCDC48-T2和OsALS-T2)进行测试。将转化后的原生质体26℃过夜培养后,置于37℃孵育8小时,后放回26℃继续培养,与处理组比较效率。结果显示,37℃处理可以显著提升PPE系统(包括PPE2、PPE3和PPE3b形式)的编辑效率,平均提升了1.6倍(从3.9%提升至6.3%),最高可以提升2.9倍(图13)。
实施例4、测试不同PBS、RT模板长度及nicking gRNA位置对PPE系统的影响
测试了不同PBS、RT模板长度及nicking gRNA位置对PPE系统的影响,以OsCDC48-T1为测试位点的结果表明,所测试的不同的PBS长度(6-16nt)和RT模板长度(7-23nt)都能使特定位点产生定向的序列修改(图14-15),在OsCDC48-T1位点上效率为3.4%至15.3%,在OsCDC48-T2位点上效率为0.9%至8.1%,在OsALS-T2位点上效率为1.1%至10.5%,结果表明不同长度的RT模板和PBS会对编辑效率有显著的影响,并且发现不同的RT模板长度对PPE系统所产生的副产物的比例和类型有较显著的影响(图16)。此外,不同的nicking gRNA也会影响PPE系统的效率,在OsCDC48-T1位点上效率为3.2%至19.2%,在OsCDC48-T2位点上效率为2.9%至8.6%(图17)。
实施例5、PPE系统实现内源位点多种类型精准修改
为了测试PPE系统能否实现多种类型的精准修改,以水稻的4个位点(OsCDC48-T1、OsCDC48-T2、OsALS-T2和OsGAPDH)位点为例设计了12种类型的单碱基转换(N变N,其中N表示A、T、C或G四种碱基类型)、多碱基转换以及不同长度的碱基插入和删除。结果表明,PPE系统可以实现所有类型的单碱基转换,最高效率可达8.0%。该系统也可以实现多碱基的同时转换,效率最高可以达到1.5%(图18)。碱基插入和删除的效率随着所需修改长度的增长而降低,碱基定向插入效率最高可达到3.0%,最长插入长度可达到15nt(图19);碱基定向删除效率最高可达到19.2%,最长删除长度可达到40nt(图20)。因此,该系统可以高效地实现小片段的增删。因此,PPE系统可以实现内源位点的多种类型的定向修改。
实施例6、PPE系统获得定向编辑植物
为了获得定向编辑的植物,选取了OsCDC48-T1和OsALS-T2位点进行测试,构建 了PPE3形式的双元载体(图21),并通过农杆菌转化的方法对水稻进行侵染。最终检测到了12株OsCDC48-T1位点发生预期的精准删除的水稻幼苗,编辑效率为21.8%(12/55);2株OsALS-T2位点发生预期的G变T的单碱基突变的水稻幼苗,编辑效率为14.3%(2/14);以及1株OsCDC48-T1位点发生预期的精准的3对碱基对同时突变的水稻幼苗,编辑效率为2.6%(1/38)(图22)(表2)。也发现在植物中会产生很少量的包含非预期副产物的现象(图23)。以上结果表明PPE系统可以在植物中获得定向修改的突变体植株。
表2.PPE系统获得定向编辑水稻植物
Figure PCTCN2020117736-appb-000007
实施例7、PBS的Tm影响PPE系统的效率
首先根据公开数据评估了控制Tm的PBS长度对植物引发编辑的影响(图24)。结果表明,当PBS Tm接近30℃(OsCDC48-T1为30℃,OsCDC48-T2为28℃和OsALS-T1为30℃)时,编辑的效率最高。
然后评估了使用Tm为18℃至52℃的PBS(对应于6nt至17nt的PBS长度)的PPE在水稻原生质体中四个靶位点(OsACC-T1、OsCDC48-T3、OsEPSPS-T1,和OsPDS-T1)上的编辑效率。结果见图24。当PBS Tm接近30℃(OsACC-T1中为24-30℃,OsEPSPS-T1中为26-34℃,OsCDC48-T3中为28-36℃,OsPDS-T1中为30℃)时,pegRNA具有更高的活性。在这些PBS Tm下,编辑效率比其他PBS Tm高1.5到4.3倍。还测试了另外六个靶位点(见图24),包括OsALS-T2、OsDEP1-T1、OsEPSPS-T2、OsAAT-T1、OsGAPDH-T1和OsLDMAR-T1,发现六个靶位点中的五个(OsEPSPS-T2是例外)有类似表现。
然后,对所有13个靶位点在不同PBS Tm下的PPE的整体编辑效率进行了归一化和比较。结果表明,编辑效率遵循正态分布(P>0.1)(图25),通常在PBS Tm 30℃时达到最大值,随后是PBS Tm 32℃和28℃,而随着PBS Tm升高或降低效率均降低。
由此得出结论,PBS序列的解链温度与PPE编辑效率密切相关(图25),并且很可能是影响植物pegRNA设计的主要因素。建议利用30℃的PBS Tm指导PPE中pegRNA的设计。
实施例8、使用双pegRNA策略显著提高PPE系统的效率
为了优化最佳编辑,本发明人开发了一种双-pegRNA策略,该策略使用分别针对正向和反向DNA链的不同pegRNA(分别称为NGG-pegRNA和CCN-pegRNA),其同时编码相同编辑(图26)。从十个水稻基因中选择了15个靶位点,并为每个靶设计了一对pegRNA(表3)。然后,在同一位置比较了仅NGG-pegRNA,仅CCN-pegRNA和双-pegRNA的编辑活性。
表3.双-pegRNA的靶序列、RT模板和PBS序列
Figure PCTCN2020117736-appb-000008
Figure PCTCN2020117736-appb-000009
每个靶位点的PAM粗体显示,PBS下划线表示。
正如预期的那样,双-pegRNA策略在大多数靶位点(15个中的13个)中具有最高的活性。他们产生了C-to-A,G-to-A,G-to-T,A-to-G,T-to-A,C-to-G和CT-AG的点突变,1bp(T)或2bp(AT)缺失,以及1bp(A)插入,最大编辑效率达到24.5%(图27)。
双-pegRNA在所有测试位点的编辑效率比单个NGG-pegRNA高约4.2倍(OsNRT1.1B(插入A)最高27.9倍),比单个CCN-pegRNA平均高1.8倍(最高为OsALS(A变G)的7.2倍)。并且,使用双pegRNA的副产物的比例不高于单pegRNA(图28)。
基于水稻参考基因组(Os-Nipponbare reference IRGSP-1.0)的计算分析表明,当引发编辑窗口为从+1到+15时,双-pegRNA理论上可以靶向20.0%的基因组碱基。但是,当与具有NG PAM的SpCas9-NG变体结合使用时,双-pegRNA策略能够靶向87.9%的水稻碱基(图29)。
序列描述
>SEQ ID NO:1 野生型SpCas9氨基酸序列
>SEQ ID NO:2 nCas9(H840A)氨基酸序列
>SEQ ID NO:3 野生型M-MLV氨基酸序列
>SEQ ID NO:4 本发明M-MLV氨基酸序列
>SEQ ID NO:5 CaMV逆转录酶氨基酸序列
>SEQ ID NO:6 retron逆转录酶氨基酸序列
>SEQ ID NO:7 32aa linker氨基酸序列
>SEQ ID NO:8 gRNA支架序列
>SEQ ID NO:9 本发明融合蛋白nCas9(H840A)-M-MLV的基因序列
>SEQ ID NO:10 本发明融合蛋白nCas9(H840A)-CaMV的基因序列
>SEQ ID NO:11 本发明融合蛋白nCas9(H840A)-retron的基因序列
>SEQ ID NO:12 本发明融合蛋白nCas9(H840A)-M-MLV的氨基酸序列
>SEQ ID NO:13 本发明融合蛋白nCas9(H840A)-CaMV的氨基酸序列
>SEQ ID NO:14 本发明融合蛋白nCas9(H840A)-retron的氨基酸序列

Claims (19)

  1. 一种用于靶向性修饰植物基因组的植物基因组编辑系统,其包含:
    i)融合蛋白和/或含有编码所述融合蛋白的核苷酸序列的表达构建体,其中所述融合蛋白包含CRISPR切口酶和逆转录酶;和/或
    ii)至少一种pegRNA和/或含有编码所述至少一种pegRNA的核苷酸序列的表达构建体,
    其中所述至少一种pegRNA从5’至3’方向包含引导序列、支架序列、反转录(RT)模板序列和引物结合位点(PBS)序列,
    其中所述至少一种gRNA能够与所述融合蛋白形成复合物并将所述融合蛋白靶向基因组中的靶序列,导致所述靶序列内的切口。
  2. 权利要求1的系统,其中所述CRISPR切口酶是Cas9切口酶,例如包含SEQ ID NO:2所示氨基酸序列。
  3. 权利要求1或2的系统,其中所述逆转录酶是M-MLV逆转录酶,优选氨基酸序列如SEQ ID NO:4所示的增强型M-MLV逆转录酶,或者所述逆转录酶是SEQ ID NO:5所示的CaMV逆转录酶或SEQ ID NO:6所示的retron逆转录酶。
  4. 权利要求1-3中任一项的系统,其中pegRNA中的引导序列被设置为与靶序列具有充分序列相同性,从而能够通过碱基配对与靶序列的互补链结合,实现序列特异性靶向。
  5. 权利要求1-4中任一项的系统,其中所述pegRNA的支架序列包含SEQ ID NO:8所述序列。
  6. 权利要求1-5中任一项的系统,其中所述引物结合序列被设置为与所述靶序列的至少一部分互补,优选地,所述引物结合序列与所述切口导致的3’游离单链的至少一部分互补,特别是与所述3’游离单链的3’末端的核苷酸序列互补。
  7. 权利要求1-6中任一项的系统,其中所述引物结合序列的Tm(解链温度)为大约18℃-52℃,优选大约24℃-36℃,更优选大约28℃-32℃,更优选大约30℃。
  8. 权利要求1-7中任一项的系统,其中,所述RT模板序列被设置为对应于切口下游的序列,并包含期望的修饰,所述修饰包括一或多个核苷酸的取代、缺失和/或添加。
  9. 权利要求1-8中任一项的系统,其还包括切口gRNA和/或含有编码所述切口gRNA的核苷酸序列的表达构建体,所述切口gRNA包含引导序列和支架序列,所示引导序列被设置为与基因组中的靶序列具有充分序列相同性,从而能够将所述融合蛋白靶向所述靶序列,并导致所述靶序列内的切口,所述切口gRNA的靶序列与所述pegRNA的靶序列位于基因组DNA的相对链上,所述切口gRNA诱导的切口和所述pegRNA诱导的切口相距大约1个-大约300个核苷酸。
  10. 权利要求1-8中任一项的系统,其包含至少一对pegRNA和/或含有编码所述至 少一对pegRNA的核苷酸序列的表达构建体。
  11. 权利要求10的系统,所述pegRNA对中的两种pegRNA被设置为靶向基因组DNA的相同链上的不同靶序列,或者,所述pegRNA对中的两种pegRNA被设置为靶向基因组DNA的不同链上的靶序列。
  12. 权利要求10或11的系统,所述pegRNA对中的一种pegRNA的靶序列的PAM位于有义链,而另一种pegRNA的PAM位于反义链。
  13. 权利要求10-12中任一项的系统,所述两种pegRNA的诱导的切口分别位于待修饰位点的两侧。
  14. 权利要求13的系统,其中针对有义链的pegRNA诱导的切口位于待修饰位点的上游(5’方向),针对反义链的pegRNA诱导的切口位于待修饰位点的下游(3’方向)。
  15. 权利要求14的系统,所述两种pegRNA的诱导的切口相距大约1个-大约300个或更多个核苷酸,例如相距1-15个核苷酸。
  16. 权利要求10-15中任一项的系统,所述pegRNA对中的两种pegRNA被设置为导入相同的期望的修饰。
  17. 一种产生经遗传修饰的植物的方法,包括将权利要求1-16中任一项的基因组编辑系统导入至少一个所述植物,由此导致所述至少一个植物的基因组中的修饰,例如所述修饰包括一或多个核苷酸的取代、缺失和/或添加。
  18. 权利要求17的方法,其中所述导入包括将权利要求1-16中任一项的基因组编辑系统转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物;或者
    所述导入包括将权利要求1-16中任一项的基因组编辑系统转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。
  19. 权利要求18的方法,所述方法还包括在升高的温度下培养已经导入所述基因组编辑系统的植物细胞、组织或完整植物,例如所述升高的温度是37℃。
PCT/CN2020/117736 2019-11-01 2020-09-25 靶向性修饰植物基因组序列的方法 WO2021082830A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
BR112022008468A BR112022008468A2 (pt) 2019-11-01 2020-09-25 Método para modificação direcionada de uma sequência de genoma de planta
EP20882981.2A EP4053284A4 (en) 2019-11-01 2020-09-25 METHOD FOR TARGETED MODIFICATION OF PLANT GENOME SEQUENCE
US17/773,426 US20230075587A1 (en) 2019-11-01 2020-09-25 Method for targeted modification of sequence of plant genome
CN202080077133.2A CN114945671A (zh) 2019-11-01 2020-09-25 靶向性修饰植物基因组序列的方法

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911062005.6 2019-11-01
CN201911062005 2019-11-01
CN202010036374.4 2020-01-14
CN202010036374 2020-01-14

Publications (1)

Publication Number Publication Date
WO2021082830A1 true WO2021082830A1 (zh) 2021-05-06

Family

ID=75715770

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117736 WO2021082830A1 (zh) 2019-11-01 2020-09-25 靶向性修饰植物基因组序列的方法

Country Status (5)

Country Link
US (1) US20230075587A1 (zh)
EP (1) EP4053284A4 (zh)
CN (1) CN114945671A (zh)
BR (1) BR112022008468A2 (zh)
WO (1) WO2021082830A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022242660A1 (en) * 2021-05-17 2022-11-24 Wuhan University System and methods for insertion and editing of large nucleic acid fragments
WO2023030534A1 (zh) * 2021-09-06 2023-03-09 苏州齐禾生科生物科技有限公司 改进的引导编辑系统
WO2023227050A1 (zh) * 2022-05-25 2023-11-30 中国科学院遗传与发育生物学研究所 一种在基因组中定点插入外源序列的方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11866728B2 (en) 2022-01-21 2024-01-09 Renagade Therapeutics Management Inc. Engineered retrons and methods of use

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018149418A1 (en) 2017-02-20 2018-08-23 Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences Genome editing system and method
CN111378051A (zh) * 2020-03-25 2020-07-07 北京市农林科学院 Pe-p2引导编辑系统及其在基因组碱基编辑中的应用

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014186686A2 (en) * 2013-05-17 2014-11-20 Two Blades Foundation Targeted mutagenesis and genome engineering in plants using rna-guided cas nucleases
WO2016080795A1 (ko) * 2014-11-19 2016-05-26 기초과학연구원 두 개의 벡터로부터 발현된 cas9 단백질을 이용한 유전자 발현 조절 방법
EP3942040A1 (en) * 2019-03-19 2022-01-26 The Broad Institute, Inc. Methods and compositions for editing nucleotide sequences
WO2021072328A1 (en) * 2019-10-10 2021-04-15 The Broad Institute, Inc. Methods and compositions for prime editing rna
CN115279898A (zh) * 2019-10-23 2022-11-01 成对植物服务股份有限公司 用于植物中rna模板化编辑的组合物和方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018149418A1 (en) 2017-02-20 2018-08-23 Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences Genome editing system and method
CN111378051A (zh) * 2020-03-25 2020-07-07 北京市农林科学院 Pe-p2引导编辑系统及其在基因组碱基编辑中的应用

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ANZALONE, AV ET AL.: "Search-and-replace genome editing without double-strand breaks or donor DNA", NATURE, vol. 576, no. 7785, 21 October 2019 (2019-10-21), XP036953141, ISSN: 0028-0836, DOI: 10.1038/s41586-019-1711-4 *
GAO ET AL., JIPB, vol. 56, April 2014 (2014-04-01), pages 343 - 349
LIN, QP ET AL.: "Prime genome editing in rice and wheat", NATURE BIOTECHNOLOGY, vol. 38, no. 5, 16 March 2020 (2020-03-16), XP037113496, ISSN: 1087-0156, DOI: 10.1038/s41587-020-0455-x *
NAKAMURA, Y ET AL.: "Codon usage tabulated from the international DNA sequence databases: status for the year 2000", NUCL. ACIDS RES., vol. 28, 2000, pages 292, XP002941557, DOI: 10.1093/nar/28.1.292
SAMBROOK, J.FRITSCH, E.F.MANIATIS, T.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
See also references of EP4053284A4

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022242660A1 (en) * 2021-05-17 2022-11-24 Wuhan University System and methods for insertion and editing of large nucleic acid fragments
WO2023030534A1 (zh) * 2021-09-06 2023-03-09 苏州齐禾生科生物科技有限公司 改进的引导编辑系统
WO2023227050A1 (zh) * 2022-05-25 2023-11-30 中国科学院遗传与发育生物学研究所 一种在基因组中定点插入外源序列的方法

Also Published As

Publication number Publication date
EP4053284A1 (en) 2022-09-07
CN114945671A (zh) 2022-08-26
BR112022008468A2 (pt) 2022-07-19
US20230075587A1 (en) 2023-03-09
EP4053284A4 (en) 2024-03-06

Similar Documents

Publication Publication Date Title
WO2021082830A1 (zh) 靶向性修饰植物基因组序列的方法
WO2019120310A1 (en) Base editing system and method based on cpf1 protein
US11820990B2 (en) Method for base editing in plants
Liang et al. Genome editing of bread wheat using biolistic delivery of CRISPR/Cas9 in vitro transcripts or ribonucleoproteins
RU2679510C2 (ru) Обогащение активируемой флуоресценцией сортировки клеток (facs) для создания растений
AU2017203177B2 (en) Constructs for expressing transgenes using regulatory elements from Setaria ubiquitin genes
WO2021032155A1 (zh) 一种碱基编辑系统和其使用方法
CN110526993B (zh) 一种用于基因编辑的核酸构建物
CN108130342A (zh) 基于Cpf1的植物基因组定点编辑方法
WO2023169454A1 (zh) 腺嘌呤脱氨酶及其在碱基编辑中的用途
EP4116426A1 (en) Multiplex genome editing method and system
CN110892074A (zh) 用于增加香蕉的保质期的组成物及方法
CN110396523B (zh) 一种重复片段介导的植物定点重组方法
JP2022511508A (ja) ゲノム編集による遺伝子サイレンシング
CN112048493A (zh) 一种增强Cas9及其衍生蛋白介导的基因操纵系统的方法及应用
WO2023030534A1 (zh) 改进的引导编辑系统
US9777286B2 (en) Zea mays metallothionein-like regulatory elements and uses thereof
WO2021175288A1 (zh) 改进的胞嘧啶碱基编辑系统
WO2024051850A1 (zh) 基于dna聚合酶的基因组编辑系统和方法
CN104024416B (zh) 用于植物基因表达的终止子序列
CN112662687A (zh) 推迟玉米花期的方法、试剂盒、基因
WO2022188816A1 (zh) 改进的cg碱基编辑系统
CN112458102B (zh) 桃热激转录因子PpHSF5及其应用
WO2023227050A1 (zh) 一种在基因组中定点插入外源序列的方法
CN116042573A (zh) 一种提高引导编辑系统碱基编辑效率的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20882981

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022008468

Country of ref document: BR

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020882981

Country of ref document: EP

Effective date: 20220601

ENP Entry into the national phase

Ref document number: 112022008468

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20220502