CN118109441A

CN118109441A - TadA8e mutant proteins and their use in mediating single base C editing

Info

Publication number: CN118109441A
Application number: CN202410254482.7A
Authority: CN
Inventors: 周焕斌; 于曼; 任斌; 旷永洁
Original assignee: Institute of Plant Protection of Chinese Academy of Agricultural Sciences
Current assignee: Institute of Plant Protection of Chinese Academy of Agricultural Sciences
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2024-05-31

Abstract

The invention relates to TadA e mutant protein and application thereof in mediating single base C editing, wherein the amino acid sequence of TadA e mutant protein is shown as SEQ ID No.2 or SEQ ID No. 4.

Description

TadA8e mutant proteins and their use in mediating single base C editing

Technical Field

The invention relates to the field of functional proteins, in particular to TadA e mutant proteins and application thereof in mediating single base C editing.

Background

The rapid development of gene editing technologies (ZFNs, TALENs and CRISPR/Cas 9) plays an increasingly important role in the biomedical and agricultural fields. The technology uses nuclease to cleave target site genes, thereby generating DNA double strand breaks, and then uses Non-homologous end-joining (NHEJ) repair or homologous recombination repair (HDR) mechanisms to edit the genome. The DNA base editing technology developed based on the CRISPR/Cas system can perform vertex substitution on a target DNA base under the condition that double-stranded DNA breaks or any donor template is not needed, so that accurate editing of a genome is realized; accelerating the genetic improvement process of animals and plants, and having important significance for the research of plant gene functions and molecular breeding.

Single Nucleotide Polymorphisms (SNPs) are the genetic basis for improvement of important agronomic traits in crops. The CRISPR-based base editing system can perform single nucleotide substitution in a target gene in a sequence specific manner to generate loss of function or mutation of function acquisition, so that annotation and correction of gene functions are greatly accelerated, and the de novo domestication or directed evolution of crop endogenous genes is realized. Single base editor ABE (A to G base editing), etc., enables the artificial evolution of agriculturally important genes in crops to generate new gene/allele resources and germplasm. Developing base editors that produce more base substitution types can expand their application, helping to create new germplasm resources.

Disclosure of Invention

The invention provides TadA e mutant protein, the amino acid sequence of which is shown as SEQ ID No.2 or SEQ ID No. 4.

The second invention provides a fusion protein, wherein the amino acid sequence of the fusion protein is formed by connecting TadA e mutant protein according to the first invention and a first Cas protein in series from N end to C end; or the amino acid sequence is formed by serially connecting TadA e mutant protein, second Cas protein and uracil-N-glycosylase in sequence from the N end to the C end.

In a specific embodiment, the amino acid sequences of the first Cas protein and the second Cas protein are independently as shown in SEQ ID No. 8.

In a specific embodiment, the uracil-N-glycosylase has an amino acid sequence as set forth in SEQ ID No. 12.

The third invention provides a nucleic acid encoding a TadA e mutein according to one of the invention; preferably, the nucleotide sequence of the nucleic acid is shown as SEQ ID No.1 or SEQ ID No. 3.

The fourth invention provides nucleic acids encoding fusion proteins according to the second invention.

In a specific embodiment, the base sequences encoding the first Cas protein and encoding the second Cas protein are independently as set forth in SEQ ID No. 7.

In a specific embodiment, the base sequence encoding the uracil-N-glycosylase is shown in SEQ ID No. 11.

The fifth aspect of the present invention provides a gene editing system comprising an I-th regulatory element and an II-th regulatory element; wherein,

The first conditional element is a nucleic acid according to the fourth aspect of the invention, the first conditional element encoding a fusion protein according to the second aspect of the invention;

The condition II element comprises a gRNA for gene editing and an sgRNA element located at the 3' end of the gRNA and linked to the gRNA and transcriptionally fused to direct the protein encoded by the regulatory element I to a target site in the genome of the organism to be mutated.

In a specific embodiment, the sgRNA element is located on a pENTR4:sgRNA vector.

The sixth invention provides the use of TadA e muteins according to one of the invention, the fusion proteins according to two of the invention, the nucleic acids according to four of the invention or the gene editing system according to five of the invention for single base C editing.

In a specific embodiment, for the TadA e mutein having the amino acid sequence shown in SEQ ID No.2, it can be achieved that the editing from single base C to single base G is mediated.

In a specific embodiment, for the TadA e mutein having the amino acid sequence shown in SEQ ID No.4, it is possible to realize an editing mediating the transition from single base C to single base T.

In a specific embodiment, the use is for knocking out genes in the genome of a graminaceous plant.

In a specific embodiment, the graminaceous plant is rice.

The invention has the beneficial effects that: the TadA e mutant protein can mediate Cas9 protein to edit single base C in gramineous plants such as rice, specifically, the TadA-CG mutant protein can mediate Cas9 protein to edit single base C to G mutation, and the TadA-N46P mutant protein can mediate Cas9 protein to edit single base C to T mutation.

Detailed Description

The above-described aspects of the invention are described in further detail below in the form of preferred embodiments, which are not to be construed as limiting the invention.

Reagents for use in the examples of the invention are commercially available unless otherwise specified.

Example 1

TadA8e protein site-directed mutagenesis

Based on rice codon optimization TadA e gene, and replacing the codons from the 5 'end to the 76 th to 81 th positions with AAGGCC, the codons from the 5' end to the 133 th to 135 th positions with CCT, the codons from the 5 'end to the 178 th to 180 th positions with ATT, and the codons from the 5' end to the 283 th to 285 th positions with AAT, tadA-CG mutant gene is obtained. The nucleotide sequence of TadA-CG mutant gene is shown as SEQ ID No.1, and the coded amino acid sequence is shown as SEQ ID No. 2.

Based on rice codon optimization TadA e gene and replacing the 133 th to 135 th codons from the 5' end with CCT, tadA-N46P mutant gene is obtained, and the 45 th asparagine from the N end of the protein is mutated into proline. The nucleotide sequence of TadA-N46P mutant gene is shown as SEQ ID No.3, and the coded amino acid sequence is shown as SEQ ID No. 4.

Example 1

PUbi-TadA-CG-SpCas9n-UNG vector construction and rice endogenous gene knockout using same

The TadA-CG mutant gene (the base sequence of which is shown as SEQ ID No. 1) is artificially synthesized by entrusted to the biological limited company of the family Optimaceae (Beijing), and is connected to a pUC57 plasmid vector to obtain a positive plasmid pUC57-TadA-CG.

The pUC57-TadA-CG plasmid is used as a template, and Cas9-CG-F1 (the base sequence of which is shown as SEQ ID No. 5)/Cas 9-CG-R1 (the base sequence of which is shown as SEQ ID No. 6) is used as a primer, and a PCR fragment of about 3.3kb including the pUC57 vector is obtained by amplification.

The base sequence of the SpCas9n gene is shown as SEQ ID No.7, and the encoded amino acid sequence is shown as SEQ ID No. 8.

The pUbi-Cas9 plasmid (H.Zhou,B.Liu,D.P.Weeks,M.H.Spalding&B.Yang.Large chromosomal deletions and heritable small genetic changes induced by CRISPR/Cas9 in rice.Nucleic Acids Res.2014,42(17):10903-10914) is used as a template, and Cas9-F (the base sequence of which is shown as SEQ ID No. 9)/Cas 9-R (the base sequence of which is shown as SEQ ID No. 10) is used as a primer, so that the SpCas9n PCR fragment with about 4.1kb is obtained through amplification.

The base sequence of uracil-N-glycosylase (UNG) gene is shown as SEQ ID No.11, and the encoded amino acid sequence is shown as SEQ ID No. 12.

The uracil-N-glycosylase gene was artificially synthesized by the order of Optimago, inc. (Beijing) and ligated to pUC57 plasmid vector to obtain a positive plasmid pUC57-UNG. The pUC57-UNG plasmid is used as a template, and Cas9-UNG-F1 (the base sequence of which is shown as SEQ ID No. 13)/Cas 9-UNG-R1 (the base sequence of which is shown as SEQ ID No. 14) is used as a primer to amplify to obtain a 966bp PCR fragment.

And carrying out an infusion connection on the three fragments to obtain pUC57-TadA-CG-SpCas9-UNG. Then, pUC57-TadA-CG-SpCas9-UNG vector and pUbi-Cas9 vector (H.Zhou,B.Liu,D.P.Weeks,M.H.Spalding&B.Yang.Large chromosomal deletions and heritable small genetic changes induced by CRISPR/Cas9 in rice.Nucleic Acids Res.2014,42(17):10903-10914) were digested with Acc65I/SpeI to obtain 5694bp and 12025bp fragments, respectively, and ligated with T4 ligase (available from Nanjinouzan Biotechnology Co., ltd.) to construct pUbi-TadA-CG-SpCas9n-UNG vector.

Multiple endogenous genes of rice were knocked out using pUbi-TadA-CG-SpCas9n-UNG, in this example exemplified by the editing of C in genes OsCOI2 and OsJAR.

The transcribed and genomic sequences of genes OsCOI and OsJAR2 were obtained from the MSU/TIGR rice genome database (http:// rice. Plant biology, MSU. Edu /).

For OsCOI2 gene, primers designed to contain a gRNA sequence (SEQ ID No.15, PAM cgg) that matches the end ligation of the BtgZI cleavage site were gOsCOI-F1 (SEQ ID No.16, btgZI cleavage sticky end tgtt at the 5' end) and gOsCOI2-R1 (SEQ ID No. 17).

For OsJAR2 gene, primers designed to contain a gRNA sequence (SEQ ID No.18, PAM cgg) that matches the end ligation of the BtgZI cleavage site were gOsJAR-F1 (SEQ ID No.19, btgZI cleavage sticky end tgtt at the 5' end) and gOsJAR2-R1 (SEQ ID No. 20).

After synthesis of the primer, gOsCOI-F1/gOsCOI-R1 was phosphorylated using T4 polynucleotide kinase, annealed to form a double strand, and then gOsCOI-F1/gOsCOI-R1 was cloned into pENTR4:sgRNA vector (CN 202111388739.0) at BtgZI cleavage site, and sequencing result showed correct, resulting in pENTR4:sgRNA-gOsCOI2, wherein forward primer indicates the gRNA sequence inserted into the cleavage site on pENTR4:sgRNA vector. pENTR4: sgRNA-gOsCOI was linearized by ApaI cleavage and pUbi-TadA-CG-SpCas9n-UNG was fused with pENTR4: sgRNA-gOsCOI2 by kit ClonExpress IIOne Step Cloning Kit (available from Nanjinopran Biotechnology Co., ltd.) to give pUbi-TadA-CG-SpCas9n-UNG-gOsCOI2. By the same procedure, gOsJAR-F1/gOsJAR 2-R1 was phosphorylated, cloned into pENTR4:sgRNA vector at BtgZI cleavage site, and the sequencing result showed correct, resulting in pENTR4:sgRNA-gOsJAR2, pENTR4:sgRNA-gOsJAR was linearized by ApaI cleavage, pUbi-TadA-CG-SpCas9n-UNG and pENTR4:sgRNA-gOsJAR were obtained by kit ClonExpress IIOne Step Cloning Kit (available from Nannuo Renzan Biotech Co., ltd.) to give pUbi-TadA-CG-SpCas9 n-UNG-gOsJAR.

PUbi-TadA-CG-SpCas9n-UNG-gOsCOI and pUbi-TadA-CG-SpCas9n-UNG-gOsJAR are respectively transformed into japonica rice variety Kitaake by an agrobacterium transformation method, and then DNA is extracted from a regenerated rice plant of the T0 generation by a CTAB method.

For pUbi-TadA-CG-SpCas9n-UNG-gOsCOI2: specific PCR primers for identification were designed based on the target site DNA sequence of OsCOI gene: HT-gOsCOI2-F1 (SEQ ID No. 21) and HT-gOsCOI2-R1 (SEQ ID No. 22), and PCR amplification is carried out by taking the corresponding genomic DNA of the T0 generation regenerated rice plant as a template, and sequencing detection is carried out on the PCR products. After sequencing by analysis, osCOI2 of the edited sequence was obtained, and a mutation of 35.4% C to G was detected in 48 regenerated seedlings of rice. Wherein, when the gene is edited, the mutation of C-13 (namely, the first position of PAM sequence is-1 and the 13 th position is base C) to G can be realized.

For pUbi-TadA-CG-SpCas9n-UNG-gOsJAR2: specific PCR primers for identification were designed based on the target site DNA sequence of OsJAR gene: HT-gOsJAR2-F1 (SEQ ID No. 23) and HT-gOsJAR2-R1 (SEQ ID No. 24), and PCR amplification is carried out by taking the corresponding genomic DNA of the T0 generation regenerated rice plant as a template, and sequencing detection is carried out on the PCR products. After sequencing by analysis, osJAR2 of the edited sequence was obtained, and 50% C to G mutation was detected in 38 rice seedlings. Wherein, when the gene editing is performed, the mutation of C-12 (namely, the first position of PAM sequence is-1 and the 12 th position is base C) and C-15 to G can be realized.

Example 2

PUbi-TadA-N46P-SpCas9N-UNG vector construction and rice endogenous gene knockout using same

The TadA-N46P mutant gene (the base sequence of which is shown as SEQ ID No. 3) was artificially synthesized by the entrusted engine biological Co., ltd., beijing) and ligated to pUC57 plasmid vector to obtain a positive plasmid pUC57-TadA-N46P.

The pUC57-TadA-N46P plasmid is used as a template, and Cas9-CG-F1 (the base sequence of which is shown as SEQ ID No. 5)/Cas 9-N46P-R1 (the base sequence of which is shown as SEQ ID No. 25) is used as a primer to amplify and obtain a PCR fragment of about 3.3kb including the pUC57 vector.

The pUbi-Cas9 plasmid is used as a template, and Cas9-F (the base sequence of which is shown as SEQ ID No. 9)/Cas 9-R (the base sequence of which is shown as SEQ ID No. 10) is used as a primer, so that the SpCas9n PCR fragment with about 4.1kb is obtained through amplification.

The pUC57-UNG plasmid is used as a template, and Cas9-UNG-F1 (the base sequence of which is shown as SEQ ID No. 13)/Cas 9-UNG-R1 (the base sequence of which is shown as SEQ ID No. 14) is used as a primer to amplify and obtain a PCR fragment of about 950 bp.

And carrying out an infusion connection on the three fragments to obtain pUC57-TadA-N46P-SpCas9-UNG. Then, pUC57-TadA-N46P-SpCas9-UNG vector and pUbi-Cas9 vector were digested with Acc65I/SpeI to obtain 5694bp and 12025bp fragments, respectively, and ligated with T4 ligase (available from Nanjinouzan Biotechnology Co., ltd.) to construct pUbi-TadA-N46P-SpCas9N-UNG vector.

Multiple endogenous genes of rice were knocked out using pUbi-TadA-N46P-SpCas9N-UNG, in this example exemplified by the editing of C in the OsCOI and OsALS genes.

The transcribed and genomic sequences of genes OsCOI and OsALS1 were obtained from the MSU/TIGR rice genome database (http:// rice. Plant biology, MSU. Edu /).

For OsALS1 gene, primers designed to contain a gRNA sequence (SEQ ID No.26, PAM agg) that matches the end ligation of the BtgZI cleavage site were gOsALS-F1 (SEQ ID No.27, btgZI cleavage sticky end tgtt at the 5' end) and gOsALS1-R1 (SEQ ID No. 28).

After synthesis of the primer, gOsCOI-F1/gOsCOI-R1 was phosphorylated using T4 polynucleotide kinase, annealed to form a double strand, and then gOsCOI-F1/gOsCOI-R1 was cloned into pENTR4:sgRNA vector (CN 202111388739.0) at BtgZI cleavage site, and sequencing result showed correct, resulting in pENTR4:sgRNA-gOsCOI2, wherein forward primer indicates the gRNA sequence inserted into the cleavage site on pENTR4:sgRNA vector. pENTR4: sgRNA-gOsCOI was linearized by ApaI cleavage and pUbi-TadA-N46P-SpCas9N-UNG was fused with pENTR4: sgRNA-gOsCOI2 by kit ClonExpress IIOne Step Cloning Kit (available from Nanjinopran Biotechnology Co., ltd.) to give pUbi-TadA-N46P-SpCas9N-UNG-gOsCOI2. By the same procedure, gOsALS-F1/gOsALS 1-R1 was phosphorylated, cloned into pENTR4:sgRNA vector at BtgZI cleavage site, and the sequencing result showed correct, resulting in pENTR4:sgRNA-gOsALS1, pENTR4:sgRNA-gOsALS1 was linearized by ApaI cleavage, pUbi-TadA-N46P-SpCas9N-UNG and pENTR4:sgRNA-gOsALS1 were obtained by kit ClonExpress IIOne Step Cloning Kit (available from Nannuo Renzan Biotechnology Co., ltd.) to give pUbi-TadA-N46P-SpCas9N-UNG-gOsALS1.

PUbi-TadA-N46P-SpCas9N-UNG-gOsCOI2 and pUbi-TadA-N46P-SpCas9N-UNG-gOsALS1 are respectively transformed into japonica rice variety Kitaake by using an agrobacterium transformation method, and then DNA is extracted from a T0 generation regenerated rice plant by using a CTAB method.

For pUbi-TadA-N46P-SpCas9N-UNG-gOsCOI2, specific PCR primers for identification were designed based on the target site DNA sequence of the OsCOI gene: HT-gOsCOI2-F1 (SEQ ID No. 21) and HT-gOsCOI2-R1 (SEQ ID No. 22), and PCR amplification is carried out by taking the corresponding genomic DNA of the T0 generation regenerated rice plant as a template, and sequencing detection is carried out on the PCR products. After analysis and sequencing, the edit sequence of OsCOI2 is obtained, 27.1% of C to T mutation and 6.3% of C to G mutation are detected in 48 rice regenerated seedlings, and the main function of pUbi-TadA-N46P-SpCas9N-UNG plasmid is C to T mutation. Wherein, when the gene is edited, the mutation from C-12 (i.e., the 12 th base C) to C-14 to T can be realized.

For pUbi-TadA-N46P-SpCas9N-UNG-gOsALS1, specific PCR primers for identification were designed based on the target site DNA sequence of the OsALS gene: HT-gOsALS1-F1 (SEQ ID No. 29) and HT-gOsALS1-R1 (SEQ ID No. 30), and PCR amplification is performed by using the corresponding genomic DNA of the T0 generation regenerated rice plant as a template, and sequencing detection is performed on the PCR products. After analysis and sequencing, the edit sequence of OsALS1 was obtained, and 83% of C to T mutation and 4.2% of C to G mutation were detected in 48 rice regenerated seedlings, and it was found that the main function of pUbi-TadA-N46P-SpCas9N-UNG plasmid was C to T mutation. Wherein a mutation of C-13 (i.e., base C at position 13) to C-17 to T can be achieved.

Claims

1. A TadA e mutant protein has an amino acid sequence shown as SEQ ID No.2 or SEQ ID No. 4.

2. A fusion protein having an amino acid sequence from N-terminus to C-terminus formed by concatenating the TadA e mutein of claim 1 with a first Cas protein; or (b)

The amino acid sequence of the polypeptide is formed by serially connecting TadA e mutant protein, second Cas protein and uracil-N-glycosylase according to claim 1 from N end to C end.

3. The fusion protein of claim 2, wherein the amino acid sequences of the first Cas protein and the second Cas protein are independently set forth in SEQ ID No. 8.

4. The fusion protein of claim 2, wherein the uracil-N-glycosylase has an amino acid sequence as set forth in SEQ ID No. 12.

5. A nucleic acid encoding the TadA e mutein of claim 1; preferably, the nucleotide sequence of the nucleic acid is shown as SEQ ID No.1 or SEQ ID No. 3.

6. A nucleic acid encoding the fusion protein of claims 2 to 4;

preferably, the base sequences encoding the first Cas protein and encoding the second Cas protein are independently as shown in SEQ id No. 7;

preferably, the base sequence encoding the uracil-N-glycosylase is shown in SEQ ID No. 11.

7. A gene editing system comprising an I-th regulatory element and an II-th regulatory element; wherein,

The condition I element is the nucleic acid of claim 6, the condition I element encoding the fusion protein of claims 2 to 4;

The condition II element comprises a gRNA for gene editing, and an sgRNA element located at the 3' end of the gRNA and linked to the gRNA and transcriptionally fused to direct the protein encoded by the I regulatory element to a target site in the genome of the organism to be mutated; preferably, the sgRNA element is located on a pENTR4:sgRNA vector.

8. Use of TadA e mutein according to claim 1, fusion protein according to claims 2 to 4, nucleic acid according to claim 5 or 6 or gene editing system according to claim 9 for single base C editing.

9. Use according to claim 8, characterized in that for the TadA e mutein of the amino acid sequence shown in SEQ ID No.2 it mediates the editing from single base C to single base G;

For the TadA e mutein with the amino acid sequence shown in SEQ ID No.4, it mediates the editing from single base C to single base T.

10. The use according to claim 8, characterized in that it is for knocking out genes in the genome of a graminaceous plant;

Preferably, the gramineous plant is rice.