CN113774082A

CN113774082A - Method for expressing nucleic acid

Info

Publication number: CN113774082A
Application number: CN202010442805.7A
Authority: CN
Inventors: 谢洪涛; 李羽; 张洋扬; 刘帅
Original assignee: Shandong Shunfeng Biotechnology Co Ltd
Current assignee: Shandong Shunfeng Biotechnology Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2021-12-10
Also published as: CN113994007A; CN113994007B; WO2021233442A1

Abstract

The invention provides a nucleic acid expression method, in particular to a nucleic acid construct, which adopts a nucleic acid construct driven by a specific promoter to successfully realize gRNA-guided efficient base site-directed mutagenesis in plants.

Description

Method for expressing nucleic acid

Technical Field

The invention relates to the technical field of biology, in particular to a nucleic acid expression method.

Background

At present, single base editing efficiency in dicotyledonous plants is low, most dicotyledonous plants such as soybean cannot be edited at present, single base editing efficiency of arabidopsis thaliana/tomato and other plants is low, and application of biotechnology breeding in agricultural production is seriously influenced. Therefore, increasing the efficiency of single base editing in dicotyledonous plants has great commercial value in agricultural production.

Therefore, there is an urgent need in the art to develop a method for improving single base editing efficiency in plants.

Disclosure of Invention

The purpose of the present invention is to provide a method for improving the efficiency of single base editing in plants.

In a first aspect, the invention provides a nucleic acid construct having the structure of formula I5 '-3' (5 'to 3'):

P1-S1-L1-S2-S3 (I)；

in the formula (I), the compound is shown in the specification,

p1, S1, L1, S2 and S3 are elements for constituting the construct, respectively;

p1 is a first promoter sequence, said first promoter comprising the promoter of an elongation factor;

s1, S2 are each independently one or more of (a) a coding sequence for a gene editing enzyme, (b) a coding sequence for an adenine deaminase and/or a cytosine deaminase;

l1 is the coding sequence of no or a linker peptide;

s3 is the coding sequence of UGI (uridine monophosphate synthase inhibitor) without uracil;

and, each "-" is independently a bond or a nucleotide connecting sequence.

In another preferred embodiment, the S1 is a coding sequence of adenine deaminase and/or cytosine deaminase, and the S2 is a coding sequence of gene editing enzyme.

In another preferred embodiment, when S1 is the coding sequence of adenine deaminase, S3 is null.

In another preferred embodiment, when S1 is the coding sequence of cytosine deaminase, S3 is the coding sequence of uracil glycosidase inhibitor UGI.

In another preferred embodiment, the elongation factor comprises a eukaryotic elongation factor or a prokaryotic elongation factor.

In another preferred embodiment, the eukaryotic extension factor comprises EF1 α, EF1 β, EF 2.

In another preferred embodiment, the prokaryotic elongation factor comprises EF-Tu, EF-Ts, EF-G; preferably, EF1 α is included; preferably, EF1 α in plants is included.

In another preferred embodiment, the plant is selected from the group consisting of: corn, rice, soybean, arabidopsis, tobacco, tomato, or combinations thereof.

In another preferred embodiment, the first promoter is derived from one or more plants selected from the group consisting of: corn, rice, soybean, arabidopsis, tobacco, tomato.

In another preferred embodiment, the first promoter is the promoter of tomato EF1 a.

In another preferred embodiment, the sequence of the first promoter is shown in SEQ ID No. 1.

In another preferred embodiment, the length of each L1 nucleotide sequence is independently 3-120nt, preferably 3-96nt, and preferably a multiple of 3.

In another preferred embodiment, the length of the amino acid sequence encoded by L1 is independently 3-40aa, preferably 6-32aa, preferably 18-32aa, preferably 24-32 aa.

In another preferred embodiment, the nucleotide linker sequence is 1 to 300nt, preferably 1 to 100nt in length.

In another preferred embodiment, the nucleotide linker sequence does not affect the normal transcription and translation of the elements.

In another preferred embodiment, the gene editing enzyme is an enzyme of an editing tool selected from the group consisting of: a CRISPR enzyme, TALEN enzyme, ZFN enzyme, or a combination thereof.

In another preferred example, the gene-editing enzyme is derived from a microorganism; preferably of bacterial origin.

In another preferred embodiment, the gene-editing enzyme is derived from a source selected from the group consisting of: streptococcus pyogenes (Streptococcus)_pyogenes) Staphylococcus (Staphylococcus aureus), Streptococcus canis (Streptococcus canis), or combinations thereof.

In another preferred embodiment, the gene-editing enzyme has double-stranded or single-stranded DNA cleavage activity, or no cleavage activity.

In another preferred example, the gene editing enzyme is a CRISPR enzyme having single-stranded DNA cleaving activity.

In another preferred embodiment, the gene-editing enzyme comprises a wild-type or mutant-type gene-editing enzyme.

In another preferred embodiment, the identity of the gene-editing enzyme to the mutated gene-editing enzyme is greater than or equal to 80%, preferably greater than or equal to 90%; more preferably not less than 95%, still more preferably not less than 98% or 99%.

In another preferred embodiment, said mutant gene-editing enzyme is subjected to one or more, preferably 1-15, preferably 1-10, preferably 1-7, more preferably 2-5, amino acid substitutions, deletions by said wild-type gene-editing enzyme; and/or by the addition of 1 to 5, preferably 1 to 4, more preferably 1 to 3, most preferably 1 to 2 amino acids.

In another preferred embodiment, the gene-editing enzyme is selected from the group consisting of: cas9, Cas12, Cas13, Cms1, MAD7, or a combination thereof.

In another preferred embodiment, the gene-editing enzyme is selected from the group consisting of: nCas9, dCas9, nCas9NG, nCas9X, nCas12, nCas13, or a combination thereof.

In another preferred embodiment, the amino acid sequence of the gene-editing enzyme is as shown in SEQ ID No. 2.

In another preferred embodiment, the coding sequence for the gene-editing enzyme is selected from the group consisting of:

(i) a polynucleotide having a sequence as set forth in SEQ ID No. 3;

(ii) polynucleotide having homology of more than or equal to 75% (preferably more than or equal to 85%, more preferably more than or equal to 90% or more than or equal to 95% or more than or equal to 98% or more than or equal to 99%) with the sequence shown in SEQ ID No. 3;

(iii) a polynucleotide in which 1 to 60 (preferably 1 to 30, more preferably 1 to 10) nucleotides are truncated or added at the 5 'end and/or the 3' end of the polynucleotide shown in SEQ ID No. 3;

(iv) (iv) a polynucleotide complementary to any one of the polynucleotides of (i) to (iii).

In another preferred embodiment, the coding sequence of the gene-editing enzyme is shown in SEQ ID No. 3.

In another preferred embodiment, the adenine deaminase comprises a wild type and a mutant.

In another preferred embodiment, the adenine deaminase comprises wild-type and/or mutant TadA.

In another preferred embodiment, the adenine deaminase comprises TadA.

In another preferred embodiment, the mutant form of adenine deaminase comprises TadA 7-10.

In another preferred embodiment, the adenine deaminase is a fusion protein of TadA and TadA 7-10.

In another preferred embodiment, the coding sequence for adenine deaminase is selected from the group consisting of:

(i) a polynucleotide having a sequence as set forth in SEQ ID No.5 or 19;

(ii) polynucleotides having a nucleotide sequence homology of 75% or more (preferably 85% or more, more preferably 90% or more or 95% or more or 98% or more or 99%) to the sequence shown in SEQ ID No.5 or 19;

(iii) a polynucleotide in which 1 to 60 (preferably 1 to 30, more preferably 1 to 10) nucleotides are truncated or added at the 5 'end and/or the 3' end of the polynucleotide shown in SEQ ID NO.5 or 19;

In another preferred embodiment, the coding sequence of adenine deaminase is as shown in SEQ ID No.5 or 19.

In another preferred embodiment, the amino acid sequence of the adenine deaminase is as shown in SEQ ID No. 4.

In another preferred embodiment, the cytosine deaminase comprises a wild type and a mutant.

In another preferred embodiment, the cytosine deaminase comprises APOBEC.

In another preferred embodiment, the APOBEC is selected from the group consisting of: APOBEC1(a1), APOBEC2(a2), APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3E, APOBEC3F, APOBEC3H, APOBEC4(a4), Activation Induced Deaminase (AID), or a combination thereof.

In another preferred embodiment, the mutant form of cytosine deaminase comprises CBE2.0, CBE2.1, CBE2.2, CBE2.3, CBE 2.4.

In another preferred embodiment, the amino acid sequence of the cytosine deaminase is as shown in any one of SEQ ID No. 6, 8-11.

In another preferred embodiment, the nucleic acid construct is further operably linked to one or more localization signal sequences.

In another preferred embodiment, the positioning signal is selected from the group consisting of: a nuclear localization signal, a chloroplast localization signal, a mitochondrial localization signal, or a combination thereof.

In another preferred embodiment, said localization signal comprises a nuclear localization signal, preferably comprising 1-2 nuclear localization signals.

In another preferred example, the nuclear localization signal comprises bpNLS, SV 40.

In another preferred embodiment, the nucleotide sequence of the nuclear localization signal is as shown in any one of SEQ ID NO. 12-14.

In another preferred embodiment, the amino acid sequence of the nuclear localization signal is shown in SEQ ID No. 15.

In another preferred embodiment, the nucleotide sequence of the S3 element is shown in SEQ ID NO. 16.

In another preferred embodiment, the nucleic acid constructs are further operably linked to one or more second nucleic acid constructs of formula II:

P2-Y1(II)

in the formula (I), the compound is shown in the specification,

p2 is a second promoter sequence;

y1 is the coding sequence of gRNA;

and, each "-" is independently a bond or a nucleotide connecting sequence.

In another preferred embodiment, when at least two nucleic acid constructs of formula II are present, the gRNA sequences may differ from each other.

In another preferred embodiment, the nucleic acid construct of formula II is located at the 5 'end or 3' end of the nucleic acid construct of formula I or distributed at both ends.

In another preferred embodiment, the gRNA includes crRNA, tracrRNA, sgRNA.

In another preferred embodiment, the second promoter is derived from one or more plants selected from the group consisting of: rice, maize, soybean, arabidopsis, tobacco or tomato.

In another preferred embodiment, the second promoter comprises an RNA polymerase III dependent promoter.

In another preferred embodiment, the second promoter is an RNA polymerase III dependent promoter.

In another preferred embodiment, the second promoter is selected from the group consisting of: u6, U3, U6a, U6b, U6c, U6-1, U3b, U3d, U6-26, U6-29, H1, or combinations thereof.

In another preferred embodiment, the second promoter comprises the U6 promoter.

In another preferred embodiment, the above-mentioned nucleotide elements of the present invention are linked in-frame (in-frame) so as to express a fusion protein with the correct amino acid sequence.

In another preferred embodiment, the nucleic acid construct of formula I and the nucleic acid construct of formula II further each independently have a terminator.

In another preferred embodiment, the nucleic acid construct of formula I and the nucleic acid construct of formula II share the same terminator.

In another preferred embodiment, the terminator comprises a terminator suitable for plant gene editing.

In another preferred embodiment, the terminator is selected from the group consisting of: NOS, Poly A, T-UBQ, rbcS, or a combination thereof.

In another preferred embodiment, the construct has a structure of formula IIIa or formula IIIb:

P1-S1-L1-S2-S3-P2-Y1 (IIIa)；

P2-Y1-P1-S1-L1-S2-S3 (IIIb)；

in the formula, each element is as defined above.

In another preferred embodiment, the nucleic acid construct is further operably linked to a first integration element (I1) and a second integration element (I2).

In another preferred embodiment, said first integrational element comprises a sequence of 5' homology arms. In another preferred embodiment, said second integrational element comprises a sequence of 3' homology arms.

In another preferred embodiment, one or more additional expression cassettes are additionally inserted between the I1 and I2 elements.

In another preferred embodiment, the additional expression cassette is separate from the expression cassette comprising the nucleic acid construct of formula I and the expression cassette comprising the nucleic acid construct of formula II.

In another preferred embodiment, said additional expression cassette expresses an agent selected from the group consisting of: a marker gene.

In another preferred embodiment, the marker gene comprises a resistance gene (e.g., a hygromycin resistance gene, a herbicide resistance gene), a fluorescent gene, or a combination thereof.

In a second aspect, the invention provides a vector comprising a nucleic acid construct according to the first aspect of the invention.

In another preferred embodiment, the vector is a plant expression vector.

In another preferred embodiment, the vector is an expression vector that can transfect or transform a plant cell.

In another preferred embodiment, the vector is an agrobacterium Ti vector.

In another preferred embodiment, said construct is integrated into the T-DNA region of said vector.

In another preferred embodiment, the carrier is circular or linear.

In a third aspect, the invention provides a host cell comprising a nucleic acid construct according to the first aspect of the invention, or having integrated into its genome one or more nucleic acid constructs according to the first aspect of the invention.

In another preferred embodiment, the cell is a plant cell.

In another preferred embodiment, the plant is selected from the group consisting of: a monocot, a dicot, a gymnosperm, or a combination thereof.

In another preferred embodiment, the plant is selected from the group consisting of: a graminaceous plant, a leguminous plant, a cruciferous plant, a solanaceae plant, an Umbelliferae plant, or a combination thereof.

In another preferred embodiment, the plant is selected from the group consisting of: arabidopsis, wheat, barley, oats, maize, rice, sorghum, millet, soybean, peanut, tobacco, tomato, cabbage, canola, spinach, lettuce, cucumber, garland chrysanthemum, water spinach, celery, lettuce, or combinations thereof.

In another preferred embodiment, the host cell is obtained by introducing the nucleic acid construct of claim 1 into a cell by a method selected from the group consisting of: agrobacterium transformation, particle gun, microinjection, electroporation, ultrasound, and polyethylene glycol (PEG) mediated methods.

In a fourth aspect, the present invention provides a reagent combination comprising:

(i) a first nucleic acid construct, or a first vector comprising said first nucleic acid construct, said first nucleic acid construct having a structure of formula I from 5 '-3':

P1-S1-L1-S2-S3 (I)

wherein the content of the first and second substances,

l1 is the coding sequence of no or a linker peptide;

and, "-" is a bond or a nucleotide linker sequence;

(ii) a second nucleic acid construct, or a second vector comprising the second nucleic acid construct, the second nucleic acid construct having a structure represented by formula (II) from 5 '-3':

P2-Y1 (II)；

wherein, P2 is a second promoter;

y1 is the coding sequence of gRNA;

and, "-" is a bond or a nucleotide linking sequence.

In another preferred embodiment, the first carrier and the second carrier are different carriers.

In another preferred embodiment, the first nucleic acid construct and the second nucleic acid construct are located on different vectors.

In another preferred embodiment, the first vector and the second vector are the same vector.

In another preferred embodiment, the first nucleic acid construct and the second nucleic acid construct are located on the same vector.

According to a fifth aspect of the invention there is provided a kit comprising a combination of reagents according to the fourth aspect of the invention.

In another preferred embodiment, the kit further comprises a label or instructions.

The sixth aspect of the present invention provides a method for gene editing in a plant, comprising the steps of:

(i) providing a plant to be edited; and

(ii) introducing a nucleic acid construct according to the first aspect of the invention, a vector according to the second aspect of the invention or an agent combination according to the fourth aspect of the invention into a plant cell of said plant to be edited, thereby effecting gene editing in said plant cell.

In another preferred embodiment, the introduction is by Agrobacterium.

In another preferred embodiment, the introduction is by gene gun.

In another preferred embodiment, the gene editing is site-directed base substitution (or mutation).

In another preferred embodiment, the site-directed substitution (or mutation) comprises a mutation of a to G.

In another preferred embodiment, the site-directed substitution (or mutation) comprises a mutation of C to T.

In another preferred embodiment, the plant includes any higher plant type that can be subjected to transformation techniques, including monocots, dicots and gymnosperms.

In another preferred embodiment, the plant is a dicotyledonous plant.

The seventh aspect of the present invention provides a method for preparing a gene-edited plant cell, comprising the steps of:

transfecting a plant cell with a nucleic acid construct according to the first aspect of the invention, a vector according to the second aspect of the invention or an agent according to the fourth aspect of the invention in combination such that a site-directed substitution (or mutation) of a chromosome in said plant cell occurs, thereby producing said gene-edited plant cell.

In another preferred embodiment, the transfection is carried out by Agrobacterium transformation or gene gun bombardment.

In an eighth aspect, the invention provides a nucleic acid construct according to the first aspect of the invention, a vector according to the second aspect of the invention, a host cell according to the third aspect of the invention, a reagent combination according to the fourth aspect of the invention, and a use of a kit according to the fifth aspect of the invention for gene editing in a plant.

The ninth aspect of the present invention provides a method for preparing a gene-edited plant, comprising the steps of:

regenerating the gene-edited plant cell produced by the method of the seventh aspect of the present invention into a plant body, thereby obtaining the gene-edited plant.

In a tenth aspect, the present invention provides a gene-edited plant prepared by the method of the ninth aspect.

It is to be understood that within the scope of the present invention, the above-described features of the present invention and those specifically described below (e.g., in the examples) may be combined with each other to form new or preferred embodiments. Not to be reiterated herein, but to the extent of space.

Drawings

FIG. 1 shows the structure of the ABE single base editor containing slEF1 a.

FIG. 2 shows the efficiency of different promoters in single base editing in tomato.

FIG. 3 shows single base editing efficiency in soybean using different promoters and different base editors.

Detailed Description

The present inventors have conducted extensive and intensive studies and, for the first time, have unexpectedly found a highly efficient EF promoter (e.g., tomato EF promoter) which is constructed in a single base editing system of ABE and CBE to drive the expression of a fusion protein composed of (a) a gene-editing enzyme and (b) adenine deaminase and/or cytosine deaminase, and which significantly improves editing efficiency in plants. On this basis, the present inventors have completed the present invention.

Term(s) for

As used herein, the term "homology arm" refers to a flanking sequence on both sides of the foreign sequence to be inserted on the targeting vector that is identical to the genomic sequence, and serves to identify the region where recombination occurs.

As used herein, the term "plant promoter" refers to a nucleic acid sequence capable of initiating transcription of a nucleic acid in a plant cell. The plant promoter may be derived from plants, microorganisms (such as bacteria, viruses) or animals, or may be a promoter artificially synthesized or modified.

As used herein, the term "gene editing" or "base mutation" or "base editing" refers to the occurrence of a base substitution (disruption), insertion (insertion), and/or deletion (deletion) at a position in a nucleotide sequence. The "editing" or "mutation" in the present invention is preferably a single-base mutation.

As used herein, the term "base substitution" refers to a mutation of a base at a position in a nucleotide sequence to another, different base, such as an a mutation to a G.

As used herein, the term "A.T to g.c" refers to a mutation or substitution of an a-T base pair to a G-C base pair at a position in a double-stranded nucleic acid sequence (particularly a genomic sequence).

As used herein, the term "C.G through T.A" refers to a mutation or substitution of a C-G base pair to a T-A base pair at a position in a double-stranded nucleic acid sequence, particularly a genomic sequence.

As used herein, the term "gene editing enzyme" refers to a nuclease suitable for CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats), TALEN (transcription Activator-like effector nucleases), ZFN (Zinc finger nucleic acid technology), and like editing tools. Preferably, the gene-editing enzyme is a CRISPR enzyme, also known as a Cas protein, of the kind including, but not limited to: cas9 protein, Cas12 protein, Cas13 protein, Cas14 protein, Csm1 protein, FDK1 protein. The Cas protein refers to a protein family, and can have different structures according to different sources, such as SpCas9 derived from Streptococcus pyogenes (Streptococcus pygeneus), and SaCas9 derived from Staphylococcus (Staphylococcus aureus); the lower classification can also be made according to structural features (e.g., domains), such as Cas12 family including Cas12a (aka Cpf1), Cas12b, Cas12c, Cas12i, and the like. The Cas protein may have double-stranded or single-stranded or no cleavage activity. The Cas protein can be a wild type or a mutant thereof, the mutation type of the mutant comprises amino acid substitution, substitution or deletion, and the mutant can change or not change the enzyme digestion activity of the Cas protein. Preferably, the Cas protein of the present invention has only single-strand cleavage activity or no cleavage activity, which is a mutant of a wild-type Cas protein. Preferably, the Cas protein of the present invention is Cas9, Cas12, Cas13 or Cas14 having single-strand cleavage activity. In a preferred embodiment, the Cas9 proteins of the present invention include SpCas9n (D10A), nspscas 9NG, SaCas9n, ScCas9n, XCas9n, wherein "n" represents nick, i.e. a Cas protein having only single strand cleavage activity. Mutating a known Cas protein to obtain a Cas protein with single-stranded or no cleavage activity is routine technical means in the art. As known to those skilled in the art, many Cas proteins with nucleic acid cleavage activity, known proteins or modified variants thereof, which are reported in the prior art, can achieve the functions of the present invention, and are included herein by reference.

As used herein, the term "coding sequence for a Cas protein" refers to a nucleotide sequence encoding a Cas protein. In the case where the inserted polynucleotide sequence is transcribed and translated to produce a functional Cas protein, the skilled artisan will recognize that, because of the degeneracy of the codons, a large number of polynucleotide sequences may encode the same polypeptide. In addition, the skilled artisan will also recognize that different species have certain preferences for codons, and that codons of Cas proteins may be optimized as desired for expression in different species, and such variants are specifically encompassed by the term "coding sequence for Cas protein". Furthermore, the term specifically includes full-length, substantially identical sequences to Cas gene sequences, as well as sequences encoding proteins that retain Cas protein function.

As used herein, the "gRNA" is also referred to as guide RNA or guide RNA and has a meaning commonly understood by those skilled in the art. In general, the guide RNA may comprise, or consist essentially of, a direct repeat and a guide sequence (guide sequence). grnas may include crRNA and tracrRNA or only crRNA depending on Cas protein on which they depend in different CRISPR systems. The crRNA and tracrRNA may be artificially engineered to fuse to form single guide RNA (sgRNA). The gRNA of the invention can be natural, and can also be artificially modified or designed and synthesized. In certain instances, the guide sequence is any polynucleotide sequence that is sufficiently complementary to the target sequence to hybridize to the target sequence and direct specific binding of the CRISPR/Cas complex to the target sequence, typically having a sequence length of 17-23 nt. In certain embodiments, the degree of complementarity between a targeting sequence and its corresponding target sequence, when optimally aligned, is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%. Determining the optimal alignment is within the ability of one of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, the Smith-Waterman algorithm in matlab (Smith-Waterman), Bowtie, Geneius, Biopython, and SeqMan.

As used herein, the term "plant" includes whole plants, plant organs (e.g., leaves, stems, roots, etc.), seeds, and plant cells and progeny of same. The type of plant that can be used in the method of the invention is not particularly limited and generally includes any plant type that can be subjected to gene editing techniques, including monocotyledonous, dicotyledonous and gymnosperms, angiosperms, including mainly woody plants.

The term "expression cassette" as used herein refers to a polynucleotide sequence comprising the sequence components of the gene to be expressed and the elements required for expression. Components required for expression include a promoter and polyadenylation signal sequence. In addition, the expression cassettes of the invention optionally contain other sequences including (but not limited to): enhancers, secretory signal peptide sequences, and the like.

In the present invention, the nucleotide sequence is described in the 5 'to 3' direction unless otherwise noted.

As used herein, "uracil DNA glycosylase inhibitor (UGI)" is capable of inhibiting intracellular uracil DNA glycosidase from re-catalyzing U back to C.

EF promoter

The EF promoter refers to a promoter of an Elongation Factor (EFs), and the EF factor refers to a protein factor that promotes extension of a polypeptide chain when mRNA is translated. Elongation factors in eukaryotes include: EF1 α, EF1 β and EF 2. Elongation factors in prokaryotes include EF-Tu, EF-Ts and EF-G. EF1a is a eukaryotic elongation factor 1 α, which is an important component of protein biosynthesis. EF1A catalyzes the binding of aminoacyl-tRNA to the ribosomal a site through a GTP-dependent mechanism. EF1A, which accounts for 3-10% of the total soluble protein, is considered one of the most abundant soluble proteins in the cytoplasm.

In a preferred embodiment, the EF promoter includes, but is not limited to: EF1a promoter, EF1 beta promoter, EF2 promoter, EF-Tu, EF-Ts, and EF-G.

In a preferred embodiment, the promoter of the present invention refers to the EF1a promoter element derived from a plant of the solanaceae family (preferably, from tomato or similar plants).

A typical promoter of the present invention has the sequence shown in SEQ ID No. 1.

It is understood that the term also includes promoters from other different solanaceae plants that are homologous to the promoter shown in SEQ ID No. 1. In addition, the term also includes derived promoters or active fragments of the promoter shown in SEQ ID No. 1 or homologous promoters thereof, mainly these derived promoters or active fragments retain the function of efficient gene editing efficiency, for example, retain at least 50% of the specific promoter function (expressed as the expression amount of the foreign gene that can be initiated) of the promoter shown in SEQ ID No. 1.

As used herein, the term "solanaceous plant" includes tomato, potato, eggplant, pepper, medlar, tobacco.

As used herein, the term "promoter" or "promoter region" refers to a nucleic acid sequence that is precisely and efficiently functional to initiate the transcription of a gene, directing the transcription of the gene nucleic acid sequence into mRNA, which is usually present upstream (5' to) the coding sequence of the gene of interest, and generally, the promoter or promoter region provides a recognition site for RNA polymerase and other factors necessary for proper initiation of transcription.

Herein, the promoter or promoter region (domain) includes a variant of the promoter, which can be obtained by inserting or deleting a regulatory region, performing random or site-directed mutagenesis, or the like.

The present invention also includes nucleic acids having 50% or more (preferably 60% or more, 70% or more, 80% or more, more preferably 90% or more, more preferably 95% or more, most preferably 98% or more, e.g., 99%) homology to the preferred promoter sequences of the present invention (SEQ ID No.:1), which also have a function of specifically increasing the efficiency of gene editing in plants. "homology" refers to the level of similarity (i.e., sequence similarity or identity) between two or more nucleic acids in terms of percentage positional identity.

It is understood that although the promoter EF1a from Solanaceae, such as tomato, is provided in the examples of the present invention, promoters derived from other similar plants (particularly from the same family as tomato) and having some homology (conservation) to the promoter of the present invention are also included within the scope of the present invention, as long as the promoter can be easily isolated from other plants by one skilled in the art after reading the present application based on the information provided herein.

As used herein, "exogenous" or "heterologous" refers to the relationship between two or more nucleic acid or protein sequences of different origin. For example, a promoter is foreign to a gene of interest if the combination of the promoter and the sequence of the gene of interest is not normally found in nature. A particular sequence is "foreign" to the cell or organism into which it is inserted.

As used herein, "cis-regulatory element" refers to a conserved base sequence that acts to regulate the transcription initiation and transcription efficiency of a gene.

The promoter of the present invention may be operably linked to an exogenous gene, which may be exogenous (heterologous) with respect to the promoter. The foreign gene (also referred to as a target gene) of the present invention is not particularly limited, and may be a gene encoding a protein having a specific function, such as (a) a gene-editing enzyme and (b) an adenine deaminase and/or a cytosine deaminase.

Representative examples of such exogenous genes include (but are not limited to): resistance genes, selection marker genes, epitope tags, reporter gene sequences, nuclear localization signal sequences, transcription activation domains (e.g., VP64), transcription inhibition domains (e.g., KRAB domains or SID domains), nuclease domains (e.g., Fok1), viral capsid protein genes, antibody genes, and domains having an activity selected from the group consisting of nucleotide deaminase, methylase activity, demethylase, transcription activation activity, transcription inhibition activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, and nucleic acid binding activity.

The resistance gene is selected from the following group: a herbicide-resistant gene, an antiviral gene, a cold-resistant gene, a high-temperature-resistant gene, a drought-resistant gene, a waterlogging-resistant gene, or an insect-resistant gene. The screening marker gene is selected from the following group: gus (. beta. -glucuronidase) gene, hyg (hygromycin) gene, neo (neomycin) gene, or gfp (green fluorescent protein) gene.

The invention also provides a gene expression cassette, which comprises the following elements from 5 'to 3': a promoter, a gene ORF sequence, and a terminator. Preferably, the promoter sequence is shown in SEQ ID No. 1 or has homology of more than or equal to 90%, preferably more than or equal to 95%, and more preferably more than or equal to 98% with the sequence shown in SEQ ID No. 1.

The invention also provides a recombinant vector comprising the promoter and/or the gene expression cassette of the invention. In a preferred embodiment, the promoter downstream of the recombinant vector comprises a multiple cloning site or at least one cleavage site. When the target gene is required to be expressed, the target gene is ligated into a suitable multiple cloning site or enzyme cleavage site, thereby operably linking the target gene with the promoter. As another preferred mode, the recombinant vector comprises (in the 5 'to 3' direction): a promoter, a gene of interest, and a terminator. If desired, the recombinant vector may further comprise an element selected from the group consisting of: a 3' polyadenylation signal; an untranslated nucleic acid sequence; transport and targeting nucleic acid sequences; resistance selection markers (dihydrofolate reductase, neomycin resistance, hygromycin resistance, green fluorescent protein, etc.); an enhancer; or operator.

One of ordinary skill in the art can use well-known methods to construct expression vectors containing the promoter and/or gene sequences of interest described herein. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like.

The promoter, expression cassette or vector of the present invention may be used to transform an appropriate host cell to allow the host to express the protein. The host cell may be a prokaryotic cell, such as E.coli, Streptomyces, Agrobacterium: or lower eukaryotic cells, e.g. yeastA cell; or higher eukaryotic cells, such as plant cells. It will be clear to one of ordinary skill in the art how to select an appropriate vector and host cell. Transformation of a host cell with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is a prokaryote (e.g., Escherichia coli), CaCl may be used₂The treatment can also be carried out by electroporation. When the host is a eukaryote, the following DNA transfection methods may be used: calcium phosphate coprecipitation, conventional mechanical methods (e.g., microinjection, electroporation, liposome encapsulation, etc.). The transformed plant may be transformed by methods such as Agrobacterium transformation or biolistic transformation, for example, leaf disc method, immature embryo transformation, flower bud soaking method, etc. The transformed plant cells, tissues or organs can be regenerated into plants by conventional methods to obtain transgenic plants.

As a preferred mode of the present invention, a method for producing a transgenic plant is: a vector carrying a promoter and a target gene (which are operably linked) is transferred into Agrobacterium, and the Agrobacterium then integrates a vector fragment containing the promoter and the target gene into the plant chromosome. The transgenic recipient plant can be Arabidopsis thaliana, wheat, barley, oat, corn, rice, sorghum, millet, soybean, peanut, tobacco, tomato, cabbage, rape, spinach, lettuce, cucumber, garland chrysanthemum, swamp cabbage, celery, and leaf lettuce. In the embodiment of the invention, the recombinant vector is a pCAMBIA1300 vector, and the promoter of the invention is constructed into the vector to transform plants.

In a preferred embodiment, the invention clones an EF promoter (such as a tomato SlEF1a promoter), and uses the promoter to drive the expression of a fusion protein coding sequence of Cas enzyme and deaminase, so as to finally obtain a system for high-efficiency single base substitution and gene knockout of dicotyledonous plants.

Adenine deaminase

As used herein, the term "adenine deaminase" is an enzyme that catalyzes the hydrolytic deamination of adenine to form hypoxanthine and ammonia. Adenine a is converted to hypoxanthine I, which can pair with cytosine and is read and copied at the DNA level as guanine (G), resulting in the conversion of the a.t pair to the g.c pair. The TadA adenine deaminase is derived from Escherichia coli, and is obtained by artificially modifying an ecTadA mutant. The dimers of TadA and ecTadA are currently commonly used adenine deaminases.

In the present invention, suitable TadA comprises both the wild type form and its specific mutant form TadA7-10, or a combination of both the wild type form and the mutant form. TadA7-10 is capable of deaminating with DNA as a substrate.

In the present invention, the adenine deaminase coding sequence in the nucleic acid construct can be codon optimized in a manner preferred by the host, depending on the host.

Cytosine deaminase

As used herein, the term "cytosine deaminase (APOBEC)" is an enzyme that catalyzes the deamination of intracellular cytosines to uracil, converting cytosine C to uracil U, which is recognized as a T during DNA replication by the polymerase enzyme that damages the DNA during re-replication, resulting in the conversion of a c.g pair to a t.a pair. 11 members of the APOBECs family have been found, including APOBEC1(a1), APOBEC2(a2), APOBEC 3A-H (3A, 3B, 3C, 3D, 3E, 3F, 3H), APOBEC4(a4), and Activated Induced Deaminase (AID).

In the present invention, suitable cytosine deaminases comprise both the wild-type form and specific mutated forms thereof (e.g. CBE2.0, CBE2.1, CBE2.2, CBE2.3, CBE2.4) and also combinations of wild-type and mutated forms. Mutant forms of cytosine deaminases are capable of deaminating using DNA as a substrate.

In the present invention, the cytosine deaminase coding sequence in the nucleic acid construct can be codon optimized in a manner that is preferred by the host, depending on the host.

In a preferred embodiment of the invention, the preferred cytosine deaminase is CBE2.0, CBE2.1, CBE2.2, CBE2.3, CBE 2.4.

The amino acid sequence of CBE2.0 is shown in SEQ ID NO. 6, and the nucleotide sequence is shown in SEQ ID NO. 7.

The amino acid sequence of CBE2.1 is shown in SEQ ID NO. 8.

The amino acid sequence of CBE2.2 is shown in SEQ ID NO. 9.

The amino acid sequence of CBE2.3 is shown in SEQ ID NO. 10.

The amino acid sequence of CBE2.4 is shown in SEQ ID NO. 11.

Construction of the invention

The invention provides a nucleic acid construct for gene editing in a plant, said nucleic acid construct having a5 '-3' structure of formula I:

P1-S1-L1-S2-S3 (I)；

in the formula (I), the compound is shown in the specification,

p1, S1, L1, S2 and S3 are elements for constituting the construct, respectively

As defined in the first aspect of the invention;

and, each "-" is a bond or a nucleotide connecting sequence.

In a preferred embodiment, the nucleic acid construct is further operably linked to one or more second nucleic acid constructs of formula II:

P2-Y1(II)；

wherein P2 and Y1 are as defined in the first aspect of the invention.

In a preferred embodiment, the nucleic acid construct is further operably linked to a first integration element (I1) and a second integration element (I2).

Wherein the I1 element (or left integrating element) and the I2 element (or right integrating element) can act synergistically to integrate the elements located therebetween (i.e., the nucleotide sequence from P1 to Y1) into the genome of a plant cell.

Representative I1 and I2 are Ti elements from Agrobacterium. Of course, other elements that may serve a similar integration function may also be used with the present invention.

The various elements used in the constructs of the invention are either known in the art or can be prepared by methods known to those skilled in the art. For example, the constructs of the present invention can be formed by conventional methods, such as PCR, total artificial chemical synthesis, enzymatic digestion to obtain the corresponding elements, and then ligating them together by well-known DNA ligation techniques.

The vector of the present invention is formed by inserting the construct of the present invention into a foreign vector, particularly a vector suitable for the manipulation of transgenic plants.

The vector of the present invention is used to transform plant cells so as to mediate the vector of the present invention to integrate plant cell chromosomes, and the vector is expressed in plants to prepare plant cells edited by genes.

The gene-edited plant cell of the present invention is regenerated into a plant body, thereby obtaining a gene-edited plant.

The constructed nucleic acid constructs of the present invention can be introduced into plant cells by conventional plant recombination techniques (e.g., Agrobacterium transfer techniques) to obtain plant cells harboring the nucleic acid construct (or a vector carrying the nucleic acid construct), or to obtain plant cells having the nucleic acid construct integrated into their genome.

The individual plants of the present invention into which the nucleic acid construct is incorporated can be isolated or removed from their progeny by conventional screening or by other means known in the art to produce genetically edited plants that do not contain the nucleic acid construct.

Specifically, the invention drives the expression of a gene editing enzyme (such as Cas9) and deaminase fusion protein coding sequence by a specific EF promoter, such as tomato EF1a, so as to improve the gene editing efficiency.

Vector construction

The vector is mainly characterized in that the coding sequences of a specific EF promoter (such as tomato EF1a), deaminase and Cas fusion protein, and optionally a nuclear localization signal and a UGI coding sequence are connected together to form the specific nucleic acid construct. When the nucleic acid construct is expressed in the cytoplasm, the fusion protein encoded by the nucleic acid construct can be transferred into the nucleus very efficiently, and the guide RNA encoded by the construct of formula II is guided to the target position in the genome, so that base substitution from A.T to G.C or from C.G to T.A is carried out at the target position, the risk of insertion/deletion is substantially avoided or eliminated, and the efficiency of gene editing can be significantly improved.

Since adenine deaminase mutates a to G, cytosine deaminase mutates C to T does not require DNA double strand cleavage activity of the Cas protein. Thus, in the present invention a Cas protein is a mutated Cas protein with no cleavage activity or with single-strand cleavage activity. In a preferred embodiment, the Cas protein of the present invention may be nCas9, the amino acid sequence of which is shown in SEQ ID No. 2. Generally, in order to increase the activity of the fusion protein, the proteins are generally connected by some flexible short peptide, i.e., Linker (Linker peptide sequence). Preferably, the Linker can be XTEN, the coding sequence of the Linker is shown in SEQ ID NO. 17, and the amino acid sequence of the Linker is shown in SEQ ID NO. 18.

The expression cassette for guide RNA suitable for plant cells was selected and constructed in the same vector as the open expression cassette (ORF) for the fusion protein described above.

In the present invention, the vector may be, for example, a plasmid, a virus, a cosmid, a phage, etc., which are well known to those skilled in the art and are described in many cases in the art. Preferably, the expression vector in the present invention is a plasmid. Expression vectors can include promoters, ribosome binding sites for translation initiation, polyadenylation sites, transcription terminators, enhancers, and the like. The expression vector may also contain one or more selectable marker genes for use in selecting host cells containing the vector. Such selectable markers include the gene encoding dihydrofolate reductase, or the gene conferring neomycin tolerance, the gene conferring resistance to tetracycline or ampicillin, and the like.

The nucleic acid constructs of the invention may be inserted into the vector by a variety of methods, for example, by ligation following digestion of the insert and vector with appropriate restriction endonucleases. A variety of cloning techniques are known in the art and are within the knowledge of those skilled in the art.

Vectors suitable for use in the present invention include commercially available plasmids such as, but not limited to: pBR322(ATCC37017), pCAMBIA1300, pKK223-3(Pharmacia Fine Chemicals, Uppsala, Sweden), GEM1(Promega Biotec, Madison, Wis., USA) pQE70, pQE60, pQE-9(Qiagen), pD10, psiX174pBluescript II KS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5(Pharmacia), pKK232-8, pCM7, pSV2, CAT pOG44, pKK 1, pSG (VKA), pSVK3, pSpBPV, pMSG, and Stratine), etc.

Genetic transformation

In the present invention, there is no particular limitation on the method of introducing the construct of formula I of the present invention into cells or integrating it into the genome. This can be carried out in a conventional manner, for example by introducing the constructs of the formula I or the corresponding vectors into the plant cells by suitable methods. Representative methods of introduction include, but are not limited to: agrobacterium transfection, particle gun, microinjection, electroporation, ultrasound, and polyethylene glycol (PEG) mediated methods.

In the present invention, the recipient plant is not particularly limited, and includes various crop plants (e.g., gramineae), forestry plants, horticultural plants (e.g., flowers), and the like. Representative examples include, but are not limited to: rice, soybean, tomato, corn, tobacco, wheat, sorghum, potato, and the like.

After the above DNA vector or fragment is introduced into a plant cell, the fusion protein and gRNA are expressed in DNA in the transformed plant cell. A gene editing enzyme (such as Cas9 nuclease) fused with adenine deaminase and/or cytosine deaminase mutates a at a target position to G (thereby mutating T of the complementary strand to C) or C at a target position to T (thereby mutating G of the complementary strand to a) under the guidance of the corresponding gRNA.

For plant cells or tissues or organs subjected to site-specific replacement of plant genomes by the method, corresponding gene-edited plants can be obtained by regeneration by a conventional method. For example, the plant after the base substitution is regenerated by tissue culture.

Applications of

The invention can be used in the field of plant genetic engineering, for plant research and breeding, in particular for genetic improvement of crops, forestry crops or horticultural plants with economic value.

The main advantages of the invention include:

(1) the invention firstly connects a specific promoter (such as an Ef1a promoter) with a coding sequence of a gene editing enzyme (such as Cas9 nuclease), adenine deaminase and/or cytosine deaminase, and optionally also comprises a nuclear localization signal and UGI (UGI), thereby forming the specific nucleic acid construct of the invention, successfully realizes the gRNA-guided base site-directed mutation (such as A mutation to G) in a plant, and has very high mutation efficiency (which can be more than or equal to 70 percent or higher).

(2) Certain nucleic acid constructs of the invention may edit certain other gene sites where promoters do not function, thereby circumventing the genotypic restriction barrier to gene editing.

(3) The specific nucleic acid construct of the invention can edit some other plants with non-functional promoters, such as soybean, thereby effectively expanding the application range of a gene editing system and eliminating species obstacles.

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Experimental procedures without specific conditions noted in the following examples, molecular cloning is generally performed according to conventional conditions such as Sambrook et al: the conditions described in the Laboratory Manual (New York: Cold Spring Harbor Laboratory Press,1989), or according to the manufacturer's recommendations. Unless otherwise indicated, percentages and parts are by weight. The test materials and reagents used in the present invention are commercially available without specific reference.

Example 1 Single base editing efficiency of different promoters in tomato

1. Target selection

Solyc05g012020 affecting fruit development in tomato is selected as a target gene, 6 target sites are selected to design sgRNAs, and the designed 6 sgRNAs have the following sequences: sgRNA 1: TACTGGAGTTGTACCTGGA (SEQ ID No.:20), sgRNA 2: GGAACAGCTTGAACGTCAAT (SEQ ID No.:21), sgRNA 3: GAACAGCCTTCTCATCATGA (SEQ ID No.:22), sgRNA 4: GGTGAGGATTTGGGACAATT (SEQ ID No.:23), sgRNA 5: CTGTGAATCTGATGAAGTTT (SEQ ID No.:24), sgRNA 6: GAAAAGTAATAACAAAGGGC (SEQ ID NO: 25).

2. Vector construction

Obtaining an expression cassette of an ABE single base editor (see figure 1) by a homologous recombination technology, wherein the nucleotide sequence of the adenine deaminase ABE7.10 is shown as SEQ ID NO.5 or 19, the nucleotide sequence of the SlEF1a promoter is shown as SEQ ID NO. 1, and the specific operation is as follows:

A) the tomato genome DNA is used as a template, and a forward/reverse primer pSlEF1a-F/pSlEF1a-R is used for amplifying a target fragment to obtain a PCR product (the length is about 1583bp, and the primer annealing temperature is 58).

The PCR reaction conditions are as follows: pre-denaturation at 95 ℃ for 5min, denaturation at 98 ℃ for 30 sec, annealing at 58 ℃ for 30 sec, extension at 72 ℃ for 45 sec, 35 cycles, and extension at 72 ℃ for 5min

B) The vector backbone was recovered by restriction endonuclease Sbf1 and SalI

proAtU6-gRNA-pro35S-ABE7.10-nspCas9

C) The PCR product obtained from A is connected into the skeleton vector obtained from B by homologous recombination to obtain a single-base editing vector proAtU6-gRNA-proSlEF1a-ABE7.10-nspCas9

The PCR reaction conditions are as follows: 30min at 50 DEG C

D) Transforming Escherichia coli, selecting monoclonal sequencing to verify that the fragment is successfully connected into the vector.

Single-base editing vectors containing 35S, UBI, AtRPS5A, SlRPS5A1, SlRPS5A2 and SlTCTP promoters were constructed in the same manner.

3. Genetic transformation

(A) The plasmid constructed above is directly transformed into agrobacterium EHA 105:

(1) adding plasmid DNA into Agrobacterium infected cells, ice-cooling for 30min, placing in liquid nitrogen for 5min, immediately placing in water bath at 37 deg.C for 5min, and placing on ice for 5min

(2) Taking out the centrifuge tube, adding 700ul YEP culture medium, and shake culturing for 2-4 hr.

(3) Taking out the bacterial liquid and coating the bacterial liquid and a YEP culture medium plate containing corresponding antibiotics, and carrying out inverted culture in an incubator until bacterial colonies are visible for about 2 days.

(B) Tomato transgenesis

(1) Taking sterile tomato seedlings (cotyledon is completely unfolded and the first true leaf is slightly exposed) with the age of 7-10d, cutting the cotyledon into 5mm square leaves (cutting off the tip and a small part of the base of the leaf and leaving the middle part), placing the leaves in a pre-culture medium with the front face upward, and carrying out dark culture at 25 ℃ for 2 d.

(2) Streaking the bacterial liquid preserved at-80 deg.c on solid YEB medium, and dark culturing at 28 deg.c for 2 days. A single colony was picked and added to 5ml of liquid YEB medium, incubated at 28 ℃ and 200rpm for 1 day. 2ml of the cell suspension was added to 50ml of fresh YEB medium at 28 ℃ and 200 rpm. Centrifuging at 4 deg.C and 5000rpm for 10min, resuspending thallus with infection buffer solution, and adjusting OD600 to about 0.6-0.8.

(3) And infecting the cotyledon pre-cultured for 2d in the bacterial liquid for 5-10min, sucking off the excess bacterial liquid on a filter paper dish, placing the cotyledon with the front side facing upwards on a co-culture medium (or filter paper soaked by the infection liquid without bacteria), and culturing for 2d in the dark at 25 ℃.

(4) Transferring the cotyledon co-cultured for 2 days to a sterilization culture medium, culturing at 25 ℃ for 7 days, culturing in the dark for the first 2-3 days, and culturing in the light for the last 4-5 days. After co-cultivation for 7 days, the cotyledons were transferred to a selection medium and cultured for 30-45 days. Subcultured every 15 d.

(5) After the sterilization, the marker gene was detected (GUS is taken as an example), and several cotyledons after 7d of sterilization were stained with GUS, and the infection time was adjusted according to the size of the stained area. (not every batch, periodically to check the activity of bacteria).

(6) Detecting the damage of the agrobacterium to the cotyledon, taking a plurality of cotyledons after 7d sterilization, continuously growing the cotyledons in a sterilization culture medium for about 30d, subculturing once every 15 days, observing the differentiation rate of the cotyledons, judging the damage degree of the bacterial liquid to the cotyledons, and adjusting the infection time.

(7) Cutting off the seedling when the differentiated young bacteria grow to about 2cm, transferring the seedling to a rooting culture medium, and culturing until the root grows out

(8) Transferring the differentiated robust seedlings to a rooting culture medium containing antibiotics for rooting culture for one week, hardening off the seedlings at room temperature for 2-3 days, and then culturing in a greenhouse matrix.

(9) And (5) gene editing detection. Taking leaves of each plant, extracting genome DNA, and designing primers on two sides of a target site of the gRNA. The amplified fragments were subjected to Sanger sequencing to determine the genotype of each plant.

4. Results of the experiment

The slEF1a promoter achieved up to 70% editing efficiency in single base editing, which was 2-20 fold higher than the other promoters (see fig. 2).

5. Conclusion of the experiment

The slEF1a promoter can efficiently drive the expression of the fusion protein of deaminase and Cas9, effectively expands the application range of a single-base editing tool, and has important significance for improving plant traits and cultivating varieties.

Example 2 Single base editing efficiency of different promoters in Soybean

Selecting GmELF3a and GmALS1 genes in soybean, selecting different promoters and different base editors to examine the single-base editing efficiency of the different promoters in soybean, and the used gRNAs are shown in the following table:

first, in the manner of example 1, the efficiency of editing of the SlEF1a promoter (pSlEF1a), CaMV35S promoter (35S) and AA6 promoter (pAA6, ref: CN101370939A) when used in combination with ABE7.10(SEQ ID NO: 5 or 19) and Cas9 was examined; as shown in fig. 3, the "a to G gRNA 1" was a result of using different promoters in combination with the above-mentioned adenine deaminase, and in soybean, the editing efficiency by the SlEF1a promoter was much higher than those of CaMV35S and AA6 promoters.

In addition, in the above manner, the adenine deaminase was replaced with a cytosine deaminase whose amino acid sequence is shown in SEQ ID No. 6 and 8-11, and in this example, the cytosine deaminase shown in SEQ ID No. 6 is preferred, and the editing efficiency of different promoters when used in combination with cytosine deaminase and Cas9 was examined; as shown in fig. 3, the result of the "C to T gRNA 2" was that different promoters were used in combination with the cytosine deaminase, and in soybean, the editing efficiency by the SlEF1a promoter was much higher than that of CaMV35S and AA6 promoters.

All documents referred to herein are incorporated by reference into this application as if each were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.

Sequence listing

<110> Shunheng Biotech Co., Ltd

<120> A method for nucleic acid expression

<130> P2020-0390

<160> 27

<170> SIPOSequenceListing 1.0

<210> 1

<211> 1583

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 1

gattagtttg tcaaatagta gagttcattt aaaattcttc agccatatag ttctattttt 60

aagctagtcg actttttttt tcttactgaa aattaatatt tttttctttt tgaaatacta 120

atacatctaa atttaacaat tgccaaagtg atttttaatt agcttgctgg ctaatcacaa 180

taaaaattac tctcctttac tatataagta aatttttatt gctatatttg ttattattat 240

tattattatt aatatttatt ttctacaaat ttaataatat tttattttat atcattttaa 300

aaagataagt aatgaaatat taagaattcg tttataattc ttttgcaggt gggtttctat 360

ttgtaagcta atctttttca gttatccttt ttttaaaatc tttattatta ttatagctat 420

atcttttatc ttttaaaatt aacattatct attaaagata atttcaataa aagagtaaaa 480

attaatttag agttctactg tcttcaaatt tctattttaa aaaatacttt taaaacttga 540

tgtatttttt acgtggtttt tcactatgac ttaatttctg ttttattata atatgtataa 600

atataaaaat agattttcca taacatatta taaaaaatgt aaggggcatt tacgtaaata 660

gatagactta aaagaggcac cgagtgaacc ctaattctca tcgttgagac tataaaatgc 720

ccattatccc attcgcacag tctcttcatt acttttgctg ttatttctcc tcagctgtgc 780

cgcatatcgc ctaatttttc ttctctaagg tttcatcatc ttcaccaatt tctttaatct 840

cgattcaatt ttttatgttt gatctgttat tgttctgtca ctacatgtgt ttttcagttg 900

ttttactaga tgattttcac tgtcttcttg ttagatcata catatattga aaatgttttg 960

gattgacttt tttgtattgt gaatatctgt tattgtttga ttgttgttca gtatttacac 1020

acccgatctg tgttatgagc ttggtcataa ctatttctct gtatgtaaat acagatctgt 1080

taatgtttgt aatcaatttt tcatatgcac tgttgatatt gttctctctc ctgtcctgtt 1140

atatgttgat atgattcggt ttttgtataa cttgaactaa acactagtcc taaatgtttt 1200

ttttactatt taagatttat ataatatgga tagatttttt gagttcctag tctctgaaga 1260

ggttaagctt gctgtagttg tttaccagtt gaggtgcaat actaaaaatc aattcaatta 1320

ctgatatttt ttgctgttta ggtttttgac aaagtacttt aatttgcttt attgaactaa 1380

aaacgtagtc ctgaattcat tgcaagtgtg aaagctatag ttcattgttt ttgttgcaat 1440

tcttgaaaaa ttaattggtc aagctataat ggattttact ttttctgttt taatattgaa 1500

tttgctgaat ttatgaatgg gttgcatggt ttttgaaata tgttgttgtg tgttgtgtaa 1560

atgcagtttc ttagtgtctc aag 1583

<210> 2

<211> 1368

<212> PRT

<213> Streptococcus pyogenes (Streptococcus pyogenes)

<400> 2

Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys

1010 1015 1020

Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser

1025 1030 1035 1040

Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu

1045 1050 1055

Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile

1060 1065 1070

Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser

1075 1080 1085

Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly

1090 1095 1100

Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile

1105 1110 1115 1120

Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser

1125 1130 1135

Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1140 1145 1150

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile

1155 1160 1165

Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala

1170 1175 1180

Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys

1185 1190 1195 1200

Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser

1205 1210 1215

Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr

1220 1225 1230

Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His

1250 1255 1260

Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val

1265 1270 1275 1280

Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys

1285 1290 1295

His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu

1300 1305 1310

Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp

1315 1320 1325

Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp

1330 1335 1340

Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile

1345 1350 1355 1360

Asp Leu Ser Gln Leu Gly Gly Asp

1365

<210> 3

<211> 4101

<212> DNA

<213> Streptococcus pyogenes (Streptococcus pyogenes)

<400> 3

gacaagaagt acagcatcgg cctggccatc ggcaccaact ctgtgggctg ggccgtgatc 60

accgacgagt acaaggtgcc cagcaagaaa ttcaaggtgc tgggcaacac cgaccggcac 120

agcatcaaga agaacctgat cggagccctg ctgttcgaca gcggcgaaac agccgaggcc 180

acccggctga agagaaccgc cagaagaaga tacaccagac ggaagaaccg gatctgctat 240

ctgcaagaga tcttcagcaa cgagatggcc aaggtggacg acagcttctt ccacagactg 300

gaagagtcct tcctggtgga agaggataag aagcacgagc ggcaccccat cttcggcaac 360

atcgtggacg aggtggccta ccacgagaag taccccacca tctaccacct gagaaagaaa 420

ctggtggaca gcaccgacaa ggccgacctg cggctgatct atctggccct ggcccacatg 480

atcaagttcc ggggccactt cctgatcgag ggcgacctga accccgacaa cagcgacgtg 540

gacaagctgt tcatccagct ggtgcagacc tacaaccagc tgttcgagga aaaccccatc 600

aacgccagcg gcgtggacgc caaggccatc ctgtctgcca gactgagcaa gagcagacgg 660

ctggaaaatc tgatcgccca gctgcccggc gagaagaaga atggcctgtt cggaaacctg 720

attgccctga gcctgggcct gacccccaac ttcaagagca acttcgacct ggccgaggat 780

gccaaactgc agctgagcaa ggacacctac gacgacgacc tggacaacct gctggcccag 840

atcggcgacc agtacgccga cctgtttctg gccgccaaga acctgtccga cgccatcctg 900

ctgagcgaca tcctgagagt gaacaccgag atcaccaagg cccccctgag cgcctctatg 960

atcaagagat acgacgagca ccaccaggac ctgaccctgc tgaaagctct cgtgcggcag 1020

cagctgcctg agaagtacaa agagattttc ttcgaccaga gcaagaacgg ctacgccggc 1080

tacattgacg gcggagccag ccaggaagag ttctacaagt tcatcaagcc catcctggaa 1140

aagatggacg gcaccgagga actgctcgtg aagctgaaca gagaggacct gctgcggaag 1200

cagcggacct tcgacaacgg cagcatcccc caccagatcc acctgggaga gctgcacgcc 1260

attctgcggc ggcaggaaga tttttaccca ttcctgaagg acaaccggga aaagatcgag 1320

aagatcctga ccttccgcat cccctactac gtgggccctc tggccagggg aaacagcaga 1380

ttcgcctgga tgaccagaaa gagcgaggaa accatcaccc cctggaactt cgaggaagtg 1440

gtggacaagg gcgcttccgc ccagagcttc atcgagcgga tgaccaactt cgataagaac 1500

ctgcccaacg agaaggtgct gcccaagcac agcctgctgt acgagtactt caccgtgtat 1560

aacgagctga ccaaagtgaa atacgtgacc gagggaatga gaaagcccgc cttcctgagc 1620

ggcgagcaga aaaaggccat cgtggacctg ctgttcaaga ccaaccggaa agtgaccgtg 1680

aagcagctga aagaggacta cttcaagaaa atcgagtgct tcgactccgt ggaaatctcc 1740

ggcgtggaag atcggttcaa cgcctccctg ggcacatacc acgatctgct gaaaattatc 1800

aaggacaagg acttcctgga caatgaggaa aacgaggaca ttctggaaga tatcgtgctg 1860

accctgacac tgtttgagga cagagagatg atcgaggaac ggctgaaaac ctatgcccac 1920

ctgttcgacg acaaagtgat gaagcagctg aagcggcgga gatacaccgg ctggggcagg 1980

ctgagccgga agctgatcaa cggcatccgg gacaagcagt ccggcaagac aatcctggat 2040

ttcctgaagt ccgacggctt cgccaacaga aacttcatgc agctgatcca cgacgacagc 2100

ctgaccttta aagaggacat ccagaaagcc caggtgtccg gccagggcga tagcctgcac 2160

gagcacattg ccaatctggc cggcagcccc gccattaaga agggcatcct gcagacagtg 2220

aaggtggtgg acgagctcgt gaaagtgatg ggccggcaca agcccgagaa catcgtgatc 2280

gaaatggcca gagagaacca gaccacccag aagggacaga agaacagccg cgagagaatg 2340

aagcggatcg aagagggcat caaagagctg ggcagccaga tcctgaaaga acaccccgtg 2400

gaaaacaccc agctgcagaa cgagaagctg tacctgtact acctgcagaa tgggcgggat 2460

atgtacgtgg accaggaact ggacatcaac cggctgtccg actacgatgt ggaccatatc 2520

gtgcctcaga gctttctgaa ggacgactcc atcgacaaca aggtgctgac cagaagcgac 2580

aagaaccggg gcaagagcga caacgtgccc tccgaagagg tcgtgaagaa gatgaagaac 2640

tactggcggc agctgctgaa cgccaagctg attacccaga gaaagttcga caatctgacc 2700

aaggccgaga gaggcggcct gagcgaactg gataaggccg gcttcatcaa gagacagctg 2760

gtggaaaccc ggcagatcac aaagcacgtg gcacagatcc tggactcccg gatgaacact 2820

aagtacgacg agaatgacaa gctgatccgg gaagtgaaag tgatcaccct gaagtccaag 2880

ctggtgtccg atttccggaa ggatttccag ttttacaaag tgcgcgagat caacaactac 2940

caccacgccc acgacgccta cctgaacgcc gtcgtgggaa ccgccctgat caaaaagtac 3000

cctaagctgg aaagcgagtt cgtgtacggc gactacaagg tgtacgacgt gcggaagatg 3060

atcgccaaga gcgagcagga aatcggcaag gctaccgcca agtacttctt ctacagcaac 3120

atcatgaact ttttcaagac cgagattacc ctggccaacg gcgagatccg gaagcggcct 3180

ctgatcgaga caaacggcga aaccggggag atcgtgtggg ataagggccg ggattttgcc 3240

accgtgcgga aagtgctgag catgccccaa gtgaatatcg tgaaaaagac cgaggtgcag 3300

acaggcggct tcagcaaaga gtctatcctg cccaagagga acagcgataa gctgatcgcc 3360

agaaagaagg actgggaccc taagaagtac ggcggcttcg acagccccac cgtggcctat 3420

tctgtgctgg tggtggccaa agtggaaaag ggcaagtcca agaaactgaa gagtgtgaaa 3480

gagctgctgg ggatcaccat catggaaaga agcagcttcg agaagaatcc catcgacttt 3540

ctggaagcca agggctacaa agaagtgaaa aaggacctga tcatcaagct gcctaagtac 3600

tccctgttcg agctggaaaa cggccggaag agaatgctgg cctctgccgg cgaactgcag 3660

aagggaaacg aactggccct gccctccaaa tatgtgaact tcctgtacct ggccagccac 3720

tatgagaagc tgaagggctc ccccgaggat aatgagcaga aacagctgtt tgtggaacag 3780

cacaagcact acctggacga gatcatcgag cagatcagcg agttctccaa gagagtgatc 3840

ctggccgacg ctaatctgga caaagtgctg tccgcctaca acaagcaccg ggataagccc 3900

atcagagagc aggccgagaa tatcatccac ctgtttaccc tgaccaatct gggagcccct 3960

gccgccttca agtactttga caccaccatc gaccggaaga ggtacaccag caccaaagag 4020

gtgctggacg ccaccctgat ccaccagagc atcaccggcc tgtacgagac acggatcgac 4080

ctgtctcagc tgggaggcga c 4101

<210> 4

<211> 364

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 4

Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu Thr

1 5 10 15

Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val Gly Ala Val

20 25 30

Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Pro Ile

35 40 45

Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg Gln

50 55 60

Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu Tyr

65 70 75 80

Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His Ser

85 90 95

Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys Thr Gly Ala

100 105 110

Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His Arg

115 120 125

Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu Leu

130 135 140

Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala Gln Lys Lys

145 150 155 160

Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly

165 170 175

Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly

180 185 190

Gly Ser Ser Gly Gly Ser Ser Glu Val Glu Phe Ser His Glu Tyr Trp

195 200 205

Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu

210 215 220

Val Pro Val Gly Ala Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu

225 230 235 240

Gly Trp Asn Arg Ala Ile Gly Leu His Asp Pro Thr Ala His Ala Glu

245 250 255

Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu

260 265 270

Ile Asp Ala Thr Leu Tyr Val Thr Phe Glu Pro Cys Val Met Cys Ala

275 280 285

Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg

290 295 300

Asn Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val Leu His Tyr

305 310 315 320

Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp

325 330 335

Glu Cys Ala Ala Leu Leu Cys Tyr Phe Phe Arg Met Pro Arg Gln Val

340 345 350

Phe Asn Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp

355 360

<210> 5

<211> 1092

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 5

tctgaagtcg agtttagcca cgagtattgg atgaggcacg cactgaccct ggcaaagcga 60

gcatgggatg aaagagaagt ccccgtgggc gccgtgctgg tgcacaacaa tagagtgatc 120

ggagagggat ggaacaggcc aatcggccgc cacgacccta ccgcacacgc agagatcatg 180

gcactgaggc agggaggcct ggtcatgcag aattaccgcc tgatcgatgc caccctgtat 240

gtgacactgg agccatgcgt gatgtgcgca ggagcaatga tccacagcag gatcggaaga 300

gtggtgttcg gagcacggga cgccaagacc ggcgcagcag gctccctgat ggatgtgctg 360

caccaccccg gcatgaacca ccgggtggag atcacagagg gaatcctggc agacgagtgc 420

gccgccctgc tgagcgattt ctttagaatg cggagacagg agatcaaggc ccagaagaag 480

gcacagagct ccaccgactc tggaggatct agcggaggtt cctctggaag cgagacacca 540

ggcacaagcg agtccgccac accagagagc tccggcggct cctccggagg ctcctctgag 600

gtggagtttt cccacgagta ctggatgaga catgccctga ccctggccaa gagggcacgc 660

gatgagaggg aggtgcctgt gggagccgtg ctggtgctga acaatagagt gatcggcgag 720

ggctggaaca gagccatcgg cctgcacgac ccaacagccc atgccgaaat tatggccctg 780

agacagggcg gcctggtcat gcagaactac agactgattg acgccaccct gtacgtgaca 840

ttcgagcctt gcgtgatgtg cgccggcgcc atgatccact ctaggatcgg ccgcgtggtg 900

tttggcgtga ggaacgcaaa aaccggcgcc gcaggctccc tgatggacgt gctgcactac 960

cccggcatga atcaccgcgt cgaaattacc gagggaatcc tggcagatga atgtgccgcc 1020

ctgctgtgct atttctttcg gatgcctaga caggtgttca atgctcagaa gaaggcccag 1080

agctccaccg ac 1092

<210> 6

<211> 228

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 6

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg

20 25 30

Lys Glu Thr Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys

35 40 45

Ile Trp Arg His Ser Ser Lys Asn Thr Thr Lys His Val Glu Val Asn

50 55 60

Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Thr Ser

65 70 75 80

Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser

85 90 95

Lys Ala Ile Thr Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val

100 105 110

Ile Tyr Val Ala Arg Leu Tyr His His Met Asp Gln Gln Asn Arg Gln

115 120 125

Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr

130 135 140

Ala Pro Glu Tyr Asp Tyr Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro

145 150 155 160

Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu

165 170 175

Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu

180 185 190

Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala

195 200 205

Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala

210 215 220

Thr Gly Leu Lys

225

<210> 7

<211> 684

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 7

agcagtgaaa ccggaccagt ggcagtggac ccaaccctga ggagacggat tgagccccat 60

gaatttgaag tgttctttga cccaagggag ctgaggaagg agacatgcct gctgtacgag 120

atcaagtggg gcacaagcca caagatctgg cgccacagct ccaagaacac cacaaagcac 180

gtggaagtga atttcatcga gaagtttacc tccgagcggc acttctgccc ctctaccagc 240

tgttccatca catggtttct gtcttggagc ccttgcggcg agtgttccaa ggccatcacc 300

gagttcctgt ctcagcaccc taacgtgacc ctggtcatct acgtggcccg gctgtatcac 360

cacatggacc agcagaacag gcagggcctg cgcgatctgg tgaattctgg cgtgaccatc 420

cagatcatga cagccccaga gtacgactat tgctggcgga acttcgtgaa ttatccacct 480

ggcaaggagg cacactggcc aagataccca cccctgtgga tgaagctgta tgcactggag 540

ctgcacgcag gaatcctggg cctgcctcca tgtctgaata tcctgcggag aaagcagccc 600

cagctgacat ttttcaccat tgctctgcag tcttgtcact atcagcggct gcctcctcat 660

attctgtggg ctacaggcct taaa 684

<210> 8

<211> 228

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 8

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg

20 25 30

Lys Glu Ala Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys

35 40 45

Ile Trp Arg Asn Ser Gly Lys Asn Thr Thr Lys His Val Glu Val Asn

50 55 60

Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Ile Ser

65 70 75 80

Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Trp Glu Cys Ser

85 90 95

Lys Ala Ile Arg Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val

100 105 110

Ile Tyr Val Ala Arg Leu Phe Gln His Met Asp Gln Gln Asn Arg Gln

115 120 125

Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr

130 135 140

Ala Ser Glu Tyr Asp His Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro

145 150 155 160

Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu

165 170 175

Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu

180 185 190

Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala

195 200 205

Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala

210 215 220

Thr Gly Leu Lys

225

<210> 9

<211> 150

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 9

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro Glu Phe Phe Asn Arg Asn Tyr Asp Pro Arg Glu Leu Arg

20 25 30

Lys Glu Thr Tyr Leu Leu Tyr Glu Ile Lys Trp Gly Lys Glu Ser Lys

35 40 45

Ile Trp Arg His Thr Ser Asn Asn Arg Thr Gln His Ala Glu Val Asn

50 55 60

Phe Leu Glu Asn Phe Phe Asn Glu Leu Tyr Phe Asn Pro Ser Thr His

65 70 75 80

Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser

85 90 95

Lys Ala Ile Val Glu Phe Leu Lys Glu His Pro Asn Val Asn Leu Glu

100 105 110

Ile Tyr Val Ala Arg Leu Tyr Leu Cys Glu Asp Glu Arg Asn Arg Gln

115 120 125

Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Arg Ile Met Asn

130 135 140

Leu Pro Asp Tyr Asn Tyr

145 150

<210> 10

<211> 228

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 10

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro Phe Tyr Phe Gln Phe Asn Asn Asp Pro Arg Ala Cys Arg

20 25 30

Arg Lys Thr Tyr Leu Cys Tyr Glu Leu Lys Gln Asp Gly Ser Thr Trp

35 40 45

Val Trp Lys Arg Thr Leu His Asn Lys Gly Arg His Ala Glu Ile Cys

50 55 60

Phe Leu Glu Lys Ile Ser Ser Leu Glu Lys Leu Asp Pro Ala Gln His

65 70 75 80

Tyr Arg Ile Thr Trp Tyr Met Ser Trp Ser Pro Cys Ser Asn Cys Ala

85 90 95

Gln Lys Ile Val Asp Phe Leu Lys Glu His Pro His Val Asn Leu Arg

100 105 110

Ile Tyr Val Ala Arg Leu Tyr Tyr His Glu Glu Glu Arg Tyr Gln Glu

115 120 125

Gly Leu Arg Asn Leu Arg Arg Ser Gly Val Ser Ile Arg Val Met Asp

130 135 140

Leu Pro Asp Phe Glu His Cys Trp Glu Thr Phe Val Asp Asn Gly Gly

145 150 155 160

Gly Pro Phe Gln Pro Trp Pro Gly Leu Glu Glu Leu Asn Ser Lys Gln

165 170 175

Leu Ser Arg Arg Leu Gln Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu

180 185 190

Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala

195 200 205

Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala

210 215 220

Thr Gly Leu Lys

225

<210> 11

<211> 228

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 11

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro Phe His Phe Gln Phe Asn Asn Asp Pro Arg Ala Tyr Arg

20 25 30

Arg Lys Thr Tyr Leu Cys Tyr Glu Leu Lys Gln Asp Gly Ser Thr Trp

35 40 45

Val Leu Asp Arg Thr Leu Arg Asn Lys Gly Arg His Ala Glu Ile Cys

50 55 60

Phe Leu Asp Lys Ile Asn Ser Trp Glu Arg Leu Asp Pro Ala Gln His

65 70 75 80

Tyr Arg Val Thr Trp Tyr Met Ser Trp Ser Pro Cys Ser Asn Cys Ala

85 90 95

Gln Gln Val Val Asp Phe Leu Lys Glu His Pro His Val Asn Leu Arg

100 105 110

Ile Phe Ala Ala Arg Leu Tyr Tyr His Glu Gln Arg Arg Tyr Gln Glu

115 120 125

Gly Leu Arg Ser Leu Arg Gly Ser Gly Val Pro Val Ala Val Met Thr

130 135 140

Leu Pro Asp Phe Glu His Cys Trp Glu Thr Phe Val Asp His Gly Gly

145 150 155 160

Arg Pro Phe Gln Pro Trp Asp Gly Leu Glu Glu Leu Asn Ser Arg Ser

165 170 175

Leu Ser Arg Arg Leu Gln Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu

180 185 190

Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala

195 200 205

Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala

210 215 220

Thr Gly Leu Lys

225

<210> 12

<211> 57

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 12

atgaaacgga cagccgacgg aagcgagttc gagtcaccaa agaagaagcg gaaagtc 57

<210> 13

<211> 51

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 13

aaaagaaccg ccgacggcag cgaattcgag cccaagaaga agaggaaagt c 51

<210> 14

<211> 66

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 14

gcttctccaa agcgtccgcg tgaccgtcac gatggagaat tgggtggacg caaacgtgca 60

agaggt 66

<210> 15

<211> 22

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 15

Ala Ser Pro Lys Arg Pro Arg Asp Arg His Asp Gly Glu Leu Gly Gly

1 5 10 15

Arg Lys Arg Ala Arg Gly

20

<210> 16

<211> 579

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 16

agcggcggga gcggcgggag cggcgggagc ggggggagca ctaatctgag cgacatcatt 60

gagaaggaga ctgggaaaca gctggtcatt caggagtcca tcctgatgct gcctgaggag 120

gtggaggaag tgatcggcaa caagccagag tctgacatcc tggtgcacac cgcctacgac 180

gagtccacag atgagaatgt gatgctgctg acctctgacg cccccgagta taagccttgg 240

gccctggtca tccaggattc taacggcgag aataagatca agatgctgag cggaggctcc 300

ggaggatctg gaggcagcac caacctgtct gacatcatcg agaaggagac aggcaagcag 360

ctggtcatcc aggagagcat cctgatgctg cccgaagaag tcgaagaagt gatcggaaac 420

aagcctgaga gcgatatcct ggtccatacc gcctacgacg agagtaccga cgaaaatgtg 480

atgctgctga catccgacgc cccagagtat aagccctggg ctctggtcat ccaggattcc 540

aacggagaga acaaaatcaa aatgctgtct ggcggctca 579

<210> 17

<211> 96

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 17

tctggagggt cctccggcgg atcgtccggc agcgagacgc caggcacctc cgagagcgct 60

acgcctgaat cctccggggg atcttcagga ggatca 96

<210> 18

<211> 32

<212> PRT

<213> Artificial sequence (artificial sequence)

<400> 18

Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr

1 5 10 15

Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser

20 25 30

<210> 19

<211> 1092

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 19

tccgaagtcg agttttccca tgagtactgg atgagacacg cattgactct cgcaaagagg 60

gcttgggatg aacgcgaggt gcccgtgggg gcagtactcg tgcataacaa tcgcgtaatc 120

ggcgaaggtt ggaataggcc gatcggacgc cacgacccca ctgcacatgc ggaaatcatg 180

gcccttcgac agggagggct tgtgatgcag aattatcgac ttatcgatgc gacgctgtac 240

gtcacgcttg aaccttgcgt aatgtgcgcg ggagctatga ttcactcccg cattggacga 300

gttgtattcg gtgcccgcga cgccaagacg ggtgccgcag gttcactgat ggacgtgctg 360

catcacccag gcatgaacca ccgggtagaa atcacagaag gcatattggc ggacgaatgt 420

gcggcgctgt tgtccgactt ttttcgcatg cggaggcagg agatcaaggc ccagaaaaaa 480

gcacaatcct ctactgactc tggagggtcc tccggcggat cgtccggcag cgagacgcca 540

ggcacctccg agagcgctac gcctgaatcc tccgggggat cttcaggagg atcatccgaa 600

gtcgagtttt cccatgagta ctggatgaga cacgcattga ctctcgcaaa gagggctcgg 660

gatgaacgcg aggtgcccgt gggggcagta ctcgtgctta acaatcgcgt aatcggcgaa 720

ggttggaata gggcgatcgg actccacgac cccactgcac atgcggaaat catggccctt 780

cgacagggag ggcttgtgat gcagaattat cgacttatcg atgcgacgct gtacgtcacg 840

tttgaacctt gcgtaatgtg cgcgggagct atgattcact cccgcattgg acgagttgta 900

ttcggtgtcc gcaacgccaa gacgggtgcc gcaggttcac tgatggacgt gctgcattac 960

ccaggcatga accaccgggt agaaatcaca gaaggcatat tggcggacga atgtgcggcg 1020

ctgttgtgct acttttttcg catgccgagg caggtgttca atgcccagaa aaaagcacaa 1080

tcctctactg ac 1092

<210> 20

<211> 19

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 20

tactggagtt gtacctgga 19

<210> 21

<211> 20

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 21

ggaacagctt gaacgtcaat 20

<210> 22

<211> 20

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 22

gaacagcctt ctcatcatga 20

<210> 23

<211> 20

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 23

ggtgaggatt tgggacaatt 20

<210> 24

<211> 20

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 24

ctgtgaatct gatgaagttt 20

<210> 25

<211> 20

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 25

gaaaagtaat aacaaagggc 20

<210> 26

<211> 22

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 26

aaatatccac accttactaa gg 22

<210> 27

<211> 22

<212> DNA

<213> Artificial sequence (artificial sequence)

<400> 27

aggtcccccg ccggatgatc gg 22

Claims

1. A nucleic acid construct having a structure of formula I5 '-3' (5 'to 3'):

P1-S1-L1-S2-S3(I)；

in the formula (I), the compound is shown in the specification,

l1 is the coding sequence of no or a linker peptide;

and, each "-" is independently a bond or a nucleotide connecting sequence.

2. The nucleic acid construct of claim 1, wherein S1 is a coding sequence for adenine deaminase and/or cytosine deaminase and S2 is a coding sequence for a gene editing enzyme.

3. A vector comprising the nucleic acid construct of claim 1.

4. A host cell comprising the nucleic acid construct of claim 1, or having integrated into its genome one or more nucleic acid constructs of claim 1.

5. A reagent combination, comprising:

P1-S1-L1-S2-S3 (I)

wherein the content of the first and second substances,

l1 is the coding sequence of no or a linker peptide;

and, "-" is a bond or a nucleotide linker sequence;

P2-Y1 (II)；

wherein, P2 is a second promoter;

y1 is the coding sequence of gRNA;

and, "-" is a bond or a nucleotide linking sequence.

6. A kit comprising the combination of reagents of claim 5.

7. A method of gene editing in a plant comprising the steps of:

(i) providing a plant to be edited; and

(ii) introducing the nucleic acid construct of claim 1, the vector of claim 3, or the combination of agents of claim 5 into a plant cell of said plant to be edited, thereby effecting gene editing within said plant cell.

8. A method of preparing a gene-edited plant cell comprising the steps of:

transfecting a plant cell with the nucleic acid construct of claim 1, the vector of claim 3, or the reagent combination of claim 5, such that a site-directed substitution (or mutation) of a chromosome in the plant cell occurs, thereby producing the gene-edited plant cell.

9. Use of the nucleic acid construct of claim 1, the vector of claim 3, the host cell of claim 4, the combination of reagents of claim 5, or the kit of claim 6 for gene editing in a plant.

10. A method of making a gene-edited plant comprising the steps of:

regenerating said gene-edited plant cell produced by the method of claim 8 into a plant body, thereby obtaining said gene-edited plant.