CN113774082A - Method for expressing nucleic acid - Google Patents

Method for expressing nucleic acid Download PDF

Info

Publication number
CN113774082A
CN113774082A CN202010442805.7A CN202010442805A CN113774082A CN 113774082 A CN113774082 A CN 113774082A CN 202010442805 A CN202010442805 A CN 202010442805A CN 113774082 A CN113774082 A CN 113774082A
Authority
CN
China
Prior art keywords
leu
glu
lys
arg
nucleic acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010442805.7A
Other languages
Chinese (zh)
Inventor
谢洪涛
李羽
张洋扬
刘帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Shunfeng Biotechnology Co Ltd
Original Assignee
Shandong Shunfeng Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Shunfeng Biotechnology Co Ltd filed Critical Shandong Shunfeng Biotechnology Co Ltd
Priority to CN202010442805.7A priority Critical patent/CN113774082A/en
Priority to CN202180003994.0A priority patent/CN113994007B/en
Priority to PCT/CN2021/095310 priority patent/WO2021233442A1/en
Publication of CN113774082A publication Critical patent/CN113774082A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H5/00Angiosperms, i.e. flowering plants, characterised by their plant parts; Angiosperms characterised otherwise than by their botanic taxonomy
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8206Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by physical or chemical, i.e. non-biological, means, e.g. electroporation, PEG mediated
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8206Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by physical or chemical, i.e. non-biological, means, e.g. electroporation, PEG mediated
    • C12N15/8207Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by physical or chemical, i.e. non-biological, means, e.g. electroporation, PEG mediated by mechanical means, e.g. microinjection, particle bombardment, silicon whiskers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04002Adenine deaminase (3.5.4.2)

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Physiology (AREA)
  • Botany (AREA)
  • Developmental Biology & Embryology (AREA)
  • Environmental Sciences (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)

Abstract

The invention provides a nucleic acid expression method, in particular to a nucleic acid construct, which adopts a nucleic acid construct driven by a specific promoter to successfully realize gRNA-guided efficient base site-directed mutagenesis in plants.

Description

Method for expressing nucleic acid
Technical Field
The invention relates to the technical field of biology, in particular to a nucleic acid expression method.
Background
At present, single base editing efficiency in dicotyledonous plants is low, most dicotyledonous plants such as soybean cannot be edited at present, single base editing efficiency of arabidopsis thaliana/tomato and other plants is low, and application of biotechnology breeding in agricultural production is seriously influenced. Therefore, increasing the efficiency of single base editing in dicotyledonous plants has great commercial value in agricultural production.
Therefore, there is an urgent need in the art to develop a method for improving single base editing efficiency in plants.
Disclosure of Invention
The purpose of the present invention is to provide a method for improving the efficiency of single base editing in plants.
In a first aspect, the invention provides a nucleic acid construct having the structure of formula I5 '-3' (5 'to 3'):
P1-S1-L1-S2-S3 (I);
in the formula (I), the compound is shown in the specification,
p1, S1, L1, S2 and S3 are elements for constituting the construct, respectively;
p1 is a first promoter sequence, said first promoter comprising the promoter of an elongation factor;
s1, S2 are each independently one or more of (a) a coding sequence for a gene editing enzyme, (b) a coding sequence for an adenine deaminase and/or a cytosine deaminase;
l1 is the coding sequence of no or a linker peptide;
s3 is the coding sequence of UGI (uridine monophosphate synthase inhibitor) without uracil;
and, each "-" is independently a bond or a nucleotide connecting sequence.
In another preferred embodiment, the S1 is a coding sequence of adenine deaminase and/or cytosine deaminase, and the S2 is a coding sequence of gene editing enzyme.
In another preferred embodiment, when S1 is the coding sequence of adenine deaminase, S3 is null.
In another preferred embodiment, when S1 is the coding sequence of cytosine deaminase, S3 is the coding sequence of uracil glycosidase inhibitor UGI.
In another preferred embodiment, the elongation factor comprises a eukaryotic elongation factor or a prokaryotic elongation factor.
In another preferred embodiment, the eukaryotic extension factor comprises EF1 α, EF1 β, EF 2.
In another preferred embodiment, the prokaryotic elongation factor comprises EF-Tu, EF-Ts, EF-G; preferably, EF1 α is included; preferably, EF1 α in plants is included.
In another preferred embodiment, the plant is selected from the group consisting of: corn, rice, soybean, arabidopsis, tobacco, tomato, or combinations thereof.
In another preferred embodiment, the first promoter is derived from one or more plants selected from the group consisting of: corn, rice, soybean, arabidopsis, tobacco, tomato.
In another preferred embodiment, the first promoter is the promoter of tomato EF1 a.
In another preferred embodiment, the sequence of the first promoter is shown in SEQ ID No. 1.
In another preferred embodiment, the length of each L1 nucleotide sequence is independently 3-120nt, preferably 3-96nt, and preferably a multiple of 3.
In another preferred embodiment, the length of the amino acid sequence encoded by L1 is independently 3-40aa, preferably 6-32aa, preferably 18-32aa, preferably 24-32 aa.
In another preferred embodiment, the nucleotide linker sequence is 1 to 300nt, preferably 1 to 100nt in length.
In another preferred embodiment, the nucleotide linker sequence does not affect the normal transcription and translation of the elements.
In another preferred embodiment, the gene editing enzyme is an enzyme of an editing tool selected from the group consisting of: a CRISPR enzyme, TALEN enzyme, ZFN enzyme, or a combination thereof.
In another preferred example, the gene-editing enzyme is derived from a microorganism; preferably of bacterial origin.
In another preferred embodiment, the gene-editing enzyme is derived from a source selected from the group consisting of: streptococcus pyogenes (Streptococcus)pyogenes) Staphylococcus (Staphylococcus aureus), Streptococcus canis (Streptococcus canis), or combinations thereof.
In another preferred embodiment, the gene-editing enzyme has double-stranded or single-stranded DNA cleavage activity, or no cleavage activity.
In another preferred example, the gene editing enzyme is a CRISPR enzyme having single-stranded DNA cleaving activity.
In another preferred embodiment, the gene-editing enzyme comprises a wild-type or mutant-type gene-editing enzyme.
In another preferred embodiment, the identity of the gene-editing enzyme to the mutated gene-editing enzyme is greater than or equal to 80%, preferably greater than or equal to 90%; more preferably not less than 95%, still more preferably not less than 98% or 99%.
In another preferred embodiment, said mutant gene-editing enzyme is subjected to one or more, preferably 1-15, preferably 1-10, preferably 1-7, more preferably 2-5, amino acid substitutions, deletions by said wild-type gene-editing enzyme; and/or by the addition of 1 to 5, preferably 1 to 4, more preferably 1 to 3, most preferably 1 to 2 amino acids.
In another preferred embodiment, the gene-editing enzyme is selected from the group consisting of: cas9, Cas12, Cas13, Cms1, MAD7, or a combination thereof.
In another preferred embodiment, the gene-editing enzyme is selected from the group consisting of: nCas9, dCas9, nCas9NG, nCas9X, nCas12, nCas13, or a combination thereof.
In another preferred embodiment, the amino acid sequence of the gene-editing enzyme is as shown in SEQ ID No. 2.
In another preferred embodiment, the coding sequence for the gene-editing enzyme is selected from the group consisting of:
(i) a polynucleotide having a sequence as set forth in SEQ ID No. 3;
(ii) polynucleotide having homology of more than or equal to 75% (preferably more than or equal to 85%, more preferably more than or equal to 90% or more than or equal to 95% or more than or equal to 98% or more than or equal to 99%) with the sequence shown in SEQ ID No. 3;
(iii) a polynucleotide in which 1 to 60 (preferably 1 to 30, more preferably 1 to 10) nucleotides are truncated or added at the 5 'end and/or the 3' end of the polynucleotide shown in SEQ ID No. 3;
(iv) (iv) a polynucleotide complementary to any one of the polynucleotides of (i) to (iii).
In another preferred embodiment, the coding sequence of the gene-editing enzyme is shown in SEQ ID No. 3.
In another preferred embodiment, the adenine deaminase comprises a wild type and a mutant.
In another preferred embodiment, the adenine deaminase comprises wild-type and/or mutant TadA.
In another preferred embodiment, the adenine deaminase comprises TadA.
In another preferred embodiment, the mutant form of adenine deaminase comprises TadA 7-10.
In another preferred embodiment, the adenine deaminase is a fusion protein of TadA and TadA 7-10.
In another preferred embodiment, the coding sequence for adenine deaminase is selected from the group consisting of:
(i) a polynucleotide having a sequence as set forth in SEQ ID No.5 or 19;
(ii) polynucleotides having a nucleotide sequence homology of 75% or more (preferably 85% or more, more preferably 90% or more or 95% or more or 98% or more or 99%) to the sequence shown in SEQ ID No.5 or 19;
(iii) a polynucleotide in which 1 to 60 (preferably 1 to 30, more preferably 1 to 10) nucleotides are truncated or added at the 5 'end and/or the 3' end of the polynucleotide shown in SEQ ID NO.5 or 19;
(iv) (iv) a polynucleotide complementary to any one of the polynucleotides of (i) to (iii).
In another preferred embodiment, the coding sequence of adenine deaminase is as shown in SEQ ID No.5 or 19.
In another preferred embodiment, the amino acid sequence of the adenine deaminase is as shown in SEQ ID No. 4.
In another preferred embodiment, the cytosine deaminase comprises a wild type and a mutant.
In another preferred embodiment, the cytosine deaminase comprises APOBEC.
In another preferred embodiment, the APOBEC is selected from the group consisting of: APOBEC1(a1), APOBEC2(a2), APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3E, APOBEC3F, APOBEC3H, APOBEC4(a4), Activation Induced Deaminase (AID), or a combination thereof.
In another preferred embodiment, the mutant form of cytosine deaminase comprises CBE2.0, CBE2.1, CBE2.2, CBE2.3, CBE 2.4.
In another preferred embodiment, the amino acid sequence of the cytosine deaminase is as shown in any one of SEQ ID No. 6, 8-11.
In another preferred embodiment, the nucleic acid construct is further operably linked to one or more localization signal sequences.
In another preferred embodiment, the positioning signal is selected from the group consisting of: a nuclear localization signal, a chloroplast localization signal, a mitochondrial localization signal, or a combination thereof.
In another preferred embodiment, said localization signal comprises a nuclear localization signal, preferably comprising 1-2 nuclear localization signals.
In another preferred example, the nuclear localization signal comprises bpNLS, SV 40.
In another preferred embodiment, the nucleotide sequence of the nuclear localization signal is as shown in any one of SEQ ID NO. 12-14.
In another preferred embodiment, the amino acid sequence of the nuclear localization signal is shown in SEQ ID No. 15.
In another preferred embodiment, the nucleotide sequence of the S3 element is shown in SEQ ID NO. 16.
In another preferred embodiment, the nucleic acid constructs are further operably linked to one or more second nucleic acid constructs of formula II:
P2-Y1(II)
in the formula (I), the compound is shown in the specification,
p2 is a second promoter sequence;
y1 is the coding sequence of gRNA;
and, each "-" is independently a bond or a nucleotide connecting sequence.
In another preferred embodiment, when at least two nucleic acid constructs of formula II are present, the gRNA sequences may differ from each other.
In another preferred embodiment, the nucleic acid construct of formula II is located at the 5 'end or 3' end of the nucleic acid construct of formula I or distributed at both ends.
In another preferred embodiment, the gRNA includes crRNA, tracrRNA, sgRNA.
In another preferred embodiment, the second promoter is derived from one or more plants selected from the group consisting of: rice, maize, soybean, arabidopsis, tobacco or tomato.
In another preferred embodiment, the second promoter comprises an RNA polymerase III dependent promoter.
In another preferred embodiment, the second promoter is an RNA polymerase III dependent promoter.
In another preferred embodiment, the second promoter is selected from the group consisting of: u6, U3, U6a, U6b, U6c, U6-1, U3b, U3d, U6-26, U6-29, H1, or combinations thereof.
In another preferred embodiment, the second promoter comprises the U6 promoter.
In another preferred embodiment, the above-mentioned nucleotide elements of the present invention are linked in-frame (in-frame) so as to express a fusion protein with the correct amino acid sequence.
In another preferred embodiment, the nucleic acid construct of formula I and the nucleic acid construct of formula II further each independently have a terminator.
In another preferred embodiment, the nucleic acid construct of formula I and the nucleic acid construct of formula II share the same terminator.
In another preferred embodiment, the terminator comprises a terminator suitable for plant gene editing.
In another preferred embodiment, the terminator is selected from the group consisting of: NOS, Poly A, T-UBQ, rbcS, or a combination thereof.
In another preferred embodiment, the construct has a structure of formula IIIa or formula IIIb:
P1-S1-L1-S2-S3-P2-Y1 (IIIa);
P2-Y1-P1-S1-L1-S2-S3 (IIIb);
in the formula, each element is as defined above.
In another preferred embodiment, the nucleic acid construct is further operably linked to a first integration element (I1) and a second integration element (I2).
In another preferred embodiment, said first integrational element comprises a sequence of 5' homology arms. In another preferred embodiment, said second integrational element comprises a sequence of 3' homology arms.
In another preferred embodiment, one or more additional expression cassettes are additionally inserted between the I1 and I2 elements.
In another preferred embodiment, the additional expression cassette is separate from the expression cassette comprising the nucleic acid construct of formula I and the expression cassette comprising the nucleic acid construct of formula II.
In another preferred embodiment, said additional expression cassette expresses an agent selected from the group consisting of: a marker gene.
In another preferred embodiment, the marker gene comprises a resistance gene (e.g., a hygromycin resistance gene, a herbicide resistance gene), a fluorescent gene, or a combination thereof.
In a second aspect, the invention provides a vector comprising a nucleic acid construct according to the first aspect of the invention.
In another preferred embodiment, the vector is a plant expression vector.
In another preferred embodiment, the vector is an expression vector that can transfect or transform a plant cell.
In another preferred embodiment, the vector is an agrobacterium Ti vector.
In another preferred embodiment, said construct is integrated into the T-DNA region of said vector.
In another preferred embodiment, the carrier is circular or linear.
In a third aspect, the invention provides a host cell comprising a nucleic acid construct according to the first aspect of the invention, or having integrated into its genome one or more nucleic acid constructs according to the first aspect of the invention.
In another preferred embodiment, the cell is a plant cell.
In another preferred embodiment, the plant is selected from the group consisting of: a monocot, a dicot, a gymnosperm, or a combination thereof.
In another preferred embodiment, the plant is selected from the group consisting of: a graminaceous plant, a leguminous plant, a cruciferous plant, a solanaceae plant, an Umbelliferae plant, or a combination thereof.
In another preferred embodiment, the plant is selected from the group consisting of: arabidopsis, wheat, barley, oats, maize, rice, sorghum, millet, soybean, peanut, tobacco, tomato, cabbage, canola, spinach, lettuce, cucumber, garland chrysanthemum, water spinach, celery, lettuce, or combinations thereof.
In another preferred embodiment, the host cell is obtained by introducing the nucleic acid construct of claim 1 into a cell by a method selected from the group consisting of: agrobacterium transformation, particle gun, microinjection, electroporation, ultrasound, and polyethylene glycol (PEG) mediated methods.
In a fourth aspect, the present invention provides a reagent combination comprising:
(i) a first nucleic acid construct, or a first vector comprising said first nucleic acid construct, said first nucleic acid construct having a structure of formula I from 5 '-3':
P1-S1-L1-S2-S3 (I)
wherein the content of the first and second substances,
p1 is a first promoter sequence, said first promoter comprising the promoter of an elongation factor;
s1, S2 are each independently one or more of (a) a coding sequence for a gene editing enzyme, (b) a coding sequence for an adenine deaminase and/or a cytosine deaminase;
l1 is the coding sequence of no or a linker peptide;
s3 is the coding sequence of UGI (uridine monophosphate synthase inhibitor) without uracil;
and, "-" is a bond or a nucleotide linker sequence;
(ii) a second nucleic acid construct, or a second vector comprising the second nucleic acid construct, the second nucleic acid construct having a structure represented by formula (II) from 5 '-3':
P2-Y1 (II);
wherein, P2 is a second promoter;
y1 is the coding sequence of gRNA;
and, "-" is a bond or a nucleotide linking sequence.
In another preferred embodiment, the first carrier and the second carrier are different carriers.
In another preferred embodiment, the first nucleic acid construct and the second nucleic acid construct are located on different vectors.
In another preferred embodiment, the first vector and the second vector are the same vector.
In another preferred embodiment, the first nucleic acid construct and the second nucleic acid construct are located on the same vector.
According to a fifth aspect of the invention there is provided a kit comprising a combination of reagents according to the fourth aspect of the invention.
In another preferred embodiment, the kit further comprises a label or instructions.
The sixth aspect of the present invention provides a method for gene editing in a plant, comprising the steps of:
(i) providing a plant to be edited; and
(ii) introducing a nucleic acid construct according to the first aspect of the invention, a vector according to the second aspect of the invention or an agent combination according to the fourth aspect of the invention into a plant cell of said plant to be edited, thereby effecting gene editing in said plant cell.
In another preferred embodiment, the introduction is by Agrobacterium.
In another preferred embodiment, the introduction is by gene gun.
In another preferred embodiment, the gene editing is site-directed base substitution (or mutation).
In another preferred embodiment, the site-directed substitution (or mutation) comprises a mutation of a to G.
In another preferred embodiment, the site-directed substitution (or mutation) comprises a mutation of C to T.
In another preferred embodiment, the plant includes any higher plant type that can be subjected to transformation techniques, including monocots, dicots and gymnosperms.
In another preferred embodiment, the plant is a dicotyledonous plant.
In another preferred embodiment, the plant is selected from the group consisting of: a graminaceous plant, a leguminous plant, a cruciferous plant, a solanaceae plant, an Umbelliferae plant, or a combination thereof.
In another preferred embodiment, the plant is selected from the group consisting of: arabidopsis, wheat, barley, oats, maize, rice, sorghum, millet, soybean, peanut, tobacco, tomato, cabbage, canola, spinach, lettuce, cucumber, garland chrysanthemum, water spinach, celery, lettuce, or combinations thereof.
The seventh aspect of the present invention provides a method for preparing a gene-edited plant cell, comprising the steps of:
transfecting a plant cell with a nucleic acid construct according to the first aspect of the invention, a vector according to the second aspect of the invention or an agent according to the fourth aspect of the invention in combination such that a site-directed substitution (or mutation) of a chromosome in said plant cell occurs, thereby producing said gene-edited plant cell.
In another preferred embodiment, the transfection is carried out by Agrobacterium transformation or gene gun bombardment.
In an eighth aspect, the invention provides a nucleic acid construct according to the first aspect of the invention, a vector according to the second aspect of the invention, a host cell according to the third aspect of the invention, a reagent combination according to the fourth aspect of the invention, and a use of a kit according to the fifth aspect of the invention for gene editing in a plant.
The ninth aspect of the present invention provides a method for preparing a gene-edited plant, comprising the steps of:
regenerating the gene-edited plant cell produced by the method of the seventh aspect of the present invention into a plant body, thereby obtaining the gene-edited plant.
In a tenth aspect, the present invention provides a gene-edited plant prepared by the method of the ninth aspect.
It is to be understood that within the scope of the present invention, the above-described features of the present invention and those specifically described below (e.g., in the examples) may be combined with each other to form new or preferred embodiments. Not to be reiterated herein, but to the extent of space.
Drawings
FIG. 1 shows the structure of the ABE single base editor containing slEF1 a.
FIG. 2 shows the efficiency of different promoters in single base editing in tomato.
FIG. 3 shows single base editing efficiency in soybean using different promoters and different base editors.
Detailed Description
The present inventors have conducted extensive and intensive studies and, for the first time, have unexpectedly found a highly efficient EF promoter (e.g., tomato EF promoter) which is constructed in a single base editing system of ABE and CBE to drive the expression of a fusion protein composed of (a) a gene-editing enzyme and (b) adenine deaminase and/or cytosine deaminase, and which significantly improves editing efficiency in plants. On this basis, the present inventors have completed the present invention.
Term(s) for
As used herein, the term "homology arm" refers to a flanking sequence on both sides of the foreign sequence to be inserted on the targeting vector that is identical to the genomic sequence, and serves to identify the region where recombination occurs.
As used herein, the term "plant promoter" refers to a nucleic acid sequence capable of initiating transcription of a nucleic acid in a plant cell. The plant promoter may be derived from plants, microorganisms (such as bacteria, viruses) or animals, or may be a promoter artificially synthesized or modified.
As used herein, the term "gene editing" or "base mutation" or "base editing" refers to the occurrence of a base substitution (disruption), insertion (insertion), and/or deletion (deletion) at a position in a nucleotide sequence. The "editing" or "mutation" in the present invention is preferably a single-base mutation.
As used herein, the term "base substitution" refers to a mutation of a base at a position in a nucleotide sequence to another, different base, such as an a mutation to a G.
As used herein, the term "A.T to g.c" refers to a mutation or substitution of an a-T base pair to a G-C base pair at a position in a double-stranded nucleic acid sequence (particularly a genomic sequence).
As used herein, the term "C.G through T.A" refers to a mutation or substitution of a C-G base pair to a T-A base pair at a position in a double-stranded nucleic acid sequence, particularly a genomic sequence.
As used herein, the term "gene editing enzyme" refers to a nuclease suitable for CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats), TALEN (transcription Activator-like effector nucleases), ZFN (Zinc finger nucleic acid technology), and like editing tools. Preferably, the gene-editing enzyme is a CRISPR enzyme, also known as a Cas protein, of the kind including, but not limited to: cas9 protein, Cas12 protein, Cas13 protein, Cas14 protein, Csm1 protein, FDK1 protein. The Cas protein refers to a protein family, and can have different structures according to different sources, such as SpCas9 derived from Streptococcus pyogenes (Streptococcus pygeneus), and SaCas9 derived from Staphylococcus (Staphylococcus aureus); the lower classification can also be made according to structural features (e.g., domains), such as Cas12 family including Cas12a (aka Cpf1), Cas12b, Cas12c, Cas12i, and the like. The Cas protein may have double-stranded or single-stranded or no cleavage activity. The Cas protein can be a wild type or a mutant thereof, the mutation type of the mutant comprises amino acid substitution, substitution or deletion, and the mutant can change or not change the enzyme digestion activity of the Cas protein. Preferably, the Cas protein of the present invention has only single-strand cleavage activity or no cleavage activity, which is a mutant of a wild-type Cas protein. Preferably, the Cas protein of the present invention is Cas9, Cas12, Cas13 or Cas14 having single-strand cleavage activity. In a preferred embodiment, the Cas9 proteins of the present invention include SpCas9n (D10A), nspscas 9NG, SaCas9n, ScCas9n, XCas9n, wherein "n" represents nick, i.e. a Cas protein having only single strand cleavage activity. Mutating a known Cas protein to obtain a Cas protein with single-stranded or no cleavage activity is routine technical means in the art. As known to those skilled in the art, many Cas proteins with nucleic acid cleavage activity, known proteins or modified variants thereof, which are reported in the prior art, can achieve the functions of the present invention, and are included herein by reference.
As used herein, the term "coding sequence for a Cas protein" refers to a nucleotide sequence encoding a Cas protein. In the case where the inserted polynucleotide sequence is transcribed and translated to produce a functional Cas protein, the skilled artisan will recognize that, because of the degeneracy of the codons, a large number of polynucleotide sequences may encode the same polypeptide. In addition, the skilled artisan will also recognize that different species have certain preferences for codons, and that codons of Cas proteins may be optimized as desired for expression in different species, and such variants are specifically encompassed by the term "coding sequence for Cas protein". Furthermore, the term specifically includes full-length, substantially identical sequences to Cas gene sequences, as well as sequences encoding proteins that retain Cas protein function.
As used herein, the "gRNA" is also referred to as guide RNA or guide RNA and has a meaning commonly understood by those skilled in the art. In general, the guide RNA may comprise, or consist essentially of, a direct repeat and a guide sequence (guide sequence). grnas may include crRNA and tracrRNA or only crRNA depending on Cas protein on which they depend in different CRISPR systems. The crRNA and tracrRNA may be artificially engineered to fuse to form single guide RNA (sgRNA). The gRNA of the invention can be natural, and can also be artificially modified or designed and synthesized. In certain instances, the guide sequence is any polynucleotide sequence that is sufficiently complementary to the target sequence to hybridize to the target sequence and direct specific binding of the CRISPR/Cas complex to the target sequence, typically having a sequence length of 17-23 nt. In certain embodiments, the degree of complementarity between a targeting sequence and its corresponding target sequence, when optimally aligned, is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%. Determining the optimal alignment is within the ability of one of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, the Smith-Waterman algorithm in matlab (Smith-Waterman), Bowtie, Geneius, Biopython, and SeqMan.
As used herein, the term "plant" includes whole plants, plant organs (e.g., leaves, stems, roots, etc.), seeds, and plant cells and progeny of same. The type of plant that can be used in the method of the invention is not particularly limited and generally includes any plant type that can be subjected to gene editing techniques, including monocotyledonous, dicotyledonous and gymnosperms, angiosperms, including mainly woody plants.
The term "expression cassette" as used herein refers to a polynucleotide sequence comprising the sequence components of the gene to be expressed and the elements required for expression. Components required for expression include a promoter and polyadenylation signal sequence. In addition, the expression cassettes of the invention optionally contain other sequences including (but not limited to): enhancers, secretory signal peptide sequences, and the like.
In the present invention, the nucleotide sequence is described in the 5 'to 3' direction unless otherwise noted.
As used herein, "uracil DNA glycosylase inhibitor (UGI)" is capable of inhibiting intracellular uracil DNA glycosidase from re-catalyzing U back to C.
EF promoter
The EF promoter refers to a promoter of an Elongation Factor (EFs), and the EF factor refers to a protein factor that promotes extension of a polypeptide chain when mRNA is translated. Elongation factors in eukaryotes include: EF1 α, EF1 β and EF 2. Elongation factors in prokaryotes include EF-Tu, EF-Ts and EF-G. EF1a is a eukaryotic elongation factor 1 α, which is an important component of protein biosynthesis. EF1A catalyzes the binding of aminoacyl-tRNA to the ribosomal a site through a GTP-dependent mechanism. EF1A, which accounts for 3-10% of the total soluble protein, is considered one of the most abundant soluble proteins in the cytoplasm.
In a preferred embodiment, the EF promoter includes, but is not limited to: EF1a promoter, EF1 beta promoter, EF2 promoter, EF-Tu, EF-Ts, and EF-G.
In a preferred embodiment, the promoter of the present invention refers to the EF1a promoter element derived from a plant of the solanaceae family (preferably, from tomato or similar plants).
A typical promoter of the present invention has the sequence shown in SEQ ID No. 1.
It is understood that the term also includes promoters from other different solanaceae plants that are homologous to the promoter shown in SEQ ID No. 1. In addition, the term also includes derived promoters or active fragments of the promoter shown in SEQ ID No. 1 or homologous promoters thereof, mainly these derived promoters or active fragments retain the function of efficient gene editing efficiency, for example, retain at least 50% of the specific promoter function (expressed as the expression amount of the foreign gene that can be initiated) of the promoter shown in SEQ ID No. 1.
As used herein, the term "solanaceous plant" includes tomato, potato, eggplant, pepper, medlar, tobacco.
As used herein, the term "promoter" or "promoter region" refers to a nucleic acid sequence that is precisely and efficiently functional to initiate the transcription of a gene, directing the transcription of the gene nucleic acid sequence into mRNA, which is usually present upstream (5' to) the coding sequence of the gene of interest, and generally, the promoter or promoter region provides a recognition site for RNA polymerase and other factors necessary for proper initiation of transcription.
Herein, the promoter or promoter region (domain) includes a variant of the promoter, which can be obtained by inserting or deleting a regulatory region, performing random or site-directed mutagenesis, or the like.
The present invention also includes nucleic acids having 50% or more (preferably 60% or more, 70% or more, 80% or more, more preferably 90% or more, more preferably 95% or more, most preferably 98% or more, e.g., 99%) homology to the preferred promoter sequences of the present invention (SEQ ID No.:1), which also have a function of specifically increasing the efficiency of gene editing in plants. "homology" refers to the level of similarity (i.e., sequence similarity or identity) between two or more nucleic acids in terms of percentage positional identity.
It is understood that although the promoter EF1a from Solanaceae, such as tomato, is provided in the examples of the present invention, promoters derived from other similar plants (particularly from the same family as tomato) and having some homology (conservation) to the promoter of the present invention are also included within the scope of the present invention, as long as the promoter can be easily isolated from other plants by one skilled in the art after reading the present application based on the information provided herein.
As used herein, "exogenous" or "heterologous" refers to the relationship between two or more nucleic acid or protein sequences of different origin. For example, a promoter is foreign to a gene of interest if the combination of the promoter and the sequence of the gene of interest is not normally found in nature. A particular sequence is "foreign" to the cell or organism into which it is inserted.
As used herein, "cis-regulatory element" refers to a conserved base sequence that acts to regulate the transcription initiation and transcription efficiency of a gene.
The promoter of the present invention may be operably linked to an exogenous gene, which may be exogenous (heterologous) with respect to the promoter. The foreign gene (also referred to as a target gene) of the present invention is not particularly limited, and may be a gene encoding a protein having a specific function, such as (a) a gene-editing enzyme and (b) an adenine deaminase and/or a cytosine deaminase.
Representative examples of such exogenous genes include (but are not limited to): resistance genes, selection marker genes, epitope tags, reporter gene sequences, nuclear localization signal sequences, transcription activation domains (e.g., VP64), transcription inhibition domains (e.g., KRAB domains or SID domains), nuclease domains (e.g., Fok1), viral capsid protein genes, antibody genes, and domains having an activity selected from the group consisting of nucleotide deaminase, methylase activity, demethylase, transcription activation activity, transcription inhibition activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, and nucleic acid binding activity.
The resistance gene is selected from the following group: a herbicide-resistant gene, an antiviral gene, a cold-resistant gene, a high-temperature-resistant gene, a drought-resistant gene, a waterlogging-resistant gene, or an insect-resistant gene. The screening marker gene is selected from the following group: gus (. beta. -glucuronidase) gene, hyg (hygromycin) gene, neo (neomycin) gene, or gfp (green fluorescent protein) gene.
The invention also provides a gene expression cassette, which comprises the following elements from 5 'to 3': a promoter, a gene ORF sequence, and a terminator. Preferably, the promoter sequence is shown in SEQ ID No. 1 or has homology of more than or equal to 90%, preferably more than or equal to 95%, and more preferably more than or equal to 98% with the sequence shown in SEQ ID No. 1.
The invention also provides a recombinant vector comprising the promoter and/or the gene expression cassette of the invention. In a preferred embodiment, the promoter downstream of the recombinant vector comprises a multiple cloning site or at least one cleavage site. When the target gene is required to be expressed, the target gene is ligated into a suitable multiple cloning site or enzyme cleavage site, thereby operably linking the target gene with the promoter. As another preferred mode, the recombinant vector comprises (in the 5 'to 3' direction): a promoter, a gene of interest, and a terminator. If desired, the recombinant vector may further comprise an element selected from the group consisting of: a 3' polyadenylation signal; an untranslated nucleic acid sequence; transport and targeting nucleic acid sequences; resistance selection markers (dihydrofolate reductase, neomycin resistance, hygromycin resistance, green fluorescent protein, etc.); an enhancer; or operator.
One of ordinary skill in the art can use well-known methods to construct expression vectors containing the promoter and/or gene sequences of interest described herein. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like.
The promoter, expression cassette or vector of the present invention may be used to transform an appropriate host cell to allow the host to express the protein. The host cell may be a prokaryotic cell, such as E.coli, Streptomyces, Agrobacterium: or lower eukaryotic cells, e.g. yeastA cell; or higher eukaryotic cells, such as plant cells. It will be clear to one of ordinary skill in the art how to select an appropriate vector and host cell. Transformation of a host cell with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is a prokaryote (e.g., Escherichia coli), CaCl may be used2The treatment can also be carried out by electroporation. When the host is a eukaryote, the following DNA transfection methods may be used: calcium phosphate coprecipitation, conventional mechanical methods (e.g., microinjection, electroporation, liposome encapsulation, etc.). The transformed plant may be transformed by methods such as Agrobacterium transformation or biolistic transformation, for example, leaf disc method, immature embryo transformation, flower bud soaking method, etc. The transformed plant cells, tissues or organs can be regenerated into plants by conventional methods to obtain transgenic plants.
As a preferred mode of the present invention, a method for producing a transgenic plant is: a vector carrying a promoter and a target gene (which are operably linked) is transferred into Agrobacterium, and the Agrobacterium then integrates a vector fragment containing the promoter and the target gene into the plant chromosome. The transgenic recipient plant can be Arabidopsis thaliana, wheat, barley, oat, corn, rice, sorghum, millet, soybean, peanut, tobacco, tomato, cabbage, rape, spinach, lettuce, cucumber, garland chrysanthemum, swamp cabbage, celery, and leaf lettuce. In the embodiment of the invention, the recombinant vector is a pCAMBIA1300 vector, and the promoter of the invention is constructed into the vector to transform plants.
In a preferred embodiment, the invention clones an EF promoter (such as a tomato SlEF1a promoter), and uses the promoter to drive the expression of a fusion protein coding sequence of Cas enzyme and deaminase, so as to finally obtain a system for high-efficiency single base substitution and gene knockout of dicotyledonous plants.
Adenine deaminase
As used herein, the term "adenine deaminase" is an enzyme that catalyzes the hydrolytic deamination of adenine to form hypoxanthine and ammonia. Adenine a is converted to hypoxanthine I, which can pair with cytosine and is read and copied at the DNA level as guanine (G), resulting in the conversion of the a.t pair to the g.c pair. The TadA adenine deaminase is derived from Escherichia coli, and is obtained by artificially modifying an ecTadA mutant. The dimers of TadA and ecTadA are currently commonly used adenine deaminases.
In the present invention, suitable TadA comprises both the wild type form and its specific mutant form TadA7-10, or a combination of both the wild type form and the mutant form. TadA7-10 is capable of deaminating with DNA as a substrate.
In the present invention, the adenine deaminase coding sequence in the nucleic acid construct can be codon optimized in a manner preferred by the host, depending on the host.
Cytosine deaminase
As used herein, the term "cytosine deaminase (APOBEC)" is an enzyme that catalyzes the deamination of intracellular cytosines to uracil, converting cytosine C to uracil U, which is recognized as a T during DNA replication by the polymerase enzyme that damages the DNA during re-replication, resulting in the conversion of a c.g pair to a t.a pair. 11 members of the APOBECs family have been found, including APOBEC1(a1), APOBEC2(a2), APOBEC 3A-H (3A, 3B, 3C, 3D, 3E, 3F, 3H), APOBEC4(a4), and Activated Induced Deaminase (AID).
In the present invention, suitable cytosine deaminases comprise both the wild-type form and specific mutated forms thereof (e.g. CBE2.0, CBE2.1, CBE2.2, CBE2.3, CBE2.4) and also combinations of wild-type and mutated forms. Mutant forms of cytosine deaminases are capable of deaminating using DNA as a substrate.
In the present invention, the cytosine deaminase coding sequence in the nucleic acid construct can be codon optimized in a manner that is preferred by the host, depending on the host.
In a preferred embodiment of the invention, the preferred cytosine deaminase is CBE2.0, CBE2.1, CBE2.2, CBE2.3, CBE 2.4.
The amino acid sequence of CBE2.0 is shown in SEQ ID NO. 6, and the nucleotide sequence is shown in SEQ ID NO. 7.
The amino acid sequence of CBE2.1 is shown in SEQ ID NO. 8.
The amino acid sequence of CBE2.2 is shown in SEQ ID NO. 9.
The amino acid sequence of CBE2.3 is shown in SEQ ID NO. 10.
The amino acid sequence of CBE2.4 is shown in SEQ ID NO. 11.
Construction of the invention
The invention provides a nucleic acid construct for gene editing in a plant, said nucleic acid construct having a5 '-3' structure of formula I:
P1-S1-L1-S2-S3 (I);
in the formula (I), the compound is shown in the specification,
p1, S1, L1, S2 and S3 are elements for constituting the construct, respectively
As defined in the first aspect of the invention;
and, each "-" is a bond or a nucleotide connecting sequence.
In a preferred embodiment, the nucleic acid construct is further operably linked to one or more second nucleic acid constructs of formula II:
P2-Y1(II);
wherein P2 and Y1 are as defined in the first aspect of the invention.
In a preferred embodiment, the nucleic acid construct is further operably linked to a first integration element (I1) and a second integration element (I2).
Wherein the I1 element (or left integrating element) and the I2 element (or right integrating element) can act synergistically to integrate the elements located therebetween (i.e., the nucleotide sequence from P1 to Y1) into the genome of a plant cell.
Representative I1 and I2 are Ti elements from Agrobacterium. Of course, other elements that may serve a similar integration function may also be used with the present invention.
The various elements used in the constructs of the invention are either known in the art or can be prepared by methods known to those skilled in the art. For example, the constructs of the present invention can be formed by conventional methods, such as PCR, total artificial chemical synthesis, enzymatic digestion to obtain the corresponding elements, and then ligating them together by well-known DNA ligation techniques.
The vector of the present invention is formed by inserting the construct of the present invention into a foreign vector, particularly a vector suitable for the manipulation of transgenic plants.
The vector of the present invention is used to transform plant cells so as to mediate the vector of the present invention to integrate plant cell chromosomes, and the vector is expressed in plants to prepare plant cells edited by genes.
The gene-edited plant cell of the present invention is regenerated into a plant body, thereby obtaining a gene-edited plant.
The constructed nucleic acid constructs of the present invention can be introduced into plant cells by conventional plant recombination techniques (e.g., Agrobacterium transfer techniques) to obtain plant cells harboring the nucleic acid construct (or a vector carrying the nucleic acid construct), or to obtain plant cells having the nucleic acid construct integrated into their genome.
The individual plants of the present invention into which the nucleic acid construct is incorporated can be isolated or removed from their progeny by conventional screening or by other means known in the art to produce genetically edited plants that do not contain the nucleic acid construct.
Specifically, the invention drives the expression of a gene editing enzyme (such as Cas9) and deaminase fusion protein coding sequence by a specific EF promoter, such as tomato EF1a, so as to improve the gene editing efficiency.
Vector construction
The vector is mainly characterized in that the coding sequences of a specific EF promoter (such as tomato EF1a), deaminase and Cas fusion protein, and optionally a nuclear localization signal and a UGI coding sequence are connected together to form the specific nucleic acid construct. When the nucleic acid construct is expressed in the cytoplasm, the fusion protein encoded by the nucleic acid construct can be transferred into the nucleus very efficiently, and the guide RNA encoded by the construct of formula II is guided to the target position in the genome, so that base substitution from A.T to G.C or from C.G to T.A is carried out at the target position, the risk of insertion/deletion is substantially avoided or eliminated, and the efficiency of gene editing can be significantly improved.
Since adenine deaminase mutates a to G, cytosine deaminase mutates C to T does not require DNA double strand cleavage activity of the Cas protein. Thus, in the present invention a Cas protein is a mutated Cas protein with no cleavage activity or with single-strand cleavage activity. In a preferred embodiment, the Cas protein of the present invention may be nCas9, the amino acid sequence of which is shown in SEQ ID No. 2. Generally, in order to increase the activity of the fusion protein, the proteins are generally connected by some flexible short peptide, i.e., Linker (Linker peptide sequence). Preferably, the Linker can be XTEN, the coding sequence of the Linker is shown in SEQ ID NO. 17, and the amino acid sequence of the Linker is shown in SEQ ID NO. 18.
The expression cassette for guide RNA suitable for plant cells was selected and constructed in the same vector as the open expression cassette (ORF) for the fusion protein described above.
In the present invention, the vector may be, for example, a plasmid, a virus, a cosmid, a phage, etc., which are well known to those skilled in the art and are described in many cases in the art. Preferably, the expression vector in the present invention is a plasmid. Expression vectors can include promoters, ribosome binding sites for translation initiation, polyadenylation sites, transcription terminators, enhancers, and the like. The expression vector may also contain one or more selectable marker genes for use in selecting host cells containing the vector. Such selectable markers include the gene encoding dihydrofolate reductase, or the gene conferring neomycin tolerance, the gene conferring resistance to tetracycline or ampicillin, and the like.
The nucleic acid constructs of the invention may be inserted into the vector by a variety of methods, for example, by ligation following digestion of the insert and vector with appropriate restriction endonucleases. A variety of cloning techniques are known in the art and are within the knowledge of those skilled in the art.
Vectors suitable for use in the present invention include commercially available plasmids such as, but not limited to: pBR322(ATCC37017), pCAMBIA1300, pKK223-3(Pharmacia Fine Chemicals, Uppsala, Sweden), GEM1(Promega Biotec, Madison, Wis., USA) pQE70, pQE60, pQE-9(Qiagen), pD10, psiX174pBluescript II KS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5(Pharmacia), pKK232-8, pCM7, pSV2, CAT pOG44, pKK 1, pSG (VKA), pSVK3, pSpBPV, pMSG, and Stratine), etc.
Genetic transformation
In the present invention, there is no particular limitation on the method of introducing the construct of formula I of the present invention into cells or integrating it into the genome. This can be carried out in a conventional manner, for example by introducing the constructs of the formula I or the corresponding vectors into the plant cells by suitable methods. Representative methods of introduction include, but are not limited to: agrobacterium transfection, particle gun, microinjection, electroporation, ultrasound, and polyethylene glycol (PEG) mediated methods.
In the present invention, the recipient plant is not particularly limited, and includes various crop plants (e.g., gramineae), forestry plants, horticultural plants (e.g., flowers), and the like. Representative examples include, but are not limited to: rice, soybean, tomato, corn, tobacco, wheat, sorghum, potato, and the like.
After the above DNA vector or fragment is introduced into a plant cell, the fusion protein and gRNA are expressed in DNA in the transformed plant cell. A gene editing enzyme (such as Cas9 nuclease) fused with adenine deaminase and/or cytosine deaminase mutates a at a target position to G (thereby mutating T of the complementary strand to C) or C at a target position to T (thereby mutating G of the complementary strand to a) under the guidance of the corresponding gRNA.
For plant cells or tissues or organs subjected to site-specific replacement of plant genomes by the method, corresponding gene-edited plants can be obtained by regeneration by a conventional method. For example, the plant after the base substitution is regenerated by tissue culture.
Applications of
The invention can be used in the field of plant genetic engineering, for plant research and breeding, in particular for genetic improvement of crops, forestry crops or horticultural plants with economic value.
The main advantages of the invention include:
(1) the invention firstly connects a specific promoter (such as an Ef1a promoter) with a coding sequence of a gene editing enzyme (such as Cas9 nuclease), adenine deaminase and/or cytosine deaminase, and optionally also comprises a nuclear localization signal and UGI (UGI), thereby forming the specific nucleic acid construct of the invention, successfully realizes the gRNA-guided base site-directed mutation (such as A mutation to G) in a plant, and has very high mutation efficiency (which can be more than or equal to 70 percent or higher).
(2) Certain nucleic acid constructs of the invention may edit certain other gene sites where promoters do not function, thereby circumventing the genotypic restriction barrier to gene editing.
(3) The specific nucleic acid construct of the invention can edit some other plants with non-functional promoters, such as soybean, thereby effectively expanding the application range of a gene editing system and eliminating species obstacles.
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Experimental procedures without specific conditions noted in the following examples, molecular cloning is generally performed according to conventional conditions such as Sambrook et al: the conditions described in the Laboratory Manual (New York: Cold Spring Harbor Laboratory Press,1989), or according to the manufacturer's recommendations. Unless otherwise indicated, percentages and parts are by weight. The test materials and reagents used in the present invention are commercially available without specific reference.
Example 1 Single base editing efficiency of different promoters in tomato
1. Target selection
Solyc05g012020 affecting fruit development in tomato is selected as a target gene, 6 target sites are selected to design sgRNAs, and the designed 6 sgRNAs have the following sequences: sgRNA 1: TACTGGAGTTGTACCTGGA (SEQ ID No.:20), sgRNA 2: GGAACAGCTTGAACGTCAAT (SEQ ID No.:21), sgRNA 3: GAACAGCCTTCTCATCATGA (SEQ ID No.:22), sgRNA 4: GGTGAGGATTTGGGACAATT (SEQ ID No.:23), sgRNA 5: CTGTGAATCTGATGAAGTTT (SEQ ID No.:24), sgRNA 6: GAAAAGTAATAACAAAGGGC (SEQ ID NO: 25).
2. Vector construction
Obtaining an expression cassette of an ABE single base editor (see figure 1) by a homologous recombination technology, wherein the nucleotide sequence of the adenine deaminase ABE7.10 is shown as SEQ ID NO.5 or 19, the nucleotide sequence of the SlEF1a promoter is shown as SEQ ID NO. 1, and the specific operation is as follows:
A) the tomato genome DNA is used as a template, and a forward/reverse primer pSlEF1a-F/pSlEF1a-R is used for amplifying a target fragment to obtain a PCR product (the length is about 1583bp, and the primer annealing temperature is 58).
Figure BDA0002504566540000191
The PCR reaction conditions are as follows: pre-denaturation at 95 ℃ for 5min, denaturation at 98 ℃ for 30 sec, annealing at 58 ℃ for 30 sec, extension at 72 ℃ for 45 sec, 35 cycles, and extension at 72 ℃ for 5min
B) The vector backbone was recovered by restriction endonuclease Sbf1 and SalI
proAtU6-gRNA-pro35S-ABE7.10-nspCas9
C) The PCR product obtained from A is connected into the skeleton vector obtained from B by homologous recombination to obtain a single-base editing vector proAtU6-gRNA-proSlEF1a-ABE7.10-nspCas9
Figure BDA0002504566540000201
The PCR reaction conditions are as follows: 30min at 50 DEG C
D) Transforming Escherichia coli, selecting monoclonal sequencing to verify that the fragment is successfully connected into the vector.
Single-base editing vectors containing 35S, UBI, AtRPS5A, SlRPS5A1, SlRPS5A2 and SlTCTP promoters were constructed in the same manner.
3. Genetic transformation
(A) The plasmid constructed above is directly transformed into agrobacterium EHA 105:
(1) adding plasmid DNA into Agrobacterium infected cells, ice-cooling for 30min, placing in liquid nitrogen for 5min, immediately placing in water bath at 37 deg.C for 5min, and placing on ice for 5min
(2) Taking out the centrifuge tube, adding 700ul YEP culture medium, and shake culturing for 2-4 hr.
(3) Taking out the bacterial liquid and coating the bacterial liquid and a YEP culture medium plate containing corresponding antibiotics, and carrying out inverted culture in an incubator until bacterial colonies are visible for about 2 days.
(B) Tomato transgenesis
(1) Taking sterile tomato seedlings (cotyledon is completely unfolded and the first true leaf is slightly exposed) with the age of 7-10d, cutting the cotyledon into 5mm square leaves (cutting off the tip and a small part of the base of the leaf and leaving the middle part), placing the leaves in a pre-culture medium with the front face upward, and carrying out dark culture at 25 ℃ for 2 d.
(2) Streaking the bacterial liquid preserved at-80 deg.c on solid YEB medium, and dark culturing at 28 deg.c for 2 days. A single colony was picked and added to 5ml of liquid YEB medium, incubated at 28 ℃ and 200rpm for 1 day. 2ml of the cell suspension was added to 50ml of fresh YEB medium at 28 ℃ and 200 rpm. Centrifuging at 4 deg.C and 5000rpm for 10min, resuspending thallus with infection buffer solution, and adjusting OD600 to about 0.6-0.8.
(3) And infecting the cotyledon pre-cultured for 2d in the bacterial liquid for 5-10min, sucking off the excess bacterial liquid on a filter paper dish, placing the cotyledon with the front side facing upwards on a co-culture medium (or filter paper soaked by the infection liquid without bacteria), and culturing for 2d in the dark at 25 ℃.
(4) Transferring the cotyledon co-cultured for 2 days to a sterilization culture medium, culturing at 25 ℃ for 7 days, culturing in the dark for the first 2-3 days, and culturing in the light for the last 4-5 days. After co-cultivation for 7 days, the cotyledons were transferred to a selection medium and cultured for 30-45 days. Subcultured every 15 d.
(5) After the sterilization, the marker gene was detected (GUS is taken as an example), and several cotyledons after 7d of sterilization were stained with GUS, and the infection time was adjusted according to the size of the stained area. (not every batch, periodically to check the activity of bacteria).
(6) Detecting the damage of the agrobacterium to the cotyledon, taking a plurality of cotyledons after 7d sterilization, continuously growing the cotyledons in a sterilization culture medium for about 30d, subculturing once every 15 days, observing the differentiation rate of the cotyledons, judging the damage degree of the bacterial liquid to the cotyledons, and adjusting the infection time.
(7) Cutting off the seedling when the differentiated young bacteria grow to about 2cm, transferring the seedling to a rooting culture medium, and culturing until the root grows out
(8) Transferring the differentiated robust seedlings to a rooting culture medium containing antibiotics for rooting culture for one week, hardening off the seedlings at room temperature for 2-3 days, and then culturing in a greenhouse matrix.
(9) And (5) gene editing detection. Taking leaves of each plant, extracting genome DNA, and designing primers on two sides of a target site of the gRNA. The amplified fragments were subjected to Sanger sequencing to determine the genotype of each plant.
4. Results of the experiment
The slEF1a promoter achieved up to 70% editing efficiency in single base editing, which was 2-20 fold higher than the other promoters (see fig. 2).
5. Conclusion of the experiment
The slEF1a promoter can efficiently drive the expression of the fusion protein of deaminase and Cas9, effectively expands the application range of a single-base editing tool, and has important significance for improving plant traits and cultivating varieties.
Example 2 Single base editing efficiency of different promoters in Soybean
Selecting GmELF3a and GmALS1 genes in soybean, selecting different promoters and different base editors to examine the single-base editing efficiency of the different promoters in soybean, and the used gRNAs are shown in the following table:
Figure BDA0002504566540000211
first, in the manner of example 1, the efficiency of editing of the SlEF1a promoter (pSlEF1a), CaMV35S promoter (35S) and AA6 promoter (pAA6, ref: CN101370939A) when used in combination with ABE7.10(SEQ ID NO: 5 or 19) and Cas9 was examined; as shown in fig. 3, the "a to G gRNA 1" was a result of using different promoters in combination with the above-mentioned adenine deaminase, and in soybean, the editing efficiency by the SlEF1a promoter was much higher than those of CaMV35S and AA6 promoters.
In addition, in the above manner, the adenine deaminase was replaced with a cytosine deaminase whose amino acid sequence is shown in SEQ ID No. 6 and 8-11, and in this example, the cytosine deaminase shown in SEQ ID No. 6 is preferred, and the editing efficiency of different promoters when used in combination with cytosine deaminase and Cas9 was examined; as shown in fig. 3, the result of the "C to T gRNA 2" was that different promoters were used in combination with the cytosine deaminase, and in soybean, the editing efficiency by the SlEF1a promoter was much higher than that of CaMV35S and AA6 promoters.
All documents referred to herein are incorporated by reference into this application as if each were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.
Sequence listing
<110> Shunheng Biotech Co., Ltd
<120> A method for nucleic acid expression
<130> P2020-0390
<160> 27
<170> SIPOSequenceListing 1.0
<210> 1
<211> 1583
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 1
gattagtttg tcaaatagta gagttcattt aaaattcttc agccatatag ttctattttt 60
aagctagtcg actttttttt tcttactgaa aattaatatt tttttctttt tgaaatacta 120
atacatctaa atttaacaat tgccaaagtg atttttaatt agcttgctgg ctaatcacaa 180
taaaaattac tctcctttac tatataagta aatttttatt gctatatttg ttattattat 240
tattattatt aatatttatt ttctacaaat ttaataatat tttattttat atcattttaa 300
aaagataagt aatgaaatat taagaattcg tttataattc ttttgcaggt gggtttctat 360
ttgtaagcta atctttttca gttatccttt ttttaaaatc tttattatta ttatagctat 420
atcttttatc ttttaaaatt aacattatct attaaagata atttcaataa aagagtaaaa 480
attaatttag agttctactg tcttcaaatt tctattttaa aaaatacttt taaaacttga 540
tgtatttttt acgtggtttt tcactatgac ttaatttctg ttttattata atatgtataa 600
atataaaaat agattttcca taacatatta taaaaaatgt aaggggcatt tacgtaaata 660
gatagactta aaagaggcac cgagtgaacc ctaattctca tcgttgagac tataaaatgc 720
ccattatccc attcgcacag tctcttcatt acttttgctg ttatttctcc tcagctgtgc 780
cgcatatcgc ctaatttttc ttctctaagg tttcatcatc ttcaccaatt tctttaatct 840
cgattcaatt ttttatgttt gatctgttat tgttctgtca ctacatgtgt ttttcagttg 900
ttttactaga tgattttcac tgtcttcttg ttagatcata catatattga aaatgttttg 960
gattgacttt tttgtattgt gaatatctgt tattgtttga ttgttgttca gtatttacac 1020
acccgatctg tgttatgagc ttggtcataa ctatttctct gtatgtaaat acagatctgt 1080
taatgtttgt aatcaatttt tcatatgcac tgttgatatt gttctctctc ctgtcctgtt 1140
atatgttgat atgattcggt ttttgtataa cttgaactaa acactagtcc taaatgtttt 1200
ttttactatt taagatttat ataatatgga tagatttttt gagttcctag tctctgaaga 1260
ggttaagctt gctgtagttg tttaccagtt gaggtgcaat actaaaaatc aattcaatta 1320
ctgatatttt ttgctgttta ggtttttgac aaagtacttt aatttgcttt attgaactaa 1380
aaacgtagtc ctgaattcat tgcaagtgtg aaagctatag ttcattgttt ttgttgcaat 1440
tcttgaaaaa ttaattggtc aagctataat ggattttact ttttctgttt taatattgaa 1500
tttgctgaat ttatgaatgg gttgcatggt ttttgaaata tgttgttgtg tgttgtgtaa 1560
atgcagtttc ttagtgtctc aag 1583
<210> 2
<211> 1368
<212> PRT
<213> Streptococcus pyogenes (Streptococcus pyogenes)
<400> 2
Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser
1025 1030 1035 1040
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1045 1050 1055
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1060 1065 1070
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
1075 1080 1085
Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1090 1095 1100
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile
1105 1110 1115 1120
Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser
1125 1130 1135
Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1140 1145 1150
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile
1155 1160 1165
Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala
1170 1175 1180
Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys
1185 1190 1195 1200
Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser
1205 1210 1215
Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr
1220 1225 1230
Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val
1265 1270 1275 1280
Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1285 1290 1295
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1300 1305 1310
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp
1315 1320 1325
Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1330 1335 1340
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile
1345 1350 1355 1360
Asp Leu Ser Gln Leu Gly Gly Asp
1365
<210> 3
<211> 4101
<212> DNA
<213> Streptococcus pyogenes (Streptococcus pyogenes)
<400> 3
gacaagaagt acagcatcgg cctggccatc ggcaccaact ctgtgggctg ggccgtgatc 60
accgacgagt acaaggtgcc cagcaagaaa ttcaaggtgc tgggcaacac cgaccggcac 120
agcatcaaga agaacctgat cggagccctg ctgttcgaca gcggcgaaac agccgaggcc 180
acccggctga agagaaccgc cagaagaaga tacaccagac ggaagaaccg gatctgctat 240
ctgcaagaga tcttcagcaa cgagatggcc aaggtggacg acagcttctt ccacagactg 300
gaagagtcct tcctggtgga agaggataag aagcacgagc ggcaccccat cttcggcaac 360
atcgtggacg aggtggccta ccacgagaag taccccacca tctaccacct gagaaagaaa 420
ctggtggaca gcaccgacaa ggccgacctg cggctgatct atctggccct ggcccacatg 480
atcaagttcc ggggccactt cctgatcgag ggcgacctga accccgacaa cagcgacgtg 540
gacaagctgt tcatccagct ggtgcagacc tacaaccagc tgttcgagga aaaccccatc 600
aacgccagcg gcgtggacgc caaggccatc ctgtctgcca gactgagcaa gagcagacgg 660
ctggaaaatc tgatcgccca gctgcccggc gagaagaaga atggcctgtt cggaaacctg 720
attgccctga gcctgggcct gacccccaac ttcaagagca acttcgacct ggccgaggat 780
gccaaactgc agctgagcaa ggacacctac gacgacgacc tggacaacct gctggcccag 840
atcggcgacc agtacgccga cctgtttctg gccgccaaga acctgtccga cgccatcctg 900
ctgagcgaca tcctgagagt gaacaccgag atcaccaagg cccccctgag cgcctctatg 960
atcaagagat acgacgagca ccaccaggac ctgaccctgc tgaaagctct cgtgcggcag 1020
cagctgcctg agaagtacaa agagattttc ttcgaccaga gcaagaacgg ctacgccggc 1080
tacattgacg gcggagccag ccaggaagag ttctacaagt tcatcaagcc catcctggaa 1140
aagatggacg gcaccgagga actgctcgtg aagctgaaca gagaggacct gctgcggaag 1200
cagcggacct tcgacaacgg cagcatcccc caccagatcc acctgggaga gctgcacgcc 1260
attctgcggc ggcaggaaga tttttaccca ttcctgaagg acaaccggga aaagatcgag 1320
aagatcctga ccttccgcat cccctactac gtgggccctc tggccagggg aaacagcaga 1380
ttcgcctgga tgaccagaaa gagcgaggaa accatcaccc cctggaactt cgaggaagtg 1440
gtggacaagg gcgcttccgc ccagagcttc atcgagcgga tgaccaactt cgataagaac 1500
ctgcccaacg agaaggtgct gcccaagcac agcctgctgt acgagtactt caccgtgtat 1560
aacgagctga ccaaagtgaa atacgtgacc gagggaatga gaaagcccgc cttcctgagc 1620
ggcgagcaga aaaaggccat cgtggacctg ctgttcaaga ccaaccggaa agtgaccgtg 1680
aagcagctga aagaggacta cttcaagaaa atcgagtgct tcgactccgt ggaaatctcc 1740
ggcgtggaag atcggttcaa cgcctccctg ggcacatacc acgatctgct gaaaattatc 1800
aaggacaagg acttcctgga caatgaggaa aacgaggaca ttctggaaga tatcgtgctg 1860
accctgacac tgtttgagga cagagagatg atcgaggaac ggctgaaaac ctatgcccac 1920
ctgttcgacg acaaagtgat gaagcagctg aagcggcgga gatacaccgg ctggggcagg 1980
ctgagccgga agctgatcaa cggcatccgg gacaagcagt ccggcaagac aatcctggat 2040
ttcctgaagt ccgacggctt cgccaacaga aacttcatgc agctgatcca cgacgacagc 2100
ctgaccttta aagaggacat ccagaaagcc caggtgtccg gccagggcga tagcctgcac 2160
gagcacattg ccaatctggc cggcagcccc gccattaaga agggcatcct gcagacagtg 2220
aaggtggtgg acgagctcgt gaaagtgatg ggccggcaca agcccgagaa catcgtgatc 2280
gaaatggcca gagagaacca gaccacccag aagggacaga agaacagccg cgagagaatg 2340
aagcggatcg aagagggcat caaagagctg ggcagccaga tcctgaaaga acaccccgtg 2400
gaaaacaccc agctgcagaa cgagaagctg tacctgtact acctgcagaa tgggcgggat 2460
atgtacgtgg accaggaact ggacatcaac cggctgtccg actacgatgt ggaccatatc 2520
gtgcctcaga gctttctgaa ggacgactcc atcgacaaca aggtgctgac cagaagcgac 2580
aagaaccggg gcaagagcga caacgtgccc tccgaagagg tcgtgaagaa gatgaagaac 2640
tactggcggc agctgctgaa cgccaagctg attacccaga gaaagttcga caatctgacc 2700
aaggccgaga gaggcggcct gagcgaactg gataaggccg gcttcatcaa gagacagctg 2760
gtggaaaccc ggcagatcac aaagcacgtg gcacagatcc tggactcccg gatgaacact 2820
aagtacgacg agaatgacaa gctgatccgg gaagtgaaag tgatcaccct gaagtccaag 2880
ctggtgtccg atttccggaa ggatttccag ttttacaaag tgcgcgagat caacaactac 2940
caccacgccc acgacgccta cctgaacgcc gtcgtgggaa ccgccctgat caaaaagtac 3000
cctaagctgg aaagcgagtt cgtgtacggc gactacaagg tgtacgacgt gcggaagatg 3060
atcgccaaga gcgagcagga aatcggcaag gctaccgcca agtacttctt ctacagcaac 3120
atcatgaact ttttcaagac cgagattacc ctggccaacg gcgagatccg gaagcggcct 3180
ctgatcgaga caaacggcga aaccggggag atcgtgtggg ataagggccg ggattttgcc 3240
accgtgcgga aagtgctgag catgccccaa gtgaatatcg tgaaaaagac cgaggtgcag 3300
acaggcggct tcagcaaaga gtctatcctg cccaagagga acagcgataa gctgatcgcc 3360
agaaagaagg actgggaccc taagaagtac ggcggcttcg acagccccac cgtggcctat 3420
tctgtgctgg tggtggccaa agtggaaaag ggcaagtcca agaaactgaa gagtgtgaaa 3480
gagctgctgg ggatcaccat catggaaaga agcagcttcg agaagaatcc catcgacttt 3540
ctggaagcca agggctacaa agaagtgaaa aaggacctga tcatcaagct gcctaagtac 3600
tccctgttcg agctggaaaa cggccggaag agaatgctgg cctctgccgg cgaactgcag 3660
aagggaaacg aactggccct gccctccaaa tatgtgaact tcctgtacct ggccagccac 3720
tatgagaagc tgaagggctc ccccgaggat aatgagcaga aacagctgtt tgtggaacag 3780
cacaagcact acctggacga gatcatcgag cagatcagcg agttctccaa gagagtgatc 3840
ctggccgacg ctaatctgga caaagtgctg tccgcctaca acaagcaccg ggataagccc 3900
atcagagagc aggccgagaa tatcatccac ctgtttaccc tgaccaatct gggagcccct 3960
gccgccttca agtactttga caccaccatc gaccggaaga ggtacaccag caccaaagag 4020
gtgctggacg ccaccctgat ccaccagagc atcaccggcc tgtacgagac acggatcgac 4080
ctgtctcagc tgggaggcga c 4101
<210> 4
<211> 364
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 4
Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu Thr
1 5 10 15
Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val Gly Ala Val
20 25 30
Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Pro Ile
35 40 45
Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg Gln
50 55 60
Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu Tyr
65 70 75 80
Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His Ser
85 90 95
Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys Thr Gly Ala
100 105 110
Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His Arg
115 120 125
Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu Leu
130 135 140
Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala Gln Lys Lys
145 150 155 160
Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly
165 170 175
Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly
180 185 190
Gly Ser Ser Gly Gly Ser Ser Glu Val Glu Phe Ser His Glu Tyr Trp
195 200 205
Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu
210 215 220
Val Pro Val Gly Ala Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu
225 230 235 240
Gly Trp Asn Arg Ala Ile Gly Leu His Asp Pro Thr Ala His Ala Glu
245 250 255
Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu
260 265 270
Ile Asp Ala Thr Leu Tyr Val Thr Phe Glu Pro Cys Val Met Cys Ala
275 280 285
Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg
290 295 300
Asn Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val Leu His Tyr
305 310 315 320
Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp
325 330 335
Glu Cys Ala Ala Leu Leu Cys Tyr Phe Phe Arg Met Pro Arg Gln Val
340 345 350
Phe Asn Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp
355 360
<210> 5
<211> 1092
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 5
tctgaagtcg agtttagcca cgagtattgg atgaggcacg cactgaccct ggcaaagcga 60
gcatgggatg aaagagaagt ccccgtgggc gccgtgctgg tgcacaacaa tagagtgatc 120
ggagagggat ggaacaggcc aatcggccgc cacgacccta ccgcacacgc agagatcatg 180
gcactgaggc agggaggcct ggtcatgcag aattaccgcc tgatcgatgc caccctgtat 240
gtgacactgg agccatgcgt gatgtgcgca ggagcaatga tccacagcag gatcggaaga 300
gtggtgttcg gagcacggga cgccaagacc ggcgcagcag gctccctgat ggatgtgctg 360
caccaccccg gcatgaacca ccgggtggag atcacagagg gaatcctggc agacgagtgc 420
gccgccctgc tgagcgattt ctttagaatg cggagacagg agatcaaggc ccagaagaag 480
gcacagagct ccaccgactc tggaggatct agcggaggtt cctctggaag cgagacacca 540
ggcacaagcg agtccgccac accagagagc tccggcggct cctccggagg ctcctctgag 600
gtggagtttt cccacgagta ctggatgaga catgccctga ccctggccaa gagggcacgc 660
gatgagaggg aggtgcctgt gggagccgtg ctggtgctga acaatagagt gatcggcgag 720
ggctggaaca gagccatcgg cctgcacgac ccaacagccc atgccgaaat tatggccctg 780
agacagggcg gcctggtcat gcagaactac agactgattg acgccaccct gtacgtgaca 840
ttcgagcctt gcgtgatgtg cgccggcgcc atgatccact ctaggatcgg ccgcgtggtg 900
tttggcgtga ggaacgcaaa aaccggcgcc gcaggctccc tgatggacgt gctgcactac 960
cccggcatga atcaccgcgt cgaaattacc gagggaatcc tggcagatga atgtgccgcc 1020
ctgctgtgct atttctttcg gatgcctaga caggtgttca atgctcagaa gaaggcccag 1080
agctccaccg ac 1092
<210> 6
<211> 228
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 6
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Thr Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys
35 40 45
Ile Trp Arg His Ser Ser Lys Asn Thr Thr Lys His Val Glu Val Asn
50 55 60
Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Thr Ser
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser
85 90 95
Lys Ala Ile Thr Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val
100 105 110
Ile Tyr Val Ala Arg Leu Tyr His His Met Asp Gln Gln Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr
130 135 140
Ala Pro Glu Tyr Asp Tyr Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro
145 150 155 160
Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu
165 170 175
Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210> 7
<211> 684
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 7
agcagtgaaa ccggaccagt ggcagtggac ccaaccctga ggagacggat tgagccccat 60
gaatttgaag tgttctttga cccaagggag ctgaggaagg agacatgcct gctgtacgag 120
atcaagtggg gcacaagcca caagatctgg cgccacagct ccaagaacac cacaaagcac 180
gtggaagtga atttcatcga gaagtttacc tccgagcggc acttctgccc ctctaccagc 240
tgttccatca catggtttct gtcttggagc ccttgcggcg agtgttccaa ggccatcacc 300
gagttcctgt ctcagcaccc taacgtgacc ctggtcatct acgtggcccg gctgtatcac 360
cacatggacc agcagaacag gcagggcctg cgcgatctgg tgaattctgg cgtgaccatc 420
cagatcatga cagccccaga gtacgactat tgctggcgga acttcgtgaa ttatccacct 480
ggcaaggagg cacactggcc aagataccca cccctgtgga tgaagctgta tgcactggag 540
ctgcacgcag gaatcctggg cctgcctcca tgtctgaata tcctgcggag aaagcagccc 600
cagctgacat ttttcaccat tgctctgcag tcttgtcact atcagcggct gcctcctcat 660
attctgtggg ctacaggcct taaa 684
<210> 8
<211> 228
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 8
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Ala Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys
35 40 45
Ile Trp Arg Asn Ser Gly Lys Asn Thr Thr Lys His Val Glu Val Asn
50 55 60
Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Ile Ser
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Trp Glu Cys Ser
85 90 95
Lys Ala Ile Arg Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val
100 105 110
Ile Tyr Val Ala Arg Leu Phe Gln His Met Asp Gln Gln Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr
130 135 140
Ala Ser Glu Tyr Asp His Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro
145 150 155 160
Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu
165 170 175
Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210> 9
<211> 150
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 9
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro Glu Phe Phe Asn Arg Asn Tyr Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Thr Tyr Leu Leu Tyr Glu Ile Lys Trp Gly Lys Glu Ser Lys
35 40 45
Ile Trp Arg His Thr Ser Asn Asn Arg Thr Gln His Ala Glu Val Asn
50 55 60
Phe Leu Glu Asn Phe Phe Asn Glu Leu Tyr Phe Asn Pro Ser Thr His
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser
85 90 95
Lys Ala Ile Val Glu Phe Leu Lys Glu His Pro Asn Val Asn Leu Glu
100 105 110
Ile Tyr Val Ala Arg Leu Tyr Leu Cys Glu Asp Glu Arg Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Arg Ile Met Asn
130 135 140
Leu Pro Asp Tyr Asn Tyr
145 150
<210> 10
<211> 228
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 10
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro Phe Tyr Phe Gln Phe Asn Asn Asp Pro Arg Ala Cys Arg
20 25 30
Arg Lys Thr Tyr Leu Cys Tyr Glu Leu Lys Gln Asp Gly Ser Thr Trp
35 40 45
Val Trp Lys Arg Thr Leu His Asn Lys Gly Arg His Ala Glu Ile Cys
50 55 60
Phe Leu Glu Lys Ile Ser Ser Leu Glu Lys Leu Asp Pro Ala Gln His
65 70 75 80
Tyr Arg Ile Thr Trp Tyr Met Ser Trp Ser Pro Cys Ser Asn Cys Ala
85 90 95
Gln Lys Ile Val Asp Phe Leu Lys Glu His Pro His Val Asn Leu Arg
100 105 110
Ile Tyr Val Ala Arg Leu Tyr Tyr His Glu Glu Glu Arg Tyr Gln Glu
115 120 125
Gly Leu Arg Asn Leu Arg Arg Ser Gly Val Ser Ile Arg Val Met Asp
130 135 140
Leu Pro Asp Phe Glu His Cys Trp Glu Thr Phe Val Asp Asn Gly Gly
145 150 155 160
Gly Pro Phe Gln Pro Trp Pro Gly Leu Glu Glu Leu Asn Ser Lys Gln
165 170 175
Leu Ser Arg Arg Leu Gln Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210> 11
<211> 228
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 11
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro Phe His Phe Gln Phe Asn Asn Asp Pro Arg Ala Tyr Arg
20 25 30
Arg Lys Thr Tyr Leu Cys Tyr Glu Leu Lys Gln Asp Gly Ser Thr Trp
35 40 45
Val Leu Asp Arg Thr Leu Arg Asn Lys Gly Arg His Ala Glu Ile Cys
50 55 60
Phe Leu Asp Lys Ile Asn Ser Trp Glu Arg Leu Asp Pro Ala Gln His
65 70 75 80
Tyr Arg Val Thr Trp Tyr Met Ser Trp Ser Pro Cys Ser Asn Cys Ala
85 90 95
Gln Gln Val Val Asp Phe Leu Lys Glu His Pro His Val Asn Leu Arg
100 105 110
Ile Phe Ala Ala Arg Leu Tyr Tyr His Glu Gln Arg Arg Tyr Gln Glu
115 120 125
Gly Leu Arg Ser Leu Arg Gly Ser Gly Val Pro Val Ala Val Met Thr
130 135 140
Leu Pro Asp Phe Glu His Cys Trp Glu Thr Phe Val Asp His Gly Gly
145 150 155 160
Arg Pro Phe Gln Pro Trp Asp Gly Leu Glu Glu Leu Asn Ser Arg Ser
165 170 175
Leu Ser Arg Arg Leu Gln Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210> 12
<211> 57
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 12
atgaaacgga cagccgacgg aagcgagttc gagtcaccaa agaagaagcg gaaagtc 57
<210> 13
<211> 51
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 13
aaaagaaccg ccgacggcag cgaattcgag cccaagaaga agaggaaagt c 51
<210> 14
<211> 66
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 14
gcttctccaa agcgtccgcg tgaccgtcac gatggagaat tgggtggacg caaacgtgca 60
agaggt 66
<210> 15
<211> 22
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 15
Ala Ser Pro Lys Arg Pro Arg Asp Arg His Asp Gly Glu Leu Gly Gly
1 5 10 15
Arg Lys Arg Ala Arg Gly
20
<210> 16
<211> 579
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 16
agcggcggga gcggcgggag cggcgggagc ggggggagca ctaatctgag cgacatcatt 60
gagaaggaga ctgggaaaca gctggtcatt caggagtcca tcctgatgct gcctgaggag 120
gtggaggaag tgatcggcaa caagccagag tctgacatcc tggtgcacac cgcctacgac 180
gagtccacag atgagaatgt gatgctgctg acctctgacg cccccgagta taagccttgg 240
gccctggtca tccaggattc taacggcgag aataagatca agatgctgag cggaggctcc 300
ggaggatctg gaggcagcac caacctgtct gacatcatcg agaaggagac aggcaagcag 360
ctggtcatcc aggagagcat cctgatgctg cccgaagaag tcgaagaagt gatcggaaac 420
aagcctgaga gcgatatcct ggtccatacc gcctacgacg agagtaccga cgaaaatgtg 480
atgctgctga catccgacgc cccagagtat aagccctggg ctctggtcat ccaggattcc 540
aacggagaga acaaaatcaa aatgctgtct ggcggctca 579
<210> 17
<211> 96
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 17
tctggagggt cctccggcgg atcgtccggc agcgagacgc caggcacctc cgagagcgct 60
acgcctgaat cctccggggg atcttcagga ggatca 96
<210> 18
<211> 32
<212> PRT
<213> Artificial sequence (artificial sequence)
<400> 18
Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr
1 5 10 15
Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser
20 25 30
<210> 19
<211> 1092
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 19
tccgaagtcg agttttccca tgagtactgg atgagacacg cattgactct cgcaaagagg 60
gcttgggatg aacgcgaggt gcccgtgggg gcagtactcg tgcataacaa tcgcgtaatc 120
ggcgaaggtt ggaataggcc gatcggacgc cacgacccca ctgcacatgc ggaaatcatg 180
gcccttcgac agggagggct tgtgatgcag aattatcgac ttatcgatgc gacgctgtac 240
gtcacgcttg aaccttgcgt aatgtgcgcg ggagctatga ttcactcccg cattggacga 300
gttgtattcg gtgcccgcga cgccaagacg ggtgccgcag gttcactgat ggacgtgctg 360
catcacccag gcatgaacca ccgggtagaa atcacagaag gcatattggc ggacgaatgt 420
gcggcgctgt tgtccgactt ttttcgcatg cggaggcagg agatcaaggc ccagaaaaaa 480
gcacaatcct ctactgactc tggagggtcc tccggcggat cgtccggcag cgagacgcca 540
ggcacctccg agagcgctac gcctgaatcc tccgggggat cttcaggagg atcatccgaa 600
gtcgagtttt cccatgagta ctggatgaga cacgcattga ctctcgcaaa gagggctcgg 660
gatgaacgcg aggtgcccgt gggggcagta ctcgtgctta acaatcgcgt aatcggcgaa 720
ggttggaata gggcgatcgg actccacgac cccactgcac atgcggaaat catggccctt 780
cgacagggag ggcttgtgat gcagaattat cgacttatcg atgcgacgct gtacgtcacg 840
tttgaacctt gcgtaatgtg cgcgggagct atgattcact cccgcattgg acgagttgta 900
ttcggtgtcc gcaacgccaa gacgggtgcc gcaggttcac tgatggacgt gctgcattac 960
ccaggcatga accaccgggt agaaatcaca gaaggcatat tggcggacga atgtgcggcg 1020
ctgttgtgct acttttttcg catgccgagg caggtgttca atgcccagaa aaaagcacaa 1080
tcctctactg ac 1092
<210> 20
<211> 19
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 20
tactggagtt gtacctgga 19
<210> 21
<211> 20
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 21
ggaacagctt gaacgtcaat 20
<210> 22
<211> 20
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 22
gaacagcctt ctcatcatga 20
<210> 23
<211> 20
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 23
ggtgaggatt tgggacaatt 20
<210> 24
<211> 20
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 24
ctgtgaatct gatgaagttt 20
<210> 25
<211> 20
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 25
gaaaagtaat aacaaagggc 20
<210> 26
<211> 22
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 26
aaatatccac accttactaa gg 22
<210> 27
<211> 22
<212> DNA
<213> Artificial sequence (artificial sequence)
<400> 27
aggtcccccg ccggatgatc gg 22

Claims (10)

1. A nucleic acid construct having a structure of formula I5 '-3' (5 'to 3'):
P1-S1-L1-S2-S3(I);
in the formula (I), the compound is shown in the specification,
p1, S1, L1, S2 and S3 are elements for constituting the construct, respectively;
p1 is a first promoter sequence, said first promoter comprising the promoter of an elongation factor;
s1, S2 are each independently one or more of (a) a coding sequence for a gene editing enzyme, (b) a coding sequence for an adenine deaminase and/or a cytosine deaminase;
l1 is the coding sequence of no or a linker peptide;
s3 is the coding sequence of UGI (uridine monophosphate synthase inhibitor) without uracil;
and, each "-" is independently a bond or a nucleotide connecting sequence.
2. The nucleic acid construct of claim 1, wherein S1 is a coding sequence for adenine deaminase and/or cytosine deaminase and S2 is a coding sequence for a gene editing enzyme.
3. A vector comprising the nucleic acid construct of claim 1.
4. A host cell comprising the nucleic acid construct of claim 1, or having integrated into its genome one or more nucleic acid constructs of claim 1.
5. A reagent combination, comprising:
(i) a first nucleic acid construct, or a first vector comprising said first nucleic acid construct, said first nucleic acid construct having a structure of formula I from 5 '-3':
P1-S1-L1-S2-S3 (I)
wherein the content of the first and second substances,
p1 is a first promoter sequence, said first promoter comprising the promoter of an elongation factor;
s1, S2 are each independently one or more of (a) a coding sequence for a gene editing enzyme, (b) a coding sequence for an adenine deaminase and/or a cytosine deaminase;
l1 is the coding sequence of no or a linker peptide;
s3 is the coding sequence of UGI (uridine monophosphate synthase inhibitor) without uracil;
and, "-" is a bond or a nucleotide linker sequence;
(ii) a second nucleic acid construct, or a second vector comprising the second nucleic acid construct, the second nucleic acid construct having a structure represented by formula (II) from 5 '-3':
P2-Y1 (II);
wherein, P2 is a second promoter;
y1 is the coding sequence of gRNA;
and, "-" is a bond or a nucleotide linking sequence.
6. A kit comprising the combination of reagents of claim 5.
7. A method of gene editing in a plant comprising the steps of:
(i) providing a plant to be edited; and
(ii) introducing the nucleic acid construct of claim 1, the vector of claim 3, or the combination of agents of claim 5 into a plant cell of said plant to be edited, thereby effecting gene editing within said plant cell.
8. A method of preparing a gene-edited plant cell comprising the steps of:
transfecting a plant cell with the nucleic acid construct of claim 1, the vector of claim 3, or the reagent combination of claim 5, such that a site-directed substitution (or mutation) of a chromosome in the plant cell occurs, thereby producing the gene-edited plant cell.
9. Use of the nucleic acid construct of claim 1, the vector of claim 3, the host cell of claim 4, the combination of reagents of claim 5, or the kit of claim 6 for gene editing in a plant.
10. A method of making a gene-edited plant comprising the steps of:
regenerating said gene-edited plant cell produced by the method of claim 8 into a plant body, thereby obtaining said gene-edited plant.
CN202010442805.7A 2020-05-22 2020-05-22 Method for expressing nucleic acid Pending CN113774082A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010442805.7A CN113774082A (en) 2020-05-22 2020-05-22 Method for expressing nucleic acid
CN202180003994.0A CN113994007B (en) 2020-05-22 2021-05-21 Method for expressing nucleic acid
PCT/CN2021/095310 WO2021233442A1 (en) 2020-05-22 2021-05-21 Nucleic acid expression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010442805.7A CN113774082A (en) 2020-05-22 2020-05-22 Method for expressing nucleic acid

Publications (1)

Publication Number Publication Date
CN113774082A true CN113774082A (en) 2021-12-10

Family

ID=78707733

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010442805.7A Pending CN113774082A (en) 2020-05-22 2020-05-22 Method for expressing nucleic acid
CN202180003994.0A Active CN113994007B (en) 2020-05-22 2021-05-21 Method for expressing nucleic acid

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202180003994.0A Active CN113994007B (en) 2020-05-22 2021-05-21 Method for expressing nucleic acid

Country Status (2)

Country Link
CN (2) CN113774082A (en)
WO (1) WO2021233442A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115851784B (en) * 2022-08-02 2023-06-27 安徽农业大学 Plant cytosine base editing system constructed by Lbcpf1 variant and application thereof
CN117402855B (en) * 2023-12-14 2024-03-19 中国农业科学院植物保护研究所 Cas protein, gene editing system and application

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160264982A1 (en) * 2013-07-16 2016-09-15 Shanghai Institutes For Biological Sciences, Chinese Academy Of Sciences Method for plant genome site-directed modification
WO2018039438A1 (en) * 2016-08-24 2018-03-01 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
CN106609282A (en) * 2016-12-02 2017-05-03 中国科学院上海生命科学研究院 Carrier for base substitution of specific sites of plant genome
CN109321584B (en) * 2017-12-27 2021-07-16 华东师范大学 Report system for simply qualitatively/quantitatively detecting working efficiency of single-base gene editing technology
CN110157726B (en) * 2018-02-11 2023-06-23 中国科学院分子植物科学卓越创新中心 Method for site-directed substitution of plant genome
CN110835634B (en) * 2018-08-15 2022-07-26 华东师范大学 Novel base conversion editing system and application thereof
CN110526993B (en) * 2019-03-06 2020-06-16 山东舜丰生物科技有限公司 Nucleic acid construct for gene editing
CN110527695B (en) * 2019-03-07 2020-06-16 山东舜丰生物科技有限公司 Nucleic acid construct for gene site-directed mutagenesis
CN110129363A (en) * 2019-06-11 2019-08-16 先正达作物保护股份公司 The method for improving tomato CRISPR/Cas9 gene editing efficiency

Also Published As

Publication number Publication date
CN113994007A (en) 2022-01-28
CN113994007B (en) 2023-07-04
WO2021233442A1 (en) 2021-11-25

Similar Documents

Publication Publication Date Title
CN107177625B (en) Artificial vector system for site-directed mutagenesis and site-directed mutagenesis method
WO2018086623A1 (en) A method for base editing in plants
AU2008264202B2 (en) Enhanced silk exsertion under stress
CN110526993B (en) Nucleic acid construct for gene editing
WO2014144094A1 (en) Tal-mediated transfer dna insertion
CN107567499A (en) Soybean U6 small nuclear RNAs gene promoter and its purposes in the constitutive expression of plant MicroRNA gene
CN110527695B (en) Nucleic acid construct for gene site-directed mutagenesis
CN113994007B (en) Method for expressing nucleic acid
CN116179589B (en) SlPRMT5 gene and application of protein thereof in regulation and control of tomato fruit yield
CN110066824B (en) Artificial base editing system for rice
AU2017234672B2 (en) Zea mays regulatory elements and uses thereof
CN112805385A (en) Base editor based on human APOBEC3A deaminase and application thereof
CN116694661A (en) ShN/AINV5-4D gene for regulating plant germination rate and application thereof
CN114686456B (en) Base editing system based on bimolecular deaminase complementation and application thereof
US9777286B2 (en) Zea mays metallothionein-like regulatory elements and uses thereof
CN113293174B (en) Nucleic acid construct for base editing
CN105585623A (en) Cultivating method for disease-resistant TaMYB-KW gene-transferred wheat, related biomaterials and application
AU2014329590A1 (en) Zea mays metallothionein-like regulatory elements and uses thereof
WO2022055751A1 (en) Plastid transformation by complementation of nuclear mutations
CN115466747A (en) Glycosyltransferase ZmKOB1 gene and application thereof in regulating and controlling maize ear fructification character or development
CN108841840B (en) Application of protein TaNADH-GoGAT in regulation and control of plant yield
CN112080513A (en) Rice artificial genome editing system with expanded editing range and application thereof
CN112813092A (en) Application of GbBCCP5 protein and coding gene thereof in regulation and control of biological oil content
WO2020177751A1 (en) Nucleic acid construct for gene editing
CN114214342B (en) Application of NtFBA1 gene in regulating and controlling PVY resistance of tobacco

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination