CN113913454B - Artificial gene editing system for rice - Google Patents

Artificial gene editing system for rice Download PDF

Info

Publication number
CN113913454B
CN113913454B CN202111388744.1A CN202111388744A CN113913454B CN 113913454 B CN113913454 B CN 113913454B CN 202111388744 A CN202111388744 A CN 202111388744A CN 113913454 B CN113913454 B CN 113913454B
Authority
CN
China
Prior art keywords
nucleotide sequence
regulatory element
rice
sequence
leu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111388744.1A
Other languages
Chinese (zh)
Other versions
CN113913454A (en
Inventor
周焕斌
柳浪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Plant Protection of Chinese Academy of Agricultural Sciences
Original Assignee
Institute of Plant Protection of Chinese Academy of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Plant Protection of Chinese Academy of Agricultural Sciences filed Critical Institute of Plant Protection of Chinese Academy of Agricultural Sciences
Priority to CN202111388744.1A priority Critical patent/CN113913454B/en
Publication of CN113913454A publication Critical patent/CN113913454A/en
Application granted granted Critical
Publication of CN113913454B publication Critical patent/CN113913454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
    • Y02A40/146Genetically Modified [GMO] plants, e.g. transgenic plants

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Botany (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Microbiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The application relates to a set of artificial gene editing system for gene editing of rice, which comprises the following components: a regulatory element of item I comprising a nucleotide sequence capable of encoding, for example, an amino acid sequence I comprising, for example, one of an I-1 amino acid sequence, an I-2 amino acid sequence and an I-3 amino acid sequence; a II regulatory element comprising a II-1 nucleotide sequence and a II-2 nucleotide sequence in tandem sequentially from the 5 'end to the 3' end; the II-1 th nucleotide sequence comprises a target nucleotide sequence; the target nucleotide sequence is derived from the genome of a target organism and contains a target site to be mutated in the genome of the target organism; the II-2 nucleotide sequence comprises an sgRNA nucleic acid sequence derived from streptococcus pyogenes; the II-1 nucleotide sequence and the II-2 nucleotide sequence are transcriptionally fused.

Description

Artificial gene editing system for rice
The application is a divisional application with the application number 201811320030.8, and the name of the patent is a set of artificial gene editing system for paddy rice, which is filed on the 11 th month 07 of 2018.
Technical Field
The application relates to an artificial gene editing system for rice.
Background
Rice (Oryza sativa l.) is one of the major food crops in the world, and fosters nearly half of the world population, including nearly the entire east and south east asia population. China is the country with the highest total rice yield in the world, and the rice yield accounts for about 30% of the total world. In the production process, three diseases of rice mainly including rice blast, false smut and sheath blight seriously restrict the growth and development of the rice, so that the yield and quality of the rice are reduced, and the global grain safety is threatened. Therefore, the research of increasing the yield, improving the rice quality, increasing the disease resistance, stress resistance and the like of rice plants so as to ensure the stable supply of grains is an important subject for sustainable development of human society. Rice is used as a model plant of monocotyledonous plants, and research technology, methods, theory and achievements of the rice have important guiding effects on other gramineous plants, such as wheat, corn, sorghum and the like.
The CRISPR/Cas9 system developed in recent years is extremely applicable because it allows site-directed modification of the genome. However, CRISPR/Cas9 systems require the recognition of PAM sequences conserved at the 3' end of guide RNAs (grnas) when performing nucleic acid cleavage. The most commonly used PAM sequence identified by SpCas9 at present is mainly NGG, although SpCas9 can also identify NAG, spCas9 (VQR) can identify NGA and the like, the editing efficiency is low; meanwhile, the base editing technology developed based on the CRISPR/SpCas9 system can also cause the limitation of base editing efficiency due to the specificity of the edited target site and the possible lack of proper PAM sequence, and the application of the CRISPR/Cas9 system in rice genome editing is greatly limited.
Therefore, if a CRISPR/Cas9 system which can edit plant genome, especially rice genome at fixed points can be developed, the efficiency of editing the plant genome can be greatly improved, and the CRISPR/Cas9 system can be widely applied to aspects of plant gene function research, crop breeding and the like, and the progress of the field of editing the plant genome can be greatly promoted.
Disclosure of Invention
The present application provides a set of artificial gene editing systems comprising:
a regulatory element of type I comprising a nucleotide sequence capable of encoding, for example, an amino acid sequence I; wherein the amino acid sequence I comprises one of an I-1 amino acid sequence, an I-2 amino acid sequence and an I-3 amino acid sequence, wherein the I-1 amino acid sequence is an amino acid sequence shown as SEQ ID No. 1; the I-2 amino acid sequence comprises an amino acid sequence shown as SEQ ID No.2, SEQ ID No.1 and SEQ ID No.3 which are sequentially connected in series from the N end to the C end; the I-3 amino acid sequence comprises an amino acid sequence shown as SEQ ID No.4 and SEQ ID No.1 which are sequentially connected in series from the N end to the C end;
a II regulatory element comprising a II-1 nucleotide sequence and a II-2 nucleotide sequence in tandem sequentially from the 5 'end to the 3' end; the II-1 th nucleotide sequence comprises a target nucleotide sequence; the target nucleotide sequence is derived from the genome of a target organism and contains a target site to be mutated in the genome of the target organism; the nucleotide sequence II-2 comprises an sgRNA nucleic acid sequence derived from streptococcus pyogenes (Streptococcus pyogenes); the II-1 nucleotide sequence and the II-2 nucleotide sequence are transcribed and fused, and the product can guide the protein encoded by the I regulatory element to a target site to be mutated in the genome of a target organism, and mutate the base generated at the target site;
when the number of the II regulatory elements is plural, the II-1 nucleotide sequences contained in each of the II regulatory elements are different from each other. In addition, when the II-th regulating element is plural, these II-th regulating elements may be connected together in series.
In the present application, the target nucleotide sequence in the artificial gene editing system is determined by the artificial gene editing system itself together with the target site to be mutated in the genome of the target organism, and, as described above, the target nucleotide sequence is derived from the genome of the target organism, and thus, the target site on the target nucleotide sequence coincides with the target site sequence to be mutated in the genome of the target organism, and thus, for the sake of simplicity of expression, both are referred to as target sites, but mutation occurs on the sequence of the genome of the target organism, not on the sequence of the artificial gene editing system.
In a specific embodiment, when the I-1 amino acid sequence is used, the target site in the target nucleotide sequence is at any one of the 3 to 5 positions in the 3 'to 5' direction of the target nucleotide sequence; when the I-2 amino acid sequence is used, the target site in the target nucleotide sequence is base C in the 2 to 10 positions of the target nucleotide sequence in the 5 'to 3' direction; when the I-3 amino acid sequence is used, the target site in the target nucleotide sequence is base A in the 2 to 8 positions of the target nucleotide sequence in the 5 'to 3' direction.
When the amino acid sequence I is an I-1 amino acid sequence, the deletion or insertion mutant corresponding to the rice gene can be obtained by deleting or inserting one or a plurality of bases at a specific endogenous site in the rice genome by utilizing the artificial gene editing system. For these deletion or insertion mutants, there is a possibility that the function of the original gene is lost, and also that the function of the original gene is reduced or enhanced, depending on the actual situation, and those mutants which have completed the detection of the gene sequence are selected to be retained or discarded according to the actual need.
Alternatively, when the amino acid sequence I is an I-2 amino acid sequence, the I-th regulatory element can be used for carrying out site-directed mutagenesis on a specific base C endogenous in a rice genome into one of T, A or G by utilizing the artificial gene editing system, and screening to obtain a rice gene function correction mutant. Or for the reverse complementary sequence, G is subjected to site-directed mutagenesis to one of A, T or C, and a rice gene function correction mutant is obtained by screening, wherein the target nucleotide sequence is the nucleotide sequence on the strand C at the target site.
Or when the amino acid sequence I is an I-3 amino acid sequence, the specific base A endogenous in the rice genome can be subjected to site-directed mutagenesis into G by utilizing the artificial gene editing system, and the rice gene function correction mutant can be obtained by screening. Or for the reverse complementary sequence, T is subjected to site-directed mutagenesis to form C, and the rice gene function correction mutant is obtained by screening, wherein the target nucleotide sequence is the nucleotide sequence on the A chain at the target site.
In a specific embodiment, the target organism is rice, the nucleotide sequence of the first regulatory element is a nucleotide sequence suitable for expression in rice, and the nucleotide sequence of the second regulatory element is a nucleotide sequence suitable for transcription in rice.
In a specific embodiment, the nucleotide coding sequence capable of encoding the amino acid sequence shown as SEQ ID No.1 is shown as SEQ ID No. 5. The nucleotide coding sequence shown as SEQ ID No.5 can be preferably used in rice.
In a specific embodiment, the nucleotide coding sequence capable of encoding the amino acid sequence shown as SEQ ID No.2 is shown as SEQ ID No. 6. The nucleotide coding sequence shown as SEQ ID No.6 can be preferably used in rice.
In a specific embodiment, the nucleotide coding sequence capable of encoding the amino acid sequence shown as SEQ ID No.3 is shown as SEQ ID No. 7. The nucleotide coding sequence shown as SEQ ID No.7 can be preferably used in rice.
In a specific embodiment, the nucleotide coding sequence capable of encoding the amino acid sequence shown as SEQ ID No.4 is shown as SEQ ID No. 8. The nucleotide coding sequence shown as SEQ ID No.8 can be preferably used in rice.
In a specific embodiment, the nucleotide sequence of II-2 is shown as SEQ ID No. 9.
In a specific embodiment, the II-1 nucleotide sequence further comprises a cloning site comprising a cleavage site for a type IIS restriction enzyme, into which the target nucleotide sequence is cloned via the cloning site on the II-1 nucleotide sequence (e.g., the target nucleotide sequence is linked to the cloning site by cleavage-ligation) such that the II-1 nucleotide sequence is transcriptionally fused to the II-2 sequence; when the II-th regulatory element is plural, the cleavage sites of the type IIS restriction enzymes for cloning different target nucleotide sequences are different from each other.
Wherein, since the target nucleotide sequence is varied according to the base editing site, other elements including the cleavage site of the restriction enzyme cloned in advance to the relevant position can be constructed. The target nucleotide sequence is cloned by cleavage of the restriction enzyme cleavage site according to the purpose of base editing, before use. When the number of the II-th regulatory elements is plural, restriction enzyme cleavage sites of restriction enzymes contained in the plural II-1 nucleotide sequences are different from each other, and thus, different target nucleotides can be effectively ensured to be successfully cloned to the target position. Multiple target nucleotide sequences can be used for base substitution of multiple target sites to be mutated on the genome of the target organism.
In a specific embodiment, it is preferred that the nucleotide sequence of the cloning site comprises SEQ ID No.10 and/or SEQ ID No.11.
In one embodiment, the target nucleotide sequence is determined by:
1) Determining a nucleotide sequence to be modified on a rice genome;
2) Judging that the nucleotide sequence to be modified determined in the step 1) is a specific sequence in a genome (the higher the specificity of the modified nucleotide sequence is, the more accurate the gene editing is, otherwise erroneous recognition may occur),
judging whether the change caused by the mutation of the base of the nucleotide site to be mutated meets the expectations or not according to the I-th regulating element; or judging whether the change caused by mutation of the reverse complementary base of the nucleotide site to be mutated meets the expectations or not according to the I-th regulatory element;
for the predictors, the nucleotide sites to be mutated are potential target sites;
3) Screening a target sequence in a nucleotide sequence to be modified or its reverse complement: searching in the 3' direction of the potential target site to confirm the presence of a recognition module capable of being recognized by the amino acid sequence I encoded by the I-th regulatory element, and
when the amino acid sequence I is an amino acid sequence such as I-1, the target site is at a position from-3 to-5 upstream of the 5 'end of the recognition module, whereby the 17 to 21 nucleotide sequence upstream of the 5' end of the recognition module is determined to be the target nucleotide sequence;
when the amino acid sequence I is an I-2 amino acid sequence, the target site is at a position from-19 to-11 upstream of the 5 '-end of the recognition module, whereby the 17-21 nucleotide sequence upstream of the 5' -end of the recognition module is determined to be the target nucleotide sequence;
when the amino acid sequence I is, for example, an I-3 amino acid sequence, the target site is located at a position from-19 to-13 upstream of the 5 '-end of the recognition module, and the 17-21 nucleotide sequence upstream of the 5' -end of the recognition module thus determined is the target nucleotide sequence.
In one embodiment, the identified motif is 5' -N 1 GN 2 -3', 17 to 21 nucleotide sequences upstream of the target nucleotide sequence, eliminating nucleotide sequences containing five consecutive T's; wherein the N is 1 And N 2 A, G, C and T, independently.
In a specific embodiment, the target nucleotide sequence is at least one of the sequences shown as SEQ ID No.16, SEQ ID No.17 and SEQ ID No. 18.
In one embodiment, the artificial gene editing system further comprises a first promoter at the 5' end of the ith regulatory element that can be used in rice and that can initiate transcription of the ith regulatory element; and/or the artificial gene editing system further comprises a second promoter at the 5' end of the II regulatory element that can be used in rice and that can initiate transcription of the II regulatory element.
In a specific embodiment, the first promoter is an RNA polymerase II promoter; and/or the second promoter is an RNA polymerase type III promoter.
In a specific embodiment, the first promoter is SEQ ID No.12; and/or the second promoter is SEQ ID No.13.
In one embodiment, the artificial gene editing system further comprises a first terminator at the 3' end of the ith regulatory element capable of terminating transcription of the ith regulatory element; and/or the artificial gene editing system further comprises a second terminator at the 3' end of the II regulatory element capable of terminating transcription of the II regulatory element.
In a specific embodiment, the first terminator is SEQ ID No.14; and/or the second terminator is SEQ ID No.15.
In a specific embodiment, the I-th regulatory element and the II-th element can be cloned onto at least one vector. For example, the I-regulatory element expression cassette and the II-regulatory element transcription cassette can be cloned or integrated into the same vector. Or the I-regulatory element expression cassette and the II-regulatory element transcription cassette are respectively positioned on different vectors, the two cassettes or the vectors containing the two cassettes can be introduced into rice callus or protoplast cells by adopting a gene gun method, an agrobacterium infection method or a PEG-mediated transformation method.
In a specific embodiment, the ith regulatory element can be cloned into pCAMBIA 1300; the II regulatory element was cloned into the entry vector pENTR 4. pCAMBIA1300 is a binary vector based on Gateway reaction and used for genetic transformation of rice, and other similar binary vectors can be used.
In a specific embodiment, the first promoter, the ith regulatory element and the first terminator can be cloned into the pCAMBIA1300 vector.
In a specific embodiment, the second promoter, the second regulatory element II and the second terminator are cloned into pENTR4 vector. When the number of the II regulatory elements is plural, the number of the second promoter at the 5 'end and the number of the terminator at the 3' end are plural. I.e.the second promoter, the second regulatory element II and the second terminator form a set, and are present in sets. Multiple groups containing different II regulatory elements may be connected together in series. Wherein, the difference of the II regulatory elements mainly refers to the difference of the II-1 nucleotide sequence.
In a specific embodiment, the I-th regulatory element and the II-th regulatory element can be integrated on the same carrier or distributed on both carriers for use together.
The second application provides the application of any artificial gene editing system as one of the application in the mutation of rice genome.
The third application provides a method for realizing fixed-point editing of rice genome, which comprises the following steps:
1) Introducing any artificial gene editing system in the application into rice callus or rice protoplast through one of agrobacterium-mediated, gene gun bombardment or PEG-mediated transformation, and culturing to obtain rice plants;
2) Screening to obtain rice plants containing the required mutation; further, the rice plant is capable of producing a rice seed comprising the mutation.
In the case of introducing the artificial gene editing system, the PEG-mediated transformation method may be used, or the artificial gene editing system may be introduced into rice protoplasts or calli by one of a gene gun method or an Agrobacterium infection method, as will be readily appreciated by those skilled in the art. It is well known to those skilled in the art that rice genomic DNA consists of two strands, and thus, the target nucleotide sequence may be on either strand complementary thereto. For example, when the target nucleotide sequence is located in a sense strand of a functional gene, if a deletion or insertion of one to several bases occurs at a specific site of the functional gene and if one of the mutations is capable of obtaining the desired frame shift mutation to result in gene inactivation, this can be achieved by using the system, i.e., by directly performing a base deletion or insertion on the sense strand, a rice gene knockout mutant can be obtained; when the target nucleotide sequence is positioned in the sense strand of a certain functional gene, if C on a specific site of the functional gene is subjected to site-directed mutation to T, and if one mutation can obtain the expected amino acid in the corresponding functional protein, the system can be adopted, namely, the substitution of C in a triplet codon for T can be realized by directly carrying out base substitution on the sense strand, so that a rice gene function correction mutant is obtained; or when the target nucleotide sequence is located in the antisense strand of a certain functional gene, if G at a specific site of the functional gene is subjected to site-directed mutagenesis to A, and if one mutation can obtain the expected amino acid in the corresponding functional protein, the system can also be adopted, namely, the function correction mutant of the rice gene can be obtained by changing the triplet codon coded amino acid in the sense strand by site-directed mutagenesis of C in the antisense strand to T and then changing the corresponding complementary G in the sense strand to A; when the target nucleotide sequence is positioned in an antisense strand of a certain functional gene, if T on a specific site of the functional gene is subjected to site-directed mutagenesis to form C, and if one mutation can obtain the expected amino acid in the corresponding functional protein, the system can be adopted, namely, the function correction mutant of the rice gene can be obtained by changing the triplet codon coded amino acid in the sense strand by site-directed mutagenesis of A in the antisense strand to G and then replacing the corresponding complementary T in the sense strand with C; or when the target nucleotide sequence is located in the sense strand of a certain functional gene, if A at a specific site of the functional gene is subjected to site-directed mutation to G, and if one mutation can obtain the amino acid in the expected corresponding functional protein, the system can be adopted to realize that A in a triplet codon is replaced by G by directly carrying out base substitution on the sense strand, so that the rice gene function correction mutant is obtained.
The beneficial effects of this application lie in:
a) The number of the II regulatory elements can be plural, so that plural gene target sites in the rice cells can be edited at the same time.
b) Gene knockout (including deletion or insertion) in the rice genome, or substitution from base pair AT to base pair GC, or substitution from base pair GC to base pair AT, can be achieved by selecting different ith regulatory elements in the artificial gene editing system of the present application.
c) The novel gene editing tool box expands the PAM sequence of the existing gene editing tool box, has wider and wider PAM sequence, and can be widely applied to knockout of target genes or directed mutation of single base in rice genome, thereby creating gene functional inactivation or acquired mutant materials. In particular the use of base editing systems in plants is more efficient and economical than gene replacement by HR or gene insertion by NHEJ; the wide PAM sequence increases the possibility of realizing the base substitution of any site, provides an important gene function research tool for scientific researchers in the field of plant research, and provides a new strategy for cultivating new rice varieties in the directions of rice gene function research and molecular breeding.
Drawings
FIG. 1 shows a graph of the editing effect of pUbi:Cas9NG at the OsCERK1 gene target site.
FIG. 2 shows the effect of editing using pUbi rBE at the target site of the OsRLCK185 gene.
FIG. 3 shows the effect of editing using pUbi rBE23 at the target site of the Os03g02040 gene.
Detailed Description
The foregoing of the present application is further illustrated in detail by the following examples, which are not to be construed as limiting the invention.
Reagents in the examples of the present application are all commercially available unless otherwise specified.
pCAMBIA1300 was derived from the BioVector NTCC collection of typical cultures. An attR1-ccdB-attR2 module was inserted in pCAMBIA1300 for gateway reactions to accept the attL 1-targeting sequence transcription module-attL 2 module from the entry vector.
Source of pENTR4 vector: purchased from Invitrogen, usa.
Source of pBlueScript SK vector: purchased from Clontech corporation.
Example 1
Construction of recombinant plasmids
The technical route for constructing the vector is as follows:
1.1 pUbi-Cas 9NG recombinant plasmid construction
The amino acid sequence of Cas9NG is determined as shown in SEQ ID No.1, the gene sequence SEQ ID No.5 for expression in rice is determined according to the amino acid sequence of Cas9NG, the 4299bp nucleotide sequence shown in SEQ ID No.5 is artificially synthesized, cloned into pUC57 and named pUC57:Cas9NG (completed by Beijing Optimago Biotechnology Co., ltd.). SEQ ID No.12 (maize ubiquitin promoter Ubip), SEQ ID No.5, SEQ ID No.14 (Nos terminator) were then cloned into the pCAMBIA1300 vector in the 5 'to 3' direction, designated pUbi: cas9NG.
The main constitution of the plasmid pUbi:Cas9NG is as follows: the CaMV35S promoter (genebank accession number FJ362600.1, nucleotide sequence 10382 to 11162), hygromycin gene (genebank accession number KY 420085.1), NOS terminator (SEQ ID No. 14), pVS1 RepA (genebank accession number KY420084.1, nucleotide sequence 5755 to 6435), pVS1 origin of replication (genebank accession number KY420084.1, nucleotide sequence 4066 to 5066), attR1 (genebank accession number KR233518.1, nucleotide sequence 2055 to 2174), ccdB expression cassette genbank accession number KR233518.1, nucleotide sequence 3289 to 3594), attR2 (genebank accession number KR233518.1, nucleotide sequence 3635 to 3759), ubip promoter (SEQ ID No. 12), 9NG gene (SEQ ID No. 5), NOS terminator (Cas 14).
1.2 Construction of pUbi rBE recombinant plasmid
The present laboratory self-vector pUbi: rBE9 (Improved base editor for efficiently inducing genetic variations in rice with CRISPR/Cas9.Ren Bin, yan Fang, kuang Yongjie, li Na, zhang Dawei, zhou Xueping, lin Honghui and Zhou Huanbin. Molecular Plant,2018, 11:623-626) was digested with EcoR I and Spe I to recover a 5.05kb fragment; double digestion of cloning vector pBlueScript SK with EcoR I and SpeI, recovering 3kb linearized vector backbone; then the two are connected, and the obtained recombinant plasmid is named pBS rBE9 after transformation, colony PCR and enzyme digestion verification.
rAPO-R1 (SEQ ID No.19: agcaagtccgattgaatact) and UGI-F1 (SEQ ID No.20: tccggcggaagtacaaac) were used as primers, recombinant plasmid pBS: rBE9 was used as template, and I-5 was used TM PCR amplification was performed with 2X HighFidelity Master Mix (available from Clausin (Beijing) Biotechnology Co., ltd.) to obtain a vector backbone of about 4.0 kb; meanwhile, osCas9-Fg1-F1 (SEQ ID No.21: attgggacaaactctgtgg and OsCas9-Fg2-R1 (SEQ ID No.22: gtcaccgcccaactgcga) are used as primers, pUC57:Cas9NG is used as a template, and I-5 is utilized TM 2X HighFidelity Master Mix, a PCR fragment of about 4.3kb Cas9NG gene was obtained, which was purified, phosphorylated, ligated to the 4.0kb vector backbone, transformed, colony PCR and restriction enzyme verified, sequenced, and the recombinant plasmid was designated pBS rBE22.
pBS rBE was double digested with BamH I and SpeI and rBE 22.03 kb fragment was recovered; the vector pUbi:cas9NG was digested with BamH I and SpeI and an about 12kb vector backbone was recovered; the two were ligated and verified by transformation, colony PCR and cleavage, and the obtained recombinant plasmid was designated pUbi rBE.
The construction of the plasmid pUbi rBE is as follows: the CaMV35S promoter (genebank accession number FJ362600.1, 10382 to 11162 nucleotide sequence), hygromycin gene (genebank accession number KY 420085.1), NOS terminator (SEQ ID No. 14), pVS1 RepA (genebank accession number KY420084.1, 5755 to 6435 nucleotide sequence), pVS1 origin of replication (genebank accession number KY420084.1, 4066 to 5066 nucleotide sequence), attR1 (genebank accession number KR233518.1, 2055 to 2174 nucleotide sequence), ccdB expression cassette genbank accession number KR233518.1, 3289 to 3594 nucleotide sequence), attR2 (genebank accession number KR233518.1, 3635 to 3759 nucleotide sequence), ubip promoter (SEQ ID No. 12), AID gene (SEQ ID No. 6), cas9 gene (SEQ ID No. 5), UGID gene (SEQ ID No. 14), and UGI 2 terminator (SEQ ID No. 14).
1.3 Construction of pUbi rBE recombinant plasmid
The gene sequence SEQ ID No.8 for expression in rice was determined based on the amino acid sequence SEQ ID No.4, and a 1191bp nucleotide sequence as shown in SEQ ID No.8 was artificially synthesized, cloned into pUC57, designated pUC57: tadA (completed by Beijing qingke Biotechnology Co., ltd.).
pUC57-F1 (SEQ ID No.23: gcgcgcttggcgtaatca) and TadA-R1 (SEQ ID No.24: agccagaccaattgagtattttttgtc) were used as primers, and pUC57:TadA as a vector was used as a template, using I-5 TM 2X HighFidelity Master Mix, and purifying to obtain a carrier skeleton of 4.13 kb; then uses OsCas9-Fg1-F1 (SEQ ID No. 21) and NLS-R2 (SEQ ID No.25: cactagttcacccgccaac) as primers and pUC57:Cas9NG as a template to utilize I-5 TM 2X HighFidelity Master Mix, obtaining PCR fragment of Cas9NG gene of about 4.3kb, purifying, phosphorylating, connecting with the 4.13kb carrier skeleton, transforming, colony PCR and enzyme cutting, and sequencing for standby, the obtained recombinant plasmid is named pUC57: rBE23.
pUC57: rBE23 was digested with BamH I and SpeI and a 5.33kb rBE23 fragment was recovered; the vector pUbi: cas9NG was subjected to BamH I and SpeI and an about 12kb vector backbone was recovered; then the two are connected, and the recombinant plasmid obtained by sequencing after transformation, colony PCR and enzyme digestion verification is named pUbi: rBE23.
The construction of the plasmid pUbi rBE is as follows: the CaMV35S promoter (genebank accession number FJ362600.1, 10382 to 11162 nucleotide sequence), hygromycin gene (genebank accession number KY 420085.1), NOS terminator (SEQ ID No. 14), pVS1 RepA (genebank accession number KY420084.1, 5755 to 6435 nucleotide sequence), pVS1 origin of replication (genebank accession number KY420084.1, 4066 to 5066 nucleotide sequence), attR1 (genebank accession number KR233518.1, 2055 to 2174 nucleotide sequence), ccdB expression cassette genbank accession number KR233518.1, 3289 to 3594 nucleotide sequence), attR2 (genebank accession number KR233518.1, 3635 to 3759 nucleotide sequence), ubip promoter (SEQ ID No. 12), tadA gene (SEQ ID No. 8), cas9 gene (SEQ ID No. 14) terminator (SEQ ID No. 14).
1.4 Construction of pENTR 4-sgRNA
The sequence of the U6 promoter (SEQ ID No. 13), the nucleotide sequence containing two BtgZI cleavage sites (SEQ ID No. 10), the sequence of the gRNA scaffold (SEQ ID No. 9), (T) 8 termination sequence (SEQ ID No. 15), the sequence of the U6 promoter (SEQ ID No. 13), the nucleotide sequence containing two BsaI cleavage sites (SEQ ID No. 11), the sequence of the sgRNA (SEQ ID No. 9), (T) 8 termination sequence (SEQ ID No. 15) which are connected in sequence were synthesized artificially in the direction from the 5 'end to the 3' end and cloned into the pENTR4 vector and named pENTR4: sgRNA. Two BtgZ I or Bsa I cleavage sites are used for cloning target nucleotide sequences in specific genes.
Example 2: knockout of Rice endogenous Gene OsCERK1 Using pUbi: cas9NG
2.1 design and cloning of recognition sequences for OsCERK1 Gene
The transcribed sequence and the genomic sequence of the OsCERK1 (LOC_Os 08g 42580) gene are obtained from the MSU/TIGR rice genome databasehttp://rice.plantbiology.msu.edu/)。
For the OsCERK1 gene, a target nucleotide sequence (SEQ ID No.16: ggccttccttg) was designed containing a match to the end ligation of Btgz I cleavage siteggatccggcga, underlined as BamH I cleavage site, bolded as PAM sequence) primers were as follows: gOsCERK1-F1 (SEQ ID No.26: tgttggccttccttgggatccgg) and gOsCERK1-R1 (SEQ ID No.27: aaacccggatcccaaggaaggcc). After synthesis of the primer, the primer was phosphorylated using T4 polynucleotide kinase, annealed to form a double strand, and gOsCERK1-F1/R1 was cloned into the BtgZ I cleavage site of pENTR4: sgRNA vector, and sequencing confirmed that the insert was completely correct and named pENTR4: sgRNA-gOsCERK1.
2.2 PEG-mediated pUbi Cas9NG system transformed japonica rice variety Kitaake protoplast and gene editing detection
1) Preparation of rice protoplast:
treating the shelled mature rice seeds with 50% commercial disinfectant for 25min; cleaning with sterile water for 3-5 times, transferring the seeds into a sterile culture dish, and sucking out excessive water; seeds were placed on 1/2MS medium (2.2 g/LMS powder; 30g/L sucrose; 6g/L plant gel; pH 5.7) and incubated in a light incubator for 10 days. Cutting stem and leaf of rice seedling with scissors, cutting stem with single-sided blade, transferring the cut rice material into sterile triangular flask, adding 10ml enzymolysis liquid (1.5% cellulase; 0.3% segregation enzyme R-10;0.4M mannitol; 2mM 2- (N-morpholino) ethanesulfonic acid (MES); 0.1 XW 5 solution; pH5.7), lightly mixing, wrapping the flask body with tinfoil paper, vacuumizing for 30min, and placing in horizontal shaking table (rotation speed about 60 rpm), and performing enzymolysis for 6 hr. After the enzymolysis, a nylon mesh (pore size: 35 μm) was used for filtration and collection of the protoplast solution. Centrifuging the protoplast solution at room temperature (centrifugal force 1000g, time 5 min), discarding supernatant, adding the lower protoplast precipitate into W5 solution (154mM NaCl;125mM CaCl) 2 The method comprises the steps of carrying out a first treatment on the surface of the 25mM KCl;2mM MES; pH 5.7) and 1000g centrifuged for 5min, the supernatant was discarded, and an appropriate amount of MMG solution (0.4M mannitol was added; 20mM CaCl 2 The method comprises the steps of carrying out a first treatment on the surface of the 25mM MES; pH 5.7) the protoplasts were resuspended.
2) PEG-mediated transformation of rice protoplasts and extraction of genomic DNA from protoplasts
Mu.l of plasmid pUbi: cas9NG (concentration 1000 NG/. Mu.l), 20. Mu.l of plasmid pENTR4: sgRNA-gOsCERK1 (concentration 1000 NG/. Mu.l), 400. Mu.l of protoplasts, 440. Mu.l (equal volume) of 40% PEG4000 solution (40% (w/v) PEG 4000;0.4M mannitol; 100mM Ca (NO) 3 ) 2 The method comprises the steps of carrying out a first treatment on the surface of the pH 5.7), gently mixed, and left for 15min. The transformation was stopped by dilution with 1ml of W5 solution and centrifugation at 1000g for 2min. The supernatant was discarded, 1ml of the W5 solution was added, the protoplast was resuspended and transferred into a 12-well cell culture plate, the protoplast was collected after 2 days of incubation at room temperature with a tinfoil paper wrap protected from light, and the genomic DNA of the protoplast was extracted by the CTAB method.
3) Detection of mutation type at target site
Specific PCR primers for identification were designed based on the target site DNA sequence of the OsCERK1 gene: osCERK1-F1 (SEQ ID No.28: gacgtctacgcctttggtgt), osCERK1-R1 (SEQ ID No.29: gtcagctgcaaaatgcaatg), PCR product fragment 393bp. Firstly, carrying out enzyme digestion on genomic DNA of the protoplast by utilizing BamH I for 2 hours, then taking enzyme digestion products as templates, taking OsCERK1-F1 (SEQ ID No. 28) and OsCERK1-R1 (SEQ ID No. 29) as primers, and utilizing I-5 TM PCR amplification was performed at 2X High Fidelity Master Mix to obtain 393bp PCR fragment. The PCR product is subjected to BamHI enzymolysis for 3 hours, the PCR product of which the target site is not successfully edited is removed by agarose gel electrophoresis, the fragment of which the target site is subjected to alkali deletion or insertion is recovered by using an AxyPrep gel recovery kit, and the TA cloning vector and Sanger sequencing are connected to analyze the mutation type. As shown in FIG. 1, random sequencing resulted in 11 monoclonal sequences, 6 mutation types were detected in total, base deletion (-1, -2 and-4 bp), base insertion (+T and +A), and base substitution (G for A), respectively, indicating that Cas9NG can recognize the NGA PAM motif to complete gene editing.
Example 3: substitution of base C to T of rice endogenous gene OsRLCK185 by pUbi rBE22
The transcribed sequence and the genomic sequence of the OsRLCK185 (LOC_Os05g 30870) gene are obtained from an MSU/TIGR rice genome databasehttp://rice.plantbiology.msu.edu/)。
For the OsRLCK185 gene, designContains a target nucleotide sequence (SEQ ID No.17:gtgcactgccaagctcacactgc underlined as Alw 44I cleavage site, bolded as PAM sequence) primers were as follows: gOsRLCK185-F1 (SEQ ID No.30: gtgtgtgcactgccaagctcacac) and gOsRLCK185-R1 (SEQ ID No.31: aaacgtgtgattggcagtgcac). After synthesis of the primer, the primer was phosphorylated using T4 polynucleotide kinase, annealed to form a double strand, and the gOsRLCK185-F1/R1 was cloned into the Bsa I cleavage site of pENTR4: sgRNA vector, and sequencing confirmed that the insert was completely correct, designated pENTR4: sgRNA-gOsRLCK185.
The other operations are the same as in example 2.
Specific PCR primers for identification were designed based on the target site DNA sequence of the OsRLCK185 gene: osRLCK185-F1 (SEQ ID No.32: tccatggccttgttcctctt), osRLCK185-R1 (SEQ ID No.33: tgctgctagacacatccaca) and PCR fragment of 484bp. Firstly, carrying out enzyme digestion on genomic DNA of protoplast for 2 hours by utilizing Alw 44I, then taking enzyme digestion products as templates, taking OsRLCK185-F1 (SEQ ID No. 32) and OsRLCK185-R1 (SEQ ID No. 33) as primers, and utilizing I-5 TM PCR amplification was performed at 2X High Fidelity Master Mix to obtain a 484bp PCR fragment. The PCR product is subjected to enzymolysis for 3 hours by Alw 44I, the PCR product of which the target site is not successfully edited is removed by agarose gel electrophoresis, the fragment of which the target site base is successfully replaced is recovered by using an AxyPrep gel recovery kit, and the TA cloning vector and Sanger sequencing are connected to analyze the mutation type. As shown in FIG. 2, 10 monoclonal sequences were obtained by random sequencing, each of which detected the mutation of the target base G to A, of which there were 3 mutation types, G respectively 4,6 >A、G 4,6,9 >A and G 4,6,9,14 >A, this suggests that rBE can recognize the NGC PAM motif to complete base editing.
Example 4: substitution of the rice endogenous gene Os03G02040 by the base A to G Using pUbi rBE23
The transcribed sequence and the genomic sequence of the Os03g02040 gene are obtained from an MSU/TIGR rice genome databasehttp://rice.plantbiology.msu.edu/)。
For the Os03g02040 gene, a target containing a linkage matching with the end of Bsa I cleavage site was designedNucleotide sequence (SEQ ID No.18: aga)tctagaggttggtctacgt underlined as Xba I cleavage site, bolded as PAM sequence) primers are as follows: gOs03g02040-F1 (SEQ ID No.34: tgttgagatctagaggttggtcta) and gOs g02040-R1 (SEQ ID No.35: aaactagaccaacctctagatctc). After synthesis of the primer, the primer was phosphorylated using T4 polynucleotide kinase, annealed to form a double strand, gOs g02040-F1/R1 was cloned into the Bsa I cleavage site of pENTR4:sgRNA vector, and sequencing confirmed that the insert was completely correct, designated pENTR4:sgRNA-gOs g02040.
The other operations are the same as in example 2.
Specific PCR primers for identification were designed based on the target site DNA sequence of the Os03g02040 gene: os03g02040-F1 (SEQ ID No.36: cactagcacgacgcactttc), os03g02040-R1 (SEQ ID No.37: agaacacgcgcatcatatc), PCR fragment 493bp. The genome DNA of the protoplast is digested by Alw 44I for 2 hours, and then the digested product is used as a template, os03g02040-F1 (SEQ ID No. 36) and Os03g02040-R1 (SEQ ID No. 37) are used as primers, and I-5 is used TM PCR amplification was performed at 2X High Fidelity Master Mix to obtain a 493bp PCR fragment. And (3) after the PCR product is subjected to enzymolysis for 3 hours by Xba I, removing the PCR product of which the target site is not successfully edited by agarose gel electrophoresis, and recovering fragments of the base of the target site successfully replaced by using an AxyPrep gel recovery kit, and connecting a TA cloning vector and Sanger sequencing to analyze mutation types. As shown in fig. 3, sequencing results showed that the target base T mutation was detected to C, where this indicated that rBE23 can recognize the NGT PAM motif to complete base editing.
Sequence listing
<110> institute of plant protection of national academy of agricultural sciences
<120> set of artificial gene editing system for rice
<130> LHA2160702-D1
<160> 37
<170> SIPOSequenceListing 1.0
<210> 1
<211> 1417
<212> PRT
<213> Artificial sequence (non)
<400> 1
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Asp Lys Lys Tyr Ser Ile Gly Leu
35 40 45
Asp Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr
50 55 60
Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His
65 70 75 80
Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu
85 90 95
Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr
100 105 110
Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu
115 120 125
Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe
130 135 140
Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn
145 150 155 160
Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His
165 170 175
Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu
180 185 190
Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu
195 200 205
Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe
210 215 220
Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile
225 230 235 240
Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser
245 250 255
Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys
260 265 270
Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr
275 280 285
Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln
290 295 300
Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln
305 310 315 320
Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser
325 330 335
Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr
340 345 350
Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His
355 360 365
Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu
370 375 380
Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly
385 390 395 400
Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys
405 410 415
Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu
420 425 430
Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser
435 440 445
Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg
450 455 460
Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu
465 470 475 480
Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg
485 490 495
Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile
500 505 510
Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln
515 520 525
Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu
530 535 540
Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr
545 550 555 560
Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro
565 570 575
Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe
580 585 590
Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe
595 600 605
Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp
610 615 620
Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile
625 630 635 640
Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu
645 650 655
Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu
660 665 670
Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys
675 680 685
Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys
690 695 700
Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp
705 710 715 720
Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile
725 730 735
His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val
740 745 750
Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly
755 760 765
Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp
770 775 780
Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile
785 790 795 800
Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser
805 810 815
Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser
820 825 830
Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu
835 840 845
Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp
850 855 860
Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile
865 870 875 880
Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu
885 890 895
Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu
900 905 910
Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala
915 920 925
Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg
930 935 940
Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu
945 950 955 960
Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser
965 970 975
Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val
980 985 990
Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp
995 1000 1005
Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His
1010 1015 1020
Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr
1025 1030 1035 1040
Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1045 1050 1055
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr
1060 1065 1070
Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu
1075 1080 1085
Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr
1090 1095 1100
Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala
1105 1110 1115 1120
Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys
1125 1130 1135
Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Arg Pro Lys
1140 1145 1150
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys
1155 1160 1165
Lys Tyr Gly Gly Phe Val Ser Pro Thr Val Ala Tyr Ser Val Leu Val
1170 1175 1180
Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys
1185 1190 1195 1200
Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
1205 1210 1215
Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp
1220 1225 1230
Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly
1235 1240 1245
Arg Lys Arg Met Leu Ala Ser Ala Arg Phe Leu Gln Lys Gly Asn Glu
1250 1255 1260
Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His
1265 1270 1275 1280
Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1285 1290 1295
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile
1300 1305 1310
Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys
1315 1320 1325
Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln
1330 1335 1340
Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro
1345 1350 1355 1360
Arg Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Val Tyr Arg
1365 1370 1375
Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr
1380 1385 1390
Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Arg
1395 1400 1405
Pro Lys Lys Lys Arg Lys Val Gly Gly
1410 1415
<210> 2
<211> 211
<212> PRT
<213> Artificial sequence (non)
<400> 2
Met Asp Ser Leu Leu Met Asn Arg Arg Glu Phe Leu Tyr Gln Phe Lys
1 5 10 15
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
20 25 30
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
35 40 45
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
50 55 60
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
65 70 75 80
Phe Ile Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
100 105 110
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
115 120 125
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
130 135 140
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Gly Arg Thr Phe Lys
145 150 155 160
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175
Arg Arg Ile Leu Leu Pro Leu Tyr Glu Val Asp Asp Leu Arg Asp Ala
180 185 190
Phe Arg Thr Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr
195 200 205
Pro Glu Ser
210
<210> 3
<211> 91
<212> PRT
<213> Artificial sequence (non)
<400> 3
Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly
1 5 10 15
Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val
20 25 30
Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr
35 40 45
Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp
50 55 60
Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly
65 70 75 80
Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser
85 90
<210> 4
<211> 397
<212> PRT
<213> Artificial sequence (non)
<400> 4
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Pro
35 40 45
Ile Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys Thr Gly
100 105 110
Ala Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser
165 170 175
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Glu Val Glu Phe Ser His Glu Tyr
195 200 205
Trp Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg
210 215 220
Glu Val Pro Val Gly Ala Val Leu Val Leu Asn Asn Arg Val Ile Gly
225 230 235 240
Glu Gly Trp Asn Arg Ala Ile Gly Leu His Asp Pro Thr Ala His Ala
245 250 255
Glu Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg
260 265 270
Leu Ile Asp Ala Thr Leu Tyr Val Thr Phe Glu Pro Cys Val Met Cys
275 280 285
Ala Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Val
290 295 300
Arg Asn Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val Leu His
305 310 315 320
Tyr Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala
325 330 335
Asp Glu Cys Ala Ala Leu Leu Cys Tyr Phe Phe Arg Met Pro Arg Gln
340 345 350
Val Phe Asn Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly
355 360 365
Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser
370 375 380
Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser
385 390 395
<210> 5
<211> 4254
<212> DNA
<213> Artificial sequence (non)
<400> 5
atggactata aggatcacga tggcgactac aaggatcatg acattgacta taaggatgac 60
gacgataaga tggcacctaa gaagaaaagg aaagtcggca ttcatggcgt tccggcagcc 120
gacaaaaagt atagcatcgg cctcgatatt gggacaaact ctgtgggctg ggcggtaatt 180
accgacgagt acaaggtgcc tagtaagaaa tttaaagtgc tcggaaacac tgacaggcac 240
tctataaaga agaacctgat cggggcactg cttttcgact ccggagagac ggcggaggcg 300
acgcgtctca agcgtaccgc gcgccgcagg tacacaagaa ggaagaatag gatctgctac 360
ttgcaggaaa tcttcagtaa cgagatggcg aaggtcgacg atagtttctt tcatcggttg 420
gaagaatcgt tcctcgtaga ggaggacaaa aagcacgagc gtcacccaat attcgggaat 480
attgttgacg aggttgccta ccatgagaaa tatcctacaa tatatcacct ccgtaagaag 540
cttgtcgatt caactgataa ggctgatctc agactcatct atcttgccct cgcacatatg 600
attaagtttc gtggccactt cttgattgaa ggcgacctca acccggacaa ctcagatgtt 660
gacaagcttt ttatacagct cgtccagaca tataaccagc tgtttgaaga gaatcccatc 720
aatgcgagtg gggttgatgc taaagccatt ttgtccgcca ggttgtccaa atctcgcaga 780
ctggaaaacc tgatcgcaca gcttcccggt gaaaagaaaa acgggctctt cggcaatctc 840
atcgcactgt ccctcggcct caccccaaac ttcaagtcta acttcgacct ggccgaggat 900
gcgaagctcc agctgtcaaa agatacatac gacgacgatt tggacaatct gcttgcgcaa 960
ataggcgacc agtatgcgga cctgttcctg gctgccaaaa atctgtcaga tgcaatcctc 1020
ctgtccgata tattgcgtgt gaacaccgaa atcacgaagg caccgcttag cgcatccatg 1080
atcaagagat acgacgagca ccatcaggac ctcacactcc tcaaggcgct tgttcgtcag 1140
cagcttcccg agaaatataa ggaaattttt ttcgatcaaa gcaagaatgg atatgctggc 1200
tatattgacg gtggcgcttc gcaggaggag ttctataaat tcattaagcc gattctggag 1260
aagatggacg gaacggagga gctcctcgtc aagcttaacc gggaagacct gttgcggaag 1320
cagaggactt ttgataacgg ctctattccg caccaaatcc atctgggtga gttgcacgca 1380
atcttgagaa gacaagagga tttctacccg ttccttaagg ataacagaga gaagatagaa 1440
aaaatactga ccttcaggat accatactat gtgggcccac tggcgcgcgg aaatagtcgt 1500
ttcgcatgga tgactagaaa gtccgaagaa acgatcacgc catggaattt tgaggaagtg 1560
gtcgacaagg gcgcctctgc ccagagcttc atcgaaagga tgaccaattt tgacaaaaat 1620
ctgcctaacg aaaaggtgct tccgaagcac agcctgttgt atgaatactt cacagtttat 1680
aacgagctca ctaaggtcaa gtacgtcacg gagggcatgc gtaagcctgc tttcctgtct 1740
ggtgaacaaa aaaaggcgat tgtggacctc cttttcaaga cgaaccgtaa agttactgtg 1800
aagcaactga aagaggatta ctttaagaaa attgagtgct tcgacagtgt ggagatttcc 1860
ggtgtcgagg accggtttaa cgccagcctg ggtacgtatc atgacctgct taaaattatc 1920
aaggataaag atttcctgga taatgaagag aacgaagata tactggagga cattgtgttg 1980
actttgaccc tcttcgagga cagagagatg attgaggaaa gactgaagac ctacgcacac 2040
ctttttgatg acaaggtcat gaaacaactc aagcgccggc gctatactgg ctggggccgg 2100
ctttctcgca agctcatcaa tgggattcgg gataagcaat caggcaagac aattttggac 2160
ttcctcaaat ccgacggatt cgcaaatagg aattttatgc agctgataca tgacgactct 2220
ttgacattca aagaagacat acagaaggct caggtcagcg gccaaggaga ttctttgcac 2280
gagcatatcg ctaacttggc aggtagcccc gccataaaaa agggcattct tcaaacggta 2340
aaagttgttg acgaactcgt gaaggttatg ggccgtcata agccggaaaa cattgttatt 2400
gaaatggcta gggaaaatca gacgacccag aagggacaga aaaatagcag ggagcggatg 2460
aagagaattg aagagggaat taaggagctt ggatctcaga ttcttaagga gcaccctgtg 2520
gagaacaccc aacttcagaa tgaaaagctc tacctttact accttcaaaa cggccgggat 2580
atgtacgtcg atcaggaact tgacattaac cggttgagcg attatgacgt tgaccatatt 2640
gtgccccaat ctttccttaa agacgactct atcgacaata aagtgctgac gcgcagcgat 2700
aaaaatcgcg gtaagtcgga taatgtcccg tcggaagagg tggttaaaaa aatgaagaac 2760
tattggaggc aactcctgaa tgccaagctg atcactcaga ggaaattcga caatctcacc 2820
aaggcagaaa ggggtggact tagcgagctc gacaaggccg gttttatcaa aagacagctg 2880
gtggagacac gccaaatcac caaacacgtt gcccagatcc tggattcgag gatgaacacg 2940
aagtatgacg agaacgacaa gttgattagg gaagtcaagg tcatcacttt gaagtccaag 3000
ctggtgagcg actttcgcaa agacttccag ttttacaaag tcagggaaat taataactac 3060
caccacgccc acgacgccta ccttaacgcc gtggttggca cagcactcat caagaaatac 3120
cctaagctcg aatctgagtt cgtctatggc gactataagg tctacgacgt tagaaaaatg 3180
atcgcgaaat ctgagcagga aataggcaag gcaactgcca agtacttctt ctattccaat 3240
atcatgaact tttttaagac ggagattacc ctggcgaatg gtgagatccg caagcgccct 3300
ttgattgaga caaacggaga aacaggagag atcgtatggg acaaagggcg ggactttgct 3360
actgttagga aggtgctctc tatgccacaa gttaacattg tcaaaaaaac tgaagtgcag 3420
acaggtgggt ttagcaagga atctatccgc ccgaagagga actctgacaa gctgatcgcc 3480
cgcaagaaag attgggaccc gaaaaagtac ggaggattcg tttcccccac agttgcgtac 3540
tccgtgcttg tcgtggccaa agtggagaag ggcaagtcta agaagctcaa gagcgtcaaa 3600
gagttgttgg ggatcacgat tatggagcgg tcgtctttcg aaaagaatcc gatagatttt 3660
ctcgaggcca agggttataa agaagtcaag aaggatctta tcatcaagct ccctaagtac 3720
tccctctttg agcttgaaaa cggacggaaa agaatgctgg cttcagcgcg ctttcttcag 3780
aagggtaatg aactcgctct gccctcaaaa tatgtgaatt tcctttacct ggcatcacac 3840
tatgagaagc ttaagggttc tccagaggac aacgagcaga agcaactgtt cgttgaacaa 3900
cacaagcact accttgacga gattatcgag caaatcagcg agtttagcaa gcgcgttata 3960
ctggcagacg caaatcttga taaggtcctt agcgcctaca acaagcatag agacaaaccc 4020
atccgggagc aggccgagaa cattattcat ctcttcacct tgacgaatct tggggccccg 4080
cgcgcgttca agtacttcga tactaccata gacagaaagg tctatcgctc gacaaaggaa 4140
gttcttgacg ccacgctgat ccaccaaagt ataacaggcc tctatgagac acgcatcgac 4200
ctttcgcagt tgggcggtga ccgccccaaa aagaagagga aagttggcgg gtga 4254
<210> 6
<211> 633
<212> DNA
<213> Artificial sequence (non)
<400> 6
atggatagcc ttctcatgaa cagaagagag tttctctatc agtttaaaaa tgttcggtgg 60
gcgaagggga ggagagagac atatctctgc tatgttgtta agcggagaga ttctgcgacc 120
tcattctcac tcgattttgg ttatttgagg aacaagaatg gatgtcatgt cgaattgttg 180
tttctccggt atatttccga ctgggatttg gacccagggc ggtgttaccg ggtcacatgg 240
tttatttcct ggagtccatg ttacgactgt gcgcgccatg tcgccgactt cctcaggggt 300
aatcctaact tgtccttgcg gatttttaca gccagactct atttctgtga ggatcggaag 360
gcggaacccg aggggctgag aagactgcac cgcgctggcg tccaaatcgc catcatgact 420
tttaaggatt atttctactg ttggaacacg ttcgtcgaga accacggtcg gaccttcaaa 480
gcctgggaag ggctgcatga aaattccgtg aggttgtccc ggcaactccg cagaatactc 540
ctgccccttt atgaggtcga cgatctcaga gacgccttta gaactagcgg aagcgagacg 600
ccagggactt ctgaatcggc cacccccgag agc 633
<210> 7
<211> 273
<212> DNA
<213> Artificial sequence (non)
<400> 7
tccggcggaa gtacaaacct ttcagacatt atagaaaagg aaaccggcaa gcaactcgtc 60
atccaggaat ccatacttat gctccctgaa gaggtggaag aagtgatcgg taataaacca 120
gagagcgaca tacttgtcca caccgcttat gacgaaagta cagacgaaaa cgtcatgctt 180
ctgacgagtg atgcccccga atacaaacct tgggcgctcg tcatccagga ttccaatggg 240
gagaataaaa taaagatgct ctctggaggc agc 273
<210> 8
<211> 1191
<212> DNA
<213> Artificial sequence (non)
<400> 8
atgtccgaag tggaatttag ccatgaatat tggatgcggc acgccctcac gcttgccaag 60
agagcctggg atgagaggga ggttcccgtc ggtgccgtgt tggtccataa caacagggtg 120
attggggaag gatggaacag acccattggg cgccatgatc caactgccca tgcagagatt 180
atggcgctca ggcaaggggg gttggttatg caaaactacc ggcttattga cgcaaccctg 240
tatgtcaccc ttgaaccctg tgttatgtgc gcgggggcca tgatacactc tcggataggg 300
cgggtggtgt tcggggctcg ggatgctaag accggagctg ctggttccct catggatgtc 360
ttgcatcatc ctggtatgaa ccatagagtc gagattactg aaggcattct cgcagacgaa 420
tgcgctgccc ttctctcaga tttctttaga atgcgcagac aggaaataaa ggctcaaaaa 480
aaagcacaga gttccacgga ttccggcggg tcgagcggtg gcagctccgg ctccgagaca 540
cccggtacga gtgaatccgc tacgcccgaa tcctcggggg gaagctctgg aggctcatca 600
gaagtcgagt tctcccatga gtattggatg aggcacgccc tcactcttgc gaagagggcc 660
agggacgaga gggaggtgcc ggtcggtgct gtcctggtct tgaataacag ggtgataggc 720
gaaggttgga acagggctat tggccttcat gaccctactg ctcatgcgga aatcatggca 780
cttagacagg ggggcctcgt tatgcaaaat taccgcctga tcgacgccac tctttatgtc 840
acatttgaac catgtgttat gtgtgcgggc gctatgatcc attcacgcat aggtcgcgtg 900
gtttttggag ttcgcaacgc gaaaacaggg gctgcaggct ctctgatgga cgttttgcac 960
tatccgggaa tgaaccatag agtcgaaatc acagaaggga ttttggcaga cgaatgcgcg 1020
gctcttcttt gttatttttt cagaatgccc cgccaagtgt ttaatgctca aaagaaagcg 1080
cagagtagca cagactcggg gggatcttct gggggctcgt ctggttccga gactcccgga 1140
acttccgagt cggcaacacc tgaatcctcc ggcggctctt cgggcggatc t 1191
<210> 9
<211> 76
<212> DNA
<213> Artificial sequence (non)
<400> 9
gttttagagc tagaaatagc aagttaaaat aaggctagtc cgttatcaac ttgaaaaagt 60
ggcaccgagt cggtgc 76
<210> 10
<211> 25
<212> DNA
<213> Artificial sequence (non)
<400> 10
tgtgtagaga ccaaaggagg tctca 25
<210> 11
<211> 41
<212> DNA
<213> Artificial sequence (non)
<400> 11
tgttggctag gatccatcgc agtcagcgat gagtacagca a 41
<210> 12
<211> 1765
<212> DNA
<213> Artificial sequence (non)
<400> 12
gcagcgtgac ccggtcgtgc ccctctctag agataatgag cattgcatgt ctaagttata 60
aaaaattacc acatattttt tttgtcacac ttgtttgaag tgcagtttat ctatctttat 120
acatatattt aaactttact ctacgaataa tataatctat agtactacaa taatatcagt 180
gttttagaga atcatataaa tgaacagtta gacatggtct aaaggacaat tgagtatttt 240
gacaacagga ctctacagtt ttatcttttt agtgtgcatg tgttctcctt tttttttgca 300
aatagcttca cctatataat acttcatcca ttttattagt acatccattt agggtttagg 360
gttaatggtt tttatagact aattttttta gtacatctat tttattctat tttagcctct 420
aaattaagaa aactaaaact ctattttagt ttttttattt aataatttag atataaaata 480
gaataaaata aagtgactaa aaattaaaca aatacccttt aagaaattaa aaaaactaag 540
gaaacatttt tcttgtttcg agtagataat gccagcctgt taaacgccgt cgacgagtct 600
aacggacacc aaccagcgaa ccagcagcgt cgcgtcgggc caagcgaagc agacggcacg 660
gcatctctgt cgctgcctct ggacccctct cgagagttcc gctccaccgt tggacttgct 720
ccgctgtcgg catccagaaa ttgcgtggcg gagcggcaga cgtgagccgg cacggcaggc 780
ggcctcctcc tcctctcacg gcacggcagc tacgggggat tcctttccca ccgctccttc 840
gctttccctt cctcgcccgc cgtaataaat agacaccccc tccacaccct ctttccccaa 900
cctcgtgttg ttcggagcgc acacacacac aaccagatct cccccaaatc cacccgtcgg 960
cacctccgct tcaaggtacg ccgctcgtcc tccccccccc cccctctcta ccttctctag 1020
atcggcgttc cggtccatgg ttagggcccg gtagttctac ttctgttcat gtttgtgtta 1080
gatccgtgtt tgtgttagat ccgtgctgct agcgttcgta cacggatgcg acctgtacgt 1140
cagacacgtt ctgattgcta acttgccagt gtttctcttt ggggaatcct gggatggctc 1200
tagccgttcc gcagacggga tcgatttcat gatttttttt gtttcgttgc atagggtttg 1260
gtttgccctt ttcctttatt tcaatatatg ccgtgcactt gtttgtcggg tcatcttttc 1320
atgctttttt tttgtcttgg ttgtgatgat gtggtgtggt tgggcggtcg ttcattcgtt 1380
ctagatcgga gtagaatact gtttcaaact acctggtgta tttattaatt ttggaactgt 1440
atgtgtgtgt catacatctt catagttacg agtttaagat ggatggaaat atcgatctag 1500
gataggtata catgttgatg tgggttttac tgatgcatat acatgatggc atatgcagca 1560
tctattcata tgctctaacc ttgagtacct atctattata ataaacaagt atgttttata 1620
attattttga tcttgatata cttggatgat ggcatatgca gcagctatat gtggattttt 1680
ttagccctgc cttcatacgc tatttatttg cttggtactg tttcttttgt cgatgctcac 1740
cctgttgttt ggtgttactt ctgca 1765
<210> 13
<211> 322
<212> DNA
<213> Artificial sequence (non)
<400> 13
aagaacgaac taagccggac aaaaaaagga gcacatatac aaaccggttt tattcatgaa 60
tggtcacgat ggatgatggg gctcagactt gagctacgag gccgcaggcg agagaagcct 120
agtgtgctct ctgcttgttt gggccgtaac ggaggatacg gccgacgagc gtgtactacc 180
gcgcgggatg ccgctgggcg ctgcgggggc cgttggatgg ggatcggtgg gtcgcgggag 240
cgttgagggg agacaggttt agtaccacct cgcctaccga acaatgaaga acccacctta 300
taaccccgcg cgctgccgct tg 322
<210> 14
<211> 253
<212> DNA
<213> Artificial sequence (non)
<400> 14
gatcgttcaa acatttggca ataaagtttc ttaagattga atcctgttgc cggtcttgcg 60
atgattatca tataatttct gttgaattac gttaagcatg taataattaa catgtaatgc 120
atgacgttat ttatgagatg ggtttttatg attagagtcc cgcaattata catttaatac 180
gcgatagaaa acaaaatata gcgcgcaaac taggataaat tatcgcgcgc ggtgtcatct 240
atgttactag atc 253
<210> 15
<211> 8
<212> DNA
<213> Artificial sequence (non)
<400> 15
tttttttt 8
<210> 16
<211> 22
<212> DNA
<213> Artificial sequence (non)
<400> 16
ggccttcctt gggatccggc ga 22
<210> 17
<211> 23
<212> DNA
<213> Artificial sequence (non)
<400> 17
gtgcactgcc aagctcacac tgc 23
<210> 18
<211> 22
<212> DNA
<213> Artificial sequence (non)
<400> 18
agatctagag gttggtctac gt 22
<210> 19
<211> 20
<212> DNA
<213> Artificial sequence (non)
<400> 19
agcaagtccg attgaatact 20
<210> 20
<211> 18
<212> DNA
<213> Artificial sequence (non)
<400> 20
tccggcggaa gtacaaac 18
<210> 21
<211> 19
<212> DNA
<213> Artificial sequence (non)
<400> 21
attgggacaa actctgtgg 19
<210> 22
<211> 18
<212> DNA
<213> Artificial sequence (non)
<400> 22
gtcaccgccc aactgcga 18
<210> 23
<211> 18
<212> DNA
<213> Artificial sequence (non)
<400> 23
gcgcgcttgg cgtaatca 18
<210> 24
<211> 27
<212> DNA
<213> Artificial sequence (non)
<400> 24
agccagacca attgagtatt ttttgtc 27
<210> 25
<211> 18
<212> DNA
<213> Artificial sequence (non)
<400> 25
actagttcac ccgccaac 18
<210> 26
<211> 23
<212> DNA
<213> Artificial sequence (non)
<400> 26
tgttggcctt ccttgggatc cgg 23
<210> 27
<211> 23
<212> DNA
<213> Artificial sequence (non)
<400> 27
aaacccggat cccaaggaag gcc 23
<210> 28
<211> 20
<212> DNA
<213> Artificial sequence (non)
<400> 28
gacgtctacg cctttggtgt 20
<210> 29
<211> 19
<212> DNA
<213> Artificial sequence (non)
<400> 29
tcagctgcaa aatgcaatg 19
<210> 30
<211> 24
<212> DNA
<213> Artificial sequence (non)
<400> 30
gtgtgtgcac tgccaagctc acac 24
<210> 31
<211> 22
<212> DNA
<213> Artificial sequence (non)
<400> 31
aaacgtgtga ttggcagtgc ac 22
<210> 32
<211> 20
<212> DNA
<213> Artificial sequence (non)
<400> 32
tccatggcct tgttcctctt 20
<210> 33
<211> 20
<212> DNA
<213> Artificial sequence (non)
<400> 33
tgctgctaga cacatccaca 20
<210> 34
<211> 24
<212> DNA
<213> Artificial sequence (non)
<400> 34
tgttgagatc tagaggttgg tcta 24
<210> 35
<211> 24
<212> DNA
<213> Artificial sequence (non)
<400> 35
aaactagacc aacctctaga tctc 24
<210> 36
<211> 20
<212> DNA
<213> Artificial sequence (non)
<400> 36
cactagcacg acgcactttc 20
<210> 37
<211> 20
<212> DNA
<213> Artificial sequence (non)
<400> 37
cagaacacgc gcatcatatc 20

Claims (18)

1. An artificial gene editing system for rice, the artificial gene editing system comprising:
a regulatory element of type I encoding a nucleotide sequence such as amino acid sequence I; wherein the amino acid sequence I is an amino acid sequence shown as SEQ ID No. 1;
a II regulatory element which is a II-1 nucleotide sequence and a II-2 nucleotide sequence which are sequentially connected in series from a 5 'end to a 3' end; the II-1 th nucleotide sequence comprises a target nucleotide sequence; the target site in the target nucleotide sequence is positioned at any one of 3 to 5 positions from 3 ʹ end to 5 ʹ end of the target nucleotide sequence, the target nucleotide sequence is derived from rice genome, and the target nucleotide sequence contains the target site to be mutated in rice genome; the nucleotide sequence II-2 is derived from streptococcus pyogenesStreptococcus pyogenes) The nucleotide sequence of II-2 is shown as SEQ ID No. 9; the II-1 nucleotide sequence and the II-2 nucleotide sequence are transcribed and fused, and the product can guide the protein coded by the I regulatory element to a target site to be mutated in rice genome and mutate the base at the target site;
when the number of the II regulatory elements is plural, the II-1 nucleotide sequences contained in each of the II regulatory elements are different from each other;
the target nucleotide sequence is determined by:
1) Determining a nucleotide sequence to be modified on a rice genome;
2) Judging that the nucleotide sequence to be modified determined in the step 1) is a specific sequence in a genome,
judging whether the change caused by the mutation of the base of the nucleotide site to be mutated meets the expectations or not according to the I-th regulating element; or judging whether the change caused by mutation of the reverse complementary base of the nucleotide site to be mutated meets the expectations or not according to the I-th regulatory element;
for the predictors, the nucleotide sites to be mutated are potential target sites;
3) Screening a target sequence in a nucleotide sequence to be modified or its reverse complement: searching in the 3 ʹ end direction of a potential target site to confirm the presence of a recognition motif recognized by the amino acid sequence encoded by the I-th regulatory element and that the target site is at a position-3 to-5 upstream of the 5 ʹ end of the recognition motif, whereby the 17 to 21 nucleotide sequence upstream of the 5 ʹ end of the recognition motif is determined to be the target nucleotide sequence; the recognition module is 5 ʹ -N 1 GN 2 -3 ʹ, wherein the N 1 And N 2 A, G, C and T, independently.
2. The artificial gene editing system of claim 1, wherein the nucleotide sequence of the I-th regulatory element is a nucleotide sequence suitable for expression in rice and the nucleotide sequence of the II-th regulatory element is a nucleotide sequence suitable for transcription in rice.
3. The artificial gene editing system according to claim 2, wherein the nucleotide coding sequence encoding the amino acid sequence shown as SEQ ID No.1 is shown as SEQ ID No. 5.
4. The artificial gene editing system according to claim 1, wherein the 3' end of the II-1 nucleotide sequence further comprises a cloning site comprising a cleavage site for a type IIs restriction enzyme, into which the target nucleotide sequence is cloned via the cloning site on the II-1 nucleotide sequence, such that the II-1 nucleotide sequence is transcriptionally fused with the II-2 sequence;
when the II-th regulatory element is plural, the cleavage sites of the type IIS restriction enzymes for cloning different target nucleotide sequences are different from each other.
5. The artificial gene editing system according to claim 1, wherein the target nucleotide sequence is 17 to 21 nucleotide sequences upstream of the 5 ʹ end of the recognition module, and nucleotide sequences containing five consecutive T's are eliminated.
6. The artificial gene editing system of claim 1, further comprising a first promoter in rice at the 5 ʹ end of the I regulatory element for initiating transcription of the I regulatory element; and/or the artificial gene editing system further comprises a second promoter for use in rice at the 5 ʹ end of the II regulatory element and for initiating transcription of the II regulatory element.
7. The artificial gene editing system of claim 6, wherein the first promoter is an RNA polymerase II type promoter; and/or the second promoter is an RNA polymerase type III promoter.
8. The artificial gene editing system of claim 6, wherein the sequence of the first promoter is SEQ ID No.12; and/or the sequence of the second promoter is SEQ ID No.13.
9. The artificial gene editing system of any of claims 1 to 8, further comprising a first terminator at the 3' end of the I regulatory element that terminates transcription of the I regulatory element; and/or the artificial gene editing system further comprises a second terminator at the 3' end of the II regulatory element that terminates transcription of the II regulatory element.
10. The artificial gene editing system of claim 9, wherein the sequence of the first terminator is SEQ ID No.14; and/or the sequence of the second terminator is SEQ ID No.15.
11. The artificial gene editing system of claim 1, wherein the I-th regulatory element and the II-th regulatory element are cloned into at least one vector.
12. The artificial gene editing system of claim 1, wherein the I-th regulatory element is cloned into pCAMBIA1300 and the II-th regulatory element is cloned into entry vector pENTR 4.
13. The artificial gene editing system of claim 9, wherein the first promoter, the ith regulatory element and the first terminator are cloned into a pCAMBIA1300 vector.
14. The artificial gene editing system of claim 9, wherein the second promoter, the II regulatory element and the second terminator are cloned into a pENTR4 vector.
15. The artificial gene editing system of claim 1, wherein the I-th regulatory element and the II-th regulatory element are integrated on the same carrier or are distributed on both carriers for use together.
16. Use of the artificial gene editing system of any of claims 1 to 15 for rice genome mutation.
17. A method for mutating a rice genome, comprising the steps of:
1) Introducing the artificial gene editing system according to any one of claims 1 to 15 into rice callus or rice protoplast by one of agrobacterium-mediated, gene gun bombardment, or PEG-mediated transformation, and then culturing to obtain rice plants;
2) Screening to obtain rice plants containing the required mutation.
18. The method of claim 17, wherein said rice plant produces a rice seed comprising said mutation.
CN202111388744.1A 2018-11-07 2018-11-07 Artificial gene editing system for rice Active CN113913454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111388744.1A CN113913454B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111388744.1A CN113913454B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice
CN201811320030.5A CN109321593B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201811320030.5A Division CN109321593B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice

Publications (2)

Publication Number Publication Date
CN113913454A CN113913454A (en) 2022-01-11
CN113913454B true CN113913454B (en) 2023-07-21

Family

ID=65261106

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202111388739.0A Active CN114045303B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice
CN202111388744.1A Active CN113913454B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice
CN201811320030.5A Active CN109321593B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111388739.0A Active CN114045303B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201811320030.5A Active CN109321593B (en) 2018-11-07 2018-11-07 Artificial gene editing system for rice

Country Status (1)

Country Link
CN (3) CN114045303B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021056302A1 (en) * 2019-09-26 2021-04-01 Syngenta Crop Protection Ag Methods and compositions for dna base editing
CN110760540A (en) * 2019-11-29 2020-02-07 中国农业科学院植物保护研究所 Gene editing artificial system for rice and application thereof
CN111100852B (en) * 2019-12-16 2021-04-13 中国农业科学院植物保护研究所 Directional mutation method of OsALS1 and crop endogenous gene directed evolution method
CN117402855B (en) * 2023-12-14 2024-03-19 中国农业科学院植物保护研究所 Cas protein, gene editing system and application

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107043779A (en) * 2016-12-01 2017-08-15 中国农业科学院作物科学研究所 A kind of fixed point base of CRISPR/nCas9 mediations replaces the application in plant
CN107177625A (en) * 2017-05-26 2017-09-19 中国农业科学院植物保护研究所 The artificial carrier's system and directed mutagenesis method of a kind of rite-directed mutagenesis
CN108034671A (en) * 2017-12-08 2018-05-15 中国农业科学院植物保护研究所 One plasmid vector and establish the method for plant population using it

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014194190A1 (en) * 2013-05-30 2014-12-04 The Penn State Research Foundation Gene targeting and genetic modification of plants via rna-guided genome editing
EP3207139A1 (en) * 2014-10-17 2017-08-23 The Penn State Research Foundation Methods and compositions for multiplex rna guided genome editing and other rna technologies
US9982279B1 (en) * 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) * 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107043779A (en) * 2016-12-01 2017-08-15 中国农业科学院作物科学研究所 A kind of fixed point base of CRISPR/nCas9 mediations replaces the application in plant
CN107177625A (en) * 2017-05-26 2017-09-19 中国农业科学院植物保护研究所 The artificial carrier's system and directed mutagenesis method of a kind of rite-directed mutagenesis
CN108034671A (en) * 2017-12-08 2018-05-15 中国农业科学院植物保护研究所 One plasmid vector and establish the method for plant population using it

Also Published As

Publication number Publication date
CN109321593A (en) 2019-02-12
CN113913454A (en) 2022-01-11
CN114045303A (en) 2022-02-15
CN114045303B (en) 2023-08-29
CN109321593B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
CN113913454B (en) Artificial gene editing system for rice
CN107630031B (en) Method and system for regulating and controlling plant fertility
AU2016309392A1 (en) Method for obtaining glyphosate-resistant rice by site-directed nucleotide substitution
EA038896B1 (en) Method of conducting site-directed modification of plant genomes using non-inheritable materials
CN108064297B (en) Wheat fertility-related gene TaMS7 and application method thereof
CN108034671B (en) Plasmid vector and method for establishing plant population by using same
CN113801891B (en) Construction method and application of beet BvCENH3 gene haploid induction line
CN110892074A (en) Compositions and methods for increasing the shelf life of bananas
CN110066824B (en) Artificial base editing system for rice
US20230242931A1 (en) Compositions and methods for improving crop yields through trait stacking
CN107227303B (en) Application of OsGA3ox1 gene in creation of rice male sterile line
US20220315938A1 (en) AUGMENTED sgRNAS AND METHODS FOR THEIR USE TO ENHANCE SOMATIC AND GERMLINE PLANT GENOME ENGINEERING
CN109112158A (en) A method of intelligent sterile line is formulated based on toxicity detoxification genes
CN115894646B (en) OsJDG1 gene and application thereof in regulation of rice grain type and thousand grain weight
KR102516522B1 (en) pPLAⅡη gene inducing haploid plant and uses thereof
CN113493803B (en) Alfalfa CRISPR/Cas9 genome editing system and application thereof
CN113024645B (en) Application of wheat transcription factor WRKY70 gene in regulation and control of plant growth and development
CN104805100B (en) Paddy gene OsS μ 2 applications in plant leaf blade aging is delayed of BP
CN112080513A (en) Rice artificial genome editing system with expanded editing range and application thereof
CN110760540A (en) Gene editing artificial system for rice and application thereof
KR102551064B1 (en) Novel U6 promoter separated form grapevine and use of the same
CN114181951B (en) Corn sheath blight disease resistance related gene Zmbzip45 and application thereof
CN115820691B (en) LbCPf1 variant-based rice base editing system and application
CN110129359B (en) Method for detecting gene editing event and determining gene editing efficiency and application thereof
KR102550308B1 (en) Method for producing genome-edited tomato plant with increased salt tolerance by SlHKT1;2 gene editing and genome-edited tomato plant with increased salt tolerance produced by the same method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant