CN110628794B - Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof - Google Patents

Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof Download PDF

Info

Publication number
CN110628794B
CN110628794B CN201910938668.3A CN201910938668A CN110628794B CN 110628794 B CN110628794 B CN 110628794B CN 201910938668 A CN201910938668 A CN 201910938668A CN 110628794 B CN110628794 B CN 110628794B
Authority
CN
China
Prior art keywords
sgrna
organism
resistance gene
sequence
target sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910938668.3A
Other languages
Chinese (zh)
Other versions
CN110628794A (en
Inventor
徐雯
杨进孝
张成伟
赵思
冯峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Academy of Agriculture and Forestry Sciences
Original Assignee
Beijing Academy of Agriculture and Forestry Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Academy of Agriculture and Forestry Sciences filed Critical Beijing Academy of Agriculture and Forestry Sciences
Priority to CN201910938668.3A priority Critical patent/CN110628794B/en
Publication of CN110628794A publication Critical patent/CN110628794A/en
Application granted granted Critical
Publication of CN110628794B publication Critical patent/CN110628794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/65Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression using markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8209Selection, visualisation of transformants, reporter constructs, e.g. antibiotic resistance markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8218Antisense, co-suppression, viral induced gene silencing [VIGS], post-transcriptional induced gene silencing [PTGS]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Abstract

The invention discloses a cell enrichment technology of C.T base substitution by taking an inactivated screening agent resistance gene as a report system and application thereof. The cell enrichment technology carrier comprises the following reagents: sgRNA, C.T base substitution system, and a selection agent resistance gene with function loss; the sgRNA consists of tRNA-sgRNA targeting a target gene target sequence and tRNA-sgRNA targeting a loss-of-function screening agent resistance gene target sequence; the C.T base substitution system can restore the function of the selection agent resistance gene with the loss of function by carrying out C.T base substitution on the selection agent resistance gene target sequence with the loss of function under the guidance of tRNA-sgRNA of the target gene sequence of the selection agent resistance gene with the loss of function. The invention realizes the enrichment of C.T base substitution cells on the cell level and greatly improves the C.T base substitution efficiency.

Description

Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof
Technical Field
The invention relates to the field of biotechnology, in particular to a cell enrichment technology of C.T base substitution by taking an inactivated screening agent resistance gene as a report system and application thereof.
Background
The CRISPR-Cas9 technology has become a powerful genome editing means and is widely applied to many tissues and cells. The CRISPR/Cas9 protein-RNA complex is localized on the target by a guide RNA (guide RNA), cleaved to generate a DNA double strand break (dsDNA break, DSB), and the organism will then instinctively initiate a DNA repair mechanism to repair the DSB. Repair mechanisms are generally of two types, one being non-homologous end joining (NHEJ) and the other being homologous recombination (HDR). In general, NHEJ dominates, and repair produces random indels (insertions or deletions) much higher than precise repair. For base exact substitution, the application of using HDR to achieve base exact substitution is greatly limited because of the low efficiency of HDR and the need for a DNA template.
In 2016, two laboratories such as David Liu and Akihiko Kondo independently report two different types of Cytosine Base Editors (CBEs), respectively, and use two different types of cytidine deaminases rAPOBEC1(rat APOBEC1) and PmCDA1(activation-induced Cytosine deaminase (AID) orthogonal template), which are based on the principle that the base editing of a single Cytosine (C) base is directly realized by using the cytidine deaminase, but not by generating DSB and initiating HDR repair, so that the base editing efficiency of C to be replaced by Thymine (Thymine, T) is greatly improved. Specifically, dead Cas9(dCas9) or the Cas9 nickase (Cas9n) are positioned to a target point through sgRNA together with rAPOBEC1 or PmCDA1, rAPOBEC1 or PmCDA1 catalyzes cytosine deamination reaction of C on unpaired single-stranded DNA to Uracil (U), and the U is paired with Adenine (Adenine, a) through DNA repair and finally paired with a through DNA replication, thereby realizing C-to-T conversion. The mean mutation rate of SpCas9n (D10A) & rAPOBEC1/PmCDA1& UGI base editing system (which contains uracil DNA glycosylase inhibitor, UGI)) was higher in the editor tested for two reasons: firstly, UGI can inhibit Uracil DNA Glycosylase (UDG) from catalyzing and removing U in DNA, and secondly, SpCas9n (D10A) generates a nick on a non-editing chain, and induces a eukaryotic mismatch repair mechanism or a long-patch BER (base-extension repair) repair mechanism to promote more preferential repair of U: G mismatch into U: A.
At present, research on enrichment of C.T base-substituted cells in plants by reporter gene-mediated cell enrichment technology is very limited, and no report is available on enrichment of C.T base-substituted cells at the cellular level and improvement of C.T base substitution efficiency by using a selection marker used in the transformation process.
Disclosure of Invention
The invention aims to provide a cell enrichment technology of C.T base substitution by taking an inactivated screening agent resistance gene as a report system, which can realize the enrichment of C.T base substituted cells on the cell level and further improve the C.T base substitution efficiency of a target spot.
In order to achieve the above object, the present invention first provides a kit comprising a sgRNA or a biological material related to the sgRNA, a c.t base substitution system, and a selection agent resistance gene for loss of function or a biological material related to the selection agent resistance gene for loss of function;
the sgRNA consists of sgRNA targeting a target gene target sequence and sgRNA targeting the loss-of-function screening agent resistance gene target sequence;
the sgRNA structure is as follows: an RNA-sgRNA backbone transcribed from the target sequence;
the c.t base substitution system comprises Cas9 nuclease or a biological material associated with the Cas9 nuclease and a cytosine deaminase or a biological material associated with the cytosine deaminase;
the C.T base replacement system can restore the function of the screening agent resistance gene with the loss of function by carrying out C.T base replacement on the screening agent resistance gene target sequence with the loss of function under the guidance of sgRNA of the screening agent resistance gene target sequence with the targeted loss of function;
the sgRNA backbone is S1) or S2) or S3):
s1) replacing T in the 571-646 th site of the sequence 1 with U to obtain an RNA molecule;
s2) carrying out substitution and/or deletion and/or addition of one or more nucleotides on the RNA molecule shown in S1) and having the same function;
s3) and S1) or S2) and has the same function.
In the kit, the sgRNA may be specifically tRNA-sgRNA; the tRNA-sgRNA consists of tRNA-sgRNA targeting a target gene target sequence and tRNA-sgRNA targeting the loss-of-function screening agent resistance gene target sequence;
the tRNA-sgRNA structure is as follows: tRNA-RNA transcribed from the target sequence-sgRNA backbone;
the tRNA is R1) or R2) or R3):
r1) is replaced by U in the 474-550 th position of the sequence 1;
r2) the RNA molecule shown in R1) is substituted and/or deleted and/or added by one or more nucleotides and has the same function;
r3) and R1) or R2) have more than 75 percent of identity or 75 percent of identity and have the same function.
In the kit, the number of target sequences of the target gene to be targeted can be one or two or more; the number of target sequences of the screening agent resistance gene targeting the loss of function may be one or two or more. The size of the target sequence can be 15-25bp, further 18-22bp, and further 20 bp.
The screening agent resistance gene with the loss of function meets the following conditions: the function or activity of the screening agent resistance gene with the function loss is lost, and the function of the screening agent resistance gene with the function loss can be recovered after C.T base substitution is carried out on a target sequence of the screening agent resistance gene with the function loss. The target sequence of the screening agent resistance gene with the loss of function can be a target sequence on the screening agent resistance gene with the loss of function (positioned in the screening agent resistance gene with the loss of function), and can also be a target sequence additionally added in the screening agent resistance gene with the loss of function or at the 5 'end or the 3' end. When a target sequence (denoted as a surrogate target sequence) is additionally added to the sequence of the selection agent resistance gene whose function is lost in order that the gene can recover its function after the C.T base substitution, the sequence of the selection agent resistance gene whose function is lost includes not only the selection agent resistance gene itself whose function is lost but also the surrogate target sequence and, if necessary, one or two or more bases additionally added in order to ensure that the selection agent resistance gene can be translated in a normal reading frame after the addition of the surrogate target sequence.
Further, the selection agent resistance gene with loss of function may be a sequence obtained by deleting the initiation codon (e.g., ATG) of the selection agent resistance gene and adding a surrogate target sequence to the 5' end of the selection agent resistance gene. The surrogate target sequence can satisfy the following conditions: and C.T base substitution is carried out on the surrogate target sequence through a C.T base substitution system, so that the function of the selection agent resistance gene with the lost function can be recovered. The agent target sequence consists of a screening agent resistance gene target sequence with function loss and a PAM sequence in sequence. It should be noted that, in order to ensure that the screener resistance gene with the start codon removed can be translated in normal reading frame after the surrogate target sequence is added, one or two or more bases may be added between the surrogate target sequence and the screener resistance gene with the start codon removed.
In one embodiment of the present invention, the surrogate target sequence is 11305 th-11327 th of SEQ ID NO. 1. The target sequence of the screening agent resistance gene with the loss of function is 11305 th-11324 th site of the sequence 1. The C.T base substitution system can perform C.T base substitution on the proxy target sequence under the guidance of tRNA-sgRNA of the target sequence of the screening agent resistance gene with the target of the loss of function, so that the 5 th base C of the proxy target sequence is mutated into the base T to form ATG, and further the function of the screening marker gene is recovered. It should be noted that, in order to ensure that the screener resistance gene with the start codon removed can be translated in normal reading frame after the surrogate target sequence is added, a base C is added between the surrogate target sequence and the screener resistance gene with the start codon removed.
In another embodiment of the invention, the surrogate target sequence is sequence 10. The target sequence of the screening agent resistance gene with the loss of function is 1 st-20 th site of the sequence 10. The C.T base substitution system can perform C.T base substitution on the proxy target sequence under the guidance of tRNA-sgRNA of the target sequence of the screening agent resistance gene with the target of the loss of function, so that the 7 th base C of the proxy target sequence is mutated into the base T to form ATG, and further the function of the screening marker gene is recovered.
Further, the screening agent resistance gene may be a screening agent resistance gene commonly used in the art, such as Bar/PAT glufosinate-N-acetyltransferase gene, PMI 6-phosphomannose isomerase gene, EPSPS 5-enolpyruvylshikimate-3-phosphate synthase gene, and the like. In one embodiment of the invention, the screener resistance gene is a hygromycin resistance gene.
In the kit, the c.t base substitution system further comprises UGI or biological material related to the UGI.
In the above kit, the Cas9 nuclease includes Cas9 nuclease or its variant, dead inactivating enzyme (dead Cas9, dCas9) or its variant, nickase (Cas9 nickase, Cas9n) or its variant from different sources. The Cas9 nucleases or variants thereof of different origins include Cas9 (such as SaCas9, SaCas9-KKH and the like) derived from bacteria, Cas9-PAM variants (such as xCas9, NG Cas9, Cas9-VQR, Cas9-VRER and the like), Cas9 high fidelity enzyme variants (such as HypaCas9, eSpCas9(1.1), Cas9-HF1 and the like) and the like. In a specific embodiment of the invention, the Cas9 nuclease is Cas9n, specifically SpCas9n protein. In another embodiment of the invention, the Cas9 nuclease is Cas9n, in particular HypaCas9n protein.
The cytosine deaminase can be an hAPOBE3A protein, a human AID protein, a PmCDA1 protein, or an rAPOBEC1 protein. In a specific embodiment of the invention, the cytosine deaminase is PmCDA1 protein. In another specific embodiment of the invention, the cytosine deaminase is an rAPOBEC1 protein.
Further, the SpCas9n protein is a1) or a2) or A3):
A1) the amino acid sequence is a protein shown in a sequence 2;
A2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 2 in the sequence table and has the same function;
A3) a fusion protein obtained by connecting a label to the N terminal or/and the C terminal of A1) or A2);
the biological material related to the SpCas9n is any one of B1) to B5):
B1) a nucleic acid molecule encoding the SpCas9 n;
B2) an expression cassette comprising the nucleic acid molecule of B1);
B3) a recombinant vector containing the nucleic acid molecule of B1) or a recombinant vector containing the expression cassette of B2);
B4) a recombinant microorganism containing B1) the nucleic acid molecule, or a recombinant microorganism containing B2) the expression cassette, or a recombinant microorganism containing B3) the recombinant vector;
B5) a transgenic cell line comprising B1) the nucleic acid molecule or a transgenic cell line comprising B2) the expression cassette;
the HypaCas9n protein is C1) or C2) or C3):
C1) the amino acid sequence is a protein shown in a sequence 7;
C2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 7 in the sequence table and has the same function;
C3) a fusion protein obtained by connecting a label to the N terminal or/and the C terminal of C1) or C2);
the biological material related to the HypaCas9n is any one of D1) to D5):
D1) a nucleic acid molecule encoding the HypaCas9 n;
D2) an expression cassette comprising the nucleic acid molecule of D1);
D3) a recombinant vector containing the nucleic acid molecule of D1) or a recombinant vector containing the expression cassette of D2);
D4) a recombinant microorganism containing D1) the nucleic acid molecule, or a recombinant microorganism containing D2) the expression cassette, or a recombinant microorganism containing D3) the recombinant vector;
D5) a transgenic cell line comprising D1) the nucleic acid molecule or a transgenic cell line comprising the expression cassette of D2);
the PmCDA1 protein is E1) or E2) or E3):
E1) the amino acid sequence is a protein shown in a sequence 3;
E2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 3 in the sequence table and has the same function;
E3) a fusion protein obtained by connecting a label to the N terminal or/and the C terminal of E1) or E2);
the biological material related to the PmCDA1 protein is any one of F1) to F5):
F1) a nucleic acid molecule encoding the PmCDA1 protein;
F2) an expression cassette comprising the nucleic acid molecule of F1);
F3) a recombinant vector comprising the nucleic acid molecule of F1) or a recombinant vector comprising the expression cassette of F2);
F4) a recombinant microorganism containing F1) said nucleic acid molecule, or a recombinant microorganism containing F2) said expression cassette, or a recombinant microorganism containing F3) said recombinant vector;
F5) a transgenic cell line comprising the nucleic acid molecule of F1) or a transgenic cell line comprising the expression cassette of F2);
the rAPOBEC1 protein is G1) or G2) or G3):
G1) the amino acid sequence is a protein shown in a sequence 12;
G2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown as the sequence 12 in the sequence table and has the same function;
G3) a fusion protein obtained by connecting a tag to the N-terminus or/and the C-terminus of G1) or G2);
the biological material related to the rAPOBEC1 protein is any one of H1) to H5):
H1) a nucleic acid molecule encoding said rAPOBEC1 protein;
H2) an expression cassette comprising the nucleic acid molecule of H1);
H3) a recombinant vector containing H1) the nucleic acid molecule or a recombinant vector containing H2) the expression cassette;
H4) a recombinant microorganism containing H1) the nucleic acid molecule, or a recombinant microorganism containing H2) the expression cassette, or a recombinant microorganism containing H3) the recombinant vector;
H5) a transgenic cell line comprising H1) the nucleic acid molecule or a transgenic cell line comprising H2) the expression cassette;
the UGI protein is I1) or I2) or I3):
I1) the amino acid sequence is a protein shown in a sequence 4;
I2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 4 in the sequence table and has the same function;
I3) a fusion protein obtained by connecting labels at the N terminal or/and the C terminal of I1) or I2);
the biological material related to the UGI protein is any one of J1) to J5):
J1) a nucleic acid molecule encoding the UGI protein;
J2) an expression cassette comprising the nucleic acid molecule of J1);
J3) a recombinant vector comprising J1) said nucleic acid molecule, or a recombinant vector comprising J2) said expression cassette;
J4) a recombinant microorganism containing J1) the nucleic acid molecule, or a recombinant microorganism containing J2) the expression cassette, or a recombinant microorganism containing J3) the recombinant vector;
J5) a transgenic cell line comprising J1) the nucleic acid molecule or a transgenic cell line comprising J2) the expression cassette;
the biological material related to the loss-of-function screener resistance gene is any one of K1) to K4):
K1) an expression cassette containing the loss-of-function selection agent resistance gene;
K2) a recombinant vector containing the selection agent resistance gene having the loss of function, or a recombinant vector containing K1) the expression cassette;
K3) a recombinant microorganism containing the loss-of-function screener resistance gene, or a recombinant microorganism containing K1) the expression cassette, or a recombinant microorganism containing K2) the recombinant vector;
K4) a transgenic cell line containing the loss-of-function screener resistance gene, or a transgenic cell line containing the expression cassette of K1).
In order to facilitate the purification of the protein of A1), C1), E1), G1), I1), the amino terminal or carboxyl terminal of the protein consisting of the amino acid sequence shown in the sequence 2 or 3 or 4 or 7 or 12 in the sequence table is attached with the tags shown in the following table.
Sequence of Table, tag
Label (R) Residue of Sequence of
Poly-Arg 5-6 (typically 5) RRRRR
Poly-His 2-10 (generally 6) HHHHHH
FLAG 8 DYKDDDDK
Strep-tag II 8 WSHPQFEK
c-myc 10 EQKLISEEDL
The protein in A2), C2), E2), G2) and I2) is a protein having 75% or more or 75% or more identity to the amino acid sequence of the protein shown in SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, SEQ ID NO. 7 or SEQ ID NO. 12 and having the same function. The identity of 75% or more than 75% is 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity.
The protein in A2), C2), E2), G2) and I2) can be artificially synthesized, or can be obtained by synthesizing the coding gene and then performing biological expression.
The coding gene for the proteins A2), C2), E2), G2), I2) described above can be obtained by deleting one or several amino acid residues from the DNA sequence shown at positions 3529-7797 of the sequence 1 (protein shown by the coding sequence 2), 8089-8712 of the sequence 1 (protein shown by the coding sequence 3), 8734-9030 of the sequence 1 (protein shown by the coding sequence 4), 6 (protein shown by the coding sequence 7) or 1-687 of the sequence 9 (protein shown by the coding sequence 12), and/or by carrying out missense mutation of one or several base pairs, and/or by attaching a coding sequence to the 5 'end and/or 3' end of which a tag shown in the above table is attached.
Further, B1) the nucleic acid molecule is B1) or B2) or B3):
b1) a cDNA molecule or DNA molecule shown in 3529-7797 site of a sequence 1 in a sequence table;
b2) a cDNA or DNA molecule having 75% or more identity to the nucleotide sequence defined in b1) and encoding said SpCas9 n;
b3) a cDNA or DNA molecule hybridizing under stringent conditions to the nucleotide sequence defined in b1) or b2) and encoding the SpCas9 n;
D1) the nucleic acid molecule is d1) or d2) or d 3):
d1) a cDNA molecule or DNA molecule shown in a sequence 6 in a sequence table;
d2) a cDNA or DNA molecule having 75% or more identity to the nucleotide sequence defined by d1) and encoding the HypaCas9 n;
d3) a cDNA or DNA molecule hybridizing under stringent conditions to the nucleotide sequence defined by d1) or d2) and encoding said HypaCas9 n;
F1) the nucleic acid molecule is f1) or f2) or f 3):
f1) a cDNA molecule or DNA molecule shown in the 8089-8712 site of the sequence 1 in the sequence table;
f2) a cDNA molecule or DNA molecule having 75% or more identity with the nucleotide sequence defined by f1) and encoding the PmCDA 1;
f3) hybridizing with the nucleotide sequence defined by f1) or f2) under strict conditions, and encoding the cDNA molecule or DNA molecule of the PmCDA 1;
H1) the nucleic acid molecule is h1) or h2) or h 3):
h1) a cDNA molecule or DNA molecule shown in 1 st to 687 th sites of a sequence 9 in a sequence table;
h2) a cDNA or DNA molecule having 75% or more identity with the nucleotide sequence defined by h1) and encoding said rAPOBEC 1;
h3) hybridizing under stringent conditions with a nucleotide sequence defined by h1) or h2) and encoding a cDNA molecule or a DNA molecule of the rAPOBEC 1;
J1) the nucleic acid molecule is j1) or j2) or j 3):
j1) a cDNA molecule or DNA molecule shown in 8734-9030 site of a sequence 1 in a sequence table;
j2) a cDNA molecule or DNA molecule having 75% or more identity to the nucleotide sequence defined in j1) and encoding said UGI;
j3) a cDNA molecule or DNA molecule which hybridizes with the nucleotide sequence defined by j1) or j2) under strict conditions and codes the UGI;
K1) the resistance gene of the screening agent with the loss of function is a DNA molecule shown in 11305-12351 position of the sequence 1 or a sequence obtained by replacing 11305-11328 in the DNA molecule shown in 11305-12351 position of the sequence 1 with the sequence 10 and keeping other sequences unchanged.
Wherein the nucleic acid molecule may be DNA, such as cDNA, genomic DNA or recombinant DNA; the nucleic acid molecule may also be RNA, such as mRNA or hnRNA, etc.
The nucleotide sequence encoding the SpCas9n or the HypaCas9n or the PmCDA1 or the rAPOBEC1 or the UGI of the present invention can be easily mutated by a person of ordinary skill in the art using known methods, such as directed evolution and point mutation. Those nucleotides which are artificially modified to have 75% or more identity to the nucleotide sequence of the SpCas9n or the HypaCas9n or the PmCDA1 or the rAPOBEC1 or the UGI of the present invention are derived from the nucleotide sequence of the present invention and are identical to the sequence of the present invention as long as they encode the SpCas9n or the HypaCas9n or the PmCDA1 or the rAPOBEC1 or the UGI and have the same function.
The term "identity" as used herein refers to sequence similarity to a native nucleic acid sequence. "identity" includes nucleotide sequences that are 75% or greater, or 85% or greater, or 90% or greater, or 95% or greater, identical to the nucleotide sequence of a protein consisting of the amino acid sequence set forth in coding sequence 2, 3, 4, 7, or 12 of the invention. Identity can be assessed visually or by computer software. Using computer software, the identity between two or more sequences can be expressed in percent (%), which can be used to assess the identity between related sequences.
The stringent conditions are hybridization and washing of the membrane 2 times, 5min each, at 68 ℃ in a solution of 2 XSSC, 0.1% SDS, and 2 times, 15min each, at 68 ℃ in a solution of 0.5 XSSC, 0.1% SDS; alternatively, hybridization was carried out at 65 ℃ in a solution of 0.1 XSSPE (or 0.1 XSSC), 0.1% SDS, and the membrane was washed.
The above-mentioned identity of 75% or more may be 80%, 85%, 90% or 95% or more.
B2) The expression cassette containing the nucleic acid molecule encoding the SpCas9n protein (SpCas9n gene expression cassette) refers to DNA capable of expressing the SpCas9n protein in host cells, and the DNA may include not only a promoter for starting the transcription of the SpCas9n gene, but also a terminator for terminating the transcription of the SpCas9n gene. Further, the expression cassette may also include an enhancer sequence. The existing expression vector can be used for constructing a recombinant vector containing the SpCas9n gene expression cassette.
D2) The expression cassette containing the nucleic acid molecule encoding the HypaCas9n protein (HypaCas9n gene expression cassette) refers to a DNA capable of expressing the HypaCas9n protein in a host cell, and the DNA may include not only a promoter for initiating transcription of the HypaCas9n gene, but also a terminator for terminating transcription of the HypaCas9n gene. Further, the expression cassette may also include an enhancer sequence. The existing expression vector can be used for constructing a recombinant vector containing the HypaCas9n gene expression cassette.
F2) The expression cassette containing the nucleic acid molecule encoding the PmCDA1 protein (PmCDA1 gene expression cassette) refers to a DNA capable of expressing the PmCDA1 protein in a host cell, and the DNA may include not only a promoter for initiating transcription of the PmCDA1 gene, but also a terminator for terminating transcription of the PmCDA1 gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the PmCDA1 gene expression cassette can be constructed by using the existing expression vector.
H2) The expression cassette containing a nucleic acid molecule encoding rAPOBEC1 protein (rAPOBEC1 gene expression cassette) refers to DNA capable of expressing rAPOBEC1 protein in host cells, and the DNA can not only comprise a promoter for starting transcription of rAPOBEC1 gene, but also comprise a terminator for stopping transcription of rAPOBEC1 gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the rAPOBEC1 gene expression cassette can be constructed by using the existing expression vector.
J2) The expression cassette containing a nucleic acid molecule encoding the UGI protein (UGI gene expression cassette) refers to a DNA capable of expressing the UGI protein in a host cell, and the DNA may include not only a promoter for initiating transcription of the UGI gene but also a terminator for terminating transcription of the UGI gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the UGI gene expression cassette can be constructed using an existing expression vector.
The vector may be a plasmid, cosmid, phage or viral vector. In a particular embodiment of the invention, the recombinant vector is in particular a sgRNA-ATG-Hyg-ATG/sgRNA-GT-1 recombinant expression vector and sgRNA-ATG-Hyg-ATGRecombinant table of/sgRNA-GT-2Expression vector and sgRNA-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-1 recombinant expression vector, sgRNA-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-2 recombinant expression vector or sgRNA-ATG-Hyg-ATGa/r-sgRNA-GT recombinant expression vector.
The sgRNA-ATG-Hyg-ATGThe sequence of the/sgRNA-GT-1 recombinant expression vector is sequence 1. The sgRNA-ATG-Hyg-ATGthe/sgRNA-GT-1 recombinant expression vector contains six target sequences, and the sequences are shown in a table 1.
The sgRNA-ATG-Hyg-ATGThe sequence of the/sgRNA-GT-2 recombinant expression vector is that five target sequences in the sequence 1, namely ALS-T1, CDC48-T1, NRT1.1B-T1, Waxy and ALS-T2 are sequentially and respectively replaced by the following five target sequences: ALS-T3, CDC48-T2, NRT1.1B-T3, NRT1.1B-T2 and DEP1, and the sequences obtained by keeping other sequences unchanged. The corresponding target sequence information is shown in Table 1.
The sgRNA-ATG-Hyg-ATGThe sequence of the/HypaCas 9n-sgRNA-GT-1 recombinant expression vector is obtained by replacing the coding sequence of the SpCas9n protein shown in 3529-7797 site of the sequence 1 with the coding sequence of the HypaCas9n protein shown in the sequence 6 (the HypaCas9n protein shown in the coding sequence 7) and keeping other sequences unchanged.
The sgRNA-ATG-Hyg-ATGThe sequence of the/HypaCas 9n-sgRNA-GT-2 recombinant expression vector is obtained by replacing the coding sequence of the SpCas9n protein shown in the 3529-7797 site of the sequence 1 with the coding sequence of the HypaCas9n protein shown in the sequence 6, replacing the 131-1802 site with the sequence 8 and keeping other sequences unchanged.
The sgRNA-ATG-Hyg-ATGThe sequence of the/r-sgRNA-GT recombinant expression vector is that the 3529-th and 8712-th positions of the sequence 1 are replaced by the sequence 9(rAPOBEC1 fuses with SpCas9n sequence), the 11305-th and 11328-th positions of the sequence 1 are replaced by the sequence 10(rAPOBEC1 surrogate target point target sequence), and the first five target point sequences of the first expression cassette of the sequence 1 are sequentially and respectively replaced by the following five target point sequences: ALS-T1, wax, NRT1.1B-T1, wax-T2, rAPOBEC1-sug, and the 1339 th and 1511 th positions of the sequence 1And (3) deleting the six target sequences, the nucleotide sequences of tRNA of the six target sequences and the nucleotide sequences of sgRNA, and keeping the other sequences unchanged to obtain the sequences. The corresponding target sequences are shown in Table 3.
The microorganism may be a yeast, bacterium, algae or fungus. Wherein the bacterium can be an Agrobacterium, such as Agrobacterium EHA 105. In a specific embodiment of the invention, the recombinant microorganism specifically comprises the sgRNA-ATG-Hyg-ATG/sgRNA-GT-1 recombinant expression vector or sgRNA-ATG-Hyg-ATG/sgRNA-GT-2 recombinant expression vector or sgRNA-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-1 recombinant expression vector or sgRNA-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-2 recombinant expression vector or sgRNA-ATG-Hyg-ATGThe agrobacterium EHA105 of the/r-sgRNA-GT recombinant expression vector.
The transgenic cell line does not include propagation material.
The kit has the following uses:
m1) enriching the cells with C.T base substitution of the genome target sequence of the organism or the organism cells;
m2) preparing products for enriching the cells with the C.T base substitution of the genome target sequences of organisms or organism cells;
m3) improving the efficiency of C.T base substitution of the genome target sequence of an organism or an organism cell;
m4) preparing a product for improving the replacement efficiency of the C.T base of the genome target sequence of the organism or the organism cell;
m5) a c.t base substitution in a target sequence of a genome of an organism or a cell of an organism;
m6) preparation of products for C.T base substitution in target sequences of organisms or biological cells.
The above-mentioned non-functional selection agent resistance gene or biological material related to the non-functional selection agent resistance gene also falls within the scope of the present invention.
In order to achieve the above object, the present invention also provides a novel use of the above-mentioned kit or the above-mentioned loss-of-function screening agent resistance gene or a biological material related to the loss-of-function screening agent resistance gene.
The present invention provides the use of the above kit or the above loss-of-function screener resistance gene or a biological material related to the loss-of-function screener resistance gene in any one of M1) -M6):
m1) enriching the cells with C.T base substitution of the genome target sequence of the organism or the organism cells;
m2) preparing products for enriching the cells with the C.T base substitution of the genome target sequences of organisms or organism cells;
m3) improving the efficiency of C.T base substitution of the genome target sequence of an organism or an organism cell;
m4) preparing a product for improving the replacement efficiency of the C.T base of the genome target sequence of the organism or the organism cell;
m5) a c.t base substitution in a target sequence of a genome of an organism or a cell of an organism;
m6) preparation of products for C.T base substitution in target sequences of organisms or biological cells.
In order to achieve the above object, the present invention also provides the method described in N1) or N2) or N3) or N4) or N5):
n1) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing the coding gene of the Cas9 nuclease, the DNA molecule of the sgRNA transcribed and targeted to the target gene target sequence, the DNA molecule of the sgRNA transcribed and targeted to the target sequence of the loss-of-function screening agent resistance gene, the coding gene of cytosine deaminase, the coding gene of UGI and the loss-of-function screening agent resistance gene into an organism or an organism cell so as to express the Cas9 nuclease, the sgRNA, the cytosine deaminase and the UGI; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with targeted function loss, the Cas9 nuclease, the cytosine deaminase and the UGI can restore the function of the screening agent resistance gene with function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with function loss, thereby realizing the enrichment of cells with C.T base substitution of the screening agent resistance gene, further realizing the enrichment of cells with C.T base substitution of the target sequence of the target gene of the genome of an organism or an organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n2) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing the coding gene of the Cas9 nuclease, the DNA molecule of the sgRNA of the transcription target gene target sequence of the loss-of-function screening agent resistance gene, the coding gene of the cytosine deaminase and the loss-of-function screening agent resistance gene into an organism or biological cells so as to express the Cas9 nuclease, the sgRNA and the cytosine deaminase; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with the targeted function loss, the Cas9 nuclease and the cytosine deaminase can restore the function of the screening agent resistance gene with the function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with the function loss, and further enrich the cells with the C.T base substitution of the screening agent resistance gene, thereby realizing the enrichment of the cells with the C.T base substitution of the target sequence of the target gene of the genome of the organism or the organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n3) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing the Cas9 nuclease, sgRNA targeting a target gene target sequence, sgRNA targeting the loss-of-function screening agent resistance gene target sequence, cytosine deaminase, UGI and the loss-of-function screening agent resistance gene into an organism or an organism cell; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with targeted function loss, the Cas9 nuclease, the cytosine deaminase and the UGI can restore the function of the screening agent resistance gene with function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with function loss, thereby realizing the enrichment of cells with C.T base substitution of the screening agent resistance gene, further realizing the enrichment of cells with C.T base substitution of the target sequence of the target gene of the genome of an organism or an organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n4) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing the Cas9 nuclease, the sgRNA targeting the target gene target sequence, the sgRNA targeting the loss-of-function screening agent resistance gene target sequence, cytosine deaminase and the loss-of-function screening agent resistance gene into an organism or an organism cell; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with the targeted function loss, the Cas9 nuclease and the cytosine deaminase can restore the function of the screening agent resistance gene with the function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with the function loss, and further enrich the cells with the C.T base substitution of the screening agent resistance gene, thereby realizing the enrichment of the cells with the C.T base substitution of the target sequence of the target gene of the genome of the organism or the organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n5) biological mutant, comprising the following steps: editing the genome of the organism according to the method of N1) or N2) or N3) or N4) to obtain an organism mutant; the biological mutant is an organism in which C.T base substitution occurs.
In the method, the sgRNA of the target gene target sequence is tRNA-sgRNA of the target gene target sequence, and the sgRNA of the screening agent resistance gene target sequence with targeted loss of function is tRNA-sgRNA of the screening agent resistance gene target sequence with targeted loss of function. Further, the tRNA-sgRNA obtained by transcribing the DNA molecule of the tRNA-sgRNA that is transcribed to target a target sequence of the target gene or the DNA molecule of the tRNA-sgRNA that is transcribed to target a target sequence of the selection agent resistance gene that is lost of function is an immature RNA precursor, and the tRNA in the RNA precursor is cut off by two enzymes (RNase P and RNase Z) to obtain mature RNA. And obtaining independent mature RNAs according to the number of targets in a recombinant expression vector, wherein each mature RNA consists of RNA transcribed by the target sequence and the sgRNA framework in sequence or consists of individual base remained by the tRNA, RNA transcribed by the target sequence and the sgRNA framework in sequence.
In the above method, the number of the UGIs may be one or two or more in the N1) or N3). In a specific embodiment of the present invention, the number of the UGIs is specifically one.
In the above method, in N1), the gene encoding Cas9 nuclease, the DNA molecule of sgRNA transcription-targeted to the target gene sequence, the DNA molecule of sgRNA transcription-targeted to the loss-of-function screening agent-resistant gene target sequence, the gene encoding cytosine deaminase, and the gene encoding UGI are introduced into an organism or an organism cell via a recombinant vector containing an expression cassette of the gene encoding Cas9 nuclease, an expression cassette of the DNA molecule transcription-targeted to the sgRNA transcription-targeted to the target gene sequence, an expression cassette of the DNA molecule transcription-targeted to the loss-of-function screening agent-resistant gene target sequence, an expression cassette of the gene encoding cytosine deaminase, and an expression cassette of the gene encoding UGI. Each of the above-mentioned expression cassettes may be introduced into an organism or a biological cell by the same recombinant expression vector, or may be introduced into an organism or a biological cell by two or more recombinant expression vectors together.
In a specific embodiment of the present invention, each of the expression cassettes is introduced into an organism or a biological cell through the same recombinant expression vector, specifically, the sgRNA-ATG-Hyg-ATG/sgRNA-GT-1 recombinant expression vector or sgRNA described above-ATG-Hyg-ATG/sgRNA-GT-2 recombinant expression vector or sg thereofRNA-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-1 recombinant expression vector or sgRNA thereof-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-2 recombinant expression vector or sgRNA thereof-ATG-Hyg-ATGa/r-sgRNA-GT recombinant expression vector.
In the kit of parts or the use or the method, the base C is mutated to the base T instead of the base C. The base C can be any position in the target sequence.
In the above kit of parts or use or method, the organism is P1) or P2) or P3) or P4):
p1) plants or animals;
p2) monocotyledonous or dicotyledonous plants;
p3) gramineous plants;
p4) rice (e.g., japanese fine rice);
the biological cell is Q1) or Q2) or Q3) or Q4):
q1) plant cells or animal cells;
q2) a monocotyledonous or dicotyledonous plant cell;
q3) a graminaceous plant cell;
q4) Rice cells (e.g., Nipponbare rice cells).
The cell enrichment technology principle of the invention is as follows: a cell enrichment technique using inactivated resistance gene of the screening agent as a reporter gene for C.T base substitution is established, so that cells with C.T base substitution on the reporter gene can grow in a medium containing the screening agent, and cells without C.T base substitution can not grow in a medium containing the screening agent. On the basis of the reporter gene, if C.T base replacement editing is carried out on the endogenous target gene target spot, cells growing in a culture medium containing a screening agent have higher probability of C.T base replacement of the endogenous target gene target spot, so that enrichment of the cells with the C.T base replacement of the endogenous target gene target spot is realized, and the C.T base replacement efficiency of the endogenous target gene target spot is improved.
The invention has the following advantages:
1. there are many different types of genes that can be used as reporter genes for cell enrichment in plants by C.T base replacement. Because genetic transformation methods (such as an agrobacterium transformation method and a gene gun transformation method) of various crops have relatively mature and stable screening systems, the genetic transformation methods have more broad spectrum and universality compared with other genetic transformation methods such as a fluorescent reporter gene and an endogenous herbicide resistance gene and the like by using a resistance gene corresponding to a screening agent for transformation as a reporter gene to enrich endogenous mutant cells of a genome.
2. The technical design is simple and convenient, and the agent target and the design form can be more widely applied to resistance genes corresponding to more screening agents so as to meet the requirements of different transformation screening systems of different crops.
3. The cell enrichment technology realizes the enrichment of C.T base replacement cells on the cellular level for different deaminase mediated base editors or different Cas9 enzyme mediated base editors, and greatly improves the C.T base replacement efficiency.
Drawings
FIG. 1 is a schematic structural diagram of a non-cell enrichment technology vector sgRNA-GT.
FIG. 2 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGSchematic structural diagram of/sgRNA-GT.
FIG. 3 is a schematic diagram of the operation principle of the cell enrichment technique.
FIG. 4 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGComparing the efficiency of C.T base replacement of the target in resistance healing of the/sgRNA-GT and the non-cell enrichment technology vector sgRNA-GT.
FIG. 5 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGThe efficiencies of C.T base replacement of target spots in T0 seedlings by the sgRNA-GT and the non-cell enrichment technology vector sgRNA-GT are compared.
FIG. 6 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGComparing the efficiencies of the sgRNA-GT and the non-cell enrichment technology vector sgRNA-GT in the T0 vaccine for the homozygous replacement of the target C.T base substitution.
FIG. 7 is Hypacas9n&PmCDA1&UGI-mediated cell enrichment technologySurgical vector sgRNA-ATG-Hyg-ATGStructural schematic diagrams of/HypaCas 9n-sgRNA-GT and acellular enrichment technology vector HypaCas9 n-sgRNA-GT.
FIG. 8 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGComparison of C.T base replacement efficiency of the/HypaCas 9n-sgRNA-GT and the acellular enrichment technology vector HypaCas9n-sgRNA-GT on the target spot in the T0 vaccine.
FIG. 9 shows rAPOBEC1&Cas9n&UGI-mediated cell enrichment technology vector sgRNA-ATG-Hyg-ATGThe structure schematic diagram of/r-sgRNA-GT and non-cell enrichment technology vector r-sgRNA-GT.
FIG. 10 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGAnd comparing the C.T base replacement efficiency of the/r-sgRNA-GT with that of the r-sgRNA-GT in a non-cell enrichment technology vector to a target point in a T0 seedling.
FIG. 11 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGSchematic structural diagrams of/sgRNA-GT, Dissugs and acellular enrichment technology vectors sgRNA-GT.
FIG. 12 shows a cell enrichment technology vector sgRNA-ATG-Hyg-ATGC.T base replacement efficiency of the target point in T0 seedlings by/sgRNA-GT and Dissugs is compared.
Detailed Description
The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The experimental procedures in the following examples are conventional unless otherwise specified. Materials, reagents, instruments and the like used in the following examples are commercially available unless otherwise specified. In the following examples, unless otherwise specified, the 1 st position of each nucleotide sequence in the sequence listing is the 5 'terminal nucleotide of the corresponding DNA/RNA, and the last position is the 3' terminal nucleotide of the corresponding DNA/RNA.
Primer pair T1 was composed of primer T1-F: 5'-gtaagaaccaccagcgacac-3' and primer T1-R: 5'-gtaattgtgcttggtgatgga-3', and is used for amplifying target ALS-T1.
Primer pair T2 was composed of primer T2-F: 5'-aatatgccattcaggtgctgg-3' and primer T2-R: 5'-atcataggcagcacatgctcc-3', and is used for amplifying target ALS-T2.
Primer pair T3 was composed of primer T3-F: 5'-atggctacgaccgccgcgg-3' and primer T3-R: 5'-gcctcaattttccctgtcacacgatc-3', and is used for amplifying target ALS-T3.
Primer pair T4 was composed of primer T4-F: 5'-attgtggctcgtgctctacc-3' and primer T4-R: 5'-agacacacccacaggaacatt-3', and is used for amplifying the target DEP 1.
Primer pair T5 was composed of primer T5-F: 5'-cttcaaattctaatccccaatcc-3' and primer T5-R: 5'-ggttgttgttgaggtttaggatc-3', for amplifying the target spot wax.
Primer pair T6 was composed of primer T6-F: 5'-ttacgaactttataactttgtcgg-3' and primer T6-R: 5'-atggaggcgatgaggaagac-3', and is used for amplifying target NRT1.1B-T1.
Primer pair T7 was composed of primer T7-F: 5'-ctaatcctaccaattaacgagtcg-3' and primer T7-R: 5'-accagttgaagaagcgcatc-3', and is used for amplifying target NRT1.1B-T2.
Primer pair T8 was composed of primer T8-F: 5'-cctccatcctcctcaccg-3' and primer T8-R: 5'-tgaccttgtggacgatggtg-3', and is used for amplifying target NRT1.1B-T3.
Primer pair T9 was composed of primer T9-F: 5'-acatcgagatggagaagcgg-3' and primer T9-R: 5'-ccatgctccaatcgatgaatac-3', and is used for amplifying target CDC 48-T1.
Primer pair T10 was composed of primer T10-F: 5'-agacaccatctgcattgttct-3' and primer T10-R: 5'-ggatgtaagaaggcgacactag-3', and is used for amplifying target CDC 48-T2.
Primer pair T11 was composed of primer T11-F: 5'-gtagcttcaaattctaatcc-3' and primer T11-R: 5'-ggaggccaccgaggacgtc-3', and is used for amplifying the target point wax-T2.
In the following examples, C.T base substitutions refer to mutations from C to T at any position in the target sequence.
The efficiency of C · T base substitution is equal to the number of positive resistant calli (or positive T0 seedlings) in which C · T base substitution occurred/the number of total positive resistant calli analyzed (or total positive T0 seedlings) × 100%.
The homozygous replacement efficiency of c.t base replacement was equal to the number of homozygous mutated T0 seedlings/total number of T0 seedlings with c.t base replacement × 100%. Homozygous mutated T0 seedlings were defined as T0 seedlings in which all sites where c.t base substitutions occurred were homozygous mutations.
Japanese fine rice: reference documents: the effects of sodium nitroprusside and its photolysis products on the growth of Nippon rice seedlings and the expression of 5 hormone marker genes [ J ]. proceedings of university of Master Henan (Nature edition), 2017(2): 48-52.; the public is available from the agroforestry academy of sciences of Beijing.
Recovering the culture medium: n6 solid medium containing 200mg/L timentin.
Screening a culture medium: n6 solid medium containing 50mg/L hygromycin.
Differentiation medium: n6 solid culture medium containing 2mg/L KT, 0.2mg/L NAA, 0.5g/L glutamic acid and 0.5g/L proline.
Rooting culture medium: n6 solid medium containing 0.2mg/L NAA, 0.5g/L glutamic acid, 0.5g/L proline.
Example 1 establishment of cell enrichment technique for C.T base substitution
Establishment of C.T base substituted cell enrichment technical carrier
The common technical (non-cell enrichment) vectors for Cas9 nuclease, cytosine deaminase, and UGI-mediated c.t base replacement were named sgRNA-GT. Taking Cas9 nuclease as SpCas9n and cytosine deaminase as PmCDA1 as examples: the schematic structure of the sgRNA-GT vector is shown in fig. 1.
The cell enrichment technology vector of Cas9 nuclease, cytosine deaminase and UGI mediated C.T base replacement is named sgRNA-ATG-Hyg-ATGThe expression vector is/sgRNA-GT. Taking Cas9 nuclease as SpCas9n and cytosine deaminase as PmCDA1 as examples: sgRNA-ATG-Hyg-ATGThe structure schematic diagram of the/sgRNA-GT vector is shown in FIG. 2.
The carrier of non-cell enrichment technology contains complete resistance gene of the screening agent. The cell enrichment technology vector is obtained by modifying a screening agent resistance gene on the basis of a non-cell enrichment technology vector to lose the function of the screening agent resistance gene and adding a corresponding proxy target sequence to the sgRNA part.
Taking the screening agent resistance gene as Hygromycin resistance gene Hygromycin as an example: the screening agent resistance gene in the carrier of the non-cell enrichment technology is the complete Hygromycin resistance gene Hygromycin. The screening agent resistance gene in the cell enrichment technical carrier is a Hygromycin resistance gene Hygromycin (Hygromycin) with lost function-ATG) Loss-of-function Hygromycin resistance gene Hygromycin (Hygromycin)-ATG) ATG is removed from the complete Hygromycin resistance gene Hygromycin, and a surrogate target sequence (containing PAM) is added at the 5' end to obtain the sequence.
Wherein, the target sequence of the surrogate target (Efemp1 mutant target) can be the following sequence: gcaacgagtggtgtggtgcctgg (bases in italics are PAM sequences) which is a surrogate target prosequence gcaac of the Efemp1 gene from human cellcagtggtgtggtgcctgg, wherein base C (base shown by underlining) is mutated to base G.
The surrogate target sequence (rAPOBEC1-sug) can also be the following sequence: cggcgacggcgagcaagtggtgg (bases in italics are PAM sequences).
Working principle of cell enrichment technology by replacing two, C and T bases
The operation principle of the cell enrichment technique by C.T base substitution is shown in FIG. 3. Taking the screening agent resistance gene as Hygromycin resistance gene Hygromycin as an example: in the cell enrichment technology, after ATG is removed from Hygromycin resistance gene Hygromycin, resistance function is lost, and a plant cannot grow resistance callus in a Hygromycin screening culture medium, when a C.T base replacement system (the C.T base replacement system consisting of Cas9 nuclease, cytosine deaminase and UGI) in the cell enrichment technology mutates C5 in a proxy target sequence (Efemp1 mutant target) to T5 (base C at position 5 is mutated to base T) or C7 in a proxy target sequence (rAPOC 1-sug) to T7 (base C at position 7 is mutated to base T) under guidance of sgRNA, after ATG is formed, the Hygromycin resistance gene Hygromycin can be normally expressed, resistance function is recovered, and the plant can grow resistance callus in the Hygromycin screening culture medium. As the cells growing the resistant callus have been subjected to C.T base substitution, the C.T base substitution efficiency of the endogenous gene corresponding to the cells is relatively higher, so that the aim of enriching the C.T base substitution cells is fulfilled, and the C.T base substitution efficiency of the endogenous target of the plant is improved.
Example 2, construction of Cas9n & PmCDA1& UGI-mediated cell enrichment technology vector and application thereof in rice genome editing
Construction of recombinant expression vector
The recombinant expression vector in this example is Cas9n&PmCDA1&UGI (PCBE) -mediated C.T base replacement non-cell enrichment technology vector sgRNA-GT and Cas9n&PmCDA1&UGI (PCBE) -mediated C.T base replacement cell enrichment technology vector sgRNA-ATG-Hyg-ATGThe expression vector is/sgRNA-GT. Each vector is a circular plasmid. The structural schematic diagrams of each element of the two recombinant expression vectors are respectively shown in FIG. 1 and FIG. 2.
Each recombinant expression vector is divided into two types according to different target sequences, and the following four recombinant expression vectors are total: sgRNA-ATG-Hyg-ATG/sgRNA-GT-1 recombinant expression vector and sgRNA-ATG-Hyg-ATGa/sgRNA-GT-2 recombinant expression vector, a sgRNA-GT-1 recombinant expression vector and a sgRNA-GT-2 recombinant expression vector.
Artificially synthesizing the four recombinant expression vectors, wherein the specific structural descriptions of the four recombinant expression vectors are respectively as follows:
sgRNA-ATG-Hyg-ATGthe sequence of the/sgRNA-GT-1 recombinant expression vector is sequence 1 in a sequence table. Wherein, the 131-467 site of the sequence 1 is the nucleotide sequence of OsU3 promoter, the 474-550 site, 647-723 site, 820-896 site, 993-1069 site, 1166-1242 site and 1339-1415 site are all the nucleotide sequences of tRNA, the 551-570 site, 724-743 site, 897-916 site, 1070-1089 site and 1243-1262 site are five sequences targeting OsALS, OsCDC48, OsNRT1.1B, OsWax and OsALS genes respectively, and the 1416-1435 site is the reporter gene surrogate target site sequence. The nucleotide sequences of the sgRNA at the 646 th 571-A nucleotide sequence; the 1809-3522 site of the sequence 1 is a nucleotide sequence of an OsUbq3 promoter, the 3529-7797 site is a coding sequence (without a stop codon) of a SpCas9n protein, and the coding sequence 2 is a SpCas9n protein; the 8089-8712 position of the sequence 1 is the coding sequence of the PmCDA1 protein (without a stop codon), and the PmCDA1 protein is shown as the coding sequence 3; the 8734-9030 th position of the sequence 1 is a UGI protein coding sequence shown in a coding sequence 4; the nucleotide sequence of 35S terminator at position 9037-9231 of the sequence 1, the nucleotide sequence of ZmUbi1 promoter at position 9306-11298, the surrogate target sequence at position 11305-11327, the nucleotide sequence of hygromycin phosphotransferase without initiation codon at position 11329-12351 and the nucleotide sequence of Nos terminator at position 12365-12617 of the sequence 1. sgRNA-ATG-Hyg-ATGSix target sequences in the/sgRNA-GT-1 recombinant expression vector are shown in Table 1, and the targets are ALS-T1, CDC48-T1, NRT1.1B-T1, Waxy, ALS-T2 and Efemp1 respectively.
sgRNA-ATG-Hyg-ATGThe sequence of the/sgRNA-GT-2 recombinant expression vector is that five target sequences in the sequence 1, namely ALS-T1, CDC48-T1, NRT1.1B-T1, Waxy and ALS-T2 are sequentially and respectively replaced by the following five target sequences: ALS-T3, CDC48-T2, NRT1.1B-T3, NRT1.1B-T2 and DEP1, and the sequences obtained by keeping other sequences unchanged. The corresponding target sequence information is shown in Table 1.
The sequence of the sgRNA-GT-1 recombinant expression vector is obtained by replacing 11305 th-12351 th site of the sequence 1 with a complete hygromycin resistance gene sequence shown in the sequence 5 and keeping other sequences unchanged.
The sequence of the sgRNA-GT-2 recombinant expression vector is that five target sequences of ALS-T1, CDC48-T1, NRT1.1B-T1, wax and ALS-T2 in the sgRNA-GT-1 recombinant expression vector are sequentially and respectively replaced by the following five target sequences: ALS-T3, CDC48-T2, NRT1.1B-T3, NRT1.1B-T2 and DEP1, and the sequences obtained by keeping other sequences unchanged. The corresponding target sequence information is shown in Table 1.
The target nucleotide sequence of sgRNA and the corresponding PAM sequence of each vector are shown in table 1.
TABLE 1
Figure BDA0002222281170000141
II, obtaining the rice positive resistance callus
The sgRNA obtained in the step one-ATG-Hyg-ATGsgRNA-GT-1 vector, sgRNA-ATG-Hyg-ATGThe sgRNA-GT-2 vector, the sgRNA-GT-1 vector and the sgRNA-GT-2 vector are operated according to the following steps 1 to 8 respectively:
1. the vector was introduced into Agrobacterium EHA105 (product of Shanghai Diego Biotechnology Ltd., CAT #: AC1010) to obtain recombinant Agrobacterium.
2. After completion of step 1, recombinant Agrobacterium was cultured using a medium (YEP medium containing 50. mu.g/ml kanamycin and 25. mu.g/ml rifampicin), cultured at 28 ℃ with shaking at 150rpm to OD600At room temperature, centrifuging at 10000rpm for 1min, resuspending the thallus with an infection solution (glucose and sucrose are replaced by N6 liquid culture medium, and the concentrations of glucose and sucrose in the infection solution are 10g/L and 20g/L respectively) and diluting to OD600And the concentration is 0.2, and an agrobacterium tumefaciens infection solution is obtained.
3. After the step 2 is completed, husking and threshing mature seeds of a rice variety Nipponbare, placing the seeds into a 100mL triangular flask, adding 70% (v/v) ethanol water solution to soak for 30sec, then placing the seeds into 25% (v/v) sodium hypochlorite water solution, carrying out shake sterilization at 120rpm for 30min, washing the seeds with sterile water for 3 times, sucking water by using filter paper, then placing the seeds on an N6 solid culture medium downwards, and carrying out dark culture at 28 ℃ for 4-6 weeks to obtain the rice callus.
4. After the step 3 is completed, soaking the rice callus in an agrobacterium infection solution A (the agrobacterium infection solution A is a liquid obtained by adding acetosyringone into the agrobacterium infection solution, the addition amount of the acetosyringone meets the volume ratio of the acetosyringone to the agrobacterium infection solution of 25 mul: 50ml), soaking for 10min, then placing the rice callus on a culture dish (containing about 200ml of the agrobacterium-free infection solution) paved with two layers of sterilization filter paper, and performing dark culture at 21 ℃ for 1 day.
5. And (4) putting the rice callus obtained in the step (4) on a recovery culture medium, and performing dark culture at 25-28 ℃ for 3 days.
6. And (4) placing the rice callus obtained in the step (5) on a screening culture medium, and performing dark culture at 28 ℃ for 2 weeks.
7. And (4) putting the rice callus obtained in the step (6) on a screening culture medium again, and performing dark culture at 28 ℃ for 2 weeks to obtain the rice resistance callus.
8. Respectively extracting 20-24 genome DNAs of rice resistant calli and taking the genome DNAs as templates, and performing PCR amplification by using a primer pair consisting of a primer F (5'-attatgtagcttgtgcgtttcg-3') and a primer R (5'-gatgaagagcttatcgacgt-3') to obtain PCR amplification products; the PCR amplification product was subjected to agarose gel electrophoresis, followed by judgment as follows: if the PCR amplification product contains about 1150bp DNA fragment, the corresponding rice resistant callus is rice positive resistant callus; if the PCR amplification product does not contain the DNA fragment of about 1150bp, the corresponding rice resistant callus is not the rice positive resistant callus.
Third, obtaining positive T0 rice seedling
1. The sgRNA obtained in the step one-ATG-Hyg-ATGsgRNA-GT-1 vector, sgRNA-ATG-Hyg-ATGAnd (3) carrying out operation on the/sgRNA-GT-2 vector, the sgRNA-GT-1 vector and the sgRNA-GT-2 vector according to 1-7 of the second step respectively to obtain the rice resistance callus.
2. And (3) putting the rice resistant callus obtained in the step (1) on a differentiation culture medium, performing illumination culture at 25 ℃ for about 1 month, transplanting the differentiated plantlets on a rooting culture medium, and performing illumination culture at 25 ℃ for 2 weeks to obtain rice T0 seedlings.
3. Respectively extracting the genomic DNA of the rice T0 seedlings obtained in the step 2, and performing PCR amplification by using the genomic DNA as a template and a primer pair consisting of a primer F (5'-attatgtagcttgtgcgtttcg-3') and a primer R (5'-gatgaagagcttatcgacgt-3') to obtain a PCR amplification product; the PCR amplification product was subjected to agarose gel electrophoresis, followed by judgment as follows: if the PCR amplification product contains a DNA fragment of about 1150bp, the corresponding rice T0 seedling is a rice positive T0 seedling; if the PCR amplification product does not contain a DNA fragment of about 1150bp, the corresponding rice T0 seedling is not a rice positive T0 seedling.
Fourth, result analysis
One), editing condition of target spot in rice callus
1. Taking 20-24 rice positive resistant callus genome DNAs obtained in the second step as templates (independently infecting twice to obtain an average value and a variance) for each vector, and carrying out PCR amplification on the ALS-T1 target by adopting a primer pair T1 to obtain a PCR amplification product; for ALS-T2 target, carrying out PCR amplification on T2 by adopting a primer to obtain a PCR amplification product; for ALS-T3 target, carrying out PCR amplification on T3 by adopting a primer to obtain a PCR amplification product; for the DEP1 target, carrying out PCR amplification on T4 by adopting a primer pair to obtain a PCR amplification product; for the wax target, carrying out PCR amplification on T5 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T1 target, carrying out PCR amplification on T6 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T2 target, carrying out PCR amplification on T7 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T3 target, carrying out PCR amplification on T8 by adopting a primer pair to obtain a PCR amplification product; for CDC48-T1 target, carrying out PCR amplification on T9 by using a primer to obtain a PCR amplification product; for CDC48-T2 target, PCR amplification is carried out by adopting a primer pair T10 to obtain a PCR amplification product.
2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of rice positive resistant calluses with C.T base substitution at each target point of each carrier is respectively counted, the C.T base substitution efficiency is calculated, and the result is shown in figure 4.
The results show that: vectors (sgRNA) by using cell enrichment technique-ATG-Hyg-ATGsgRNA-GT-1 vector, sgRNA-ATG-Hyg-ATGa/sgRNA-GT-2 vector), the efficiency of C.T base substitution at C3 in ALS-T1 target is increased from 44% to 75%; the C.T base replacement efficiency of the 4 th C in the ALS-T2 target point is increased from 23% to 68%; the average C.T base substitution efficiency of C3, 4, 5 and 6 in the ALS-T3 target increased from 30% to 75%; the average C.T base replacement efficiency of C at positions 3, 4 and 5 in the NRT1.1B-T3 target point is increased from 40% to 75%; the average C.T base replacement efficiency of C at 8 th, 9 th, 11 th and 12 th positions in the NRT1.1B-T2 target point is increased from 5 percent to 15 percent; the C.T base replacement efficiency of the 11 th C in the Waxy target point is increased2.5% to 20%; the average C.T base substitution efficiency of C3 and C4 in CDC48-T1 target point is increased from 36% to 77%; the C.T base replacement efficiency of C at the 3 rd position in the CDC48-T2 target point is increased from 0 to 15 percent; the C.T base replacement efficiency of the 4 th C in the NRT1.1B-T1 target point is increased from 44% to 66%. In conclusion, the efficiency of C.T base replacement of most targets is improved to 1.5-8 times of that of the non-cell enrichment technology by using the cell enrichment technology.
II), editing condition of target spot in rice T0 seedling
1. Respectively taking the sgRNA obtained in the step three-ATG-Hyg-ATGsgRNA-GT-1 vector, sgRNA-ATG-Hyg-ATGThe genome DNA of the rice positive T0 seedling of the/sgRNA-GT-2 vector, the sgRNA-GT-1 vector and the sgRNA-GT-2 vector is used as a template, and for an ALS-T1 target spot, a primer is adopted to carry out PCR amplification on T1 to obtain a PCR amplification product; for ALS-T2 target, carrying out PCR amplification on T2 by adopting a primer to obtain a PCR amplification product; for ALS-T3 target, carrying out PCR amplification on T3 by adopting a primer to obtain a PCR amplification product; for the DEP1 target, carrying out PCR amplification on T4 by adopting a primer pair to obtain a PCR amplification product; for the wax target, carrying out PCR amplification on T5 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T1 target, carrying out PCR amplification on T6 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T2 target, carrying out PCR amplification on T7 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T3 target, carrying out PCR amplification on T8 by adopting a primer pair to obtain a PCR amplification product; for CDC48-T1 target, carrying out PCR amplification on T9 by using a primer to obtain a PCR amplification product; for CDC48-T2 target, PCR amplification is carried out by adopting a primer pair T10 to obtain a PCR amplification product.
2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. And respectively counting the number of rice positive T0 seedlings with C.T base substitution at each target point of each carrier, and calculating to obtain the C.T base substitution efficiency.
The results of analysis of the C.T base substitution efficiency are shown in FIG. 5. The results show that: in T0 vaccine, compared with non-cell enrichment technology, the replacement efficiency of C.T bases of ALS-T1 target is increased from 40% to 66.7%, the replacement efficiency of C.T bases of ALS-T2 target is increased from 20.7% to 63.3%, the replacement efficiency of C.T bases of ALS-T3 target is increased from 23.3% to 47.8%, the replacement efficiency of C.T bases of NRT1.1B-T2 target is increased from 50% to 60%, the replacement efficiency of C.T bases of NRT1.1B-T3 target is increased from 30% to 36.7%, the replacement efficiency of C.T bases of CDC48-T1 target is increased from 39.3% to 56.7%, and the replacement efficiency of C.T bases of CDC48-T2 target is increased from 13.3% to 33.3%. In addition to the fact that the replacement efficiency of the Waxy and NRT1.1B-T1 targets remains unchanged, the cell enrichment technology increases the C.T base replacement efficiency of most targets, and the average C.T base replacement efficiency is increased from 36% to 49%.
The results of analysis of homozygous replacement efficiency for C.T base replacement are shown in FIG. 6. The results show that: in T0, the efficiency of homozygous C.T base substitutions with 5 targets was increased. Specifically, the homozygous replacement efficiency of ALS-T1 was increased from 8.3% to 45%, the homozygous replacement efficiency of ALS-T2 was increased from 16.7% to 35.3%, the homozygous replacement efficiency of ALS-T3 was increased from 0 to 9.1%, the homozygous replacement efficiency of NRT1.1B-T3 was increased from 0 to 9.1%, and the homozygous replacement efficiency of CDC48-T1 was increased from 18.2% to 37.5%. The homozygous replacement rates for the remaining targets were almost unchanged. In summary, cell enrichment techniques can increase the efficiency of homozygous replacement of a portion of the target compared to non-cell enrichment techniques.
Example 3, construction of HypaCas9n & PmCDA1& UGI mediated cell enrichment technology vector and application thereof in genome editing of rice T0 seedling
Construction of recombinant expression vector
The recombinant expression vector in this example was Hypacas9n&PmCDA1&UGI (HypacaS9-PCBE) mediated C.T base replacement acellular enrichment technology vector (named HypacaS9n-sgRNA-GT) and HypacaS9n&PmCDA1&UGI (HypacaS9-PCBE) mediated cell enrichment technology vector (named sgRNA)-ATG-Hyg-ATG/Hypacas9 n-sgRNA-GT). Each vector is a circular plasmid. The structural schematic diagram of each element of the two recombinant expression vectors is shown in FIG. 7. Vector main structure and Cas9n&PmCDA1&UGI-mediated non-cell enrichment technology vectors are similar to cell enrichment technology vectors, differing only in the use of HypaCas9n instead of SpCas9 n.HypaCas9n&PmCDA1&UGI-mediated cell enrichment technology working principle and Cas9n&PmCDA1&UGI-mediated cell enrichment techniques are the same.
Each recombinant expression vector is divided into two types according to different target sequences, and the following four recombinant expression vectors are total: sgRNA-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-1 recombinant expression vector, sgRNA-ATG-Hyg-ATGA HypaCas9n-sgRNA-GT-2 recombinant expression vector, a HypaCas9n-sgRNA-GT-1 recombinant expression vector and a HypaCas9n-sgRNA-GT-2 recombinant expression vector.
Artificially synthesizing the four recombinant expression vectors, wherein the specific structural descriptions of the four recombinant expression vectors are respectively as follows:
sgRNA-ATG-Hyg-ATGthe sequence of the/HypaCas 9n-sgRNA-GT-1 recombinant expression vector is obtained by replacing the coding sequence of the SpCas9n protein shown in 3529-7797 site of the sequence 1 with the coding sequence of the HypaCas9n protein shown in the sequence 6 (the HypaCas9n protein shown in the coding sequence 7) and keeping other sequences unchanged.
sgRNA-ATG-Hyg-ATGThe sequence of the/HypaCas 9n-sgRNA-GT-2 recombinant expression vector is obtained by replacing the coding sequence of the SpCas9n protein shown in the 3529-7797 site of the sequence 1 with the coding sequence of the HypaCas9n protein shown in the sequence 6, replacing the 131-1802 site with the sequence 8 and keeping other sequences unchanged. Wherein, the 1-337 position of the sequence 8 is a nucleotide sequence of OsU3 promoter, the 344-420 position, 517-593 position and 690-766 position are tRNA sequences, the 441-516 position, 614-689 position and 787-862 position are sgRNA sequences, the 421-440 position is an ALS-T3 target sequence, the 594-613 position is a CDC48-T2 target sequence, the 767-786 position is an Efemp1 surrogate target sequence, and the 863-1153 position is a nucleotide sequence of OsU3 terminator. The corresponding target sequence information is shown in Table 2.
The sequence of the HypaCas9n-sgRNA-GT-1 recombinant expression vector is obtained by replacing the coding sequence of SpCas9n protein shown in 3529-7797 site of the sequence 1 with the coding sequence of HypaCas9n protein shown in the sequence 6, replacing the 11305-12351 site with the whole hygromycin resistance gene sequence shown in the sequence 5, and keeping other sequences unchanged.
The sequence of the HypaCas9n-sgRNA-GT-2 recombinant expression vector is obtained by replacing the coding sequence of SpCas9n protein shown in 3529-7797 site of the sequence 1 with the coding sequence of HypaCas9n protein shown in the sequence 6, replacing the 131-1802 site with the sequence 8, replacing the 11305-12351 site with the complete hygromycin resistance gene sequence shown in the sequence 5, and keeping other sequences unchanged.
The target nucleotide sequence of sgRNA and the corresponding PAM sequence of each vector are shown in table 2.
TABLE 2
Figure BDA0002222281170000181
Second, obtaining the Positive T0 Rice seedlings
1. The sgRNA obtained in the step one-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-1 vector, sgRNA-ATG-Hyg-ATGThe rice resistant calli were obtained by performing the operations of the/HypaCas 9n-sgRNA-GT-2 vector, the HypaCas9n-sgRNA-GT-1 vector and the HypaCas9n-sgRNA-GT-2 vector according to step two of example 2, respectively, from 1 to 7.
2. The rice resistant callus obtained in step 1 was subjected to the operation according to step 2-3 of example 2 to obtain a rice positive T0 seedling and the corresponding genomic DNA.
Third, result analysis
1. Respectively taking the sgRNA obtained in the step two-ATG-Hyg-ATG/Hypacas9n-sgRNA-GT-1 vector, sgRNA-ATG-Hyg-ATGThe genome DNA of the rice positive T0 seedling of a/HypaCas 9n-sgRNA-GT-2 vector, a HypaCas9n-sgRNA-GT-1 vector and a HypaCas9n-sgRNA-GT-2 vector is used as a template, and for an ALS-T1 target spot, a primer is adopted to carry out PCR amplification on T1 to obtain a PCR amplification product; for ALS-T2 target, carrying out PCR amplification on T2 by adopting a primer to obtain a PCR amplification product; for ALS-T3 target, carrying out PCR amplification on T3 by adopting a primer to obtain a PCR amplification product; for the wax target, carrying out PCR amplification on T5 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T1Performing PCR amplification on the target spot by using a primer pair T6 to obtain a PCR amplification product; for CDC48-T1 target, carrying out PCR amplification on T9 by using a primer to obtain a PCR amplification product; for CDC48-T2 target, PCR amplification is carried out by adopting a primer pair T10 to obtain a PCR amplification product.
2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of rice positive T0 seedlings with C.T base substitution at each target point of each carrier is respectively counted, the C.T base substitution efficiency is calculated, and the result is shown in figure 8.
The results show that: in T0, the cell enrichment technology mediated by HypaCas9n & PmCDA1& UGI was synergistic to a different extent for all 7 tested targets, compared to the HypaCas9n & PmCDA1& UGI mediated non-cell enrichment technology, HypaCas9 n-sgRNA-GT. The base substitution efficiency of ALS-T1 was increased from 20% to 50%, the base substitution efficiency of ALS-T2 was increased from 5% to 18.2%, the base substitution efficiency of ALS-T3 was increased from 5% to 11.8%, the base substitution efficiency of wax was increased from 15.8% to 18.2%, the base substitution efficiency of NRT1.1B-T1 was increased from 55% to 81.8%, the base substitution efficiency of CDC48-T1 was increased from 15% to 33%, and the base substitution efficiency of CDC48-T2 was increased from 0 to 5.9%.
Example 4, rAPOBEC1& Cas9n & UGI mediated cell enrichment technology vector and application thereof in genome editing of rice T0 seedling
Construction of recombinant expression vector
The recombinant expression vector in this example was rAPOBEC1&Cas9n&UGI (rCBE) -mediated C.T base substitution non-cell enrichment technology vector (named r-sgRNA-GT) and rAPOBEC1&Cas9n&UGI (rCBE) -mediated C.T base replacement cell enrichment technology vector (named sgRNA)-ATG-Hyg-ATGr-sgRNA-GT). Each vector is a circular plasmid. The structural schematic diagram of each element of the two recombinant expression vectors is shown in FIG. 9. rAPOBEC1&Cas9n&UGI-mediated cell enrichment technology working principle and Cas9n&PmCDA1&UGI-mediated cell enrichment techniques are the same.
The following recombinant expression vectors were artificially synthesized: sgRNA-ATG-Hyg-ATGa/r-sgRNA-GT recombinant expression vector and a r-sgRNA-GT recombinant expression vector. The specific structures of the two recombinant vectors are described below:
sgRNA-ATG-Hyg-ATGthe sequence of the/r-sgRNA-GT recombinant expression vector is that the 3529-th and 8712-th positions of the sequence 1 are replaced by a sequence 9(rAPOBEC1 fused with SpCas9n sequence), the 11305-th and 11328-th positions of the sequence 1 are replaced by a sequence 10(rAPOBEC1 surrogate target point sequence), and the first five target point sequences of the first expression cassette of the sequence 1 are sequentially and respectively replaced by the following five target point sequences: ALS-T1, wax, NRT1.1B-T1, wax-T2 and rAPOBEC1-sug, and the sixth target sequence shown in the 1339 th-1511 th position of the sequence 1, the nucleotide sequence of tRNA thereof and the nucleotide sequence of sgRNA are deleted, and other sequences are kept unchanged. The corresponding target sequences are shown in Table 3.
The nucleotide sequence of the r-sgRNA-GT recombinant expression vector is that the 3529-th and 8712-th positions of the sequence 1 are replaced by the sequence 9(rAPOBEC1 fuses with SpCas9n sequences), the 11305-th and 12351-th positions of the sequence 1 are replaced by the complete hygromycin resistance gene sequence shown in the sequence 5, and the first five target point sequences of the first expression cassette of the sequence 1 are sequentially and respectively replaced by the following five target point sequences: ALS-T1, wax, NRT1.1B-T1, wax-T2 and rAPOBEC1-sug, and the sixth target sequence shown in the 1339 th-1511 th position of the sequence 1, the nucleotide sequence of tRNA thereof and the nucleotide sequence of sgRNA are deleted, and other sequences are kept unchanged. The corresponding target sequences are shown in Table 3.
The target nucleotide sequence of sgRNA and the corresponding PAM sequence of each vector are shown in table 3.
TABLE 3
Figure BDA0002222281170000191
Second, obtaining the Positive T0 Rice seedlings
1. The sgRNA obtained in the step one-ATG-Hyg-ATGthe/r-sgRNA-GT vector and the r-sgRNA-GT vector were operated according to the steps 1 to 7 of example 2 to obtain rice-resistant calli.
2. The rice resistant callus obtained in step 1 was subjected to the operation according to step 2-3 of example 2 to obtain a rice positive T0 seedling and the corresponding genomic DNA.
Third, result analysis
1. Respectively taking the sgRNA obtained in the step two-ATG-Hyg-ATGTaking the genomic DNA of the rice positive T0 seedling of the/r-sgRNA-GT vector and the r-sgRNA-GT vector as a template, and carrying out PCR amplification on an ALS-T1 target spot by adopting a primer pair T1 to obtain a PCR amplification product; for the wax target, carrying out PCR amplification on T5 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T1 target, carrying out PCR amplification on T6 by adopting a primer pair to obtain a PCR amplification product; and for the Waxy-T2 target point, carrying out PCR amplification on T11 by adopting a primer pair to obtain a PCR amplification product.
2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of rice positive T0 seedlings with C.T base substitution at each target point of each carrier is respectively counted, the C.T base substitution efficiency is calculated, and the result is shown in figure 10.
The results show that: in T0 vaccine, rAPOBEC1 compared with r-sgRNA-GT non-cell enrichment technology&Cas9n&UGI-mediated cell enrichment technology sgRNA-ATG-Hyg-ATGthe/r-sgRNA-GT has different degrees of synergy on 4 tested targets (ALS-T1, wax, NRT1.1B-T1 and wax-T2). The replacement efficiency of the wax target is increased from 0 to 20%, the replacement efficiency of the NRT1.1B-T1 target is increased from 5% to 30%, the replacement efficiency of the ALS-T1 target is increased from 9.1% to 33.3%, and the replacement efficiency of the wax-T2 target is increased from 6.7% to 45%.
Example 5 application of optimized Cas9n & PmCDA1& UGI-mediated cell enrichment technology in rice genome editing
To improve Cas9n & PmCDA1& UGI-mediated c.t base replacement efficiency, Cas9n & PmCDA1& UGI-mediated cell enrichment technology was optimized: the optimized esgRNA is applied to a Cas9n & PmCDA1& UGI mediated cell enrichment technology, the optimized esgRNA is used for editing a genome endogenous target sequence, the sgRNA is used for editing a proxy target sequence of a reporter gene, the technology is named as a differential proxy technology, and a differential proxy technology vector is named as a differential proxy system Dissugs (differentiated sgRNAs based SurrorGate system). The schematic structural diagram of the dispugs of the difference proxy system is shown in fig. 11.
Construction of recombinant expression vector
The following recombinant expression vectors were artificially synthesized: DisSUGs-1 recombinant expression vectors and DisSUGs-2 recombinant expression vectors. The two recombinant expression vectors are both circular plasmids, and the specific structural descriptions are respectively as follows:
the sequences of the DissuGs-1 recombinant expression vector are sequences obtained by respectively replacing sgRNA nucleotide sequences at the 571-646 th, 744-819 th, 917-992 th, 1090-1165 th and 1263-1338 th of the sequence 1 with a sequence 11(esgRNA nucleotide sequence) and keeping other sequences unchanged.
The sequence of the DisSUGs-2 recombinant expression vector is that the first five target sequences of a first expression cassette in the DisSUGs-1 recombinant expression vector are sequentially and respectively replaced by the following five target sequences: ALS-T3, CDC48-T2, NRT1.1B-T3, NRT1.1B-T2 and DEP1, and the sequences obtained by keeping other sequences unchanged. The corresponding target sequence information is shown in Table 1.
The target nucleotide sequences of the esgrnas or sgrnas of each vector and the corresponding PAM sequences are shown in table 1.
Second, obtaining the Positive T0 Rice seedlings
1. And (3) operating DisSUGs-1 vectors and DisSUGs-2 vectors obtained in the first step according to 1-7 of the second step in the example 2 respectively to obtain rice resistance calluses.
2. The rice resistant callus obtained in step 1 was subjected to the operation according to step 2-3 of example 2 to obtain a rice positive T0 seedling and the corresponding genomic DNA.
Editing condition of target spot in rice T0 seedling
1. Respectively taking the DisSUGs-1 recombinant expression vector obtained in the step two and the genome DNA of the rice positive T0 seedling of the DisSUGs-1 recombinant expression vector as a template, and carrying out PCR amplification on the ALS-T1 target by adopting a primer pair T1 to obtain a PCR amplification product; for ALS-T2 target, carrying out PCR amplification on T2 by adopting a primer to obtain a PCR amplification product; for ALS-T3 target, carrying out PCR amplification on T3 by adopting a primer to obtain a PCR amplification product; for the DEP1 target, carrying out PCR amplification on T4 by adopting a primer pair to obtain a PCR amplification product; for the wax target, carrying out PCR amplification on T5 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T1 target, carrying out PCR amplification on T6 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T2 target, carrying out PCR amplification on T7 by adopting a primer pair to obtain a PCR amplification product; for NRT1.1B-T3 target, carrying out PCR amplification on T8 by adopting a primer pair to obtain a PCR amplification product; for CDC48-T1 target, carrying out PCR amplification on T9 by using a primer to obtain a PCR amplification product; for CDC48-T2 target, PCR amplification is carried out by adopting a primer pair T10 to obtain a PCR amplification product.
2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of rice positive T0 seedlings with C.T base substitution at each target point of each carrier is respectively counted, the C.T base substitution efficiency is calculated, and the result is shown in figure 12.
The results show that: and Cas9n&PmCDA1&UGI-mediated cell enrichment technology sgRNA-ATG-Hyg-ATGCompared with the sgRNA-GT, the probability of replacing C with T of T0 seedlings with 9 targets in 10 tested targets in the differential agent technology is increased. The concrete expression is as follows: the replacement efficiency of ALS-T1 is increased from 66.7% to 78.6%, the replacement efficiency of ALS-T2 is unchanged, the replacement efficiency of ALS-T3 is increased from 47.8% to 76.9%, the replacement efficiency of DEP1 is increased from 58.6% to 76.9%, the replacement efficiency of Waxy is increased from 13.3% to 30%, the replacement efficiency of NRT1.1B-T1 is increased from 58.6% to 73.3%, the replacement efficiency of NRT1.1B-T2 is increased from 60% to 76.9%, the replacement efficiency of NRT1.1B-T3 is increased from 36.7% to 42.3%, the replacement efficiency of CDC48-T1 is increased from 56.7% to 80.8%, and the replacement efficiency of CDC48-T2 is increased from 33.3% to 65.4%. The results show that the differential proxy technology ratio is Cas9n&PmCDA1&UGI-mediated cell enrichment technology sgRNA-ATG-Hyg-ATGThe enrichment efficiency of the/sgRNA-GT cells is higher.
The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The use of some of the essential features is possible within the scope of the claims attached below.
Sequence listing
<110> agriculture and forestry academy of sciences of Beijing City
<120> cell enrichment technique by C.T base substitution using inactivated screening agent resistance gene as reporter system and use thereof
<160>12
<170>PatentIn version 3.5
<210>1
<211>19029
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
ggtggcagga tatattgtgg tgtaaacatg gcactagcct caccgtcttc gcagacgagg 60
ccgctaagtc gcagctacgc tctcaacggc actgactagg tagtttaaac gtgcacttaa 120
ttaaggtacc gaagcaactt aaagttatca ggcatgcatg gatcttggag gaatcagatg 180
tgcagtcagg gaccatagca caagacaggc gtcttctact ggtgctacca gcaaatgctg 240
gaagccggga acactgggta cgttggaaac cacgtgatgt gaagaagtaa gataaactgt 300
aggagaaaag catttcgtag tgggccatga agcctttcag gacatgtatt gcagtatggg 360
ccggcccatt acgcaattgg acgacaacaa agactagtat tagtaccacc tcggctatcc 420
acatagatca aagctgattt aaaagagttg tgcagatgat ccgtggcgga tccaacaaag 480
caccagtggt ctagtggtag aatagtaccc tgccacggta cagacccggg ttcgattccc 540
ggctggtgca cgcgtccatg gagatccacc gttttagagc tagaaatagc aagttaaaat 600
aaggctagtc cgttatcaac ttgaaaaagt ggcaccgagt cggtgcaaca aagcaccagt 660
ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt cccggctggt 720
gcagaccagc cagcgtctgg cgcgttttag agctagaaat agcaagttaa aataaggcta 780
gtccgttatc aacttgaaaa agtggcaccg agtcggtgca acaaagcacc agtggtctag 840
tggtagaata gtaccctgcc acggtacaga cccgggttcg attcccggct ggtgcacggc 900
gacggcgagc aagtgggttt tagagctaga aatagcaagt taaaataagg ctagtccgtt 960
atcaacttga aaaagtggca ccgagtcggt gcaacaaagc accagtggtc tagtggtaga 1020
atagtaccct gccacggtac agacccgggt tcgattcccg gctggtgcat tgtaatcaac 1080
tccagtgtcg ttttagagct agaaatagca agttaaaata aggctagtcc gttatcaact 1140
tgaaaaagtg gcaccgagtc ggtgcaacaa agcaccagtg gtctagtggt agaatagtac 1200
cctgccacgg tacagacccg ggttcgattc ccggctggtg cagaacaacc aacatttggg 1260
tagttttaga gctagaaata gcaagttaaa ataaggctag tccgttatca acttgaaaaa 1320
gtggcaccga gtcggtgcaa caaagcacca gtggtctagt ggtagaatag taccctgcca 1380
cggtacagac ccgggttcga ttcccggctg gtgcagcaac gagtggtgtg gtgccgtttt 1440
agagctagaa atagcaagtt aaaataaggc tagtccgtta tcaacttgaa aaagtggcac 1500
cgagtcggtg cttttttttt tcgttttgca ttgagttttc tccgtcgcat gtttgcagtt 1560
ttattttccg ttttgcattg aaatttctcc gtctcatgtt tgcagcgtgt tcaaaaagta 1620
cgcagctgta tttcacttat ttacggcgcc acattttcat gccgtttgtg ccaactatcc 1680
cgagctagtg aatacagctt ggcttcacac aacactggtg acccgctgac ctgctcgtac 1740
ctcgtaccgt cgtacggcac agcatttgga attaaagggt gtgatcgata ctgcttgctg 1800
ctaagcttac aaattcgggt caaggcggaa gccagcgcgc caccccacgt cagcaaatac 1860
ggaggcgcgg ggttgacggc gtcacccggt cctaacggcg accaacaaac cagccagaag 1920
aaattacagt aaaaaaaaag taaattgcac tttgatccac cttttattac ctaagtctca 1980
atttggatca cccttaaacc tatcttttca atttgggccg ggttgtggtt tggactacca 2040
tgaacaactt ttcgtcatgt ctaacttccc tttcagcaaa catatgaacc atatatagag 2100
gagatcggcc gtatactaga gctgatgtgt ttaaggtcgt tgattgcacg agaaaaaaaa 2160
atccaaatcg caacaatagc aaatttatct ggttcaaagt gaaaagatat gtttaaaggt 2220
agtccaaagt aaaacttata gataataaaa tgtggtccaa agcgtaattc actcaaaaaa 2280
aatcaacgag acgtgtacca aacggagaca aacggcatct tctcgaaatt tcccaaccgc 2340
tcgctcgccc gcctcgtctt cccggaaacc gcggtggttt cagcgtggcg gattctccaa 2400
gcagacggag acgtcacggc acgggactcc tcccaccacc caaccgccat aaataccagc 2460
cccctcatct cctctcctcg catcagctcc acccccgaaa aatttctccc caatctcgcg 2520
aggctctcgt cgtcgaatcg aatcctctcg cgtcctcaag gtacgctgct tctcctctcc 2580
tcgcttcgtt tcgattcgat ttcggacggg tgaggttgtt ttgttgctag atccgattgg 2640
tggttagggt tgtcgatgtg attatcgtga gatgtttagg ggttgtagat ctgatggttg 2700
tgatttgggc acggttggtt cgataggtgg aatcgtggtt aggttttggg attggatgtt 2760
ggttctgatg attgggggga atttttacgg ttagatgaat tgttggatga ttcgattggg 2820
gaaatcggtg tagatctgtt ggggaattgt ggaactagtc atgcctgagt gattggtgcg 2880
atttgtagcg tgttccatct tgtaggcctt gttgcgagca tgttcagatc tactgttccg 2940
ctcttgattg agttattggt gccatgggtt ggtgcaaaca caggctttaa tatgttatat 3000
ctgttttgtg tttgatgtag atctgtaggg tagttcttct tagacatggt tcaattatgt 3060
agcttgtgcg tttcgatttg atttcatatg ttcacagatt agataatgat gaactctttt 3120
aattaattgt caatggtaaa taggaagtct tgtcgctata tctgtcataa tgatctcatg 3180
ttactatctg ccagtaattt atgctaagaa ctatattaga atatcatgtt acaatctgta 3240
gtaatatcat gttacaatct gtagttcatc tatataatct attgtggtaa tttcttttta 3300
ctatctgtgt gaagattatt gccactagtt cattctactt atttctgaag ttcaggatac 3360
gtgtgctgtt actacctatc tgaatacatg tgtgatgtgc ctgttactat ctttttgaat 3420
acatgtatgt tctgttggaa tatgtttgct gtttgatccg ttgttgtgtc cttaatcttg 3480
tgctagttct taccctatct gtttggtgat tatttcttgc agtacgtaat ggactacaag 3540
gaccacgacg gggattacaa agaccacgac atagactaca aggatgacga tgacaaaatg 3600
gcaccgaaga aaaaaaggaa ggtcggaatc catggcgttc cagctgccga taagaaatat 3660
tccatcggac tcgccattgg cacgaatagc gtcggatggg ctgttattac tgatgagtac 3720
aaagttccgt ctaagaagtt caaggtgctg ggcaacacag accgccacag cataaagaaa 3780
aatctcatcg gtgcactcct tttcgatagt ggggagactg cagaagcgac aagattgaaa 3840
aggactgcga gaaggcgcta tacacggcgt aagaatagaa tctgctacct tcaggagatt 3900
ttctctaacg aaatggctaa ggtcgatgac agtttctttc atagacttga ggaatcgttc 3960
ttggttgagg aggataagaa acatgagagg cacccgatat ttggaaacat cgtggatgag 4020
gtcgcatatc atgaaaagta ccccacaatc taccacctga gaaagaaact cgttgattcc 4080
accgacaaag cggatttgag actcatctac ctcgctcttg cccatatgat aaagttccgc 4140
ggacactttc tgatcgaggg cgacctcaac cctgataata gcgacgtcga taagctcttc 4200
atccagttgg ttcaaaccta caatcagctc tttgaggaaa acccaattaa tgctagtgga 4260
gtggatgcaa aagcgatact gtcggccaga ctctccaaga gcagaaggtt ggagaacctg 4320
atcgctcaac ttcctggaga aaagaaaaac ggtctttttg ggaatttgat tgccttgtct 4380
ctgggcctca caccaaactt caagtcaaat tttgacctcg ctgaggatgc caaacttcag 4440
ttgtctaagg atacctatga tgacgatctt gacaatttgc tggcacaaat tggcgaccag 4500
tacgcggatc tgttcctcgc agcgaagaat ctgagtgatg ctattctcct ttcggacata 4560
ctcagggtta acactgagat cacaaaagca cctttgagtg cgtcgatgat taagcgctat 4620
gatgaacatc accaagacct cactttgctg aaggcccttg tgcggcagca attgccagag 4680
aagtacaaag aaatcttctt tgaccaatct aagaacggat acgctggcta tattgatgga 4740
ggagcttctc aggaggaatt ctataagttt atcaaaccta tacttgagaa gatggatggt 4800
acagaggaac tccttgttaa attgaacaga gaagatttgc tgcgcaagca acggaccttt 4860
gacaacggat caattccgca tcagatacac ctcggcgagc ttcatgccat ccttcgccgg 4920
caggaagatt tctacccctt tttgaaggac aaccgcgaga agatagaaaa aatccttacg 4980
ttccggattc cttactatgt gggtccattg gcaaggggga attcccgctt tgcgtggatg 5040
actcggaaaa gcgaggaaac tatcacaccg tggaacttcg aggaagttgt ggacaaggga 5100
gcttctgccc aatcattcat tgagaggatg actaacttcg ataagaacct gccgaacgag 5160
aaagttctcc ccaagcactc cctcctttac gagtatttca ccgtgtataa cgaacttacg 5220
aaggttaaat acgtgactga gggtatgagg aagccagcat tcttgagcgg ggaacaaaag 5280
aaagcgattg ttgatttgct gtttaaaact aatcgcaagg tgacagtcaa gcagctcaaa 5340
gaggattatt tcaagaaaat tgaatgtttc gactctgtgg agatatcagg agtcgaagat 5400
aggtttaacg cttcccttgg cacataccat gacctcctta agatcattaa ggacaaagat 5460
ttcctggata acgaggaaaa tgaggacatc ctcgaagata ttgttcttac cttgacgctg 5520
tttgaggatc gcgaaatgat cgaggaacgg cttaagacgt atgctcactt gttcgacgat 5580
aaggttatga agcagctcaa gcgtagaagg tacactggat ggggccgtct gtctagaaag 5640
ctcatcaacg gaatacgtga taaacaaagt ggcaagacaa ttttggattt tctgaagtcg 5700
gacggattcg ccaacagaaa ttttatgcag ctgattcatg acgatagtct caccttcaaa 5760
gaggacatac agaaggctca agtgagtggt caaggggatt cgctgcatga acacatcgca 5820
aacctcgcgg gttcaccggc cataaagaaa ggaatccttc aaactgttaa ggtcgttgat 5880
gagttggtta aagtgatggg taggcacaag cccgaaaaca tagtgatcga gatggctcgc 5940
gaaaatcaga ctacacaaaa agggcagaag aactctcgcg agcggatgaa aaggattgag 6000
gaaggaatca aggaactggg ctcacagatt ctcaaagagc atccagtcga aaacacacag 6060
ctgcaaaatg agaagctcta tctttactat ctccaaaatg gccgggacat gtatgttgat 6120
caggagcttg acatcaaccg tttgtccgac tatgatgtgg accacattgt cccgcaatct 6180
ttccttaagg acgattcaat cgataataag gtgttgaccc ggagcgataa aaaccgtgga 6240
aagtctgaca atgtcccttc agaggaagtg gttaagaaga tgaagaacta ctggagacaa 6300
ttgctgaatg caaaactgat cacacagaga aagttcgaca acctcaccaa agcagagaga 6360
ggtgggctca gtgaacttga taaagcgggc ttcattaagc gtcagctcgt tgagactaga 6420
cagatcacga agcatgtcgc gcagattttg gattcgcgga tgaacacgaa gtacgacgag 6480
aatgataaac tgatacgtga agtcaaggtt atcactctta agtccaaatt ggtgagcgat 6540
ttcagaaagg acttccaatt ctataaggtc agggagatca acaattatca tcacgctcac 6600
gatgcctacc ttaatgctgt tgtggggacc gcccttatta agaaataccc taaattggag 6660
tctgaattcg tttacgggga ttataaggtc tacgacgtta ggaaaatgat agctaagagt 6720
gagcaggaga tcggtaaagc aactgcgaag tatttctttt actcgaacat catgaatttc 6780
tttaagaccg agataacgct ggcaaatggc gaaattagaa agaggcctct catagagact 6840
aacggtgaga caggggaaat cgtctgggat aagggtaggg actttgcgac agtgcgcaag 6900
gtcctctcta tgccgcaagt taatattgtg aagaaaaccg aggtgcagac gggaggcttc 6960
tccaaggaaa gcatacttcc caaacggaac tctgataagt tgatcgctcg taagaaagat 7020
tgggacccta agaaatatgg tgggttcgat tccccaactg ttgcttacag cgtgctggtc 7080
gttgccaagg tcgagaaggg taaatccaag aaactcaaaa gcgttaagga actccttggg 7140
attactatca tggagagatc ttcattcgaa aagaatccta tcgactttct tgaggccaaa 7200
ggatataagg aagttaagaa agatctgata atcaaactcc caaagtactc attgtttgag 7260
ctggaaaacg gcaggaagcg catgcttgct tccgccggag agttgcagaa agggaacgag 7320
ttggctctgc cttctaagta tgttaacttc ctctatcttg cctctcatta cgagaagctc 7380
aaaggctcac cagaggacaa cgaacagaaa caactttttg tcgagcaaca taagcactat 7440
ttggatgaga ttatagaaca gatcagtgaa ttctcgaaaa gggttatcct tgcagatgcg 7500
aatcttgaca aggtgttgtc tgcatacaac aaacatagag ataagccgat cagggagcaa 7560
gcggaaaata tcattcacct cttcactctt acaaacttgg gtgctcccgc tgccttcaag 7620
tattttgata ccacgattga ccggaaacgt tacacctcaa cgaaggaggt gctggatgcc 7680
accctcatcc accaatctat taccggactc tacgagacta gaatcgatct ctcacagctc 7740
ggcggggata aaagaccagc agcgacgaaa aaggcaggac aggctaagaa gaagaaagag 7800
ctcggaggag gaggcacggg aggaggaggc tccgccgagt atgtgcgcgc gctcttcgac 7860
ttcaacggca atgacgagga ggatctccct ttcaagaagg gcgacatcct ccgcatccgc 7920
gataagccgg aggagcagtg gtggaacgca gaggactccg agggcaagcg gggcatgatc 7980
ctggtgccat acgtcgagaa gtacagcggc gattacaagg accacgatgg cgactacaag 8040
gatcatgaca tcgattacaa ggacgatgac gataagtccg gcgtcgacat gacggacgcg 8100
gagtatgtgc gcatccacga gaagctcgat atctacacct tcaagaagca gttcttcaac 8160
aataagaagt cggtgtccca tcggtgctac gtcctcttcg agctgaagcg caggggagag 8220
cgccgcgcct gcttctgggg ctacgcggtg aataagccgc agtcaggcac agagcgcggc 8280
atccacgccg agatcttctc gatccggaag gtcgaggagt acctccgcga caacccaggc 8340
cagttcacga tcaattggta ctccagctgg tccccttgcg cagattgcgc agagaagatc 8400
ctcgagtggt acaaccagga gctgaggggc aatggccata ccctcaagat ctgggcctgc 8460
aagctgtact acgagaagaa cgcgaggaat cagatcggcc tctggaacct gcgggataat 8520
ggcgtgggcc tcaacgtgat ggtgtccgag cactaccagt gctgccgcaa gatcttcatc 8580
cagtcctccc acaatcagct gaacgagaat aggtggctcg aaaagaccct gaagcgcgcc 8640
gagaagtgga ggagcgagct gtctatcatg atccaggtca agatcctgca caccacaaag 8700
tcaccggcgg tgggcggcgg cggcagcgaa ttctccggcg gcagcacgaa cctcagcgac 8760
atcatcgaga aggagacagg caagcagctc gtgatccagg agtctatcct catgctgcct 8820
gaggaggtgg aggaggtcat cggcaacaag ccggagtccg atatcctcgt gcacaccgcc 8880
tacgacgagt cgacagatga gaatgtcatg ctcctgacct ccgacgcacc agagtacaag 8940
ccatgggcgc tcgtgatcca ggattccaac ggcgagaata agatcaagat gctgtctggc 9000
ggctccccga agaagaagcg caaggtctag actagtctga aatcaccagt ctctctctac 9060
aaatctatct ctctctataa taatgtgtga gtagttccca gataagggaa ttagggttct 9120
tatagggttt cgctcatgtg ttgagcatat aagaaaccct tagtatgtat ttgtatttgt 9180
aaaatacttc tatcaataaa atttctaatt cctaaaacca aaatccagtg gggcgcccga 9240
cctgtactcg cgaaggttaa cttacagaga gtgtccgggc gcgcctggtg gatcgtccgc 9300
ctaggctgca gtgcagcgtg acccggtcgt gcccctctct agagataatg agcattgcat 9360
gtctaagtta taaaaaatta ccacatattt tttttgtcac acttgtttga agtgcagttt 9420
atctatcttt atacatatat ttaaacttta ctctacgaat aatataatct atagtactac 9480
aataatatca gtgttttaga gaatcatata aatgaacagt tagacatggt ctaaaggaca 9540
attgagtatt ttgacaacag gactctacag ttttatcttt ttagtgtgca tgtgttctcc 9600
tttttttttg caaatagctt cacctatata atacttcatc cattttatta gtacatccat 9660
ttagggttta gggttaatgg tttttataga ctaatttttt tagtacatct attttattct 9720
attttagcct ctaaattaag aaaactaaaa ctctatttta gtttttttat ttaataattt 9780
agatataaaa tagaataaaa taaagtgact aaaaattaaa caaataccct ttaagaaatt 9840
aaaaaaacta aggaaacatt tttcttgttt cgagtagata atgccagcct gttaaacgcc 9900
gtcgacgagt ctaacggaca ccaaccagcg aaccagcagc gtcgcgtcgg gccaagcgaa 9960
gcagacggca cggcatctct gtcgctgcct ctggacccct ctcgagagtt ccgctccacc 10020
gttggacttg ctccgctgtc ggcatccaga aattgcgtgg cggagcggca gacgtgagcc 10080
ggcacggcag gcggcctcct cctcctctca cggcaccggc agctacgggg gattcctttc 10140
ccaccgctcc ttcgctttcc cttcctcgcc cgccgtaata aatagacacc ccctccacac 10200
cctctttccc caacctcgtg ttgttcggag cgcacacaca cacaaccaga tctcccccaa 10260
atccacccgt cggcacctcc gcttcaaggt acgccgctcg tcctcccccc ccccccctct 10320
ctaccttctc tagatcggcg ttccggtcca tggttagggc ccggtagttc tacttctgtt 10380
catgtttgtg ttagatccgt gtttgtgtta gatccgtgct gctagcgttc gtacacggat 10440
gcgacctgta cgtcagacac gttctgattg ctaacttgcc agtgtttctc tttggggaat 10500
cctgggatgg ctctagccgt tccgcagacg ggatcgattt catgattttt tttgtttcgt 10560
tgcatagggt ttggtttgcc cttttccttt atttcaatat atgccgtgca cttgtttgtc 10620
gggtcatctt ttcatgcttt tttttgtctt ggttgtgatg atgtggtctg gttgggcggt 10680
cgttctagat cggagtagaa ttctgtttca aactacctgg tggatttatt aattttggat 10740
ctgtatgtgt gtgccataca tattcatagt tacgaattga agatgatgga tggaaatatc 10800
gatctaggat aggtatacat gttgatgcgg gttttactga tgcatataca gagatgcttt 10860
ttgttcgctt ggttgtgatg atgtggtgtg gttgggcggt cgttcattcg ttctagatcg 10920
gagtagaata ctgtttcaaa ctacctggtg tatttattaa ttttggaact gtatgtgtgt 10980
gtcatacatc ttcatagtta cgagtttaag atggatggaa atatcgatct aggataggta 11040
tacatgttga tgtgggtttt actgatgcat atacatgatg gcatatgcag catctattca 11100
tatgctctaa ccttgagtac ctatctatta taataaacaa gtatgtttta taattatttt 11160
gatcttgata tacttggatg atggcatatg cagcagctat atgtggattt ttttagccct 11220
gccttcatac gctatttatt tgcttggtac tgtttctttt gtcgatgctc accctgttgt 11280
ttggtgttac ttctgcagga gctcgcaacg agtggtgtgg tgcctggcaa aaagcctgaa 11340
ctcaccgcga cgtctgtcga gaagtttctg atcgaaaagt tcgacagcgt ctccgacctg 11400
atgcagctct cggagggcga agaatctcgt gctttcagct tcgatgtagg agggcgtgga 11460
tatgtcctgc gggtaaatag ctgcgccgat ggtttctaca aagatcgtta tgtttatcgg 11520
cactttgcat cggccgcgct cccgattccg gaagtgcttg acattgggga gtttagcgag 11580
agcctgacct attgcatctc ccgccgttca cagggtgtca cgttgcaaga cctgcctgaa 11640
accgaactgc ccgctgttct acaaccggtc gcggaggcta tggatgcgat cgctgcggcc 11700
gatcttagcc agacgagcgg gttcggccca ttcggaccgc aaggaatcgg tcaatacact 11760
acatggcgtg atttcatatg cgcgattgct gatccccatg tgtatcactg gcaaactgtg 11820
atggacgaca ccgtcagtgc gtccgtcgcg caggctctcg atgagctgat gctttgggcc 11880
gaggactgcc ccgaagtccg gcacctcgtg cacgcggatt tcggctccaa caatgtcctg 11940
acggacaatg gccgcataac agcggtcatt gactggagcg aggcgatgtt cggggattcc 12000
caatacgagg tcgccaacat cttcttctgg aggccgtggt tggcttgtat ggagcagcag 12060
acgcgctact tcgagcggag gcatccggag cttgcaggat cgccacgact ccgggcgtat 12120
atgctccgca ttggtcttga ccaactctat cagagcttgg ttgacggcaa tttcgatgat 12180
gcagcttggg cgcagggtcg atgcgacgca atcgtccgat ccggagccgg gactgtcggg 12240
cgtacacaaa tcgcccgcag aagcgcggcc gtctggaccg atggctgtgt agaagtactc 12300
gccgatagtg gaaaccgacg ccccagcact cgtccgaggg caaagaaata gggccagtta 12360
ggccgatcgt tcaaacattt ggcaataaag tttcttaaga ttgaatcctg ttgccggtct 12420
tgcgatgatt atcatataat ttctgttgaa ttacgttaag catgtaataa ttaacatgta 12480
atgcatgacg ttatttatga gatgggtttt tatgattaga gtcccgcaat tatacattta 12540
atacgcgata gaaaacaaaa tatagcgcgc aaactaggat aaattatcgc gcgcggtgtc 12600
atctatgtta ctagatctgt agccctgcag gacgcgttta attaagtgca cgcggccgcc 12660
tacttagtca agagcctcgc acgcgactgt cacgcggcca ggatcgcctc gtgagcctcg 12720
caatctgtac ctagtgttta aactatcagt gtttgacagg atatattggc gggtaaacct 12780
aagagaaaag agcgtttatt agaataacgg atatttaaaa gggcgtgaaa aggtttatcc 12840
gttcgtccat ttgtatgtgc atgccaacca cagggttccc ctcgggatca aagtactttg 12900
atccaacccc tccgctgcta tagtgcagtc ggcttctgac gttcagtgca gccgtcttct 12960
gaaaacgaca tgtcgcacaa gtcctaagtt acgcgacagg ctgccgccct gcccttttcc 13020
tggcgttttc ttgtcgcgtg ttttagtcgc ataaagtaga atacttgcga ctagaaccgg 13080
agacattacg ccatgaacaa gagcgccgcc gctggcctgc tgggctatgc ccgcgtcagc 13140
accgacgacc aggacttgac caaccaacgg gccgaactgc acgcggccgg ctgcaccaag 13200
ctgttttccg agaagatcac cggcaccagg cgcgaccgcc cggagctggc caggatgctt 13260
gaccacctac gccctggcga cgttgtgaca gtgaccaggc tagaccgcct ggcccgcagc 13320
acccgcgacc tactggacat tgccgagcgc atccaggagg ccggcgcggg cctgcgtagc 13380
ctggcagagc cgtgggccga caccaccacg ccggccggcc gcatggtgtt gaccgtgttc 13440
gccggcattg ccgagttcga gcgttcccta atcatcgacc gcacccggag cgggcgcgag 13500
gccgccaagg cccgaggcgt gaagtttggc ccccgcccta ccctcacccc ggcacagatc 13560
gcgcacgccc gcgagctgat cgaccaggaa ggccgcaccg tgaaagaggc ggctgcactg 13620
cttggcgtgc atcgctcgac cctgtaccgc gcacttgagc gcagcgagga agtgacgccc 13680
accgaggcca ggcggcgcgg tgccttccgt gaggacgcat tgaccgaggc cgacgccctg 13740
gcggccgccg agaatgaacg ccaagaggaa caagcatgaa accgcaccag gacggccagg 13800
acgaaccgtt tttcattacc gaagagatcg aggcggagat gatcgcggcc gggtacgtgt 13860
tcgagccgcc cgcgcacgtc tcaaccgtgc ggctgcatga aatcctggcc ggtttgtctg 13920
atgccaagct ggcggcctgg ccggccagct tggccgctga agaaaccgag cgccgccgtc 13980
taaaaaggtg atgtgtattt gagtaaaaca gcttgcgtca tgcggtcgct gcgtatatga 14040
tgcgatgagt aaataaacaa atacgcaagg ggaacgcatg aaggttatcg ctgtacttaa 14100
ccagaaaggc gggtcaggca agacgaccat cgcaacccat ctagcccgcg ccctgcaact 14160
cgccggggcc gatgttctgt tagtcgattc cgatccccag ggcagtgccc gcgattgggc 14220
ggccgtgcgg gaagatcaac cgctaaccgt tgtcggcatc gaccgcccga cgattgaccg 14280
cgacgtgaag gccatcggcc ggcgcgactt cgtagtgatc gacggagcgc cccaggcggc 14340
ggacttggct gtgtccgcga tcaaggcagc cgacttcgtg ctgattccgg tgcagccaag 14400
cccttacgac atatgggcca ccgccgacct ggtggagctg gttaagcagc gcattgaggt 14460
cacggatgga aggctacaag cggcctttgt cgtgtcgcgg gcgatcaaag gcacgcgcat 14520
cggcggtgag gttgccgagg cgctggccgg gtacgagctg cccattcttg agtcccgtat 14580
cacgcagcgc gtgagctacc caggcactgc cgccgccggc acaaccgttc ttgaatcaga 14640
acccgagggc gacgctgccc gcgaggtcca ggcgctggcc gctgaaatta aatcaaaact 14700
catttgagtt aatgaggtaa agagaaaatg agcaaaagca caaacacgct aagtgccggc 14760
cgtccgagcg cacgcagcag caaggctgca acgttggcca gcctggcaga cacgccagcc 14820
atgaagcggg tcaactttca gttgccggcg gaggatcaca ccaagctgaa gatgtacgcg 14880
gtacgccaag gcaagaccat taccgagctg ctatctgaat acatcgcgca gctaccagag 14940
taaatgagca aatgaataaa tgagtagatg aattttagcg gctaaaggag gcggcatgga 15000
aaatcaagaa caaccaggca ccgacgccgt ggaatgcccc atgtgtggag gaacgggcgg 15060
ttggccaggc gtaagcggct gggttgtctg ccggccctgc aatggcactg gaacccccaa 15120
gcccgaggaa tcggcgtgac ggtcgcaaac catccggccc ggtacaaatc ggcgcggcgc 15180
tgggtgatga cctggtggag aagttgaagg ccgcgcaggc cgcccagcgg caacgcatcg 15240
aggcagaagc acgccccggt gaatcgtggc aagcggccgc tgatcgaatc cgcaaagaat 15300
cccggcaacc gccggcagcc ggtgcgccgt cgattaggaa gccgcccaag ggcgacgagc 15360
aaccagattt tttcgttccg atgctctatg acgtgggcac ccgcgatagt cgcagcatca 15420
tggacgtggc cgttttccgt ctgtcgaagc gtgaccgacg agctggcgag gtgatccgct 15480
acgagcttcc agacgggcac gtagaggttt ccgcagggcc ggccggcatg gccagtgtgt 15540
gggattacga cctggtactg atggcggttt cccatctaac cgaatccatg aaccgatacc 15600
gggaagggaa gggagacaag cccggccgcg tgttccgtcc acacgttgcg gacgtactca 15660
agttctgccg gcgagccgat ggcggaaagc agaaagacga cctggtagaa acctgcattc 15720
ggttaaacac cacgcacgtt gccatgcagc gtacgaagaa ggccaagaac ggccgcctgg 15780
tgacggtatc cgagggtgaa gccttgatta gccgctacaa gatcgtaaag agcgaaaccg 15840
ggcggccgga gtacatcgag atcgagctag ctgattggat gtaccgcgag atcacagaag 15900
gcaagaaccc ggacgtgctg acggttcacc ccgattactt tttgatcgat cccggcatcg 15960
gccgttttct ctaccgcctg gcacgccgcg ccgcaggcaa ggcagaagcc agatggttgt 16020
tcaagacgat ctacgaacgc agtggcagcg ccggagagtt caagaagttc tgtttcaccg 16080
tgcgcaagct gatcgggtca aatgacctgc cggagtacga tttgaaggag gaggcggggc 16140
aggctggccc gatcctagtc atgcgctacc gcaacctgat cgagggcgaa gcatccgccg 16200
gttcctaatg tacggagcag atgctagggc aaattgccct agcaggggaa aaaggtcgaa 16260
aaggtctctt tcctgtggat agcacgtaca ttgggaaccc aaagccgtac attgggaacc 16320
ggaacccgta cattgggaac ccaaagccgt acattgggaa ccggtcacac atgtaagtga 16380
ctgatataaa agagaaaaaa ggcgattttt ccgcctaaaa ctctttaaaa cttattaaaa 16440
ctcttaaaac ccgcctggcc tgtgcataac tgtctggcca gcgcacagcc gaagagctgc 16500
aaaaagcgcc tacccttcgg tcgctgcgct ccctacgccc cgccgcttcg cgtcggccta 16560
tcgcggccgc tggccgctca aaaatggctg gcctacggcc aggcaatcta ccagggcgcg 16620
gacaagccgc gccgtcgcca ctcgaccgcc ggcgcccaca tcaaggcacc ctgcctcgcg 16680
cgtttcggtg atgacggtga aaacctctga cacatgcagc tcccggagac ggtcacagct 16740
tgtctgtaag cggatgccgg gagcagacaa gcccgtcagg gcgcgtcagc gggtgttggc 16800
gggtgtcggg gcgcagccat gacccagtca cgtagcgata gcggagtgta tactggctta 16860
actatgcggc atcagagcag attgtactga gagtgcacca tatgcggtgt gaaataccgc 16920
acagatgcgt aaggagaaaa taccgcatca ggcgctcttc cgcttcctcg ctcactgact 16980
cgctgcgctc ggtcgttcgg ctgcggcgag cggtatcagc tcactcaaag gcggtaatac 17040
ggttatccac agaatcaggg gataacgcag gaaagaacat gtgagcaaaa ggccagcaaa 17100
aggccaggaa ccgtaaaaag gccgcgttgc tggcgttttt ccataggctc cgcccccctg 17160
acgagcatca caaaaatcga cgctcaagtc agaggtggcg aaacccgaca ggactataaa 17220
gataccaggc gtttccccct ggaagctccc tcgtgcgctc tcctgttccg accctgccgc 17280
ttaccggata cctgtccgcc tttctccctt cgggaagcgt ggcgctttct catagctcac 17340
gctgtaggta tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac 17400
cccccgttca gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag tccaacccgg 17460
taagacacga cttatcgcca ctggcagcag ccactggtaa caggattagc agagcgaggt 17520
atgtaggcgg tgctacagag ttcttgaagt ggtggcctaa ctacggctac actagaagga 17580
cagtatttgg tatctgcgct ctgctgaagc cagttacctt cggaaaaaga gttggtagct 17640
cttgatccgg caaacaaacc accgctggta gcggtggttt ttttgtttgc aagcagcaga 17700
ttacgcgcag aaaaaaagga tctcaagaag atcctttgat cttttctacg gggtctgacg 17760
ctcagtggaa cgaaaactca cgttaaggga ttttggtcat gcattctagg tactaaaaca 17820
attcatccag taaaatataa tattttattt tctcccaatc aggcttgatc cccagtaagt 17880
caaaaaatag ctcgacatac tgttcttccc cgatatcctc cctgatcgac cggacgcaga 17940
aggcaatgtc ataccacttg tccgccctgc cgcttctccc aagatcaata aagccactta 18000
ctttgccatc tttcacaaag atgttgctgt ctcccaggtc gccgtgggaa aagacaagtt 18060
cctcttcggg cttttccgtc tttaaaaaat catacagctc gcgcggatct ttaaatggag 18120
tgtcttcttc ccagttttcg caatccacat cggccagatc gttattcagt aagtaatcca 18180
attcggctaa gcggctgtct aagctattcg tatagggaca atccgatatg tcgatggagt 18240
gaaagagcct gatgcactcc gcatacagct cgataatctt ttcagggctt tgttcatctt 18300
catactcttc cgagcaaagg acgccatcgg cctcactcat gagcagattg ctccagccat 18360
catgccgttc aaagtgcagg acctttggaa caggcagctt tccttccagc catagcatca 18420
tgtccttttc ccgttccaca tcataggtgg tccctttata ccggctgtcc gtcattttta 18480
aatataggtt ttcattttct cccaccagct tatatacctt agcaggagac attccttccg 18540
tatcttttac gcagcggtat ttttcgatca gttttttcaa ttccggtgat attctcattt 18600
tagccattta ttatttcctt cctcttttct acagtattta aagatacccc aagaagctaa 18660
ttataacaag acgaactcca attcactgtt ccttgcattc taaaacctta aataccagaa 18720
aacagctttt tcaaagttgt tttcaaagtt ggcgtataac atagtatcga cggagccgat 18780
tttgaaaccg cggtgatcac aggcagcaac gctctgtcat cgttacaatc aacatgctac 18840
cctccgcgag atcatccgtg tttcaaaccc ggcagcttag ttgccgttct tccgaatagc 18900
atcggtaaca tgagcaaagt ctgccgcctt acaacggctc tcccgctgac gccgtcccgg 18960
actgatgggc tgcctgtatc gagtggtgat tttgtgccga gctgccggtc ggggagctgt 19020
tggctggct 19029
<210>2
<211>1423
<212>PRT
<213> Artificial Sequence (Artificial Sequence)
<400>2
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Asp Lys Lys Tyr Ser Ile Gly Leu
35 40 45
Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr
50 55 60
Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His
65 70 75 80
Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu
85 90 95
Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr
100 105 110
Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu
115 120 125
Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe
130 135 140
Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn
145 150 155 160
Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His
165 170 175
Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu
180 185 190
Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu
195 200 205
Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe
210 215 220
Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile
225 230 235 240
Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser
245 250 255
Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys
260 265 270
Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr
275 280 285
Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln
290 295 300
Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln
305 310 315 320
Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser
325 330 335
Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr
340 345 350
Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His
355 360 365
Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu
370 375 380
Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly
385 390 395 400
Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys
405 410 415
Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu
420 425 430
Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser
435 440 445
Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg
450 455 460
Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu
465 470 475 480
Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg
485 490 495
Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile
500 505 510
Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln
515 520 525
Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu
530 535 540
Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr
545 550 555 560
Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro
565 570 575
Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe
580 585 590
Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe
595 600 605
Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp
610 615 620
Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile
625 630 635 640
Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu
645 650 655
Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu
660 665 670
Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys
675 680 685
Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys
690 695 700
Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp
705 710 715 720
Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile
725 730 735
His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val
740 745 750
Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly
755 760 765
Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp
770 775 780
Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile
785 790 795 800
Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser
805 810 815
Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser
820 825 830
Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu
835 840 845
Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp
850 855 860
Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile
865 870 875 880
Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu
885 890 895
Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu
900 905 910
Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala
915 920 925
Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg
930 935 940
Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu
945 950 955 960
Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser
965 970 975
Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val
980 985 990
Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp
995 1000 1005
Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala
1010 1015 1020
His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys
1040 1045 1050
Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile
1055 1060 1065
Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn
1070 1075 1080
Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys
1085 1090 1095
Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp
1100 1105 1110
Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met
1115 1120 1125
Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1130 1135 1140
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu
1145 1150 1155
Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe
1160 1165 1170
Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val
1175 1180 1185
Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu
1190 1195 1200
Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile
1205 1210 1215
Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu
1220 1225 1230
Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly
1235 1240 1245
Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn
1250 1255 1260
Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln
1280 1285 1290
Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile
1295 1300 1305
Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp
1310 1315 1320
Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp
1325 1330 1335
Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr
1340 1345 1350
Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr
1355 1360 1365
Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1370 1375 1380
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg
1385 1390 1395
Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr
1400 1405 1410
Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1415 1420
<210>3
<211>208
<212>PRT
<213> Artificial Sequence (Artificial Sequence)
<400>3
Met Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys Leu Asp Ile Tyr
1 5 10 15
Thr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser Val Ser His Arg
20 25 30
Cys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu Arg Arg Ala Cys
35 40 45
Phe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly Thr Glu Arg Gly
50 55 60
Ile His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu Glu Tyr Leu Arg
65 70 75 80
Asp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser Ser Trp Ser Pro
85 90 95
Cys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr Asn Gln Glu Leu
100 105 110
Arg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys Lys Leu Tyr Tyr
115 120 125
Glu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn Leu Arg Asp Asn
130 135 140
Gly Val Gly Leu Asn Val Met Val Ser Glu His Tyr Gln Cys Cys Arg
145 150 155 160
Lys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn Glu Asn Arg Trp
165 170 175
Leu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg Ser Glu Leu Ser
180 185 190
Ile Met Ile Gln Val Lys Ile Leu His Thr Thr Lys Ser Pro Ala Val
195 200 205
<210>4
<211>98
<212>PRT
<213> Artificial Sequence (Artificial Sequence)
<400>4
Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly
1 5 10 15
Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val
20 25 30
Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr
35 40 45
Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp
50 55 60
Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly
65 70 75 80
Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg
85 90 95
Lys Val
<210>5
<211>1026
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
atgaaaaagc ctgaactcac cgcgacgtct gtcgagaagt ttctgatcga aaagttcgac 60
agcgtctccg acctgatgca gctctcggag ggcgaagaat ctcgtgcttt cagcttcgat 120
gtaggagggc gtggatatgt cctgcgggta aatagctgcg ccgatggttt ctacaaagat 180
cgttatgttt atcggcactt tgcatcggcc gcgctcccga ttccggaagt gcttgacatt 240
ggggagttta gcgagagcct gacctattgc atctcccgcc gttcacaggg tgtcacgttg 300
caagacctgc ctgaaaccga actgcccgct gttctacaac cggtcgcgga ggctatggat 360
gcgatcgctg cggccgatct tagccagacg agcgggttcg gcccattcgg accgcaagga 420
atcggtcaat acactacatg gcgtgatttc atatgcgcga ttgctgatcc ccatgtgtat 480
cactggcaaa ctgtgatgga cgacaccgtc agtgcgtccg tcgcgcaggc tctcgatgag 540
ctgatgcttt gggccgagga ctgccccgaa gtccggcacc tcgtgcacgc ggatttcggc 600
tccaacaatg tcctgacgga caatggccgc ataacagcgg tcattgactg gagcgaggcg 660
atgttcgggg attcccaata cgaggtcgcc aacatcttct tctggaggcc gtggttggct 720
tgtatggagc agcagacgcg ctacttcgag cggaggcatc cggagcttgc aggatcgcca 780
cgactccggg cgtatatgct ccgcattggt cttgaccaac tctatcagag cttggttgac 840
ggcaatttcg atgatgcagc ttgggcgcag ggtcgatgcg acgcaatcgt ccgatccgga 900
gccgggactg tcgggcgtac acaaatcgcc cgcagaagcg cggccgtctg gaccgatggc 960
tgtgtagaag tactcgccga tagtggaaac cgacgcccca gcactcgtcc gagggcaaag 1020
aaatag 1026
<210>6
<211>4269
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
atggactaca aggaccacga cggggattac aaagaccacg acatagacta caaggatgac 60
gatgacaaaa tggcaccgaa gaaaaaaagg aaggtcggaa tccatggcgt tccagctgcc 120
gataagaaat attccatcgg actcgccatt ggcacgaata gcgtcggatg ggctgttatt 180
actgatgagt acaaagttcc gtctaagaag ttcaaggtgc tgggcaacac agaccgccac 240
agcataaaga aaaatctcat cggtgcactc cttttcgata gtggggagac tgcagaagcg 300
acaagattga aaaggactgc gagaaggcgc tatacacggc gtaagaatag aatctgctac 360
cttcaggaga ttttctctaa cgaaatggct aaggtcgatg acagtttctt tcatagactt 420
gaggaatcgt tcttggttga ggaggataag aaacatgaga ggcacccgat atttggaaac 480
atcgtggatg aggtcgcata tcatgaaaag taccccacaa tctaccacct gagaaagaaa 540
ctcgttgatt ccaccgacaa agcggatttg agactcatct acctcgctct tgcccatatg 600
ataaagttcc gcggacactt tctgatcgag ggcgacctca accctgataa tagcgacgtc 660
gataagctct tcatccagtt ggttcaaacc tacaatcagc tctttgagga aaacccaatt 720
aatgctagtg gagtggatgc aaaagcgata ctgtcggcca gactctccaa gagcagaagg 780
ttggagaacc tgatcgctca acttcctgga gaaaagaaaa acggtctttt tgggaatttg 840
attgccttgt ctctgggcct cacaccaaac ttcaagtcaa attttgacct cgctgaggat 900
gccaaacttc agttgtctaa ggatacctat gatgacgatc ttgacaattt gctggcacaa 960
attggcgacc agtacgcgga tctgttcctc gcagcgaaga atctgagtga tgctattctc 1020
ctttcggaca tactcagggt taacactgag atcacaaaag cacctttgag tgcgtcgatg 1080
attaagcgct atgatgaaca tcaccaagac ctcactttgc tgaaggccct tgtgcggcag 1140
caattgccag agaagtacaa agaaatcttc tttgaccaat ctaagaacgg atacgctggc 1200
tatattgatg gaggagcttc tcaggaggaa ttctataagt ttatcaaacc tatacttgag 1260
aagatggatg gtacagagga actccttgtt aaattgaaca gagaagattt gctgcgcaag 1320
caacggacct ttgacaacgg atcaattccg catcagatac acctcggcga gcttcatgcc 1380
atccttcgcc ggcaggaaga tttctacccc tttttgaagg acaaccgcga gaagatagaa 1440
aaaatcctta cgttccggat tccttactat gtgggtccat tggcaagggg gaattcccgc 1500
tttgcgtgga tgactcggaa aagcgaggaa actatcacac cgtggaactt cgaggaagtt 1560
gtggacaagg gagcttctgc ccaatcattc attgagagga tgactaactt cgataagaac 1620
ctgccgaacg agaaagttct ccccaagcac tccctccttt acgagtattt caccgtgtat 1680
aacgaactta cgaaggttaa atacgtgact gagggtatga ggaagccagc attcttgagc 1740
ggggaacaaa agaaagcgat tgttgatttg ctgtttaaaa ctaatcgcaa ggtgacagtc 1800
aagcagctca aagaggatta tttcaagaaa attgaatgtt tcgactctgt ggagatatca 1860
ggagtcgaag ataggtttaa cgcttccctt ggcacatacc atgacctcct taagatcatt 1920
aaggacaaag atttcctgga taacgaggaa aatgaggaca tcctcgaaga tattgttctt 1980
accttgacgc tgtttgagga tcgcgaaatg atcgaggaac ggcttaagac gtatgctcac 2040
ttgttcgacg ataaggttat gaagcagctc aagcgtagaa ggtacactgg atggggccgt 2100
ctgtctagaa agctcatcaa cggaatacgt gataaacaaa gtggcaagac aattttggat 2160
tttctgaagt cggacggatt cgccaacaga gcttttgcgg cactgattgc tgacgatagt 2220
ctcaccttca aagaggacat acagaaggct caagtgagtg gtcaagggga ttcgctgcat 2280
gaacacatcg caaacctcgc gggttcaccg gccataaaga aaggaatcct tcaaactgtt 2340
aaggtcgttg atgagttggt taaagtgatg ggtaggcaca agcccgaaaa catagtgatc 2400
gagatggctc gcgaaaatca gactacacaa aaagggcaga agaactctcg cgagcggatg 2460
aaaaggattg aggaaggaat caaggaactg ggctcacaga ttctcaaaga gcatccagtc 2520
gaaaacacac agctgcaaaa tgagaagctc tatctttact atctccaaaa tggccgggac 2580
atgtatgttg atcaggagct tgacatcaac cgtttgtccg actatgatgt ggaccacatt 2640
gtcccgcaat ctttccttaa ggacgattca atcgataata aggtgttgac ccggagcgat 2700
aaaaaccgtg gaaagtctga caatgtccct tcagaggaag tggttaagaa gatgaagaac 2760
tactggagac aattgctgaa tgcaaaactg atcacacaga gaaagttcga caacctcacc 2820
aaagcagaga gaggtgggct cagtgaactt gataaagcgg gcttcattaa gcgtcagctc 2880
gttgagacta gacagatcac gaagcatgtc gcgcagattt tggattcgcg gatgaacacg 2940
aagtacgacg agaatgataa actgatacgt gaagtcaagg ttatcactct taagtccaaa 3000
ttggtgagcg atttcagaaa ggacttccaa ttctataagg tcagggagat caacaattat 3060
catcacgctc acgatgccta ccttaatgct gttgtgggga ccgcccttat taagaaatac 3120
cctaaattgg agtctgaatt cgtttacggg gattataagg tctacgacgt taggaaaatg 3180
atagctaaga gtgagcagga gatcggtaaa gcaactgcga agtatttctt ttactcgaac 3240
atcatgaatt tctttaagac cgagataacg ctggcaaatg gcgaaattag aaagaggcct 3300
ctcatagaga ctaacggtga gacaggggaa atcgtctggg ataagggtag ggactttgcg 3360
acagtgcgca aggtcctctc tatgccgcaa gttaatattg tgaagaaaac cgaggtgcag 3420
acgggaggct tctccaagga aagcatactt cccaaacgga actctgataa gttgatcgct 3480
cgtaagaaag attgggaccc taagaaatat ggtgggttcg attccccaac tgttgcttac 3540
agcgtgctgg tcgttgccaa ggtcgagaag ggtaaatcca agaaactcaa aagcgttaag 3600
gaactccttg ggattactat catggagaga tcttcattcg aaaagaatcc tatcgacttt 3660
cttgaggcca aaggatataa ggaagttaag aaagatctga taatcaaact cccaaagtac 3720
tcattgtttg agctggaaaa cggcaggaag cgcatgcttg cttccgccgg agagttgcag 3780
aaagggaacg agttggctct gccttctaag tatgttaact tcctctatct tgcctctcat 3840
tacgagaagc tcaaaggctc accagaggac aacgaacaga aacaactttt tgtcgagcaa 3900
cataagcact atttggatga gattatagaa cagatcagtg aattctcgaa aagggttatc 3960
cttgcagatg cgaatcttga caaggtgttg tctgcataca acaaacatag agataagccg 4020
atcagggagc aagcggaaaa tatcattcac ctcttcactc ttacaaactt gggtgctccc 4080
gctgccttca agtattttga taccacgatt gaccggaaac gttacacctc aacgaaggag 4140
gtgctggatg ccaccctcat ccaccaatct attaccggac tctacgagac tagaatcgat 4200
ctctcacagc tcggcgggga taaaagacca gcagcgacga aaaaggcagg acaggctaag 4260
aagaagaaa 4269
<210>7
<211>1423
<212>PRT
<213> Artificial Sequence (Artificial Sequence)
<400>7
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Asp Lys Lys Tyr Ser Ile Gly Leu
35 40 45
Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr
50 55 60
Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His
65 70 75 80
Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu
85 90 95
Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr
100 105 110
Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu
115 120 125
Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe
130 135 140
Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn
145 150 155 160
Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His
165 170 175
Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu
180 185 190
Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu
195 200 205
Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe
210 215 220
Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile
225 230 235 240
Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser
245 250 255
Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys
260 265 270
Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr
275 280 285
Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln
290 295 300
Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln
305 310 315 320
Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser
325 330 335
Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr
340 345 350
Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His
355 360 365
Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu
370 375 380
Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly
385 390 395 400
Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys
405 410 415
Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu
420 425 430
Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser
435 440 445
Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg
450 455 460
Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu
465 470 475 480
Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg
485 490 495
Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile
500 505 510
Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln
515 520 525
Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu
530 535 540
Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr
545 550 555 560
Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro
565 570 575
Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe
580 585 590
Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe
595 600 605
Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp
610 615 620
Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile
625 630 635 640
Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu
645 650 655
Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu
660 665 670
Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys
675 680 685
Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys
690 695 700
Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp
705 710 715 720
Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Ala Phe Ala Ala Leu Ile
725 730 735
Ala Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val
740 745 750
Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly
755 760 765
Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp
770 775 780
Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile
785 790 795 800
Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser
805 810 815
Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser
820 825 830
Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu
835 840 845
Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp
850 855 860
Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile
865 870 875 880
Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu
885 890 895
Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu
900 905 910
Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala
915 920 925
Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg
930 935 940
Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu
945 950 955 960
Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser
965 970 975
Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val
980 985 990
Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp
995 1000 1005
Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala
1010 1015 1020
His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys
1040 1045 1050
Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile
1055 1060 1065
Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn
1070 1075 1080
Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys
1085 1090 1095
Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp
1100 1105 1110
Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met
1115 1120 1125
Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1130 1135 1140
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu
1145 1150 1155
Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe
1160 1165 1170
Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val
1175 1180 1185
Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu
1190 1195 1200
Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile
1205 1210 1215
Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu
1220 1225 1230
Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly
1235 1240 1245
Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn
1250 1255 1260
Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln
1280 1285 1290
Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile
1295 1300 1305
Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp
1310 1315 1320
Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp
1325 1330 1335
Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr
1340 1345 1350
Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr
1355 1360 1365
Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1370 1375 1380
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg
1385 1390 1395
Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr
1400 1405 1410
Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1415 1420
<210>8
<211>1153
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>8
gaagcaactt aaagttatca ggcatgcatg gatcttggag gaatcagatg tgcagtcagg 60
gaccatagca caagacaggc gtcttctact ggtgctacca gcaaatgctg gaagccggga 120
acactgggta cgttggaaac cacgtgatgt gaagaagtaa gataaactgt aggagaaaag 180
catttcgtag tgggccatga agcctttcag gacatgtatt gcagtatggg ccggcccatt 240
acgcaattgg acgacaacaa agactagtat tagtaccacc tcggctatcc acatagatca 300
aagctgattt aaaagagttg tgcagatgat ccgtggcgga tccaacaaag caccagtggt 360
ctagtggtag aatagtaccc tgccacggta cagacccggg ttcgattccc ggctggtgca 420
cgcccccacc cggcctcgag gttttagagc tagaaatagc aagttaaaat aaggctagtc 480
cgttatcaac ttgaaaaagt ggcaccgagt cggtgcaaca aagcaccagt ggtctagtgg 540
tagaatagta ccctgccacg gtacagaccc gggttcgatt cccggctggt gcaccctcaa 600
cagtatcatc aatgttttag agctagaaat agcaagttaa aataaggcta gtccgttatc 660
aacttgaaaa agtggcaccg agtcggtgca acaaagcacc agtggtctag tggtagaata 720
gtaccctgcc acggtacaga cccgggttcg attcccggct ggtgcagcaa cgagtggtgt 780
ggtgccgttt tagagctaga aatagcaagt taaaataagg ctagtccgtt atcaacttga 840
aaaagtggca ccgagtcggt gctttttttt ttcgttttgc attgagtttt ctccgtcgca 900
tgtttgcagt tttattttcc gttttgcatt gaaatttctc cgtctcatgt ttgcagcgtg 960
ttcaaaaagt acgcagctgt atttcactta tttacggcgc cacattttca tgccgtttgt 1020
gccaactatc ccgagctagt gaatacagct tggcttcaca caacactggt gacccgctga 1080
cctgctcgta cctcgtaccg tcgtacggca cagcatttgg aattaaaggg tgtgatcgat 1140
actgcttgct gct 1153
<210>9
<211>5004
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>9
atgtccagcg agacaggacc agtggcagtc gacccaacac tgcgcaggcg gatcgagcca 60
cacgagttcg aggtgttctt cgatccgagg gagctccgga aggagacatg cctcctgtac 120
gagatcaact ggggcggccg ccactctatc tggaggcata cctcacagaa cacaaataag 180
catgtggagg tcaacttcat cgagaagttc accacagagc ggtacttctg cccgaatacg 240
cgctgctcca tcacctggtt cctgtcgtgg tccccatgcg gagagtgctc gagggcaatc 300
acggagttcc tctcccgcta cccgcacgtc accctgttca tctacatcgc acggctctac 360
caccatgcgg acccgcggaa taggcagggc ctccgcgatc tgatctcttc aggcgtgaca 420
atccagatca tgacggagca ggagtcaggc tactgctgga ggaacttcgt caattacagc 480
ccatctaacg aggcacactg gccgcgctac ccgcatctct gggtgcgcct ctacgtgctc 540
gagctgtact gcatcatcct cggcctgccg ccatgcctca atatcctgcg caggaagcag 600
ccgcagctga cgttcttcac catcgccctc cagagctgcc actaccagcg gctccctccg 660
catatcctgt gggcgacagg cctcaagtca ggctcggaga cacctggcac gtccgagagc 720
gccaccccgg agtctatgga ctacaaggac cacgacgggg attacaaaga ccacgacata 780
gactacaagg atgacgatga caaaatggca ccgaagaaaa aaaggaaggt cggaatccat 840
ggcgttccag ctgccgataa gaaatattcc atcggactcg ccattggcac gaatagcgtc 900
ggatgggctg ttattactga tgagtacaaa gttccgtcta agaagttcaa ggtgctgggc 960
aacacagacc gccacagcat aaagaaaaat ctcatcggtg cactcctttt cgatagtggg 1020
gagactgcag aagcgacaag attgaaaagg actgcgagaa ggcgctatac acggcgtaag 1080
aatagaatct gctaccttca ggagattttc tctaacgaaa tggctaaggt cgatgacagt 1140
ttctttcata gacttgagga atcgttcttg gttgaggagg ataagaaaca tgagaggcac 1200
ccgatatttg gaaacatcgt ggatgaggtc gcatatcatg aaaagtaccc cacaatctac 1260
cacctgagaa agaaactcgt tgattccacc gacaaagcgg atttgagact catctacctc 1320
gctcttgccc atatgataaa gttccgcgga cactttctga tcgagggcga cctcaaccct 1380
gataatagcg acgtcgataa gctcttcatc cagttggttc aaacctacaa tcagctcttt 1440
gaggaaaacc caattaatgc tagtggagtg gatgcaaaag cgatactgtc ggccagactc 1500
tccaagagca gaaggttgga gaacctgatc gctcaacttc ctggagaaaa gaaaaacggt 1560
ctttttggga atttgattgc cttgtctctg ggcctcacac caaacttcaa gtcaaatttt 1620
gacctcgctg aggatgccaa acttcagttg tctaaggata cctatgatga cgatcttgac 1680
aatttgctgg cacaaattgg cgaccagtac gcggatctgt tcctcgcagc gaagaatctg 1740
agtgatgcta ttctcctttc ggacatactc agggttaaca ctgagatcac aaaagcacct 1800
ttgagtgcgt cgatgattaa gcgctatgat gaacatcacc aagacctcac tttgctgaag 1860
gcccttgtgc ggcagcaatt gccagagaag tacaaagaaa tcttctttga ccaatctaag 1920
aacggatacg ctggctatat tgatggagga gcttctcagg aggaattcta taagtttatc 1980
aaacctatac ttgagaagat ggatggtaca gaggaactcc ttgttaaatt gaacagagaa 2040
gatttgctgc gcaagcaacg gacctttgac aacggatcaa ttccgcatca gatacacctc 2100
ggcgagcttc atgccatcct tcgccggcag gaagatttct accccttttt gaaggacaac 2160
cgcgagaaga tagaaaaaat ccttacgttc cggattcctt actatgtggg tccattggca 2220
agggggaatt cccgctttgc gtggatgact cggaaaagcg aggaaactat cacaccgtgg 2280
aacttcgagg aagttgtgga caagggagct tctgcccaat cattcattga gaggatgact 2340
aacttcgata agaacctgcc gaacgagaaa gttctcccca agcactccct cctttacgag 2400
tatttcaccg tgtataacga acttacgaag gttaaatacg tgactgaggg tatgaggaag 2460
ccagcattct tgagcgggga acaaaagaaa gcgattgttg atttgctgtt taaaactaat 2520
cgcaaggtga cagtcaagca gctcaaagag gattatttca agaaaattga atgtttcgac 2580
tctgtggaga tatcaggagt cgaagatagg tttaacgctt cccttggcac ataccatgac 2640
ctccttaaga tcattaagga caaagatttc ctggataacg aggaaaatga ggacatcctc 2700
gaagatattg ttcttacctt gacgctgttt gaggatcgcg aaatgatcga ggaacggctt 2760
aagacgtatg ctcacttgtt cgacgataag gttatgaagc agctcaagcg tagaaggtac 2820
actggatggg gccgtctgtc tagaaagctc atcaacggaa tacgtgataa acaaagtggc 2880
aagacaattt tggattttct gaagtcggac ggattcgcca acagaaattt tatgcagctg 2940
attcatgacg atagtctcac cttcaaagag gacatacaga aggctcaagt gagtggtcaa 3000
ggggattcgc tgcatgaaca catcgcaaac ctcgcgggtt caccggccat aaagaaagga 3060
atccttcaaa ctgttaaggt cgttgatgag ttggttaaag tgatgggtag gcacaagccc 3120
gaaaacatag tgatcgagat ggctcgcgaa aatcagacta cacaaaaagg gcagaagaac 3180
tctcgcgagc ggatgaaaag gattgaggaa ggaatcaagg aactgggctc acagattctc 3240
aaagagcatc cagtcgaaaa cacacagctg caaaatgaga agctctatct ttactatctc 3300
caaaatggcc gggacatgta tgttgatcag gagcttgaca tcaaccgttt gtccgactat 3360
gatgtggacc acattgtccc gcaatctttc cttaaggacg attcaatcga taataaggtg 3420
ttgacccgga gcgataaaaa ccgtggaaag tctgacaatg tcccttcaga ggaagtggtt 3480
aagaagatga agaactactg gagacaattg ctgaatgcaa aactgatcac acagagaaag 3540
ttcgacaacc tcaccaaagc agagagaggt gggctcagtg aacttgataa agcgggcttc 3600
attaagcgtc agctcgttga gactagacag atcacgaagc atgtcgcgca gattttggat 3660
tcgcggatga acacgaagta cgacgagaat gataaactga tacgtgaagt caaggttatc 3720
actcttaagt ccaaattggt gagcgatttc agaaaggact tccaattcta taaggtcagg 3780
gagatcaaca attatcatca cgctcacgat gcctacctta atgctgttgt ggggaccgcc 3840
cttattaaga aataccctaa attggagtct gaattcgttt acggggatta taaggtctac 3900
gacgttagga aaatgatagc taagagtgag caggagatcg gtaaagcaac tgcgaagtat 3960
ttcttttact cgaacatcat gaatttcttt aagaccgaga taacgctggc aaatggcgaa 4020
attagaaaga ggcctctcat agagactaac ggtgagacag gggaaatcgt ctgggataag 4080
ggtagggact ttgcgacagt gcgcaaggtc ctctctatgc cgcaagttaa tattgtgaag 4140
aaaaccgagg tgcagacggg aggcttctcc aaggaaagca tacttcccaa acggaactct 4200
gataagttga tcgctcgtaa gaaagattgg gaccctaaga aatatggtgg gttcgattcc 4260
ccaactgttg cttacagcgt gctggtcgtt gccaaggtcg agaagggtaa atccaagaaa 4320
ctcaaaagcg ttaaggaact ccttgggatt actatcatgg agagatcttc attcgaaaag 4380
aatcctatcg actttcttga ggccaaagga tataaggaag ttaagaaaga tctgataatc 4440
aaactcccaa agtactcatt gtttgagctg gaaaacggca ggaagcgcat gcttgcttcc 4500
gccggagagt tgcagaaagg gaacgagttg gctctgcctt ctaagtatgt taacttcctc 4560
tatcttgcct ctcattacga gaagctcaaa ggctcaccag aggacaacga acagaaacaa 4620
ctttttgtcg agcaacataa gcactatttg gatgagatta tagaacagat cagtgaattc 4680
tcgaaaaggg ttatccttgc agatgcgaat cttgacaagg tgttgtctgc atacaacaaa 4740
catagagata agccgatcag ggagcaagcg gaaaatatca ttcacctctt cactcttaca 4800
aacttgggtg ctcccgctgc cttcaagtat tttgatacca cgattgaccg gaaacgttac 4860
acctcaacga aggaggtgct ggatgccacc ctcatccacc aatctattac cggactctac 4920
gagactagaa tcgatctctc acagctcggc ggggataaaa gaccagcagc gacgaaaaag 4980
gcaggacagg ctaagaagaa gaaa 5004
<210>10
<211>23
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>10
cggcgacggc gagcaagtgg tgg 23
<210>11
<211>86
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>11
gtttcagagc tatgctggaa acagcatagc aagttgaaat aaggctagtc cgttatcaac 60
ttgaaaaagt ggcaccgagt cggtgc 86
<210>12
<211>229
<212>PRT
<213> Artificial Sequence (Artificial Sequence)
<400>12
Met Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg
1 5 10 15
Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30
Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp Gly Gly Arg His
35 40 45
Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu Val
50 55 60
Asn Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr
65 70 75 80
Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys
85 90 95
Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val Thr Leu
100 105 110
Phe Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg Asn Arg
115 120 125
Gln Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln Ile Met
130 135 140
Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr Ser
145 150 155 160
Pro Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp Val Arg
165 170 175
Leu Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu Pro Pro Cys
180 185 190
Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile
195 200 205
Ala Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp
210 215 220
Ala Thr Gly Leu Lys
225

Claims (12)

1. A kit comprising a sgRNA or a biological material related to the sgRNA, a c.t base substitution system, and a loss-of-function screener resistance gene or a biological material related to the loss-of-function screener resistance gene;
the sgRNA consists of sgRNA targeting a target gene target sequence and sgRNA targeting the loss-of-function screening agent resistance gene target sequence;
the sgRNA structure is as follows: an RNA-sgRNA backbone transcribed from the target sequence;
the c.t base substitution system comprises Cas9 nuclease or a biological material associated with the Cas9 nuclease and a cytosine deaminase or a biological material associated with the cytosine deaminase;
the C.T base replacement system can restore the function of the screening agent resistance gene with the loss of function by carrying out C.T base replacement on the screening agent resistance gene target sequence with the loss of function under the guidance of sgRNA of the screening agent resistance gene target sequence with the targeted loss of function;
the sgRNA framework is an RNA molecule obtained by replacing T in the 646 th position 571-646 of the sequence 1 with U;
the biological material related to the sgRNA is a nucleic acid molecule encoding the sgRNA or an expression cassette, a recombinant vector, a recombinant microorganism and a transgenic cell line containing the nucleic acid molecule;
the biological material related to the screening agent resistance gene with the loss of function is a nucleic acid molecule encoding the screening agent resistance gene with the loss of function or an expression cassette, a recombinant vector, a recombinant microorganism and a transgenic cell line containing the nucleic acid molecule;
the biological material related to the Cas9 nuclease is a nucleic acid molecule encoding the Cas9 nuclease or an expression cassette, a recombinant vector, a recombinant microorganism, a transgenic cell line containing the nucleic acid molecule;
the biological material related to the cytosine deaminase is a nucleic acid molecule encoding the cytosine deaminase or an expression cassette, a recombinant vector, a recombinant microorganism or a transgenic cell line containing the nucleic acid molecule;
the screening agent resistance gene with the function loss is a sequence obtained by deleting the initiation codon of the screening agent resistance gene and adding an agent target sequence at the 5' end of the screening agent resistance gene; the C.T base replacement system can restore the function of the screening agent resistance gene with the loss of function by carrying out C.T base replacement on the surrogate target sequence under the guidance of the sgRNA of the screening agent resistance gene target sequence with the targeted loss of function;
the surrogate target sequence is 11305 th-11327 th site of the sequence 1 or 10;
the screening agent resistance gene is an exogenous resistance gene.
2. The kit of claim 1, wherein: the screening agent resistance gene is a hygromycin resistance gene; the hygromycin resistance gene sequence is sequence 5.
3. The kit of claim 1, wherein: the sgRNA is tRNA-sgRNA;
the tRNA-sgRNA consists of tRNA-sgRNA targeting a target gene target sequence and tRNA-sgRNA targeting the loss-of-function screening agent resistance gene target sequence;
the tRNA-sgRNA structure is as follows: tRNA-RNA transcribed from the target sequence-sgRNA backbone;
the tRNA is an RNA molecule obtained by replacing T in the 474-550 th position of the sequence 1 with U.
4. The kit of claim 1, wherein: the C.T base substitution system further comprises UGI or biological material associated with the UGI;
the Cas9 nuclease is SpCas9n protein or HypaCas9n protein;
the cytosine deaminase is PmCDA1 protein or rAPOBEC1 protein;
the SpCas9n protein is a protein shown as a sequence 2;
the biological material related to the SpCas9n is any one of B1) to B5):
B1) a nucleic acid molecule encoding the SpCas9 n;
B2) an expression cassette comprising the nucleic acid molecule of B1);
B3) a recombinant vector containing the nucleic acid molecule of B1) or a recombinant vector containing the expression cassette of B2);
B4) a recombinant microorganism containing B1) the nucleic acid molecule, or a recombinant microorganism containing B2) the expression cassette, or a recombinant microorganism containing B3) the recombinant vector;
B5) a transgenic cell line comprising B1) the nucleic acid molecule or a transgenic cell line comprising B2) the expression cassette;
the HypaCas9n protein is a protein shown as a sequence 7;
the biological material related to the HypaCas9n is any one of D1) to D5):
D1) a nucleic acid molecule encoding the HypaCas9 n;
D2) an expression cassette comprising the nucleic acid molecule of D1);
D3) a recombinant vector containing the nucleic acid molecule of D1) or a recombinant vector containing the expression cassette of D2);
D4) a recombinant microorganism containing D1) the nucleic acid molecule, or a recombinant microorganism containing D2) the expression cassette, or a recombinant microorganism containing D3) the recombinant vector;
D5) a transgenic cell line comprising D1) the nucleic acid molecule or a transgenic cell line comprising the expression cassette of D2);
the PmCDA1 protein is a protein shown in a sequence 3;
the biological material related to the PmCDA1 protein is any one of F1) to F5):
F1) a nucleic acid molecule encoding the PmCDA1 protein;
F2) an expression cassette comprising the nucleic acid molecule of F1);
F3) a recombinant vector comprising the nucleic acid molecule of F1) or a recombinant vector comprising the expression cassette of F2);
F4) a recombinant microorganism containing F1) said nucleic acid molecule, or a recombinant microorganism containing F2) said expression cassette, or a recombinant microorganism containing F3) said recombinant vector;
F5) a transgenic cell line comprising the nucleic acid molecule of F1) or a transgenic cell line comprising the expression cassette of F2);
the rAPOBEC1 protein is a protein shown in a sequence 12;
the biological material related to the rAPOBEC1 protein is any one of H1) to H5):
H1) a nucleic acid molecule encoding said rAPOBEC1 protein;
H2) an expression cassette comprising the nucleic acid molecule of H1);
H3) a recombinant vector containing H1) the nucleic acid molecule or a recombinant vector containing H2) the expression cassette;
H4) a recombinant microorganism containing H1) the nucleic acid molecule, or a recombinant microorganism containing H2) the expression cassette, or a recombinant microorganism containing H3) the recombinant vector;
H5) a transgenic cell line comprising H1) the nucleic acid molecule or a transgenic cell line comprising H2) the expression cassette;
the UGI protein is a protein shown in a sequence 4;
the biological material related to the UGI protein is any one of J1) to J5):
J1) a nucleic acid molecule encoding the UGI protein;
J2) an expression cassette comprising the nucleic acid molecule of J1);
J3) a recombinant vector comprising J1) said nucleic acid molecule, or a recombinant vector comprising J2) said expression cassette;
J4) a recombinant microorganism containing J1) the nucleic acid molecule, or a recombinant microorganism containing J2) the expression cassette, or a recombinant microorganism containing J3) the recombinant vector;
J5) a transgenic cell line comprising J1) the nucleic acid molecule or a transgenic cell line comprising J2) the expression cassette.
5. The loss-of-function screener resistance gene of claim 1 or a biological material associated with said loss-of-function screener resistance gene.
6. Use of the kit of any one of claims 1 to 4 or the loss-of-function screener resistance gene of claim 5 or a biological material associated with said loss-of-function screener resistance gene in any one of M1) -M6):
m1) enriching the cells with C.T base substitution of the genome target sequence of the organism or the organism cells;
m2) preparing products for enriching the cells with the C.T base substitution of the genome target sequences of organisms or organism cells;
m3) improving the efficiency of C.T base substitution of the genome target sequence of an organism or an organism cell;
m4) preparing a product for improving the replacement efficiency of the C.T base of the genome target sequence of the organism or the organism cell;
m5) a c.t base substitution in a target sequence of a genome of an organism or a cell of an organism;
m6) preparation of products for C.T base substitution in target sequences of organisms or biological cells.
7, N1) or N2) or N3) or N4) or N5):
n1) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing into an organism or cell of an organism a gene encoding a Cas9 nuclease, a DNA molecule transcribing a sgRNA targeted to a target gene target sequence, a DNA molecule transcribing a sgRNA targeted to the loss-of-function screener resistance gene target sequence, a gene encoding a cytosine deaminase, a gene encoding UGI, and a loss-of-function screener resistance gene of any one of claims 1-4, such that the Cas9 nuclease, the sgRNA, the cytosine deaminase, and UGI are all expressed; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with targeted function loss, the Cas9 nuclease, the cytosine deaminase and the UGI can restore the function of the screening agent resistance gene with function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with function loss, thereby realizing the enrichment of cells with C.T base substitution of the screening agent resistance gene, further realizing the enrichment of cells with C.T base substitution of the target sequence of the target gene of the genome of an organism or an organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n2) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing into an organism or cell of an organism a gene encoding a Cas9 nuclease, a DNA molecule transcribing a sgRNA targeted to a target gene target sequence, a DNA molecule transcribing a sgRNA targeted to the loss-of-function screener resistance gene target sequence, a gene encoding a cytosine deaminase, and a loss-of-function screener resistance gene of any one of claims 1-4, such that the Cas9 nuclease, the sgRNA, the cytosine deaminase are all expressed; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with the targeted function loss, the Cas9 nuclease and the cytosine deaminase can restore the function of the screening agent resistance gene with the function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with the function loss, and further enrich the cells with the C.T base substitution of the screening agent resistance gene, thereby realizing the enrichment of the cells with the C.T base substitution of the target sequence of the target gene of the genome of the organism or the organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n3) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing the Cas9 nuclease of any of claims 1-4, sgRNA targeting a target gene sequence, sgRNA targeting the loss-of-function screener resistance gene target sequence, cytosine deaminase, UGI, and a loss-of-function screener resistance gene into an organism or biological cell; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with targeted function loss, the Cas9 nuclease, the cytosine deaminase and the UGI can restore the function of the screening agent resistance gene with function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with function loss, thereby realizing the enrichment of cells with C.T base substitution of the screening agent resistance gene, further realizing the enrichment of cells with C.T base substitution of the target sequence of the target gene of the genome of an organism or an organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n4) A method for enriching cells in which a C.T base substitution has occurred for a target sequence in the genome of an organism or a cell of an organism or a method for increasing the efficiency of a C.T base substitution for a target sequence in the genome of an organism or a cell of an organism, comprising the steps of: introducing the Cas9 nuclease of any of claims 1-4, sgRNA targeting a target gene sequence, sgRNA targeting the loss-of-function screener resistance gene target sequence, cytosine deaminase, and a loss-of-function screener resistance gene into an organism or organism cell; under the guidance of sgRNA of the target sequence of the screening agent resistance gene with the targeted function loss, the Cas9 nuclease and the cytosine deaminase can restore the function of the screening agent resistance gene with the function loss by carrying out C.T base substitution on the target sequence of the screening agent resistance gene with the function loss, and further enrich the cells with the C.T base substitution of the screening agent resistance gene, thereby realizing the enrichment of the cells with the C.T base substitution of the target sequence of the target gene of the genome of the organism or the organism cell or improving the C.T base substitution efficiency of the target sequence of the target gene of the genome of the organism or the organism cell;
n5) biological mutant, comprising the following steps: editing the genome of the organism according to the method of N1) or N2) or N3) or N4) to obtain an organism mutant; the biological mutant is an organism in which C.T base substitution occurs.
8. The kit of any one of claims 1 to 4 or the use of claim 6 or the method of claim 7, wherein: the C.T base is replaced by a base C and mutated into a base T.
9. The use according to claim 6 or the method according to claim 7, characterized in that: the organism is a plant or an animal; the biological cell is a plant cell or an animal cell.
10. The use or method according to claim 9, wherein: the plant is a monocotyledon or a dicotyledon; the plant cell is a monocotyledon cell or a dicotyledon cell.
11. The use or method according to claim 10, wherein: the monocotyledon is a gramineous plant; the monocotyledon cell is a gramineae plant cell.
12. The use or method according to claim 11, wherein: the gramineous plant is rice; the gramineous plant cell is a rice cell.
CN201910938668.3A 2019-09-30 2019-09-30 Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof Active CN110628794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910938668.3A CN110628794B (en) 2019-09-30 2019-09-30 Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910938668.3A CN110628794B (en) 2019-09-30 2019-09-30 Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof

Publications (2)

Publication Number Publication Date
CN110628794A CN110628794A (en) 2019-12-31
CN110628794B true CN110628794B (en) 2021-07-16

Family

ID=68973506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910938668.3A Active CN110628794B (en) 2019-09-30 2019-09-30 Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof

Country Status (1)

Country Link
CN (1) CN110628794B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016176191A1 (en) * 2015-04-27 2016-11-03 The Trustees Of The University Of Pennsylvania Dual aav vector system for crispr/cas9 mediated correction of human disease
CN108795972A (en) * 2017-05-05 2018-11-13 中国科学院遗传与发育生物学研究所 Without using the cellifugal method of transgenosis flag sequence point
CN109652440A (en) * 2018-12-28 2019-04-19 北京市农林科学院 Application of the VQRn-Cas9&PmCDA1&UGI base editing system in plant gene editor
CN109666693A (en) * 2018-12-29 2019-04-23 北京市农林科学院 Application of the MG132 in base editing system editor's acceptor gene group

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016176191A1 (en) * 2015-04-27 2016-11-03 The Trustees Of The University Of Pennsylvania Dual aav vector system for crispr/cas9 mediated correction of human disease
CN108795972A (en) * 2017-05-05 2018-11-13 中国科学院遗传与发育生物学研究所 Without using the cellifugal method of transgenosis flag sequence point
CN109652440A (en) * 2018-12-28 2019-04-19 北京市农林科学院 Application of the VQRn-Cas9&PmCDA1&UGI base editing system in plant gene editor
CN109666693A (en) * 2018-12-29 2019-04-23 北京市农林科学院 Application of the MG132 in base editing system editor's acceptor gene group

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A highly efficient method for enriching TALEN or CRISP R/Cas9-edited mutant cells;Hideyo Yasuda et al.;《Journal of Genetics and Genomics》;20161101;第43卷;第705-708页 *

Also Published As

Publication number Publication date
CN110628794A (en) 2019-12-31

Similar Documents

Publication Publication Date Title
CN107043779B (en) Application of CRISPR/nCas 9-mediated site-specific base substitution in plants
CN111378051B (en) PE-P2 guided editing system and application thereof in genome base editing
CN107254485A (en) A kind of new reaction system for being capable of rapid build plant gene fixed point knockout carrier
CN106929532B (en) Artificially creating male sterile line of maize and efficient transfer method
JP2018527920A (en) Method for obtaining glyphosate-tolerant rice by site-specific nucleotide substitution
SK161597A3 (en) Dna sequence of a gene of hydroxy-phenyl pyruvate dioxygenase and production of plants containing a gene of hydroxy-phenyl pyruvate dioxygenase and which are tolerant to certain herbicides
CN110564752B (en) Application of differential agent technology in enrichment of C.T base substitution cells
CN112279903B (en) Gene for improving rice blast resistance of rice in panicle stage and application thereof
WO2021032155A1 (en) Base editing system and use method therefor
CN110229843B (en) Upland cotton transformation event 19PFA1-135-17 and specificity identification method thereof
CN110628794B (en) Cell enrichment technology of C.T base substitution by taking inactivated screening agent resistance gene as report system and application thereof
CN111593031B (en) Rice ALS mutant gene, plant transgenic screening vector pCALSm3 containing gene and application thereof
US10941412B2 (en) Citrus varieties resistant to Xanthomonas citri infection
CN105462969B (en) Pig specificity close friend site Pifs102 and its application
WO2023216415A1 (en) Base editing system based on bimolecular deaminase complementation, and use thereof
CN107417779B (en) Plant aluminum-resistant related protein GmGRPL and coding gene and application thereof
CN114317596B (en) Method for mutating A in plant genome target sequence into G
CN111471684B (en) Plant constitutive promoter ALSpro and application thereof
CN114317589B (en) Application of SpRYn-ABE base editing system in plant genome base substitution
WO2005083108A1 (en) Reduction of spontaneous mutation rates in cells
CN106676129A (en) Method for improving genome edition efficiency
CN111560396B (en) Plant transgenic screening vector pCALSm1 and application thereof
US20220049263A1 (en) Virus-based replicon for plant genome editing without inserting replicon into plant genome and uses thereof
CN109266631A (en) A kind of method that genome fixed point knocks out
Curtis et al. Recombinant DNA, vector design, and construction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant