CN110029096A

CN110029096A - A kind of adenine base edit tool and application thereof

Info

Publication number: CN110029096A
Application number: CN201910382569.1A
Authority: CN
Inventors: 黄诗圣; 李向阳; 黄行许
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2019-05-09
Filing date: 2019-05-09
Publication date: 2019-07-19
Anticipated expiration: 2039-05-09
Also published as: CN110029096B

Abstract

The present invention relates to field of biotechnology, more particularly to a kind of adenine base edit tool and application thereof.The present invention provides a kind of fusion protein, including ecTadA-ecTadA* dimer fragment and SpCas9-NG D10A nickase segment, the ecTadA-ecTadA* dimer fragment include ecTad segment and ecTadA* segment.Fusion protein provided by the present invention can be with NG for PAM sequence, it holds 4-7 A to sport G sgRNA 5 ', the target site of base editor can be increased, furthermore, the fusion protein also has the advantages that editor's accuracy is high, low adjacent to missing the target, and has good industrialization prospect.

Description

A kind of adenine base edit tool and application thereof

Technical field

The present invention relates to field of biotechnology, more particularly to a kind of adenine base edit tool and application thereof.

Background technique

CRISPR/Cas9 is the gene editing technology being most widely used at present.Cas9 arrival refers under the guidance of sgRNA Determine region and play digestion activity, is cut between the upstream PAM 3bp and 4bp.The double-strand of DNA is caused after CRISPR/Cas9 cutting Be broken (DSB), excite the DNA repair mechanism of itself, be broadly divided into HDR (Homologous Directly Repair, it is homologous heavy Group is repaired) and NHEJ (Non-Homologous End Join, non-homogeneous recombination end are repaired).HDR can be carried out quasi- using template Really repair, the result that NHEJ is repaired is randomly incorporated into insertion or missing, NHEJ in repair process in the highest flight.

The appearance of CRISPR/Cas9 is so that genetic manipulation is very convenient, but is randomly incorporated into insertion or missing simultaneously by NHEJ It can not achieve the accurate editor to genome, and the carrier or single stranded DNA (Single- of homologous recombination be provided after cutting Stranded Donor Oligonucleotide, ssODN) method, low efficiency and take considerable time.And Cas9 is cut Caused by DSB may cause the large fragment deletion of genome, leave security risk.

The David Liu et al. report of Harvard University is melted using the Cas9D10Anickase (nCas9) that RuvC structural domain inactivates The method for closing deaminase may be implemented to carry out point mutation (C-to-T or A-to-G) to genome single base, and not cause DSB, have Cytosine base edit tool (Cytosine Base Editor, CBE) and adenine base edit tool (Adenine Base Editor, ABE) two kinds.

The fusion protein of cytosine deaminase/adenine deaminase and nCas9 reach target site under the guidance of sgRNA And the DNA chain complementary with sgRNA is combined.Cytosine deaminase cytimidine C a certain range of to periphery, which carries out deamination, to be become Uracil U, U can be with adenine A complementary pairings, and by the duplication of DNA, U is eventually replaced the complementary pairing base T by A； Similar, adenine deaminase adenine A a certain range of to periphery, which carries out deamination, becomes hypoxanthine I, and I can be with cytimidine C complementary pairing, by the duplication of DNA, I is eventually replaced the complementary pairing bases G by C.To reach C-to-T or A-to- The purpose of G.

Summary of the invention

In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide a kind of adenine base edit tools And application thereof, for solving the problems of the prior art.

In order to achieve the above objects and other related objects, one aspect of the present invention provides a kind of fusion protein, including ecTadA- EcTadA* dimer fragment and SpCas9-NG D10A nickase segment, the ecTadA-ecTadA* dimer fragment packet Include ecTad segment and ecTadA* segment.

In some embodiments of the present invention, the amino acid sequence of the ecTadA segment includes:

A) amino acid sequence as shown in SEQ ID NO.57；Or,

B) with amino acid sequence of the SEQ ID NO.57 with 80% or more sequence similarity and with ammonia defined by a) The function of base acid sequence is preferably capable forming dimer with ecTadA* segment and dimer is living with adenine deaminase Property.

In some embodiments of the present invention, the amino acid sequence of the ecTadA* segment includes:

C) amino acid sequence as shown in SEQ ID NO.58；Or,

D) with amino acid sequence of the SEQ ID NO.58 with 80% or more sequence similarity and with ammonia defined by c) The function of base acid sequence is preferably capable forming dimer with ecTadA segment and dimer has adenine deaminase activity.

In some embodiments of the present invention, the amino acid sequence packet of the SpCas9-NG D10A nickase segment It includes:

E) amino acid sequence as shown in SEQ ID NO.59；Or,

F) with amino acid sequence of the SEQ ID NO.59 with 80% or more sequence similarity and with ammonia defined by e) The function of base acid sequence is preferably capable identification NG as PAM.

In some embodiments of the present invention, the fusion protein is from 5 ' ends to 3 ' ends successively including ecTadA-ecTadA* Dimer fragment and SpCas9-NG D10A nickase segment.

In some embodiments of the present invention, the ecTadA-ecTadA* dimer fragment is successively wrapped from 5 ' ends to 3 ' ends Include ecTad segment and ecTadA* segment.

In some embodiments of the present invention, the fusion protein further includes nuclear localization signal segment, it is preferred that the core Positioning signal segment be located at ecTadA-ecTadA* dimer fragment and SpCas9-NG D10A nickase segment 5 ' ends and/ Or 3 ' end, it is preferred that the amino acid sequence of the nuclear localization signal segment is as shown in SEQ ID NO.60.

In some embodiments of the present invention, the amino acid sequence of the fusion protein is as shown in SEQ ID No.61.

Another aspect of the present invention provides a kind of isolated polynucleotides, encodes the fusion protein.

Another aspect of the present invention provides a kind of construct, and the construct contains the isolated polynucleotides.

Another aspect of the present invention provides a kind of expression system, and the expression system contains in the construct or genome It is integrated with the polynucleotides.

In some embodiments of the present invention, the host cell of the expression system is selected from eukaryocyte or prokaryotic cell, It is preferably selected from mouse cell, people's cell, it is thin to be more preferably selected from mouse brain nerve oncocyte, human embryonic kidney cell or human cervical carcinoma Born of the same parents are more preferably selected from N2a cell, HEK293FT cell or Hela cell.

Another aspect of the present invention provide the fusion protein, the isolated polynucleotides, the construct or Purposes of the expression system in gene editing.

In some embodiments of the present invention, the purposes is specially the purposes in Eukaryotic gene editing.

Another aspect of the present invention provides a kind of base editor system, including the fusion protein, the base editosome System further includes sgRNA.

Another aspect of the present invention provides a kind of gene editing method, comprising: passes through the fusion protein or the alkali Base editor's system carries out gene editing.

Detailed description of the invention

Fig. 1 is shown as the ABEmax-NG plasmid construct schematic diagram that the building of embodiment 1 obtains.

Fig. 2 is shown as 2 experimental result schematic diagram of the embodiment of the present invention, wherein a is eGFP reporting system Schematic diagram；B is microscope photo result of this report system on HEK293FT cell；C is flow cytometer detection result；D is Sanger sequencing result.

Fig. 3 is shown as 3 experimental result schematic diagram of the embodiment of the present invention, wherein a is for ABEmax-NG to 16 on N2a cell The deep-seq sequencing analysis result of a endogenous gene site editor；B is the base distribution ratio of corresponding editing sites；C is Corresponding mutation efficiency, mutation accessory substance and indel statistics；D is for ABEmax-NG to 4 endogenous genes on mice embryonic The deep-seq sequencing analysis result of site editor；E is corresponding mutation efficiency, mutation accessory substance and indel statistics.

Fig. 4 is shown as 3 experimental result schematic diagram of the embodiment of the present invention, wherein a is for ABEmax-NG to 16 on N2a cell The neighbouring result of missing the target of a endogenous gene site editor；B is that ABEmax-NG compiles 4 endogenous gene sites on mice embryonic The neighbouring result of missing the target collected.

Fig. 5 is shown as 4 experimental result schematic diagram of the embodiment of the present invention, wherein a be using ABEmax-NG on N2a cell To the edited result of the splice site of 4 endogenous genes；B is corresponding RNA montage testing result；C is corresponding Sanger survey Sequence is as a result, verify emerging montage hypotype.

Fig. 6 is shown as 5 experimental result schematic diagram of the embodiment of the present invention, wherein a is to obtain BBS2 base using ABEmax-NG Because of the schematic diagram of acceptor splicing site mutation mouse；B is the genotype identification result for being mutated mouse；C is the base for being mutated mouse different tissues Because of type qualification result；D is the RNA hypotype testing result for being mutated mouse different tissues.

Specific embodiment

Inventor passes through a large amount of pilot studys, provides a kind of fusion protein, and the fusion protein is a kind of New adenine base edit tool, the fusion protein can identify that NG as PAM, has widened the targeting model of base editor It encloses, completes the present invention on this basis.

First aspect present invention provides a kind of fusion protein, including ecTadA-ecTadA* dimer fragment and SpCas9- NGD10A nickase segment, the ecTadA-ecTadA* dimer fragment include ecTad segment and ecTadA* segment.Institute Stating fusion protein can match with the sgRNA of targeting target area, realize to sgRNA in target area with NG for PAM sequence The efficient base editor of the A-to-G at 5 ' 4-7, ends, and the accuracy being mutated is high, it is low adjacent to missing the target,

In fusion protein provided by the present invention, the amino acid sequence of the ecTadA segment may include: a) such as SEQ Amino acid sequence shown in ID NO.57；Or, b) having the amino acid sequence of 80% or more sequence similarity with SEQ ID NO.57 Column and have the function of a) defined by amino acid sequence.Specifically, it is described b) in amino acid sequence refer specifically to: such as SEQ Amino acid sequence shown in ID No.57 (specifically can be 1-50,1-30 by replacing, lacking or add one or more A, 1-20,1-10,1-5,1-3,1,2 or 3) obtained from amino acid, or in the end N- and/or C- End addition it is one or more (specifically can be 1-50,1-30,1-20,1-10,1-5,1-3,1,2, Or 3) obtained from amino acid, and have the function of the polypeptide piece of amino acid polypeptide fragment as shown in SEQ ID No.57 Section has that adenine deaminase is active for example, it may be having and can form dimer and dimer with ecTadA* segment, more It specifically can be the function that adenine (adenine, A) deamination is generated to hypoxanthine (hypoxanthine, I).It is described b) in Amino acid sequence can with SEQ ID No.57 have 80%, 85%, 90%, 93%, 95%, 97% or 99% or more phase Like property.

In fusion protein provided by the present invention, the amino acid sequence of the ecTadA* segment may include: c) such as SEQ Amino acid sequence shown in ID NO.58；Or, d) having the amino acid sequence of 80% or more sequence similarity with SEQ ID NO.58 Column and have the function of c) defined by amino acid sequence.Specifically, it is described d) in amino acid sequence refer specifically to: such as SEQ Amino acid sequence shown in ID No.58 (specifically can be 1-50,1-30 by replacing, lacking or add one or more A, 1-20,1-10,1-5,1-3,1,2 or 3) obtained from amino acid, or in the end N- and/or C- End addition it is one or more (specifically can be 1-50,1-30,1-20,1-10,1-5,1-3,1,2, Or 3) obtained from amino acid, and have the function of the polypeptide piece of amino acid polypeptide fragment as shown in SEQ ID No.58 Section has that adenine deaminase is active for example, it may be having and can form dimer and dimer with ecTadA segment, more It specifically can be the function that adenine (adenine, A) deamination is generated to hypoxanthine (hypoxanthine, I).It is described d) in Amino acid sequence can with SEQ ID No.58 have 80%, 85%, 90%, 93%, 95%, 97% or 99% or more phase Like property.

In fusion protein provided by the present invention, the amino acid sequence of the SpCas9-NG D10A nickase segment can To include: the e) amino acid sequence as shown in SEQ ID NO.59；Or, f) there is 80% or more sequence phase with SEQ ID NO.59 Like property amino acid sequence and have the function of e) defined by amino acid sequence.Specifically, it is described f) in amino acid sequence Refer specifically to: the amino acid sequence as shown in SEQ ID No.59 by replacing, missing or addition is one or more (specifically can be with It is 1-50,1-30,1-20,1-10,1-5,1-3,1,2 or 3) obtained from amino acid, or in N- One or more is added in end and/or the end C- (specifically can be 1-50,1-30 is a, 1-20 is a, 1-10 is a, 1-5 is a, 1-3 A, 1,2 or 3) obtained from amino acid, and the function with amino acid polypeptide fragment as shown in SEQ ID No.59 Can polypeptide fragment, for example, it may be function of the NG as PAM can be identified, specifically can be can using NG sequence as PAM, and can be matched with sgRNA the and ecTadA-ecTadA* dimer fragment in selectively targeted site, realize target spot SgRNA 5 ' holds the base editor of 4-7 A-to-G in region.It is described f) in amino acid sequence can be with SEQ ID No.59 With 80%, 85%, 90%, 93%, 95%, 97% or 99% or more similitude.The targeting of CRISPR/Cas9 system is known Not it is generally necessary to have space before sequence adjacent to motif (protospacer adjacent motif, PAM) beside target site, make It is frequently used for the Cas9 enzyme of genome editor the most for one kind, from streptococcus pyogenes (Streptococcus pyogenes) Cas9 (SpCas9) can only identify the PAM of NGG sequence, which limits the range that can be targeted in genome, and this hair SpCas9-NG D10A nickase segment in bright can identify NG sequence as PAM.

In fusion protein provided by the present invention, the substitution, missing or addition can be conserved amino acid substitution. " the conserved amino acid substitution " can specifically refer to that amino acid residue is replaced by other amino acid residues with similar side chain The case where.Amino acid residue families with similar side chain should be to those skilled in the art it is known, for example, can To be including but not limited to basic side chain (such as lysine, arginine, histidine), acid side-chain (such as aspartic acid, paddy ammonia Acid), uncharged polar side chain (for example, glycine, asparagine, glutamine, serine, threonine, tyrosine, half Cystine), non-polar sidechain (such as alanine, valine, leucine, isoleucine, proline, phenylalanine, first sulphur ammonia Acid, tryptophan) isoleucine) and the families such as aromatic side chains (such as tyrosine, phenylalanine, tryptophan, histidine).Conservative Amino acid substitution more specifically can include but is not limited in following table listed concrete condition, in table 1 (amino acid similarity matrix) Two amino acid of digital representation between similarity, be considered that conserved amino acid replaces when number is more than or equal to 0, table 2 is The scheme that illustrative conserved amino acid replaces.

Table 1

	C	G	P	S	A	T	D	E	N	Q	H	K	R	V	M	I	L	F	Y	W
																					W	-8	-7	-6	-2	-6	-5	-7	-7	-4	-5	-3	-3	2	-6	-4	-5	-2	0	0	17
Y	0	-5	-5	-3	-3	-3	-4	-4	-2	-4	0	-4	-5	-2	-2	-1	-1	7	10
																					F	-4	-5	-5	-3	-4	-3	-6	-5	-4	-5	-2	-5	-4	-1	0	1	2	9
L	-6	-4	-3	-3	-2	-2	-4	-3	-3	-2	-2	-3	-3	2	4	2	6
																					I	-2	-3	-2	-1	-1	0	-2	-2	-2	-2	-2	-2	-2	4	2	5
M	-5	-3	-2	-2	-1	-1	-3	-2	0	-1	-2	0	0	2	6
																					V	-2	-1	-1	-1	0	0	-2	-2	-2	-2	-2	-2	-2	4
R	-4	-3	0	0	-2	-1	-1	-1	0	1	2	3	6
																					K	-5	-2	-1	0	-1	0	0	0	1	1	0	5
H	-3	-2	0	-1	-1	-1	1	1	2	3	6
																					Q	-5	-1	0	-1	0	-1	2	2	1	4
N	-4	0	-1	1	0	0	2	1	2
																					E	-5	0	-1	0	0	0	3	4
D	-5	1	-1	0	0	0	4
																					T	-2	0	0	1	1	3
A	-2	1	1	1	2
																					S	0	1	1	1
P	-3	-1	6
																					G	-3	5
C	12

Table 2

It can also include nuclear localization signal segment (NLS), the nuclear localization signal in fusion protein provided by the present invention Segment can be located at the N-terminal of ecTadA-ecTadA* dimer fragment, can also be located at SpCas9-NG D10A nickase piece The C-terminal of section.The nuclear localization signal segment may include the amino acid sequence as shown in SEQ ID NO.60.

In fusion protein provided by the present invention, it can successively include ecTadA- that the fusion protein, which is held from 5 ' ends to 3 ', EcTadA* dimer fragment and SpCas9-NG D10A nickase segment, the ecTadA-ecTadA* dimer fragment is certainly 5 ' ends to 3 ' ends can successively include ecTadA segment and ecTadA* segment.In a specific embodiment of the invention, the fusion The amino acid sequence of albumen is as shown in SEQ ID No.61.

Second aspect of the present invention provides a kind of isolated polynucleotides, merges egg provided by coding first aspect present invention It is white.

Third aspect present invention provides a kind of construct, and the construct contains separation provided by second aspect of the present invention Polynucleotides.The construct can usually be constructed by the way that the isolated polynucleotides are inserted into suitable expression vector It obtains, suitable expression vector may be selected in those skilled in the art, for example, the expression vector can be including but not limited to PCMV expression vector, pSV2 expression vector, pGL3 expression vector etc..

Fourth aspect present invention provides a kind of expression system, and the expression system contains provided by third aspect present invention The polynucleotides separated provided by the second aspect of the present invention of external source are integrated in construct or genome.The expression system It can be host cell, the host cell can express fusion protein as described above, and the fusion protein can be with sgRNA It matches, so as to which the fusion protein is navigated to target area, realizes the base editor of target area.Of the invention another In one specific embodiment, the host cell can be eukaryocyte and/or prokaryotic cell, more specifically can be mouse cell, People's cell etc. more specifically can be mouse brain nerve oncocyte, human embryonic kidney cell, human cervical carcinoma cell etc., more specifically can be with It is N2a cell, HEK293FT cell, Hela cell etc..

Fifth aspect present invention provides fusion protein provided by first aspect present invention or second aspect of the present invention is mentioned Table provided by construct provided by the isolated polynucleotides or third aspect present invention of confession or fourth aspect present invention Up to purposes of the system in gene editing, purposes in preferably Eukaryotic gene editing, the eucaryote specifically may be used To be metazoa, including but not limited to mouse etc. specifically can be.The purposes specifically can be including but not limited to by A to G base editor (more specifically in target area sgRNA 5 ' hold 4-7 A-to-G base editor), edit montage by Body/donor site is come building or the treatment of human diseases etc. for adjusting RNA montage, this tool being utilized to carry out mouse disease model. In a specific embodiment of the invention, gene to be edited can be CHRNE (ID:11448), SIX6 (ID:20476), ITPR1 (ID:16438)、TMEM67(ID:329795)、LMBR1(ID:56873)、NFIX(ID:18032)、DES(ID:13346)、 BHLHA9(ID:320522)、NDUFS1(ID:227197)、HOXD13(ID:15433)、AKR1C19(ID:432720)、LMNA (ID:16905)、WNT5A(ID:22418),、SUFU(ID:24069)、GJA8(ID:14616)、EYA1(ID:14048)、BBS2 (ID:67378), OFD1 (ID:237222), MYO7A (ID:17921), SEPN1 (ID:74777) etc..Another specific reality of the present invention It applies in example, object to be edited can be embryo, cell etc..

Sixth aspect present invention provides a kind of base editor system, including egg is merged provided by first aspect present invention White, the base editor system further includes sgRNA.Those skilled in the art can select according to the target editing area of gene The sgRNA in suitable targeting specific site.For example, the sequence of the sgRNA usually can be at least partly mutual with target area It mends, so as to be matched with the fusion protein, the fusion protein is navigated into target area, is realized in target area SgRNA 5 ' holds the base editor of 4-7 A-to-G, specifically can be adenine desamination reaction, i.e., compiles adenine (A) Collect is hypoxanthine (I).Base editor system provided by the present invention has greatly widened the range that genome can target, can be with Using NG sequence as PAM, the base of the A-to-G at 5 ' 4-7, ends in sgRNA target area is realized, and be mutated with very high essence Parasexuality is low adjacent to missing the target.In an of the invention specific embodiment, the sgRNA can target CHRNE, SIX6, ITPR1, TMEM67、LMBR1、NFIX、DES、BHLHA9、NDUFS1、HOXD13、AKR1C19、LMNA、WNT5A、SUFU、GJA8、EYA1、 The genes such as BBS2, OFD1, MYO7A, SEPN1.In another specific embodiment of the present invention, object to be edited can be embryo, thin Born of the same parents etc..

Seventh aspect present invention provides a kind of base edit methods, comprising: melts provided by first aspect through the invention Base editor system provided by hop protein or sixth aspect present invention carries out gene editing.For example, the gene editing method It may include: to cultivate expression system provided by fourth aspect present invention under proper condition, to express the fusion protein, The fusion protein can it is engaged therewith targeting target area sgRNA existing under the conditions of, to targeting regions carry out alkali Base editor.The method for providing condition existing for the sgRNA should be known to those skilled in the art, for example, Can be culture under proper condition can express the expression system of the sgRNA, and the expression system can be including containing It encodes and is integrated with the more of the coding sgRNA in the host cell or chromosome of the expression vector of the polynucleotides of the sgRNA The host cell of nucleotide.In a specific embodiment of the invention, the gene editing is outer-gene editor.

The present invention provides a kind of new adenine base by combining the SpCas9-NG for identifying NG PAM with ABE SgRNA 5 ' can be held 4-7 A to sport G by edit tool, fusion protein provided by the present invention with NG for PAM sequence, The target site that base editor can be increased possesses more for editing acceptor splicing site/donor site sgRNA, can be used for adjusting The gene number of section RNA montage more possesses bigger base editor's range in genome.In addition, the fusion protein also has Editor's accuracy height, neighbouring low advantage of missing the target, have good industrialization prospect.

Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.

Before further describing the specific embodiments of the present invention, it should be appreciated that protection scope of the present invention is not limited to down State specific specific embodiment；It is also understood that term used in the embodiment of the present invention is specific specific in order to describe Embodiment, rather than limiting the scope of protection of the present invention；In description of the invention and claims, unless in text In addition explicitly point out, singular "one", " one " and " this " include plural form.

When embodiment provides numberical range, it should be appreciated that except non-present invention is otherwise noted, two ends of each numberical range Any one numerical value can be selected between point and two endpoints.Unless otherwise defined, the present invention used in all technologies and Scientific term is identical as the normally understood meaning of those skilled in the art of the present technique.Except specific method, equipment used in embodiment, Outside material, grasp and record of the invention according to those skilled in the art to the prior art can also be used and this Any method, equipment and the material of the similar or equivalent prior art of method described in inventive embodiments, equipment, material come real The existing present invention.

Unless otherwise stated, disclosed in this invention experimental method, detection method, preparation method be all made of this technology neck Molecular biology, biochemistry, chromatin Structure and the analysis of domain routine, analytical chemistry, cell culture, recombinant DNA technology and The routine techniques of related fields.These technologies have perfect explanation in the prior art, and for details, reference can be made to Sambrook etc. MOLECULAR CLONING:A LABORATORY MANUAL, Second edition, Cold Spring Harbor Laboratory Press, 1989and Third edition, 2001；Ausubel etc., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley&Sons, New York, 1987and periodic updates；the Series METHODS IN ENZYMOLOGY, Academic Press, San Diego；Wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998；METHODS IN ENZYMOLOGY, Vol.304, Chromatin (P.M.Wassarman and A.P.Wolffe, eds.), Academic Press, San Diego, 1999；With METHODS IN MOLECULAR BIOLOGY, Vol.119, Chromatin Protocols (P.B.Becker, ed.) Humana Press, Totowa, 1999 etc..

Embodiment 1

ABEmax-NG plasmid is constructed first, passes through Mut Express II Fast Mutagenesis Kit V2 (Vazyme, C214-02) is by 7 amino acid mutation (R1335V/L1111R/D1135V/G1218R/E1219F/A1322R/ T1337R ABEmax plasmid) is introduced, ABEmax plasmid is purchased from Addgene (#112095).Construct obtained ABEmax-NG plasmid Sequence is shown in annex sequence table SEQ ID.1.

Embodiment 2

In the present embodiment, the volume of ABEmax-NG is verified on HEK293FT using eGFP reporting system The ability of collecting.

The last 1.1 green fluorescent protein reporting system plasmid construction

It is green to enhancing by Mut Express II Fast Mutagenesis Kit V2 (Vazyme, C214-02) Fluorescence protein expression carrier introduce two base mutations, one be located at -63 codon of threonine third position, sport T, A or G, the mutation will not change the amino acid sequence of eGFP, but provide a variety of PAM sequences to test ABEmax- The identification of NG；Another is located at first of -69 codon of glutamine, sports T, converts thereof into a termination codon Son, so that eGFP be made to lose fluorescence.Construct obtained eGFP reporting system plasmid Sequence is shown in annex sequence table SEQ ID.2.

The building of 1.2sgRNA plasmid

Design sgRNA simultaneously synthesizes oligos, upstream sequence are as follows: 5 '-accgAGCACTACACGCCGTAGGTC-3 ' (SEQ ID NO.3), downstream sequence are as follows: 5 '-aaacGACCTACGGCGTGTAGTGCT-3 ' (SEQ ID NO.4), upstream and downstream sequence are logical Cross program (95 DEG C, 5min；95℃-85℃at-2℃/s；85℃-25℃at-0.1℃/s；4 DEG C of hold at) annealing, connection Onto pGL3-U6-sgRNA (Addgene#51133) carrier by BsaI (NEB:R0539L) linearisation.Linearisation system is such as Shown in lower: 2 μ g of pGL3-U6-sgRNA；6 μ L of buffer (NEB:R0539L)；BsaI 2μL；DdH2O polishing is to 60 μ L.37℃ Digestion is stayed overnight.Linked system is as follows: T4 connection buffer (NEB:M0202L) 1 μ L, linearized vector 20ng, the oligo of annealing Segment (10 μM) 5 μ L, T4 ligase (NEB:M0202L) 0.5 μ L, ddH₂O polishing to 10 μ, L.16 stay overnight by a DEG C connection.The load of connection Body chooses bacterium by conversion, identifies.Bacterium extraction plasmid (Axygene:AP-MN-P-250G) is shaken to positive colony and measures concentration.

The culture transfection and identification of 1.3 cells

HEK293FT cell (being purchased from ATCC) inoculated and cultured is in the sugared culture solution of DMEM high of addition 10%FBS (HyClone, SH30022.01B), wherein Streptomycin containing 1%Penicillin (v/v) (Gibco).Work as cell concentration When being 80%, liquid is changed with the DMEM culture medium of 10% serum, culture makes cell state restore best for 2 hours.The matter of every hole transfection The amount of grain is 1 μ g, sgRNA plasmid of ABEmax-NG plasmid, 0.5 μ g, the eGFP expression plasmid after mutation respectively 0.5μg.Plasmid is mixed in Opti-MEM (Gibco, 11058021) culture medium of 50 μ l.By the Lipofectamine of 2 μ l 2000 transfection reagents (Thermo, 11668019) are mixed into the Opti-MEM culture medium of 50 μ l and mix, and stand 5 minutes.It will be mixed with The Opti-MEM for being mixed with Lipofectamine 2000 is added in the Opti-MEM of plasmid, and piping and druming mixes at a slow speed, stands 20 minutes.It will The Opti-MEM for being mixed with plasmid and Lipofectamine 2000 is separately added into 12 orifice plates.With 10%FBS's after transfection 6 hours DMEM changes liquid.Transfection observes fluorescence under the microscope and takes pictures after 48 hours, and with Flow cytometry Fluorescence Ratio.Sorting GFP positive cell identifies genotype by cracking, and the ingredient of lysate is 50mM KCl, 1.5mM MgCl2,10mM Tris PH 8.0,0.5%Nonidet P-40,0.5%Tween 20,100g/ml protease K.Sequence near target spot is carried out PCR amplification identifies amplified production with Sanger sequencing after purification.Amplification system is as follows: 2Xbuffer (Vazyme, P505)25μL；dNTP 1μL；F(10pmol/μL)1μL；R(10pmol/μL)1μL；1 μ L of template；Archaeal dna polymerase (Vazyme, P505)0.5μL；DdH2O polishing is to 50 μ L.It amplifies the PCR product come to purify by following step: three times volume is added PCR-A (Axygen:AP-PCR-250G) crosses column, and centrifugation, 12000 revs/min are centrifuged 1 minute；700 μ L W2 are added, are centrifuged 1 point Clock；Waste liquid is abandoned, 700 μ LW2 are added, is centrifuged 1 minute；Waste liquid is abandoned, is dallied 1 minute；20 μ L water elutions are added.Correlated results such as Fig. 2 It is shown.ABEmax-NG can identify NG PAM in HEK293FT cell, and the enhancing fluorescin after repairing mutation keeps its extensive Multiple fluorescence.

Embodiment 3

In the present embodiment, endogenous gene site is compiled on N2a cell and mice embryonic using ABEmax-NG Volume.

The building of 2.1sgRNA plasmid

Select 16 endogenous mouse genes: CHRNE (ID:11448), SIX6 (ID:20476), ITPR1 (ID:16438), TMEM67(ID:329795)、LMBR1(ID:56873)、NFIX(ID:18032)、DES(ID:13346)、BHLHA9(ID: 320522)、NDUFS1(ID:227197)、HOXD13(ID:15433)、AKR1C19(ID:432720)、LMNA(ID:16905)、 WNT5A (ID:22418), SUFU (ID:24069), GJA8 (ID:14616), EYA1 (ID:14048) design sgRNA, used Oligos see annex sequence table SEQ ID.5-36.The building for carrying out sgRNA plasmid by 1.2.

Select 4 endogenous mouse genes: BBS2 (ID:67378), OFD1 (ID:237222), MYO7A (ID:17921), SEPN1 (ID:74777) designs sgRNA and synthesizes oligos, the PUC57- of linearisation is annealed and be connected to by 1.2 method On T7-sgRNA carrier (addgene:51132).Linearisation system is as described in 1.2.Used oligos is shown in annex sequence table SEQ ID.37-44。

The culture transfection and identification of 2.2 cells

N2a cell (being purchased from ATCC) is cultivated and is transfected by 1.3, and the plasmid amount of transfection is 1 μ g of ABEmax-NG, 0.5 μ g of sgRNA expression vector plasmid, using ABEmax as control.GFP positive cell is sorted after transfection 72 hours, is carried out by 1.3 Cracking, PCR amplification and purifying, product are sequenced with two generations and are identified.Correlated results is as shown in Figure 3 and Figure 4.ABEmax-NG can To identify NG PAM in N2a cell, endogenous gene site is efficiently edited, and accuracy is high, accessory substance is low, neighbouring misses the target It is low.

The in-vitro transcription of 2.3sgRNA

Using the PUC57-T7sgRNA of building as template, the segment containing sgRNA, the primer are as follows: F:5 '-are expanded TCTCGCGCGTTTCGGTGATGACGG-3'(SEQ ID.45)；R:5'-AAAAAAAGCACCGACTCGGTGCCACTTTTTC- 3'(SEQ ID.46).Amplification system is as follows: 25 μ L of 2Xbuffer (Vazyme, P505)；dNTP 1μL；F(10pmol/μL)2μ L；R(10pmol/μL)2μL；Template 1ng；0.5 μ L of archaeal dna polymerase (Vazyme, P505)；DdH2O polishing is to 50 μ L.It amplifies The PCR product come is purified by following step: every 100 μ L volume adds 4 μ L RNAsecure (Life:AM7005)；60 DEG C 15 points Clock；The PCR-A (Axygen:AP-PCR-250G) that three times volume is added crosses column, and centrifugation, 12000 revs/min are centrifuged 1 minute；Add Enter 500 μ L W2, is centrifuged 1 minute；Idle running 1 minute；20 μ L are added without RNAase water elution.

It is transcribed using in-vitro transcription kit (Ambion, Life Technologies, AM1354), steps are as follows:

Reaction system are as follows: 1 μ L of reaction buffer；enzyme mix 1μL；A1μL；T 1μL；G 1μL；C 1μL； Template 800ng；H2O polishing is to 10 μ L.37 DEG C of 5 hours of reaction after above-mentioned system mixes.1 μ L DNase, 37 DEG C of reactions are added 15 minutes.Using the sgRNA of QIAquick Gel Extraction Kit (Ambion, Life Technologies, AM1908) recycling transcription, step is such as Under: it above walks reaction volume and 90 μ LElution solution transplanting 1.5ml EP pipe is added；350 μ L Binding are added Solution is mixed；The mixing of 250 μ L dehydrated alcohols is added；Upper prop；10000 revs/min are centrifuged 30 seconds, outwell waste liquid；It is added 500 μ L Washing solution, 10000 revs/min are centrifuged 30 seconds, outwell waste liquid；Idle running 1 minute；Collecting pipe is changed, is added 100 μ L Elution solution elution；It is mixed that 10 μ L ammonium acetates (Ambion, Life Technologies, AM1908) is added It is even；The mixing of 275 μ L dehydrated alcohols is added；- 20 DEG C are placed 30 minutes, while being prepared 70% ethyl alcohol and being placed -20 DEG C；Under 4 DEG C of environment 13000 revs/min are centrifuged 15 minutes.Supernatant is abandoned, 500 μ L, 70% ethyl alcohol is added；Centrifugation 5 minutes, siphons away waste liquid, dries 5 points Clock；The water dissolution of 20 μ L is added；1 μ L is taken to survey concentration.

The in-vitro transcription of 2.4ABEmax-NG and ABEmax

Plasmid enzyme restriction is recycled.This step is to linearize plasmid.System is as follows: 10 μ g of plasmid；buffer I (NEB:R0539L) 10 μ L；4 μ L (NEB:R0539L) of BbsI；H2O polishing is to 100 μ L.After mixing, 37 DEG C of digestions are stayed overnight.

The recycling of linearization plasmid.4 μ L RNAsecure (Life:AM7005), 60 DEG C of reactions 10 are added in digestion products Minute；It carries out operating remaining step using QIAquick Gel Extraction Kit (QIAGEN:28004), 5 times of volume buffer PB is added, cross column； 750 μ L buffer PE centrifugation is added；Idle running 1 minute；With 10 μ L water elutions, concentration is measured.

It is transcribed in vitro.Sequentially add system according to the requirement of kit (Invitrogen:AM1345): 1 μ g linearisation carries Body；10μL2XNTP/ARCA；Polishing is to 20 μ L water；2μL T7ezyme mix；2μL 10xreaction buffer.Mix it It reacts 2 hours for 37 DEG C afterwards.1 μ L DNasea is added to react 15 minutes.

Tailing.Transcription product carries out the stability that tailing processing guarantees transcript mRNA.Specific system is as follows: 20 μ L reaction produces Object；36μL H2O；20μL 5xE-PAP buffer；10μL 25mM MnCl2；10μL ATP solution；4μL PEP.Instead It is reacted 30 minutes for 37 DEG C after answering system to mix.

Recycling.(QIAGEN:74104) is carried out using QIAquick Gel Extraction Kit.Steps are as follows: above walking reaction product and 350 μ L are added buffer RLT；250 μ L dehydrated alcohols are added, cross column, centrifugation；500 μ L RPE are added, are centrifuged, 500 μ L RPE, centrifugation is added； Idle running；30 μ L water elutions are added.- 80 DEG C of preservations after measurement concentration.

The injection of 2.5 mice embryonics, in vitro culture and identification

To 1-cell embryo inject ABEmax-NG/ABEmax and sgRNA mixture, concentration be respectively 100ng/ μ L and 50ng/μL.In vitro culture arrive E4.5 days, condition of culture be KSOM culture solution (Millipore, MR-106-D), 37 DEG C, 5% CO₂.Embryo transfer is managed to 200 μ L, adds 5 μ L alkaline lysis solution (200mM KOH/50mM dithiothreitol).65 DEG C be incubated for after ten minutes, add neutralization solution (900mM Tris-HCl, PH 8.3/300mM KCl/200mM HCl), 5 400 μM of μ L random primers (Genscript, Nanjing, China), 6 μ L 10x PCR buffer (Takara, Dalian, China), 3 μ L dNTPs (2.5mM) and 1 μ L Taq polymerase (Takara, Dalian, China), moisturizing to 60 μ L.PCR 50 circulation, it is each circulation include 92 DEG C 1 minute；Prolong within 2 minutes It stretches, temperature is with 10sec/degree from 37 DEG C to 55 DEG C；55 DEG C 4 minutes.Amplified production carries out purpose as pcr template, by 1.3 The PCR amplification of segment, purifying, purified product are sequenced with two generations and are identified, correlated results is as shown in Figure 3 and Figure 4.ABEmax- NG can identify NG PAM in mice embryonic, efficiently edit endogenous gene site, and accuracy is high, accessory substance is low, adjacent It closely misses the target low.

Embodiment 4

In the present embodiment, the RNA montage of endogenous gene is adjusted in N2a cell using ABEmax-NG.

The building of 3.1sgRNA plasmid

Select 4 endogenous mouse genes: BBS2 (ID:67378), OFD1 (ID:237222), MYO7A (ID:17921), SEPN1 (ID:74777), designs sgRNA, and used oligos is shown in annex sequence table SEQ ID.47-54.It is carried out by 1.2 The building of sgRNA plasmid.

The culture transfection and identification of 3.2 cells

N2a cell (being purchased from ATCC) is cultivated and is transfected by 1.3, and the plasmid amount of transfection is 1 μ g of ABEmax-NG, 0.5 μ g of sgRNA expression vector plasmid, using ABEmax as control.GFP positive cell is sorted after transfection 72 hours, as 24 holes It is cultivated in plate to covering with.Part cell is taken to be cracked by 1.3, PCR amplification and purifying, product are sequenced with two generations and are identified. Remaining cell mentions total serum IgE with TRIzol method, carries out reverse transcription with HiScript II Q RT SuperMix (Vazyme, R222), The PCR amplification for carrying out reverse transcription product by 1.3.The separation of RNA hypotype is carried out using agarose gel electrophoresis, isolated RNA is sub- Type is sequenced with Sanger and is identified.

Correlated results is as shown in Figure 5.ABEmax-NG can identify NG PAM in N2a cell, efficiently edit endogenous base Because of site splice site, to change the RNA montage of endogenous gene.

Embodiment 5

In the present embodiment, BBS2 splice site mutant mice is obtained using ABEmax-NG, it was demonstrated that ABEmax-NG can be at For the effective tool for making RNA splice model mouse.

The building of 4.1sgRNA plasmid

The sgRNA of design editing mouse BBS2 gene (ID:67378) splice site, used oligos are shown in annex sequence List SEQ ID.55-56.The building for carrying out PUC57-T7-sgRNA vector plasmid by 2.1.

4.2 being transcribed in vitro

The in-vitro transcription for carrying out sgRNA and ABEmax-NG by 2.3 and 2.4.

The injection of 4.3 mice embryonics and transplanting

The injection for carrying out mice embryonic by 2.5.It will be in the embryo transfer after injection to replace-conceive female rat (background: ICR strain).

The detection of the identification of 4.4 murine genes types and RNA montage

Mousetail is taken, extracts genomic DNA with phenol chloroform method, carries out PCR amplification and purifying by 1.3, product is used Sanger sequencing is identified.Take different tissues of mice, with phenol chloroform method extract genomic DNA, by 1.3 carry out PCR amplifications and Purifying, product are sequenced with two generations and are identified；Extraction, reverse transcription, amplification and the purifying for carrying out RNA by 3.2, product are surveyed with two generations Sequence is identified.Correlated results is as shown in Figure 6.BBS2 splice site mutation mouse is successfully obtained using ABEmax-NG, each The mutation and corresponding RNA montage of splice site can be detected in tissue.

In conclusion the present invention effectively overcomes various shortcoming in the prior art and has high industrial utilization value.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as At all equivalent modifications or change, should be covered by the claims of the present invention.

Sequence table

<110>Shanghai Science and Technology Univ.

<120>a kind of adenine base edit tool and application thereof

<160> 61

<170> SIPOSequenceListing 1.0

<210> 1

<211> 8811

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 1

atatgccaag tacgccccct attgacgtca atgacggtaa atggcccgcc tggcattatg 60

cccagtacat gaccttatgg gactttccta cttggcagta catctacgta ttagtcatcg 120

ctattaccat ggtgatgcgg ttttggcagt acatcaatgg gcgtggatag cggtttgact 180

cacggggatt tccaagtctc caccccattg acgtcaatgg gagtttgttt tggcaccaaa 240

atcaacggga ctttccaaaa tgtcgtaaca actccgcccc attgacgcaa atgggcggta 300

ggcgtgtacg gtgggaggtc tatataagca gagctggttt agtgaaccgt cagatccgct 360

agagatccgc ggccgctaat acgactcact atagggagag ccgccaccat gaaacggaca 420

gccgacggaa gcgagttcga gtcaccaaag aagaagcgga aagtctctga agtcgagttt 480

agccacgagt attggatgag gcacgcactg accctggcaa agcgagcatg ggatgaaaga 540

gaagtccccg tgggcgccgt gctggtgcac aacaatagag tgatcggaga gggatggaac 600

aggccaatcg gccgccacga ccctaccgca cacgcagaga tcatggcact gaggcaggga 660

ggcctggtca tgcagaatta ccgcctgatc gatgccaccc tgtatgtgac actggagcca 720

tgcgtgatgt gcgcaggagc aatgatccac agcaggatcg gaagagtggt gttcggagca 780

cgggacgcca agaccggcgc agcaggctcc ctgatggatg tgctgcacca ccccggcatg 840

aaccaccggg tggagatcac agagggaatc ctggcagacg agtgcgccgc cctgctgagc 900

gatttcttta gaatgcggag acaggagatc aaggcccaga agaaggcaca gagctccacc 960

gactctggag gatctagcgg aggatcctct ggaagcgaga caccaggcac aagcgagtcc 1020

gccacaccag agagctccgg cggctcctcc ggaggatcct ctgaggtgga gttttcccac 1080

gagtactgga tgagacatgc cctgaccctg gccaagaggg cacgcgatga gagggaggtg 1140

cctgtgggag ccgtgctggt gctgaacaat agagtgatcg gcgagggctg gaacagagcc 1200

atcggcctgc acgacccaac agcccatgcc gaaattatgg ccctgagaca gggcggcctg 1260

gtcatgcaga actacagact gattgacgcc accctgtacg tgacattcga gccttgcgtg 1320

atgtgcgccg gcgccatgat ccactctagg atcggccgcg tggtgtttgg cgtgaggaac 1380

gcaaaaaccg gcgccgcagg ctccctgatg gacgtgctgc actaccccgg catgaatcac 1440

cgcgtcgaaa ttaccgaggg aatcctggca gatgaatgtg ccgccctgct gtgctatttc 1500

tttcggatgc ctagacaggt gttcaatgct cagaagaagg cccagagctc caccgactcc 1560

ggaggatcta gcggaggctc ctctggctct gagacacctg gcacaagcga gagcgcaaca 1620

cctgaaagca gcgggggcag cagcgggggg tcagacaaga agtacagcat cggcctggcc 1680

atcggcacca actctgtggg ctgggccgtg atcaccgacg agtacaaggt gcccagcaag 1740

aaattcaagg tgctgggcaa caccgaccgg cacagcatca agaagaacct gatcggagcc 1800

ctgctgttcg acagcggcga aacagccgag gccacccggc tgaagagaac cgccagaaga 1860

agatacacca gacggaagaa ccggatctgc tatctgcaag agatcttcag caacgagatg 1920

gccaaggtgg acgacagctt cttccacaga ctggaagagt ccttcctggt ggaagaggat 1980

aagaagcacg agcggcaccc catcttcggc aacatcgtgg acgaggtggc ctaccacgag 2040

aagtacccca ccatctacca cctgagaaag aaactggtgg acagcaccga caaggccgac 2100

ctgcggctga tctatctggc cctggcccac atgatcaagt tccggggcca cttcctgatc 2160

gagggcgacc tgaaccccga caacagcgac gtggacaagc tgttcatcca gctggtgcag 2220

acctacaacc agctgttcga ggaaaacccc atcaacgcca gcggcgtgga cgccaaggcc 2280

atcctgtctg ccagactgag caagagcaga cggctggaaa atctgatcgc ccagctgccc 2340

ggcgagaaga agaatggcct gttcggaaac ctgattgccc tgagcctggg cctgaccccc 2400

aacttcaaga gcaacttcga cctggccgag gatgccaaac tgcagctgag caaggacacc 2460

tacgacgacg acctggacaa cctgctggcc cagatcggcg accagtacgc cgacctgttt 2520

ctggccgcca agaacctgtc cgacgccatc ctgctgagcg acatcctgag agtgaacacc 2580

gagatcacca aggcccccct gagcgcctct atgatcaaga gatacgacga gcaccaccag 2640

gacctgaccc tgctgaaagc tctcgtgcgg cagcagctgc ctgagaagta caaagagatt 2700

ttcttcgacc agagcaagaa cggctacgcc ggctacattg acggcggagc cagccaggaa 2760

gagttctaca agttcatcaa gcccatcctg gaaaagatgg acggcaccga ggaactgctc 2820

gtgaagctga acagagagga cctgctgcgg aagcagcgga ccttcgacaa cggcagcatc 2880

ccccaccaga tccacctggg agagctgcac gccattctgc ggcggcagga agatttttac 2940

ccattcctga aggacaaccg ggaaaagatc gagaagatcc tgaccttccg catcccctac 3000

tacgtgggcc ctctggccag gggaaacagc agattcgcct ggatgaccag aaagagcgag 3060

gaaaccatca ccccctggaa cttcgaggaa gtggtggaca agggcgcttc cgcccagagc 3120

ttcatcgagc ggatgaccaa cttcgataag aacctgccca acgagaaggt gctgcccaag 3180

cacagcctgc tgtacgagta cttcaccgtg tataacgagc tgaccaaagt gaaatacgtg 3240

accgagggaa tgagaaagcc cgccttcctg agcggcgagc agaaaaaggc catcgtggac 3300

ctgctgttca agaccaaccg gaaagtgacc gtgaagcagc tgaaagagga ctacttcaag 3360

aaaatcgagt gcttcgactc cgtggaaatc tccggcgtgg aagatcggtt caacgcctcc 3420

ctgggcacat accacgatct gctgaaaatt atcaaggaca aggacttcct ggacaatgag 3480

gaaaacgagg acattctgga agatatcgtg ctgaccctga cactgtttga ggacagagag 3540

atgatcgagg aacggctgaa aacctatgcc cacctgttcg acgacaaagt gatgaagcag 3600

ctgaagcggc ggagatacac cggctggggc aggctgagcc ggaagctgat caacggcatc 3660

cgggacaagc agtccggcaa gacaatcctg gatttcctga agtccgacgg cttcgccaac 3720

agaaacttca tgcagctgat ccacgacgac agcctgacct ttaaagagga catccagaaa 3780

gcccaggtgt ccggccaggg cgatagcctg cacgagcaca ttgccaatct ggccggcagc 3840

cccgccatta agaagggcat cctgcagaca gtgaaggtgg tggacgagct cgtgaaagtg 3900

atgggccggc acaagcccga gaacatcgtg atcgaaatgg ccagagagaa ccagaccacc 3960

cagaagggac agaagaacag ccgcgagaga atgaagcgga tcgaagaggg catcaaagag 4020

ctgggcagcc agatcctgaa agaacacccc gtggaaaaca cccagctgca gaacgagaag 4080

ctgtacctgt actacctgca gaatgggcgg gatatgtacg tggaccagga actggacatc 4140

aaccggctgt ccgactacga tgtggaccat atcgtgcctc agagctttct gaaggacgac 4200

tccatcgaca acaaggtgct gaccagaagc gacaagaacc ggggcaagag cgacaacgtg 4260

ccctccgaag aggtcgtgaa gaagatgaag aactactggc ggcagctgct gaacgccaag 4320

ctgattaccc agagaaagtt cgacaatctg accaaggccg agagaggcgg cctgagcgaa 4380

ctggataagg ccggcttcat caagagacag ctggtggaaa cccggcagat cacaaagcac 4440

gtggcacaga tcctggactc ccggatgaac actaagtacg acgagaatga caagctgatc 4500

cgggaagtga aagtgatcac cctgaagtcc aagctggtgt ccgatttccg gaaggatttc 4560

cagttttaca aagtgcgcga gatcaacaac taccaccacg cccacgacgc ctacctgaac 4620

gccgtcgtgg gaaccgccct gatcaaaaag taccctaagc tggaaagcga gttcgtgtac 4680

ggcgactaca aggtgtacga cgtgcggaag atgatcgcca agagcgagca ggaaatcggc 4740

aaggctaccg ccaagtactt cttctacagc aacatcatga actttttcaa gaccgagatt 4800

accctggcca acggcgagat ccggaagcgg cctctgatcg agacaaacgg cgaaaccggg 4860

gagatcgtgt gggataaggg ccgggatttt gccaccgtgc ggaaagtgct gagcatgccc 4920

caagtgaata tcgtgaaaaa gaccgaggtg cagacaggcg gcttcagcaa agagtctatc 4980

agacccaaga ggaacagcga taagctgatc gccagaaaga aggactggga ccctaagaag 5040

tacggcggct tcgtgagccc caccgtggcc tattctgtgc tggtggtggc caaagtggaa 5100

aagggcaagt ccaagaaact gaagagtgtg aaagagctgc tggggatcac catcatggaa 5160

agaagcagct tcgagaagaa tcccatcgac tttctggaag ccaagggcta caaagaagtg 5220

aaaaaggacc tgatcatcaa gctgcctaag tactccctgt tcgagctgga aaacggccgg 5280

aagagaatgc tggcctctgc cagattcctg cagaagggaa acgaactggc cctgccctcc 5340

aaatatgtga acttcctgta cctggccagc cactatgaga agctgaaggg ctcccccgag 5400

gataatgagc agaaacagct gtttgtggaa cagcacaagc actacctgga cgagatcatc 5460

gagcagatca gcgagttctc caagagagtg atcctggccg acgctaatct ggacaaagtg 5520

ctgtccgcct acaacaagca ccgggataag cccatcagag agcaggccga gaatatcatc 5580

cacctgttta ccctgaccaa tctgggagcc cctagagcct tcaagtactt tgacaccacc 5640

atcgaccgga aggtgtacag aagcaccaaa gaggtgctgg acgccaccct gatccaccag 5700

agcatcaccg gcctgtacga gacacggatc gacctgtctc agctgggagg tgactctggc 5760

ggctcaaaaa gaaccgccga cggcagcgaa ttcgagccca agaagaagag gaaagtctaa 5820

ccggtcatca tcaccatcac cattgagttt aaacccgctg atcagcctcg actgtgcctt 5880

ctagttgcca gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg 5940

ccactcccac tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt 6000

gtcattctat tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaca 6060

atagcaggca tgctggggat gcggtgggct ctatggcttc tgaggcggaa agaaccagct 6120

ggggctcgat accgtcgacc tctagctaga gcttggcgta atcatggtca tagctgtttc 6180

ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat acgagccgga agcataaagt 6240

gtaaagccta gggtgcctaa tgagtgagct aactcacatt aattgcgttg cgctcactgc 6300

ccgctttcca gtcgggaaac ctgtcgtgcc agctgcatta atgaatcggc caacgcgcgg 6360

ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctc gctcactgac tcgctgcgct 6420

cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata cggttatcca 6480

cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga 6540

accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct gacgagcatc 6600

acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa agataccagg 6660

cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat 6720

acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca cgctgtaggt 6780

atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa ccccccgttc 6840

agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg gtaagacacg 6900

acttatcgcc actggcagca gccactggta acaggattag cagagcgagg tatgtaggcg 6960

gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga acagtatttg 7020

gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc tcttgatccg 7080

gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag attacgcgca 7140

gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac actcagtgga 7200

acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc ttcacctaga 7260

tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag taaacttggt 7320

ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt 7380

catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat 7440

ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctcca gatttatcag 7500

caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct 7560

ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt 7620

tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg 7680

cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca 7740

aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt 7800

tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat 7860

gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac 7920

cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc agaactttaa 7980

aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt 8040

tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt 8100

tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa 8160

gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt 8220

atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaa 8280

taggggttcc gcgcacattt ccccgaaaag tgccacctga cgtcgacgga tcgggagatc 8340

gatctcccga tcccctaggg tcgactctca gtacaatctg ctctgatgcc gcatagttaa 8400

gccagtatct gctccctgct tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt 8460

aagctacaac aaggcaaggc ttgaccgaca attgcatgaa gaatctgctt agggttaggc 8520

gttttgcgct gcttcgcgat gtacgggcca gatatacgcg ttgacattga ttattgacta 8580

gttattaata gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg 8640

ttacataact tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga 8700

cgtcaataat gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat 8760

gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat c 8811

<210> 2

<211> 4368

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 2

ggggttgggg ttgcgccttt tccaaggcag ccctgggttt gcgcagggac gcggctgctc 60

tgggcgtggt tccgggaaac gcagcggcgc cgaccctggg actcgcacat tcttcacgtc 120

cgttcgcagc gtcacccgga tcttcgccgc tacccttgtg ggccccccgg cgacgcttcc 180

tgctccgccc ctaagtcggg aaggttcctt gcggttcgcg gcgtgccgga cgtgacaaac 240

ggaagccgca cgtctcacta gtaccctcgc agacggacag cgccagggag caatggcagc 300

gcgccgaccg cgatgggctg tggccaatag cggctgctca gcagggcgcg ccgagagcag 360

cggccgggaa ggggcggtgc gggaggcggg gtgtggggcg gtagtgtggg ccctgttcct 420

gcccgcgcgg tgttccgcat tctgcaagcc tccggagcgc acgtcggcag tcggctccct 480

cgttgaccga atcaccgacc tctctcccca gggggatcca tggtgagcaa gggcgaggag 540

ctgttcaccg gggtggtgcc catcctggtc gagctggacg gcgacgtaaa cggccacaag 600

ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg gcaagctgac cctgaagttc 660

atctgcacca ccggcaagct gcccgtgccc tggcccaccc tcgtgaccac nctgacctac 720

ggcgtgtagt gcttcagccg ctaccccgac cacatgaagc agcacgactt cttcaagtcc 780

gccatgcccg aaggctacgt ccaggagcgc accatcttct tcaaggacga cggcaactac 840

aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg tgaaccgcat cgagctgaag 900

ggcatcgact tcaaggagga cggcaacatc ctggggcaca agctggagta caactacaac 960

agccacaacg tctatatcat ggccgacaag cagaagaacg gcatcaaggt gaacttcaag 1020

atccgccaca acatcgagga cggcagcgtg cagctcgccg accactacca gcagaacacc 1080

cccatcggcg acggccccgt gctgctgccc gacaaccact acctgagcac ccagtccgcc 1140

ctgagcaaag accccaacga gaagcgcgat cacatggtcc tgctggagtt cgtgaccgcc 1200

gccgggatca ctctcggcat ggacgagctg tacaagtaaa gcggccgcga ctctagatca 1260

taatcagcca taccacattt gtagaggttt tacttgcttt aaaaaacctc ccacacctcc 1320

ccctgaacct gaaacataaa atgaatgcaa ttgttgttgt taacttgttt attgcagctt 1380

ataatggtta caaataaagc aatagcatca caaatttcac aaataaagca tttttttcac 1440

tgcattctag ttgtggtttg tccaaactca tcaatgtatc ttagtcgacc gatgcccttg 1500

agagccttca acccagtcag ctccttccgg tgggcgcggg gcatgactat cgtcgccgca 1560

cttatgactg tcttctttat catgcaactc gtaggacagg tgccggcagc gctcttccgc 1620

ttcctcgctc actgactcgc tgcgctcggt cgttcggctg cggcgagcgg tatcagctca 1680

ctcaaaggcg gtaatacggt tatccacaga atcaggggat aacgcaggaa agaacatgtg 1740

agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc gcgttgctgg cgtttttcca 1800

taggctccgc ccccctgacg agcatcacaa aaatcgacgc tcaagtcaga ggtggcgaaa 1860

cccgacagga ctataaagat accaggcgtt tccccctgga agctccctcg tgcgctctcc 1920

tgttccgacc ctgccgctta ccggatacct gtccgccttt ctcccttcgg gaagcgtggc 1980

gctttctcat agctcacgct gtaggtatct cagttcggtg taggtcgttc gctccaagct 2040

gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc gccttatccg gtaactatcg 2100

tcttgagtcc aacccggtaa gacacgactt atcgccactg gcagcagcca ctggtaacag 2160

gattagcaga gcgaggtatg taggcggtgc tacagagttc ttgaagtggt ggcctaacta 2220

cggctacact agaagaacag tatttggtat ctgcgctctg ctgaagccag ttaccttcgg 2280

aaaaagagtt ggtagctctt gatccggcaa acaaaccacc gctggtagcg gtggtttttt 2340

tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct caagaagatc ctttgatctt 2400

ttctacgggg tctgacgctc agtggaacga aaactcacgt taagggattt tggtcatgag 2460

attatcaaaa aggatcttca cctagatcct tttaaattaa aaatgaagtt ttaaatcaat 2520

ctaaagtata tatgagtaaa cttggtctga cagttaccaa tgcttaatca gtgaggcacc 2580

tatctcagcg atctgtctat ttcgttcatc catagttgcc tgactccccg tcgtgtagat 2640

aactacgata cgggagggct taccatctgg ccccagtgct gcaatgatac cgcgggaccc 2700

acgctcaccg gctccagatt tatcagcaat aaaccagcca gccggaaggg ccgagcgcag 2760

aagtggtcct gcaactttat ccgcctccat ccagtctatt aattgttgcc gggaagctag 2820

agtaagtagt tcgccagtta atagtttgcg caacgttgtt gccattgcta caggcatcgt 2880

ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc ggttcccaac gatcaaggcg 2940

agttacatga tcccccatgt tgtgcaaaaa agcggttagc tccttcggtc ctccgatcgt 3000

tgtcagaagt aagttggccg cagtgttatc actcatggtt atggcagcac tgcataattc 3060

tcttactgtc atgccatccg taagatgctt ttctgtgact ggtgagtact caaccaagtc 3120

attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc ccggcgtcaa tacgggataa 3180

taccgcgcca catagcagaa ctttaaaagt gctcatcatt ggaaaacgtt cttcggggcg 3240

aaaactctca aggatcttac cgctgttgag atccagttcg atgtaaccca ctcgtgcacc 3300

caactgatct tcagcatctt ttactttcac cagcgtttct gggtgagcaa aaacaggaag 3360

gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa tgttgaatac tcatactctt 3420

cctttttcaa tattattgaa gcatttatca gggttattgt ctcatgagcg gatacatatt 3480

tgaatgtatt tagaaaaata aacaaatagg ggttccgcgc acatttcccc gaaaagtgcc 3540

acctgacgcg ccctgtagcg gcgcattaag cgcggcgggt gtggtggtta cgcgcagcgt 3600

gaccgctaca cttgccagcg ccctagcgcc cgctcctttc gctttcttcc cttcctttct 3660

cgccacgttc gccggctttc cccgtcaagc tctaaatcgg gggctccctt tagggttccg 3720

atttagtgct ttacggcacc tcgaccccaa aaaacttgat tagggtgatg gttcacgtag 3780

tgggccatcg ccctgataga cggtttttcg ccctttgacg ttggagtcca cgttctttaa 3840

tagtggactc ttgttccaaa ctggaacaac actcaaccct atctcggtct attcttttga 3900

tttataaggg attttgccga tttcggccta ttggttaaaa aatgagctga tttaacaaaa 3960

atttaacgcg aattttaaca aaatattaac gcttacaatt tgccattcgc cattcaggct 4020

gcgcaactgt tgggaagggc gatcggtgcg ggcctcttcg ctattacgcc agcccaagct 4080

accatgataa gtaagtaata ttaaggtacg ggaggtactt ggagcggccg caataaaata 4140

tctttatttt cattacatct gtgtgttggt tttttgtgtg aatcgatagt actaacatac 4200

gctctccatc aaaacaaaac gaaacaaaac aaactagcaa aataggctgt ccccagtgca 4260

agtgcaggtg ccagaacatt tctctatcga taggtaccga ttagtgaacg gatctcgacg 4320

gtatcgatca cgagactagc cagagatcca ctttggccgc ggctcgag 4368

<210> 3

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 3

accgagcact acacgccgta ggtc 24

<210> 4

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 4

aaacgaccta cggcgtgtag tgct 24

<210> 5

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 5

accgcaatcc agacactggt ggtc 24

<210> 6

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 6

aaacgaccac cagtgtctgg attg 24

<210> 7

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 7

accgcgggca gcgaccatag gaag 24

<210> 8

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 8

aaaccttcct atggtcgctg cccg 24

<210> 9

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 9

accgatggaa agcagacacg atag 24

<210> 10

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 10

aaacctatcg tgtctgcttt ccat 24

<210> 11

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 11

accggaacat gaactcttac gact 24

<210> 12

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 12

aaacagtcgt aagagttcat gttc 24

<210> 13

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 13

accgcctcta ttgtgctgtc atgt 24

<210> 14

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 14

aaacacatga cagcacaata gagg 24

<210> 15

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 15

accgcagcag ctcgtccttc actg 24

<210> 16

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 16

aaaccagtga aggacgagct gctg 24

<210> 17

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 17

accgtattac agaaaccagc cccg 24

<210> 18

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 18

aaaccggggc tggtttctgt aata 24

<210> 19

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 19

accgggctaa cgtgcgggag cgca 24

<210> 20

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 20

aaactgcgct cccgcacgtt agcc 24

<210> 21

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 21

accgattgat gtaatggatg cagt 24

<210> 22

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 22

aaacactgca tccattacat caat 24

<210> 23

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 23

accggtttca gaatcgaagg gtga 24

<210> 24

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 24

aaactcaccc ttcgattctg aaac 24

<210> 25

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 25

accgagacat attcctcact acaa 24

<210> 26

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 26

aaacttgtag tgaggaatat gtct 24

<210> 27

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 27

accggcgcat ggccacttcc tgtg 24

<210> 28

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 28

aaaccacagg aagtggccat gcgc 24

<210> 29

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 29

accgcttgta tcaggaccac atgc 24

<210> 30

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 30

aaacgcatgt ggtcctgata caag 24

<210> 31

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 31

accgaacgtg atggccatgt cgcc 24

<210> 32

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 32

aaacggcgac atggccatca cgtt 24

<210> 33

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 33

accgagccag actctgccga tgac 24

<210> 34

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 34

aaacgtcatc ggcagagtct ggct 24

<210> 35

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 35

accgtttgga aggaaagtgg tata 24

<210> 36

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 36

aaactatacc actttccttc caaa 24

<210> 37

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 37

taggcgggca gcgaccatag gaag 24

<210> 38

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 38

aaaccttcct atggtcgctg cccg 24

<210> 39

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 39

taggggctaa cgtgcgggag cgca 24

<210> 40

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 40

aaactgcgct cccgcacgtt agcc 24

<210> 41

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 41

taggagacat attcctcact acaa 24

<210> 42

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 42

aaacttgtag tgaggaatat gtct 24

<210> 43

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 43

taggtttgga aggaaagtgg tata 24

<210> 44

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 44

aaactatacc actttccttc caaa 24

<210> 45

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 45

tctcgcgcgt ttcggtgatg acgg 24

<210> 46

<211> 31

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 46

aaaaaaagca ccgactcggt gccacttttt c 31

<210> 47

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 47

accggttcag gttactggag acaa 24

<210> 48

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 48

aaacttgtct ccagtaacct gaac 24

<210> 49

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 49

accgctgata cctgaagtgt gtcc 24

<210> 50

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 50

aaacggacac acttcaggta tcag 24

<210> 51

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 51

accgcctcag gaggacgacc tggc 24

<210> 52

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 52

aaacgccagg tcgtcctcct gagg 24

<210> 53

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 53

accgcactca ccggaacatc acgg 24

<210> 54

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 54

aaacccgtga tgttccggtg agtg 24

<210> 55

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 55

tagggttcag gttactggag acaa 24

<210> 56

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 56

aaacttgtct ccagtaacct gaac 24

<210> 57

<211> 166

<212> PRT

<213>artificial sequence (Artificial Sequence)

<400> 57

Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu Thr

1 5 10 15

Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val Gly Ala Val

20 25 30

Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Pro Ile

35 40 45

Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg Gln

50 55 60

Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu Tyr

65 70 75 80

Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His Ser

85 90 95

Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys Thr Gly Ala

100 105 110

Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His Arg

115 120 125

Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu Leu

130 135 140

Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala Gln Lys Lys

145 150 155 160

Ala Gln Ser Ser Thr Asp

165

<210> 58

<211> 166

<212> PRT

<213>artificial sequence (Artificial Sequence)

<400> 58

Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu Thr

1 5 10 15

Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala Val

20 25 30

Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala Ile

35 40 45

Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg Gln

50 55 60

Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu Tyr

65 70 75 80

Val Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His Ser

85 90 95

Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ala Lys Thr Gly Ala

100 105 110

Ala Gly Ser Leu Met Asp Val Leu His Tyr Pro Gly Met Asn His Arg

115 120 125

Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu Leu

130 135 140

Cys Tyr Phe Phe Arg Met Pro Arg Gln Val Phe Asn Ala Gln Lys Lys

145 150 155 160

Ala Gln Ser Ser Thr Asp

165

<210> 59

<211> 1367

<212> PRT

<213>artificial sequence (Artificial Sequence)

<400> 59

Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly

1 5 10 15

Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys

20 25 30

Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly

35 40 45

Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys

50 55 60

Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr

65 70 75 80

Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe

85 90 95

Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His

100 105 110

Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His

115 120 125

Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser

130 135 140

Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met

145 150 155 160

Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp

165 170 175

Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn

180 185 190

Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys

195 200 205

Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu

210 215 220

Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu

225 230 235 240

Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp

245 250 255

Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp

260 265 270

Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu

275 280 285

Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile

290 295 300

Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met

305 310 315 320

Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala

325 330 335

Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp

340 345 350

Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln

355 360 365

Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly

370 375 380

Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys

385 390 395 400

Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly

405 410 415

Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu

420 425 430

Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro

435 440 445

Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met

450 455 460

Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val

465 470 475 480

Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn

485 490 495

Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu

500 505 510

Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr

515 520 525

Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys

530 535 540

Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val

545 550 555 560

Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser

565 570 575

Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr

580 585 590

Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn

595 600 605

Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu

610 615 620

Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His

625 630 635 640

Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr

645 650 655

Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys

660 665 670

Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala

675 680 685

Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys

690 695 700

Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His

705 710 715 720

Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile

725 730 735

Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg

740 745 750

His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr

755 760 765

Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu

770 775 780

Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val

785 790 795 800

Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln

805 810 815

Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu

820 825 830

Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp

835 840 845

Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly

850 855 860

Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn

865 870 875 880

Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

885 890 895

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys

900 905 910

Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys

915 920 925

His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu

930 935 940

Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys

945 950 955 960

Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu

965 970 975

Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val

980 985 990

Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val

995 1000 1005

Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser

1010 1015 1020

Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn

1025 1030 1035 1040

Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile

1045 1050 1055

Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val

1060 1065 1070

Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met

1075 1080 1085

Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe

1090 1095 1100

Ser Lys Glu Ser Ile Arg Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala

1105 1110 1115 1120

Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Val Ser Pro

1125 1130 1135

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys

1140 1145 1150

Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met

1155 1160 1165

Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys

1170 1175 1180

Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr

1185 1190 1195 1200

Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala

1205 1210 1215

Arg Phe Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro

1235 1240 1245

Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr

1250 1255 1260

Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile

1265 1270 1275 1280

Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His

1285 1290 1295

Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe

1300 1305 1310

Thr Leu Thr Asn Leu Gly Ala Pro Arg Ala Phe Lys Tyr Phe Asp Thr

1315 1320 1325

Thr Ile Asp Arg Lys Val Tyr Arg Ser Thr Lys Glu Val Leu Asp Ala

1330 1335 1340

Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp

1345 1350 1355 1360

Leu Ser Gln Leu Gly Gly Asp

1365

<210> 60

<211> 18

<212> PRT

<213>artificial sequence (Artificial Sequence)

<400> 60

Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys Arg

1 5 10 15

Lys Val

<210> 61

<211> 1803

<212> PRT

<213>artificial sequence (Artificial Sequence)

<400> 61

Met Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys

1 5 10 15

Arg Lys Val Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His

20 25 30

Ala Leu Thr Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val

35 40 45

Gly Ala Val Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn

50 55 60

Arg Pro Ile Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala

65 70 75 80

Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala

85 90 95

Thr Leu Tyr Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met

100 105 110

Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys

115 120 125

Thr Gly Ala Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met

130 135 140

Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala

145 150 155 160

Ala Leu Leu Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala

165 170 175

Gln Lys Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly

180 185 190

Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu

195 200 205

Ser Ser Gly Gly Ser Ser Gly Gly Ser Ser Glu Val Glu Phe Ser His

210 215 220

Glu Tyr Trp Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Arg Asp

225 230 235 240

Glu Arg Glu Val Pro Val Gly Ala Val Leu Val Leu Asn Asn Arg Val

245 250 255

Ile Gly Glu Gly Trp Asn Arg Ala Ile Gly Leu His Asp Pro Thr Ala

260 265 270

His Ala Glu Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn

275 280 285

Tyr Arg Leu Ile Asp Ala Thr Leu Tyr Val Thr Phe Glu Pro Cys Val

290 295 300

Met Cys Ala Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe

305 310 315 320

Gly Val Arg Asn Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val

325 330 335

Leu His Tyr Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile

340 345 350

Leu Ala Asp Glu Cys Ala Ala Leu Leu Cys Tyr Phe Phe Arg Met Pro

355 360 365

Arg Gln Val Phe Asn Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp Ser

370 375 380

Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser

385 390 395 400

Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Asp

405 410 415

Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp

420 425 430

Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val

435 440 445

Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala

450 455 460

Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg

465 470 475 480

Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu

485 490 495

Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe

500 505 510

His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu

515 520 525

Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu

530 535 540

Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr

545 550 555 560

Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile

565 570 575

Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn

580 585 590

Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln

595 600 605

Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala

610 615 620

Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile

625 630 635 640

Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile

645 650 655

Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu

660 665 670

Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp

675 680 685

Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe

690 695 700

Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu

705 710 715 720

Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile

725 730 735

Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala Leu

740 745 750

Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln

755 760 765

Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu

770 775 780

Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr

785 790 795 800

Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln

805 810 815

Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly Glu

820 825 830

Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys

835 840 845

Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr

850 855 860

Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr

865 870 875 880

Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val

885 890 895

Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe

900 905 910

Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu

915 920 925

Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val

930 935 940

Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys

945 950 955 960

Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys

965 970 975

Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val

980 985 990

Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr

995 1000 1005

His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu

1010 1015 1020

Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe

1025 1030 1035 1040

Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu

1045 1050 1055

Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly

1060 1065 1070

Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln

1075 1080 1085

Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn

1090 1095 1100

Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu

1105 1110 1115 1120

Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu

1125 1130 1135

His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu

1140 1145 1150

Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg His

1155 1160 1165

Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr

1170 1175 1180

Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu

1185 1190 1195 1200

Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu

1205 1210 1215

Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn

1220 1225 1230

Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser

1235 1240 1245

Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp

1250 1255 1260

Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys

1265 1270 1275 1280

Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1285 1290 1295

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp

1300 1305 1310

Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala

1315 1320 1325

Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His

1330 1335 1340

Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn

1345 1350 1355 1360

Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu

1365 1370 1375

Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile

1380 1385 1390

Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly

1395 1400 1405

Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr

1410 1415 1420

Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu

1425 1430 1435 1440

Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile

1445 1450 1455

Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg

1460 1465 1470

Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp

1475 1480 1485

Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro

1490 1495 1500

Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser

1505 1510 1515 1520

Lys Glu Ser Ile Arg Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1525 1530 1535

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Val Ser Pro Thr

1540 1545 1550

Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser

1555 1560 1565

Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu

1570 1575 1580

Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly

1585 1590 1595 1600

Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser

1605 1610 1615

Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Arg

1620 1625 1630

Phe Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn

1635 1640 1645

Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu

1650 1655 1660

Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu

1665 1670 1675 1680

Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu

1685 1690 1695

Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg

1700 1705 1710

Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr

1715 1720 1725

Leu Thr Asn Leu Gly Ala Pro Arg Ala Phe Lys Tyr Phe Asp Thr Thr

1730 1735 1740

Ile Asp Arg Lys Val Tyr Arg Ser Thr Lys Glu Val Leu Asp Ala Thr

1745 1750 1755 1760

Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1765 1770 1775

Ser Gln Leu Gly Gly Asp Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly

1780 1785 1790

Ser Glu Phe Glu Pro Lys Lys Lys Arg Lys Val

1795 1800

Claims

1. a kind of fusion protein, including ecTadA-ecTadA* dimer fragment and SpCas9-NG D10A nickase segment, The ecTadA-ecTadA* dimer fragment includes ecTad segment and ecTadA* segment.

2. fusion protein as described in claim 1, which is characterized in that the amino acid sequence of the ecTadA segment includes:

A) amino acid sequence as shown in SEQ ID NO.57；Or,

B) with amino acid sequence of the SEQ ID NO.57 with 80% or more sequence similarity and with amino acid defined by a) The function of sequence is preferably capable forming dimer with ecTadA* segment and dimer has adenine deaminase activity.

3. fusion protein as described in claim 1, which is characterized in that the amino acid sequence of the ecTadA* segment includes:

C) amino acid sequence as shown in SEQ ID NO.58；Or,

D) with amino acid sequence of the SEQ ID NO.58 with 80% or more sequence similarity and with amino acid defined by c) The function of sequence is preferably capable forming dimer with ecTadA segment and dimer has adenine deaminase activity.

4. fusion protein as described in claim 1, which is characterized in that the ammonia of the SpCas9-NG D10A nickase segment Base acid sequence includes:

E) amino acid sequence as shown in SEQ ID NO.59；Or,

F) with amino acid sequence of the SEQ ID NO.59 with 80% or more sequence similarity and with amino acid defined by e) The function of sequence is preferably capable identification NG as PAM.

5. fusion protein as described in claim 1, which is characterized in that the fusion protein, which is held from 5 ' ends to 3 ', successively includes EcTadA-ecTadA* dimer fragment and SpCas9-NG D10A nickase segment；

And/or it successively includes ecTad segment and ecTadA* that the ecTadA-ecTadA* dimer fragment is held from 5 ' ends to 3 ' Segment；

And/or the fusion protein further includes nuclear localization signal segment, it is preferred that the nuclear localization signal segment is located at 5 ' the ends and/or 3 ' ends of ecTadA-ecTadA* dimer fragment and SpCas9-NG D10A nickase segment, it is preferred that institute The amino acid sequence of nuclear localization signal segment is stated as shown in SEQ ID NO.60.

6. fusion protein as described in claim 1, which is characterized in that the amino acid sequence of the fusion protein such as SEQ ID Shown in No.61.

7. a kind of isolated polynucleotides encode the fusion protein as described in claim 1~6 any claim.

8. a kind of construct, the construct contains the polynucleotides separated as claimed in claim 7.

9. a kind of expression system, the expression system contains to be integrated with outside in construct as claimed in claim 8 or genome The polynucleotides as claimed in claim 7 in source.

10. expression system as claimed in claim 9, which is characterized in that it is thin that the host cell of the expression system is selected from eukaryon Born of the same parents or prokaryotic cell are preferably selected from mouse cell, people's cell, be more preferably selected from mouse brain nerve oncocyte, human embryonic kidney cell, Or human cervical carcinoma cell, it is more preferably selected from N2a cell, HEK293FT cell or Hela cell.

11. fusion protein as described in claim 1~6 any claim, the multicore glycosides separated as claimed in claim 7 Construct sour, as claimed in claim 8 or the expression system as described in claim 9~10 any claim are compiled in gene Purposes in volume.

12. purposes as claimed in claim 11, which is characterized in that the purposes is specially in Eukaryotic gene editing Purposes.

13. a kind of base editor system, including the fusion protein as described in claim 1~6 any claim, the base Editor's system further includes sgRNA.

14. a kind of gene editing method, comprising: by fusion protein as described in claim 1~6 any claim or Base editor's system as claimed in claim 13 carries out gene editing.