CN111893104A - 一种基于结构的crispr蛋白的优化设计方法 - Google Patents

一种基于结构的crispr蛋白的优化设计方法 Download PDF

Info

Publication number
CN111893104A
CN111893104A CN202010666107.5A CN202010666107A CN111893104A CN 111893104 A CN111893104 A CN 111893104A CN 202010666107 A CN202010666107 A CN 202010666107A CN 111893104 A CN111893104 A CN 111893104A
Authority
CN
China
Prior art keywords
lys
leu
glu
asp
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010666107.5A
Other languages
English (en)
Other versions
CN111893104B (zh
Inventor
黄强
薛冬梅
朱海霞
杜文豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010666107.5A priority Critical patent/CN111893104B/zh
Publication of CN111893104A publication Critical patent/CN111893104A/zh
Application granted granted Critical
Publication of CN111893104B publication Critical patent/CN111893104B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Abstract

本发明属于蛋白质工程技术领域,具体为一种基于结构的CRISPR蛋白的优化设计方法。本发明基于已解析的CRISPR/Cas9蛋白结构,首先通过分析和比较,找出对Cas9某一功能有重要影响的氨基酸位点;然后结合Rosetta优化Cas9的Protein‑DNA相互作用界面,最后设计并获得新型突变体,成功地获得一个突变体yCas9。该突变体具有xCas9相当的剪切活性:宽泛的PAM识别范围,可识别NGG、NGA、NGT、NGC、GAA和GAT,且在基因编辑时脱靶率低。因此,yCas9在功能上可与xCas9并驾齐驱,是一个有潜在应用价值的基因编辑工具,证明本设计方法的有效性和实用性。

Description

一种基于结构的CRISPR蛋白的优化设计方法
技术领域
本发明属于蛋白质工程技术领域,具体涉及基于结构的CRISPR蛋白的优化设计方法,还涉及由该优化设计方法得到的SpCas9的新型突变体,以及作为基因编辑工具的应用。
背景技术
CRISPR是细菌和古细菌中抵抗外源核酸入侵的免疫系统。近年来,已被人们开发成一种特异性切割DNA片段的基因编辑工具[1]。目前人们最常使用的是II型CRISPR系统中的化脓性链球菌Cas9(SpCas9)[2]。但是,SpCas9存在PAM(protospacer adjacent motifs)识别范围有限以及基因编辑时易脱靶的问题,从而使其更广泛应用受到限制[3-5]。因此,只有拓展PAM识别范围和降低脱靶效应,才能使SpCas9技术在基因编辑领域发挥更大的作用。
为了达到以上目标,人们已提出多种研究方法:一种是利用细菌选择系统来筛选设计SpCas9,通过该方法获得了PAM识别范围扩大的VQR、EQR和VRER突变体[6]。另一种是基于SpCas9结构指导的改造方法,也获得了高保真突变体eSpCas9、SpCas9-HF、HypaCas9和evoCas9[7、8]。此外,还有人通过PACE定向进化的方法获得了一个SpCas9突变体(xCas9)。该突变体不仅有高特异性,而且PAM识别更加灵活[9]。虽然,这些方法可以帮助人们获得Cas9的新型突变体,但是这些方法盲目性比较大,且耗时耗力。
随着计算机运算速度的快速增长,很多计算软件被用来辅助蛋白质设计。这个方法不受体系的限制,只要有蛋白质的三维结构,就可以将蛋白结构模型先精细化,再通过模拟自由能方法计算一些物理和化学过程的自由能变化,从而进行蛋白设计与改造。生物分子结构预测和设计的Rosetta软件包(https://www.rosettacommons.org)包含140个应用程序,该软件可进行蛋白质结构从头预测、蛋白质结构设计和蛋白质重设计等[10、11]。用Rosetta进行蛋白质设计的方法盲目性低,操作更简单,效率更高。
基于上述问题,开发一种利用Rosetta软件的优化设计CRISPR蛋白的方法十分有必要。
发明内容
本发明的第一个目的是提供一种基于结构的CRISPR蛋白的优化设计方法,以获得SpCas9的新型突变体。
本发明的第二个目的是利用本发明方法获得SpCas9的新型突变体(yCas9),为生物学领域提供有应用价值的新型基因编辑工具。
本发明提供的基于结构的CRISPR蛋白的优化设计方法,从能量角度出发,使用Rosetta来优化Cas9蛋白与DNA有相互作用的重要氨基酸,以增强Cas9对DNA的结合力,从而获得PAM兼容性高或脱靶率低的突变体,记该突变体为yCas9。
具体而言,本发明提供基于结构的CRISPR蛋白的优化设计方法,流程如附图1所示,其具体步骤如下。
第一步:确定对Cas9特性有重要影响的氨基酸位点。
蛋白优化设计的第一步就是确定对Cas9特性有重要影响的氨基酸位点。从蛋白质数据库(http://www.rcsb.org/)中下载蛋白结构,通过分析和比较蛋白的电子密度图,找出对Cas9的PAM识别或者脱靶作用有重要影响的氨基酸位点;具体地,xCas9突变的7个定向进化位点(A262T、R324L、S409I、E480K、E543D、M694I和E1219V)分布在sgRNA-DNA异源双链体的两侧(附图2)。这7个氨基酸共同作用,拓展了xCas9的PAM识别范围并降低了其脱靶效应。因此,选取这7个位点作为Rosetta优化设计的氨基酸位点。
第二步:Relax能量最小化。
Relax[12、13]是Rosetta软件里结构预测中的一个模块。在Rosetta力场中,使用该模块通过多次迭代使结构的氨基酸侧链重排,并且在每轮迭代中不断地调整范德华作用力,以能量最小化的方式搜索该结构的局部最优构象。所以,此步骤是将第一步中选择的结构用Relax模块处理,使其处于能量最低的状态。
本发明用Relax模块对SpCas9结构(PDB ID: 4UN3)进行多轮(25轮)迭代,并根据Rosetta的打分函数,选出能量最低的结构作为初始结构。
第三步:Fixbb优化Protein-DNA相互作用界面。
Fixbb (Fixed backbone design )[14、15]是Rosetta软件里蛋白设计中一个模块。该模块可在固定蛋白主链骨架的同时,使侧链自由旋转,寻找其最佳的位置,并设计最适合该位置的氨基酸。所以,此步骤是把步骤二处理过的结构用Fixbb模块固定它的蛋白骨架,然后针对步骤一中选出的氨基酸位点进行20种氨基酸随机组合设计,设置程序输出10000个组合结果。
具体地,本发明使用Fixbb模块固定第二步中筛选出的初始结构的主链原子,并对第一步中确定的7个氨基酸位点(262、324、409、480、543、694、1219)进行20种氨基酸组合突变设计,使程序输出10000个排列组合。
第四步:去除重复性结果,筛选突变体。
虽然每个氨基酸位点都有20种氨基酸可以选择,n个氨基酸位点就有20n种随机组合,但是适合该固定骨架且能量低的情况并不多,所以Fixbb模块输出的结果中,会有很多重复的结果。此步骤就通过对输出的10000个结果进行序列比对,去除重复的结果,把剩余结果按能量从低到高的顺序排列,并从中进行挑选。最终选择分数值低即更接近自然构象的突变体,并将其命名为yCas9。
第五步:突变体的表达与验证。
将筛选到的突变体首先进行质粒构造,然后表达与纯化蛋白,最后对蛋白的剪切活性进行检测,以鉴定上述基于Relax和Fixbb等多种Rosetta模块设计出的突变体的性能。
本发明得到的yCas9,其氨基酸序列如SEQ ID NO.6所示,是将野生型SpCas9的第409位的氨基酸S、第480位的E、第543位的E、第694位的M和第1219位的E分别突变成了N、K、D、L和T。
yCas9具有和xCas9相当的基因编辑活性:宽泛的PAM识别范围,可识别NGG、NGA、NGT、NGC、GAA和GAT,且在基因编辑时脱靶率低。
本发明中,所述野生型SpCas9核酸酶的核苷酸序列和氨基酸序列分别为SEQIDNO.1和SEQ ID NO.2所示。
所述xCas9核酸酶的核苷酸序列和氨基酸序列分别为SEQ ID NO.3和SEQ ID NO.4所示。
所述yCas9核酸酶的核苷酸序列和氨基酸序列分别为SEQ ID NO.5和SEQ ID NO.6所示。
本发明还提供一种多核苷酸序列,可以转录和翻译成所述的yCas9核酸酶,其序列为SEQ ID NO.7。
本发明还提供一种表达载体,其含有上述多核苷酸序列。
本发明还提供一种宿主细胞,可以用于转化上述表达载体。
本发明还提供所述yCas9核酸酶的制备方法,具体步骤包括:首先,构建所述yCas9核酸酶的多核苷酸序列表达载体;然后,将所述表达载体转化至宿主细胞,筛选并挑出单克隆;最后,将所述单克隆诱导表达,并通过亲和层析从表达产物中分离出所述的yCas9核酸酶。
本发明提供的上述CRISPR/Cas9(yCas9)核酸酶、多核苷酸序列以及表达载体均可作为编辑基因组DNA的编辑工具,用于基因组DNA片段的相关编辑。
本发明中,所述的基因编辑可以是单点编辑,也可以是编辑位点大于等于两个的多点编辑。
所述编辑的手段包括删除、突变、插入、倒位、移位、重复或易位。
本发明中,所述CRISPR/Cas9编辑工具包括与靶标DNA片段匹配的引导sgRNA。
所述的CRISPR/Cas9核酸酶与能够介导它的sgRNA组合,从而对目的基因进行编辑。
附图说明
图1为Rosetta组合突变设计流程图。
图2为xCas9定向进化位点的三维分布。
图3为pET21-6His-TEV-yCas9质粒构造。
图4为含有yCas9的突变氨基酸的质粒测序。
图5为CRISPR/Cas9目标蛋白纯化的电泳鉴定。
图6为sgRNA的在靶和脱靶序列。
图7为SpCas9 PAM识别的检测。
图8为xCas9 PAM识别的检测。
图9为yCas9 PAM识别的检测。
图10为SpCas9脱靶效应的检测。
图11为xCas9脱靶效应的检测。
图12为yCas9脱靶效应的检测。
具体实施方式
下述实施例中所用的实验方法,如无特定说明,均为常规方法。
下述实施例中所用的材料、试剂等,如无特定说明,均为从商业途径获得。
一、SpCas9的组合突变设计
第一步:确定对Cas9特性有重要影响的氨基酸位点。
xCas9突变的7个定向进化位点(A262T、R324L、S409I、E480K、E543D、M694I和E1219V)分布在sgRNA-DNA异源双链体的两侧(附图2)。这7个氨基酸共同作用,拓展了xCas9的PAM识别范围并降低了其脱靶效应。因此,选取这7个位点作为Rosetta优化设计的氨基酸位点。
第二步:Relax能量最小化。
用Relax模块将SpCas9结构(PDB ID: 4UN3)进行25轮迭代,并根据Rosetta的打分函数选出能量最低的结构作为初始结构。
第三步:Fixbb优化Protein-DNA相互作用界面。
用Fixbb模块固定第二步中筛选出的初始结构的主链原子,并对第一步中确定的7个氨基酸位点(262、324、409、480、543、694、1219)进行20种氨基酸组合突变设计,使程序输出10000个排列组合。
第四步:去除重复性结果,筛选突变体。
对这10000个组合进行序列比对,去除重复性结果,并将剩余结果按分数大小排列。最终选择分数值低即更接近自然构象的突变体,并将其命名为yCas9。
第五步:突变体的表达与验证。
本发明的yCas9是将野生型SpCas9的第409位的氨基酸S、第480位的E、第543位的E、第694位的M和第1219位的E分别突变成了N、K、D、L和T。yCas9具有和xCas9相当的基因编辑活性。所述yCas9含有如SEQ ID NO.6所示的氨基酸序列。
二、突变体yCas9的质粒构造、蛋白表达与纯化、检测
(1)yCas9质粒构造:本文分别以质粒pET21-6His-TEV-SpCas9(SEQ ID NO.8)和pET21-6His-TEV-xCas9(SEQ ID NO.9)为模板,通过一步定向克隆和点突变的方式构建突变体质粒,其具体质粒构建策略如附图3所示。质粒构建完成后,通过上海杰李生物技术有限公司测序,其结果如附图4所示,表明yCas9质粒构建成功。
(2)yCas9蛋白表达与纯化:首先,将上述获得的yCas9质粒转入Rosetta(DE3)细胞中表达,当菌液的OD600达到0.9时,加入终浓度为0.5 μM的IPTG,16℃过夜培养。其次,离心收集菌体,并用裂解液重悬菌体,超声破碎20 min,12 000 rpm离心1 h。然后,用Ni柱亲和层析的方式纯化蛋白。最后,用SDS-PAGE鉴定,收集目的蛋白并将其浓缩至250 μL,保存于蛋白储存液中。将浓缩后的蛋白再次进行电泳鉴定,结果如附图5所示,yCas9的杂蛋白已经非常少了,可用于后续实验。
(3)底物DNA的获取:底物DNA序列如SEQ ID NO.10所示,通过PCR扩增和割胶回收的方式获得。底物DNA为920 bp,其中包含20 bp靶序列和PAM序列,PAM序列又分别为5'-TGG-3'、5'-TGA-3'、5'-TGC-3'、5'-TGT-3'、5'-GAA-3'和5'-GAT-3'。原始底物DNA的PAM序列为TGG(SEQ ID NO.10),通过点突变的方式获得带有其他PAM序列的DNA。
(4)sgRNA的获取:sgRNA序列如SEQ ID NO.11所示,通过PCR扩增,体外转录和纯化的方式获得。野生型sgRNA(SEQ ID NO.11)中与底物DNA的靶序列匹配的间隔序列为5'-UACCGCUCCAGUCGUUCAUG-3'(SEQ ID NO.12)。从PAM远端开始依次将sgRNA间隔序列的两个碱基突变为其互补碱基,共改8次,最终获得9个sgRNA,其序列如附图6所示,这些sgRNA将用于脱靶检测。
(5)yCas9的PAM识别检测:利用带有不同的PAM序列的底物DNA检测yCas9的PAM识别能力,其反应体系为20 μL体系:200 nM Cas9、200 nM sgRNA、30 nM DNA和反应缓冲液(20 mM HEPES, 150 mM KCl, 1 mM DTT, 10 mM MgCl2,pH 7.5)。首先在反应缓冲液中加入Cas9和sgRNA,混合均匀后将其放置在37℃预孵育10 min,然后将DNA加入混合物中,继续孵育2 h。反应结束后,使用琼脂糖凝胶电泳检测Cas9的底物切割情况,其结果如附图7、8和9所示。由图可见,野生型SpCas9主要识别TGG、TGA和TGC,表明其可识别的PAM的类型非常有限(图7,Lanes 2-4)。从图8可以看出,xCas9不仅可以识别TGG、TGA和TGC,而且可以识别TGT、GAA和GAT序列。而yCas9对包含这些PAM序列的DNA的剪切活性与xCas9相当(图9)。值得注意的是,yCas9对GAT的识别能力比xCas9高(图9,Lane 7)。由此说明,Rosetta设计的yCas9突变体的PAM识别范围比野生型SpCas9大,并且其对GAT的识别能力优于xCas9。
(6)yCas9的脱靶效应检测:利用不同的sgRNA,如附图6中所示的0到8号sgRNA检测yCas9的体外切割活性,从而评价yCas9的脱靶效应,其反应体系为20 μL体系:200 nMCas9、200 nM sgRNA、30 nM DNA和反应缓冲液(20 mM HEPES, 150 mM KCl, 1 mM DTT, 10mM MgCl2,pH 7.5)。首先在反应缓冲液中加入Cas9和sgRNA,混合均匀后将其放置在37℃预孵育10 min,然后将DNA加入混合物中,继续孵育3 h。反应结束后,使用琼脂糖凝胶电泳检测Cas9的底物切割情况,其结果如附图10、11和12所示。由图可见,1 - 8号sgRNA均可以引导SpCas9剪切底物DNA。其中,1-5号sgRNA引导的剪切活性与0号sgRNA相当(图10,Lanes 3-8),这表明SpCas9的脱靶效应比较严重。与SpCas9相比,xCas9的脱靶效率明显降低。在1-8号sgRNA的引导下,xCas9对底物的切割活性都有下降(图11,Lanes 3-11)。其中,泳道9和11的结果表明6号和8号sgRNA几乎不引导xCas9切割底物DNA。由图12可见,yCas9的脱靶效应与xCas9相当,在相同的反应时间内,yCas9对DNA的剪切与xCas9几乎一致。因此,yCas9的脱靶率远远低于野生型SpCas9。
综上,yCas9突变体在功能上与xCas9相当,既拓展了PAM识别能力,又降低了基因编辑时的脱靶效应,是一个有潜在应用价值的基因编辑工具,可以加强CRISPR系统介导的基因编辑技术的实用性和有效性。所以,本文所设计的基于结构的理性设计方法是可行有效的,为人们继续开发多PAM识别和低脱靶的多功能型Cas9蛋白提供了新的途径。
参考文献
[1] Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating andtargeting genomes[J]. Nat Biotechnol, 2014, 32(4): 347-355.
[2] Lier C, Baticle E, Horvath P, et al. Analysis of the type II-ACRISPR-Cas system of Streptococcus agalactiae reveals distinctive featuresaccording to genetic lineages[J]. Front Genet, 2015, 6: 214.
[3] Hsu PD, Scott DA, Weinstein JA, et al. DNA targeting specificity ofRNA-guided Cas9 nucleases[J]. Nat Biotechnol, 2013, 31(9): 827-832.
[4] Fu Y, Sander JD, Reyon D, et al. Improving CRISPR-Cas nucleasespecificity using truncated guide RNAs[J]. Nat Biotechnol, 2014, 32(3): 279-284.
[5] Cong L, Ran FA, Cox D, et al. Multiplex genome engineering usingCRISPR/Cas systems[J]. Science, 2013, 339(6121): 819-823.
[6] Kleinstiver BP, Prew MS, Tsai SQ, et al. Engineered CRISPR/Cas9nucleases with altered PAM specificities[J]. Nature, 2015, 523(7561): 481-485.
[7] Chen JS, Dagdas YS, Kleinstiver BP, et al. Enhanced proofreadinggoverns CRISPR/Cas9 targeting accuracy[J]. Nature, 2017, 550(7676): 407-410.
[8] Kleinstiver BP, Pattanayak V, Prew MS, et al. High-fidelity CRISPR/Cas9 nucleases with no detectable genome-wide off-target effects[J]. Nature,2016, 529(7587): 490-495.
[9] Hu JH, Miller SM, Geurts MH, et al. Evolved Cas9 variants with broadPAM compatibility and high DNA specificity[J]. Nature, 2018, 556(7699): 57-63.
[10] Leaver-Fay A, Tyka M, Lewis SM, et al. ROSETTA3: an object-orientedsoftware suite for the simulation and design of macromolecules[J]. MethodsEnzymol, 2011, 487: 545-574.
[11] Kerstin S. Broo, Lars Brive, Per Ahlberg, et al. Catalysis ofhydrolysis and transesterifification reactions of p-nitrophenyl esters by adesigned helix-loop helix dimer[J]. Am Chem Soc, 1997, 119: 11362–11372.
[12] Nivon L G, Moretti R, Baker D. A Pareto-optimal refinement methodfor protein design scaffolds[J]. PLoS One. 2013, 8(4): e59004.
[13] Conway P, Tyka M D, DiMaio F, et al. Relaxation of backbone bondgeometry improves protein energy landscape modeling[J]. Protein Sci. 2014, 23(1): 47-55.
[14] Kuhlman B, Dantas G, Ireton G C, et al. Design of a novel globularprotein fold with atomic-level accuracy[J]. Science. 2003, 302(5649): 1364-1368.
[15] Hu X, Wang H, Ke H, et al. High-resolution design of a protein loop[J]. Proc Natl Acad Sci U S A. 2007, 104(45): 17668-17673。
序列表
<110> 复旦大学
<120> 一种基于结构的CRISPR蛋白的优化设计方法
<130> 001
<160> 12
<170> SIPOSequenceListing 1.0
<210> 1
<211> 4104
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 1
ggcgacaaga agtactccat tgggctcgat atcggcacaa acagcgtcgg ctgggccgtc 60
attacggacg agtacaaggt gccgagcaaa aaattcaaag ttctgggcaa taccgatcgc 120
cacagcataa agaagaacct cattggcgcc ctcctgttcg actccgggga gacggccgaa 180
gccacgcggc tcaaaagaac agcacggcgc agatataccc gcagaaagaa tcggatctgc 240
tacctgcagg agatctttag taatgagatg gctaaggtgg atgactcttt cttccatagg 300
ctggaggagt cctttttggt ggaggaggat aaaaagcacg agcgccaccc aatctttggc 360
aatatcgtgg acgaggtggc gtaccatgaa aagtacccaa ccatatatca tctgaggaag 420
aagcttgtag acagtactga taaggctgac ttgcggttga tctatctcgc gctggcgcat 480
atgatcaaat ttcggggaca cttcctcatc gagggggacc tgaacccaga caacagcgat 540
gtcgacaaac tctttatcca actggttcag acttacaatc agcttttcga agagaacccg 600
atcaacgcat ccggagttga cgccaaagca atcctgagcg ctaggctgtc caaatcccgg 660
cggctcgaaa acctcatcgc acagctccct ggggagaaga agaacggcct gtttggtaat 720
cttatcgccc tgtcactcgg gctgaccccc aactttaaat ctaacttcga cctggccgaa 780
gatgccaagc ttcaactgag caaagacacc tacgatgatg atctcgacaa tctgctggcc 840
cagatcggcg accagtacgc agaccttttt ttggcggcaa agaacctgtc agacgccatt 900
ctgctgagtg atattctgcg agtgaacacg gagatcacca aagctccgct gagcgctagt 960
atgatcaagc gctatgatga gcaccaccaa gacttgactt tgctgaaggc ccttgtcaga 1020
cagcaactgc ctgagaagta caaggaaatt ttcttcgatc agtctaaaaa tggctacgcc 1080
ggatacattg acggcggagc aagccaggag gaattttaca aatttattaa gcccatcttg 1140
gaaaaaatgg acggcaccga ggagctgctg gtaaagctta acagagaaga tctgttgcgc 1200
aaacagcgca ctttcgacaa tggaagcatc ccccaccaga ttcacctggg cgaactgcac 1260
gctatcctca ggcggcaaga ggatttctac ccctttttga aagataacag ggaaaagatt 1320
gagaaaatcc tcacatttcg gataccctac tatgtaggcc ccctcgcccg gggaaattcc 1380
agattcgcgt ggatgactcg caaatcagaa gagaccatca ctccctggaa cttcgaggaa 1440
gtcgtggata agggggcctc tgcccagtcc ttcatcgaaa ggatgactaa ctttgataaa 1500
aatctgccta acgaaaaggt gcttcctaaa cactctctgc tgtacgagta cttcacagtt 1560
tataacgagc tcaccaaggt caaatacgtc acagaaggga tgagaaagcc agcattcctg 1620
tctggagagc agaagaaagc tatcgtggac ctcctcttca agacgaaccg gaaagttacc 1680
gtgaaacagc tcaaagaaga ctatttcaaa aagattgaat gtttcgactc tgttgaaatc 1740
agcggagtgg aggatcgctt caacgcatcc ctgggaacgt atcacgatct cctgaaaatc 1800
attaaagaca aggacttcct ggacaatgag gagaacgagg acattcttga ggacattgtc 1860
ctcaccctta cgttgtttga agatagggag atgattgaag aacgcttgaa aacttacgct 1920
catctcttcg acgacaaagt catgaaacag ctcaagaggc gccgatatac aggatggggg 1980
cggctgtcaa gaaaactgat caatgggatc cgagacaagc agagtggaaa gacaatcctg 2040
gattttctta agtccgatgg atttgccaac cggaacttca tgcagttgat ccatgatgac 2100
tctctcacct ttaaggagga catccagaaa gcacaagttt ctggccaggg ggacagtctt 2160
cacgagcaca tcgctaatct tgcaggtagc ccagctatca aaaagggaat actgcagacc 2220
gttaaggtcg tggatgaact cgtcaaagta atgggaaggc ataagcccga gaatatcgtt 2280
atcgagatgg cccgagagaa ccaaactacc cagaagggac agaagaacag tagggaaagg 2340
atgaagagga ttgaagaggg tataaaagaa ctggggtccc aaatccttaa ggaacaccca 2400
gttgaaaaca cccagcttca gaatgagaag ctctacctgt actacctgca gaacggcagg 2460
gacatgtacg tggatcagga actggacatc aatcggctct ccgactacga cgtggatcat 2520
atcgtgcccc agtcttttct caaagatgat tctattgata ataaagtgtt gacaagatcc 2580
gataaaaata gagggaagag tgataacgtc ccctcagaag aagttgtcaa gaaaatgaaa 2640
aattattggc ggcagctgct gaacgccaaa ctgatcacac aacggaagtt cgataatctg 2700
actaaggctg aacgaggtgg cctgtctgag ttggataaag caggcttcat caaaaggcag 2760
cttgttgaga cacgccagat caccaagcac gtggcccaaa ttctcgattc acgcatgaac 2820
accaagtacg atgaaaatga caaactgatt cgagaggtga aagttattac tctgaagtct 2880
aagctggtct cagatttcag aaaggacttt cagttttata aggtgagaga gatcaacaat 2940
taccaccatg cgcatgatgc ctacctgaat gcagtggtag gcactgcact tatcaaaaaa 3000
tatcccaagc ttgaatctga atttgtttac ggagactata aagtgtacga tgttaggaaa 3060
atgatcgcaa agtctgagca ggaaataggc aaggccaccg ctaagtactt cttttacagc 3120
aatattatga attttttcaa gaccgagatt acactggcca atggagagat tcggaagcga 3180
ccacttatcg aaacaaacgg agaaacagga gaaatcgtgt gggacaaggg tagggatttc 3240
gcgacagtcc ggaaggtcct gtccatgccg caggtgaaca tcgttaaaaa gaccgaagta 3300
cagaccggag gcttctccaa ggaaagtatc ctcccgaaaa ggaacagcga caagctgatc 3360
gcacgcaaaa aagattggga ccccaagaaa tacggcggat tcgattctcc tacagtcgct 3420
tacagtgtac tggttgtggc caaagtggag aaagggaagt ctaaaaaact caaaagcgtc 3480
aaggaactgc tgggcatcac aatcatggag cgatcaagct tcgaaaaaaa ccccatcgac 3540
tttctcgagg cgaaaggata taaagaggtc aaaaaagacc tcatcattaa gcttcccaag 3600
tactctctct ttgagcttga aaacggccgg aaacgaatgc tcgctagtgc gggcgagctg 3660
cagaaaggta acgagctggc actgccctct aaatacgtta atttcttgta tctggccagc 3720
cactatgaaa agctcaaagg gtctcccgaa gataatgagc agaagcagct gttcgtggaa 3780
caacacaaac actaccttga tgagatcatc gagcaaataa gcgaattctc caaaagagtg 3840
atcctcgccg acgctaacct cgataaggtg ctttctgctt acaataagca cagggataag 3900
cccatcaggg agcaggcaga aaacattatc cacttgttta ctctgaccaa cttgggcgcg 3960
cctgcagcct tcaagtactt cgacaccacc atagacagaa agcggtacac ctctacaaag 4020
gaggtcctgg acgccacact gattcatcag tcaattacgg ggctctatga aacaagaatc 4080
gacctctctc agctcggtgg agac 4104
<210> 2
<211> 1368
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 2
Gly Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser
1025 1030 1035 1040
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1045 1050 1055
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1060 1065 1070
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
1075 1080 1085
Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1090 1095 1100
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile
1105 1110 1115 1120
Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser
1125 1130 1135
Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1140 1145 1150
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile
1155 1160 1165
Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala
1170 1175 1180
Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys
1185 1190 1195 1200
Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser
1205 1210 1215
Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr
1220 1225 1230
Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val
1265 1270 1275 1280
Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1285 1290 1295
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1300 1305 1310
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp
1315 1320 1325
Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1330 1335 1340
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile
1345 1350 1355 1360
Asp Leu Ser Gln Leu Gly Gly Asp
1365
<210> 3
<211> 4104
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 3
ggcgacaaga agtactccat tgggctcgat atcggcacaa acagcgtcgg ctgggccgtc 60
attacggacg agtacaaggt gccgagcaaa aaattcaaag ttctgggcaa taccgatcgc 120
cacagcataa agaagaacct cattggcgcc ctcctgttcg actccgggga gacggccgaa 180
gccacgcggc tcaaaagaac agcacggcgc agatataccc gcagaaagaa tcggatctgc 240
tacctgcagg agatctttag taatgagatg gctaaggtgg atgactcttt cttccatagg 300
ctggaggagt cctttttggt ggaggaggat aaaaagcacg agcgccaccc aatctttggc 360
aatatcgtgg acgaggtggc gtaccatgaa aagtacccaa ccatatatca tctgaggaag 420
aagcttgtag acagtactga taaggctgac ttgcggttga tctatctcgc gctggcgcat 480
atgatcaaat ttcggggaca cttcctcatc gagggggacc tgaacccaga caacagcgat 540
gtcgacaaac tctttatcca actggttcag acttacaatc agcttttcga agagaacccg 600
atcaacgcat ccggagttga cgccaaagca atcctgagcg ctaggctgtc caaatcccgg 660
cggctcgaaa acctcatcgc acagctccct ggggagaaga agaacggcct gtttggtaat 720
cttatcgccc tgtcactcgg gctgaccccc aactttaaat ctaacttcga cctggccgaa 780
gataccaagc ttcaactgag caaagacacc tacgatgatg atctcgacaa tctgctggcc 840
cagatcggcg accagtacgc agaccttttt ttggcggcaa agaacctgtc agacgccatt 900
ctgctgagtg atattctgcg agtgaacacg gagatcacca aagctccgct gagcgctagt 960
atgatcaagc tgtatgatga gcaccaccaa gacttgactt tgctgaaggc ccttgtcaga 1020
cagcaactgc ctgagaagta caaggaaatt ttcttcgatc agtctaaaaa tggctacgcc 1080
ggatacattg acggcggagc aagccaggag gaattttaca aatttattaa gcccatcttg 1140
gaaaaaatgg acggcaccga ggagctgctg gtaaagctta acagagaaga tctgttgcgc 1200
aaacagcgca ctttcgacaa tggaattatc ccccaccaga ttcacctggg cgaactgcac 1260
gctatcctca ggcggcaaga ggatttctac ccctttttga aagataacag ggaaaagatt 1320
gagaaaatcc tcacatttcg gataccctac tatgtaggcc ccctcgcccg gggaaattcc 1380
agattcgcgt ggatgactcg caaatcagaa gagaccatca ctccctggaa cttcgagaaa 1440
gtcgtggata agggggcctc tgcccagtcc ttcatcgaaa ggatgactaa ctttgataaa 1500
aatctgccta acgaaaaggt gcttcctaaa cactctctgc tgtacgagta cttcacagtt 1560
tataacgagc tcaccaaggt caaatacgtc acagaaggga tgagaaagcc agcattcctg 1620
tctggagatc agaagaaagc tatcgtggac ctcctcttca agacgaaccg gaaagttacc 1680
gtgaaacagc tcaaagaaga ctatttcaaa aagattgaat gtttcgactc tgttgaaatc 1740
agcggagtgg aggatcgctt caacgcatcc ctgggaacgt atcacgatct cctgaaaatc 1800
attaaagaca aggacttcct ggacaatgag gagaacgagg acattcttga ggacattgtc 1860
ctcaccctta cgttgtttga agatagggag atgattgaag aacgcttgaa aacttacgct 1920
catctcttcg acgacaaagt catgaaacag ctcaagaggc gccgatatac aggatggggg 1980
cggctgtcaa gaaaactgat caatgggatc cgagacaagc agagtggaaa gacaatcctg 2040
gattttctta agtccgatgg atttgccaac cggaacttca ttcagttgat ccatgatgac 2100
tctctcacct ttaaggagga catccagaaa gcacaagttt ctggccaggg ggacagtctt 2160
cacgagcaca tcgctaatct tgcaggtagc ccagctatca aaaagggaat actgcagacc 2220
gttaaggtcg tggatgaact cgtcaaagta atgggaaggc ataagcccga gaatatcgtt 2280
atcgagatgg cccgagagaa ccaaactacc cagaagggac agaagaacag tagggaaagg 2340
atgaagagga ttgaagaggg tataaaagaa ctggggtccc aaatccttaa ggaacaccca 2400
gttgaaaaca cccagcttca gaatgagaag ctctacctgt actacctgca gaacggcagg 2460
gacatgtacg tggatcagga actggacatc aatcggctct ccgactacga cgtggatcat 2520
atcgtgcccc agtcttttct caaagatgat tctattgata ataaagtgtt gacaagatcc 2580
gataaaaata gagggaagag tgataacgtc ccctcagaag aagttgtcaa gaaaatgaaa 2640
aattattggc ggcagctgct gaacgccaaa ctgatcacac aacggaagtt cgataatctg 2700
actaaggctg aacgaggtgg cctgtctgag ttggataaag caggcttcat caaaaggcag 2760
cttgttgaga cacgccagat caccaagcac gtggcccaaa ttctcgattc acgcatgaac 2820
accaagtacg atgaaaatga caaactgatt cgagaggtga aagttattac tctgaagtct 2880
aagctggtct cagatttcag aaaggacttt cagttttata aggtgagaga gatcaacaat 2940
taccaccatg cgcatgatgc ctacctgaat gcagtggtag gcactgcact tatcaaaaaa 3000
tatcccaagc ttgaatctga atttgtttac ggagactata aagtgtacga tgttaggaaa 3060
atgatcgcaa agtctgagca ggaaataggc aaggccaccg ctaagtactt cttttacagc 3120
aatattatga attttttcaa gaccgagatt acactggcca atggagagat tcggaagcga 3180
ccacttatcg aaacaaacgg agaaacagga gaaatcgtgt gggacaaggg tagggatttc 3240
gcgacagtcc ggaaggtcct gtccatgccg caggtgaaca tcgttaaaaa gaccgaagta 3300
cagaccggag gcttctccaa ggaaagtatc ctcccgaaaa ggaacagcga caagctgatc 3360
gcacgcaaaa aagattggga ccccaagaaa tacggcggat tcgattctcc tacagtcgct 3420
tacagtgtac tggttgtggc caaagtggag aaagggaagt ctaaaaaact caaaagcgtc 3480
aaggaactgc tgggcatcac aatcatggag cgatcaagct tcgaaaaaaa ccccatcgac 3540
tttctcgagg cgaaaggata taaagaggtc aaaaaagacc tcatcattaa gcttcccaag 3600
tactctctct ttgagcttga aaacggccgg aaacgaatgc tcgctagtgc gggcgttctg 3660
cagaaaggta acgagctggc actgccctct aaatacgtta atttcttgta tctggccagc 3720
cactatgaaa agctcaaagg gtctcccgaa gataatgagc agaagcagct gttcgtggaa 3780
caacacaaac actaccttga tgagatcatc gagcaaataa gcgaattctc caaaagagtg 3840
atcctcgccg acgctaacct cgataaggtg ctttctgctt acaataagca cagggataag 3900
cccatcaggg agcaggcaga aaacattatc cacttgttta ctctgaccaa cttgggcgcg 3960
cctgcagcct tcaagtactt cgacaccacc atagacagaa agcggtacac ctctacaaag 4020
gaggtcctgg acgccacact gattcatcag tcaattacgg ggctctatga aacaagaatc 4080
gacctctctc agctcggtgg agac 4104
<210> 4
<211> 1368
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 4
Gly Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Thr Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Leu Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Arg Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Lys
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Asp Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Ile Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser
1025 1030 1035 1040
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1045 1050 1055
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1060 1065 1070
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
1075 1080 1085
Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1090 1095 1100
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile
1105 1110 1115 1120
Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser
1125 1130 1135
Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1140 1145 1150
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile
1155 1160 1165
Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala
1170 1175 1180
Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys
1185 1190 1195 1200
Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser
1205 1210 1215
Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr
1220 1225 1230
Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val
1265 1270 1275 1280
Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1285 1290 1295
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1300 1305 1310
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp
1315 1320 1325
Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1330 1335 1340
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile
1345 1350 1355 1360
Asp Leu Ser Gln Leu Gly Gly Asp
1365
<210> 5
<211> 4104
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 5
ggcgacaaga agtactccat tgggctcgat atcggcacaa acagcgtcgg ctgggccgtc 60
attacggacg agtacaaggt gccgagcaaa aaattcaaag ttctgggcaa taccgatcgc 120
cacagcataa agaagaacct cattggcgcc ctcctgttcg actccgggga gacggccgaa 180
gccacgcggc tcaaaagaac agcacggcgc agatataccc gcagaaagaa tcggatctgc 240
tacctgcagg agatctttag taatgagatg gctaaggtgg atgactcttt cttccatagg 300
ctggaggagt cctttttggt ggaggaggat aaaaagcacg agcgccaccc aatctttggc 360
aatatcgtgg acgaggtggc gtaccatgaa aagtacccaa ccatatatca tctgaggaag 420
aagcttgtag acagtactga taaggctgac ttgcggttga tctatctcgc gctggcgcat 480
atgatcaaat ttcggggaca cttcctcatc gagggggacc tgaacccaga caacagcgat 540
gtcgacaaac tctttatcca actggttcag acttacaatc agcttttcga agagaacccg 600
atcaacgcat ccggagttga cgccaaagca atcctgagcg ctaggctgtc caaatcccgg 660
cggctcgaaa acctcatcgc acagctccct ggggagaaga agaacggcct gtttggtaat 720
cttatcgccc tgtcactcgg gctgaccccc aactttaaat ctaacttcga cctggccgaa 780
gatgccaagc ttcaactgag caaagacacc tacgatgatg atctcgacaa tctgctggcc 840
cagatcggcg accagtacgc agaccttttt ttggcggcaa agaacctgtc agacgccatt 900
ctgctgagtg atattctgcg agtgaacacg gagatcacca aagctccgct gagcgctagt 960
atgatcaagc gctatgatga gcaccaccaa gacttgactt tgctgaaggc ccttgtcaga 1020
cagcaactgc ctgagaagta caaggaaatt ttcttcgatc agtctaaaaa tggctacgcc 1080
ggatacattg acggcggagc aagccaggag gaattttaca aatttattaa gcccatcttg 1140
gaaaaaatgg acggcaccga ggagctgctg gtaaagctta acagagaaga tctgttgcgc 1200
aaacagcgca ctttcgacaa tggaaatatc ccccaccaga ttcacctggg cgaactgcac 1260
gctatcctca ggcggcaaga ggatttctac ccctttttga aagataacag ggaaaagatt 1320
gagaaaatcc tcacatttcg gataccctac tatgtaggcc ccctcgcccg gggaaattcc 1380
agattcgcgt ggatgactcg caaatcagaa gagaccatca ctccctggaa cttcgagaaa 1440
gtcgtggata agggggcctc tgcccagtcc ttcatcgaaa ggatgactaa ctttgataaa 1500
aatctgccta acgaaaaggt gcttcctaaa cactctctgc tgtacgagta cttcacagtt 1560
tataacgagc tcaccaaggt caaatacgtc acagaaggga tgagaaagcc agcattcctg 1620
tctggagatc agaagaaagc tatcgtggac ctcctcttca agacgaaccg gaaagttacc 1680
gtgaaacagc tcaaagaaga ctatttcaaa aagattgaat gtttcgactc tgttgaaatc 1740
agcggagtgg aggatcgctt caacgcatcc ctgggaacgt atcacgatct cctgaaaatc 1800
attaaagaca aggacttcct ggacaatgag gagaacgagg acattcttga ggacattgtc 1860
ctcaccctta cgttgtttga agatagggag atgattgaag aacgcttgaa aacttacgct 1920
catctcttcg acgacaaagt catgaaacag ctcaagaggc gccgatatac aggatggggg 1980
cggctgtcaa gaaaactgat caatgggatc cgagacaagc agagtggaaa gacaatcctg 2040
gattttctta agtccgatgg atttgccaac cggaacttcc tgcagttgat ccatgatgac 2100
tctctcacct ttaaggagga catccagaaa gcacaagttt ctggccaggg ggacagtctt 2160
cacgagcaca tcgctaatct tgcaggtagc ccagctatca aaaagggaat actgcagacc 2220
gttaaggtcg tggatgaact cgtcaaagta atgggaaggc ataagcccga gaatatcgtt 2280
atcgagatgg cccgagagaa ccaaactacc cagaagggac agaagaacag tagggaaagg 2340
atgaagagga ttgaagaggg tataaaagaa ctggggtccc aaatccttaa ggaacaccca 2400
gttgaaaaca cccagcttca gaatgagaag ctctacctgt actacctgca gaacggcagg 2460
gacatgtacg tggatcagga actggacatc aatcggctct ccgactacga cgtggatcat 2520
atcgtgcccc agtcttttct caaagatgat tctattgata ataaagtgtt gacaagatcc 2580
gataaaaata gagggaagag tgataacgtc ccctcagaag aagttgtcaa gaaaatgaaa 2640
aattattggc ggcagctgct gaacgccaaa ctgatcacac aacggaagtt cgataatctg 2700
actaaggctg aacgaggtgg cctgtctgag ttggataaag caggcttcat caaaaggcag 2760
cttgttgaga cacgccagat caccaagcac gtggcccaaa ttctcgattc acgcatgaac 2820
accaagtacg atgaaaatga caaactgatt cgagaggtga aagttattac tctgaagtct 2880
aagctggtct cagatttcag aaaggacttt cagttttata aggtgagaga gatcaacaat 2940
taccaccatg cgcatgatgc ctacctgaat gcagtggtag gcactgcact tatcaaaaaa 3000
tatcccaagc ttgaatctga atttgtttac ggagactata aagtgtacga tgttaggaaa 3060
atgatcgcaa agtctgagca ggaaataggc aaggccaccg ctaagtactt cttttacagc 3120
aatattatga attttttcaa gaccgagatt acactggcca atggagagat tcggaagcga 3180
ccacttatcg aaacaaacgg agaaacagga gaaatcgtgt gggacaaggg tagggatttc 3240
gcgacagtcc ggaaggtcct gtccatgccg caggtgaaca tcgttaaaaa gaccgaagta 3300
cagaccggag gcttctccaa ggaaagtatc ctcccgaaaa ggaacagcga caagctgatc 3360
gcacgcaaaa aagattggga ccccaagaaa tacggcggat tcgattctcc tacagtcgct 3420
tacagtgtac tggttgtggc caaagtggag aaagggaagt ctaaaaaact caaaagcgtc 3480
aaggaactgc tgggcatcac aatcatggag cgatcaagct tcgaaaaaaa ccccatcgac 3540
tttctcgagg cgaaaggata taaagaggtc aaaaaagacc tcatcattaa gcttcccaag 3600
tactctctct ttgagcttga aaacggccgg aaacgaatgc tcgctagtgc gggcgcactg 3660
cagaaaggta acgagctggc actgccctct aaatacgtta atttcttgta tctggccagc 3720
cactatgaaa agctcaaagg gtctcccgaa gataatgagc agaagcagct gttcgtggaa 3780
caacacaaac actaccttga tgagatcatc gagcaaataa gcgaattctc caaaagagtg 3840
atcctcgccg acgctaacct cgataaggtg ctttctgctt acaataagca cagggataag 3900
cccatcaggg agcaggcaga aaacattatc cacttgttta ctctgaccaa cttgggcgcg 3960
cctgcagcct tcaagtactt cgacaccacc atagacagaa agcggtacac ctctacaaag 4020
gaggtcctgg acgccacact gattcatcag tcaattacgg ggctctatga aacaagaatc 4080
gacctctctc agctcggtgg agac 4104
<210> 6
<211> 1368
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 6
Gly Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Asn Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Lys
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Asp Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Leu Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser
1025 1030 1035 1040
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1045 1050 1055
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile
1060 1065 1070
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
1075 1080 1085
Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly
1090 1095 1100
Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile
1105 1110 1115 1120
Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser
1125 1130 1135
Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1140 1145 1150
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile
1155 1160 1165
Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala
1170 1175 1180
Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys
1185 1190 1195 1200
Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser
1205 1210 1215
Ala Gly Thr Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr
1220 1225 1230
Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val
1265 1270 1275 1280
Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1285 1290 1295
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu
1300 1305 1310
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp
1315 1320 1325
Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp
1330 1335 1340
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile
1345 1350 1355 1360
Asp Leu Ser Gln Leu Gly Gly Asp
1365
<210> 7
<211> 9553
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 7
tggcgaatgg gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 60
cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 120
ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 180
gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 240
acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 300
ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 360
ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 420
acaaaaattt aacgcgaatt ttaacaaaat attaacgttt acaatttcag gtggcacttt 480
tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt caaatatgta 540
tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa ggaagagtat 600
gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt gccttcctgt 660
ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt tgggtgcacg 720
agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt ttcgccccga 780
agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg tattatcccg 840
tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga atgacttggt 900
tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa gagaattatg 960
cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga caacgatcgg 1020
aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa ctcgccttga 1080
tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca ccacgatgcc 1140
tgcagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta ctctagcttc 1200
ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac ttctgcgctc 1260
ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc gtgggtctcg 1320
cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag ttatctacac 1380
gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga taggtgcctc 1440
actgattaag cattggtaac tgtcagacca agtttactca tatatacttt agattgattt 1500
aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata atctcatgac 1560
caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag aaaagatcaa 1620
aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa caaaaaaacc 1680
accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt ttccgaaggt 1740
aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc cgtagttagg 1800
ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa tcctgttacc 1860
agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa gacgatagtt 1920
accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc ccagcttgga 1980
gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa gcgccacgct 2040
tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa caggagagcg 2100
cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg ggtttcgcca 2160
cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc tatggaaaaa 2220
cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg ctcacatgtt 2280
ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg agtgagctga 2340
taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg aagcggaaga 2400
gcgcctgatg cggtattttc tccttacgca tctgtgcggt atttcacacc gcatatatgg 2460
tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatac actccgctat 2520
cgctacgtga ctgggtcatg gctgcgcccc gacacccgcc aacacccgct gacgcgccct 2580
gacgggcttg tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct 2640
gcatgtgtca gaggttttca ccgtcatcac cgaaacgcgc gaggcagctg cggtaaagct 2700
catcagcgtg gtcgtgaagc gattcacaga tgtctgcctg ttcatccgcg tccagctcgt 2760
tgagtttctc cagaagcgtt aatgtctggc ttctgataaa gcgggccatg ttaagggcgg 2820
ttttttcctg tttggtcact gatgcctccg tgtaaggggg atttctgttc atgggggtaa 2880
tgataccgat gaaacgagag aggatgctca cgatacgggt tactgatgat gaacatgccc 2940
ggttactgga acgttgtgag ggtaaacaac tggcggtatg gatgcggcgg gaccagagaa 3000
aaatcactca gggtcaatgc cagcgcttcg ttaatacaga tgtaggtgtt ccacagggta 3060
gccagcagca tcctgcgatg cagatccgga acataatggt gcagggcgct gacttccgcg 3120
tttccagact ttacgaaaca cggaaaccga agaccattca tgttgttgct caggtcgcag 3180
acgttttgca gcagcagtcg cttcacgttc gctcgcgtat cggtgattca ttctgctaac 3240
cagtaaggca accccgccag cctagccggg tcctcaacga caggagcacg atcatgcgca 3300
cccgtggggc cgccatgccg gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 3360
gaccagtgac gaaggcttga gcgagggcgt gcaagattcc gaataccgca agcgacaggc 3420
cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 3480
gcacctgtcc tacgagttgc atgataaaga agacagtcat aagtgcggcg acgatagtca 3540
tgccccgcgc ccaccggaag gagctgactg ggttgaaggc tctcaagggc atcggtcgag 3600
atcccggtgc ctaatgagtg agctaactta cattaattgc gttgcgctca ctgcccgctt 3660
tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 3720
gcggtttgcg tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc 3780
tgattgccct tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc 3840
cccagcaggc gaaaatcctg tttgatggtg gttaacggcg ggatataaca tgagctgtct 3900
tcggtatcgt cgtatcccac taccgagata tccgcaccaa cgcgcagccc ggactcggta 3960
atggcgcgca ttgcgcccag cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 4020
atgccctcat tcagcatttg catggtttgt tgaaaaccgg acatggcact ccagtcgcct 4080
tcccgttccg ctatcggctg aatttgattg cgagtgagat atttatgcca gccagccaga 4140
cgcagacgcg ccgagacaga acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 4200
aatgcgacca gatgctccac gcccagtcgc gtaccgtctt catgggagaa aataatactg 4260
ttgatgggtg tctggtcaga gacatcaaga aataacgccg gaacattagt gcaggcagct 4320
tccacagcaa tggcatcctg gtcatccagc ggatagttaa tgatcagccc actgacgcgt 4380
tgcgcgagaa gattgtgcac cgccgcttta caggcttcga cgccgcttcg ttctaccatc 4440
gacaccacca cgctggcacc cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 4500
gacggcgcgt gcagggccag actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 4560
gccagttgtt gtgccacgcg gttgggaatg taattcagct ccgccatcgc cgcttccact 4620
ttttcccgcg ttttcgcaga aacgtggctg gcctggttca ccacgcggga aacggtctga 4680
taagagacac cggcatactc tgcgacatcg tataacgtta ctggtttcac attcaccacc 4740
ctgaattgac tctcttccgg gcgctatcat gccataccgc gaaaggtttt gcgccattcg 4800
atggtgtccg ggatctcgac gctctccctt atgcgactcc tgcattagga agcagcccag 4860
tagtaggttg aggccgttga gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc 4920
gcccaacagt cccccggcca cggggcctgc caccataccc acgccgaaac aagcgctcat 4980
gagcccgaag tggcgagccc gatcttcccc atcggtgatg tcggcgatat aggcgccagc 5040
aaccgcacct gtggcgccgg tgatgccggc cacgatgcgt ccggcgtaga ggatcgagat 5100
ctcgatcccg cgaaattaat acgactcact ataggggaat tgtgagcgga taacaattcc 5160
cctctagaaa taattttgtt taactttaag aaggagatat accatgggca gcagccatca 5220
tcatcatcat cacagcagcg gcctggtgcc gcgcggcagc catatggaaa atctctactt 5280
ccaaggcgac aagaagtact ccattgggct cgatatcggc acaaacagcg tcggctgggc 5340
cgtcattacg gacgagtaca aggtgccgag caaaaaattc aaagttctgg gcaataccga 5400
tcgccacagc ataaagaaga acctcattgg cgccctcctg ttcgactccg gggagacggc 5460
cgaagccacg cggctcaaaa gaacagcacg gcgcagatat acccgcagaa agaatcggat 5520
ctgctacctg caggagatct ttagtaatga gatggctaag gtggatgact ctttcttcca 5580
taggctggag gagtcctttt tggtggagga ggataaaaag cacgagcgcc acccaatctt 5640
tggcaatatc gtggacgagg tggcgtacca tgaaaagtac ccaaccatat atcatctgag 5700
gaagaagctt gtagacagta ctgataaggc tgacttgcgg ttgatctatc tcgcgctggc 5760
gcatatgatc aaatttcggg gacacttcct catcgagggg gacctgaacc cagacaacag 5820
cgatgtcgac aaactcttta tccaactggt tcagacttac aatcagcttt tcgaagagaa 5880
cccgatcaac gcatccggag ttgacgccaa agcaatcctg agcgctaggc tgtccaaatc 5940
ccggcggctc gaaaacctca tcgcacagct ccctggggag aagaagaacg gcctgtttgg 6000
taatcttatc gccctgtcac tcgggctgac ccccaacttt aaatctaact tcgacctggc 6060
cgaagatgcc aagcttcaac tgagcaaaga cacctacgat gatgatctcg acaatctgct 6120
ggcccagatc ggcgaccagt acgcagacct ttttttggcg gcaaagaacc tgtcagacgc 6180
cattctgctg agtgatattc tgcgagtgaa cacggagatc accaaagctc cgctgagcgc 6240
tagtatgatc aagcgctatg atgagcacca ccaagacttg actttgctga aggcccttgt 6300
cagacagcaa ctgcctgaga agtacaagga aattttcttc gatcagtcta aaaatggcta 6360
cgccggatac attgacggcg gagcaagcca ggaggaattt tacaaattta ttaagcccat 6420
cttggaaaaa atggacggca ccgaggagct gctggtaaag cttaacagag aagatctgtt 6480
gcgcaaacag cgcactttcg acaatggaaa tatcccccac cagattcacc tgggcgaact 6540
gcacgctatc ctcaggcggc aagaggattt ctaccccttt ttgaaagata acagggaaaa 6600
gattgagaaa atcctcacat ttcggatacc ctactatgta ggccccctcg cccggggaaa 6660
ttccagattc gcgtggatga ctcgcaaatc agaagagacc atcactccct ggaacttcga 6720
gaaagtcgtg gataaggggg cctctgccca gtccttcatc gaaaggatga ctaactttga 6780
taaaaatctg cctaacgaaa aggtgcttcc taaacactct ctgctgtacg agtacttcac 6840
agtttataac gagctcacca aggtcaaata cgtcacagaa gggatgagaa agccagcatt 6900
cctgtctgga gatcagaaga aagctatcgt ggacctcctc ttcaagacga accggaaagt 6960
taccgtgaaa cagctcaaag aagactattt caaaaagatt gaatgtttcg actctgttga 7020
aatcagcgga gtggaggatc gcttcaacgc atccctggga acgtatcacg atctcctgaa 7080
aatcattaaa gacaaggact tcctggacaa tgaggagaac gaggacattc ttgaggacat 7140
tgtcctcacc cttacgttgt ttgaagatag ggagatgatt gaagaacgct tgaaaactta 7200
cgctcatctc ttcgacgaca aagtcatgaa acagctcaag aggcgccgat atacaggatg 7260
ggggcggctg tcaagaaaac tgatcaatgg gatccgagac aagcagagtg gaaagacaat 7320
cctggatttt cttaagtccg atggatttgc caaccggaac ttcctgcagt tgatccatga 7380
tgactctctc acctttaagg aggacatcca gaaagcacaa gtttctggcc agggggacag 7440
tcttcacgag cacatcgcta atcttgcagg tagcccagct atcaaaaagg gaatactgca 7500
gaccgttaag gtcgtggatg aactcgtcaa agtaatggga aggcataagc ccgagaatat 7560
cgttatcgag atggcccgag agaaccaaac tacccagaag ggacagaaga acagtaggga 7620
aaggatgaag aggattgaag agggtataaa agaactgggg tcccaaatcc ttaaggaaca 7680
cccagttgaa aacacccagc ttcagaatga gaagctctac ctgtactacc tgcagaacgg 7740
cagggacatg tacgtggatc aggaactgga catcaatcgg ctctccgact acgacgtgga 7800
tcatatcgtg ccccagtctt ttctcaaaga tgattctatt gataataaag tgttgacaag 7860
atccgataaa aatagaggga agagtgataa cgtcccctca gaagaagttg tcaagaaaat 7920
gaaaaattat tggcggcagc tgctgaacgc caaactgatc acacaacgga agttcgataa 7980
tctgactaag gctgaacgag gtggcctgtc tgagttggat aaagcaggct tcatcaaaag 8040
gcagcttgtt gagacacgcc agatcaccaa gcacgtggcc caaattctcg attcacgcat 8100
gaacaccaag tacgatgaaa atgacaaact gattcgagag gtgaaagtta ttactctgaa 8160
gtctaagctg gtctcagatt tcagaaagga ctttcagttt tataaggtga gagagatcaa 8220
caattaccac catgcgcatg atgcctacct gaatgcagtg gtaggcactg cacttatcaa 8280
aaaatatccc aagcttgaat ctgaatttgt ttacggagac tataaagtgt acgatgttag 8340
gaaaatgatc gcaaagtctg agcaggaaat aggcaaggcc accgctaagt acttctttta 8400
cagcaatatt atgaattttt tcaagaccga gattacactg gccaatggag agattcggaa 8460
gcgaccactt atcgaaacaa acggagaaac aggagaaatc gtgtgggaca agggtaggga 8520
tttcgcgaca gtccggaagg tcctgtccat gccgcaggtg aacatcgtta aaaagaccga 8580
agtacagacc ggaggcttct ccaaggaaag tatcctcccg aaaaggaaca gcgacaagct 8640
gatcgcacgc aaaaaagatt gggaccccaa gaaatacggc ggattcgatt ctcctacagt 8700
cgcttacagt gtactggttg tggccaaagt ggagaaaggg aagtctaaaa aactcaaaag 8760
cgtcaaggaa ctgctgggca tcacaatcat ggagcgatca agcttcgaaa aaaaccccat 8820
cgactttctc gaggcgaaag gatataaaga ggtcaaaaaa gacctcatca ttaagcttcc 8880
caagtactct ctctttgagc ttgaaaacgg ccggaaacga atgctcgcta gtgcgggcgc 8940
actgcagaaa ggtaacgagc tggcactgcc ctctaaatac gttaatttct tgtatctggc 9000
cagccactat gaaaagctca aagggtctcc cgaagataat gagcagaagc agctgttcgt 9060
ggaacaacac aaacactacc ttgatgagat catcgagcaa ataagcgaat tctccaaaag 9120
agtgatcctc gccgacgcta acctcgataa ggtgctttct gcttacaata agcacaggga 9180
taagcccatc agggagcagg cagaaaacat tatccacttg tttactctga ccaacttggg 9240
cgcgcctgca gccttcaagt acttcgacac caccatagac agaaagcggt acacctctac 9300
aaaggaggtc ctggacgcca cactgattca tcagtcaatt acggggctct atgaaacaag 9360
aatcgacctc tctcagctcg gtggagactg actcgagcac caccaccacc accactgaga 9420
tccggctgct aacaaagccc gaaaggaagc tgagttggct gctgccaccg ctgagcaata 9480
actagcataa ccccttgggg cctctaaacg ggtcttgagg ggttttttgc tgaaaggagg 9540
aactatatcc gga 9553
<210> 8
<211> 9553
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 8
tggcgaatgg gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 60
cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 120
ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 180
gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 240
acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 300
ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 360
ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 420
acaaaaattt aacgcgaatt ttaacaaaat attaacgttt acaatttcag gtggcacttt 480
tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt caaatatgta 540
tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa ggaagagtat 600
gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt gccttcctgt 660
ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt tgggtgcacg 720
agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt ttcgccccga 780
agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg tattatcccg 840
tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga atgacttggt 900
tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa gagaattatg 960
cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga caacgatcgg 1020
aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa ctcgccttga 1080
tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca ccacgatgcc 1140
tgcagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta ctctagcttc 1200
ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac ttctgcgctc 1260
ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc gtgggtctcg 1320
cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag ttatctacac 1380
gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga taggtgcctc 1440
actgattaag cattggtaac tgtcagacca agtttactca tatatacttt agattgattt 1500
aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata atctcatgac 1560
caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag aaaagatcaa 1620
aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa caaaaaaacc 1680
accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt ttccgaaggt 1740
aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc cgtagttagg 1800
ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa tcctgttacc 1860
agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa gacgatagtt 1920
accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc ccagcttgga 1980
gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa gcgccacgct 2040
tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa caggagagcg 2100
cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg ggtttcgcca 2160
cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc tatggaaaaa 2220
cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg ctcacatgtt 2280
ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg agtgagctga 2340
taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg aagcggaaga 2400
gcgcctgatg cggtattttc tccttacgca tctgtgcggt atttcacacc gcatatatgg 2460
tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatac actccgctat 2520
cgctacgtga ctgggtcatg gctgcgcccc gacacccgcc aacacccgct gacgcgccct 2580
gacgggcttg tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct 2640
gcatgtgtca gaggttttca ccgtcatcac cgaaacgcgc gaggcagctg cggtaaagct 2700
catcagcgtg gtcgtgaagc gattcacaga tgtctgcctg ttcatccgcg tccagctcgt 2760
tgagtttctc cagaagcgtt aatgtctggc ttctgataaa gcgggccatg ttaagggcgg 2820
ttttttcctg tttggtcact gatgcctccg tgtaaggggg atttctgttc atgggggtaa 2880
tgataccgat gaaacgagag aggatgctca cgatacgggt tactgatgat gaacatgccc 2940
ggttactgga acgttgtgag ggtaaacaac tggcggtatg gatgcggcgg gaccagagaa 3000
aaatcactca gggtcaatgc cagcgcttcg ttaatacaga tgtaggtgtt ccacagggta 3060
gccagcagca tcctgcgatg cagatccgga acataatggt gcagggcgct gacttccgcg 3120
tttccagact ttacgaaaca cggaaaccga agaccattca tgttgttgct caggtcgcag 3180
acgttttgca gcagcagtcg cttcacgttc gctcgcgtat cggtgattca ttctgctaac 3240
cagtaaggca accccgccag cctagccggg tcctcaacga caggagcacg atcatgcgca 3300
cccgtggggc cgccatgccg gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 3360
gaccagtgac gaaggcttga gcgagggcgt gcaagattcc gaataccgca agcgacaggc 3420
cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 3480
gcacctgtcc tacgagttgc atgataaaga agacagtcat aagtgcggcg acgatagtca 3540
tgccccgcgc ccaccggaag gagctgactg ggttgaaggc tctcaagggc atcggtcgag 3600
atcccggtgc ctaatgagtg agctaactta cattaattgc gttgcgctca ctgcccgctt 3660
tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 3720
gcggtttgcg tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc 3780
tgattgccct tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc 3840
cccagcaggc gaaaatcctg tttgatggtg gttaacggcg ggatataaca tgagctgtct 3900
tcggtatcgt cgtatcccac taccgagata tccgcaccaa cgcgcagccc ggactcggta 3960
atggcgcgca ttgcgcccag cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 4020
atgccctcat tcagcatttg catggtttgt tgaaaaccgg acatggcact ccagtcgcct 4080
tcccgttccg ctatcggctg aatttgattg cgagtgagat atttatgcca gccagccaga 4140
cgcagacgcg ccgagacaga acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 4200
aatgcgacca gatgctccac gcccagtcgc gtaccgtctt catgggagaa aataatactg 4260
ttgatgggtg tctggtcaga gacatcaaga aataacgccg gaacattagt gcaggcagct 4320
tccacagcaa tggcatcctg gtcatccagc ggatagttaa tgatcagccc actgacgcgt 4380
tgcgcgagaa gattgtgcac cgccgcttta caggcttcga cgccgcttcg ttctaccatc 4440
gacaccacca cgctggcacc cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 4500
gacggcgcgt gcagggccag actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 4560
gccagttgtt gtgccacgcg gttgggaatg taattcagct ccgccatcgc cgcttccact 4620
ttttcccgcg ttttcgcaga aacgtggctg gcctggttca ccacgcggga aacggtctga 4680
taagagacac cggcatactc tgcgacatcg tataacgtta ctggtttcac attcaccacc 4740
ctgaattgac tctcttccgg gcgctatcat gccataccgc gaaaggtttt gcgccattcg 4800
atggtgtccg ggatctcgac gctctccctt atgcgactcc tgcattagga agcagcccag 4860
tagtaggttg aggccgttga gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc 4920
gcccaacagt cccccggcca cggggcctgc caccataccc acgccgaaac aagcgctcat 4980
gagcccgaag tggcgagccc gatcttcccc atcggtgatg tcggcgatat aggcgccagc 5040
aaccgcacct gtggcgccgg tgatgccggc cacgatgcgt ccggcgtaga ggatcgagat 5100
ctcgatcccg cgaaattaat acgactcact ataggggaat tgtgagcgga taacaattcc 5160
cctctagaaa taattttgtt taactttaag aaggagatat accatgggca gcagccatca 5220
tcatcatcat cacagcagcg gcctggtgcc gcgcggcagc catatggaaa atctctactt 5280
ccaaggcgac aagaagtact ccattgggct cgatatcggc acaaacagcg tcggctgggc 5340
cgtcattacg gacgagtaca aggtgccgag caaaaaattc aaagttctgg gcaataccga 5400
tcgccacagc ataaagaaga acctcattgg cgccctcctg ttcgactccg gggagacggc 5460
cgaagccacg cggctcaaaa gaacagcacg gcgcagatat acccgcagaa agaatcggat 5520
ctgctacctg caggagatct ttagtaatga gatggctaag gtggatgact ctttcttcca 5580
taggctggag gagtcctttt tggtggagga ggataaaaag cacgagcgcc acccaatctt 5640
tggcaatatc gtggacgagg tggcgtacca tgaaaagtac ccaaccatat atcatctgag 5700
gaagaagctt gtagacagta ctgataaggc tgacttgcgg ttgatctatc tcgcgctggc 5760
gcatatgatc aaatttcggg gacacttcct catcgagggg gacctgaacc cagacaacag 5820
cgatgtcgac aaactcttta tccaactggt tcagacttac aatcagcttt tcgaagagaa 5880
cccgatcaac gcatccggag ttgacgccaa agcaatcctg agcgctaggc tgtccaaatc 5940
ccggcggctc gaaaacctca tcgcacagct ccctggggag aagaagaacg gcctgtttgg 6000
taatcttatc gccctgtcac tcgggctgac ccccaacttt aaatctaact tcgacctggc 6060
cgaagatgcc aagcttcaac tgagcaaaga cacctacgat gatgatctcg acaatctgct 6120
ggcccagatc ggcgaccagt acgcagacct ttttttggcg gcaaagaacc tgtcagacgc 6180
cattctgctg agtgatattc tgcgagtgaa cacggagatc accaaagctc cgctgagcgc 6240
tagtatgatc aagcgctatg atgagcacca ccaagacttg actttgctga aggcccttgt 6300
cagacagcaa ctgcctgaga agtacaagga aattttcttc gatcagtcta aaaatggcta 6360
cgccggatac attgacggcg gagcaagcca ggaggaattt tacaaattta ttaagcccat 6420
cttggaaaaa atggacggca ccgaggagct gctggtaaag cttaacagag aagatctgtt 6480
gcgcaaacag cgcactttcg acaatggaag catcccccac cagattcacc tgggcgaact 6540
gcacgctatc ctcaggcggc aagaggattt ctaccccttt ttgaaagata acagggaaaa 6600
gattgagaaa atcctcacat ttcggatacc ctactatgta ggccccctcg cccggggaaa 6660
ttccagattc gcgtggatga ctcgcaaatc agaagagacc atcactccct ggaacttcga 6720
ggaagtcgtg gataaggggg cctctgccca gtccttcatc gaaaggatga ctaactttga 6780
taaaaatctg cctaacgaaa aggtgcttcc taaacactct ctgctgtacg agtacttcac 6840
agtttataac gagctcacca aggtcaaata cgtcacagaa gggatgagaa agccagcatt 6900
cctgtctgga gagcagaaga aagctatcgt ggacctcctc ttcaagacga accggaaagt 6960
taccgtgaaa cagctcaaag aagactattt caaaaagatt gaatgtttcg actctgttga 7020
aatcagcgga gtggaggatc gcttcaacgc atccctggga acgtatcacg atctcctgaa 7080
aatcattaaa gacaaggact tcctggacaa tgaggagaac gaggacattc ttgaggacat 7140
tgtcctcacc cttacgttgt ttgaagatag ggagatgatt gaagaacgct tgaaaactta 7200
cgctcatctc ttcgacgaca aagtcatgaa acagctcaag aggcgccgat atacaggatg 7260
ggggcggctg tcaagaaaac tgatcaatgg gatccgagac aagcagagtg gaaagacaat 7320
cctggatttt cttaagtccg atggatttgc caaccggaac ttcatgcagt tgatccatga 7380
tgactctctc acctttaagg aggacatcca gaaagcacaa gtttctggcc agggggacag 7440
tcttcacgag cacatcgcta atcttgcagg tagcccagct atcaaaaagg gaatactgca 7500
gaccgttaag gtcgtggatg aactcgtcaa agtaatggga aggcataagc ccgagaatat 7560
cgttatcgag atggcccgag agaaccaaac tacccagaag ggacagaaga acagtaggga 7620
aaggatgaag aggattgaag agggtataaa agaactgggg tcccaaatcc ttaaggaaca 7680
cccagttgaa aacacccagc ttcagaatga gaagctctac ctgtactacc tgcagaacgg 7740
cagggacatg tacgtggatc aggaactgga catcaatcgg ctctccgact acgacgtgga 7800
tcatatcgtg ccccagtctt ttctcaaaga tgattctatt gataataaag tgttgacaag 7860
atccgataaa aatagaggga agagtgataa cgtcccctca gaagaagttg tcaagaaaat 7920
gaaaaattat tggcggcagc tgctgaacgc caaactgatc acacaacgga agttcgataa 7980
tctgactaag gctgaacgag gtggcctgtc tgagttggat aaagcaggct tcatcaaaag 8040
gcagcttgtt gagacacgcc agatcaccaa gcacgtggcc caaattctcg attcacgcat 8100
gaacaccaag tacgatgaaa atgacaaact gattcgagag gtgaaagtta ttactctgaa 8160
gtctaagctg gtctcagatt tcagaaagga ctttcagttt tataaggtga gagagatcaa 8220
caattaccac catgcgcatg atgcctacct gaatgcagtg gtaggcactg cacttatcaa 8280
aaaatatccc aagcttgaat ctgaatttgt ttacggagac tataaagtgt acgatgttag 8340
gaaaatgatc gcaaagtctg agcaggaaat aggcaaggcc accgctaagt acttctttta 8400
cagcaatatt atgaattttt tcaagaccga gattacactg gccaatggag agattcggaa 8460
gcgaccactt atcgaaacaa acggagaaac aggagaaatc gtgtgggaca agggtaggga 8520
tttcgcgaca gtccggaagg tcctgtccat gccgcaggtg aacatcgtta aaaagaccga 8580
agtacagacc ggaggcttct ccaaggaaag tatcctcccg aaaaggaaca gcgacaagct 8640
gatcgcacgc aaaaaagatt gggaccccaa gaaatacggc ggattcgatt ctcctacagt 8700
cgcttacagt gtactggttg tggccaaagt ggagaaaggg aagtctaaaa aactcaaaag 8760
cgtcaaggaa ctgctgggca tcacaatcat ggagcgatca agcttcgaaa aaaaccccat 8820
cgactttctc gaggcgaaag gatataaaga ggtcaaaaaa gacctcatca ttaagcttcc 8880
caagtactct ctctttgagc ttgaaaacgg ccggaaacga atgctcgcta gtgcgggcga 8940
gctgcagaaa ggtaacgagc tggcactgcc ctctaaatac gttaatttct tgtatctggc 9000
cagccactat gaaaagctca aagggtctcc cgaagataat gagcagaagc agctgttcgt 9060
ggaacaacac aaacactacc ttgatgagat catcgagcaa ataagcgaat tctccaaaag 9120
agtgatcctc gccgacgcta acctcgataa ggtgctttct gcttacaata agcacaggga 9180
taagcccatc agggagcagg cagaaaacat tatccacttg tttactctga ccaacttggg 9240
cgcgcctgca gccttcaagt acttcgacac caccatagac agaaagcggt acacctctac 9300
aaaggaggtc ctggacgcca cactgattca tcagtcaatt acggggctct atgaaacaag 9360
aatcgacctc tctcagctcg gtggagactg actcgagcac caccaccacc accactgaga 9420
tccggctgct aacaaagccc gaaaggaagc tgagttggct gctgccaccg ctgagcaata 9480
actagcataa ccccttgggg cctctaaacg ggtcttgagg ggttttttgc tgaaaggagg 9540
aactatatcc gga 9553
<210> 9
<211> 9553
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 9
tggcgaatgg gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 60
cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 120
ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 180
gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 240
acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 300
ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 360
ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 420
acaaaaattt aacgcgaatt ttaacaaaat attaacgttt acaatttcag gtggcacttt 480
tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt caaatatgta 540
tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa ggaagagtat 600
gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt gccttcctgt 660
ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt tgggtgcacg 720
agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt ttcgccccga 780
agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg tattatcccg 840
tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga atgacttggt 900
tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa gagaattatg 960
cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga caacgatcgg 1020
aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa ctcgccttga 1080
tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca ccacgatgcc 1140
tgcagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta ctctagcttc 1200
ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac ttctgcgctc 1260
ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc gtgggtctcg 1320
cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag ttatctacac 1380
gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga taggtgcctc 1440
actgattaag cattggtaac tgtcagacca agtttactca tatatacttt agattgattt 1500
aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata atctcatgac 1560
caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag aaaagatcaa 1620
aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa caaaaaaacc 1680
accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt ttccgaaggt 1740
aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc cgtagttagg 1800
ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa tcctgttacc 1860
agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa gacgatagtt 1920
accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc ccagcttgga 1980
gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa gcgccacgct 2040
tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa caggagagcg 2100
cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg ggtttcgcca 2160
cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc tatggaaaaa 2220
cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg ctcacatgtt 2280
ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg agtgagctga 2340
taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg aagcggaaga 2400
gcgcctgatg cggtattttc tccttacgca tctgtgcggt atttcacacc gcatatatgg 2460
tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatac actccgctat 2520
cgctacgtga ctgggtcatg gctgcgcccc gacacccgcc aacacccgct gacgcgccct 2580
gacgggcttg tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct 2640
gcatgtgtca gaggttttca ccgtcatcac cgaaacgcgc gaggcagctg cggtaaagct 2700
catcagcgtg gtcgtgaagc gattcacaga tgtctgcctg ttcatccgcg tccagctcgt 2760
tgagtttctc cagaagcgtt aatgtctggc ttctgataaa gcgggccatg ttaagggcgg 2820
ttttttcctg tttggtcact gatgcctccg tgtaaggggg atttctgttc atgggggtaa 2880
tgataccgat gaaacgagag aggatgctca cgatacgggt tactgatgat gaacatgccc 2940
ggttactgga acgttgtgag ggtaaacaac tggcggtatg gatgcggcgg gaccagagaa 3000
aaatcactca gggtcaatgc cagcgcttcg ttaatacaga tgtaggtgtt ccacagggta 3060
gccagcagca tcctgcgatg cagatccgga acataatggt gcagggcgct gacttccgcg 3120
tttccagact ttacgaaaca cggaaaccga agaccattca tgttgttgct caggtcgcag 3180
acgttttgca gcagcagtcg cttcacgttc gctcgcgtat cggtgattca ttctgctaac 3240
cagtaaggca accccgccag cctagccggg tcctcaacga caggagcacg atcatgcgca 3300
cccgtggggc cgccatgccg gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 3360
gaccagtgac gaaggcttga gcgagggcgt gcaagattcc gaataccgca agcgacaggc 3420
cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 3480
gcacctgtcc tacgagttgc atgataaaga agacagtcat aagtgcggcg acgatagtca 3540
tgccccgcgc ccaccggaag gagctgactg ggttgaaggc tctcaagggc atcggtcgag 3600
atcccggtgc ctaatgagtg agctaactta cattaattgc gttgcgctca ctgcccgctt 3660
tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 3720
gcggtttgcg tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc 3780
tgattgccct tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc 3840
cccagcaggc gaaaatcctg tttgatggtg gttaacggcg ggatataaca tgagctgtct 3900
tcggtatcgt cgtatcccac taccgagata tccgcaccaa cgcgcagccc ggactcggta 3960
atggcgcgca ttgcgcccag cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 4020
atgccctcat tcagcatttg catggtttgt tgaaaaccgg acatggcact ccagtcgcct 4080
tcccgttccg ctatcggctg aatttgattg cgagtgagat atttatgcca gccagccaga 4140
cgcagacgcg ccgagacaga acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 4200
aatgcgacca gatgctccac gcccagtcgc gtaccgtctt catgggagaa aataatactg 4260
ttgatgggtg tctggtcaga gacatcaaga aataacgccg gaacattagt gcaggcagct 4320
tccacagcaa tggcatcctg gtcatccagc ggatagttaa tgatcagccc actgacgcgt 4380
tgcgcgagaa gattgtgcac cgccgcttta caggcttcga cgccgcttcg ttctaccatc 4440
gacaccacca cgctggcacc cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 4500
gacggcgcgt gcagggccag actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 4560
gccagttgtt gtgccacgcg gttgggaatg taattcagct ccgccatcgc cgcttccact 4620
ttttcccgcg ttttcgcaga aacgtggctg gcctggttca ccacgcggga aacggtctga 4680
taagagacac cggcatactc tgcgacatcg tataacgtta ctggtttcac attcaccacc 4740
ctgaattgac tctcttccgg gcgctatcat gccataccgc gaaaggtttt gcgccattcg 4800
atggtgtccg ggatctcgac gctctccctt atgcgactcc tgcattagga agcagcccag 4860
tagtaggttg aggccgttga gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc 4920
gcccaacagt cccccggcca cggggcctgc caccataccc acgccgaaac aagcgctcat 4980
gagcccgaag tggcgagccc gatcttcccc atcggtgatg tcggcgatat aggcgccagc 5040
aaccgcacct gtggcgccgg tgatgccggc cacgatgcgt ccggcgtaga ggatcgagat 5100
ctcgatcccg cgaaattaat acgactcact ataggggaat tgtgagcgga taacaattcc 5160
cctctagaaa taattttgtt taactttaag aaggagatat accatgggca gcagccatca 5220
tcatcatcat cacagcagcg gcctggtgcc gcgcggcagc catatggaaa atctctactt 5280
ccaaggcgac aagaagtact ccattgggct cgatatcggc acaaacagcg tcggctgggc 5340
cgtcattacg gacgagtaca aggtgccgag caaaaaattc aaagttctgg gcaataccga 5400
tcgccacagc ataaagaaga acctcattgg cgccctcctg ttcgactccg gggagacggc 5460
cgaagccacg cggctcaaaa gaacagcacg gcgcagatat acccgcagaa agaatcggat 5520
ctgctacctg caggagatct ttagtaatga gatggctaag gtggatgact ctttcttcca 5580
taggctggag gagtcctttt tggtggagga ggataaaaag cacgagcgcc acccaatctt 5640
tggcaatatc gtggacgagg tggcgtacca tgaaaagtac ccaaccatat atcatctgag 5700
gaagaagctt gtagacagta ctgataaggc tgacttgcgg ttgatctatc tcgcgctggc 5760
gcatatgatc aaatttcggg gacacttcct catcgagggg gacctgaacc cagacaacag 5820
cgatgtcgac aaactcttta tccaactggt tcagacttac aatcagcttt tcgaagagaa 5880
cccgatcaac gcatccggag ttgacgccaa agcaatcctg agcgctaggc tgtccaaatc 5940
ccggcggctc gaaaacctca tcgcacagct ccctggggag aagaagaacg gcctgtttgg 6000
taatcttatc gccctgtcac tcgggctgac ccccaacttt aaatctaact tcgacctggc 6060
cgaagatacc aagcttcaac tgagcaaaga cacctacgat gatgatctcg acaatctgct 6120
ggcccagatc ggcgaccagt acgcagacct ttttttggcg gcaaagaacc tgtcagacgc 6180
cattctgctg agtgatattc tgcgagtgaa cacggagatc accaaagctc cgctgagcgc 6240
tagtatgatc aagctgtatg atgagcacca ccaagacttg actttgctga aggcccttgt 6300
cagacagcaa ctgcctgaga agtacaagga aattttcttc gatcagtcta aaaatggcta 6360
cgccggatac attgacggcg gagcaagcca ggaggaattt tacaaattta ttaagcccat 6420
cttggaaaaa atggacggca ccgaggagct gctggtaaag cttaacagag aagatctgtt 6480
gcgcaaacag cgcactttcg acaatggaat tatcccccac cagattcacc tgggcgaact 6540
gcacgctatc ctcaggcggc aagaggattt ctaccccttt ttgaaagata acagggaaaa 6600
gattgagaaa atcctcacat ttcggatacc ctactatgta ggccccctcg cccggggaaa 6660
ttccagattc gcgtggatga ctcgcaaatc agaagagacc atcactccct ggaacttcga 6720
gaaagtcgtg gataaggggg cctctgccca gtccttcatc gaaaggatga ctaactttga 6780
taaaaatctg cctaacgaaa aggtgcttcc taaacactct ctgctgtacg agtacttcac 6840
agtttataac gagctcacca aggtcaaata cgtcacagaa gggatgagaa agccagcatt 6900
cctgtctgga gatcagaaga aagctatcgt ggacctcctc ttcaagacga accggaaagt 6960
taccgtgaaa cagctcaaag aagactattt caaaaagatt gaatgtttcg actctgttga 7020
aatcagcgga gtggaggatc gcttcaacgc atccctggga acgtatcacg atctcctgaa 7080
aatcattaaa gacaaggact tcctggacaa tgaggagaac gaggacattc ttgaggacat 7140
tgtcctcacc cttacgttgt ttgaagatag ggagatgatt gaagaacgct tgaaaactta 7200
cgctcatctc ttcgacgaca aagtcatgaa acagctcaag aggcgccgat atacaggatg 7260
ggggcggctg tcaagaaaac tgatcaatgg gatccgagac aagcagagtg gaaagacaat 7320
cctggatttt cttaagtccg atggatttgc caaccggaac ttcattcagt tgatccatga 7380
tgactctctc acctttaagg aggacatcca gaaagcacaa gtttctggcc agggggacag 7440
tcttcacgag cacatcgcta atcttgcagg tagcccagct atcaaaaagg gaatactgca 7500
gaccgttaag gtcgtggatg aactcgtcaa agtaatggga aggcataagc ccgagaatat 7560
cgttatcgag atggcccgag agaaccaaac tacccagaag ggacagaaga acagtaggga 7620
aaggatgaag aggattgaag agggtataaa agaactgggg tcccaaatcc ttaaggaaca 7680
cccagttgaa aacacccagc ttcagaatga gaagctctac ctgtactacc tgcagaacgg 7740
cagggacatg tacgtggatc aggaactgga catcaatcgg ctctccgact acgacgtgga 7800
tcatatcgtg ccccagtctt ttctcaaaga tgattctatt gataataaag tgttgacaag 7860
atccgataaa aatagaggga agagtgataa cgtcccctca gaagaagttg tcaagaaaat 7920
gaaaaattat tggcggcagc tgctgaacgc caaactgatc acacaacgga agttcgataa 7980
tctgactaag gctgaacgag gtggcctgtc tgagttggat aaagcaggct tcatcaaaag 8040
gcagcttgtt gagacacgcc agatcaccaa gcacgtggcc caaattctcg attcacgcat 8100
gaacaccaag tacgatgaaa atgacaaact gattcgagag gtgaaagtta ttactctgaa 8160
gtctaagctg gtctcagatt tcagaaagga ctttcagttt tataaggtga gagagatcaa 8220
caattaccac catgcgcatg atgcctacct gaatgcagtg gtaggcactg cacttatcaa 8280
aaaatatccc aagcttgaat ctgaatttgt ttacggagac tataaagtgt acgatgttag 8340
gaaaatgatc gcaaagtctg agcaggaaat aggcaaggcc accgctaagt acttctttta 8400
cagcaatatt atgaattttt tcaagaccga gattacactg gccaatggag agattcggaa 8460
gcgaccactt atcgaaacaa acggagaaac aggagaaatc gtgtgggaca agggtaggga 8520
tttcgcgaca gtccggaagg tcctgtccat gccgcaggtg aacatcgtta aaaagaccga 8580
agtacagacc ggaggcttct ccaaggaaag tatcctcccg aaaaggaaca gcgacaagct 8640
gatcgcacgc aaaaaagatt gggaccccaa gaaatacggc ggattcgatt ctcctacagt 8700
cgcttacagt gtactggttg tggccaaagt ggagaaaggg aagtctaaaa aactcaaaag 8760
cgtcaaggaa ctgctgggca tcacaatcat ggagcgatca agcttcgaaa aaaaccccat 8820
cgactttctc gaggcgaaag gatataaaga ggtcaaaaaa gacctcatca ttaagcttcc 8880
caagtactct ctctttgagc ttgaaaacgg ccggaaacga atgctcgcta gtgcgggcgt 8940
tctgcagaaa ggtaacgagc tggcactgcc ctctaaatac gttaatttct tgtatctggc 9000
cagccactat gaaaagctca aagggtctcc cgaagataat gagcagaagc agctgttcgt 9060
ggaacaacac aaacactacc ttgatgagat catcgagcaa ataagcgaat tctccaaaag 9120
agtgatcctc gccgacgcta acctcgataa ggtgctttct gcttacaata agcacaggga 9180
taagcccatc agggagcagg cagaaaacat tatccacttg tttactctga ccaacttggg 9240
cgcgcctgca gccttcaagt acttcgacac caccatagac agaaagcggt acacctctac 9300
aaaggaggtc ctggacgcca cactgattca tcagtcaatt acggggctct atgaaacaag 9360
aatcgacctc tctcagctcg gtggagactg actcgagcac caccaccacc accactgaga 9420
tccggctgct aacaaagccc gaaaggaagc tgagttggct gctgccaccg ctgagcaata 9480
actagcataa ccccttgggg cctctaaacg ggtcttgagg ggttttttgc tgaaaggagg 9540
aactatatcc gga 9553
<210> 10
<211> 920
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 10
tagtcctgtc gggtttcgcc acctctgact tgagcgtcga tttttgtgat gctcgtcagg 60
ggggcggagc ctatggaaaa acgccagcaa cgcggccttt ttacggttcc tggccttttg 120
ctggcctttt gctcacatgt tctttcctgc gttatcccct gattctgtgg ataaccgtat 180
taccgccttt gagtgagctg ataccgctcg ccgcagccga acgaccgagc gcagcgagtc 240
agtgagcgag gaagcggaag agcgcccaat acgcaaaccg cctctccccg cgcgttggcc 300
gattcattaa tgcagctggc acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa 360
cgcaattaat gtgagttagc tcactcatta ggcaccccag gctttacact ttatgcttcc 420
ggctcgtatg ttgtgtggaa ttgtgagcgg ataacaattt cacacaggaa acagctatga 480
ccatgattac gccaagctcg aaattaaccc tcactaaagg gaacaaaagc tggagctcca 540
ccgcggtggc ggccgctcta gaactagtgg atcccccggg ctgcaggaat tcgatatcaa 600
gcttatcgat taccgctcca gtcgttcatg tggttagagc tagaaatagc aagttaaaat 660
aaggctagtc cgttatcaac ttgaaaaagt ggcaccgagt cggtgctctc gagggggggc 720
ccggtaccca attcgcccta tagtgagtcg tattacaatt cactggccgt cgttttacaa 780
cgtcgtgact gggaaaaccc tggcgttacc caacttaatc gccttgcagc acatccccct 840
ttcgccagct ggcgtaatag cgaagaggcc cgcaccgatc gcccttccca acagttgcgc 900
agcctgaatg gcgaatggaa 920
<210> 11
<211> 97
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 11
taccgctcca gtcgttcatg gttttagagc tagaaatagc aagttaaaat aaggctagtc 60
cgttatcaac ttgaaaaagt ggcaccgagt cggtgct 97
<210> 12
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 12
taccgctcca gtcgttcatg 20

Claims (9)

1.一种基于结构的CRISPR蛋白的优化设计方法,其特征在于,从能量角度出发,使用Rosetta来优化Cas9蛋白与DNA有相互作用的重要氨基酸,以增强Cas9对DNA的结合力,从而获得PAM兼容性高或脱靶率低的突变体,记该突变体为yCas9,具体步骤为:
第一步:确定对Cas9特性有重要影响的氨基酸位点;
从蛋白质数据库中下载蛋白结构,通过分析和比较蛋白的电子密度图,找出对Cas9的PAM识别或者脱靶作用有重要影响的氨基酸位点;xCas9突变的7个定向进化位点:A262T、R324L、S409I、E480K、E543D、M694I和E1219V,分布在sgRNA-DNA异源双链体的两侧,这7个氨基酸共同作用,拓展了xCas9的PAM识别范围并降低其脱靶效应;因此,选取这7个位点作为Rosetta优化设计的氨基酸位点;
第二步:Relax能量最小化,选出能量最低的结构作为初始结构;
Relax是Rosetta软件里结构预测中的一个模块;在Rosetta力场中,使用该模块通过多次迭代使结构的氨基酸侧链重排,并且在每轮迭代中不断地调整范德华作用力,以能量最小化的方式搜索该结构的局部最优构象;故采用Relax模块对SpCas9结构(PDB ID: 4UN3)进行多轮迭代,并根据Rosetta的打分函数,选出能量最低的结构作为初始结构;
第三步:Fixbb优化Protein-DNA相互作用界面;
Fixbb是Rosetta软件里蛋白设计中一个模块;该模块可在固定蛋白主链骨架的同时,使侧链自由旋转,寻找其最佳的位置,并设计最适合该位置的氨基酸;
所以,使用Fixbb模块固定第二步中筛选出的初始结构的主链原子,并对第一步中确定的7个氨基酸位点:262、324、409、480、543、694、1219,进行20种氨基酸组合突变设计,使程序输出10000个排列组合;
第四步:去除重复性结果,筛选突变体;
Fixbb模块输出的结果中,有很多重复的结果;此步骤通过对输出的10000个结果进行序列比对,去除重复的结果,把剩余结果按能量从低到高的顺序排列,并从中进行挑选;最终选择分数值低即更接近自然构象的突变体,并将其命名为yCas9;
第五步:突变体的表达与验证。
将筛选到的突变体首先进行质粒构造,然后表达与纯化蛋白,最后对蛋白的剪切活性进行检测,以鉴定设计出的突变体的性能;
最终得到的yCas9,其氨基酸序列如SEQ ID NO.6所示,是将野生型SpCas9的第409位的氨基酸S、第480位的E、第543位的E、第694位的M和第1219位的E分别突变成了N、K、D、L和T。
2. 一种由权利要求1所述优化设计方法得到的SpCas9突变体-- yCas9,该突变体yCas9具有和xCas9相当的基因编辑活性:宽泛的PAM识别范围,可识别NGG、NGA、NGT、NGC、GAA和GAT,且在基因编辑时脱靶率低。
3. 一种多核苷酸序列,可以转录和翻译成所述的yCas9核酸酶,其序列为SEQ IDNO.7。
4.一种表达载体,其含有如权利要求3所述的多核苷酸序列。
5.一种宿主细胞,用于转化权利要求4所述表达载体。
6.一种yCas9核酸酶的制备方法,具体步骤包括:首先,构建所述yCas9核酸酶的多核苷酸序列表达载体;然后,将所述表达载体转化至宿主细胞,筛选并挑出单克隆;最后,将所述单克隆诱导表达,并通过亲和层析从表达产物中分离出所述的yCas9核酸酶。
7.如权利要求2所述的yCas9核酸酶、权利要求3所述的多核苷酸序列以及如权利要求4所述的表达载体作为编辑基因组DNA的编辑工具的应用,用于基因组DNA片段的相关编辑。
8.如权利要求7所述的应用,所述的基因编辑是单点编辑,或者是编辑位点大于等于两个的多点编辑;编辑的手段包括删除、突变、插入、倒位、移位、重复或易位。
9.如权利要求7所述的应用,其中yCas9作为编辑工具,包括与靶标DNA片段匹配的引导sgRNA,进行对目的基因进行编辑。
CN202010666107.5A 2020-07-12 2020-07-12 一种基于结构的crispr蛋白的优化设计方法 Active CN111893104B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010666107.5A CN111893104B (zh) 2020-07-12 2020-07-12 一种基于结构的crispr蛋白的优化设计方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010666107.5A CN111893104B (zh) 2020-07-12 2020-07-12 一种基于结构的crispr蛋白的优化设计方法

Publications (2)

Publication Number Publication Date
CN111893104A true CN111893104A (zh) 2020-11-06
CN111893104B CN111893104B (zh) 2021-06-04

Family

ID=73193082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010666107.5A Active CN111893104B (zh) 2020-07-12 2020-07-12 一种基于结构的crispr蛋白的优化设计方法

Country Status (1)

Country Link
CN (1) CN111893104B (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112626049A (zh) * 2020-12-14 2021-04-09 安徽省农业科学院水稻研究所 一种在水稻基因打靶中识别特异位点的SpCas9-NRRH突变体及其应用
CN116486903A (zh) * 2023-04-17 2023-07-25 深圳新锐基因科技有限公司 基于同源蛋白序列进化方向结合自由能变提高蛋白稳定性的方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156823A (zh) * 2011-02-18 2011-08-17 复旦大学 一种靶向作用于蛋白激酶非活性构象的化合物筛选方法
CN107247885A (zh) * 2017-07-06 2017-10-13 中国水产科学研究院黄海水产研究所 一种电压‑门控钠离子通道的结构预测方法
WO2019191261A1 (en) * 2018-03-28 2019-10-03 Sanofi Pasteur Inc. Methods of generating broadly protective vaccine compositions comprising neuraminidase

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156823A (zh) * 2011-02-18 2011-08-17 复旦大学 一种靶向作用于蛋白激酶非活性构象的化合物筛选方法
CN107247885A (zh) * 2017-07-06 2017-10-13 中国水产科学研究院黄海水产研究所 一种电压‑门控钠离子通道的结构预测方法
WO2019191261A1 (en) * 2018-03-28 2019-10-03 Sanofi Pasteur Inc. Methods of generating broadly protective vaccine compositions comprising neuraminidase

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDREW LEAVER-FAY ET AL.: "ROSETTA3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules", 《METHODS ENZYMOL. AUTHOR MANUSCRIPT》 *
BRIAN J. BENDER ET AL.: "Protocols for Molecular Modeling with Rosetta3 and RosettaScripts", 《BIOCHEMISTRY》 *
黄韧 薛成 等编著: "《生物信息学网络资源与应用》", 30 June 2003, 中山大学出版社 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112626049A (zh) * 2020-12-14 2021-04-09 安徽省农业科学院水稻研究所 一种在水稻基因打靶中识别特异位点的SpCas9-NRRH突变体及其应用
CN112626049B (zh) * 2020-12-14 2022-04-01 安徽省农业科学院水稻研究所 一种在水稻基因打靶中识别特异位点的SpCas9-NRRH突变体及其应用
CN116486903A (zh) * 2023-04-17 2023-07-25 深圳新锐基因科技有限公司 基于同源蛋白序列进化方向结合自由能变提高蛋白稳定性的方法及装置
CN116486903B (zh) * 2023-04-17 2023-12-29 深圳新锐基因科技有限公司 基于同源蛋白序列进化方向结合自由能变提高蛋白稳定性的方法及装置

Also Published As

Publication number Publication date
CN111893104B (zh) 2021-06-04

Similar Documents

Publication Publication Date Title
CN111893104B (zh) 一种基于结构的crispr蛋白的优化设计方法
CN110923183A (zh) 产羊毛甾醇大肠杆菌菌株的构建方法
CN106755031B (zh) 鼠李糖脂生产质粒及其构建方法与大肠杆菌工程菌和应用
CN109402109B (zh) 一种改进的重叠延伸pcr方法
KR101495276B1 (ko) 광 유도성 프로모터 및 이를 포함하는 유전자 발현 시스템
CN110241099B (zh) 酿脓链球菌的CRISPR核酸酶SpCas9 的截短变异体及其应用
CN115247173A (zh) 构建tmprss6基因突变的缺铁性贫血猪核移植供体细胞的基因编辑系统及其应用
CN111748034B (zh) 一种滑液囊支原体单克隆抗体的制备方法
CN107075495B (zh) 裂解酶和编码该裂解酶的dna、含该dna的载体以及用于不对称合成(s)-苯基乙酰基甲醇的方法
CN106715689B (zh) 裂解酶以及不对称合成(s)-苯基乙酰基甲醇的方法
CN112501139A (zh) 一株重组新城疫病毒毒株及其制备方法和应用
CN113755512B (zh) 一种制备串联重复蛋白质的方法与应用
RU2792132C1 (ru) Рекомбинантная плазмида pET-GST-3CL, обеспечивающая синтез протеазы 3CL SARS-CoV-2 в клетках E.coli в растворимой форме
RU2774333C1 (ru) Рекомбинантная плазмида pET-GST-3CL-GPG, обеспечивающая синтез протеазы 3CL SARS-CoV-2 в клетках E.coli в растворимой форме
CN113234746B (zh) 一种农药诱导蛋白互作和诱导基因表达的方法
CN112553177B (zh) 一种热稳定性提高的谷氨酰胺转氨酶变体
CN114317473B (zh) 一种催化活性和热稳定性提高的谷氨酰胺转氨酶变体
CN115161335B (zh) 用于构建tardbp基因突变的als模型猪核移植供体细胞的基因编辑系统及其应用
KR100902634B1 (ko) 재조합 hmgb1 펩티드를 포함하는 핵산 전달 복합체
CN112813087A (zh) 一种SalI限制性内切酶的制备方法
KR20220080101A (ko) 향상된 비천연 아미노산 혼입을 위한 키메라 열안정성 아미노아실-tRNA 합성효소
CN115247175A (zh) 构建setdb1基因突变的表观遗传失调模型猪核移植供体细胞的基因编辑系统及其应用
CN112662647A (zh) 重组NcoI限制性内切酶的制备方法
CN115247191A (zh) 基因编辑系统及其在构建双基因突变的痣样基底细胞癌综合征猪核移植供体细胞中的应用
CN115232818A (zh) 构建dok7基因突变的先天性肌无力模型猪核移植供体细胞的基因编辑系统及其应用

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant