CN115678913A - 表观遗传因子在真核细胞中优化基因编辑工具的应用 - Google Patents

表观遗传因子在真核细胞中优化基因编辑工具的应用 Download PDF

Info

Publication number
CN115678913A
CN115678913A CN202111281795.4A CN202111281795A CN115678913A CN 115678913 A CN115678913 A CN 115678913A CN 202111281795 A CN202111281795 A CN 202111281795A CN 115678913 A CN115678913 A CN 115678913A
Authority
CN
China
Prior art keywords
leu
sequence
lys
ser
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111281795.4A
Other languages
English (en)
Inventor
张学礼
毕昌昊
杨超
董兴啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Institute of Industrial Biotechnology of CAS
Original Assignee
Tianjin Institute of Industrial Biotechnology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Institute of Industrial Biotechnology of CAS filed Critical Tianjin Institute of Industrial Biotechnology of CAS
Priority to CN202111281795.4A priority Critical patent/CN115678913A/zh
Publication of CN115678913A publication Critical patent/CN115678913A/zh
Pending legal-status Critical Current

Links

Images

Landscapes

  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

本发明公开了表观遗传因子在真核细胞中优化基因编辑工具的应用。本发明所要保护的重组基因编辑系统为在基因编辑系统的基础上改造得到的,所述重组基因编辑系统表达融合蛋白,所述融合蛋白含有序列特异性结合蛋白、诱导基因组修饰因子和表观遗传因子;所述重组基因编辑系统的基因编辑效率高于所述基因编辑系统。所述重组基因编辑系统可为载体、mRNA或DNA分子。所述重组基因表达系统、表观遗传因子、融合蛋白和/或融合蛋白相关的生物材料可应用于优化基因编辑,在实际生产中,可提高基因编辑效率,实现新的编辑类型和建立新兴的基因编辑工具。

Description

表观遗传因子在真核细胞中优化基因编辑工具的应用
技术领域
本发明涉及生物技术领域,具体涉及表观遗传因子在真核细胞中优化基因编辑工具的方式及其应用。
背景技术
基因编辑系统是指一类特异性靶向基因组,并诱导基因组突变的蛋白系统。现有的基因编辑系统主要由序列特异性结合蛋白(Cas9蛋白、锌指核酸酶、转录激活物样效应核酸酶等)和诱导基因组修饰因子(脱氨酶、转座酶、逆转录酶等)组成,可诱导基因组的突变、插入、缺失,为基因突变导致的疾病治疗带来曙光。目前,这类基因编辑系统的优化工作主要集中在基因组结合蛋白及诱导因子的进化,例如靶向基因组蛋白从锌指核酸酶向Cas9蛋白的转变,此外,诱导基因组修饰因子也从最初的空白转变为脱氨酶、转座酶等。如碱基编辑系统,CBE基因编辑系统为胞嘧啶碱基编辑系统,CBE中的胞苷脱氨酶可直接将目标胞苷转化为尿苷,随后在尿嘧啶糖基化酶抑制剂和DNA复制的作用下将尿嘧啶转化为胸腺嘧啶,可在不对DNA双链进行切割的条件下完成C-T碱基的转换;GBE碱基编辑系统会在尿嘧啶羰基化酶的作用下,切除尿苷,随后在细胞自身的DNA修复机制下,实现C-G碱基的转换。然而,现阶段少有研究探索基因组外部环境的变化对基因编辑系统的功能影响,尤其是在真核细胞中。真核细胞中,基因组的基本结构为DNA和组蛋白缠绕的核小体,这种紧密的染色质结构为靶向基因组蛋白与DNA序列的结合设置巨大的障碍,很大程度影响基因编辑系统的效率。
此外,真核细胞的染色质环境不仅包括DNA和组蛋白的结合情况,同时也包括组蛋白尾端氨基酸及DNA自身的化学修饰,这二者共同组成了复杂的染色质环境。可以预见,真核细胞复杂的染色质环境必将对基因编辑系统的功能产生巨大的影响。
目前,表观遗传涉及的因子包括染色质重塑因子、组蛋白修饰因子、DNA及RNA修饰因子、miRNA和lncRNA等。染色质重塑因子是一类可与组蛋白相互作用,从而诱导染色质环境改变的复合物;组蛋白修饰因子是组蛋白翻译后修饰的主要蛋白,包括甲基化、乙酰化、泛素化等修饰因子,对基因的转录调控、DNA链的断裂修复等过程有着重要影响;DNA及RNA修饰是发生在DNA及RNA自身的化学修饰,常见的有胞嘧啶的甲基化、腺嘌呤的甲基化修饰,这些修饰同样影响着基因转录调控及DNA修复进程。
发明内容
本发明所要解决的技术问题是如何优化基因编辑系统和/或如何使用表观遗传因子优化基因编辑系统。
为了解决上述技术问题,本发明首先提供了重组基因编辑系统。所述重组基因编辑系统可在基因编辑系统的基础上改造得到的。所述重组基因编辑系统可表达融合蛋白。所述融合蛋白可含有序列特异性结合蛋白、诱导基因组修饰因子和表观遗传因子。所述重组基因编辑系统的基因编辑效率高于所述基因编辑系统。
上文所述重组基因编辑系统可为载体、mRNA或DNA分子。
所述基因编辑系统可含有序列特异性结合蛋白的编码基因和诱导基因组修饰因子的编码基因。所述基因编辑系统可表达含有序列特异性结合蛋白和诱导基因组修饰因子的融合蛋白。
上文所述重组基因编辑系统中,所述基因编辑系统可为碱基编辑系统。所述碱基编辑系统可为CBE碱基编辑系统或GBE碱基编辑系统。所述CBE碱基编辑系统可为BE4max碱基编辑系统。所述基因编辑系统还可为其他基因编辑系统,如先导编辑器或基于转座酶的基因编辑系统等。上文所述重组基因编辑系统可为重组碱基编辑系统。
上文所述重组基因编辑系统中,所述序列特异性结合蛋白可为Cas9蛋白。所述诱导基因组修饰因子可为脱氨酶。
上文所述重组碱基编辑系统中,所述序列特异性结合蛋白还可为锌指核酸酶或转录激活物样效应核酸酶等。所述诱导基因组修饰因子还可为转座酶或逆转录酶等。
上文所述重组基因编辑系统中,所述表观遗传因子可为染色质重塑因子、组蛋白修饰因子和/或RNA修饰因子。
上文所述重组基因编辑系统中,所述基因编辑系统可为CBE碱基编辑系统或GBE碱基编辑系统。所述脱氨酶可为胞苷脱氨酶。所述Cas9蛋白可为nCas9。
上文所述重组基因编辑系统中,所述Cas9蛋白还可为其他Cas9蛋白。所述脱氨酶还可为其他脱氨酶。所述融合蛋白还可含有其他蛋白。所述其他蛋白可为尿嘧啶糖基化酶抑制剂蛋白(UGI)和/或尿嘧啶糖基化酶(UDG)。所述尿嘧啶糖基化酶抑制剂蛋白的氨基酸序列可为序列表中序列1的第2006-2088位。所述尿嘧啶糖基化酶的氨基酸序列可为序列表中序列3的第1993-2224位。
上文所述重组基因编辑系统中,所述胞苷脱氨酶的氨基酸序列可为序列表中序列1的第20-247位。所述nCas9蛋白的氨基酸序列可为序列表中序列1的第629-1995位。
上文所述重组基因编辑系统中,所述染色质重塑因子可为SOX2。所述组蛋白修饰因子可为SETD2。所述RNA修饰因子可为METTL3。
所述SOX2可为A1)、A2)、A3)、A4)或A5)中的任一种:
A1)氨基酸序列为序列表中序列1的第280-596位。
A2)由编码序列是序列表中序列6的第838-1053位的核苷酸序列编码的蛋白质。
A3)由编码序列是序列表中序列7的第838-1188位的核苷酸序列编码的蛋白质。
A4)将A1)、A2)或A3)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由A1)、A2)或A3)衍生的或与A1)、A2)或A3)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质。
A5)在A1)、A2)、A3)或A4)的N末端或/和C末端连接蛋白标签得到的融合蛋白质。
上文所述重组基因编辑系统中,所述SETD2可为B1)、B2)或B3)中的任一种:
B1)由编码序列是序列表中序列9的第25-915位的核苷酸序列编码的蛋白质。所述SETD2的氨基酸序列由297个氨基酸残基组成。
B2)将B1)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由B1)衍生的或与B1)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质。
B3)在B1)或B2)的N末端或/和C末端连接蛋白标签得到的融合蛋白质。
上文所述重组基因编辑系统中,所述METTL3可为C1)、C2)或C3)中的任一种:
C1)由编码序列是序列表中序列10的第25-1761位的核苷酸序列编码的蛋白质。所述METTL3的氨基酸序列由579个氨基酸残基组成。
C2)将C1)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由C1)衍生的或与C1)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质。
C3)在C1)或C2)的N末端或/和C末端连接蛋白标签得到的融合蛋白质。
上文所述重组基因编辑系统中,所述融合蛋白可为D1)、D2)、D3)、D4)、D5)、D6)中的任一种:
D1)氨基酸序列为序列表中序列1的蛋白质。
D2)氨基酸序列为序列表中序列3的蛋白质。
D3)由编码序列是序列表中序列6的第1-5871位的核苷酸序列编码的蛋白质。所述融合蛋白的氨基酸序列由1957个氨基酸残基组成。
D4)由编码序列是序列表中序列7的第1-6006位的核苷酸序列编码的蛋白质。所述融合蛋白的氨基酸序列由2002个氨基酸残基组成。
D5)由编码序列是序列表中序列8的第1-6072位的核苷酸序列编码的蛋白质。所述融合蛋白的氨基酸序列由2024个氨基酸残基组成。
D6)由编码序列是序列表中序列9的第1-6612位的核苷酸序列编码的蛋白质。所述融合蛋白的氨基酸序列由2224个氨基酸残基组成。
D7)由编码序列是序列表中序列10的第1-7458位的核苷酸序列编码的蛋白质。所述融合蛋白的氨基酸序列由2486个氨基酸残基组成。
D8)将D1)、D2)、D3)、D4)、D5)、D6)或D7)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由D1)、D2)、D3)、D4)、D5)、D6或D7)衍生的或与D1)、D2)、D3)、D4)、D5)、D6)或D7)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质。
D9)在D1)、D2)、D3)、D4)、D5)、D6)、D7)或D8)的N末端或/和C末端连接蛋白标签得到的融合蛋白质。
所述标签蛋白(protein-tag)是指利用DNA体外重组技术,与目的蛋白一起融合表达的一种多肽或者蛋白,以便于目的蛋白的表达、检测、示踪和/或纯化。所述标签蛋白可为Flag标签蛋白、His标签蛋白、MBP标签蛋白、HA标签蛋白、myc标签蛋白、GST标签蛋白和/或SUMO标签蛋白等。
上文所述的融合蛋白或其相关的生物材料也属于本发明的保护范围。所述生物材料可为下述任一种:
E1)编码上文所述融合蛋白质的核酸分子。
E2)含有E1)所述核酸分子的表达盒。
E3)含有E1)所述核酸分子的重组载体、或含有E2)所述表达盒的重组载体。
E4)含有E1)所述核酸分子的重组微生物、或含有E2)所述表达盒的重组微生物、或含有E3)所述重组载体的重组微生物。
E5)含有E1)所述核酸分子的转基因植物细胞系、或含有E2)所述表达盒的转基因细胞系、或含有E3)所述重组载体的转基因细胞系。
E6)含有E1)所述核酸分子的转基因植物组织、或含有E2)所述表达盒的转基因组织、或含有E3)所述重组载体的转基因组织。
E7)含有E1)所述核酸分子的转基因动物器官、或含有E2)所述表达盒的转基因动物器官、或含有E3)所述重组载体的转基因动物器官。
上文所述生物材料中,B1)所述核酸分子可为下述任一种:
E11)编码序列是序列表中序列2的DNA分子。
E12)编码序列是序列表中序列4的DNA分子。
E13)编码序列是序列表中序列6的DNA分子。
E14)编码序列是序列表中序列7的DNA分子。
E15)编码序列是序列表中序列8的DNA分子。
E16)编码序列是序列表中序列9的DNA分子。
E17)编码序列是序列表中序列10的DNA分子。
为了解决上述技术问题本发明还提供了上文所述的表观遗传因子在提高基因编辑系统的基因编辑效率中的应用。
为了解决上述技术问题本发明还提供了上文所述的表观遗传因子和/或上文所述的融合蛋白和/或其相关的生物材料在基因编辑中的应用。
所述应用的目的可以是疾病诊断目的、疾病预后目的和/或疾病治疗目的,它们的目的也可以是非疾病诊断目的、非疾病预后目的和非疾病治疗目的;所述应用的直接目的可以是获取疾病诊断结果、疾病预后结果和/或疾病治疗结果的中间结果的信息,所述应用的直接目的可以是非疾病诊断目的、非疾病预后目的和/或非疾病治疗目的。
本发明将表观遗传因子的编码基因导入CBE和GBE碱基编辑系统得到表达含有表观遗传因子的融合蛋白的重组碱基编辑系统,并将该重组碱基编辑系统质粒和靶向于特定基因的gRNA质粒共转染HEK293T细胞进行碱基编辑编辑效率分析,结果发现,相比较于CBE和GBE碱基编辑系统,重组碱基编辑系统可在不同程度上提高碱基的编辑范围和碱基的编辑效率。本发明所提供的重组基因编辑系统和/或表观遗传因子在优化基因编辑系统中的应用在实际生产中,可提高基因编辑效率,实现新的编辑效果和建立新兴的基因编辑工具。
附图说明
图1为先锋因子融合碱基编辑系统图。
图2为先锋因子蛋白融合编辑器编辑结果图。
图3为细胞转染及筛选示意图。
图4为HEK293T(293T)细胞中融合先锋因子SOX2蛋白对CBE(A)和GBE(B)编辑结果的影响。(A)的纵坐标为碱基C-T转换的效率,横坐标为胞嘧啶在前间隔基序框的位置;(B)的纵坐标为碱基C-G转换的效率。
图5为MYC终止突变靶位点C的编辑效率(A)和诱导终致突变的C11编辑效率(B)。(A)的纵坐标为碱基C-T转换的效率;(B)的纵坐标为11号位胞嘧啶的C-T转换效率。
图6为SOX2结构域拆分对CBE(A)和GBE(B)编辑效率的影响。(A)的纵坐标为碱基C-T转换的效率,横坐标为胞嘧啶在前间隔基序框的位置;(B)的纵坐标为碱基C-G转换的效率,横坐标为GBE和基于SOX2蛋白不同结构域构建的GBE。
图7为SETD2和METTL3蛋白对GBE编辑效率的影响。纵坐标为碱基C-G转换的效率。
具体实施方式
下面结合具体实施方式对本发明进行进一步的详细描述,给出的实施例仅为了阐明本发明,而不是为了限制本发明的范围。以下提供的实施例可作为本技术领域普通技术人员进行进一步改进的指南,并不以任何方式构成对本发明的限制。
下述实施例中的实验方法,如无特殊说明,均为常规方法,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。下述实施例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。
本发明实施例中BE4max碱基编辑系统表达质粒来源于Addgene(#112093)。GBE碱基编辑系统表达质粒(APOBEC-nCas9-Ung)和gRNA表达质粒骨架(RNF2 sgRNA)为实验室保存(相关文献:Zhao D,Li J,Li S,et al.New base editors change C to A in bacteriaand C to G in mammalian cells[J].Nature Biotechnology,2021,39(1).,公众可从申请人处获得,仅用于重复本发明)。HEK293T细胞(293T)和宫颈癌细胞(HeLa)源于北京大学医学部尚永丰教授赠送(相关文献:Yang C,Wu J,Liu X,et al.Circadian Rhythm IsDisrupted by ZNF704 in Breast Carcinogenesis[J].Cancer Research,2020,80(19):canres.0493.2020.)。
实施例1:筛选优化碱基编辑系统的一类表观遗传因子-先锋因子蛋白
1.测试先锋因子
先锋因子是染色质重塑因子中的一类,筛选标准包括:①先锋因子功能已被文献报道;②基因编码序列长度在500bp-2000bp之间;③先锋因子功能结构域划分清晰。
通过筛选拟进行如下先锋因子蛋白PAX7(NG_023262),PBX1(NG_028246),FOXA1(NG_033028),SOX2(NG_009080,氨基酸序列为序列表中序列1的第280-596位)的验证。
2.构建含有先锋因子蛋白的重组基因编辑系统
2.1引物设计和PCR扩增
通过RNA提取试剂盒提取293T(HEK293T)细胞RNA,并以提取的RNA为模板反转录获取cDNA文库。
根据步骤1筛选得到的先锋因子基因序列设计上下游引物(表1),以获取的cDNA文库为模板进行PCR扩增,分别获得筛选先锋因子PAX7,PBX1,FOXA1,SOX2的基因序列(核苷酸序列为序列表中序列2的第838-1788位)。
表1.PCR扩增引物列表
Figure BDA0003331360480000051
2.2构建不同先锋因子排列组合的编辑系统表达质粒
BE4max碱基编辑系统(一种CBE碱基编辑系统)能表达融合蛋白APOBEC1-nCas9-2xUGI;GBE碱基编辑系统能表达融合蛋白APOBEC1-nCas9-UNG。
其中APOBEC1为胞苷脱氨酶(氨基酸序列为序列表中序列1的第20-247位),nCas9为Cas9蛋白(氨基酸序列为序列表中序列1的第629-1995位),UGI为尿嘧啶糖基化酶抑制蛋白(氨基酸序列为序列表中序列1的第2006-2088位),UNG为尿嘧啶糖基化酶(氨基酸序列为序列表中序列3的第1993-2224位)。
以BE4max、GBE碱基编辑系统为基础,使用无缝克隆试剂盒(碧云天,货号:D7010S)通过基因重组方法分别将步骤2.1中扩增得到的四种不同先锋因子的基因序列分别整合至BE4max碱基编辑系统质粒和GBE碱基编辑系统质粒中。
同时考虑到先锋因子基因序列插入到碱基编辑系统质粒的位置不同,导致得到的重组碱基编辑系统表达的融合蛋白中先锋因子蛋白与脱氨酶、Cas蛋白的排列位置影响编辑系统的编辑效果,本发明通过无缝克隆试剂盒分别将PAX7,PBX1,FOXA1,SOX2四种先锋因子基因序列插入到CBE和GBE碱基编辑系统质粒上APOBEC1基因编码序列和nCas9基因蛋白编码序列的不同位置,分别得到先锋因子蛋白表达在如图1所示的融合蛋白APOBEC1-nCas9-2xUGI或APOBEC1-nCas9-UNG的APOBEC1的氨基端位置(NH3)、中间位置(Middle,即APOBEC1和nCas9中间位置)和nCas9的羧基端位置(COOH)的新的融合蛋白,以验证编辑效果。
实验结果证实,将先锋因子蛋白SOX2的编码序列插入到BE4max碱基编辑系统质粒中间位置得到的重组碱基编辑系统SoxM-CBE(图2中A的SOX2-middle-CBE)的碱基编辑效率高于BE4max碱基编辑系统(图2中A的BE4max)和将其他三种先锋因子蛋白的编码序列插入到BE4max碱基编辑系统质粒中间位置得到的重组碱基编辑系统(图2中A的PAX7-middle-CBE、PBX1-middle-CBE和FOXA1-middle-CBE);将先锋因子蛋白SOX2的编码序列插入到GBE碱基编辑系统质粒氨基端位置得到的重组碱基编辑系统SoxN-GBE(SOX2-NH3-GBE,图2中B的SOX2-GBE(N))的碱基编辑效率在RP11位点高于GBE碱基编辑系统(图2中B的GBE)和将其他三种先锋因子蛋白的编码序列插入到GBE碱基编辑系统质粒氨基端位置得到的重组碱基编辑系统(图2中B的PAX7-GBE(N)、PBX1-GBE(N)和FOXA1-GBE(N))。
基于BE4max碱基编辑系统得到的SoxM-CBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列2的重组基因APOBEC1-SOX2-nCas9-2xUGI,能够表达氨基酸序列为序列表中序列1的融合蛋白APOBEC1-SOX2-nCas9-2xUGI。
融合蛋白APOBEC1-SOX2-nCas9-2xUGI从N端到C端依次为胞苷脱氨酶1(APOBEC1)、先锋因子SOX2、Cas9蛋白(nCas9),尿嘧啶糖基化酶抑制蛋白(UGI)。序列表中序列2的第58-741位核苷酸为APOBEC1基因的编码序列,序列2的第1885-5985位核苷酸为nCas9基因的编码序列,序列2的第6016-6264位核苷酸为UGI基因的编码序列。
基于GBE碱基编辑系统得到的SoxN-GBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列4的重组基因SOX2-APOBEC1-nCas9-UNG,能够表达氨基酸序列为序列表中序列3的融合蛋白SOX2-APOBEC1-nCas9-UNG。
融合蛋白SOX2-APOBEC1-nCas9-UNG从N端到C端依次为先锋因子SOX2、胞苷脱氨酶1(APOBEC1)、Cas9蛋白(nCas9),尿嘧啶糖基化酶(UNG)。序列表中序列4的第5977-6672位核苷酸为UNG基因的编码序列。
实施例2:293T细胞中验证先锋因子对碱基编辑系统的优化效果
1.gRNA重组表达质粒的构建
依据GBE和BE4max碱基编辑系统的特性,构建gRNA表达质粒,每种碱基编辑系统构建gRNA表达质粒各10个(表2)。
表2.sgRNA的克隆和深度测序引物
Figure BDA0003331360480000071
具体为以RNF2 sgRNA序列为模板,设计上下游引物(靶基因组位点名称及对应扩增引物见表2),引物退火形成gRNA的编码双链DNA,应用goldengate方法(BsaI酶,Thermo)将得到的gRNA的编码双链DNA与gRNA表达质粒骨架(RNF2 sgRNA)连接,共得到18种gRNA表达质粒。
2.重组质粒转染293T细胞
将实施例1中得到的重组碱基编辑系统SoxM-CBE或SoxN-GBE的质粒和实施例2步骤1中得到的18种gRNA重组表达质粒分别转化大肠杆菌Trans5α(Transgene公司)进行扩增,通过质粒提取试剂盒(天根)分别获取重组碱基编辑系统质粒和gRNA表达质粒。
传代生长融合度达90%的293T细胞于细胞培养24孔板中,第二天进行质粒转染实验,实验中分别转染SoxM-CBE或SoxN-GBE质粒600ng、gRNA表达质粒每种300ng(SoxN-GBE+gRNA,SoxM-CBE+gRNA),并加入转染试剂PEI(Polysciences,美国)(转染质粒和PEI的比例为1μg:3μL)和Opti-MEM培养基(Gibco)(转染质粒和培养基的比例1μg:100μL),充分混匀后静置15min,加入至待转染的293T细胞中转染24h;转染后24h将转染细胞更换含有嘌呤霉素(Sigma)(比例为1:2500)的培养基(Gibco)继续培养,共培养6天,每两天更换1次培养基(图3)。
3.融合先锋因子SOX2的重组碱基编辑系统编辑效率的验证
步骤2中细胞转染后6天收集转染细胞,PBS清洗一遍后,加入适量细胞裂解液,提取细胞基因组DNA,以转染细胞的基因组DNA为模板,PCR扩增获取目的片段,并通过纯化试剂盒回收目的片段,进行深度测序分析;Linux系统下通过CRISPResso2软件分析深度测序数据,统计并比较不同先锋因子融合碱基编辑系统的编辑效率、编辑框(图4),结果显示,BE4max碱基高效编辑范围为4-9位,经过优化后的重组碱基编辑系统SoxM-CBE的高效碱基编辑范围为5-16位,SoxM-CBE表现出更广的编辑范围(图4中A);GBE碱基编辑系统的平均编辑效率为13.73%,经过优化后的重组碱基编辑系统SoxN-GBE的平均编辑效率为28.32%,SoxN-GBE表现出更高的编辑效率(图4中B)。
实施例3:宫颈癌细胞中应用SoxM-CBE系统诱导原癌基因MYC的终止突变
1.SoxM-CBE碱基编辑系统质粒转染宫颈癌癌症细胞
传代培养宫颈癌癌症细胞HeLa,当传代生长融合度达90%时,将细胞传代分至24孔板中,第二天进行质粒转染实验,实验中分别转染实施例1中得到的重组碱基编辑系统质粒(SoxM-CBE 600ng、实施例2步骤1中得到1种gRNA重组质粒(gRNA靶位点序列:5’-CACGGCCGACCAGCTGGAGA-3’,靶基因为MYC基因(表2中MYC-site))300ng,并加入转染试剂PEI(转染质粒与PEI的比例为1μg:3μL)和Opti-MEM培养基(转染质粒与培养基的比例1μg:100μL),充分混匀后静置15min,加入至待转染的细胞中;转染后24h更换培养基为嘌呤霉素(比例为1:2500)筛选培养基继续培养,共培养6天,每两天更换1次筛选培养基。
2.重组碱基编辑系统的编辑效率分析
步骤1中收集的转染细胞,PBS清洗一遍后,加入适量细胞裂解液,提取细胞基因组DNA,设计MYC基因靶位点DNA序列上下游引物(F:5’-CCCTCCTACGTTGCGGTCA-3’,R:5’-CGAGAAGCCGCTCCACAT-3’),并以提取的基因组DNA为模板,PCR扩增获取PCR产物(序列表中序列5),通过纯化试剂盒回收PCR产物片段,进行深度测序分析,分析MYC基因靶位点DNA序列终止密码子诱导编辑效率(图5),结果显示,相比于对照BE4max碱基编辑系统,重组碱基编辑系统SoxM-CBE在编辑框第11位胞嘧啶C11的位置表现出近30%的C-T的碱基转换效率。
综上,本发明研究证实当先锋因子SOX2插入GBE碱基编辑系统脱氨酶的氨基端(SoxN-GBE:SOX2-APOBEC1-nCas9-UNG)和CBE碱基编辑系统脱氨酶羧基端(SoxM-BE4max:APOBEC1-SOX2-nCas9-UGI)得到的重组碱基编辑系统,能分别促进GBE和CBE碱基编辑系统编辑效率提升和编辑范围拓展。
实施例4:探索先锋因子SOX2蛋白功能结构域对碱基编辑的影响
1.拆分SOX2功能结构域,构建基于SOX2蛋白功能结构域的重组碱基编辑系统
SOX2蛋白主要包括三个功能结构域(图6),包括HMG(High mobility group)、RBD(RNA binding domain)、SAD(SOX2 activation domain)。以实施例1步骤2.1中的cDNA为模板,通过PCR(引物见表1)扩增SOX2蛋白三个功能结构域的DNA片段,随后通过无缝克隆试剂盒连接至BE4max碱基编辑系统质粒和GBE碱基编辑系统质粒,筛选获得基于三个功能结构域的6种重组碱基编辑系统,包括HMG-Middle-CBE、RBD-Middle-CBE、SAD-Middle-CBE、HMG-NH3-GBE、RBD-NH3-GBE、SAD-NH3-GBE。
2.细胞水平验证融合不同结构域的碱基编辑系统效果
将步骤1得到的6种重组碱基编辑系统质粒600ng、实施例2步骤1中得到1种gRNA重组质粒(靶基因为FANCF,靶位点序列和扩增引物见表2)300ng按照实施例3步骤1的转染方法共转染至293T细胞,并检测碱基编辑效果。结果显示,相比较于CBE碱基编辑系统,HMG-Middle-CBE(图6中A的HmgM-CBE)和SAD-Middle-CBE(图6中A的SadM-CBE)重组碱基编辑系统在前间隔基序框靠后位置的胞嘧啶均表现出明显的效率提升(编辑范围扩展);相比较于GBE碱基编辑系统SAD-NH3-GBE(图6中B的SadN-GBE)重组碱基编辑系统对VISTA序列编辑框内6号位胞嘧啶编辑效果有明显的提升(图6中B)。
基于CBE碱基编辑系统得到的HmgM-CBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列6的重组基因APOBEC1-HMG-nCas9-2xUGI,能够表达融合蛋白APOBEC1-HMG-nCas9-2xUGI,融合蛋白APOBEC1-HMG-nCas9-2xUGI的氨基酸序列由1957个氨基酸残基组成,为由编码序列是序列表中序列6的第1-5871位的核苷酸序列编码的蛋白质。
融合蛋白APOBEC1-HMG-nCas9-2xUGI从N端到C端依次为胞苷脱氨酶1(APOBEC1)、先锋因子SOX2的HMG结构域、Cas9蛋白(nCas9),尿嘧啶糖基化酶抑制蛋白(UGI)。HMG结构域的氨基酸序列由72个氨基酸残基组成,为由编码序列是序列表中序列6的第838-1053位的核苷酸序列编码的蛋白质。。
基于CBE碱基编辑系统得到的SadM-CBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列7的重组基因APOBEC1-SAD-nCas9-2xUGI,能够表达融合蛋白APOBEC1-SAD-nCas9-2xUGI,融合蛋白APOBEC1-SAD-nCas9-2xUGI的氨基酸序列由2002个氨基酸残基组成,为由编码序列是序列表中序列7的第1-6006位的核苷酸序列编码的蛋白质。
融合蛋白APOBEC1-SAD-nCas9-2xUGI从N端到C端依次为胞苷脱氨酶1(APOBEC1)、先锋因子SOX2的SAD结构域、Cas9蛋白(nCas9),尿嘧啶糖基化酶抑制蛋白(UGI)。SAD结构域的氨基酸序列由117个氨基酸残基组成,为由编码序列是序列表中序列7的第838-1188位的核苷酸序列编码的蛋白质。
基于GBE碱基编辑系统得到的SadN-GBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列8的重组基因SAD-APOBEC1-nCas9-UNG,能够表达融合蛋白SAD-APOBEC1-nCas9-UNG,融合蛋白SAD-APOBEC1-nCas9-UNG的氨基酸序列由2024个氨基酸残基组成,为由编码序列是序列表中序列8的第1-6072位的核苷酸序列编码的蛋白质。
融合蛋白SAD-APOBEC1-nCas9-UNG从N端到C端依次为先锋因子SOX2的SAD结构域、胞苷脱氨酶(APOBEC1)、Cas9蛋白(nCas9),尿嘧啶糖基化酶(UNG)
实施例5:表观遗传因子SETD2及METTL3对GBE编辑系统的影响
1.构建基于SETD2和METTL3蛋白的GBE重组碱基编辑系统
检索文献筛选表观遗传因子组蛋白甲基转移酶SETD2和RNA甲基转移酶METTL3蛋白进行碱基编辑系统优化,构建GBE重组碱基编辑系统,以293T的cDNA为模板并通过PCR扩增SETD2基因核心酶催化结构域片段(序列表中序列9的第25-915位核苷酸)和METTL3基因片段(序列表中序列10的第25-1761位核苷酸),扩增引物见表1,随后分别通过无缝克隆试剂盒连接至GBE碱基编辑系统质粒,筛选获得新型的重组碱基编辑系统,包括SETD2-NH3-GBE、和METTL3-NH3-GBE。
基于GBE碱基编辑系统得到的SETD2-NH3-GBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列9的重组基因SETD2-APOBEC1-nCas9-UNG,能够表达融合蛋白SETD2-APOBEC1-nCas9-UNG,融合蛋白SETD2-APOBEC1-nCas9-UNG的氨基酸序列由2224个氨基酸残基组成,为由编码序列是序列表中序列9的第1-6612位的核苷酸序列编码的蛋白质。
融合蛋白SETD2-APOBEC1-nCas9-UNG从N端到C端依次为组蛋白甲基转移酶SETD2、胞苷脱氨酶(APOBEC1)、Cas9蛋白(nCas9),尿嘧啶糖基化酶(UNG)。SETD2的氨基酸序列由297个氨基酸残基组成,为由编码序列是序列表中序列9的第25-915位的核苷酸序列编码的蛋白质。
基于GBE碱基编辑系统得到的METTL3-NH3-GBE重组碱基编辑系统质粒含有核苷酸序列为序列表中序列10的重组基因METTL3-APOBEC1-nCas9-UNG,能够表达融合蛋白METTL3-APOBEC1-nCas9-UNG,融合蛋白METTL3-APOBEC1-nCas9-UNG的氨基酸序列由2486个氨基酸残基组成,为由编码序列是序列表中序列10的第1-7458位的核苷酸序列编码的蛋白质。
融合蛋白METTL3-APOBEC1-nCas9-UNG从N端到C端依次为RNA甲基转移酶METTL3、胞苷脱氨酶1(APOBEC1)、Cas9蛋白(nCas9),尿嘧啶糖基化酶(UNG)。METTL3的氨基酸序列由579个氨基酸残基组成,为由编码序列是序列表中序列10的第25-1761位的核苷酸序列编码的蛋白质。
2.细胞水平验证SETD2-NH3-GBE和METTL3-NH3-GBE重组碱基编辑系统效果
分别将SETD2-NH3-GBE和METTL3-NH3-GBE重组碱基编辑系统和实施例2步骤1中得到1种gRNA表达质粒(靶基因为VISTA,靶位点序列及扩增引物见表2)按照实施例3步骤1的方法共转染至293T细胞,并检测靶基因中编辑框内6号位胞嘧啶的碱基编辑效果,结果显示SETD2-NH3-GBE和METTL3-NH3-GBE重组碱基编辑系统相对于GBE编辑系统的编辑效果有明显的提升(图7)。
以上对本发明进行了详述。对于本领域技术人员来说,在不脱离本发明的宗旨和范围,以及无需进行不必要的实验情况下,可在等同参数、浓度和条件下,在较宽范围内实施本发明。虽然本发明给出了特殊的实施例,应该理解为,可以对本发明作进一步的改进。总之,按本发明的原理,本申请欲包括任何变更、用途或对本发明的改进,包括脱离了本申请中已公开范围,而用本领域已知的常规技术进行的改变。按以下附带的权利要求的范围,可以进行一些基本特征的应用。
序列表
<110> 中国科学院天津工业生物技术研究所
<120> 表观遗传因子在真核细胞中优化基因编辑工具的应用
<130> GNCSQ212224
<160> 10
<170> SIPOSequenceListing 1.0
<210> 1
<211> 2202
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 1
Met Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys
1 5 10 15
Arg Lys Val Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu
20 25 30
Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg
35 40 45
Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp Gly Gly
50 55 60
Arg His Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val
65 70 75 80
Glu Val Asn Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro
85 90 95
Asn Thr Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly
100 105 110
Glu Cys Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val
115 120 125
Thr Leu Phe Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg
130 135 140
Asn Arg Gln Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln
145 150 155 160
Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn
165 170 175
Tyr Ser Pro Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp
180 185 190
Val Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu Pro
195 200 205
Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe
210 215 220
Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile
225 230 235 240
Leu Trp Ala Thr Gly Leu Lys Ser Gly Gly Ser Ser Gly Gly Ser Ser
245 250 255
Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser
260 265 270
Gly Gly Ser Ser Gly Gly Ser Met Tyr Asn Met Met Glu Thr Glu Leu
275 280 285
Lys Pro Pro Gly Pro Gln Gln Thr Ser Gly Gly Gly Gly Gly Asn Ser
290 295 300
Thr Ala Ala Ala Ala Gly Gly Asn Gln Lys Asn Ser Pro Asp Arg Val
305 310 315 320
Lys Arg Pro Met Asn Ala Phe Met Val Trp Ser Arg Gly Gln Arg Arg
325 330 335
Lys Met Ala Gln Glu Asn Pro Lys Met His Asn Ser Glu Ile Ser Lys
340 345 350
Arg Leu Gly Ala Glu Trp Lys Leu Leu Ser Glu Thr Glu Lys Arg Pro
355 360 365
Phe Ile Asp Glu Ala Lys Arg Leu Arg Ala Leu His Met Lys Glu His
370 375 380
Pro Asp Tyr Lys Tyr Arg Pro Arg Arg Lys Thr Lys Thr Leu Met Lys
385 390 395 400
Lys Asp Lys Tyr Thr Leu Pro Gly Gly Leu Leu Ala Pro Gly Gly Asn
405 410 415
Ser Met Ala Ser Gly Val Gly Val Gly Ala Gly Leu Gly Ala Gly Val
420 425 430
Asn Gln Arg Met Asp Ser Tyr Ala His Met Asn Gly Trp Ser Asn Gly
435 440 445
Ser Tyr Ser Met Met Gln Asp Gln Leu Gly Tyr Pro Gln His Pro Gly
450 455 460
Leu Asn Ala His Gly Ala Ala Gln Met Gln Pro Met His Arg Tyr Asp
465 470 475 480
Val Ser Ala Leu Gln Tyr Asn Ser Met Thr Ser Ser Gln Thr Tyr Met
485 490 495
Asn Gly Ser Pro Thr Tyr Ser Met Ser Tyr Ser Gln Gln Gly Thr Pro
500 505 510
Gly Met Ala Leu Gly Ser Met Gly Ser Val Val Lys Ser Glu Ala Ser
515 520 525
Ser Ser Pro Pro Val Val Thr Ser Ser Ser His Ser Arg Ala Pro Cys
530 535 540
Gln Ala Gly Asp Leu Arg Asp Met Ile Ser Met Tyr Leu Pro Gly Ala
545 550 555 560
Glu Val Pro Glu Pro Ala Ala Pro Ser Arg Leu His Met Ser Gln His
565 570 575
Tyr Gln Ser Gly Pro Val Pro Gly Thr Ala Ile Asn Gly Thr Leu Pro
580 585 590
Leu Ser His Met Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu
595 600 605
Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser
610 615 620
Ser Gly Gly Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr
625 630 635 640
Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser
645 650 655
Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys
660 665 670
Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala
675 680 685
Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn
690 695 700
Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val
705 710 715 720
Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu
725 730 735
Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu
740 745 750
Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys
755 760 765
Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala
770 775 780
Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp
785 790 795 800
Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val
805 810 815
Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly
820 825 830
Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg
835 840 845
Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu
850 855 860
Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys
865 870 875 880
Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp
885 890 895
Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln
900 905 910
Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu
915 920 925
Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu
930 935 940
Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr
945 950 955 960
Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu
965 970 975
Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly
980 985 990
Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu
995 1000 1005
Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp
1010 1015 1020
Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln
1025 1030 1035 1040
Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe
1045 1050 1055
Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr
1060 1065 1070
Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg
1075 1080 1085
Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn
1090 1095 1100
Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu
1105 1110 1115 1120
Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro
1125 1130 1135
Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr
1140 1145 1150
Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser
1155 1160 1165
Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg
1170 1175 1180
Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu
1185 1190 1195 1200
Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala
1205 1210 1215
Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp
1220 1225 1230
Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu
1235 1240 1245
Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys
1250 1255 1260
Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg
1265 1270 1275 1280
Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly
1285 1290 1295
Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser
1300 1305 1310
Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser
1315 1320 1325
Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly
1330 1335 1340
Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile
1345 1350 1355 1360
Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys
1365 1370 1375
Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg
1380 1385 1390
Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met
1395 1400 1405
Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys
1410 1415 1420
Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu
1425 1430 1435 1440
Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp
1445 1450 1455
Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser
1460 1465 1470
Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp
1475 1480 1485
Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys
1490 1495 1500
Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr
1505 1510 1515 1520
Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser
1525 1530 1535
Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg
1540 1545 1550
Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr
1555 1560 1565
Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1570 1575 1580
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr
1585 1590 1595 1600
Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu
1605 1610 1615
Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu
1620 1625 1630
Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met
1635 1640 1645
Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe
1650 1655 1660
Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala
1665 1670 1675 1680
Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr
1685 1690 1695
Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys
1700 1705 1710
Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln
1715 1720 1725
Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1730 1735 1740
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly
1745 1750 1755 1760
Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val
1765 1770 1775
Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly
1780 1785 1790
Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe
1795 1800 1805
Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1810 1815 1820
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met
1825 1830 1835 1840
Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro
1845 1850 1855
Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu
1860 1865 1870
Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln
1875 1880 1885
His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser
1890 1895 1900
Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala
1905 1910 1915 1920
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile
1925 1930 1935
Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys
1940 1945 1950
Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu
1955 1960 1965
Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1970 1975 1980
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Ser Gly Gly Ser Gly
1985 1990 1995 2000
Gly Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr
2005 2010 2015
Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu
2020 2025 2030
Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His
2035 2040 2045
Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser
2050 2055 2060
Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn
2065 2070 2075 2080
Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Gly Gly Ser Gly
2085 2090 2095
Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln
2100 2105 2110
Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu
2115 2120 2125
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr
2130 2135 2140
Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro
2145 2150 2155 2160
Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn
2165 2170 2175
Lys Ile Lys Met Leu Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly Ser
2180 2185 2190
Glu Phe Glu Pro Lys Lys Lys Arg Lys Val
2195 2200
<210> 2
<211> 6609
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 2
atgaaacgga cagccgacgg aagcgagttc gagtcaccaa agaagaagcg gaaagtctcc 60
tcagagactg ggcctgtcgc cgtcgatcca accctgcgcc gccggattga acctcacgag 120
tttgaagtgt tctttgaccc ccgggagctg agaaaggaga catgcctgct gtacgagatc 180
aactggggag gcaggcactc catctggagg cacacctctc agaacacaaa taagcacgtg 240
gaggtgaact tcatcgagaa gtttaccaca gagcggtact tctgccccaa taccagatgt 300
agcatcacat ggtttctgag ctggtcccct tgcggagagt gtagcagggc catcaccgag 360
ttcctgtcca gatatccaca cgtgacactg tttatctaca tcgccaggct gtatcaccac 420
gcagacccaa ggaataggca gggcctgcgc gatctgatca gctccggcgt gaccatccag 480
atcatgacag agcaggagtc cggctactgc tggcggaact tcgtgaatta ttctcctagc 540
aacgaggccc actggcctag gtacccacac ctgtgggtgc gcctgtacgt gctggagctg 600
tattgcatca tcctgggcct gcccccttgt ctgaatatcc tgcggagaaa gcagccccag 660
ctgaccttct ttacaatcgc cctgcagtct tgtcactatc agaggctgcc accccacatc 720
ctgtgggcca caggcctgaa gtctggagga tctagcggag gatcctctgg cagcgagaca 780
ccaggaacaa gcgagtcagc aacaccagag agcagtggcg gcagcagcgg cggcagcatg 840
tacaacatga tggagacgga gctgaagccg ccgggcccgc agcaaacttc ggggggcggc 900
ggcggcaact ccaccgcggc ggcggccggc ggcaaccaga aaaacagccc ggaccgcgtc 960
aagcggccca tgaatgcctt catggtgtgg tcccgcgggc agcggcgcaa gatggcccag 1020
gagaacccca agatgcacaa ctcggagatc agcaagcgcc tgggcgccga gtggaaactt 1080
ttgtcggaga cggagaagcg gccgttcatc gacgaggcta agcggctgcg agcgctgcac 1140
atgaaggagc acccggatta taaataccgg ccccggcgga aaaccaagac gctcatgaag 1200
aaggataagt acacgctgcc cggcgggctg ctggcccccg gcggcaatag catggcgagc 1260
ggggtcgggg tgggcgccgg cctgggcgcg ggcgtgaacc agcgcatgga cagttacgcg 1320
cacatgaacg gctggagcaa cggcagctac agcatgatgc aggaccagct gggctacccg 1380
cagcacccgg gcctcaatgc gcacggcgca gcgcagatgc agcccatgca ccgctacgac 1440
gtgagcgccc tgcagtacaa ctccatgacc agctcgcaga cctacatgaa cggctcgccc 1500
acctacagca tgtcctactc gcagcagggc acccctggca tggctcttgg ctccatgggt 1560
tcggtggtca agtccgaggc cagctccagc ccccctgtgg ttacctcttc ctcccactcc 1620
agggcgccct gccaggccgg ggacctccgg gacatgatca gcatgtatct ccccggcgcc 1680
gaggtgccgg aacccgccgc ccccagcaga cttcacatgt cccagcacta ccagagcggc 1740
ccggtgcccg gcacggccat taacggcaca ctgcccctct cacacatgag cggaggatct 1800
agcggaggat caagcggaag cgagactcct ggaaccagcg aaagcgcaac cccagaaagc 1860
agcggaggaa gtagcggagg aagcgacaag aagtacagca tcggcctggc catcggcacc 1920
aactctgtgg gctgggccgt gatcaccgac gagtacaagg tgcccagcaa gaaattcaag 1980
gtgctgggca acaccgaccg gcacagcatc aagaagaacc tgatcggagc cctgctgttc 2040
gacagcggcg aaacagccga ggccacccgg ctgaagagaa ccgccagaag aagatacacc 2100
agacggaaga accggatctg ctatctgcaa gagatcttca gcaacgagat ggccaaggtg 2160
gacgacagct tcttccacag actggaagag tccttcctgg tggaagagga taagaagcac 2220
gagcggcacc ccatcttcgg caacatcgtg gacgaggtgg cctaccacga gaagtacccc 2280
accatctacc acctgagaaa gaaactggtg gacagcaccg acaaggccga cctgcggctg 2340
atctatctgg ccctggccca catgatcaag ttccggggcc acttcctgat cgagggcgac 2400
ctgaaccccg acaacagcga cgtggacaag ctgttcatcc agctggtgca gacctacaac 2460
cagctgttcg aggaaaaccc catcaacgcc agcggcgtgg acgccaaggc catcctgtct 2520
gccagactga gcaagagcag acggctggaa aatctgatcg cccagctgcc cggcgagaag 2580
aagaatggcc tgttcggaaa cctgattgcc ctgagcctgg gcctgacccc caacttcaag 2640
agcaacttcg acctggccga ggatgccaaa ctgcagctga gcaaggacac ctacgacgac 2700
gacctggaca acctgctggc ccagatcggc gaccagtacg ccgacctgtt tctggccgcc 2760
aagaacctgt ccgacgccat cctgctgagc gacatcctga gagtgaacac cgagatcacc 2820
aaggcccccc tgagcgcctc tatgatcaag agatacgacg agcaccacca ggacctgacc 2880
ctgctgaaag ctctcgtgcg gcagcagctg cctgagaagt acaaagagat tttcttcgac 2940
cagagcaaga acggctacgc cggctacatt gacggcggag ccagccagga agagttctac 3000
aagttcatca agcccatcct ggaaaagatg gacggcaccg aggaactgct cgtgaagctg 3060
aacagagagg acctgctgcg gaagcagcgg accttcgaca acggcagcat cccccaccag 3120
atccacctgg gagagctgca cgccattctg cggcggcagg aagattttta cccattcctg 3180
aaggacaacc gggaaaagat cgagaagatc ctgaccttcc gcatccccta ctacgtgggc 3240
cctctggcca ggggaaacag cagattcgcc tggatgacca gaaagagcga ggaaaccatc 3300
accccctgga acttcgagga agtggtggac aagggcgctt ccgcccagag cttcatcgag 3360
cggatgacca acttcgataa gaacctgccc aacgagaagg tgctgcccaa gcacagcctg 3420
ctgtacgagt acttcaccgt gtataacgag ctgaccaaag tgaaatacgt gaccgaggga 3480
atgagaaagc ccgccttcct gagcggcgag cagaaaaagg ccatcgtgga cctgctgttc 3540
aagaccaacc ggaaagtgac cgtgaagcag ctgaaagagg actacttcaa gaaaatcgag 3600
tgcttcgact ccgtggaaat ctccggcgtg gaagatcggt tcaacgcctc cctgggcaca 3660
taccacgatc tgctgaaaat tatcaaggac aaggacttcc tggacaatga ggaaaacgag 3720
gacattctgg aagatatcgt gctgaccctg acactgtttg aggacagaga gatgatcgag 3780
gaacggctga aaacctatgc ccacctgttc gacgacaaag tgatgaagca gctgaagcgg 3840
cggagataca ccggctgggg caggctgagc cggaagctga tcaacggcat ccgggacaag 3900
cagtccggca agacaatcct ggatttcctg aagtccgacg gcttcgccaa cagaaacttc 3960
atgcagctga tccacgacga cagcctgacc tttaaagagg acatccagaa agcccaggtg 4020
tccggccagg gcgatagcct gcacgagcac attgccaatc tggccggcag ccccgccatt 4080
aagaagggca tcctgcagac agtgaaggtg gtggacgagc tcgtgaaagt gatgggccgg 4140
cacaagcccg agaacatcgt gatcgaaatg gccagagaga accagaccac ccagaaggga 4200
cagaagaaca gccgcgagag aatgaagcgg atcgaagagg gcatcaaaga gctgggcagc 4260
cagatcctga aagaacaccc cgtggaaaac acccagctgc agaacgagaa gctgtacctg 4320
tactacctgc agaatgggcg ggatatgtac gtggaccagg aactggacat caaccggctg 4380
tccgactacg atgtggacca tatcgtgcct cagagctttc tgaaggacga ctccatcgac 4440
aacaaggtgc tgaccagaag cgacaagaac cggggcaaga gcgacaacgt gccctccgaa 4500
gaggtcgtga agaagatgaa gaactactgg cggcagctgc tgaacgccaa gctgattacc 4560
cagagaaagt tcgacaatct gaccaaggcc gagagaggcg gcctgagcga actggataag 4620
gccggcttca tcaagagaca gctggtggaa acccggcaga tcacaaagca cgtggcacag 4680
atcctggact cccggatgaa cactaagtac gacgagaatg acaagctgat ccgggaagtg 4740
aaagtgatca ccctgaagtc caagctggtg tccgatttcc ggaaggattt ccagttttac 4800
aaagtgcgcg agatcaacaa ctaccaccac gcccacgacg cctacctgaa cgccgtcgtg 4860
ggaaccgccc tgatcaaaaa gtaccctaag ctggaaagcg agttcgtgta cggcgactac 4920
aaggtgtacg acgtgcggaa gatgatcgcc aagagcgagc aggaaatcgg caaggctacc 4980
gccaagtact tcttctacag caacatcatg aactttttca agaccgagat taccctggcc 5040
aacggcgaga tccggaagcg gcctctgatc gagacaaacg gcgaaaccgg ggagatcgtg 5100
tgggataagg gccgggattt tgccaccgtg cggaaagtgc tgagcatgcc ccaagtgaat 5160
atcgtgaaaa agaccgaggt gcagacaggc ggcttcagca aagagtctat cctgcccaag 5220
aggaacagcg ataagctgat cgccagaaag aaggactggg accctaagaa gtacggcggc 5280
ttcgacagcc ccaccgtggc ctattctgtg ctggtggtgg ccaaagtgga aaagggcaag 5340
tccaagaaac tgaagagtgt gaaagagctg ctggggatca ccatcatgga aagaagcagc 5400
ttcgagaaga atcccatcga ctttctggaa gccaagggct acaaagaagt gaaaaaggac 5460
ctgatcatca agctgcctaa gtactccctg ttcgagctgg aaaacggccg gaagagaatg 5520
ctggcctctg ccggcgaact gcagaaggga aacgaactgg ccctgccctc caaatatgtg 5580
aacttcctgt acctggccag ccactatgag aagctgaagg gctcccccga ggataatgag 5640
cagaaacagc tgtttgtgga acagcacaag cactacctgg acgagatcat cgagcagatc 5700
agcgagttct ccaagagagt gatcctggcc gacgctaatc tggacaaagt gctgtccgcc 5760
tacaacaagc accgggataa gcccatcaga gagcaggccg agaatatcat ccacctgttt 5820
accctgacca atctgggagc ccctgccgcc ttcaagtact ttgacaccac catcgaccgg 5880
aagaggtaca ccagcaccaa agaggtgctg gacgccaccc tgatccacca gagcatcacc 5940
ggcctgtacg agacacggat cgacctgtct cagctgggag gtgacagcgg cgggagcggc 6000
gggagcgggg ggagcactaa tctgagcgac atcattgaga aggagactgg gaaacagctg 6060
gtcattcagg agtccatcct gatgctgcct gaggaggtgg aggaagtgat cggcaacaag 6120
ccagagtctg acatcctggt gcacaccgcc tacgacgagt ccacagatga gaatgtgatg 6180
ctgctgacct ctgacgcccc cgagtataag ccttgggccc tggtcatcca ggattctaac 6240
ggcgagaata agatcaagat gctgagcgga ggatccggag gatctggagg cagcaccaac 6300
ctgtctgaca tcatcgagaa ggagacaggc aagcagctgg tcatccagga gagcatcctg 6360
atgctgcccg aagaagtcga agaagtgatc ggaaacaagc ctgagagcga tatcctggtc 6420
cataccgcct acgacgagag taccgacgaa aatgtgatgc tgctgacatc cgacgcccca 6480
gagtataagc cctgggctct ggtcatccag gattccaacg gagagaacaa aatcaaaatg 6540
ctgtctggcg gctcaaaaag aaccgccgac ggcagcgaat tcgagcccaa gaagaagagg 6600
aaagtctaa 6609
<210> 3
<211> 2224
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 3
Met Pro Lys Lys Lys Arg Lys Val Met Tyr Asn Met Met Glu Thr Glu
1 5 10 15
Leu Lys Pro Pro Gly Pro Gln Gln Thr Ser Gly Gly Gly Gly Gly Asn
20 25 30
Ser Thr Ala Ala Ala Ala Gly Gly Asn Gln Lys Asn Ser Pro Asp Arg
35 40 45
Val Lys Arg Pro Met Asn Ala Phe Met Val Trp Ser Arg Gly Gln Arg
50 55 60
Arg Lys Met Ala Gln Glu Asn Pro Lys Met His Asn Ser Glu Ile Ser
65 70 75 80
Lys Arg Leu Gly Ala Glu Trp Lys Leu Leu Ser Glu Thr Glu Lys Arg
85 90 95
Pro Phe Ile Asp Glu Ala Lys Arg Leu Arg Ala Leu His Met Lys Glu
100 105 110
His Pro Asp Tyr Lys Tyr Arg Pro Arg Arg Lys Thr Lys Thr Leu Met
115 120 125
Lys Lys Asp Lys Tyr Thr Leu Pro Gly Gly Leu Leu Ala Pro Gly Gly
130 135 140
Asn Ser Met Ala Ser Gly Val Gly Val Gly Ala Gly Leu Gly Ala Gly
145 150 155 160
Val Asn Gln Arg Met Asp Ser Tyr Ala His Met Asn Gly Trp Ser Asn
165 170 175
Gly Ser Tyr Ser Met Met Gln Asp Gln Leu Gly Tyr Pro Gln His Pro
180 185 190
Gly Leu Asn Ala His Gly Ala Ala Gln Met Gln Pro Met His Arg Tyr
195 200 205
Asp Val Ser Ala Leu Gln Tyr Asn Ser Met Thr Ser Ser Gln Thr Tyr
210 215 220
Met Asn Gly Ser Pro Thr Tyr Ser Met Ser Tyr Ser Gln Gln Gly Thr
225 230 235 240
Pro Gly Met Ala Leu Gly Ser Met Gly Ser Val Val Lys Ser Glu Ala
245 250 255
Ser Ser Ser Pro Pro Val Val Thr Ser Ser Ser His Ser Arg Ala Pro
260 265 270
Cys Gln Ala Gly Asp Leu Arg Asp Met Ile Ser Met Tyr Leu Pro Gly
275 280 285
Ala Glu Val Pro Glu Pro Ala Ala Pro Ser Arg Leu His Met Ser Gln
290 295 300
His Tyr Gln Ser Gly Pro Val Pro Gly Thr Ala Ile Asn Gly Thr Leu
305 310 315 320
Pro Leu Ser His Met Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser
325 330 335
Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly
340 345 350
Ser Ser Gly Gly Ser Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro
355 360 365
Thr Leu Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp
370 375 380
Pro Arg Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp
385 390 395 400
Gly Gly Arg His Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys
405 410 415
His Val Glu Val Asn Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe
420 425 430
Cys Pro Asn Thr Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro
435 440 445
Cys Gly Glu Cys Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro
450 455 460
His Val Thr Leu Phe Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp
465 470 475 480
Pro Arg Asn Arg Gln Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr
485 490 495
Ile Gln Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe
500 505 510
Val Asn Tyr Ser Pro Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His
515 520 525
Leu Trp Val Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly
530 535 540
Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr
545 550 555 560
Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro
565 570 575
His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Ser Glu Thr Pro Gly
580 585 590
Thr Ser Glu Ser Ala Thr Pro Glu Leu Lys Asp Lys Lys Tyr Ser Ile
595 600 605
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
610 615 620
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
625 630 635 640
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
645 650 655
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
660 665 670
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
675 680 685
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
690 695 700
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
705 710 715 720
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
725 730 735
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
740 745 750
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
755 760 765
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
770 775 780
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
785 790 795 800
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
805 810 815
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
820 825 830
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
835 840 845
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
850 855 860
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
865 870 875 880
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
885 890 895
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
900 905 910
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
915 920 925
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
930 935 940
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
945 950 955 960
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
965 970 975
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
980 985 990
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
995 1000 1005
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
1010 1015 1020
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
1025 1030 1035 1040
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
1045 1050 1055
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
1060 1065 1070
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
1075 1080 1085
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
1090 1095 1100
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
1105 1110 1115 1120
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
1125 1130 1135
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
1140 1145 1150
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
1155 1160 1165
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
1170 1175 1180
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
1185 1190 1195 1200
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
1205 1210 1215
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
1220 1225 1230
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
1235 1240 1245
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
1250 1255 1260
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
1265 1270 1275 1280
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
1285 1290 1295
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
1300 1305 1310
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
1315 1320 1325
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
1330 1335 1340
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
1345 1350 1355 1360
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
1365 1370 1375
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
1380 1385 1390
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1395 1400 1405
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
1410 1415 1420
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
1425 1430 1435 1440
His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
1445 1450 1455
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
1460 1465 1470
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
1475 1480 1485
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
1490 1495 1500
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
1505 1510 1515 1520
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
1525 1530 1535
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
1540 1545 1550
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
1555 1560 1565
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1570 1575 1580
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1585 1590 1595 1600
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val
1605 1610 1615
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys
1620 1625 1630
Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1635 1640 1645
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile
1650 1655 1660
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
1665 1670 1675 1680
Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1685 1690 1695
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1700 1705 1710
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1715 1720 1725
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1730 1735 1740
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1745 1750 1755 1760
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1765 1770 1775
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1780 1785 1790
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1795 1800 1805
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1810 1815 1820
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1825 1830 1835 1840
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys
1845 1850 1855
Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu
1860 1865 1870
Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1875 1880 1885
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg
1890 1895 1900
Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly
1905 1910 1915 1920
Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1925 1930 1935
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1940 1945 1950
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1955 1960 1965
Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys
1970 1975 1980
Lys Thr Arg Asp Ser Gly Gly Ser Met Phe Gly Glu Ser Trp Lys Lys
1985 1990 1995 2000
His Leu Ser Gly Glu Phe Gly Lys Pro Tyr Phe Ile Lys Leu Met Gly
2005 2010 2015
Phe Val Ala Glu Glu Arg Lys His Tyr Thr Val Tyr Pro Pro Pro His
2020 2025 2030
Gln Val Phe Thr Trp Thr Gln Met Cys Asp Ile Lys Asp Val Lys Val
2035 2040 2045
Val Ile Leu Gly Gln Asp Pro Tyr His Gly Pro Asn Gln Ala His Gly
2050 2055 2060
Leu Cys Phe Ser Val Gln Arg Pro Val Pro Pro Pro Pro Ser Leu Glu
2065 2070 2075 2080
Asn Ile Tyr Lys Glu Leu Ser Thr Asp Ile Glu Asp Phe Val His Pro
2085 2090 2095
Gly His Gly Asp Leu Ser Gly Trp Ala Lys Gln Gly Val Leu Leu Leu
2100 2105 2110
Asn Ala Val Leu Thr Val Arg Ala His Gln Ala Asn Ser His Lys Glu
2115 2120 2125
Arg Gly Trp Glu Gln Phe Thr Asp Ala Val Val Ser Trp Leu Asn Gln
2130 2135 2140
Asn Ser Asn Gly Leu Val Phe Leu Leu Trp Gly Ser Tyr Ala Gln Lys
2145 2150 2155 2160
Lys Gly Ser Ala Ile Asp Arg Lys Arg His His Val Leu Gln Thr Ala
2165 2170 2175
His Pro Ser Pro Leu Ser Val Tyr Arg Gly Phe Phe Gly Cys Arg His
2180 2185 2190
Phe Ser Lys Thr Asn Glu Leu Leu Gln Lys Ser Gly Lys Lys Pro Ile
2195 2200 2205
Asp Trp Lys Glu Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
2210 2215 2220
<210> 4
<211> 6675
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 4
atgccaaaga agaagaggaa ggttatgtac aacatgatgg agacggagct gaagccgccg 60
ggcccgcagc aaacttcggg gggcggcggc ggcaactcca ccgcggcggc ggccggcggc 120
aaccagaaaa acagcccgga ccgcgtcaag cggcccatga atgccttcat ggtgtggtcc 180
cgcgggcagc ggcgcaagat ggcccaggag aaccccaaga tgcacaactc ggagatcagc 240
aagcgcctgg gcgccgagtg gaaacttttg tcggagacgg agaagcggcc gttcatcgac 300
gaggctaagc ggctgcgagc gctgcacatg aaggagcacc cggattataa ataccggccc 360
cggcggaaaa ccaagacgct catgaagaag gataagtaca cgctgcccgg cgggctgctg 420
gcccccggcg gcaatagcat ggcgagcggg gtcggggtgg gcgccggcct gggcgcgggc 480
gtgaaccagc gcatggacag ttacgcgcac atgaacggct ggagcaacgg cagctacagc 540
atgatgcagg accagctggg ctacccgcag cacccgggcc tcaatgcgca cggcgcagcg 600
cagatgcagc ccatgcaccg ctacgacgtg agcgccctgc agtacaactc catgaccagc 660
tcgcagacct acatgaacgg ctcgcccacc tacagcatgt cctactcgca gcagggcacc 720
cctggcatgg ctcttggctc catgggttcg gtggtcaagt ccgaggccag ctccagcccc 780
cctgtggtta cctcttcctc ccactccagg gcgccctgcc aggccgggga cctccgggac 840
atgatcagca tgtatctccc cggcgccgag gtgccggaac ccgccgcccc cagcagactt 900
cacatgtccc agcactacca gagcggcccg gtgcccggca cggccattaa cggcacactg 960
cccctctcac acatgagcgg aggatctagc ggaggatcaa gcggaagcga gactcctgga 1020
accagcgaaa gcgcaacccc agaaagcagc ggaggaagta gcggaggaag ctcatcggag 1080
accggccctg ttgctgttga ccccaccctg cggcggagaa tcgagccaca cgagttcgag 1140
gtgttcttcg acccaaggga gctccgcaag gagacgtgcc tcctgtacga gatcaactgg 1200
ggcggcaggc actccatctg gaggcacacc agccaaaaca ccaacaagca cgtggaggtc 1260
aacttcatcg agaagttcac caccgagagg tacttctgcc caaacacccg ctgctccatc 1320
acctggttcc tgtcctggag cccatgcggc gagtgctcca gggccatcac cgagttcctc 1380
agccgctacc cacacgtcac cctgttcatc tacatcgcca ggctctacca ccacgccgac 1440
ccaaggaaca ggcagggcct ccgcgacctg atctccagcg gcgtgaccat ccaaatcatg 1500
accgagcagg agtccggcta ctgctggagg aacttcgtca actactcccc aagcaacgag 1560
gcccactggc caaggtaccc acacctctgg gtgcgcctct acgtgctcga gctgtactgc 1620
atcatcctcg gcctgccacc atgcctcaac atcctgaggc gcaagcaacc acagctgacc 1680
ttcttcacca tcgccctcca aagctgccac taccagaggc tcccaccaca catcctgtgg 1740
gctaccggcc tcaagtccgg cagcgagacg ccaggcacct ccgagagcgc tacgcctgaa 1800
cttaaggaca agaagtactc gatcggcctc gccatcggga cgaactcagt tggctgggcc 1860
gtgatcaccg acgagtacaa ggtgccctct aagaagttca aggtcctggg gaacaccgac 1920
cgccattcca tcaagaagaa cctcatcggc gctctcctgt tcgacagcgg ggagaccgct 1980
gaggctacga ggctcaagag aaccgctagg cgccggtaca cgagaaggaa gaacaggatc 2040
tgctacctcc aagagatttt ctccaacgag atggccaagg ttgacgattc attcttccac 2100
cgcctggagg agtctttcct cgtggaggag gataagaagc acgagcggca tcccatcttc 2160
ggcaacatcg tggacgaggt tgcctaccac gagaagtacc ctacgatcta ccatctgcgg 2220
aagaagctcg tggactccac cgataaggcg gacctcagac tgatctacct cgctctggcc 2280
cacatgatca agttccgcgg ccatttcctg atcgaggggg atctcaaccc agacaacagc 2340
gatgttgaca agctgttcat ccaactcgtg cagacctaca accaactctt cgaggagaac 2400
ccgatcaacg cctctggcgt ggacgcgaag gctatcctgt ccgcgaggct ctcgaagtcc 2460
aggaggctgg agaacctgat cgctcagctc ccaggcgaga agaagaacgg cctgttcggg 2520
aacctcatcg ctctcagcct ggggctcacc ccgaacttca agtcgaactt cgatctcgct 2580
gaggacgcca agctgcaact ctccaaggac acctacgacg atgacctcga taacctcctg 2640
gcccagatcg gcgatcaata cgcggacctg ttcctcgctg ccaagaacct gtcggacgcc 2700
atcctcctgt cagatatcct ccgcgtgaac accgagatca cgaaggctcc actctctgcc 2760
tccatgatca agcgctacga cgagcaccat caggatctga ccctcctgaa ggcgctggtc 2820
cgccaacagc tcccggagaa gtacaaggag attttcttcg atcagtcgaa gaacggctac 2880
gctgggtaca tcgacggcgg ggcctcacaa gaggagttct acaagttcat caagccaatc 2940
ctggagaaga tggacggcac ggaggagctc ctggtgaagc tcaacaggga ggacctcctg 3000
cggaagcaga gaaccttcga taacggcagc atcccccacc aaatccatct cggggagctg 3060
cacgccatcc tgagaaggca agaggacttc taccctttcc tcaaggataa ccgggagaag 3120
atcgagaaga tcctgacctt cagaatccca tactacgtcg gccctctcgc gcgggggaac 3180
tcaagattcg cttggatgac ccgcaagtct gaggagacca tcacgccgtg gaacttcgag 3240
gaggtggtgg acaagggcgc tagcgctcag tcgttcatcg agaggatgac caacttcgac 3300
aagaacctgc ccaacgagaa ggtgctccct aagcactcgc tcctgtacga gtacttcacc 3360
gtctacaacg agctcacgaa ggtgaagtac gtcaccgagg gcatgcgcaa gccagcgttc 3420
ctgtccgggg agcagaagaa ggctatcgtg gacctcctgt tcaagaccaa ccggaaggtc 3480
acggttaagc aactcaagga ggactacttc aagaagatcg agtgcttcga ttcggtcgag 3540
atcagcggcg ttgaggaccg cttcaacgcc agcctcggga cctaccacga tctcctgaag 3600
atcatcaagg ataaggactt cctggacaac gaggagaacg aggatatcct ggaggacatc 3660
gtgctgaccc tcacgctgtt cgaggacagg gagatgatcg aggagcgcct gaagacgtac 3720
gcccatctct tcgatgacaa ggtcatgaag caactcaagc gccggagata caccggctgg 3780
gggaggctgt cccgcaagct catcaacggc atccgggaca agcagtccgg gaagaccatc 3840
ctcgacttcc tcaagagcga tggcttcgcc aacaggaact tcatgcaact gatccacgat 3900
gacagcctca ccttcaagga ggatatccaa aaggctcaag tgagcggcca gggggactcg 3960
ctgcacgagc atatcgcgaa cctcgctggc tcccccgcga tcaagaaggg catcctccag 4020
accgtgaagg ttgtggacga gctcgtgaag gtcatgggcc ggcacaagcc tgagaacatc 4080
gtcatcgaga tggccagaga gaaccaaacc acgcagaagg ggcaaaagaa ctctagggag 4140
cgcatgaagc gcatcgagga gggcatcaag gagctggggt cccaaatcct caaggagcac 4200
ccagtggaga acacccaact gcagaacgag aagctctacc tgtactacct ccagaacggc 4260
agggatatgt acgtggacca agagctggat atcaaccgcc tcagcgatta cgacgtcgat 4320
catatcgttc cccagtcttt cctgaaggat gactccatcg acaacaaggt cctcaccagg 4380
tcggacaaga accgcggcaa gtcagataac gttccatctg aggaggtcgt taagaagatg 4440
aagaactact ggaggcagct cctgaacgcc aagctgatca cgcaaaggaa gttcgacaac 4500
ctcaccaagg ctgagagagg cgggctctca gagctggaca aggccggctt catcaagcgg 4560
cagctggtcg agaccagaca aatcacgaag cacgttgcgc aaatcctcga ctctcggatg 4620
aacacgaagt acgatgagaa cgacaagctg atcagggagg ttaaggtgat caccctgaag 4680
tctaagctcg tctccgactt caggaaggat ttccagttct acaaggttcg cgagatcaac 4740
aactaccacc atgcccatga cgcttacctc aacgctgtgg tcggcaccgc tctgatcaag 4800
aagtacccaa agctggagtc cgagttcgtg tacggggact acaaggttta cgatgtgcgc 4860
aagatgatcg ccaagtcgga gcaagagatc ggcaaggcta ccgccaagta cttcttctac 4920
tcaaacatca tgaacttctt caagaccgag atcacgctgg ccaacggcga gatccggaag 4980
agaccgctca tcgagaccaa cggcgagacg ggggagatcg tgtgggacaa gggcagggat 5040
ttcgcgaccg tccgcaaggt tctctccatg ccccaggtga acatcgtcaa gaagaccgag 5100
gtccaaacgg gcgggttctc aaaggagtct atcctgccta agcggaacag cgacaagctc 5160
atcgccagaa agaaggactg ggacccaaag aagtacggcg ggttcgacag ccctaccgtg 5220
gcctactcgg tcctggttgt ggcgaaggtt gagaagggca agtccaagaa gctcaagagc 5280
gtgaaggagc tcctggggat caccatcatg gagaggtcca gcttcgagaa gaacccaatc 5340
gacttcctgg aggccaaggg ctacaaggag gtgaagaagg acctgatcat caagctcccg 5400
aagtactctc tcttcgagct ggagaacggc aggaagagaa tgctggcttc cgctggcgag 5460
ctccagaagg ggaacgagct cgcgctgcca agcaagtacg tgaacttcct ctacctggct 5520
tcccactacg agaagctcaa gggcagcccg gaggacaacg agcaaaagca gctgttcgtc 5580
gagcagcaca agcattacct cgacgagatc atcgagcaaa tctccgagtt cagcaagcgc 5640
gtgatcctcg ccgacgcgaa cctggataag gtcctctccg cctacaacaa gcaccgggac 5700
aagcccatca gagagcaagc ggagaacatc atccatctct tcaccctgac gaacctcggc 5760
gctcctgctg ctttcaagta cttcgacacc acgatcgatc ggaagagata cacctccacg 5820
aaggaggtcc tggacgcgac cctcatccac cagtcgatca ccggcctgta cgagacgagg 5880
atcgacctct cacaactcgg cggggataag agacccgcag caaccaagaa ggcagggcaa 5940
gcaaagaaga agaagacgcg tgactccggc ggcagcatgt ttggagagag ctggaagaag 6000
cacctcagcg gggagttcgg gaaaccgtat tttatcaagc taatgggatt tgttgcagaa 6060
gaaagaaagc attacactgt ttatccaccc ccacaccaag tcttcacctg gacccagatg 6120
tgtgacataa aagatgtgaa ggttgtcatc ctgggacagg atccatatca tggacctaat 6180
caagctcacg ggctctgctt tagtgttcaa aggcctgttc cgcctccgcc cagtttggag 6240
aacatttata aagagttgtc tacagacata gaggattttg ttcatcctgg ccatggagat 6300
ttatctgggt gggccaagca aggtgttctc cttctcaacg ctgtcctcac ggttcgtgcc 6360
catcaagcca actctcataa ggagcgaggc tgggagcagt tcactgatgc agttgtgtcc 6420
tggctaaatc agaactcgaa tggccttgtt ttcttgctct ggggctctta tgctcagaag 6480
aagggcagtg ccattgatag gaagcggcac catgtactac agacggctca tccctcccct 6540
ttgtcagtgt atagagggtt ctttggatgt agacactttt caaagaccaa tgagctgctg 6600
cagaagtctg gcaagaagcc cattgactgg aaggagctgt cgggggggag cccaaagaag 6660
aagcggaagg tgtag 6675
<210> 5
<211> 204
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 5
ccctcctacg ttgcggtcac acccttctcc cttcggggag acaacgacgg cggtggcggg 60
agcttctcca cggccgacca gctggagatg gtgaccgagc tgctgggagg agacatggtg 120
aaccagagtt tcatctgcga cccggacgac gagaccttca tcaaaaacat catcatccag 180
gactgtatgt ggagcggctt ctcg 204
<210> 6
<211> 5871
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 6
atgaaacgga cagccgacgg aagcgagttc gagtcaccaa agaagaagcg gaaagtctcc 60
tcagagactg ggcctgtcgc cgtcgatcca accctgcgcc gccggattga acctcacgag 120
tttgaagtgt tctttgaccc ccgggagctg agaaaggaga catgcctgct gtacgagatc 180
aactggggag gcaggcactc catctggagg cacacctctc agaacacaaa taagcacgtg 240
gaggtgaact tcatcgagaa gtttaccaca gagcggtact tctgccccaa taccagatgt 300
agcatcacat ggtttctgag ctggtcccct tgcggagagt gtagcagggc catcaccgag 360
ttcctgtcca gatatccaca cgtgacactg tttatctaca tcgccaggct gtatcaccac 420
gcagacccaa ggaataggca gggcctgcgc gatctgatca gctccggcgt gaccatccag 480
atcatgacag agcaggagtc cggctactgc tggcggaact tcgtgaatta ttctcctagc 540
aacgaggccc actggcctag gtacccacac ctgtgggtgc gcctgtacgt gctggagctg 600
tattgcatca tcctgggcct gcccccttgt ctgaatatcc tgcggagaaa gcagccccag 660
ctgaccttct ttacaatcgc cctgcagtct tgtcactatc agaggctgcc accccacatc 720
ctgtgggcca caggcctgaa gtctggagga tctagcggag gatcctctgg cagcgagaca 780
ccaggaacaa gcgagtcagc aacaccagag agcagtggcg gcagcagcgg cggcagccgc 840
gtcaagcggc ccatgaatgc cttcatggtg tggtcccgcg ggcagcggcg caagatggcc 900
caggagaacc ccaagatgca caactcggag atcagcaagc gcctgggcgc cgagtggaaa 960
cttttgtcgg agacggagaa gcggccgttc atcgacgagg ctaagcggct gcgagcgctg 1020
cacatgaagg agcacccgga ttataaatac cggagcggag gatctagcgg aggatcaagc 1080
ggaagcgaga ctcctggaac cagcgaaagc gcaaccccag aaagcagcgg aggaagtagc 1140
ggaggaagcg acaagaagta cagcatcggc ctggccatcg gcaccaactc tgtgggctgg 1200
gccgtgatca ccgacgagta caaggtgccc agcaagaaat tcaaggtgct gggcaacacc 1260
gaccggcaca gcatcaagaa gaacctgatc ggagccctgc tgttcgacag cggcgaaaca 1320
gccgaggcca cccggctgaa gagaaccgcc agaagaagat acaccagacg gaagaaccgg 1380
atctgctatc tgcaagagat cttcagcaac gagatggcca aggtggacga cagcttcttc 1440
cacagactgg aagagtcctt cctggtggaa gaggataaga agcacgagcg gcaccccatc 1500
ttcggcaaca tcgtggacga ggtggcctac cacgagaagt accccaccat ctaccacctg 1560
agaaagaaac tggtggacag caccgacaag gccgacctgc ggctgatcta tctggccctg 1620
gcccacatga tcaagttccg gggccacttc ctgatcgagg gcgacctgaa ccccgacaac 1680
agcgacgtgg acaagctgtt catccagctg gtgcagacct acaaccagct gttcgaggaa 1740
aaccccatca acgccagcgg cgtggacgcc aaggccatcc tgtctgccag actgagcaag 1800
agcagacggc tggaaaatct gatcgcccag ctgcccggcg agaagaagaa tggcctgttc 1860
ggaaacctga ttgccctgag cctgggcctg acccccaact tcaagagcaa cttcgacctg 1920
gccgaggatg ccaaactgca gctgagcaag gacacctacg acgacgacct ggacaacctg 1980
ctggcccaga tcggcgacca gtacgccgac ctgtttctgg ccgccaagaa cctgtccgac 2040
gccatcctgc tgagcgacat cctgagagtg aacaccgaga tcaccaaggc ccccctgagc 2100
gcctctatga tcaagagata cgacgagcac caccaggacc tgaccctgct gaaagctctc 2160
gtgcggcagc agctgcctga gaagtacaaa gagattttct tcgaccagag caagaacggc 2220
tacgccggct acattgacgg cggagccagc caggaagagt tctacaagtt catcaagccc 2280
atcctggaaa agatggacgg caccgaggaa ctgctcgtga agctgaacag agaggacctg 2340
ctgcggaagc agcggacctt cgacaacggc agcatccccc accagatcca cctgggagag 2400
ctgcacgcca ttctgcggcg gcaggaagat ttttacccat tcctgaagga caaccgggaa 2460
aagatcgaga agatcctgac cttccgcatc ccctactacg tgggccctct ggccagggga 2520
aacagcagat tcgcctggat gaccagaaag agcgaggaaa ccatcacccc ctggaacttc 2580
gaggaagtgg tggacaaggg cgcttccgcc cagagcttca tcgagcggat gaccaacttc 2640
gataagaacc tgcccaacga gaaggtgctg cccaagcaca gcctgctgta cgagtacttc 2700
accgtgtata acgagctgac caaagtgaaa tacgtgaccg agggaatgag aaagcccgcc 2760
ttcctgagcg gcgagcagaa aaaggccatc gtggacctgc tgttcaagac caaccggaaa 2820
gtgaccgtga agcagctgaa agaggactac ttcaagaaaa tcgagtgctt cgactccgtg 2880
gaaatctccg gcgtggaaga tcggttcaac gcctccctgg gcacatacca cgatctgctg 2940
aaaattatca aggacaagga cttcctggac aatgaggaaa acgaggacat tctggaagat 3000
atcgtgctga ccctgacact gtttgaggac agagagatga tcgaggaacg gctgaaaacc 3060
tatgcccacc tgttcgacga caaagtgatg aagcagctga agcggcggag atacaccggc 3120
tggggcaggc tgagccggaa gctgatcaac ggcatccggg acaagcagtc cggcaagaca 3180
atcctggatt tcctgaagtc cgacggcttc gccaacagaa acttcatgca gctgatccac 3240
gacgacagcc tgacctttaa agaggacatc cagaaagccc aggtgtccgg ccagggcgat 3300
agcctgcacg agcacattgc caatctggcc ggcagccccg ccattaagaa gggcatcctg 3360
cagacagtga aggtggtgga cgagctcgtg aaagtgatgg gccggcacaa gcccgagaac 3420
atcgtgatcg aaatggccag agagaaccag accacccaga agggacagaa gaacagccgc 3480
gagagaatga agcggatcga agagggcatc aaagagctgg gcagccagat cctgaaagaa 3540
caccccgtgg aaaacaccca gctgcagaac gagaagctgt acctgtacta cctgcagaat 3600
gggcgggata tgtacgtgga ccaggaactg gacatcaacc ggctgtccga ctacgatgtg 3660
gaccatatcg tgcctcagag ctttctgaag gacgactcca tcgacaacaa ggtgctgacc 3720
agaagcgaca agaaccgggg caagagcgac aacgtgccct ccgaagaggt cgtgaagaag 3780
atgaagaact actggcggca gctgctgaac gccaagctga ttacccagag aaagttcgac 3840
aatctgacca aggccgagag aggcggcctg agcgaactgg ataaggccgg cttcatcaag 3900
agacagctgg tggaaacccg gcagatcaca aagcacgtgg cacagatcct ggactcccgg 3960
atgaacacta agtacgacga gaatgacaag ctgatccggg aagtgaaagt gatcaccctg 4020
aagtccaagc tggtgtccga tttccggaag gatttccagt tttacaaagt gcgcgagatc 4080
aacaactacc accacgccca cgacgcctac ctgaacgccg tcgtgggaac cgccctgatc 4140
aaaaagtacc ctaagctgga aagcgagttc gtgtacggcg actacaaggt gtacgacgtg 4200
cggaagatga tcgccaagag cgagcaggaa atcggcaagg ctaccgccaa gtacttcttc 4260
tacagcaaca tcatgaactt tttcaagacc gagattaccc tggccaacgg cgagatccgg 4320
aagcggcctc tgatcgagac aaacggcgaa accggggaga tcgtgtggga taagggccgg 4380
gattttgcca ccgtgcggaa agtgctgagc atgccccaag tgaatatcgt gaaaaagacc 4440
gaggtgcaga caggcggctt cagcaaagag tctatcctgc ccaagaggaa cagcgataag 4500
ctgatcgcca gaaagaagga ctgggaccct aagaagtacg gcggcttcga cagccccacc 4560
gtggcctatt ctgtgctggt ggtggccaaa gtggaaaagg gcaagtccaa gaaactgaag 4620
agtgtgaaag agctgctggg gatcaccatc atggaaagaa gcagcttcga gaagaatccc 4680
atcgactttc tggaagccaa gggctacaaa gaagtgaaaa aggacctgat catcaagctg 4740
cctaagtact ccctgttcga gctggaaaac ggccggaaga gaatgctggc ctctgccggc 4800
gaactgcaga agggaaacga actggccctg ccctccaaat atgtgaactt cctgtacctg 4860
gccagccact atgagaagct gaagggctcc cccgaggata atgagcagaa acagctgttt 4920
gtggaacagc acaagcacta cctggacgag atcatcgagc agatcagcga gttctccaag 4980
agagtgatcc tggccgacgc taatctggac aaagtgctgt ccgcctacaa caagcaccgg 5040
gataagccca tcagagagca ggccgagaat atcatccacc tgtttaccct gaccaatctg 5100
ggagcccctg ccgccttcaa gtactttgac accaccatcg accggaagag gtacaccagc 5160
accaaagagg tgctggacgc caccctgatc caccagagca tcaccggcct gtacgagaca 5220
cggatcgacc tgtctcagct gggaggtgac agcggcggga gcggcgggag cggggggagc 5280
actaatctga gcgacatcat tgagaaggag actgggaaac agctggtcat tcaggagtcc 5340
atcctgatgc tgcctgagga ggtggaggaa gtgatcggca acaagccaga gtctgacatc 5400
ctggtgcaca ccgcctacga cgagtccaca gatgagaatg tgatgctgct gacctctgac 5460
gcccccgagt ataagccttg ggccctggtc atccaggatt ctaacggcga gaataagatc 5520
aagatgctga gcggaggatc cggaggatct ggaggcagca ccaacctgtc tgacatcatc 5580
gagaaggaga caggcaagca gctggtcatc caggagagca tcctgatgct gcccgaagaa 5640
gtcgaagaag tgatcggaaa caagcctgag agcgatatcc tggtccatac cgcctacgac 5700
gagagtaccg acgaaaatgt gatgctgctg acatccgacg ccccagagta taagccctgg 5760
gctctggtca tccaggattc caacggagag aacaaaatca aaatgctgtc tggcggctca 5820
aaaagaaccg ccgacggcag cgaattcgag cccaagaaga agaggaaagt c 5871
<210> 7
<211> 6006
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 7
atgaaacgga cagccgacgg aagcgagttc gagtcaccaa agaagaagcg gaaagtctcc 60
tcagagactg ggcctgtcgc cgtcgatcca accctgcgcc gccggattga acctcacgag 120
tttgaagtgt tctttgaccc ccgggagctg agaaaggaga catgcctgct gtacgagatc 180
aactggggag gcaggcactc catctggagg cacacctctc agaacacaaa taagcacgtg 240
gaggtgaact tcatcgagaa gtttaccaca gagcggtact tctgccccaa taccagatgt 300
agcatcacat ggtttctgag ctggtcccct tgcggagagt gtagcagggc catcaccgag 360
ttcctgtcca gatatccaca cgtgacactg tttatctaca tcgccaggct gtatcaccac 420
gcagacccaa ggaataggca gggcctgcgc gatctgatca gctccggcgt gaccatccag 480
atcatgacag agcaggagtc cggctactgc tggcggaact tcgtgaatta ttctcctagc 540
aacgaggccc actggcctag gtacccacac ctgtgggtgc gcctgtacgt gctggagctg 600
tattgcatca tcctgggcct gcccccttgt ctgaatatcc tgcggagaaa gcagccccag 660
ctgaccttct ttacaatcgc cctgcagtct tgtcactatc agaggctgcc accccacatc 720
ctgtgggcca caggcctgaa gtctggagga tctagcggag gatcctctgg cagcgagaca 780
ccaggaacaa gcgagtcagc aacaccagag agcagtggcg gcagcagcgg cggcagcgac 840
gtgagcgccc tgcagtacaa ctccatgacc agctcgcaga cctacatgaa cggctcgccc 900
acctacagca tgtcctactc gcagcagggc acccctggca tggctcttgg ctccatgggt 960
tcggtggtca agtccgaggc cagctccagc ccccctgtgg ttacctcttc ctcccactcc 1020
agggcgccct gccaggccgg ggacctccgg gacatgatca gcatgtatct ccccggcgcc 1080
gaggtgccgg aacccgccgc ccccagcaga cttcacatgt cccagcacta ccagagcggc 1140
ccggtgcccg gcacggccat taacggcaca ctgcccctct cacacatgag cggaggatct 1200
agcggaggat caagcggaag cgagactcct ggaaccagcg aaagcgcaac cccagaaagc 1260
agcggaggaa gtagcggagg aagcgacaag aagtacagca tcggcctggc catcggcacc 1320
aactctgtgg gctgggccgt gatcaccgac gagtacaagg tgcccagcaa gaaattcaag 1380
gtgctgggca acaccgaccg gcacagcatc aagaagaacc tgatcggagc cctgctgttc 1440
gacagcggcg aaacagccga ggccacccgg ctgaagagaa ccgccagaag aagatacacc 1500
agacggaaga accggatctg ctatctgcaa gagatcttca gcaacgagat ggccaaggtg 1560
gacgacagct tcttccacag actggaagag tccttcctgg tggaagagga taagaagcac 1620
gagcggcacc ccatcttcgg caacatcgtg gacgaggtgg cctaccacga gaagtacccc 1680
accatctacc acctgagaaa gaaactggtg gacagcaccg acaaggccga cctgcggctg 1740
atctatctgg ccctggccca catgatcaag ttccggggcc acttcctgat cgagggcgac 1800
ctgaaccccg acaacagcga cgtggacaag ctgttcatcc agctggtgca gacctacaac 1860
cagctgttcg aggaaaaccc catcaacgcc agcggcgtgg acgccaaggc catcctgtct 1920
gccagactga gcaagagcag acggctggaa aatctgatcg cccagctgcc cggcgagaag 1980
aagaatggcc tgttcggaaa cctgattgcc ctgagcctgg gcctgacccc caacttcaag 2040
agcaacttcg acctggccga ggatgccaaa ctgcagctga gcaaggacac ctacgacgac 2100
gacctggaca acctgctggc ccagatcggc gaccagtacg ccgacctgtt tctggccgcc 2160
aagaacctgt ccgacgccat cctgctgagc gacatcctga gagtgaacac cgagatcacc 2220
aaggcccccc tgagcgcctc tatgatcaag agatacgacg agcaccacca ggacctgacc 2280
ctgctgaaag ctctcgtgcg gcagcagctg cctgagaagt acaaagagat tttcttcgac 2340
cagagcaaga acggctacgc cggctacatt gacggcggag ccagccagga agagttctac 2400
aagttcatca agcccatcct ggaaaagatg gacggcaccg aggaactgct cgtgaagctg 2460
aacagagagg acctgctgcg gaagcagcgg accttcgaca acggcagcat cccccaccag 2520
atccacctgg gagagctgca cgccattctg cggcggcagg aagattttta cccattcctg 2580
aaggacaacc gggaaaagat cgagaagatc ctgaccttcc gcatccccta ctacgtgggc 2640
cctctggcca ggggaaacag cagattcgcc tggatgacca gaaagagcga ggaaaccatc 2700
accccctgga acttcgagga agtggtggac aagggcgctt ccgcccagag cttcatcgag 2760
cggatgacca acttcgataa gaacctgccc aacgagaagg tgctgcccaa gcacagcctg 2820
ctgtacgagt acttcaccgt gtataacgag ctgaccaaag tgaaatacgt gaccgaggga 2880
atgagaaagc ccgccttcct gagcggcgag cagaaaaagg ccatcgtgga cctgctgttc 2940
aagaccaacc ggaaagtgac cgtgaagcag ctgaaagagg actacttcaa gaaaatcgag 3000
tgcttcgact ccgtggaaat ctccggcgtg gaagatcggt tcaacgcctc cctgggcaca 3060
taccacgatc tgctgaaaat tatcaaggac aaggacttcc tggacaatga ggaaaacgag 3120
gacattctgg aagatatcgt gctgaccctg acactgtttg aggacagaga gatgatcgag 3180
gaacggctga aaacctatgc ccacctgttc gacgacaaag tgatgaagca gctgaagcgg 3240
cggagataca ccggctgggg caggctgagc cggaagctga tcaacggcat ccgggacaag 3300
cagtccggca agacaatcct ggatttcctg aagtccgacg gcttcgccaa cagaaacttc 3360
atgcagctga tccacgacga cagcctgacc tttaaagagg acatccagaa agcccaggtg 3420
tccggccagg gcgatagcct gcacgagcac attgccaatc tggccggcag ccccgccatt 3480
aagaagggca tcctgcagac agtgaaggtg gtggacgagc tcgtgaaagt gatgggccgg 3540
cacaagcccg agaacatcgt gatcgaaatg gccagagaga accagaccac ccagaaggga 3600
cagaagaaca gccgcgagag aatgaagcgg atcgaagagg gcatcaaaga gctgggcagc 3660
cagatcctga aagaacaccc cgtggaaaac acccagctgc agaacgagaa gctgtacctg 3720
tactacctgc agaatgggcg ggatatgtac gtggaccagg aactggacat caaccggctg 3780
tccgactacg atgtggacca tatcgtgcct cagagctttc tgaaggacga ctccatcgac 3840
aacaaggtgc tgaccagaag cgacaagaac cggggcaaga gcgacaacgt gccctccgaa 3900
gaggtcgtga agaagatgaa gaactactgg cggcagctgc tgaacgccaa gctgattacc 3960
cagagaaagt tcgacaatct gaccaaggcc gagagaggcg gcctgagcga actggataag 4020
gccggcttca tcaagagaca gctggtggaa acccggcaga tcacaaagca cgtggcacag 4080
atcctggact cccggatgaa cactaagtac gacgagaatg acaagctgat ccgggaagtg 4140
aaagtgatca ccctgaagtc caagctggtg tccgatttcc ggaaggattt ccagttttac 4200
aaagtgcgcg agatcaacaa ctaccaccac gcccacgacg cctacctgaa cgccgtcgtg 4260
ggaaccgccc tgatcaaaaa gtaccctaag ctggaaagcg agttcgtgta cggcgactac 4320
aaggtgtacg acgtgcggaa gatgatcgcc aagagcgagc aggaaatcgg caaggctacc 4380
gccaagtact tcttctacag caacatcatg aactttttca agaccgagat taccctggcc 4440
aacggcgaga tccggaagcg gcctctgatc gagacaaacg gcgaaaccgg ggagatcgtg 4500
tgggataagg gccgggattt tgccaccgtg cggaaagtgc tgagcatgcc ccaagtgaat 4560
atcgtgaaaa agaccgaggt gcagacaggc ggcttcagca aagagtctat cctgcccaag 4620
aggaacagcg ataagctgat cgccagaaag aaggactggg accctaagaa gtacggcggc 4680
ttcgacagcc ccaccgtggc ctattctgtg ctggtggtgg ccaaagtgga aaagggcaag 4740
tccaagaaac tgaagagtgt gaaagagctg ctggggatca ccatcatgga aagaagcagc 4800
ttcgagaaga atcccatcga ctttctggaa gccaagggct acaaagaagt gaaaaaggac 4860
ctgatcatca agctgcctaa gtactccctg ttcgagctgg aaaacggccg gaagagaatg 4920
ctggcctctg ccggcgaact gcagaaggga aacgaactgg ccctgccctc caaatatgtg 4980
aacttcctgt acctggccag ccactatgag aagctgaagg gctcccccga ggataatgag 5040
cagaaacagc tgtttgtgga acagcacaag cactacctgg acgagatcat cgagcagatc 5100
agcgagttct ccaagagagt gatcctggcc gacgctaatc tggacaaagt gctgtccgcc 5160
tacaacaagc accgggataa gcccatcaga gagcaggccg agaatatcat ccacctgttt 5220
accctgacca atctgggagc ccctgccgcc ttcaagtact ttgacaccac catcgaccgg 5280
aagaggtaca ccagcaccaa agaggtgctg gacgccaccc tgatccacca gagcatcacc 5340
ggcctgtacg agacacggat cgacctgtct cagctgggag gtgacagcgg cgggagcggc 5400
gggagcgggg ggagcactaa tctgagcgac atcattgaga aggagactgg gaaacagctg 5460
gtcattcagg agtccatcct gatgctgcct gaggaggtgg aggaagtgat cggcaacaag 5520
ccagagtctg acatcctggt gcacaccgcc tacgacgagt ccacagatga gaatgtgatg 5580
ctgctgacct ctgacgcccc cgagtataag ccttgggccc tggtcatcca ggattctaac 5640
ggcgagaata agatcaagat gctgagcgga ggatccggag gatctggagg cagcaccaac 5700
ctgtctgaca tcatcgagaa ggagacaggc aagcagctgg tcatccagga gagcatcctg 5760
atgctgcccg aagaagtcga agaagtgatc ggaaacaagc ctgagagcga tatcctggtc 5820
cataccgcct acgacgagag taccgacgaa aatgtgatgc tgctgacatc cgacgcccca 5880
gagtataagc cctgggctct ggtcatccag gattccaacg gagagaacaa aatcaaaatg 5940
ctgtctggcg gctcaaaaag aaccgccgac ggcagcgaat tcgagcccaa gaagaagagg 6000
aaagtc 6006
<210> 8
<211> 6072
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 8
atgccaaaga agaagaggaa ggttgacgtg agcgccctgc agtacaactc catgaccagc 60
tcgcagacct acatgaacgg ctcgcccacc tacagcatgt cctactcgca gcagggcacc 120
cctggcatgg ctcttggctc catgggttcg gtggtcaagt ccgaggccag ctccagcccc 180
cctgtggtta cctcttcctc ccactccagg gcgccctgcc aggccgggga cctccgggac 240
atgatcagca tgtatctccc cggcgccgag gtgccggaac ccgccgcccc cagcagactt 300
cacatgtccc agcactacca gagcggcccg gtgcccggca cggccattaa cggcacactg 360
cccctctcac acatgagcgg aggatctagc ggaggatcaa gcggaagcga gactcctgga 420
accagcgaaa gcgcaacccc agaaagcagc ggaggaagta gcggaggaag ctcatcggag 480
accggccctg ttgctgttga ccccaccctg cggcggagaa tcgagccaca cgagttcgag 540
gtgttcttcg acccaaggga gctccgcaag gagacgtgcc tcctgtacga gatcaactgg 600
ggcggcaggc actccatctg gaggcacacc agccaaaaca ccaacaagca cgtggaggtc 660
aacttcatcg agaagttcac caccgagagg tacttctgcc caaacacccg ctgctccatc 720
acctggttcc tgtcctggag cccatgcggc gagtgctcca gggccatcac cgagttcctc 780
agccgctacc cacacgtcac cctgttcatc tacatcgcca ggctctacca ccacgccgac 840
ccaaggaaca ggcagggcct ccgcgacctg atctccagcg gcgtgaccat ccaaatcatg 900
accgagcagg agtccggcta ctgctggagg aacttcgtca actactcccc aagcaacgag 960
gcccactggc caaggtaccc acacctctgg gtgcgcctct acgtgctcga gctgtactgc 1020
atcatcctcg gcctgccacc atgcctcaac atcctgaggc gcaagcaacc acagctgacc 1080
ttcttcacca tcgccctcca aagctgccac taccagaggc tcccaccaca catcctgtgg 1140
gctaccggcc tcaagtccgg cagcgagacg ccaggcacct ccgagagcgc tacgcctgaa 1200
cttaaggaca agaagtactc gatcggcctc gccatcggga cgaactcagt tggctgggcc 1260
gtgatcaccg acgagtacaa ggtgccctct aagaagttca aggtcctggg gaacaccgac 1320
cgccattcca tcaagaagaa cctcatcggc gctctcctgt tcgacagcgg ggagaccgct 1380
gaggctacga ggctcaagag aaccgctagg cgccggtaca cgagaaggaa gaacaggatc 1440
tgctacctcc aagagatttt ctccaacgag atggccaagg ttgacgattc attcttccac 1500
cgcctggagg agtctttcct cgtggaggag gataagaagc acgagcggca tcccatcttc 1560
ggcaacatcg tggacgaggt tgcctaccac gagaagtacc ctacgatcta ccatctgcgg 1620
aagaagctcg tggactccac cgataaggcg gacctcagac tgatctacct cgctctggcc 1680
cacatgatca agttccgcgg ccatttcctg atcgaggggg atctcaaccc agacaacagc 1740
gatgttgaca agctgttcat ccaactcgtg cagacctaca accaactctt cgaggagaac 1800
ccgatcaacg cctctggcgt ggacgcgaag gctatcctgt ccgcgaggct ctcgaagtcc 1860
aggaggctgg agaacctgat cgctcagctc ccaggcgaga agaagaacgg cctgttcggg 1920
aacctcatcg ctctcagcct ggggctcacc ccgaacttca agtcgaactt cgatctcgct 1980
gaggacgcca agctgcaact ctccaaggac acctacgacg atgacctcga taacctcctg 2040
gcccagatcg gcgatcaata cgcggacctg ttcctcgctg ccaagaacct gtcggacgcc 2100
atcctcctgt cagatatcct ccgcgtgaac accgagatca cgaaggctcc actctctgcc 2160
tccatgatca agcgctacga cgagcaccat caggatctga ccctcctgaa ggcgctggtc 2220
cgccaacagc tcccggagaa gtacaaggag attttcttcg atcagtcgaa gaacggctac 2280
gctgggtaca tcgacggcgg ggcctcacaa gaggagttct acaagttcat caagccaatc 2340
ctggagaaga tggacggcac ggaggagctc ctggtgaagc tcaacaggga ggacctcctg 2400
cggaagcaga gaaccttcga taacggcagc atcccccacc aaatccatct cggggagctg 2460
cacgccatcc tgagaaggca agaggacttc taccctttcc tcaaggataa ccgggagaag 2520
atcgagaaga tcctgacctt cagaatccca tactacgtcg gccctctcgc gcgggggaac 2580
tcaagattcg cttggatgac ccgcaagtct gaggagacca tcacgccgtg gaacttcgag 2640
gaggtggtgg acaagggcgc tagcgctcag tcgttcatcg agaggatgac caacttcgac 2700
aagaacctgc ccaacgagaa ggtgctccct aagcactcgc tcctgtacga gtacttcacc 2760
gtctacaacg agctcacgaa ggtgaagtac gtcaccgagg gcatgcgcaa gccagcgttc 2820
ctgtccgggg agcagaagaa ggctatcgtg gacctcctgt tcaagaccaa ccggaaggtc 2880
acggttaagc aactcaagga ggactacttc aagaagatcg agtgcttcga ttcggtcgag 2940
atcagcggcg ttgaggaccg cttcaacgcc agcctcggga cctaccacga tctcctgaag 3000
atcatcaagg ataaggactt cctggacaac gaggagaacg aggatatcct ggaggacatc 3060
gtgctgaccc tcacgctgtt cgaggacagg gagatgatcg aggagcgcct gaagacgtac 3120
gcccatctct tcgatgacaa ggtcatgaag caactcaagc gccggagata caccggctgg 3180
gggaggctgt cccgcaagct catcaacggc atccgggaca agcagtccgg gaagaccatc 3240
ctcgacttcc tcaagagcga tggcttcgcc aacaggaact tcatgcaact gatccacgat 3300
gacagcctca ccttcaagga ggatatccaa aaggctcaag tgagcggcca gggggactcg 3360
ctgcacgagc atatcgcgaa cctcgctggc tcccccgcga tcaagaaggg catcctccag 3420
accgtgaagg ttgtggacga gctcgtgaag gtcatgggcc ggcacaagcc tgagaacatc 3480
gtcatcgaga tggccagaga gaaccaaacc acgcagaagg ggcaaaagaa ctctagggag 3540
cgcatgaagc gcatcgagga gggcatcaag gagctggggt cccaaatcct caaggagcac 3600
ccagtggaga acacccaact gcagaacgag aagctctacc tgtactacct ccagaacggc 3660
agggatatgt acgtggacca agagctggat atcaaccgcc tcagcgatta cgacgtcgat 3720
catatcgttc cccagtcttt cctgaaggat gactccatcg acaacaaggt cctcaccagg 3780
tcggacaaga accgcggcaa gtcagataac gttccatctg aggaggtcgt taagaagatg 3840
aagaactact ggaggcagct cctgaacgcc aagctgatca cgcaaaggaa gttcgacaac 3900
ctcaccaagg ctgagagagg cgggctctca gagctggaca aggccggctt catcaagcgg 3960
cagctggtcg agaccagaca aatcacgaag cacgttgcgc aaatcctcga ctctcggatg 4020
aacacgaagt acgatgagaa cgacaagctg atcagggagg ttaaggtgat caccctgaag 4080
tctaagctcg tctccgactt caggaaggat ttccagttct acaaggttcg cgagatcaac 4140
aactaccacc atgcccatga cgcttacctc aacgctgtgg tcggcaccgc tctgatcaag 4200
aagtacccaa agctggagtc cgagttcgtg tacggggact acaaggttta cgatgtgcgc 4260
aagatgatcg ccaagtcgga gcaagagatc ggcaaggcta ccgccaagta cttcttctac 4320
tcaaacatca tgaacttctt caagaccgag atcacgctgg ccaacggcga gatccggaag 4380
agaccgctca tcgagaccaa cggcgagacg ggggagatcg tgtgggacaa gggcagggat 4440
ttcgcgaccg tccgcaaggt tctctccatg ccccaggtga acatcgtcaa gaagaccgag 4500
gtccaaacgg gcgggttctc aaaggagtct atcctgccta agcggaacag cgacaagctc 4560
atcgccagaa agaaggactg ggacccaaag aagtacggcg ggttcgacag ccctaccgtg 4620
gcctactcgg tcctggttgt ggcgaaggtt gagaagggca agtccaagaa gctcaagagc 4680
gtgaaggagc tcctggggat caccatcatg gagaggtcca gcttcgagaa gaacccaatc 4740
gacttcctgg aggccaaggg ctacaaggag gtgaagaagg acctgatcat caagctcccg 4800
aagtactctc tcttcgagct ggagaacggc aggaagagaa tgctggcttc cgctggcgag 4860
ctccagaagg ggaacgagct cgcgctgcca agcaagtacg tgaacttcct ctacctggct 4920
tcccactacg agaagctcaa gggcagcccg gaggacaacg agcaaaagca gctgttcgtc 4980
gagcagcaca agcattacct cgacgagatc atcgagcaaa tctccgagtt cagcaagcgc 5040
gtgatcctcg ccgacgcgaa cctggataag gtcctctccg cctacaacaa gcaccgggac 5100
aagcccatca gagagcaagc ggagaacatc atccatctct tcaccctgac gaacctcggc 5160
gctcctgctg ctttcaagta cttcgacacc acgatcgatc ggaagagata cacctccacg 5220
aaggaggtcc tggacgcgac cctcatccac cagtcgatca ccggcctgta cgagacgagg 5280
atcgacctct cacaactcgg cggggataag agacccgcag caaccaagaa ggcagggcaa 5340
gcaaagaaga agaagacgcg tgactccggc ggcagcatgt ttggagagag ctggaagaag 5400
cacctcagcg gggagttcgg gaaaccgtat tttatcaagc taatgggatt tgttgcagaa 5460
gaaagaaagc attacactgt ttatccaccc ccacaccaag tcttcacctg gacccagatg 5520
tgtgacataa aagatgtgaa ggttgtcatc ctgggacagg atccatatca tggacctaat 5580
caagctcacg ggctctgctt tagtgttcaa aggcctgttc cgcctccgcc cagtttggag 5640
aacatttata aagagttgtc tacagacata gaggattttg ttcatcctgg ccatggagat 5700
ttatctgggt gggccaagca aggtgttctc cttctcaacg ctgtcctcac ggttcgtgcc 5760
catcaagcca actctcataa ggagcgaggc tgggagcagt tcactgatgc agttgtgtcc 5820
tggctaaatc agaactcgaa tggccttgtt ttcttgctct ggggctctta tgctcagaag 5880
aagggcagtg ccattgatag gaagcggcac catgtactac agacggctca tccctcccct 5940
ttgtcagtgt atagagggtt ctttggatgt agacactttt caaagaccaa tgagctgctg 6000
cagaagtctg gcaagaagcc cattgactgg aaggagctgt cgggggggag cccaaagaag 6060
aagcggaagg tg 6072
<210> 9
<211> 6612
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 9
atgccaaaga agaagaggaa ggttctggtt gggccctcct gtgtcatgga tgacttcagg 60
gacccacagc gatggaagga atgtgccaag caagggaaaa tgccatgtta ctttgatctt 120
attgaagaaa atgtttattt aacagaaaga aagaagaata aatctcatcg agatattaag 180
cgaatgcagt gtgagtgtac acctctttct aaagatgaaa gagctcaagg tgaaatagca 240
tgtggggaag attgtcttaa tcgtcttctc atgattgaat gttcttctcg gtgtccaaat 300
ggggattatt gttccaatag acggtttcag agaaaacagc atgcagatgt ggaagtcata 360
ctcacagaaa agaaaggctg gggcttgaga gctgccaaag accttccttc gaacaccttt 420
gtcctagaat attgtggaga ggtactcgat cataaagagt ttaaagctcg agtgaaggag 480
tatgcacgaa acaaaaacat ccattactat ttcatggccc tgaagaatga tgagataata 540
gatgccactc aaaaaggaaa ttgctctcgt ttcatgaatc acagctgtga accaaattgt 600
gaaacccaaa aatggactgt gaacggacaa ctgagggttg ggttttttac caccaaactg 660
gttccttcag gctcagagtt aacgtttgac tatcagttcc agagatatgg aaaagaagcc 720
cagaaatgtt tctgcggatc agccaattgc cggggttacc tgggaggaga aaacagagtc 780
agcatcagag cagcaggagg gaaaatgaag aaggaacgat ctcgtaagaa ggattcagtg 840
gatggagagc tagaagctct gatggaaaat ggtgagggtc tctctgataa aaaccaggtg 900
ctcagcttat cccggagcgg aggatctagc ggaggatcaa gcggaagcga gactcctgga 960
accagcgaaa gcgcaacccc agaaagcagc ggaggaagta gcggaggaag ctcatcggag 1020
accggccctg ttgctgttga ccccaccctg cggcggagaa tcgagccaca cgagttcgag 1080
gtgttcttcg acccaaggga gctccgcaag gagacgtgcc tcctgtacga gatcaactgg 1140
ggcggcaggc actccatctg gaggcacacc agccaaaaca ccaacaagca cgtggaggtc 1200
aacttcatcg agaagttcac caccgagagg tacttctgcc caaacacccg ctgctccatc 1260
acctggttcc tgtcctggag cccatgcggc gagtgctcca gggccatcac cgagttcctc 1320
agccgctacc cacacgtcac cctgttcatc tacatcgcca ggctctacca ccacgccgac 1380
ccaaggaaca ggcagggcct ccgcgacctg atctccagcg gcgtgaccat ccaaatcatg 1440
accgagcagg agtccggcta ctgctggagg aacttcgtca actactcccc aagcaacgag 1500
gcccactggc caaggtaccc acacctctgg gtgcgcctct acgtgctcga gctgtactgc 1560
atcatcctcg gcctgccacc atgcctcaac atcctgaggc gcaagcaacc acagctgacc 1620
ttcttcacca tcgccctcca aagctgccac taccagaggc tcccaccaca catcctgtgg 1680
gctaccggcc tcaagtccgg cagcgagacg ccaggcacct ccgagagcgc tacgcctgaa 1740
cttaaggaca agaagtactc gatcggcctc gccatcggga cgaactcagt tggctgggcc 1800
gtgatcaccg acgagtacaa ggtgccctct aagaagttca aggtcctggg gaacaccgac 1860
cgccattcca tcaagaagaa cctcatcggc gctctcctgt tcgacagcgg ggagaccgct 1920
gaggctacga ggctcaagag aaccgctagg cgccggtaca cgagaaggaa gaacaggatc 1980
tgctacctcc aagagatttt ctccaacgag atggccaagg ttgacgattc attcttccac 2040
cgcctggagg agtctttcct cgtggaggag gataagaagc acgagcggca tcccatcttc 2100
ggcaacatcg tggacgaggt tgcctaccac gagaagtacc ctacgatcta ccatctgcgg 2160
aagaagctcg tggactccac cgataaggcg gacctcagac tgatctacct cgctctggcc 2220
cacatgatca agttccgcgg ccatttcctg atcgaggggg atctcaaccc agacaacagc 2280
gatgttgaca agctgttcat ccaactcgtg cagacctaca accaactctt cgaggagaac 2340
ccgatcaacg cctctggcgt ggacgcgaag gctatcctgt ccgcgaggct ctcgaagtcc 2400
aggaggctgg agaacctgat cgctcagctc ccaggcgaga agaagaacgg cctgttcggg 2460
aacctcatcg ctctcagcct ggggctcacc ccgaacttca agtcgaactt cgatctcgct 2520
gaggacgcca agctgcaact ctccaaggac acctacgacg atgacctcga taacctcctg 2580
gcccagatcg gcgatcaata cgcggacctg ttcctcgctg ccaagaacct gtcggacgcc 2640
atcctcctgt cagatatcct ccgcgtgaac accgagatca cgaaggctcc actctctgcc 2700
tccatgatca agcgctacga cgagcaccat caggatctga ccctcctgaa ggcgctggtc 2760
cgccaacagc tcccggagaa gtacaaggag attttcttcg atcagtcgaa gaacggctac 2820
gctgggtaca tcgacggcgg ggcctcacaa gaggagttct acaagttcat caagccaatc 2880
ctggagaaga tggacggcac ggaggagctc ctggtgaagc tcaacaggga ggacctcctg 2940
cggaagcaga gaaccttcga taacggcagc atcccccacc aaatccatct cggggagctg 3000
cacgccatcc tgagaaggca agaggacttc taccctttcc tcaaggataa ccgggagaag 3060
atcgagaaga tcctgacctt cagaatccca tactacgtcg gccctctcgc gcgggggaac 3120
tcaagattcg cttggatgac ccgcaagtct gaggagacca tcacgccgtg gaacttcgag 3180
gaggtggtgg acaagggcgc tagcgctcag tcgttcatcg agaggatgac caacttcgac 3240
aagaacctgc ccaacgagaa ggtgctccct aagcactcgc tcctgtacga gtacttcacc 3300
gtctacaacg agctcacgaa ggtgaagtac gtcaccgagg gcatgcgcaa gccagcgttc 3360
ctgtccgggg agcagaagaa ggctatcgtg gacctcctgt tcaagaccaa ccggaaggtc 3420
acggttaagc aactcaagga ggactacttc aagaagatcg agtgcttcga ttcggtcgag 3480
atcagcggcg ttgaggaccg cttcaacgcc agcctcggga cctaccacga tctcctgaag 3540
atcatcaagg ataaggactt cctggacaac gaggagaacg aggatatcct ggaggacatc 3600
gtgctgaccc tcacgctgtt cgaggacagg gagatgatcg aggagcgcct gaagacgtac 3660
gcccatctct tcgatgacaa ggtcatgaag caactcaagc gccggagata caccggctgg 3720
gggaggctgt cccgcaagct catcaacggc atccgggaca agcagtccgg gaagaccatc 3780
ctcgacttcc tcaagagcga tggcttcgcc aacaggaact tcatgcaact gatccacgat 3840
gacagcctca ccttcaagga ggatatccaa aaggctcaag tgagcggcca gggggactcg 3900
ctgcacgagc atatcgcgaa cctcgctggc tcccccgcga tcaagaaggg catcctccag 3960
accgtgaagg ttgtggacga gctcgtgaag gtcatgggcc ggcacaagcc tgagaacatc 4020
gtcatcgaga tggccagaga gaaccaaacc acgcagaagg ggcaaaagaa ctctagggag 4080
cgcatgaagc gcatcgagga gggcatcaag gagctggggt cccaaatcct caaggagcac 4140
ccagtggaga acacccaact gcagaacgag aagctctacc tgtactacct ccagaacggc 4200
agggatatgt acgtggacca agagctggat atcaaccgcc tcagcgatta cgacgtcgat 4260
catatcgttc cccagtcttt cctgaaggat gactccatcg acaacaaggt cctcaccagg 4320
tcggacaaga accgcggcaa gtcagataac gttccatctg aggaggtcgt taagaagatg 4380
aagaactact ggaggcagct cctgaacgcc aagctgatca cgcaaaggaa gttcgacaac 4440
ctcaccaagg ctgagagagg cgggctctca gagctggaca aggccggctt catcaagcgg 4500
cagctggtcg agaccagaca aatcacgaag cacgttgcgc aaatcctcga ctctcggatg 4560
aacacgaagt acgatgagaa cgacaagctg atcagggagg ttaaggtgat caccctgaag 4620
tctaagctcg tctccgactt caggaaggat ttccagttct acaaggttcg cgagatcaac 4680
aactaccacc atgcccatga cgcttacctc aacgctgtgg tcggcaccgc tctgatcaag 4740
aagtacccaa agctggagtc cgagttcgtg tacggggact acaaggttta cgatgtgcgc 4800
aagatgatcg ccaagtcgga gcaagagatc ggcaaggcta ccgccaagta cttcttctac 4860
tcaaacatca tgaacttctt caagaccgag atcacgctgg ccaacggcga gatccggaag 4920
agaccgctca tcgagaccaa cggcgagacg ggggagatcg tgtgggacaa gggcagggat 4980
ttcgcgaccg tccgcaaggt tctctccatg ccccaggtga acatcgtcaa gaagaccgag 5040
gtccaaacgg gcgggttctc aaaggagtct atcctgccta agcggaacag cgacaagctc 5100
atcgccagaa agaaggactg ggacccaaag aagtacggcg ggttcgacag ccctaccgtg 5160
gcctactcgg tcctggttgt ggcgaaggtt gagaagggca agtccaagaa gctcaagagc 5220
gtgaaggagc tcctggggat caccatcatg gagaggtcca gcttcgagaa gaacccaatc 5280
gacttcctgg aggccaaggg ctacaaggag gtgaagaagg acctgatcat caagctcccg 5340
aagtactctc tcttcgagct ggagaacggc aggaagagaa tgctggcttc cgctggcgag 5400
ctccagaagg ggaacgagct cgcgctgcca agcaagtacg tgaacttcct ctacctggct 5460
tcccactacg agaagctcaa gggcagcccg gaggacaacg agcaaaagca gctgttcgtc 5520
gagcagcaca agcattacct cgacgagatc atcgagcaaa tctccgagtt cagcaagcgc 5580
gtgatcctcg ccgacgcgaa cctggataag gtcctctccg cctacaacaa gcaccgggac 5640
aagcccatca gagagcaagc ggagaacatc atccatctct tcaccctgac gaacctcggc 5700
gctcctgctg ctttcaagta cttcgacacc acgatcgatc ggaagagata cacctccacg 5760
aaggaggtcc tggacgcgac cctcatccac cagtcgatca ccggcctgta cgagacgagg 5820
atcgacctct cacaactcgg cggggataag agacccgcag caaccaagaa ggcagggcaa 5880
gcaaagaaga agaagacgcg tgactccggc ggcagcatgt ttggagagag ctggaagaag 5940
cacctcagcg gggagttcgg gaaaccgtat tttatcaagc taatgggatt tgttgcagaa 6000
gaaagaaagc attacactgt ttatccaccc ccacaccaag tcttcacctg gacccagatg 6060
tgtgacataa aagatgtgaa ggttgtcatc ctgggacagg atccatatca tggacctaat 6120
caagctcacg ggctctgctt tagtgttcaa aggcctgttc cgcctccgcc cagtttggag 6180
aacatttata aagagttgtc tacagacata gaggattttg ttcatcctgg ccatggagat 6240
ttatctgggt gggccaagca aggtgttctc cttctcaacg ctgtcctcac ggttcgtgcc 6300
catcaagcca actctcataa ggagcgaggc tgggagcagt tcactgatgc agttgtgtcc 6360
tggctaaatc agaactcgaa tggccttgtt ttcttgctct ggggctctta tgctcagaag 6420
aagggcagtg ccattgatag gaagcggcac catgtactac agacggctca tccctcccct 6480
ttgtcagtgt atagagggtt ctttggatgt agacactttt caaagaccaa tgagctgctg 6540
cagaagtctg gcaagaagcc cattgactgg aaggagctgt cgggggggag cccaaagaag 6600
aagcggaagg tg 6612
<210> 10
<211> 7458
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 10
atgccaaaga agaagaggaa ggtttcggac acgtggagct ctatccaggc ccacaagaag 60
cagctggact ctctgcggga gaggctgcag cggaggcgga agcaggactc ggggcacttg 120
gatctacgga atccagaggc agcattgtct ccaaccttcc gtagtgacag cccagtgcct 180
actgcaccca cctctggtgg ccctaagccc agcacagctt cagcagttcc tgaattagct 240
acagatcctg agttagagaa gaagttgcta caccacctct ctgatctggc cttaacattg 300
cccactgatg ctgtgtccat ctgtcttgcc atctccacgc cagatgctcc tgccactcaa 360
gatggggtag aaagcctcct gcagaagttt gcagctcagg agttgattga ggtaaagcga 420
ggtctcctac aagatgatgc acatcctact cttgtaacct atgctgacca ttccaagctc 480
tctgccatga tgggtgctgt ggcagaaaag aagggccctg gggaggtagc agggactgtc 540
acagggcaga agcggcgtgc agaacaggac tcgactacag tagctgcctt tgccagttcg 600
ttagtctctg gtctgaactc ttcagcatcg gaaccagcaa aggagccagc caagaaatca 660
aggaaacatg ctgcctcaga tgttgatctg gagatagaga gccttctgaa ccaacagtcc 720
actaaggaac aacagagcaa gaaggtcagt caggagatcc tagagctatt aaatactaca 780
acagccaagg aacaatccat tgttgaaaaa tttcgctctc gaggtcgggc ccaagtgcaa 840
gaattctgtg actatggaac caaggaggag tgcatgaaag ccagtgatgc tgatcgaccc 900
tgtcgcaagc tgcacttcag acgaattatc aataaacaca ctgatgagtc tttaggtgac 960
tgctctttcc ttaatacatg tttccacatg gatacctgca agtatgttca ctatgaaatt 1020
gatgcttgca tggattctga ggcccctggc agcaaagacc acacgccaag ccaggagctt 1080
gctcttacac agagtgtcgg aggtgattcc agtgcagacc gactcttccc acctcagtgg 1140
atctgttgtg atatccgcta cctggacgtc agtatcttgg gcaagtttgc agttgtgatg 1200
gctgacccac cctgggatat tcacatggaa ctgccctatg ggaccctgac agatgatgag 1260
atgcgcaggc tcaacatacc cgtactacag gatgatggct ttctcttcct ctgggtcaca 1320
ggcagggcca tggagttggg gagagaatgt ctaaacctct gggggtatga acgggtagat 1380
gaaattattt gggtgaagac aaatcaactg caacgcatca ttcggacagg ccgtacaggt 1440
cactggttga accatgggaa ggaacactgc ttggttggtg tcaaaggaaa tccccaaggc 1500
ttcaaccagg gtctggattg tgatgtgatc gtagctgagg ttcgttccac cagtcataaa 1560
ccagatgaaa tctatggcat gattgaaaga ctatctcctg gcactcgcaa gattgagtta 1620
tttggacgac cacacaatgt gcaacccaac tggatcaccc ttggaaacca actggatggg 1680
atccacctac tagacccaga tgtggttgca cggttcaagc aaaggtaccc agatggtatc 1740
atctctaaac ctaagaattt aagcggagga tctagcggag gatcaagcgg aagcgagact 1800
cctggaacca gcgaaagcgc aaccccagaa agcagcggag gaagtagcgg aggaagctca 1860
tcggagaccg gccctgttgc tgttgacccc accctgcggc ggagaatcga gccacacgag 1920
ttcgaggtgt tcttcgaccc aagggagctc cgcaaggaga cgtgcctcct gtacgagatc 1980
aactggggcg gcaggcactc catctggagg cacaccagcc aaaacaccaa caagcacgtg 2040
gaggtcaact tcatcgagaa gttcaccacc gagaggtact tctgcccaaa cacccgctgc 2100
tccatcacct ggttcctgtc ctggagccca tgcggcgagt gctccagggc catcaccgag 2160
ttcctcagcc gctacccaca cgtcaccctg ttcatctaca tcgccaggct ctaccaccac 2220
gccgacccaa ggaacaggca gggcctccgc gacctgatct ccagcggcgt gaccatccaa 2280
atcatgaccg agcaggagtc cggctactgc tggaggaact tcgtcaacta ctccccaagc 2340
aacgaggccc actggccaag gtacccacac ctctgggtgc gcctctacgt gctcgagctg 2400
tactgcatca tcctcggcct gccaccatgc ctcaacatcc tgaggcgcaa gcaaccacag 2460
ctgaccttct tcaccatcgc cctccaaagc tgccactacc agaggctccc accacacatc 2520
ctgtgggcta ccggcctcaa gtccggcagc gagacgccag gcacctccga gagcgctacg 2580
cctgaactta aggacaagaa gtactcgatc ggcctcgcca tcgggacgaa ctcagttggc 2640
tgggccgtga tcaccgacga gtacaaggtg ccctctaaga agttcaaggt cctggggaac 2700
accgaccgcc attccatcaa gaagaacctc atcggcgctc tcctgttcga cagcggggag 2760
accgctgagg ctacgaggct caagagaacc gctaggcgcc ggtacacgag aaggaagaac 2820
aggatctgct acctccaaga gattttctcc aacgagatgg ccaaggttga cgattcattc 2880
ttccaccgcc tggaggagtc tttcctcgtg gaggaggata agaagcacga gcggcatccc 2940
atcttcggca acatcgtgga cgaggttgcc taccacgaga agtaccctac gatctaccat 3000
ctgcggaaga agctcgtgga ctccaccgat aaggcggacc tcagactgat ctacctcgct 3060
ctggcccaca tgatcaagtt ccgcggccat ttcctgatcg agggggatct caacccagac 3120
aacagcgatg ttgacaagct gttcatccaa ctcgtgcaga cctacaacca actcttcgag 3180
gagaacccga tcaacgcctc tggcgtggac gcgaaggcta tcctgtccgc gaggctctcg 3240
aagtccagga ggctggagaa cctgatcgct cagctcccag gcgagaagaa gaacggcctg 3300
ttcgggaacc tcatcgctct cagcctgggg ctcaccccga acttcaagtc gaacttcgat 3360
ctcgctgagg acgccaagct gcaactctcc aaggacacct acgacgatga cctcgataac 3420
ctcctggccc agatcggcga tcaatacgcg gacctgttcc tcgctgccaa gaacctgtcg 3480
gacgccatcc tcctgtcaga tatcctccgc gtgaacaccg agatcacgaa ggctccactc 3540
tctgcctcca tgatcaagcg ctacgacgag caccatcagg atctgaccct cctgaaggcg 3600
ctggtccgcc aacagctccc ggagaagtac aaggagattt tcttcgatca gtcgaagaac 3660
ggctacgctg ggtacatcga cggcggggcc tcacaagagg agttctacaa gttcatcaag 3720
ccaatcctgg agaagatgga cggcacggag gagctcctgg tgaagctcaa cagggaggac 3780
ctcctgcgga agcagagaac cttcgataac ggcagcatcc cccaccaaat ccatctcggg 3840
gagctgcacg ccatcctgag aaggcaagag gacttctacc ctttcctcaa ggataaccgg 3900
gagaagatcg agaagatcct gaccttcaga atcccatact acgtcggccc tctcgcgcgg 3960
gggaactcaa gattcgcttg gatgacccgc aagtctgagg agaccatcac gccgtggaac 4020
ttcgaggagg tggtggacaa gggcgctagc gctcagtcgt tcatcgagag gatgaccaac 4080
ttcgacaaga acctgcccaa cgagaaggtg ctccctaagc actcgctcct gtacgagtac 4140
ttcaccgtct acaacgagct cacgaaggtg aagtacgtca ccgagggcat gcgcaagcca 4200
gcgttcctgt ccggggagca gaagaaggct atcgtggacc tcctgttcaa gaccaaccgg 4260
aaggtcacgg ttaagcaact caaggaggac tacttcaaga agatcgagtg cttcgattcg 4320
gtcgagatca gcggcgttga ggaccgcttc aacgccagcc tcgggaccta ccacgatctc 4380
ctgaagatca tcaaggataa ggacttcctg gacaacgagg agaacgagga tatcctggag 4440
gacatcgtgc tgaccctcac gctgttcgag gacagggaga tgatcgagga gcgcctgaag 4500
acgtacgccc atctcttcga tgacaaggtc atgaagcaac tcaagcgccg gagatacacc 4560
ggctggggga ggctgtcccg caagctcatc aacggcatcc gggacaagca gtccgggaag 4620
accatcctcg acttcctcaa gagcgatggc ttcgccaaca ggaacttcat gcaactgatc 4680
cacgatgaca gcctcacctt caaggaggat atccaaaagg ctcaagtgag cggccagggg 4740
gactcgctgc acgagcatat cgcgaacctc gctggctccc ccgcgatcaa gaagggcatc 4800
ctccagaccg tgaaggttgt ggacgagctc gtgaaggtca tgggccggca caagcctgag 4860
aacatcgtca tcgagatggc cagagagaac caaaccacgc agaaggggca aaagaactct 4920
agggagcgca tgaagcgcat cgaggagggc atcaaggagc tggggtccca aatcctcaag 4980
gagcacccag tggagaacac ccaactgcag aacgagaagc tctacctgta ctacctccag 5040
aacggcaggg atatgtacgt ggaccaagag ctggatatca accgcctcag cgattacgac 5100
gtcgatcata tcgttcccca gtctttcctg aaggatgact ccatcgacaa caaggtcctc 5160
accaggtcgg acaagaaccg cggcaagtca gataacgttc catctgagga ggtcgttaag 5220
aagatgaaga actactggag gcagctcctg aacgccaagc tgatcacgca aaggaagttc 5280
gacaacctca ccaaggctga gagaggcggg ctctcagagc tggacaaggc cggcttcatc 5340
aagcggcagc tggtcgagac cagacaaatc acgaagcacg ttgcgcaaat cctcgactct 5400
cggatgaaca cgaagtacga tgagaacgac aagctgatca gggaggttaa ggtgatcacc 5460
ctgaagtcta agctcgtctc cgacttcagg aaggatttcc agttctacaa ggttcgcgag 5520
atcaacaact accaccatgc ccatgacgct tacctcaacg ctgtggtcgg caccgctctg 5580
atcaagaagt acccaaagct ggagtccgag ttcgtgtacg gggactacaa ggtttacgat 5640
gtgcgcaaga tgatcgccaa gtcggagcaa gagatcggca aggctaccgc caagtacttc 5700
ttctactcaa acatcatgaa cttcttcaag accgagatca cgctggccaa cggcgagatc 5760
cggaagagac cgctcatcga gaccaacggc gagacggggg agatcgtgtg ggacaagggc 5820
agggatttcg cgaccgtccg caaggttctc tccatgcccc aggtgaacat cgtcaagaag 5880
accgaggtcc aaacgggcgg gttctcaaag gagtctatcc tgcctaagcg gaacagcgac 5940
aagctcatcg ccagaaagaa ggactgggac ccaaagaagt acggcgggtt cgacagccct 6000
accgtggcct actcggtcct ggttgtggcg aaggttgaga agggcaagtc caagaagctc 6060
aagagcgtga aggagctcct ggggatcacc atcatggaga ggtccagctt cgagaagaac 6120
ccaatcgact tcctggaggc caagggctac aaggaggtga agaaggacct gatcatcaag 6180
ctcccgaagt actctctctt cgagctggag aacggcagga agagaatgct ggcttccgct 6240
ggcgagctcc agaaggggaa cgagctcgcg ctgccaagca agtacgtgaa cttcctctac 6300
ctggcttccc actacgagaa gctcaagggc agcccggagg acaacgagca aaagcagctg 6360
ttcgtcgagc agcacaagca ttacctcgac gagatcatcg agcaaatctc cgagttcagc 6420
aagcgcgtga tcctcgccga cgcgaacctg gataaggtcc tctccgccta caacaagcac 6480
cgggacaagc ccatcagaga gcaagcggag aacatcatcc atctcttcac cctgacgaac 6540
ctcggcgctc ctgctgcttt caagtacttc gacaccacga tcgatcggaa gagatacacc 6600
tccacgaagg aggtcctgga cgcgaccctc atccaccagt cgatcaccgg cctgtacgag 6660
acgaggatcg acctctcaca actcggcggg gataagagac ccgcagcaac caagaaggca 6720
gggcaagcaa agaagaagaa gacgcgtgac tccggcggca gcatgtttgg agagagctgg 6780
aagaagcacc tcagcgggga gttcgggaaa ccgtatttta tcaagctaat gggatttgtt 6840
gcagaagaaa gaaagcatta cactgtttat ccacccccac accaagtctt cacctggacc 6900
cagatgtgtg acataaaaga tgtgaaggtt gtcatcctgg gacaggatcc atatcatgga 6960
cctaatcaag ctcacgggct ctgctttagt gttcaaaggc ctgttccgcc tccgcccagt 7020
ttggagaaca tttataaaga gttgtctaca gacatagagg attttgttca tcctggccat 7080
ggagatttat ctgggtgggc caagcaaggt gttctccttc tcaacgctgt cctcacggtt 7140
cgtgcccatc aagccaactc tcataaggag cgaggctggg agcagttcac tgatgcagtt 7200
gtgtcctggc taaatcagaa ctcgaatggc cttgttttct tgctctgggg ctcttatgct 7260
cagaagaagg gcagtgccat tgataggaag cggcaccatg tactacagac ggctcatccc 7320
tcccctttgt cagtgtatag agggttcttt ggatgtagac acttttcaaa gaccaatgag 7380
ctgctgcaga agtctggcaa gaagcccatt gactggaagg agctgtcggg ggggagccca 7440
aagaagaagc ggaaggtg 7458

Claims (10)

1.重组基因编辑系统,其特征在于:所述重组基因编辑系统为在基因编辑系统的基础上改造得到的,所述重组基因编辑系统表达融合蛋白,所述融合蛋白含有序列特异性结合蛋白、诱导基因组修饰因子和表观遗传因子;所述重组基因编辑系统的基因编辑效率高于所述基因编辑系统。
2.根据权利要求1所述的重组基因编辑系统,其特征在于:所述基因编辑系统为碱基编辑系统。
3.根据权利要求1或2所述的重组基因编辑系统,其特征在于:所述序列特异性结合蛋白为Cas9蛋白;所述诱导基因组修饰因子为脱氨酶。
4.根据权利要求1或2所述的重组基因编辑系统,其特征在于:所述表观遗传因子为染色质重塑因子、组蛋白修饰因子和/或RNA修饰因子。
5.根据权利要求2或3或4所述的重组基因编辑系统,其特征在于:所述基因编辑系统为CBE碱基编辑系统、GBE碱基编辑系统;所述脱氨酶为胞苷脱氨酶;所述Cas9蛋白为nCas9。
6.根据权利要求4或5所述的重组基因编辑系统,其特征在于:所述染色质重塑因子为SOX2;所述组蛋白修饰因子为SETD2;所述RNA修饰因子为METTL3;
所述SOX2为A1)、A2)、A3)、A4)或A5)中的任一种:
A1)氨基酸序列为序列表中序列1的第280-596位;
A2)由编码序列是序列表中序列6的第838-1053位的核苷酸序列编码的蛋白质;
A3)由编码序列是序列表中序列7的第838-1188位的核苷酸序列编码的蛋白质;
A4)将A1)、A2)或A3)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由A1)、A2)或A3)衍生的或与A1)、A2)或A3)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质;
A5)在A1)、A2)、A3)或A4)的N末端或/和C末端连接蛋白标签得到的融合蛋白质;
和/或,所述SETD2为B1)、B2)或B3)中的任一种:
B1)由编码序列是序列表中序列9的第25-915位的核苷酸序列编码的蛋白质;
B2)将B1)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由B1)衍生的或与B1)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质;
B3)在B1)或B2)的N末端或/和C末端连接蛋白标签得到的融合蛋白质;
和/或,所述METTL3为C1)、C2)或C3)中的任一种:
C1)由编码序列是序列表中序列10的第25-1761位的核苷酸序列编码的蛋白质;
C2)将C1)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由C1)衍生的或与C1)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质;
C3)在C1)或C2)的N末端或/和C末端连接蛋白标签得到的融合蛋白质。
7.根据权利要求1或6所述的重组基因编辑系统,其特征在于:所述融合蛋白为D1)、D2)、D3)、D4)、D5)、D6)中的任一种:
D1)氨基酸序列为序列表中序列1的蛋白质;
D2)氨基酸序列为序列表中序列3的蛋白质;
D3)由编码序列是序列表中序列6的第1-5871位的核苷酸序列编码的蛋白质;
D4)由编码序列是序列表中序列7的第1-6006位的核苷酸序列编码的蛋白质;
D5)由编码序列是序列表中序列8的第1-6072位的核苷酸序列编码的蛋白质;
D6)由编码序列是序列表中序列9的第1-6612位的核苷酸序列编码的蛋白质;
D7)由编码序列是序列表中序列10的第1-7458位的核苷酸序列编码的蛋白质;
D8)将D1)、D2)、D3)、D4)、D5)、D6)或D7)所示的氨基酸序列经过一个以上氨基酸残基的取代和/或缺失和/或添加得到的且具有相同功能的由D1)、D2)、D3)、D4)、D5)、D6或D7)衍生的或与D1)、D2)、D3)、D4)、D5)、D6)或D7)所示的蛋白质具有80%以上的同一性且具有相同功能的蛋白质;
D9)在D1)、D2)、D3)、D4)、D5)、D6)、D7)或D8)的N末端或/和C末端连接蛋白标签得到的融合蛋白质。
8.权利要求1-7中任一权利要求中所述的融合蛋白或其相关的生物材料,所述生物材料为下述任一种:
E1)编码权利要求1-7中任一所述融合蛋白质的核酸分子;
E2)含有E1)所述核酸分子的表达盒;
E3)含有E1)所述核酸分子的重组载体、或含有E2)所述表达盒的重组载体;
E4)含有E1)所述核酸分子的重组微生物、或含有E2)所述表达盒的重组微生物、或含有E3)所述重组载体的重组微生物;
E5)含有E1)所述核酸分子的转基因植物细胞系、或含有E2)所述表达盒的转基因细胞系、或含有E3)所述重组载体的转基因细胞系;
E6)含有E1)所述核酸分子的转基因植物组织、或含有E2)所述表达盒的转基因组织、或含有E3)所述重组载体的转基因组织;
E7)含有E1)所述核酸分子的转基因动物器官、或含有E2)所述表达盒的转基因动物器官、或含有E3)所述重组载体的转基因动物器官。
9.权利要求1-7中任一权利要求中所述的表观遗传因子在提高基因编辑系统的基因编辑效率中的应用。
10.权利要求1-7中任一权利要求中所述的表观遗传因子和/或权利要求8所述的融合蛋白和/或其相关的生物材料在基因编辑中的应用。
CN202111281795.4A 2021-11-01 2021-11-01 表观遗传因子在真核细胞中优化基因编辑工具的应用 Pending CN115678913A (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111281795.4A CN115678913A (zh) 2021-11-01 2021-11-01 表观遗传因子在真核细胞中优化基因编辑工具的应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111281795.4A CN115678913A (zh) 2021-11-01 2021-11-01 表观遗传因子在真核细胞中优化基因编辑工具的应用

Publications (1)

Publication Number Publication Date
CN115678913A true CN115678913A (zh) 2023-02-03

Family

ID=85059622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111281795.4A Pending CN115678913A (zh) 2021-11-01 2021-11-01 表观遗传因子在真核细胞中优化基因编辑工具的应用

Country Status (1)

Country Link
CN (1) CN115678913A (zh)

Similar Documents

Publication Publication Date Title
CN112410377B (zh) VI-E型和VI-F型CRISPR-Cas系统及用途
KR102084186B1 (ko) Dna 단일가닥 절단에 의한 염기 교정 비표적 위치 확인 방법
KR102606680B1 (ko) S. 피오게네스 cas9 돌연변이 유전자 및 이에 의해 암호화되는 폴리펩티드
KR102271292B1 (ko) Rna-안내 게놈 편집의 특이성을 증가시키기 위한 rna-안내 foki 뉴클레아제(rfn)의 용도
EP3765616B1 (en) Novel crispr dna and rna targeting enzymes and systems
EP2947146B1 (en) Methods and compositions for targeted cleavage and recombination
CA3111432A1 (en) Novel crispr enzymes and systems
EP2927318B1 (en) Methods and compositions for targeted cleavage and recombination
US8349810B2 (en) Methods for targeted cleavage and recombination of CCR5
KR20190116407A (ko) 고-충실도 Cas9 변이체 및 그의 적용
KR20190082318A (ko) Crispr/cpf1 시스템 및 방법
AU2022200130B2 (en) Engineered Cas9 systems for eukaryotic genome modification
EP2785831A2 (en) Nucleotide-specific recognition sequences for designer tal effectors
CN113373130A (zh) Cas12蛋白、含有Cas12蛋白的基因编辑系统及应用
JP2001507565A (ja) 改変tn5トランスポザーゼを使用したインビトロ転位用システム
KR20220054434A (ko) 신규한 crispr dna 표적화 효소 및 시스템
CN109337904B (zh) 基于C2c1核酸酶的基因组编辑系统和方法
KR20220151175A (ko) 킬로베이스 스케일에서 rna-가이드된 게놈 재조합
CN114729011A (zh) 新型crispr dna靶向酶及系统
WO2023030340A1 (en) Novel design of guide rna and uses thereof
CN115678913A (zh) 表观遗传因子在真核细胞中优化基因编辑工具的应用
WO2023039434A1 (en) Systems and methods for transposing cargo nucleotide sequences
Sridhara Structural and functional basis of mitochondrial tRNA processing
US20070202508A1 (en) Novel thermophilic proteins and the nucleic acids encoding them
WO2023039438A1 (en) Systems, compositions, and methods involving retrotransposons and functional fragments thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination