CN112805385B - 基于人apobec3a脱氨酶的碱基编辑器及其用途 - Google Patents

基于人apobec3a脱氨酶的碱基编辑器及其用途 Download PDF

Info

Publication number
CN112805385B
CN112805385B CN201980049597.XA CN201980049597A CN112805385B CN 112805385 B CN112805385 B CN 112805385B CN 201980049597 A CN201980049597 A CN 201980049597A CN 112805385 B CN112805385 B CN 112805385B
Authority
CN
China
Prior art keywords
leu
lys
glu
asp
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201980049597.XA
Other languages
English (en)
Other versions
CN112805385A (zh
Inventor
高彩霞
宗媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qihe Biotechnology Co ltd
Original Assignee
Suzhou Qihe Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Qihe Biotechnology Co ltd filed Critical Suzhou Qihe Biotechnology Co ltd
Publication of CN112805385A publication Critical patent/CN112805385A/zh
Application granted granted Critical
Publication of CN112805385B publication Critical patent/CN112805385B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Paper (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

提供了一种基于人APOBEC3A脱氨酶的碱基编辑器及其用途,其中该编辑器能够介导高效的C至T的核苷酸取代。

Description

基于人APOBEC3A脱氨酶的碱基编辑器及其用途
技术领域
本发明涉及基因工程领域。具体而言,本发明涉及基于人APOBEC3A脱氨酶的碱基编辑器及其用途,特别是所述编辑器在植物碱基编辑中的用途,其中所述编辑器能够介导高效的C至T的核苷酸取代。
发明背景
目前,大量的与重要农艺性状相关的单核苷酸变体被开发并应用于作物改良(Zhao,K.等.Nat.Commun.2,467(2011);Henikoff,S.和Comai,L.Annu.Rev.PlantBiol.54,375–401(2003))。植物单核苷酸多态性遗传工程代表了分子育种的巨大进步(Voytas,D.F.和Gao,C.PLoS Biol.12,e1001877(2014);Gao,C.Nat.Rev.Mol.CellBiol.19,275-276(2018))。
近期出现的碱基编辑器(BE)技术在包括植物在内的多种物种中实现了单核苷酸基因组修饰,而不需引入DNA双链断裂(DSB)、外源供体DNA模板以及多余的插入缺失(Hess,G.T.等.Mol.Cell 68,26-43(2017);Yang,B.等.J.Genet.Genomics 44,423-437(2017))。该技术可以与HDR技术互补并规避其部分局限性。最为广泛使用的胞苷碱基编辑器BE3,由胞嘧啶脱氨酶APOBEC1与Cas9切口酶(nCas9(D10A))和尿嘧啶糖基化酶抑制剂UGI融合组成(Komor,A.C.等.Nature 533,420–424(2016)),其可直接实现基因组DNA靶中的C至T点突变。
已有研究对BE3进行修饰以扩大其PAM的选择范围,并提高其编辑效率和特异性(Kim,Y.B.等.Nat.Biotechnol.35,371-376(2017);Komor,A.C.等.Sci.Adv.3,eaao4774(2017);Kim,K.等.Nat.Biotechnol.35,435-437(2017);Rees,H.A.等.Nat.Commun.8,15790(2017).;Gerhke,J.M.等.bioRxiv 273938.doi:10.1101/273938(2018);St Martin,A.等.Nucleic Acids Res.9.doi:10.1093/nar/gky332(2018))。然而,尽管这些进展非常有利且取得了相关进展,但是目前的BE3编辑器受限于五个碱基对内的狭窄的脱氨基窗,使其在某些靶位点效率较低,并且当靶核苷酸C远离位置7时通常效率会降低。另外,BE3明显偏好于TC二核苷酸,而对GC二核苷酸的编辑活性明显降低甚至不可检测。这两点限制性都妨碍了编辑器进行精准突变和多样化突变,因此需要进一步对碱基编辑器技术进行改进。
发明简述
本发明包括一种新型碱基编辑器A3A-PBE系统,该系统能够在17bp范围内的脱氨基窗口内,广泛地在内源基因组位点上高效引入C至T取代突变。A3A-BE3可以在高度GC化环境和高甲基化区域高效工作,在编码区和非编码区产生多样化的突变。使A3A-BE3碱基编辑系统成为植物育种中产生有价值的精准突变和多样化突变体的有吸引力的新工具,有助于通过基因组工程提高作物改良效率。
附图简述
图1:比较A3A-PBE和PBE的C至T碱基编辑效率。a:A3A-PBE编辑胞嘧啶碱基的范围。b:三种胞嘧啶碱基编辑器构建体的示意图。
图2:比较A3A-PBE和PBE的C至T碱基编辑效率。a:使用三种胞嘧啶碱基编辑器将水稻中BFP转化为GFP的流式细胞图。使用各胞嘧啶碱基编辑器以及pUbi-BFPm和pOsU3-BFP-sgRNA转化后的原生质体。GFP和未处理的原生质体样品用作对照。比例尺,150μm。b:通过流式细胞术(FCM)测量在BFP编码序列的靶区域中C至T取代的频率(%)。数据来自三个独立的生物学重复,所有值均为平均值±标准误差。****P<0.0001。
图3:比较A3A-PBE和PBE的C至T碱基编辑效率。a:由PBE、A3A-PBE和A3A-Gam在小麦原生质体的4个靶位点引入的靶向单C至T取代的频率。b:由PBE、A3A-PBE和A3A-Gam在水稻原生质体的6个靶位点引入的靶向单C至T取代的频率。c:由PBE和A3A-PBE在马铃薯原生质体的10个靶位点引入的靶向单C至T取代的频率。未处理的原生质体样品用作对照。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值±标准误差)。
图4和图5:测试小麦基因组座位的胞嘧啶碱基编辑的产物纯度。显示了用PBE、A3A-PBE和A3A-Gam处理的小麦原生质体中四个代表性小麦基因组DNA位点的产物分布和插入频率。每个位置总共使用19,000-140,000个测序读数。
图6、图7和图8:测试水稻基因组座位的胞嘧啶碱基编辑产物纯度。显示了用PBE、A3A-PBE和A3A-Gam处理的水稻原生质体中六个代表性水稻基因组DNA位点的产物分布和插入频率。每个位置总共使用25,000-131,000个测序读数。
图9:小麦和水稻基因组十个靶位点的插入缺失频率。测量由PBE、A3A-PBE、A3A-Gam和Cas9诱导的插入缺失频率。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值±标准误差)。
图10:A3A-PBE和PBE碱基编辑器在马铃薯原生质体中C至T碱基编辑效率的比较。(a)两种胞嘧啶碱基编辑器和sgRNA载体的示意图。(b)靶向StALS和StGBSS的sgRNA序列。脱氨窗口中的C碱基以蓝色突出显示。PAM序列显示为红色。(c)马铃薯十个靶位点的插入缺失频率。由PBE、A3A-PBE和Cas9诱导的相对sgRNA的插入缺失频率。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值±标准误差)。
图11:A3A-PBE在C到T碱基编辑中的广泛适用。a:使用A3A-PBE和PBE碱基编辑器比较高GC背景中C至T碱基取代效率。b:序列背景对使用PBE(窗口为3-9)和A3A-PBE(窗口为1-17)时碱基编辑效率的影响。使用图3a-b和图11a中的数据计算频率(平均值±标准误差)。c:由A3A-PBE在TaVRN1-A1启动子的顺式元件中引入的靶向单一C至T取代频率。
图12:A3A-PBE在C到T碱基编辑中的广泛适用。a:T0小麦、水稻和马铃薯中由A3A-PBE诱导的突变频率。b:TaALS中的氨基酸取代赋予除草剂抗性。野生型(WT)TaALS与T0-7突变体TaALS的氨基酸序列比对。生长三周后在添加0.254ppm烟嘧磺隆的再生培养基中T0-7的表型。比例尺,1cm。
图13:鉴定和分析具有A3A-PBE靶向C至T取代的小麦幼苗。(a)靶向TaALS同源物外显子保守区域的sgRNA序列。脱氨窗口中的C碱基以红色突出显示。前间区序列邻近基序(Protospacer-adjacent motif,PAM)序列用粗体突出显示,EcoO109I限制性位点用下划线表示。(b)对10株代表性taals突变体进行PCR-RE分析。泳道T0-1至T0-10显示用EcoO109I消化后的独立小麦植物的扩增PCR片段。标记WT/D和WT/U的泳道分别指从野生型(WT)植物扩增的使用和不使用EcoO109I消化的PCR片段。箭头标记的条带表示阳性碱基编辑。
图14:用于TaALS和TaMTL碱基编辑的构建体以及在所得到的T0突变体中检测转基因整合。(a)用于TaALS和TaMTL碱基编辑的A3A-PBE和pTaU6-sgRNA载体图。显示了用于检测转基因整合的5个引物对(F1/R1、F2/R2、F3/R3、F4/R4和F5/R5)的位置。(b)针对10个代表性taals突变体植物和10个tamt1突变体使用5个引物对得到转基因整合试验的结果。在TaALS(T0-3、T0-5、T0-6和T0-7)的四种突变体和TaMTL(T0-1、T0-2、T0-3、T0-5、T0-6和T0-9)的六种突变体中,使用5种引物对均未产生预期的PCR扩增,表明其为非转基因的。使用从野生型小麦植物提取的基因组DNA(cvKenong 199)作为阴性对照。用A3A-PBE或pTaU6-sgRNA质粒DNA作为阳性对照。
图15:通过SDS-PAGE分析纯化的A3A-PBE-ΔUGI蛋白。在10%SDS-PAGE分离3μg纯化蛋白质,然后经考马斯蓝染色显现。
图16:A3A-PBE在C到T碱基编辑中广泛适用。a:使用A3A-PBE-ΔUGI(DNA)和A3A-PBE-ΔUGI(RNP)比较C至T碱基编辑效率。未处理的原生质体样品用作对照。数据来自三个独立的生物学重复(n=3),计算每个频率(平均值)。b:生物信息学分析PBE和A3A-PBE在水稻基因组靶向Cs(NGG PAM)或Gs(CCN PAM)的范围。PBE或A3A-PBE协同不同的Cas9变体(VQR、EQR、VRER,SaCas9和SaKKH)显著增加水稻基因组中靶向Cs或Gs的碱基编辑范围。
图17:基于Cpf1的A3A碱基编辑器的载体构建。
图18:用基于Cpf1的A3A碱基编辑器对水稻内源基因进行碱基编辑。
图19:示出包含A3A突变体(N57G取代)的构建体碱基编辑效率。
图20:示出NLS对碱基编辑效率的影响。
发明详述
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,MolecularCloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold SpringHarbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
如本文所用,术语“CRISPR效应蛋白”通常指在天然存在的CRISPR系统中存在的核酸酶,以及其修饰形式、其变体、其催化活性片段等。该术语涵盖基于CRISPR系统的能够在细胞内实现基因靶向(例如基因编辑、基因靶向调控等)的任何效应蛋白。
“CRISPR效应蛋白”的实例包括Cas9核酸酶或其变体。所述Cas9核酸酶可以是来自不同物种的Cas9核酸酶,例如来自化脓链球菌(S.pyogenes)的spCas9或衍生自金黄色葡萄球菌(S.aureus)的SaCas9。“Cas9核酸酶”和“Cas9”在本文中可互换使用,指的是包括Cas9蛋白或其片段(例如包含Cas9的活性DNA切割结构域和/或Cas9的gRNA结合结构域的蛋白)的RNA指导的核酸酶。Cas9是CRISPR/Cas(成簇的规律间隔的短回文重复序列及其相关系统)基因组编辑系统的组分,能在向导RNA的指导下靶向并切割DNA靶序列形成DNA双链断裂(DSB)。
“CRISPR效应蛋白”的实例还可以包括Cpf1核酸酶或其变体例如高特异性变体。所述Cpf1核酸酶可以是来自不同物种的Cpf1核酸酶,例如来自Francisella novicida U112、Acidaminococcus sp.BV3L6和Lachnospiraceae bacterium ND2006的Cpf1核酸酶。
如本文所用,“gRNA”和“向导RNA”可互换使用,指的是能够与CRISPR效应蛋白形成复合物并由于与靶序列具有一定互补性而能够将所述复合物靶向靶序列的RNA分子。例如,在基于Cas9的基因编辑系统中,gRNA通常由部分互补形成复合物的crRNA和tracrRNA分子构成,其中crRNA包含与靶序列具有足够互补性以便与该靶序列杂交并且指导CRISPR复合物(Cas9+crRNA+tracrRNA)与该靶序列序列特异性地结合的序列。然而,本领域已知可以设计单向导RNA(sgRNA),其同时包含crRNA和tracrRNA的特征。而在基于Cpf1的基因组编辑系统中,gRNA通常仅由成熟crRNA分子构成,其中crRNA包含的序列与靶序列具有足够相同性以便与靶序列的互补序列杂交并且指导复合物(Cpf1+crRNA)与该靶序列序列特异性结合。基于所使用的CRISPR效应蛋白和待编辑的靶序列设计合适的gRNA序列属于本领域技术人员的能力范围内。
“基因组”在用于植物细胞时不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
如本文所使用的,术语“植物”包括整个植物和任何后代、植物的细胞、组织、或部分。术语“植物部分”包括植物的任何部分,包括,例如但不限于:种子(包括成熟种子、没有种皮的未成熟胚、和不成熟的种子);植物插条(plant cutting);植物细胞;植物细胞培养物;植物器官(例如,花粉、胚、花、果实、芽、叶、根、茎,和相关外植体)。植物组织或植物器官可以是种子、愈伤组织、或者任何其他被组织成结构或功能单元的植物细胞群体。植物细胞或组织培养物能够再生出具有该细胞或组织所来源的植物的生理学和形态学特征的植物,并能够再生出与该植物具有基本上相同基因型的植物。与此相反,一些植物细胞不能够再生产生植物。植物细胞或组织培养物中的可再生细胞可以是胚、原生质体、分生细胞、愈伤组织、花粉、叶、花药、根、根尖、丝、花、果仁、穗、穗轴、壳、或茎。
植物部分包括可收获的部分和可用于繁殖后代植物的部分。可用于繁殖的植物部分包括,例如但不限于:种子;果实;插条;苗;块茎;和砧木。植物的可收获部分可以是植物的任何有用部分,包括,例如但不限于:花;花粉;苗;块茎;叶;茎;果实;种子;和根。
植物细胞是植物的结构和生理单元。如本文所使用的,植物细胞包括原生质体和具有部分细胞壁的原生质体。植物细胞可以处于分离的单个细胞或细胞聚集体的形式(例如,松散愈伤组织和培养的细胞),并且可以是更高级组织单元(例如,植物组织、植物器官、和植物)的一部分。因此,植物细胞可以是原生质体、产生配子的细胞,或者能够再生成完整植物的细胞或细胞的集合。因此,在本文的实施方案中,包含多个植物细胞并能够再生成为整株植物的种子被认为是一种“植物部分”。
如本文所使用的,术语“原生质体”是指细胞壁被完全或部分地除去、其脂双层膜裸露的植物细胞。典型地,原生质体是没有细胞壁的分离植物细胞,其具有再生成细胞培养物或整株植物的潜力。
植物“后代”包括植物的任何后续世代。
“经遗传修饰的植物”包括在其基因组内包含外源多核苷酸或修饰的基因或表达调控序列的植物。例如外源多核苷酸能够稳定地整合进基因组中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。修饰的基因或表达调控序列为在植物基因组中所述序列包含单个或多个脱氧核苷酸取代、缺失和添加。例如,通过本发明获得的经遗传修饰的植物可以相对于野生型植物(相应的未经所述遗传修饰的植物)包含一个或多个A至G的取代。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A”为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G”表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H”表示A或C或T,“I”表示肌苷,并且“N”表示任何核苷酸。
“多肽”、“肽”、和“蛋白质”在本发明中可互换使用,指氨基酸残基的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白质”还可包括修饰形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。
如本发明所用,“表达构建体”是指适于感兴趣的核苷酸序列在植物中表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA)。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。植物表达调控元件指的是能够在植物中控制感兴趣的核苷酸序列转录、RNA加工或稳定性或者翻译的核苷酸序列。
调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中,启动子是能够控制植物细胞中基因转录的启动子,无论其是否来源于植物细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用,并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”植物是指用所述核酸或蛋白质转化植物细胞,使得所述核酸或蛋白质在植物细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。
“稳定转化”指将外源核苷酸序列导入植物基因组中,导致外源基因稳定遗传。一旦稳定转化,外源核酸序列稳定地整合进所述植物和其任何连续世代的基因组中。
“瞬时转化”指将核酸分子或蛋白质导入植物细胞中,执行功能而没有外源基因稳定遗传。瞬时转化中,外源核酸序列不整合进植物基因组中。
“性状”指植物或特定植物材料或细胞的生理的、形态的、生化的或物理的特征。在一些实施方式中,这些特征可以是肉眼可见的,比如种子、植株的大小等;可用生物化学技术测定的指标,如种子或叶片中蛋白、淀粉或油份的含量等;可观察的代谢或生理过程,如测定对水分胁迫、特定盐、糖或氮浓度的抗性;可检测的基因表达水平;或可观察渗透胁迫的抗性或产量等农艺性状。在一些实施方式中,性状还包括植物对除草剂的抗性。
“农艺性状”是可测量的指标参数,包括但不限于:叶片绿色、籽粒产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白含量、种子蛋白含量、植物营养组织蛋白质含量、抗旱性、氮的吸收、根的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗盐性和分蘖数等。
二、碱基编辑系统
首先,本发明提供一种碱基编辑融合蛋白,其包含核酸酶失活的CRISPR效应蛋白(如Cas9和Cpf1等)和APOBEC3A脱氨酶。在一些实施方案中,所述碱基编辑融合蛋白包含选自SEQ ID NO:12-16的氨基酸序列。
本发明人令人惊奇地发现,核酸酶失活的CRISPR效应蛋白与APOBEC3A脱氨酶相融合形成的碱基编辑器,能够在17bp范围内的脱氨基窗口内,广泛地在植物内源基因组位点上甚至是高GC背景的位点高效引入C至T取代突变。在本文实施方案中,“碱基编辑融合蛋白”和“碱基编辑器”可互换使用。
本发明还提供了所述碱基编辑融合蛋白在对细胞基因组中的靶序列进行碱基编辑的用途。
本发明还提供了一种用于对细胞基因组中的靶序列进行碱基编辑的系统,其包含以下i)至v)中至少一项:
i)碱基编辑融合蛋白,和向导RNA;
ii)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导RNA;
iii)碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构建体;
iv)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编码向导RNA的核苷酸序列的表达构建体;
v)包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷酸序列的表达构建体;
其中所述碱基编辑融合蛋白包含核酸酶失活的CRISPR效应蛋白(如Cas9和Cpf1等)和APOBEC3A脱氨酶,所述向导RNA能够将所述碱基编辑融合蛋白靶向细胞基因组中的靶序列,从而所述碱基编辑融合蛋白导致所述靶序列中的一或多个C被T取代。
在本发明各个方面的一些实施方案中,所述APOBEC3A脱氨酶是人APOBEC3A脱氨酶。在一些实施方案中,所述APOBEC3A脱氨酶包含与SEQ ID NO:2具有至少75%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%序列相同性的氨基酸序列,并基本上保留SEQ ID NO:2的脱氨酶活性。在一些实施方案中,所述APOBEC3A脱氨酶相对于SEQ ID NO:2包含一或多个,例如1个、2个、3个、4个、5个、6个、7个、8个、9个、10个氨基酸取代、缺失或添加,并基本上保留SEQID NO:2的脱氨酶活性。在一些实施方案中,所述人APOBEC3A脱氨酶包含SEQ ID NO:2所示氨基酸序列。在一些实施方案中,所述APOBEC3A脱氨酶相对于SEQ ID NO:2包含在第57位的氨基酸取代,例如N57G取代。
如本发明所用,“核酸酶失活的CRISPR效应蛋白”是指CRISPR效应蛋白的双链核酸切割活性缺失,然而还保留gRNA指导的DNA靶向能力。缺失双链核酸切割活性的CRISPR效应蛋白也涵盖切口酶(nickase),其在双链核酸分子形成切口(nick),但不完全切断双链核酸。
在本发明的一些优选的实施方案中,本发明所述核酸酶失活的CRISPR效应蛋白具有切口酶活性。不受任何理论限制,据认为真核生物的错配修复通过DNA链上的切口(nick)来指导该链错配碱基的移除和修复。胞苷脱氨酶作用形成的U:G错配可能被修复为C:G。通过在包含未编辑的G的一条链上引入切口,将能够优先地将U:G错配修复为期望的U:A或T:A。
在一些实施方案中,所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cas9。Cas9核酸酶的DNA切割结构域已知包含两个亚结构域:HNH核酸酶亚结构域和RuvC亚结构域。HNH亚结构域切割与gRNA互补的链,而RuvC亚结构域切割非互补的链。在这些亚结构域中的突变可以使Cas9的核酸酶活性失活,形成“核酸酶失活的Cas9”。所述核酸酶失活的Cas9仍然保留gRNA指导的DNA结合能力。因此,原则上,当与另外的蛋白融合时,核酸酶失活的Cas9可以简单地通过与合适的向导RNA共表达而将所述另外的蛋白靶向几乎任何DNA序列。
本发明所述核酸酶失活的Cas9可以衍生自不同物种的Cas9,例如,衍生自化脓链球菌(S.pyogenes)Cas9(SpCas9),或衍生自金黄色葡萄球菌(S.aureus)Cas9(SaCas9)。同时突变Cas9的HNH核酸酶亚结构域和RuvC亚结构域(例如,包含突变D10A和H840A)使Cas9的核酸酶失去活性,成为核酸酶死亡Cas9(dCas9)。突变失活其中一个亚结构域可以使得Cas9具有切口酶活性,即获得Cas9切口酶(nCas9),例如,仅具有突变D10A的nCas9。
因此,在本发明的一些实施方案中,本发明所述核酸酶失活的Cas9相对于野生型Cas9包含氨基酸取代D10A和/或H840A。
在本发明的一些具体实施方式中,所述核酸酶失活的Cas9还可以包含额外的突变。例如核酸酶失活的SpCas9还可以包含EQR、VQR或VRER突变以及SaCas9还可以包含KKH突变(Kim et al.Nat.Biotechnol.35,371-376.)。
在本发明的一些具体实施方式中,所述核酸酶失活的SpCas9包含SEQ ID NO:4所示的氨基酸序列。
在一些实施方案中,所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cpf1。Cpf1包含一个DNA切割结构域(RuvC),将其突变后可以使Cpf1的DNA切割活性缺失,形成“DNA切割活性缺失的Cpf1”。所述DNA切割活性缺失的Cpf1仍然保留gRNA指导的DNA结合能力。因此,原则上,当与另外的蛋白融合时,DNA切割活性缺失的Cpf1可以简单地通过与合适的向导RNA共表达而将所述另外的蛋白靶向几乎任何DNA序列。
本发明所述DNA切割活性缺失的Cpf1可以衍生自不同物种的Cpf1,例如,衍生自Francisella novicida U112、Acidaminococcus sp.BV3L6和Lachnospiraceae bacteriumND2006的分别称为FnCpf1、AsCpf1和LbCpf1的Cpf1蛋白。
在一些实施方案中,所述DNA切割活性缺失的Cpf1是DNA切割活性缺失的FnCpf1。在一些具体实施方式中,所述DNA切割活性缺失的FnCpf1相对于野生型FnCpf1包含D917A突变。
在一些实施方案中,所述DNA切割活性缺失的Cpf1是DNA切割活性缺失的AsCpf1。在一些具体实施方式中,所述DNA切割活性缺失的AsCpf1相对于野生型AsCpf1包含D908A突变。
在一些实施方案中,所述DNA切割活性缺失的Cpf1是DNA切割活性缺失的LbCpf1。在一些具体实施方式中,所述DNA切割活性缺失的LbCpf1相对于野生型LbCpf1包含D832A突变。
在本发明的一些实施方案中,所述APOBEC3A脱氨酶被融合至所述核酸酶失活的CRISPR效应蛋白(如核酸酶失活的Cas9或Cpf1)的N末端。
在本发明的一些实施方案中,所述APOBEC3A脱氨酶和所述核酸酶失活的CRISPR效应蛋白(如核酸酶失活的Cas9或Cpf1)通过接头融合。所述接头可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如,所述接头可以是柔性接头,例如GGGGS、GS、GAP、(GGGGS)x 3、GGS和(GGS)x7等。优选地,所述接头长32个氨基酸。在一些优选的实施方案中,所述接头是SEQ ID NO:3所示的XTEN接头。
在细胞中,尿嘧啶DNA糖基化酶催化U从DNA上的去除并启动碱基切除修复(BER),导致将U:G修复成C:G。因此,不受任何理论限制,在本发明的碱基编辑融合蛋白或本发明的系统中包含尿嘧啶DNA糖基化酶抑制剂将能够增加碱基编辑的效率。
因此,在本发明的一些实施方案中,所述碱基编辑融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。在一些具体实施方式中,所述尿嘧啶DNA糖基化酶抑制剂包含SEQ IDNO:5所示的氨基酸序列。
在一些实施方案中,本发明的碱基编辑融合蛋白还包含Gam蛋白。在一些实施方案中,其氨基酸序列如SEQ ID NO:6所示。
在本发明的一些实施方案中,本发明的碱基编辑融合蛋白还包含核定位序列(NLS)。一般而言,所述碱基编辑融合蛋白中的一个或多个NLS应具有足够的强度,以便在植物细胞的核中驱动所述碱基编辑融合蛋白以可实现其碱基编辑功能的量积聚。一般而言,核定位活性的强度由所述碱基编辑融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。
在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以位于N端和/或C端。在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以位于APOBEC3A脱氨酶与所述核酸酶失活的CRISPR效应蛋白之间。在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以位于APOBEC3A脱氨酶与所述核酸酶失活的CRISPR效应蛋白之间。在一些实施方案中,所述碱基编辑融合蛋白包含约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含在或接近于N端的约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含在或接近于C端约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含这些的组合,如包含在N端的一个或多个NLS以及在C端的一个或多个NLS。当存在多于一个NLS时,每一个可以被选择为不依赖于其他NLS。在本发明的一些实施方式中,所述碱基编辑融合蛋白包含至少2个NLS,例如所述至少2个NLS位于C端。在一些实施方案中,所述NLS位于所述碱基编辑融合蛋白的C末端。在一些实施方案中,所述碱基编辑融合蛋白包含至少3个NLS。在一些实施方案中,所述碱基编辑融合蛋白在N端和/或在所述APOBEC3A脱氨酶与所述核酸酶失活的CRISPR效应蛋白之间不包含NLS。
一般而言,NLS由暴露于蛋白表面上的带正电的赖氨酸或精氨酸的一个或多个短序列组成,但其他类型的NLS也是已知的。NLS的非限制性实例包括:KKRKV(核苷酸序列5’-AAGAAGAGAAAGGTC-3’)、PKKKRKV(核苷酸序列5’-CCCAAGAAGAAGAGGAAGGTG-3’或CCAAAGAAGAAGAGGAAGGTT),或SGGSPKKKRKV(核苷酸序列5’-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3’)。
在本发明的一些实施方式中,所述碱基编辑融合蛋白的N端包含PKKKRKV所示的氨基酸序列的NLS。在本发明的一些实施方式中,所述碱基编辑融合蛋白的C端包含KRPAATKKAGQAKKKK所示的氨基酸序列的NLS。在本发明的一些实施方式中,所述碱基编辑融合蛋白的C端包含PKKKRKV所示的氨基酸序列的NLS效率更高。
此外,根据所需要编辑的DNA位置,本发明的碱基编辑融合蛋白还可以包括其他的定位序列,例如细胞质定位序列、叶绿体定位序列、线粒体定位序列等。
在一些具体实施方案中,所述碱基编辑融合蛋白包含选自SEQ ID NO:12-16的氨基酸序列。
为了在植物中获得有效表达,在本发明的一些实施方式中,所述编码碱基编辑融合蛋白的核苷酸序列针对待进行碱基编辑的植物进行密码子优化。
密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修饰核酸序列以便增强在感兴趣宿主细胞中的表达的方法。不同的物种对于特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关,而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码子。因此,可以将基因定制为基于密码子优化在给定生物中的最佳基因表达。密码子利用率表可以容易地获得,例如在www.kazusa.orjp/codon/上可获得的密码子使用数据库(“Codon Usage Database”)中,并且这些表可以通过不同的方式调整适用。参见,Nakamura Y.等,“Codon usage tabulatedfrom the international DNA sequence databases:status for theyear2000.Nucl.Acids Res.,28:292(2000)。
在一些具体实施方案中,所述碱基编辑融合蛋白由选自SEQ ID NO:7-11的核苷酸序列编码。
在本发明一些实施方式中,所述向导RNA是单向导RNA(sgRNA)。根据给定的靶序列构建合适的sgRNA的方法是本领域已知的。例如,可参见文献:Wang,Y.et al.Simultaneousediting of three homoeoalleles in hexaploid bread wheat confers heritableresistance to powdery mildew.Nat.Biotechnol.32,947-951(2014);Shan,Q.etal.Targeted genome modification of crop plants using a CRISPR-Cassystem.Nat.Biotechnol.31,686-688(2013);Liang,Z.et al.Targeted mutagenesis inZea mays using TALENs and the CRISPR/Cas system.J Genet Genomics.41,63–68(2014)。在本发明一些优选实施方式中,所述向导RNA是esgRNA。所述esgRNA的构建可以参考Li,C.et al.Genome Biol.19,59(2018)。
在本发明一些实施方式中,所述编码碱基编辑融合蛋白的核苷酸序列和/或所述编码向导RNA的核苷酸序列与植物表达调控元件如启动子可操作地连接。
本发明可使用的启动子的实例包括但不限于:花椰菜花叶病毒35S启动子(Odellet al.(1985)Nature 313:810-812)、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子、TrpPro5启动子(美国专利申请No.10/377,318;2005年3月16日提请)、pEMU启动子(Last et al.(1991)Theor.Appl.Genet.81:581-588)、MAS启动子(Velten et al.(1984)EMBO J.3:2723-2730)、玉米H3组蛋白启动子(Lepetit et al.(1992)Mol.Gen.Genet.231:276-285和Atanassova et al.(1992)Plant J.2(3):291-300)和欧洲油菜(Brassica napus)ALS3(PCT申请WO 97/41228)启动子。可用于本发明的启动子还包含Moore et al.(2006)Plant J.45(4):651-683中综述的常用组织特异性启动子。
在本发明可使用的sgRNA的精确RNA的获得借助于tRNA的自身切割产生(Zhang etal.(2017)Genome Biology,2017,18:191)。
三、产生经遗传修饰的生物体的方法
在另一方面,本发明提供了一种产生经遗传修饰的生物体的方法,包括将本发明的用于对细胞基因组中的靶序列进行碱基编辑的系统导入生物体细胞,由此所述向导RNA将所述碱基编辑融合蛋白靶向所述生物体细胞基因组中的靶序列,导致所述靶序列中的一或多个C被T取代。在一些优选实施方案中,所述生物体是植物。
可以被Cas9和向导RNA复合物识别并靶向的靶序列的设计属于本领域普通技术人员的技能范围。可以被Cpf1蛋白和向导RNA(即crRNA)复合物识别并靶向的靶序列或crRNA编码序列的设计可以参照例如Zhang et al.,Cell 163,1-13,October 22,2015。一般而言,靶序列是与向导RNA中包含的大约20个核苷酸的引导序列互补的序列,且3’末端紧邻前间区序列邻近基序(protospacer adjacent motif)(PAM)NGG。
例如,在本发明的一些实施方案中,所述靶序列具有以下结构:5’-NX-NGG-3’,其中N独立地选自A、G、C和T;X为14≤X≤30的整数;Nx表示X个连续的核苷酸,NGG为PAM序列。在本发明的一些优选的实施方案中,X为20。在一些实施方案中,碱基编辑的窗口位于靶序列的位置1-17。也就是说,本发明的系统可以使靶序列从5’末端起的第1-17位范围内的一或多个C被T取代。
在本发明所述方法的一些实施方案中,还包括筛选具有期望的核苷酸取代的生物体如植物。可以通过T7EI、PCR/RE或测序方法检测生物体如植物中的核苷酸取代,例如可参见Shan,Q.,Wang,Y.,Li,J.&Gao,C.Genome editing in rice and wheat using theCRISPR/Cas system.Nat.Protoc.9,2395-2410(2014)。
在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。
可以通过T7EI、PCR/RE或测序方法检测所述细胞靶序列中的C至T碱基编辑。
在本发明的方法中,所述碱基编辑的系统可以通过本领域技术人员熟知的各种方法导入细胞。可用于将本发明的基因组编辑系统导入细胞的方法包括但不限于:磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化。
可以通过本发明的方法进行基因组编辑的细胞可以来自例如,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,例如水稻、玉米、小麦、高粱、大麦、大豆、花生、拟南芥等。
本发明的方法尤其适合于产生经遗传修饰的植物,例如作物植物。在本发明的产生经遗传修饰的植物的方法中,所述碱基编辑系统可以本领域技术人员熟知的各种方法导入植物。可用于将本发明的碱基编辑系统导入植物的方法包括但不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。优选地,通过瞬时转化将所述碱基编辑系统导入植物。
在本发明的方法中,只需在植物细胞中导入或产生所述碱基编辑融合蛋白和向导RNA即可实现对靶序列的修饰,并且所述修饰可以稳定遗传,无需将所述碱基编辑系统稳定转化植物。这样避免了稳定存在的碱基编辑系统的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的碱基编辑系统转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率,获得不含外源核苷酸序列的经修饰的植物。
在另一些实施方式中,可以将本发明的碱基编辑系统转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录的RNA分子转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现碱基编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组中的整合。
因此,在一些实施方式中,使用本发明的方法对植物进行遗传修饰和育种可以获得无外源DNA整合的植物,即非转基因(transgene-free)的经修饰的植物。此外,本发明的碱基编辑系统在植物中进行碱基编辑时具有高特异性(低脱靶率),这也提高了生物安全性。
可以通过本发明的方法进行碱基编辑的植物包括单子叶植物和双子叶植物。例如,所述植物可以是作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
在本发明的一些实施方式中,其中所述靶序列与植物性状如农艺性状相关,由此所述碱基编辑导致所述植物相对于野生型植物具有改变的性状。在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。相应地,在本发明的一些实施方式中,所述C至T的取代导致靶蛋白中的氨基酸取代。在本发明的另一些实施方式中,所述C至T的取代导致靶基因的表达发生变化。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的植物的后代。
在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分,其中所述植物通过本发明上述的方法获得。在一些实施方式中,所述经遗传修饰的植物或其后代或其部分是非转基因的。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明上述的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而将所述遗传修饰导入第二植物。
实施例
为了便于理解本发明,下面将参照相关具体实施例及附图对本发明进行更全面的描述。附图中给出了本发明的较佳实施例。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。
本发明中使用的原生质体来自于冬小麦品种Kenong199、粳稻品种中花11和马铃薯品种“Désirée”(Désirée)。
实施例1—PBE系统的优化及其编辑效率验证
将植物nCas9-PBE系统(以下称为PBE)(Zong,Y.等.Nat.Biotechnol.35,438-440(2017))中的rAPOBEC1替换为人类APOBEC3A(以下称为A3A),针对谷类植物进行密码子优化(图1b),获得A3A-PBE。
将UGI和Mu蛋白添加到A3A-PBE中,从而产生A3A-Gam(图1b),期望增加碱基编辑效率和产物纯度(Komor,A.C.等.Sci.Adv.3,eaao4774(2017))。
使用之前研究描述过的报道基因系统来表征这些构建体的碱基编辑活性,当BFP-sgRNA靶序列的C4变为T4时,其可将BFP转化为GFP(Zong,Y.等.Nat.Biotechnol.35,438-440(2017))。通过PEG介导的转化,将各个植物碱基编辑器(PBE、A3A-PBE和A3A-Gam)构建体与pUbi-BFPm和pOsU3-BFP-sgRNA共转染到水稻原生质体中。
流式细胞术(FCM)分析显示A3A-PBE在24.5%的频率下产生最大比率的GFP表达细胞,比PBE大约高出12倍(图2a-b)。A3A-Gam的编辑效率低于A3A-PBE,但高于PBE。
实施例2—A3A-PBE在小麦和水稻细胞中突变效率和编辑窗口验证
为了进一步测试A3A-PBE编辑内源基因的效果,为3种小麦基因(TaALS、TaMTL、TaLOX2-T1和TaLOX2-T2)设计了4种sgRNA,并为6种水稻基因(OsAAT-T1,OsCDC48,OsDEP1,OsPDS,OsNRT1.1B-T1,OsOD和OsEV)各自设计1种sgRNA(图3a-b和表1)。作为对照,使用野生型Cas9(WT Cas9)产生缺失和/或插入突变(插入缺失,indel)。
表1.sgRNA靶位点和序列的描述
Figure BDA0002914838020000141
Figure BDA0002914838020000151
注:标有下划线的C/G碱基即由PBE、A3A-PBE和A3A-Gam编辑的碱基。每个目标序列中的PAM基序以粗体显示。
使用下一代测序技术(NGS)对每个基因座获取100,000-270,000个读段,从而评估原生质体中各基因的C至T碱基编辑。最终评估出A3A-PBE的基因编辑效率最高,其在小麦中的编辑频率为0.3-36.9%,其在水稻中的编辑频率为0.5-31.1%(图3a-b)。A3A-PBE在10个靶位点的平均编辑效率为13.1%,与PBE的平均效率(1%)、和A3A-Gam的平均效率(2.8%)相比,分别提高了13倍和5倍。这些目标位点碱基编辑效率的增加幅度为:PBE<A3A-Gam<A3A-PBE,这与报告系统的结果一致(图2a-b)。
通过分析10个测试位点的原位空间位置的编辑效率,发现在多数情况下,A3A-PBE的活性脱氨基窗口跨越大约17个核苷酸,从前间区序列(protospacer)位置1-17,比以往在植物系统中报道的PBE的编辑窗口(位置3到9)更宽(图3a-b)。
由于大多数靶向Cs位于前间区序列的3-9位置之外,这意味着A3A-PBE的靶向范围有所增加,且在一定程度上可以克服PAM的要求限制。此外,A3A-PBE与其他两种构建体一起,并未在任何小麦和水稻基因组靶基因座上诱导出非预期编辑(<0.1%),且其插入缺失频率(<0.1%)明显低于野生型Cas9(WT Cas9)(2.2-21.6%)(图5-10)。
实施例3—A3A-PBE在四倍体马铃薯中突变效率和编辑窗口验证
四倍体遗传使得对马铃薯的研究和经传统杂交进行育种成为一项挑战(Obidiegwu,J.E.,Flath,K.和Gebhardt,C.Theor.Appl.Genet.127,763-780(2014))。本实施例在四倍体马铃薯(Solanum tuberosum)中应用了A3A-PBE。在本发明中,利用35S启动子驱动A3A-PBE和PBE的融合蛋白,利用AtU6启动子驱动sgRNA(图11a)。为了靶向两个内源性马铃薯基因StALS(StALS-T1至StALS-T4)和StGBSS(StGBSS-T1至StGBSS-T7),分别设计了四个和六个sgRNA(图3c、图10b和表1)。
将sgRNA连同A3A-PBE或PBE构建体共转化到马铃薯原生质体中,并且在转染后48小时检测到碱基编辑诱导的突变。PBE在这10个靶位点的平均编辑效率为0.4%(图3c)。在这10个靶位点观察A3A-PBE的C至T转化率,其平均效率(4.3%)高出PBE约11倍。
在A3A-PBE编辑的10个靶位点中均观察到C至T的转换,并在前间区序列内观察到有效编辑频率跨越位置1到17(图3c),这与小麦和水稻细胞中的结果一致(图3a-b)。
同样,A3A-PBE诱导的插入缺失(<0.1%)相比WT Cas9(6.2-34.5%)大幅降低(图10)。
这是首次发现胞苷脱氨基的基因编辑可用于靶向马铃薯基因组,这为将A3A-PBE广泛用于双子叶植物铺平了道路。
总之,这些结果表明A3A-PBE在小麦、水稻和马铃薯细胞中的多个基因座处提供了比PBE更高的C至T突变效率和更宽的编辑窗口。
实施例4—测试A3A-PBE融合基因在内源植物基因内的高GC位点处的情况
使用针对3种小麦基因和3种水稻基因设计了7种不同的sgRNA(TaHPPD,TaDEP1,TaLOX2-T3,TaLOX2-T4,OsHPPD,OsAAT-T2和OsNRT1.1B-T2)(图12a、表1),并直接比较了A3A-PBE和PBE的编辑活性。该实施例表明A3A-PBE融合基因对紧邻G的下游目标C明显没有偏见(Komor,A.C.等.Nature 533,420–424(2016))。A3A-PBE在这七个靶位点中,在高GC背景下的编辑效率提高至41.2%(图11a)。
同时,在PBE的所有靶位点中几乎观察不到C至T的编辑细胞(<0.2%),与A3A-PBE的碱基编辑相比,该效率降低了50倍。因此,就植物基因组中大量包含5'-GC-3'的序列而言,A3A-PBE对靶向突变更为有利。总而言之,无论在何种序列背景下,A3A-PBE都几乎可以同等地编辑胞苷,这一点优于PBE(图11b)。鉴于对靶胞嘧啶侧翼序列的要求降低,该技术将改善靶向范围从而更为有利于产生点突变。
实施例5—调查A3A-PBE是否可以在与多种sgRNA结合时产生多样化的突变
A3A-PBE广泛的脱氨基窗口和高编辑效率表明,它可能可以在研究基因调控区域发挥作用,在基因调控区域可能需要突变多个位点。因此调查了A3A-PBE是否可以在与多种sgRNA结合时产生多样化的突变。TaVRN1-A1启动子包含多个调控位点,例如VRN盒、CArG盒以及一个推定的AG杂交盒(图11c),这些多个结合位点的突变可以影响小麦开花时间(Chengxia,L.和Jorge,D.The Plant J.55,543-554(2008).;Kippes,N.等.Proc.Natl.Acad.Sci.USA112,E5401-E5410(2015))。
设计了3个sgRNA,用于靶向相关的结合位点(图11c)。在A3A-PBE或其变体A3A-PBE-VQR处理的原生质体中,扩增出TaVRN1靶位点的扩增子,从而鉴定出在这六个顺式元件中携带不同突变的读段,其效率范围为1.2%至27.7%。例如,在VRN盒的靶位点,A3A-PBE有效编辑了sgRNA靶序列第4位至第16位的C核苷酸,足以破坏与bZIP转录因子之间的结合(图11c)(Chengxia,L.和Jorge,D.The Plant J.55,543-554(2008);Kippes,N.等.Proc.Natl.Acad.Sci.USA 112,E5401-E5410(2015))。
实施例6—再生A3A-PBE碱基编辑的突变植物
靶向小麦中的乙酰乳酸合酶基因(ALS),其为支链氨基酸生物合成途径中的第一种酶。将硬直黑麦草(Lolium rigidum)ALS的保守P197氨基酸取代为其他氨基酸可使草种抗除草剂烟嘧磺隆(Powles,S.B.和Yu,Q.Annu.Rev.Plant Biol.61,317-347(2010).)。硬直黑麦草(Lolium rigidum)中的P197对应于六倍体小麦靶位点TaALS中的P174。
通过基因枪法将A3A-PBE和pTaU6-ALS-sgRNA构建体转入未成熟的小麦胚中,并且在未使用除草剂或抗性筛选的前提下再生了植物。通过PCR-RE和Sanger测序,在120个转化后的未成熟胚中,再生了27株含有至少一个C至T取代的突变植物,突变效率为22.5%(27/120)(图12a、图13),比先前报道的CRISPR/Cas9介导的基因敲除或点突变效率高约4-10倍。在前间区序列位置-7、6、7、8、9、10、12和13处发现C至T替换(图12a和图13)。
在27个突变体中,鉴定了多种氨基酸取代组合,其中12个突变体在三个基因组中均具有靶向性突变(表2)。更为重要的是,这27个突变体中有两个突变体(T0-7,T0-9),其有6个等位基因同时被编辑,切编码的蛋白均包含氨基酸替换(图12a-b和表2)。
评估了T0-7突变体的除草剂抗性。在添加0.254ppm烟嘧磺隆的再生培养基上培养三周后,突变植物仍具有正常表型,并未有损伤性状。而野生型(WT)植物显示出严重的发育迟缓和叶子枯萎性状(图12b)。
Figure BDA0002914838020000181
Figure BDA0002914838020000191
Figure BDA0002914838020000201
Figure BDA0002914838020000211
Figure BDA0002914838020000221
实施例7—A3A-PBE碱基编辑的多样化和精确性验证
通过农杆菌介导的转化,使用A3A-PBE系统靶向OsCDC48和OsNRT1.1B-T2位点,在水稻中获得碱基编辑的植物。鉴定了OsCDC48碱基替换效率为82.9%(34/41)和OsNRT1.1B-T2碱基替换效率为44.1%(15/34),其中包括7个OsCDC48和4个OsNRT1.1B-T2纯合突变品系(图12a)。
通过PEG介导的原生质体转化,靶向马铃薯StGBSS-T6。从原生质体再生出两株独立的杂合突变体马铃薯植物,碱基编辑频率为6.5%(2/31)。
通过A3A-PBE可以获得不同的突变体组合,例如,在34个OsCDC48突变体植物中,存在五种组合:3个单碱基取代,1个双碱基取代,8个三碱基取代,14个五碱基取代和6个六碱基取代(图12a),这些取代比先前报道的更有效,且比PBE产生的突变更多样化。
使用在线工具CRISPR-P预测潜在脱靶区域,鉴定和检测了水稻基因组中OsCDC48和OsNRT1.1B-T2的脱靶位点。
转基因水稻植物均未在两个靶位点产生插入缺失或非预期的编辑(图12a)。在两个靶点的潜在的3个错配脱靶区域均未检测到突变(表4)。这表明,A3A-PBE系统可以有效的在诱导植物中的特定靶点产生突变,而不引起其他基因组修饰。
实施例8—对A3A-PBE融合基因的进一步优化
在大肠杆菌表达并纯化了的不含UGI(A3A-PBE-ΔUGI)蛋白的A3A-PBE(图15)。在没有UGI的情况下,融合蛋白对植物细胞的毒性更小,更易于纯化,并可以提高C核苷酸转化为其他三种碱基核苷酸的可能。A3A-PBE-ΔUGI蛋白与体外转录的sgRNA形成核糖核蛋白复合物,并将针对2个小麦基因(TaMTL和TaLOX2-T5)的复合物转入原生质体中(图16a和表1)。
扩增子深度测序结果显示A3A-PBE-ΔUGI RNP的C至T取代频率在1.8%,效率低于A3A-PBE-ΔUGI质粒形式(平均为3.9%)(图16a),而采用PBE RNP形式则不可行。植物A3A-PBE-ΔUGI RNP可以经进一步优化用以产生非转基因突变植物,这可以促进碱基编辑在改良作物植物的育种和商业化中的应用。
此外,还对A3A进行了突变,第57位的N突变为G(N57G取代),构建了A3A-PBE-N57G融合蛋白。A3A-PBE、A3A-PBE-N57G和A3A-PBE-ΔUGI转化在小麦和水稻原生质体,针对不同基因进行碱基编辑。结果如图19所示。A3A-PBE-N57G和A3A-PBE-ΔUGI在某些位点可具有较高的编辑效率。
此外,对A3A-PBE融合蛋白N端加一个NLS,构建A3A-PBE-NLS,并在小麦原生质体中行进行验证。结果如图20所示。A3A-PBE-NLS某些位点具有与A3A-PBE相当或更高的编辑效率。
实施例9—水稻参考基因组序列(Os-Nipponbare-Reference-IRGSP-1.0)的计算分析结果
对水稻参考基因组序列(Os-Nipponbare-Reference-IRGSP-1.0)的计算分析显示,相比于PBE,本发明中具有17个核苷酸编辑窗口的A3A-PBE碱基编辑器在碱基编辑靶向范围中,C/G碱基编辑数量提高了1.8倍(图16b)。相似地,当SpCas9,SaCas9及其变体携带NGG、NGA NCGC、NNGRRT和NNNRRT PAMs时,A3A脱氨酶可以在基因组范围内突变90%的C/G碱基(图16b)。
实施例10—基于Cpf1的A3A碱基编辑器
本实施例中,将前述A3A碱基编辑器中的nCas9替换为核酸酶失活的Cpf1蛋白。载体构建如图17所示。
利用所得的基于Cpf1的A3A碱基编辑器编辑內源靶基因水稻DEP1,检测在第十位C的突变效率。结果如图18所示。结果表明,相比于APOBEC1,人APOBEC3A能够显著提高碱基编辑效率。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。
序列表
<110> 中国科学院遗传与发育生物学研究所
<120> 基于人APOBEC3A脱氨酶的碱基编辑器及其用途
<130> P2019TC821
<150> 201810816603.7
<151> 2018-07-24
<160> 16
<170> PatentIn version 3.5
<210> 1
<211> 597
<212> DNA
<213> artificial sequence
<220>
<223> 人APOBEC3A 编码序列-密码子优化
<400> 1
atggaggcca gcccggctag cggcccaagg catctcatgg acccgcacat cttcaccagc 60
aacttcaaca acggcatcgg caggcacaag acctacttgt gctacgaggt ggagaggctc 120
gacaacggaa cctccgtgaa gatggaccaa cacagggggt tcctccacaa ccaagccaag 180
aacctcctct gcggcttcta cggcaggcac gccgagttga ggttcctcga cttggtgcca 240
tccctccaac tcgatccagc ccaaatctac cgcgtgacct ggttcatctc ctggtcccca 300
tgcttctcct ggggttgcgc cggcgaggtt cgggctttcc tccaagaaaa cacccacgtc 360
cgcctccgca ttttcgccgc caggatctat gattacgacc ctctctacaa ggaggccctc 420
cagatgctgc gggacgccgg tgctcaggtg agtatcatga cctacgacga gttcaagcac 480
tgctgggaca ccttcgttga ccaccagggc tgcccattcc aaccatggga cggtctggat 540
gaacacagcc aagccttgtc cggcaggctc cgggccatcc tccaaaacca ggggaac 597
<210> 2
<211> 199
<212> PRT
<213> artificial sequence
<220>
<223> 人APOBEC3A 氨基酸序列
<400> 2
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn
195
<210> 3
<211> 16
<212> PRT
<213> artificial sequence
<220>
<223> XTEN氨基酸序列
<400> 3
Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser
1 5 10 15
<210> 4
<211> 1369
<212> PRT
<213> artificial sequence
<220>
<223> nCas9氨基酸序列
<400> 4
Leu Lys Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser
1 5 10 15
Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys
20 25 30
Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu
35 40 45
Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg
50 55 60
Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile
65 70 75 80
Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp
85 90 95
Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys
100 105 110
Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala
115 120 125
Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val
130 135 140
Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala
145 150 155 160
His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn
165 170 175
Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr
180 185 190
Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp
195 200 205
Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu
210 215 220
Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly
225 230 235 240
Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn
245 250 255
Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr
260 265 270
Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala
275 280 285
Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser
290 295 300
Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala
305 310 315 320
Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu
325 330 335
Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe
340 345 350
Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala
355 360 365
Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met
370 375 380
Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu
385 390 395 400
Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His
405 410 415
Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro
420 425 430
Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg
435 440 445
Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala
450 455 460
Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu
465 470 475 480
Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met
485 490 495
Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His
500 505 510
Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val
515 520 525
Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu
530 535 540
Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val
545 550 555 560
Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe
565 570 575
Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu
580 585 590
Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu
595 600 605
Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu
610 615 620
Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr
625 630 635 640
Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg
645 650 655
Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg
660 665 670
Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly
675 680 685
Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr
690 695 700
Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser
705 710 715 720
Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys
725 730 735
Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met
740 745 750
Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn
755 760 765
Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg
770 775 780
Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His
785 790 795 800
Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr
805 810 815
Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn
820 825 830
Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu
835 840 845
Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn
850 855 860
Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met
865 870 875 880
Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg
885 890 895
Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu
900 905 910
Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
915 920 925
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr
930 935 940
Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys
945 950 955 960
Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val
965 970 975
Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala
980 985 990
Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu
995 1000 1005
Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile
1010 1015 1020
Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe
1025 1030 1035
Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu
1040 1045 1050
Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly
1055 1060 1065
Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr
1070 1075 1080
Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys
1085 1090 1095
Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro
1100 1105 1110
Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1115 1120 1125
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser
1130 1135 1140
Val Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu
1145 1150 1155
Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser
1160 1165 1170
Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr
1175 1180 1185
Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser
1190 1195 1200
Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala
1205 1210 1215
Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr
1220 1225 1230
Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly
1235 1240 1245
Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His
1250 1255 1260
Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser
1265 1270 1275
Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser
1280 1285 1290
Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu
1295 1300 1305
Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala
1310 1315 1320
Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr
1325 1330 1335
Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile
1340 1345 1350
Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1355 1360 1365
Asp
<210> 5
<211> 90
<212> PRT
<213> artificial sequence
<220>
<223> UGI氨基酸序列
<400> 5
Thr Arg Asp Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys
1 5 10 15
Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro
20 25 30
Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu
35 40 45
Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu
50 55 60
Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp
65 70 75 80
Ser Asn Gly Glu Asn Lys Ile Lys Met Leu
85 90
<210> 6
<211> 176
<212> PRT
<213> artificial sequence
<220>
<223> GAM氨基酸序列
<400> 6
Met Ala Lys Pro Ala Lys Arg Ile Lys Ser Ala Ala Ala Ala Tyr Val
1 5 10 15
Pro Gln Asn Arg Asp Ala Val Ile Thr Asp Ile Lys Arg Ile Gly Asp
20 25 30
Leu Gln Arg Glu Ala Ser Arg Leu Glu Thr Glu Met Asn Asp Ala Ile
35 40 45
Ala Glu Ile Thr Glu Lys Phe Ala Ala Arg Ile Ala Pro Ile Lys Thr
50 55 60
Asp Ile Glu Thr Leu Ser Lys Gly Val Gln Gly Trp Cys Glu Ala Asn
65 70 75 80
Arg Asp Glu Leu Thr Asn Gly Gly Lys Val Lys Thr Ala Asn Leu Val
85 90 95
Thr Gly Asp Val Ser Trp Arg Val Arg Pro Pro Ser Val Ser Ile Arg
100 105 110
Gly Met Asp Ala Val Met Glu Thr Glu Thr Leu Glu Arg Leu Gly Leu
115 120 125
Gln Arg Phe Ile Arg Thr Lys Gln Glu Ile Asn Lys Glu Ala Ile Leu
130 135 140
Leu Glu Pro Lys Ala Val Ala Gly Val Ala Gly Ile Thr Val Lys Ser
145 150 155 160
Gly Ile Glu Asp Phe Ser Ile Ile Pro Phe Glu Gln Glu Ala Gly Ile
165 170 175
<210> 7
<211> 5106
<212> DNA
<213> artificial sequence
<220>
<223> A3A-PBE
<400> 7
atggaggcca gcccggctag cggcccaagg catctcatgg acccgcacat cttcaccagc 60
aacttcaaca acggcatcgg caggcacaag acctacttgt gctacgaggt ggagaggctc 120
gacaacggaa cctccgtgaa gatggaccaa cacagggggt tcctccacaa ccaagccaag 180
aacctcctct gcggcttcta cggcaggcac gccgagttga ggttcctcga cttggtgcca 240
tccctccaac tcgatccagc ccaaatctac cgcgtgacct ggttcatctc ctggtcccca 300
tgcttctcct ggggttgcgc cggcgaggtt cgggctttcc tccaagaaaa cacccacgtc 360
cgcctccgca ttttcgccgc caggatctat gattacgacc ctctctacaa ggaggccctc 420
cagatgctgc gggacgccgg tgctcaggtg agtatcatga cctacgacga gttcaagcac 480
tgctgggaca ccttcgttga ccaccagggc tgcccattcc aaccatggga cggtctggat 540
gaacacagcc aagccttgtc cggcaggctc cgggccatcc tccaaaacca ggggaactcc 600
gggagcgaga cgccaggcac ctccgagtcg gccaccccag aatctcttaa ggacaagaag 660
tactcgatcg gcctcgccat cgggacgaac tcagttggct gggccgtgat caccgacgag 720
tacaaggtgc cctctaagaa gttcaaggtc ctggggaaca ccgaccgcca ttccatcaag 780
aagaacctca tcggcgctct cctgttcgac agcggggaga ccgctgaggc tacgaggctc 840
aagagaaccg ctaggcgccg gtacacgaga aggaagaaca ggatctgcta cctccaagag 900
attttctcca acgagatggc caaggttgac gattcattct tccaccgcct ggaggagtct 960
ttcctcgtgg aggaggataa gaagcacgag cggcatccca tcttcggcaa catcgtggac 1020
gaggttgcct accacgagaa gtaccctacg atctaccatc tgcggaagaa gctcgtggac 1080
tccaccgata aggcggacct cagactgatc tacctcgctc tggcccacat gatcaagttc 1140
cgcggccatt tcctgatcga gggggatctc aacccagaca acagcgatgt tgacaagctg 1200
ttcatccaac tcgtgcagac ctacaaccaa ctcttcgagg agaacccgat caacgcctct 1260
ggcgtggacg cgaaggctat cctgtccgcg aggctctcga agtccaggag gctggagaac 1320
ctgatcgctc agctcccagg cgagaagaag aacggcctgt tcgggaacct catcgctctc 1380
agcctggggc tcaccccgaa cttcaagtcg aacttcgatc tcgctgagga cgccaagctg 1440
caactctcca aggacaccta cgacgatgac ctcgataacc tcctggccca gatcggcgat 1500
caatacgcgg acctgttcct cgctgccaag aacctgtcgg acgccatcct cctgtcagat 1560
atcctccgcg tgaacaccga gatcacgaag gctccactct ctgcctccat gatcaagcgc 1620
tacgacgagc accatcagga tctgaccctc ctgaaggcgc tggtccgcca acagctcccg 1680
gagaagtaca aggagatttt cttcgatcag tcgaagaacg gctacgctgg gtacatcgac 1740
ggcggggcct cacaagagga gttctacaag ttcatcaagc caatcctgga gaagatggac 1800
ggcacggagg agctcctggt gaagctcaac agggaggacc tcctgcggaa gcagagaacc 1860
ttcgataacg gcagcatccc ccaccaaatc catctcgggg agctgcacgc catcctgaga 1920
aggcaagagg acttctaccc tttcctcaag gataaccggg agaagatcga gaagatcctg 1980
accttcagaa tcccatacta cgtcggccct ctcgcgcggg ggaactcaag attcgcttgg 2040
atgacccgca agtctgagga gaccatcacg ccgtggaact tcgaggaggt ggtggacaag 2100
ggcgctagcg ctcagtcgtt catcgagagg atgaccaact tcgacaagaa cctgcccaac 2160
gagaaggtgc tccctaagca ctcgctcctg tacgagtact tcaccgtcta caacgagctc 2220
acgaaggtga agtacgtcac cgagggcatg cgcaagccag cgttcctgtc cggggagcag 2280
aagaaggcta tcgtggacct cctgttcaag accaaccgga aggtcacggt taagcaactc 2340
aaggaggact acttcaagaa gatcgagtgc ttcgattcgg tcgagatcag cggcgttgag 2400
gaccgcttca acgccagcct cgggacctac cacgatctcc tgaagatcat caaggataag 2460
gacttcctgg acaacgagga gaacgaggat atcctggagg acatcgtgct gaccctcacg 2520
ctgttcgagg acagggagat gatcgaggag cgcctgaaga cgtacgccca tctcttcgat 2580
gacaaggtca tgaagcaact caagcgccgg agatacaccg gctgggggag gctgtcccgc 2640
aagctcatca acggcatccg ggacaagcag tccgggaaga ccatcctcga cttcctcaag 2700
agcgatggct tcgccaacag gaacttcatg caactgatcc acgatgacag cctcaccttc 2760
aaggaggata tccaaaaggc tcaagtgagc ggccaggggg actcgctgca cgagcatatc 2820
gcgaacctcg ctggctcccc cgcgatcaag aagggcatcc tccagaccgt gaaggttgtg 2880
gacgagctcg tgaaggtcat gggccggcac aagcctgaga acatcgtcat cgagatggcc 2940
agagagaacc aaaccacgca gaaggggcaa aagaactcta gggagcgcat gaagcgcatc 3000
gaggagggca tcaaggagct ggggtcccaa atcctcaagg agcacccagt ggagaacacc 3060
caactgcaga acgagaagct ctacctgtac tacctccaga acggcaggga tatgtacgtg 3120
gaccaagagc tggatatcaa ccgcctcagc gattacgacg tcgatcatat cgttccccag 3180
tctttcctga aggatgactc catcgacaac aaggtcctca ccaggtcgga caagaaccgc 3240
ggcaagtcag ataacgttcc atctgaggag gtcgttaaga agatgaagaa ctactggagg 3300
cagctcctga acgccaagct gatcacgcaa aggaagttcg acaacctcac caaggctgag 3360
agaggcgggc tctcagagct ggacaaggcc ggcttcatca agcggcagct ggtcgagacc 3420
agacaaatca cgaagcacgt tgcgcaaatc ctcgactctc ggatgaacac gaagtacgat 3480
gagaacgaca agctgatcag ggaggttaag gtgatcaccc tgaagtctaa gctcgtctcc 3540
gacttcagga aggatttcca gttctacaag gttcgcgaga tcaacaacta ccaccatgcc 3600
catgacgctt acctcaacgc tgtggtcggc accgctctga tcaagaagta cccaaagctg 3660
gagtccgagt tcgtgtacgg ggactacaag gtttacgatg tgcgcaagat gatcgccaag 3720
tcggagcaag agatcggcaa ggctaccgcc aagtacttct tctactcaaa catcatgaac 3780
ttcttcaaga ccgagatcac gctggccaac ggcgagatcc ggaagagacc gctcatcgag 3840
accaacggcg agacggggga gatcgtgtgg gacaagggca gggatttcgc gaccgtccgc 3900
aaggttctct ccatgcccca ggtgaacatc gtcaagaaga ccgaggtcca aacgggcggg 3960
ttctcaaagg agtctatcct gcctaagcgg aacagcgaca agctcatcgc cagaaagaag 4020
gactgggacc caaagaagta cggcgggttc gacagcccta ccgtggccta ctcggtcctg 4080
gttgtggcga aggttgagaa gggcaagtcc aagaagctca agagcgtgaa ggagctcctg 4140
gggatcacca tcatggagag gtccagcttc gagaagaacc caatcgactt cctggaggcc 4200
aagggctaca aggaggtgaa gaaggacctg atcatcaagc tcccgaagta ctctctcttc 4260
gagctggaga acggcaggaa gagaatgctg gcttccgctg gcgagctcca gaaggggaac 4320
gagctcgcgc tgccaagcaa gtacgtgaac ttcctctacc tggcttccca ctacgagaag 4380
ctcaagggca gcccggagga caacgagcaa aagcagctgt tcgtcgagca gcacaagcat 4440
tacctcgacg agatcatcga gcaaatctcc gagttcagca agcgcgtgat cctcgccgac 4500
gcgaacctgg ataaggtcct ctccgcctac aacaagcacc gggacaagcc catcagagag 4560
caagcggaga acatcatcca tctcttcacc ctgacgaacc tcggcgctcc tgctgctttc 4620
aagtacttcg acaccacgat cgatcggaag agatacacct ccacgaagga ggtcctggac 4680
gcgaccctca tccaccagtc gatcaccggc ctgtacgaga cgaggatcga cctctcacaa 4740
ctcggcgggg ataagagacc cgcagcaacc aagaaggcag ggcaagcaaa gaagaagaag 4800
acgcgtgact ccggcggcag caccaacctg tccgacatca tcgagaagga gacgggcaag 4860
caactcgtga tccaggagag catcctcatg ctgccagagg aggtggagga ggtcatcggc 4920
aacaagccag agtccgacat cctggtgcac accgcctacg acgagtccac cgacgagaac 4980
gtcatgctcc tgaccagcga cgccccagag tacaagccat gggccctcgt catccaggac 5040
agcaacgggg agaacaagat caagatgctg tcggggggga gcccaaagaa gaagcggaag 5100
gtgtag 5106
<210> 8
<211> 6009
<212> DNA
<213> artificial sequence
<220>
<223> A3A-Gam
<400> 8
atggcgaagc cggccaagag gatcaaatcc gctgctgctg cctacgtgcc gcaaaatagg 60
gatgccgtga tcaccgacat caagaggatc ggcgatctgc agagggaggc gtctcgtctc 120
gaaactgaga tgaacgacgc gatcgcggag atcaccgaga agttcgccgc tcgtatcgcc 180
ccgatcaaga ccgacatcga aactctctcc aagggcgtgc aaggttggtg cgaggccaat 240
agggacgagc tcaccaatgg cggcaaggtg aagaccgcca acctcgtgac cggcgatgtg 300
tcttggaggg tgaggccacc atccgtgagc attcgtggta tggacgccgt gatggaaact 360
ctcgagcgcc tcggcctcca aaggttcatc cgcaccaagc aagaaatcaa caaggaggcg 420
atcctcctcg agccaaaagc cgtggccggc gtggccggca tcacagtcaa gtccggcatc 480
gaggacttct ccatcatccc gttcgagcaa gaagccggca tctccggcag cgagacgcca 540
ggcacctccg agagcgctac gcctgaatcc aggcctgagg ccagcccggc tagcggccca 600
aggcatctca tggacccgca catcttcacc agcaacttca acaacggcat cggcaggcac 660
aagacctact tgtgctacga ggtggagagg ctcgacaacg gaacctccgt gaagatggac 720
caacacaggg ggttcctcca caaccaagcc aagaacctcc tctgcggctt ctacggcagg 780
cacgccgagt tgaggttcct cgacttggtg ccatccctcc aactcgatcc agcccaaatc 840
taccgcgtga cctggttcat ctcctggtcc ccatgcttct cctggggttg cgccggcgag 900
gttcgggctt tcctccaaga aaacacccac gtccgcctcc gcattttcgc cgccaggatc 960
tatgattacg accctctcta caaggaggcc ctccagatgc tgcgggacgc cggtgctcag 1020
gtgagtatca tgacctacga cgagttcaag cactgctggg acaccttcgt tgaccaccag 1080
ggctgcccat tccaaccatg ggacggtctg gatgaacaca gccaagcctt gtccggcagg 1140
ctccgggcca tcctccaaaa ccaggggaac agcggaggat cttccggagg atctagcggc 1200
tccgagacac caggaacatc cgaaagcgct acaccagaat ctagcggagg ctcttccgga 1260
ggatctctta aggacaagaa gtactcgatc ggcctcgcca tcgggacgaa ctcagttggc 1320
tgggccgtga tcaccgacga gtacaaggtg ccctctaaga agttcaaggt cctggggaac 1380
accgaccgcc attccatcaa gaagaacctc atcggcgctc tcctgttcga cagcggggag 1440
accgctgagg ctacgaggct caagagaacc gctaggcgcc ggtacacgag aaggaagaac 1500
aggatctgct acctccaaga gattttctcc aacgagatgg ccaaggttga cgattcattc 1560
ttccaccgcc tggaggagtc tttcctcgtg gaggaggata agaagcacga gcggcatccc 1620
atcttcggca acatcgtgga cgaggttgcc taccacgaga agtaccctac gatctaccat 1680
ctgcggaaga agctcgtgga ctccaccgat aaggcggacc tcagactgat ctacctcgct 1740
ctggcccaca tgatcaagtt ccgcggccat ttcctgatcg agggggatct caacccagac 1800
aacagcgatg ttgacaagct gttcatccaa ctcgtgcaga cctacaacca actcttcgag 1860
gagaacccga tcaacgcctc tggcgtggac gcgaaggcta tcctgtccgc gaggctctcg 1920
aagtccagga ggctggagaa cctgatcgct cagctcccag gcgagaagaa gaacggcctg 1980
ttcgggaacc tcatcgctct cagcctgggg ctcaccccga acttcaagtc gaacttcgat 2040
ctcgctgagg acgccaagct gcaactctcc aaggacacct acgacgatga cctcgataac 2100
ctcctggccc agatcggcga tcaatacgcg gacctgttcc tcgctgccaa gaacctgtcg 2160
gacgccatcc tcctgtcaga tatcctccgc gtgaacaccg agatcacgaa ggctccactc 2220
tctgcctcca tgatcaagcg ctacgacgag caccatcagg atctgaccct cctgaaggcg 2280
ctggtccgcc aacagctccc ggagaagtac aaggagattt tcttcgatca gtcgaagaac 2340
ggctacgctg ggtacatcga cggcggggcc tcacaagagg agttctacaa gttcatcaag 2400
ccaatcctgg agaagatgga cggcacggag gagctcctgg tgaagctcaa cagggaggac 2460
ctcctgcgga agcagagaac cttcgataac ggcagcatcc cccaccaaat ccatctcggg 2520
gagctgcacg ccatcctgag aaggcaagag gacttctacc ctttcctcaa ggataaccgg 2580
gagaagatcg agaagatcct gaccttcaga atcccatact acgtcggccc tctcgcgcgg 2640
gggaactcaa gattcgcttg gatgacccgc aagtctgagg agaccatcac gccgtggaac 2700
ttcgaggagg tggtggacaa gggcgctagc gctcagtcgt tcatcgagag gatgaccaac 2760
ttcgacaaga acctgcccaa cgagaaggtg ctccctaagc actcgctcct gtacgagtac 2820
ttcaccgtct acaacgagct cacgaaggtg aagtacgtca ccgagggcat gcgcaagcca 2880
gcgttcctgt ccggggagca gaagaaggct atcgtggacc tcctgttcaa gaccaaccgg 2940
aaggtcacgg ttaagcaact caaggaggac tacttcaaga agatcgagtg cttcgattcg 3000
gtcgagatca gcggcgttga ggaccgcttc aacgccagcc tcgggaccta ccacgatctc 3060
ctgaagatca tcaaggataa ggacttcctg gacaacgagg agaacgagga tatcctggag 3120
gacatcgtgc tgaccctcac gctgttcgag gacagggaga tgatcgagga gcgcctgaag 3180
acgtacgccc atctcttcga tgacaaggtc atgaagcaac tcaagcgccg gagatacacc 3240
ggctggggga ggctgtcccg caagctcatc aacggcatcc gggacaagca gtccgggaag 3300
accatcctcg acttcctcaa gagcgatggc ttcgccaaca ggaacttcat gcaactgatc 3360
cacgatgaca gcctcacctt caaggaggat atccaaaagg ctcaagtgag cggccagggg 3420
gactcgctgc acgagcatat cgcgaacctc gctggctccc ccgcgatcaa gaagggcatc 3480
ctccagaccg tgaaggttgt ggacgagctc gtgaaggtca tgggccggca caagcctgag 3540
aacatcgtca tcgagatggc cagagagaac caaaccacgc agaaggggca aaagaactct 3600
agggagcgca tgaagcgcat cgaggagggc atcaaggagc tggggtccca aatcctcaag 3660
gagcacccag tggagaacac ccaactgcag aacgagaagc tctacctgta ctacctccag 3720
aacggcaggg atatgtacgt ggaccaagag ctggatatca accgcctcag cgattacgac 3780
gtcgatcata tcgttcccca gtctttcctg aaggatgact ccatcgacaa caaggtcctc 3840
accaggtcgg acaagaaccg cggcaagtca gataacgttc catctgagga ggtcgttaag 3900
aagatgaaga actactggag gcagctcctg aacgccaagc tgatcacgca aaggaagttc 3960
gacaacctca ccaaggctga gagaggcggg ctctcagagc tggacaaggc cggcttcatc 4020
aagcggcagc tggtcgagac cagacaaatc acgaagcacg ttgcgcaaat cctcgactct 4080
cggatgaaca cgaagtacga tgagaacgac aagctgatca gggaggttaa ggtgatcacc 4140
ctgaagtcta agctcgtctc cgacttcagg aaggatttcc agttctacaa ggttcgcgag 4200
atcaacaact accaccatgc ccatgacgct tacctcaacg ctgtggtcgg caccgctctg 4260
atcaagaagt acccaaagct ggagtccgag ttcgtgtacg gggactacaa ggtttacgat 4320
gtgcgcaaga tgatcgccaa gtcggagcaa gagatcggca aggctaccgc caagtacttc 4380
ttctactcaa acatcatgaa cttcttcaag accgagatca cgctggccaa cggcgagatc 4440
cggaagagac cgctcatcga gaccaacggc gagacggggg agatcgtgtg ggacaagggc 4500
agggatttcg cgaccgtccg caaggttctc tccatgcccc aggtgaacat cgtcaagaag 4560
accgaggtcc aaacgggcgg gttctcaaag gagtctatcc tgcctaagcg gaacagcgac 4620
aagctcatcg ccagaaagaa ggactgggac ccaaagaagt acggcgggtt cgacagccct 4680
accgtggcct actcggtcct ggttgtggcg aaggttgaga agggcaagtc caagaagctc 4740
aagagcgtga aggagctcct ggggatcacc atcatggaga ggtccagctt cgagaagaac 4800
ccaatcgact tcctggaggc caagggctac aaggaggtga agaaggacct gatcatcaag 4860
ctcccgaagt actctctctt cgagctggag aacggcagga agagaatgct ggcttccgct 4920
ggcgagctcc agaaggggaa cgagctcgcg ctgccaagca agtacgtgaa cttcctctac 4980
ctggcttccc actacgagaa gctcaagggc agcccggagg acaacgagca aaagcagctg 5040
ttcgtcgagc agcacaagca ttacctcgac gagatcatcg agcaaatctc cgagttcagc 5100
aagcgcgtga tcctcgccga cgcgaacctg gataaggtcc tctccgccta caacaagcac 5160
cgggacaagc ccatcagaga gcaagcggag aacatcatcc atctcttcac cctgacgaac 5220
ctcggcgctc ctgctgcttt caagtacttc gacaccacga tcgatcggaa gagatacacc 5280
tccacgaagg aggtcctgga cgcgaccctc atccaccagt cgatcaccgg cctgtacgag 5340
acgaggatcg acctctcaca actcggcggg gataagagac ccgcagcaac caagaaggca 5400
gggcaagcaa agaagaagaa gacgcgttca ggcggctccg gcggctccac caacctgtcc 5460
gacatcatcg agaaggagac gggcaagcaa ctcgtgatcc aggagagcat cctcatgctg 5520
ccagaggagg tggaggaggt catcggcaac aagccagagt ccgacatcct ggtgcacacc 5580
gcctacgacg agtccaccga cgagaacgtc atgctcctga ccagcgacgc cccagagtac 5640
aagccatggg ccctcgtcat ccaggacagc aacggggaga acaagatcaa gatgctgtcg 5700
gggacgcgtg actccggcgg cagcaccaac ctgtccgaca tcatcgagaa ggagacgggc 5760
aagcaactcg tgatccagga gagcatcctc atgctgccag aggaggtgga ggaggtcatc 5820
ggcaacaagc cagagtccga catcctggtg cacaccgcct acgacgagtc caccgacgag 5880
aacgtcatgc tcctgaccag cgacgcccca gagtacaagc catgggccct cgtcatccag 5940
gacagcaacg gggagaacaa gatcaagatg ctgtcggggg ggagcccaaa gaagaagcgg 6000
aaggtgtag 6009
<210> 9
<211> 4803
<212> DNA
<213> artificial sequence
<220>
<223> A3A-PBE-ΔUGI
<400> 9
atggaggcca gcccggctag cggcccaagg catctcatgg acccgcacat cttcaccagc 60
aacttcaaca acggcatcgg caggcacaag acctacttgt gctacgaggt ggagaggctc 120
gacaacggaa cctccgtgaa gatggaccaa cacagggggt tcctccacaa ccaagccaag 180
aacctcctct gcggcttcta cggcaggcac gccgagttga ggttcctcga cttggtgcca 240
tccctccaac tcgatccagc ccaaatctac cgcgtgacct ggttcatctc ctggtcccca 300
tgcttctcct ggggttgcgc cggcgaggtt cgggctttcc tccaagaaaa cacccacgtc 360
cgcctccgca ttttcgccgc caggatctat gattacgacc ctctctacaa ggaggccctc 420
cagatgctgc gggacgccgg tgctcaggtg agtatcatga cctacgacga gttcaagcac 480
tgctgggaca ccttcgttga ccaccagggc tgcccattcc aaccatggga cggtctggat 540
gaacacagcc aagccttgtc cggcaggctc cgggccatcc tccaaaacca ggggaactcc 600
gggagcgaga cgccaggcac ctccgagtcg gccaccccag aatctcttaa ggacaagaag 660
tactcgatcg gcctcgccat cgggacgaac tcagttggct gggccgtgat caccgacgag 720
tacaaggtgc cctctaagaa gttcaaggtc ctggggaaca ccgaccgcca ttccatcaag 780
aagaacctca tcggcgctct cctgttcgac agcggggaga ccgctgaggc tacgaggctc 840
aagagaaccg ctaggcgccg gtacacgaga aggaagaaca ggatctgcta cctccaagag 900
attttctcca acgagatggc caaggttgac gattcattct tccaccgcct ggaggagtct 960
ttcctcgtgg aggaggataa gaagcacgag cggcatccca tcttcggcaa catcgtggac 1020
gaggttgcct accacgagaa gtaccctacg atctaccatc tgcggaagaa gctcgtggac 1080
tccaccgata aggcggacct cagactgatc tacctcgctc tggcccacat gatcaagttc 1140
cgcggccatt tcctgatcga gggggatctc aacccagaca acagcgatgt tgacaagctg 1200
ttcatccaac tcgtgcagac ctacaaccaa ctcttcgagg agaacccgat caacgcctct 1260
ggcgtggacg cgaaggctat cctgtccgcg aggctctcga agtccaggag gctggagaac 1320
ctgatcgctc agctcccagg cgagaagaag aacggcctgt tcgggaacct catcgctctc 1380
agcctggggc tcaccccgaa cttcaagtcg aacttcgatc tcgctgagga cgccaagctg 1440
caactctcca aggacaccta cgacgatgac ctcgataacc tcctggccca gatcggcgat 1500
caatacgcgg acctgttcct cgctgccaag aacctgtcgg acgccatcct cctgtcagat 1560
atcctccgcg tgaacaccga gatcacgaag gctccactct ctgcctccat gatcaagcgc 1620
tacgacgagc accatcagga tctgaccctc ctgaaggcgc tggtccgcca acagctcccg 1680
gagaagtaca aggagatttt cttcgatcag tcgaagaacg gctacgctgg gtacatcgac 1740
ggcggggcct cacaagagga gttctacaag ttcatcaagc caatcctgga gaagatggac 1800
ggcacggagg agctcctggt gaagctcaac agggaggacc tcctgcggaa gcagagaacc 1860
ttcgataacg gcagcatccc ccaccaaatc catctcgggg agctgcacgc catcctgaga 1920
aggcaagagg acttctaccc tttcctcaag gataaccggg agaagatcga gaagatcctg 1980
accttcagaa tcccatacta cgtcggccct ctcgcgcggg ggaactcaag attcgcttgg 2040
atgacccgca agtctgagga gaccatcacg ccgtggaact tcgaggaggt ggtggacaag 2100
ggcgctagcg ctcagtcgtt catcgagagg atgaccaact tcgacaagaa cctgcccaac 2160
gagaaggtgc tccctaagca ctcgctcctg tacgagtact tcaccgtcta caacgagctc 2220
acgaaggtga agtacgtcac cgagggcatg cgcaagccag cgttcctgtc cggggagcag 2280
aagaaggcta tcgtggacct cctgttcaag accaaccgga aggtcacggt taagcaactc 2340
aaggaggact acttcaagaa gatcgagtgc ttcgattcgg tcgagatcag cggcgttgag 2400
gaccgcttca acgccagcct cgggacctac cacgatctcc tgaagatcat caaggataag 2460
gacttcctgg acaacgagga gaacgaggat atcctggagg acatcgtgct gaccctcacg 2520
ctgttcgagg acagggagat gatcgaggag cgcctgaaga cgtacgccca tctcttcgat 2580
gacaaggtca tgaagcaact caagcgccgg agatacaccg gctgggggag gctgtcccgc 2640
aagctcatca acggcatccg ggacaagcag tccgggaaga ccatcctcga cttcctcaag 2700
agcgatggct tcgccaacag gaacttcatg caactgatcc acgatgacag cctcaccttc 2760
aaggaggata tccaaaaggc tcaagtgagc ggccaggggg actcgctgca cgagcatatc 2820
gcgaacctcg ctggctcccc cgcgatcaag aagggcatcc tccagaccgt gaaggttgtg 2880
gacgagctcg tgaaggtcat gggccggcac aagcctgaga acatcgtcat cgagatggcc 2940
agagagaacc aaaccacgca gaaggggcaa aagaactcta gggagcgcat gaagcgcatc 3000
gaggagggca tcaaggagct ggggtcccaa atcctcaagg agcacccagt ggagaacacc 3060
caactgcaga acgagaagct ctacctgtac tacctccaga acggcaggga tatgtacgtg 3120
gaccaagagc tggatatcaa ccgcctcagc gattacgacg tcgatcatat cgttccccag 3180
tctttcctga aggatgactc catcgacaac aaggtcctca ccaggtcgga caagaaccgc 3240
ggcaagtcag ataacgttcc atctgaggag gtcgttaaga agatgaagaa ctactggagg 3300
cagctcctga acgccaagct gatcacgcaa aggaagttcg acaacctcac caaggctgag 3360
agaggcgggc tctcagagct ggacaaggcc ggcttcatca agcggcagct ggtcgagacc 3420
agacaaatca cgaagcacgt tgcgcaaatc ctcgactctc ggatgaacac gaagtacgat 3480
gagaacgaca agctgatcag ggaggttaag gtgatcaccc tgaagtctaa gctcgtctcc 3540
gacttcagga aggatttcca gttctacaag gttcgcgaga tcaacaacta ccaccatgcc 3600
catgacgctt acctcaacgc tgtggtcggc accgctctga tcaagaagta cccaaagctg 3660
gagtccgagt tcgtgtacgg ggactacaag gtttacgatg tgcgcaagat gatcgccaag 3720
tcggagcaag agatcggcaa ggctaccgcc aagtacttct tctactcaaa catcatgaac 3780
ttcttcaaga ccgagatcac gctggccaac ggcgagatcc ggaagagacc gctcatcgag 3840
accaacggcg agacggggga gatcgtgtgg gacaagggca gggatttcgc gaccgtccgc 3900
aaggttctct ccatgcccca ggtgaacatc gtcaagaaga ccgaggtcca aacgggcggg 3960
ttctcaaagg agtctatcct gcctaagcgg aacagcgaca agctcatcgc cagaaagaag 4020
gactgggacc caaagaagta cggcgggttc gacagcccta ccgtggccta ctcggtcctg 4080
gttgtggcga aggttgagaa gggcaagtcc aagaagctca agagcgtgaa ggagctcctg 4140
gggatcacca tcatggagag gtccagcttc gagaagaacc caatcgactt cctggaggcc 4200
aagggctaca aggaggtgaa gaaggacctg atcatcaagc tcccgaagta ctctctcttc 4260
gagctggaga acggcaggaa gagaatgctg gcttccgctg gcgagctcca gaaggggaac 4320
gagctcgcgc tgccaagcaa gtacgtgaac ttcctctacc tggcttccca ctacgagaag 4380
ctcaagggca gcccggagga caacgagcaa aagcagctgt tcgtcgagca gcacaagcat 4440
tacctcgacg agatcatcga gcaaatctcc gagttcagca agcgcgtgat cctcgccgac 4500
gcgaacctgg ataaggtcct ctccgcctac aacaagcacc gggacaagcc catcagagag 4560
caagcggaga acatcatcca tctcttcacc ctgacgaacc tcggcgctcc tgctgctttc 4620
aagtacttcg acaccacgat cgatcggaag agatacacct ccacgaagga ggtcctggac 4680
gcgaccctca tccaccagtc gatcaccggc ctgtacgaga cgaggatcga cctctcacaa 4740
ctcggcgggg ataagagacc cgcagcaacc aagaaggcag ggcaagcaaa gaagaagaag 4800
tag 4803
<210> 10
<211> 5127
<212> DNA
<213> artificial sequence
<220>
<223> A3A-PBE-NLS
<400> 10
atgccaaaga agaagaggaa ggttgaggcc agcccggcta gcggcccaag gcatctcatg 60
gacccgcaca tcttcaccag caacttcaac aacggcatcg gcaggcacaa gacctacttg 120
tgctacgagg tggagaggct cgacaacgga acctccgtga agatggacca acacaggggg 180
ttcctccaca accaagccaa gaacctcctc tgcggcttct acggcaggca cgccgagttg 240
aggttcctcg acttggtgcc atccctccaa ctcgatccag cccaaatcta ccgcgtgacc 300
tggttcatct cctggtcccc atgcttctcc tggggttgcg ccggcgaggt tcgggctttc 360
ctccaagaaa acacccacgt ccgcctccgc attttcgccg ccaggatcta tgattacgac 420
cctctctaca aggaggccct ccagatgctg cgggacgccg gtgctcaggt gagtatcatg 480
acctacgacg agttcaagca ctgctgggac accttcgttg accaccaggg ctgcccattc 540
caaccatggg acggtctgga tgaacacagc caagccttgt ccggcaggct ccgggccatc 600
ctccaaaacc aggggaactc cgggagcgag acgccaggca cctccgagtc ggccacccca 660
gaatctctta aggacaagaa gtactcgatc ggcctcgcca tcgggacgaa ctcagttggc 720
tgggccgtga tcaccgacga gtacaaggtg ccctctaaga agttcaaggt cctggggaac 780
accgaccgcc attccatcaa gaagaacctc atcggcgctc tcctgttcga cagcggggag 840
accgctgagg ctacgaggct caagagaacc gctaggcgcc ggtacacgag aaggaagaac 900
aggatctgct acctccaaga gattttctcc aacgagatgg ccaaggttga cgattcattc 960
ttccaccgcc tggaggagtc tttcctcgtg gaggaggata agaagcacga gcggcatccc 1020
atcttcggca acatcgtgga cgaggttgcc taccacgaga agtaccctac gatctaccat 1080
ctgcggaaga agctcgtgga ctccaccgat aaggcggacc tcagactgat ctacctcgct 1140
ctggcccaca tgatcaagtt ccgcggccat ttcctgatcg agggggatct caacccagac 1200
aacagcgatg ttgacaagct gttcatccaa ctcgtgcaga cctacaacca actcttcgag 1260
gagaacccga tcaacgcctc tggcgtggac gcgaaggcta tcctgtccgc gaggctctcg 1320
aagtccagga ggctggagaa cctgatcgct cagctcccag gcgagaagaa gaacggcctg 1380
ttcgggaacc tcatcgctct cagcctgggg ctcaccccga acttcaagtc gaacttcgat 1440
ctcgctgagg acgccaagct gcaactctcc aaggacacct acgacgatga cctcgataac 1500
ctcctggccc agatcggcga tcaatacgcg gacctgttcc tcgctgccaa gaacctgtcg 1560
gacgccatcc tcctgtcaga tatcctccgc gtgaacaccg agatcacgaa ggctccactc 1620
tctgcctcca tgatcaagcg ctacgacgag caccatcagg atctgaccct cctgaaggcg 1680
ctggtccgcc aacagctccc ggagaagtac aaggagattt tcttcgatca gtcgaagaac 1740
ggctacgctg ggtacatcga cggcggggcc tcacaagagg agttctacaa gttcatcaag 1800
ccaatcctgg agaagatgga cggcacggag gagctcctgg tgaagctcaa cagggaggac 1860
ctcctgcgga agcagagaac cttcgataac ggcagcatcc cccaccaaat ccatctcggg 1920
gagctgcacg ccatcctgag aaggcaagag gacttctacc ctttcctcaa ggataaccgg 1980
gagaagatcg agaagatcct gaccttcaga atcccatact acgtcggccc tctcgcgcgg 2040
gggaactcaa gattcgcttg gatgacccgc aagtctgagg agaccatcac gccgtggaac 2100
ttcgaggagg tggtggacaa gggcgctagc gctcagtcgt tcatcgagag gatgaccaac 2160
ttcgacaaga acctgcccaa cgagaaggtg ctccctaagc actcgctcct gtacgagtac 2220
ttcaccgtct acaacgagct cacgaaggtg aagtacgtca ccgagggcat gcgcaagcca 2280
gcgttcctgt ccggggagca gaagaaggct atcgtggacc tcctgttcaa gaccaaccgg 2340
aaggtcacgg ttaagcaact caaggaggac tacttcaaga agatcgagtg cttcgattcg 2400
gtcgagatca gcggcgttga ggaccgcttc aacgccagcc tcgggaccta ccacgatctc 2460
ctgaagatca tcaaggataa ggacttcctg gacaacgagg agaacgagga tatcctggag 2520
gacatcgtgc tgaccctcac gctgttcgag gacagggaga tgatcgagga gcgcctgaag 2580
acgtacgccc atctcttcga tgacaaggtc atgaagcaac tcaagcgccg gagatacacc 2640
ggctggggga ggctgtcccg caagctcatc aacggcatcc gggacaagca gtccgggaag 2700
accatcctcg acttcctcaa gagcgatggc ttcgccaaca ggaacttcat gcaactgatc 2760
cacgatgaca gcctcacctt caaggaggat atccaaaagg ctcaagtgag cggccagggg 2820
gactcgctgc acgagcatat cgcgaacctc gctggctccc ccgcgatcaa gaagggcatc 2880
ctccagaccg tgaaggttgt ggacgagctc gtgaaggtca tgggccggca caagcctgag 2940
aacatcgtca tcgagatggc cagagagaac caaaccacgc agaaggggca aaagaactct 3000
agggagcgca tgaagcgcat cgaggagggc atcaaggagc tggggtccca aatcctcaag 3060
gagcacccag tggagaacac ccaactgcag aacgagaagc tctacctgta ctacctccag 3120
aacggcaggg atatgtacgt ggaccaagag ctggatatca accgcctcag cgattacgac 3180
gtcgatcata tcgttcccca gtctttcctg aaggatgact ccatcgacaa caaggtcctc 3240
accaggtcgg acaagaaccg cggcaagtca gataacgttc catctgagga ggtcgttaag 3300
aagatgaaga actactggag gcagctcctg aacgccaagc tgatcacgca aaggaagttc 3360
gacaacctca ccaaggctga gagaggcggg ctctcagagc tggacaaggc cggcttcatc 3420
aagcggcagc tggtcgagac cagacaaatc acgaagcacg ttgcgcaaat cctcgactct 3480
cggatgaaca cgaagtacga tgagaacgac aagctgatca gggaggttaa ggtgatcacc 3540
ctgaagtcta agctcgtctc cgacttcagg aaggatttcc agttctacaa ggttcgcgag 3600
atcaacaact accaccatgc ccatgacgct tacctcaacg ctgtggtcgg caccgctctg 3660
atcaagaagt acccaaagct ggagtccgag ttcgtgtacg gggactacaa ggtttacgat 3720
gtgcgcaaga tgatcgccaa gtcggagcaa gagatcggca aggctaccgc caagtacttc 3780
ttctactcaa acatcatgaa cttcttcaag accgagatca cgctggccaa cggcgagatc 3840
cggaagagac cgctcatcga gaccaacggc gagacggggg agatcgtgtg ggacaagggc 3900
agggatttcg cgaccgtccg caaggttctc tccatgcccc aggtgaacat cgtcaagaag 3960
accgaggtcc aaacgggcgg gttctcaaag gagtctatcc tgcctaagcg gaacagcgac 4020
aagctcatcg ccagaaagaa ggactgggac ccaaagaagt acggcgggtt cgacagccct 4080
accgtggcct actcggtcct ggttgtggcg aaggttgaga agggcaagtc caagaagctc 4140
aagagcgtga aggagctcct ggggatcacc atcatggaga ggtccagctt cgagaagaac 4200
ccaatcgact tcctggaggc caagggctac aaggaggtga agaaggacct gatcatcaag 4260
ctcccgaagt actctctctt cgagctggag aacggcagga agagaatgct ggcttccgct 4320
ggcgagctcc agaaggggaa cgagctcgcg ctgccaagca agtacgtgaa cttcctctac 4380
ctggcttccc actacgagaa gctcaagggc agcccggagg acaacgagca aaagcagctg 4440
ttcgtcgagc agcacaagca ttacctcgac gagatcatcg agcaaatctc cgagttcagc 4500
aagcgcgtga tcctcgccga cgcgaacctg gataaggtcc tctccgccta caacaagcac 4560
cgggacaagc ccatcagaga gcaagcggag aacatcatcc atctcttcac cctgacgaac 4620
ctcggcgctc ctgctgcttt caagtacttc gacaccacga tcgatcggaa gagatacacc 4680
tccacgaagg aggtcctgga cgcgaccctc atccaccagt cgatcaccgg cctgtacgag 4740
acgaggatcg acctctcaca actcggcggg gataagagac ccgcagcaac caagaaggca 4800
gggcaagcaa agaagaagaa gacgcgtgac tccggcggca gcaccaacct gtccgacatc 4860
atcgagaagg agacgggcaa gcaactcgtg atccaggaga gcatcctcat gctgccagag 4920
gaggtggagg aggtcatcgg caacaagcca gagtccgaca tcctggtgca caccgcctac 4980
gacgagtcca ccgacgagaa cgtcatgctc ctgaccagcg acgccccaga gtacaagcca 5040
tgggccctcg tcatccagga cagcaacggg gagaacaaga tcaagatgct gtcggggggg 5100
agcccaaaga agaagcggaa ggtgtag 5127
<210> 11
<211> 5106
<212> DNA
<213> artificial sequence
<220>
<223> A3A-PBE-N57G
<400> 11
atggaggcca gcccggctag cggcccaagg catctcatgg acccgcacat cttcaccagc 60
aacttcaaca acggcatcgg caggcacaag acctacttgt gctacgaggt ggagaggctc 120
gacaacggaa cctccgtgaa gatggaccaa cacagggggt tcctccacgg ccaagccaag 180
aacctcctct gcggcttcta cggcaggcac gccgagttga ggttcctcga cttggtgcca 240
tccctccaac tcgatccagc ccaaatctac cgcgtgacct ggttcatctc ctggtcccca 300
tgcttctcct ggggttgcgc cggcgaggtt cgggctttcc tccaagaaaa cacccacgtc 360
cgcctccgca ttttcgccgc caggatctat gattacgacc ctctctacaa ggaggccctc 420
cagatgctgc gggacgccgg tgctcaggtg agtatcatga cctacgacga gttcaagcac 480
tgctgggaca ccttcgttga ccaccagggc tgcccattcc aaccatggga cggtctggat 540
gaacacagcc aagccttgtc cggcaggctc cgggccatcc tccaaaacca ggggaactcc 600
gggagcgaga cgccaggcac ctccgagtcg gccaccccag aatctcttaa ggacaagaag 660
tactcgatcg gcctcgccat cgggacgaac tcagttggct gggccgtgat caccgacgag 720
tacaaggtgc cctctaagaa gttcaaggtc ctggggaaca ccgaccgcca ttccatcaag 780
aagaacctca tcggcgctct cctgttcgac agcggggaga ccgctgaggc tacgaggctc 840
aagagaaccg ctaggcgccg gtacacgaga aggaagaaca ggatctgcta cctccaagag 900
attttctcca acgagatggc caaggttgac gattcattct tccaccgcct ggaggagtct 960
ttcctcgtgg aggaggataa gaagcacgag cggcatccca tcttcggcaa catcgtggac 1020
gaggttgcct accacgagaa gtaccctacg atctaccatc tgcggaagaa gctcgtggac 1080
tccaccgata aggcggacct cagactgatc tacctcgctc tggcccacat gatcaagttc 1140
cgcggccatt tcctgatcga gggggatctc aacccagaca acagcgatgt tgacaagctg 1200
ttcatccaac tcgtgcagac ctacaaccaa ctcttcgagg agaacccgat caacgcctct 1260
ggcgtggacg cgaaggctat cctgtccgcg aggctctcga agtccaggag gctggagaac 1320
ctgatcgctc agctcccagg cgagaagaag aacggcctgt tcgggaacct catcgctctc 1380
agcctggggc tcaccccgaa cttcaagtcg aacttcgatc tcgctgagga cgccaagctg 1440
caactctcca aggacaccta cgacgatgac ctcgataacc tcctggccca gatcggcgat 1500
caatacgcgg acctgttcct cgctgccaag aacctgtcgg acgccatcct cctgtcagat 1560
atcctccgcg tgaacaccga gatcacgaag gctccactct ctgcctccat gatcaagcgc 1620
tacgacgagc accatcagga tctgaccctc ctgaaggcgc tggtccgcca acagctcccg 1680
gagaagtaca aggagatttt cttcgatcag tcgaagaacg gctacgctgg gtacatcgac 1740
ggcggggcct cacaagagga gttctacaag ttcatcaagc caatcctgga gaagatggac 1800
ggcacggagg agctcctggt gaagctcaac agggaggacc tcctgcggaa gcagagaacc 1860
ttcgataacg gcagcatccc ccaccaaatc catctcgggg agctgcacgc catcctgaga 1920
aggcaagagg acttctaccc tttcctcaag gataaccggg agaagatcga gaagatcctg 1980
accttcagaa tcccatacta cgtcggccct ctcgcgcggg ggaactcaag attcgcttgg 2040
atgacccgca agtctgagga gaccatcacg ccgtggaact tcgaggaggt ggtggacaag 2100
ggcgctagcg ctcagtcgtt catcgagagg atgaccaact tcgacaagaa cctgcccaac 2160
gagaaggtgc tccctaagca ctcgctcctg tacgagtact tcaccgtcta caacgagctc 2220
acgaaggtga agtacgtcac cgagggcatg cgcaagccag cgttcctgtc cggggagcag 2280
aagaaggcta tcgtggacct cctgttcaag accaaccgga aggtcacggt taagcaactc 2340
aaggaggact acttcaagaa gatcgagtgc ttcgattcgg tcgagatcag cggcgttgag 2400
gaccgcttca acgccagcct cgggacctac cacgatctcc tgaagatcat caaggataag 2460
gacttcctgg acaacgagga gaacgaggat atcctggagg acatcgtgct gaccctcacg 2520
ctgttcgagg acagggagat gatcgaggag cgcctgaaga cgtacgccca tctcttcgat 2580
gacaaggtca tgaagcaact caagcgccgg agatacaccg gctgggggag gctgtcccgc 2640
aagctcatca acggcatccg ggacaagcag tccgggaaga ccatcctcga cttcctcaag 2700
agcgatggct tcgccaacag gaacttcatg caactgatcc acgatgacag cctcaccttc 2760
aaggaggata tccaaaaggc tcaagtgagc ggccaggggg actcgctgca cgagcatatc 2820
gcgaacctcg ctggctcccc cgcgatcaag aagggcatcc tccagaccgt gaaggttgtg 2880
gacgagctcg tgaaggtcat gggccggcac aagcctgaga acatcgtcat cgagatggcc 2940
agagagaacc aaaccacgca gaaggggcaa aagaactcta gggagcgcat gaagcgcatc 3000
gaggagggca tcaaggagct ggggtcccaa atcctcaagg agcacccagt ggagaacacc 3060
caactgcaga acgagaagct ctacctgtac tacctccaga acggcaggga tatgtacgtg 3120
gaccaagagc tggatatcaa ccgcctcagc gattacgacg tcgatcatat cgttccccag 3180
tctttcctga aggatgactc catcgacaac aaggtcctca ccaggtcgga caagaaccgc 3240
ggcaagtcag ataacgttcc atctgaggag gtcgttaaga agatgaagaa ctactggagg 3300
cagctcctga acgccaagct gatcacgcaa aggaagttcg acaacctcac caaggctgag 3360
agaggcgggc tctcagagct ggacaaggcc ggcttcatca agcggcagct ggtcgagacc 3420
agacaaatca cgaagcacgt tgcgcaaatc ctcgactctc ggatgaacac gaagtacgat 3480
gagaacgaca agctgatcag ggaggttaag gtgatcaccc tgaagtctaa gctcgtctcc 3540
gacttcagga aggatttcca gttctacaag gttcgcgaga tcaacaacta ccaccatgcc 3600
catgacgctt acctcaacgc tgtggtcggc accgctctga tcaagaagta cccaaagctg 3660
gagtccgagt tcgtgtacgg ggactacaag gtttacgatg tgcgcaagat gatcgccaag 3720
tcggagcaag agatcggcaa ggctaccgcc aagtacttct tctactcaaa catcatgaac 3780
ttcttcaaga ccgagatcac gctggccaac ggcgagatcc ggaagagacc gctcatcgag 3840
accaacggcg agacggggga gatcgtgtgg gacaagggca gggatttcgc gaccgtccgc 3900
aaggttctct ccatgcccca ggtgaacatc gtcaagaaga ccgaggtcca aacgggcggg 3960
ttctcaaagg agtctatcct gcctaagcgg aacagcgaca agctcatcgc cagaaagaag 4020
gactgggacc caaagaagta cggcgggttc gacagcccta ccgtggccta ctcggtcctg 4080
gttgtggcga aggttgagaa gggcaagtcc aagaagctca agagcgtgaa ggagctcctg 4140
gggatcacca tcatggagag gtccagcttc gagaagaacc caatcgactt cctggaggcc 4200
aagggctaca aggaggtgaa gaaggacctg atcatcaagc tcccgaagta ctctctcttc 4260
gagctggaga acggcaggaa gagaatgctg gcttccgctg gcgagctcca gaaggggaac 4320
gagctcgcgc tgccaagcaa gtacgtgaac ttcctctacc tggcttccca ctacgagaag 4380
ctcaagggca gcccggagga caacgagcaa aagcagctgt tcgtcgagca gcacaagcat 4440
tacctcgacg agatcatcga gcaaatctcc gagttcagca agcgcgtgat cctcgccgac 4500
gcgaacctgg ataaggtcct ctccgcctac aacaagcacc gggacaagcc catcagagag 4560
caagcggaga acatcatcca tctcttcacc ctgacgaacc tcggcgctcc tgctgctttc 4620
aagtacttcg acaccacgat cgatcggaag agatacacct ccacgaagga ggtcctggac 4680
gcgaccctca tccaccagtc gatcaccggc ctgtacgaga cgaggatcga cctctcacaa 4740
ctcggcgggg ataagagacc cgcagcaacc aagaaggcag ggcaagcaaa gaagaagaag 4800
acgcgtgact ccggcggcag caccaacctg tccgacatca tcgagaagga gacgggcaag 4860
caactcgtga tccaggagag catcctcatg ctgccagagg aggtggagga ggtcatcggc 4920
aacaagccag agtccgacat cctggtgcac accgcctacg acgagtccac cgacgagaac 4980
gtcatgctcc tgaccagcga cgccccagag tacaagccat gggccctcgt catccaggac 5040
agcaacgggg agaacaagat caagatgctg tcggggggga gcccaaagaa gaagcggaag 5100
gtgtag 5106
<210> 12
<211> 1701
<212> PRT
<213> artificial sequence
<220>
<223> A3A-PBE
<400> 12
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Thr Arg Asp Ser Gly Gly Ser Thr
1595 1600 1605
Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1610 1615 1620
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val
1625 1630 1635
Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr
1640 1645 1650
Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala
1655 1660 1665
Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly
1670 1675 1680
Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys
1685 1690 1695
Arg Lys Val
1700
<210> 13
<211> 2002
<212> PRT
<213> artificial sequence
<220>
<223> A3A-Gam
<400> 13
Met Ala Lys Pro Ala Lys Arg Ile Lys Ser Ala Ala Ala Ala Tyr Val
1 5 10 15
Pro Gln Asn Arg Asp Ala Val Ile Thr Asp Ile Lys Arg Ile Gly Asp
20 25 30
Leu Gln Arg Glu Ala Ser Arg Leu Glu Thr Glu Met Asn Asp Ala Ile
35 40 45
Ala Glu Ile Thr Glu Lys Phe Ala Ala Arg Ile Ala Pro Ile Lys Thr
50 55 60
Asp Ile Glu Thr Leu Ser Lys Gly Val Gln Gly Trp Cys Glu Ala Asn
65 70 75 80
Arg Asp Glu Leu Thr Asn Gly Gly Lys Val Lys Thr Ala Asn Leu Val
85 90 95
Thr Gly Asp Val Ser Trp Arg Val Arg Pro Pro Ser Val Ser Ile Arg
100 105 110
Gly Met Asp Ala Val Met Glu Thr Leu Glu Arg Leu Gly Leu Gln Arg
115 120 125
Phe Ile Arg Thr Lys Gln Glu Ile Asn Lys Glu Ala Ile Leu Leu Glu
130 135 140
Pro Lys Ala Val Ala Gly Val Ala Gly Ile Thr Val Lys Ser Gly Ile
145 150 155 160
Glu Asp Phe Ser Ile Ile Pro Phe Glu Gln Glu Ala Gly Ile Ser Gly
165 170 175
Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Arg Pro
180 185 190
Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His Ile
195 200 205
Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr Leu
210 215 220
Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met Asp
225 230 235 240
Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys Gly
245 250 255
Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro Ser
260 265 270
Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile Ser
275 280 285
Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala Phe
290 295 300
Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg Ile
305 310 315 320
Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg Asp
325 330 335
Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His Cys
340 345 350
Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp Asp
355 360 365
Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala Ile
370 375 380
Leu Gln Asn Gln Gly Asn Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly
385 390 395 400
Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly
405 410 415
Gly Ser Ser Gly Gly Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly Leu
420 425 430
Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr
435 440 445
Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His
450 455 460
Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu
465 470 475 480
Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr
485 490 495
Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu
500 505 510
Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe
515 520 525
Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn
530 535 540
Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His
545 550 555 560
Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu
565 570 575
Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu
580 585 590
Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe
595 600 605
Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile
610 615 620
Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser
625 630 635 640
Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys
645 650 655
Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr
660 665 670
Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln
675 680 685
Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln
690 695 700
Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser
705 710 715 720
Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr
725 730 735
Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His
740 745 750
Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu
755 760 765
Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly
770 775 780
Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys
785 790 795 800
Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu
805 810 815
Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser
820 825 830
Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg
835 840 845
Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu
850 855 860
Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg
865 870 875 880
Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile
885 890 895
Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln
900 905 910
Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu
915 920 925
Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr
930 935 940
Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro
945 950 955 960
Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe
965 970 975
Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe
980 985 990
Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp
995 1000 1005
Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
1010 1015 1020
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
1025 1030 1035
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu
1040 1045 1050
Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp
1055 1060 1065
Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly
1070 1075 1080
Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser
1085 1090 1095
Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn
1100 1105 1110
Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
1115 1120 1125
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
1130 1135 1140
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys
1145 1150 1155
Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val
1160 1165 1170
Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg
1175 1180 1185
Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg
1190 1195 1200
Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile
1205 1210 1215
Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys
1220 1225 1230
Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp
1235 1240 1245
Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His
1250 1255 1260
Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
1265 1270 1275
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val
1280 1285 1290
Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln
1295 1300 1305
Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu
1310 1315 1320
Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly
1325 1330 1335
Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His
1340 1345 1350
Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
1355 1360 1365
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
1370 1375 1380
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val
1385 1390 1395
Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn
1400 1405 1410
Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu
1415 1420 1425
Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys
1430 1435 1440
Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys
1445 1450 1455
Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile
1460 1465 1470
Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr
1475 1480 1485
Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe
1490 1495 1500
Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1505 1510 1515
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile
1520 1525 1530
Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp
1535 1540 1545
Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala
1550 1555 1560
Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys
1565 1570 1575
Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu
1580 1585 1590
Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys
1595 1600 1605
Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys
1610 1615 1620
Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala
1625 1630 1635
Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser
1640 1645 1650
Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu
1655 1660 1665
Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu
1670 1675 1680
Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu
1685 1690 1695
Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val
1700 1705 1710
Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln
1715 1720 1725
Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala
1730 1735 1740
Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1745 1750 1755
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln
1760 1765 1770
Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu
1775 1780 1785
Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala
1790 1795 1800
Lys Lys Lys Lys Thr Arg Ser Gly Gly Ser Gly Gly Ser Thr Asn
1805 1810 1815
Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile
1820 1825 1830
Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile
1835 1840 1845
Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp
1850 1855 1860
Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro
1865 1870 1875
Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu
1880 1885 1890
Asn Lys Ile Lys Met Leu Ser Gly Thr Arg Asp Ser Gly Gly Ser
1895 1900 1905
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu
1910 1915 1920
Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu
1925 1930 1935
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala
1940 1945 1950
Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp
1955 1960 1965
Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn
1970 1975 1980
Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys
1985 1990 1995
Lys Arg Lys Val
2000
<210> 14
<211> 1600
<212> PRT
<213> artificial sequence
<220>
<223> A3A-PBE-ΔUGI
<400> 14
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys
1595 1600
<210> 15
<211> 1708
<212> PRT
<213> artificial sequence
<220>
<223> A3A-PBE-NLS
<400> 15
Met Pro Lys Lys Lys Arg Lys Val Glu Ala Ser Pro Ala Ser Gly Pro
1 5 10 15
Arg His Leu Met Asp Pro His Ile Phe Thr Ser Asn Phe Asn Asn Gly
20 25 30
Ile Gly Arg His Lys Thr Tyr Leu Cys Tyr Glu Val Glu Arg Leu Asp
35 40 45
Asn Gly Thr Ser Val Lys Met Asp Gln His Arg Gly Phe Leu His Asn
50 55 60
Gln Ala Lys Asn Leu Leu Cys Gly Phe Tyr Gly Arg His Ala Glu Leu
65 70 75 80
Arg Phe Leu Asp Leu Val Pro Ser Leu Gln Leu Asp Pro Ala Gln Ile
85 90 95
Tyr Arg Val Thr Trp Phe Ile Ser Trp Ser Pro Cys Phe Ser Trp Gly
100 105 110
Cys Ala Gly Glu Val Arg Ala Phe Leu Gln Glu Asn Thr His Val Arg
115 120 125
Leu Arg Ile Phe Ala Ala Arg Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys
130 135 140
Glu Ala Leu Gln Met Leu Arg Asp Ala Gly Ala Gln Val Ser Ile Met
145 150 155 160
Thr Tyr Asp Glu Phe Lys His Cys Trp Asp Thr Phe Val Asp His Gln
165 170 175
Gly Cys Pro Phe Gln Pro Trp Asp Gly Leu Asp Glu His Ser Gln Ala
180 185 190
Leu Ser Gly Arg Leu Arg Ala Ile Leu Gln Asn Gln Gly Asn Ser Gly
195 200 205
Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Leu Lys
210 215 220
Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly
225 230 235 240
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
245 250 255
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
260 265 270
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
275 280 285
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
290 295 300
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
305 310 315 320
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
325 330 335
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
340 345 350
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
355 360 365
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
370 375 380
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
385 390 395 400
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
405 410 415
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
420 425 430
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
435 440 445
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
450 455 460
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
465 470 475 480
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
485 490 495
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
500 505 510
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
515 520 525
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
530 535 540
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
545 550 555 560
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
565 570 575
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
580 585 590
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
595 600 605
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
610 615 620
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
625 630 635 640
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
645 650 655
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
660 665 670
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
675 680 685
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
690 695 700
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
705 710 715 720
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
725 730 735
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
740 745 750
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
755 760 765
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
770 775 780
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
785 790 795 800
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
805 810 815
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
820 825 830
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
835 840 845
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
850 855 860
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
865 870 875 880
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
885 890 895
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
900 905 910
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
915 920 925
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
930 935 940
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
945 950 955 960
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
965 970 975
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
980 985 990
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
995 1000 1005
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
1010 1015 1020
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr
1025 1030 1035
Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile
1040 1045 1050
Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser
1055 1060 1065
Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser
1070 1075 1080
Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val
1085 1090 1095
Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys
1100 1105 1110
Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg
1115 1120 1125
Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln
1130 1135 1140
Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
1145 1150 1155
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile
1160 1165 1170
Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp
1175 1180 1185
Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn
1190 1195 1200
Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr
1205 1210 1215
Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr
1220 1225 1230
Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser
1235 1240 1245
Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser
1250 1255 1260
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly
1265 1270 1275
Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly
1280 1285 1290
Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys
1295 1300 1305
Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val
1310 1315 1320
Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn
1325 1330 1335
Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys
1340 1345 1350
Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val
1355 1360 1365
Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val
1370 1375 1380
Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1385 1390 1395
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val
1400 1405 1410
Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu
1415 1420 1425
Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu
1430 1435 1440
Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe
1445 1450 1455
Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu
1460 1465 1470
Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr
1475 1480 1485
Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val
1490 1495 1500
Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn
1505 1510 1515
Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile
1520 1525 1530
His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys
1535 1540 1545
Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys
1550 1555 1560
Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu
1565 1570 1575
Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg
1580 1585 1590
Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys Thr
1595 1600 1605
Arg Asp Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys
1610 1615 1620
Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu
1625 1630 1635
Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp
1640 1645 1650
Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val
1655 1660 1665
Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu
1670 1675 1680
Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser
1685 1690 1695
Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1700 1705
<210> 16
<211> 1701
<212> PRT
<213> artificial sequence
<220>
<223> A3A-PBE-N57G
<400> 16
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Gly Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Thr Arg Asp Ser Gly Gly Ser Thr
1595 1600 1605
Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1610 1615 1620
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val
1625 1630 1635
Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr
1640 1645 1650
Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala
1655 1660 1665
Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly
1670 1675 1680
Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys
1685 1690 1695
Arg Lys Val
1700

Claims (27)

1.一种产生经遗传修饰的植物的方法,包括将用于对植物细胞基因组中的靶序列进行碱基编辑的系统导入植物,所述系统包含以下i)至v)中至少一项:
i)碱基编辑融合蛋白,和向导RNA;
ii)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导RNA;
iii)碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构建体;
iv)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编码向导RNA的核苷酸序列的表达构建体;
v)包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷酸序列的表达构建体;
其中所述碱基编辑融合蛋白包含核酸酶失活的CRISPR效应蛋白和APOBEC3A脱氨酶,所述向导RNA能够将所述碱基编辑融合蛋白靶向植物细胞基因组中的靶序列,从而所述碱基编辑融合蛋白导致所述靶序列中的一或多个C被T取代,
其中所述核酸酶失活的CRISPR效应蛋白是核酸酶失活的Cas9或核酸酶失活的LbCpf1,所述核酸酶失活的Cas9由SEQ ID NO:4的氨基酸序列组成。
2.权利要求1的方法,其中所述APOBEC3A脱氨酶由相对于SEQ ID NO:2包含N57G取代的氨基酸序列组成。
3.权利要求1的方法,其中所述APOBEC3A脱氨酶被融合至所述核酸酶失活的CRISPR效应蛋白的N末端。
4.权利要求1的方法,其中所述APOBEC3A脱氨酶和所述核酸酶失活的CRISPR效应蛋白通过接头融合。
5.权利要求1的方法,其中所述碱基编辑融合蛋白还在其N端和/或C端包含核定位序列(NLS)。
6.权利要求1的方法,其中所述碱基编辑融合蛋白还包含UGI序列。
7.权利要求6的方法,其中所述UGI氨基酸序列示于SEQ ID NO:5。
8.权利要求1的方法,其中所述碱基编辑融合蛋白还包含Gam蛋白序列。
9.权利要求8的方法,其中所述Gam蛋白氨基酸序列示于SEQ ID NO:6。
10.权利要求1的方法,其中所述碱基编辑融合蛋白由SEQ ID NO:7-11之一所示的核苷酸序列编码的氨基酸序列组成或由SEQ ID NO:12-16之一所示的氨基酸序列组成。
11.权利要求1的方法,其中所述编码碱基编辑融合蛋白的核苷酸序列针对待进行碱基编辑的植物进行密码子优化。
12.权利要求11的方法,其中所述编码碱基编辑融合蛋白的核苷酸序列示于SEQ IDNO:7-9中任一个。
13.权利要求1的方法,其中所述向导RNA是单向导RNA(sgRNA)。
14.权利要求1的方法,所述编码碱基编辑融合蛋白的核苷酸序列和/或所述编码向导RNA的核苷酸序列与植物表达调控元件可操作地连接。
15.权利要求14的方法,其中所述调控元件是启动子。
16.权利要求15的方法,其中所述启动子是35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子或玉米U3启动子。
17.权利要求1的方法,其中所述向导RNA的靶区域长度为20个核苷酸。
18.权利要求1的方法,其中所述导入在不存在选择压力下进行。
19.权利要求1的方法,还包括筛选具有期望的核苷酸取代的植物。
20.权利要求1的方法,其中所述植物选自单子叶植物和双子叶植物。
21.权利要求1的方法,其中所述植物是小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
22.权利要求1的方法,其中所述靶序列与植物性状相关,由此所述碱基编辑导致所述植物相对于野生型植物具有改变的性状。
23.权利要求1的方法,其中通过瞬时转化导入所述系统。
24.权利要求1的方法,其中所述系统通过选自以下的方法导入所述植物:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、病毒介导的转化、花粉管通道法和子房注射法。
25.权利要求1的方法,还包括获得所述经遗传修饰的植物的后代。
26.权利要求1的方法,其中没有外源DNA整合到所述经修饰的植物的基因组中。
27.一种植物育种方法,包括将通过权利要求1-26中任一项的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而将所述遗传修饰导入第二植物。
CN201980049597.XA 2018-07-24 2019-07-24 基于人apobec3a脱氨酶的碱基编辑器及其用途 Active CN112805385B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2018108166037 2018-07-24
CN201810816603 2018-07-24
PCT/CN2019/097398 WO2020020193A1 (zh) 2018-07-24 2019-07-24 基于人apobec3a脱氨酶的碱基编辑器及其用途

Publications (2)

Publication Number Publication Date
CN112805385A CN112805385A (zh) 2021-05-14
CN112805385B true CN112805385B (zh) 2023-05-30

Family

ID=69182103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980049597.XA Active CN112805385B (zh) 2018-07-24 2019-07-24 基于人apobec3a脱氨酶的碱基编辑器及其用途

Country Status (2)

Country Link
CN (1) CN112805385B (zh)
WO (1) WO2020020193A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114317590B (zh) * 2020-09-30 2024-01-16 北京市农林科学院 一种将植物基因组中的碱基c突变为碱基t的方法
CN117043345A (zh) * 2021-03-09 2023-11-10 苏州齐禾生科生物科技有限公司 改进的cg碱基编辑系统
CN115678900A (zh) * 2021-07-30 2023-02-03 中国科学院天津工业生物技术研究所 缩小碱基编辑器的编辑窗口的方法、碱基编辑器及用途

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018071868A1 (en) * 2016-10-14 2018-04-19 President And Fellows Of Harvard College Aav delivery of nucleobase editors

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL294014B2 (en) * 2015-10-23 2024-07-01 Harvard College Nucleobase editors and their uses
JP2019523011A (ja) * 2016-11-14 2019-08-22 インスティテュート・オブ・ジェネティクス・アンド・ディヴェロプメンタル・バイオロジー、チャイニーズ・アカデミー・オブ・サイエンシズInstitute of Genetics and Developmental Biology, Chinese Academy of Sciences 植物における塩基編集のための方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018071868A1 (en) * 2016-10-14 2018-04-19 President And Fellows Of Harvard College Aav delivery of nucleobase editors

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A fluorescent reporter for quantification and enrichment of DNA editing by APOBEC-Cas9 or cleavage by Cas9 in living cell;Martin等;《Nucleic Acids Research》;20180509;第46卷(第14期);摘要,第5页左栏第2段-右栏第1段,图1 *
Efficient genome modification by CRISPR-Cas9 nickase with minimal off-target effects;Bin Shen等;《Nature Methods》;20140302;第11卷(第4期);第399-404页 *
High-precision CRISPR-Cas9 base editors with minimized bystander and off-target mutations;Gehrked等;《BioRxiv》;20180301;第6页第1段 *
WP_001107930.1;GenBank;《GenBank》;20130828;全文 *

Also Published As

Publication number Publication date
CN112805385A (zh) 2021-05-14
WO2020020193A1 (zh) 2020-01-30

Similar Documents

Publication Publication Date Title
US11820990B2 (en) Method for base editing in plants
CN108866092B (zh) 抗除草剂基因的产生及其用途
WO2019120310A1 (en) Base editing system and method based on cpf1 protein
JP2019523011A (ja) 植物における塩基編集のための方法
US20220333126A1 (en) Methods and compositions for herbicide tolerance in plants
US20200199609A1 (en) Compositions and methods for stature modification in plants
CN112805385B (zh) 基于人apobec3a脱氨酶的碱基编辑器及其用途
CN114945670A (zh) 一种碱基编辑系统和其使用方法
EP3262177A1 (en) Haploid induction
US20220346341A1 (en) Methods and compositions to increase yield through modifications of fea3 genomic locus and associated ligands
US20200340009A1 (en) Cenh3 deletion mutants
WO2019161147A9 (en) Methods and compositions for increasing harvestable yield via editing ga20 oxidase genes to generate short stature plants
CN114395580B (zh) 用于控制玉米株高的基因
EP4130257A1 (en) Improved cytosine base editing system
US20210155949A1 (en) Improving agronomic characteristics in maize by modification of endogenous mads box transcription factors
WO2018228348A1 (en) Methods to improve plant agronomic trait using bcs1l gene and guide rna/cas endonuclease systems
WO2023115030A2 (en) Lodging resistance in eragrostis tef
CN115843314A (zh) 获得具有增加的白粉病抗性的小麦的方法
CN114174518A (zh) 非生物胁迫耐受性植物及方法

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220627

Address after: Room D340, F3, building 2, No. 2250, Pudong South Road, Pudong New Area, Shanghai 200120

Applicant after: Shanghai Blue Cross Medical Science Research Institute

Address before: 100101 courtyard 1, Beichen West Road, Chaoyang District, Beijing

Applicant before: INSTITUTE OF GENETICS AND DEVELOPMENTAL BIOLOGY, CHINESE ACADEMY OF SCIENCES

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220921

Address after: Unit E598, 5th Floor, Lecheng Plaza, Phase II, Biomedical Industrial Park, No. 218, Sangtian Street, Suzhou Industrial Park, Suzhou Area, China (Jiangsu) Pilot Free Trade Zone, Suzhou City, Jiangsu Province, 215127

Applicant after: Suzhou Qihe Biotechnology Co.,Ltd.

Address before: Room D340, F3, building 2, No. 2250, Pudong South Road, Pudong New Area, Shanghai 200120

Applicant before: Shanghai Blue Cross Medical Science Research Institute

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant