CN109295053A - 通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控rna剪接的方法 - Google Patents

通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控rna剪接的方法 Download PDF

Info

Publication number
CN109295053A
CN109295053A CN201810819909.8A CN201810819909A CN109295053A CN 109295053 A CN109295053 A CN 109295053A CN 201810819909 A CN201810819909 A CN 201810819909A CN 109295053 A CN109295053 A CN 109295053A
Authority
CN
China
Prior art keywords
lys
leu
glu
asp
arg
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810819909.8A
Other languages
English (en)
Other versions
CN109295053B (zh
Inventor
常兴
袁娟娟
马云青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institutes for Biological Sciences SIBS of CAS
Original Assignee
Shanghai Institutes for Biological Sciences SIBS of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institutes for Biological Sciences SIBS of CAS filed Critical Shanghai Institutes for Biological Sciences SIBS of CAS
Publication of CN109295053A publication Critical patent/CN109295053A/zh
Application granted granted Critical
Publication of CN109295053B publication Critical patent/CN109295053B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K31/00Medicinal preparations containing organic active ingredients
    • A61K31/70Carbohydrates; Sugars; Derivatives thereof
    • A61K31/7088Compounds having three or more nucleosides or nucleotides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/43Enzymes; Proenzymes; Derivatives thereof
    • A61K38/46Hydrolases (3)
    • A61K38/465Hydrolases (3) acting on ester bonds (3.1), e.g. lipases, ribonucleases
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/43Enzymes; Proenzymes; Derivatives thereof
    • A61K38/46Hydrolases (3)
    • A61K38/50Hydrolases (3) acting on carbon-nitrogen bonds, other than peptide bonds (3.5), e.g. asparaginase
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • A61K48/005Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
    • A61K48/0066Manipulation of the nucleic acid to modify its expression pattern, e.g. enhance its duration of expression, achieved by the presence of particular introns in the delivered nucleic acid
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P21/00Drugs for disorders of the muscular or neuromuscular system
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/111General methods applicable to biologically active non-coding nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/30Special therapeutic applications
    • C12N2320/33Alteration of splicing

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Veterinary Medicine (AREA)
  • Epidemiology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Immunology (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Orthopedic Medicine & Surgery (AREA)
  • Neurology (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)

Abstract

公开了通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控RNA剪接的方法,所述方法包括在所述细胞中表达靶向性的胞嘧啶脱氨酶,以诱导该细胞中感兴趣基因的感兴趣内含子的3’剪接位点AG突变为AA,或感兴趣基因的感兴趣内含子的5’剪接位点GT突变为AT,或感兴趣基因的感兴趣内含子的多聚嘧啶区的多个C分别突变为T。利用该方法可特异性阻断外显子识别过程,调控内源性mRNA的选择性剪接过程,可诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换、诱导内含子包含及增强外显子的包含。

Description

通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控RNA 剪接的方法
技术领域
本发明涉及通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控RNA剪接的方法。
背景技术
真核生物基因的正确表达需要将前体mRNA中的内含子剪接,同时将外显子拼接形成成熟mRNA。而超过98%的内含子是被一种高度动态的蛋白复合体——剪接体切除的。剪接体由超过150种小核核糖核蛋白(snRNPs)组成,如U1,U2,U4,U5和U6。在剪接过程中,U1snRNP识别内含子5’剪接位点的GU序列,剪接因子1(SF1)结合内含子的分叉点,而U2辅助蛋白(U2AF)的35KD亚基结合内含子的3’剪接位点的AG序列,65KD亚基结合在多聚嘧啶区序列,完成外显子识别过程;然后U5和U6蛋白通过调控RNA结构重构以及RNA与蛋白相互作用,催化内含子的去除过程。RNA剪接过程对基因表达调控发挥着重要作用,有研究发现15%的可遗传人类疾病是由于前体mRNA加工异常导致的,因此RNA剪接过程可以作为这些疾病的可能治疗靶点,例如利用反义寡聚核苷酸(ASO)调控疾病相关基因的RNA剪接对杜氏肌肉萎缩症和脊髓性肌萎缩症都有一定的缓解作用。
除了内含子剪接外,75%人类基因在表达过程中会发生选择性RNA剪接,极大地提高了人类蛋白组的丰富性。但是现在仍缺乏便捷有效的方法调控选择性剪接过程,大多数选择性剪接蛋白亚型的功能并不清楚。
反义寡聚核苷酸可以结合在RNA的顺式作用元件(如外显子剪接增强子)上阻断外显子的剪接,但是利用反义寡聚核苷酸进行剪接调控需要谨慎的设计以及严格的筛选,在疾病治疗过程中需要持续给药,其合成过程也非常昂贵,十分耗费时间与金钱。因此对这类疾病急需一种能够一次性治愈的治疗策略。
发明内容
本文提供一种调控细胞中感兴趣基因的RNA剪接的方法,其特征在于,所述方法包括在所述细胞中表达靶向性的胞嘧啶脱氨酶,以诱导该细胞中感兴趣基因的感兴趣内含子的3’剪接位点AG突变为AA,或感兴趣基因的感兴趣内含子的5’剪接位点GT突变为AT,或感兴趣基因的感兴趣内含子的多聚嘧啶区的多个C分别突变为T。
在一个或多个实施方案中,用于本文所述的方法的靶向性胞嘧啶脱氨酶可选自:
(1)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas酶的融合蛋白;
(2)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的TALEN蛋白的融合蛋白;
(3)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的锌指蛋白的融合蛋白;
(4)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶的融合蛋白;和
(5)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与Ago蛋白的融合蛋白。
在一个或多个实施方案中,所述靶向性胞嘧啶脱氨酶为胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas酶的融合蛋白,或者是胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶的融合蛋白;所述方法包括在所述细胞中表达所述靶向性胞嘧啶脱氨酶和sgRNA,其中,所述sgRNA为所述Cas酶或Cpf酶所特异性识别,并结合到含有感兴趣基因感兴趣内含子剪接位点的序列或感兴趣多聚嘧啶区的互补序列。
在一个或多个实施方案中,所述靶向性胞嘧啶脱氨酶是胞嘧啶脱氨酶、其保留了酶活的片段或突变体与Ago蛋白的融合蛋白;所述方法包括在所述细胞中表达所述靶向性胞嘧啶脱氨酶和该Ago蛋白识别的gDNA的步骤。
在一个或多个实施方案中,本文提供一种调控细胞中感兴趣基因的RNA剪接的方法,所述方法包括在所述细胞中表达(1)核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白与胞嘧啶脱氨酶AID或其突变体的融合蛋白和(2)sgRNA的步骤;其中,所述sgRNA的Cas蛋白识别区为所述Cas蛋白所特异性识别,所述sgRNA结合到含有感兴趣基因感兴趣内含子剪接位点的序列或感兴趣多聚嘧啶区的互补序列。
在一个或多个实施方案中,所述sgRNA结合到含有感兴趣基因的感兴趣内含子的5’剪接位点的序列,所述融合蛋白将所述5’剪接位点处的GT突变为AT,从而诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换或内含子包含。
在一个或多个实施方案中,所述sgRNA结合到含有感兴趣基因的感兴趣内含子的3’剪接位点的序列,所述融合蛋白将所述3’剪接位点的AG突变为AA,从而诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换或内含子包含。
在一个或多个实施方案中,所述sgRNA结合到感兴趣多聚嘧啶区的互补链,诱导多聚嘧啶区的C突变为T,从而增强外显子包含。
在一个或多个实施方案中,通过将所述融合蛋白和sgRNA的表达载体转入所述细胞,从而调控该细胞中感兴趣基因的RNA剪接。
在一个或多个实施方案中,所述方法还包括,同时转入Ugi的表达质粒的步骤。
在一个或多个实施方案中,所述方法还包括,同时转入核酸酶缺陷或部分缺陷Cas9蛋白、AID或其突变体以及Ugi的融合蛋白的表达质粒的步骤。
在一个或多个实施方案中,所述融合蛋白和AID、其片段或其突变体如本文任意部分或任一实施方案所述。
在一个或多个实施方案中,所述感兴趣的细胞和感兴趣的基因如本文任意部分或任一实施方案所述。
在某些实施方案中,本文提供一种诱导外显子跳读的方法,所述方法包括在感兴趣细胞中表达(1)核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白与胞嘧啶脱氨酶AID或其突变体和任选的Ugi的融合蛋白的融合蛋白和(2)sgRNA的步骤;其中,所述sgRNA的Cas蛋白识别区为所述Cas蛋白所特异性识别,所述sgRNA结合到含有感兴趣基因感兴趣内含子剪接位点的序列。
在某些实施方案中,本文提供一种激活替代剪接位点的方法,所述方法包括在感兴趣细胞中表达(1)核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白与胞嘧啶脱氨酶AID或其突变体和任选的Ugi的融合蛋白的融合蛋白和(2)sgRNA的步骤;其中,所述sgRNA的Cas蛋白识别区为所述Cas蛋白所特异性识别,所述sgRNA结合到含有具有感兴趣基因感兴趣内含子剪接位点的序列,其中,所述感兴趣内含子附近具有替代剪接位点。
在某些实施方案中,本文还提供一种诱导互斥外显子转换的方法,所述方法包括在感兴趣细胞中表达(1)核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白和胞嘧啶脱氨酶AID或其突变体和任选的Ugi的融合蛋白和(2)sgRNA的步骤;其中,所述sgRNA的Cas蛋白识别区为所述Cas蛋白所特异性识别,所述sgRNA的靶标结合区含有感兴趣基因感兴趣内含子剪接位点的序列;其中,所述感兴趣基因选自PKM。
在某些实施方案中,本文还提供一种诱导内含子包含的方法,所述方法包括在感兴趣细胞中表达(1)核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白与胞嘧啶脱氨酶AID或其突变体和任选的Ugi的融合蛋白的融合蛋白和(2)sgRNA的步骤;其中,所述sgRNA的Cas蛋白识别区为所述Cas蛋白所特异性识别,所述sgRNA含有感兴趣内含子的剪切位点;其中,所述感兴趣的内含子长度较短(<150bp),并且富含G/C碱基。
在某些实施方案中,本文还提供一种增强外显子包含的方法,所述方法包括在感兴趣细胞中表达(1)核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白、胞嘧啶脱氨酶AID或其突变体和Ugi的融合蛋白和(2)sgRNA的步骤;其中,所述sgRNA的Cas蛋白识别区为所述Cas蛋白所特异性识别,所述sgRNA的含有感兴趣外显子的上游多聚嘧啶区的互补序列。
本文还提供一种融合蛋白,所述融合蛋白含有胞核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白和胞嘧啶脱氨酶AID或其突变体。
在一个或多个实施方案中,本文的融合蛋白还含有Ugi。
本文还提供用于在细胞中产生点突变,或用于调控细胞中感兴趣基因的RNA剪接,或用于在感兴趣的细胞中诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换、诱导内含子包含、增强外显子包含的融合蛋白,所述融合蛋白含有胞核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas蛋白和胞嘧啶脱氨酶AID或其突变体,任选还含有接头序列、核定位序列以及Ugi。
本文还包括利用本文所述的调控RNA剪接的方法进行疾病治疗的方法。
本文还提供本文所述的融合蛋白或其表达载体及相应的sgRNA或其表达载体在制备调控RNA剪接的试剂盒中的应用,以及含有本文所述的融合蛋白或其表达及相应的sgRNA或其表达载体的试剂盒。
附图说明
图1:TAM通过将3'剪接位点处不变的鸟嘌呤转化为腺嘌呤来诱导CD45的第5外显子的跳读。(A)利用TAM在CD45RB外显子的3'剪接位点的鸟嘌呤转化为腺嘌呤并引发外显子跳跃的示意图。在WT Raji细胞中,组合式剪接CD45的外显子5以产生最长的CD45亚型(CD45RA+RB+RC+,上图);TAM将外显子5的3'SS处的AG二核苷酸转化为AA,消除该剪接位点并破坏外显子识别,导致外显子5的跳跃并产生缺少CD45RB的CD45亚型(CD45RA+RC+,下图)。(B,C)TAM引起CD45RB外显子的跳跃。用AIDx-nCas9-Ugi和靶标sgRNA(CD45-E5-3'SS)或针对AAVS1的对照sgRNA(Ctrl)的表达质粒转染Raji细胞。转染后7天,使用外显子特异性抗体(B)通过流式细胞术测定靶向外显子(CD45RB),其上游外显子(外显子4,CD45RA),下游外显子(外显子6,CD45RC)和总CD45的表达;或通过外显子特异性实时PCR(C)检测相应外显子的表达。数据是两个独立实验的代表性(B)或总结(C)。**,p<0.01在学生t检验。(D)在CD45RBlow细胞中,3'SS处的G>A突变富集。从B中显示的细胞的基因组DNA以及来自TAM处理的细胞的分选的CD45RBhi和CD45RBlow细胞扩增内含子-外显子连接。扩增子通过高通量测序分析,超过8000x覆盖。描绘了具有可检测突变(突变体读数/WT读数>0.1%)的每个核苷酸的碱基组成,并标记了突变的Gs的G>A转化百分比。在内含子-外显子连接序列的顶部显示了sgRNA和PAM序列的位置。虚线描绘内含子/外显子连接。数据是两个独立实验的代表。(E)流式细胞分析CD45RB在对照Raji细胞或分选的CD45RBhi和CD45RBlow细胞从TAM处理的细胞表达。(F)TAM诱导CD45RB跳跃而不改变CD45的编码序列。如在D中,从cDNA扩增外显子-内含子结,并通过高通量测序分析碱基取代。注意,与基因组DNA相比,两种外显子突变在TAM处理的细胞的cDNA中是不可检测的。
图2:TAM通过将5'剪接位点的不变鸟嘌呤转化为腺嘌呤来诱导CD45RB外显子的跳读。(A)指导TAM在CD45RB外显子的5'SS处将不变鸟嘌呤转化为腺嘌呤并引发外显子跳跃的示意图。(B,C)TAM引起CD45RB外显子的跳跃。用AIDx-nCas9-Ugi和靶标sgRNA(E5-5'SS)或针对AAVS1的对照sgRNA(Ctrl)的表达质粒转染Raji细胞。转染后7天,通过使用外显子特异性抗体(B)或通过流式细胞术测定靶向外显子(CD45RB),其上游外显子(外显子4,CD45RA),下游外显子(外显子6,CD45RC)和总CD45的表达外显子特异性实时PCR(C)。数据是两个独立实验的代表性(B)或总结(C)。**,p<0.01在学生t检验。(D)CD45RBlow细胞中CD45RB外显子的5'位点的G>A突变富集。从B中显示的细胞扩增内含子-外显子连接,以及来自TAM处理的Raji细胞的分选的CD45RBhi和CD45RBlow细胞。扩增子通过高通量测序分析,超过8000x覆盖。描绘了具有可检测突变(突变体读数/WT读数>0.1%)的每个核苷酸的碱基组成,并且在左侧标记了目标G的G>A转化百分比。在内含子-外显子连接序列的顶部标记sgRNA和PAM序列的位置。虚线描绘内含子/外显子连接。数据是两个独立实验的代表。(E)流式细胞分析CD45RB在对照Raji细胞或分选的CD45RBhi和CD45RBlow细胞从TAM处理的细胞表达。(F)TAM诱导CD45RB跳跃与CD45蛋白序列的最小变化。从cDNA扩增外显子-内含子连接点,并通过高通量测序分析碱基取代。注意,与基因组DNA相比,TAM处理的细胞的cDNA中的两种突变突变显着降低。
图3:TAM通过在将5'SS处不变的鸟嘌呤转化为腺嘌呤来促进RPS24外显子5的跳读。(A)TAM将RPS24外显子5的5'剪接位点的腺嘌呤转化为鸟嘌呤。用nCas9-AIDx-Ugi和对照sgRNA(Ctrl)或靶向RPS24外显子5(5')的5'SS的sgRNA的表达质粒转染293T细胞,E5-5’SS)。转染后6天,从基因组DNA(顶部2个小组)或cDNA(底部2个小组)扩增sgRNA靶向区域,并通过超过8000x覆盖的高通量测序进行分析。描绘了具有可检测突变(>0.1%)的核苷酸的碱基组成。来自Refseq的外显子/内含子连接序列的顶部显示了sgRNA和PAM序列的位置。用虚线描绘内含子/外显子连接。数据是两个独立实验的代表。(B)TAM促进RPS24外显子5的跳跃。如在A中,从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示对照sgRNA(上图)或E5-5'SS sgRNA(下图)处理细胞的每个剪接结的覆盖率和百分比。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。为了清楚起见,仅描绘了代表总转录物超过1%的结弧。(C)通过异构体特异性实时PCR测定RPS24同种型与包含或跳过的外显子5之间的比例。数据是三个独立实验的总结。(D,E)5’SS的G至A突变引起RPS24外显子5的完全跳跃。从TAM处理的细胞中获得两个单细胞克隆,并通过Sanger测序进行分析,右侧(D)表示细胞的基因型。通过实时PCR(E)测定包含的外显子5的同种型的表达。数据是三个独立实验的总结。
图4:TAM通过在突变各自的剪接位点的鸟嘌呤,诱导TP53外显子8或外显子9的跳读。(A-C)TAM通过突变其5'SS引起TP53外显子8的跳跃。(A)如图1所示,用nCas9-AIDx-Ugi和靶向AAVS1(Ctrl)的对照sgRNA或靶向TP53外显子8(E8-5'SS)的5'SS的sgRNA的表达质粒转染293T细胞。转染后6天,从基因组DNA(顶部2个小组)或cDNA(底部2个小组)扩增sgRNA靶向区域,并通过高通量测序进行分析。描绘了具有可检测突变(>0.1%)的核苷酸的碱基组成。来自Refseq的外显子/内含子连接序列的顶部显示了sgRNA和PAM序列的位置。用虚线描绘内含子/外显子连接。数据是两个独立实验的代表。(B)通过RT-PCR分析TP53外显子8的剪接。(C)如在A中,从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或E8-5'SS sgRNA(下图)处理的细胞的每个剪接结的覆盖率和百分比。为了清楚起见,仅描绘了代表总转录物超过1%的结弧。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。注意,在TAM处理的细胞中,总转录物的42.1%跳过了外显子8,而1.1%激活了外显子8内的隐蔽剪接位点。(D-F)TAM通过突变其3'SS引起TP53外显子9的跳跃。(D)如(A)所示,用TAM和靶向TP53外显子9的3'SS的sgRNA转染293T细胞。转染7天后,从基因组DNA扩增内含子-外显子连接并通过高通量测序进行分析。(E)通过RT-PCR分析TP53的剪接。(F)如在D中,从cDNA扩增剪接连接并通过高通量测序进行分析。描绘了占总誊本超过1%的交点。注意,3’SS突变导致外显子在总转录物的34%中跳跃,并在23.6%的mRNA中激活隐蔽剪接位点。TAM处理的细胞也激活了内含子8内的神经外显子(总转录物的4.3%)。(A-F)数据代表两个独立实验。
图5:TAM激活替代剪接位点并将Stat3α转化为Stat3β。(A)利用TAM消除Stat3外显子23(Stat3α)的典型3'SS并促进下游替代3'SS(Stat3β)的利用的示意图。(B)TAM在Stat3外显子23的典型3'SS处突变不变G。如图1所示,293T细胞用AIDx-nCas9-Ugi和靶向Stat3外显子23(E23-3'SS-)或针对AAVS1的sgRNA(Ctrl)的表达质粒转染。从DNA(顶部2个小组)或cDNA(底部2个小组)扩增内含子-外显子连接,并通过高通量测序进行分析。描绘了具有可检测突变(>0.1%)的核苷酸的碱基组成。注意,TAM也在外显子23中诱导了两个突变,其比cDNA(54%和16%)的cDNA(26%和6%)少得多。数据是两个独立实验的代表。(C)TAM在Stat3外显子23增强了远端3'SS的利用。从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或E23-3'SS sgRNA(下图)处理细胞的每个剪接结的覆盖率和百分比。描绘了占总誊本超过1%的交点。连接读数的计数和百分比(括号中)描绘在每个连接弧的顶部。注意仅在用Stat3-E23-3'SS处理的细胞中,sgRNA是在~10%转录物中活化的隐蔽剪接位点。数据是两个独立实验的代表。(E-F)TAM将Stat3α转化为Stat3β。通过RT-PCR(D),同种型特异性实时荧光定量PCR(E)检测Stat3α和Stat3β在TAM处理细胞中的表达,并测定Stat3α与Stat3β之间的比例(F)。
图6:TAM通过消除外显子10的5'SS或3’SS将PKM2切换到PKM1。(A)显示TAM在C2C12细胞中将PKM2转移到PKM1的示意图。顶板,在WT C2C12细胞中,外显子10,但不是PKM基因的外显子9被剪接以产生PKM2,其cDNA被限制酶PstI识别;底部,TAM将外显子10的5'SS处的GT二核苷酸转化为AT(或3’SS的AG转化为AA)。因此,外显子9而不是外显子10被剪接以产生PKM1,其cDNA被限制酶NcoI识别。(B)TAM增加PKM1,同时抑制PKM2表达。用TAM和目标sgRNA(PKM-E10-5'SS或PKM-E10-3’SS)或对照sgRNA(Ctrl)转染C2C12细胞。转染后7天,将细胞向肌肉细胞分化,而后从cDNA扩增PKM,扩增子用PstI或NcoI消化。指出了对应于PKM1或PKM2的片段,并且包括GAPDH和总PKM(外显子5和外显子6的扩增子)作为载体对照。(C,D)TAM在PKM外显子10的3'SS(C)或5’SS(D)处将不变G转换为A。从基因组DNA(顶部两个图)或cDNA(底部两个图)扩增内含子-外显子连接,并通过高通量测序进行分析。描述了每种鸟嘌呤的碱组成和A的百分比。数据是两个独立实验的代表。(E)PKM1与PKM2比值的实时PCR分析。数据是代表性的(B,D,E)或两个独立实验(C)的总结。(F)TAM将PKM2移至PKM1。如在C中,从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或E10-5'SSsgRNA(下图)处理细胞的每个剪接结的覆盖率和百分比。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。(G,H)和上面类似,TAM可以在未分化的C2C12细胞中,将PKM2转化为PKM1。
图7:TAM通过消除PKM的第9个外显子的3'SS或5’SS来抑制PKM1的表达。(A)TAM在PKM外显子9的3'SS或5’SS处将不变G转化为A。(B)C2C12细胞分化来的肌肉细胞中,从对照或TAM处理的细胞(E9-3'SS)的基因组DNA,并通过高通量测序进行分析。描绘了突变频率超过0.1%的每个鸟嘌呤的百分比G或A。数据是两个独立实验的代表。注意,TAM在这个位点也引起外显子9内的C>T突变。(C、D、E)TAM抑制PKM1表达同时促进PKM2的表达。(C)从cDNA扩增PKM,扩增子用NcoI消化。指出了对应于PKM1或PKM2的片段,并且包括GAPDH和总PKM(外显子5和外显子6的扩增子)作为载体对照。(D)通过实时PCR测定PKM1和PKM2的表达,计算PKM1和PKM2的比例。(E)从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或E9-3'SS sgRNA(下图)处理的细胞的每个剪接结的覆盖率和百分比。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。数据是两个独立实验的总结。***,p<0.0001在学生的t检验。(F)如上,C2C12细胞分化来的肌肉细胞中,从对照或TAM处理的细胞(E9-5'SS)的基因组DNA,并通过高通量测序进行分析。描绘了突变频率超过0.1%的每个鸟嘌呤的百分比G或A。数据是两个独立实验的代表。(G)实时定量PCR分析PKM1和PKM2的表达。
图8:TAM在5'SS上将不变G转换为A后,保留了BAP1的第二个内含子。(A)引导TAM在BAP1外显子2的5'剪接位点处突变不变G并示出其保留的示意图。BAP1的第二个内含子可能通过内含子定义方式进行剪接,其中5'SS与下游3'SS配对。将不变量G转换为A,在5'SS处理U1识别U1RNP并破坏内含子定义,导致包含该内含子。(B,C)TAM诱导BAP1内含子2的保留。用AIDx-nCas9-Ugi和针对AAVS1(Ctrl)或针对BAP1(NAP1-E2-5'SS)的外显子2的5'SS的sgRNA的表达质粒转染293T细胞。转染7天后,通过RT-PCR(B)或同种型特异性实时PCR(C)分析BAP1mRNA的剪接。(D)保留内含子含有5'SS G>A突变。从用对照sgRNA(ctrl)或目标sgRNA(E2-5'SS)处理的293T细胞的基因组DNA(顶部两个图)或cDNA(底部两个图)扩增内含子-外显子连接。描绘了具有可检测突变的每个鸟嘌呤的碱基组成。在内含子-外显子连接序列的顶部标记sgRNA和PAM序列的位置,虚线描绘内含子/外显子连接。数据是两个独立实验的代表。注意,因为内含子2在对照细胞中有效剪接,只有接受E2-5'SS sgRNA的细胞具有覆盖内含子的读数,其中99%含有G>A突变。(E)突变5’SS诱导的第二内含子的保留,而不是跳过BAP1的第二外显子。如在D中,从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或E2-5'SS sgRNA(下图)处理细胞的每个剪接位的覆盖率和百分比。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。注意在sgRNA处理的细胞中,2.4%的mRNA被剪接以跳过第二个外显子,而超过60%保留了第二个内含子。数据是两个独立实验的代表性(B,D,E)或总结(C)。
图9:在BAP1的外显子3的3'SS处将不变G转化为A导致其保留。(A)引导TAM突变BAP1外显子3的3'SS处的不变G并引导其保留的示意图。(B,C)TAM诱导BAP1内含子2的保留。用AIDx-nCas9-Ugi和AAVS1(Ctrl)或BAP1的内含子2的3'SS的sgRNA的表达质粒转染293T细胞。转染7天后,通过RT-PCR(B)和同种型特异性实时PCR(C)分析BAP1mRNA的剪接。(D)保留的第二个内含子在3'SS处含有G>A突变。从使用对照sgRNA(Ctrl)处理的293T细胞的基因组DNA(顶部2个小组)或cDNA(底部2个小区)中扩增出5'SS,或者以3'ss(E3-3'SS)为靶标的sgRNA扩增。描绘了具有可检测突变的每个鸟嘌呤的碱基组成(G>A转化效率超过0.1%)。在内含子-外显子连接序列的顶部显示了sgRNA和PAM序列的位置。虚线描绘了内含子/外显子结。数据是两个独立实验的代表。注意,因为内含子2在Ctrl细胞中有效拼接,只有接受E3-3'SSsgRNA的细胞具有覆盖内含子的读数。(E)TAM主要诱导BAP1的第二外显子的保留。如在D中,从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或E3-3'SS sgRNA(下图)处理的细胞的每个剪接结的覆盖率和百分比。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。注意在sgRNA处理的细胞中,4.7%的mRNA跳过第3外显子,8.7%使用下游隐窝剪接位点,而超过20%保留了第2个内含子。数据是两个独立实验的代表性(B,D,E)或总结(C)。
图10:在GANAB外显子6上游的聚嘧啶簇(PPT)将Cs转化为Ts增强其包含。(A)指导TAM在GANAB外显子6的PPT处将Cs转化为Ts的示意图,以增强3'SS的强度。GANAB外显子6的聚嘧啶多糖包含多个C(左),并将这些C转化为T(右),增加这种3'SS的强度(从6.88到10.12),并增强外显子6的包含。(B)TAM将GnAB外显子6的PPT转化为Ts。用AIDx-nCas9--Ugi和对照sgRNA(Ctrl)或靶向GANAB外显子6(PPT-E6GANAB)的PPT的sgRNA的表达质粒转染293T细胞。转染后6天,从基因组DNA扩增sgRNA靶向区域,并通过超过8000x覆盖的高通量测序进行分析。描绘了具有可检测突变(>0.1%)的核苷酸的碱基组成。在连接序列的顶部显示了sgRNA和PAM序列的位置。用虚线描绘内含子/外显子连接。数据是两个独立实验的代表。(C,D,E)TAM加强了GANAB第六外显子的包含。(C)如在B中,从cDNA扩增剪接连接并通过高通量测序进行分析。图片显示了对照sgRNA(上图)或PPT-E6GANAB sgRNA(下图)处理的细胞的每个剪接结的覆盖率和百分比。结点读数的计数和百分比(括号中)描绘在每个连接弧的顶部。(D,E)通过RT-PCR(D)或同工型特异性实时PCR(E)分析GANAB mRNA的剪接。数据是两个独立实验的代表(C,D)或总结(E)。(F,G)TAM促进ThyN1的第六外显子的包含。(H,I)TAM增强了OS9的第13个外显子的包含。
图11:在RPS24外显子5上游的聚嘧啶(PPT)转化C至T增强其包含。(A)TAM在RPS24外显子5的PPT处将C转化为T。293T细胞用AIDx-nCas9-Ugi和AAVS1(Ctrl)或针对RPS24(PPT-E5RPS25)第5外显子的聚嘧啶核苷的sgRNA的表达质粒转染。转染后6天,从基因组DNA扩增sgRNA靶向区域,并通过超过8000x覆盖的高通量测序进行分析。描绘了具有可检测突变(>0.1%)的每个胞嘧啶的百分比,数据是两个独立实验的代表。(B,C)如A中,TAM增强了RPS24的第5外显子的包含。通过从cDNA(B)或同种型特异性实时PCR(C)扩增的连接的高通量测序来分析RPS24mRNA的剪接。(D,E)PPT的C至T转化增加了RPS24的外显子6的含量。从TAM处理的细胞衍生出两个单细胞克隆,并通过Sanger测序(D)进行分析,右侧表示克隆细胞的基因型。(E)通过同种型特异性实时PCR测定RPS24的外显子6的含量。数据是两个独立实验的代表性(A,B,D)或总结(C,E)。
图12:利用TAM在杜氏肌无力病人细胞中诱导外显子跳读,修复DMD基因的读码框,恢复抗肌萎缩蛋白(DMD)的表达。(A)指导TAM在DMD外显子50的5’SS处将G转化为A,恢复病人细胞中抗肌萎缩蛋白的表达的示意图。和正常细胞相比(上图),此病人因为遗传突变缺失外显子51,破坏了抗肌萎缩蛋白的读码框,造成此蛋白的完全缺失(中图);利用TAM突变外显子50的5’SS的GU为AU后,在病人细胞中外显子50跳读,从而恢复抗肌萎缩蛋白的读码框和蛋白表达。(B)用对照sgRNA(ctrl)或目标sgRNA(E50-5'SS)处理杜氏肌无力病人的iPSC细胞后,利用PCR扩增相应的DNA,进行高通量测序分析诱导的突变。数据是两个独立实验的代表。(C,D)分别将正常人来源的iPSCs,病人来源的iPSCs和修复后病人来源的iPSCs向心肌细胞分化,而后分别利用RT-PCR(C)或western blot(D)检测DMD基因的表达。(E)修复后的细胞精确发生49和52号外显子的剪接。
图13:利用TAM技术调控RNA剪接的示意图。利用TAM技术将内含子5’剪接位点处的GT突变为AT,可诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换或内含子包含;将所述内含子3’剪接位点的AG突变为AA,也可诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换或内含子包含;诱导内含子3’端的多聚嘧啶区中的C突变为T,可增强弱剪接位点,从而增强外显子包含。
图14:利用TAM在杜氏肌无力病人细胞中诱导外显子跳读,修复DMD基因的读码框,恢复抗肌萎缩蛋白(DMD)的表达。
具体实施方式
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成优选的技术方案。
本文通过在细胞中产生点突变,尤其是将细胞中感兴趣基因的感兴趣内含子的3’剪接位点AG突变为AA,或感兴趣基因的感兴趣内含子的5’剪接位点GT突变为AT,或感兴趣基因的感兴趣内含子的多聚嘧啶区的多个C(例如2-10个)分别突变为T,从而调控该细胞中感兴趣基因的RNA剪接,以诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换、诱导内含子包含或增强外显子包含。“调控”在本文中意指改变RNA的常规的剪接方式。
可利用靶向性的胞嘧啶脱氨酶来实施本发明。本文中,通过将胞嘧啶脱氨酶与具有靶向作用的蛋白融合,构建本文的靶向性胞嘧啶脱氨酶。
本文中,胞嘧啶脱氨酶指具有胞嘧啶脱氨酶活性的各种酶,包括但不限于APOBEC家族的酶,如APOBEC-2、AID、APOBEC-3A、APOBEC-3B、APOBEC-3C、APOBEC-3DE、APOBEC-3G、APOBEC-3F、APOBEC-3H、APOBEC4、APOBEC1和pmCDA1。适用于本文的胞嘧啶脱氨酶可以来自任何物种,尤其优选是哺乳动物尤其是人的胞嘧啶脱氨酶。优选的是,适用于本文的胞嘧啶脱氨酶是激活型胞嘧啶脱氨酶,如人源激活型胞嘧啶脱氨酶。APOBEC家族的胞嘧啶脱氨酶是一种RNA编辑酶家族,N端有核定位信号,C端有核输出信号,其催化结构域为APOBEC家族所共有。一般认为N端结构为体细胞超变(SHM)所必须。胞嘧啶脱氨酶的功能是对胞嘧啶脱氨基,将胞嘧啶变成尿嘧啶,随后的DNA修复可以将尿嘧啶变成其它碱基。应理解的是,本领域周知的胞嘧啶脱氨酶或其保留了对胞嘧啶脱氨基、将胞嘧啶变成尿嘧啶的生物学活性的片段或突变体均可用于本文。
在某些实施方案中,本文使用AID作为靶向性胞嘧啶脱氨酶中的胞嘧啶脱氨酶。AID的氨基酸残基9-26为核定位(NLS)结构域,尤其是氨基酸残基13-26参与了DNA的结合,氨基酸残基56-94为催化结构域,氨基酸残基109-182为APOBEC样结构域,氨基酸残基193-198为核输出(NES)结构域,氨基酸残基39-42与连环蛋白样蛋白1(CTNNBL1)相互作用,氨基酸113-123残基是hotspot识别环。
本文可使用AID的全长序列(如SEQ ID NO:25第1457-1654位氨基酸所示),也可使用AID的片段。优选的是,所述片段至少包括NLS结构域、催化结构域和APOBEC样结构域。因此,在某些实施方案中,所述片段至少包含AID第9-182位氨基酸残基(即SEQ ID NO:25第1465-1638位氨基酸残基)。在其他实施方案中,所述片段至少包含AID第1-182位氨基酸残基(即SEQ ID NO:25第1457-1638位氨基酸残基)。例如,在某些实施方案中,本文使用的AID片段由第1-182位氨基酸残基组成,由第1-186位氨基酸残基组成,或由第1-190位氨基酸残基组成。因此,在某些实施方案中,本文使用的AID片段由SEQ ID NO:25第1457-1638位氨基酸残基、SEQ ID NO:25第1457-1642位氨基酸残基,或由SEQ ID NO:25第1457-1646位氨基酸残组成。
本文还可使用AID的保留了其胞嘧啶脱氨酶活(即对胞嘧啶脱氨基、将胞嘧啶变成尿嘧啶的生物学活性)的变体。例如,这样的变体相对于AID的野生型序列可具有1-10个,如1-8个,1-5个或1-3个氨基酸变异,包括氨基酸的缺失、取代和突变。优选的是,这些氨基酸变异不发生在上述NLS结构域、催化结构域和APOBEC样结构域内,或即便发生在这些结构域内也不影响到这些结构域原本的生物学功能。例如,优选的是,这些变异不发生在AID氨基酸序列的第24、27、38、56、58、87、90、112、140等位置上。在某些实施方案中,这些变异也不发生在氨基酸39-42、氨基酸113-123之内。因此,例如,变异可发生在氨基酸1-8、氨基酸28-37、氨基酸43-55和/或氨基酸183-198之中。在某些实施方案中,变异发生在第10、82和156位。例如,在第10、82和156位发生取代突变,这类取代突变可以是K10E、T82I和E156G。在这些实施方案中,示例性的AID突变体的氨基酸序列含有SEQ ID NO:31第1447-1629位所示的氨基酸序列,或由SEQ ID NO:31第1447-1629位所示的氨基酸残基组成。其它AID、其片段或突变体的例子可参见CN 201710451424.3,本文将其全部内容以引入的方式纳入本文。
本文中,具有靶向作用的蛋白可以是本领域周知的能靶向细胞基因组中感兴趣基因的蛋白,包括但不限于特异识别靶向序列的TALEN蛋白、突变识别靶向序列的锌指蛋白、Ago蛋白、Cpf酶以及Cas酶。可利用本领域周知的TALEN蛋白、锌指蛋白、Ago蛋白和Cpf酶和Cas酶来实施本文。
因此,在某些实施方案中,适用于本文的靶向性胞嘧啶脱氨酶可选自:(1)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas酶的融合蛋白;(2)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的TALEN蛋白的融合蛋白;(3)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的锌指蛋白的融合蛋白;(4)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶的融合蛋白;和(5)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与Ago蛋白的融合蛋白。
当使用Cpf酶的时候,优选使用核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶。Cpf酶在其识别的sgRNA的指引下,结合到特定的DNA序列,允许与其融合的胞嘧啶脱氨酶实施本文所述的突变。Ago蛋白则需要在其识别的gDNA的指引下结合到特定的DNA序列。
在某些实施方案中,本文利用靶向胞嘧啶脱氨酶AID介导基因突变技术(TAM)将内含子剪接位点的鸟嘌呤突变成腺嘌呤,特异性阻断外显子识别过程,调控内源性mRNA的选择性剪接过程。本文的TAM技术使用一种核酸酶活性缺失的Cas蛋白与胞嘧啶脱氨酶AID、其活性片段或突变体的融合蛋白。在sgRNA的指引下,所述融合蛋白被招募到特定的DNA序列,AID、其活性片段或突变体将鸟嘌呤(G)突变成腺嘌呤(A),或将胞嘧啶突变(C)成胸腺嘧啶(T)。
CRISPR(Clustered Regularly Interspaced Short Palindromic Repeats)是细菌抵御病毒侵袭或躲避哺乳动物免疫反应的基因编辑系统。该系统经过改造和优化,目前已被广泛应用在体外生化反应、细胞与个体的基因编辑中。通常,具有核酸内切酶活性的Cas蛋白(也称为Cas酶)与其特异性识别的sgRNA形成的复合物通过sgRNA的配对区(即靶标结合区)与靶标DNA中的模板链进行互补配对,由Cas在特定位置将双链DNA切断。本文利用Cas/sgRNA的上述特性,即利用sgRNA与靶标的特异性结合而将Cas定位到期望的位置,在该位置由融合蛋白中的AID、其活性片段或突变体将鸟嘌呤(G)突变成腺嘌呤(A),或将胞嘧啶突变(C)成胸腺嘧啶(T)。
适用于本文的核酸酶活性部分(仅具有DNA单链断裂能力)或完全缺失(无DNA双链断裂能力),尤其是核酸内切酶活性部分或完全缺失、但保留了解旋酶活性的Cas蛋白可以是衍生自本领域周知的各种Cas蛋白及其变异体,包括但不限于Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9(也称为Csn1和Csx12)、Cas10、Csy1、Csy2、Csy3、Cse1、Cse2、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、Csf1、Csf2、Csf3、Csf4、Cpf1其同源物或其修饰形式。
在一些实施方案中,使用核酸酶活性缺失的Cas9酶和其特异性识别的单链sgRNA。Cas9酶可以是来自不同物种的Cas9酶,包括但不限于来自化脓链球菌的Cas9(SpCas9)、来自金黄色葡萄球菌的Cas9(SaCas9),以及来自嗜热链球菌的Cas9(St1Cas9)等。可以使用Cas9酶的各种变体,只要该Cas9酶能特异性识别它的sgRNA,并缺失核酸酶活性即可。
可采用本领域周知的方法制备核酸酶活性缺失的Cas蛋白,这些方法包括但不限于使Cas蛋白中核酸内切酶的整个催化结构域缺失或使该结构域中的一个或数个氨基酸发生突变,从而产生核酸酶活性缺失的Cas蛋白。突变可以是一个或数个(例如2个以上、3个以上、4个以上、5个以上、10个以上,至整个催化结构域)氨基酸残基的缺失或取代,或一个或数个新氨基酸残基(例如1个以上、2个以上、3个以上、4个以上、5个以上、10个以上,或者1~10个、1~15个不等)的插入。可采用本领域常规的方法进行上述结构域的缺失或氨基酸残基的突变,以及检测突变后的Cas蛋白是否还具有核酸酶活性。例如,对于Cas9,可将它的两个核酸内切酶催化结构域RuvC1和HNH分别突变,例如将该酶的第10个氨基酸(位于RuvC1结构域中)天冬酰胺突变为丙氨酸或其它氨基酸,将第841位氨基酸(位于HNH结构域中)组氨酸突变为丙氨酸或其它氨基酸。这两处突变使Cas9失去核酸内切酶活性。优选的是,Cas酶完全无核酸酶活性。在一个或多个实施方案中,本文使用的无核酸酶活性的Cas9酶的氨基酸序列如SEQ ID NO:25第42-1452所示。在其他实施方案中,本文使用的Cas酶部分缺失核酸酶活性,即该Cas酶可引起DNA单链断裂。这类Cas酶的代表性例子可如SEQ ID NO:33第42-1419位氨基酸残基所示。在其它实施方案中,本文使用的Cas酶的氨基酸序列如SEQ IDNO:23第199-1566位所示,或如SEQ ID NO:50第199-1262位氨基酸残基所示。其它Cas酶的例子可参见CN 201710451424.3,本文将其全部内容以引入的方式纳入本文。
Cas/sgRNA复合物行使功能需要在DNA的非模板链(3’到5’)有前间区序列邻近基序(protospacer adjacent motif,PAM)。不同Cas酶,其对应的PAM并不完全相同。例如,SpCas9的PAM通常是NGG(SEQ ID NO:34);SaCas9酶的PAM通常是NNGRR(SEQ ID NO:35);St1Cas9酶的PAM通常是NNAGAA(SEQ ID NO:36);其中,N为A、C、T或G,R为G或A。
在某些优选的实施方式中,SaCas9酶的PAM是NNGRRT(SEQ ID NO:37)。在某些优选的实施方式中,SpCas9的PAM是TGG(SEQ ID NO:38);在某些优选的实施方式中,SaCas9酶KKH突变体的PAM是NNNRRT(SEQ ID NO:39);其中,N为A、C、T或G,R为G或A。
sgRNA通常包括两部分:靶标结合区和蛋白识别区(如Cas酶识别区或Cpf酶识别区)。靶标结合区与蛋白识别区通常以5’到3’的方向连接。靶标结合区的长度通常为15~25个碱基,更通常为18~22个碱基,如20个碱基。靶标结合区与DNA的模板链特异性结合,从而将融合蛋白招募到预定位点。通常,DNA模板链上sgRNA结合区域的对侧区紧邻PAM,或者隔开数个碱基(例如10个以内,或8个以内,或5个以内)。因此,在设计sgRNA时,通常先根据所用的剪接酶(如Cas酶)确定该酶的PAM,然后在DNA的非模板链上寻找可作为PAM的位点,之后将该非模板链(3’到5’)PAM位点下游紧邻该PAM位点或与该PAM位点隔开10个以内(例如8个以内、5个以内等)的长15~25个碱基、更通常长18~22个碱基的片段作为sgRNA的靶标结合区的序列。sgRNA的蛋白识别区则根据所使用的剪接酶而确定,这为本领域所技术人员所掌握。
因此,本文的sgRNA的靶标结合区的序列为含所选剪接酶(如Cas酶或Cpf酶)识别的PAM位点的DNA链下游紧邻该PAM位点或与该PAM位点隔开10个以内(例如8个以内、5个以内等)的长15~25个碱基、更通常长18~22个碱基的片段;其蛋白识别区为所选剪接酶所特异性识别。
鉴于本文的目的是将内含子剪接位点的鸟嘌呤突变成腺嘌呤,或将3’剪接位点上游的多聚嘧啶链中的C突变为T,因此,在设计用于本文的sgRNA时,需考虑剪接位点附近是否存在PAM序列,以及PAM序列与剪接位点的距离。因此,通常,sgRNA结合到含有感兴趣基因感兴趣内含子剪接位点的序列,或结合到感兴趣多聚嘧啶区的互补序列。或者,sgRNA的靶标结合区含有感兴趣基因感兴趣内含子剪接位点的互补序列,或含有感兴趣基因感兴趣内含子的多聚嘧啶区的序列。
可采用本领域常规的方法制备sgRNA,例如,采用常规的化学合成方法合成。sgRNA也可经由表达载体转入细胞,在细胞内表达出该sgRNA;或利用腺相关病毒导入动物/人类体内。可采用本领域周知的方法构建sgRNA的表达载体。
在某些实施方案中,本文也提供sgRNA序列或其互补序列,其包括靶标结合区和蛋白识别区,其中,所述靶标结合区结合到含有感兴趣基因感兴趣内含子剪接位点的序列,或结合到感兴趣多聚嘧啶区的互补序列。通常,所述靶标结合区长度为15~25个碱基,如18~22个碱基,优选20个碱基。在某些实施方案中,所述sgRNA的靶标结合区结合到DMD外显子50的含3’剪接位点的序列;优选地,所述sgRNA的靶标结合区如SEQ ID NO:17或51所示。
用于本文的靶向性的胞嘧啶脱氨酶优选是前文所述的Cas酶与前文所述的AID、其片段或突变体的融合蛋白。Cas酶通常在融合蛋白氨基酸序列的N端,AID、其片段或突变体在C端;当然,AID、其片段或突变体也可在融合蛋白氨基酸序列的N端,而Cas酶在C端。在某些实施方案中,本文提供主要由Cas酶和AID、其片段或突变体形成的融合蛋白。应理解的是,本文所述的“主要由……形成”的融合蛋白或类似表述并不意指融合蛋白仅包括Cas酶和AID、其片段或突变体,该限定应理解为融合蛋白可仅包括Cas酶和AID、其片段或突变体,或还可含有其他不影响到该融合蛋白中的Cas酶的靶向作用及AID、其片段或突变体突变靶序列的功能的部分,包括但不限于各种接头序列、核定位序列、Ugi序列以及如下文所述因基因克隆操作、和/或为了构建融合蛋白、促进重组蛋白的表达、获得自动分泌到宿主细胞外的重组蛋白、或利于重组蛋白的检测和/或纯化等而在融合蛋白中引入的氨基酸序列。
Cas酶可通过接头与AID、其片段或突变体融合。接头可以是3~25个残基的肽,例如3~15、5~15、10~20个残基的肽。肽接头的适合的实例是本领域中公知的。通常,接头含有一个或多个前后重复的基序,该基序通常含有Gly和/或Ser。例如,该基序可以是SGGS(SEQ ID NO:40)、GSSGS(SEQ ID NO:41)、GGGS(SEQ ID NO:42)、GGGGS(SEQ ID NO:43)、SSSSG(SEQ ID NO:44)、GSGSA(SEQ ID NO:45)和GGSGG(SEQ ID NO:46)。优选地,该基序在接头序列中是相邻的,在重复之间没有插入氨基酸残基。接头序列可以包含1、2、3、4或5个重复基序组成。在某些实施方案中,接头序列是多甘氨酸接头序列。接头序列中甘氨酸的数量无特别限制,通常为2~20个,例如2~15、2~10、2~8个。除甘氨酸和丝氨酸来,接头中还可含有其它已知的氨基酸残基,例如丙氨酸(A)、亮氨酸(L)、苏氨酸(T)、谷氨酸(E)、苯丙氨酸(F)、精氨酸(R)、谷氨酰胺(Q)等。在某些实施方案中,接头序列为XTEN,其氨基酸序列如SEQ ID NO:29第183-198位氨基酸残基所示。其它示例性的接头序列可参见CN201710451424.3中所列出的接头序列,如该申请的SEQ ID NO:21-31等。
应理解,在基因克隆操作中,常常需要设计合适的酶切位点,这势必在所表达的氨基酸序列末端引入了一个或多个不相干的残基,而这并不影响目的序列的活性。为了构建融合蛋白、促进重组蛋白的表达、获得自动分泌到宿主细胞外的重组蛋白、或利于重组蛋白的纯化,常常需要将一些氨基酸添加至重组蛋白的N-末端、C-末端或该蛋白内的其它合适区域内,例如,包括但不限于,适合的接头肽、信号肽、前导肽、末端延伸等。因此,本文融合蛋白的氨基端或羧基端还可含有一个或多个多肽片段,作为蛋白标签。任何合适的标签都可以用于本文。例如,所述的标签可以是FLAG,HA,HA1,c-Myc,Poly-His,Poly-Arg,Strep-TagII,AU1,EE,T7,4A6,ε,B,gE以及Ty1。这些标签可用于对蛋白进行纯化。
本文的融合蛋白还可含有核定位序列(NLS)。可使用本领域周知的各种来源和各种氨基酸组成的核定位序列。这类核定位序列包括但不限于:SV40病毒大T抗原的NLS;来自核质蛋白的NLS,例如,的核质蛋白二分NLS;来自c-myc的NLS;来自hRNPA1M9的NLS;来自输入蛋白-α的IBB结构域的序列;肌瘤T蛋白的序列;小鼠c-ablIV的序列;流感病毒NS1的序列;肝炎病毒δ抗原的序列;小鼠Mx1蛋白的序列;人聚(ADP-核糖)聚合酶的序列;以及类固醇激素受体(人)糖皮质激素的序列;等等。这些NLS序列的氨基酸序列可参见CN201710451424.3SEQ ID NO:33-47所列出的序列。在某些具体实施方案中,本文使用SEQ IDNO:25第26-33位氨基酸残基所示的序列作为NLS。NLS可位于融合蛋白的N端、C端;也可位于融合蛋白序列中,例如位于融合蛋白中Cas9酶的N端和/或C端,或位于融合蛋白中的AID、其片段或突变体的N端和/或C端。
可以通过任何适合的技术检测本发明融合蛋白在细胞核中的积聚。例如,可将检测标记融合到Cas酶上,使得在与检测细胞核的位置的手段(例如,对于细胞核特异的染料,如DAPI)相结合时融合蛋白在细胞内的位置可以被可视化。在某些实施方案中,本文使用3*flag作为标记,该肽段序列可如SEQ ID NO:25第1-23位氨基酸残基所示。应理解,通常,若存在标记序列时,标记序列通常在融合蛋白的N端。标记序列与NLS之间可直接连接,也可通过适当的接头序列连接。NLS序列可直接与Cas酶或AID、其片段或突变体连接,也可通过适当的接头序列与Cas酶或AID、其片段或突变体连接。
因此,在某些实施方案中,本文的融合蛋白由Cas酶和AID、其片段或突变体组成。在其它实施方案中,本文的融合蛋白由Cas酶通过接头与AID、其片段或突变体连接而成。在某些实施方案中,本文的融合蛋白由NLS、Cas酶、AID或其片段或突变体以及Cas酶和AID或其片段或突变体之间的任选的接头序列组成。在某些实施方案中,本文的融合蛋白除含有NLS、Cas酶和AID、其片段或突变体外,还可含有噬菌体蛋白,如作为UNG抑制剂的UGI。示例性的UGI的氨基酸序列可如本申请SEQ ID NO:23第1576-1659位氨基酸残基所示。因此,在某些实施方案中,本文的融合蛋白含有本文所述的Cas9酶、本文所述的AID或其片段或突变体、UGI以及NLS,或由这些部分及它们之间的任选的接头序列以及任选的用于检测、分离或纯化的氨基酸序列组成。Ugi序列可位于融合蛋白的N端、C端,或者位于融合蛋白之中,例如位于NLS序列与Cas酶之间或位于Cas酶与AID、其片段或突变体之间。在某些实施方案中,本文的融合蛋白从N端到C端依次为AID或其片段或突变体、Cas酶、Ugi和NLS,或者可为Cas酶、AID或其片段或突变体、Ugi和NLS,它们之间可由接头连接。
在某些实施方案中,本文使用CN 201710451424.3所公开的融合蛋白。更具体而言,本文使用本申请所公开的其氨基酸序列如SEQ ID NO:25、27、29、31、33、48或50所示,或如SEQ ID NO:25第26-1654位氨基酸所示,或如SEQ ID NO:27第26-1638位所示,或如SEQID NO:31第26-1629位氨基酸所示,或如SEQ ID NO:33第26-1638位氨基酸所示的融合蛋白,或如SEQ ID NO:48第26-1629位氨基酸所示。在某些实施方案中,本文的融合蛋白如本申请SEQ ID NO:23所示。
可构建表达上述融合蛋白的表达载体/质粒和表达所需sgRNA的载体/质粒,将其转入感兴趣的细胞中,以通过诱导感兴趣基因剪接位点的碱基突变来调控其RNA剪接。
“表达载体”可以是本领域熟知的各种细菌质粒、噬菌体、酵母质粒、植物细胞病毒、哺乳动物细胞病毒如腺病毒、逆转录病毒或其它载体。只要能在宿主体内复制和稳定,任何质粒和载体都可以用。表达载体的一个重要特征是通常含有复制起点、启动子、标记基因和翻译控制元件。表达载体还可包括翻译起始用的核糖体结合位点和转录终止子。本文所述的多核苷酸序列可操作性地连接到表达载体中的适当启动子上,以经由该启动子指导mRNA合成。这些启动子的代表性例子有:大肠杆菌的lac或trp启动子;λ噬菌体PL启动子;真核启动子包括CMV立即早期启动子、HSV胸苷激酶启动子、早期和晚期SV40启动子、反转录病毒的LTRs和其它一些已知的可控制基因在原核或真核细胞或其病毒中表达的启动子。标记基因可用于提供用于选择转化的宿主细胞的表型性状,包括但不限于真核细胞培养用的二氢叶酸还原酶、新霉素抗性以及绿色荧光蛋白(GFP),或用于大肠杆菌的四环素或氨苄青霉素抗性。当本文所述的多核苷酸在高等真核细胞中表达时,如果在载体中插入增强子序列,则将会使转录得到增强。增强子是DNA的顺式作用因子,通常大约有10到300个碱基对,作用于启动子以增强基因的转录。
本领域一般技术人员清楚如何选择适当的载体、启动子、增强子和宿主细胞。可采用本领域技术人员熟知的方法构建含本文所述的多核苷酸序列和合适的转录/翻译控制信号的表达载体。这些方法包括体外重组DNA技术、DNA合成技术、体内重组技术等。
本文的融合蛋白、其编码序列或表达载体,和/或sgRNA、其编码序列或表达载体可以组合物的形式提供。例如,组合物可含有本文的融合蛋白和sgRNA或sgRNA的表达载体,或可含有本文融合蛋白的表达载体和sgRNA或sgRNA的表达载体。在组合物中,融合蛋白或其表达载体、或sgRNA或其表达载体可以混合物的形式提供,或者可单独包装。组合物可以是溶液的形式,也可以是冻干形式。优选的是,组合物中的融合蛋白是本文所述的AID、其片段或突变体与本文所述的Cas酶的融合蛋白。
组合物可提供在试剂盒中。因此,本文提供含有本文所述组合物的试剂盒。或者,本文也提供一种试剂盒,该试剂盒含有本文的融合蛋白和sgRNA或sgRNA的表达载体,或含有本文融合蛋白的表达载体和sgRNA或sgRNA的表达载体。试剂盒中,融合蛋白或其表达载体、或sgRNA或其表达载体可独立包装,或以混合物的形式提供。试剂盒中还可包括例如用于将所述融合蛋白或其表达载体和/或sgRNA或其表达载体转入细胞的试剂,以及指导技术人员进行所述转入的说明书。或者,试剂盒还可包括指导技术人员采用试剂盒所含成分实施本文所述的各种方法和用途的说明书。试剂盒中还包括其它的试剂,例如用于PCR的试剂等。
本文的融合蛋白、其编码序列或表达载体,和/或sgRNA或其表达载体可用于诱导感兴趣基因剪接位点的碱基突变来调控其RNA剪接。因此,本文提供一种诱导感兴趣的细胞内感兴趣的基因的剪接位点发生碱基突变的方法,所述方法包括在所述细胞内表达本文所述的融合蛋白的步骤,根据所表达的融合蛋白,该方法还包括表达sgRNA或gDNA的步骤。例如,在某些实施方案中,在细胞中表达本文所述的AID、其片段或突变体与Cas酶的融合蛋白及其识别的sgRNA。在某些实施方案中,在细胞中表达胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的TALEN蛋白的融合蛋白。在某些实施方案中,在细胞中表达胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的锌指蛋白的融合蛋白。在某些实施方案中,在细胞中表达胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶的融合蛋白以及Cpf酶识别的sgRNA。在其它实施方案中,在细胞中表达胞嘧啶脱氨酶、其保留了酶活的片段或突变体与Ago蛋白的融合蛋白以及Ago蛋白识别的gDNA。
本文中,感兴趣的细胞尤其还包括那些需要在其中使感兴趣基因的剪接位点发生碱基突变以调控其RNA剪接的细胞。这类细胞包括原核细胞和真核细胞,例如植物细胞、动物细胞、微生物细胞等。尤其优选的是动物细胞,例如哺乳动物细胞、啮齿类动物细胞,包括人、马、牛、羊、鼠、兔等等。微生物细胞包括本领域周知的来自各种微生物种类的细胞,尤其是那些具有医疗研究价值、生产价值(例如燃料如乙醇的生产、蛋白质生产、油脂如DHA生产)的微生物种类的细胞。细胞还可以是各种器官来源的细胞,例如来自人肝脏、肾脏、皮肤等处的细胞,或者是血液细胞。细胞还可以是目前在售的各种成熟的细胞系,例如293细胞、COS细胞。在某些实施方案中,细胞是来自健康个体的细胞;在其他实施方案中,细胞是来自患病个体的患病组织的细胞,例如来自炎症组织的细胞、肿瘤细胞。在某些实施方案中,感兴趣的细胞是诱导型多能干细胞。细胞还可以是经基因工程改造过,以使其具有某种特定功能(例如生产感兴趣的蛋白)或产生感兴趣的表型的细胞。应理解的是,感兴趣的细胞包括体细胞和生殖细胞。在某些实施方案中,细胞是动物或人体内特定的细胞。
感兴趣的基因可以是任何感兴趣的核酸序列,尤其是各种与疾病相关,或与各种感兴趣的蛋白质的生产相关,或各种与感兴趣的生物学功能相关的基因或核酸序列。这类感兴趣的基因或核酸序列包括但不限于编码各种功能蛋白的核酸序列。本文中,功能蛋白指能够完成生物体的生理功能的蛋白质,包括催化蛋白、运输蛋白、免疫蛋白和调节蛋白等。在某些具体实施方式中,所述功能蛋白包括但不限于:疾病的发生、发展和转移中涉及的蛋白,细胞分化、增殖与凋亡中涉及的蛋白,参与新陈代谢的蛋白,发育相关的蛋白,以及各种药物靶点等等。例如,功能蛋白可以是抗体、酶、脂蛋白、激素类蛋白、运输和贮存蛋白、运动蛋白、受体蛋白、膜蛋白等。
作为示范性的例子,感兴趣的基因包括但不限于RPS24、CD45、DMD、PKM、BAP1、TP53、STAT3、GANAB、ThyN1、OS9、SMN2、β血红球蛋白基因、LMNA、MDM4、Bcl2和LRP8等。
在某些实施方案中,本文所述的方法包括将本文的融合蛋白或其表达载体和其识别的sgRNA或其表达载体或gDNA或其表达载体转入所述细胞内。在细胞组成型表达本文所述融合蛋白的情况下,可仅将相应的sgRNA或其表达载体或其识别的gDNA或其表达载体转入细胞中。在细胞诱导型表达本文所述融合蛋白的情况下,在转入sgRNA或gDNA之后,还可用诱导剂孵育细胞,或对细胞施与相应的诱导措施(例如光照)。优选地是,利用本文所述的AID、其片段或突变体与本文所述的Cas酶的融合蛋白及其识别的sgRNA来实施本文实施的方法。
可采用常规的转染方法将所述融合蛋白或其表达载体和/或其识别的sgRNA或其表达载体或gDNA或其表达载体转入细胞中。例如,当感兴趣的细胞为原核生物如大肠杆菌时,能吸收DNA的感受态细胞可在指数生长期后收获,用CaCl2法处理,所用的步骤在本领域众所周知。另一种方法是使用MgCl2。如果需要,转化也可用电穿孔的方法进行。当宿主是真核生物,可选用如下的DNA转染方法:磷酸钙共沉淀法,常规机械方法如显微注射、电穿孔、脂质体包装等。例如,转染时,首先制备质粒DNA-脂质体复合物,然后将该质粒DNA-脂质体复合物和相应的sgRNA或gDNA共同转染细胞。可采用市售的转染试剂盒或试剂将本文所述的载体或质粒转入感兴趣的细胞中,这类试剂包括但不限于2000试剂。转化细胞后,获得的转化子可以用常规方法培养,以允许其表达本文所述的融合蛋白。根据所用的细胞,培养中所用的培养基可选自各种常规培养基。
通常,针对不同的细胞,可采用已知技术设计表达本文融合蛋白和sgRNA或gDNA的表达载体,以使这些表达载体适于在该细胞中表达。例如,可在表达载体中提供利于在该细胞中启动表达的启动子以及其他相关的调控序列。这些都可由技术人员根据实际情况加以选择和实施。
对于用于本文的sgRNA,可在感兴趣基因的感兴趣的剪接位点附近寻找可作为PAM的位点,根据该PAM选择能识别该PAM的Cas酶,并依本文所述设计、制备含该Cas酶的本发明融合蛋白以及相应的sgRNA。因此,用于本文的sgRNA的靶标识别区通常含有感兴趣基因感兴趣内含子剪接位点的互补序列
本文所述的剪接位点具有本领域周知含义,包括5’剪接位点和3’剪接位点。本文中,所述5’剪接位点和3’剪接位点均是相对于内含子而言。通常,在感兴趣基因的感兴趣外显子/内含子的剪接位点附近选择可作为PAM的位点。例如,感兴趣的基因的感兴趣的外显子或内含子可以是RPS的外显子5、CD45的外显子5、TP53基因的外显子8或9、PKM的外显子9或10、BAP1的内含子2和TP53的内含子8等。或者,在某些实施方案中,在感兴趣基因3’剪接位点的上游存在于内含子内的多聚嘧啶链附近选择可作为PAM的位点。因此,这类sgRNA的靶标结合区含有感兴趣基因感兴趣内含子的多聚嘧啶区的序列。
本文的方法可以是体外方法,也可以是体内方法;此外,本文的方法包括治疗目的的方法和非治疗目的的方法。当体内实施时,可采用本领域周知的手段将本文的融合蛋白或其表达载体和其识别的sgRNA或其表达载体或gDNA或其表达载体转入实验对象体内,如相应的组织细胞内。应理解,体内实施时,实施对象可以是人或各种非人动物,包括本领域惯常采用的各种非人模式生物。体内实验应满足伦理要求。
本文所述的诱导感兴趣的细胞内感兴趣的基因的剪接位点发生碱基突变的方法是一种通用的RNA剪接调控方法,可用于基因治疗。因此,本文提供一种基因治疗方法,所述方法包括给予有需要的对象治疗有效量的表达本文所述融合蛋白的载体和相应的sgRNA或gDNA的表达载体。治疗有效量可根据对象的年龄、性别、所患疾病的性质和严重程度等方面予以确定。通常,给予治疗有效量的所述载体应足以缓和所述疾病的症状或治愈所述疾病。该基因治疗可用于因基因发生变异而致病的疾病的治疗,也可以用于通过调节不同剪接亚型而得以缓解症状或治愈的疾病的治疗。例如,因基因变异导致的疾病包括但不限于:DMD基因变异造成的杜氏肌无力症,SMN,β血红球蛋白IVS2 647G>A突变造成的地中海贫血症,LMNA突变造成的早衰症和家族性高胆固醇血症等。可通过调节不同剪接亚型的比例而实现症状缓解或治愈的疾病包括肿瘤,所述剪接亚型包括但不限于Stat3α向Stat3β的转换,PKM2向PKM1的转换,MDM4外显子6的跳跃,Bcl2可变剪接位点的选择,LRP8外显子八跳跃。
在某些实施方案中,本文提供一种肿瘤治疗方法,所述方法包括给予有需要的对象治疗有效量的表达本文任一实施方案所述的融合蛋白的载体和相应的sgRNA的表达载体的步骤。在某些实施方案中,所述sgRNA的靶标结合区包含Stat3内含子22的3’剪接位点的互补序列。在某些实施方案中,适用于此方法的sgRNA的靶标结合区如SEQ ID NO:3所示。或者,所述sgRNA的靶标结合区包含PKM内含子10的5’或3’剪接位点的互补序列。在某些实施方案中,适用于此方法的sgRNA的靶标结合区如SEQ ID NO:15或16所示。
在某些实施方案中,本文还提供一种治疗由于DMD基因变异造成的杜氏肌无力症的方法,所述方法包括给予有需要的对象治疗有效量的表达本文所述融合蛋白的载体和相应的sgRNA的表达载体的步骤,其中,所述sgRNA的靶标结合区包含DMD外显子50的5’剪接位点的互补序列。在某些实施方案中,适用于此方法的sgRNA的靶标结合区如SEQ ID NO:17或51所示。在某些实施方案中,适用于此方法的融合蛋白的氨基酸序列可如SEQ ID NO:23或50所示。
可采用本领域周知的手段实施本文所述的基因治疗方法。通常,基因治疗的给药途径包括离体途径和在体途径。例如,可采用合适的骨架载体(例如腺相关病毒载体)构建表达本文所述融合蛋白的表达载体和表达sgRNA或gDNA的载体,将其以通用的方式例如给予患者,例如注射。或者,在涉及血液疾病时,可获取对象存在基因变异的血液细胞,体外采用本文所述方法处理,使所述细胞消除所述变异后体外扩增,再回输给该对象。此外,还可利用本文所述的方法改造对象的多能干细胞,回输至患者,以达到治疗目的。
本文再一方面还提供本文任一实施方案所述的融合蛋白、其编码序列和/或表达载体,和/或sgRNA和/或其表达载体在制备用于调控RNA剪接的试剂或试剂盒、在制备用于基因治疗的试剂、或在制备用于治疗因基因变异导致的疾病或受益于功能蛋白不同剪接亚型的比例改变的肿瘤的药物中的应用。本文也涉及一种用于调控RNA剪接、基因治疗(尤其是治疗因基因变异导致的疾病或受益于功能蛋白不同剪接亚型的比例改变的肿瘤)的本文任一实施方案所述的融合蛋白、其编码序列和/或表达载体,以及sgRNA和/或其表达载体。
利用本文所述方法,可有效的诱导外显子跳读(例如RPS24外显子5,CD45外显子5,DMD基因外显子50,23,51等),调控互斥外显子的选择(PKM1/PKM2等),诱导内含子保留/包含(BAP1和TP53等)以及诱导可变剪接位点的利用(STAT3α/β等)等。同时可通过将3’剪接位点上游的C突变为T,促进选择性外显子的包含比例(RPS24外显子5,GANAB外显子5,ThyN1外显子6,OS9外显子13和SMN2外显子7)。此外,本文还证明,通过这一方法可以有效的纠正人类遗传突变造成的基因剪接缺陷。因此,本文所公开的方法是一种通用的RNA剪接调控方法,可用于疾病治疗,尤其可用于以下疾病的基因治疗:DMD基因变异造成的杜氏肌无力症,SMN,β血红球蛋白IVS2 647G>A突变造成的地中海贫血症,LMNA突变造成的早衰症,家族性高胆固醇血症。同时本文所述方法还可通过调节不同剪接亚型的比例,包括但不限于诱导Stat3α向Stat3β的转换,PKM2向PKM1的转换,MDM4外显子6的跳跃,Bcl2可变剪接位点的选择,LRP8外显子八跳跃等,从而实现对肿瘤等疾病的治疗。
下文将以具体实施例的方式阐述本发明。应理解,这些实施例仅仅是示例性的,而非限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件如Sambrook&Russell所著的Molecular Cloning:A Laboratory Manual(分子克隆实验指南第三版)中所述的条件,或按照制造厂商所建议的条件。除非另行定义,文中所使用的所有专业与科学用语与本领域熟练人员所熟悉的意义相同。此外,任何与所记载内容相似或均等的方法及材料皆可应用于本发明中。文中所述的较佳实施方法与材料仅作示范之用。
一、材料和方法
(1)表达AIDX-Cas9或Cas9-AIDX的融合蛋白的质粒的构建
参照CN 201710451424.3(本文将其全部内容以引用的方式纳入本文)实施例中所披露的方法构建表达本文所需的表达AIDX-Cas9或Cas9-AIDX融合蛋白的质粒。
在以下的实验中,使用到AIDX-nCas9-Ugi融合蛋白,参照CN 201710451424.3实施例1-3和14的方法构建其表达质粒,即MO91-AIDX-XTEN-nCas9-Ugi,该质粒表达SEQ ID NO:23所示的融合蛋白,其中,第1-182位为AIDX的氨基酸序列,第183-198位是接头XTEN的氨基酸序列,第199-1566位为nCas9的氨基酸序列,第1567-1570位和第1654-1657位为接头序列,第1571-1653为Ugi序列,第1658-1664位为SV40NLS的氨基酸序列。该融合蛋白的编码序列如SEQ ID NO:22所示。
(2)gRNA的制备
1、寻找20bp的靶标序列。如果该20bp的靶标序列的起始碱基不是G,需将一个G加到其5’端以使其能被RNA聚合酶III U6启动子有效转录。需注意的是该靶标序列不能含有XhoI或NheI的识别位点。
2、将sgRNA克隆到pLX(Addgene)中,获得pLX sgRNA。需如下4个引物,其中R1和F2是sgRNA特异性的:
F1:AAACTCGAGTGTACAAAAAAGCAGGCTTTAAAG(SEQ ID NO:18)
R1:rc(GN19)GGTGTTTCGTCCTTTCC(SEQ ID NO:19)
F2:GN19GTTTTAGAGCTAGAAATAGCAA(SEQ ID NO:20)
R2:AAAGCTAGCTAATGCCAACTTTGTACAAGAAAGCTG(SEQ ID NO:21)
其中,GN19=新的靶标结合序列,rc(GN19)=新靶标结合序列的反向互补序列。
3、分别使用F1+R1和F2+R2扩增pLX sgRNA;
4、凝胶纯化两次扩增获得的产物,合并,用于F1+R2进行第三次PCR;
5、使用NheI和XhoI消化步骤4进行的PCR获得的产物;和
6、连接和转化,从而制备得到sgRNA的表达载体。
(3)细胞的转染
293T细胞至70-90%的汇合度时进行转染。转染时,首先制备质粒DNA-脂质体复合物,包括将四倍量的2000试剂稀释在培养基中,分别将表达本文所述融合蛋白的质粒及相应的sgRNA质粒稀释在培养基中,然后将稀释的质粒分别加到稀释的2000试剂中(1:1)孵育30分钟。之后将该质粒DNA-脂质体复合物转染293T细胞。作为对照,仅用所述质粒DNA-脂质体复合物转染依照CN201710451424.3实施例4所构建的报告细胞,加嘌呤霉素2ug/ml和杀稻瘟菌素20ug/ml,筛选3d,分别在转染后第7天通过高通量测序分析基因表达,剪接和突变。
(4)定量PCR和高通量测序等
本文中的定量PCR和高通量测序等生物学方法,除非另有说明,否则均采用本领域惯常采用的方法和试剂实施。
二、结果
1、剪接位点G突变为A会导致外显子的跳读。
RPS24是核糖体的组成蛋白,其突变会导致先天性再生障碍性贫血。RPS24的外显子5可以被选择性剪接,产生具有不同3’UTR的两种亚型,其中肝癌细胞更倾向于表达包含外显子5的亚型,但是其生理功能并不清楚。
本实验利用TAM技术设计sgRNA(RPS24-E5-5’SS,其靶标结合区的序列如SEQ IDNO:9所示)将RPS24的外显子5的5’剪接位点或者3’剪接位点的G突变为A,调控其选择性剪接过程。如前文所述转染293T细胞,并分别在转染后第7天通过高通量测序分析基因表达,剪接和突变。
在293T细胞中,利用AIDX-nCas9-Ugi融合蛋白中的UNG抑制因子UGI和sgRNA使该融合蛋白靶向RPS24外显子5的5’剪接位点。通过测序发现内含子5的第一位(IVS5+1)有40%以上的G到A的突变,同时外显子5的最后一位碱基有30%的G到A的突变,在外显子5上还有两个位点有不到10%的G到A的突变(图3,A)。通过外显子拼接位点的测序发现,相对于对照组,RPS24sgRNA转染的细胞中外显子5的包含比例下调(图3,B),定量PCR结果也得到一致的结论(图3,C),并且在成熟RNA中并没有发现外显子的突变(图3,A)。
同时我们获得了两株同基因型的单克隆细胞系,它们的5’剪接位点完全突变为A,同时在外显子上也包含有一个G到A的突变(图3,D)。在这两株克隆中,RPS24外显子5包含的亚型完全检测不到,说明5’剪接位点的G到A的突变可以导致RPS24外显子5的跳读(图3,E)。
通过以上结果说明TAM技术可以有效地将剪接位点的G突变为A,导致外显子的跳读(RPS外显子5的5’剪接位点突变)。
2、CD45五号外显子剪接位点G突变为A会导致外显子的跳读。
为进一步验证是否能够有效地破坏剪接位点并且调控外显子跳读,我们选择了CD45基因的三个选择性外显子作为靶基因。CD45是一种受体酪氨酸磷酸酶,可以通过调控抗原受体(如TCR或者BCR)的信号传导调控T淋巴细胞和B淋巴细胞的发育与功能。CD45基因约由33个外显子组成,其中分别编码CD45蛋白胞外A,B,C区域的外显子4,5,6可以被选择性剪接。CD45亚型的表达模式取决于T细胞和B细胞的发育阶段,在B细胞表面表达包含有三个选择性外显子的最长CD45亚型(B220)。
我们针对CD45基因外显子5的5’剪接位点和3’剪接位点的G设计sgRNA(CD45-E5-5’SS和CD45-E5-3’SS,靶标结合区的序列分别如SEQ ID NO:1和2所示),在一种表达未剪接CD45亚型的生发中心B细胞系Raji细胞中进行外显子5剪接位点的编辑。将400ng的AIDx-nCas9-Ugi的表达质粒,300ng的sgRNA的表达质粒和50ng的Ugi表达质粒用具有1,100V电压和一个脉冲40ms的Neon(Life Technologies)电转染到1×105个Raji细胞中。转染后24h,加入2μg/ml嘌呤霉素以选择转染细胞3天。
我们发现这两个sgRNA分别可以诱导53.6%和73.4%的DNA发生剪接位点的G>A突变(图1和2),发现当破坏外显子5的剪接位点后,CD45RB表达明显下调,并且CD45RA与CD45RC的表达未发生明显改变,说明在诱导外显子跳读时剪接位点是独立的,并且5’SS或者3’SS的突变都可以造成外显子跳读。
4、TP53基因8号外显子剪接位点G突变为A会导致外显子的跳读。
本实验利用TAM技术设计sgRNA(TP53-E8-5’SS,其序列如SEQ ID NO:7所示)将TP53的外显子8的3’剪接位点的G突变为A,调控其选择性剪接过程(图4)。如前文所述转染293T细胞,并分别在转染后第7天通过高通量测序分析基因表达,剪接和突变。
通过测序发现内含子8的第一位(IVS8+1)有80%以上的G到A的突变(图4,A)。通过外显子拼接位点的测序发现,sgRNA转染的细胞中TP53有超过40%发生外显子8的跳读,定量PCR结果(图4,B、C)也得到一致的结论,并且在成熟RNA中并没有发现外显子的突变。对照组没有可检测到的外显子8的跳读。
5、TP53基因9号外显子剪接位点G突变为A会导致外显子的跳读。
我们还证明利用同样的方法,可以造成TP53基因外显子外显子9的跳读。具体而言,用TAM和靶向TP53外显子9的3'SS的sgRNA(TP53-E9-3’SS,其靶标结合区序列如SEQ IDNO:8所示)转染293T细胞。转染7天后,从基因组DNA扩增内含子-外显子连接并通过高通量测序进行分析。通过RT-PCR分析TP53的剪接。从cDNA扩增剪接连接并通过高通量测序进行分析。3’SS突变导致外显子在总转录物的34%中跳跃,并在23.6%的mRNA中激活隐蔽剪接位点。TAM处理的细胞也激活了内含子8内的神经外显子(总转录物的4.3%)(图4,D-F)。
6、剪接位点的准确编辑可以改变可变剪接位点的选择
除了外显子跳读外,RNA在剪接过程中还可能发生可变剪接位点的选择,并且会形成具有不同生理功能的新蛋白亚型。例如Stat3外显子23上一个可变剪接位点的选择会形成一种截短的缺少C端反式激活结构域的STAT3β亚型。全长的STAT3α可以促进肿瘤发生,而STAT3β可以发挥显性失活作用,抑制STAT3α功能,促进肿瘤细胞凋亡。尤其在乳腺癌细胞中,相对于敲除STAT3整体表达而言,诱导STAT3β表达能够更有效地抑制细胞存活,预示着诱导STAT3β表达可以作为肿瘤治疗策略。因为STAT3常规剪接位点与选择性剪接位点之间只有50bp,较难使用常规的双sgRNA剪接方法诱导STAT3β表达,而TAM技术可以提供更为准确的基因编辑方法。本实验设计sgRNA破坏常规剪接位点,利用TAM消除Stat3外显子23(Stat3α)的典型3'SS,并促进下游替代3'SS(Stat3β)的利用,其示意图如图5(A)所示。293T细胞用AIDx-nCas9-Ugi和靶向Stat3外显子23(STAT3-E23-3'SS,其靶标结合区的序列如SEQ ID NO:3所示)或针对AAVS1的sgRNA(Ctrl)。从DNA(顶部2个小组)或cDNA(底部2个小组)扩增内含子-外显子连接,并通过高通量测序进行分析。利用上述方法在293T细胞中表达TAM和sgRNA,使其中超过50%的3’剪接位点的G突变为A(图5,B)。结果显示,TAM在Stat3外显子23增强了远端3'SS的利用(图5,C)。通过定量PCR以及免疫印迹分析发现STAT3β表达水平上调而STAT3α表达水平下调(图5,E-F)。与预期一致,相对于敲除STAT3整体蛋白表达,TAM编辑的细胞增殖速率更为明显地被抑制。
以上结果显示针对可变剪接位点极为靠近的情况,TAM技术可以使我们克服常规双sgRNA剪接方法的缺陷,准确地破坏选择性剪接位点,调控可变剪接位点的选择。
7、互斥外显子
互斥外显子是另一种主要的选择性剪接形式,互斥外显子可以被选择性包含在不同转录本中产生功能不同的蛋白。丙酮酸激酶(PKM)是糖酵解过程的限速酶,在剪接过程中PKM的外显子9和10可以被选择性包含产生两种亚型PKM1和PKM2,其中PKM1包含外显子9,不包含外显子10,主要表达在成人组织中,而PKM2包含外显子10,不包含外显子9,主要表达在胚胎干细胞以及肿瘤细胞中。因为PKM2与肿瘤发生有关,我们希望能够利用TAM技术将肿瘤细胞的PKM剪接方式从PKM2切换为PKM1。
图6(A)显示了TAM在C2C12细胞中将PKM2转移到PKM1的示意图。上图中,PKM基因的外显子10而不是外显子9被剪接以产生PKM2,其cDNA被限制酶PstI识别;下图中,TAM将外显子10的5'SS处的GT二核苷酸转化为AT。因此,外显子9而不是外显子10被剪接以产生PKM1,其cDNA被限制酶NcoI识别。
我们设计针对内含子10的3’或5’剪接位点的sgRNA(PKM-3’SS-E10或PKM-5'SS-E10,其靶标结合区的序列分别如SEQ ID NO:15或16所示),感染C2C12细胞使其中的G突变为A(图6,C、D)。我们发现在C2C12分化得到的肌肉细胞中,PKM2表达明显下调而PKM1表达水平上调(图6,B、E、F)。同样,在未分化的C2C12细胞中,PKM2表达明显下调而PKM1表达水平上调(图6,G,H)。
而利用针对内含子9的5’或3’剪接位点的sgRNA(PKM-3'SS-E9,PKM-5’SS-E9,其靶标结合区的序列分别如SEQ ID NO:13或14所示),可以使其G突变为A,同时PKM1表达水平下调(图7),同时PKM2的表达上调进一步说明剪接位点的突变能够改变互斥外显子的剪接位点的选择。
8、诱导内含子包含
內含子包含是另一种可变剪接的形式,最近的研究证明內含子的包含在很多人类疾病包括肿瘤中发生。我们证明,利用TAM和sgRNA破坏相应的内含子的剪接位点,可以特异诱导内含子的包含。
BAP1是组蛋白去泛素化酶,其第二内含子在一些肿瘤中保留,引起BAP1的表达降低。BAP1的第二个内含子可能通过内含子定义方式进行剪接,其中5'SS与下游3'SS配对。将G转换为A,在5'SS处理U1识别U1RNP并破坏内含子定义,导致包含该内含子。本实验引导TAM在BAP1内含子2的5'剪接位点处突变G,其示意图如图8(A)所示。
我们设计针对內含子2的5’剪接位点的sgRNA(BAP1-E2-5'SS,其靶标结合区的序列如SEQ ID NO:5所示),用AIDx-nCas9-Ugi的表达质粒和针对AAVS1(Ctrl)或针对BAP1内含子2的该sgRNA的表达质粒转染293T细胞。转染7天后,通过RT-PCR(图8,B)或同种型特异性实时PCR(图8,C)分析BAP1mRNA的剪接。结果显示,超过70%的G突变为A(图8,D)。突变后内含子2的包含被诱导,有超过60%的BAP1mRNA含有内含子2;与此类似,突变内含子2的3’剪接位点(sgRNA的序列如SEQ ID NO:6(BAP1-E3-3'SS)所示)后,也诱导BAP1的内含子包含(图9,B-E)。
9、3’剪接位点-3位C到T的突变可以促进外显子包含
除了剪接位点,mRNA上其他顺式作用元件也可以改变前体mRNA剪接过程,因此我们还可以利用TAM技术编辑其他剪接调控元件。因为内含子元件的改变不会影响基因表达序列,我们聚焦在内含子的剪接调控元件的编辑上。在3’剪接位点上游,存在着一个多聚嘧啶链,由胞嘧啶(C)和胸腺嘧啶(T)组成,我们的实验证明可以通过TAM和相应的sgRNA把多聚嘧啶链中的C突变为T,增强3’剪接位点的强度,促进下游外显子的包含。
用AIDx-nCas9-Ugi的表达质粒和针对AAVS1(Ctrl)的sgRNA或针对RPS24第5外显子的聚嘧啶核苷的sgRNA(RPS24-E5-PPT,其靶标结合区的序列如SEQ ID NO:10所示)的表达质粒转染293T细胞。转染后6天,从基因组DNA扩增sgRNA靶向区域,并通过超过8000x覆盖的高通量测序进行分析。结果显示,多聚嘧啶链中超过50%的C突变为T。我们发现外显子5的包含率提高(图11,B,C)。之后我们分选得到了两个包含完全的C到T突变的单细胞克隆,其外显子5的包含率分别提高了8倍与5倍(图11,E)。
此外,我们利用AIDx-nCas9-Ugi的表达质粒和对照sgRNA(Ctrl)或靶向GANAB外显子6的PPT的sgRNA(GANAB-E6-PPT,其靶标结合区的序列如SEQ ID NO:4所示)的表达质粒转染293T细胞。转染后6天,从基因组DNA扩增sgRNA靶向区域,并通过超过8000x覆盖的高通量测序进行分析。结果如图10(B-E)所示,其中多个C被诱导突变为T,最高的是IVS5-6C,其超过70%的C突变为T,同时高通量测序证明外显子6的包含增加了50%。类似的方法也可以造成ThyN1外显子6(sgRNA靶标结合区的序列如SEQ ID NO:12所示,THYN1-E6-PPT)的包含增加(图10,F-G)和OS9外显子13(sgRNA靶标结合区的序列如SEQ ID NO:11所示,OS9-E13-PPT)的包含增加(图10,H-I)。
10、TAM技术可以在人类iPS细胞以及mdx小鼠模型中恢复DMD蛋白表达(C2C12和iPS)
杜氏肌肉萎缩症(DMD)是一种肌肉萎缩疾病,在美国每4000个男性就有1个病例。由于病人的DMD基因的可遗传性突变导致其开放阅读框改变或者未成熟密码子的形成而导致骨骼肌抗肌肉萎缩蛋白缺陷,导致疾病的发生。相对于突变的DMD基因,截短的抗肌肉萎缩蛋白能够发挥部分功能,导致发病程度较轻的贝克肌萎缩症。因此有研究利用反义寡聚核苷酸或者利用双sgRNA介导的CRISPR技术跳读部分外显子从而使DMD的开放阅读框恢复,促进抗抗肌萎缩蛋白的表达。这种通过跳读DMD基因的非必需区域以部分恢复抗抗肌萎缩蛋白表达的方法,预计可以使80%的DMD病人从中获利。但是利用反义寡聚核苷酸进行治疗需要进行持续性给药,极度耗费时间与金钱,因此开发新的DMD基因治疗策略十分必要。
为了验证TAM技术是否可以调控DMD基因的外显子跳读,我们利用一个缺少外显子51的DMD病人的iPS细胞进行实验,根据序列分析发现当利用sgRNA(其靶标结合区的序列如SEQ ID NO:17所示,DMD EXON50 5'SS)跳读外显子50后抗肌萎缩蛋白的开放阅读框可以恢复(图12)。我们在病人来源的iPSC中转染sgRNA(其靶标结合区的序列如SEQ ID NO:17所示)的表达质粒和AIDx-nCas9-Ugi的表达质粒,高通量测序发现可以诱导12%以上的G>A突变(图12,B),而后我们分离出一株单克隆细胞具有完全的G>A突变(图12B)。而后我们将iPSCs细胞向心肌细胞分化,发现TAM编辑过的细胞发生了外显子50的跳读(图12C,D),进一步western实验发现抗肌肉萎缩蛋白蛋白的表达在TAM修复的细胞中获得了恢复(图12,E)。
采用相同的实验,利用AIDx-saCas9(KKH,切割酶)-Ugi(编码序列如SEQ ID NO:49所示,氨基酸序列如SEQ ID NO:50所示)和相应的sgRNA序列(其序列如SEQ ID NO:51所示,其骨架序列如SEQ ID NO:52所示)诱导DMD第50号外显子跳读。具体而言,用对照sgRNA(ctrl)或目标sgRNA(E50-5'SS)结合AIDx-saCas9(KKH,切割酶)-Ugi处理杜氏肌无力病人的iPSC细胞后,利用PCR扩增相应的DNA,进行高通量测序分析诱导的突变。数据是两个独立实验的代表。结果如图14(A)所示。分别将正常人来源的iPSCs、病人来源的iPSCs和修复后病人来源的iPSCs向心肌细胞分化,而后分别利用RT-PCR或western blot或免疫荧光染色检测DMD基因和抗肌萎缩蛋白的表达,结果分别如图14的B、C和D所示。图14的E、F和G显示,修复后的心肌细胞逆转了肌无力的表型,心肌细胞分别通过低渗透压诱导的肌酸激酶释放(E)、miR31表达(F)、以及β-肌萎缩蛋白聚糖蛋白的表达(G),证明修复后的心肌细胞逆转了肌无力的表型。此外,全基因组测序证明基因编辑的高度特异性,两次全基因组测序只发现一个脱靶位点(图14,H和I)。
本文涉及的序列信息如下:
序列表
<110> 中国科学院上海生命科学研究院
<120> 通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控RNA剪接的方法
<130> 172473Z1
<150> CN 201710611651.8
<151> 2017-07-25
<160> 52
<170> SIPOSequenceListing 1.0
<210> 1
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 1
cctgagatag cattgctgcc 20
<210> 2
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 2
aacacctaag gtaggaaagt 20
<210> 3
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 3
gtcgttctgt aggaaatggg 20
<210> 4
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 4
ctgccccagt ttctcggata 20
<210> 5
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 5
taccgaaatc ttccacgagc 20
<210> 6
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 6
cacctgcgat gaggaaagga 20
<210> 7
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 7
cctcgcttag tgctccctgg 20
<210> 8
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 8
gctaggaaag aggcaaggaa 20
<210> 9
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 9
tatacctgtg atccaatctc 20
<210> 10
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 10
tgattcagtg agctggagat 20
<210> 11
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 11
cccctctaag aggaggatcc 20
<210> 12
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 12
gtacactgtt gtcacatagg 20
<210> 13
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 13
ctatctgtaa ggtttagggt 20
<210> 14
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 14
ccctacctgc cagactccgt 20
<210> 15
<211> 23
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 15
ctaggggagc aacatccgtc cag 23
<210> 16
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 16
tcctacctgc cagacttggt 20
<210> 17
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 17
atacttacag gctccaatag 20
<210> 18
<211> 33
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 18
aaactcgagt gtacaaaaaa gcaggcttta aag 33
<210> 19
<211> 37
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (2)..(20)
<223> n是a、c、g或t
<400> 19
gnnnnnnnnn nnnnnnnnnn ggtgtttcgt cctttcc 37
<210> 20
<211> 42
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (2)..(20)
<223> n是a、c、g或t
<400> 20
gnnnnnnnnn nnnnnnnnnn gttttagagc tagaaatagc aa 42
<210> 21
<211> 36
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 21
aaagctagct aatgccaact ttgtacaaga aagctg 36
<210> 22
<211> 5013
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 22
atggacagcc tcttgatgaa ccggaggaag tttctttacc aattcaaaaa tgtccgctgg 60
gctaagggtc ggcgtgagac ctacctgtgc tacgtagtga agaggcgtga cagtgctaca 120
tccttttcac tggactttgg ttatcttcgc aataagaacg gctgccacgt ggaattgctc 180
ttcctccgct acatctcgga ctgggaccta gaccctggcc gctgctaccg cgtcacctgg 240
ttcacctcct ggagcccctg ctacgactgt gcccgacatg tggccgactt tctgcgaggg 300
aaccccaacc tcagtctgag gatcttcacc gcgcgcctct acttctgtga ggaccgcaag 360
gctgagcccg aggggctgcg gcggctgcac cgcgccgggg tgcaaatagc catcatgacc 420
ttcaaagatt atttttactg ctggaatact tttgtagaaa accatgaaag aactttcaaa 480
gcctgggaag ggctgcatga aaattcagtt cgtctctcca gacagcttcg gcgcatcctt 540
ttgcccagcg gcagcgagac tcccgggacc tcagagtccg ccacacccga aagtatggat 600
aagaaatact caataggctt agctatcggc acaaatagcg tcggatgggc ggtgatcact 660
gatgaatata aggttccgtc taaaaagttc aaggttctgg gaaatacaga ccgccacagt 720
atcaaaaaaa atcttatagg ggctctttta tttgacagtg gagagacagc ggaagcgact 780
cgtctcaaac ggacagctcg tagaaggtat acacgtcgga agaatcgtat ttgttatcta 840
caggagattt tttcaaatga gatggcgaaa gtagatgata gtttctttca tcgacttgaa 900
gagtcttttt tggtggaaga agacaagaag catgaacgtc atcctatttt tggaaatata 960
gtagatgaag ttgcttatca tgagaaatat ccaactatct atcatctgcg aaaaaaattg 1020
gtagattcta ctgataaagc ggatttgcgc ttaatctatt tggccttagc gcatatgatt 1080
aagtttcgtg gtcatttttt gattgaggga gatttaaatc ctgataatag tgatgtggac 1140
aaactattta tccagttggt acaaacctac aatcaattat ttgaagaaaa ccctattaac 1200
gcaagtggag tagatgctaa agcgattctt tctgcacgat tgagtaaatc aagacgatta 1260
gaaaatctca ttgctcagct ccccggtgag aagaaaaatg gcttatttgg gaatctcatt 1320
gctttgtcat tgggtttgac ccctaatttt aaatcaaatt ttgatttggc agaagatgct 1380
aaattacagc tttcaaaaga tacttacgat gatgatttag ataatttatt ggcgcaaatt 1440
ggagatcaat atgctgattt gtttttggca gctaagaatt tatcagatgc tattttactt 1500
tcagatatcc taagagtaaa tactgaaata actaaggctc ccctatcagc ttcaatgatt 1560
aaacgctacg atgaacatca tcaagacttg actcttttaa aagctttagt tcgacaacaa 1620
cttccagaaa agtataaaga aatctttttt gatcaatcaa aaaacggata tgcaggttat 1680
attgatgggg gagctagcca agaagaattt tataaattta tcaaaccaat tttagaaaaa 1740
atggatggta ctgaggaatt attggtgaaa ctaaatcgtg aagatttgct gcgcaagcaa 1800
cggacctttg acaacggctc tattccccat caaattcact tgggtgagct gcatgctatt 1860
ttgagaagac aagaagactt ttatccattt ttaaaagaca atcgtgagaa gattgaaaaa 1920
atcttgactt ttcgaattcc ttattatgtt ggtccattgg cgcgtggcaa tagtcgtttt 1980
gcatggatga ctcggaagtc tgaagaaaca attaccccat ggaattttga agaagttgtc 2040
gataaaggtg cttcagctca atcatttatt gaacgcatga caaactttga taaaaatctt 2100
ccaaatgaaa aagtactacc aaaacatagt ttgctttatg agtattttac ggtttataac 2160
gaattgacaa aggtcaaata tgttactgaa ggaatgcgaa aaccagcatt tctttcaggt 2220
gaacagaaga aagccattgt tgatttactc ttcaaaacaa atcgaaaagt aaccgttaag 2280
caattaaaag aagattattt caaaaaaata gaatgttttg atagtgttga aatttcagga 2340
gttgaagata gatttaatgc ttcattaggt acctaccatg atttgctaaa aattattaaa 2400
gataaagatt ttttggataa tgaagaaaat gaagatatct tagaggatat tgttttaaca 2460
ttgaccttat ttgaagatag ggagatgatt gaggaaagac ttaaaacata tgctcacctc 2520
tttgatgata aggtgatgaa acagcttaaa cgtcgccgtt atactggttg gggacgtttg 2580
tctcgaaaat tgattaatgg tattagggat aagcaatctg gcaaaacaat attagatttt 2640
ttgaaatcag atggttttgc caatcgcaat tttatgcagc tgatccatga tgatagtttg 2700
acatttaaag aagacattca aaaagcacaa gtgtctggac aaggcgatag tttacatgaa 2760
catattgcaa atttagctgg tagccctgct attaaaaaag gtattttaca gactgtaaaa 2820
gttgttgatg aattggtcaa agtaatgggg cggcataagc cagaaaatat cgttattgaa 2880
atggcacgtg aaaatcagac aactcaaaag ggccagaaaa attcgcgaga gcgtatgaaa 2940
cgaatcgaag aaggtatcaa agaattagga agtcagattc ttaaagagca tcctgttgaa 3000
aatactcaat tgcaaaatga aaagctctat ctctattatc tccaaaatgg aagagacatg 3060
tatgtggacc aagaattaga tattaatcgt ttaagtgatt atgatgtcga tcacattgtt 3120
ccacaaagtt tccttaaaga cgattcaata gacaataagg tcttaacgcg ttctgataaa 3180
aatcgtggta aatcggataa cgttccaagt gaagaagtag tcaaaaagat gaaaaactat 3240
tggagacaac ttctaaacgc caagttaatc actcaacgta agtttgataa tttaacgaaa 3300
gctgaacgtg gaggtttgag tgaacttgat aaagctggtt ttatcaaacg ccaattggtt 3360
gaaactcgcc aaatcactaa gcatgtggca caaattttgg atagtcgcat gaatactaaa 3420
tacgatgaaa atgataaact tattcgagag gttaaagtga ttaccttaaa atctaaatta 3480
gtttctgact tccgaaaaga tttccaattc tataaagtac gtgagattaa caattaccat 3540
catgcccatg atgcgtatct aaatgccgtc gttggaactg ctttgattaa gaaatatcca 3600
aaacttgaat cggagtttgt ctatggtgat tataaagttt atgatgttcg taaaatgatt 3660
gctaagtctg agcaagaaat aggcaaagca accgcaaaat atttctttta ctctaatatc 3720
atgaacttct tcaaaacaga aattacactt gcaaatggag agattcgcaa acgccctcta 3780
atcgaaacta atggggaaac tggagaaatt gtctgggata aagggcgaga ttttgccaca 3840
gtgcgcaaag tattgtccat gccccaagtc aatattgtca agaaaacaga agtacagaca 3900
ggcggattct ccaaggagtc aattttacca aaaagaaatt cggacaagct tattgctcgt 3960
aaaaaagact gggatccaaa aaaatatggt ggttttgata gtccaacggt agcttattca 4020
gtcctagtgg ttgctaaggt ggaaaaaggg aaatcgaaga agttaaaatc cgttaaagag 4080
ttactaggga tcacaattat ggaaagaagt tcctttgaaa aaaatccgat tgacttttta 4140
gaagctaaag gatataagga agttaaaaaa gacttaatca ttaaactacc taaatatagt 4200
ctttttgagt tagaaaacgg tcgtaaacgg atgctggcta gtgccggaga attacaaaaa 4260
ggaaatgagc tggctctgcc aagcaaatat gtgaattttt tatatttagc tagtcattat 4320
gaaaagttga agggtagtcc agaagataac gaacaaaaac aattgtttgt tgagcagcat 4380
aagcattatt tagatgagat tattgagcaa atcagtgaat tttctaagcg tgttatttta 4440
gcagatgcca atttagataa agttcttagt gcatataaca aacatagaga caaaccaata 4500
cgtgaacaag cagaaaatat tattcattta tttacgttga cgaatcttgg agctcccgct 4560
gcttttaaat attttgatac aacaattgat cgtaaacgat atacgtctac aaaagaagtt 4620
ttagatgcca ctcttatcca tcaatccatc actggtcttt atgaaacacg cattgatttg 4680
agtcagctag gaggtgactc tggtggttct actaatctgt cagatattat tgaaaaggag 4740
accggtaagc aactggttat ccaggaatcc atcctcatgc tcccagagga ggtggaagaa 4800
gtcattggga acaagccgga aagcgatata ctcgtgcaca ccgcctacga cgagagcacc 4860
gacgagaatg tcatgcttct gactagcgac gcccctgaat acaagccttg ggctctggtc 4920
atacaggata gcaacggtga gaacaagatt aagatgctct ctggtggttc tcccaagaag 4980
aagaggaaag tccatcacca ccaccatcac taa 5013
<210> 23
<211> 1670
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 23
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1 5 10 15
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
20 25 30
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
35 40 45
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
50 55 60
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
65 70 75 80
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
100 105 110
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
115 120 125
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
130 135 140
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
145 150 155 160
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175
Arg Arg Ile Leu Leu Pro Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu
180 185 190
Ser Ala Thr Pro Glu Ser Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala
195 200 205
Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys
210 215 220
Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser
225 230 235 240
Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr
245 250 255
Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg
260 265 270
Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met
275 280 285
Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu
290 295 300
Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile
305 310 315 320
Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu
325 330 335
Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile
340 345 350
Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile
355 360 365
Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile
370 375 380
Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn
385 390 395 400
Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys
405 410 415
Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys
420 425 430
Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro
435 440 445
Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu
450 455 460
Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile
465 470 475 480
Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp
485 490 495
Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys
500 505 510
Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln
515 520 525
Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys
530 535 540
Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr
545 550 555 560
Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro
565 570 575
Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn
580 585 590
Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile
595 600 605
Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln
610 615 620
Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys
625 630 635 640
Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly
645 650 655
Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr
660 665 670
Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser
675 680 685
Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys
690 695 700
Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn
705 710 715 720
Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala
725 730 735
Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys
740 745 750
Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys
755 760 765
Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg
770 775 780
Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys
785 790 795 800
Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp
805 810 815
Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu
820 825 830
Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln
835 840 845
Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu
850 855 860
Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe
865 870 875 880
Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His
885 890 895
Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser
900 905 910
Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser
915 920 925
Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu
930 935 940
Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu
945 950 955 960
Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg
965 970 975
Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln
980 985 990
Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys
995 1000 1005
Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln
1010 1015 1020
Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val
1025 1030 1035 1040
Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr
1045 1050 1055
Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu
1060 1065 1070
Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys
1075 1080 1085
Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly
1090 1095 1100
Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val
1105 1110 1115 1120
Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg
1125 1130 1135
Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys
1140 1145 1150
Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe
1155 1160 1165
Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp
1170 1175 1180
Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1185 1190 1195 1200
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val
1205 1210 1215
Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala
1220 1225 1230
Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile
1235 1240 1245
Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn
1250 1255 1260
Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr
1265 1270 1275 1280
Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr
1285 1290 1295
Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
1300 1305 1310
Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys
1315 1320 1325
Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val
1330 1335 1340
Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1345 1350 1355 1360
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro
1365 1370 1375
Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu
1380 1385 1390
Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg
1395 1400 1405
Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu
1410 1415 1420
Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1425 1430 1435 1440
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe
1445 1450 1455
Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser
1460 1465 1470
Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val
1475 1480 1485
Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala
1490 1495 1500
Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala
1505 1510 1515 1520
Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
1525 1530 1535
Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly
1540 1545 1550
Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Ser Gly
1555 1560 1565
Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln
1570 1575 1580
Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu
1585 1590 1595 1600
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr
1605 1610 1615
Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro
1620 1625 1630
Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn
1635 1640 1645
Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1650 1655 1660
His His His His His His
1665 1670
<210> 24
<211> 4989
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 24
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaacg aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtcgccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttggtagtg atgctttaga cgattttgac 4320
ttagatatgc ttggttcaga cgcgttagac gacttcggtg gaggatccat ggacagcctc 4380
ttgatgaacc ggaggaagtt tctttaccaa ttcaaaaatg tccgctgggc taagggtcgg 4440
cgtgagacct acctgtgcta cgtagtgaag aggcgtgaca gtgctacatc cttttcactg 4500
gactttggtt atcttcgcaa taagaacggc tgccacgtgg aattgctctt cctccgctac 4560
atctcggact gggacctaga ccctggccgc tgctaccgcg tcacctggtt cacctcctgg 4620
agcccctgct acgactgtgc ccgacatgtg gccgactttc tgcgagggaa ccccaacctc 4680
agtctgagga tcttcaccgc gcgcctctac ttctgtgagg accgcaaggc tgagcccgag 4740
gggctgcggc ggctgcaccg cgccggggtg caaatagcca tcatgacctt caaagattat 4800
ttttactgct ggaatacttt tgtagaaaac catgaaagaa ctttcaaagc ctgggaaggg 4860
ctgcatgaaa attcagttcg tctctccaga cagcttcggc gcatcctttt gcccctgtat 4920
gaggttgatg acttacgaga cgcatttcgt acttggggac gtgattacaa agacgatgac 4980
gataagtga 4989
<210> 25
<211> 1662
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 25
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1010 1015 1020
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035 1040
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val
1045 1050 1055
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys
1060 1065 1070
Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1075 1080 1085
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile
1090 1095 1100
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
1105 1110 1115 1120
Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1125 1130 1135
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1140 1145 1150
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1155 1160 1165
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1170 1175 1180
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1185 1190 1195 1200
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1205 1210 1215
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1220 1225 1230
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1250 1255 1260
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275 1280
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys
1285 1290 1295
Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu
1300 1305 1310
Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1315 1320 1325
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg
1330 1335 1340
Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly
1345 1350 1355 1360
Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1365 1370 1375
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1380 1385 1390
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1395 1400 1405
Asp Glu Gly Ala Pro Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser
1410 1415 1420
Pro Lys Lys Lys Arg Lys Val Gly Ser Asp Ala Leu Asp Asp Phe Asp
1425 1430 1435 1440
Leu Asp Met Leu Gly Ser Asp Ala Leu Asp Asp Phe Gly Gly Gly Ser
1445 1450 1455
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1460 1465 1470
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
1475 1480 1485
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
1490 1495 1500
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
1505 1510 1515 1520
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
1525 1530 1535
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
1540 1545 1550
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
1555 1560 1565
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
1570 1575 1580
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
1585 1590 1595 1600
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
1605 1610 1615
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
1620 1625 1630
Arg Arg Ile Leu Leu Pro Leu Tyr Glu Val Asp Asp Leu Arg Asp Ala
1635 1640 1645
Phe Arg Thr Trp Gly Arg Asp Tyr Lys Asp Asp Asp Asp Lys
1650 1655 1660
<210> 26
<211> 4941
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 26
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaacg aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtcgccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttggtagtg atgctttaga cgattttgac 4320
ttagatatgc ttggttcaga cgcgttagac gacttcggtg gaggatccat ggacagcctc 4380
ttgatgaacc ggaggaagtt tctttaccaa ttcaaaaatg tccgctgggc taagggtcgg 4440
cgtgagacct acctgtgcta cgtagtgaag aggcgtgaca gtgctacatc cttttcactg 4500
gactttggtt atcttcgcaa taagaacggc tgccacgtgg aattgctctt cctccgctac 4560
atctcggact gggacctaga ccctggccgc tgctaccgcg tcacctggtt cacctcctgg 4620
agcccctgct acgactgtgc ccgacatgtg gccgactttc tgcgagggaa ccccaacctc 4680
agtctgagga tcttcaccgc gcgcctctac ttctgtgagg accgcaaggc tgagcccgag 4740
gggctgcggc ggctgcaccg cgccggggtg caaatagcca tcatgacctt caaagattat 4800
ttttactgct ggaatacttt tgtagaaaac catgaaagaa ctttcaaagc ctgggaaggg 4860
ctgcatgaaa attcagttcg tctctccaga cagcttcggc gcatcctttt gcccgattac 4920
aaagacgatg acgataagtg a 4941
<210> 27
<211> 1646
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 27
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1010 1015 1020
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035 1040
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val
1045 1050 1055
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys
1060 1065 1070
Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1075 1080 1085
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile
1090 1095 1100
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
1105 1110 1115 1120
Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1125 1130 1135
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1140 1145 1150
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1155 1160 1165
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1170 1175 1180
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1185 1190 1195 1200
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1205 1210 1215
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1220 1225 1230
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1250 1255 1260
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275 1280
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys
1285 1290 1295
Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu
1300 1305 1310
Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1315 1320 1325
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg
1330 1335 1340
Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly
1345 1350 1355 1360
Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1365 1370 1375
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1380 1385 1390
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1395 1400 1405
Asp Glu Gly Ala Pro Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser
1410 1415 1420
Pro Lys Lys Lys Arg Lys Val Gly Ser Asp Ala Leu Asp Asp Phe Asp
1425 1430 1435 1440
Leu Asp Met Leu Gly Ser Asp Ala Leu Asp Asp Phe Gly Gly Gly Ser
1445 1450 1455
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1460 1465 1470
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
1475 1480 1485
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
1490 1495 1500
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
1505 1510 1515 1520
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
1525 1530 1535
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
1540 1545 1550
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
1555 1560 1565
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
1570 1575 1580
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
1585 1590 1595 1600
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
1605 1610 1615
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
1620 1625 1630
Arg Arg Ile Leu Leu Pro Asp Tyr Lys Asp Asp Asp Asp Lys
1635 1640 1645
<210> 28
<211> 4731
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 28
atggacagcc tcttgatgaa ccggaggaag tttctttacc aattcaaaaa tgtccgctgg 60
gctaagggtc ggcgtgagac ctacctgtgc tacgtagtga agaggcgtga cagtgctaca 120
tccttttcac tggactttgg ttatcttcgc aataagaacg gctgccacgt ggaattgctc 180
ttcctccgct acatctcgga ctgggaccta gaccctggcc gctgctaccg cgtcacctgg 240
ttcacctcct ggagcccctg ctacgactgt gcccgacatg tggccgactt tctgcgaggg 300
aaccccaacc tcagtctgag gatcttcacc gcgcgcctct acttctgtga ggaccgcaag 360
gctgagcccg aggggctgcg gcggctgcac cgcgccgggg tgcaaatagc catcatgacc 420
ttcaaagatt atttttactg ctggaatact tttgtagaaa accatgaaag aactttcaaa 480
gcctgggaag ggctgcatga aaattcagtt cgtctctcca gacagcttcg gcgcatcctt 540
ttgcccagcg gcagcgagac tcccgggacc tcagagtccg ccacacccga aagtgataaa 600
aagtattcta ttggtttagc catcggcact aattccgttg gatgggctgt cataaccgat 660
gaatacaaag taccttcaaa gaaatttaag gtgttgggga acacagaccg tcattcgatt 720
aaaaagaatc ttatcggtgc cctcctattc gatagtggcg aaacggcaga ggcgactcgc 780
ctgaaacgaa ccgctcggag aaggtataca cgtcgcaaga accgaatatg ttacttacaa 840
gaaattttta gcaatgagat ggccaaagtt gacgattctt tctttcaccg tttggaagag 900
tccttccttg tcgaagagga caagaaacat gaacggcacc ccatctttgg aaacatagta 960
gatgaggtgg catatcatga aaagtaccca acgatttatc acctcagaaa aaagctagtt 1020
gactcaactg ataaagcgga cctgaggtta atctacttgg ctcttgccca tatgataaag 1080
ttccgtgggc actttctcat tgagggtgat ctaaatccgg acaactcgga tgtcgacaaa 1140
ctgttcatcc agttagtaca aacctataat cagttgtttg aagagaaccc tataaatgca 1200
agtggcgtgg atgcgaaggc tattcttagc gcccgcctct ctaaatcccg acggctagaa 1260
aacctgatcg cacaattacc cggagagaag aaaaatgggt tgttcggtaa ccttatagcg 1320
ctctcactag gcctgacacc aaattttaag tcgaacttcg acttagctga agatgccaaa 1380
ttgcagctta gtaaggacac gtacgatgac gatctcgaca atctactggc acaaattgga 1440
gatcagtatg cggacttatt tttggctgcc aaaaacctta gcgatgcaat cctcctatct 1500
gacatactga gagttaatac tgagattacc aaggcgccgt tatccgcttc aatgatcaaa 1560
aggtacgatg aacatcacca agacttgaca cttctcaagg ccctagtccg tcagcaactg 1620
cctgagaaat ataaggaaat attctttgat cagtcgaaaa acgggtacgc aggttatatt 1680
gacggcggag cgagtcaaga ggaattctac aagtttatca aacccatatt agagaagatg 1740
gatgggacgg aagagttgct tgtaaaactc aatcgcgaag atctactgcg aaagcagcgg 1800
actttcgaca acggtagcat tccacatcaa atccacttag gcgaattgca tgctatactt 1860
agaaggcagg aggattttta tccgttcctc aaagacaatc gtgaaaagat tgagaaaatc 1920
ctaacctttc gcatacctta ctatgtggga cccctggccc gagggaactc tcggttcgca 1980
tggatgacaa gaaagtccga agaaacgatt actccatgga attttgagga agttgtcgat 2040
aaaggtgcgt cagctcaatc gttcatcgag aggatgacca actttgacaa gaatttaccg 2100
aacgaaaaag tattgcctaa gcacagttta ctttacgagt atttcacagt gtacaatgaa 2160
ctcacgaaag ttaagtatgt cactgagggc atgcgtaaac ccgcctttct aagcggagaa 2220
cagaagaaag caatagtaga tctgttattc aagaccaacc gcaaagtgac agttaagcaa 2280
ttgaaagagg actactttaa gaaaattgaa tgcttcgatt ctgtcgagat ctccggggta 2340
gaagatcgat ttaatgcgtc acttggtacg tatcatgacc tcctaaagat aattaaagat 2400
aaggacttcc tggataacga agagaatgaa gatatcttag aagatatagt gttgactctt 2460
accctctttg aagatcggga aatgattgag gaaagactaa aaacatacgc tcacctgttc 2520
gacgataagg ttatgaaaca gttaaagagg cgtcgctata cgggctgggg acgattgtcg 2580
cggaaactta tcaacgggat aagagacaag caaagtggta aaactattct cgattttcta 2640
aagagcgacg gcttcgccaa taggaacttt atgcagctga tccatgatga ctctttaacc 2700
ttcaaagagg atatacaaaa ggcacaggtt tccggacaag gggactcatt gcacgaacat 2760
attgcgaatc ttgctggttc gccagccatc aaaaagggca tactccagac agtcaaagta 2820
gtggatgagc tagttaaggt catgggacgt cacaaaccgg aaaacattgt aatcgagatg 2880
gcacgcgaaa atcaaacgac tcagaagggg caaaaaaaca gtcgagagcg gatgaagaga 2940
atagaagagg gtattaaaga actgggcagc cagatcttaa aggagcatcc tgtggaaaat 3000
acccaattgc agaacgagaa actttacctc tattacctac aaaatggaag ggacatgtat 3060
gttgatcagg aactggacat aaaccgttta tctgattacg acgtcgatgc cattgtaccc 3120
caatcctttt tgaaggacga ttcaatcgac aataaagtgc ttacacgctc ggataagaac 3180
cgagggaaaa gtgacaatgt tccaagcgag gaagtcgtaa agaaaatgaa gaactattgg 3240
cggcagctcc taaatgcgaa actgataacg caaagaaagt tcgataactt aactaaagct 3300
gagaggggtg gcttgtctga acttgacaag gccggattta ttaaacgtca gctcgtggaa 3360
acccgccaaa tcacaaagca tgttgcacag atactagatt cccgaatgaa tacgaaatac 3420
gacgagaacg ataagctgat tcgggaagtc aaagtaatca ctttaaagtc aaaattggtg 3480
tcggacttca gaaaggattt tcaattctat aaagttaggg agataaataa ctaccaccat 3540
gcgcacgacg cttatcttaa tgccgtcgta gggaccgcac tcattaagaa atacccgaag 3600
ctagaaagtg agtttgtgta tggtgattac aaagtttatg acgtccgtaa gatgatcgcg 3660
aaaagcgaac aggagatagg caaggctaca gccaaatact tcttttattc taacattatg 3720
aatttcttta agacggaaat cactctggca aacggagaga tacgcaaacg acctttaatt 3780
gaaaccaatg gggagacagg tgaaatcgta tgggataagg gccgggactt cgcgacggtg 3840
agaaaagttt tgtccatgcc ccaagtcaac atagtaaaga aaactgaggt gcagaccgga 3900
gggttttcaa aggaatcgat tcttccaaaa aggaatagtg ataagctcat cgctcgtaaa 3960
aaggactggg acccgaaaaa gtacggtggc ttcgatagcc ctacagttgc ctattctgtc 4020
ctagtagtgg caaaagttga gaagggaaaa tccaagaaac tgaagtcagt caaagaatta 4080
ttggggataa cgattatgga gcgctcgtct tttgaaaaga accccatcga cttccttgag 4140
gcgaaaggtt acaaggaagt aaaaaaggat ctcataatta aactaccaaa gtatagtctg 4200
tttgagttag aaaatggccg aaaacggatg ttggctagcg ccggagagct tcaaaagggg 4260
aacgaactcg cactaccgtc taaatacgtg aatttcctgt atttagcgtc ccattacgag 4320
aagttgaaag gttcacctga agataacgaa cagaagcaac tttttgttga gcagcacaaa 4380
cattatctcg acgaaatcat agagcaaatt tcggaattca gtaagagagt catcctagct 4440
gatgccaatc tggacaaagt attaagcgca tacaacaagc acagggataa acccatacgt 4500
gagcaggcgg aaaatattat ccatttgttt actcttacca acctcggcgc tccagccgca 4560
ttcaagtatt ttgacacaac gatagatcgc aaacgataca cttctaccaa ggaggtgcta 4620
gacgcgacac tgattcacca atccatcacg ggattatatg aaactcggat agatttgtca 4680
cagcttgggg gtgactctgg tggttctccc aagaagaaga ggaaagtcta a 4731
<210> 29
<211> 1576
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 29
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1 5 10 15
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
20 25 30
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
35 40 45
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
50 55 60
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
65 70 75 80
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
100 105 110
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
115 120 125
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
130 135 140
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
145 150 155 160
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175
Arg Arg Ile Leu Leu Pro Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu
180 185 190
Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile
195 200 205
Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val
210 215 220
Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile
225 230 235 240
Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala
245 250 255
Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg
260 265 270
Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala
275 280 285
Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val
290 295 300
Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val
305 310 315 320
Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg
325 330 335
Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr
340 345 350
Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu
355 360 365
Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln
370 375 380
Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala
385 390 395 400
Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser
405 410 415
Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn
420 425 430
Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn
435 440 445
Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser
450 455 460
Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly
465 470 475 480
Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala
485 490 495
Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala
500 505 510
Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp
515 520 525
Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr
530 535 540
Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile
545 550 555 560
Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile
565 570 575
Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg
580 585 590
Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro
595 600 605
His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu
610 615 620
Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile
625 630 635 640
Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn
645 650 655
Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro
660 665 670
Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe
675 680 685
Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val
690 695 700
Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu
705 710 715 720
Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe
725 730 735
Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr
740 745 750
Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys
755 760 765
Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe
770 775 780
Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp
785 790 795 800
Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile
805 810 815
Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg
820 825 830
Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu
835 840 845
Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile
850 855 860
Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu
865 870 875 880
Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp
885 890 895
Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly
900 905 910
Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro
915 920 925
Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu
930 935 940
Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met
945 950 955 960
Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu
965 970 975
Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile
980 985 990
Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu
995 1000 1005
Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu
1010 1015 1020
Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp Ala Ile Val Pro
1025 1030 1035 1040
Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg
1045 1050 1055
Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val
1060 1065 1070
Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu
1075 1080 1085
Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly
1090 1095 1100
Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu
1105 1110 1115 1120
Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met
1125 1130 1135
Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val
1140 1145 1150
Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln
1155 1160 1165
Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1170 1175 1180
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys
1185 1190 1195 1200
Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg
1205 1210 1215
Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys
1220 1225 1230
Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr
1235 1240 1245
Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly
1250 1255 1260
Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
1265 1270 1275 1280
Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu
1285 1290 1295
Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn
1300 1305 1310
Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr
1315 1320 1325
Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala
1330 1335 1340
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu
1345 1350 1355 1360
Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile
1365 1370 1375
Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile
1380 1385 1390
Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys
1395 1400 1405
Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1410 1415 1420
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu
1425 1430 1435 1440
Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val
1445 1450 1455
Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu
1460 1465 1470
Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu
1475 1480 1485
Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu
1490 1495 1500
Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala
1505 1510 1515 1520
Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
1525 1530 1535
Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu
1540 1545 1550
Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Ser Gly Gly
1555 1560 1565
Ser Pro Lys Lys Lys Arg Lys Val
1570 1575
<210> 30
<211> 4890
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 30
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaaag aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtagccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttagcggca gcgagactcc cgggacctca 4320
gagtccgcca cacccgaaag tatggacagc ctcttgatga accggaggga gtttctttac 4380
caattcaaaa atgtccgctg ggctaagggt cggcgtgaga cctacctgtg ctacgtagtg 4440
aagaggcgtg acagtgctac atccttttca ctggactttg gttatcttcg caataagaac 4500
ggctgccacg tggaattgct cttcctccgc tacatctcgg actgggacct agaccctggc 4560
cgctgctacc gcgtcacctg gttcatctcc tggagcccct gctacgactg tgcccgacat 4620
gtggccgact ttctgcgagg gaaccccaac ctcagtctga ggatcttcac cgcgcgcctc 4680
tacttctgtg aggaccgcaa ggctgagccc gaggggctgc ggcggctgca ccgcgccggg 4740
gtgcaaatag ccatcatgac cttcaaagat tatttttact gctggaatac ttttgtagaa 4800
aaccatggaa gaactttcaa agcctgggaa gggctgcatg aaaattcagt tcgtctctcc 4860
agacagcttc ggcgcatcct tttgccctga 4890
<210> 31
<211> 1629
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 31
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Lys Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1010 1015 1020
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035 1040
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val
1045 1050 1055
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys
1060 1065 1070
Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1075 1080 1085
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile
1090 1095 1100
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
1105 1110 1115 1120
Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1125 1130 1135
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1140 1145 1150
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1155 1160 1165
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1170 1175 1180
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1185 1190 1195 1200
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1205 1210 1215
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1220 1225 1230
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1250 1255 1260
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275 1280
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys
1285 1290 1295
Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu
1300 1305 1310
Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1315 1320 1325
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg
1330 1335 1340
Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly
1345 1350 1355 1360
Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1365 1370 1375
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1380 1385 1390
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1395 1400 1405
Asp Glu Gly Ala Pro Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser
1410 1415 1420
Pro Lys Lys Lys Arg Lys Val Ser Gly Ser Glu Thr Pro Gly Thr Ser
1425 1430 1435 1440
Glu Ser Ala Thr Pro Glu Ser Met Asp Ser Leu Leu Met Asn Arg Arg
1445 1450 1455
Glu Phe Leu Tyr Gln Phe Lys Asn Val Arg Trp Ala Lys Gly Arg Arg
1460 1465 1470
Glu Thr Tyr Leu Cys Tyr Val Val Lys Arg Arg Asp Ser Ala Thr Ser
1475 1480 1485
Phe Ser Leu Asp Phe Gly Tyr Leu Arg Asn Lys Asn Gly Cys His Val
1490 1495 1500
Glu Leu Leu Phe Leu Arg Tyr Ile Ser Asp Trp Asp Leu Asp Pro Gly
1505 1510 1515 1520
Arg Cys Tyr Arg Val Thr Trp Phe Ile Ser Trp Ser Pro Cys Tyr Asp
1525 1530 1535
Cys Ala Arg His Val Ala Asp Phe Leu Arg Gly Asn Pro Asn Leu Ser
1540 1545 1550
Leu Arg Ile Phe Thr Ala Arg Leu Tyr Phe Cys Glu Asp Arg Lys Ala
1555 1560 1565
Glu Pro Glu Gly Leu Arg Arg Leu His Arg Ala Gly Val Gln Ile Ala
1570 1575 1580
Ile Met Thr Phe Lys Asp Tyr Phe Tyr Cys Trp Asn Thr Phe Val Glu
1585 1590 1595 1600
Asn His Gly Arg Thr Phe Lys Ala Trp Glu Gly Leu His Glu Asn Ser
1605 1610 1615
Val Arg Leu Ser Arg Gln Leu Arg Arg Ile Leu Leu Pro
1620 1625
<210> 32
<211> 4917
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 32
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaaag aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
catatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtagccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttggtagtg atgctttaga cgattttgac 4320
ttagatatgc ttggttcaga cgcgttagac gacttcggtg gaggatccat ggacagcctc 4380
ttgatgaacc ggaggaagtt tctttaccaa ttcaaaaatg tccgctgggc taagggtcgg 4440
cgtgagacct acctgtgcta cgtagtgaag aggcgtgaca gtgctacatc cttttcactg 4500
gactttggtt atcttcgcaa taagaacggc tgccacgtgg aattgctctt cctccgctac 4560
atctcggact gggacctaga ccctggccgc tgctaccgcg tcacctggtt cacctcctgg 4620
agcccctgct acgactgtgc ccgacatgtg gccgactttc tgcgagggaa ccccaacctc 4680
agtctgagga tcttcaccgc gcgcctctac ttctgtgagg accgcaaggc tgagcccgag 4740
gggctgcggc ggctgcaccg cgccggggtg caaatagcca tcatgacctt caaagattat 4800
ttttactgct ggaatacttt tgtagaaaac catgaaagaa ctttcaaagc ctgggaaggg 4860
ctgcatgaaa attcagttcg tctctccaga cagcttcggc gcatcctttt gccctga 4917
<210> 33
<211> 1638
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 33
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Lys Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1010 1015 1020
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035 1040
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val
1045 1050 1055
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys
1060 1065 1070
Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1075 1080 1085
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile
1090 1095 1100
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
1105 1110 1115 1120
Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1125 1130 1135
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1140 1145 1150
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1155 1160 1165
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1170 1175 1180
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1185 1190 1195 1200
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1205 1210 1215
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1220 1225 1230
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1250 1255 1260
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275 1280
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys
1285 1290 1295
Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu
1300 1305 1310
Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1315 1320 1325
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg
1330 1335 1340
Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly
1345 1350 1355 1360
Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1365 1370 1375
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1380 1385 1390
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1395 1400 1405
Asp Glu Gly Ala Pro Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser
1410 1415 1420
Pro Lys Lys Lys Arg Lys Val Gly Ser Asp Ala Leu Asp Asp Phe Asp
1425 1430 1435 1440
Leu Asp Met Leu Gly Ser Asp Ala Leu Asp Asp Phe Gly Gly Gly Ser
1445 1450 1455
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1460 1465 1470
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
1475 1480 1485
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
1490 1495 1500
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
1505 1510 1515 1520
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
1525 1530 1535
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
1540 1545 1550
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
1555 1560 1565
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
1570 1575 1580
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
1585 1590 1595 1600
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
1605 1610 1615
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
1620 1625 1630
Arg Arg Ile Leu Leu Pro
1635
<210> 34
<211> 3
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (1)..(1)
<223> n是a、c、g或t
<400> 34
ngg 3
<210> 35
<211> 5
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (1)..(2)
<223> n是a、c、g或t
<220>
<221> misc_feature
<222> (4)..(5)
<223> r是a或g
<400> 35
nngrr 5
<210> 36
<211> 6
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (1)..(2)
<223> n是a、c、g或t
<400> 36
nnagaa 6
<210> 37
<211> 6
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (1)..(2)
<223> n是a、c、g或t
<220>
<221> misc_feature
<222> (4)..(5)
<223> r是a或g
<400> 37
nngrrt 6
<210> 38
<211> 3
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 38
tgg 3
<210> 39
<211> 6
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (1)..(3)
<223> n是a、c、g或t
<220>
<221> misc_feature
<222> (4)..(5)
<223> r是a或g
<400> 39
nnnrrt 6
<210> 40
<211> 4
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 40
Ser Gly Gly Ser
1
<210> 41
<211> 5
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 41
Gly Ser Ser Gly Ser
1 5
<210> 42
<211> 4
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 42
Gly Gly Gly Ser
1
<210> 43
<211> 5
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 43
Gly Gly Gly Gly Ser
1 5
<210> 44
<211> 5
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 44
Ser Ser Ser Ser Gly
1 5
<210> 45
<211> 5
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 45
Gly Ser Gly Ser Ala
1 5
<210> 46
<211> 5
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 46
Gly Gly Ser Gly Gly
1 5
<210> 47
<211> 4890
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 47
atggactata aggaccacga cggagactac aaggatcatg atattgatta caaagacgat 60
gacgataaga tggccccaaa gaagaagcgg aaggtcggta tccacggagt cccagcagct 120
accatggaca agaagtattc tatcggactg gccatcggga ctaatagcgt cgggtgggcc 180
gtgatcactg acgagtacaa ggtgccctct aagaagttca aggtgctcgg gaacaccgac 240
cggcattcca tcaagaaaaa tctgatcgga gctctcctct ttgattcagg ggagaccgct 300
gaagcaaccc gcctcaagcg gactgctaga cggcggtaca ccaggaggaa gaaccggatt 360
tgttaccttc aagagatatt ctccaacgaa atggcaaagg tcgacgacag cttcttccat 420
aggctggaag aatcattcct cgtggaagag gataagaagc atgaacggca tcccatcttc 480
ggtaatatcg tcgacgaggt ggcctatcac gagaaatacc caaccatcta ccatcttcgc 540
aaaaagctgg tggactcaac cgacaaggca gacctccggc ttatctacct ggccctggcc 600
cacatgatca agttcagagg ccacttcctg atcgagggcg acctcaatcc tgacaatagc 660
gatgtggata aactgttcat ccagctggtg cagacttaca accagctctt tgaagagaac 720
cccatcaatg caagcggagt cgatgccaag gccattctgt cagcccggct gtcaaagagc 780
cgcagacttg agaatcttat cgctcagctg ccgggtgaaa agaaaaatgg actgttcggg 840
aacctgattg ctctttcact tgggctgact cccaatttca agtctaattt cgacctggca 900
gaggatgcca agctgcaact gtccaaggac acctatgatg acgatctcga caacctcctg 960
gcccagatcg gtgaccaata cgccgacctt ttccttgctg ctaagaatct ttctgacgcc 1020
atcctgctgt ctgacattct ccgcgtgaac actgaaatca ccaaggcccc tctttcagct 1080
tcaatgatta agcggtatga tgagcaccac caggacctga ccctgcttaa ggcactcgtc 1140
cggcagcagc ttccggagaa gtacaaggaa atcttctttg accagtcaaa gaatggatac 1200
gccggctaca tcgacggagg tgcctcccaa gaggaatttt ataagtttat caaacctatc 1260
cttgagaaga tggacggcac cgaagagctc ctcgtgaaac tgaatcggga ggatctgctg 1320
cggaagcagc gcactttcga caatgggagc attccccacc agatccatct tggggagctt 1380
cacgccatcc ttcggcgcca agaggacttc tacccctttc ttaaggacaa cagggagaag 1440
attgagaaaa ttctcacttt ccgcatcccc tactacgtgg gacccctcgc cagaggaaat 1500
agccggtttg cttggatgac cagaaagtca gaagaaacta tcactccctg gaacttcgaa 1560
gaggtggtgg acaagggagc cagcgctcag tcattcatcg aacggatgac taacttcgat 1620
aagaacctcc ccaatgagaa ggtcctgccg aaacattccc tgctctacga gtactttacc 1680
gtgtacaacg agctgaccaa ggtgaaatat gtcaccgaag ggatgaggaa gcccgcattc 1740
ctgtcaggcg aacaaaagaa ggcaattgtg gaccttctgt tcaagaccaa tagaaaggtg 1800
accgtgaagc agctgaagga ggactatttc aagaaaattg aatgcttcga ctctgtggag 1860
attagcgggg tcgaagatcg gttcaacgca agcctgggta cctaccatga tctgcttaag 1920
atcatcaagg acaaggattt tctggacaat gaggagaaag aggacatcct tgaggacatt 1980
gtcctgactc tcactctgtt cgaggaccgg gaaatgatcg aggagaggct taagacctac 2040
gcccatctgt tcgacgataa agtgatgaag caacttaaac ggagaagata taccggatgg 2100
ggacgcctta gccgcaaact catcaacgga atccgggaca aacagagcgg aaagaccatt 2160
cttgatttcc ttaagagcga cggattcgct aatcgcaact tcatgcaact tatccatgat 2220
gattccctga cctttaagga ggacatccag aaggcccaag tgtctggaca aggtgactca 2280
ctgcacgagc atatcgcaaa tctggctggt tcacccgcta ttaagaaggg tattctccag 2340
accgtgaaag tcgtggacga gctggtcaag gtgatgggtc gccataaacc agagaacatt 2400
gtcatcgaga tggccaggga aaaccagact acccagaagg gacagaagaa cagcagggag 2460
cggatgaaaa gaattgagga agggattaag gagctcgggt cacagatcct taaagagcac 2520
ccggtggaaa acacccagct tcagaatgag aagctctatc tgtactacct tcaaaatgga 2580
cgcgatatgt atgtggacca agagcttgat atcaacaggc tctcagacta cgacgtggac 2640
gccatcgtcc ctcagagctt cctcaaagac gactcaattg acaataaggt gctgactcgc 2700
tcagacaaga accggggaaa gtcagataac gtgccctcag aggaagtcgt gaaaaagatg 2760
aagaactatt ggcgccagct tctgaacgca aagctgatca ctcagcggaa gttcgacaat 2820
ctcactaagg ctgagagggg cggactgagc gaactggaca aagcaggatt cattaaacgg 2880
caacttgtgg agactcggca gattactaaa catgtagccc aaatccttga ctcacgcatg 2940
aataccaagt acgacgaaaa cgacaaactt atccgcgagg tgaaggtgat taccctgaag 3000
tccaagctgg tcagcgattt cagaaaggac tttcaattct acaaagtgcg ggagatcaat 3060
aactatcatc atgctcatga cgcatatctg aatgccgtgg tgggaaccgc cctgatcaag 3120
aagtacccaa agctggaaag cgagttcgtg tacggagact acaaggtcta cgacgtgcgc 3180
aagatgattg ccaaatctga gcaggagatc ggaaaggcca ccgcaaagta cttcttctac 3240
agcaacatca tgaatttctt caagaccgaa atcacccttg caaacggtga gatccggaag 3300
aggccgctca tcgagactaa tggggagact ggcgaaatcg tgtgggacaa gggcagagat 3360
ttcgctaccg tgcgcaaagt gctttctatg cctcaagtga acatcgtgaa gaaaaccgag 3420
gtgcaaaccg gaggcttttc taaggaatca atcctcccca agcgcaactc cgacaagctc 3480
attgcaagga agaaggattg ggaccctaag aagtacggcg gattcgattc accaactgtg 3540
gcttattctg tcctggtcgt ggctaaggtg gaaaaaggaa agtctaagaa gctcaagagc 3600
gtgaaggaac tgctgggtat caccattatg gagcgcagct ccttcgagaa gaacccaatt 3660
gactttctcg aagccaaagg ttacaaggaa gtcaagaagg accttatcat caagctccca 3720
aagtatagcc tgttcgaact ggagaatggg cggaagcgga tgctcgcctc cgctggcgaa 3780
cttcagaagg gtaatgagct ggctctcccc tccaagtacg tgaatttcct ctaccttgca 3840
agccattacg agaagctgaa ggggagcccc gaggacaacg agcaaaagca actgtttgtg 3900
gagcagcata agcattatct ggacgagatc attgagcaga tttccgagtt ttctaaacgc 3960
gtcattctcg ctgatgccaa cctcgataaa gtccttagcg catacaataa gcacagagac 4020
aaaccaattc gggagcaggc tgagaatatc atccacctgt tcaccctcac caatcttggt 4080
gcccctgccg cattcaagta cttcgacacc accatcgacc ggaaacgcta tacctccacc 4140
aaagaagtgc tggacgccac cctcatccac cagagcatca ccggacttta cgaaactcgg 4200
attgacctct cacagctcgg aggggatgag ggagctccca agaaaaagcg caaggtaggt 4260
agttccggat ctccgaaaaa gaaacgcaaa gttagcggca gcgagactcc cgggacctca 4320
gagtccgcca cacccgaaag tatggacagc ctcttgatga accggaggaa gtttctttac 4380
caattcaaaa atgtccgctg ggctaagggt cggcgtgaga cctacctgtg ctacgtagtg 4440
aagaggcgtg acagtgctac atccttttca ctggactttg gttatcttcg caataagaac 4500
ggctgccacg tggaattgct cttcctccgc tacatctcgg actgggacct agaccctggc 4560
cgctgctacc gcgtcacctg gttcacctcc tggagcccct gctacgactg tgcccgacat 4620
gtggccgact ttctgcgagg gaaccccaac ctcagtctga ggatcttcac cgcgcgcctc 4680
tacttctgtg aggaccgcaa ggctgagccc gaggggctgc ggcggctgca ccgcgccggg 4740
gtgcaaatag ccatcatgac cttcaaagat tatttttact gctggaatac ttttgtagaa 4800
aaccatgaaa gaactttcaa agcctgggaa gggctgcatg aaaattcagt tcgtctctcc 4860
agacagcttc ggcgcatcct tttgccctga 4890
<210> 48
<211> 1629
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 48
Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp
1 5 10 15
Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30
Gly Ile His Gly Val Pro Ala Ala Thr Met Asp Lys Lys Tyr Ser Ile
35 40 45
Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp
50 55 60
Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp
65 70 75 80
Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
85 90 95
Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
100 105 110
Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser
115 120 125
Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
130 135 140
Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe
145 150 155 160
Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile
165 170 175
Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
180 185 190
Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His
195 200 205
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
210 215 220
Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
225 230 235 240
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg
245 250 255
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly
260 265 270
Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly
275 280 285
Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys
290 295 300
Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
305 310 315 320
Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
325 330 335
Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
340 345 350
Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu
355 360 365
His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu
370 375 380
Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
385 390 395 400
Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe
405 410 415
Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val
420 425 430
Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
435 440 445
Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu
450 455 460
Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
465 470 475 480
Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu
485 490 495
Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu
500 505 510
Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
515 520 525
Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro
530 535 540
Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr
545 550 555 560
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
565 570 575
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu
580 585 590
Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
595 600 605
Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val
610 615 620
Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
625 630 635 640
Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Lys Glu Asp Ile
645 650 655
Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met
660 665 670
Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val
675 680 685
Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser
690 695 700
Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile
705 710 715 720
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
725 730 735
Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
740 745 750
Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu
755 760 765
Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
770 775 780
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile
785 790 795 800
Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys
805 810 815
Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
820 825 830
Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
835 840 845
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
850 855 860
Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
865 870 875 880
Ala Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys
885 890 895
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
900 905 910
Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu
915 920 925
Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala
930 935 940
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg
945 950 955 960
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu
965 970 975
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
980 985 990
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg
995 1000 1005
Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1010 1015 1020
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys
1025 1030 1035 1040
Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val
1045 1050 1055
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys
1060 1065 1070
Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1075 1080 1085
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile
1090 1095 1100
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
1105 1110 1115 1120
Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val
1125 1130 1135
Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu
1140 1145 1150
Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp
1155 1160 1165
Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1170 1175 1180
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1185 1190 1195 1200
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu
1205 1210 1215
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1220 1225 1230
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu
1235 1240 1245
Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1250 1255 1260
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala
1265 1270 1275 1280
Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys
1285 1290 1295
Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu
1300 1305 1310
Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1315 1320 1325
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg
1330 1335 1340
Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly
1345 1350 1355 1360
Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
1365 1370 1375
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1380 1385 1390
Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
1395 1400 1405
Asp Glu Gly Ala Pro Lys Lys Lys Arg Lys Val Gly Ser Ser Gly Ser
1410 1415 1420
Pro Lys Lys Lys Arg Lys Val Ser Gly Ser Glu Thr Pro Gly Thr Ser
1425 1430 1435 1440
Glu Ser Ala Thr Pro Glu Ser Met Asp Ser Leu Leu Met Asn Arg Arg
1445 1450 1455
Lys Phe Leu Tyr Gln Phe Lys Asn Val Arg Trp Ala Lys Gly Arg Arg
1460 1465 1470
Glu Thr Tyr Leu Cys Tyr Val Val Lys Arg Arg Asp Ser Ala Thr Ser
1475 1480 1485
Phe Ser Leu Asp Phe Gly Tyr Leu Arg Asn Lys Asn Gly Cys His Val
1490 1495 1500
Glu Leu Leu Phe Leu Arg Tyr Ile Ser Asp Trp Asp Leu Asp Pro Gly
1505 1510 1515 1520
Arg Cys Tyr Arg Val Thr Trp Phe Thr Ser Trp Ser Pro Cys Tyr Asp
1525 1530 1535
Cys Ala Arg His Val Ala Asp Phe Leu Arg Gly Asn Pro Asn Leu Ser
1540 1545 1550
Leu Arg Ile Phe Thr Ala Arg Leu Tyr Phe Cys Glu Asp Arg Lys Ala
1555 1560 1565
Glu Pro Glu Gly Leu Arg Arg Leu His Arg Ala Gly Val Gln Ile Ala
1570 1575 1580
Ile Met Thr Phe Lys Asp Tyr Phe Tyr Cys Trp Asn Thr Phe Val Glu
1585 1590 1595 1600
Asn His Glu Arg Thr Phe Lys Ala Trp Glu Gly Leu His Glu Asn Ser
1605 1610 1615
Val Arg Leu Ser Arg Gln Leu Arg Arg Ile Leu Leu Pro
1620 1625
<210> 49
<211> 4089
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 49
atggacagcc tcttgatgaa ccggaggaag tttctttacc aattcaaaaa tgtccgctgg 60
gctaagggtc ggcgtgagac ctacctgtgc tacgtagtga agaggcgtga cagtgctaca 120
tccttttcac tggactttgg ttatcttcgc aataagaacg gctgccacgt ggaattgctc 180
ttcctccgct acatctcgga ctgggaccta gaccctggcc gctgctaccg cgtcacctgg 240
ttcacctcct ggagcccctg ctacgactgt gcccgacatg tggccgactt tctgcgaggg 300
aaccccaacc tcagtctgag gatcttcacc gcgcgcctct acttctgtga ggaccgcaag 360
gctgagcccg aggggctgcg gcggctgcac cgcgccgggg tgcaaatagc catcatgacc 420
ttcaaagatt atttttactg ctggaatact tttgtagaaa accatgaaag aactttcaaa 480
gcctgggaag ggctgcatga aaattcagtt cgtctctcca gacagcttcg gcgcatcctt 540
ttgcccagcg gcagcgagac tcccgggacc tcagagtccg ccacacccga aagtgggaaa 600
cggaactaca tcctggggct tgacattggg ataaccagcg ttggctacgg aattattgat 660
tatgagacac gcgatgtgat tgacgccggg gttaggctgt tcaaagaggc caacgttgaa 720
aacaacgagg gaagacggag taagcgcgga gcaagaagac tcaagcgcag acggagacat 780
cggattcaga gggtgaaaaa gctgctcttc gattacaatc tcctgaccga tcatagtgag 840
ctgagcggaa tcaaccccta cgaggcgcga gtgaaagggc tttcccagaa gctgtccgaa 900
gaggagttct ccgccgcgtt gctgcacctg gccaaacgga ggggggttca caatgtaaac 960
gaagtggagg aggacacggg caatgaactt agtacgaaag aacagatcag taggaactct 1020
aaggctctcg aagagaaata cgtcgctgag ttgcagcttg agagactgaa aaaagacggc 1080
gaagtacgcg gatctattaa taggttcaag acttcagatt acgtaaagga agccaagcag 1140
ctcctgaaag tacagaaagc gtaccatcag ctcgatcaga gcttcatcga tacctacata 1200
gatttgctgg agacacggag gacatactac gagggcccag gggaaggatc tccttttggg 1260
tggaaggaca tcaaggaatg gtacgagatg cttatgggac attgtacata ttttccggag 1320
gagctcagga gcgtcaagta cgcctacaat gccgacctgt acaatgccct caatgacctc 1380
aataacctcg tgattaccag ggacgagaac gagaagctgg agtactatga aaagttccag 1440
attatcgaga atgtgtttaa gcagaagaag aagccgacac ttaagcagat tgcaaaggaa 1500
atcctcgtga atgaggaaga tatcaaggga tacagagtga caagtacagg caagcccgag 1560
ttcacaaatc tgaaggtgta ccacgatatt aaggacataa ccgcacgaaa ggagataatc 1620
gaaaacgctg agctcctcga tcagatcgca aaaattctta ccatctacca gtctagtgag 1680
gacattcagg aggaactgac taatctgaac agtgagctca cccaagagga aattgagcag 1740
atttcaaacc tgaaaggcta caccgggacg cacaatctga gcctcaaagc aatcaacctc 1800
attctggatg aactttggca cacaaatgac aaccaaattg ccatattcaa ccgcctgaaa 1860
ctggtgccaa aaaaagtgga tctgtcacag caaaaggaaa tccctacaac cttggttgac 1920
gattttattc tgtcccccgt tgtcaagcgg agcttcatcc agtcaatcaa ggtgatcaat 1980
gccatcatta aaaaatacgg attgccaaac gatataatta tcgagcttgc acgagagaag 2040
aactcaaagg acgcccagaa gatgattaac gaaatgcaga agcgcaaccg ccagacaaac 2100
gaacgcatag aggaaattat aagaacaacc ggcaaagaga atgccaagta tctgatcgag 2160
aaaatcaagc tgcacgacat gcaagaaggc aagtgcctgt actctctgga agctatccca 2220
ctcgaagatc tgctgaataa tccattcaat tacgaggtgg accacatcat ccctagatcc 2280
gtaagctttg acaattcctt caataacaaa gttctggtta aacaggagga aaattctaaa 2340
aaagggaacc ggaccccgtt ccagtacctg agctccagtg acagcaagat tagctacgag 2400
acttttaaga aacatattct gaatctggcc aaaggcaaag gcaggatcag caagaccaag 2460
aaggagtacc tcctcgaaga acgcgacatt aacagattta gtgtgcagaa agatttcatc 2520
aaccgaaacc ttgtcgatac tcggtacgcc acgagaggcc tgatgaatct cctcaggagc 2580
tacttccgcg tcaataatct ggacgttaaa gtcaagagca taaatggggg attcaccagc 2640
tttctgagga gaaagtggaa gtttaagaag gaacgaaaca aaggatacaa gcaccatgct 2700
gaggatgctt tgatcatcgc taacgcggac tttatcttta aggaatggaa aaagctggat 2760
aaggcaaaga aagtgatgga aaaccagatg ttcgaggaga agcaggcaga gtcaatgcct 2820
gagatcgaga cagagcagga atacaaggaa attttcatca cccctcatca gattaaacac 2880
ataaaggact tcaaagacta taaatactct catagggtgg acaaaaaacc caatcgcaag 2940
ctcattaatg acaccctgta ctcaacacgg aaggatgata aaggtaatac cttgattgtg 3000
aataatctta atggattgta tgacaaagat aacgacaagc tcaagaagct gatcaacaag 3060
tctccagaga agctccttat gtatcaccac gacccacaga cttatcagaa attgaaactg 3120
atcatggagc aatacgggga tgagaagaac ccactctaca aatattatga ggaaacaggt 3180
aattacctga ccaagtactc caagaaggat aacggaccag tgatcaaaaa gataaagtac 3240
tatggcaaca aacttaatgc gcatttggac ataactgacg attaccccaa ttctcgaaac 3300
aaggttgtga agctctccct gaagccttat agatttgacg tgtacctgga taatggggtt 3360
tataaattcg tcaccgtgaa aaatctggac gtgatcaaaa aggagaacta ttatgaagta 3420
aactcaaagt gctatgagga ggcgaagaag ctgaagaaga tctccaatca ggccgagttc 3480
atcgcttcct tctataagaa cgatctcatc aagatcaatg gagagcttta tcgcgtcatt 3540
ggtgtgaaca atgacttgct gaacaggatc gaagtcaata tgatagacat tacctaccgg 3600
gagtatctcg aaaacatgaa tgataaacgg ccgcctcaca tcatcaagac aatcgcatct 3660
aaaactcagt caataaaaaa gtactctacc gatatcctgg ggaatctcta tgaagtgaag 3720
tcaaagaagc acccacaaat cattaaaaaa ggtggatcct ctggtggttc tactaatctg 3780
tcagatatta ttgaaaagga gaccggtaag caactggtta tccaggaatc catcctcatg 3840
ctcccagagg aggtggaaga agtcattggg aacaagccgg aaagcgatat actcgtgcac 3900
accgcctacg acgagagcac cgacgagaat gtcatgcttc tgactagcga cgcccctgaa 3960
tacaagcctt gggctctggt catacaggat agcaacggtg agaacaagat taagatgctc 4020
tctggtggtt ctcccaagaa gaagaggaaa gtcggatcct acccatacga tgttccagat 4080
tacgcttaa 4089
<210> 50
<211> 1362
<212> PRT
<213> 人工序列(Artificial Sequence)
<400> 50
Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys
1 5 10 15
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
20 25 30
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
35 40 45
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
50 55 60
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
65 70 75 80
Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
100 105 110
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
115 120 125
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
130 135 140
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Glu Arg Thr Phe Lys
145 150 155 160
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175
Arg Arg Ile Leu Leu Pro Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu
180 185 190
Ser Ala Thr Pro Glu Ser Gly Lys Arg Asn Tyr Ile Leu Gly Leu Asp
195 200 205
Ile Gly Ile Thr Ser Val Gly Tyr Gly Ile Ile Asp Tyr Glu Thr Arg
210 215 220
Asp Val Ile Asp Ala Gly Val Arg Leu Phe Lys Glu Ala Asn Val Glu
225 230 235 240
Asn Asn Glu Gly Arg Arg Ser Lys Arg Gly Ala Arg Arg Leu Lys Arg
245 250 255
Arg Arg Arg His Arg Ile Gln Arg Val Lys Lys Leu Leu Phe Asp Tyr
260 265 270
Asn Leu Leu Thr Asp His Ser Glu Leu Ser Gly Ile Asn Pro Tyr Glu
275 280 285
Ala Arg Val Lys Gly Leu Ser Gln Lys Leu Ser Glu Glu Glu Phe Ser
290 295 300
Ala Ala Leu Leu His Leu Ala Lys Arg Arg Gly Val His Asn Val Asn
305 310 315 320
Glu Val Glu Glu Asp Thr Gly Asn Glu Leu Ser Thr Lys Glu Gln Ile
325 330 335
Ser Arg Asn Ser Lys Ala Leu Glu Glu Lys Tyr Val Ala Glu Leu Gln
340 345 350
Leu Glu Arg Leu Lys Lys Asp Gly Glu Val Arg Gly Ser Ile Asn Arg
355 360 365
Phe Lys Thr Ser Asp Tyr Val Lys Glu Ala Lys Gln Leu Leu Lys Val
370 375 380
Gln Lys Ala Tyr His Gln Leu Asp Gln Ser Phe Ile Asp Thr Tyr Ile
385 390 395 400
Asp Leu Leu Glu Thr Arg Arg Thr Tyr Tyr Glu Gly Pro Gly Glu Gly
405 410 415
Ser Pro Phe Gly Trp Lys Asp Ile Lys Glu Trp Tyr Glu Met Leu Met
420 425 430
Gly His Cys Thr Tyr Phe Pro Glu Glu Leu Arg Ser Val Lys Tyr Ala
435 440 445
Tyr Asn Ala Asp Leu Tyr Asn Ala Leu Asn Asp Leu Asn Asn Leu Val
450 455 460
Ile Thr Arg Asp Glu Asn Glu Lys Leu Glu Tyr Tyr Glu Lys Phe Gln
465 470 475 480
Ile Ile Glu Asn Val Phe Lys Gln Lys Lys Lys Pro Thr Leu Lys Gln
485 490 495
Ile Ala Lys Glu Ile Leu Val Asn Glu Glu Asp Ile Lys Gly Tyr Arg
500 505 510
Val Thr Ser Thr Gly Lys Pro Glu Phe Thr Asn Leu Lys Val Tyr His
515 520 525
Asp Ile Lys Asp Ile Thr Ala Arg Lys Glu Ile Ile Glu Asn Ala Glu
530 535 540
Leu Leu Asp Gln Ile Ala Lys Ile Leu Thr Ile Tyr Gln Ser Ser Glu
545 550 555 560
Asp Ile Gln Glu Glu Leu Thr Asn Leu Asn Ser Glu Leu Thr Gln Glu
565 570 575
Glu Ile Glu Gln Ile Ser Asn Leu Lys Gly Tyr Thr Gly Thr His Asn
580 585 590
Leu Ser Leu Lys Ala Ile Asn Leu Ile Leu Asp Glu Leu Trp His Thr
595 600 605
Asn Asp Asn Gln Ile Ala Ile Phe Asn Arg Leu Lys Leu Val Pro Lys
610 615 620
Lys Val Asp Leu Ser Gln Gln Lys Glu Ile Pro Thr Thr Leu Val Asp
625 630 635 640
Asp Phe Ile Leu Ser Pro Val Val Lys Arg Ser Phe Ile Gln Ser Ile
645 650 655
Lys Val Ile Asn Ala Ile Ile Lys Lys Tyr Gly Leu Pro Asn Asp Ile
660 665 670
Ile Ile Glu Leu Ala Arg Glu Lys Asn Ser Lys Asp Ala Gln Lys Met
675 680 685
Ile Asn Glu Met Gln Lys Arg Asn Arg Gln Thr Asn Glu Arg Ile Glu
690 695 700
Glu Ile Ile Arg Thr Thr Gly Lys Glu Asn Ala Lys Tyr Leu Ile Glu
705 710 715 720
Lys Ile Lys Leu His Asp Met Gln Glu Gly Lys Cys Leu Tyr Ser Leu
725 730 735
Glu Ala Ile Pro Leu Glu Asp Leu Leu Asn Asn Pro Phe Asn Tyr Glu
740 745 750
Val Asp His Ile Ile Pro Arg Ser Val Ser Phe Asp Asn Ser Phe Asn
755 760 765
Asn Lys Val Leu Val Lys Gln Glu Glu Asn Ser Lys Lys Gly Asn Arg
770 775 780
Thr Pro Phe Gln Tyr Leu Ser Ser Ser Asp Ser Lys Ile Ser Tyr Glu
785 790 795 800
Thr Phe Lys Lys His Ile Leu Asn Leu Ala Lys Gly Lys Gly Arg Ile
805 810 815
Ser Lys Thr Lys Lys Glu Tyr Leu Leu Glu Glu Arg Asp Ile Asn Arg
820 825 830
Phe Ser Val Gln Lys Asp Phe Ile Asn Arg Asn Leu Val Asp Thr Arg
835 840 845
Tyr Ala Thr Arg Gly Leu Met Asn Leu Leu Arg Ser Tyr Phe Arg Val
850 855 860
Asn Asn Leu Asp Val Lys Val Lys Ser Ile Asn Gly Gly Phe Thr Ser
865 870 875 880
Phe Leu Arg Arg Lys Trp Lys Phe Lys Lys Glu Arg Asn Lys Gly Tyr
885 890 895
Lys His His Ala Glu Asp Ala Leu Ile Ile Ala Asn Ala Asp Phe Ile
900 905 910
Phe Lys Glu Trp Lys Lys Leu Asp Lys Ala Lys Lys Val Met Glu Asn
915 920 925
Gln Met Phe Glu Glu Lys Gln Ala Glu Ser Met Pro Glu Ile Glu Thr
930 935 940
Glu Gln Glu Tyr Lys Glu Ile Phe Ile Thr Pro His Gln Ile Lys His
945 950 955 960
Ile Lys Asp Phe Lys Asp Tyr Lys Tyr Ser His Arg Val Asp Lys Lys
965 970 975
Pro Asn Arg Lys Leu Ile Asn Asp Thr Leu Tyr Ser Thr Arg Lys Asp
980 985 990
Asp Lys Gly Asn Thr Leu Ile Val Asn Asn Leu Asn Gly Leu Tyr Asp
995 1000 1005
Lys Asp Asn Asp Lys Leu Lys Lys Leu Ile Asn Lys Ser Pro Glu Lys
1010 1015 1020
Leu Leu Met Tyr His His Asp Pro Gln Thr Tyr Gln Lys Leu Lys Leu
1025 1030 1035 1040
Ile Met Glu Gln Tyr Gly Asp Glu Lys Asn Pro Leu Tyr Lys Tyr Tyr
1045 1050 1055
Glu Glu Thr Gly Asn Tyr Leu Thr Lys Tyr Ser Lys Lys Asp Asn Gly
1060 1065 1070
Pro Val Ile Lys Lys Ile Lys Tyr Tyr Gly Asn Lys Leu Asn Ala His
1075 1080 1085
Leu Asp Ile Thr Asp Asp Tyr Pro Asn Ser Arg Asn Lys Val Val Lys
1090 1095 1100
Leu Ser Leu Lys Pro Tyr Arg Phe Asp Val Tyr Leu Asp Asn Gly Val
1105 1110 1115 1120
Tyr Lys Phe Val Thr Val Lys Asn Leu Asp Val Ile Lys Lys Glu Asn
1125 1130 1135
Tyr Tyr Glu Val Asn Ser Lys Cys Tyr Glu Glu Ala Lys Lys Leu Lys
1140 1145 1150
Lys Ile Ser Asn Gln Ala Glu Phe Ile Ala Ser Phe Tyr Lys Asn Asp
1155 1160 1165
Leu Ile Lys Ile Asn Gly Glu Leu Tyr Arg Val Ile Gly Val Asn Asn
1170 1175 1180
Asp Leu Leu Asn Arg Ile Glu Val Asn Met Ile Asp Ile Thr Tyr Arg
1185 1190 1195 1200
Glu Tyr Leu Glu Asn Met Asn Asp Lys Arg Pro Pro His Ile Ile Lys
1205 1210 1215
Thr Ile Ala Ser Lys Thr Gln Ser Ile Lys Lys Tyr Ser Thr Asp Ile
1220 1225 1230
Leu Gly Asn Leu Tyr Glu Val Lys Ser Lys Lys His Pro Gln Ile Ile
1235 1240 1245
Lys Lys Gly Gly Ser Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile
1250 1255 1260
Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met
1265 1270 1275 1280
Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp
1285 1290 1295
Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met
1300 1305 1310
Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile
1315 1320 1325
Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser
1330 1335 1340
Pro Lys Lys Lys Arg Lys Val Gly Ser Tyr Pro Tyr Asp Val Pro Asp
1345 1350 1355 1360
Tyr Ala
<210> 51
<211> 20
<212> DNA
<213> 人工序列(Artificial Sequence)
<400> 51
acttacaggc tccaatagtg 20
<210> 52
<211> 103
<212> DNA
<213> 人工序列(Artificial Sequence)
<220>
<221> misc_feature
<222> (1)..(20)
<223> n是a、c、g或t
<400> 52
nnnnnnnnnn nnnnnnnnnn gttatagtac tctggaaaca gaatctacta taacaaggca 60
aaatgccgtg tttatctcgt caacttgttg gcgagatttt ttt 103

Claims (15)

1.一种调控细胞中感兴趣基因的RNA剪接的方法,其特征在于,所述方法包括在所述细胞中表达靶向性胞嘧啶脱氨酶,以诱导该细胞中感兴趣基因的感兴趣内含子的3’剪接位点AG突变为AA,或感兴趣基因的感兴趣内含子的5’剪接位点GT突变为AT,或感兴趣基因的感兴趣内含子的多聚嘧啶区的多个C分别突变为T。
2.如权利要求1所述的方法,其特征在于,所述靶向性胞嘧啶脱氨酶选自:
(1)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas酶的融合蛋白;
(2)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的TALEN蛋白的融合蛋白;
(3)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与特异识别靶向序列的锌指蛋白的融合蛋白;
(4)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶的融合蛋白;和
(5)胞嘧啶脱氨酶、其保留了酶活的片段或突变体与Ago蛋白的融合蛋白。
3.如权利要求2所述的方法,其特征在于,所述靶向性胞嘧啶脱氨酶为胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas酶的融合蛋白,或者是胞嘧啶脱氨酶、其保留了酶活的片段或突变体与核酸酶活性部分或完全缺失但保留了解旋酶活性的Cpf酶的融合蛋白;所述方法包括在所述细胞中表达所述靶向性胞嘧啶脱氨酶和sgRNA,其中,所述sgRNA为所述Cas酶或Cpf酶所特异性识别,并结合到含有感兴趣基因感兴趣内含子剪接位点的序列或结合到感兴趣多聚嘧啶区的互补序列。
4.如权利要求3所述的方法,其特征在于,
所述sgRNA结合到含有所述感兴趣基因的感兴趣内含子的5’剪接位点的序列,所述融合蛋白将所述5’剪接位点处的GT突变为AT,从而诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换或内含子包含;或
所述sgRNA结合到含有所述感兴趣基因的感兴趣内含子的3’剪接位点的序列,所述融合蛋白将所述3’剪接位点的AG突变为AA,从而诱导外显子跳读、激活替代剪接位点、诱导互斥外显子转换或内含子包含;或
所述sgRNA结合到感兴趣多聚嘧啶区的互补序列,诱导该多聚嘧啶区的C突变为T,从而增强外显子的包含。
5.如权利要求2所述的方法,其特征在于,所述靶向性胞嘧啶脱氨酶是胞嘧啶脱氨酶、其保留了酶活的片段或突变体与Ago蛋白的融合蛋白;所述方法包括在所述细胞中表达所述靶向性胞嘧啶脱氨酶和该Ago蛋白识别的gDNA的步骤。
6.如权利要求3所述的方法,其特征在于,所述融合蛋白还含有Ugi,或所述方法还包括同时转入Ugi的表达质粒的步骤;
或者,所述方法包括直接导入所述融合蛋白和所述sgRNA的步骤。
7.如权利要求2-3中任一项所述的方法,其特征在于,
所述Cas酶的核酸酶活性全部缺失,无DNA双链断裂能力,或部分缺失,仅具有DNA单链断裂能力;和/或
所述Cas酶选自:Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9(也称为Csn1和Csx12)、Cas10、Csy1、Csy2、Csy3、Cse1、Cse2、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、Csf1、Csf2、Csf3、Csf4、其同源物或其修饰形式;优选地,所述Cas酶为Cas9酶,优选选自:来自化脓链球菌的Cas9、来自金黄色葡萄球菌的Cas9,以及来自嗜热链球菌的Cas9;和/或
所述胞嘧啶脱氨酶为全长人源激活型胞嘧啶脱氨酶(hAID)、或其保留了酶活的片段或突变体,其中所述片段至少包括胞嘧啶脱氨酶的NLS结构域、催化结构域和APOBEC样结构域;和/或所述融合蛋白还包含以下序列中的一种或多种:接头,核定位序列,Ugi,以及为了构建融合蛋白、促进重组蛋白的表达、获得自动分泌到宿主细胞外的重组蛋白、或利于重组蛋白的纯化而引入的氨基酸残基或氨基酸序列。
8.如权利要求7所述的方法,其特征在于,
所述Cas酶为Cas9酶,该酶的两个核酸内切酶催化结构域RuvC1和/或HNH发生突变,导致该酶核酸酶活性缺失、保留了解旋酶活性;优选地,所述Cas9酶的RuvC1和HNH都发生突变,导致该酶核酸酶活性缺失、保留了解旋酶活;更优选地,所述Cas9酶的第10个氨基酸天冬酰胺突变为丙氨酸或其它氨基酸,第841位氨基酸组氨酸突变为丙氨酸或其它氨基酸;更优选地,所述Cas9酶的氨基酸序列如SEQ ID NO:23第199-1566位所示,或如SEQ ID NO:25第42-1452位氨基酸残基所示,或如SEQ ID NO:33第42-1419位氨基酸残基所示,或如SEQID NO:50第199-1262位氨基酸残基所示;和/或
所述胞嘧啶脱氨酶的片段至少包含胞嘧啶脱氨酶的第9-182位氨基酸残基,例如至少包含第1-182位氨基酸;优选地,所述片段由第1-182位氨基酸残基组成,由第1-186位氨基酸残基组成,或由第1-190位氨基酸残基组成;或者,所述胞嘧啶脱氨酶的氨基酸序列如SEQ ID NO:25第1457-1654位氨基酸残基所示,所述片段至少包含SEQ ID NO:25的第1465-1638位氨基酸残基,例如至少包含SEQ ID NO:25第1457-1638位氨基酸残基,优选地,所述片段由SEQ ID NO:25第1457-1638位氨基酸残基、SEQ ID NO:25第1457-1642位氨基酸残基,或SEQ ID NO:25第1457-1646位氨基酸残组成;所述突变体在第10、82和156位具有取代突变,优选地,所述取代突变是K10E、T82I和E156G,更优选地,所述突变体含有如SEQ ID NO:31第1447-1629位所示的氨基酸序列,或由如SEQ ID NO:31第1447-1629位所示的氨基酸残基组成。
9.如权利要求8所述的方法,其特征在于,所述融合蛋白的氨基酸序列如SEQ ID NO:23、25、27、29、31、33、48或50所示,或如SEQ ID NO:25第26-1654位氨基酸所示,或如SEQID NO:27第26-1638位所示,或如SEQ ID NO:31第26-1629位氨基酸所示,或如SEQ IDNO:33第26-1638位氨基酸所示,或如SEQ ID NO:48第26-1629位氨基酸所示。
10.一种融合蛋白,其特征在于,所述融合蛋白含有胞核酸酶活性部分或完全缺失但保留了解旋酶活性的Cas酶,胞嘧啶脱氨酶或其保留了酶活的片段或突变体,以及Ugi,和任选的核定位序列和接头序列。
11.如权利要求10所述的融合蛋白,其特征在于,
所述Cas酶如权利要求7或8所述;
所述胞嘧啶脱氨酶或其保留了酶活的片段或突变体如权利要求7或8所述;
所述Ugi的氨基酸序列如SEQ ID NO:23第1576-1659位氨基酸残基所示。
12.一种组合物或含有该组合物的试剂盒,其中,
所述组合物含有权利要求10或11所述的融合蛋白或其表达载体;
所述试剂盒还任选地含有能被该组合物中的融合蛋白识别的sgRNA或其表达载体;
优选地,所述试剂盒含有能使该组合物中的融合蛋白和sgRNA表达的病毒颗粒。
13.一种sgRNA,其包含蛋白识别区和靶标识别区,其特征在于,所述靶标结合区结合到含有感兴趣基因感兴趣内含子剪接位点的序列,或结合到感兴趣基因的感兴趣多聚嘧啶区的互补序列。
14.如权利要求13所述的sgRNA,其特征在于,所述sgRNA的靶标结合区结合到DMD外显子50的含5’剪接位点的序列;优选地,所述sgRNA的靶标结合区如SEQ ID NO:17或51所示。
15.权利要求10或11所述的融合蛋白或其表达载体及能被该融合蛋白识别的sgRNA或其表达载体在制备用于调控RNA剪接的试剂或试剂盒、用于基因治疗的试剂、或用于治疗因基因变异导致的疾病或受益于功能蛋白不同剪接亚型的比例改变的肿瘤的药物中的应用;
优选地,所述因基因变异导致的疾病选自:DMD基因变异造成的杜氏肌无力症,SMN,β血红球蛋白IVS2 647G>A突变造成的地中海贫血症,LMNA突变造成的早衰症和家族性高胆固醇血症;所述剪接亚型选自:Stat3α向Stat3β的转换,PKM2向PKM1的转换,MDM4外显子6的跳跃,Bcl2可变剪接位点的选择和LRP8外显子八跳跃。
CN201810819909.8A 2017-07-25 2018-07-24 通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控rna剪接的方法 Active CN109295053B (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2017106116518 2017-07-25
CN201710611651 2017-07-25

Publications (2)

Publication Number Publication Date
CN109295053A true CN109295053A (zh) 2019-02-01
CN109295053B CN109295053B (zh) 2023-12-22

Family

ID=65039989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810819909.8A Active CN109295053B (zh) 2017-07-25 2018-07-24 通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控rna剪接的方法

Country Status (5)

Country Link
US (1) US20210355508A1 (zh)
EP (1) EP3712272A4 (zh)
CN (1) CN109295053B (zh)
CA (1) CA3106738A1 (zh)
WO (1) WO2019020007A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110714008A (zh) * 2019-10-17 2020-01-21 南通大学 核苷酸序列构建的crispr重组质粒靶向编辑iss序列的方法
CN112063621A (zh) * 2020-09-02 2020-12-11 西湖大学 杜氏肌营养不良症相关的外显子剪接增强子、sgRNA、基因编辑工具及应用
WO2023169410A1 (zh) * 2022-03-08 2023-09-14 中国科学院遗传与发育生物学研究所 胞嘧啶脱氨酶及其在碱基编辑中的用途

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3990119A4 (en) * 2019-06-26 2024-03-27 Fred Hutchinson Cancer Center METHODS AND COMPOSITIONS USING BRD9 ACTIVATION THERAPIES FOR THE TREATMENT OF CANCER AND RELATED DISEASES
WO2021113390A1 (en) * 2019-12-02 2021-06-10 Shape Therapeutics Inc. Compositions for treatment of diseases
US20240165271A1 (en) * 2021-03-26 2024-05-23 The Board Of Regents Of The University Of Texas System Nucleotide editing to reframe dmd transcripts by base editing and prime editing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012174610A1 (en) * 2011-06-24 2012-12-27 Murdoch Childrens Research Institute Treatment and diagnosis of epigenetic disorders and conditions
WO2015089406A1 (en) * 2013-12-12 2015-06-18 President And Fellows Of Harvard College Cas variants for gene editing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7067793B2 (ja) * 2015-10-23 2022-05-16 プレジデント アンド フェローズ オブ ハーバード カレッジ 核酸塩基編集因子およびその使用
EP3368063B1 (en) * 2015-10-28 2023-09-06 Vertex Pharmaceuticals Inc. Materials and methods for treatment of duchenne muscular dystrophy
CN107043779B (zh) * 2016-12-01 2020-05-12 中国农业科学院作物科学研究所 一种CRISPR/nCas9介导的定点碱基替换在植物中的应用

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012174610A1 (en) * 2011-06-24 2012-12-27 Murdoch Childrens Research Institute Treatment and diagnosis of epigenetic disorders and conditions
WO2015089406A1 (en) * 2013-12-12 2015-06-18 President And Fellows Of Harvard College Cas variants for gene editing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KOMOR A C等: "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage", 《NATURE》 *
马云青等: "AID介导的原位靶点突变——哺乳动物DNA碱基编辑新技术", 《中国细胞生物学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110714008A (zh) * 2019-10-17 2020-01-21 南通大学 核苷酸序列构建的crispr重组质粒靶向编辑iss序列的方法
CN112063621A (zh) * 2020-09-02 2020-12-11 西湖大学 杜氏肌营养不良症相关的外显子剪接增强子、sgRNA、基因编辑工具及应用
WO2022047876A1 (zh) * 2020-09-02 2022-03-10 西湖大学 杜氏肌营养不良症相关的外显子剪接增强子、sgRNA、基因编辑工具及应用
CN115011598A (zh) * 2020-09-02 2022-09-06 西湖大学 杜氏肌营养不良症相关的外显子剪接增强子、sgRNA、基因编辑工具及应用
WO2023169410A1 (zh) * 2022-03-08 2023-09-14 中国科学院遗传与发育生物学研究所 胞嘧啶脱氨酶及其在碱基编辑中的用途

Also Published As

Publication number Publication date
EP3712272A4 (en) 2021-10-13
WO2019020007A1 (zh) 2019-01-31
CN109295053B (zh) 2023-12-22
CA3106738A1 (en) 2019-01-31
US20210355508A1 (en) 2021-11-18
EP3712272A1 (en) 2020-09-23

Similar Documents

Publication Publication Date Title
CN109295053A (zh) 通过诱导剪接位点碱基突变或多聚嘧啶区碱基置换调控rna剪接的方法
JP7197363B2 (ja) ヌクレアーゼを使用するヒト神経幹細胞のゲノム編集
JP2024023294A (ja) 遺伝子編集のためのcpf1関連方法及び組成物
KR102613296B1 (ko) 신규한 crispr 효소 및 시스템
JP2024056895A (ja) モジュラーAAV送達システムによるCRISPR-Casゲノム編集
JP2024023194A (ja) 肝臓ターゲティングおよび治療のためのCRISPR-Cas系、ベクターおよび組成物の送達および使用
KR20210089629A (ko) Rna-가이드된 뉴클레아제 및 그의 활성 단편 및 변이체 및 사용 방법
WO2017215619A1 (zh) 在细胞内产生点突变的融合蛋白、其制备及用途
CN106434725B (zh) 小环dna载体制剂及其制备方法和使用方法
CN109844116A (zh) 包括使用h1启动子对crispr指导rna的改进的组合物和方法
PT2896697E (pt) Engenharia de sistemas, métodos e composições guia otimizadas para a manipulação de sequências
KR20160044457A (ko) 서열 조작을 위한 탠덤 안내 시스템, 방법 및 조성물의 전달, 조작 및 최적화
JP2016521994A (ja) 配列操作のための最適化されたCRISPR−Cas二重ニッカーゼ系、方法および組成物
KR20230002401A (ko) C9orf72의 표적화를 위한 조성물 및 방법
JP2015535010A (ja) 部位特異的酵素および使用方法
JP2000515011A (ja) 形質転換関連組換えクローニング
KR20230129230A (ko) Bcl11a의 표적화를 위한 조성물 및 방법
WO2018093954A1 (en) Stem loop rna mediated transport of mitochondria genome editing molecules (endonucleases) into the mitochondria
CN111718420A (zh) 一种用于基因治疗的融合蛋白及其应用
KR20230129162A (ko) 제1형 근긴장성 이영양증을 치료하기 위한 rna 표적화조성물 및 방법
JP2002543792A (ja) ベクターによるトランスポゾン配列の供給及び組込み
CN110177878A (zh) 转基因动物和生物生产方法
CN115851706A (zh) 一种碱基编辑系统及其应用
WO2023184107A1 (en) Crispr-cas13 system for treating mecp2-associated diseases
WO2020187272A1 (zh) 一种用于基因治疗的融合蛋白及其应用

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 200031 Yueyang Road, Shanghai, No. 319, No.

Applicant after: Shanghai Institute of nutrition and health, Chinese Academy of Sciences

Address before: 200031 Yueyang Road, Shanghai, No. 319, No.

Applicant before: SHANGHAI INSTITUTES FOR BIOLOGICAL SCIENCES, CHINESE ACADEMY OF SCIENCES

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant