CN108070611B - 植物碱基编辑方法 - Google Patents

植物碱基编辑方法 Download PDF

Info

Publication number
CN108070611B
CN108070611B CN201711122179.8A CN201711122179A CN108070611B CN 108070611 B CN108070611 B CN 108070611B CN 201711122179 A CN201711122179 A CN 201711122179A CN 108070611 B CN108070611 B CN 108070611B
Authority
CN
China
Prior art keywords
leu
lys
glu
ile
asp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711122179.8A
Other languages
English (en)
Other versions
CN108070611A (zh
Inventor
高彩霞
宗媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Genetics and Developmental Biology of CAS
Original Assignee
Institute of Genetics and Developmental Biology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Genetics and Developmental Biology of CAS filed Critical Institute of Genetics and Developmental Biology of CAS
Publication of CN108070611A publication Critical patent/CN108070611A/zh
Application granted granted Critical
Publication of CN108070611B publication Critical patent/CN108070611B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/02Methods or apparatus for hybridisation; Artificial pollination ; Fertility
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/06Processes for producing mutations, e.g. treatment with chemicals or with radiation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8202Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by biological means, e.g. cell mediated or natural vector
    • C12N15/8205Agrobacterium mediated transformation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8242Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits
    • C12N15/8243Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits involving biosynthetic or metabolic pathways, i.e. metabolic engineering, e.g. nicotine, caffeine
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • C12N15/8262Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield involving plant development
    • C12N15/8263Ablation; Apoptosis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • C12N15/8271Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance
    • C12N15/8274Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance for herbicide resistance
    • C12N15/8278Sulfonylurea
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • C12N15/8287Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for fertility modification, e.g. apomixis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04005Cytidine deaminase (3.5.4.5)
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
    • Y02A40/146Genetically Modified [GMO] plants, e.g. transgenic plants

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Nutrition Science (AREA)
  • Mycology (AREA)
  • Botany (AREA)
  • Developmental Biology & Embryology (AREA)
  • Environmental Sciences (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Peptides Or Proteins (AREA)

Abstract

本发明涉及植物基因工程领域。具体而言,本发明涉及一种植物碱基编辑方法。更具体而言,本发明涉及一种通过向导RNA指导的Cas9‑胞苷脱氨酶融合蛋白对植物(例如作物植物)基因组中的靶序列进行高效碱基编辑的方法,以及通过所述方法产生的植物及其后代。

Description

植物碱基编辑方法
技术领域
本发明涉及植物基因工程领域。具体而言,本发明涉及一种植物碱基 编辑方法。更具体而言,本发明涉及一种通过向导RNA指导的Cas9-胞苷 脱氨酶融合蛋白对植物(例如作物植物)基因组中的靶序列进行高效碱基编 辑的方法,以及通过所述方法产生的植物及其后代。
背景技术
高效的作物改良的前提是能够获得新的遗传突变,这些突变可以容易 地引入现代栽培种中。遗传研究,尤其是基于全基因组相关的研究表明, 单核苷酸的改变是构成作物性状差异的主要原因。单碱基的变异会导致氨 基酸替换,从而导致优良等位基因和优异性状的进化。在基因组编辑出现前, 定向诱导基因组局部突变(TILLING)可以作为用于产生作物改良中迫切需 要的突变的方法。然而,TILLING筛选耗时耗力,并且所鉴别的点突变经 常受数目和种类的限制。基因组编辑技术,特别是基于CRISPR/Cas9系统 的基因组编辑技术可以通过同源重组(HR)介导的DNA修复途径来实现在 基因组位点中引入特定碱基的替换。但目前,该方法的成功使用受到很大 限制,主要是由于在植物中HR介导的双链断链修复发生的频率很低。另外, 有效的提供足量的DNA修复模板也是目前的一大难关。这些问题使得目前 在植物中通过HR的方式高效而简单地实现定点突变成为一大挑战。因此, 本领域仍然迫切需要新的对植物基因组进行定点突变的方法。
发明简述
在第一方面,本发明提供一种用于对植物基因组中的靶序列进行碱基 编辑的系统,其包含以下i)至v)中至少一项:
i)碱基编辑融合蛋白,和向导RNA;
ii)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导 RNA;
iii)碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构 建体;
iv)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编 码向导RNA的核苷酸序列的表达构建体;
v)包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷 酸序列的表达构建体;
其中所述碱基编辑融合蛋白包含核酸酶失活的Cas9结构域和脱氨酶结 构域,所述向导RNA能够将所述碱基编辑融合蛋白靶向植物基因组中的靶 序列。
在第二方面,本发明提供一种产生经遗传修饰的植物的方法,包括将 本发明的用于对植物基因组中的靶序列进行碱基编辑的系统导入植物,由 此所述向导RNA将所述碱基编辑融合蛋白靶向所述植物基因组中的靶序 列,导致所述靶序列中的一或多个C被T取代。
在第三方面,本发明提供经遗传修饰的植物或其后代,其中所述植物 通过本发明的方法获得。
在第四方面,本发明一种植物育种方法,包括将通过本发明的方法获 得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而 将所述遗传修饰导入第二植物。
附图描述
图1.示出通过BFP变成GFP检测报告系统在植物原生质体中建立定 向-植物碱基编辑(PBE)介导的基因组编辑系统。(a)n/dCas9-PBE表达载体的 示意图。APOBEC1、n/dCas9、XTEN和UGI全部针对小麦进行密码子优 化,在n/dCas9-PBE两端添加NLS,融合蛋白由玉米Ubiquitin-1启动子驱 动。(b)用于检测植物原生质体中定向-PBE系统介导的精确基因组编辑的 BFP-to-GFP报告基因系统。(C)通过流式细胞术测量小麦和水稻原生质体中定向PBE介导的BFP至GFP突变效率。显示四个视野的原生质体。原 生质体分别用以下DNA构建体转化(从左至右):(i)pnCas9-PBE、pUbi-BFPm 以及pBFP-sgRNA;(ii)pdCas9-PBE、pUbi-BFPm以及pBFP-sgRNA;(iii)仅 pUbi-BFPm和pBFP-sgRNA;(iv)玉米Ubiquitin-1启动子驱动的GFP表达 阳性对照pUbi-GFP。比例尺:800μm。(D)定向PBE系统诱导的BFP靶 位点基因型和频率。序列中靶碱基为第4位的C,下面的数字表示其在靶 序列中的位置。每个序列下为对应碱基占总测序读段的百分比。三个邻近C (C3、C6、C9)也被转换成T,但并不造成蛋白序列的改变。
图2.示出定向PBE系统在植物原生质体中介导内源基因的点突变。(a) 定向C至T单取代的频率。值和误差条反映在不同日期进行的三个生物学 重复的均值和标准差。(b)nCas9-PBE造成的胞苷多取代的频率。“--”表示 无。基于突变读段相对于总捕获读段的数目。(c)植物原生质体中内源基因 7个靶位点的indel形成频率。示出了用nCas9-PBE、dCas9-PBE或野生型 Cas9处理在靶位置具有T或indel的总DNA测序读段的百分比。值和误差 条反映在不同日期进行的三个生物学重复的均值和标准差。(d)使用 nCas9-PBE在ZmCENH3的CENP-A靶结构域(CATD)产生的点突变谱。野 生型ZmCENH3中三个连续残基丙氨酸-亮氨酸-亮氨酸(ALL)被突变为 AFL、VLL、ALF、AFF、VFL、VLF或VFF。拟南芥CENH3(AtCENH3) 和大麦CENH3(HvCENH3)的CATD区域包括在该图中进行比较。下划线 标记的亮氨酸残基之前已经显示为是AtCENH3和HvCENH3发挥正确功能 所必需的。已经发现AtCENH3中下划线标记的亮氨酸和丙氨酸的取代导致 单倍体诱导。ZmCENH3、AtCENH3和HvCENH3的Genbank登录号分别 为AF519807、AF465800和JF419329。
图3.应用定向-PBE系统获得小麦和水稻基因修饰植物。(a)设计为靶 向TaLOX2的外显子5位点的sgRNA的序列。脱氨化窗口中靶C用斜体加 粗显示。PAM序列用斜体显示,而SalI限制性位点为GTCGAC。(b)TaLOX2 两株点突变突变体的T7EI和PCR-RE测定结果。泳道T0-1至T0-12显示 T7EI和SalI消化的扩增自独立的小麦植物的PCR片段。标记为WT/D和WT/U的泳道分别是用或没用T7EI和SalI消化的扩增自野生型植物的PCR 片段。箭头标记的条带是由于定向PBE诱导的突变造成的。两个点突变突 变体的基因型进一步通过Sanger测序鉴定。(c)用于靶向OsCDC48基因的 农杆菌表达载体pAG-n/dCas9-PBE的结构图。Hyg由2x35S启动子驱动。 (d)设计为靶向OsCDC48的外显子9位点的sgRNA的序列。示出了12个OsCDC48点突变突变体的T7EI测定结果。(e)通过Sanger测序鉴别的水稻 中OsCDC48点突变突变体的基因型和频率。右边示出每种基因型的点突变 的频率(每种基因型突变体相对于突变体水稻总数)。靶碱基为C3、C4、C7、C8,下标数字表示其在靶序列中的位置。下面的各序列是突变体水稻的序 列结果。
图4.示出通过Sanger测序鉴定的12株代表性OsCDC48点突变突变 体的基因型。斜体加粗碱基C表示脱氨化窗口第3、4、7和8位的待突变 C。小写碱基t代表靶位点中成功的C至T取代。
图5.示出通过nCas9-PBE系统对ZmALS1/ZmALS2进行碱基编辑。(a) ZmALS1/ZmALS2的基因组结构及共同的sgRNA靶序列;(b)在不同的 nCas9-PBE修饰株系中检测ZmALS1/ZmALS2的C变T类型。
发明详述
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有 本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化 学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室 操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使 用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文 献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,MolecularCloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold SpringHarbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发 明,下面提供相关术语的定义和解释。
“Cas9核酸酶”和“Cas9”在本文中可互换使用,指的是包括Cas9蛋 白或其片段(例如包含Cas9的活性DNA切割结构域和/或Cas9的gRNA结 合结构域的蛋白)的RNA指导的核酸酶。Cas9是CRISPR/Cas(成簇的规律 间隔的短回文重复序列及其相关系统)基因组编辑系统的组分,能在向导 RNA的指导下靶向并切割DNA靶序列形成DNA双链断裂(DSB)。
“向导RNA”和“gRNA”在本文中可互换使用,通常由部分互补形 成复合物的crRNA和tracrRNA分子构成,其中crRNA包含与靶序列具有 足够互补性以便与该靶序列杂交并且指导CRISPR复合物 (Cas9+crRNA+tracrRNA)与该靶序列序列特异性结合的序列。然而,本领域 已知可以设计单向导RNA(sgRNA),其同时包含crRNA和tracrRNA的特征。
“脱氨酶”是指催化脱氨基反应的酶。在本发明一些实施方式中,所 述脱氨酶指的是胞苷脱氨酶,其催化胞苷或脱氧胞苷分别脱氨化为尿嘧啶 或脱氧尿嘧啶。
“基因组”在用于植物细胞时不仅涵盖存在于细胞核中的染色体DNA, 而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
如本文所使用的,术语“植物”包括整个植物和任何后代、植物的细 胞、组织、或部分。术语“植物部分”包括植物的任何部分,包括,例如 但不限于:种子(包括成熟种子、没有种皮的未成熟胚、和不成熟的种子); 植物插条(plant cutting);植物细胞;植物细胞培养物;植物器官(例如,花 粉、胚、花、果实、芽、叶、根、茎,和相关外植体)。植物组织或植物器官可以是种子、愈伤组织、或者任何其他被组织成结构或功能单元的植物 细胞群体。植物细胞或组织培养物能够再生出具有该细胞或组织所来源的 植物的生理学和形态学特征的植物,并能够再生出与该植物具有基本上相 同基因型的植物。与此相反,一些植物细胞不能够再生产生植物。植物细 胞或组织培养物中的可再生细胞可以是胚、原生质体、分生细胞、愈伤组 织、花粉、叶、花药、根、根尖、丝、花、果仁、穗、穗轴、壳、或茎。
植物部分包括可收获的部分和可用于繁殖后代植物的部分。可用于繁 殖的植物部分包括,例如但不限于:种子;果实;插条;苗;块茎;和砧 木。植物的可收获部分可以是植物的任何有用部分,包括,例如但不限于: 花;花粉;苗;块茎;叶;茎;果实;种子;和根。
植物细胞是植物的结构和生理单元。如本文所使用的,植物细胞包括 原生质体和具有部分细胞壁的原生质体。植物细胞可以处于分离的单个细 胞或细胞聚集体的形式(例如,松散愈伤组织和培养的细胞),并且可以是更 高级组织单元(例如,植物组织、植物器官、和植物)的一部分。因此,植物 细胞可以是原生质体、产生配子的细胞,或者能够再生成完整植物的细胞 或细胞的集合。因此,在本文的实施方案中,包含多个植物细胞并能够再生成为整株植物的种子被认为是一种“植物部分”。
如本文所使用的,术语“原生质体”是指细胞壁被完全或部分地除去、 其脂双层膜裸露的植物细胞。典型地,原生质体是没有细胞壁的分离植物 细胞,其具有再生成细胞培养物或整株植物的潜力。
植物“后代”包括植物的任何后续世代。
“经遗传修饰的植物”包括在其基因组内包含外源多核苷酸或修饰的 基因或表达调控序列的植物。例如外源多核苷酸能够稳定地整合进基因组 中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的 部分整合进基因组中。修饰的基因或表达调控序列为在植物基因组中所述 序列包含单个或多个脱氧核苷酸取代、缺失和添加。例如,通过本发明获 得的经遗传修饰的植物可以相对于野生型植物(相应的未经所述遗传修饰的 植物)包含一个或多个C至T的取代。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相 同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座 的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使 用并且是单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然 的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A” 为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G” 表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H”表示A或C或 T,“I”表示肌苷,并且“N”表示任何核苷酸。
“多肽”、“肽”、和“蛋白质”在本发明中可互换使用,指氨基酸残基 的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的 氨基酸的人工化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸 聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白质”还可包括修饰 形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、 羟化和ADP-核糖基化。
如本发明所用,“表达构建体”是指适于感兴趣的核苷酸序列在植物中 表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的 表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻 译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载 体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA)。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷 酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感 兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5' 非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、 RNA加工或稳定性或者翻译的核苷酸序列。植物表达调控元件指的是能够 在植物中控制感兴趣的核苷酸序列转录、RNA加工或稳定性或者翻译的核 苷酸序列。
调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸 化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一 些实施方案中,启动子是能够控制植物细胞中基因转录的启动子,无论其 是否来源于植物细胞。启动子可以是组成型启动子或组织特异性启动子或 发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下 表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用, 并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特 定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件 决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化 学信号等)而选择性表达可操纵连接的DNA序列。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启 动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连 接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控 元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”植 物是指用所述核酸或蛋白质转化植物细胞,使得所述核酸或蛋白质在植物 细胞中能够发挥功能。本发明所用的“转化”包括稳定转化和瞬时转化。
“稳定转化”指将外源核苷酸序列导入植物基因组中,导致外源基因 稳定遗传。一旦稳定转化,外源核酸序列稳定地整合进所述植物和其任何 连续世代的基因组中。
“瞬时转化”指将核酸分子或蛋白质导入植物细胞中,执行功能而没 有外源基因稳定遗传。瞬时转化中,外源核酸序列不整合进植物基因组中。
“性状”指植物或特定植物材料或细胞的生理的、形态的、生化的或 物理的特征。在一些实施方式中,这些特征可以是肉眼可见的,比如种子、 植株的大小等;可用生物化学技术测定的指标,如种子或叶片中蛋白、淀 粉或油份的含量等;可观察的代谢或生理过程,如测定对水分胁迫、特定 盐、糖或氮浓度的抗性;可检测的基因表达水平;或可观察渗透胁迫的抗 性或产量等农艺性状。在一些实施方式中,性状还包括植物的倍性(ploidy), 例如对植物育种重要的单倍性(haploidy)。在一些实施方式中,性状还包括 植物对除草剂的抗性。
“农艺性状”是可测量的指标参数,包括但不限于:叶片绿色、籽粒 产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、 果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营 养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离 氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白 含量、种子蛋白含量、植物营养组织蛋白质含量、抗旱性、氮的吸收、根 的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗 盐性和分蘖数等。
二、植物碱基编辑系统
本发明提供了一种用于对植物基因组中的靶序列进行碱基编辑的系 统,其包含以下i)至v)中至少一项:
i)碱基编辑融合蛋白,和向导RNA;
ii)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导 RNA;
iii)碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构 建体;
iv)包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编 码向导RNA的核苷酸序列的表达构建体;
v)包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷 酸序列的表达构建体;
其中所述碱基编辑融合蛋白包含核酸酶失活的Cas9结构域和脱氨酶结 构域,所述向导RNA能够将所述碱基编辑融合蛋白靶向植物基因组中的靶 序列。
Cas9核酸酶的DNA切割结构域已知包含两个亚结构域:HNH核酸酶 亚结构域和RuvC亚结构域。HNH亚结构域切割与gRNA互补的链,而RuvC 亚结构域切割非互补的链。在这些亚结构域中的突变可以使Cas9的核酸酶 活性失活,形成“核酸酶失活的Cas9”。所述核酸酶失活的Cas9仍然保留 gRNA指导的DNA结合能力。因此,原则上,当与另外的蛋白融合时,核 酸酶失活的Cas9可以简单地通过与合适的向导RNA共表达而将所述另外 的蛋白靶向几乎任何DNA序列。
胞苷脱氨酶可以催化DNA上胞苷(C)的脱氨化作用形成尿嘧啶(U)。将 核酸酶失活的Cas9与胞苷脱氨酶融合,在向导RNA的指导下,融合蛋白 可以靶向植物基因组中的靶序列,由于Cas9核酸酶活性缺失,DNA双链 不被切割,而融合蛋白中的脱氨酶结构域能够将Cas9-向导RNA-DNA复合 物形成中产生的单链DNA的胞苷脱氨转换成U,再通过碱基错配修复实现 C至T的取代。
因此,在本发明的一些实施方案中,所述脱氨酶是胞苷脱氨酶,例如 载脂蛋白BmRNA编辑复合体(APOBEC)家族脱氨酶。本发明所述脱氨酶 特别是可以接受单链DNA作为底物的脱氨酶。
本发明可用的胞苷脱氨酶的实例包括但不限于:APOBEC1脱氨酶、激 活诱导的胞苷脱氨酶(AID)、APOBEC3G或CDA1。
在本发明的一些具体实施方式中,所述胞苷脱氨酶包含SEQ ID NO:11 所示的氨基酸序列。
本发明所述核酸酶失活的Cas9可以衍生自不同物种的Cas9,例如,衍 生自化脓链球菌(S.pyogenes)Cas9(SpCas9,核苷酸序列示于:SEQ ID NO:18;氨基酸序列示于SEQ IDNO:21)。同时突变SpCas9的HNH核酸酶 亚结构域和RuvC亚结构域(例如,包含突变D10A和H840A)使化脓链球菌 (S.pyogenes)Cas9的核酸酶失去活性,成为核酸酶死亡Cas9(dCas9)。突变 失活其中一个亚结构域可以使得Cas9具有切口酶活性,即获得Cas9切口 酶(nCas9),例如,仅具有突变D10A的nCas9。
因此,在本发明的一些实施方案中,本发明所述核酸酶失活的Cas9相 对于野生型Cas9包含氨基酸取代D10A和/或H840A。
在本发明的一些优选的实施方案中,本发明所述核酸酶失活的Cas9具 有切口酶活性。不受任何理论限制,据认为真核生物的错配修复通过DNA 链上的切口(nick)来指导该链错配碱基的移除和修复。胞苷脱氨酶作用形成 的U:G错配可能被修复为C:G。通过在包含未编辑的G的一条链上引入切 口,将能够优先地将U:G错配修复为期望的U:A或T:A。因此,优选地, 所述核酸酶失活的Cas9是Cas9切口酶,其保留Cas9的HNH亚结构域的 切割活性,而RuvC亚结构域的切割活性失活。例如,所述核酸酶失活的 Cas9相对于野生型Cas9包含氨基酸取代D10A。
在本发明的一些具体实施方式中,所述核酸酶失活的Cas9包含SEQ ID NO:14的氨基酸序列。在一些优选实施方式中,所述核酸酶失活的Cas9包 含SEQ ID NO:13的氨基酸序列。
在本发明的一些实施方案中,所述脱氨酶结构域被融合至所述核酸酶 失活的Cas9结构域的N末端。在一些实施方案中,所述脱氨酶结构域被融 合至所述核酸酶失活的Cas9结构域的C末端。
在本发明的一些实施方案中,所述脱氨酶结构域和所述核酸酶失活的 Cas9结构域通过接头融合。所述接头可以是长1-50个(例如1、2、3、4、 5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或 20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸 序列。例如,所述接头可以是柔性接头,例如GGGGS、GS、GAP、(GGGGS) x 3、GGS和(GGS)x7等。在一些优选的实施方案中,所述接头是SEQ ID NO:12所示的XTEN接头。
在细胞中,尿嘧啶DNA糖基化酶催化U从DNA上的去除并启动碱基 切除修复(BER),导致将U:G修复成C:G。因此,不受任何理论限制,在本 发明的碱基编辑融合蛋白或本发明的系统中包含尿嘧啶DNA糖基化酶抑 制剂将能够增加碱基编辑的效率。
因此,在本发明的一些实施方案中,所述碱基编辑融合蛋白还包含尿 嘧啶DNA糖基化酶抑制剂(UGI)。在一些具体实施方式中,所述尿嘧啶DNA 糖基化酶抑制剂包含SEQ IDNO:15所示的氨基酸序列。
在本发明的一些实施方案中,本发明的碱基编辑融合蛋白还包含核定 位序列(NLS)。一般而言,所述碱基编辑融合蛋白中的一个或多个NLS应 具有足够的强度,以便在植物细胞的核中驱动所述碱基编辑融合蛋白以可 实现其碱基编辑功能的量积聚。一般而言,核定位活性的强度由所述碱基 编辑融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或 这些因素的组合决定。
在本发明的一些实施方案中,本发明的碱基编辑融合蛋白的NLS可以 位于N端和/或C端。在一些实施方案中,所述碱基编辑融合蛋白包含约1、 2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述 碱基编辑融合蛋白包含在或接近于N端的约1、2、3、4、5、6、7、8、9、 10个或更多个NLS。在一些实施方案中,所述碱基编辑融合蛋白包含在或 接近于C端约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些 实施方案中,所述碱基编辑融合蛋白包含这些的组合,如包含在N端的一 个或多个NLS以及在C端的一个或多个NLS。当存在多于一个NLS时, 每一个可以被选择为不依赖于其他NLS。在本发明的一些优选实施方式中, 所述碱基编辑融合蛋白包含2个NLS,例如所述2个NLS分别位于N端和 C端。
一般而言,NLS由暴露于蛋白表面上的带正电的赖氨酸或精氨酸的一 个或多个短序列组成,但其他类型的NLS也是已知的。NLS的非限制性实 例包括:KKRKV(核苷酸序列5’-AAGAAGAGAAAGGTC-3’)、 PKKKRKV(核苷酸序列5’-CCCAAGAAGAAGAGGAAGGTG-3’或CCAAAGAAGAAGAGGAAGGTT),或SGGSPKKKRKV(核苷酸序列5’- TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3’)。
在本发明的一些实施方式中,所述碱基编辑融合蛋白的N端包含 PKKKRKV所示的氨基酸序列的NLS。在本发明的一些实施方式中,所述 碱基编辑融合蛋白的C端包含SGGSPKKKRKV所示的氨基酸序列的NLS。
此外,根据所需要编辑的DNA位置,本发明的碱基编辑融合蛋白还可 以包括其他的定位序列,例如细胞质定位序列、叶绿体定位序列、线粒体 定位序列等。
在本发明的一些具体实施方式中,所述碱基编辑融合蛋白包含SEQ ID NO:22或23所示的氨基酸序列。
为了在植物中获得有效表达,在本发明的一些实施方式中,所述编码 碱基编辑融合蛋白的核苷酸序列针对待进行碱基编辑的植物进行密码子优 化。
密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使 用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、 5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修 饰核酸序列以便增强在感兴趣宿主细胞中的表达的方法。不同的物种对于 特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的 密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关,而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的 可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码 子。因此,可以将基因定制为基于密码子优化在给定生物中的最佳基因表 达。密码子利用率表可以容易地获得,例如在www.kazusa.orjp/codon/上可 获得的密码子使用数据库(“Codon Usage Database”)中,并且这些表可以 通过不同的方式调整适用。参见,Nakamura Y.等,“Codon usagetabulated from the international DNA sequence databases:status for theyear2000. Nucl.Acids Res.,28:292(2000)。
在本发明一些具体实施方式中,经密码子优化的所述编码碱基编辑融 合蛋白的核苷酸序列示于SEQ ID NO:19或20。
在本发明一些实施方式中,所述向导RNA是单向导RNA(sgRNA)。根 据给定的靶序列构建合适的sgRNA的方法是本领域已知的。例如,可参见 文献:Wang,Y.etal.Simultaneous editing of three homoeoalleles in hexaploid bread wheatconfers heritable resistance to powdery mildew.Nat.Biotechnol.32, 947-951(2014);Shan,Q.et al.Targeted genome modification of crop plants using aCRISPR-Cas system.Nat.Biotechnol.31,686-688(2013);Liang,Z.et al.Targetedmutagenesis in Zea mays using TALENs and the CRISPR/Cas system.J GenetGenomics.41,63–68(2014)。
在本发明一些实施方式中,所述编码碱基编辑融合蛋白的核苷酸序列 和/或所述编码向导RNA的核苷酸序列与植物表达调控元件如启动子可操 作地连接。
本发明可使用的启动子的实例包括但不限于:花椰菜花叶病毒35S启 动子(Odellet al.(1985)Nature 313:810-812)、玉米Ubi-1启动子、小麦U6 启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子、TrpPro5 启动子(美国专利申请No.10/377,318;2005年3月16日提请)、pEMU启动 子(Last et al.(1991)Theor.Appl.Genet.81:581-588)、MAS启动子(Velten et al.(1984)EMBO J.3:2723-2730)、玉米H3组蛋白启动子(Lepetit et al.(1992)Mol.Gen.Genet.231:276-285和Atanassova et al.(1992)Plant J. 2(3):291-300)和欧洲油菜(Brassica napus)ALS3(PCT申请WO 97/41228)启 动子。可用于本发明的启动子还包含Moore et al.(2006)Plant J. 45(4):651-683中综述的常用组织特异性启动子。
三、产生经遗传修饰的植物的方法
在另一方面,本发明提供了一种产生经遗传修饰的植物的方法,包括 将本发明的用于对植物基因组中的靶序列进行碱基编辑的系统导入植物, 由此所述向导RNA将所述碱基编辑融合蛋白靶向所述植物基因组中的靶序 列,导致所述靶序列中的一或多个C被T取代。
可以被Cas9和向导RNA复合物识别并靶向的靶序列的设计属于本领 域普通技术人员的技能范围。一般而言,靶序列是与向导RNA中包含的大 约20个核苷酸的引导序列互补的序列,且3’末端紧邻前间区序列邻近基序 (protospacer adjacent motif)(PAM)NGG。
例如,在本发明的一些实施方案中,所述靶序列具有以下结构: 5’-NX-NGG-3’,其中N独立地选自A、G、C和T;X为14≤X≤30的整 数;Nx表示X个连续的核苷酸,NGG为PAM序列。在本发明的一些具体 的实施方案中,X为20。
本发明的碱基编辑系统在植物中具有宽的脱氨化窗口,例如,具有长 度为7个核苷酸的脱氨化窗口。在本发明所述方法的一些实施方案中,所 述靶序列中第3至第9位内的一或多个C被T取代。例如,如果存在,所 述靶序列中第3至第9位内的任意1个、2个、3个、4个、5个、6个或7 个C被T取代。例如,如果所述靶序列的第3至第9位内包含4个C,则 其中任意1个、2个、3个、4个C可以被T取代。所述C可以是连续的, 也可是由其他核苷酸分隔开。因此,如果靶序列中存在多个C,通过本发 明的方法可以获得多种的突变组合。在本发明所述方法的一些实施方案中, 还包括筛选具有期望的核苷酸取代的植物。可以通过T7EI、PCR/RE或测 序方法检测植物中的核苷酸取代,例如可参见Shan,Q.,Wang,Y.,Li,J.& Gao,C.Genome editing in rice and wheat using the CRISPR/Cas system.Nat. Protoc.9,2395-2410(2014)。
在本发明的方法中,所述碱基编辑系统可以本领域技术人员熟知的各 种方法导入植物。可用于将本发明的碱基编辑系统导入植物的方法包括但 不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、 植物病毒介导的转化、花粉管通道法和子房注射法。
在本发明的方法中,只需在植物细胞中导入或产生所述碱基编辑融合 蛋白和向导RNA即可实现对靶序列的修饰,并且所述修饰可以稳定遗传, 无需将所述碱基编辑系统稳定转化植物。这样避免了稳定存在的碱基编辑 系统的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从 而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避 免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的碱基编辑系统转化至分 离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植 物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过 程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂 可以提高植物的再生效率,获得不含外源核苷酸序列的经修饰的植物。
在另一些实施方式中,可以将本发明的碱基编辑系统转化至完整植物 上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于 难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录 的RNA分子转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞 中实现碱基编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组 中的整合。
可以通过本发明的方法进行碱基编辑的植物包括单子叶植物和双子叶 植物。例如,所述植物可以是作物植物,例如小麦、水稻、玉米、大豆、 向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯 或马铃薯。
在本发明的一些实施方式中,其中所述靶序列与植物性状如农艺性状 相关,由此所述碱基编辑导致所述植物相对于野生型植物具有改变的性状。
在本发明中,待进行修饰的靶序列可以位于基因组的任何位置,例如 位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启 动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。 相应地,在本发明的一些实施方式中,所述C至T的取代导致靶蛋白中的 氨基酸取代或靶蛋白的截短(生成了终止密码子)。在本发明的另一些实施方 式中,所述C至T的取代导致靶基因的表达发生变化。
在一些具体实施方式中,通过本发明的方法进行修饰的基因可以是小 麦LOX2、水稻CDC48、NRT1.1B和SPL14,玉米CENH3和ALS基因。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的 植物的后代。
在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分, 其中所述植物通过本发明上述的方法获得。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明 上述的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植 物杂交,从而将所述遗传修饰导入第二植物。
四、单倍体玉米植物的产生方法及其用途
CENH3编码着丝粒组蛋白,其是动物和植物中着丝粒发挥正常功能所 必须的。TILLING研究已经显示拟南芥CENH3(AtCENH3)C端区域的若干 残基的氨基酸取代能够导致单倍体诱导,这对于加速作物育种是很有利的。 植物CENH3蛋白的CENP-A靶向结构域(CATD,图2d)中高度保守的亮氨 酸残基被苯丙氨酸(F)取代在拟南芥中导致单倍体诱导,但在大麦中尽管 CENH3向着丝粒的装载受损,但是没有出现单倍体诱导。本发明人发现,通过本发明的碱基编辑方法对ZmCENH3中CATD结构域进行突变,特别 是突变保守的亮氨酸残基能够用于诱导玉米单倍体,在玉米品种改良中可 以广泛使用。
本发明提供了一种产生玉米单倍体诱导品系的方法,包括通过本发明 的碱基编辑方法对玉米植物中的ZmCENH3基因进行修饰,导致ZmCENH3 中CATD结构域中的一个或多个氨基酸取代,所述氨基酸取代赋予玉米植 物单倍体诱导能力。
在一具体实施方式中,所述修饰导致SEQ ID NO:25的ZmCENH3中第 109-111位的保守基序丙氨酸-亮氨酸-亮氨酸(ALL)中的一或多个氨基酸取 代。例如,所述ALL基序被修饰为单取代:AFL、VLL或ALF;双取代: AFF、VFL或VLF;或三取代:VFF。
在一些具体实施方式中,通过本发明的碱基编辑方法对ZmCENH3进 行修饰的靶序列是AGCCCTCCTTGCGCTGCAAGAGG,其中下划线示出 PAM序列。
在一些实施方式中,所述玉米植物是综31品种。在一些实施方式中, 所述玉米植物是HiII品种。
本发明提供了一种产生玉米单倍体的方法,包括将通过本发明的方法 获得的玉米单倍体诱导品系与野生型玉米植物杂交,收获杂交子代以获得 单倍体玉米植物。在一些具体实施方式中,所述玉米单倍体诱导品系作为 父本,所述野生型玉米植物作为母本杂交。
本发明还涵盖通过本发明的方法获得的玉米单倍体诱导品系和玉米单 倍体以及其在玉米育种中的用途。
五、除草剂抗性玉米植物的产生方法
本发明提供了一种产生除草剂抗性玉米植物的方法,包括通过本发明 的碱基编辑方法对玉米植物中的ZmALS基因(编码乙酰乳酸合成酶)进行修 饰,导致ZmALS中的一个或多个氨基酸取代,所述氨基酸取代赋予玉米植 物除草剂抗性。
在一具体实施方式中,所述修饰同时导致SEQ ID NO:27的ZmALS1 和SEQ ID NO:29的ZmALS2的一或多个氨基酸取代。例如,所述ZmALS1 和ZmALS2的第165位残基被取代。
在一些具体实施方式中,通过本发明的碱基编辑方法对ZmALS1和 ZmALS2进行修饰的靶序列是CAGGTGCCGCGACGCATGATTGG,其中下 划线示出PAM序列。
在一些实施方式中,所述玉米植物是综31品种。在一些实施方式中, 所述玉米植物是HiII品种。
本发明还提供了一种培育除草剂抗性玉米植物方法,包括将通过本发 明上述的方法获得的第一除草剂抗性玉米植物与第二植物杂交,从而将除 草剂抗性导入第二植物。
本发明还涵盖通过本发明的方法获得的除草剂抗性玉米植物或其后 代。
一种在玉米植物种植区域防治不期望的植物的方法,包括向所述区域 内的植物施用ALS抑制剂除草剂,其中所述玉米植物是通过本发明的方法 获得的除草剂抗性玉米植物。
实施例
材料与方法
构建pn/dCas9-PBE表达载体
APOBEC1、XTEN、nCas9(D10A)、dCas9和UGI序列针对小麦进行密 码子优化(SEQ IDNO:1-5)并订购自GenScript(Nanjing)。使用引物对 AflII-F(具有AflII限制位点)和MluI-R(具有MluI限制位点)扩增全长 n/dCas9片段。PCR产物用AflII和MluI消化,然后插入至这两种酶消化的 pUC57-APOBEC1-XTEN-UGI载体(该载体序列示于SEQ ID NO:10)以产生 融合克隆载体pUC57-APOBECI-XTEN-n/dCas9-UGI。然后使用引物对 BamHI-F和Bsp1047I-R扩增APOBECI-XTEN-n/dCas9-UGI片段并用 BamHI和Bsp1047I消化,在插入至这两种酶消化的pUbi-GFP(该载体序列 示于SEQ ID NO:8)以产生融合表达载体pn/dCas9-PBE。
构建sgRNA表达载体
实验中sgRNA靶序列如下表1所示:
表1.靶基因及sgRNA靶序列
Figure BDA0001467588190000171
斜体的C表示第3-9位的脱氨化窗口中的待突变的C;粗体表示PAM序列。
根据之前描述(Wang,Y.et al.Simultaneous editing of three homoeoallelesin hexaploid bread wheat confers heritable resistance to powdery mildew.Nat.Biotechnol.32,947-951,2014;Shan,Q.et al.Targeted genome modification of cropplants using a CRISPR-Cas system.Nat.Biotechnol.31,686-688,2013;和 Liang,Z.etal.Targeted mutagenesis in Zea mays using TALENs and the CRISPR/Cas system.JGenet Genomics.41,63–68,2014)基于 pTaU6-sgRNA(Addgene ID53062)或pOsU3-sgRNA(Addgene ID53063)或 pZmU3-sgRNA(Addgene ID53061)构建sgRNA表达载体:
pTaU6-BFP-sgRNA、pOsU3-BFP-sgRNA、pZmU3-BFP-sgRNA、 pTaU6-LOX2-S1-sgRNA、pTaU6-LOX2-S2-sgRNA、 pTaU6-LOX2-S3-sgRNA、pOsU3-CDC48-sgRNA、pOsU3-NRT1.1-sgRNA、 pOsU3-SPL14-sgRNA和pZmU3-CENH3-sgRNA。
BFP和GFP表达载体
pUbi-BFPm,该载体序列示于SEQ ID NO:9。表达的BFP的氨基酸序 列示于SEQ IDNO:17。
pUbi-GFP,该载体序列示于SEQ ID NO:8。表达的GFP的氨基酸序 列示于SEQ IDNO:16。
构建pAG-n/dCas9-PBE-CDC48-sgRNA表达载体
使用Gibson-F和Gibson-R引物通过Gibson克隆方法将 APOBECI-XTEN-d/nCas9-UGI片段融合至经StuI和SacI消化的 pHUE411(Addgen#62203),生成没有sgRNA靶位点的载体 pHUE411-APOBECI-XTEN-d/nCas9-UGI。合成包含OsCDC48靶序列的成 对寡核苷酸,退火并克隆进BsaI消化的pHUN411载体,获得 pHUE411-sgRNA-CDC48。然后将pHUE411-sgRNA-CDC48的PmeI和AvrII 消化片段插入至同样酶切的pHUE411-APOBECI-XTEN-d/nCas9-UGI,最后 获得农杆菌介导的转化载体pAG-n/dCas9-PBE-CDC48-sgRNA。
原生质体测定
在本研究中使用小麦Bobwhite品种、水稻日本晴品种和玉米自交系品 种综31。如下所述进行原生质体转化。平均转化效率为55-70%。每种质粒 用10μg通过PEG介导方法进行转化,48小时后,收集原生质体,提取DNA 用于T7EI和PCR-RE测定。
小麦(玉米)原生质体制备及转化
1)取小麦(玉米)幼嫩的叶片,将其中间部分切成0.5-1mm的丝,放入 0.6M的Mannitol溶液中避光处理10分钟,再用滤网过滤,将其放入50ml 酶液中20-25℃避光,10rmp缓慢摇晃消化5小时。
2)加10ml W5稀释酶解产物,用75μm尼龙滤膜过滤酶解液于圆底离 心管中(50ml)。
3)23℃,100g,离心3min,弃上清。
4)用W5 10ml轻轻悬起,冰上放置30min使原生质体逐渐沉降,弃上 清。
5)加适量MMG悬浮,至于冰上,待转化。
6)2ml离心管中加10-20μg质粒,200μl原生质体(大约4×105细胞), 220μl新配的PEG溶液,混匀,室温避光放置10-20分钟诱导转化。
7)诱导转化结束后缓慢加880μl W5溶液,轻轻颠倒混匀,100g水平 离心3min,吸弃上清。
8)加2ml W5溶液重悬,转移到六孔板中,室温(或25℃)暗处培养。若 用于提取原生质体基因组DNA,需培养48h。
水稻原生质体制备及转化:
1)选取幼苗叶鞘部分分离原生质体,用锋利刀片切成大约0.5mm宽。
2)切开后立刻转移到0.6M Mannitol溶液中,避光放置10min。
3)过滤掉Mannitol溶液,转移到酶解液中,避光抽真空30min。
4)避光酶解5-6h,同时缓慢摇动(脱色摇床,速度10)。
5)酶解结束后,加入等体积的W5,水平摇动10sec,释放原生质体。
6)使用40μm尼龙膜过滤原生质体到50ml圆底离心管,再加W5溶液 冲洗。
7)250g水平离心3min沉淀原生质体,吸弃上清。
8)加10ml W5重悬原生质体,250g离心3min,弃上清。
9)加适量MMG溶液重悬原生质体浓度为2×106/ml。
注:以上所有步骤在室温进行。
10)2ml离心管中加10-20μg质粒,200μl原生质体(大约4×105细胞),220μl新配的PEG溶液,混匀,室温避光放置10-20分钟诱导转化。
11)诱导转化结束后缓慢加880μl W5溶液,轻轻颠倒混匀,250g水平 离心3min,吸弃上清。
12)加2ml WI溶液重悬,转移到六孔板中,室温(或25℃)暗处培养, 若用于提取原生质体基因组DNA,需培养48h。
DNA构建体基因枪转化进小麦愈伤细胞
质粒DNA(pnCas9-PBE和pTaU6-LOX2-S1-sgRNA)用于轰击Bobwhite 幼胚。如之前描述(Zhang,K.,Liu,J.,Zhang,Y.,Yang,Z.&Gao,C.Biolistic genetic transformationof a wide range of Chinese elite wheat(Triticum aestivum L.)varieties.J.Genet.Genomics.42,39-42(2015))进行基因枪转 化。在轰击后,根据文献记载处理胚,不过在组织培养过程中不使用任何 选择剂。
pAG-n/dCas9-PBE-CDC48-sgRNA通过农杆菌转化进水稻愈伤细胞
pAG-n/dCas9-PBE-CDC48-sgRNA双元载体通过电穿孔转化进农杆菌 AGL1菌株。根据Shan等人(Shan,Q.et al.Targeted genome modification of crop plants using aCRISPR-Cas system.Nat.Biotechnol.31,686-688(2013)) 进行水稻栽培种日本晴的农杆菌介导的转化、组织培养和再生。在全部随 后的组织培养过程中使用潮霉素筛选(50μg/ml)。
通过T7EI和PCR/RE测定鉴别突变
提取单株水稻植物DNA来通过T7EI测定检测突变,然后通过Sanger 测序确认。在小麦中,为了节约成本和人力,随机挑选3-4株合并为一组来 通过T7EI和PCR/RE检测突变。一旦该组出现阳性结果,该组中全部单株 进一步通过T7EI和PCR/RE进行检测,然后通过Sanger测序确认。
T7EI检测方法:
1)提取植物基因组DNA,PCR扩增和电泳检测。
2)PCR产物加入T7EI缓冲液,体系如下:
10×T7EI缓冲液 1.1μl
PCR产物 5μl
ddH<sub>2</sub>O 4.4μl
3)PCR仪中加热至95℃ 5min,取出降至室温,使PCR产物重新退火 形成异源双链DNA。
4)加入T7EI内切酶,体系如下:
3)中退火产物 10.5μl
T7EI,5units/μl 0.5μl
总体积 11μl
5)37℃酶切1h,取全部11μl产物凝胶电泳检测。如PCR产物可以切 开,说明含有indel突变。
PCR/RE检测:
1)提取植物基因组DNA。
2)合成基因特异引物,扩增含有靶位点的片段,长度为350-1000bp 之间:
10×EasyTaq Buffer 5μl
dNTP(2.5mM) 4μl
正向引物(10μM) 2μl
正向引物(10μM) 2μl
Easy Taq 0.5μl
DNA 2μl
ddH<sub>2</sub>O 至50μl
3)一般反应条件是:94℃变性5min;94℃变性30s,58℃复性30s, 72℃延伸30s,扩增30至35个循环;72℃保温5min;12℃保温。取5μl PCR 产物电泳检测。
4)限制性内切酶酶切PCR产物,一般的酶切体系如下:
10×Fastdigest Buffer 2μl
限制性内切酶 1μl
PCR产物 3-5μl
ddH<sub>2</sub>O 至20μl
5)37℃,酶切2-3h。1.2%琼脂糖凝胶电泳检测。
6)回收纯化PCR产物中未切开的突变条带,进行TA克隆。反应体系 如下:
pEasy-T Vector 1μl
回收的未切开的PCR产物 4μl
7)22℃连接10min,转化E.coli感受态细胞,涂LB固体平板(Amp100、 IPTG和X-gal),培养12-16h,挑选白色菌落鉴定阳性克隆,送测序。
深度测序
不同的sgRNA表达载体分别与pnCas9-PBE、pdCas9-PBE以及pwCas9 转化至小麦、水稻或玉米原生质体48小时后,收集原生质体,提取DNA 进行深度测序。在第一轮PCR中,靶区域使用位点特异性引物(表5)进行扩 增。在二轮PCR中,将正向和反向标签添加至PCR产物末端进行文库构建 (表5)。合并等量不同PCR产物。样品然后在Beijing GenomicsInstitute使 用Illumina High-Seq 4000测序。
实施例1.nCas9-PBE和dCas9-PBE在植物原生质体中对BFP的碱基编辑
在nCas9-PBE融合蛋白中,从N端至C端分别为NLS(SEQ ID NO:30)、 APOBEC1(SEQID NO:11)、XTEN接头(SEQ ID NO:12)、Cas9切口酶(nCas9, SEQ ID NO:13)、尿嘧啶DNA糖基化酶抑制剂(UGI,SEQ ID NO:15)和 NLS(SEQ ID NO:31);而在dCas9-PBE融合蛋白中,从N端至C端分别为 NLS、APOBEC1、XTEN接头、催化失活的Cas9(dCas9,SEQ ID NO:14)、 UBI和NLS。针对谷物作物高效表达而密码子优化后的融合蛋白编码序列 在质粒构建体pnCas9-PBE和pdCas9-PBE中被置于玉米Ubiquitin-1基因启 动子Ubi-1下游(图1a)。
比较了两种构建体分别在小麦和水稻原生质体中将蓝色荧光蛋白(BFP) 转换成绿色荧光蛋白(GFP)的能力。该转换涉及通过将BFP编码基因第66 个密码子的第一个核苷酸从C突变成T,从而由CAC(组氨酸)变成TAC(酪 氨酸)。
设计BFP特异性sgRNA的靶区域为覆盖密码子65至71以及密码子 72的前两个核苷酸,最后三个碱基(CAG)构成前间区序列邻近基序(PAM) (图1b)。待突变的C碱基位于靶序列的第4位。sgRNA分别使用启动子TaU6 和OsU3转录(图1b)。sgRNA表达载体pTaU6-BFP-sgRNA和 pOsU3-BFP-sgRNA的构建如上所述。
因为CAG并不是CRISPR/Cas9的最优PAM,将CAG人工突变成CGG, 并克隆所得的BFP序列(BFPm)至Ubi-1启动子下游,形成表达构建体 pUbi-BFPm(图1b)作为在原生质体中待编辑的靶。
当导入进水稻原生质体中时,pnCas9-PBE、pUbi-BFPm和 pOsU3-BFP-sgRNA的组合导致5.8%的细胞表达GFP;而用pdCas9-PBE替 换pnCas9-PBE后导致仅仅0.5%的GFP表达细胞;当不包含pnCas9-PBE 或pdCas9-PBE时,没有发现GFP表达细胞,而平行的阳性对照(用表达GFP 的构建体pUbi-GFP转化细胞)有58.4%的细胞表达GFP(图1c)。
在小麦原生质体中,使用pnCas9-PBE(6.8%)也比使用pdCas9-PBE(0.3%) 产生更多的表达GFP的细胞。
对用pnCas9-PBE、pUbi-BFPm和pOsU3-BFP-sgRNA (pTaU6-BFP-sgRNA或pZmU3-BFP-sgRNA)转化的水稻(小麦或玉米)原生 质体进行的深度扩增子测序显示总DNA读段的大约4.00%携带C至T的突 变(图1d)。突变仅仅在前间区序列(靶序列)的第3、4、6和9位的C碱基发 生,突变频率分别为2.48-3.92%、3.06-3.79%、5.86-8.75%和6.47-7.86%(图1d)。期望的C(第4位)也被突变,尽管其突变频率并不是最高的。当用 pdCas9-PBE替换pnCas9-PBE时,上述位置的C碱基也都被突变,但是突 变频率(0.06-0.22%)相对于pnCas9-PBE(2.48-8.75%)显著降低(图1d)。
因此,荧光蛋白报道基因测定的结果显示nCas9-PBE和dCas9-PBE均 能够在小麦、水稻和玉米原生质体中将靶区域的C转换成T,脱氨化的窗 口覆盖前间区序列(靶序列)的第3-9位。而nCas9-PBE的活性比dCas9-PBE 更强.
实施例2.nCas9-PBE和dCas9-PBE在植物原生质体中对内源基因的碱基 编辑
本实施例进一步研究了pnCas9-PBE或pdCas9-PBE对小麦、水稻和玉 米内源基因的突变特性。作为诱导indel的对照,本实验中还使用了表达野 生型Cas9的构建体pwCas9(Addgene ID53064)。
在小麦基因TaLOX2中设计了三个不同的sgRNA靶位点(S1、S2和S3), 在三个水稻基因OsCDC48、OsNRT1.1B和OsSPL14中分别设计一个sgRNA 靶位点,以及在玉米基因ZmCENH3中设计一个sgRNA靶位点(见表1)。
各sgRNA表达构建体分别和pnCas9-PBE、pdCas9-PBE以及pwCas9 组合并分别在小麦、水稻和玉米原生质体中共表达。提取原生质体DNA, 制备7个不同靶的PCR扩增子并测序。100000至270000个测序读段(read) 用于详细分析突变特性。
使用pnCas9-PBE时,在全部7个靶中观察到C至T转换,脱氨化窗 口覆盖前间区序列(靶序列)第3-9位(图2a)。单个C至T的取代频率为 0.57%-7.07%,第7位或接近第7位处的C碱基具有最高取代频率,并且编 辑事件独立于序列结构(图2a)。多个C的编辑频率(包括2个至5个C)在 0.31%至12.48%之间,同时编辑两个或三个C的频率更高(图2b)。与pwCas9 诱导的indel的频率(6.27%-11.68%)相比,在7个靶位点中pnCas9-PBE诱导 的indel频率非常低(0.01%-0.34%)(图2c和表3)。
在另一方面,表达pdCas9-PBE在7个靶位点中仅仅产生低频率的单个 C编辑(<0.96%)或多个C编辑(<1.29%)(图2a和表2),而出现indel的频率 (<0.06%)与pnCas9-PBE相当(图2c和表3)。显然,该实验的结果验证了之 前报道基因测定中发现的nCas9-PBE优于dCas9-PBE的结果以及验证了在 谷物植物细胞中的脱氨化窗口的大小(第3-9位的7个核苷酸窗口)。 nCas9-PBE不依赖靶位点上下文序列编辑单个和多个C的高效率也得到了验证。
以ZmCENH3为代表分析由nCas9-PBE在靶基因组区域中造成的氨基 酸突变。如图2d所示,在野生型ZmCENH3中,该保守残基是位于一个丙 氨酸-亮氨酸-亮氨酸(ALL,残基109-111)区段的中间亮氨酸残基。在使用 nCas-PBE处理的原生质体扩增的ZmCENH3靶位点的扩增子中,我们很容 易就鉴别到A109V、L110V和/或L111F取代(图2d)。nCas9-PBE能够编辑 单个C和多个C的能力使得对ALL获得总共7种不同的取代类型,包括对 每个残基的单取代(AFL、VLL和ALF),对两个残基的双取代(AFF、VFL 和VLF)以及全部三个残基的三取代(VFF)(图2d)。ZmCENH3-AFL的取代 事件在拟南芥和大麦中已经有研究,但剩余的单、双和三取代事件之前还 没有研究报道。
表2.dCas9-PBE处理诱导多个C取代的效率
Figure BDA0001467588190000241
“--”表示无;
a突变体读段数相对于总读段数。
表3.Indel诱导效率
nCas9-PBE dCas9-PBE wCas9 未处理
TaLOX2-S1 0.01 0.01 11.68 0.01
TaLOX2-S2 0.03 0.02 6.92 0.02
TaLOX2-S3 0.04 0.03 7.10 0.06
OsCDC48 0.22 0.13 9.04 0.02
OsNRT1.1B 0.06 0.05 7.12 0.03
OsSPL14 0.08 0.14 7.34 0.04
ZmCENH3 0.02 0.01 6.27 0.01
实施例3.突变体植物分析
本实施例进一步考察nCas9-PBE和dCas9-PBE的功能差异,以及通过 分析突变体植物验证nCas9-PBE的功能特性。
对于小麦转化,pnCas9-PBE和pTaU6-sgRNA-LOX2-s1(图3a)通过基因 枪共转化至面包小麦品种Bobwhite,并且在无除草剂选择下再生植物。使 用特异性引物(表4)从所再生的植物的基因组DNA扩增sgRNA-LOX2-s1的 靶位点。通过T7EI和限制性酶切测定(PCR-RE),从160个幼胚中鉴别到两 株携带核苷酸取代的植物,突变效率为1.25%(2/160)(图3b)。通过Sanger 测序确认,突变体T0-3同时含有全部三个C至T取代,在PAM下游第3、 6和9位。而突变体T0-7携带在第3位的C至T突变(图3b)。我们没有检 测到这两株突变体植物中的indel。
对于水稻转化,使用粳稻栽培种日本晴,因为该品种对遗传操作高应 答且已有全基因组序列。采用已发现调节水稻衰老和细胞死亡的OsCDC48 基因作为靶(图3c和表1)。pAG-n/dCas9-PBE-CDC48-sgRNA双元载体用农 杆菌介导的基因转移方法转化水稻。
针对nCas9-PBE和dCas9-PBE测试分别获得了92和87株独立的转基 因T0植物。在92株nCas9-PBE植物中,在T7EI分析和Sanger测序后发 现其中40株在OsCDC48靶区域携带至少一个C至T取代,因此突变体产 生效率为43.48%(40/92)(图3d)。代表性的测序结果示于图4。在所述40株 突变体中,仅仅在靶区域的第3、4、7和8位处的C碱基检测到突变,总 共鉴别到7种不同类型的点突变(图3e)。具体包括四种单核苷酸突变(C3T、 C4T、C7T和C8T),两种双核苷酸突变(C3C4至T3T4和C7C8至T7T8)和 一种三核苷酸突变(C3C4C7至T3T4T7)(图3e)。因此,nCas9-PBE在这些水 稻突变体中的脱氨化窗口覆盖7个核苷酸(第3-9位)。各个C的突变频率从 5.00%(C3,2/40)至32.50%(C7,13/40)之间,第7位的C具有最高的编辑频 率。该40株突变体的靶区域中没有观察到indel。令人意外的是,对于87 株dCas9-PBE植物,当使用T7EI测定和Sanger测序分析时,没有在 OsCDC48的靶区域检测到点突变或indel。
通过所获得的40株nCas-PBE突变体检查本发明的碱基编辑方法的潜 在脱靶效应。在日本晴的基因组序列中发现5个可能的脱靶位点,各自与 OsCDC48的sgRNA靶区域具有三个核苷酸错配(表4)。使用T7EI测定和 Sanger测序仔细分析这5个位点的扩增子,没有检测到点突变或indel,表 明通过nCas9-PBE的碱基编辑具有高特异性。
表4.OsCDC48潜在脱靶效应分析
Figure BDA0001467588190000261
小写碱基为与OsCDC48错配碱基。黑体字母示出PAM序列。
实施例4.对玉米植物ALS基因进行碱基编辑
利用AtALS(AT3G48560.1)作为seed sequence(种子序列),在玉米数据 库(https://phytozome.jgi.doe.gov)中,通过BLASTN比对分析,得到两个 ZmALS同源基因,分别命名为ZmALS1(Locus Name:GRMZM2G143357)、 ZmALS2(Locus Name:GRMZM2G143008)。ZmALS1与ZmALS2之间具有 93.84%的序列相似性,在ZmALS1(SEQ ID NO:26)与ZmALS2(SEQID NO:28) 的保守区(Svitashev et al.,Plant Physiol.,2015)设计共同的sgRNA靶序列: CAGGTGCCGCGACGCATGATTGG。如上文所述方法构建碱基编辑系统, 并通过基因枪法转化综31玉米品种。经过PCR/RE检测,获得了在ZmALS1 与ZmALS2上C7转变为T7的植株(图5)。
基于上面所示的数据,修改和优化的nCas9-PBE,在小麦、水稻和玉 米细胞中诱导高效的和特异性的C至T碱基编辑,并且通过nCas9-PBE产 生的点突变相对于通过TILLING产生的点突变具有独特性。使用nCas9-PBE并结合适当设计的sgRNA,能够相对容易地对期望的残基以及 邻近残基进行点突变,从而可能分析位于蛋白关键结构域中的氨基酸单独或组合突变的作用。在另一方面,通过TILLING鉴别的点突变通常仅仅出 现在单独的氨基酸残基上,通过TILLING难以同时获得对靶残基及其周围 邻近残基的突变。因此,nCas9-PBE对于快速产生多重突变体具有明显优 势,可以用于详细分析关键蛋白结构域中的一个或多个氨基酸的功能。 nCas9-PBE的重要功能特性包括相对大的脱氨化窗口(覆盖前间区序列/靶 序列的7个碱基),其碱基编辑不依赖于靶区域的序列上下文结构,以及几 乎没有indel突变。nCas9-PBE在谷物植物中更大的脱氨化窗口对于产生更 多样化的突变是有利的。
本发明证明有Cas9变体-胞苷脱氨酶融合蛋白介导的C至T碱基编辑 是一种在植物基因组中产生定向点突变的高效工具,由此增加通过基因组 工程化改造改良作物的效率。
表5.实施例中所用引物
Figure BDA0001467588190000271
Figure BDA0001467588190000281
Figure BDA0001467588190000291
序列表
<110> 中国科学院遗传与发育生物学研究所
<120> 植物碱基编辑方法
<130> TC1499
<160> 31
<170> PatentIn version 3.5
<210> 1
<211> 681
<212> DNA
<213> Artificial Sequence
<220>
<223> APOBEC1 encoding sequence
<400> 1
tcatcggaga ccggccctgt tgctgttgac cccaccctgc ggcggagaat cgagccacac 60
gagttcgagg tgttcttcga cccaagggag ctccgcaagg agacgtgcct cctgtacgag 120
atcaactggg gcggcaggca ctccatctgg aggcacacca gccaaaacac caacaagcac 180
gtggaggtca acttcatcga gaagttcacc accgagaggt acttctgccc aaacacccgc 240
tgctccatca cctggttcct gtcctggagc ccatgcggcg agtgctccag ggccatcacc 300
gagttcctca gccgctaccc acacgtcacc ctgttcatct acatcgccag gctctaccac 360
cacgccgacc caaggaacag gcagggcctc cgcgacctga tctccagcgg cgtgaccatc 420
caaatcatga ccgagcagga gtccggctac tgctggagga acttcgtcaa ctactcccca 480
agcaacgagg cccactggcc aaggtaccca cacctctggg tgcgcctcta cgtgctcgag 540
ctgtactgca tcatcctcgg cctgccacca tgcctcaaca tcctgaggcg caagcaacca 600
cagctgacct tcttcaccat cgccctccaa agctgccact accagaggct cccaccacac 660
atcctgtggg ctaccggcct c 681
<210> 2
<211> 48
<212> DNA
<213> Artificial Sequence
<220>
<223> XTEN encoding sequence
<400> 2
aagtccggca gcgagacgcc aggcacctcc gagagcgcta cgcctgaa 48
<210> 3
<211> 4152
<212> DNA
<213> Artificial Sequence
<220>
<223> nCas9(D10A) encoding sequence
<400> 3
atggacaaga agtactcgat cggcctcgcc atcgggacga actcagttgg ctgggccgtg 60
atcaccgacg agtacaaggt gccctctaag aagttcaagg tcctggggaa caccgaccgc 120
cattccatca agaagaacct catcggcgct ctcctgttcg acagcgggga gaccgctgag 180
gctacgaggc tcaagagaac cgctaggcgc cggtacacga gaaggaagaa caggatctgc 240
tacctccaag agattttctc caacgagatg gccaaggttg acgattcatt cttccaccgc 300
ctggaggagt ctttcctcgt ggaggaggat aagaagcacg agcggcatcc catcttcggc 360
aacatcgtgg acgaggttgc ctaccacgag aagtacccta cgatctacca tctgcggaag 420
aagctcgtgg actccaccga taaggcggac ctcagactga tctacctcgc tctggcccac 480
atgatcaagt tccgcggcca tttcctgatc gagggggatc tcaacccaga caacagcgat 540
gttgacaagc tgttcatcca actcgtgcag acctacaacc aactcttcga ggagaacccg 600
atcaacgcct ctggcgtgga cgcgaaggct atcctgtccg cgaggctctc gaagtccagg 660
aggctggaga acctgatcgc tcagctccca ggcgagaaga agaacggcct gttcgggaac 720
ctcatcgctc tcagcctggg gctcaccccg aacttcaagt cgaacttcga tctcgctgag 780
gacgccaagc tgcaactctc caaggacacc tacgacgatg acctcgataa cctcctggcc 840
cagatcggcg atcaatacgc ggacctgttc ctcgctgcca agaacctgtc ggacgccatc 900
ctcctgtcag atatcctccg cgtgaacacc gagatcacga aggctccact ctctgcctcc 960
atgatcaagc gctacgacga gcaccatcag gatctgaccc tcctgaaggc gctggtccgc 1020
caacagctcc cggagaagta caaggagatt ttcttcgatc agtcgaagaa cggctacgct 1080
gggtacatcg acggcggggc ctcacaagag gagttctaca agttcatcaa gccaatcctg 1140
gagaagatgg acggcacgga ggagctcctg gtgaagctca acagggagga cctcctgcgg 1200
aagcagagaa ccttcgataa cggcagcatc ccccaccaaa tccatctcgg ggagctgcac 1260
gccatcctga gaaggcaaga ggacttctac cctttcctca aggataaccg ggagaagatc 1320
gagaagatcc tgaccttcag aatcccatac tacgtcggcc ctctcgcgcg ggggaactca 1380
agattcgctt ggatgacccg caagtctgag gagaccatca cgccgtggaa cttcgaggag 1440
gtggtggaca agggcgctag cgctcagtcg ttcatcgaga ggatgaccaa cttcgacaag 1500
aacctgccca acgagaaggt gctccctaag cactcgctcc tgtacgagta cttcaccgtc 1560
tacaacgagc tcacgaaggt gaagtacgtc accgagggca tgcgcaagcc agcgttcctg 1620
tccggggagc agaagaaggc tatcgtggac ctcctgttca agaccaaccg gaaggtcacg 1680
gttaagcaac tcaaggagga ctacttcaag aagatcgagt gcttcgattc ggtcgagatc 1740
agcggcgttg aggaccgctt caacgccagc ctcgggacct accacgatct cctgaagatc 1800
atcaaggata aggacttcct ggacaacgag gagaacgagg atatcctgga ggacatcgtg 1860
ctgaccctca cgctgttcga ggacagggag atgatcgagg agcgcctgaa gacgtacgcc 1920
catctcttcg atgacaaggt catgaagcaa ctcaagcgcc ggagatacac cggctggggg 1980
aggctgtccc gcaagctcat caacggcatc cgggacaagc agtccgggaa gaccatcctc 2040
gacttcctca agagcgatgg cttcgccaac aggaacttca tgcaactgat ccacgatgac 2100
agcctcacct tcaaggagga tatccaaaag gctcaagtga gcggccaggg ggactcgctg 2160
cacgagcata tcgcgaacct cgctggctcc cccgcgatca agaagggcat cctccagacc 2220
gtgaaggttg tggacgagct cgtgaaggtc atgggccggc acaagcctga gaacatcgtc 2280
atcgagatgg ccagagagaa ccaaaccacg cagaaggggc aaaagaactc tagggagcgc 2340
atgaagcgca tcgaggaggg catcaaggag ctggggtccc aaatcctcaa ggagcaccca 2400
gtggagaaca cccaactgca gaacgagaag ctctacctgt actacctcca gaacggcagg 2460
gatatgtacg tggaccaaga gctggatatc aaccgcctca gcgattacga cgtcgatcat 2520
atcgttcccc agtctttcct gaaggatgac tccatcgaca acaaggtcct caccaggtcg 2580
gacaagaacc gcggcaagtc agataacgtt ccatctgagg aggtcgttaa gaagatgaag 2640
aactactgga ggcagctcct gaacgccaag ctgatcacgc aaaggaagtt cgacaacctc 2700
accaaggctg agagaggcgg gctctcagag ctggacaagg ccggcttcat caagcggcag 2760
ctggtcgaga ccagacaaat cacgaagcac gttgcgcaaa tcctcgactc tcggatgaac 2820
acgaagtacg atgagaacga caagctgatc agggaggtta aggtgatcac cctgaagtct 2880
aagctcgtct ccgacttcag gaaggatttc cagttctaca aggttcgcga gatcaacaac 2940
taccaccatg cccatgacgc ttacctcaac gctgtggtcg gcaccgctct gatcaagaag 3000
tacccaaagc tggagtccga gttcgtgtac ggggactaca aggtttacga tgtgcgcaag 3060
atgatcgcca agtcggagca agagatcggc aaggctaccg ccaagtactt cttctactca 3120
aacatcatga acttcttcaa gaccgagatc acgctggcca acggcgagat ccggaagaga 3180
ccgctcatcg agaccaacgg cgagacgggg gagatcgtgt gggacaaggg cagggatttc 3240
gcgaccgtcc gcaaggttct ctccatgccc caggtgaaca tcgtcaagaa gaccgaggtc 3300
caaacgggcg ggttctcaaa ggagtctatc ctgcctaagc ggaacagcga caagctcatc 3360
gccagaaaga aggactggga cccaaagaag tacggcgggt tcgacagccc taccgtggcc 3420
tactcggtcc tggttgtggc gaaggttgag aagggcaagt ccaagaagct caagagcgtg 3480
aaggagctcc tggggatcac catcatggag aggtccagct tcgagaagaa cccaatcgac 3540
ttcctggagg ccaagggcta caaggaggtg aagaaggacc tgatcatcaa gctcccgaag 3600
tactctctct tcgagctgga gaacggcagg aagagaatgc tggcttccgc tggcgagctc 3660
cagaagggga acgagctcgc gctgccaagc aagtacgtga acttcctcta cctggcttcc 3720
cactacgaga agctcaaggg cagcccggag gacaacgagc aaaagcagct gttcgtcgag 3780
cagcacaagc attacctcga cgagatcatc gagcaaatct ccgagttcag caagcgcgtg 3840
atcctcgccg acgcgaacct ggataaggtc ctctccgcct acaacaagca ccgggacaag 3900
cccatcagag agcaagcgga gaacatcatc catctcttca ccctgacgaa cctcggcgct 3960
cctgctgctt tcaagtactt cgacaccacg atcgatcgga agagatacac ctccacgaag 4020
gaggtcctgg acgcgaccct catccaccag tcgatcaccg gcctgtacga gacgaggatc 4080
gacctctcac aactcggcgg ggataagaga cccgcagcaa ccaagaaggc agggcaagca 4140
aagaagaaga ag 4152
<210> 4
<211> 4152
<212> DNA
<213> Artificial Sequence
<220>
<223> dCas9 encoding sequence
<400> 4
atggacaaga agtactcgat cggcctcgcc atcgggacga actcagttgg ctgggccgtg 60
atcaccgacg agtacaaggt gccctctaag aagttcaagg tcctggggaa caccgaccgc 120
cattccatca agaagaacct catcggcgct ctcctgttcg acagcgggga gaccgctgag 180
gctacgaggc tcaagagaac cgctaggcgc cggtacacga gaaggaagaa caggatctgc 240
tacctccaag agattttctc caacgagatg gccaaggttg acgattcatt cttccaccgc 300
ctggaggagt ctttcctcgt ggaggaggat aagaagcacg agcggcatcc catcttcggc 360
aacatcgtgg acgaggttgc ctaccacgag aagtacccta cgatctacca tctgcggaag 420
aagctcgtgg actccaccga taaggcggac ctcagactga tctacctcgc tctggcccac 480
atgatcaagt tccgcggcca tttcctgatc gagggggatc tcaacccaga caacagcgat 540
gttgacaagc tgttcatcca actcgtgcag acctacaacc aactcttcga ggagaacccg 600
atcaacgcct ctggcgtgga cgcgaaggct atcctgtccg cgaggctctc gaagtccagg 660
aggctggaga acctgatcgc tcagctccca ggcgagaaga agaacggcct gttcgggaac 720
ctcatcgctc tcagcctggg gctcaccccg aacttcaagt cgaacttcga tctcgctgag 780
gacgccaagc tgcaactctc caaggacacc tacgacgatg acctcgataa cctcctggcc 840
cagatcggcg atcaatacgc ggacctgttc ctcgctgcca agaacctgtc ggacgccatc 900
ctcctgtcag atatcctccg cgtgaacacc gagatcacga aggctccact ctctgcctcc 960
atgatcaagc gctacgacga gcaccatcag gatctgaccc tcctgaaggc gctggtccgc 1020
caacagctcc cggagaagta caaggagatt ttcttcgatc agtcgaagaa cggctacgct 1080
gggtacatcg acggcggggc ctcacaagag gagttctaca agttcatcaa gccaatcctg 1140
gagaagatgg acggcacgga ggagctcctg gtgaagctca acagggagga cctcctgcgg 1200
aagcagagaa ccttcgataa cggcagcatc ccccaccaaa tccatctcgg ggagctgcac 1260
gccatcctga gaaggcaaga ggacttctac cctttcctca aggataaccg ggagaagatc 1320
gagaagatcc tgaccttcag aatcccatac tacgtcggcc ctctcgcgcg ggggaactca 1380
agattcgctt ggatgacccg caagtctgag gagaccatca cgccgtggaa cttcgaggag 1440
gtggtggaca agggcgctag cgctcagtcg ttcatcgaga ggatgaccaa cttcgacaag 1500
aacctgccca acgagaaggt gctccctaag cactcgctcc tgtacgagta cttcaccgtc 1560
tacaacgagc tcacgaaggt gaagtacgtc accgagggca tgcgcaagcc agcgttcctg 1620
tccggggagc agaagaaggc tatcgtggac ctcctgttca agaccaaccg gaaggtcacg 1680
gttaagcaac tcaaggagga ctacttcaag aagatcgagt gcttcgattc ggtcgagatc 1740
agcggcgttg aggaccgctt caacgccagc ctcgggacct accacgatct cctgaagatc 1800
atcaaggata aggacttcct ggacaacgag gagaacgagg atatcctgga ggacatcgtg 1860
ctgaccctca cgctgttcga ggacagggag atgatcgagg agcgcctgaa gacgtacgcc 1920
catctcttcg atgacaaggt catgaagcaa ctcaagcgcc ggagatacac cggctggggg 1980
aggctgtccc gcaagctcat caacggcatc cgggacaagc agtccgggaa gaccatcctc 2040
gacttcctca agagcgatgg cttcgccaac aggaacttca tgcaactgat ccacgatgac 2100
agcctcacct tcaaggagga tatccaaaag gctcaagtga gcggccaggg ggactcgctg 2160
cacgagcata tcgcgaacct cgctggctcc cccgcgatca agaagggcat cctccagacc 2220
gtgaaggttg tggacgagct cgtgaaggtc atgggccggc acaagcctga gaacatcgtc 2280
atcgagatgg ccagagagaa ccaaaccacg cagaaggggc aaaagaactc tagggagcgc 2340
atgaagcgca tcgaggaggg catcaaggag ctggggtccc aaatcctcaa ggagcaccca 2400
gtggagaaca cccaactgca gaacgagaag ctctacctgt actacctcca gaacggcagg 2460
gatatgtacg tggaccaaga gctggatatc aaccgcctca gcgattacga cgtcgatgct 2520
atcgttcccc agtctttcct gaaggatgac tccatcgaca acaaggtcct caccaggtcg 2580
gacaagaacc gcggcaagtc agataacgtt ccatctgagg aggtcgttaa gaagatgaag 2640
aactactgga ggcagctcct gaacgccaag ctgatcacgc aaaggaagtt cgacaacctc 2700
accaaggctg agagaggcgg gctctcagag ctggacaagg ccggcttcat caagcggcag 2760
ctggtcgaga ccagacaaat cacgaagcac gttgcgcaaa tcctcgactc tcggatgaac 2820
acgaagtacg atgagaacga caagctgatc agggaggtta aggtgatcac cctgaagtct 2880
aagctcgtct ccgacttcag gaaggatttc cagttctaca aggttcgcga gatcaacaac 2940
taccaccatg cccatgacgc ttacctcaac gctgtggtcg gcaccgctct gatcaagaag 3000
tacccaaagc tggagtccga gttcgtgtac ggggactaca aggtttacga tgtgcgcaag 3060
atgatcgcca agtcggagca agagatcggc aaggctaccg ccaagtactt cttctactca 3120
aacatcatga acttcttcaa gaccgagatc acgctggcca acggcgagat ccggaagaga 3180
ccgctcatcg agaccaacgg cgagacgggg gagatcgtgt gggacaaggg cagggatttc 3240
gcgaccgtcc gcaaggttct ctccatgccc caggtgaaca tcgtcaagaa gaccgaggtc 3300
caaacgggcg ggttctcaaa ggagtctatc ctgcctaagc ggaacagcga caagctcatc 3360
gccagaaaga aggactggga cccaaagaag tacggcgggt tcgacagccc taccgtggcc 3420
tactcggtcc tggttgtggc gaaggttgag aagggcaagt ccaagaagct caagagcgtg 3480
aaggagctcc tggggatcac catcatggag aggtccagct tcgagaagaa cccaatcgac 3540
ttcctggagg ccaagggcta caaggaggtg aagaaggacc tgatcatcaa gctcccgaag 3600
tactctctct tcgagctgga gaacggcagg aagagaatgc tggcttccgc tggcgagctc 3660
cagaagggga acgagctcgc gctgccaagc aagtacgtga acttcctcta cctggcttcc 3720
cactacgaga agctcaaggg cagcccggag gacaacgagc aaaagcagct gttcgtcgag 3780
cagcacaagc attacctcga cgagatcatc gagcaaatct ccgagttcag caagcgcgtg 3840
atcctcgccg acgcgaacct ggataaggtc ctctccgcct acaacaagca ccgggacaag 3900
cccatcagag agcaagcgga gaacatcatc catctcttca ccctgacgaa cctcggcgct 3960
cctgctgctt tcaagtactt cgacaccacg atcgatcgga agagatacac ctccacgaag 4020
gaggtcctgg acgcgaccct catccaccag tcgatcaccg gcctgtacga gacgaggatc 4080
gacctctcac aactcggcgg ggataagaga cccgcagcaa ccaagaaggc agggcaagca 4140
aagaagaaga ag 4152
<210> 5
<211> 249
<212> DNA
<213> Artificial Sequence
<220>
<223> UGI encoding sequence
<400> 5
accaacctgt ccgacatcat cgagaaggag acgggcaagc aactcgtgat ccaggagagc 60
atcctcatgc tgccagagga ggtggaggag gtcatcggca acaagccaga gtccgacatc 120
ctggtgcaca ccgcctacga cgagtccacc gacgagaacg tcatgctcct gaccagcgac 180
gccccagagt acaagccatg ggccctcgtc atccaggaca gcaacgggga gaacaagatc 240
aagatgctg 249
<210> 6
<211> 720
<212> DNA
<213> Artificial Sequence
<220>
<223> GFP encoding sequence
<400> 6
atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60
ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120
ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 180
ctcgtgacca ccttcaccta cggcgtgcag tgcttcagcc gctaccccga ccacatgaag 240
cagcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 300
ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 360
gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 420
aagctggagt acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 480
ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc 540
gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 600
tacctgagca cccagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 660
ctgctggagt tcgtgaccgc cgccgggatc actcacggca tggacgagct gtacaagtaa 720
<210> 7
<211> 720
<212> DNA
<213> Artificial Sequence
<220>
<223> BFPm encoding sequence
<400> 7
atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60
ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120
ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 180
ctcgtgacca ccttcaccca cggcgtgcgg tgcttcagcc gctaccccga ccacatgaag 240
cagcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 300
ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 360
gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 420
aagctggagt acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 480
ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc 540
gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 600
tacctgagca cccagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 660
ctgctggagt tcgtgaccgc cgccgggatc actcacggca tggacgagct gtacaagtaa 720
<210> 8
<211> 5696
<212> DNA
<213> Artificial Sequence
<220>
<223> pUbi-GFP vector
<400> 8
gagctcggta cctgacccgg tcgtgcccct ctctagagat aatgagcatt gcatgtctaa 60
gttataaaaa attaccacat attttttttg tcacacttgt ttgaagtgca gtttatctat 120
ctttatacat atatttaaac tttactctac gaataatata atctatagta ctacaataat 180
atcagtgttt tagagaatca tataaatgaa cagttagaca tggtctaaag gacaattgag 240
tattttgaca acaggactct acagttttat ctttttagtg tgcatgtgtt ctcctttttt 300
tttgcaaata gcttcaccta tataatactt catccatttt attagtacat ccatttaggg 360
tttagggtta atggttttta tagactaatt tttttagtac atctatttta ttctatttta 420
gcctctaaat taagaaaact aaaactctat tttagttttt ttatttaata atttagatat 480
aaaatagaat aaaataaagt gactaaaaat taaacaaata ccctttaaga aattaaaaaa 540
actaaggaaa catttttctt gtttcgagta gataatgcca gcctgttaaa cgccgtcgac 600
gagtctaacg gacaccaacc agcgaaccag cagcgtcgcg tcgggccaag cgaagcagac 660
ggcacggcat ctctgtcgct gcctctggac ccctctcgat cgagagttcc gctccaccgt 720
tggacttgct ccgctgtcgg catccagaaa ttgcgtggcg gagcggcaga cgtgagccgg 780
cacggcaggc ggcctcctcc tcctctcacg gcaccggcag ctacggggga ttcctttccc 840
accgctcctt cgctttccct tcctcgcccg ccgtaataaa tagacacccc ctccacaccc 900
tctttcccca acctcgtgtt gttcggagcg cacacacaca caaccagatc tcccccaaat 960
ccacccgtcg gcacctccgc ttcaaggtac gccgctcgtc ctcccccccc ccccctctct 1020
accttctcta gatcggcgtt ccggtccatg gttagggccc ggtagttcta cttctgttca 1080
tgtttgtgtt agatccgtgt ttgtgttaga tccgtgctgc tagcgttcgt acacggatgc 1140
gacctgtacg tcagacacgt tctgattgct aacttgccag tgtttctctt tggggaatcc 1200
tgggatggct ctagccgttc cgcagacggg atcgatttca tgattttttt tgtttcgttg 1260
catagggttt ggtttgccct tttcctttat ttcaatatat gccgtgcact tgtttgtcgg 1320
gtcatctttt catgcttttt tttgtcttgg ttgtgatgat gtggtctggt tgggcggtcg 1380
ttctagatcg gagtagaatt aattctgttt caaactacct ggtggattta ttaattttgg 1440
atctgtatgt gtgtgccata catattcata gttacgaatt gaagatgatg gatggaaata 1500
tcgatctagg ataggtatac atgttgatgc gggttttact gatgcatata cagagatgct 1560
ttttgttcgc ttggttgtga tgatgtggtg tggttgggcg gtcgttcatt cgttctagat 1620
cggagtagaa tactgtttca aactacctgg tgtatttatt aattttggaa ctgtatgtgt 1680
gtgtcataca tcttcatagt tacgagttta agatggatgg aaatatcgat ctaggatagg 1740
tatacatgtt gatgtgggtt ttactgatgc atatacatga tggcatatgc agcatctatt 1800
catatgctct aaccttgagt acctatctat tataataaac aagtatgttt tataattatt 1860
ttgatcttga tatacttgga tgatggcata tgcagcagct atatgtggat ttttttagcc 1920
ctgccttcat acgctattta tttgcttggt actgtttctt ttgtcgatgc tcaccctgtt 1980
gtttggtgtt acttctgcaa agcttccacc atggcgtgca ggtcgactct agaggatcca 2040
tggtgagcaa gggcgaggag ctgttcaccg gggtggtgcc catcctggtc gagctggacg 2100
gcgacgtaaa cggccacaag ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg 2160
gcaagctgac cctgaagttc atctgcacca ccggcaagct gcccgtgccc tggcccaccc 2220
tcgtgaccac cttcacctac ggcgtgcagt gcttcagccg ctaccccgac cacatgaagc 2280
agcacgactt cttcaagtcc gccatgcccg aaggctacgt ccaggagcgc accatcttct 2340
tcaaggacga cggcaactac aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg 2400
tgaaccgcat cgagctgaag ggcatcgact tcaaggagga cggcaacatc ctggggcaca 2460
agctggagta caactacaac agccacaacg tctatatcat ggccgacaag cagaagaacg 2520
gcatcaaggt gaacttcaag atccgccaca acatcgagga cggcagcgtg cagctcgccg 2580
accactacca gcagaacacc cccatcggcg acggccccgt gctgctgccc gacaaccact 2640
acctgagcac ccagtccgcc ctgagcaaag accccaacga gaagcgcgat cacatggtcc 2700
tgctggagtt cgtgaccgcc gccgggatca ctcacggcat ggacgagctg tacaagtaac 2760
cgggcgagct cgaattcgct gaaatcacca gtctctctct acaaatctat ctctctctat 2820
tttctccata aataatgtgt gagtagtttc ccgataaggg aaattagggt tcttataggg 2880
tttcgctcat gtgttgagca tataagaaac ccttagtatg tatttgtatt tgtaaaatac 2940
ttctatcaat aaaatttcta attcctaaaa ccaaaatcca gtactaaaat ccagatctcc 3000
taaagtccct atagatcttt gtcgtgaata taaaccagac acgagacgac taaacctgga 3060
gcccagacgc cgttcgaagc tagaagtacc gcttaggcag gaggccgtta gggaaaagat 3120
gctaaggcag ggttggttac gttgactccc ccgtaggttt ggtttaaata tgatgaagtg 3180
gacggaagga aggaggaaga caaggaagga taaggttgca ggccctgtgc aaggtaagaa 3240
gatggaaatt tgatagaggt acgctactat acttatacta tacgctaagg gaatgcttgt 3300
atttataccc tataccccct aataacccct tatcaattta agaaataatc cgcataagcc 3360
cccgcttaaa aattggtatc agagccatga ataggtctat gaccaaaact caagaggata 3420
aaacctcacc aaaatacgaa agagttctta actctaaaga taaaagatct ttcaagatca 3480
aaactagttc cctcacaccg gagcatgcga tatcctcgag agatctaggc gtaatcatgg 3540
tcatagctgt ttcctgtgtg aaattgttat ccgctcacaa ttccacacaa catacgagcc 3600
ggaagcataa agtgtaaagc ctggggtgcc taatgagtga gctaactcac attaattgcg 3660
ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgt gccagctgca ttaatgaatc 3720
ggccaacgcg cggggagagg cggtttgcgt attgggcgct cttccgcttc ctcgctcact 3780
gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat cagctcactc aaaggcggta 3840
atacggttat ccacagaatc aggggataac gcaggaaaga acatgtgagc aaaaggccag 3900
caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag gctccgcccc 3960
cctgacgagc atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta 4020
taaagatacc aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg 4080
ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcaatgc 4140
tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac 4200
gaaccccccg ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac 4260
ccggtaagac acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg 4320
aggtatgtag gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga 4380
aggacagtat ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt 4440
agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag 4500
cagattacgc gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct 4560
gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg 4620
atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat 4680
gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc 4740
tgtctatttc gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg 4800
gagggcttac catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct 4860
ccagatttat cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca 4920
actttatccg cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg 4980
ccagttaata gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg 5040
tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc 5100
cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag 5160
ttggccgcag tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg 5220
ccatccgtaa gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag 5280
tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat 5340
agcagaactt taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg 5400
atcttaccgc tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca 5460
gcatctttta ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca 5520
aaaaagggaa taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat 5580
tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag 5640
aaaaataaac aaataggggt tccgcgcaca tttccccgaa aagtgccacc tgacgt 5696
<210> 9
<211> 5696
<212> DNA
<213> Artificial Sequence
<220>
<223> pUbi-BFPm vector
<400> 9
gagctcggta cctgacccgg tcgtgcccct ctctagagat aatgagcatt gcatgtctaa 60
gttataaaaa attaccacat attttttttg tcacacttgt ttgaagtgca gtttatctat 120
ctttatacat atatttaaac tttactctac gaataatata atctatagta ctacaataat 180
atcagtgttt tagagaatca tataaatgaa cagttagaca tggtctaaag gacaattgag 240
tattttgaca acaggactct acagttttat ctttttagtg tgcatgtgtt ctcctttttt 300
tttgcaaata gcttcaccta tataatactt catccatttt attagtacat ccatttaggg 360
tttagggtta atggttttta tagactaatt tttttagtac atctatttta ttctatttta 420
gcctctaaat taagaaaact aaaactctat tttagttttt ttatttaata atttagatat 480
aaaatagaat aaaataaagt gactaaaaat taaacaaata ccctttaaga aattaaaaaa 540
actaaggaaa catttttctt gtttcgagta gataatgcca gcctgttaaa cgccgtcgac 600
gagtctaacg gacaccaacc agcgaaccag cagcgtcgcg tcgggccaag cgaagcagac 660
ggcacggcat ctctgtcgct gcctctggac ccctctcgat cgagagttcc gctccaccgt 720
tggacttgct ccgctgtcgg catccagaaa ttgcgtggcg gagcggcaga cgtgagccgg 780
cacggcaggc ggcctcctcc tcctctcacg gcaccggcag ctacggggga ttcctttccc 840
accgctcctt cgctttccct tcctcgcccg ccgtaataaa tagacacccc ctccacaccc 900
tctttcccca acctcgtgtt gttcggagcg cacacacaca caaccagatc tcccccaaat 960
ccacccgtcg gcacctccgc ttcaaggtac gccgctcgtc ctcccccccc ccccctctct 1020
accttctcta gatcggcgtt ccggtccatg gttagggccc ggtagttcta cttctgttca 1080
tgtttgtgtt agatccgtgt ttgtgttaga tccgtgctgc tagcgttcgt acacggatgc 1140
gacctgtacg tcagacacgt tctgattgct aacttgccag tgtttctctt tggggaatcc 1200
tgggatggct ctagccgttc cgcagacggg atcgatttca tgattttttt tgtttcgttg 1260
catagggttt ggtttgccct tttcctttat ttcaatatat gccgtgcact tgtttgtcgg 1320
gtcatctttt catgcttttt tttgtcttgg ttgtgatgat gtggtctggt tgggcggtcg 1380
ttctagatcg gagtagaatt aattctgttt caaactacct ggtggattta ttaattttgg 1440
atctgtatgt gtgtgccata catattcata gttacgaatt gaagatgatg gatggaaata 1500
tcgatctagg ataggtatac atgttgatgc gggttttact gatgcatata cagagatgct 1560
ttttgttcgc ttggttgtga tgatgtggtg tggttgggcg gtcgttcatt cgttctagat 1620
cggagtagaa tactgtttca aactacctgg tgtatttatt aattttggaa ctgtatgtgt 1680
gtgtcataca tcttcatagt tacgagttta agatggatgg aaatatcgat ctaggatagg 1740
tatacatgtt gatgtgggtt ttactgatgc atatacatga tggcatatgc agcatctatt 1800
catatgctct aaccttgagt acctatctat tataataaac aagtatgttt tataattatt 1860
ttgatcttga tatacttgga tgatggcata tgcagcagct atatgtggat ttttttagcc 1920
ctgccttcat acgctattta tttgcttggt actgtttctt ttgtcgatgc tcaccctgtt 1980
gtttggtgtt acttctgcaa agcttccacc atggcgtgca ggtcgactct agaggatcca 2040
tggtgagcaa gggcgaggag ctgttcaccg gggtggtgcc catcctggtc gagctggacg 2100
gcgacgtaaa cggccacaag ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg 2160
gcaagctgac cctgaagttc atctgcacca ccggcaagct gcccgtgccc tggcccaccc 2220
tcgtgaccac cttcacccac ggcgtgcggt gcttcagccg ctaccccgac cacatgaagc 2280
agcacgactt cttcaagtcc gccatgcccg aaggctacgt ccaggagcgc accatcttct 2340
tcaaggacga cggcaactac aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg 2400
tgaaccgcat cgagctgaag ggcatcgact tcaaggagga cggcaacatc ctggggcaca 2460
agctggagta caactacaac agccacaacg tctatatcat ggccgacaag cagaagaacg 2520
gcatcaaggt gaacttcaag atccgccaca acatcgagga cggcagcgtg cagctcgccg 2580
accactacca gcagaacacc cccatcggcg acggccccgt gctgctgccc gacaaccact 2640
acctgagcac ccagtccgcc ctgagcaaag accccaacga gaagcgcgat cacatggtcc 2700
tgctggagtt cgtgaccgcc gccgggatca ctcacggcat ggacgagctg tacaagtaac 2760
cgggcgagct cgaattcgct gaaatcacca gtctctctct acaaatctat ctctctctat 2820
tttctccata aataatgtgt gagtagtttc ccgataaggg aaattagggt tcttataggg 2880
tttcgctcat gtgttgagca tataagaaac ccttagtatg tatttgtatt tgtaaaatac 2940
ttctatcaat aaaatttcta attcctaaaa ccaaaatcca gtactaaaat ccagatctcc 3000
taaagtccct atagatcttt gtcgtgaata taaaccagac acgagacgac taaacctgga 3060
gcccagacgc cgttcgaagc tagaagtacc gcttaggcag gaggccgtta gggaaaagat 3120
gctaaggcag ggttggttac gttgactccc ccgtaggttt ggtttaaata tgatgaagtg 3180
gacggaagga aggaggaaga caaggaagga taaggttgca ggccctgtgc aaggtaagaa 3240
gatggaaatt tgatagaggt acgctactat acttatacta tacgctaagg gaatgcttgt 3300
atttataccc tataccccct aataacccct tatcaattta agaaataatc cgcataagcc 3360
cccgcttaaa aattggtatc agagccatga ataggtctat gaccaaaact caagaggata 3420
aaacctcacc aaaatacgaa agagttctta actctaaaga taaaagatct ttcaagatca 3480
aaactagttc cctcacaccg gagcatgcga tatcctcgag agatctaggc gtaatcatgg 3540
tcatagctgt ttcctgtgtg aaattgttat ccgctcacaa ttccacacaa catacgagcc 3600
ggaagcataa agtgtaaagc ctggggtgcc taatgagtga gctaactcac attaattgcg 3660
ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgt gccagctgca ttaatgaatc 3720
ggccaacgcg cggggagagg cggtttgcgt attgggcgct cttccgcttc ctcgctcact 3780
gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat cagctcactc aaaggcggta 3840
atacggttat ccacagaatc aggggataac gcaggaaaga acatgtgagc aaaaggccag 3900
caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag gctccgcccc 3960
cctgacgagc atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta 4020
taaagatacc aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg 4080
ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcaatgc 4140
tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac 4200
gaaccccccg ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac 4260
ccggtaagac acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg 4320
aggtatgtag gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga 4380
aggacagtat ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt 4440
agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag 4500
cagattacgc gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct 4560
gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg 4620
atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat 4680
gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc 4740
tgtctatttc gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg 4800
gagggcttac catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct 4860
ccagatttat cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca 4920
actttatccg cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg 4980
ccagttaata gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg 5040
tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc 5100
cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag 5160
ttggccgcag tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg 5220
ccatccgtaa gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag 5280
tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat 5340
agcagaactt taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg 5400
atcttaccgc tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca 5460
gcatctttta ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca 5520
aaaaagggaa taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat 5580
tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag 5640
aaaaataaac aaataggggt tccgcgcaca tttccccgaa aagtgccacc tgacgt 5696
<210> 10
<211> 3514
<212> DNA
<213> Artificial Sequence
<220>
<223> pUC57-APOBEC1-XTEN-UGI vector
<400> 10
cgacgtagcc acctactccc aacatcagcc ggactccgat tacctcggga acttgctccg 60
tagtaagaca ttcatcgcgc ttgctgcctt cgaccaagaa gcggttgttg gcgctctcgc 120
ggcttacgtt ctgcccaggt ttgagcagcc gcgtagtgag atctatatct atgatctcgc 180
agtctccggc gagcaccgga ggcagggcat tgccaccgcg ctcatcaatc tcctcaagca 240
tgaggccaac gcgcttggtg cttatgtgat ctacgtgcaa gcagattacg gtgacgatcc 300
cgcagtggct ctctatacaa agttgggcat acgggaagaa gtgatgcact ttgatatcga 360
cccaagtacc gccacctaac aattcgttca agccgagatc ggcttcccgg ccgcggagtt 420
gttcggtaaa ttgtcacaac gccgctcatg acattaacct ataaaaatag gcgtatcacg 480
aggccctttt gtctcgcgcg tttcggtgat gacggtgaaa acctctgaca catgcagctc 540
ccggagaagg tcacagcttg tctgtaagcg gatgccggga gcagacaagc ccgtcagggc 600
gcgtcagcgg gtgttggcgg gtgtcggggc tggcttaact atgcggcatc agagcagatt 660
gtactgagag tgcaccatat gcggtgtgaa ataccgcaca gatgcgtaag gagaaaatac 720
cgcatcaggc gccattcgcc attcaggctg cgcaactgtt gggaagggcg atcggtgcgg 780
gcctcttcgc tattacgcca gctggcgaaa gggggatgtg ctgcaaggcg attaagttgg 840
gtaacgccag ggttttccca gtcacgacgt tgtaaaacga cggccagtga attcgagctc 900
ggtacctcgc gaatgccaaa gaagaagagg aaggtttcat cggagaccgg ccctgttgct 960
gttgacccca ccctgcggcg gagaatcgag ccacacgagt tcgaggtgtt cttcgaccca 1020
agggagctcc gcaaggagac gtgcctcctg tacgagatca actggggcgg caggcactcc 1080
atctggaggc acaccagcca aaacaccaac aagcacgtgg aggtcaactt catcgagaag 1140
ttcaccaccg agaggtactt ctgcccaaac acccgctgct ccatcacctg gttcctgtcc 1200
tggagcccat gcggcgagtg ctccagggcc atcaccgagt tcctcagccg ctacccacac 1260
gtcaccctgt tcatctacat cgccaggctc taccaccacg ccgacccaag gaacaggcag 1320
ggcctccgcg acctgatctc cagcggcgtg accatccaaa tcatgaccga gcaggagtcc 1380
ggctactgct ggaggaactt cgtcaactac tccccaagca acgaggccca ctggccaagg 1440
tacccacacc tctgggtgcg cctctacgtg ctcgagctgt actgcatcat cctcggcctg 1500
ccaccatgcc tcaacatcct gaggcgcaag caaccacagc tgaccttctt caccatcgcc 1560
ctccaaagct gccactacca gaggctccca ccacacatcc tgtgggctac cggcctcaag 1620
tccggcagcg agacgccagg cacctccgag agcgctacgc ctgaacttaa gcaaatcacg 1680
cgtgactccg gcggcagcac caacctgtcc gacatcatcg agaaggagac gggcaagcaa 1740
ctcgtgatcc aggagagcat cctcatgctg ccagaggagg tggaggaggt catcggcaac 1800
aagccagagt ccgacatcct ggtgcacacc gcctacgacg agtccaccga cgagaacgtc 1860
atgctcctga ccagcgacgc cccagagtac aagccatggg ccctcgtcat ccaggacagc 1920
aacggggaga acaagatcaa gatgctgtcg ggggggagcc caaagaagaa gcggaaggtg 1980
tagggatccc gggcccgtcg actgcagagg cctgcatgca agcttggcgt aatcatggtc 2040
atagctgttt cctgtgtgaa attgttatcc gctcacaatt ccacacaaca tacgagccgg 2100
aagcataaag tgtaaagcct ggggtgccta atgagtgagc taactcacat taattgcgtt 2160
gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc cagctgcatt aatgaatcgg 2220
ccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct tccgcttcct cgctcactga 2280
ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca gctcactcaa aggcggtaat 2340
acggttatcc acagaatcag gggataacgc aggaaagaac atgtgagcaa aaggccagca 2400
aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt ttccataggc tccgcccccc 2460
tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg cgaaacccga caggactata 2520
aagataccag gcgtttcccc ctggaagctc cctcgtgcgc tctcctgttc cgaccctgcc 2580
gcttaccgga tacctgtccg cctttctccc ttcgggaagc gtggcgcttt ctcatagctc 2640
acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc aagctgggct gtgtgcacga 2700
accccccgtt cagcccgacc gctgcgcctt atccggtaac tatcgtcttg agtccaaccc 2760
ggtaagacac gacttatcgc cactggcagc agccactggt aacaggatta gcagagcgag 2820
gtatgtaggc ggtgctacag agttcttgaa gtggtggcct aactacggct acactagaag 2880
aacagtattt ggtatctgcg ctctgctgaa gccagttacc ttcggaaaaa gagttggtag 2940
ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt ttttttgttt gcaagcagca 3000
gattacgcgc agaaaaaaag gatctcaaga agatcctttg atcttttcta cggggtctga 3060
cgctcagtgg aacgaaaact cacgttaagg gattttggtc atgaggcgca caccgtggaa 3120
acggatgaag gcacgaaccc agttgacata agcctgttcg gttcgtaaac tgtaatgcaa 3180
gtagcgtatg cgctcacgca actggtccag aaccttgacc gaacgcagcg gtggtaacgg 3240
cgcagtggcg gttttcatgg cttgttatga ctgttttttt gtacagtcta tgcctcgggc 3300
atccaagcag caagcgcgtt acgccgtggg tcgatgtttg atgttatgga gcagcaacga 3360
tgttacgcag cagcaacgat gttacgcagc agggcagtcg ccctaaaaca aagttaggtg 3420
gctcaagtat gggcatcatt cgcacatgta ggctcggccc tgaccaagtc aaatccatgc 3480
gggctgctct tgatcttttc ggtcgtgagt tcgg 3514
<210> 11
<211> 227
<212> PRT
<213> Artificial Sequence
<220>
<223> APOBEC1
<400> 11
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp Gly Gly Arg His Ser
35 40 45
Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu Val Asn
50 55 60
Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr Arg
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser
85 90 95
Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val Thr Leu Phe
100 105 110
Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln Ile Met Thr
130 135 140
Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr Ser Pro
145 150 155 160
Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp Val Arg Leu
165 170 175
Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu
225
<210> 12
<211> 16
<212> PRT
<213> Artificial Sequence
<220>
<223> XTEN
<400> 12
Lys Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu
1 5 10 15
<210> 13
<211> 1384
<212> PRT
<213> Artificial Sequence
<220>
<223> nCas9(D10A)
<400> 13
Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala
1010 1015 1020
Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe
1025 1030 1035
Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala
1040 1045 1050
Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu
1055 1060 1065
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
1070 1075 1080
Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr
1085 1090 1095
Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys
1100 1105 1110
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro
1115 1120 1125
Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1130 1135 1140
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys
1145 1150 1155
Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser
1160 1165 1170
Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys
1175 1180 1185
Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu
1190 1195 1200
Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly
1205 1210 1215
Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1220 1225 1230
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys
1250 1255 1260
His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys
1265 1270 1275
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala
1280 1285 1290
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn
1295 1300 1305
Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala
1310 1315 1320
Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
1325 1330 1335
Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr
1340 1345 1350
Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys
1370 1375 1380
Lys
<210> 14
<211> 1384
<212> PRT
<213> Artificial Sequence
<220>
<223> dCas9
<400> 14
Met Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp Ala Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala
1010 1015 1020
Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe
1025 1030 1035
Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala
1040 1045 1050
Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu
1055 1060 1065
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
1070 1075 1080
Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr
1085 1090 1095
Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys
1100 1105 1110
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro
1115 1120 1125
Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1130 1135 1140
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys
1145 1150 1155
Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser
1160 1165 1170
Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys
1175 1180 1185
Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu
1190 1195 1200
Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly
1205 1210 1215
Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1220 1225 1230
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys
1250 1255 1260
His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys
1265 1270 1275
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala
1280 1285 1290
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn
1295 1300 1305
Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala
1310 1315 1320
Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
1325 1330 1335
Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr
1340 1345 1350
Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys
1370 1375 1380
Lys
<210> 15
<211> 83
<212> PRT
<213> Artificial Sequence
<220>
<223> UGI
<400> 15
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1 5 10 15
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile
20 25 30
Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu
35 40 45
Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr
50 55 60
Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile
65 70 75 80
Lys Met Leu
<210> 16
<211> 239
<212> PRT
<213> Artificial Sequence
<220>
<223> GFP
<400> 16
Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu
1 5 10 15
Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly
20 25 30
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile
35 40 45
Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr
50 55 60
Phe Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met Lys
65 70 75 80
Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu
85 90 95
Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu
100 105 110
Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly
115 120 125
Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr
130 135 140
Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn
145 150 155 160
Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser
165 170 175
Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly
180 185 190
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu
195 200 205
Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe
210 215 220
Val Thr Ala Ala Gly Ile Thr His Gly Met Asp Glu Leu Tyr Lys
225 230 235
<210> 17
<211> 239
<212> PRT
<213> Artificial Sequence
<220>
<223> BFP
<400> 17
Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu
1 5 10 15
Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly
20 25 30
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile
35 40 45
Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr
50 55 60
Phe Thr His Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met Lys
65 70 75 80
Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu
85 90 95
Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu
100 105 110
Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly
115 120 125
Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr
130 135 140
Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn
145 150 155 160
Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser
165 170 175
Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly
180 185 190
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu
195 200 205
Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe
210 215 220
Val Thr Ala Ala Gly Ile Thr His Gly Met Asp Glu Leu Tyr Lys
225 230 235
<210> 18
<211> 4107
<212> DNA
<213> Artificial Sequence
<220>
<223> WT spCas9 nucleotide sequence
<400> 18
atggataaga aatactcaat aggcttagat atcggcacaa atagcgtcgg atgggcggtg 60
atcactgatg aatataaggt tccgtctaaa aagttcaagg ttctgggaaa tacagaccgc 120
cacagtatca aaaaaaatct tataggggct cttttatttg acagtggaga gacagcggaa 180
gcgactcgtc tcaaacggac agctcgtaga aggtatacac gtcggaagaa tcgtatttgt 240
tatctacagg agattttttc aaatgagatg gcgaaagtag atgatagttt ctttcatcga 300
cttgaagagt cttttttggt ggaagaagac aagaagcatg aacgtcatcc tatttttgga 360
aatatagtag atgaagttgc ttatcatgag aaatatccaa ctatctatca tctgcgaaaa 420
aaattggtag attctactga taaagcggat ttgcgcttaa tctatttggc cttagcgcat 480
atgattaagt ttcgtggtca ttttttgatt gagggagatt taaatcctga taatagtgat 540
gtggacaaac tatttatcca gttggtacaa acctacaatc aattatttga agaaaaccct 600
attaacgcaa gtggagtaga tgctaaagcg attctttctg cacgattgag taaatcaaga 660
cgattagaaa atctcattgc tcagctcccc ggtgagaaga aaaatggctt atttgggaat 720
ctcattgctt tgtcattggg tttgacccct aattttaaat caaattttga tttggcagaa 780
gatgctaaat tacagctttc aaaagatact tacgatgatg atttagataa tttattggcg 840
caaattggag atcaatatgc tgatttgttt ttggcagcta agaatttatc agatgctatt 900
ttactttcag atatcctaag agtaaatact gaaataacta aggctcccct atcagcttca 960
atgattaaac gctacgatga acatcatcaa gacttgactc ttttaaaagc tttagttcga 1020
caacaacttc cagaaaagta taaagaaatc ttttttgatc aatcaaaaaa cggatatgca 1080
ggttatattg atgggggagc tagccaagaa gaattttata aatttatcaa accaatttta 1140
gaaaaaatgg atggtactga ggaattattg gtgaaactaa atcgtgaaga tttgctgcgc 1200
aagcaacgga cctttgacaa cggctctatt ccccatcaaa ttcacttggg tgagctgcat 1260
gctattttga gaagacaaga agacttttat ccatttttaa aagacaatcg tgagaagatt 1320
gaaaaaatct tgacttttcg aattccttat tatgttggtc cattggcgcg tggcaatagt 1380
cgttttgcat ggatgactcg gaagtctgaa gaaacaatta ccccatggaa ttttgaagaa 1440
gttgtcgata aaggtgcttc agctcaatca tttattgaac gcatgacaaa ctttgataaa 1500
aatcttccaa atgaaaaagt actaccaaaa catagtttgc tttatgagta ttttacggtt 1560
tataacgaat tgacaaaggt caaatatgtt actgaaggaa tgcgaaaacc agcatttctt 1620
tcaggtgaac agaagaaagc cattgttgat ttactcttca aaacaaatcg aaaagtaacc 1680
gttaagcaat taaaagaaga ttatttcaaa aaaatagaat gttttgatag tgttgaaatt 1740
tcaggagttg aagatagatt taatgcttca ttaggtacct accatgattt gctaaaaatt 1800
attaaagata aagatttttt ggataatgaa gaaaatgaag atatcttaga ggatattgtt 1860
ttaacattga ccttatttga agatagggag atgattgagg aaagacttaa aacatatgct 1920
cacctctttg atgataaggt gatgaaacag cttaaacgtc gccgttatac tggttgggga 1980
cgtttgtctc gaaaattgat taatggtatt agggataagc aatctggcaa aacaatatta 2040
gattttttga aatcagatgg ttttgccaat cgcaatttta tgcagctgat ccatgatgat 2100
agtttgacat ttaaagaaga cattcaaaaa gcacaagtgt ctggacaagg cgatagttta 2160
catgaacata ttgcaaattt agctggtagc cctgctatta aaaaaggtat tttacagact 2220
gtaaaagttg ttgatgaatt ggtcaaagta atggggcggc ataagccaga aaatatcgtt 2280
attgaaatgg cacgtgaaaa tcagacaact caaaagggcc agaaaaattc gcgagagcgt 2340
atgaaacgaa tcgaagaagg tatcaaagaa ttaggaagtc agattcttaa agagcatcct 2400
gttgaaaata ctcaattgca aaatgaaaag ctctatctct attatctcca aaatggaaga 2460
gacatgtatg tggaccaaga attagatatt aatcgtttaa gtgattatga tgtcgatcac 2520
attgttccac aaagtttcct taaagacgat tcaatagaca ataaggtctt aacgcgttct 2580
gataaaaatc gtggtaaatc ggataacgtt ccaagtgaag aagtagtcaa aaagatgaaa 2640
aactattgga gacaacttct aaacgccaag ttaatcactc aacgtaagtt tgataattta 2700
acgaaagctg aacgtggagg tttgagtgaa cttgataaag ctggttttat caaacgccaa 2760
ttggttgaaa ctcgccaaat cactaagcat gtggcacaaa ttttggatag tcgcatgaat 2820
actaaatacg atgaaaatga taaacttatt cgagaggtta aagtgattac cttaaaatct 2880
aaattagttt ctgacttccg aaaagatttc caattctata aagtacgtga gattaacaat 2940
taccatcatg cccatgatgc gtatctaaat gccgtcgttg gaactgcttt gattaagaaa 3000
tatccaaaac ttgaatcgga gtttgtctat ggtgattata aagtttatga tgttcgtaaa 3060
atgattgcta agtctgagca agaaataggc aaagcaaccg caaaatattt cttttactct 3120
aatatcatga acttcttcaa aacagaaatt acacttgcaa atggagagat tcgcaaacgc 3180
cctctaatcg aaactaatgg ggaaactgga gaaattgtct gggataaagg gcgagatttt 3240
gccacagtgc gcaaagtatt gtccatgccc caagtcaata ttgtcaagaa aacagaagta 3300
cagacaggcg gattctccaa ggagtcaatt ttaccaaaaa gaaattcgga caagcttatt 3360
gctcgtaaaa aagactggga tccaaaaaaa tatggtggtt ttgatagtcc aacggtagct 3420
tattcagtcc tagtggttgc taaggtggaa aaagggaaat cgaagaagtt aaaatccgtt 3480
aaagagttac tagggatcac aattatggaa agaagttcct ttgaaaaaaa tccgattgac 3540
tttttagaag ctaaaggata taaggaagtt aaaaaagact taatcattaa actacctaaa 3600
tatagtcttt ttgagttaga aaacggtcgt aaacggatgc tggctagtgc cggagaatta 3660
caaaaaggaa atgagctggc tctgccaagc aaatatgtga attttttata tttagctagt 3720
cattatgaaa agttgaaggg tagtccagaa gataacgaac aaaaacaatt gtttgttgag 3780
cagcataagc attatttaga tgagattatt gagcaaatca gtgaattttc taagcgtgtt 3840
attttagcag atgccaattt agataaagtt cttagtgcat ataacaaaca tagagacaaa 3900
ccaatacgtg aacaagcaga aaatattatt catttattta cgttgacgaa tcttggagct 3960
cccgctgctt ttaaatattt tgatacaaca attgatcgta aacgatatac gtctacaaaa 4020
gaagttttag atgccactct tatccatcaa tccatcactg gtctttatga aacacgcatt 4080
gatttgagtc agctaggagg tgactga 4107
<210> 19
<211> 5214
<212> DNA
<213> Artificial Sequence
<220>
<223> Fusion NLS-APOBEC1-XTEN-nCas9-UGI-NLS nucleotide sequence
<400> 19
atgccaaaga agaagaggaa ggtttcatcg gagaccggcc ctgttgctgt tgaccccacc 60
ctgcggcgga gaatcgagcc acacgagttc gaggtgttct tcgacccaag ggagctccgc 120
aaggagacgt gcctcctgta cgagatcaac tggggcggca ggcactccat ctggaggcac 180
accagccaaa acaccaacaa gcacgtggag gtcaacttca tcgagaagtt caccaccgag 240
aggtacttct gcccaaacac ccgctgctcc atcacctggt tcctgtcctg gagcccatgc 300
ggcgagtgct ccagggccat caccgagttc ctcagccgct acccacacgt caccctgttc 360
atctacatcg ccaggctcta ccaccacgcc gacccaagga acaggcaggg cctccgcgac 420
ctgatctcca gcggcgtgac catccaaatc atgaccgagc aggagtccgg ctactgctgg 480
aggaacttcg tcaactactc cccaagcaac gaggcccact ggccaaggta cccacacctc 540
tgggtgcgcc tctacgtgct cgagctgtac tgcatcatcc tcggcctgcc accatgcctc 600
aacatcctga ggcgcaagca accacagctg accttcttca ccatcgccct ccaaagctgc 660
cactaccaga ggctcccacc acacatcctg tgggctaccg gcctcaagtc cggcagcgag 720
acgccaggca cctccgagag cgctacgcct gaacttaagg acaagaagta ctcgatcggc 780
ctcgccatcg ggacgaactc agttggctgg gccgtgatca ccgacgagta caaggtgccc 840
tctaagaagt tcaaggtcct ggggaacacc gaccgccatt ccatcaagaa gaacctcatc 900
ggcgctctcc tgttcgacag cggggagacc gctgaggcta cgaggctcaa gagaaccgct 960
aggcgccggt acacgagaag gaagaacagg atctgctacc tccaagagat tttctccaac 1020
gagatggcca aggttgacga ttcattcttc caccgcctgg aggagtcttt cctcgtggag 1080
gaggataaga agcacgagcg gcatcccatc ttcggcaaca tcgtggacga ggttgcctac 1140
cacgagaagt accctacgat ctaccatctg cggaagaagc tcgtggactc caccgataag 1200
gcggacctca gactgatcta cctcgctctg gcccacatga tcaagttccg cggccatttc 1260
ctgatcgagg gggatctcaa cccagacaac agcgatgttg acaagctgtt catccaactc 1320
gtgcagacct acaaccaact cttcgaggag aacccgatca acgcctctgg cgtggacgcg 1380
aaggctatcc tgtccgcgag gctctcgaag tccaggaggc tggagaacct gatcgctcag 1440
ctcccaggcg agaagaagaa cggcctgttc gggaacctca tcgctctcag cctggggctc 1500
accccgaact tcaagtcgaa cttcgatctc gctgaggacg ccaagctgca actctccaag 1560
gacacctacg acgatgacct cgataacctc ctggcccaga tcggcgatca atacgcggac 1620
ctgttcctcg ctgccaagaa cctgtcggac gccatcctcc tgtcagatat cctccgcgtg 1680
aacaccgaga tcacgaaggc tccactctct gcctccatga tcaagcgcta cgacgagcac 1740
catcaggatc tgaccctcct gaaggcgctg gtccgccaac agctcccgga gaagtacaag 1800
gagattttct tcgatcagtc gaagaacggc tacgctgggt acatcgacgg cggggcctca 1860
caagaggagt tctacaagtt catcaagcca atcctggaga agatggacgg cacggaggag 1920
ctcctggtga agctcaacag ggaggacctc ctgcggaagc agagaacctt cgataacggc 1980
agcatccccc accaaatcca tctcggggag ctgcacgcca tcctgagaag gcaagaggac 2040
ttctaccctt tcctcaagga taaccgggag aagatcgaga agatcctgac cttcagaatc 2100
ccatactacg tcggccctct cgcgcggggg aactcaagat tcgcttggat gacccgcaag 2160
tctgaggaga ccatcacgcc gtggaacttc gaggaggtgg tggacaaggg cgctagcgct 2220
cagtcgttca tcgagaggat gaccaacttc gacaagaacc tgcccaacga gaaggtgctc 2280
cctaagcact cgctcctgta cgagtacttc accgtctaca acgagctcac gaaggtgaag 2340
tacgtcaccg agggcatgcg caagccagcg ttcctgtccg gggagcagaa gaaggctatc 2400
gtggacctcc tgttcaagac caaccggaag gtcacggtta agcaactcaa ggaggactac 2460
ttcaagaaga tcgagtgctt cgattcggtc gagatcagcg gcgttgagga ccgcttcaac 2520
gccagcctcg ggacctacca cgatctcctg aagatcatca aggataagga cttcctggac 2580
aacgaggaga acgaggatat cctggaggac atcgtgctga ccctcacgct gttcgaggac 2640
agggagatga tcgaggagcg cctgaagacg tacgcccatc tcttcgatga caaggtcatg 2700
aagcaactca agcgccggag atacaccggc tgggggaggc tgtcccgcaa gctcatcaac 2760
ggcatccggg acaagcagtc cgggaagacc atcctcgact tcctcaagag cgatggcttc 2820
gccaacagga acttcatgca actgatccac gatgacagcc tcaccttcaa ggaggatatc 2880
caaaaggctc aagtgagcgg ccagggggac tcgctgcacg agcatatcgc gaacctcgct 2940
ggctcccccg cgatcaagaa gggcatcctc cagaccgtga aggttgtgga cgagctcgtg 3000
aaggtcatgg gccggcacaa gcctgagaac atcgtcatcg agatggccag agagaaccaa 3060
accacgcaga aggggcaaaa gaactctagg gagcgcatga agcgcatcga ggagggcatc 3120
aaggagctgg ggtcccaaat cctcaaggag cacccagtgg agaacaccca actgcagaac 3180
gagaagctct acctgtacta cctccagaac ggcagggata tgtacgtgga ccaagagctg 3240
gatatcaacc gcctcagcga ttacgacgtc gatcatatcg ttccccagtc tttcctgaag 3300
gatgactcca tcgacaacaa ggtcctcacc aggtcggaca agaaccgcgg caagtcagat 3360
aacgttccat ctgaggaggt cgttaagaag atgaagaact actggaggca gctcctgaac 3420
gccaagctga tcacgcaaag gaagttcgac aacctcacca aggctgagag aggcgggctc 3480
tcagagctgg acaaggccgg cttcatcaag cggcagctgg tcgagaccag acaaatcacg 3540
aagcacgttg cgcaaatcct cgactctcgg atgaacacga agtacgatga gaacgacaag 3600
ctgatcaggg aggttaaggt gatcaccctg aagtctaagc tcgtctccga cttcaggaag 3660
gatttccagt tctacaaggt tcgcgagatc aacaactacc accatgccca tgacgcttac 3720
ctcaacgctg tggtcggcac cgctctgatc aagaagtacc caaagctgga gtccgagttc 3780
gtgtacgggg actacaaggt ttacgatgtg cgcaagatga tcgccaagtc ggagcaagag 3840
atcggcaagg ctaccgccaa gtacttcttc tactcaaaca tcatgaactt cttcaagacc 3900
gagatcacgc tggccaacgg cgagatccgg aagagaccgc tcatcgagac caacggcgag 3960
acgggggaga tcgtgtggga caagggcagg gatttcgcga ccgtccgcaa ggttctctcc 4020
atgccccagg tgaacatcgt caagaagacc gaggtccaaa cgggcgggtt ctcaaaggag 4080
tctatcctgc ctaagcggaa cagcgacaag ctcatcgcca gaaagaagga ctgggaccca 4140
aagaagtacg gcgggttcga cagccctacc gtggcctact cggtcctggt tgtggcgaag 4200
gttgagaagg gcaagtccaa gaagctcaag agcgtgaagg agctcctggg gatcaccatc 4260
atggagaggt ccagcttcga gaagaaccca atcgacttcc tggaggccaa gggctacaag 4320
gaggtgaaga aggacctgat catcaagctc ccgaagtact ctctcttcga gctggagaac 4380
ggcaggaaga gaatgctggc ttccgctggc gagctccaga aggggaacga gctcgcgctg 4440
ccaagcaagt acgtgaactt cctctacctg gcttcccact acgagaagct caagggcagc 4500
ccggaggaca acgagcaaaa gcagctgttc gtcgagcagc acaagcatta cctcgacgag 4560
atcatcgagc aaatctccga gttcagcaag cgcgtgatcc tcgccgacgc gaacctggat 4620
aaggtcctct ccgcctacaa caagcaccgg gacaagccca tcagagagca agcggagaac 4680
atcatccatc tcttcaccct gacgaacctc ggcgctcctg ctgctttcaa gtacttcgac 4740
accacgatcg atcggaagag atacacctcc acgaaggagg tcctggacgc gaccctcatc 4800
caccagtcga tcaccggcct gtacgagacg aggatcgacc tctcacaact cggcggggat 4860
aagagacccg cagcaaccaa gaaggcaggg caagcaaaga agaagaagac gcgtgactcc 4920
ggcggcagca ccaacctgtc cgacatcatc gagaaggaga cgggcaagca actcgtgatc 4980
caggagagca tcctcatgct gccagaggag gtggaggagg tcatcggcaa caagccagag 5040
tccgacatcc tggtgcacac cgcctacgac gagtccaccg acgagaacgt catgctcctg 5100
accagcgacg ccccagagta caagccatgg gccctcgtca tccaggacag caacggggag 5160
aacaagatca agatgctgtc gggggggagc ccaaagaaga agcggaaggt gtag 5214
<210> 20
<211> 5214
<212> DNA
<213> Artificial Sequence
<220>
<223> fusion NLS-APOBEC1-XTEN-dCas9-UGI-NLS nucleotide sequence
<400> 20
atgccaaaga agaagaggaa ggtttcatcg gagaccggcc ctgttgctgt tgaccccacc 60
ctgcggcgga gaatcgagcc acacgagttc gaggtgttct tcgacccaag ggagctccgc 120
aaggagacgt gcctcctgta cgagatcaac tggggcggca ggcactccat ctggaggcac 180
accagccaaa acaccaacaa gcacgtggag gtcaacttca tcgagaagtt caccaccgag 240
aggtacttct gcccaaacac ccgctgctcc atcacctggt tcctgtcctg gagcccatgc 300
ggcgagtgct ccagggccat caccgagttc ctcagccgct acccacacgt caccctgttc 360
atctacatcg ccaggctcta ccaccacgcc gacccaagga acaggcaggg cctccgcgac 420
ctgatctcca gcggcgtgac catccaaatc atgaccgagc aggagtccgg ctactgctgg 480
aggaacttcg tcaactactc cccaagcaac gaggcccact ggccaaggta cccacacctc 540
tgggtgcgcc tctacgtgct cgagctgtac tgcatcatcc tcggcctgcc accatgcctc 600
aacatcctga ggcgcaagca accacagctg accttcttca ccatcgccct ccaaagctgc 660
cactaccaga ggctcccacc acacatcctg tgggctaccg gcctcaagtc cggcagcgag 720
acgccaggca cctccgagag cgctacgcct gaacttaagg acaagaagta ctcgatcggc 780
ctcgccatcg ggacgaactc agttggctgg gccgtgatca ccgacgagta caaggtgccc 840
tctaagaagt tcaaggtcct ggggaacacc gaccgccatt ccatcaagaa gaacctcatc 900
ggcgctctcc tgttcgacag cggggagacc gctgaggcta cgaggctcaa gagaaccgct 960
aggcgccggt acacgagaag gaagaacagg atctgctacc tccaagagat tttctccaac 1020
gagatggcca aggttgacga ttcattcttc caccgcctgg aggagtcttt cctcgtggag 1080
gaggataaga agcacgagcg gcatcccatc ttcggcaaca tcgtggacga ggttgcctac 1140
cacgagaagt accctacgat ctaccatctg cggaagaagc tcgtggactc caccgataag 1200
gcggacctca gactgatcta cctcgctctg gcccacatga tcaagttccg cggccatttc 1260
ctgatcgagg gggatctcaa cccagacaac agcgatgttg acaagctgtt catccaactc 1320
gtgcagacct acaaccaact cttcgaggag aacccgatca acgcctctgg cgtggacgcg 1380
aaggctatcc tgtccgcgag gctctcgaag tccaggaggc tggagaacct gatcgctcag 1440
ctcccaggcg agaagaagaa cggcctgttc gggaacctca tcgctctcag cctggggctc 1500
accccgaact tcaagtcgaa cttcgatctc gctgaggacg ccaagctgca actctccaag 1560
gacacctacg acgatgacct cgataacctc ctggcccaga tcggcgatca atacgcggac 1620
ctgttcctcg ctgccaagaa cctgtcggac gccatcctcc tgtcagatat cctccgcgtg 1680
aacaccgaga tcacgaaggc tccactctct gcctccatga tcaagcgcta cgacgagcac 1740
catcaggatc tgaccctcct gaaggcgctg gtccgccaac agctcccgga gaagtacaag 1800
gagattttct tcgatcagtc gaagaacggc tacgctgggt acatcgacgg cggggcctca 1860
caagaggagt tctacaagtt catcaagcca atcctggaga agatggacgg cacggaggag 1920
ctcctggtga agctcaacag ggaggacctc ctgcggaagc agagaacctt cgataacggc 1980
agcatccccc accaaatcca tctcggggag ctgcacgcca tcctgagaag gcaagaggac 2040
ttctaccctt tcctcaagga taaccgggag aagatcgaga agatcctgac cttcagaatc 2100
ccatactacg tcggccctct cgcgcggggg aactcaagat tcgcttggat gacccgcaag 2160
tctgaggaga ccatcacgcc gtggaacttc gaggaggtgg tggacaaggg cgctagcgct 2220
cagtcgttca tcgagaggat gaccaacttc gacaagaacc tgcccaacga gaaggtgctc 2280
cctaagcact cgctcctgta cgagtacttc accgtctaca acgagctcac gaaggtgaag 2340
tacgtcaccg agggcatgcg caagccagcg ttcctgtccg gggagcagaa gaaggctatc 2400
gtggacctcc tgttcaagac caaccggaag gtcacggtta agcaactcaa ggaggactac 2460
ttcaagaaga tcgagtgctt cgattcggtc gagatcagcg gcgttgagga ccgcttcaac 2520
gccagcctcg ggacctacca cgatctcctg aagatcatca aggataagga cttcctggac 2580
aacgaggaga acgaggatat cctggaggac atcgtgctga ccctcacgct gttcgaggac 2640
agggagatga tcgaggagcg cctgaagacg tacgcccatc tcttcgatga caaggtcatg 2700
aagcaactca agcgccggag atacaccggc tgggggaggc tgtcccgcaa gctcatcaac 2760
ggcatccggg acaagcagtc cgggaagacc atcctcgact tcctcaagag cgatggcttc 2820
gccaacagga acttcatgca actgatccac gatgacagcc tcaccttcaa ggaggatatc 2880
caaaaggctc aagtgagcgg ccagggggac tcgctgcacg agcatatcgc gaacctcgct 2940
ggctcccccg cgatcaagaa gggcatcctc cagaccgtga aggttgtgga cgagctcgtg 3000
aaggtcatgg gccggcacaa gcctgagaac atcgtcatcg agatggccag agagaaccaa 3060
accacgcaga aggggcaaaa gaactctagg gagcgcatga agcgcatcga ggagggcatc 3120
aaggagctgg ggtcccaaat cctcaaggag cacccagtgg agaacaccca actgcagaac 3180
gagaagctct acctgtacta cctccagaac ggcagggata tgtacgtgga ccaagagctg 3240
gatatcaacc gcctcagcga ttacgacgtc gatgctatcg ttccccagtc tttcctgaag 3300
gatgactcca tcgacaacaa ggtcctcacc aggtcggaca agaaccgcgg caagtcagat 3360
aacgttccat ctgaggaggt cgttaagaag atgaagaact actggaggca gctcctgaac 3420
gccaagctga tcacgcaaag gaagttcgac aacctcacca aggctgagag aggcgggctc 3480
tcagagctgg acaaggccgg cttcatcaag cggcagctgg tcgagaccag acaaatcacg 3540
aagcacgttg cgcaaatcct cgactctcgg atgaacacga agtacgatga gaacgacaag 3600
ctgatcaggg aggttaaggt gatcaccctg aagtctaagc tcgtctccga cttcaggaag 3660
gatttccagt tctacaaggt tcgcgagatc aacaactacc accatgccca tgacgcttac 3720
ctcaacgctg tggtcggcac cgctctgatc aagaagtacc caaagctgga gtccgagttc 3780
gtgtacgggg actacaaggt ttacgatgtg cgcaagatga tcgccaagtc ggagcaagag 3840
atcggcaagg ctaccgccaa gtacttcttc tactcaaaca tcatgaactt cttcaagacc 3900
gagatcacgc tggccaacgg cgagatccgg aagagaccgc tcatcgagac caacggcgag 3960
acgggggaga tcgtgtggga caagggcagg gatttcgcga ccgtccgcaa ggttctctcc 4020
atgccccagg tgaacatcgt caagaagacc gaggtccaaa cgggcgggtt ctcaaaggag 4080
tctatcctgc ctaagcggaa cagcgacaag ctcatcgcca gaaagaagga ctgggaccca 4140
aagaagtacg gcgggttcga cagccctacc gtggcctact cggtcctggt tgtggcgaag 4200
gttgagaagg gcaagtccaa gaagctcaag agcgtgaagg agctcctggg gatcaccatc 4260
atggagaggt ccagcttcga gaagaaccca atcgacttcc tggaggccaa gggctacaag 4320
gaggtgaaga aggacctgat catcaagctc ccgaagtact ctctcttcga gctggagaac 4380
ggcaggaaga gaatgctggc ttccgctggc gagctccaga aggggaacga gctcgcgctg 4440
ccaagcaagt acgtgaactt cctctacctg gcttcccact acgagaagct caagggcagc 4500
ccggaggaca acgagcaaaa gcagctgttc gtcgagcagc acaagcatta cctcgacgag 4560
atcatcgagc aaatctccga gttcagcaag cgcgtgatcc tcgccgacgc gaacctggat 4620
aaggtcctct ccgcctacaa caagcaccgg gacaagccca tcagagagca agcggagaac 4680
atcatccatc tcttcaccct gacgaacctc ggcgctcctg ctgctttcaa gtacttcgac 4740
accacgatcg atcggaagag atacacctcc acgaaggagg tcctggacgc gaccctcatc 4800
caccagtcga tcaccggcct gtacgagacg aggatcgacc tctcacaact cggcggggat 4860
aagagacccg cagcaaccaa gaaggcaggg caagcaaaga agaagaagac gcgtgactcc 4920
ggcggcagca ccaacctgtc cgacatcatc gagaaggaga cgggcaagca actcgtgatc 4980
caggagagca tcctcatgct gccagaggag gtggaggagg tcatcggcaa caagccagag 5040
tccgacatcc tggtgcacac cgcctacgac gagtccaccg acgagaacgt catgctcctg 5100
accagcgacg ccccagagta caagccatgg gccctcgtca tccaggacag caacggggag 5160
aacaagatca agatgctgtc gggggggagc ccaaagaaga agcggaaggt gtag 5214
<210> 21
<211> 1368
<212> PRT
<213> Artificial Sequence
<220>
<223> WT spCas9
<400> 21
Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala
1010 1015 1020
Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe
1025 1030 1035
Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala
1040 1045 1050
Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu
1055 1060 1065
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
1070 1075 1080
Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr
1085 1090 1095
Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys
1100 1105 1110
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro
1115 1120 1125
Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1130 1135 1140
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys
1145 1150 1155
Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser
1160 1165 1170
Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys
1175 1180 1185
Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu
1190 1195 1200
Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly
1205 1210 1215
Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1220 1225 1230
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys
1250 1255 1260
His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys
1265 1270 1275
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala
1280 1285 1290
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn
1295 1300 1305
Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala
1310 1315 1320
Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
1325 1330 1335
Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr
1340 1345 1350
Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
<210> 22
<211> 1737
<212> PRT
<213> Artificial Sequence
<220>
<223> Fusion NLS-APOBEC1-XTEN-nCas9-UGI-NLS
<400> 22
Met Pro Lys Lys Lys Arg Lys Val Ser Ser Glu Thr Gly Pro Val Ala
1 5 10 15
Val Asp Pro Thr Leu Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val
20 25 30
Phe Phe Asp Pro Arg Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu
35 40 45
Ile Asn Trp Gly Gly Arg His Ser Ile Trp Arg His Thr Ser Gln Asn
50 55 60
Thr Asn Lys His Val Glu Val Asn Phe Ile Glu Lys Phe Thr Thr Glu
65 70 75 80
Arg Tyr Phe Cys Pro Asn Thr Arg Cys Ser Ile Thr Trp Phe Leu Ser
85 90 95
Trp Ser Pro Cys Gly Glu Cys Ser Arg Ala Ile Thr Glu Phe Leu Ser
100 105 110
Arg Tyr Pro His Val Thr Leu Phe Ile Tyr Ile Ala Arg Leu Tyr His
115 120 125
His Ala Asp Pro Arg Asn Arg Gln Gly Leu Arg Asp Leu Ile Ser Ser
130 135 140
Gly Val Thr Ile Gln Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp
145 150 155 160
Arg Asn Phe Val Asn Tyr Ser Pro Ser Asn Glu Ala His Trp Pro Arg
165 170 175
Tyr Pro His Leu Trp Val Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile
180 185 190
Ile Leu Gly Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro
195 200 205
Gln Leu Thr Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg
210 215 220
Leu Pro Pro His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Ser Glu
225 230 235 240
Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Leu Lys Asp Lys Lys
245 250 255
Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val
260 265 270
Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly
275 280 285
Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu
290 295 300
Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala
305 310 315 320
Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu
325 330 335
Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg
340 345 350
Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His
355 360 365
Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr
370 375 380
Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys
385 390 395 400
Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe
405 410 415
Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp
420 425 430
Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe
435 440 445
Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu
450 455 460
Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln
465 470 475 480
Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu
485 490 495
Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu
500 505 510
Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp
515 520 525
Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala
530 535 540
Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val
545 550 555 560
Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg
565 570 575
Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg
580 585 590
Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys
595 600 605
Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe
610 615 620
Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu
625 630 635 640
Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr
645 650 655
Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His
660 665 670
Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn
675 680 685
Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val
690 695 700
Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys
705 710 715 720
Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys
725 730 735
Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys
740 745 750
Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu
755 760 765
Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu
770 775 780
Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile
785 790 795 800
Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu
805 810 815
Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile
820 825 830
Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp
835 840 845
Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn
850 855 860
Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp
865 870 875 880
Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp
885 890 895
Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly
900 905 910
Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly
915 920 925
Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn
930 935 940
Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile
945 950 955 960
Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile
965 970 975
Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr
980 985 990
Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro
995 1000 1005
Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln
1010 1015 1020
Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu
1025 1030 1035
Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
1040 1045 1050
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
1055 1060 1065
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn
1070 1075 1080
Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe
1085 1090 1095
Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp
1100 1105 1110
Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val
1115 1120 1125
Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu
1130 1135 1140
Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly
1145 1150 1155
Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu
1160 1165 1170
Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp
1175 1180 1185
Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
1190 1195 1200
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe
1205 1210 1215
Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr
1220 1225 1230
His His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala
1235 1240 1245
Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly
1250 1255 1260
Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu
1265 1270 1275
Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn
1280 1285 1290
Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1295 1300 1305
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu
1310 1315 1320
Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val
1325 1330 1335
Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln
1340 1345 1350
Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser
1355 1360 1365
Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr
1370 1375 1380
Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val
1385 1390 1395
Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys
1400 1405 1410
Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys
1415 1420 1425
Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1430 1435 1440
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu
1445 1450 1455
Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln
1460 1465 1470
Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu
1475 1480 1485
Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp
1490 1495 1500
Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu
1505 1510 1515
Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile
1520 1525 1530
Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1535 1540 1545
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His
1550 1555 1560
Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr
1565 1570 1575
Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu
1580 1585 1590
Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr
1595 1600 1605
Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro
1610 1615 1620
Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys Thr Arg
1625 1630 1635
Asp Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu
1640 1645 1650
Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro
1655 1660 1665
Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile
1670 1675 1680
Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met
1685 1690 1695
Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val
1700 1705 1710
Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly
1715 1720 1725
Gly Ser Pro Lys Lys Lys Arg Lys Val
1730 1735
<210> 23
<211> 1737
<212> PRT
<213> Artificial Sequence
<220>
<223> fusion NLS-APOBEC1-XTEN-dCas9-UGI-NLS
<400> 23
Met Pro Lys Lys Lys Arg Lys Val Ser Ser Glu Thr Gly Pro Val Ala
1 5 10 15
Val Asp Pro Thr Leu Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val
20 25 30
Phe Phe Asp Pro Arg Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu
35 40 45
Ile Asn Trp Gly Gly Arg His Ser Ile Trp Arg His Thr Ser Gln Asn
50 55 60
Thr Asn Lys His Val Glu Val Asn Phe Ile Glu Lys Phe Thr Thr Glu
65 70 75 80
Arg Tyr Phe Cys Pro Asn Thr Arg Cys Ser Ile Thr Trp Phe Leu Ser
85 90 95
Trp Ser Pro Cys Gly Glu Cys Ser Arg Ala Ile Thr Glu Phe Leu Ser
100 105 110
Arg Tyr Pro His Val Thr Leu Phe Ile Tyr Ile Ala Arg Leu Tyr His
115 120 125
His Ala Asp Pro Arg Asn Arg Gln Gly Leu Arg Asp Leu Ile Ser Ser
130 135 140
Gly Val Thr Ile Gln Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp
145 150 155 160
Arg Asn Phe Val Asn Tyr Ser Pro Ser Asn Glu Ala His Trp Pro Arg
165 170 175
Tyr Pro His Leu Trp Val Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile
180 185 190
Ile Leu Gly Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro
195 200 205
Gln Leu Thr Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg
210 215 220
Leu Pro Pro His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Ser Glu
225 230 235 240
Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Leu Lys Asp Lys Lys
245 250 255
Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val
260 265 270
Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly
275 280 285
Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu
290 295 300
Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala
305 310 315 320
Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu
325 330 335
Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg
340 345 350
Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His
355 360 365
Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr
370 375 380
Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys
385 390 395 400
Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe
405 410 415
Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp
420 425 430
Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe
435 440 445
Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu
450 455 460
Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln
465 470 475 480
Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu
485 490 495
Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu
500 505 510
Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp
515 520 525
Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala
530 535 540
Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val
545 550 555 560
Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg
565 570 575
Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg
580 585 590
Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys
595 600 605
Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe
610 615 620
Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu
625 630 635 640
Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr
645 650 655
Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His
660 665 670
Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn
675 680 685
Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val
690 695 700
Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys
705 710 715 720
Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys
725 730 735
Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys
740 745 750
Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu
755 760 765
Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu
770 775 780
Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile
785 790 795 800
Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu
805 810 815
Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile
820 825 830
Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp
835 840 845
Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn
850 855 860
Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp
865 870 875 880
Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp
885 890 895
Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly
900 905 910
Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly
915 920 925
Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn
930 935 940
Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile
945 950 955 960
Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile
965 970 975
Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr
980 985 990
Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro
995 1000 1005
Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln
1010 1015 1020
Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu
1025 1030 1035
Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
1040 1045 1050
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
1055 1060 1065
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn
1070 1075 1080
Arg Leu Ser Asp Tyr Asp Val Asp Ala Ile Val Pro Gln Ser Phe
1085 1090 1095
Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp
1100 1105 1110
Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val
1115 1120 1125
Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu
1130 1135 1140
Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly
1145 1150 1155
Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu
1160 1165 1170
Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp
1175 1180 1185
Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
1190 1195 1200
Glu Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe
1205 1210 1215
Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr
1220 1225 1230
His His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala
1235 1240 1245
Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly
1250 1255 1260
Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu
1265 1270 1275
Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn
1280 1285 1290
Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu
1295 1300 1305
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu
1310 1315 1320
Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val
1325 1330 1335
Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln
1340 1345 1350
Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser
1355 1360 1365
Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr
1370 1375 1380
Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val
1385 1390 1395
Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys
1400 1405 1410
Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys
1415 1420 1425
Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
1430 1435 1440
Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu
1445 1450 1455
Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln
1460 1465 1470
Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu
1475 1480 1485
Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp
1490 1495 1500
Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu
1505 1510 1515
Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile
1520 1525 1530
Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys
1535 1540 1545
His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His
1550 1555 1560
Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr
1565 1570 1575
Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu
1580 1585 1590
Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr
1595 1600 1605
Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro
1610 1615 1620
Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys Thr Arg
1625 1630 1635
Asp Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu
1640 1645 1650
Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro
1655 1660 1665
Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile
1670 1675 1680
Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met
1685 1690 1695
Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val
1700 1705 1710
Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly
1715 1720 1725
Gly Ser Pro Lys Lys Lys Arg Lys Val
1730 1735
<210> 24
<211> 474
<212> DNA
<213> Artificial Sequence
<220>
<223> ZmCENH3 encoding sequence
<400> 24
atggctcgaa ccaagcacca ggccgtgagg aagacggcgg agaagcccaa gaagaagctc 60
cagttcgagc gctcaggtgg tgcgagtacc tcggcgacgc cggaaagggc tgctgggacc 120
gggggaagag cggcgtctgg aggtgactca gttaagaaga cgaaaccacg ccaccgctgg 180
cggccaggga ctgtagcgct gcgggagatc aggaagtacc agaagtccac tgaaccgctc 240
atcccctttg cgcctttcgt ccgtgtggtg agggagttaa ccaatttcgt aacaaacggg 300
aaagtagagc gctataccgc agaagccctc cttgcgctgc aagaggcagc agaattccac 360
ttgatagaac tgtttgaaat ggcgaatctg tgtgccatcc atgccaagcg tgtcacaatc 420
atgcaaaagg acatacaact tgcaaggcgt atcggaggaa ggcgttgggc atga 474
<210> 25
<211> 157
<212> PRT
<213> Artificial Sequence
<220>
<223> ZmCENH3
<400> 25
Met Ala Arg Thr Lys His Gln Ala Val Arg Lys Thr Ala Glu Lys Pro
1 5 10 15
Lys Lys Lys Leu Gln Phe Glu Arg Ser Gly Gly Ala Ser Thr Ser Ala
20 25 30
Thr Pro Glu Arg Ala Ala Gly Thr Gly Gly Arg Ala Ala Ser Gly Gly
35 40 45
Asp Ser Val Lys Lys Thr Lys Pro Arg His Arg Trp Arg Pro Gly Thr
50 55 60
Val Ala Leu Arg Glu Ile Arg Lys Tyr Gln Lys Ser Thr Glu Pro Leu
65 70 75 80
Ile Pro Phe Ala Pro Phe Val Arg Val Val Arg Glu Leu Thr Asn Phe
85 90 95
Val Thr Asn Gly Lys Val Glu Arg Tyr Thr Ala Glu Ala Leu Leu Ala
100 105 110
Leu Gln Glu Ala Ala Glu Phe His Leu Ile Glu Leu Phe Glu Met Ala
115 120 125
Asn Leu Cys Ala Ile His Ala Lys Arg Val Thr Ile Met Gln Lys Asp
130 135 140
Ile Gln Leu Ala Arg Arg Ile Gly Gly Arg Arg Trp Ala
145 150 155
<210> 26
<211> 1917
<212> DNA
<213> Artificial Sequence
<220>
<223> ZmALS1 encoding sequence
<400> 26
atggccaccg ccgccaccgc ggccgccgcg ctcaccggcg ccactaccgc tacgcccaag 60
tcgaggcgcc gagcccacca cttggccacc cggcgcgccc tcgccgcgcc catcaggtgc 120
tcagcgttgt cacgcgccac gccgacggct cccccggcca ctccgctacg tccgtggggc 180
cccaacgagc cccgcaaggg ctccgacatc ctcgtcgagg ctctcgagcg ctgtggcgtc 240
cgtgacgtct tcgcctaccc cggcggcgca tccatggaga tccaccaggc actcacccgc 300
tcccccgtca tcgccaacca cctcttccgc cacgaacaag gggaggcctt cgccgcctcc 360
ggctacgcgc gctcctcggg ccgcgttggc gtctgcatcg ccacctccgg ccccggcgcc 420
accaacctag tctctgcgct cgcagacgcg ttgctcgact ccgtccccat tgtcgccatc 480
acgggacagg tgccgcgacg catgattggc accgacgcct ttcaggagac gcccatcgtc 540
gaggtcaccc gctccatcac caagcacaac tacctggtcc tcgacgtcga cgacatcccc 600
cgcgtcgtgc aggaggcctt cttcctcgca tcctctggtc gcccggggcc ggtgcttgtt 660
gacatcccca aggacatcca gcagcagatg gcggtgccgg cctgggacac gcccatgagt 720
ctgcctgggt acatcgcgcg ccttcccaag cctcccgcga ctgaatttct tgagcaggtg 780
ctgcgtcttg ttggtgaatc acggcgccct gttctttatg ttggcggtgg ctgtgcagca 840
tcaggtgagg agttgtgccg ctttgtggag ttgactggaa tcccagtcac aactactctt 900
atgggccttg gcaacttccc cagcgacgac ccactgtcac tgcgcatgct tggtatgcat 960
ggcacagtgt atgcaaatta tgcagtggat aaggccgatc tgttgcttgc atttggtgtg 1020
cggtttgatg atcgtgtgac agggaaaatt gaggcttttg caggcagagc taagattgtg 1080
cacattgata ttgatcctgc tgagattggc aagaacaagc agccacatgt gtccatctgt 1140
gcagatgtta agcttgcttt gcagggcatg aatactcttc tggaaggaag cacatcaaag 1200
aagagctttg acttcggctc atggcatgat gaattggatc agcaaaagcg ggagtttccc 1260
cttgggtata aaatcttcaa tgaggaaatc cagccacaat atgctattca ggttcttgat 1320
gagttgacga aggggaaggc catcattgcc acaggtgttg ggcagcacca gatgtgggcg 1380
gcacagtatt acacttacaa gcggccaagg cagtggctgt cttcagctgg tcttggggct 1440
atgggatttg gtttgccggc tgctgctggt gctgctgtgg ccaacccagg tgtcactgtt 1500
gttgacatcg acggagatgg tagcttcctc atgaacattc aggagctagc tatgatccgt 1560
attgagaacc tcccagtcaa ggtctttgtg ctaaacaacc agcacctcgg gatggtggtg 1620
cagtgggagg acaggttcta taaggccaat agagcacaca cattcttggg aaacccagag 1680
aacgaaagtg agatatatcc agattttgtg gcaattgcca aagggttcaa cattccagca 1740
gtccgtgtga caaagaagag cgaagtccat gcagcaatca agaagatgct tgaggctcca 1800
gggccgtacc tcttggatat aatcgtcccg caccaggagc atgtgttgcc tatgatccct 1860
agtggtgggg ctttcaagga tatgatcctg gatggtgatg gcaggactgt gtattga 1917
<210> 27
<211> 638
<212> PRT
<213> Artificial Sequence
<220>
<223> ZmALS1
<400> 27
Met Ala Thr Ala Ala Thr Ala Ala Ala Ala Leu Thr Gly Ala Thr Thr
1 5 10 15
Ala Thr Pro Lys Ser Arg Arg Arg Ala His His Leu Ala Thr Arg Arg
20 25 30
Ala Leu Ala Ala Pro Ile Arg Cys Ser Ala Leu Ser Arg Ala Thr Pro
35 40 45
Thr Ala Pro Pro Ala Thr Pro Leu Arg Pro Trp Gly Pro Asn Glu Pro
50 55 60
Arg Lys Gly Ser Asp Ile Leu Val Glu Ala Leu Glu Arg Cys Gly Val
65 70 75 80
Arg Asp Val Phe Ala Tyr Pro Gly Gly Ala Ser Met Glu Ile His Gln
85 90 95
Ala Leu Thr Arg Ser Pro Val Ile Ala Asn His Leu Phe Arg His Glu
100 105 110
Gln Gly Glu Ala Phe Ala Ala Ser Gly Tyr Ala Arg Ser Ser Gly Arg
115 120 125
Val Gly Val Cys Ile Ala Thr Ser Gly Pro Gly Ala Thr Asn Leu Val
130 135 140
Ser Ala Leu Ala Asp Ala Leu Leu Asp Ser Val Pro Ile Val Ala Ile
145 150 155 160
Thr Gly Gln Val Pro Arg Arg Met Ile Gly Thr Asp Ala Phe Gln Glu
165 170 175
Thr Pro Ile Val Glu Val Thr Arg Ser Ile Thr Lys His Asn Tyr Leu
180 185 190
Val Leu Asp Val Asp Asp Ile Pro Arg Val Val Gln Glu Ala Phe Phe
195 200 205
Leu Ala Ser Ser Gly Arg Pro Gly Pro Val Leu Val Asp Ile Pro Lys
210 215 220
Asp Ile Gln Gln Gln Met Ala Val Pro Ala Trp Asp Thr Pro Met Ser
225 230 235 240
Leu Pro Gly Tyr Ile Ala Arg Leu Pro Lys Pro Pro Ala Thr Glu Phe
245 250 255
Leu Glu Gln Val Leu Arg Leu Val Gly Glu Ser Arg Arg Pro Val Leu
260 265 270
Tyr Val Gly Gly Gly Cys Ala Ala Ser Gly Glu Glu Leu Cys Arg Phe
275 280 285
Val Glu Leu Thr Gly Ile Pro Val Thr Thr Thr Leu Met Gly Leu Gly
290 295 300
Asn Phe Pro Ser Asp Asp Pro Leu Ser Leu Arg Met Leu Gly Met His
305 310 315 320
Gly Thr Val Tyr Ala Asn Tyr Ala Val Asp Lys Ala Asp Leu Leu Leu
325 330 335
Ala Phe Gly Val Arg Phe Asp Asp Arg Val Thr Gly Lys Ile Glu Ala
340 345 350
Phe Ala Gly Arg Ala Lys Ile Val His Ile Asp Ile Asp Pro Ala Glu
355 360 365
Ile Gly Lys Asn Lys Gln Pro His Val Ser Ile Cys Ala Asp Val Lys
370 375 380
Leu Ala Leu Gln Gly Met Asn Thr Leu Leu Glu Gly Ser Thr Ser Lys
385 390 395 400
Lys Ser Phe Asp Phe Gly Ser Trp His Asp Glu Leu Asp Gln Gln Lys
405 410 415
Arg Glu Phe Pro Leu Gly Tyr Lys Ile Phe Asn Glu Glu Ile Gln Pro
420 425 430
Gln Tyr Ala Ile Gln Val Leu Asp Glu Leu Thr Lys Gly Lys Ala Ile
435 440 445
Ile Ala Thr Gly Val Gly Gln His Gln Met Trp Ala Ala Gln Tyr Tyr
450 455 460
Thr Tyr Lys Arg Pro Arg Gln Trp Leu Ser Ser Ala Gly Leu Gly Ala
465 470 475 480
Met Gly Phe Gly Leu Pro Ala Ala Ala Gly Ala Ala Val Ala Asn Pro
485 490 495
Gly Val Thr Val Val Asp Ile Asp Gly Asp Gly Ser Phe Leu Met Asn
500 505 510
Ile Gln Glu Leu Ala Met Ile Arg Ile Glu Asn Leu Pro Val Lys Val
515 520 525
Phe Val Leu Asn Asn Gln His Leu Gly Met Val Val Gln Trp Glu Asp
530 535 540
Arg Phe Tyr Lys Ala Asn Arg Ala His Thr Phe Leu Gly Asn Pro Glu
545 550 555 560
Asn Glu Ser Glu Ile Tyr Pro Asp Phe Val Ala Ile Ala Lys Gly Phe
565 570 575
Asn Ile Pro Ala Val Arg Val Thr Lys Lys Ser Glu Val His Ala Ala
580 585 590
Ile Lys Lys Met Leu Glu Ala Pro Gly Pro Tyr Leu Leu Asp Ile Ile
595 600 605
Val Pro His Gln Glu His Val Leu Pro Met Ile Pro Ser Gly Gly Ala
610 615 620
Phe Lys Asp Met Ile Leu Asp Gly Asp Gly Arg Thr Val Tyr
625 630 635
<210> 28
<211> 1917
<212> DNA
<213> Artificial Sequence
<220>
<223> ZmALS2 encoding sequence
<400> 28
atggccaccg ccgccgccgc gtctaccgcg ctcactggcg ccactaccgc tgcgcccaag 60
gcgaggcgcc gggcgcacct cctggccacc cgccgcgccc tcgccgcgcc catcaggtgc 120
tcagcggcgt cacccgccat gccgatggct cccccggcca ccccgctccg gccgtggggc 180
cccaccgatc cccgcaaggg cgccgacatc ctcgtcgagt ccctcgagcg ctgcggcgtc 240
cgcgacgtct tcgcctaccc cggcggcgcg tccatggaga tccaccaggc actcacccgc 300
tcccccgtca tcgccaacca cctcttccgc cacgagcaag gggaggcctt tgcggcctcc 360
ggctacgcgc gctcctcggg ccgcgtcggc gtctgcatcg ccacctccgg ccccggcgcc 420
accaaccttg tctccgcgct cgccgacgcg ctgctcgatt ccgtccccat ggtcgccatc 480
acgggacagg tgccgcgacg catgattggc accgacgcct tccaggagac gcccatcgtc 540
gaggtcaccc gctccatcac caagcacaac tacctggtcc tcgacgtcga cgacatcccc 600
cgcgtcgtgc aggaggcttt cttcctcgcc tcctctggtc gaccggggcc ggtgcttgtc 660
gacatcccca aggacatcca gcagcagatg gcggtgcctg tctgggacaa gcccatgagt 720
ctgcctgggt acattgcgcg ccttcccaag ccccctgcga ctgagttgct tgagcaggtg 780
ctgcgtcttg ttggtgaatc ccggcgccct gttctttatg ttggcggtgg ctgcgcagca 840
tctggtgagg agttgcgacg ctttgtggag ctgactggaa tcccggtcac aactactctt 900
atgggcctcg gcaacttccc cagcgacgac ccactgtctc tgcgcatgct aggtatgcat 960
ggcacggtgt atgcaaatta tgcagtggat aaggccgatc tgttgcttgc acttggtgtg 1020
cggtttgatg atcgtgtgac agggaagatt gaggcttttg caagcagggc taagattgtg 1080
cacgttgata ttgatccggc tgagattggc aagaacaagc agccacatgt gtccatctgt 1140
gcagatgtta agcttgcttt gcagggcatg aatgctcttc ttgaaggaag cacatcaaag 1200
aagagctttg actttggctc atggaacgat gagttggatc agcagaagag ggaattcccc 1260
cttgggtata aaacatctaa tgaggagatc cagccacaat atgctattca ggttcttgat 1320
gagctgacga aaggcgaggc catcatcggc acaggtgttg ggcagcacca gatgtgggcg 1380
gcacagtact acacttacaa gcggccaagg cagtggttgt cttcagctgg tcttggggct 1440
atgggatttg gtttgccggc tgctgctggt gcttctgtgg ccaacccagg tgttactgtt 1500
gttgacatcg atggagatgg tagctttctc atgaacgttc aggagctagc tatgatccga 1560
attgagaacc tcccggtgaa ggtctttgtg ctaaacaacc agcacctggg gatggtggtg 1620
cagtgggagg acaggttcta taaggccaac agagcgcaca catacttggg aaacccagag 1680
aatgaaagtg agatatatcc agatttcgtg acgatcgcca aagggttcaa cattccagcg 1740
gtccgtgtga caaagaagaa cgaagtccgc gcagcgataa agaagatgct cgagactcca 1800
gggccgtacc tcttggatat aatcgtccca caccaggagc atgtgttgcc tatgatccct 1860
agtggtgggg ctttcaagga tatgatcctg gatggtgatg gcaggactgt gtactga 1917
<210> 29
<211> 638
<212> PRT
<213> Artificial Sequence
<220>
<223> ZmALS2
<400> 29
Met Ala Thr Ala Ala Ala Ala Ser Thr Ala Leu Thr Gly Ala Thr Thr
1 5 10 15
Ala Ala Pro Lys Ala Arg Arg Arg Ala His Leu Leu Ala Thr Arg Arg
20 25 30
Ala Leu Ala Ala Pro Ile Arg Cys Ser Ala Ala Ser Pro Ala Met Pro
35 40 45
Met Ala Pro Pro Ala Thr Pro Leu Arg Pro Trp Gly Pro Thr Asp Pro
50 55 60
Arg Lys Gly Ala Asp Ile Leu Val Glu Ser Leu Glu Arg Cys Gly Val
65 70 75 80
Arg Asp Val Phe Ala Tyr Pro Gly Gly Ala Ser Met Glu Ile His Gln
85 90 95
Ala Leu Thr Arg Ser Pro Val Ile Ala Asn His Leu Phe Arg His Glu
100 105 110
Gln Gly Glu Ala Phe Ala Ala Ser Gly Tyr Ala Arg Ser Ser Gly Arg
115 120 125
Val Gly Val Cys Ile Ala Thr Ser Gly Pro Gly Ala Thr Asn Leu Val
130 135 140
Ser Ala Leu Ala Asp Ala Leu Leu Asp Ser Val Pro Met Val Ala Ile
145 150 155 160
Thr Gly Gln Val Pro Arg Arg Met Ile Gly Thr Asp Ala Phe Gln Glu
165 170 175
Thr Pro Ile Val Glu Val Thr Arg Ser Ile Thr Lys His Asn Tyr Leu
180 185 190
Val Leu Asp Val Asp Asp Ile Pro Arg Val Val Gln Glu Ala Phe Phe
195 200 205
Leu Ala Ser Ser Gly Arg Pro Gly Pro Val Leu Val Asp Ile Pro Lys
210 215 220
Asp Ile Gln Gln Gln Met Ala Val Pro Val Trp Asp Lys Pro Met Ser
225 230 235 240
Leu Pro Gly Tyr Ile Ala Arg Leu Pro Lys Pro Pro Ala Thr Glu Leu
245 250 255
Leu Glu Gln Val Leu Arg Leu Val Gly Glu Ser Arg Arg Pro Val Leu
260 265 270
Tyr Val Gly Gly Gly Cys Ala Ala Ser Gly Glu Glu Leu Arg Arg Phe
275 280 285
Val Glu Leu Thr Gly Ile Pro Val Thr Thr Thr Leu Met Gly Leu Gly
290 295 300
Asn Phe Pro Ser Asp Asp Pro Leu Ser Leu Arg Met Leu Gly Met His
305 310 315 320
Gly Thr Val Tyr Ala Asn Tyr Ala Val Asp Lys Ala Asp Leu Leu Leu
325 330 335
Ala Leu Gly Val Arg Phe Asp Asp Arg Val Thr Gly Lys Ile Glu Ala
340 345 350
Phe Ala Ser Arg Ala Lys Ile Val His Val Asp Ile Asp Pro Ala Glu
355 360 365
Ile Gly Lys Asn Lys Gln Pro His Val Ser Ile Cys Ala Asp Val Lys
370 375 380
Leu Ala Leu Gln Gly Met Asn Ala Leu Leu Glu Gly Ser Thr Ser Lys
385 390 395 400
Lys Ser Phe Asp Phe Gly Ser Trp Asn Asp Glu Leu Asp Gln Gln Lys
405 410 415
Arg Glu Phe Pro Leu Gly Tyr Lys Thr Ser Asn Glu Glu Ile Gln Pro
420 425 430
Gln Tyr Ala Ile Gln Val Leu Asp Glu Leu Thr Lys Gly Glu Ala Ile
435 440 445
Ile Gly Thr Gly Val Gly Gln His Gln Met Trp Ala Ala Gln Tyr Tyr
450 455 460
Thr Tyr Lys Arg Pro Arg Gln Trp Leu Ser Ser Ala Gly Leu Gly Ala
465 470 475 480
Met Gly Phe Gly Leu Pro Ala Ala Ala Gly Ala Ser Val Ala Asn Pro
485 490 495
Gly Val Thr Val Val Asp Ile Asp Gly Asp Gly Ser Phe Leu Met Asn
500 505 510
Val Gln Glu Leu Ala Met Ile Arg Ile Glu Asn Leu Pro Val Lys Val
515 520 525
Phe Val Leu Asn Asn Gln His Leu Gly Met Val Val Gln Trp Glu Asp
530 535 540
Arg Phe Tyr Lys Ala Asn Arg Ala His Thr Tyr Leu Gly Asn Pro Glu
545 550 555 560
Asn Glu Ser Glu Ile Tyr Pro Asp Phe Val Thr Ile Ala Lys Gly Phe
565 570 575
Asn Ile Pro Ala Val Arg Val Thr Lys Lys Asn Glu Val Arg Ala Ala
580 585 590
Ile Lys Lys Met Leu Glu Thr Pro Gly Pro Tyr Leu Leu Asp Ile Ile
595 600 605
Val Pro His Gln Glu His Val Leu Pro Met Ile Pro Ser Gly Gly Ala
610 615 620
Phe Lys Asp Met Ile Leu Asp Gly Asp Gly Arg Thr Val Tyr
625 630 635
<210> 30
<211> 7
<212> PRT
<213> Artificial Sequence
<220>
<223> N terminal NLS
<400> 30
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 31
<211> 11
<212> PRT
<213> Artificial Sequence
<220>
<223> C terminal NLS
<400> 31
Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1 5 10

Claims (29)

1.碱基编辑系统用于对植物基因组中的靶序列进行碱基编辑的用途,其中所述系统包含以下i)至v)中至少一项:
i) 碱基编辑融合蛋白,和向导RNA;
ii) 包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和向导RNA;
iii) 碱基编辑融合蛋白,和包含编码向导RNA的核苷酸序列的表达构建体;
iv) 包含编码碱基编辑融合蛋白的核苷酸序列的表达构建体,和包含编码向导RNA的核苷酸序列的表达构建体;
v) 包含编码碱基编辑融合蛋白的核苷酸序列和编码向导RNA的核苷酸序列的表达构建体;
其中所述碱基编辑融合蛋白包含核酸酶失活的Cas9结构域和脱氨酶结构域和尿嘧啶DNA糖基化酶抑制剂(UGI),所述向导RNA能够将所述碱基编辑融合蛋白靶向植物基因组中的靶序列;
其中所述脱氨酶是载脂蛋白B mRNA编辑复合体(APOBEC)家族脱氨酶;且
其中所述核酸酶失活的Cas9相对于野生型Cas9包含氨基酸取代D10A和/或H840A。
2.权利要求1的用途,其中所述脱氨酶是APOBEC1脱氨酶或激活诱导的胞苷脱氨酶(AID)。
3. 权利要求2的用途,其中所述脱氨酶的氨基酸序列如SEQ ID NO:11所示。
4. 权利要求1的用途,其中所述核酸酶失活的Cas9的氨基酸序列如SEQ ID NO:13 或14所示。
5.权利要求1的用途,其中所述脱氨酶结构域被融合至所述核酸酶失活的Cas9结构域的N末端,或其中所述脱氨酶结构域被融合至所述核酸酶失活的Cas9结构域的C末端。
6.权利要求1的用途,其中所述脱氨酶结构域和所述核酸酶失活的Cas9结构域通过接头融合。
7. 权利要求6的用途,其中所述接头是SEQ ID NO:12所示的XTEN接头。
8. 权利要求1的用途,其中所述尿嘧啶DNA糖基化酶抑制剂的氨基酸序列如SEQ IDNO:15所示。
9.权利要求1的用途,其中所述碱基编辑融合蛋白还在其N端和/或C端包含核定位序列(NLS)。
10. 权利要求9的用途,其中所述NLS的氨基酸序列如SEQ ID NO:30或31所示。
11. 权利要求1的用途,其中所述碱基编辑融合蛋白的氨基酸序列如SEQ ID NO:22或23所示。
12.权利要求1的用途,其中所述编码碱基编辑融合蛋白的核苷酸序列针对待进行碱基编辑的植物进行密码子优化。
13. 权利要求1的用途,其中所述编码碱基编辑融合蛋白的核苷酸序列如SEQ ID NO:19或20所示。
14.权利要求1的用途,其中所述向导RNA是单向导RNA(sgRNA)。
15.权利要求1的用途,所述编码碱基编辑融合蛋白的核苷酸序列和/或所述编码向导RNA的核苷酸序列与植物表达调控元件可操作地连接。
16.权利要求15的用途,其中所述调控元件是启动子。
17.权利要求16的用途,所述启动子选自35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子或玉米U3启动子。
18.一种产生经遗传修饰的植物的方法,包括将权利要求1-17中任一项的系统导入植物,由此所述向导RNA将所述碱基编辑融合蛋白靶向所述植物基因组中的靶序列,导致所述靶序列中的一或多个C被T取代。
19.权利要求18的方法,其中所述导入在不存在选择压力下进行。
20.权利要求18或19的方法,其中所述靶序列中第3至第9位内的一或多个C被T取代。
21.权利要求18或19的方法,还包括筛选具有期望的核苷酸取代的植物。
22.权利要求18或19的方法,其中所述植物选自单子叶植物和双子叶植物。
23.权利要求22的方法,其中所述植物是作物植物。
24.权利要求23的方法,其中所述作物植物选自小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯或马铃薯。
25.权利要求18或19的方法,其中所述靶序列与植物性状相关,由此所述碱基编辑导致所述植物相对于野生型植物具有改变的性状。
26.权利要求25的方法,其中所述植物性状为农艺性状。
27.权利要求18或19的方法,其中所述系统通过选自以下的方法导入所述植物:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、病毒介导的转化、花粉管通道法和子房注射法。
28.权利要求18或19的方法,还包括获得所述经遗传修饰的植物的后代。
29.一种植物育种方法,包括将通过权利要求18-28中任一项的方法获得的经遗传修饰的第一植物与不含有所述遗传修饰的第二植物杂交,从而将所述遗传修饰导入第二植物。
CN201711122179.8A 2016-11-14 2017-11-14 植物碱基编辑方法 Active CN108070611B (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610998842X 2016-11-14
CN201610998842 2016-11-14

Publications (2)

Publication Number Publication Date
CN108070611A CN108070611A (zh) 2018-05-25
CN108070611B true CN108070611B (zh) 2021-06-29

Family

ID=62109117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711122179.8A Active CN108070611B (zh) 2016-11-14 2017-11-14 植物碱基编辑方法

Country Status (11)

Country Link
US (1) US11447785B2 (zh)
EP (1) EP3538661A4 (zh)
JP (1) JP2019523011A (zh)
KR (1) KR20190039430A (zh)
CN (1) CN108070611B (zh)
AR (1) AR110075A1 (zh)
AU (1) AU2017358264A1 (zh)
BR (1) BR112019002455A2 (zh)
CA (1) CA3043774A1 (zh)
EA (1) EA201990815A1 (zh)
WO (1) WO2018086623A1 (zh)

Families Citing this family (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3613852A3 (en) 2011-07-22 2020-04-22 President and Fellows of Harvard College Evaluation and improvement of nuclease cleavage specificity
US20150044192A1 (en) 2013-08-09 2015-02-12 President And Fellows Of Harvard College Methods for identifying a target site of a cas9 nuclease
US9359599B2 (en) 2013-08-22 2016-06-07 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US9340800B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College Extended DNA-sensing GRNAS
US9526784B2 (en) 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
US9322037B2 (en) 2013-09-06 2016-04-26 President And Fellows Of Harvard College Cas9-FokI fusion proteins and uses thereof
US20150166985A1 (en) 2013-12-12 2015-06-18 President And Fellows Of Harvard College Methods for correcting von willebrand factor point mutations
EP4079847A1 (en) 2014-07-30 2022-10-26 President And Fellows Of Harvard College Cas9 proteins including ligand-dependent inteins
JP7067793B2 (ja) 2015-10-23 2022-05-16 プレジデント アンド フェローズ オブ ハーバード カレッジ 核酸塩基編集因子およびその使用
KR102547316B1 (ko) 2016-08-03 2023-06-23 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 아데노신 핵염기 편집제 및 그의 용도
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
WO2018165504A1 (en) 2017-03-09 2018-09-13 President And Fellows Of Harvard College Suppression of pain by gene editing
EP3592777A1 (en) 2017-03-10 2020-01-15 President and Fellows of Harvard College Cytosine to guanine base editor
IL306092A (en) 2017-03-23 2023-11-01 Harvard College Nucleic base editors that include nucleic acid programmable DNA binding proteins
EP3622061A4 (en) * 2017-05-11 2021-01-27 Institute Of Genetics And Developmental Biology Chinese Academy of Sciences CREATION OF A HERBICIDE RESISTANT GENE AND USE OF IT
WO2018209320A1 (en) 2017-05-12 2018-11-15 President And Fellows Of Harvard College Aptazyme-embedded guide rnas for use with crispr-cas9 in genome editing and transcriptional activation
CN111801345A (zh) 2017-07-28 2020-10-20 哈佛大学的校长及成员们 使用噬菌体辅助连续进化(pace)的进化碱基编辑器的方法和组合物
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
JP2021500036A (ja) 2017-10-16 2021-01-07 ザ ブロード インスティテュート, インコーポレーテッドThe Broad Institute, Inc. アデノシン塩基編集因子の使用
CN110157727A (zh) 2017-12-21 2019-08-23 中国科学院遗传与发育生物学研究所 植物碱基编辑方法
WO2019226953A1 (en) * 2018-05-23 2019-11-28 The Broad Institute, Inc. Base editors and uses thereof
CN108753778B (zh) * 2018-06-01 2021-11-02 上海科技大学 利用碱基编辑修复fbn1t7498c突变的试剂和方法
CN112805385B (zh) * 2018-07-24 2023-05-30 苏州齐禾生科生物科技有限公司 基于人apobec3a脱氨酶的碱基编辑器及其用途
US20210355475A1 (en) * 2018-08-10 2021-11-18 Cornell University Optimized base editors enable efficient editing in cells, organoids and mice
CN110835629B (zh) * 2018-08-15 2022-07-26 华东师范大学 一种新型碱基转换编辑系统的构建方法及其应用
CN110835634B (zh) * 2018-08-15 2022-07-26 华东师范大学 一种新型碱基转换编辑系统及其应用
CN110835632B (zh) * 2018-08-15 2022-01-11 华东师范大学 新型碱基转换编辑系统用于基因治疗的应用
CN109652440A (zh) * 2018-12-28 2019-04-19 北京市农林科学院 VQRn-Cas9&PmCDA1&UGI碱基编辑系统在植物基因编辑中的应用
CN110423775B (zh) * 2019-03-11 2020-08-11 四川省农业科学院生物技术核技术研究所 一种水稻基因组中稻瘟病抗性位点dna的编辑修饰方法、编辑载体
AU2020242032A1 (en) 2019-03-19 2021-10-07 Massachusetts Institute Of Technology Methods and compositions for editing nucleotide sequences
CN109825638A (zh) * 2019-04-11 2019-05-31 上海市农业生物基因中心 一种水稻耐盐基因OsRR22引导引物、应用和靶点载体及靶点载体制备方法
WO2020223642A1 (en) * 2019-05-02 2020-11-05 Monsanto Technology Llc Compositions and methods for generating diversity at targeted nucleic acid sequences
US20220251580A1 (en) * 2019-05-07 2022-08-11 Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences Improved gene editing system
CN111304180B (zh) * 2019-06-04 2023-05-26 山东舜丰生物科技有限公司 一种新的dna核酸切割酶及其应用
EP3983548A4 (en) * 2019-06-11 2023-06-28 Pairwise Plants Services, Inc. Methods of producing plants with altered fruit development and plants derived therefrom
CN112239756B (zh) * 2019-07-01 2022-04-19 科稷达隆(北京)生物技术有限公司 一组来源于植物的胞嘧啶脱氨酶和其在碱基编辑系统中的应用
CN112266418A (zh) * 2019-07-08 2021-01-26 中国科学院微生物研究所 改进的基因组编辑系统及其应用
CN117264998A (zh) * 2019-07-10 2023-12-22 苏州齐禾生科生物科技有限公司 双功能基因组编辑系统及其用途
CN110373414A (zh) * 2019-08-01 2019-10-25 中国农业科学院作物科学研究所 一种高效创制磺酰脲类除草剂抗性玉米的方法及其应用
CN110551752B (zh) * 2019-08-30 2023-03-14 北京市农林科学院 xCas9n-epBE碱基编辑系统及其在基因组碱基替换中的应用
US11591607B2 (en) 2019-10-24 2023-02-28 Pairwise Plants Services, Inc. Optimized CRISPR-Cas nucleases and base editors and methods of use thereof
CN111019967A (zh) * 2019-11-27 2020-04-17 南京农业大学 GmU3-19g-1和GmU6-16g-1启动子在大豆多基因编辑系统中的应用
CN110982820A (zh) * 2020-01-03 2020-04-10 云南中烟工业有限责任公司 一种烟草单倍体的基因编辑方法
CN111139245A (zh) * 2020-01-06 2020-05-12 济南大学 基因cda1在调控叶绿体发育中的应用
CN115380111A (zh) * 2020-01-30 2022-11-22 成对植物服务股份有限公司 用于碱基多样化的组合物、系统和方法
BR112022022603A2 (pt) 2020-05-08 2023-01-17 Broad Inst Inc Métodos e composições para edição simultânea de ambas as fitas de sequência alvo de nucleotídeos de fita dupla
WO2022056139A1 (en) 2020-09-10 2022-03-17 Monsanto Technology Llc Increasing gene editing and site-directed integration events utilizing meiotic and germline promoters
CN114317590B (zh) * 2020-09-30 2024-01-16 北京市农林科学院 一种将植物基因组中的碱基c突变为碱基t的方法
CN114317589B (zh) * 2020-09-30 2024-01-16 北京市农林科学院 SpRYn-ABE碱基编辑系统在植物基因组碱基替换中的应用
CN112553243B (zh) * 2020-12-11 2022-07-22 中国农业科学院棉花研究所 CRISPR/xCas9基因编辑系统在棉花中的应用
CN114835816B (zh) * 2021-01-14 2023-12-22 中国科学院遗传与发育生物学研究所 一种调控植物基因组dna特定区域甲基化水平的方法
WO2022188816A1 (zh) * 2021-03-09 2022-09-15 苏州齐禾生科生物科技有限公司 改进的cg碱基编辑系统
WO2023279118A2 (en) * 2021-07-02 2023-01-05 University Of Maryland, College Park Cytidine deaminases and methods of genome editing using the same
CN114438115A (zh) * 2021-12-23 2022-05-06 中国热带农业科学院热带生物技术研究所 一种CRISPR/Cas9基因编辑载体、构建方法及其应用
CN114686456B (zh) * 2022-05-10 2023-02-17 中山大学 基于双分子脱氨酶互补的碱基编辑系统及其应用
CN116463348B (zh) * 2023-05-26 2024-05-14 中国农业科学院作物科学研究所 利用CRISPR/Cas9系统编辑玉米ZmCENH3基因的sg RNA及其应用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015133554A1 (ja) * 2014-03-05 2015-09-11 国立大学法人神戸大学 標的化したdna配列の核酸塩基を特異的に変換するゲノム配列の改変方法及びそれに用いる分子複合体
CN105934516A (zh) * 2013-12-12 2016-09-07 哈佛大学的校长及成员们 用于基因编辑的cas变体
CN106834341A (zh) * 2016-12-30 2017-06-13 中国农业大学 一种基因定点突变载体及其构建方法和应用
CN107043779A (zh) * 2016-12-01 2017-08-15 中国农业科学院作物科学研究所 一种CRISPR/nCas9介导的定点碱基替换在植物中的应用
CN108513575A (zh) * 2015-10-23 2018-09-07 哈佛大学的校长及成员们 核碱基编辑器及其用途

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3221516A1 (en) * 2013-08-22 2015-02-26 E. I. Du Pont De Nemours And Company Plant genome modification using guide rna/cas endonuclease systems and methods of use
US9526784B2 (en) * 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
US11584936B2 (en) * 2014-06-12 2023-02-21 King Abdullah University Of Science And Technology Targeted viral-mediated plant genome editing using CRISPR /Cas9
JP6434235B2 (ja) * 2014-07-03 2018-12-05 株式会社ブリヂストン タイヤ
WO2016183438A1 (en) 2015-05-14 2016-11-17 Massachusetts Institute Of Technology Self-targeting genome editing system
CN106609282A (zh) 2016-12-02 2017-05-03 中国科学院上海生命科学研究院 一种用于植物基因组定点碱基替换的载体

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105934516A (zh) * 2013-12-12 2016-09-07 哈佛大学的校长及成员们 用于基因编辑的cas变体
WO2015133554A1 (ja) * 2014-03-05 2015-09-11 国立大学法人神戸大学 標的化したdna配列の核酸塩基を特異的に変換するゲノム配列の改変方法及びそれに用いる分子複合体
CN106459957A (zh) * 2014-03-05 2017-02-22 国立大学法人神户大学 用于特异性转变靶向dna序列的核酸碱基的基因组序列的修饰方法及其使用的分子复合体
CN108513575A (zh) * 2015-10-23 2018-09-07 哈佛大学的校长及成员们 核碱基编辑器及其用途
CN107043779A (zh) * 2016-12-01 2017-08-15 中国农业科学院作物科学研究所 一种CRISPR/nCas9介导的定点碱基替换在植物中的应用
CN106834341A (zh) * 2016-12-30 2017-06-13 中国农业大学 一种基因定点突变载体及其构建方法和应用

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage;Alexis C. Komor等;《Nature》;20160420;第533卷(第7603期);第420-424页 *
碱基编辑系统BE3在家蚕中的应用研究;李豫丰;《中国优秀硕士学位论文全文数据库 基础科学辑》;20190115(第01期);A006-773 *

Also Published As

Publication number Publication date
CA3043774A1 (en) 2018-05-17
AR110075A1 (es) 2019-02-20
JP2019523011A (ja) 2019-08-22
EA201990815A1 (ru) 2019-09-30
WO2018086623A1 (en) 2018-05-17
EP3538661A1 (en) 2019-09-18
AU2017358264A1 (en) 2019-02-21
CN108070611A (zh) 2018-05-25
US20190292553A1 (en) 2019-09-26
BR112019002455A2 (pt) 2019-06-25
EP3538661A4 (en) 2020-04-15
US11447785B2 (en) 2022-09-20
KR20190039430A (ko) 2019-04-11

Similar Documents

Publication Publication Date Title
CN108070611B (zh) 植物碱基编辑方法
CN108753813B (zh) 获得无标记转基因植物的方法
KR20130132405A (ko) 형질전환 빈도를 증가시키기 위해 변형된 아그로박테리움 균주
KR102096592B1 (ko) 신규한 crispr 연관 단백질 및 이의 용도
US20110321190A1 (en) Method of positive plant selection using sorbitol dehydrogenase
CN108486105B (zh) 一种马克斯克鲁维酵母启动子及其制备方法与应用
CN110540989A (zh) 基于pcr技术克隆已知区域旁邻的未知dna序列的引物及方法
CN110914414A (zh) 利用设计dna重组酶遗传改变基因组的方法与手段
CN111171132B (zh) 乌鳢抗菌肽
AU684785B2 (en) Marker gene
CN110734480B (zh) 大肠杆菌分子伴侣GroEL/ES在协助合成植物Rubisco中的应用
CN113061626B (zh) 一种组织特异性敲除斑马鱼基因的方法及应用
BRPI0616533A2 (pt) polinucleotìdeo isolado, fragmento de ácido nucléico isolado, construções de dna recombinante, plantas, sementes, células vegetais, tecidos vegetais, método de isolamento de fragmentos de ácidos nucléico, método de mapeamento de variações genéticas, método de cultivo molecular, plantas de milho, métodos de alteração do transporte de nitrogênio das plantas e variantes de hat de plantas alteradas
CN109593695B (zh) 一种在枯草芽孢杆菌芽孢表面展示葡萄糖氧化酶的方法与应用
WO1994026913A9 (en) Marker gene
CN110818784A (zh) 水稻基因OsATL15在调节农药的吸收转运中的应用
CN114438083A (zh) 识别猪PERV基因的sgRNA及其编码DNA和应用
CN108728484B (zh) 用于获得无标记转基因植物的载体及其应用
CN109880885B (zh) 一种双荧光筛选β-丙氨酸合成酶的方法
CN108410870B (zh) 马克斯克鲁维酵母启动子、分泌信号肽及其制备与应用
CN102241763A (zh) 一种鱼类持续激活生长激素受体基因及制备方法和用途
CN110747216A (zh) 一种多基因共表达成套载体及其应用
US20100304461A1 (en) Portable, Temperature and Chemically Inducible Expression Vector for High Cell Density Expression of Heterologous Genes in Escherichia Coli
CN114317605B (zh) 一种小胶质细胞钾离子探针转基因小鼠模型的构建方法
KR101629345B1 (ko) 구제역 아시아1 혈청형 유전형 iv 바이러스의 방어항원이 발현되는 재조합 구제역 바이러스 및 그의 제조방법

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180525

Assignee: Shanghai Blue Cross Medical Science Research Institute

Assignor: INSTITUTE OF GENETICS AND DEVELOPMENTAL BIOLOGY, CHINESE ACADEMY OF SCIENCES

Contract record no.: X2022990000347

Denomination of invention: Plant base editing method

Granted publication date: 20210629

License type: Common License

Record date: 20220705

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180525

Assignee: Suzhou Qihe Biotechnology Co.,Ltd.

Assignor: Shanghai Blue Cross Medical Science Research Institute

Contract record no.: X2023990000162

Denomination of invention: Plant base editing method

Granted publication date: 20210629

License type: Common License

Record date: 20230117