WO2020241869A1 - GENOME EDITING SYSTEM USING Cas PROTEIN HAVING TWO TYPES OF NUCLEIC ACID BASE-CONVERTING ENZYMES FUSED THERETO - Google Patents

GENOME EDITING SYSTEM USING Cas PROTEIN HAVING TWO TYPES OF NUCLEIC ACID BASE-CONVERTING ENZYMES FUSED THERETO Download PDF

Info

Publication number
WO2020241869A1
WO2020241869A1 PCT/JP2020/021456 JP2020021456W WO2020241869A1 WO 2020241869 A1 WO2020241869 A1 WO 2020241869A1 JP 2020021456 W JP2020021456 W JP 2020021456W WO 2020241869 A1 WO2020241869 A1 WO 2020241869A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
base
editing
dna
cells
Prior art date
Application number
PCT/JP2020/021456
Other languages
French (fr)
Japanese (ja)
Inventor
望 谷内江
宗 石黒
莉奈 坂田
秀人 森
Original Assignee
国立大学法人東京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立大学法人東京大学 filed Critical 国立大学法人東京大学
Publication of WO2020241869A1 publication Critical patent/WO2020241869A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)

Definitions

  • the present invention relates to a genome editing system using a Cas protein in which two types of nucleic acid-based converting enzymes are fused. More specifically, the present invention is a fusion protein in which different nucleic acid-based converting enzymes are bound to the N-terminal and C-terminal of the Cas protein. With respect to fusion proteins in which the Cas protein has lost some or all of its nuclease activity.
  • the present invention also relates to a polynucleotide encoding the fusion protein, an expression vector containing the polynucleotide, a genome editing system containing the polynucleotide, and a method for producing a DNA-edited cell using the system.
  • the genome which is the blueprint for life, is composed of a DNA base sequence consisting of the four letters A, T, G, and C.
  • genome editing technology that can freely edit a specific target sequence in intracellular genomic DNA has been rapidly developed, and is bringing about a great change in the entire biology field including agriculture and medical fields.
  • Cas9Cas9 protein having DNA cleaving activity is recruited by a guide RNA (gRNA) to a target DNA region located upstream of the 3'protospacer adjacent motif (PAM), and double-stranded DNA.
  • gRNA guide RNA
  • PAM 3'protospacer adjacent motif
  • DSB genome editing tool that induces cleavage
  • This method has been widely applied as a genome editing tool, such as knocking out a specific gene through error-prone DNA repair and enabling insertion of another foreign DNA into a chromosome through DSB-induced homologous recombination. There was.
  • cytotoxicity due to double-stranded DNA cleavage and editing accuracy have been problems.
  • a base editing tool has been developed in which a deaminase (deaminase) is fused with a mutant Cas9 that does not have DNA cleaving activity. More specifically, base editing induced by nuclease deficiency or tethering deoxynucleoside deaminase to a complex consisting of nickase Cas9 (dCas9 or nCas9) and gRNA is an efficient and direct base in the genomic sequence. It has been clarified that it induces substitution (Non-Patent Document 3).
  • CBEs cytosine base edits
  • ABEs Non-Patent Documents 6 and 7
  • CBEs cytosine base edits
  • ABEs Non-Patent Documents 6 and 7
  • CBEs are usually composed of cytidine deaminase and uracil glycosylase inhibitor (UGI).
  • the cytidine deaminase is rAPOBEC1 used in BEs (Non-Patent Document 5), PmCDA1 used in Target-AID (Non-Patent Document 4), etc., and converts cytidine in a non-gRNA-binding DNA strand into uridine.
  • UGI also inhibits base excision repair. Then, by using these, it becomes possible to replace cytosine with uracil and further with thymine through DNA repair.
  • ABEs for example, a heterodimer complex consisting of wild-type and modified TadA adenosine deaminese is used.
  • the enzyme converts adenine to inosine and replicates it as guanine (Non-Patent Document 6).
  • C of the target DNA can be converted to T or A to G with high efficiency (C ⁇ T or A ⁇ G) through deamination reaction without cutting the DNA.
  • Expectations are rising as an excellent genome editing tool in areas such as crop breeding and gene therapy.
  • the base substitution patterns that can be used with conventional base editing tools have been limited to two types, C ⁇ T or A ⁇ G.
  • the present invention has been made in view of the above-mentioned problems of the prior art, and an object of the present invention is to provide a base editing tool capable of simultaneously performing a plurality of types of base substitutions.
  • the present inventors aim to provide a base editing tool that enables simultaneous base substitution of both C ⁇ T and A ⁇ G in order to achieve the above object, and cytidine deaminase and adenosine.
  • Adenosine was fused to a single nCas9 (D10A) to first prepare three base editors, Target-ACE, Target-ACEmax, and ACBEmax.
  • nCas9 fused to PmCDA1 derived from Target-AID and TadA heterodimer derived from ABE-7.10 are fused as C-terminal and N-terminal, respectively, for consistent single-function base editing. It is configured with other functional domains that exist.
  • Target-ACEmax was constructed by applying GenScript codon optimization and addition of a binode-type nuclear localization signal (binode-type NLS) to the N-terminal to the Target-ACE.
  • binode-type NLS binode-type nuclear localization signal
  • ACBEmax was constructed by replacing the codon-optimized PmCDA1 domain in Target-ACEmax with the codon-optimized cytidine deaminase domain rAPOBEC1 derived from BE4max.
  • Target-ACEmax has the highest co-editing efficiency of heterologous bases.
  • RNA off-target effect is generally regarded as a problem in genome editing technology
  • RNA off-target effect is a concern in base editing technology
  • exome Analysis and transcriptome analysis were performed. As a result, it was also revealed that Target-ACEmax does not unnecessarily enhance these off-target effects as compared with conventional base editing tools.
  • the present invention relates to a fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, and the Cas protein loses part or all of the nuclease activity.
  • the present invention also relates to a polynucleotide encoding the fusion protein, an expression vector containing the polynucleotide, a genome editing system containing the polynucleotide, and a method for producing a DNA-edited cell using the system. Specifically, it provides the following.
  • ⁇ 1> A fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, and the Cas protein loses part or all of its nuclease activity.
  • the different nucleobase converting enzymes are adenosine deaminase and cytidine deaminase.
  • ⁇ 3> The fusion protein according to ⁇ 2>, wherein the adenosine deaminase is bound to the N-terminal of the Cas protein and the cytidine deaminase is bound to the C-terminal of the Cas protein.
  • ⁇ 4> The fusion protein according to any one of ⁇ 1> to ⁇ 3>, to which at least one of a nuclear localization signal and a uracil glycosylase inhibitor is bound.
  • ⁇ 5> A polynucleotide encoding the fusion protein according to any one of ⁇ 1> to ⁇ 4>.
  • ⁇ 6> The polynucleotide according to ⁇ 5>, wherein the codon is optimized according to the host cell to be introduced.
  • ⁇ 7> An expression vector containing the polynucleotide according to ⁇ 5> or ⁇ 6>.
  • ⁇ 8> A genome editing system including the following (A) and (B).
  • FIG. 1 It is a schematic diagram which shows the development system of the dual-function base editor (Target-ACEmax, Target-ACE, ACBEmax) of this invention.
  • the development line from the single-function base editor to the dual-function base editor is indicated by an arrow, and the combination of single-function base editors corresponding to the dual-function base editor is indicated by a broken line.
  • the schematic diagram of the C ⁇ T base editing reporter system is shown. Following C ⁇ T base editing on the antisense strand, DNA replication restores the translation of EGFP by converting the codon GTG (coding valine) to the start codon ATG (coding methionine).
  • the schematic diagram of the A ⁇ G base edit reporter system is shown.
  • FIG. 5 is a graph showing the frequency with which C ⁇ T base editing was detected by amplicon sequencing at the gRNA target site (-30 bp to +10 bp with respect to PAM) of the C ⁇ T base editing reporter cell.
  • the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar.
  • FIG. 5 is a graph showing the frequency with which C ⁇ T base editing was detected by amplicon sequencing at the gRNA target site (-30 bp to +10 bp with respect to PAM) of the C ⁇ T base editing reporter cell.
  • the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar.
  • FIG. 5 is a graph showing the frequency with which A ⁇ G base editing was detected by amplicon sequencing at the gRNA target site (-30 bp to +10 bp with respect to PAM) of the A ⁇ G base editing reporter cell.
  • the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar.
  • It is a graph which shows the mean C-> T and A-> G base edit spectrum at a genome target site.
  • It is a graph which shows the editing frequency of C ⁇ T or A ⁇ G at 47 endogenous target sites of the human genome.
  • the black horizontal bars represent the average edit spectrum of C ⁇ T or A ⁇ G.
  • a bilateral Mann-Whitney U test was performed to compare any pair in the two datasets obtained from three cell cultures.
  • the combination of positions in which the average co-editing frequency is ranked in the top 5 under any base editing condition is shown.
  • Each bar shows the mean co-editing frequency measured for cytosine and adenine-bearing target sites at each combination position in three cell cultures.
  • a two-sided Mann-Whitney U test was performed to compare Target-ACEmax with the corresponding enzyme mixture and the other two bibase editors for mean co-editing frequency in a combination of positions with sufficient sample size.
  • a two-sided Welch's t-test was performed and a dual-function editing editor was compared with the corresponding single-function enzyme mixture. Arrows indicate datasets with higher mean scores than others (** P ⁇ 0.01). It is a graph which shows the number and frequency of genome-wide C ⁇ U RNA mutants detected in the cell subjected to various base editing conditions. Each bar shows the number of mutants identified by RNA-seq, and the jitter plot shows their mutant aryl frequency. The experiment was performed twice for each base editing condition. It is a graph which shows the number and frequency of genome-wide A ⁇ I RNA mutants detected in the cell subjected to various base editing conditions.
  • Each bar shows the number of mutants identified by RNA-seq, and the jitter plot shows their mutant aryl frequency.
  • the experiment was performed twice for each base editing condition. It is a figure which shows the outline of the conditional probability model in order to predict the frequency of various base editing patterns in an arbitrary target DNA sequence by various base editing editors.
  • the dot diagram at the bottom shows the correlation between the measured value and the predicted value. It is a graph which shows the bystander mutation risk score with codon conversion in various base editing methods.
  • the horizontal bar represents the median risk score for the various codon conversion types represented by the dots. 10 0 ⁇ 4 ⁇ 10 0 near region (color display under pink areas) indicates the range of the average risk score of single-function base editor except BE4max (C).
  • the present invention provides a fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, wherein the Cas protein loses part or all of its nuclease activity.
  • the Cas protein in the fusion protein of the present invention has lost part or all of its nuclease activity.
  • a Cas protein that has lost part of its nuclease activity is called "nCas”
  • a Cas protein that has lost all of its nuclease activity is called "dCas”.
  • the Cas protein typically comprises a domain involved in cleavage of the target strand (RuvC domain) and a domain involved in cleavage of the non-target strand (HNH domain), but the Cas protein used in the present invention is preferably preferred.
  • the introduction of a mutation into at least one domain results in the loss of nuclease activity in that domain.
  • SpCas9 protein (Cas9 protein derived from S. pyogenes)
  • a mutation of the 10th amino acid (aspartic acid) from the N-terminal to alanine (D10A: mutation in the RuvC domain) for example, a mutation of the 10th amino acid (aspartic acid) from the N-terminal to alanine (D10A: mutation in the RuvC domain).
  • Cas9 proteins of various origins are known (eg, WO2014 / 131833), and their nCas or dCas can be utilized.
  • the amino acid sequence and base sequence of Cas9 protein are registered in a public database, for example, GenBank (http://www.ncbi.nlm.nih.gov) (for example, accession number: Q99ZW2.1, etc.). ), These can be used in the present invention.
  • mutations to alter PAM recognition may be introduced into the Cas protein (Benjamin, P. et al., Nature 523, 481-485 (2015), Hirano, S. et al., Molecular Cell. 61, 886-894 (2016)).
  • Cas proteins other than Cas9 for example, Cpf1 (Cas12), Cas12b, CasX (Cas12e), Cas14 and the like can also be used.
  • nucleobase converting enzyme refers to a target nucleotide without breaking the DNA strand by catalyzing a reaction of converting a substituent on the purine or pyrimidine ring of a DNA base into another group or atom.
  • a “deaminase” is an enzyme that catalyzes a deamination reaction that converts an amino group of a base into a carbonyl group and belongs to the nucleic acid / nucleotide deaminase superfamily.
  • deaminase examples include cytosine deaminase capable of converting cytosine or 5-methylcytosine to uracil or thymine, adenosine deaminase capable of converting adenine to hypoxanthine, and guanosine deaminase capable of converting guanine to xanthine.
  • the origin of the nucleobase converting enzyme is not particularly limited, and examples thereof include mammals (eg, humans, pigs, cows, horses, monkeys), fish (eg, lampreys), and bacteria (eg, Escherichia coli).
  • the "different nucleobase converting enzyme" fused to the N-terminal and C-terminal of the Cas protein is not particularly limited, but a combination of adenosine deaminase and cytidine deaminase is preferable. Further, in the fusion protein, it is preferable that adenosine deaminase is bound to the N-terminal of the Cas protein and cytidine deaminase is bound to the C-terminal of the Cas protein.
  • adenosine deaminase examples include Escherichia coli-derived TadA and its variants.
  • a typical amino acid sequence of wild-type TadA includes an amino acid sequence consisting of positions 218 to 383 shown in SEQ ID NO: 2, and an amino acid sequence of a variant thereof (TadA *) is shown in SEQ ID NO: 2. Examples thereof include the amino acid sequences consisting of the 20th to 185th positions described above.
  • the fusion protein of the present invention has both TadA and a variant thereof from the viewpoint of facilitating reduction of off-target RNA editing while maintaining the activity of on-target DNA base substitution. desirable.
  • cytosine deaminase examples include PmCDA1 (Petromyzon mammal cytosine deaminase 1), APOBEC (APOBEC1, APOBEC2, APOBEC3 (APOBEC3A, 3B, 3C, 3D (3E), etc.), 3G, 3D (3E), 3F, 3F, etc.
  • PmCDA1 Petromyzon mammal cytosine deaminase 1
  • APOBEC APOBEC1, APOBEC2, APOBEC3 (APOBEC3A, 3B, 3C, 3D (3E), etc.), 3G, 3D (3E), 3F, 3F, etc.
  • Anc689 which is an ancestral amino acid sequence of APOBEC
  • AID Activation-induced cytosine deaminase (AICDA)) derived from mammals, a family of AIDs, and variants thereof.
  • PmCDA1 is preferable from the viewpoint that it easily exhibits high catalytic activity when bound to the C-terminal of Cas protein.
  • a typical amino acid sequence of PmCDA1 includes the amino acid sequence consisting of positions 1876 to 2083 shown in SEQ ID NO: 2.
  • Examples of the amino acid sequence of APOBEC1 include the amino acid sequence at positions 1876 to 2103 shown in SEQ ID NO: 6.
  • At least one of the nuclear localization signal and the uracil glycosylase inhibitor is further bound to the "fusion protein" of the present invention.
  • nuclear localization signal is not particularly limited as long as it is an amino acid sequence that serves as a marker for transporting the fusion protein of the present invention to the cell nucleus, but is a sequence having one cluster consisting of basic amino acids ( It may be a mononode type) or a sequence in which clusters consisting of two basic amino acids are linked via a spacer sequence (binode type).
  • the nuclear localization signal may be bound to at least one of the N-terminal and the C-terminal in the fusion protein of the present invention, or may be inserted between each domain.
  • the "uracil glycosylase inhibitor” according to the present invention may be any as long as it can suppress the degradation of uracil converted from cytosine by cytidine deaminase by uracil DNA glycosylase, for example, 2095 described in SEQ ID NO: 2.
  • An amino acid sequence consisting of ⁇ 2177 positions can be mentioned.
  • the uracil glycosylase inhibitor may be bound to at least one of the N-terminal and the C-terminal in the fusion protein of the present invention as long as it exhibits the inhibitory activity, or may be inserted between the domains. However, it is preferable that it is bound to the C-terminal. Further, the number of uracil glycosylase inhibitors in the fusion protein of the present invention is not particularly limited as long as the inhibitory activity can be exhibited, and may be one, or a plurality (for example, 2, 3, 4, or 5). It may be (pieces).
  • the above-mentioned Cas protein, nucleobase converting enzyme, nuclear translocation signal and uracil glycosylase inhibitor are, for example, adenosine deaminase-Cas protein-citidine deaminase-uracil glycosylase inhibition in order from N-terminal as follows. Placed with the agent.
  • nuclear localization signal-adenosin deaminase-Cas protein-cytidine deaminase-nuclear transfer signal-uracil glycosylase inhibitor can be mentioned in order from N-terminal, and more preferably N-terminal.
  • binode nuclear localization signal-modified TadA (TadA *) -TadA-SpCas9 (D10A) -PmCDA1-mononode nuclear localization signal-uracil glycosylase inhibitor is mentioned, more preferably SEQ ID NO: 2. It is a protein (Taget-ACEmax) consisting of the amino acid sequence described in 1.
  • fusion protein of the present invention includes adenosine deaminase-Cas protein-citidine deaminase-nuclear localization signal-uracil glycosylase inhibitor in order from N-terminal, and more preferably modified in order from N-terminal.
  • Type TadA (TadA *)-TadA-SpCas9 (D10A) -PmCDA1-mononode nuclear localization signal-uracil glycosylase inhibitor more preferably a protein consisting of the amino acid sequence set forth in SEQ ID NO: 4 (Tadet-).
  • ACE adenosine deaminase-Cas protein-citidine deaminase-nuclear localization signal-uracil glycosylase inhibitor in order from N-terminal, and more preferably modified in order from N-terminal.
  • a preferred embodiment of the fusion protein of the present invention includes, in order from the N-terminal, a nuclear localization signal-adenosine deaminase-Cas protein-citidine deaminase-nuclear transfer signal-uracil glycosylase inhibitor-nuclear transfer signal, and more preferably From the N-terminal, binode nuclear localization signal-modified TadA (TadA *) -TadA-SpCas9 (D10A) -APOBEC1-urasyl glycosylase inhibitor-urasyl glycosylase inhibitor-binode nuclear localization signal, and more.
  • it is a protein (ACBEmax) consisting of the amino acid sequence shown in SEQ ID NO: 6.
  • the Cas protein, nucleobase converting enzyme, nuclear localization signal and uracil glycosylase inhibitor described above may be directly bound or indirectly bound via a linker sequence. May be good.
  • the "linker sequence” is not particularly limited as long as the function of each of the proteins is not suppressed, and a linker composed of glycine and serine (GGS peptide, 2 ⁇ GGS peptide, 3 ⁇ GGS peptide, GS peptide, 2 ⁇ GS peptide, 3 ⁇ GS peptide, etc.).
  • a linker composed of glycine and serine (GGS peptide, 2 ⁇ GGS peptide, 3 ⁇ GGS peptide, GS peptide, 2 ⁇ GS peptide, 3 ⁇ GS peptide, etc.).
  • only one of these linker sequences may be arranged between each protein, or a plurality of (for example, 2, 3, 4 or 5 sequences) may be arranged.
  • it may be composed of only one kind of linker sequence, or may be composed of a plurality of kinds of linker sequences.
  • the number of amino acids in the "linker sequence" arranged between each protein is not particularly limited as long as the function of each protein is not suppressed, but is usually 1 to 300 amino acids, and the lower limit is set. It is preferably 2 amino acids or more (for example, 3 amino acids or more, 4 amino acids or more, 5 amino acids or more, 6 amino acids or more, 7 amino acids or more, 8 amino acids or more, 9 amino acids or more), and more preferably 10 amino acids or more (for example, 20 amino acids or more). Amino acids or more, 30 amino acids or more, 40 amino acids or more, 50 amino acids or more, 60 amino acids or more, 70 amino acids or more, 80 amino acids or more, 90 amino acids or more).
  • the upper limit is preferably 200 amino acids or less (for example, 190 amino acids or less, 180 amino acids or less, 170 amino acids or 160 amino acids), and more preferably 150 amino acids or less (for example, 140 amino acids or less, 130 amino acids or less, 120 amino acids).
  • it is 110 amino acids or less, more preferably 100 amino acids or less (for example, 90 amino acids or less, 80 amino acids or less, 70 amino acids or less, 60 amino acids or less), and more preferably 50 amino acids or less (for example, 40 amino acids or less, 30 amino acids or less, 20 amino acids or less, 10 amino acids or less).
  • the fusion protein of the present invention may have other functional proteins in addition to the above-mentioned Cas protein, nucleobase converting enzyme, nuclear translocation signal, uracil glycosylase inhibitor and linker sequence.
  • the other functional protein is not particularly limited, and is appropriately selected according to the function to be imparted to the fusion protein of the present invention.
  • functional proteins used for the purpose of facilitating purification and detection of fusion proteins include, for example, 3 ⁇ FLAG tag peptide, FLAG tag peptide (both registered trademarks, Sigma-Aldrich), XTEN peptide, SH3 peptide and the like. Examples thereof include modified versions, Myc tags, His tags, HA tags, and fluorescent protein tags (GFP and the like).
  • the fusion protein of the present invention has been described above, specific amino acid sequences relating to various proteins constituting the fusion protein can be found in other than those shown above, as well as known literature and NCBI (http://https) by those skilled in the art. : //Www.ncbi.nlm.nih.gov/guide/) and the like can be searched and obtained as appropriate. Further, not limited to the typical amino acid sequence (for example, NCBI reference sequence) obtained in this way, the proteins and the like according to the present invention are the typical amino acids as long as each function of the protein is maintained. It may also include variants and homologues to the sequence. In addition, such variants and homologues usually have high homology to a typical amino acid sequence. The high homology is usually 60% or more, preferably 70% or more, more preferably 80% or more, still more preferably 90% or more (eg, 95% or more, 96% or more, 97% or more, 98% or more, 99% or more).
  • the fusion protein of the present invention can be obtained by a person skilled in the art based on the information of the nucleotide sequence encoding the amino acid sequence of Escherichia coli, animal cells, insect cells, plant cells, and cell-free protein synthesis system (for example, reticulated red erythrocyte extraction). It can also be biosynthesized by a genetic method using a liquid, wheat germ extract) or the like. Further, based on the information on the amino acid sequence, it can be chemically synthesized using a peptide synthesizer or the like.
  • the present invention provides polynucleotides encoding the fusion proteins described above.
  • the polynucleotide is not particularly limited, but it is preferable to optimize the codons used over the entire length of the CDS according to the host cell to be introduced.
  • the protein expression level can be expected to increase by converting the sequence into codons that are frequently used in the host organism.
  • the codon usage frequency data in the host used can be obtained from the genetic code usage frequency database (http://www.kazusa.or.jp/codon/index.html) published on the homepage of the Kazusa DNA Research Institute.
  • the codons used in the DNA sequence that are not frequently used in the host can be converted into codons that are frequently used by encoding the same amino acid. Good.
  • it can be optimized by using programs provided by various manufacturers. Examples of such manufacturers and programs provided include GenScript (OptimumGene), Integrated DNA Technologies (Codon Optimization), and Eurofins Genomics (GENEius), among which GenScript codon (OptimumGene) by GenScript codon (Op) Is preferable.
  • the present invention may also take the form of an expression vector such that the fusion protein encoded by the polynucleotide can be expressed in the host.
  • it comprises one or more regulatory elements that are operably linked to the polynucleotide (DNA) to be expressed.
  • “operably bound” means that the DNA is operably bound to the regulatory element.
  • "Regulatory elements” include promoters, enhancers, internal ribosome entry sites (IRES), self-cleaving peptides (eg, self-cleaving 2A peptides), and other expression control elements (eg, transcription termination signals, eg, polyadenylation). Signal and poly U sequences).
  • telomeres e.g., telomere kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase kinase (PGK)). Promoters and EF1 ⁇ promoters), polI promoters, or combinations thereof.
  • the expression vector of the present invention may contain a selectable marker (drug resistance gene, nutritional requirement complementary gene, etc.) so that introduction into a host cell can be detected.
  • a person skilled in the art can select an appropriate expression vector according to the type of cell to be introduced and the like.
  • such an expression vector chemically synthesizes a DNA strand, or, as shown in Examples described later, a partially overlapping oligo DNA short strand synthesized by using a PCR method or a Gibson Assembly method. By connecting, it is also possible to construct a DNA encoding the entire length.
  • the present invention provides a genome editing system including the following (A) and (B).
  • the guide RNA is a combination of crRNA and tracrRNA corresponding to the CRISPR / Cas9 system.
  • the crRNA and tracrRNA may be in the form of a single molecule (for example, a form in which an intervening sequence is sandwiched between them) or a two-molecule form.
  • the crRNA contains at least a base sequence complementary to the base sequence of the target DNA region (targeted base sequence) and a base sequence capable of interacting with tracrRNA in this order from the 5'side.
  • the crRNA forms a double-stranded RNA with the tracrRNA in a base sequence capable of interacting with the tracrRNA, and the formed double-stranded RNA interacts with the Cas9 protein. This guides the Cas9 protein to the target DNA region.
  • base editing at the target site is performed within the sequence on the target DNA of the guide RNA (guide RNA target sequence) located upstream (5'side) of the PAM sequence recognized by the Cas9 protein. Occurs. More specifically, the complementarity of base pairing between the guide RNA target sequence and the targeting base sequence which is a complementary sequence thereof, and the PAM sequence located on the 3'side of the complementary strand of the guide RNA target sequence. And, it happens at the position determined by both.
  • the guide RNA of the Cas9 protein according to the present invention recognizes the complementary strand of the base sequence existing on the 5'side of the PAM sequence so that the PAM sequence can be recognized by the Cas9 protein, that is, a guide.
  • the guide RNA target sequence recognized by RNA is designed to be the base sequence of the complementary strand of the base sequence on the 5'side of the PAM sequence so that the PAM sequence can be recognized by the Cas9 protein.
  • CRISPR Design Tool http://crispr.mit.edu/) (Massachusetts Institute of Technology).
  • E-CRISPR http://www.e-crisp.org/E-CRISPR/
  • Zifit CRISPR http://zifit.partners.org/ZiFiT/) (ZingFinger Consortium)
  • Cas9des //Cas9.cbi.pku.edu.cn/) (Beijing University), CRISPRdirect (http://crispr.dbcls.jp/) (Tokyo University), CRISPR-P (http://cbi.hzau.edu) .Cn / crispr /) (China Agricultural University), Guide RNA Target Design Tool. It can be determined by using (https://wwws.blueheronbio.com/external/tools/gRNASrc.jsp
  • the guide RNA may not contain tracrRNA.
  • Examples of the CRISPR / Cas system using such a guide RNA as a component include the CRISPR / Cpf1 system.
  • a person skilled in the art can prepare a guide RNA according to the present invention using an in vitro transcription reaction system (for example, T7 DNA polymerase system) using the DNA encoding the same as a template based on the sequence information.
  • an in vitro transcription reaction system for example, T7 DNA polymerase system
  • the expression vector containing the polynucleotide encoding the guide RNA according to the present invention can be appropriately prepared by a person skilled in the art as in the fusion protein described above, but the promoter contained in the expression vector has a short code.
  • a polIII promoter for example, U6, H1, SNR6, SNR52, SCR1, RPR1 promoter
  • U6, H1, SNR6, SNR52, SCR1, RPR1 promoter is preferably used.
  • the DNA editing system of the present invention may constitute a kit containing a combination of the fusion protein of the present invention and a guide RNA or the like.
  • the kit may further include one or more additional reagents. Examples of such additional reagents include, but are not limited to, dilution buffers, reconstruction buffers, wash buffers, nucleic acid transfer reagents, protein transfer reagents, and control reagents.
  • the kit may also include additional instructions for carrying out the methods of the invention.
  • the present invention provides a method for producing a cell in which DNA has been edited, which comprises introducing the above-mentioned genome editing system into the cell.
  • the cells used in the present invention may be eukaryotic cells such as animal cells, plant cells, algae cells and fungal cells, or prokaryotic cells such as bacteria and paleontology. Eukaryotic cells are preferred.
  • eukaryotic cells include animal cells, plant cells, algae cells, and fungal cells.
  • animal cells include cells of fish, birds, reptiles, amphibians, and insects.
  • the "animal cell” includes, for example, cells constituting an individual animal, cells constituting an organ / tissue extracted from an animal, cultured cells derived from an animal tissue, and the like. Specifically, for example, germ cells such as egg mother cells and sperm; embryo cells of embryos at each stage (for example, 1-cell stage embryo, 2-cell stage embryo, 4-cell stage embryo, 8-cell stage embryo, 16-cell stage). Embryos, mulberry stage embryos, etc.); Stem cells such as induced pluripotent stem (iPS) cells and embryonic stem (ES) cells; fibroblasts, hematopoietic cells, neurons, muscle cells, bone cells, hepatocytes, pancreatic cells , Somatic cells such as brain cells and kidney cells.
  • germ cells such as egg mother cells and sperm
  • embryo cells of embryos at each stage for example, 1-cell stage embryo, 2-cell stage embryo, 4-cell stage embryo, 8-cell stage embryo, 16-cell stage.
  • pre-fertilization and post-fertilization oocytes can be used, but post-fertilization oocytes, that is, fertilized eggs are preferable.
  • the fertilized egg is of a pronuclear stage embryo.
  • “Mammals” is a concept that includes humans and non-human mammals.
  • non-human mammals include ferrets such as cows, wild boars, pigs, sheep and goats, cloven-hoofed animals such as horses, rodents such as mice, rats, guinea pigs, hamsters and squirrels, and lagomorphs such as rabbits. , Dogs, cats, ferrets and other meats.
  • the non-human mammal may be a domestic animal or a companion animal (pet animal) or a wild animal.
  • plant cells include cells of cereals, oil crops, forage crops, fruits, and vegetables.
  • the "plant cell” includes, for example, cells constituting an individual plant, cells constituting an organ or tissue separated from a plant, cultured cells derived from a plant tissue, and the like.
  • plant organs and tissues include leaves, stems, shoot apex (growth point), roots, tubers, callus and the like.
  • plants include rice, corn, banana, peanut, sunflower, tomato, rape, tobacco, wheat, barley, potato, soybean, cotton, carnation, etc., and their breeding materials (eg, seeds, tubers, tubers, etc.) ) Is also included.
  • Examples of methods for introducing a genome editing system into cells include electroporation, calcium phosphate method, liposome method, DEAE dextran method, microinjection method, cationic lipid-mediated transfection, electroporation, transduction, and Will vector. Examples include methods such as infection. Such a method is described in many standard laboratory manuals such as "Leonard G. Daviset al., Basic methods in molecular biology, New York: Elsevier, 1986".
  • the fact that the desired DNA editing occurs in the cells produced in this manner can be determined by those skilled in the art using known DNA analysis methods (for example, PCR method, sequencing method, Southern blotting method). Can be confirmed.
  • next-generation sequencing NGS; Next Generation Sequencing
  • NGS Next Generation Sequencing
  • single molecule sequencing method a synthetic sequencing method (sequencing-by-sequencing, for example, Sequencing by Illumina Solexa Genome Analyzer, Hiseq®, Nextseq, Miseq or Miniseq), Pyro.
  • Sequencing method for example, sequencing by sequencer GSLX or FLX manufactured by Roche Diagnostics (454) (so-called 454 sequencing)
  • rigase reaction sequencing method for example, SoliD® manufactured by Life Technology Co., Ltd.
  • sequencing with 5500 xl can be mentioned.
  • Examples of the single molecule sequencing method include PacBio RS II or PacBioSequel system manufactured by Pacific Biosciences of California, PromethION, Gilead ION or MinION manufactured by Oxford Nanopore Technologies.
  • the BE4 plasmid (pCMV-BE4), BE4max plasmid (pCMV-BE4max), ABE7.10 plasmid (pCMV-ABE7.10) and ABEmax plasmid (pCMV-ABEmax) having the same backbone were obtained from Addgene (each). Catalog numbers: 100802, 112093, 102919 and 112095).
  • the other single-function base editor and dual-function base editor were constructed by concatenating PCR fragments with Gibson assembly as follows.
  • the Target-AID plasmid (pCMV-Target-AID) is a backbone fragment in which two fragments encoding the N-terminal and C-terminal half of Target-AID are amplified from pCMV-ABE 7.10 using the primer pair RS047 / RS8. Built by assembling. The two fragments were amplified from pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA (HPRT) (Catalog No. 79620, manufactured by Addgene) using primer pairs RS045 / HM129 and HM128 / RS046, respectively.
  • Target-AIDmax plasmid pCMV-Target-AIDmax
  • the pUC-optimized-PmCDA1-ugi plasmid which encodes the C-terminal region of Target-AIDmax with optimized codons, was first subjected to a gene synthesis service of GenScript. Used and built.
  • This C-terminal fragment is amplified by the primer pair SI1304 / SI1307 and amplified from pCMV-BE4max using the primer pair SI945 / SI1308, and the backbone amplified from pCMV-ABEmax using SI1310 / SI1309. Concatenated with the fragment.
  • the BE4max (C) plasmid (pCMV-BE4max (C)) was constructed by replacing the C-terminal region of Target-AIDmax with two codon-optimized rAPOBEC1 and BE4max UGI domains.
  • the nCas9 fragment obtained from pCMV-Target-AIDmax using SI447 / SI1105 the rAPOBEC1 and two UGI fragments obtained from BE4max using the primer pairs SI1352 / SI1357 and SI1359 / SI1350, respectively.
  • the Target-ACE plasmid contains a fragment encoding the plasmid skeleton, ABE7.10 amplified from pCMV-ABE7.10 using the primer pair RS047 / RS052, and the primer pair RS05RS / RS04. It was constructed by ligating a fragment encoding the C-terminal region of Target-AID amplified from pcDNA-pCMV-nCas9 using.
  • the Target-ACEmax plasmid (pCMV-Target-ACEmax) was obtained from the ABEmax fragment obtained from pCMV-ABEmax using the primer pair SI945 / SI1305 and from the pUC optimized PmCDA1-ugi using the primer pair SI1304 / SI1307. It was constructed by ligating a fragment encoding the C-terminal region of -AIDmax and a fragment encoding the plasmid skeleton obtained from pCMV-ABEmax using the primer pair SI1310 / SI1309.
  • the ACBEmax plasmid (pCMV-ACBEmax) is an rAPOBEC1 domain, 2xUGI, prepared to construct pCMV-BE4max (C) with the ABEmax fragment obtained from pCMV-Target ACEmax using the primer pair SI447 / SI1105.
  • the domain and the three fragments encoding the two backbone fragments were constructed by concatenation.
  • the two UGI domains have non-synonymous nucleotide substitutions in the GS linker between tandem UGIs (SGGSG [G> E] SGGS).
  • the gRNA spacer insert was prepared by a single pot reaction on phosphorylated and annealed ssDNA pairs. To prepare each spacer fragment, a T4 polynucleotide kinase reaction sample was prepared using two ssDNAs according to the manufacturer's (Takara) protocol and placed in a thermal cycler at 37 ° C. for 30 minutes and 95 ° C. for 5 minutes. Then, starting from 95 ° C., the reaction was carried out for 70 cycles (12 seconds for each cycle) while lowering the temperature by 1 ° C. for each cycle, and then maintained at 25 ° C.
  • the annealed spacer insert was then ligated to the pU6-gRNA cloning backbone (pSI-356) by golden gate assembly using BsmBI (NEB) and T4 DNA ligase (NEB). This assembly was performed with a thermal cycler under the following conditions. 15 cycles of 37 ° C. for 5 minutes, 20 ° C. for 5 minutes and 55 ° C. for 30 minutes, then maintained at 4 ° C.
  • the C ⁇ T reporter system was designed so that when C ⁇ T base editing occurs in the antisense strand, GTG changes to ATG and the start codon is restored, as shown in FIG. 3A. As shown in FIG. 3B, the A ⁇ G reporter system translates downstream EGFP by changing TAA to CAA and disrupting the stop codon when base editing of A ⁇ G occurs in the antisense strand. Designed to progress.
  • the reporter cassette fragment was added to the primer set 112-V4-BC2-FW / SI680 and RS204 /.
  • SI666 pLV-eGFP (catalog number: 36083 manufactured by Addgene) was used as a template for amplification. Each of them was inserted into the EcoRI and BamHI recognition sites of the pLVSIN-CMV-Puro backbone vector (manufactured by Takara) using T4 DNA ligase (manufactured by NEB) and cloned.
  • each reporter cassette fragment was used, and the primer sets SI760 / SI680 and SI761 / SI680 were used, respectively. It was constructed in the same manner as the lentivirus C ⁇ T base editing reporter plasmid except that it was amplified.
  • poly-C polycytosine repeats
  • poly-A polyadenine repeats
  • poly-CA alternate polyadenine / cytosine repeats
  • the poly-C target site was conditioned to have cytosine that fills a 7-bp sliding window in which the 5'end shifts from PAM to the range of -24 bp to -16 bp at 2 bp intervals.
  • the poly-A target site was conditioned to have adenine at the end of 5'filling a 6-bp sliding window that shifts from PAM to the range of -21 bp to -13 bp at 2 bp intervals.
  • the poly-AC target site and the poly-CA target site are subject to each repetitive sequence satisfying a 6-bp sliding window in which the 5'end shifts from PAM to the range of -24 bp to -14 bp at 2 bp intervals. did.
  • candidate target sites containing 4 bp or more of homopolymers in the gRNA seed region ranging from -8 bp to -1 bp and overlapping the annotated exons were excluded.
  • the two candidate sites with the highest predicted gRNA activity scores were selected, respectively.
  • one candidate site was selected for each of the poly-AC and poly-CA slide window positions.
  • 7 poly-C, 7 poly-A, Six poly-AC and four poly-CA target sites were further screened.
  • the selected target sites of poly-C and poly-A contain both cytosine and adenine bases.
  • 24 target sites were screened from the gRNA library collection previously prepared by us for another assay.
  • Each of these target sites should contain one or more cytosines and one or more adenines in the -20 bp to -14 bp region.
  • a total of 47 on-target sites were analyzed because EGFP control data could not be obtained in the CUL3-NGG site 2 amplicon sequencing assay.
  • HEK293Ta cells were purchased from GeneCopoea and supplemented with 10% fetal bovine serum (FBS) (Thermo Fisher Scientific) and 1% penicillin-streptomycin (Sigma) in modified Dulbecco Eagle's Medium (DMEM, Sigma). Was maintained at 37 ° C. under 5% CO 2 conditions. In addition, cells were routinely checked for mycoplasma contamination by nested PCR using a culture medium as a template.
  • FBS fetal bovine serum
  • DMEM modified Dulbecco Eagle's Medium
  • the next day the culture medium was changed to fresh medium, and two days later, the culture supernatant containing lentivirus particles was collected and dispensed into a 1.5 mL tube. The resulting virus sample was stored at ⁇ 80 ° C. until infection.
  • HEK293Ta cells were seeded in a collagen I coated 96-well plate (manufactured by Asone) with 200 ⁇ L DMEM to a density of ⁇ 5 ⁇ 10 3 cells / well. The next day, 0.48 ⁇ L PEI, 50 ⁇ L 1 ⁇ PBS, 120 ng base editor expression plasmid or control EGFP expression plasmid (pLV-eGFP), and 40 ng gRNA expression plasmid were mixed and then at room temperature before being added to each transfection well. Incubated for 15 minutes. This experiment was independently repeated 3 times.
  • HEK293Ta cells were seeded on a collagen I coated 6-well plate (manufactured by Asone) with 2 mL DMEM to a density of ⁇ 2 ⁇ 10 5 cells / well. The next day, 3.0 ⁇ L of 1 mg / mL PEI, 200 ⁇ L 1 ⁇ PBS, 666 ng base editor expression plasmid or control EGFP expression plasmid (pLV-eGFP), and 333 ng EMX1-targeting gRNA plasmid were mixed and incubated at room temperature for 15 minutes. , Added to each well. This experiment was independently repeated twice.
  • the culture medium was aspirated and 200 ⁇ L of freshly prepared 50 mM NaOH was added to each cell sample in a 24-well plate. Then, 100 ⁇ L of it was transferred to a 96-well PCR plate (manufactured by Nippon Genetics), heated at 95 ° C. for 15 minutes, cooled to 4 ° C., and then 20 ⁇ L of 1M Tris-HCl (pH 8.0) was added. Neutralized. Using the cell lysate thus prepared as a template, the corresponding 1st HTS primer was used to amplify the target region in each sine pull.
  • PCR is performed in a 20 ⁇ L solution containing 1 ⁇ L template, 10 ⁇ M primers, 1 ⁇ L, 0.2 ⁇ L Phusion DNA polymerase, 5 ⁇ Phusion HF buffer (manufactured by NEB) and 1.6 ⁇ L of 2.5 mM dNTPs under the following temperature cycle conditions. It was done in. The cycle of 98 ° C. for 30 seconds, then 98 ° C. for 10 seconds, 60 ° C. for 10 seconds and 72 ° C. for 10 seconds was repeated 30 times, and for the final extension reaction, 72 ° C. for 5 minutes.
  • the PCR product thus obtained was subjected to electrophoresis on a 2% agarose gel. Further, 1 ⁇ L of a 10-fold diluted solution was used as a template, and the mixture was re-amplified in a 20 ⁇ L solution containing a custom Illumina index primer under the following temperature cycle conditions. The cycle of 98 ° C. for 30 seconds, then 98 ° C. for 10 seconds, 60 ° C. for 10 seconds and 72 ° C. for 30 seconds was repeated 15 times, and for the final elongation reaction, 72 ° C. for 5 minutes.
  • Each index library thus obtained was subjected to electrophoresis on a 2% agarose gel, and a band having an expected molecular weight was recovered using a FastGene Gel / PCR extraction kit (manufactured by Nippon Genetics).
  • PCR is performed in a 20 ⁇ L solution containing a 2 ⁇ L genomic DNA template, 8.3 ⁇ M each primer 1.20 ⁇ L, 0.2 ⁇ L Phaseion DNA polymerase, 5 ⁇ Phaseion HF buffer, and 1.6 ⁇ L 2.5 mM dNTPs in the following temperature cycle. It was done under the conditions. First, a cycle of 98 ° C. for 30 seconds, 98 ° C. for 10 seconds, 60 ° C. for 10 seconds and 72 ° C. for 60 seconds was repeated 30 times, and for the final extension reaction, 72 ° C. for 5 minutes.
  • a PCR reaction was carried out in a 20 ⁇ L solution under the following temperature cycle conditions to amplify the purified product. First, a cycle of 98 ° C. for 30 seconds, then 98 ° C. for 10 seconds, 65 ° C. for 10 seconds, and 72 ° C. for 90 seconds was repeated for 15 cycles, and for the final extension reaction, 72 ° C. for 5 minutes.
  • Each index library thus obtained was subjected to electrophoresis on a 2% agarose gel, and a band having an expected molecular weight was recovered using a FastGene Gel / PCR extraction kit (manufactured by Nippon Genetics).
  • the sequence library was quantified by qPCR using the KAPA library quantification kit Illumina (manufactured by KAPA Biosystems) for multiplexing.
  • the multiplexed library uses Illumina HiSeq2500 (TruSe rapid SBS kit; 2 ⁇ 151 bp paired-end) or MiSeq (MiSeq v3 kit; 2 ⁇ 2200 bp paired-end) with 20% to 30% of PhX. As a control, it was quantified by the same qPCR protocol.
  • RNA-seq RNA-seq
  • WES total exome sequencing
  • RNA-seq library To prepare the RNA-seq library, the cells were subjected to centrifugation at 1,000 rpm for 5 minutes, and the culture supernatant was removed. Then, total RNA was extracted using ISOSPIN Cell & Tissue RNA (Nippon Gene), and an RNA-seq library was prepared using a TRUSeq Stranded mRNA library preparation kit (manufactured by Illumina).
  • genomic DNA was extracted using NucleoSpin Tissue (Machery Nagal). Then, 500 ng of genomic DNA was fragmented into an average size of 150 to 300 bp in a 50 ⁇ L solution using an ultrasonic irradiation device (manufactured by Waken Yakuhin Co., Ltd., product name: Covalis E-220). Then, end repair, A-tailing and SureSelect adapter ligation (manufactured by Agilent) were performed.
  • the adapter-binding DNA was concentrated using the KAPA HyperPrep kit (KAPA Biosystems). Then, 750 ng of the pre-amplified DNA was hybridized to the SureSelectXT Human All Exon V3 kit probe (manufactured by Agilent) over 20 hours.
  • Post-capture DNA library amplification was performed using KAPA DNA polymerase and SureSelect Indexing post-capture polymerase chain reaction primers to index the library. Finally, the library was purified using Agecurt AMPure XP beads. Fragment size distributions and yields of RNA-seq and WES libraries were quantified using a LabChip GX electrophoresis system (manufactured by PerkinElmer). After multiplexing, the final library was sequenced with Illumina NovaSeq 6000 (S2 flow cell; 2 x 101 bp paired end).
  • RNA-seq and WES base call files were multiplexed using bcl2fastq2 (version v2.20.0).
  • sextk version 1.3-r107-dirty
  • the subsampled RNA-seq reads were then referenced using STAR (version 2.7.3a) and transcript annotation GTF (GENCODE Release 22 GRCh38.p2) to reference the human genome GRCh38. d1. It was mapped to vd1 and deduplicated using Picard MarkDuplicates (version 2.0.1).
  • Subsampled DNA reads were also processed for mapping according to the National Cancer Center Genome Data Commons DNA-Seq analysis pipeline. That is, the read is referenced by the BWA-MEM (version 0.7.15) Human Genome GRCh38. d1. Aligned with vd1, PCR deplicates were removed using Picard Mark Duplicates (version 2.0.1).
  • the GATK Haplotype Caller (version 4.1.4.1) is available at the Reference Human Genome GRCh38. d1. It was used to detect both DNA and RNA variants of vd1. Mutation positions detected by the GATK haplotype caller are described in Grunewald, J. et al. et al.
  • the base editing pattern in the window from PAM to m bp to n bp is Sm, n, and the dislocation-state character strings sm, sm + 1, ..., Sn-1, sn.
  • the frequency of predetermined results Sm, n at the test site was predicted by the following formula.
  • this prediction model takes into account other independent base translocation patterns and of the base translocations at all edited positions. Calculate the geometric average of probabilities. The specific P (s_j
  • the base editing prediction model was evaluated by 5-fold, 15-fold, and leave-one-out cross-validation experiments. For the target site tested, the frequency of all base edit result patterns detected in the ampli-consequencing dataset for windows ranging from -25 bp to -5 bp from PAM does not overlap with the test target site amplifiers at other target sites. Predicted by training reconsequencing data. In the k-fold cross-validation experiment, [47 / k] target sites were randomly selected as test samples from 47 target sites. The edited results were predicted by training the amplicon sequencing data of the remaining target sites, and the test sample was randomly changed and repeated 100 times.
  • amplicon sequencing data from 46 other target regions were used to predict the editing results for each target site.
  • the predicted editing frequency is converted into a relative editing frequency among all edited leads, and in the case of multiple, one prediction result is randomly selected for each editing result pattern, and prediction and experimentation are performed. It was measured by calculating Pearson's correlation coefficient between measurements.
  • the predicted probabilities for the outcome of base editing with target source codon conversion are all summed per gRNA, and the maximum integration probability among the possible gRNAs is defined as the conversion probability. did.
  • the conversion probability for codons that do not have a potential gRNA was defined as 0 for any destination codon type.
  • a CCM was finally generated to show the frequency of each source destination codon conversion type with a conversion probability threshold of 5%. For various source-destination codon types, bystander mutation risk scores were calculated by dividing the CCM frequency that allows bystander mutations by those that do not tolerate bystander mutations.
  • gRNA target sites that may correct each mutation were determined within the ⁇ 25 bp region from the target mutation. First screened. Using a base-edited predictive model trained with all amplicon sequencing datasets, the probability that mutations will be modified by various gRNA target sites in the ⁇ 15 bp region from the target mutation without inducing unwanted bystander mutations. Predicted. For each mutation, its corrected probability was defined as the maximum probability of target codon conversion among those induced by various gRNAs.
  • Cytidine deaminase and adenosine deaminase in a single nCas9 (D10A) with the aim of providing a base editing tool that allows simultaneous base substitution of both C ⁇ T and A ⁇ G.
  • Target-ACE In Target-ACE, nCas9 fused to PmCDA1 derived from Target-AID and TadA heterodimer derived from ABE-7.10 are fused as C-terminal and N-terminal, respectively, for consistent single-function base editing. It is configured with other functional domains that exist.
  • the amino acid sequence of Target-ACE and the nucleotide sequence encoding are shown in SEQ ID NOs: 4 and 3, respectively.
  • positions 2 to 167 indicate a modified TadA (TadA *)
  • positions 168 to 171 indicate a GGS linker
  • positions 172 to 175 indicate a GGS linker
  • Positions 191 indicate the XTEN linker
  • positions 192 to 195 indicate the GGS linker
  • positions 196 to 199 indicate the GGS linker
  • positions 200 to 365 indicate TadA
  • positions 366 to 369 indicate the GGS linker
  • positions 370 to 373 Positions indicate GGS linkers
  • positions 374 to 389 indicate XTEN linkers
  • positions 390 to 393 indicate GGS linkers
  • positions 394 to 397 indicate GGS linkers
  • positions 398 to 1764 indicate SpCas9 (D10A), 1765.
  • Positions ⁇ 1774 indicate 2 ⁇ GS linker
  • positions 1775-1831 indicate mutant SH3
  • positions 1832 to 1857 indicate 3 ⁇ FLAG
  • positions 1858 to 2065 indicate PmCDA1
  • positions 2066 to 2073 indicate mononode type nuclei.
  • a translocation signal (nuclear localization signal derived from SV40 T antigen) is shown
  • positions 2077 to 2159 indicate a uracil glycosilase inhibitor.
  • Target-ACEmax was constructed by applying GenScript codon optimization and addition of a bifurcated nuclear localization signal (NLS) to the N-terminal to the Target-ACE (Fig. 2).
  • the amino acid sequence of Target-ACEmax and the encoding nucleotide sequence are shown in SEQ ID NOs: 2 and 1, respectively.
  • positions 2 to 19 indicate a binode-type nuclear localization signal
  • positions 20 to 185 indicate a modified TadA (TadA *)
  • positions 186 to 189 indicate a GGS linker.
  • Positions 190 to 193 indicate the GGS linker
  • positions 194 to 209 indicate the XTEN linker
  • positions 210 to 213 indicate the GGS linker
  • positions 214 to 217 indicate the GGS linker
  • positions 218 to 383 indicate TadA.
  • 384-387 positions indicate GGS linker
  • 388-391 positions indicate GGS linker
  • 392-407 positions indicate XTEN linker
  • 408-411 positions indicate GGS linker
  • 412-415 positions indicate GGS linker.
  • Positions 416 to 1782 indicate SpCas9 (D10A)
  • positions 1783 to 1792 indicate 2 ⁇ GS linker
  • positions 1793 to 1849 indicate mutant SH3
  • positions 1850 to 1875 indicate 3 ⁇ FLAG
  • positions 1876 to 2083 indicate the GGS linker
  • positions 2084 to 2091 indicate a mononode-type nuclear localization signal (nuclear localization signal derived from SV40 T antigen), and positions 2095 to 2177 indicate a uracil glycosylase inhibitor.
  • ACBEmax was constructed by replacing the codon-optimized PmCDA1 domain in Target-ACEmax with the codon-optimized cytidine deaminase domain rAPOBEC1 derived from BE4max.
  • positions 2 to 19 indicate a binode-type nuclear localization signal
  • positions 20 to 185 indicate a modified TadA (TadA *)
  • positions 186 to 189 indicate a GGS linker.
  • the 190-193 positions indicate the GGS linker
  • the 194-209 positions indicate the XTEN linker
  • the 210-213 positions indicate the GGS linker
  • the 214-217 positions indicate the GGS linker
  • the 218-383 positions indicate the TadA.
  • Positions 416 to 1782 indicate SpCas9 (D10A)
  • positions 1783 to 1792 indicate a 2 ⁇ GS linker
  • positions 1793 to 1849 indicate mutant SH3
  • positions 1850 to 1875 indicate 3 ⁇ FLAG
  • positions 2104 to 2186 indicate a uracil glycosylase inhibitor
  • positions 2187 to 2196 indicate a 3 ⁇ GGS linker
  • positions 2197 to 2279 indicate a uracil glycosylase inhibitor
  • positions 2280 to 2283 indicate a GGS linker.
  • 2284-2300 indicate a bifurcated nuclear localization signal.
  • Target-AIDmax As controls for these dual-function base editors, in addition to Target-AID, BE4max, ABE, and ABEmax, codon-optimized Target-AIDmax and BE4max (C) were constructed.
  • BE4max (C) is obtained by replacing the C-terminal PmCDA1 domain derived from Target-AIDmax with the rAPOBEC1 domain derived from BE4max.
  • the base editing spectra at 47 genomic target sites of human embryonic kidney (HEK293Ta) cells were subjected to triple amplicon sequencing (1, 833 assay).
  • the dual-function base editor has similar base editing properties from the corresponding single-function base editor. It became clear that it inherited (Fig. 6).
  • the C ⁇ T and A ⁇ G edit frequencies at 47 endogenous target sites containing different numbers of cytosine and adenine were significantly different, but were generally consistent with the base edit spectral data (FIG. 7).
  • the amplicon sequencing data for 47 genomic target sites involves editing both C ⁇ T and A ⁇ G by a dual-function base editor and four controls (four combinations of corresponding base editors).
  • An edit result pattern of 722 was observed with a read frequency of 0.1% or more (FIG. 8).
  • Target-ACEmax and Target-ACE, and their corresponding base editor mixtures exhibit clusters of multi-base editing patterns that differ from ACBEmax and its corresponding base editor mixtures.
  • dinucleotide homologous co-editing spectra and dinucleotide heterogeneous co-editing spectra were then examined for each of the 13 base editors.
  • the mean co-editing frequency for each cytosine-cytosine pair, adenine-adenine pair or cytosine-adenine pair at different locations from the PAM was calculated using the amplicon sequencing results of the relevant target sites.
  • the plurality of C ⁇ T edited spectra and the plurality of A ⁇ G edited spectra were reacquired with the same tendency as the average frequency of C ⁇ T and A ⁇ G at 47 target sites.
  • Target-ACEmax and Target-AIDmax + ABEmax were 19.2% and 21.
  • the non-specific A ⁇ I editing activity of Target-ACEmax (mean 3,359 over two replications) was relatively lower than Target-AIDmax + ABEmax (mean 4,179).
  • the ampli-consequencing data obtained in this example proved to be sufficient for the training procedure. Since the machine learning method can predict multinucleotide co-editing, it was used to predict all frequencies of codon conversion patterns in the human genome obtained by various base editing methods. If bystander mutations were not allowed to occur, this analysis showed that Target-ACEmax and its corresponding base editor mix, Target-AIDmax + ABEmax, have the highest potential for diversifying genomic codons. The same analysis was then repeated, allowing bystander mutations to occur. Then, the bystander risk of causing undesired mutations was estimated for all base editing methods (Fig. 15).
  • Target-ACEmax obtained a pair of heterologous disease mutations reported in the ClinVar database (Landrum, MJ et al. Nucleic Acids Res 44, D862-868 (2016).). It was predicted that it was most likely to be corrected (Fig. 16).
  • the present invention is expected to have a wide range of applications in the fields of agriculture, forestry and fisheries, livestock, medical care, etc., including breeding and gene therapy.

Abstract

According to the present invention, a fusion protein has different nucleic acid base-converting enzymes which are respectively bonded to an N-terminal and a C-terminal of a Cas protein, wherein the Cas protein partially or fully lacks nuclease activity.

Description

2種の核酸塩基変換酵素が融合されたCasタンパク質を利用したゲノム編集システムGenome editing system using Cas protein, which is a fusion of two types of nucleobase converting enzymes
 本発明は、2種の核酸塩基変換酵素が融合されたCasタンパク質を利用したゲノム編集システムに関し、より詳しくは、Casタンパク質のN末端及びC末端に異なる核酸塩基変換酵素が結合した融合タンパク質であって、Casタンパク質がヌクレアーゼ活性の一部又は全部を喪失している融合タンパク質に関する。本発明は、また当該融合タンパク質をコードするポリヌクレオチド、該ポリヌクレオチドを含む発現ベクター、及びそれらを含むゲノム編集システム、並びに該システムを用いた、DNAが編集された細胞を製造する方法に関する。 The present invention relates to a genome editing system using a Cas protein in which two types of nucleic acid-based converting enzymes are fused. More specifically, the present invention is a fusion protein in which different nucleic acid-based converting enzymes are bound to the N-terminal and C-terminal of the Cas protein. With respect to fusion proteins in which the Cas protein has lost some or all of its nuclease activity. The present invention also relates to a polynucleotide encoding the fusion protein, an expression vector containing the polynucleotide, a genome editing system containing the polynucleotide, and a method for producing a DNA-edited cell using the system.
 生命の設計図であるゲノムはA、T、G、Cの4文字からなるDNA塩基配列で構成されている。近年、細胞内ゲノムDNA中の特定の狙った配列を自在に編集できるゲノム編集技術が急速に発展しており、農業、医療分野を含めた生物学分野全体に大きな変革をもたらしつつある。 The genome, which is the blueprint for life, is composed of a DNA base sequence consisting of the four letters A, T, G, and C. In recent years, genome editing technology that can freely edit a specific target sequence in intracellular genomic DNA has been rapidly developed, and is bringing about a great change in the entire biology field including agriculture and medical fields.
 特に、CRISPR-Cas9ゲノム編集は、DNA切断活性を有するCas9Cas9タンパク質が、3’プロトスペーサー隣接モチーフ(PAM)の上流に位置する標的DNA領域に、ガイドRNA(gRNA)によってリクルートされ、二本鎖DNA切断(DSB)を誘導するゲノム編集ツールである(非特許文献1,2)。この方法は、エラーを起こしやすいDNA修復を通じて特定の遺伝子をノックアウトし、またDSB誘導相同組換えを通じて別の外来DNAの染色体への挿入を可能にする等、ゲノム編集ツールとして広くその応用が進んでいた。しかしながら、CRISPR-Cas9ゲノム編集法においては、二本鎖DNA切断による細胞毒性や、編集の正確性が課題となっていた。 In particular, in CRISPR-Cas9 genome editing, Cas9Cas9 protein having DNA cleaving activity is recruited by a guide RNA (gRNA) to a target DNA region located upstream of the 3'protospacer adjacent motif (PAM), and double-stranded DNA. It is a genome editing tool that induces cleavage (DSB) (Non-Patent Documents 1 and 2). This method has been widely applied as a genome editing tool, such as knocking out a specific gene through error-prone DNA repair and enabling insertion of another foreign DNA into a chromosome through DSB-induced homologous recombination. There was. However, in the CRISPR-Cas9 genome editing method, cytotoxicity due to double-stranded DNA cleavage and editing accuracy have been problems.
 このような中、この数年はDNA切断活性のない変異型のCas9に脱アミノ化酵素(デアミナーゼ)を融合した塩基編集ツールが開発されてきた。より具体的には、デオキシヌクレオシドデアミナーゼを、ヌクレアーゼ欠損又はニッカーゼCas9(dCas9又はnCas9)とgRNAとからなる複合体にテザリングすることによって誘導される塩基編集は、ゲノム配列において効率的かつ直接的な塩基置換を誘導することが明らかになっている(非特許文献3)。 Under these circumstances, for the past few years, a base editing tool has been developed in which a deaminase (deaminase) is fused with a mutant Cas9 that does not have DNA cleaving activity. More specifically, base editing induced by nuclease deficiency or tethering deoxynucleoside deaminase to a complex consisting of nickase Cas9 (dCas9 or nCas9) and gRNA is an efficient and direct base in the genomic sequence. It has been clarified that it induces substitution (Non-Patent Document 3).
 現在利用可能な塩基編集の中で、シトシン塩基編集(CBEs、非特許文献4,5及び7)及びアデニン塩基編集(ABEs、非特許文献6及び7)は、gRNA標的部位の狭い領域にて、非常に効率的で正確な塩基置換を可能にする。 Among the currently available base edits, cytosine base edits (CBEs, Non-Patent Documents 4, 5 and 7) and adenine base edits (ABEs, Non-Patent Documents 6 and 7) are performed in a narrow region of the gRNA target site. Allows for very efficient and accurate base substitution.
 CBEsは、通常シチジンデアミナーゼ及びウラシルグリコシラーゼ阻害剤(UGI)から構成される。シチジンデアミナーゼは、BEs(非特許文献5)に使用されるrAPOBEC1、Target-AID(非特許文献4)で使用されるPmCDA1等であり、非gRNA結合DNA鎖のシチジンをウリジンに変換する。また、UGIは、塩基除去修復を阻害する。そして、これらを用いることにより、DNA修復を介し、シトシンを、ウラシルに、さらにはチミンに置き換えることが可能となる。 CBEs are usually composed of cytidine deaminase and uracil glycosylase inhibitor (UGI). The cytidine deaminase is rAPOBEC1 used in BEs (Non-Patent Document 5), PmCDA1 used in Target-AID (Non-Patent Document 4), etc., and converts cytidine in a non-gRNA-binding DNA strand into uridine. UGI also inhibits base excision repair. Then, by using these, it becomes possible to replace cytosine with uracil and further with thymine through DNA repair.
 ABEsにおいては、例えば、野生型及び改変型のTadAアデノシンデアミネーゼからなるヘテロダイマー複合体が用いられる。当該酵素は、アデニンをイノシンに変換し、グアニンとして複製する(非特許文献6)。 In ABEs, for example, a heterodimer complex consisting of wild-type and modified TadA adenosine deaminese is used. The enzyme converts adenine to inosine and replicates it as guanine (Non-Patent Document 6).
 このように、塩基編集ではDNAを切断せずに脱アミノ化反応を介して標的DNAのCをTに、あるいはAをGに高効率で変換(C→T、あるいはA→G)できるため、作物の品種改良や遺伝子治療をはじめとした領域において優れたゲノム編集ツールとして期待が高まっている。しかしながら、これまでの塩基編集ツールで可能な塩基置換パターンはC→T又はA→Gの二種類に限られていた。 In this way, in base editing, C of the target DNA can be converted to T or A to G with high efficiency (C → T or A → G) through deamination reaction without cutting the DNA. Expectations are rising as an excellent genome editing tool in areas such as crop breeding and gene therapy. However, the base substitution patterns that can be used with conventional base editing tools have been limited to two types, C → T or A → G.
 本発明は、前記従来技術の有する課題に鑑みてなされたものであり、複数種の塩基置換を同時に可能とする塩基編集ツールを提供することを目的とする。 The present invention has been made in view of the above-mentioned problems of the prior art, and an object of the present invention is to provide a base editing tool capable of simultaneously performing a plurality of types of base substitutions.
 本発明者らは、前記目的を達成すべく、C→T及びA→Gの両方の塩基置換を同時に可能とする塩基編集ツールを提供することを目的として、シチジンデアミネーサーゼとアデノシンデアミネーサーゼを単一のnCas9(D10A)に融合させ、3種の塩基エディタ Target-ACE、Target-ACEmax、ACBEmaxを、先ず作製した。 The present inventors aim to provide a base editing tool that enables simultaneous base substitution of both C → T and A → G in order to achieve the above object, and cytidine deaminase and adenosine. Adenosine was fused to a single nCas9 (D10A) to first prepare three base editors, Target-ACE, Target-ACEmax, and ACBEmax.
 Target-ACEは、Target-AID由来のPmCDA1に融合したnCas9と、ABE-7.10由来のTadAヘテロダイマーとが、各々C末及びN末として融合しており、矛盾しない単一機能塩基編集に存在する他の機能ドメインを伴い、構成されている。 In Target-ACE, nCas9 fused to PmCDA1 derived from Target-AID and TadA heterodimer derived from ABE-7.10 are fused as C-terminal and N-terminal, respectively, for consistent single-function base editing. It is configured with other functional domains that exist.
 Target-ACEmaxは、前記Target-ACEに、GenScriptコドン最適化と、N末端への双節型核局在化シグナル(双節型NLS)の追加を施すことによって、構築した。 Target-ACEmax was constructed by applying GenScript codon optimization and addition of a binode-type nuclear localization signal (binode-type NLS) to the N-terminal to the Target-ACE.
 ACBEmaxは、Target-ACEmaxにおいてコドンが最適化されたPmCDA1ドメインを、BE4max由来のシチジンデアミナーゼドメイン rAPOBEC1をコドンが最適化されたものに置き換えることによって構築した。 ACBEmax was constructed by replacing the codon-optimized PmCDA1 domain in Target-ACEmax with the codon-optimized cytidine deaminase domain rAPOBEC1 derived from BE4max.
 次に、従来の単機能塩基エディタ、それらの組み合わせ、上記Target-ACEmax等を用いて、ヒト培養細胞におけるゲノム上の47ヶ所の標的領域のゲノム編集を行い、これらの塩基置換パターンを、アンプリコンシーケンシングによって観察した。その結果、Target-ACE、Target-ACEmax及びACBEmaxのいずれにおいても、高い異種塩基同時編集効率を有することが明らかになった。特に、Target-ACEmaxが最も高い異種塩基同時編集効率を有することが明らかになった。 Next, using a conventional single-function base editor, a combination thereof, the above-mentioned Target-ACEmax, etc., genome editing of 47 target regions on the genome in cultured human cells was performed, and these base substitution patterns were subjected to amplicon. Observed by sequencing. As a result, it was clarified that all of Target-ACE, Target-ACEmax and ACBEmax have high co-editing efficiency of different bases. In particular, it was revealed that Target-ACEmax has the highest co-editing efficiency of heterologous bases.
 また、ゲノム編集技術一般において問題視されている意図しないゲノム領域の編集(DNAオフターゲット効果)や塩基編集技術で懸念されている意図しない細胞内RNA分子の編集(RNAオフターゲット効果)について、エキソーム解析及びトランスクリプトーム解析を行った。その結果、Target-ACEmaxが、従来の塩基編集ツールと比較して、いたずらにこれらのオフターゲット効果を引き上げないことも明らかにした。 In addition, regarding the editing of unintended genomic regions (DNA off-target effect), which is generally regarded as a problem in genome editing technology, and the editing of unintended intracellular RNA molecules (RNA off-target effect), which is a concern in base editing technology, exome. Analysis and transcriptome analysis were performed. As a result, it was also revealed that Target-ACEmax does not unnecessarily enhance these off-target effects as compared with conventional base editing tools.
 また、アンプリコンシーケンシングによって得られた膨大な塩基編集パターン評価データを元に、任意のターゲット配列の塩基編集パターンを予測する機械学習モデルの開発も行った。この機械学習モデルを用いて、Target-ACEmaxがタンパク質をコードする遺伝子群について、標的塩基群以外の不要な塩基編集を抑えて、多様なアミノ酸置換を正確に誘導できること、ヒトに観察される多様な疾患遺伝子変異の複数を同時に高効率に修復できることも示し、本発明を完成するに至った。 We also developed a machine learning model that predicts the base editing pattern of an arbitrary target sequence based on the huge amount of base editing pattern evaluation data obtained by amplicon sequencing. Using this machine learning model, it is possible to accurately induce various amino acid substitutions in the gene group in which Target-ACEmax encodes a protein by suppressing unnecessary base editing other than the target base group, and various observed in humans. It has also been shown that a plurality of disease gene mutations can be repaired at the same time with high efficiency, and the present invention has been completed.
 すなわち、本発明は、Casタンパク質のN末端及びC末端に異なる核酸塩基変換酵素が結合した融合タンパク質であって、Casタンパク質がヌクレアーゼ活性の一部又は全部を喪失している融合タンパク質に関する。本発明は、また当該融合タンパク質をコードするポリヌクレオチド、該ポリヌクレオチドを含む発現ベクター、及びそれらを含むゲノム編集システム、並びに該システムを用いた、DNAが編集された細胞を製造する方法に関し、より具体的には、以下を提供するものである。
<1> Casタンパク質のN末端及びC末端に異なる核酸塩基変換酵素が結合した融合タンパク質であって、Casタンパク質がヌクレアーゼ活性の一部又は全部を喪失している融合タンパク質。
<2> 異なる核酸塩基変換酵素がアデノシンデアミナーゼ及びシチジンデアミナーゼである、<1>に記載の融合タンパク質。
<3> 融合タンパク質において、アデノシンデアミナーゼがCasタンパク質のN末端に、シチジンデアミナーゼがCasタンパク質のC末端に結合している、<2>に記載の融合タンパク質。
<4> さらに、核移行シグナル及びウラシルグリコシラーゼ阻害剤の少なくとも1つが結合していている、<1>から<3>のいずれかに記載の融合タンパク質。
<5> <1>から<4>のいずれかに記載の融合タンパク質をコードするポリヌクレオチド。
<6> 導入される宿主細胞に合わせてコドンが最適化されている、<5>に記載のポリヌクレオチド。
<7> <5>又は<6>に記載のポリヌクレオチドを含む発現ベクター。
<8> 以下の(A)及び(B)を含むゲノム編集システム。
(A)<1>から<4>のいずれかに記載の融合タンパク質、<5>若しくは<6>に記載のポリヌクレオチド、又は<6>に記載の発現ベクター。
(B)ガイドRNA、該ガイドRNAをコードするポリヌクレオチド、又は該ポリヌクレオチドを含む発現ベクター
<9> DNAが編集された細胞を製造する方法であって、細胞に、<8>に記載のゲノム編集システムを導入することを含む方法。
That is, the present invention relates to a fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, and the Cas protein loses part or all of the nuclease activity. The present invention also relates to a polynucleotide encoding the fusion protein, an expression vector containing the polynucleotide, a genome editing system containing the polynucleotide, and a method for producing a DNA-edited cell using the system. Specifically, it provides the following.
<1> A fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, and the Cas protein loses part or all of its nuclease activity.
<2> The fusion protein according to <1>, wherein the different nucleobase converting enzymes are adenosine deaminase and cytidine deaminase.
<3> The fusion protein according to <2>, wherein the adenosine deaminase is bound to the N-terminal of the Cas protein and the cytidine deaminase is bound to the C-terminal of the Cas protein.
<4> The fusion protein according to any one of <1> to <3>, to which at least one of a nuclear localization signal and a uracil glycosylase inhibitor is bound.
<5> A polynucleotide encoding the fusion protein according to any one of <1> to <4>.
<6> The polynucleotide according to <5>, wherein the codon is optimized according to the host cell to be introduced.
<7> An expression vector containing the polynucleotide according to <5> or <6>.
<8> A genome editing system including the following (A) and (B).
(A) The fusion protein according to any one of <1> to <4>, the polynucleotide according to <5> or <6>, or the expression vector according to <6>.
(B) A method for producing a guide RNA, a polynucleotide encoding the guide RNA, or an expression vector <9> DNA-edited cell containing the polynucleotide, wherein the cell is subjected to the genome according to <8>. Methods involving the introduction of an editing system.
 本発明によれば、1の塩基編集ツールにて、複数種の塩基置換を同時に行なうことが可能となる。 According to the present invention, it is possible to simultaneously perform a plurality of types of base substitutions with one base editing tool.
本発明の二重機能塩基エディタ(Target-ACEmax、Target-ACE、ACBEmax)の開発系統を示す、概略図である。単機能塩基エディタから二重機能塩基エディタへの開発系統は、矢印で示され、二重機能塩基エディタに対応する単機能塩基エディタの組み合わせは、破線によって示される。It is a schematic diagram which shows the development system of the dual-function base editor (Target-ACEmax, Target-ACE, ACBEmax) of this invention. The development line from the single-function base editor to the dual-function base editor is indicated by an arrow, and the combination of single-function base editors corresponding to the dual-function base editor is indicated by a broken line. Target-ACEmaxの構造の概要を示す、図である。It is a figure which shows the outline of the structure of Target-ACEmax. C→T塩基編集レポーター系の概略図を示す。アンチセンス鎖におけるC→T塩基編集に続く、DNA複製により、コドンGTG(バリンをコード)が開始コドンATG(メチオニンをコード)に変換されることによって、EGFPの翻訳が復元される。The schematic diagram of the C → T base editing reporter system is shown. Following C → T base editing on the antisense strand, DNA replication restores the translation of EGFP by converting the codon GTG (coding valine) to the start codon ATG (coding methionine). A→G塩基編集レポーター系の概略図を示す。アンチセンス鎖におけるA→G塩基編集に続く、DNA複製により、終止コドンTAAがコドンCAA(グルタミンをコード)に変換されることによって、その下流のEGFPの翻訳が進行する。The schematic diagram of the A → G base edit reporter system is shown. Following A → G base editing in the antisense strand, DNA replication converts the stop codon TAA to codon CAA (encoding glutamine), which proceeds with the translation of EGFP downstream thereof. 種々の塩基エディタと、対応するオンターゲット(OT)又は非標的(NT)gRNAとを一過的に導入した、C→T及びA→G塩基編集レポーター細胞を、蛍光顕微鏡によって観察した写真である。図中のスケールバーは、40μmを表す。4回の細胞培養にて、一貫した結果が独立して再現された。Photographs of C → T and A → G base editing reporter cells transiently introduced with various base editors and corresponding on-target (OT) or non-target (NT) gRNAs observed with a fluorescence microscope. .. The scale bar in the figure represents 40 μm. Consistent results were independently reproduced in 4 cell cultures. C→T塩基編集レポーター細胞において、開始コドンの復元頻度を示す、グラフである。図中、3回の独立したトランスフェクション実験の結果をドットで示す、それらの平均値を、バーで示す。It is a graph which shows the restoration frequency of a start codon in a C → T base edit reporter cell. In the figure, the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar. A→G塩基編集レポーター細胞において、終止コドンの破壊頻度を示す、グラフである。図中、3回の独立したトランスフェクション実験の結果をドットで示す、それらの平均値を、バーで示す。It is a graph which shows the disruption frequency of a stop codon in A → G base edit reporter cells. In the figure, the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar. C→T塩基編集レポーター細胞のgRNA標的部位(PAMに対して-30bp~+10bp)において、アンプリコンシーケンシングによってC→T塩基編集が検出された頻度を示す、グラフである。図中、3回の独立したトランスフェクション実験の結果をドットで示す、それらの平均値を、バーで示す。FIG. 5 is a graph showing the frequency with which C → T base editing was detected by amplicon sequencing at the gRNA target site (-30 bp to +10 bp with respect to PAM) of the C → T base editing reporter cell. In the figure, the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar. A→G塩基編集レポーター細胞のgRNA標的部位(PAMに対して-30bp~+10bp)において、アンプリコンシーケンシングによってA→G塩基編集が検出された頻度を示す、グラフである。図中、3回の独立したトランスフェクション実験の結果をドットで示す、それらの平均値を、バーで示す。FIG. 5 is a graph showing the frequency with which A → G base editing was detected by amplicon sequencing at the gRNA target site (-30 bp to +10 bp with respect to PAM) of the A → G base editing reporter cell. In the figure, the results of three independent transfection experiments are indicated by dots, and the average value thereof is indicated by a bar. ゲノム標的部位における平均C→T及びA→G塩基編集スペクトルを示す、グラフである。It is a graph which shows the mean C-> T and A-> G base edit spectrum at a genome target site. ヒトゲノムの内因性標的部位47ヶ所におけるC→T又はA→Gの編集頻度を示すグラフである。黒い水平バーは、C→T又はA→Gの平均編集スペクトルを表す。3回の細胞培養から得られた2つのデータセットにおける任意のペアを比較するために、両側Mann-WhitneyのU検定を行った。矢印は、他よりも高い平均編集頻度を示すデータセットを示す(*P<0.05;**P<0.01;***P<0.001)。It is a graph which shows the editing frequency of C → T or A → G at 47 endogenous target sites of the human genome. The black horizontal bars represent the average edit spectrum of C → T or A → G. A bilateral Mann-Whitney U test was performed to compare any pair in the two datasets obtained from three cell cultures. Arrows indicate datasets that show a higher average editing frequency than others (* P <0.05; ** P <0.01; *** P <0.001). C→T及びA→Gの共編集による編集パターンを示す、図である。読み取り頻度しきい値0.1%(上パネル)を伴う塩基編集条件によって得られた合計722の異種共編集パターンは、種々の塩基編集条件(下パネル)によって生成された頻度に基づいて階層的にクラスター化された。3つの独立した反復実験で観察された編集結果パターンの総数を、各塩基編集条件の右側に示す。It is a figure which shows the editing pattern by the co-editing of C → T and A → G. A total of 722 heterogeneous co-editing patterns obtained by base editing conditions with a read frequency threshold of 0.1% (upper panel) are hierarchical based on the frequencies generated by various base editing conditions (lower panel). Clustered in. The total number of edit result patterns observed in the three independent repeat experiments is shown on the right side of each base edit condition. PAMの-20~--1bp上流領域にて生じる、C→T及びA→G同時編集の平均頻度を、示す図である。It is a figure which shows the average frequency of C-> T and A-> G simultaneous editing which occurs in the -20 to -1bp upstream region of PAM. PAMに対して特定の部位に位置するシトシン及びアデニンの組み合わせにおける、二重機能塩基エディタ又は単機能塩基エディタの混合物による、共編集頻度を示すグラフである。どの塩基編集条件においても平均共編集頻度が上位5にランクされた、位置の組み合わせを示す。各バーは、3回の細胞培養におけるそれぞれの組み合わせ位置におけるシトシン及びアデニンを有する標的部位について測定された平均共編集頻度を示す。十分なサンプルサイズを有する位置の組み合わせにおける平均共編集頻度につき、Target-ACEmaxと、それに対応する酵素混合物と、他の2つの二重塩基エディタを比較するため、両側Mann-WhitneyのU検定を行った(*P<0.05;**P<0.01;***P<0.001)。It is a graph which shows the co-editing frequency by a mixture of a dual-function base editor or a single-function base editor in the combination of cytosine and adenine located at a specific site with respect to PAM. The combination of positions in which the average co-editing frequency is ranked in the top 5 under any base editing condition is shown. Each bar shows the mean co-editing frequency measured for cytosine and adenine-bearing target sites at each combination position in three cell cultures. A two-sided Mann-Whitney U test was performed to compare Target-ACEmax with the corresponding enzyme mixture and the other two bibase editors for mean co-editing frequency in a combination of positions with sufficient sample size. (* P <0.05; ** P <0.01; *** P <0.001). EMX1遺伝子部位1、並びにFANCF遺伝子の部位1及び部位2と、それらに対応するオフターゲット部位における、塩基編集頻度を示す、グラフである。アンプリコンシーケンシング実験は3回行った。It is a graph which shows the base edit frequency in the EMX1 gene site 1, the FANCF gene site 1 and site 2, and the corresponding off-target site. The ampli-consequencing experiment was performed 3 times. DNAオフターゲットリスクを推定した結果を示す、グラフである。jitterプロットは、3回の細胞培養再現実験において、3種のgRNAで測定された相対的なオフターゲット/オンターゲット編集頻度を示し、黒い水平バーはDNAオフターゲットリスクスコアとしての中央値を表す。10-4~10-2付近の領域(カラー表示下、ピンクの色の領域)は、単機能塩基エディタのスコア範囲を示す。両側ウェルチのt検定を行い、二重機能編集エディタと対応する単機能酵素混合物を比較した。矢印は、他よりも高い平均スコアを持つデータセットを示す(**P<0.01)。It is a graph which shows the result of estimating the DNA off-target risk. The jitter plot shows the relative off-target / on-target edit frequency measured for the three gRNAs in three cell culture reproduction experiments, and the black horizontal bar represents the median DNA off-target risk score. The area around 10 -4 to 10 -2 (the area in pink under the color display) indicates the score range of the single-function base editor. A two-sided Welch's t-test was performed and a dual-function editing editor was compared with the corresponding single-function enzyme mixture. Arrows indicate datasets with higher mean scores than others (** P <0.01). 種々の塩基編集条件に供した細胞において検出された、ゲノムワイド C→U RNA変異体の数及び頻度を示す、グラフである。各バーはRNA-seqによって識別される変異体の数を示し、jitterプロットは、それらの変異アリール頻度を示す。実験は、各塩基編集条件について2回行った。It is a graph which shows the number and frequency of genome-wide C → U RNA mutants detected in the cell subjected to various base editing conditions. Each bar shows the number of mutants identified by RNA-seq, and the jitter plot shows their mutant aryl frequency. The experiment was performed twice for each base editing condition. 種々の塩基編集条件に供した細胞において検出された、ゲノムワイド A→I RNA変異体の数及び頻度を示す、グラフである。各バーはRNA-seqによって識別される変異体の数を示し、jitterプロットは、それらの変異アリール頻度を示す。実験は、各塩基編集条件について2回行った。It is a graph which shows the number and frequency of genome-wide A → I RNA mutants detected in the cell subjected to various base editing conditions. Each bar shows the number of mutants identified by RNA-seq, and the jitter plot shows their mutant aryl frequency. The experiment was performed twice for each base editing condition. 種々の塩基編集エディタによる、任意の標的DNA配列における様々な塩基編集パターンの頻度を予測するため、条件付き確率モデルの概要を示す図である。下部のドット図は、実測値と予測値との相関を示す。It is a figure which shows the outline of the conditional probability model in order to predict the frequency of various base editing patterns in an arbitrary target DNA sequence by various base editing editors. The dot diagram at the bottom shows the correlation between the measured value and the predicted value. 種々の塩基編集方法における、コドン変換を伴うバイスタンダー変異リスクスコアを示す、グラフである。横棒は、ドットで表される種々のコドン変換タイプについてのリスクスコアの中央値を表す。10~4×10付近の領域(カラー表示下、ピンク色の領域)は、BE4max(C)を除く単機能塩基エディタの平均リスクスコアの範囲を示す。2つのデータセットにおける任意のペアを比較するために、両側Mann-WhitneyのU検定を実行した。矢印は、他よりも高い平均スコアを持つデータセットを示す(***P<0.001)。It is a graph which shows the bystander mutation risk score with codon conversion in various base editing methods. The horizontal bar represents the median risk score for the various codon conversion types represented by the dots. 10 0 ~ 4 × 10 0 near region (color display under pink areas) indicates the range of the average risk score of single-function base editor except BE4max (C). A two-sided Mann-Whitney U test was performed to compare any pair in the two datasets. Arrows indicate datasets with higher mean scores than others (*** P <0.001). 修正可能な異種疾患変異ペアの頻度を示す、グラフである。ClinVarデータベースで報告された病理学的C・G→T・A及びA・T→G・C 一塩基変動(SNV)の場合、各変異を修正する可能性のあるgRNA標的部位は、標的変異から±25bp領域内で最初にスクリーニングされた。標的突然変異から±15bp領域に望ましくないバイスタンダー変異を誘導することなく、これらの異なるgRNA標的部位による変異を補正する確率を、全てのアンプリコンシーケンシングデータセットによって訓練された塩基編集予測モデルを用いて予測した。各変異について、その修正率は、種々のRNAによって誘導されるものの中で標的変換パターンの最大確率として定義された。最後に、同じ疾患遺伝子における2つの異種突然変異の組み合わせを数えることによって異種突然変異を修正する異種の塩基編集条件の能力を評価し、いずれも5%の修正率の閾値で修正可能であると予測した。It is a graph which shows the frequency of the heterogeneous disease mutation pair which can be modified. In the case of pathological C / G → TA and A / T → G / C single nucleotide polymorphisms (SNVs) reported in the ClinVar database, the gRNA target sites that may correct each mutation are from the target mutation. It was first screened within the ± 25 bp region. A base-edited predictive model trained by all amplicon sequencing datasets to determine the probability of correcting mutations at these different gRNA target sites without inducing unwanted bystander mutations in the ± 15 bp region from the target mutations. Predicted using. For each mutation, its modification rate was defined as the maximum probability of a target conversion pattern among those induced by various RNAs. Finally, the ability of heterologous base editing conditions to correct heterologous mutations was evaluated by counting the combination of two heterologous mutations in the same disease gene, both of which could be corrected with a 5% correction rate threshold. I predicted.
 <融合タンパク質>
 本発明においては、Casタンパク質のN末端及びC末端に異なる核酸塩基変換酵素が結合した融合タンパク質であって、Casタンパク質がヌクレアーゼ活性の一部または全部を喪失している融合タンパク質を提供する。
<Fusion protein>
The present invention provides a fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, wherein the Cas protein loses part or all of its nuclease activity.
 本発明の融合タンパク質におけるCasタンパク質は、ヌクレアーゼ活性の一部または全部を喪失している。ヌクレアーゼ活性の一部を喪失しているCasタンパク質を「nCas」と、ヌクレアーゼ活性の全部を喪失しているCasタンパク質を「dCas」と称する。Casタンパク質は、典型的には、標的鎖の切断に関与するドメイン(RuvCドメイン)及び非標的鎖の切断に関与するドメイン(HNHドメイン)を含むが、本発明で用いるCasタンパク質は、好ましくは、少なくとも一方のドメインへの変異の導入により、当該ドメインのヌクレアーゼ活性が喪失している。このような変異としては、SpCas9タンパク質(S.pyogenes由来のCas9タンパク質)の場合には、例えば、N末端から10番目のアミノ酸(アスパラギン酸)のアラニンへの変異(D10A:RuvCドメイン内の変異)、N末端から840番目のアミノ酸(ヒスチジン)のアラニンへの変異(H840A:HNHドメイン内の変異)、N末端から863番目のアミノ酸(アスパラギン)のアラニンへの変異(N863A:HNHドメイン内の変異)、N末端から762番目のアミノ酸(グルタミン酸)のアラニンへの変異(E762A:RuvCIIドメイン内の変異)、N末端から986番目のアミノ酸(アスパラギン酸)のアラニンへの変異(D986A:RuvCIIIドメイン内の変異)が挙げられる。その他、種々の由来のCas9タンパク質が公知であり(例えば、WO2014/131833)、それらのnCas又はdCasを利用することができる。なお、Cas9タンパク質のアミノ酸配列及び塩基配列は公開されたデータベース、例えば、GenBank(http://www.ncbi.nlm.nih.gov)に登録されており(例えば、アクセッション番号:Q99ZW2.1等)、本発明においてはこれらを利用することができる。 The Cas protein in the fusion protein of the present invention has lost part or all of its nuclease activity. A Cas protein that has lost part of its nuclease activity is called "nCas", and a Cas protein that has lost all of its nuclease activity is called "dCas". The Cas protein typically comprises a domain involved in cleavage of the target strand (RuvC domain) and a domain involved in cleavage of the non-target strand (HNH domain), but the Cas protein used in the present invention is preferably preferred. The introduction of a mutation into at least one domain results in the loss of nuclease activity in that domain. As such a mutation, in the case of SpCas9 protein (Cas9 protein derived from S. pyogenes), for example, a mutation of the 10th amino acid (aspartic acid) from the N-terminal to alanine (D10A: mutation in the RuvC domain). , Mutation of amino acid 840th from N-terminal (histidine) to alanine (H840A: mutation in HNH domain), mutation of amino acid 863th from N-terminal (aspartic acid) to alanine (N863A: mutation in HNH domain) , The mutation of the 762th amino acid (glutamic acid) from the N-terminal to alanine (E762A: mutation in the RuvCII domain), the mutation of the 986th amino acid (aspartic acid) from the N-terminal to alanine (D986A: mutation in the RuvCIII domain) ). In addition, Cas9 proteins of various origins are known (eg, WO2014 / 131833), and their nCas or dCas can be utilized. The amino acid sequence and base sequence of Cas9 protein are registered in a public database, for example, GenBank (http://www.ncbi.nlm.nih.gov) (for example, accession number: Q99ZW2.1, etc.). ), These can be used in the present invention.
 Casタンパク質には、さらなる変異、例えば、PAM認識を改変するための変異が導入されていてもよい(Benjamin,P.ら、Nature 523, 481-485(2015)、Hirano,S.ら、 Molecular Cell 61, 886-894(2016))。 Further mutations, such as mutations to alter PAM recognition, may be introduced into the Cas protein (Benjamin, P. et al., Nature 523, 481-485 (2015), Hirano, S. et al., Molecular Cell. 61, 886-894 (2016)).
 本発明においては、Cas9以外のCasタンパク質、例えば、Cpf1(Cas12)、Cas12b、CasX(Cas12e)、Cas14等を利用することもできる。 In the present invention, Cas proteins other than Cas9, for example, Cpf1 (Cas12), Cas12b, CasX (Cas12e), Cas14 and the like can also be used.
 本発明において「核酸塩基変換酵素」とは、DNA塩基のプリン又はピリミジン環上の置換基を他の基又は原子に変換する反応を触媒することにより、DNA鎖を切断することなく、標的のヌクレオチドを他のヌクレオチドに変換し得る酵素を意味し、例えば、デアミナーゼが挙げられる。「デアミナーゼ」は、塩基のアミノ基をカルボニル基に変換する脱アミノ化反応を触媒する酵素であり、核酸/ヌクレオチドデアミナーゼスーパーファミリーに属する。このようなデアミナーゼとしては、シトシン又は5-メチルシトシンをそれぞれウラシル又はチミンに変換し得るシチジンデアミナーゼ、アデニンをヒポキサンチンに変換し得るアデノシンデアミナーゼ、グアニンをキサンチンに変換し得るグアノシンデアミナーゼが挙げられる。また、核酸塩基変換酵素の由来は特に制限されないが、例えば、哺乳動物(例えば、ヒト、ブタ、ウシ、ウマ、サル)、魚類(例えば、ヤツメウナギ)、細菌(例えば、大腸菌)が挙げられる。 In the present invention, the "nucleobase converting enzyme" refers to a target nucleotide without breaking the DNA strand by catalyzing a reaction of converting a substituent on the purine or pyrimidine ring of a DNA base into another group or atom. Means an enzyme capable of converting a nucleic acid into another nucleotide, and examples thereof include deaminase. A "deaminase" is an enzyme that catalyzes a deamination reaction that converts an amino group of a base into a carbonyl group and belongs to the nucleic acid / nucleotide deaminase superfamily. Examples of such deaminase include cytosine deaminase capable of converting cytosine or 5-methylcytosine to uracil or thymine, adenosine deaminase capable of converting adenine to hypoxanthine, and guanosine deaminase capable of converting guanine to xanthine. The origin of the nucleobase converting enzyme is not particularly limited, and examples thereof include mammals (eg, humans, pigs, cows, horses, monkeys), fish (eg, lampreys), and bacteria (eg, Escherichia coli).
 Casタンパク質のN末端及びC末端に融合する「異なる核酸塩基変換酵素」としては、特に制限はないが、アデノシンデアミナーゼとシチジンデアミナーゼの組み合わせが好ましい。また、融合タンパク質においては、アデノシンデアミナーゼがCasタンパク質のN末端に、シチジンデアミナーゼがCasタンパク質のC末端に結合していることが好ましい。 The "different nucleobase converting enzyme" fused to the N-terminal and C-terminal of the Cas protein is not particularly limited, but a combination of adenosine deaminase and cytidine deaminase is preferable. Further, in the fusion protein, it is preferable that adenosine deaminase is bound to the N-terminal of the Cas protein and cytidine deaminase is bound to the C-terminal of the Cas protein.
 アデノシンデアミナーゼとしては、例えば、大腸菌由来のTadA及びその改変体が挙げられる。野生型のTadAの典型的なアミノ酸配列としては、配列番号:2に記載の218~383位からなるアミノ酸配列が挙げられ、その改変体(TadA*)のアミノ酸配列としては、配列番号:2に記載の20~185位からなるアミノ酸配列が挙げられる。また、本発明の融合タンパク質においては、オンターゲットのDNA塩基置換の活性は維持しつつオフターゲットのRNA編集を低減し易くなるという観点から、TadA及びその改変体の両方を有していることが望ましい。 Examples of adenosine deaminase include Escherichia coli-derived TadA and its variants. A typical amino acid sequence of wild-type TadA includes an amino acid sequence consisting of positions 218 to 383 shown in SEQ ID NO: 2, and an amino acid sequence of a variant thereof (TadA *) is shown in SEQ ID NO: 2. Examples thereof include the amino acid sequences consisting of the 20th to 185th positions described above. In addition, the fusion protein of the present invention has both TadA and a variant thereof from the viewpoint of facilitating reduction of off-target RNA editing while maintaining the activity of on-target DNA base substitution. desirable.
 シチジンデアミナーゼとしては、例えば、ヤツメウナギ由来のPmCDA1(Petromyzon marinus cytosine deaminase 1)、APOBEC(APOBEC1、APOBEC2、APOBEC3(APOBEC3A、3B、3C、3D(3E)、3F、3G、3H等)、APOBEC4等)、APOBECの祖先アミノ酸配列であるAnc689、哺乳動物由来のAID(Activation-induced cytidine deaminase(AICDA))、AIDのファミリー、及びそれらの改変体が挙げられる。これらの中で、Casタンパク質のC末端に結合した際に高い触媒活性を示し易いという観点から、PmCDA1が好ましい。PmCDA1の典型的なアミノ酸配列としては、配列番号:2に記載の1876~2083位からなるアミノ酸配列が挙げられる。APOBEC1のアミノ酸配列の例としては、配列番号:6に記載の1876~2103位のアミノ酸配列が挙げられる。 Examples of the cytosine deaminase include PmCDA1 (Petromyzon mammal cytosine deaminase 1), APOBEC (APOBEC1, APOBEC2, APOBEC3 (APOBEC3A, 3B, 3C, 3D (3E), etc.), 3G, 3D (3E), 3F, 3F, etc. Examples thereof include Anc689, which is an ancestral amino acid sequence of APOBEC, AID (Activation-induced cytosine deaminase (AICDA)) derived from mammals, a family of AIDs, and variants thereof. Among these, PmCDA1 is preferable from the viewpoint that it easily exhibits high catalytic activity when bound to the C-terminal of Cas protein. A typical amino acid sequence of PmCDA1 includes the amino acid sequence consisting of positions 1876 to 2083 shown in SEQ ID NO: 2. Examples of the amino acid sequence of APOBEC1 include the amino acid sequence at positions 1876 to 2103 shown in SEQ ID NO: 6.
 本発明の「融合タンパク質」には、さらに、核移行シグナル及びウラシルグリコシラーゼ阻害剤の少なくとも1つが結合していていることが好ましい。 It is preferable that at least one of the nuclear localization signal and the uracil glycosylase inhibitor is further bound to the "fusion protein" of the present invention.
 本発明にかかる「核移行シグナル」は、本発明の融合タンパク質を細胞核へ輸送する際の目印となるアミノ酸配列であれば、特に制限はないが、塩基性アミノ酸からなるクラスターを1つ有する配列(単節型)であってもよく、2つの塩基性アミノ酸からなるクラスターがスペーサー配列を介して結合している配列(双節型)であってもよい。核移行シグナルは、本発明の融合タンパク質において、N末端及びC末端の少なくともどちらか一方に結合していてもよく、また各ドメインの間に挿入されていてもよい。 The "nuclear localization signal" according to the present invention is not particularly limited as long as it is an amino acid sequence that serves as a marker for transporting the fusion protein of the present invention to the cell nucleus, but is a sequence having one cluster consisting of basic amino acids ( It may be a mononode type) or a sequence in which clusters consisting of two basic amino acids are linked via a spacer sequence (binode type). The nuclear localization signal may be bound to at least one of the N-terminal and the C-terminal in the fusion protein of the present invention, or may be inserted between each domain.
 本発明にかかる「ウラシルグリコシラーゼ阻害剤」は、シチジンデアミナーゼによってシトシンから変換されるウラシルを、ウラシルDNAグリコシラーゼによって分解されるのを抑制できるものであればよく、例えば、配列番号:2に記載の2095~2177位からなるアミノ酸配列が挙げられる。 The "uracil glycosylase inhibitor" according to the present invention may be any as long as it can suppress the degradation of uracil converted from cytosine by cytidine deaminase by uracil DNA glycosylase, for example, 2095 described in SEQ ID NO: 2. An amino acid sequence consisting of ~ 2177 positions can be mentioned.
 ウラシルグリコシラーゼ阻害剤は、前記抑制活性を奏する限り、本発明の融合タンパク質において、N末端及びC末端の少なくともどちらか一方に結合していてもよく、また各ドメインの間に挿入されていてもよいが、好ましくは、C末端に結合していることが好ましい。また、本発明の融合タンパク質におけるウラシルグリコシラーゼ阻害剤の個数としても、前記抑制活性奏せる限り、特に制限はなく、1個であってもよく、また複数個(例えば、2、3、4又は5個)であってもよい。 The uracil glycosylase inhibitor may be bound to at least one of the N-terminal and the C-terminal in the fusion protein of the present invention as long as it exhibits the inhibitory activity, or may be inserted between the domains. However, it is preferable that it is bound to the C-terminal. Further, the number of uracil glycosylase inhibitors in the fusion protein of the present invention is not particularly limited as long as the inhibitory activity can be exhibited, and may be one, or a plurality (for example, 2, 3, 4, or 5). It may be (pieces).
 本発明の融合タンパク質において、上述のCasタンパク質、核酸塩基変換酵素、核移行シグナル及びウラシルグリコシラーゼ阻害剤は、例えば、以下のようにN末から順に、アデノシンデアミナーゼ-Casタンパク質-シチジンデアミナーゼ-ウラシルグリコシラーゼ阻害剤と配置される。 In the fusion protein of the present invention, the above-mentioned Cas protein, nucleobase converting enzyme, nuclear translocation signal and uracil glycosylase inhibitor are, for example, adenosine deaminase-Cas protein-citidine deaminase-uracil glycosylase inhibition in order from N-terminal as follows. Placed with the agent.
 また、本発明の融合タンパク質の好ましい一態様としては、N末から順に、核移行シグナル-アデノシンデアミナーゼ-Casタンパク質-シチジンデアミナーゼ-核移行シグナル―ウラシルグリコシラーゼ阻害剤が挙げられ、さらに好ましくは、N末から順に、双節型核移行シグナル-改変型TadA(TadA*)-TadA-SpCas9(D10A)-PmCDA1-単節型核移行シグナル―ウラシルグリコシラーゼ阻害剤が挙げられ、より好ましくは、配列番号:2に記載のアミノ酸配列からなるタンパク質(Target-ACEmax)である。 In addition, as a preferred embodiment of the fusion protein of the present invention, nuclear localization signal-adenosin deaminase-Cas protein-cytidine deaminase-nuclear transfer signal-uracil glycosylase inhibitor can be mentioned in order from N-terminal, and more preferably N-terminal. In order from, binode nuclear localization signal-modified TadA (TadA *) -TadA-SpCas9 (D10A) -PmCDA1-mononode nuclear localization signal-uracil glycosylase inhibitor is mentioned, more preferably SEQ ID NO: 2. It is a protein (Taget-ACEmax) consisting of the amino acid sequence described in 1.
 本発明の融合タンパク質の好ましい他の一態様としては、N末から順に、アデノシンデアミナーゼ-Casタンパク質-シチジンデアミナーゼ-核移行シグナル―ウラシルグリコシラーゼ阻害剤が挙げられ、さらに好ましくは、N末から順に、改変型TadA(TadA*)-TadA-SpCas9(D10A)-PmCDA1-単節型核移行シグナル―ウラシルグリコシラーゼ阻害剤が挙げられ、より好ましくは、配列番号:4に記載のアミノ酸配列からなるタンパク質(Target-ACE)である。 Another preferred embodiment of the fusion protein of the present invention includes adenosine deaminase-Cas protein-citidine deaminase-nuclear localization signal-uracil glycosylase inhibitor in order from N-terminal, and more preferably modified in order from N-terminal. Type TadA (TadA *)-TadA-SpCas9 (D10A) -PmCDA1-mononode nuclear localization signal-uracil glycosylase inhibitor, more preferably a protein consisting of the amino acid sequence set forth in SEQ ID NO: 4 (Tadet-). ACE).
 本発明の融合タンパク質の好ましい一態様としては、N末から順に、核移行シグナル-アデノシンデアミナーゼ-Casタンパク質-シチジンデアミナーゼ-核移行シグナル―ウラシルグリコシラーゼ阻害剤-核移行シグナルが挙げられ、さらに好ましくは、N末から順に、双節型核移行シグナル-改変型TadA(TadA*)-TadA-SpCas9(D10A)-APOBEC1-ウラシルグリコシラーゼ阻害剤-ウラシルグリコシラーゼ阻害剤-双節型核移行シグナルが挙げられ、より好ましくは、配列番号:6に記載のアミノ酸配列からなるタンパク質(ACBEmax)である。 A preferred embodiment of the fusion protein of the present invention includes, in order from the N-terminal, a nuclear localization signal-adenosine deaminase-Cas protein-citidine deaminase-nuclear transfer signal-uracil glycosylase inhibitor-nuclear transfer signal, and more preferably From the N-terminal, binode nuclear localization signal-modified TadA (TadA *) -TadA-SpCas9 (D10A) -APOBEC1-urasyl glycosylase inhibitor-urasyl glycosylase inhibitor-binode nuclear localization signal, and more. Preferably, it is a protein (ACBEmax) consisting of the amino acid sequence shown in SEQ ID NO: 6.
 また、本発明の融合タンパク質において、上述のCasタンパク質、核酸塩基変換酵素、核移行シグナル及びウラシルグリコシラーゼ阻害剤は、直接結合していてもよく、またリンカー配列を介して間接的に結合していてもよい。 Further, in the fusion protein of the present invention, the Cas protein, nucleobase converting enzyme, nuclear localization signal and uracil glycosylase inhibitor described above may be directly bound or indirectly bound via a linker sequence. May be good.
 本発明において「リンカー配列」としては、前記各タンパク質の機能が抑制されない限り、特に制限はなく、グリシン及びセリンから構成されるリンカー(GGSペプチド、2×GGSペプチド、3×GGSペプチド、GSペプチド、2×GSペプチド、3×GSペプチド等)が挙げられる。また、各タンパク質間にこれらのリンカー配列を1のみ配置してもよく、複数(例えば、2、3、4又は5配列)配置してもよい。また複数のリンカー配列を配置する場合には、1種のリンカー配列のみから構成されていてもよく、また複数種のリンカー配列から構成されていてもよい。 In the present invention, the "linker sequence" is not particularly limited as long as the function of each of the proteins is not suppressed, and a linker composed of glycine and serine (GGS peptide, 2 × GGS peptide, 3 × GGS peptide, GS peptide, 2 × GS peptide, 3 × GS peptide, etc.). In addition, only one of these linker sequences may be arranged between each protein, or a plurality of (for example, 2, 3, 4 or 5 sequences) may be arranged. When arranging a plurality of linker sequences, it may be composed of only one kind of linker sequence, or may be composed of a plurality of kinds of linker sequences.
 本発明において各タンパク質間に配置される「リンカー配列」のアミノ酸数としては、前記各タンパク質の機能が抑制されない限り、特に制限されるものではないが、通常1~300アミノ酸であり、下限としては、好ましくは2アミノ酸以上(例えば、3アミノ酸以上、4アミノ酸以上、5アミノ酸以上、6アミノ酸以上、7アミノ酸以上、8アミノ酸以上、9アミノ酸以上)であり、より好ましくは10アミノ酸以上(例えば、20アミノ酸以上、30アミノ酸以上、40アミノ酸以上、50アミノ酸以上、60アミノ酸以上、70アミノ酸以上、80アミノ酸以上、90アミノ酸以上)である。上限としては、好ましくは200アミノ酸以下(例えば、190アミノ酸下、180アミノ酸下、170アミノ酸下、160アミノ酸下)であり、より好ましくは150アミノ酸下(例えば、140アミノ酸以下、130アミノ酸以下、120アミノ酸以下、110アミノ酸以下)であり、さらに好ましくは100アミノ酸以下(例えば、90アミノ酸以下、80アミノ酸以下、70アミノ酸以下、60アミノ酸以下)であり、より好ましくは50アミノ酸以下(例えば、40アミノ酸以下、30アミノ酸以下、20アミノ酸以下、10アミノ酸以下)である。 In the present invention, the number of amino acids in the "linker sequence" arranged between each protein is not particularly limited as long as the function of each protein is not suppressed, but is usually 1 to 300 amino acids, and the lower limit is set. It is preferably 2 amino acids or more (for example, 3 amino acids or more, 4 amino acids or more, 5 amino acids or more, 6 amino acids or more, 7 amino acids or more, 8 amino acids or more, 9 amino acids or more), and more preferably 10 amino acids or more (for example, 20 amino acids or more). Amino acids or more, 30 amino acids or more, 40 amino acids or more, 50 amino acids or more, 60 amino acids or more, 70 amino acids or more, 80 amino acids or more, 90 amino acids or more). The upper limit is preferably 200 amino acids or less (for example, 190 amino acids or less, 180 amino acids or less, 170 amino acids or 160 amino acids), and more preferably 150 amino acids or less (for example, 140 amino acids or less, 130 amino acids or less, 120 amino acids). Hereinafter, it is 110 amino acids or less, more preferably 100 amino acids or less (for example, 90 amino acids or less, 80 amino acids or less, 70 amino acids or less, 60 amino acids or less), and more preferably 50 amino acids or less (for example, 40 amino acids or less, 30 amino acids or less, 20 amino acids or less, 10 amino acids or less).
 また、本発明の融合タンパク質は、上述のCasタンパク質、核酸塩基変換酵素、核移行シグナル、ウラシルグリコシラーゼ阻害剤及びリンカー配列の他、他の機能性タンパク質を有していてもよい。他の機能性タンパク質としては特に制限はなく、本発明の融合タンパク質に付与したい機能に応じて適宜選択される。例えば、融合タンパク質の精製及び検出を容易にする目的で用いる機能性タンパク質としては、例えば、3×FLAGタグペプチド、FLAGタグペプチド(共に、登録商標、Sigma-Aldrich社)、XTENペプチド、SH3ペプチド及びその改変型、Mycタグ、Hisタグ、HAタグ、蛍光タンパク質タグ(GFP等)が挙げられる。 Further, the fusion protein of the present invention may have other functional proteins in addition to the above-mentioned Cas protein, nucleobase converting enzyme, nuclear translocation signal, uracil glycosylase inhibitor and linker sequence. The other functional protein is not particularly limited, and is appropriately selected according to the function to be imparted to the fusion protein of the present invention. For example, functional proteins used for the purpose of facilitating purification and detection of fusion proteins include, for example, 3 × FLAG tag peptide, FLAG tag peptide (both registered trademarks, Sigma-Aldrich), XTEN peptide, SH3 peptide and the like. Examples thereof include modified versions, Myc tags, His tags, HA tags, and fluorescent protein tags (GFP and the like).
 以上、本発明の融合タンパク質について説明したが、融合タンパク質を構成する各種タンパク質等に関する具体的なアミノ酸配列は、上記にて示した以外にも、当業者であれば、公知の文献やNCBI(http://www.ncbi.nlm.nih.gov/guide/)等のデータベースを検索して適宜入手することができる。また、そのようにして入手できる典型的なアミノ酸配列(例えば、NCBI レファレンスシークエンス)に限らず、本発明に係るタンパク質等は、当該タンパク質が担う各機能を維持している限り、それら典型的なアミノ酸配列に対する改変体、相同体も含み得る。また、かかる改変体、相同体は、典型的なアミノ酸配列に対し、通常高い相同性を有する。高い相同性は、通常60%以上であり、好ましくは70%以上、より好ましくは80%以上、さらに好ましくは90%以上(例えば、95%以上、96%以上、97%以上、98%以上、99%以上)である。 Although the fusion protein of the present invention has been described above, specific amino acid sequences relating to various proteins constituting the fusion protein can be found in other than those shown above, as well as known literature and NCBI (http://https) by those skilled in the art. : //Www.ncbi.nlm.nih.gov/guide/) and the like can be searched and obtained as appropriate. Further, not limited to the typical amino acid sequence (for example, NCBI reference sequence) obtained in this way, the proteins and the like according to the present invention are the typical amino acids as long as each function of the protein is maintained. It may also include variants and homologues to the sequence. In addition, such variants and homologues usually have high homology to a typical amino acid sequence. The high homology is usually 60% or more, preferably 70% or more, more preferably 80% or more, still more preferably 90% or more (eg, 95% or more, 96% or more, 97% or more, 98% or more, 99% or more).
 また、本発明の融合タンパク質は、当業者であれば、そのアミノ酸配列をコードするヌクレオチド配列の情報に基づき、大腸菌、動物細胞、昆虫細胞、植物細胞、無細胞タンパク質合成系(例えば、網状赤血球抽出液、小麦胚芽抽出液)等を用い遺伝学的手法によって、生合成することもできる。また、前記アミノ酸配列の情報に基づき、ペプチド合成機等を用いて、化学的に合成することもできる。 In addition, the fusion protein of the present invention can be obtained by a person skilled in the art based on the information of the nucleotide sequence encoding the amino acid sequence of Escherichia coli, animal cells, insect cells, plant cells, and cell-free protein synthesis system (for example, reticulated red erythrocyte extraction). It can also be biosynthesized by a genetic method using a liquid, wheat germ extract) or the like. Further, based on the information on the amino acid sequence, it can be chemically synthesized using a peptide synthesizer or the like.
 <ポリヌクレオチド、及びベクター>
 本発明は、上述の融合タンパク質をコードするポリヌクレオチドを提供する。かかるポリヌクレオチドとしては、特に制限はないが、導入する宿主細胞に合わせて使用コドンをCDS全長にわたり最適化することが好ましい。異種ポリヌクレオチドの発現に際し、その配列を宿主生物において使用頻度の高いコドンに変換することで、タンパク質発現量の増大が期待できる。使用する宿主におけるコドン使用頻度のデータは、例えば、公益財団法人 かずさDNA研究所のホームページに公開されている遺伝暗号使用頻度データベース(http://www.kazusa.or.jp/codon/index.html)を用いることができ、または各宿主におけるコドン使用頻度を記した文献を参照してもよい。入手したデータと導入しようとするDNA配列を参照し、該DNA配列に用いられているコドンの中で宿主において使用頻度の低いものを、同一のアミノ酸をコードし使用頻度の高いコドンに変換すればよい。また、各種メーカーが提供するプログラムを用いて最適化することができる。かかるメーカー及び提供するプログラムとしては、例えば、GenScript社(OptimumGene)、Integrated DNA Technologies社(Codon Optimization)、ユーロフィンジェノミクス社(GENEius)が挙げられ、これらの中では、GenScript社(OptimumGene)によるコドン最適化が好ましい。
<Polynucleotides and vectors>
The present invention provides polynucleotides encoding the fusion proteins described above. The polynucleotide is not particularly limited, but it is preferable to optimize the codons used over the entire length of the CDS according to the host cell to be introduced. When expressing a heterologous polynucleotide, the protein expression level can be expected to increase by converting the sequence into codons that are frequently used in the host organism. For example, the codon usage frequency data in the host used can be obtained from the genetic code usage frequency database (http://www.kazusa.or.jp/codon/index.html) published on the homepage of the Kazusa DNA Research Institute. ) Can be used, or the literature describing the frequency of codon usage in each host may be referred to. By referring to the obtained data and the DNA sequence to be introduced, the codons used in the DNA sequence that are not frequently used in the host can be converted into codons that are frequently used by encoding the same amino acid. Good. In addition, it can be optimized by using programs provided by various manufacturers. Examples of such manufacturers and programs provided include GenScript (OptimumGene), Integrated DNA Technologies (Codon Optimization), and Eurofins Genomics (GENEius), among which GenScript codon (OptimumGene) by GenScript codon (Op) Is preferable.
 本発明はまた、前記ポリヌクレオチドがコードする融合タンパク質が宿主において発現できるよう、発現ベクターの態様もとり得る。発現ベクターの形態を採用する場合には、発現させるべきポリヌクレオチド(DNA)に作動的に結合している1つ以上の調節エレメントを含む。ここで、「作動可能に結合している」とは、調節エレメントに上記DNAが発現可能に結合していることを意味する。「調節エレメント」としては、プロモーター、エンハンサー、内部リボソーム進入部位(IRES)、自己切断型ペプチド(例えば、自己切断型2Aペプチド)、及び他の発現制御エレメント(例えば、転写終結シグナル、例えば、ポリアデニル化シグナル及びポリU配列)が挙げられる。調節エレメントとしては、目的に応じて、例えば、多様な宿主細胞中でのDNAの構成的発現を指向するものであっても、特定の細胞、組織、あるいは器官でのみDNAの発現を指向するものであってもよい。また、特定の時期にのみDNAの発現を指向するものであっても、人為的に誘導可能なDNAの発現を指向するものであってもよい。プロモーターとしては、例えば、polIIプロモーター(例えば、レトロウイルスのラウス肉腫ウイルス(RSV)LTRプロモーター、サイトメガロウイルス(CMV)プロモーター、SV40プロモーター、ジヒドロ葉酸レダクターゼプロモーター、β-アクチンプロモーター、ホスホグリセロールキナーゼ(PGK)プロモーター、及びEF1αプロモーター)、polIプロモーター、又はそれらの組み合せが挙げられる。また、本発明の発現ベクターは、宿主細胞における導入を検出できるように、選択マーカー(薬剤耐性遺伝子、栄養要求性相補遺伝子等)を含むものであってもよい。 The present invention may also take the form of an expression vector such that the fusion protein encoded by the polynucleotide can be expressed in the host. When adopting the form of an expression vector, it comprises one or more regulatory elements that are operably linked to the polynucleotide (DNA) to be expressed. Here, "operably bound" means that the DNA is operably bound to the regulatory element. "Regulatory elements" include promoters, enhancers, internal ribosome entry sites (IRES), self-cleaving peptides (eg, self-cleaving 2A peptides), and other expression control elements (eg, transcription termination signals, eg, polyadenylation). Signal and poly U sequences). Regulatory elements, depending on the purpose, for example, those that direct the constitutive expression of DNA in various host cells but that direct the expression of DNA only in specific cells, tissues, or organs. It may be. Further, the expression of DNA may be directed only at a specific time, or the expression of artificially inducible DNA may be directed. Examples of promoters include the polII promoter (eg, retrovirus Rous sarcoma virus (RSV) LTR promoter, cytomegalovirus (CMV) promoter, SV40 promoter, dihydrofolate reductase promoter, β-actin promoter, phosphoglycerol kinase (PGK)). Promoters and EF1α promoters), polI promoters, or combinations thereof. In addition, the expression vector of the present invention may contain a selectable marker (drug resistance gene, nutritional requirement complementary gene, etc.) so that introduction into a host cell can be detected.
 当業者であれば、導入する細胞の種類等に応じて、適切な発現ベクターを選択することができる。また、かかる発現ベクターは、化学的にDNA鎖を合成するか、または、後述の実施例に示すとおり、合成した一部オーバーラップするオリゴDNA短鎖を、PCR法やGibson Assembly法を利用して接続することにより、その全長をコードするDNAを構築することも可能である。 A person skilled in the art can select an appropriate expression vector according to the type of cell to be introduced and the like. In addition, such an expression vector chemically synthesizes a DNA strand, or, as shown in Examples described later, a partially overlapping oligo DNA short strand synthesized by using a PCR method or a Gibson Assembly method. By connecting, it is also possible to construct a DNA encoding the entire length.
 <ゲノム編集システム>
 本発明は、以下の(A)及び(B)を含むゲノム編集システムを提供する。
(A)上記融合タンパク質、該タンパク質をコードするポリヌクレオチド、または該ポリヌクレオチドを含む発現ベクター。
(B)ガイドRNA、該ガイドRNAをコードするポリヌクレオチド、または該ポリヌクレオチドを含む発現ベクター。
<Genome editing system>
The present invention provides a genome editing system including the following (A) and (B).
(A) The fusion protein, a polynucleotide encoding the protein, or an expression vector containing the polynucleotide.
(B) A guide RNA, a polynucleotide encoding the guide RNA, or an expression vector containing the polynucleotide.
 (A)については上述のとおりである。また(B)に関し、ガイドRNAは、CRISPR/Cas9システムに対応した、crRNAとtracrRNAの組み合わせである。crRNAとtracrRNAは、一分子の形態(例えば、介在配列を挟んで結合した形態)であっても、二分子の形態であってもよい。crRNAは少なくとも、標的DNA領域の塩基配列に対して相補的な塩基配列(標的化塩基配列)とtracrRNAと相互作用可能な塩基配列を、5’側よりこの順で含んでなる。crRNAは、tracrRNAと相互作用可能な塩基配列において、tracrRNAと二重鎖RNAを形成し、形成された二重鎖RNAは、Cas9タンパク質と相互作用する。これによりCas9タンパク質が標的DNA領域にガイドされる。 (A) is as described above. Regarding (B), the guide RNA is a combination of crRNA and tracrRNA corresponding to the CRISPR / Cas9 system. The crRNA and tracrRNA may be in the form of a single molecule (for example, a form in which an intervening sequence is sandwiched between them) or a two-molecule form. The crRNA contains at least a base sequence complementary to the base sequence of the target DNA region (targeted base sequence) and a base sequence capable of interacting with tracrRNA in this order from the 5'side. The crRNA forms a double-stranded RNA with the tracrRNA in a base sequence capable of interacting with the tracrRNA, and the formed double-stranded RNA interacts with the Cas9 protein. This guides the Cas9 protein to the target DNA region.
 前記標的DNA領域において、標的部位における塩基編集は、Cas9タンパク質が認識するPAM配列の上流(5’側)に位置する、ガイドRNAの標的となるDNA上の配列(ガイドRNA標的配列)内にて生じる。より詳細には、ガイドRNA標的配列と、それに相補的な配列である標的化塩基配列との間の塩基対形成の相補性と、ガイドRNA標的配列の相補鎖の3’側に位置するPAM配列と、の両方によって決定される位置で起きる。本発明に係るCas9タンパク質のガイドRNAは、前記Cas9タンパク質によるPAM配列の認識が可能となるように、前記PAM配列の5’側に存在する塩基配列の相補鎖を認識するように、すなわち、ガイドRNAが認識するガイドRNA標的配列が、前記Cas9タンパク質によるPAM配列の認識が可能となるように、前記PAM配列の5’側の塩基配列の相補鎖の塩基配列となるように、設計される。 In the target DNA region, base editing at the target site is performed within the sequence on the target DNA of the guide RNA (guide RNA target sequence) located upstream (5'side) of the PAM sequence recognized by the Cas9 protein. Occurs. More specifically, the complementarity of base pairing between the guide RNA target sequence and the targeting base sequence which is a complementary sequence thereof, and the PAM sequence located on the 3'side of the complementary strand of the guide RNA target sequence. And, it happens at the position determined by both. The guide RNA of the Cas9 protein according to the present invention recognizes the complementary strand of the base sequence existing on the 5'side of the PAM sequence so that the PAM sequence can be recognized by the Cas9 protein, that is, a guide. The guide RNA target sequence recognized by RNA is designed to be the base sequence of the complementary strand of the base sequence on the 5'side of the PAM sequence so that the PAM sequence can be recognized by the Cas9 protein.
 このようなCas9タンパク質のPAM配列及びガイドRNA標的配列を設計する方法としては、種々の方法が知られており、例えば、CRISPR Design Tool(http://crispr.mit.edu/)(マサチューセッツ工科大学)、E-CRISP(http://www.e-crisp.org/E-CRISP/)、Zifit Targeter(http://zifit.partners.org/ZiFiT/)(Zing Fingerコンソーシアム)、Cas9 design(http://cas9.cbi.pku.edu.cn/)(北京大学)、CRISPRdirect(http://crispr.dbcls.jp/)(東京大学)、CRISPR-P(http://cbi.hzau.edu.cn/crispr/)(華中農業大学)、Guide RNA Target Design Tool.(https://wwws.blueheronbio.com/external/tools/gRNASrc.jsp)(Blue Heron Biotech)等を利用して決定することができる。 Various methods are known as methods for designing the PAM sequence and guide RNA target sequence of such Cas9 protein. For example, CRISPR Design Tool (http://crispr.mit.edu/) (Massachusetts Institute of Technology). ), E-CRISPR (http://www.e-crisp.org/E-CRISPR/), Zifit CRISPR (http://zifit.partners.org/ZiFiT/) (ZingFinger Consortium), Cas9des : //Cas9.cbi.pku.edu.cn/) (Beijing University), CRISPRdirect (http://crispr.dbcls.jp/) (Tokyo University), CRISPR-P (http://cbi.hzau.edu) .Cn / crispr /) (China Agricultural University), Guide RNA Target Design Tool. It can be determined by using (https://wwws.blueheronbio.com/external/tools/gRNASrc.jsp) (BlueHeronBiotech) and the like.
 また、CRISPR/Casシステムによっては、ガイドRNAにtracrRNAを含まない場合がある。このようなガイドRNAを構成要素とするCRISPR/Casシステムとしては、例えば、CRISPR/Cpf1システムが挙げられる。 Also, depending on the CRISPR / Cas system, the guide RNA may not contain tracrRNA. Examples of the CRISPR / Cas system using such a guide RNA as a component include the CRISPR / Cpf1 system.
 本発明にかかるガイドRNAは、当業者であれば、その配列情報に基づき、それをコードするDNAを鋳型として、インビトロ転写反応系(例えば、T7 DNAポリメラーゼ系)を用いて調製することができる。 A person skilled in the art can prepare a guide RNA according to the present invention using an in vitro transcription reaction system (for example, T7 DNA polymerase system) using the DNA encoding the same as a template based on the sequence information.
 また、本発明にかかるガイドRNAをコードするポリヌクレオチドを含む発現ベクターについては、上述の融合タンパク質同様に、当業者であれば適宜調製し得るが、当該発現ベクターに含まれるプロモーターとしては、短いコード領域を効率よく転写し易いという観点から、polIIIプロモーター(例えば、U6、H1、SNR6、SNR52,SCR1、RPR1プロモーター)が好適に用いられる。 Further, the expression vector containing the polynucleotide encoding the guide RNA according to the present invention can be appropriately prepared by a person skilled in the art as in the fusion protein described above, but the promoter contained in the expression vector has a short code. From the viewpoint of efficient transcription of the region, a polIII promoter (for example, U6, H1, SNR6, SNR52, SCR1, RPR1 promoter) is preferably used.
 本発明のDNA編集システムは、本発明の融合タンパク質とガイドRNA等との組み合わせを含むキットを構成していてもよい。該キットは、一つ又は複数の追加の試薬を更に含んでいてもよい。このような追加の試薬としては、例えば、希釈緩衝液、再構成溶液、洗浄緩衝液、核酸導入試薬、タンパク質導入試薬、対照試薬が挙げられるが、これらに制限されるものではない。また、当該キットは、本発明の方法を実施するための使用説明書をさらに含んでいてもよい。 The DNA editing system of the present invention may constitute a kit containing a combination of the fusion protein of the present invention and a guide RNA or the like. The kit may further include one or more additional reagents. Examples of such additional reagents include, but are not limited to, dilution buffers, reconstruction buffers, wash buffers, nucleic acid transfer reagents, protein transfer reagents, and control reagents. The kit may also include additional instructions for carrying out the methods of the invention.
 (DNAが編集された細胞を製造する方法)
 本発明は、DNAが編集された細胞を製造する方法であって、細胞に、上記ゲノム編集システムを導入することを含む方法を提供する。
(How to make cells with edited DNA)
The present invention provides a method for producing a cell in which DNA has been edited, which comprises introducing the above-mentioned genome editing system into the cell.
 本発明に用いる細胞は、動物細胞、植物細胞、藻細胞、真菌細胞等の真核細胞であっても、細菌や古細菌等の原核細胞であってもよい。好ましくは真核細胞である。 The cells used in the present invention may be eukaryotic cells such as animal cells, plant cells, algae cells and fungal cells, or prokaryotic cells such as bacteria and paleontology. Eukaryotic cells are preferred.
 「真核細胞」としては、例えば、動物細胞、植物細胞、藻細胞、真菌細胞が挙げられる。また動物細胞としては、例えば、哺乳動物細胞の他、魚類、鳥類、爬虫類、両生類、昆虫類の細胞が挙げられる。 Examples of "eukaryotic cells" include animal cells, plant cells, algae cells, and fungal cells. In addition to mammalian cells, animal cells include cells of fish, birds, reptiles, amphibians, and insects.
 「動物細胞」には、例えば、動物の個体を構成している細胞、動物から摘出された器官・組織を構成する細胞、動物の組織に由来する培養細胞等が含まれる。具体的には、例えば、卵母細胞や精子等の生殖細胞;各段階の胚の胚細胞(例えば、1細胞期胚、2細胞期胚、4細胞期胚、8細胞期胚、16細胞期胚、桑実期胚等);誘導多能性幹(iPS)細胞や胚性幹(ES)細胞等の幹細胞;線維芽細胞、造血細胞、ニューロン、筋細胞、骨細胞、肝細胞、膵臓細胞、脳細胞、腎細胞等の体細胞等が挙げられる。ゲノム編集動物の作成に用いられる卵母細胞としては、受精前及び受精後の卵母細胞を利用することができるが、好ましくは受精後の卵母細胞、すなわち受精卵である。特に好ましくは、受精卵は前核期胚のものである。 The "animal cell" includes, for example, cells constituting an individual animal, cells constituting an organ / tissue extracted from an animal, cultured cells derived from an animal tissue, and the like. Specifically, for example, germ cells such as egg mother cells and sperm; embryo cells of embryos at each stage (for example, 1-cell stage embryo, 2-cell stage embryo, 4-cell stage embryo, 8-cell stage embryo, 16-cell stage). Embryos, mulberry stage embryos, etc.); Stem cells such as induced pluripotent stem (iPS) cells and embryonic stem (ES) cells; fibroblasts, hematopoietic cells, neurons, muscle cells, bone cells, hepatocytes, pancreatic cells , Somatic cells such as brain cells and kidney cells. As the oocyte used for producing the genome-editing animal, pre-fertilization and post-fertilization oocytes can be used, but post-fertilization oocytes, that is, fertilized eggs are preferable. Particularly preferably, the fertilized egg is of a pronuclear stage embryo.
 「哺乳動物」とは、ヒトおよび非ヒト哺乳動物を包含する概念である。非ヒト哺乳動物の例としては、ウシ、イノシシ、ブタ、ヒツジ、ヤギ等の偶蹄類、ウマ等の奇蹄類、マウス、ラット、モルモット、ハムスター、リス等の齧歯類、ウサギ等のウサギ目、イヌ、ネコ、フェレット等の食肉類等が挙げられる。非ヒト哺乳動物は、家畜又はコンパニオンアニマル(愛玩動物)であってもよく、野生動物であってもよい。 "Mammals" is a concept that includes humans and non-human mammals. Examples of non-human mammals include ferrets such as cows, wild boars, pigs, sheep and goats, cloven-hoofed animals such as horses, rodents such as mice, rats, guinea pigs, hamsters and squirrels, and lagomorphs such as rabbits. , Dogs, cats, ferrets and other meats. The non-human mammal may be a domestic animal or a companion animal (pet animal) or a wild animal.
 「植物細胞」としては、例えば、穀物類、油料作物、飼料作物、果物、野菜類の細胞が挙げられる。「植物細胞」には、例えば、植物の個体を構成している細胞、植物から分離した器官や組織を構成する細胞、植物の組織に由来する培養細胞等が含まれる。植物の器官や組織としては、例えば、葉、茎、茎頂(生長点)、根、塊茎、カルス等が挙げられる。植物の例としては、イネ、トウモロコシ、バナナ、ピーナツ、ヒマワリ、トマト、アブラナ、タバコ、コムギ、オオムギ、ジャガイモ、ダイズ、ワタ、カーネーション等が挙げられ、その繁殖材料(例えば、種子、塊根、塊茎等)も含まれる。 Examples of "plant cells" include cells of cereals, oil crops, forage crops, fruits, and vegetables. The "plant cell" includes, for example, cells constituting an individual plant, cells constituting an organ or tissue separated from a plant, cultured cells derived from a plant tissue, and the like. Examples of plant organs and tissues include leaves, stems, shoot apex (growth point), roots, tubers, callus and the like. Examples of plants include rice, corn, banana, peanut, sunflower, tomato, rape, tobacco, wheat, barley, potato, soybean, cotton, carnation, etc., and their breeding materials (eg, seeds, tubers, tubers, etc.) ) Is also included.
 細胞へゲノム編集システムを導入する方法としては、例えば、電気穿孔法、リン酸カルシウム法、リポソーム法、DEAEデキストラン法、マイクロインジェクション法、カチオン性脂質媒介トランスフェクション、エレクトロポレーション、形質導入、ウイルベクターを用いた感染等の方法が挙げられる。このような方法は、「Leonard G.Daviset al.,Basic methods in molecular biology,New York:Elsevier, 1986」等、多くの標準的研究室マニュアルに記載されている。 Examples of methods for introducing a genome editing system into cells include electroporation, calcium phosphate method, liposome method, DEAE dextran method, microinjection method, cationic lipid-mediated transfection, electroporation, transduction, and Will vector. Examples include methods such as infection. Such a method is described in many standard laboratory manuals such as "Leonard G. Daviset al., Basic methods in molecular biology, New York: Elsevier, 1986".
 このようにして、製造された細胞において、所望のDNA編集が生じていることは、当業者であれば、公知のDNA解析手法(例えば、PCR法、シーケンシング法、サザンブロッティング法)を利用して確認することができる。 The fact that the desired DNA editing occurs in the cells produced in this manner can be determined by those skilled in the art using known DNA analysis methods (for example, PCR method, sequencing method, Southern blotting method). Can be confirmed.
 また、次世代シーケンシング(NGS;Next Generation Sequencing)又は1分子シーケンシング法等を利用することによって、網羅的に配列決定することにより、所望のDNA編集のみならず、オフターゲットの有無等も確認することができる。次世代シーケンシング法としては特に制限はないが、合成シーケンシング法(sequencing-by-synthesis、例えば、イルミナ社製Solexaゲノムアナライザー、Hiseq(登録商標)、Nextseq、Miseq又はMiniseqによるシーケンシング)、パイロシーケンシング法(例えば、ロッシュ・ダイアグノステックス(454)社製のシークエンサーGSLX又はFLXによるシーケンシング(所謂454シークエンシング))、リガーゼ反応シーケンシング法(例えば、ライフテクノロジー社製のSoliD(登録商標)又は5500xlによるシーケンシング)が挙げられる。1分子シーケンシング法としては、例えば、パシフィック・バイオサイエンシズ・オブ・カリフォルニア社製のPacBio RS II又はPacBioSequelシステム、オックスフォード・ナノポアテクノロジーズ社製のPromethION、GridION又はMinION等が挙げられる。 In addition, by comprehensively determining the sequence by using next-generation sequencing (NGS; Next Generation Sequencing) or single molecule sequencing method, not only desired DNA editing but also the presence or absence of off-target can be confirmed. can do. The next-generation sequencing method is not particularly limited, but is a synthetic sequencing method (sequencing-by-sequencing, for example, Sequencing by Illumina Solexa Genome Analyzer, Hiseq®, Nextseq, Miseq or Miniseq), Pyro. Sequencing method (for example, sequencing by sequencer GSLX or FLX manufactured by Roche Diagnostics (454) (so-called 454 sequencing)), rigase reaction sequencing method (for example, SoliD® manufactured by Life Technology Co., Ltd.) Alternatively, sequencing with 5500 xl) can be mentioned. Examples of the single molecule sequencing method include PacBio RS II or PacBioSequel system manufactured by Pacific Biosciences of California, PromethION, Gilead ION or MinION manufactured by Oxford Nanopore Technologies.
 以下、実施例に基づいて本発明をより具体的に説明するが、本発明は以下の実施例に限定されるものではない。以下に、実施例において用いた材料及び方法について説明する。 Hereinafter, the present invention will be described in more detail based on Examples, but the present invention is not limited to the following Examples. The materials and methods used in the examples will be described below.
 (塩基エディタ発現プラスミドの調製)
 全ての塩基エディタ発現プラスミドは、pCMV-BE3(Addgene社製 カタログ番号:73021)にて使用される同じバックボーン配列を用い、調製した。
(Preparation of base editor expression plasmid)
All base editor expression plasmids were prepared using the same backbone sequence used in pCMV-BE3 (Catalog No. 73921, manufactured by Addgene).
 同じバックボーンを有する、BE4プラスミド(pCMV-BE4)、BE4maxプラスミド(pCMV-BE4max)、ABE7.10プラスミド(pCMV-ABE7.10)及びABEmaxプラスミド(pCMV-ABEmax)は、Addgene社から入手した(各々、カタログ番号:100802、112093、102919及び112095)。 The BE4 plasmid (pCMV-BE4), BE4max plasmid (pCMV-BE4max), ABE7.10 plasmid (pCMV-ABE7.10) and ABEmax plasmid (pCMV-ABEmax) having the same backbone were obtained from Addgene (each). Catalog numbers: 100802, 112093, 102919 and 112095).
 他の単機能塩基エディタ及び二重機能塩基エディタは、以下のようにGibson assemblyにより、PCRフラグメントを連結することにより構築した。 The other single-function base editor and dual-function base editor were constructed by concatenating PCR fragments with Gibson assembly as follows.
 Target-AIDプラスミド(pCMV-Target-AID)は、Target-AIDのN末端とC末端の半分をコードする2つの断片を、プライマーペア RS047/RS8を用いpCMV-ABE7.10から増幅したバックボーン断片と共に組み立てることによって構築した。なお、前記2つの断片は、プライマーペア RS045/HM129及びHM128/RS046を各々用い、pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA(HPRT)(Addgene社製 カタログ番号:79620)から増幅した。 The Target-AID plasmid (pCMV-Target-AID) is a backbone fragment in which two fragments encoding the N-terminal and C-terminal half of Target-AID are amplified from pCMV-ABE 7.10 using the primer pair RS047 / RS8. Built by assembling. The two fragments were amplified from pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA (HPRT) (Catalog No. 79620, manufactured by Addgene) using primer pairs RS045 / HM129 and HM128 / RS046, respectively.
 Target-AIDmaxプラスミド(pCMV-Target-AIDmax)を構築するために、コドンが最適化されたTarget-AIDmaxのC末端領域をコードするpUC-optimized-PmCDA1-ugiプラスミドを、先ずGenScriptの遺伝子合成サービスを利用し、構築した。このC末端断片は、プライマーペア SI1304/SI1307で増幅し、プライマーペア SI945/SI1308を使用してpCMV-BE4maxから増幅されたnCas9断片と、SI1310/SI1309を使用してpCMV-ABEmaxから増幅されたバックボーン断片とに連結した。 In order to construct the Target-AIDmax plasmid (pCMV-Target-AIDmax), the pUC-optimized-PmCDA1-ugi plasmid, which encodes the C-terminal region of Target-AIDmax with optimized codons, was first subjected to a gene synthesis service of GenScript. Used and built. This C-terminal fragment is amplified by the primer pair SI1304 / SI1307 and amplified from pCMV-BE4max using the primer pair SI945 / SI1308, and the backbone amplified from pCMV-ABEmax using SI1310 / SI1309. Concatenated with the fragment.
 BE4max(C)プラスミド(pCMV-BE4max(C))は、Target-AIDmaxのC末端領域を、コドンが最適化されたrAPOBEC1及びBE4maxの2つのUGIドメインに置き換え、構築した。このために、SI447/SI1105を使用してpCMV-Target-AIDmaxから得られたnCas9フラグメントを、プライマーペア SI1352/SI1357及びSI1359/SI1350を各々使用して、BE4maxから得られたrAPOBEC1及び2つのUGI断片と、プライマーペア SI1351/SI448を用いてpCMV-BE4maxから得られるバックボーン断片とに、連結した。 The BE4max (C) plasmid (pCMV-BE4max (C)) was constructed by replacing the C-terminal region of Target-AIDmax with two codon-optimized rAPOBEC1 and BE4max UGI domains. To this end, the nCas9 fragment obtained from pCMV-Target-AIDmax using SI447 / SI1105, the rAPOBEC1 and two UGI fragments obtained from BE4max using the primer pairs SI1352 / SI1357 and SI1359 / SI1350, respectively. Was ligated to the backbone fragment obtained from pCMV-BE4max using the primer pair SI1351 / SI448.
 Target-ACEプラスミド(pCMV-Target-ACE)は、プラスミド骨格をコードする断片と、プライマーペア RS047/RS052を使用してpCMV-ABE7.10から増幅されたABE7.10と、プライマーペア RS05RS/RS04を使用してpcDNA-pCMV-nCas9から増幅されたTarget -AIDのC末端領域をコードする断片とを連結することにより、構築した。 The Target-ACE plasmid (pCMV-Target-ACE) contains a fragment encoding the plasmid skeleton, ABE7.10 amplified from pCMV-ABE7.10 using the primer pair RS047 / RS052, and the primer pair RS05RS / RS04. It was constructed by ligating a fragment encoding the C-terminal region of Target-AID amplified from pcDNA-pCMV-nCas9 using.
 Target-ACEmaxプラスミド(pCMV-Target-ACEmax)は、プライマーペア SI945/SI1305を用いてpCMV-ABEmaxから得られたABEmax断片と、プライマーペア SI1304/SI1307を用いてpUC optimized PmCDA1-ugiから得られたTarget-AIDmaxのC末端領域をコードする断片と、プライマーペア SI1310/SI1309を用いてpCMV-ABEmaxから得られたプラスミド骨格をコードする断片とを連結することによって構築した。 The Target-ACEmax plasmid (pCMV-Target-ACEmax) was obtained from the ABEmax fragment obtained from pCMV-ABEmax using the primer pair SI945 / SI1305 and from the pUC optimized PmCDA1-ugi using the primer pair SI1304 / SI1307. It was constructed by ligating a fragment encoding the C-terminal region of -AIDmax and a fragment encoding the plasmid skeleton obtained from pCMV-ABEmax using the primer pair SI1310 / SI1309.
 ACBEmaxプラスミド(pCMV-ACBEmax)は、プライマーペア SI447/SI1105を使用してpCMV-Target ACEmaxから得られたABEmax断片と、pCMV-BE4max(C)を構築するために調製した、rAPOBEC1ドメイン、2×UGIドメイン及び2つのバックボーン断片をコードする3つのフラグメントとを、連結することによって構築した。なお、2つのUGIドメインは、タンデムUGIs(SGGSG[G>E]SGGS)間のGSリンカーにおいて、非同義的ヌクレオチド置換を有する。 The ACBEmax plasmid (pCMV-ACBEmax) is an rAPOBEC1 domain, 2xUGI, prepared to construct pCMV-BE4max (C) with the ABEmax fragment obtained from pCMV-Target ACEmax using the primer pair SI447 / SI1105. The domain and the three fragments encoding the two backbone fragments were constructed by concatenation. Note that the two UGI domains have non-synonymous nucleotide substitutions in the GS linker between tandem UGIs (SGGSG [G> E] SGGS).
 (gRNA発現プラスミドの調製)
 gRNAスペーサーインサートは、リン酸化及びアニールssDNAペアに対するシングルポット反応により調製した。各スペーサー断片を調製するために、T4ポリヌクレオチドキナーゼ反応サンプルを、メーカー(Takara社)のプロトコルに従い、2つのssDNAを用い調製し、サーマルサイクラーに入れ、37℃で30分間、95℃で5分間、次に、95℃から開始し、1サイクル毎に1℃ずつ温度を下げながら、70サイクル(各サイクル12秒)の反応に供し、次いで、25℃で維持した。
(Preparation of gRNA expression plasmid)
The gRNA spacer insert was prepared by a single pot reaction on phosphorylated and annealed ssDNA pairs. To prepare each spacer fragment, a T4 polynucleotide kinase reaction sample was prepared using two ssDNAs according to the manufacturer's (Takara) protocol and placed in a thermal cycler at 37 ° C. for 30 minutes and 95 ° C. for 5 minutes. Then, starting from 95 ° C., the reaction was carried out for 70 cycles (12 seconds for each cycle) while lowering the temperature by 1 ° C. for each cycle, and then maintained at 25 ° C.
 その後、アニールされたスペーサーインサートを、BsmBI(NEB社製)及びT4 DNAリガーゼ(NEB社製)を用い、ゴールデンゲートアセンブリによってpU6-gRNAクローニングバックボーン(pSI-356)に連結した。このアセンブルは、下記条件にて、サーマルサイクラーで行った。
37℃で5分間、20℃で5分間及び55℃で30分間を15サイクル行い、その後4℃で維持。
The annealed spacer insert was then ligated to the pU6-gRNA cloning backbone (pSI-356) by golden gate assembly using BsmBI (NEB) and T4 DNA ligase (NEB). This assembly was performed with a thermal cycler under the following conditions.
15 cycles of 37 ° C. for 5 minutes, 20 ° C. for 5 minutes and 55 ° C. for 30 minutes, then maintained at 4 ° C.
 (レンチウイルス塩基編集レポータープラスミドの調製)
 C→Tレポーター系は、図3Aに示すように、アンチセンス鎖においてC→Tの塩基編集を生じた場合、GTGがATGに変化し、開始コドンが復元するように設計した。A→Gレポーター系は、図3Bに示すように、アンチセンス鎖においてA→Gの塩基編集が生じた場合、TAAがCAAに変化し、終止コドンが破壊されることにより、下流のEGFPの翻訳が進行するように設計した。
(Preparation of wrench virus base editing reporter plasmid)
The C → T reporter system was designed so that when C → T base editing occurs in the antisense strand, GTG changes to ATG and the start codon is restored, as shown in FIG. 3A. As shown in FIG. 3B, the A → G reporter system translates downstream EGFP by changing TAA to CAA and disrupting the stop codon when base editing of A → G occurs in the antisense strand. Designed to progress.
 レンチウイルスC→T塩基編集レポータープラスミド(pLV-SI-112)とその陽性対照となるプラスミド(pRS112)を構築するために、レポーターカセット断片をプライマーセット 112-V4-BC2-FW/SI680及びRS204/SI666を用い、pLV-eGFP(Addgene社製 カタログ番号:36083)を鋳型として増幅した。それぞれ、T4 DNAリガーゼ(NEB社製)を用いてpLVSIN-CMV-Puro バックボーンベクター(タカラ社製)のEcoRI及びBamHIの認識部位に挿入し、クローニングした。 In order to construct the lentivirus C → T base editing reporter plasmid (pLV-SI-112) and its positive control plasmid (pRS112), the reporter cassette fragment was added to the primer set 112-V4-BC2-FW / SI680 and RS204 /. Using SI666, pLV-eGFP (catalog number: 36083 manufactured by Addgene) was used as a template for amplification. Each of them was inserted into the EcoRI and BamHI recognition sites of the pLVSIN-CMV-Puro backbone vector (manufactured by Takara) using T4 DNA ligase (manufactured by NEB) and cloned.
 レンチウイルスA→G塩基編集レポータープラスミド(pLV-SI-121)とその陽性対照となるプラスミド(pLV-SI-122)は、各レポーターカセット断片を、プライマーセット SI760/SI680及びSI761/SI680を各々用いて増幅したこと以外は、前記レンチウイルスC→T塩基編集レポータープラスミドと同様にして構築した。 For the lentivirus A → G base editing reporter plasmid (pLV-SI-121) and its positive control plasmid (pLV-SI-122), each reporter cassette fragment was used, and the primer sets SI760 / SI680 and SI761 / SI680 were used, respectively. It was constructed in the same manner as the lentivirus C → T base editing reporter plasmid except that it was amplified.
 なお、以上のようにして調製した各種プラスミドの配列は、サンガーシーケンシングにより確認した。 The sequences of the various plasmids prepared as described above were confirmed by sanger sequencing.
 (gRNA標的部位の選択)
 アンプリコンシーケンシングアッセイ用に、シトシン及びアデニンを有するgRNA標的部位を選択するために、先ず、ポリシトシン反復配列(poly-C)、ポリアデニン反復配列(poly-A)、交互ポリアデニン/シトシン反復配列(poly-AC)及び交互ポリシトシン/アデニン反復配列(poly-CA)を伴う標的部位を、ヒトゲノム中(hg19)において探索した。poly-C標的部位は、5’末がPAMから-24bp~-16bpの範囲に2bpの間隔でシフトする7-bpスライディングウィンドウを満たすシトシンを有することを条件とした。poly-A標的部位は、5’末がPAMから-21bp~-13bpの範囲に2bpの間隔でシフトする6-bpスライディングウィンドウを満たすアデニンを有することを条件とした。poly-AC標的部位及びpoly-CA標的部位は、5’末がPAMから-24bp~-14bpの範囲に2bpの間隔でシフトする6-bpスライディングウィンドウを満たす、各反復配列を有することを条件とした。また、-8bp~-1bpに及ぶgRNAシード領域内に4bp以上のホモポリマーを含み、また注釈付エクソンと重なる、候補標的部位は除外した。poly-Cとpoly-Aの各スライディングウィンドウの位置について、予測gRNA活性スコアが最も高い2つの候補部位をそれぞれ選択した。同様に、poly-ACとpoly-CAのスライドウィンドウ位置ごとに、それぞれ1つの候補部位を選択した。
(Selection of gRNA target site)
To select gRNA target sites with cytosine and adenine for ampli-consequencing assays, first polycytosine repeats (poly-C), polyadenine repeats (poly-A), alternate polyadenine / cytosine repeats (poly) Target sites with -AC) and alternating polycytosine / adenine repeats (poly-CA) were searched in the human genome (hg19). The poly-C target site was conditioned to have cytosine that fills a 7-bp sliding window in which the 5'end shifts from PAM to the range of -24 bp to -16 bp at 2 bp intervals. The poly-A target site was conditioned to have adenine at the end of 5'filling a 6-bp sliding window that shifts from PAM to the range of -21 bp to -13 bp at 2 bp intervals. The poly-AC target site and the poly-CA target site are subject to each repetitive sequence satisfying a 6-bp sliding window in which the 5'end shifts from PAM to the range of -24 bp to -14 bp at 2 bp intervals. did. In addition, candidate target sites containing 4 bp or more of homopolymers in the gRNA seed region ranging from -8 bp to -1 bp and overlapping the annotated exons were excluded. For the positions of the poly-C and poly-A sliding windows, the two candidate sites with the highest predicted gRNA activity scores were selected, respectively. Similarly, one candidate site was selected for each of the poly-AC and poly-CA slide window positions.
 これらの標的部位用に設計されたアンプリコンシーケンシングプライマーを用い、本発明者らによる増幅シーケンシングライブラリー調製プロトコルによって、強力に増幅することにより、7つのpoly-C、7つのpoly-A、6つのpoly-AC及び4つのpoly-CA標的部位を、更にスクリーニングした。なお、選抜されたpoly-C及びpoly-Aの標的部位には、シトシンとアデニンの両方の塩基が含まれている。これら24の標的部位に加えて、本発明者らが以前に別のアッセイのために調製したgRNAライブラリーコレクションから24の標的部位をスクリーニングした。これら各標的部位には、-20bp~-14bpの領域に1つ以上のシトシンと1つ以上のアデニンとを含める必要がある。CUL3-NGGサイト2のアンプリコンシーケンシングアッセイにおいて、EGFPコントロールデータを取得できなかったため、合計47のオンターゲット部位を分析した。 By using amplicon sequencing primers designed for these target sites and strongly amplifying by the amplification sequencing library preparation protocol by the present inventors, 7 poly-C, 7 poly-A, Six poly-AC and four poly-CA target sites were further screened. The selected target sites of poly-C and poly-A contain both cytosine and adenine bases. In addition to these 24 target sites, 24 target sites were screened from the gRNA library collection previously prepared by us for another assay. Each of these target sites should contain one or more cytosines and one or more adenines in the -20 bp to -14 bp region. A total of 47 on-target sites were analyzed because EGFP control data could not be obtained in the CUL3-NGG site 2 amplicon sequencing assay.
 アンプリコンシーケンシングによるオフターゲットサイト分析では、FANCFの2つのオンターゲットサイトと、-20bp~-14bpに及ぶ領域にシトシンとアデニンの両方の塩基を有するEMX1遺伝子のオンターゲットサイト1つと、過去の報告(Tsai,S.Q.et al.Nat Biotechnol 33,187-197(2015).及びKleinstiver,B.P.et al.Nature 529,490-495(2016).)においてGUIDE-seqによって同定された多くのオフターゲットサイトを選択した。各オンターゲットサイトに対し、以前に報告されたオフターゲットサイトのうち5つを選択したが、EMX1オフターゲットサイトの1つは、アンプリコンシーケンシングライブラリー調製プロトコルによって増幅されなかったため、分析から除外した。 In off-target site analysis by ampli-sequencing, two on-target sites of FANCF and one on-target site of the EMX1 gene having both cytosine and adenine bases in the region ranging from -20 bp to -14 bp, and previous reports. Identified by GUIDE-seq in (Tsai, S.Q. et al. Nat Biotechnol 33, 187-197 (2015). And Kleinstiber, BP et al. Nature 529, 490-495 (2016).). We have selected many off-target sites. Five of the previously reported off-target sites were selected for each on-target site, but one of the EMX1 off-target sites was excluded from the analysis because it was not amplified by the amplicon sequencing library preparation protocol. did.
 (細胞培養)
 HEK293Ta細胞は、GeneCopoeiaから購入し、10%のウシ胎仔血清(FBS)(Thermo Fisher Scientific社製)及び1%ペニシリン-ストレプトマイシン(シグマ社製)を添加した、ダルベッコ変法イーグル培地(DMEM、Sigma社製)にて、37℃、5%CO条件下にて維持した。また、細胞については、培養培地を鋳型として用いたnested PCRにより、マイコプラズマ汚染について日常的にチェックした。
(Cell culture)
HEK293Ta cells were purchased from GeneCopoea and supplemented with 10% fetal bovine serum (FBS) (Thermo Fisher Scientific) and 1% penicillin-streptomycin (Sigma) in modified Dulbecco Eagle's Medium (DMEM, Sigma). Was maintained at 37 ° C. under 5% CO 2 conditions. In addition, cells were routinely checked for mycoplasma contamination by nested PCR using a culture medium as a template.
 (塩基編集レポーター系を有する細胞株)
 C→T及びA→Gレポーター系と、それぞれに予想される変異を含む陽性コントロール系とを、レンチウイルス伝達によって、ヒト胚性腎臓細胞 HEK293Ta細胞に導入した。
(Cell line with base editing reporter system)
A C → T and A → G reporter system and a positive control system containing the expected mutations in each were introduced into human embryonic kidney cells HEK293Ta cells by lentivirus transmission.
 レンチウイルスパッケージングのため、トランスフェクションの1日前、6ウェルプレートに、~2×10細胞/ウェルを播種した。各パッケージング反応において、レンチウイルスプラスミド 489ng、2つのヘルパープラスミド psPAX2(Addgene社製、カタログ番号:12260)、pMD2.G(Addgene社製、カタログ番号:12259)それぞれ366ng及び122ngを、1mg/mL ポリエチレンイミンMAX(PEI、Polysciences社製)9.38μL、及び300μL リン酸緩衝生理食塩液(PBS)にて、コトランスフェクションした。その翌日、培養培地を新鮮な培地に変更し、2日後、レンチウイルス粒子を含む培養上清を回収し、1.5mLチューブに分注した。得られたウイルスサンプルは、感染迄-80℃で保存した。 ~ 2 × 10 5 cells / well were seeded on 6-well plates 1 day prior to transfection for wrench virus packaging. In each packaging reaction, 489 ng of lentivirus plasmid, two helper plasmids, psPAX2 (manufactured by Addgene, catalog number: 12260), pMD2. G (Addgene, Catalog No .: 12259) 366 ng and 122 ng, respectively, in 1 mg / mL polyethyleneimine MAX (PEI, Polysciences) 9.38 μL and 300 μL phosphate buffered saline (PBS). I did it. The next day, the culture medium was changed to fresh medium, and two days later, the culture supernatant containing lentivirus particles was collected and dispensed into a 1.5 mL tube. The resulting virus sample was stored at −80 ° C. until infection.
 レンチウイルス感染において、2mL DMEMにて、~2×10細胞/ウェルになるよう、6ウェルプレートに播種し、24時間インキュベートした。ウイルス上清を室温で解凍し、1μLの8mg/mL ポリブレン(Sigma社製)を混合し、各細胞サンプルに添加した。感染の翌日、~5×10個の感染細胞を、CellTiter-Gloアッセイ(Promega社製)によって、機能的な力価を測定するため、96ウェル培養プレートに再播種した。感染の2日後、2.0μg/mL ピューロマイシン(Thermo Fisher Scientific社製)を培地に添加し、続いて3日間インキュベーションを行い、正常に遺伝子導入された細胞を選択した。ピューロマイシン選択後、レポーター細胞株及びそれに対応する陽性対照細胞株において、それぞれバックグラウンド蛍光が生じていないこと及びEGFPが発現していることを確認した。 In Lentivirus infection, seeded in 6-well plates with 2 mL DMEM to ~ 2 × 10 5 cells / well and incubated for 24 hours. The virus supernatant was thawed at room temperature, 1 μL of 8 mg / mL polybrene (manufactured by Sigma) was mixed and added to each cell sample. The day after infection, ~ 5 × 10 3 infected cells were reseeded into 96-well culture plates to measure their functional titers by the CellTiter-Glo assay (Promega). Two days after infection, 2.0 μg / mL puromycin (manufactured by Thermo Fisher Scientific) was added to the medium, followed by incubation for 3 days to select normally transgenic cells. After selection of puromycin, it was confirmed that background fluorescence was not generated and EGFP was expressed in the reporter cell line and the corresponding positive control cell line, respectively.
 (EGFPレポーター活性化アッセイの際のトランスフェクション)
 塩基編集レポーター系が導入された細胞とそれに対応するコントロール細胞を、500μL DMEMのコラーゲンIにてコーティングした24ウェルプレート(Asone社製)に、~5×10細胞/ウェルの密度になるよう、播種した。その翌日、1mg/mL PEI 1.2μL、50μL PBS、塩基エディタ発現プラスミド 300ng、及び100ng gRNA発現プラスミドを混合し、その後、トランスフェクション用の各ウェルに添加する前に、20分間室温でインキュベートした。
(Transfection during EGFP reporter activation assay)
The cells into which the base editing reporter system was introduced and the corresponding control cells were placed in a 24-well plate (manufactured by AS ONE) coated with collagen I of 500 μL DMEM so as to have a density of ~ 5 × 10 4 cells / well. Sown. The next day, 1 mg / mL PEI 1.2 μL, 50 μL PBS, base editor expression plasmid 300 ng, and 100 ng gRNA expression plasmid were mixed and then incubated for 20 minutes at room temperature before being added to each well for transfection.
 各塩基エディタを混合して用いた実験では、2種の塩基エディタ発現プラスミドを質量比1:1で混合し、300ngを使用した。蛍光イメージングは、トランスフェクションの3日後に、20倍の対物レンズを備えた共焦点顕微鏡 InCellAnalyzer6000(GE Healthcare社製)を用いて行った。実験は4回繰り返して行なった。そのうちの1回は、細胞核を10mg/mLのHoechst 33342(Thermo Fisher Scientific社製)で染色した。 In the experiment in which each base editor was mixed and used, two kinds of base editor expression plasmids were mixed at a mass ratio of 1: 1 and 300 ng was used. Fluorescence imaging was performed 3 days after transfection using a confocal microscope InCellAnalyzer6000 (manufactured by GE Healthcare) equipped with a 20x objective lens. The experiment was repeated 4 times. At one time, the cell nuclei were stained with 10 mg / mL Hoechst 33342 (manufactured by Thermo Fisher Scientific).
 (ゲノムオンターゲット及びオフターゲットアッセイの際のトランスフェクション)
 HEK293Ta細胞を、コラーゲンIコーティング96ウェルプレート(Asone社製)に、200μL DMEMにて、~5×10細胞/ウェルの密度になるよう播種した。翌日、0.48μL PEI、50μL 1×PBS、120ng 塩基エディタ発現プラスミド又はコントロールEGFP発現プラスミド(pLV-eGFP)、及び40ng gRNA発現プラスミドを混合し、その後、トランスフェクション用各ウェルに添加する前に室温で15分間インキュベートした。この実験は、独立して3回繰り返して行なった。
(Transfection during genome on-target and off-target assays)
HEK293Ta cells were seeded in a collagen I coated 96-well plate (manufactured by Asone) with 200 μL DMEM to a density of ~ 5 × 10 3 cells / well. The next day, 0.48 μL PEI, 50 μL 1 × PBS, 120 ng base editor expression plasmid or control EGFP expression plasmid (pLV-eGFP), and 40 ng gRNA expression plasmid were mixed and then at room temperature before being added to each transfection well. Incubated for 15 minutes. This experiment was independently repeated 3 times.
 (ゲノムワイドDNA及びRNAオフターゲットアッセイの際のトランスフェクション)
 HEK293Ta細胞を、コラーゲンIコーティング6ウェルプレート(Asone社製)に、2mL DMEMにて、~2×10細胞/ウェルの密度になるよう播種した。翌日、3.0μLの1mg/mL PEI、200μL 1×PBS、666ng 塩基エディタ発現プラスミド又はコントロールEGFP発現プラスミド(pLV-eGFP)、及び333ng EMX1―ターゲティングgRNAプラスミドを混合し、室温で15分インキュベーションした後、各ウェルに添加した。この実験は、独立して2回繰り返して行なった。
(Transfection during genome-wide DNA and RNA off-target assay)
HEK293Ta cells were seeded on a collagen I coated 6-well plate (manufactured by Asone) with 2 mL DMEM to a density of ~ 2 × 10 5 cells / well. The next day, 3.0 μL of 1 mg / mL PEI, 200 μL 1 × PBS, 666 ng base editor expression plasmid or control EGFP expression plasmid (pLV-eGFP), and 333 ng EMX1-targeting gRNA plasmid were mixed and incubated at room temperature for 15 minutes. , Added to each well. This experiment was independently repeated twice.
 (EGFPレポーター活性化アッセイにおけるアンプリコンシーケンシング)
 共焦点イメージング後、培養培地を吸引し、200μLの新たに調製した50mM NaOHを24ウェルプレートに各細胞サンプルに添加した。その後、そのうちの100μLを96ウェルPCRプレート(Nippon Genetics社製)に移し、95℃で15分間加熱し、4℃に冷却し、続いて20μLの1M Tris-HCl(pH8.0)を添加し、中和した。このようにして調製した細胞溶解液を鋳型として、対応する1st HTSプライマーを用い、各サインプルにおける標的領域を増幅した。
(Amplicon sequencing in EGFP reporter activation assay)
After confocal imaging, the culture medium was aspirated and 200 μL of freshly prepared 50 mM NaOH was added to each cell sample in a 24-well plate. Then, 100 μL of it was transferred to a 96-well PCR plate (manufactured by Nippon Genetics), heated at 95 ° C. for 15 minutes, cooled to 4 ° C., and then 20 μL of 1M Tris-HCl (pH 8.0) was added. Neutralized. Using the cell lysate thus prepared as a template, the corresponding 1st HTS primer was used to amplify the target region in each sine pull.
 PCRは、1μL 鋳型、10μMの各プライマー 1μL、0.2μL Phusion DNA ポリメラーゼ、5×Phusion HFバッファー(NEB社製)及び2.5mM dNTPs 1.6μLを含む、20μLの溶液にて、下記温度サイクル条件で行なった。
98℃で30秒、次いで、98℃で10秒、60℃で10秒及び72℃で10秒のサイクルを30回繰り返し、最後の伸長反応のため、72℃で5分間。
PCR is performed in a 20 μL solution containing 1 μL template, 10 μM primers, 1 μL, 0.2 μL Phusion DNA polymerase, 5 × Phusion HF buffer (manufactured by NEB) and 1.6 μL of 2.5 mM dNTPs under the following temperature cycle conditions. It was done in.
The cycle of 98 ° C. for 30 seconds, then 98 ° C. for 10 seconds, 60 ° C. for 10 seconds and 72 ° C. for 10 seconds was repeated 30 times, and for the final extension reaction, 72 ° C. for 5 minutes.
 このようにして得られたPCR産物を、2%アガロースゲルでの電気泳動に供した。また10倍希釈したもの1μLを鋳型とし、カスタムイルミナインデックスプライマーを含む20μLの溶液にて、下記温度サイクル条件で再増幅した。
98℃で30秒、次いで、98℃で10秒、60℃で10秒及び72℃で30秒のサイクルを15回繰り返し、最後の伸長反応のため、72℃で5分間。
The PCR product thus obtained was subjected to electrophoresis on a 2% agarose gel. Further, 1 μL of a 10-fold diluted solution was used as a template, and the mixture was re-amplified in a 20 μL solution containing a custom Illumina index primer under the following temperature cycle conditions.
The cycle of 98 ° C. for 30 seconds, then 98 ° C. for 10 seconds, 60 ° C. for 10 seconds and 72 ° C. for 30 seconds was repeated 15 times, and for the final elongation reaction, 72 ° C. for 5 minutes.
 このようにして得られた各インデックスライブラリーを2%アガロースゲルでの電気泳動に供し、期待される分子量のバンドを、FastGene Gel/PCR抽出キット(Nippon Genetics社製)を用いて回収した。 Each index library thus obtained was subjected to electrophoresis on a 2% agarose gel, and a band having an expected molecular weight was recovered using a FastGene Gel / PCR extraction kit (manufactured by Nippon Genetics).
 (ゲノムオンターゲット及びオフターゲットアッセイにおけるアンプリコンシーケンシング)
 トランスフェクションの3日後、培養液を除去し、50μLの新たに調製した50mM NaOHを、96ウェルプレート中の各細胞サンプルに添加した。細胞サンプルを96ウェルqPCRプレート(BioRad社製)に移し、光学的に透明な粘着PCRシール(BioRad社製)で密封し、2,400rpmで2分間遠心し、95℃で15分間加熱し、4℃に冷却し、続いて5μLの1M Tris-HCl(pH8.0)を添加し、中和した。細胞溶解液が入っているプレートを、再び遠心分離に供し、-20℃で保存した。各標的領域は、対応する1st HTSプライマーペアを用いて増幅した。
(Amplicon sequencing in genome on-target and off-target assays)
Three days after transfection, the culture was removed and 50 μL of freshly prepared 50 mM NaOH was added to each cell sample in a 96-well plate. The cell sample was transferred to a 96-well qPCR plate (BioRad), sealed with an optically transparent adhesive PCR seal (BioRad), centrifuged at 2,400 rpm for 2 minutes, heated at 95 ° C. for 15 minutes, 4 The mixture was cooled to ° C., followed by the addition of 5 μL of 1M Tris-HCl (pH 8.0) for neutralization. The plate containing the cell lysate was centrifuged again and stored at −20 ° C. Each target region was amplified using the corresponding 1st HTS primer pair.
 PCRは、2μL ゲノムDNAテンプレート、8.3μMの各プライマー 1.20μL、0.2μL Phusion DNAポリメラーゼ、5×Phusion HFバッファー、及び2.5mM dNTPs 1.6μLを含む20μLの溶液にて、下記温度サイクル条件で行なった。
先ず98℃で30秒、98℃で10秒、60℃で10秒及び72℃で60秒のサイクルを30回繰り返し、最後の伸長反応のため、72℃で5分間。
PCR is performed in a 20 μL solution containing a 2 μL genomic DNA template, 8.3 μM each primer 1.20 μL, 0.2 μL Phaseion DNA polymerase, 5 × Phaseion HF buffer, and 1.6 μL 2.5 mM dNTPs in the following temperature cycle. It was done under the conditions.
First, a cycle of 98 ° C. for 30 seconds, 98 ° C. for 10 seconds, 60 ° C. for 10 seconds and 72 ° C. for 60 seconds was repeated 30 times, and for the final extension reaction, 72 ° C. for 5 minutes.
 再現実験を行なう毎に、同じ塩基エディタ試薬の各PCR産物 3μLをプールし、1.8×体積のAgencourt AMPure XP磁気ビーズ(Beckman Coulter社製)を用いて精製した。 Each time a reproduction experiment was performed, 3 μL of each PCR product of the same base editor reagent was pooled and purified using 1.8 × volume of Agencourt AMPure XP magnetic beads (manufactured by Beckman Coulter).
 10ng/μLの第1PCR産物 1μLを鋳型とし、カスタムイルミナインデックスプライマーを用い、20μLの溶液にて、下記温度サイクル条件にてPCR反応を行い、前記精製物を増幅した。
先ず98℃で30秒、次いで、98℃で10秒、65℃で10秒及び72℃で90秒のサイクルを15サイクル繰り返し、最後の伸長反応のため、72℃で5分間。
Using 1 μL of the first PCR product of 10 ng / μL as a template and using a custom Illumina index primer, a PCR reaction was carried out in a 20 μL solution under the following temperature cycle conditions to amplify the purified product.
First, a cycle of 98 ° C. for 30 seconds, then 98 ° C. for 10 seconds, 65 ° C. for 10 seconds, and 72 ° C. for 90 seconds was repeated for 15 cycles, and for the final extension reaction, 72 ° C. for 5 minutes.
 このようにして得られた各インデックスライブラリーを2%アガロースゲルでの電気泳動に供し、期待される分子量のバンドを、FastGene Gel/PCR抽出キット(Nippon Genetics社製)を用いて回収した。 Each index library thus obtained was subjected to electrophoresis on a 2% agarose gel, and a band having an expected molecular weight was recovered using a FastGene Gel / PCR extraction kit (manufactured by Nippon Genetics).
 シーケンスライブラリーは、多重化のため、KAPAライブラリー定量化キットイルミナ(KAPA Biosystems社製)を用いたqPCRによって定量化した。多重化されたライブラリーは、イルミナHiSeq2500(TruSe rapid SBSキット;2×151 bp paired-end)又はMiSeq(MiSeq v3キット;2×2200 bp paired-end)を用い、20%~30%のPhiXをコントロールとし、同じqPCRプロトコルにて、定量化した。 The sequence library was quantified by qPCR using the KAPA library quantification kit Illumina (manufactured by KAPA Biosystems) for multiplexing. The multiplexed library uses Illumina HiSeq2500 (TruSe rapid SBS kit; 2 × 151 bp paired-end) or MiSeq (MiSeq v3 kit; 2 × 2200 bp paired-end) with 20% to 30% of PhX. As a control, it was quantified by the same qPCR protocol.
 (RNA-seq及びWESライブラリーの調製)
 トランスフェクションの3日後に、細胞をトリプシンによって溶解し、トランスクリプトームシーケンシング(RNA-seq)及び全エキソームシーケンシング(WES)のため、2つの1.5mLチューブに分注した。
(Preparation of RNA-seq and WES library)
Three days after transfection, cells were lysed with trypsin and dispensed into two 1.5 mL tubes for transcriptome sequencing (RNA-seq) and total exome sequencing (WES).
 RNA-seqライブラリー調製のため、細胞を、1,000rpm、5分間の遠心分離に供し、培養上清を除去した。その後、トータルRNAをISOSPIN Cell&Tissue RNA(Nippon Gene)を用いて抽出し、TRUSeq Stranded mRNAライブラリー調製キット(Illumina社製)を用い、RNA-seqライブラリーを調製した。 To prepare the RNA-seq library, the cells were subjected to centrifugation at 1,000 rpm for 5 minutes, and the culture supernatant was removed. Then, total RNA was extracted using ISOSPIN Cell & Tissue RNA (Nippon Gene), and an RNA-seq library was prepared using a TRUSeq Stranded mRNA library preparation kit (manufactured by Illumina).
 WESライブラリー調製のために、NucleoSpin Tissue(Macherey Nagel)を用い、ゲノムDNAを抽出した。その後、500ngのゲノムDNAを、50μLの溶液にて、超音波照射装置(和研薬株式会社製、製品名:Covalis E-220)を用い、平均サイズ150~300bpに断片化した。次いで、エンドリペア、A-テーリング及びSureSelectアダプターライゲーション(Agilent社製)を行なった。 For the preparation of the WES library, genomic DNA was extracted using NucleoSpin Tissue (Machery Nagal). Then, 500 ng of genomic DNA was fragmented into an average size of 150 to 300 bp in a 50 μL solution using an ultrasonic irradiation device (manufactured by Waken Yakuhin Co., Ltd., product name: Covalis E-220). Then, end repair, A-tailing and SureSelect adapter ligation (manufactured by Agilent) were performed.
 アダプター結合DNAは、KAPA HyperPrepキット(KAPA Biosystems)を用いて濃縮した。その後、前もって増幅したDNA 750ngを、SureSelectXT Human All Exon V3キットプローブ(Agilent社製)に、20時間かけてハイブリダイズした。 The adapter-binding DNA was concentrated using the KAPA HyperPrep kit (KAPA Biosystems). Then, 750 ng of the pre-amplified DNA was hybridized to the SureSelectXT Human All Exon V3 kit probe (manufactured by Agilent) over 20 hours.
 ライブラリーのインデックス化のため、KAPA DNAポリメラーゼ及びSureSelect Indexingポストキャプチャポリメラーゼ連鎖反応プライマーを使用して、ポストキャプチャDNAライブラリー増幅を行った。最終的に、ライブラリーはAgencourt AMPure XPビーズを用いて精製した。RNA-seq及びWESライブラリーのフラグメントサイズ分布と収率は、LabChip GX電気泳動システム(Perkin Elmer社製)を用いて定量化した。多重化した後、最終的なライブラリーはイルミナNovaSeq 6000(S2フローセル;2×101 bpペアエンド)によってシーケンシングした。 Post-capture DNA library amplification was performed using KAPA DNA polymerase and SureSelect Indexing post-capture polymerase chain reaction primers to index the library. Finally, the library was purified using Agecurt AMPure XP beads. Fragment size distributions and yields of RNA-seq and WES libraries were quantified using a LabChip GX electrophoresis system (manufactured by PerkinElmer). After multiplexing, the final library was sequenced with Illumina NovaSeq 6000 (S2 flow cell; 2 x 101 bp paired end).
 (アンプリコンシーケンシング解析)
 先ず、カスタム サンプル インデックス及びデマルチプレックス ペアエンドのリードを識別するため、共通のアダプター配列を、blastn-shortオプションを使用し、NCBI BLAST+(バージョン2.7.0)にて、アンプリコンシーケンシングのリードにマップした。次に、IDしきい値80%を有する、EMBOSSニードルパッケージ(バージョン6.6.0)を用いた標的領域の対応する参照配列に、さらにマッピングされた、マージしたシーケンシングリードを生成するため、FLASH(バージョン1.2.0)を用い、各サンプルのペアエンドリードをマージした。
(Amplicon sequencing analysis)
First, in order to identify the read of the custom sample index and the demultiplexed pair end, a common adapter sequence is used with the Blastn-short option, and NCBI BLAST + (version 2.7.0) is used for the amplicon sequencing read. Mapped to. Next, to generate a further mapped, merged sequencing read to the corresponding reference sequence of the target region using the EMBOSS needle package (version 6.6.0) with an ID threshold of 80%. The paired end reads of each sample were merged using FLASH (version 1.2.0).
 ゲノムオンターゲット及びオフターゲットのアンプリコンシーケンシングデータを用いた、単一及び複数の編集スペクトル解析では、EGFPトランスフェクションコントロールデータは常に同じようにバックグラウンド信号を引いた。 In single and multiple edited spectrum analyzes using genome-on-target and off-target amplicon sequencing data, the EGFP transfection control data always elicited the same background signal.
 (RNA及びDNA変異体検出パイプライン)
 RNA-seq及びWESベースコールファイルは、bcl2fastq2(バージョン v2.20.0)を使用して多重化した。バリアントコールのシーケンシングカバレッジバイアスを排除するために、全てのRNA-seq及びWESライブラリーは、seqtk(バージョン1.3-r107-dirty)を使用して、サンプルあたりそれぞれ7400万リードと9400万リードまで無作為にサブサンプリングした。次いで、サブサンプリングされたRNA-seqリードを、STAR(バージョン2.7.3a)及びトランスクリプト注釈GTF(GENCODEリリース22 GRCh38.p2)を使用して参照ヒトゲノムGRCh38.d1.vd1にマッピングし、Picard MarkDuplicates(バージョン2.0.1)を使用して重複排除した。また、国立がん研究所ゲノムデータコモンズDNA-Seq分析パイプラインに従って、サブサンプリングされたDNAリードもマッピングのために処理した。すなわち、リードはBWA-MEM(バージョン0.7.15)によって参照ヒトゲノムGRCh38.d1.vd1と整列し、PCRdeplicatesはPicard MarkDuplicates(バージョン2.0.1)を使用して除去した。GATKハプロタイプ コーラー(バージョン4.1.4.1)は、参照ヒトゲノムGRCh38.d1.vd1のDNA及びRNA変異体の両方を検出するために使用した。GATK ハプロタイプコーラーによって検出した変異位置は、Grunewald, J.et al.Nat Biotechnol 37,1041-1048(2019)に記載のとおり、更にフィルタリングした
 (DNAオフターゲットリスクの推定)
 標的gRNAの非特異的結合によって引き起こされる種々の塩基編集方法におけるDNAオフターゲット効果を調べるため、オフターゲット部位が一般的にGUIDE-seqによって観察される8EMX1及びFANCF遺伝子を標的とする3つのgRNAを用いた(Tsai,S.Q.et al.Nat Biotechnol 33,187-197(2015).)。塩基エディタ及びオンターゲットgRNA試薬をHEK293Ta細胞にトランスフェクションした後、FANCF標的部位1に関する5つのオフターゲット部位、FANCF標的部位2に関する5つのオフターゲット部位、EMX1標的部位に関する4つのオフターゲット部位における、アンプリコンシーケンシングによって、塩基編集効率を分析した。全ての試験したオフターゲット部位において、対応するオンターゲット活性に正規化された相対的なオフターゲット活動の中央値を、各塩基編集方法のDNAオフターゲティングリスクスコアとして計算した。
(RNA and DNA mutant detection pipeline)
RNA-seq and WES base call files were multiplexed using bcl2fastq2 (version v2.20.0). To eliminate sequencing coverage bias for variant calls, all RNA-seq and WES libraries use sextk (version 1.3-r107-dirty) with 74 million reads and 94 million reads per sample, respectively. Was randomly subsampled. The subsampled RNA-seq reads were then referenced using STAR (version 2.7.3a) and transcript annotation GTF (GENCODE Release 22 GRCh38.p2) to reference the human genome GRCh38. d1. It was mapped to vd1 and deduplicated using Picard MarkDuplicates (version 2.0.1). Subsampled DNA reads were also processed for mapping according to the National Cancer Center Genome Data Commons DNA-Seq analysis pipeline. That is, the read is referenced by the BWA-MEM (version 0.7.15) Human Genome GRCh38. d1. Aligned with vd1, PCR deplicates were removed using Picard Mark Duplicates (version 2.0.1). The GATK Haplotype Caller (version 4.1.4.1) is available at the Reference Human Genome GRCh38. d1. It was used to detect both DNA and RNA variants of vd1. Mutation positions detected by the GATK haplotype caller are described in Grunewald, J. et al. et al. Further filtering as described in Nat Biotechnol 37, 1041-1048 (2019) (estimation of DNA off-target risk)
To investigate the DNA off-target effect on various base editing methods caused by non-specific binding of the target gRNA, we selected three gRNAs whose off-target sites target the 8EMX1 and FANCF genes commonly observed by GUIDE-seq. It was used (Tsai, S.Q. et al. Nat Biotechnol 33, 187-197 (2015).). After transfecting HEK293Ta cells with a base editor and on-target gRNA reagent, amplifiers at 5 off-target sites for FANCF target site 1, 5 off-target sites for FANCF target site 2, and 4 off-target sites for EMX1 target site. Base editing efficiency was analyzed by reconsequencing. At all tested off-target sites, the median relative off-target activity normalized to the corresponding on-target activity was calculated as the DNA off-targeting risk score for each base editing method.
 (塩基編集予測モデル)
 モデルトレーニング
 種々の標的部位についてのアンプリコンシーケンシングデータセットを使用して、条件付き確率モデルをトレーニングした。トレーニング過程における潜在的なシーケンスエラーの影響を最小限に抑えるために、1×10-4未満の相対的リード頻度を伴う観察された編集結果を、最初に除去した。標的部位におけるPAMからのi bp位置におけるヌクレオチド塩基転位状態をs_iとし、その確率をP(Si)とした。各標的領域に関し、指定された領域におけるi及びjの組合せ毎に、P(Si)及びP(Sj│Si)を算出した(i≠j)。100以上のリードにおいて観察された種々のトレーニング標的部位に亘る、P(Si)及びP(Sj│Si)の平均値によって、トレーニングモデルを最終的に構築した。
(Base editing prediction model)
Model Training Conditional probabilistic models were trained using amplicon sequencing datasets for various target sites. Observed edits with a relative read frequency of less than 1 × 10 -4 were first removed to minimize the effects of potential sequence errors during the training process. The nucleotide base rearrangement state at the ibp position from PAM at the target site was defined as s_i, and the probability was defined as P (Si). For each target region, P (Si) and P (Sj│Si) were calculated for each combination of i and j in the designated region (i ≠ j). The training model was finally constructed by the mean values of P (Si) and P (Sj│Si) over the various training target sites observed in 100 or more leads.
 塩基編集の結果予測
 PAMからm bp~n bpにかかるウィンドウにおける塩基編集パターンをSm,nとし、転位状態の文字列 sm,sm+1,…,sn-1,sn。で表すことができるトレーニングモデルを使用して、テスト対象部位における所定の結果 Sm,nの頻度を下記式で予測した。
Prediction of the result of base editing The base editing pattern in the window from PAM to m bp to n bp is Sm, n, and the dislocation-state character strings sm, sm + 1, ..., Sn-1, sn. Using a training model that can be represented by, the frequency of predetermined results Sm, n at the test site was predicted by the following formula.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 基本的に、複数の塩基転位を伴う特定の編集結果の頻度を予測するために、この予測モデルは、他の独立した塩基転位パターンを考慮して、編集された全ての位置での塩基転位の確率の幾何学的平均を計算する。トレーニングデータを欠いている特定のP(s_j|s_i)は無視され(1として扱われる)、トレーニングデータが欠けているP(s_i)は0である。 Basically, in order to predict the frequency of a particular edit result with multiple base translocations, this prediction model takes into account other independent base translocation patterns and of the base translocations at all edited positions. Calculate the geometric average of probabilities. The specific P (s_j | s_i) lacking training data is ignored (treated as 1), and the P (s_i) lacking training data is 0.
 塩基編集予測モデルの検証
 塩基編集予測モデルを、5-fold、15-fold、leave-one-out交差検証実験で評価した。テストした標的部位に関し、PAMから-25bp~-5bpにかかるウィンドウについてのアンプリコンシーケンシングデータセットで検出された全ての塩基編集結果パターンの頻度は、テスト対象部位と重複しない他の標的部位のアンプリコンシーケンシングデータをトレーニングすることによって予測した。k分割交差検証実験では、47標的部位から、テストサンプルとして、[47/k]対象部位をランダムに選択した。それら編集結果を、残りの標的部位のアンプリコンシーケンシングデータをトレーニングして予測し、テストサンプルをランダムに変更して100回反復した。leave-one-out交差検証のため、他の46の標的領域のアンプリコンシーケンシングデータを使用して、各標的部位の編集結果を予測した。予測性能は、まず、全ての編集されたリードの間で予測編集頻度を相対編集頻度に変換し、複数の場合は各編集結果パターンに対して1つの予測結果をランダムに選択し、予測と実験測定の間のピアソンの相関係数を計算することによって測定した。
Verification of Base Editing Prediction Model The base editing prediction model was evaluated by 5-fold, 15-fold, and leave-one-out cross-validation experiments. For the target site tested, the frequency of all base edit result patterns detected in the ampli-consequencing dataset for windows ranging from -25 bp to -5 bp from PAM does not overlap with the test target site amplifiers at other target sites. Predicted by training reconsequencing data. In the k-fold cross-validation experiment, [47 / k] target sites were randomly selected as test samples from 47 target sites. The edited results were predicted by training the amplicon sequencing data of the remaining target sites, and the test sample was randomly changed and repeated 100 times. For leaf-one-out cross-validation, amplicon sequencing data from 46 other target regions were used to predict the editing results for each target site. For prediction performance, first, the predicted editing frequency is converted into a relative editing frequency among all edited leads, and in the case of multiple, one prediction result is randomly selected for each editing result pattern, and prediction and experimentation are performed. It was measured by calculating Pearson's correlation coefficient between measurements.
 (合成した標的配列における共編集頻度についてのシミュレーション)
 塩基編集予測モデルを用い、種々の塩基編集法についての多次元共編集スペクトルを予測するために、PAMに対して-20bp~-1bpまでの領域におけるシトシン及び/又はアデニン塩基のみからなる100の合成した標的配列をシミュレートした。各標的領域について、C→T及びA→G編集を伴う全ての起こり得る結果(合計220の結果)が、47のアンプリコンシーケンシングデータ全てからトレーニングされた塩基編集予測モデルを用い、予測した。相同及び異種のジヌクレオチド編集スペクトルの平均は、全ての予測頻度を用い、計算した。また、poly-ACがPAMに対して-20bp~-1bpまで広がっている、シミュレートされた配列についても、トリヌクレオチド編集スペクトルを予測した。
(Simulation of co-editing frequency in synthesized target sequences)
100 synthetics consisting only of cytosine and / or adenine bases in the region from -20 bp to -1 bp with respect to PAM to predict multidimensional co-edited spectra for various base editing methods using base editing prediction models. The target sequence was simulated. For each target region, C → T and A → G editing may all occur in with the results (of a total of 2 20 results), using the training bases edited prediction model from amplicon sequencing data every 47, predicted .. Meaning of homologous and heterogeneous dinucleotide edited spectra was calculated using all predicted frequencies. Trinucleotide editing spectra were also predicted for simulated sequences in which poly-AC extends from -20 bp to -1 bp with respect to PAM.
 (コドン変換マトリックスとバイスタンダー変異リスク)
 コドン変換ポテンシャルと、各塩基編集方法におけるバイスタンダー変異リスクを推定するために、標的コドン変換と一緒に望ましくない変異が生成されるバイスタンダー変異を伴う、又は伴わずに、コドン変換マトリックス(CCM)を生成した。
(Codon conversion matrix and bystander mutation risk)
To estimate the codon conversion potential and the risk of bystander mutations in each base editing method, the codon conversion matrix (CCM) with or without bystander mutations that generate unwanted mutations with the target codon conversion. Was generated.
 先ず、ヒトゲノム中の11,250,496種のソースコドン(hg38)について、標的コドンから±25bpの領域でgRNAの標的となり得る部位をスクリーニングした。全てのgRNAについて、標的コドントリプレットの±15bp領域において生じ得るC→T及びA→Gの全ての編集パターンについての確率を、全47ゲノム部位のアンプリコンシーケンシングデータによってトレーニングされた塩基編集予測モデルを用いて予測した。その後、バイスタンダー変異を伴わない、標的ソースコドンから各宛先コドンへの変異の可能性を、可能性のある全てのgRNAによって生成されたものの中で、目的とする結果を生成する最大確率として定義した。バイスタンダー変異が生じ得る場合、標的ソースコドン変換を伴う塩基編集の結果について予測された確率は、全てgRNA毎に合計し、可能性のあるgRNAの中での最大統合確率は、変換確率として定義した。標的となり得るgRNAを有しないコドンの変換確率は、任意の宛先コドンタイプに対して0と定義した。全てのゲノムコドンについて種々の宛先コドンへの変換確率を計算した後、変換確率のしきい値5%を伴う各ソース宛先コドン変換タイプの頻度を示すため、最終的にCCMを生成した。種々のソース-宛先コドンタイプについて、バイスタンダー変異リスクスコアを、バイスタンダー変異を許容しないものによって、バイスタンダー変異を可能にするCCM頻度を分割することによって計算した。 First, for 11,250,496 kinds of source codons (hg38) in the human genome, sites that could be targets of gRNA were screened in the region ± 25 bp from the target codon. For all gRNAs, the probability for all C → T and A → G editing patterns that can occur in the ± 15bp region of the target codon triplet is a base editing prediction model trained by amplicon sequencing data for all 47 genomic sites. Was predicted using. Then, the possibility of mutation from the target source codon to each destination codon without bystander mutation is defined as the maximum probability of producing the desired result among all possible gRNAs. did. If bystander mutations can occur, the predicted probabilities for the outcome of base editing with target source codon conversion are all summed per gRNA, and the maximum integration probability among the possible gRNAs is defined as the conversion probability. did. The conversion probability for codons that do not have a potential gRNA was defined as 0 for any destination codon type. After calculating the conversion probabilities to various destination codons for all genomic codons, a CCM was finally generated to show the frequency of each source destination codon conversion type with a conversion probability threshold of 5%. For various source-destination codon types, bystander mutation risk scores were calculated by dividing the CCM frequency that allows bystander mutations by those that do not tolerate bystander mutations.
 (ClinVar分析)
 ClinVarデータベースで報告された病理学的C・G→T・A及びA・T→G・C SNVsについて、各変異を修正する可能性のあるgRNA標的部位を、標的変異から±25bp領域内で、先ずスクリーニングした。標的変異から±15bp領域において、望ましくないバイスタンダー変異を誘導することなく、種々のgRNA標的部位によって変異が修正される確率を、全てのアンプリコンシーケンシングデータセットによってトレーニングした塩基編集予測モデルを用いて予測した。各変異について、その補正された確率は、種々のgRNAによって誘導されたものの間で標的コドン変換における最大確率として定義した。最後に、2つの異種疾患変異を同時に修正する二重機能塩基エディタの世界的な可能性を推定するために、両方とも5%の補正確率閾値で修正可能であると予測された2つの異種突然変異の組み合わせを数えた。計算コストを削減するために、異種変異スペースの組み合わせを同じ遺伝子内のものに限定した。
(ClinVar analysis)
For the pathological C ・ G → TA and A ・ T → G ・ C SNVs reported in the ClinVar database, gRNA target sites that may correct each mutation were determined within the ± 25 bp region from the target mutation. First screened. Using a base-edited predictive model trained with all amplicon sequencing datasets, the probability that mutations will be modified by various gRNA target sites in the ± 15 bp region from the target mutation without inducing unwanted bystander mutations. Predicted. For each mutation, its corrected probability was defined as the maximum probability of target codon conversion among those induced by various gRNAs. Finally, to estimate the global potential of a dual-function base editor to modify two heterogeneous disease mutations simultaneously, two heterogeneous suddens both predicted to be correctable with a correction probability threshold of 5%. The combination of mutations was counted. To reduce computational costs, we limited the heterologous mutation space combinations to those within the same gene.
 (統計学的分析)
 全てのゲノム編集実験を、独立した細胞培養サンプルで少なくとも3回繰り返した。統計テストは、Python(バージョン 3.7.4)又はR(バージョン 3.6.0.)と、Scipy(バージョン 1.4.1)とによって行った。
(Statistical analysis)
All genome editing experiments were repeated at least 3 times with independent cell culture samples. Statistical tests were performed with Python (version 3.7.4) or R (version 3.6.0.) And Scipy (version 1.4.1).
 以下に、上述の材料及び方法を用いて行った実施例について説明する。 The examples carried out using the above-mentioned materials and methods will be described below.
 C→T及びA→Gの両方の塩基置換を同時に可能とする塩基編集ツールを提供することを目的として、シチジンデアミネーサーゼとアデノシンデアミネーサーゼを単一のnCas9(D10A)に融合させ、両機能を有する、3種の塩基エディタ Target-ACE、Target-ACEmax、ACBEmaxを開発し、試験した(図1)。 Cytidine deaminase and adenosine deaminase in a single nCas9 (D10A) with the aim of providing a base editing tool that allows simultaneous base substitution of both C → T and A → G. Three types of base editors, Target-ACE, Target-ACEmax, and ACBEmax, which have both functions by fusion, were developed and tested (Fig. 1).
 Target-ACEは、Target-AID由来のPmCDA1に融合したnCas9と、ABE-7.10由来のTadAヘテロダイマーとが、各々C末及びN末として融合しており、矛盾しない単一機能塩基編集に存在する他の機能ドメインを伴い、構成されている。Target-ACEのアミノ酸配列、及びコードするヌクレオチド配列を、配列番号:4及び3に各々示す。なお、配列番号:4に記載のアミノ酸配列において、2~167位は改変型TadA(TadA*)を示し、168~171位はGGSリンカーを示し、172~175位はGGSリンカーを示し、176~191位はXTENリンカーを示し、192~195位はGGSリンカーを示し、196~199位はGGSリンカーを示し、200~365位はTadAを示し、366~369位はGGSリンカーを示し、370~373位はGGSリンカーを示し、374~389位はXTENリンカーを示し、390~393位はGGSリンカーを示し、394~397位はGGSリンカーを示し、398~1764位はSpCas9(D10A)を示し、1765~1774位は2×GSリンカーを示し、1775~1831位はmutated SH3を示し、1832~1857位は3×FLAGを示し、1858~2065位はPmCDA1を示し、2066~2073位は単節型核移行シグナル(SV40 T抗原由来の核移行シグナル)を示し、2077~2159位はウラシルグリコシラーゼ阻害剤を示す。 In Target-ACE, nCas9 fused to PmCDA1 derived from Target-AID and TadA heterodimer derived from ABE-7.10 are fused as C-terminal and N-terminal, respectively, for consistent single-function base editing. It is configured with other functional domains that exist. The amino acid sequence of Target-ACE and the nucleotide sequence encoding are shown in SEQ ID NOs: 4 and 3, respectively. In the amino acid sequence shown in SEQ ID NO: 4, positions 2 to 167 indicate a modified TadA (TadA *), positions 168 to 171 indicate a GGS linker, and positions 172 to 175 indicate a GGS linker, and positions 176 to 175. Positions 191 indicate the XTEN linker, positions 192 to 195 indicate the GGS linker, positions 196 to 199 indicate the GGS linker, positions 200 to 365 indicate TadA, and positions 366 to 369 indicate the GGS linker, and positions 370 to 373. Positions indicate GGS linkers, positions 374 to 389 indicate XTEN linkers, positions 390 to 393 indicate GGS linkers, positions 394 to 397 indicate GGS linkers, and positions 398 to 1764 indicate SpCas9 (D10A), 1765. Positions ~ 1774 indicate 2 × GS linker, positions 1775-1831 indicate mutant SH3, positions 1832 to 1857 indicate 3 × FLAG, positions 1858 to 2065 indicate PmCDA1, and positions 2066 to 2073 indicate mononode type nuclei. A translocation signal (nuclear localization signal derived from SV40 T antigen) is shown, and positions 2077 to 2159 indicate a uracil glycosilase inhibitor.
 Target-ACEmaxは、前記Target-ACEに、GenScriptコドン最適化と、N末端への双節型核局在化シグナル(NLS)の追加を施すことによって、構築した(図2)。Target-ACEmaxのアミノ酸配列、及びコードするヌクレオチド配列を、配列番号:2及び1に各々示す。なお、配列番号:2に記載のアミノ酸配列において、2~19位は双節型核移行シグナルを示し、20~185位は改変型TadA(TadA*)を示し、186~189位はGGSリンカーを示し、190~193位はGGSリンカーを示し、194~209位はXTENリンカーを示し、210~213位はGGSリンカーを示し、214~217位はGGSリンカーを示し、218~383位はTadAを示し、384~387位はGGSリンカーを示し、388~391位はGGSリンカーを示し、392~407位はXTENリンカーを示し、408~411位はGGSリンカーを示し、412~415位はGGSリンカーを示し、416~1782位はSpCas9(D10A)を示し、1783~1792位は2×GSリンカーを示し、1793~1849位はmutated SH3を示し、1850~1875位は3×FLAGを示し、1876~2083位はPmCDA1を示し、2084~2091位は単節型核移行シグナル(SV40 T抗原由来の核移行シグナル)を示し、2095~2177位はウラシルグリコシラーゼ阻害剤を示す。 Target-ACEmax was constructed by applying GenScript codon optimization and addition of a bifurcated nuclear localization signal (NLS) to the N-terminal to the Target-ACE (Fig. 2). The amino acid sequence of Target-ACEmax and the encoding nucleotide sequence are shown in SEQ ID NOs: 2 and 1, respectively. In the amino acid sequence shown in SEQ ID NO: 2, positions 2 to 19 indicate a binode-type nuclear localization signal, positions 20 to 185 indicate a modified TadA (TadA *), and positions 186 to 189 indicate a GGS linker. Positions 190 to 193 indicate the GGS linker, positions 194 to 209 indicate the XTEN linker, positions 210 to 213 indicate the GGS linker, positions 214 to 217 indicate the GGS linker, and positions 218 to 383 indicate TadA. , 384-387 positions indicate GGS linker, 388-391 positions indicate GGS linker, 392-407 positions indicate XTEN linker, 408-411 positions indicate GGS linker, and 412-415 positions indicate GGS linker. Positions 416 to 1782 indicate SpCas9 (D10A), positions 1783 to 1792 indicate 2 × GS linker, positions 1793 to 1849 indicate mutant SH3, positions 1850 to 1875 indicate 3 × FLAG, and positions 1876 to 2083. Indicates PmCDA1, positions 2084 to 2091 indicate a mononode-type nuclear localization signal (nuclear localization signal derived from SV40 T antigen), and positions 2095 to 2177 indicate a uracil glycosylase inhibitor.
 ACBEmaxは、Target-ACEmaxにおいてコドンが最適化されたPmCDA1ドメインを、BE4max由来のシチジンデアミナーゼドメイン rAPOBEC1をコドンが最適化されたものに置き換えることによって構築した。 ACBEmax was constructed by replacing the codon-optimized PmCDA1 domain in Target-ACEmax with the codon-optimized cytidine deaminase domain rAPOBEC1 derived from BE4max.
 ACBEmaxのアミノ酸配列、及びコードするヌクレオチド配列を、配列番号:6及び5に各々示す。なお、配列番号:6に記載のアミノ酸配列において、2~19位は双節型核移行シグナルを示し、20~185位は改変型TadA(TadA*)を示し、186~189位はGGSリンカーを示し、190~193位はGGSリンカーを示し、194~209位はXTENリンカーを示し、210~213位はGGSリンカーを示し、214~217位はGGSリンカーを示し、218~383位はTadAを示し、384~387位はGGSリンカーを示し、388~391位はGGSリンカーを示し、392~407位はXTENリンカーを示し、408~411位はGGSリンカーを示し、412~415位はGGSリンカーを示し、416~1782位はSpCas9(D10A)を示し、1783~1792位は2×GSリンカーを示し、1793~1849位はmutated SH3を示し、1850~1875位は3×FLAGを示し、1876~2103位はAPOBEC1を示し、2104~2186位はウラシルグリコシラーゼ阻害剤を示し、2187~2196位は3×GGSリンカーを示し、2197~2279位はウラシルグリコシラーゼ阻害剤を示し、2280~2283位はGGSリンカーを示し、2284~2300位は双節型核移行シグナルを示す。 The amino acid sequence of ACBEmax and the encoding nucleotide sequence are shown in SEQ ID NOs: 6 and 5, respectively. In the amino acid sequence shown in SEQ ID NO: 6, positions 2 to 19 indicate a binode-type nuclear localization signal, positions 20 to 185 indicate a modified TadA (TadA *), and positions 186 to 189 indicate a GGS linker. The 190-193 positions indicate the GGS linker, the 194-209 positions indicate the XTEN linker, the 210-213 positions indicate the GGS linker, the 214-217 positions indicate the GGS linker, and the 218-383 positions indicate the TadA. , 384-387 positions indicate the GGS linker, 388-391 positions indicate the GGS linker, 392-407 positions indicate the XTEN linker, 408-411 positions indicate the GGS linker, and 412-415 positions indicate the GGS linker. Positions 416 to 1782 indicate SpCas9 (D10A), positions 1783 to 1792 indicate a 2 × GS linker, positions 1793 to 1849 indicate mutant SH3, positions 1850 to 1875 indicate 3 × FLAG, and positions 1876 to 2103. Indicates APOBEC1, positions 2104 to 2186 indicate a uracil glycosylase inhibitor, positions 2187 to 2196 indicate a 3 × GGS linker, positions 2197 to 2279 indicate a uracil glycosylase inhibitor, and positions 2280 to 2283 indicate a GGS linker. , 2284-2300 indicate a bifurcated nuclear localization signal.
 また。これら二重機能塩基エディタのコントロールとして、Target-AID、BE4max、ABE、ABEmaxに加えて、コドンが最適かされたTarget-AIDmaxとBE4max(C)とを構築した。BE4max(C)は、Target-AIDmax由来のC末 PmCDA1ドメインを、BE4max由来のrAPOBEC1ドメインによって置換したものである。 Also. As controls for these dual-function base editors, in addition to Target-AID, BE4max, ABE, and ABEmax, codon-optimized Target-AIDmax and BE4max (C) were constructed. BE4max (C) is obtained by replacing the C-terminal PmCDA1 domain derived from Target-AIDmax with the rAPOBEC1 domain derived from BE4max.
 単機能塩基エディタ及び二重機能塩基エディタの生細胞における塩基編集活性を試験するため、本発明者らは、対応する塩基置換がEGFPタンパク質の発現を活性化するように設計された、C→T及びA→G塩基編集レポーター細胞を構築した(図3A及び3B)。全ての単機能CBEsはC→Tレポーター細胞しか活性化できなかった。また、ABEsはA→Gレポーター細胞しか活性化できなかった(図4)。一方、Target-ACE、Target-ACEmax及びACBEmaxの3種の二重機能塩基エディタは、それらのコントロール(各二重機能塩基エディタに対応する酵素の混合物;Target-AID+ABE、Target-AIDmax+ABEmax、BE4max(C)+ABEmax、及びBE4max+ABEmax)同様に、C→Tレポーター細胞及びA→Gレポーター細胞の両方を活性化した。これらの結果は、レポーター細胞におけるgRNA標的領域のアンプリコンシーケンシング(図5A~D)によっても確認された。 To test the base editing activity of single-function and dual-function base editors in living cells, we have designed the corresponding base substitutions to activate the expression of the EGFP protein, C → T. And A → G base editing reporter cells were constructed (FIGS. 3A and 3B). All monofunctional CBEs could only activate C → T reporter cells. In addition, ABEs could only activate A → G reporter cells (Fig. 4). On the other hand, three types of dual-function base editors, Target-ACE, Target-ACEmax and ACBEmax, control them (a mixture of enzymes corresponding to each dual-function base editor; Target-AID + ABE, Target-AIDmax + ABEmax, BE4max (C). ) + ABEmax, and BE4max + ABEmax), both C → T reporter cells and A → G reporter cells were activated. These results were also confirmed by amplicon sequencing of gRNA target regions in reporter cells (FIGS. 5A-D).
 種々の塩基編集方法におけるC→T及びA→G塩基編集活性を特徴付けるために、ヒト胚性腎臓(HEK293Ta)細胞の47のゲノム標的部位における塩基編集スペクトルを、三重のアンプリコンシーケンシング(1,833アッセイ)によって分析した。その結果、PAMに関連する各部位のシトシン又はアデニンにおける、C→T又はA→Gの平均編集頻度をとることによって、二重機能塩基エディタは、対応する単機能塩基エディタから同様の塩基編集特性を継承していることが明らかになった(図6)。また、異なる数のシトシンとアデニンを含む47の内因性標的部位におけるC→T及びA→G編集頻度は大きく異なっていたが、塩基編集スペクトルデータと全体的に一致していた(図7)。 In order to characterize the C → T and A → G base editing activities in various base editing methods, the base editing spectra at 47 genomic target sites of human embryonic kidney (HEK293Ta) cells were subjected to triple amplicon sequencing (1, 833 assay). As a result, by taking the average editing frequency of C → T or A → G in cytosine or adenine at each site related to PAM, the dual-function base editor has similar base editing properties from the corresponding single-function base editor. It became clear that it inherited (Fig. 6). In addition, the C → T and A → G edit frequencies at 47 endogenous target sites containing different numbers of cytosine and adenine were significantly different, but were generally consistent with the base edit spectral data (FIG. 7).
 47のゲノム標的部位のアンプリコンシーケンシングデータでは、二重機能塩基エディタと、4種のコントロール(対応する塩基エディタの組み合わせ4種)とによる、C→TとA→Gの両方の編集を伴う722の編集結果パターンが、0.1%以上の読み取り頻度で観察された(図8)。全体として、Target-ACEmax及びTarget-ACE、並びにそれらに対応する塩基エディタの混合物は、ACBEmax及びそれに対応する塩基エディタの混合物とは異なる多種塩基編集パターンのクラスターを示すことが明らかになった。47の異なる標的部位における共編集パターンを観察するため、次に13種の塩基エディタのそれぞれについて、ジヌクレオチド相同共編集スペクトル及びジヌクレオチド異種共編集スペクトルを調べた。PAMからの位置が異なる、各シトシン-シトシン対、アデニン-アデニン対又はシトシン-アデニン対における、平均共編集頻度は、関連する標的部位のアンプリコンシーケンシング結果を用いて算出した。種々の塩基エディタ試薬において、複数のC→T編集スペクトルと複数のA→G編集スペクトルは、47の標的部位におけるC→T及びA→Gの平均頻度と同傾向にて、再取得された。C→T及びA→Gを同時に編集する場合、Target-ACEmaxとTarget-AIDmax+ABEmaxは、PAMに対して-18bpに位置するシトシン及び15bpに位置するアデニンの周辺において、それぞれ19.2%と21.0%のピークの高さにて、効率的な共編集スペクトルを示した(図9及び10)。ACBEmaxとBE4max+ABEmaxは、別の効率的な共編集スペクトルを示し、-12及び-15bpにおけるシトシン-アデニン位置の組み合わせにおけるピーク頻度は、全ての塩基エディタ試薬の中で最も高かった(それぞれ23.2%と23.6%)。 The amplicon sequencing data for 47 genomic target sites involves editing both C → T and A → G by a dual-function base editor and four controls (four combinations of corresponding base editors). An edit result pattern of 722 was observed with a read frequency of 0.1% or more (FIG. 8). Overall, it was revealed that Target-ACEmax and Target-ACE, and their corresponding base editor mixtures, exhibit clusters of multi-base editing patterns that differ from ACBEmax and its corresponding base editor mixtures. To observe co-editing patterns at 47 different target sites, dinucleotide homologous co-editing spectra and dinucleotide heterogeneous co-editing spectra were then examined for each of the 13 base editors. The mean co-editing frequency for each cytosine-cytosine pair, adenine-adenine pair or cytosine-adenine pair at different locations from the PAM was calculated using the amplicon sequencing results of the relevant target sites. In the various base editor reagents, the plurality of C → T edited spectra and the plurality of A → G edited spectra were reacquired with the same tendency as the average frequency of C → T and A → G at 47 target sites. When editing C → T and A → G at the same time, Target-ACEmax and Target-AIDmax + ABEmax were 19.2% and 21. Around cytosine located at -18 bp and adenine located at 15 bp with respect to PAM, respectively. Efficient co-editing spectra were shown at a peak height of 0% (FIGS. 9 and 10). ACBEmax and BE4max + ABEmax showed different efficient co-editing spectra, with peak frequencies in the cytosine-adenine position combination at -12 and -15bp being the highest of all base editor reagents (23.2% each). And 23.6%).
 gRNAの非特異的結合によって引き起こされる種々の塩基編集方法のDNAオフターゲット効果を調べるため、3つのgRNAを用い、それらのオンターゲット及び一般的に観察されるオフターゲット部位(Tsai,S.Q.et al.Nat Biotechnol 33,187-197 (2015).、Kleinstiver,B.P.et al.Nature 529,490-495(2016).、図11)のアンプリコンシーケンシングを行った。アンプリコンシーケンシングデータに基づいて、オフターゲットリスクスコアを種々の塩基編集方法について計算した(図12)。二重機能塩基エディタのオフターゲットリスクスコアは、単機能塩基エディタのオフターゲットリスクスコアの範囲内であった。それに対し、塩基エディタの混合物におけるオフターゲットリスクスコアは、対応する二重機能塩基エディタのリスクスコアよりも著しく高かった。また、エキソームシーケンシング(WES)全体とトランスクリプトームシーケンシング(RNA-seq)によって、13種の塩基エディタのゲノムワイドDNA及びRNAオフターゲット活性を広範囲に調査した。WESデータセットでは一塩基変動(SNV)の上昇レベルは検出されなかったが、PmCDA1関連ではなく、rAPOBEC1に関連する塩基編集方法を用いて見つかったC→U RNA編集の数は、以前の報告(Grunewald,J.et al.Nature 569,433-437(2019).、Grunewald,J.et al.Nat Biotechnol 37,1041-1048(2019).)と一致する他のサンプル(322)の場合よりも顕著に高かった(図13A及び13B)。非特異的なA→I RNA編集活性は、ABEmax又は、ABE若しくはABEmaxを含む塩基編集混合物を使用した場合、以前に報告された他の方法(Grunewald,J.et al.Nature 569,433-437(2019).、Zhou,C.et al.Nature 571,275-278(2019).、Rees,H.A.,Wilson,C.,Doman,J.L.&Liu,D.R.Sci Adv 5,eaax5717(2019).)と比較して、全体的に有意に高かった(P値=0.00019)。特に、Target-ACEmaxの非特異的A→I編集活性(2つの複製にわたる平均3,359)は、Target-AIDmax+ABEmax(平均4,179)よりも比較的低かった。 To investigate the DNA off-target effects of various base editing methods caused by non-specific binding of gRNAs, three gRNAs were used and their on-target and commonly observed off-target sites (Tsai, S.Q. Ampli-sequencing of et al. Nat Biotechnology 33,187-197 (2015)., Kleinstiber, BP et al. Nature 529,490-495 (2016)., FIG. 11) was performed. Off-target risk scores were calculated for various base editing methods based on ampli-consequencing data (FIG. 12). The off-target risk score of the dual-function base editor was within the off-target risk score of the single-function base editor. In contrast, the off-target risk score for the base editor mixture was significantly higher than the risk score for the corresponding dual-function base editor. In addition, genome-wide DNA and RNA off-target activity of 13 base editors were extensively investigated by whole exome sequencing (WES) and transcriptome sequencing (RNA-seq). No elevated levels of single nucleotide polymorphism (SNV) were detected in the WES dataset, but the number of C → U RNA edits found using rAPOBEC1-related base editing methods rather than PmCDA1-related was previously reported ( Compared to the case of other samples (322) that match Grunewald, J. et al. Nature 569, 433-437 (2019)., Grunewald, J. et al. Nat Biotechnology 37, 1041-1048 (2019).). It was significantly higher (FIGS. 13A and 13B). Non-specific A → I RNA editing activity can be achieved by other previously reported methods (Grunewald, J. et al. Nature 569, 433-437) when using ABEmax or a base editing mixture containing ABE or ABEmax. (2019)., Zhou, C. et al. Nature 571,275-278 (2019)., Rees, HA, Wilson, C., Doman, J.L. & Liu, D.R.Sci Adv 5 , Eax5717 (2019).), Which was significantly higher overall (P value = 0.00019). In particular, the non-specific A → I editing activity of Target-ACEmax (mean 3,359 over two replications) was relatively lower than Target-AIDmax + ABEmax (mean 4,179).
 野生型Cas9を介したゲノム編集の結果を予測する最近の機械学習アプローチ(Shen,M.W.et al.Nature 563,646-651(2018).、Allen,F.et al.Nat Biotechnol,64-72(2018).、Chen,W.et al.Nucleic Acids Res,gkz487(2019).)と同様に、本発明者らは、アンプリコンシーケンシングデータを訓練し、所定の標的配列の塩基編集パターンとその頻度を予測する塩基編集予測方法を開発した(図14)。簡単に言うと、その本発明者らの方法によって、Target-ACEmax及びACBEmaxに関し、ピアソンの相関係数がそれぞれ0.70と0.71である、訓練されていないターゲットの塩基編集結果を予測することに成功した。また、本実施例にて得られたアンプリコンシーケンシングデータは、トレーニング手順に十分であることを実証した。機械学習法では、マルチヌクレオチド共編集の予測が可能なため、種々の塩基編集法によって得られたヒトゲノムにおけるコドン変換パターンの全ての頻度を予測するために利用した。バイスタンダー変異が起こることが許されなかった場合、この分析は、Target-ACEmax及びそれに対応する塩基エディタミックス、Target-AIDmax+ABEmaxがゲノムコドンを多様化するための最も高い可能性を有することを示した。その後、バイスタンダー変異が起こることを許容し、同じ分析を繰り返した。そして、全ての塩基編集方法に対して、望ましくない変異を発生させるバイスタンダーリスクを推定した(図15)。その結果、Target-ACEmax及びACBEmaxのバイスタンダー変異のリスクは、単機能塩基エディタにおいて最も高いBE4maxのそれの範囲内であった。ACBEmaxのバイスタンダー変異リスクスコアは、Target-ACEmaxのそれよりも有意に低かったが、これはACBEmaxがゲノム全体のコドン変換パターンの拡大において顕著な改善を示さなかったのとほぼ一致した。 Recent machine learning approaches to predict the outcome of genome editing via wild-type Cas9 (Shen, MW et al. Nature 563, 646-651 (2018)., Allen, F. et al. Nat Biotechnology, 64. -72 (2018)., Chen, W. et al. Nucleic Acids Res, gkz487 (2019).), The present inventors trained amplicon sequencing data and edited the base of a predetermined target sequence. We have developed a base editing prediction method for predicting patterns and their frequencies (Fig. 14). Simply put, our method predicts the base edit results for untrained targets with Pearson correlation coefficients of 0.70 and 0.71, respectively, for Target-ACEmax and ACBEmax. I succeeded in doing so. In addition, the ampli-consequencing data obtained in this example proved to be sufficient for the training procedure. Since the machine learning method can predict multinucleotide co-editing, it was used to predict all frequencies of codon conversion patterns in the human genome obtained by various base editing methods. If bystander mutations were not allowed to occur, this analysis showed that Target-ACEmax and its corresponding base editor mix, Target-AIDmax + ABEmax, have the highest potential for diversifying genomic codons. The same analysis was then repeated, allowing bystander mutations to occur. Then, the bystander risk of causing undesired mutations was estimated for all base editing methods (Fig. 15). As a result, the risk of bystander mutations in Target-ACEmax and ACBEmax was within that range of BE4max, which is the highest in the single-function base editor. The bystander mutation risk score for ACBEmax was significantly lower than that for Target-ACEmax, which was in close agreement with ACBEmax showing no significant improvement in the expansion of the genome-wide codon conversion pattern.
 また、本発明者らによるモデルによって、Target-ACEmaxが、ClinVarデータベース(Landrum,M.J.et al.Nucleic Acids Res 44,D862-868(2016).)で報告された異種疾患変異のペアを修正する可能性が最も高いと予測された(図16)。 In addition, according to the model by the present inventors, Target-ACEmax obtained a pair of heterologous disease mutations reported in the ClinVar database (Landrum, MJ et al. Nucleic Acids Res 44, D862-868 (2016).). It was predicted that it was most likely to be corrected (Fig. 16).
 以上説明したように、本発明によれば、多種の塩基置換を同時に行なうことができる。そのため、本発明は、品種改良、遺伝子治療等を含めた、農林水産、畜産、医療分野等での、幅広い応用が期待される。 As described above, according to the present invention, various base substitutions can be performed at the same time. Therefore, the present invention is expected to have a wide range of applications in the fields of agriculture, forestry and fisheries, livestock, medical care, etc., including breeding and gene therapy.

Claims (9)

  1.  Casタンパク質のN末端及びC末端に異なる核酸塩基変換酵素が結合した融合タンパク質であって、Casタンパク質がヌクレアーゼ活性の一部又は全部を喪失している融合タンパク質。 A fusion protein in which different nucleobase converting enzymes are bound to the N-terminal and C-terminal of the Cas protein, and the Cas protein loses part or all of its nuclease activity.
  2.  異なる核酸塩基変換酵素がアデノシンデアミナーゼ及びシチジンデアミナーゼである、請求項1に記載の融合タンパク質。 The fusion protein according to claim 1, wherein the different nucleobase converting enzymes are adenosine deaminase and cytidine deaminase.
  3.  融合タンパク質において、アデノシンデアミナーゼがCasタンパク質のN末端に、シチジンデアミナーゼがCasタンパク質のC末端に結合している、請求項2に記載の融合タンパク質。 The fusion protein according to claim 2, wherein in the fusion protein, adenosine deaminase is bound to the N-terminal of the Cas protein and cytidine deaminase is bound to the C-terminal of the Cas protein.
  4.  さらに、核移行シグナル及びウラシルグリコシラーゼ阻害剤の少なくとも1つが結合していている、請求項1から3のいずれかに記載の融合タンパク質。 The fusion protein according to any one of claims 1 to 3, further to which at least one of a nuclear localization signal and a uracil glycosylase inhibitor is bound.
  5.  請求項1から4のいずれかに記載の融合タンパク質をコードするポリヌクレオチド。 A polynucleotide encoding the fusion protein according to any one of claims 1 to 4.
  6.  導入される宿主細胞に合わせてコドンが最適化されている、請求項5に記載のポリヌクレオチド。 The polynucleotide according to claim 5, wherein the codon is optimized according to the host cell to be introduced.
  7.  請求項5又は6に記載のポリヌクレオチドを含む発現ベクター。 An expression vector containing the polynucleotide according to claim 5 or 6.
  8.  以下の(A)及び(B)を含むゲノム編集システム。
    (A)請求項1から4のいずれかに記載の融合タンパク質、請求項5若しくは6に記載のポリヌクレオチド、又は請求項7に記載の発現ベクター。
    (B)ガイドRNA、該ガイドRNAをコードするポリヌクレオチド、又は該ポリヌクレオチドを含む発現ベクター
    A genome editing system including the following (A) and (B).
    (A) The fusion protein according to any one of claims 1 to 4, the polynucleotide according to claim 5 or 6, or the expression vector according to claim 7.
    (B) A guide RNA, a polynucleotide encoding the guide RNA, or an expression vector containing the polynucleotide.
  9.  DNAが編集された細胞を製造する方法であって、細胞に、請求項8に記載のゲノム編集システムを導入することを含む方法。 A method for producing a cell in which DNA has been edited, which comprises introducing the genome editing system according to claim 8 into the cell.
PCT/JP2020/021456 2019-05-30 2020-05-29 GENOME EDITING SYSTEM USING Cas PROTEIN HAVING TWO TYPES OF NUCLEIC ACID BASE-CONVERTING ENZYMES FUSED THERETO WO2020241869A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962854605P 2019-05-30 2019-05-30
US62/854,605 2019-05-30

Publications (1)

Publication Number Publication Date
WO2020241869A1 true WO2020241869A1 (en) 2020-12-03

Family

ID=73552147

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/021456 WO2020241869A1 (en) 2019-05-30 2020-05-29 GENOME EDITING SYSTEM USING Cas PROTEIN HAVING TWO TYPES OF NUCLEIC ACID BASE-CONVERTING ENZYMES FUSED THERETO

Country Status (1)

Country Link
WO (1) WO2020241869A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114561392A (en) * 2022-03-22 2022-05-31 绍兴市妇幼保健院 Method for removing HBV e antigen by closing target gene based on base editing technology
CN114561429A (en) * 2022-03-22 2022-05-31 绍兴市妇幼保健院 Treatment method for inhibiting HBV surface antigen based on base editing ATG initiation codon

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017183724A1 (en) * 2016-04-21 2017-10-26 国立大学法人神戸大学 Method for increasing mutation introduction efficiency in genome sequence modification technique, and molecular complex to be used therefor
WO2018027078A1 (en) * 2016-08-03 2018-02-08 President And Fellows Of Harard College Adenosine nucleobase editors and uses thereof
WO2018213708A1 (en) * 2017-05-18 2018-11-22 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
JP2019509012A (en) * 2015-10-23 2019-04-04 プレジデント アンド フェローズ オブ ハーバード カレッジ Nucleobase editing factors and uses thereof
WO2020028823A1 (en) * 2018-08-03 2020-02-06 Beam Therapeutics Inc. Multi-effector nucleobase editors and methods of using same to modify a nucleic acid target sequence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019509012A (en) * 2015-10-23 2019-04-04 プレジデント アンド フェローズ オブ ハーバード カレッジ Nucleobase editing factors and uses thereof
WO2017183724A1 (en) * 2016-04-21 2017-10-26 国立大学法人神戸大学 Method for increasing mutation introduction efficiency in genome sequence modification technique, and molecular complex to be used therefor
WO2018027078A1 (en) * 2016-08-03 2018-02-08 President And Fellows Of Harard College Adenosine nucleobase editors and uses thereof
WO2018213708A1 (en) * 2017-05-18 2018-11-22 The Broad Institute, Inc. Systems, methods, and compositions for targeted nucleic acid editing
WO2020028823A1 (en) * 2018-08-03 2020-02-06 Beam Therapeutics Inc. Multi-effector nucleobase editors and methods of using same to modify a nucleic acid target sequence

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GAUDELLI, NICOLE M. ET AL.: "Programmable base editing of A.T to G-C in genomic DNA without DNA cleavage", NATURE, vol. 551, 2017, pages 464 - 471, XP037203026, DOI: 10.1038/nature24644 *
KOBLAN, LUKE W. ET AL.: "Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction", NAT. BIOTECHNOL., vol. 36, 2018, pages 843 - 846, XP036929657, DOI: 10.1038/nbt.4172 *
KOMOR, ALEXIS C. ET AL.: "Programmable editing of a target base in genomic DNA without double- stranded DNA cleavage", NATURE, vol. 533, 2016, pages 420 - 424, XP055551781, DOI: 10.1038/nature17946 *
NISHIDA, KEIJI ET AL.: "Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems", SCIENCE, vol. 353, 2016, pages 1 - 14, XP055367833 *
RINA C. SAKATA, ISHIGURO SOH, MORI HIDETO, TANAKA MAMORU, SEKI MOTOAKI, MASUYAMA NANAMI, NISHIDA KEIJI, NISHIMASU HIROSHI, KONDO A: "A single CRISPR base editor to induce simultaneous C-to-T and A-to-G mutations", BIORXIV, 8 August 2019 (2019-08-08), pages 1 - 17, XP055768028, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/729269v1.full> [retrieved on 20200804] *
SAKATA, RINA C. ET AL.: "Base editors for simultaneous introduction of C-to-T and A-to-G mutations", NAT. BIOTECHNOL., vol. 38, June 2020 (2020-06-01), pages 865 - 869, XP037187563, DOI: 10.1038/s41587-020-0509-0 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114561392A (en) * 2022-03-22 2022-05-31 绍兴市妇幼保健院 Method for removing HBV e antigen by closing target gene based on base editing technology
CN114561429A (en) * 2022-03-22 2022-05-31 绍兴市妇幼保健院 Treatment method for inhibiting HBV surface antigen based on base editing ATG initiation codon

Similar Documents

Publication Publication Date Title
US20220251550A1 (en) Methods for extending the replicative capacity of somatic cells during an ex vivo cultivation process
JP6480647B1 (en) Method for producing eukaryotic cell in which DNA is edited, and kit used in the method
Bai et al. CRISPR/Cas9-mediated precise genome modification by a long ssDNA template in zebrafish
Hatfield et al. Selenocysteine incorporation machinery and the role of selenoproteins in development and health
Wierson et al. Expanding the CRISPR toolbox with ErCas12a in zebrafish and human cells
Li et al. Base pair editing in goat: nonsense codon introgression into FGF 5 results in longer hair
Midic et al. Quantitative assessment of timing, efficiency, specificity and genetic mosaicism of CRISPR/Cas9-mediated gene editing of hemoglobin beta gene in rhesus monkey embryos
WO2020241869A1 (en) GENOME EDITING SYSTEM USING Cas PROTEIN HAVING TWO TYPES OF NUCLEIC ACID BASE-CONVERTING ENZYMES FUSED THERETO
EP3019631B1 (en) Genetic profiling method for animals
Teramoto et al. Early embryogenesis-specific expression of the rice transposon Ping enhances amplification of the MITE mPing
Feng et al. Highly efficient CRISPR-mediated gene editing in a rotifer
Jiyang et al. An essential role for REV3 in mammalian cell survival: absence of REV3 induces p53-independent embryonic death
WO2023169410A1 (en) Cytosine deaminase and use thereof in base editing
US20200149063A1 (en) Methods for gender determination and selection of avian embryos in unhatched eggs
WO2022186233A1 (en) Genomic dna modification method and modification detection method
Li et al. Base pair editing of goat embryos: nonsense codon introgression into FGF5 to improve cashmere yield
JPWO2005054463A1 (en) Development of mammalian genome modification technology using retrotransposon
WO2021171688A1 (en) Gene knock-in method, method for producing gene knock-in cell, gene knock-in cell, canceration risk evaluation method, cancer cell production method, and kit for use in same
WO2022050377A1 (en) Method for editing target dna, method for producing cell having edited target dna, and dna edition system for use in said methods
WO2023217085A1 (en) Development of dna targeted gene editing tool
Elsersawi Gene Editing, Epigenetic, Cloning and Therapy
Waldmann Investigations on the molecular biology of human adenylate kinase 2 deficiency (reticular dysgenesis) and the establishment and characterisation of an adenylate kinase 2-deficient mouse model
Lowry The meiotic drive mechanism of a selfish chromosome in Zea mays
Ko Generation of mouse models of neurodevelopmental disorders using the CRISPR/Cas9 system
Hennig Investigating the Mechanism Underlying the Polled Phenotype in Cattle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20814290

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20814290

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP