WO2018135838A2 - Method for identifying base editing off-target site by dna single strand break - Google Patents

Method for identifying base editing off-target site by dna single strand break Download PDF

Info

Publication number
WO2018135838A2
WO2018135838A2 PCT/KR2018/000747 KR2018000747W WO2018135838A2 WO 2018135838 A2 WO2018135838 A2 WO 2018135838A2 KR 2018000747 W KR2018000747 W KR 2018000747W WO 2018135838 A2 WO2018135838 A2 WO 2018135838A2
Authority
WO
WIPO (PCT)
Prior art keywords
dna
target
emx
sequence
coding gene
Prior art date
Application number
PCT/KR2018/000747
Other languages
French (fr)
Korean (ko)
Other versions
WO2018135838A3 (en
Inventor
김진수
Original Assignee
기초과학연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 기초과학연구원 filed Critical 기초과학연구원
Priority to CN201880007380.8A priority Critical patent/CN110234770A/en
Priority to JP2019559249A priority patent/JP2020505062A/en
Priority to EP18741209.3A priority patent/EP3572525A4/en
Publication of WO2018135838A2 publication Critical patent/WO2018135838A2/en
Publication of WO2018135838A3 publication Critical patent/WO2018135838A3/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01GHORTICULTURE; CULTIVATION OF VEGETABLES, FLOWERS, RICE, FRUIT, VINES, HOPS OR SEAWEED; FORESTRY; WATERING
    • A01G13/00Protecting plants
    • A01G13/02Protective coverings for plants; Coverings for the ground; Devices for laying-out or removing coverings
    • A01G13/0206Canopies, i.e. devices providing a roof above the plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01GHORTICULTURE; CULTIVATION OF VEGETABLES, FLOWERS, RICE, FRUIT, VINES, HOPS OR SEAWEED; FORESTRY; WATERING
    • A01G13/00Protecting plants
    • A01G13/02Protective coverings for plants; Coverings for the ground; Devices for laying-out or removing coverings
    • A01G13/025Devices for laying-out or removing plant coverings
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2497Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/02Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
    • C12Y302/02027Uracil-DNA glycosylase (3.2.2.27)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04005Cytidine deaminase (3.5.4.5)
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F16ENGINEERING ELEMENTS AND UNITS; GENERAL MEASURES FOR PRODUCING AND MAINTAINING EFFECTIVE FUNCTIONING OF MACHINES OR INSTALLATIONS; THERMAL INSULATION IN GENERAL
    • F16BDEVICES FOR FASTENING OR SECURING CONSTRUCTIONAL ELEMENTS OR MACHINE PARTS TOGETHER, e.g. NAILS, BOLTS, CIRCLIPS, CLAMPS, CLIPS OR WEDGES; JOINTS OR JOINTING
    • F16B2/00Friction-grip releasable fastenings
    • F16B2/20Clips, i.e. with gripping action effected solely by the inherent resistance to deformation of the material of the fastening
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/11Antisense
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • a composition for DNA single strand break comprising cytidine deaminase, inactivated target specific endonuclease, and guide RNA, DNA using the same.
  • Base Editors Programmable deaminases
  • DNA binding modules and cytidine deaminases target nucleotide substitutions or base modifications in the genome without generating DNA double-strand breaks.
  • programmable 'nucleases such as CRISPR-Cas9 and zinc-finger nucleases (ZFNs)
  • ZFNs zinc-finger nucleases
  • Base braces can correct point mutations that cause genetic disease in human cells, animals, and plants, or produce single-nucleotide polymorphisms (SNPs).
  • a base straightener comprising a catalytically deficient Cas9 (dCas9) or D10A Cas9 kinase (nCas9) derived from 5.
  • dCas9 catalytically deficient Cas9
  • nCas9 D10A Cas9 kinase
  • Target-AID comprising dCas9 or nCas9 and PmCDAl or human AID, which is act i vat ion-induced cytidine deaminase (AID) ortholog of sea lamprey; 3) CRISPR-X containing sgRNAs linked to MS2 RNA hairpins and clCas9 to recruit overactivated AID variants fused to MS2-binding proteins; and 4) zinc-finger proteins or transcriptional activator-like effectors (TALEs). Fused to cytidine diaminase.
  • TALEs transcriptional activator-like effectors
  • a means for analyzing target specificity of the entire genome of a braces and a means for analyzing non-target sites, non-target effects, etc. of the braces are provided.
  • One example includes (a) a diaminase or a coding gene thereof (cDNA, rDNA, or mR A), (b) an inactivated target specific endonuclease or a coding gene thereof (cDNA, rDNA, or mRNA), and (c) A single DNA, comprising a guide RNA or a coding gene thereof.
  • a diaminase or a coding gene thereof cDNA, rDNA, or mR A
  • an inactivated target specific endonuclease or a coding gene thereof cDNA, rDNA, or mRNA
  • a single DNA comprising a guide RNA or a coding gene thereof.
  • the composition may be free of uracil-specific Excision Reagent (USER).
  • RNA diaminase or its coding gene
  • cDNA diaminase or its coding gene
  • a activated activated target specific endonuclease or its coding gene cDNA, rDNA, or mRNA
  • introducing the guide RNA or a coding gene thereof into the cell or contacting the DNA isolated from the cell may be one that does not include treating uracil-specific removal reagent (USER).
  • USER uracil-specific removal reagent
  • Another example is
  • Another example is
  • the method comprises, for example, after step ( ⁇ ) and before, concurrently or after step (iii), (iii-1) base correction (eg, cytosine (a) in nucleic acid sequence data obtained by the assay.
  • C) may further comprise determining whether to convert to uracil (U) or thymine (T).
  • the method may not comprise the step of treating the uracil-specific removal reagent (USER) to generate a double cleavage in the DNA.
  • the method e.g., base calibration efficiency at an on-target site, off-target site identification method
  • the cleavage site is the target site If not (on—target site), it may further include the step of identifying (determining) off-target site.
  • a base straightener composed of Cas9 nickase and deaminase (eg,
  • BE3 nontarget sites can be identified computationally using full genome siege data.
  • a technique is provided for generating double strand breaks in DNA using a diminase that does not cause double strand breaks in DNA.
  • One example may comprise (a) a diaminase or a coding gene thereof (cDNA, rDNA, or mRNA), (b) an inactivated target specific endonuclease or a coding gene thereof (cDNA, rDNA, or mRNA) And (c) provides a composition for DNA single strand breaks, comprising a guide RNA or a coding gene thereof.
  • the composition may be free of uracil-specific Excision Reagent (USER).
  • Coding genes as used herein may be used in the form of cDNA, rDNA or recombinant vector comprising the same, or mRNA.
  • the deaminase may be cytidine deaminase.
  • Cytidine deaminase converts cytosine (eg, cytosine present in double-stranded DNA or RNA), which is a base present in nucleotides, into uracil.
  • C-to ⁇ U conversion or C-to-U editing means all enzymes having the activity of converting cytosine located on the strand where the PAM sequence of the target site sequence (target sequence) to uracil Let's do it.
  • the cytidine deaminase may be derived from mammals such as humans, primates such as Wonseung, rats, rodents such as mice, but is not limited thereto.
  • the cytidan deaminase is an enzyme belonging to the family of APOBEC ("apolipoprotein B mRNA editing enzyme, catalytic polypeptide-1 ike"), AID (act i vat ion-induced cyt idine deaminase), CDA (cyt idine deaminase;
  • CDA1 and the like may be one or more selected from the group consisting of, for example, one or more selected from the group consisting of, but is not limited thereto:
  • APOBEC 1 Human Homo sapiens APOBEC 1 (Protein: GenBank Accession Nos. NP—001291495.1, NP_001635.2, NP_005880.2, etc .; Genes (describe the genes encoding them in the order of the proteins described above): GenBank Accession Nos. _001304566.1, li— 001644.4, NM_005889.3, etc.), mouse musculus) AP0BEC1 (protein: GenBank Accession Nos. NP—001127863.1, NP—112436.1); genes (describe the genes encoding them in the order of protein described above): GenBank Accession Nos. NM_001134391-1, NM_031159.3, etc.);
  • AP0BEC2 human AP0BEC2 (protein: GenBank Accession No. NP_006780.1, etc .; gene: GenBank Accession No. ⁇ _006789.3, etc.), mouse AP0BEC2 (protein: GenBank Accession No. NP-033824.1, etc .; gene: GenBank Accession No. ⁇ _009694.3 and the like);
  • AP0BEC3B Human AP0BEC3B (protein: GenBank Accession Nos. NP_001257340.1, NP_004891.4, etc .; gene (mRNA or cDNA, hereinafter identical) (describes genes encoding it in the order of proteins described above): GenBank Accession Nos. NM_001270411. 1, ⁇ _00490 (). 4, etc.), mouse Uius musculus) AP0BEC3B (protein: GenBank Accession Nos. NP— 001153887.1, NP_001333970.1, NP — 084531.1, etc .; genes (describing genes encoding them in the order of proteins described above) GenBank Accession Nos. NM— 001160415.1, NM_001347041.1, Li_030255.3, etc.);
  • AP0BEC3C human AP0BEC3C (protein: GenBank Accession No. NP-055323.2, etc .; gene: GenBank Accession No. ⁇ _014508 ⁇ 2, etc.);
  • AP0BEC3D (including AP0BEC3E): human AP0BEC3D (protein: GenBank Accession No. NP_689639.2, etc .; gene: GenBank Accession No. # 152426.3, etc.);
  • AP0BEC3F human AP0BEC3F (protein: GenBank Accession Nos. NP_660341.2, NP # 001006667.1, etc .; genes (describing genes encoding them in the order of proteins described above): ⁇ _145298.5, ⁇ _001006666.1, etc.);
  • AP0BEC3G Human AP0BEC3G (protein: GenBank Accession Nos. NP_068594.1, NP_001336365.1, NP_001336366.1, NP— 001336367.1, etc .; genes (describe the genes encoding them in the order of proteins described above): NM — 021822.3. NM_001349436. 1, ⁇ _001349437.1, NM_001349438.1 and the like);
  • AP0BEC3H Human AP0BEC3H (Protein: GenBank Accession Nos. NP_001159474.2, NP— 001159475.2, NP_001159476.2, NP_861438.3, etc .; genes (describe the genes encoding them in the order of proteins described above): ⁇ _001166002.2, ⁇ _001166003.2, NM ⁇ 001166004.2, NM_181773.4, etc.);
  • AP0BEC4 (including AP0BEC3E): human AP0BEC4 (protein: GenBank Accession No. NP-982279.1 and the like; gene: GenBank Accession No. NM_203454.2 and the like); Mouse AP0BEC4 (protein: GenBank Accession No. NP — 001074666.1, etc .; gene: GenBank Accession No. NM — 001081197.1, etc.);
  • Act i vat ion-induced cytidine deaminase (AICDA or AID): human AID (protein: GenBank Accession Nos. NP— 001317272.1, NP_065712.1, etc .; gene (listing genes encoding it in the order of proteins described above): GenBank Accession Nos. VII- 001330343.1, NM_020661.3 and the like); Mouse AID (protein: GenBank Accession No. NP — 033775.1, etc .; gene: GenBank Accession No. Lli— 009645.2, etc.); And
  • CDA cytidine deaminase; EC number 3.5.4.5; for example CDA1: GenBank Accession Nos. NP_001776.1 (gene: NM_001785.2), CAA06460.1 (gene: AJ005261.1), ⁇ _416648.1 (gene: NC_000913.3) and the like.
  • target specific nucleases also called progra ⁇ able nucleases, collectively refer to all forms of endonucleases capable of recognizing and cleaving specific positions on the desired genomic DNA.
  • the target specific nuclease may be one or more selected from all nucleases that recognize a specific sequence of the target gene and have nucleotide cleavage activity that may result in insertion and / or deletion (Indel) in the target gene. Can be.
  • the target specific nuclease may be one or more selected from the group consisting of RGEN (RNA-guided engineered nuclease; for example, Cas9, Cpfl, etc.) derived from the microbial immune system CRISPR, but is not limited thereto. no.
  • RGEN RNA-guided engineered nuclease; for example, Cas9, Cpfl, etc.
  • the target specific nuclease is of a type such as Cas protein (eg, Cas9 protein (CRISPR (Clustered regularly interspaced short palindromic repeats) associated protein 9), Cpfl protein (CRISPR from Prevotel la and Francisella 1), etc.). And / or at least one member selected from the group consisting of endonucleases involved in the type V CRISPR system.
  • the target specific nuclease may further comprise a target DNA specific guide RNA for guiding to the target site of the genomic DNA.
  • the guide RNA may be transcr ibed in vitro, such as, but not limited to, oligonucleotide duplex or plasmid template.
  • the target specific nuclease and guide RNA may be used in the form of ribonucleic acid protein (RNP), wherein the ribonucleic acid protein is a complex or a combination of a target specific nuclease or its coding gene and RNA or its coding gene It may be included in the form of a complex.
  • RNP ribonucleic acid protein
  • Cas9 protein is a major protein component of the CRISPR / Cas system, a protein that can function as an activated endonuclease or nickase.
  • Cas9 protein or genetic information can be obtained from known databases such as GenBank of the National Center for Biotechnology Informat ion (NCBI).
  • NCBI National Center for Biotechnology Informat ion
  • the Cas9 protein Strap Toe Caucasus sp. ⁇ Streptococcus sp. Such as Cas9 protein from Streptococcus pyogenes (eg SwissProt Accession number Q99ZW2 (NP_269215.1) (coding gene: SEQ ID NO: 4));
  • Cas9 protein from the genus Campylobacter such as, for example, Campylobacter jejuni;
  • Cas9 protein from the genus Streptococcus such as Streptococcus thermophi les or Streptococcus aureus;
  • Cas9 protein from Pasteureella PasteureJ d such as Pasteurella multocida
  • Cas9 protein from the genus Francisla F mci sella such as, for example, Franciscan la novicida.
  • It may be one or more selected from the group consisting of, but is not limited thereto.
  • the Cpfl protein is an endonuclease of the new CRISPR system that is distinct from the CRISPR / Cas system, which is relatively small in size compared to Cas9, does not require tracrRNA, and can act by a single guide RNA. It also recognizes a thymine-rich protospacer-adjacent motif (PAM) sequence and cuts the double chain of DNA to create a cohesive end (cohesive double-strand break).
  • PAM thymine-rich protospacer-adjacent motif
  • the Cpfl protein may be found in the genus Candida iCandidatus, Lachnospira, Butyrivibrio, Peregr ini bacteria, and axadominococus.
  • BV3L6 Porphyromona s macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevi or / cam 's, Prevotel la disiens, Mo r axel J a bovocul i (237), Smi ihel la sp. (SC— K08D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisel la novicida (U112), Candidatus Methanoplasma termitum, Candidatus Paceibacter, Eubacter ium eligens, and the like, but are not limited thereto.
  • the target specific endonucleases may be isolated from microorganisms or artificially or non-naturally produced such as recombinant or synthetic methods.
  • the target specific endonucleases eg Cas9, Cpf l, etc.
  • Recombinant DNA refers to DNA molecules artificially produced by genetic recombination methods such as molecular cloning to include heterologous or homologous genetic material obtained from various organisms.
  • the recombinant DNA When the recombinant DNA is expressed in an appropriate organism to produce a target specific endonuclease (Un vivo or in ro), the recombinant DNA selects a codon optimized for expression in the organism among the codons encoding the protein to be prepared. It may be one having a nucleotide sequence reconstructed.
  • the inactivated target specific endonuclease inactivated target specific endonuclease refers to a target specific endonuclease that has lost endonuclease activity that cleaves a DNA double strand, for example, an endonuclease. At least one selected from inactivated target specific endonucleases that have lost activity and have Nikase activity and inactivated target specific endonucleases that have lost both endonuclease activity and Nikase activity have.
  • the inactivated target specific endonuclease may have a Nikase activity, in which case the cytosine is converted to uracil simultaneously or sequentially with the cytosine being converted to uracil
  • ni ck is introduced at the opposite strand (such as the opposite strand) (e.g., at a position corresponding to the third and fourth nucleotides in the 5 1 terminal direction of the PAM sequence to the opposite strand of the strand where the PAM sequence is present). ck is introduced).
  • Such modifications (mutations) of the target specific endonuclease are at least catalyzed aspartic acid residues (cat a lytic aspar t ate res i due; e.g., in the tenth position for a Streptococcus pyogenes-derived Cas9 protein).
  • the 'other amino acids' are alanine, isoleucine, leucine, methionine, phenylalanine, plinine, tryptophan, valine, aspartic acid, cysteine, glutamine, glycine, serine, threonine, tyrosine, aspartic acid, Glutamic acid, arginine, histidine, lysine, among all known variants of the amino acids, refers to an amino acid selected from among amino acids except for those that the wild type protein originally had at the mutation site.
  • the modified Cas9 protein is a Cas9 protein derived from Streptococcus pyogenes (eg, Swi ssProt Access i on number Q99ZW2 (NP_269215. 1) a mutation at the D10 or H840 position (e.g., substitution with another amino acid) is introduced, resulting in a modified Cas9, _ Streptococcus pyogenes having lost endonuclease activity and having Nikase activity ' of the mutation in both D '10 and H840 position in Cas9 protein (e.
  • Streptococcus pyogenes eg, Swi ssProt Access i on number Q99ZW2 (NP_269215. 1
  • a mutation at the D10 or H840 position e.g., substitution with another amino acid
  • the mutation at the D10 position of the CAs9 protein means a D10A mutation (mutation in which D, the tenth amino acid of the amino acids of the Cas9 protein, is replaced by A; hereinafter, a mutation introduced into Cas9 is represented by the same method).
  • the mutation at the H840 position may be a H840A mutation.
  • the inactivated target specific endonuclease comprises a Nikase (eg, a D10A mutation having a D10A mutation in which D10 of Cas9 protein (SEQ ID NO: 4) from Streptococcus pyogenes (SEQ ID NO: 4) is substituted for A). , Encoded by SEQ ID NO: 11).
  • Nikase eg, a D10A mutation having a D10A mutation in which D10 of Cas9 protein (SEQ ID NO: 4) from Streptococcus pyogenes (SEQ ID NO: 4) is substituted for A.
  • cytidine deaminase and inactivated target specific endonucleases are fused proteins fused to each other directly or via a peptide linker (e.g., cytidine deaminase—inactivated from the N-terminus to the C-terminal direction).
  • Target-specific endonucleases located in target-specific endonuclease sequence ie, inactivated target-specific endonucleases are fused to the C-terminus of cytidine deaminase
  • the fusion protein is located in the sequence N—terminal to C-terminal, in the order of cytidine deaminase-inactivated target specific endonuclease, or inactivated ' target specific endonuclease- Located in the order of Citidine Diaminase.
  • a fusion protein, or a cytidine deaminase coding gene and a target specific endonuclease coding gene inactivated to encode the fusion protein may be used in the form included in one plasmid.
  • the plasmid may be any plasmid including an expression system capable of inserting the cytidine deaminase coding gene and / or inactivated target specific endonuclease coding gene and expressing it in a host cell.
  • the plasmid contains elements for gene expression of interest, and may include a rep i cat ion or igin, a promoter, an operator, a transcription terminator, and the like.
  • Appropriate enzyme sites eg, restriction enzyme sites
  • the plasmids are plasmids used in the art such as pcDNA series, pSClOl, pGV1106, pACYC177, ColEl, pKT230, pME290, pBR322, pUC8 / 9, pUC6, pBD9, pHC79, P IJ61, pLAFRl, pHV, pGEX series, pET Series, pUC19 may be one or more selected from the group consisting of, but is not limited thereto.
  • the host cell is a cell (eukaryotic cell comprising a mammalian cell such as a human cell) or the cytidine to be subjected to base correction or double strand cleavage by the cytidine deaminase.
  • a cell eukaryotic cell comprising a mammalian cell such as a human cell
  • the cytidine to be subjected to base correction or double strand cleavage by the cytidine deaminase.
  • the guide RNA serves to guide a mixture or fusion protein of the cytidine deaminase and the inactivated target specific endonuclease to the target site, and CRISPR . It may be one or more selected from the group consisting of RNA (crRNA), iray75-act ivat ing crRNA (tracrRNA), and single guide RNA (sgRNA), specifically double-stranded crRNA in which crRNA and tracrRNA are bonded to each other:
  • the tracrRNA complex, or crRNA or portion thereof, and the tracrRNA or portion thereof may be single stranded guide RNA (sgRNA) linked by an oligonucleotide linker.
  • the specific sequence of the guide RNA may be appropriately selected depending on the type of target specific endonuclease used or the microorganism derived therefrom, which is easily understood by those skilled in the art. .
  • the crRNA When using a Cas9 protein from Streptococcus pyogenes as a target specific endonuclease, the crRNA can be expressed by the following general formula (1):
  • N cas9 is a targeting sequence, i.e., a site determined according to the sequence of the target site of the target gene (ie, the sequence can be hybridized with the sequence of the target site), and 1 is included in the targeting sequence. Indicative of the number of nucleotides formed, which may be an integer from 17 to 23 or 18 to 22, such as 20;
  • the site comprising 12 consecutive nucleotides (GUUUUAGAGCUA; SEQ ID NO: 1) located adjacent to the 3 'direction of the target sequence is an essential part of the crRNA,
  • X cas9 is a site comprising m nucleotides located at the 3 'end of the crRNA (ie, located adjacent to the 3' direction of an essential part of the crRNA), where m is an integer from 8 to 12, such as 11
  • the m nucleotides may be the same as or different from each other, and may be independently selected from the group consisting of A, U, C, and G.
  • the X cas9 may include UGCUGUUUUG (SEQ ID NO: 2), but is not limited thereto.
  • tracrRNA may be represented by the following general formula (2):
  • the site represented by SEQ ID NO: 3 is an integral part of the tracrRNA
  • Y caS 9 is a site comprising p nucleotides located adjacent to the 5 'end of the essential portion of the tracrRNA, p may be an integer of 6 to 20, such as an integer of 8 to 19, the p nucleotides are May be the same or different and may be independently selected from the group consisting of A, U, C and G, respectively.
  • the sgRNA is targeted to the crRNA.
  • the crRNA portion comprising the sequence and the essential portion and the tracrRNA portion including the essential portion (60 nucleotides) of the tracrRNA form a hairpin structure through the oligonucleotide linker (stem-loop structure).
  • the ligonucleotide linker corresponds to the loop structure.
  • the sgRNA is a double-stranded RNA molecule in which a crRNA portion including a targeting sequence and an essential portion of the crRNA and a tracrRNA portion including an essential portion of the tracrRNA are bonded to each other.
  • the terminal may have a hampin structure connected via a ligonucleotide linker.
  • the sgRNA can be represented by the following general formula 3:
  • the ligonucleotide linker included in the sgRNA may be one containing 3 to 5, such as 4 nucleotides,
  • the nucleotides may be the same or different from each other, and may be independently selected from the group consisting of A, U, (: and G).
  • the crRNA or sgRNA may further comprise 1-3 guanine (G) at the 5 'end (ie, the 5' end of the targeting sequence region of the crRNA).
  • the tracrRNA or sgRNA may further comprise a termination region comprising 5 to 7 uracils (U) at the 3 ′ end of the essential portion (60nt) of the tracrRNA.
  • the target sequence of the guide RNA is adjacent to 5 'of the PAM (Protospacer Adjacent Motif sequence (5.— NGG-3 1 (N is A, T, G, or C) for pyogenes Cas9)) on the target DNA. And from about 17 to about 23 or from about 18 to about 22, such as 20 contiguous nucleic acid sequences.
  • PAM Protospacer Adjacent Motif sequence
  • the targeting sequence of the guide RNA which is capable of hybridizing with the target sequence of the guide RNA, is the DNA strand in which the target sequence is located (ie, the PAM sequence (5′— NGG-3 ′ (N is A, T, G, or C)). position nyukeul of the complementary strands of the DNA strands) to Leo tide sequence that is at least 50%, at least 60%, at least 70%, 80%, at least 90% or 95%, more than 99%, or 100% sequence " It means a nucleotide sequence having complementarity, it is possible to complementarily bind to the nucleotide sequence of the complementary strand.
  • the nucleic acid sequence of the target site is a PAM sequence of two DNA strands of the corresponding gene site of the target gene .
  • the plaque is represented by the stranded nucleic acid sequence.
  • the targeting sequence included in the guide RNA is changed from T to U on the basis of RNA characteristics. It will have the same nucleic acid sequence as the sequence.
  • the targeting sequence of the guide RNA and the sequence of the target site are represented by the same nucleic acid sequence except that T and U are mutually altered.
  • the guide RNA may be used in the form of RNA (or included in the composition), or in the form of a plasmid containing DNA encoding the same (or in the composition).
  • compositions and methods described herein can be characterized as including or not using uracil-specific removal reagents (USER).
  • the uracil-specific removal reagent serves to remove uracil converted from cytosine by the cytidine deaminase and / or to introduce DNA cleavage at the position where the uracil is removed. It can include all materials.
  • the uracil-specific removal reagent comprises uracil DNA glycosylase (UDG), endonuclease VIII, and combinations thereof.
  • the uracil-specific removal reagent may comprise an endonuclease VIII or a combination of uracil DNA glycosylase and endonuclease VIII.
  • Uracil DNA glycosylase is an enzyme that acts to prevent the mutagenesis of DNA by removing uracil (U) present in DNA, and by cutting the N-glycosylic bond of uracil, base—excision repair
  • uracil DNA glycosylase is Escherichia coli uracil DNA glycosylase (eg, GenBank Accession Nos.
  • the endonuclease VIII serves to remove the nucleotide from which the uracil has been removed, resulting from N-glycosylase activity that removes uracil damaged by the uracil DNA glycosylase from the double-stranded DNA and the damaged uracil removal. At least one may be selected from all enzymes having both AP-lyase activity that cleaves 3 'and 5' ends of an apurinic site (AP site).
  • the endonuclease VIII is human endonuclease VIII (eg, GenBank Accession Nos.
  • mouse endonuclease VIII Eg, GenBank Accession Nos. BAC06477.1, NP-082623.1, etc.
  • Escherichia coli endonuclease VIII eg, GenBank Accession Nos. OBZ49008.1, 0BZ43214.1, 0BZ42025.1, ANJ41661.1, KYL40995.1 , KMV55034.1, ⁇ 53379.1, ⁇ 50038.1, KMV40847.1, AQW72152.1, etc.
  • deaminase or its coding gene cDNA, rDNA, or mRNA
  • inactivated target specific endonuclease or its coding gene cDNA, rDNA, or mRNA
  • guide RNA or a coding gene thereof examples include (a) deaminase or its coding gene (cDNA, rDNA, or mRNA), (b) inactivated target specific endonuclease or its coding gene (cDNA, rDNA, or mRNA), and (c) guide RNA or a coding gene thereof,
  • a method of creating a double strand break in DNA comprising introducing into or contacting DNA isolated from the cell.
  • the method may be one that does not include treating uracil-specific Excision Reagent (USER).
  • USER uracil-specific Excision Reagent
  • genomic DNA by generating (or introducing) a single stranded break in the DNA, genomic DNA.
  • the site where the base correction base editing, ie, the conversion from C to U
  • the base calibration efficiency can be analyzed.
  • the base calibration efficiency at the .on-target site, the specificity of the on-target sequence, the off-target sequence, and the like can be identified (or measured).
  • Another example is
  • analyzing the nucleic acid sequence of the single-stranded DNA fragment provides a method for nucleic acid sequence analysis of DNA introduced with a base editing by a diminase.
  • the method may not comprise the step of treating the uracil-specific removal reagent (USER) to generate a double strand break in the DNA.
  • USR uracil-specific removal reagent
  • Another example is
  • base calibration e.g, cytosine (e.g., cytosine) in nucleic acid sequence data obtained by the analysis.
  • C) may further comprise determining whether to convert to uracil (U) or thymine (T).
  • the method may not comprise the step of treating the uracil-specific Excision Reagent (USER) to generate double strand breaks in the DNA.
  • USER uracil-specific Excision Reagent
  • the method (e.g., base calibration efficiency at an on-target site, a method of identifying off-target sites), wherein after step (iii), (iv) the cleavage site is a target site If not (on-target site), may further comprise the step of identifying (determining) the off-target site.
  • deaminase, inactivated target specific endonuclease, guide RNA and uracil-specific ablation reagents are as described above.
  • the methods provided herein may be performed in cells or in vitro, for example, may be performed in vitro. More specifically, all steps of the method are carried out in vitro, or step (i) is performed intracellularly, and step (ii) and subsequent steps are performed in the cell where step (i) is performed.
  • the extracted DNA eg, genomic DNA
  • Said step (i) comprises transfecting or extracting a deaminase (or a coding gene thereof) and an inactivated target specific endonuclease (or a coding gene thereof) and a guide RNA to the cell, or extracted from the cell.
  • the cell may be selected from all eukaryotic cells intended to introduce base correction and / or single stranded cleavage by deaminase, eg, may be selected from mammalian cells, including human cells.
  • the transfection is
  • RNP ribonucleic acid protein
  • a plasmid (recombinant vector) comprising a diaminase coding gene and a target specific endonuclease coding gene, respectively, or a plasmid comprising a guide RNA or guide RNA coding gene
  • the introduction may be carried out by introducing into the cell by any conventional means, for example, the introduction may be performed by electroporation, lipofection, microinjection and the like, but is not limited thereto.
  • said step (i) comprises the step of base correction (base correction site, base calibration efficiency, etc.) and / or single strand cleavage (cutting site, cleavage) by the cell (diaminase and inactivated endonuclease).
  • DNA extracted from the cells to be confirmed eg, efficiency, etc.
  • a deaminase and an inactivated target specific endonuclease e.g., a fusion protein comprising a satidine deaminase and an inactivated Cas9 protein
  • DNA extracted from the cell may be a genomic DNA or a polymerase chain reaction (PCR) amplification product comprising a target gene or a target site.
  • PCR polymerase chain reaction
  • the deaminase, inactivated target specific endonuclease, and / or guide RNA used in step (1) are removed. It may further comprise a step.
  • the step of smoothing (or end repairing) the double-stranded DNA fragment in which the single-strand break may occur may be further included.
  • the terminal smoothing step (b) removes the extended nucleotides on the 3 'side (complementary position with the 5 1 terminal side of the cleavage point of the cleaved strand) in the double stranded DNA fragment where the single strand cleavage occurred.
  • a 3'-to_5 'trimming step (cutting), and / or (c) a double stranded DNA fragment in which a single strand break has occurred, 3' at the cleavage point of the strand where the break occurred It may further comprise a 5'-to-3 'DNA synthesis step that extends the terminal nucleotides (see figure in Example 1). Said 3'- to-5 'cleavage step is carried out using an appropriate conventional exonuclease. Can be done. The 5'-to-3 'DNA synthesis step can be carried out using any suitable conventional DNA polymerase.
  • the single stranded DNA fragment (DNA double strand)
  • a contiguous 10-30 nt or 15-25 nt oligonucleotide comprising the cleavage site of the cleaved strand and / or a contiguous position comprising a (complementary) position corresponding to the cleavage position of the uncleaved strand 10 to 30nt or 15 to 25ht of oligonucleotide) may be further included.
  • the single stranded DNA fragments used for nucleic acid sequencing in step ( ⁇ ) above are not contiguous with 10 to 30 nt or 15 to 25 nt of ligonucleotides and / or cleavage, including the cleavage sites of the strands where the single strand cleavage has occurred.
  • Contiguous 10-30nt or .15-25nt oligonucleotides comprising a (complementary) position opposite to the cleavage position of the strand; And / or amplification products of the oligonucleotides.
  • an off-target site is not a target site for deaminase and inactivated target specific endonuclease, but the deaminase and inactivated target specific endonuclease are active. Say a location with. In other words, it refers to a position other than the target position that is base corrected and / or cleaved by a deaminase and inactivated target specific endonuclease.
  • the non-target position is to be used in a concept that includes not only the actual non-target position for the deaminase and inactivated target specific endonuclease, but also a potential non-target position that is likely to be a non-target position. Can be.
  • the non-target position may mean any position other than the target position cleaved by the deaminase and inactivated target specific endonuclease in vitro, but not limited thereto.
  • the deaminase and inactivated target specific endonuclease having activity at a position other than the target position can be caused by various causes.
  • a nucleotide mismatch with a target sequence designed for a target site Because of the low level of (mi smatch), there is a possibility that deaminase and inactivated target specific endonucleases work for sequences other than the target sequence (non-target sequence) that have high sequence homology with the target sequence.
  • the non-target position may be, but is not limited to, a sequence region (gene region) that satisfies one or more of the following conditions:
  • the strand in which the cleavage occurred and the complementary strand comprise the PAM sequence
  • the strand in which the cleavage occurred and the complementary strand were not more than 15 or 10 or less, such as 1 . 1 to 15, 1 to 14, 1 to 13, 1 to 12, 1 to 11, 1 to 10, 1 to 9, 1 to 8, 1 to Includes seven, one to six, one to five, one to four, one to three, one to two, or one mismatch nucleotide; And
  • the cleaved and complementary strands contain base correction (conversion of one or more cytosines (C) to uracil (U) or thymine . (T)).
  • deaminase and inactivated target specific endonucleases at non-target positions can cause mutations of unwanted genes in the genome, which can cause serious problems. Therefore, the process of accurately detecting and analyzing non-target sequences as well as the activity at the target sites of the deaminase and inactivated target specific endonuclease can be very important and specific to the target sites without any known non-target effects. It may be useful to develop diaminase and inactivated target specific endonucleases that operate independently.
  • the cytidine deaminase and the inactivated target specific endonuclease may have activity in vivo (Un vivo) and in vitro Un / ro), and therefore, in vitro (eg, genomes).
  • DNA can be used to detect non-target positions, which, when applied in vivo, are expected to have activity at the same position as the detected non-target position (genetic position (site) containing non-target sequence). Can be.
  • Step (ii) is a step of analyzing the nucleic acid sequence of the DNA fragment cut (single stranded) in step (i), all conventional nucleic acid sequence analysis It may be carried out by the method. For example, the separated used in step (1)
  • the nucleic acid sequencing may be performed by whole genome sequencing.
  • whole genome sequencing is a method for reading a genome in multiples of 10x, 20x, 40x format for full-length genome sequencing by next generation sequencing. Means.
  • Next-generation sequencing is a technology that fragments the full-length genome in chip-based and PCR-based paired end formats, and performs the sequencing of the fragments at high speed based on chemical hybridization. it means.
  • Step (iii) is a step of identifying (or determining) the position at which DNA is cleaved from the sequence read data obtained in step (ii), and analyzing the sequencing data to generate an on-target site. And off-target sites can be detected easily. Determining the specific location of DNA cleavage from the sequencing data can be performed by various approaches. The present specification provides various rational methods for determining the location. However, this is only an example included in the technical idea of the present invention and the scope of the present invention is not limited by these methods.
  • the position where the 5 'end is vertically aligned may mean the position where the DNA is cleaved.
  • Sorting the sequence data according to the position on the genome may be performed using an analysis program (eg, BWA / GATK or ISAAC, etc.).
  • an analysis program eg, BWA / GATK or ISAAC, etc.
  • vertical alignment refers to the adjacent Watson st rand and the Crick strand, respectively, when analyzing the whole genome sequencing results by a program such as BWA / GATK or ISAAC.
  • the cleavage in step (1) occurs at the target position and the non-target position
  • the commonly cleaved sites are vertically aligned because their positions each start at the 5 'end, but are not cleaved. Since the 5 'end is not present in the non-site, the alignment may be arranged in a staggered manner.
  • the vertically aligned position can be seen as the cleaved site in step (i), which may mean a target position or non-target position cleaved by an inactivated target specific endonuclease.
  • alignment means mapping base sequence data to a reference genome and arranging bases having the same position in the genome according to each position.
  • Any computer program can be used as long as it can be sorted, and it can be selected from programs known in the art, or programs designed for the purpose. However, it is not limited thereto.
  • the position where the DNA is cleaved by the deaminase and the inactivated target specific endonuclease can be determined by a method such as finding a position where the 5 'end is vertically aligned as described above. If the location is not an on-target site, it can be determined as an off-target site.
  • a sequence identical to the base sequence designed as the target position of the deaminase and inactivated target specific endonuclease is a target position, and a sequence not identical to the base sequence may be regarded as a non-target position. This is obvious by definition of the non-target location described above.
  • the method (eg, base calibration efficiency at on-target site, off-target site identification method) is such that after step (iii) the cleavage site is not an on-target site.
  • the method may further include identifying (determining) the off-target se.
  • the cut strands in the DNA fragments cleaved by the braces are vertically aligned at the 5 'end.
  • DNA reads in which the 5 'end is vertically aligned DNA read; as used herein, the 5' end is vertically aligned and the same nucleic acid sequence
  • the number of cleavage sites can be determined according to the number of DNA fragments having the same or a population of the DNA fragments). For example, when the number of DNA reads vertically aligned at the 5 'end is 1, it can be confirmed that the cleavage occurs only at one position, that is, at the target position, by the braces.
  • the number of DNA reads vertically aligned at the 5 'end is 2 or more, for example, 3 or more, 4 or more, 5 or more, 6 or more, ⁇ or more, 8 or more, 9 or more, or 10 or more, cleavage at a plurality of locations It can be confirmed that this has occurred, which means that there is DNA cleavage at a position other than the target position (non-target position).
  • one of the DNA reads whose vertical 5 'ends are not the target position ie, having a nucleic acid sequence different from the nucleic acid sequence of the target position
  • identifying the single-stranded cleavage position of step (iii) may comprise (a) identifying (or measuring) the number of DNA reads that are vertically aligned at the 5 'end.
  • the number of DNA reads vertically aligned at the 5 'end is 2 or more, for example, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, or 10 or more, the position other than the target position It can be confirmed (or determined) that DNA cleavage has occurred at the (non-target position).
  • the step of identifying as the non-target position of step (iv) may include (i v-1) having a nucleic acid sequence different from that of the target position in the DNA read in which the two or more 5 'ends are vertically aligned. Identifying (or determining) the DNA read to a non-target location.
  • the non-target position includes the PAM sequence (more specifically, DNA reads having a nucleic acid sequence different from that of the target position in the DNA read whose 5 ′ end is vertically aligned among the cut DNA fragments; By checking whether the complementary strand (strand with complementary sequence) contains the PAM sequence), thereby excluding the erroneously cleaved position rather than the target specific cleavage by the target specific endonuclease included in the calibrator This can further increase the accuracy of the non-target position.
  • the PAM sequence more specifically, DNA reads having a nucleic acid sequence different from that of the target position in the DNA read whose 5 ′ end is vertically aligned among the cut DNA fragments
  • identifying the single stranded cleavage position of step (iii) comprises: (b) confirming whether the non-target position comprises a PAM sequence, e.g., in the truncated DNA fragment, the 5 'end is vertically aligned.
  • DNA strands having a nucleic acid sequence different from the nucleic acid sequence at the target position among the DNA reads that are complementary strands (strands having complementary sequences) include a PAM sequence specific for the target-specific endonuclease included in the braces. It may further comprise the step of checking whether or not.
  • the DNA read having a nucleic acid sequence different from the nucleic acid sequence at the position of the cut out of the DNA read whose vertical alignment of the 5 'end is complementary with the strand having the complementary sequence. Includes a PAM sequence specific for the target specific endonuclease included in the braces, it may comprise the step of identifying (or determining) the non-target position.
  • the non-target position may be composed of a sequence having homology with the sequence of the target position. More specifically, since the target position sequence is represented by the nucleic acid sequence of the strand including the P ′ sequence, the non-target position is determined by the nucleic acid sequence of the target position in the DNA read whose vertical 5 ′ is aligned in the truncated DNA fragment. Nucleic acid sequences of DNA reads with different nucleic acid sequences and complementary strands (strands with complementary sequences) have one or more nucleotide mismatches with the target position, more specifically with the target position (target sequence).
  • Up to 10 or less such as 1 to 15, 1 to 14, 1 to 13, 1 to 12, 1 to 11 ⁇ 1 to 10, 1 to 9, It may have 1 to 8, 1 to 7, 1 to 6, 1 to 5, 1 to 4, 1 to 3, or 1 to 2 nucleotide match .
  • identifying the single stranded cleavage site of step (iii) comprises: (c) a DNA read having a nucleic acid sequence that is different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end is aligned in the truncated DNA fragment; And complementary strands
  • the DNA read can be identified (or determined) as DNA cleavage has occurred at a position other than the target position (non-target position).
  • the step of identifying the non-target position of step (IV) may be performed by (iv-3) a nucleic acid sequence different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end is aligned in the cut DNA fragment.
  • a nucleic acid sequence different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end is aligned in the cut DNA fragment Up to 15 or 10 or less, such as 1 to 15, 1 to 14, 1 to 1, mismatched nucleotides to the target position sequence of the DNA read and the complementary strand (the strand with the complementary sequence) 13, 1 to 12, 1 to 11, 1 to 10, 1 to 9, 1
  • identification (or determination) by non-target position It may include the step).
  • Step (iii) may comprise one or more of steps (a), (b), and (c) (eg, step (a) and one or more of (b) and (c)), wherein When two or more of (a), (b), and (c) are included, they may be performed simultaneously or sequentially in any order.
  • step (iv) may comprise at least one of steps (iv1 1), (iv-2), and (iv-3) (e.g., steps (iv-1) and (iv-2) and (iv-3) And at least two of steps (iv-1), (iv-2), and (iv-3), which are performed simultaneously or sequentially, regardless of order. It may be.
  • step (iii- ⁇ ) of the base correction for example, the conversion of cytosine (C) to uracil (U) or thymine (T)
  • the 5 'end of the cut DNA fragments, vertical alignment Complementary strands (strands with complementary sequences) with DNA reads having a nucleic acid sequence different from the nucleic acid sequence at the target position among the DNA reads are subjected to nucleotide correction (uracil (U) or thymine (T) of one or more cytosines (C)
  • nucleotide correction uracil (U) or thymine (T) of one or more cytosines (C) It may be to include a step of determining (measure) whether or not).
  • identifying the non-target position of step (iv) includes (iv-4) having a nucleic acid sequence different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end of the truncated DNA fragment is vertically aligned. If the DNA read and the complementary strand (strand with complementary sequences) comprise base correction (conversion of one or more cytosines (C) to uracil (U) or thymine (T)), they are identified as non-target positions (or Determining) may be included.
  • base correction conversion of one or more cytosines (C) to uracil (U) or thymine (T)
  • the single strand is cut by performing step (i) on the genomic DNA, followed by whole genome analysis (step (ii)), and then aligned with ISAAC to vertically aligned and not cut at the cut position.
  • step (ii) whole genome analysis
  • step (ii) whole genome analysis
  • step (ii) whole genome analysis
  • ISAAC whole genome analysis
  • a non-target position may be determined as a position where two or more sequence reads corresponding to the Watson strand and the Crick strand are vertically aligned, respectively, and more than 20% of the sequence data. Are vertically aligned and have the same 5 'end at each Watson strand and Creek strand. It can be determined that the position where the number of the nucleotide sequence data is 10 or more is a non-target position, that is, a position to be cleaved.
  • steps ( ⁇ ) and (iii) may be Digenome—seq (di ested-genome sequencing), and more details are described in Korean Patent Publication No. 10-2016-0058703 Incorporated herein by reference).
  • the base calibration site and / or single strand cleavage site of the diminase can do.
  • the base calibration efficiency or target specificity at the on—target site ie, the base calibration or cleavage frequency at the on-target site
  • [Total base calibration or cleavage frequency] ie, the base calibration or cleavage frequency at the on-target site
  • the base calibration efficiency or target specificity at the on—target site ie, the base calibration or cleavage frequency at the on-target site
  • total base calibration or cleavage frequency Total base calibration or cleavage frequency
  • make the non-target position can be carried out by treating the non-specific deaminase, and an active target enemy endonuclease in vitro Un w 'ro) in the dielectric DNA. Therefore, it can be confirmed whether the non-target effect is substantially observed in vivo with respect to the non-target position identified (detected) through the above method. However, since this is only an additional verification process, it is not an essential step in the scope of the present invention, but is only a step that may be additionally performed as necessary.
  • off-target effect can be used to mean the level at which base correction and / or double stranded cleavage occurs at off-target , site.
  • Insert and / or deletion (Indel) generically refers to variations in which some bases are inserted or deleted in the base sequence of DNA.
  • the DNA single strand cleavage method using cytidine deaminase provided herein and nucleic acid sequencing techniques using the same can provide more accurate and more accurate detection of the base correction position, target specificity, and / or non-target position of the cytidine deaminase. It can be confirmed efficiently. ⁇ ⁇ Brief Description of Drawings ⁇
  • 1 is a representative IGV image showing C ⁇ U transformation and straight alignment at the target location of EMX1.
  • Figure 2 shows the sequence reads from only one strand obtained with Digenome-seq results. Shows the number of mcked sites with uniform alignment and the number of PAM-containing sites with a mismatch of 10 or less of these locations.
  • 3A is a cleavage map of the rAP0BECl-XTEN-dCas9-NLS vector.
  • 3B is a cleavage map of the rAP0BECl-XTEN-dCas9—UGI-NLS vector.
  • 3C is a cleavage map of the rAP0BECl-XTEN-Cas9n-UGI-NLS vector.
  • 5 is a cleavage map of the pET28b_BEl vector.
  • HEK293T cells (ATCC CRL-11268) were maintained in DMEM (Dulbecco Modified Eagle Medium) medium supplemented with 103 ⁇ 4 (w / v) FBS and l% (w / v) penicillin / Streptomycin (Welgene).
  • HEK293T cells (1.5xl0 5 ) were inoculated in 24-well plates, sgRNA plasmid (500 ng) and Lipofectamine 2000 (Invitrogen), and Base Editor ' plasmid (Addgene plasmid # 73019 (Expresses BE1 with C— terminal NLS in) mammalian cells; rAPOBECl—XTEN ⁇ dCas9-NLS; FIG.
  • the sgRNAs used in the Examples below are the PAM sequences (5'-NGG-3 ', where N is A, T) at the 5' end of the target site sequence (target sequence; EMX1 on—target sequence; GAGTCCGAGCAGAAGMGAAGGG (SEQ ID NO)). , G, or C)) was used as the targeting sequence '(N cas9 ) r of the following general formula 3 in which T was replaced with U: 5 "-(N cas g) 1- (GUUUUAGAGCUA SEQ ID NO: l)-(GAAA)-SEQ ID NO: 3) -3 '(formula 3; oligonucleotide linker: GAM).
  • a plasmid encoding Hi S 6-rAP0BECl-XTEN-dCas9 protein (pET28b-BE1; Expresses BE1 with N—terminal His6 tag in E. Col i; FIG. 5) was provided by David Liu (Addgene plasmid # 73018). His6-rAP0BECl-nCas9 protein (BE3 delta UGI; BE3 variant lacking the UGI domain) was substituted by H840 in the plasmid pET28b-BEl encoding His6-rAP0BECl-XTEN ⁇ dCas9 protein using site directed mutagenesis. Plasmid (pET28b-BE3 delta UGI; FIG. 6) was constructed.
  • Rosetta expressing cells (Novagen, catalog number: 70954-3CN) were transformed with pET28b-BEl or pET28b-BE3 delta UGI prepared above, Lur ia-Bertani containing 100 g / ml kanamycin and 50 mg / ml carbenici 1 in (LB) incubated overnight at 37 ° C in brot. 10 ml of overnight cultures of Rosetta cells containing pET28b_BEl or pET28b—BE3 delta UGI were inoculated into 400 ml LB broth containing 100 / g / ml kanamycin and 50 mg / ml carbenici 1 in and OD600 reached 0.5-0.6.
  • cells were harvested by centrifugation at 5000x g for 10 min at 4 ° C, lysozyme (Sigma) and protease inhibitors (Roche complete, EDTA—free) supplemented lysis buffer (50 mM NaH 2 P0 4 , Solubilized in 5 ml of 300 mM NaCl, 1 mM DTT and 10 mM imidazole, pH 8.0).
  • lysis buffer 50 mM NaH 2 P0 4 , Solubilized in 5 ml of 300 mM NaCl, 1 mM DTT and 10 mM imidazole, pH 8.0.
  • the obtained cell reaction product was centrifuged at 13,000 rpm for 30 minutes at 4 ° C., and the soluble cell lysate obtained was incubated with Ni-NTA agarose resin (Qiagen) at 4 ° C. for 1 hour.
  • Genomic DNA was purified (extracted) from HEK293T cells using the DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer's instructions. Genomic DNA (10 ig) was buffered (100 mM NaCl, 40 mM for 8 hours at 37 ° C with a reaction volume of 500 with rATOBECl—nCas9 protein (300 nM) and sgRNA (900 nM) purified in Reference Example 2 above. ) Hris-HCl, 10 mM MgC12, and 100 / g / ml BSA, pH 7.9).
  • the sgRNA used above was the PM sequence at the 5 'end of the target site sequence (target sequence; EMX1 on—target sequence; GAGTCCGAGCAGAAGAAGMGGG (SEQ ID NO: 14)) (5'-NGG-3' (N is A, T, G, or The sequence in which T was replaced with U in the sequence except for C) was used as the targeting sequence '(Nca ⁇ 1) of the following general formula 3:
  • uracil containing genomic DNA was purified by DNeasy Blood & Tissue Kit (Qiagen). Target sites were PCR amplified using SUN—PCR blends and Sanger sequencing to confirm BE3-mediated cytosine deamination and DNA cleavage.
  • Genomic DNA (1) was fragmented in the 400-500 bp range using Covaris system (Life Technologies) and blunt-ended using End Repair Mix (Thermo Fischer).
  • a library was generated by connecting fragmented DNA with an adapter, followed by Macrogen, and carried out hole genome sequencing (WGS) using HiSeq X Ten Sequencer (Illumina) (Kim, D., Kim, S. , Kim, S., Park, J. & Kim, JS Genome one wide target specificities of CRISPR ⁇ Cas9 nucleases revealed by multiplex Digenome- seq.Genome research 26, 406-415 (2016)).
  • targets and potential nontarget sites were amplified with a KAPA HiFi HotStart PCR kit (KAPA Biosystems # KK2501). Pooled PCR amplifications were sequenced using MiniSeq (II lumina) or Illumina Miseq (LAS Inc. Korea) equipped with TruSeq HT Dual Index System (Illumina).
  • the primers used for the target deep sequencing are as follows:
  • 1 is a representative IGV image showing C ⁇ U transformation and straight alignment at the target location of EMX1.
  • Groups A and B are identified as having absolute numbers (n ⁇ 5 or 10) and relative numbers (10% or 20%, respectively) having the same 5 ′ end and the number of positions homologous to the target sequence. Shows.
  • Di genome The Cas9 induced inde l frequency and the BE3 induced substitution frequency at the BE3 non-target position for EMX1 identified via seq were measured using in-depth sequencing in HEK293T cells (see Reference Example 5). Intact genomic DNA or rAPOBECl—generated DNA treated with nCas9 was used to determine whether C ⁇ T conversion occurred at each of these positions in the WGS data obtained.

Abstract

Provided are a DNA single strand break composition comprising cytidine deaminase, inactivated target-specific endonuclease, and guide RNA, a method for producing a DNA single strand break, using the same, a DNA nucleotide sequencing method with base editing introduced thereto, and a method for identifying (or measuring or detecting) a base editing position, base editing efficiency on an on-target site and an off-target site, and/or target specificity.

Description

【발명의 설명】  [Explanation of invention]
【발명의 명칭】  [Name of invention]
DNA 단일가닥 절단에 의한 염기 교정 비표적 위치 확인 방법 【기술분야】  Base-correction non-target location method by DNA single-strand cutting
시티딘 디아미나제 (deaminase), 불활성화된 표적특이적 엔도뉴클레아제, 및 가이드 RNA를 포함하는, DNA 단일 가닥 절단 (single strand break)용 조성물, 이를 이용한 DNA. 단일 가닥 절단 (single strand break) 생성 방법, 염기 교정 (base editing)이 도입된 DNA의 핵산 서열 분석 방법, 및 염기 교정 위치, on-target 부위에서의 염기 교정 효율, 비표적 위치 (off-target site), 및 /또는 표적 특이성을 확인 (또는 측정 또는 검출)하는 방법과 관련된 것이다.  A composition for DNA single strand break, comprising cytidine deaminase, inactivated target specific endonuclease, and guide RNA, DNA using the same. Methods for generating single strand breaks, nucleic acid sequencing of DNA with base editing introduced, and base correction sites, base calibration efficiency at on-target sites, off-target sites ) And / or a method of identifying (or measuring or detecting) target specificity.
【발명의 배경이 되는 기술】  [Technique to become background of invention]
DNA 결합 모들과 시티딘 디아미나제 (cytidine deaminase)를 포함하는 염기 교정기 (Base Editor; Programmable deaminase) 는 DNA 이중 가닥 절단 (DNA double-strand breaks)을 생성하지 않고 유전체에서 표적화 된 뉴클레오티드 치환 또는 염기 교정을 가능하게 한다. 표적 부위에 작은 삽입 또는 결실 (indels)을 유도하는 CRISPR-Cas9 및 ZFN (zinc-finger nucleases)과 같은 programmable 'nucleases와 달리, - programmable 디아미나제는 표적 부위에서 몇' 개의 뉴클레오타이드.내에서 C를 T로 (또는 보다 적은 비율로 C를 G 또는 A로) 변환시킨다. 염기교정기는 인간의 세포 , 동물 및 식물에 대한 유전 질환을 유발하는 점 돌연변이를 수정하거나 단일염기다형성 (single-nucleotide polymorphisms; SNP)을 생성 할 수 있다. Base Editors (Programmable deaminases), including DNA binding modules and cytidine deaminases, target nucleotide substitutions or base modifications in the genome without generating DNA double-strand breaks. To make it possible. Unlike programmable 'nucleases, such as CRISPR-Cas9 and zinc-finger nucleases (ZFNs), which induce small insertions or indels at the target site,-programmable diaminases contain C within several ' nucleotides at the target site. Convert to T (or C to G or A in lesser proportions). Base braces can correct point mutations that cause genetic disease in human cells, animals, and plants, or produce single-nucleotide polymorphisms (SNPs).
염기교정기는 다음의 4 종류가 보고되어 있다:  Four calibrators have been reported:
1) 5. c e7 5에서 유래하는 촉매적으로 결핍된 Cas9 (catalytically-deficient Cas9; dCas9) 또는 D10A Cas9 니케이즈 (nCas9)와, 래트의 시티딘 디아미나아제인 rAPOBECl를 포함하는 염기교정기 (Base Editors; BEs); 2) dCas9 또는 nCas9와, 바다칠성장어 (sea lamprey)의 act i vat ion- induced cytidine deaminase (AID) ortholog인 PmCDAl 또는 인간 AID를 포함하는 Target-AID; 3) MS2-결합 단백질에 융합된 과활성화된 AID 변이체를 모집하기 위해 MS2 RNA 헤어핀에 연결된 sgRNAs와 clCas9를 포함하는 CRISPR— X; 및 4) 징크—핑거 단백질 또는 transcription activator-like effectors (TALEs)가 시티딘 디아미나제에 융합 된 것 . 염기교정기에 의한 염기 교정 (base editing)에 대한 광범위한 관심에도 불구하고, programmable 디아미나제의 유전체 전체에 대한 표적 특이성을 분석할 수 있는 수단이 개발된 바가 없다. 따라서, 염기교정기의 유전체 전체에 대한 표적 특이성을 분석하여, 염기교정기의 염기 교정 효율, 비표적 사이트 (off— target site), 비표적 효과 (of f— target effect) 등을 분석할 수 있는 수단의 개발이 필요하다. 1) 5. A base straightener comprising a catalytically deficient Cas9 (dCas9) or D10A Cas9 kinase (nCas9) derived from 5. c e7 5 and rAPOBECl, a rat cytidine deaminase. Editors; BEs); 2) Target-AID comprising dCas9 or nCas9 and PmCDAl or human AID, which is act i vat ion-induced cytidine deaminase (AID) ortholog of sea lamprey; 3) CRISPR-X containing sgRNAs linked to MS2 RNA hairpins and clCas9 to recruit overactivated AID variants fused to MS2-binding proteins; and 4) zinc-finger proteins or transcriptional activator-like effectors (TALEs). Fused to cytidine diaminase. Despite widespread interest in base editing by base calibrators, no means have been developed to analyze target specificity for the entire genome of programmable diamines. Therefore, by analyzing the target specificity of the whole genome of the calibrator, it is possible to analyze the base calibration efficiency, off-target site, of f-target effect, etc. Need development
[발명의 내용]  [Content of invention]
【해결하고자 하는 과제】  Problem to be solved
본 명세서에서는 염기교정기의 유전체 전체에 대한 표적 특이성을 분석할 수 있는 수단, 및 이를 통하여 염기교정기의 비표적 사이트, 비표적 효과 등을 분석할 수 있는 수단이 제공된다..  In the present specification, a means for analyzing target specificity of the entire genome of a braces, and a means for analyzing non-target sites, non-target effects, etc. of the braces are provided.
일 예는 (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mR A), (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 포함하는, DNA 단일. 가닥 절단 (single strand breaks)용 조성물을 제공한다. 상기 조성물은 우라실—특이적 제거 시약 (Uracil- Specific Excision Reagent; USER)을 포함하지 .않는 것일 수 있다.  One example includes (a) a diaminase or a coding gene thereof (cDNA, rDNA, or mR A), (b) an inactivated target specific endonuclease or a coding gene thereof (cDNA, rDNA, or mRNA), and (c) A single DNA, comprising a guide RNA or a coding gene thereof. Provided are compositions for single strand breaks. The composition may be free of uracil-specific Excision Reagent (USER).
다른 예는, (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (b) 붙활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시키는 단계를 포함하는, DNA 단일 가닥 절단 (single strand break) 방법을 제공한다. 상기 방법은 우라실-특이적 제거 시약 (Uracil-Specific Excision Reagent; USER)을 처리하는 단계를 포함하지 않는 것일 수 있다.  Other examples include (a) diaminase or its coding gene (cDNA, rDNA, or mRNA), and (b) a activated activated target specific endonuclease or its coding gene (cDNA, rDNA, or mRNA), And (c) introducing the guide RNA or a coding gene thereof into the cell or contacting the DNA isolated from the cell. The method may be one that does not include treating uracil-specific removal reagent (USER).
다른 예는,  Another example is
(i) (a) 디아미나제 또는 이의 암^화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA 단일 가닥 절단을 유도하는.단계 ; 및  (i) (a) diaminase or its cancerous gene (cDNA, rDNA, or mRNA), and (b) inactivated target specific endonuclease or its coding gene (cDNA, rDNA, or mRNA) And (c) introducing a guide RNA or a coding gene thereof into the cell or contacting DNA isolated from the cell to induce DNA single strand cleavage; And
(ii) 상기 단일 가닥 절단된 DNA 절편의 핵산 서열을 분석하는 단계 를 포함하는, 상기 디아미나제에 의하여 염기 교정 (base editing)이 도입된 DNA 의 핵산 서열 분석 방법을 제공한다. . 상기 방법은 우라실- 특이적 제거 시약 (Uraci卜 Specific Excision Reagent; USER)을 처리하는 단계를 포함하지 않는 것일 수 있다. (ii) analyzing the nucleic acid sequence of the single-stranded DNA fragment, providing a method for nucleic acid sequencing of DNA introduced with base editing by the deaminase. . The method is uracil- It may not include treating the specific removal reagent (Uraci® Specific Excision Reagent; USER).
다른 예는,  Another example is
(1) (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA) 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA 단일 가닥 절단을 유도하는 단계;  (1) (a) diaminase or its coding gene (cDNA, rDNA, or mRNA), and (b) inactivated target specific endonuclease or its coding gene (cDNA, rDNA, or mRNA) and ( c) introducing a guide RNA or a coding gene thereof into the cell or contacting DNA isolated from the cell to induce DNA single strand break;
(ii) 상기 절단된 DNA 절편의 핵산 서열을 분석하는 단계; 및  (ii) analyzing the nucleic acid sequence of the cleaved DNA fragment; And
(iii) 상기 분석에 의여 수득된 핵산 서열 데이터로부터 단일 가닥 절단 위치를 확인하는 단계  (iii) identifying the single stranded cleavage site from the nucleic acid sequence data obtained by the analysis
를 포함하는, 디아미나제의 염기 교정 또는 단일가닥절단 위치, on- target 부위에서의 염기 교정 효율, 비표적 위치 (off-target site), 및 /또는 표적 특이성을 확인 (또는 측정 또는 검출)하는 방법을 제공한다. 상기 방법은, 예컨대, 상기 단계 (Π) 이후 및 단계 (iii) 이전, 동시 또는 이후에, (iii— 1) 상기 분석에 의여 수득된 핵산 서열 데이터 (sequence read)에서 염기 교정 (예컨대, 시토신 (C)의 우라실 (U) 또는 티민 (T)으로의 변환) 여부를 확인하는 단계를 추가로 포함할 수 있다. 상기 방법은 우라실-특이적 제거 시약 (Uracil-Specific Excision Reagent; USER)을 처리하여 DNA 에 이중 가 ^ 절단을 생성하는 단계를 포함하지 않는 것일 수 있다. 일 예에서, 상기 방법 (예컨대, on-target 부위에서의 염기 교정 효율, 비표적 위치 (off-target site) 확인 방법)은, 상기 단계 (iii) 이후에, (iv) 상기 절단 위치가 표적 위치 (on— target site)가 아닌 경우, 비표적 위치 (off-target site)로 확인 (판단)하는 단계를 추가로 포함할 수 있다.  To identify (or measure or detect) base calibration or single-stranded cleavage sites, base calibration efficiency at on-target sites, off-target sites, and / or target specificities, including, Provide a method. The method comprises, for example, after step (Π) and before, concurrently or after step (iii), (iii-1) base correction (eg, cytosine (a) in nucleic acid sequence data obtained by the assay. C) may further comprise determining whether to convert to uracil (U) or thymine (T). The method may not comprise the step of treating the uracil-specific removal reagent (USER) to generate a double cleavage in the DNA. In one example, the method (e.g., base calibration efficiency at an on-target site, off-target site identification method), after step (iii), (iv) the cleavage site is the target site If not (on—target site), it may further include the step of identifying (determining) off-target site.
【과제의 해결 수단】  [Measures of problem]
본 명세서에서는 Digenome-seq를 수정하여 인간 유전체에서 Cas9 니케이즈 (nickase)와 디아미나제 (deaminase)로 구성된 염기교정기 (예컨대 , In the present specification, by modifying Digenome-seq in the human genome, a base straightener composed of Cas9 nickase and deaminase (eg,
Base Editor 3; BE3)의 특이성을 평가하였다. 유전체 DNA를 BE3 및 가이드Base Editor 3; BE3) was evaluated for specificity. Genomic DNA BE3 and Guide
RNA로 시험관내에서 처리하여 DNA 이중 가닥 중 단일 가닥에 절단이 생성되는 것을 확인하였다. 본 명세서에서 제공되는 디아미나제를 이용한Treatment in vitro with RNA confirmed that cleavage was produced on a single strand of the DNA double strand. Using the diamines provided herein
DNA 단일 가닥 절단 방법 및 이를 이용한 핵산 서열 분석 방법에 의하여,By DNA single strand cleavage method and nucleic acid sequencing method using the same,
BE3 비표적 사이트를 전체 유전체 시뭔성 데이터를 사용하여 계산적으로 확인할 수 있다. 우선 , DNA에 이중 가닥 절단을 유발하지 않는 디아미나제를 이용하여 DNA에 이중 가닥 절단을 생성하는 기술이 제공된다. BE3 nontarget sites can be identified computationally using full genome siege data. First, a technique is provided for generating double strand breaks in DNA using a diminase that does not cause double strand breaks in DNA.
일 예는 일 예는 (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 포함하는, DNA 단일 가닥 절단 (single strand breaks)용 조성물을 제공한다. 상기 조성물은 우라실-특이적 제거 .시약 (Uracil- Specific Excision Reagent; USER)을 포함하지 않는 것일 수 있다.  One example may comprise (a) a diaminase or a coding gene thereof (cDNA, rDNA, or mRNA), (b) an inactivated target specific endonuclease or a coding gene thereof (cDNA, rDNA, or mRNA) And (c) provides a composition for DNA single strand breaks, comprising a guide RNA or a coding gene thereof. The composition may be free of uracil-specific Excision Reagent (USER).
본 명세서에 사용된 암호화 유전자는 cDNA, rDNA 또는 이를 포함하는 재조합 백터, 또는 mRNA 형태로 사용될 수 있다.  Coding genes as used herein may be used in the form of cDNA, rDNA or recombinant vector comprising the same, or mRNA.
상기 디아미나제는 시티딘 디아미나제일 수 있다. 시티딘 디아미나제는 뉴클레오타이드에 존재하는 염기인 시토신 (예컨대, 2중 가닥 DNA 또는 RNA에 존재하는 시토신)을 우라실로 변환 . (C-toᅳ U conversion or C-to-U editing)시키는 활성을 갖는 모든 효소를 의미하는 것으로, 표적 부위의 서열 (표적 .서열)의 PAM 서열이 존재하는 가닥에 위치하는 시토신을 우라실로 변환시킨다. 일 예에서, 상기 시티딘 디아미나제는 인간, 원승이 등의 영장류, 래트, 마우스 등의 설치류 등과 같은 포유류로부터 유래된 것일 수 있으나, 이에 제한되는 것은 아니다. 예컨대, 상기 시티단 디아미나제는 APOBEC ("apolipoprotein B mRNA editing enzyme , catalytic polypeptide-1 ike") 패밀리에 속하는 효소들, AID (act i vat ion- induced cyt idine deaminase) , CDA (cyt idine deaminase; 예컨대, CDA1) 등으로 이루어진 군에서 선택된 1종 이상일 수 있으며, 예컨대, 다음으로 이루어진 군에서 1종 이상 선택될 수 있으나, 이에 제한되는 것은 아니다:  The deaminase may be cytidine deaminase. Cytidine deaminase converts cytosine (eg, cytosine present in double-stranded DNA or RNA), which is a base present in nucleotides, into uracil. (C-to ᅳ U conversion or C-to-U editing) means all enzymes having the activity of converting cytosine located on the strand where the PAM sequence of the target site sequence (target sequence) to uracil Let's do it. In one example, the cytidine deaminase may be derived from mammals such as humans, primates such as Wonseung, rats, rodents such as mice, but is not limited thereto. For example, the cytidan deaminase is an enzyme belonging to the family of APOBEC ("apolipoprotein B mRNA editing enzyme, catalytic polypeptide-1 ike"), AID (act i vat ion-induced cyt idine deaminase), CDA (cyt idine deaminase; For example, CDA1) and the like may be one or more selected from the group consisting of, for example, one or more selected from the group consisting of, but is not limited thereto:
APOBEC 1: 인간 Homo sapiens) APOBEC 1 (단백질: GenBank Accession Nos. NP— 001291495.1, NP_001635.2, NP_005880.2 등; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): GenBank Accession Nos. 匪 _001304566.1, 丽— 001644.4, NM_005889.3 등), 마우스 musculus) AP0BEC1 (단백질: GenBank Accession Nos. NP— 001127863.1, NP— 112436.1 둥; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): GenBank Accession Nos. NM_001134391 · 1, NM_031159.3 등);  APOBEC 1: Human Homo sapiens APOBEC 1 (Protein: GenBank Accession Nos. NP—001291495.1, NP_001635.2, NP_005880.2, etc .; Genes (describe the genes encoding them in the order of the proteins described above): GenBank Accession Nos. _001304566.1, li— 001644.4, NM_005889.3, etc.), mouse musculus) AP0BEC1 (protein: GenBank Accession Nos. NP—001127863.1, NP—112436.1); genes (describe the genes encoding them in the order of protein described above): GenBank Accession Nos. NM_001134391-1, NM_031159.3, etc.);
AP0BEC2: 인간 AP0BEC2 (단백질: GenBank Accession No. NP_006780.1 등; 유전자: GenBank Accession No. 丽_006789.3 등), 마우스 AP0BEC2 (단백질: GenBank Accession No. NP— 033824.1 등; 유전자: GenBank Accession No. 醒 _009694.3 등); AP0BEC2: human AP0BEC2 (protein: GenBank Accession No. NP_006780.1, etc .; gene: GenBank Accession No. 丽 _006789.3, etc.), mouse AP0BEC2 (protein: GenBank Accession No. NP-033824.1, etc .; gene: GenBank Accession No.醒 _009694.3 and the like);
AP0BEC3B: 인간 AP0BEC3B (단백질: GenBank Accession Nos. NP_001257340.1, NP_004891.4 등; 유전자 (mRNA or cDNA, 이하 동일) (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): GenBank Accession Nos. NM_001270411.1 , 醒_00490().4 등), 마우스 Uius musculus) AP0BEC3B (단백질: GenBank Accession Nos. NP— 001153887.1, NP_001333970.1, NP— 084531.1 등; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): GenBank Accession Nos. NM— 001160415.1, NM_001347041.1 , 丽_030255.3 등);  AP0BEC3B: Human AP0BEC3B (protein: GenBank Accession Nos. NP_001257340.1, NP_004891.4, etc .; gene (mRNA or cDNA, hereinafter identical) (describes genes encoding it in the order of proteins described above): GenBank Accession Nos. NM_001270411. 1, 醒 _00490 (). 4, etc.), mouse Uius musculus) AP0BEC3B (protein: GenBank Accession Nos. NP— 001153887.1, NP_001333970.1, NP — 084531.1, etc .; genes (describing genes encoding them in the order of proteins described above) GenBank Accession Nos. NM— 001160415.1, NM_001347041.1, Li_030255.3, etc.);
AP0BEC3C: 인간 AP0BEC3C (단백질: GenBank Accession No. NP— 055323.2 등; 유전자: GenBank Accession No. 匪 _014508·2 등);  AP0BEC3C: human AP0BEC3C (protein: GenBank Accession No. NP-055323.2, etc .; gene: GenBank Accession No. 匪 _014508 · 2, etc.);
AP0BEC3D (includi ng AP0BEC3E ): 인간 AP0BEC3D (단백질: GenBank Accession No. NP_689639.2 등; 유전자: GenBank Accession No. 匪_152426.3 등);  AP0BEC3D (including AP0BEC3E): human AP0BEC3D (protein: GenBank Accession No. NP_689639.2, etc .; gene: GenBank Accession No. # 152426.3, etc.);
AP0BEC3F: 인간 AP0BEC3F (단백질: GenBank Accession Nos. NP_660341.2, NPᅳ 001006667.1 등; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): 匪_145298.5, 匪 _001006666.1 등);  AP0BEC3F: human AP0BEC3F (protein: GenBank Accession Nos. NP_660341.2, NP # 001006667.1, etc .; genes (describing genes encoding them in the order of proteins described above): 匪 _145298.5, 匪 _001006666.1, etc.);
AP0BEC3G: 인간 AP0BEC3G (단백질: GenBank Accession Nos. NP_068594.1 , NP_001336365.1, NP_001336366.1 , NP— 001336367.1 등; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): NM— 021822.3. NM_001349436.1 , 匪 _001349437.1, NM_001349438.1 등);  AP0BEC3G: Human AP0BEC3G (protein: GenBank Accession Nos. NP_068594.1, NP_001336365.1, NP_001336366.1, NP— 001336367.1, etc .; genes (describe the genes encoding them in the order of proteins described above): NM — 021822.3. NM_001349436. 1, 匪 _001349437.1, NM_001349438.1 and the like);
AP0BEC3H: 인간 AP0BEC3H (단백질: GenBank Accession Nos. NP_001159474.2, NP— 001159475.2, NP_001159476.2, NP_861438.3 등; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): 匪 _001166002.2, 丽_001166003.2, NMᅳ 001166004.2, NM_181773.4 등);  AP0BEC3H: Human AP0BEC3H (Protein: GenBank Accession Nos. NP_001159474.2, NP— 001159475.2, NP_001159476.2, NP_861438.3, etc .; genes (describe the genes encoding them in the order of proteins described above): 匪 _001166002.2, 丽_001166003.2, NM ᅳ 001166004.2, NM_181773.4, etc.);
AP0BEC4 (including AP0BEC3E): 인간 AP0BEC4 (단백질: GenBank Accession No. NP— 982279.1 등; 유전자: GenBank Accession No. NM_203454.2 등); 마우스 AP0BEC4 (단백질: GenBank Accession No. NP— 001074666.1 등; 유전자: GenBank Accession No. NM_001081197.1 등);  AP0BEC4 (including AP0BEC3E): human AP0BEC4 (protein: GenBank Accession No. NP-982279.1 and the like; gene: GenBank Accession No. NM_203454.2 and the like); Mouse AP0BEC4 (protein: GenBank Accession No. NP — 001074666.1, etc .; gene: GenBank Accession No. NM — 001081197.1, etc.);
Act i vat ion- induced cytidine deaminase (AICDA 또는 AID): 인간 AID (단백질: GenBank Accession Nos. NP— 001317272.1, NP_065712.1 등; 유전자 (앞에 기재된 단백질 순서대로 이를 암호화 하는 유전자를 기재함): GenBank Accession Nos. 匪— 001330343.1 , ΝΜ_020661.3 등); 마우스 AID (단백질: GenBank Accession No. NP— 033775.1 등; 유전자: GenBank Accession No. 丽— 009645.2 등); 및 Act i vat ion-induced cytidine deaminase (AICDA or AID): human AID (protein: GenBank Accession Nos. NP— 001317272.1, NP_065712.1, etc .; gene (listing genes encoding it in the order of proteins described above): GenBank Accession Nos. VII- 001330343.1, NM_020661.3 and the like); Mouse AID (protein: GenBank Accession No. NP — 033775.1, etc .; gene: GenBank Accession No. Lli— 009645.2, etc.); And
CDA (cytidine deaminase; EC number 3.5.4.5; 예컨대, CDA1): GenBank Accession Nos. NP_001776.1 (유전자: NM_001785.2) , CAA06460.1 (유전자: AJ005261.1), ΝΡ_416648.1 (유전자: NC_000913.3) 등.  CDA (cytidine deaminase; EC number 3.5.4.5; for example CDA1): GenBank Accession Nos. NP_001776.1 (gene: NM_001785.2), CAA06460.1 (gene: AJ005261.1), ΝΡ_416648.1 (gene: NC_000913.3) and the like.
본 명세서에 사용된 바로서, 표적 특이적 뉴클레아제는, 유전자 가위 (progra睡 able nuclease)라고도 불리며, 목적하는 유전체 DNA 상의 특정 위치를 인식하여 절단할 수 있는 모든 형태의 엔도뉴클레아제를 통칭한다. 예컨대, 상기 표적 특이적 뉴클레아제는 표적 유전자의 특정 서열을 인식하고 뉴클레오티드 절단 활성을 가져 표적 유전자에서 인델 (insertion and/or deletion, Indel)을 야기할 수 있는 모든 뉴클레아제에서 선택된 1종 이상일 수 있다.  As used herein, target specific nucleases, also called progra 睡 able nucleases, collectively refer to all forms of endonucleases capable of recognizing and cleaving specific positions on the desired genomic DNA. do. For example, the target specific nuclease may be one or more selected from all nucleases that recognize a specific sequence of the target gene and have nucleotide cleavage activity that may result in insertion and / or deletion (Indel) in the target gene. Can be.
예컨대, 상기 표적 특이적 뉴클레아제는 미생물 면역체계인 CRISPR에서 유래한 RGEN (RNA-guided engineered nuclease; 예컨대, Cas9, Cpfl, 등) 등으로 이루어진 군에서 선택된 1종 이상일 수 있으나, 이에 제한되는 것은 아니다.  For example, the target specific nuclease may be one or more selected from the group consisting of RGEN (RNA-guided engineered nuclease; for example, Cas9, Cpfl, etc.) derived from the microbial immune system CRISPR, but is not limited thereto. no.
일 구체예에서 , 상기 표적 특이적 뉴클레아제는 Cas 단백질 (예컨대, Cas9 단백질 (CRISPR (Clustered regularly interspaced short palindromic repeats) associated protein 9)), Cpfl 단백질 (CRISPR from Prevotel la and Francisella 1) 등과 같은 타입 Π 및 /또는 타입 V의 CRISPR 시스템에 수반되는 엔도뉴클레아제로 이루어진 군에서 선택된 1종 이상일 수 있다. 이 경우, 상기 표적 특이적 뉴클레아제는 유전체 DNA의 표적 부위로 안내하기 위한 표적 DNA 특이적 가이드 RNA를 추가로 포함할 수 있다. 상기 가이드 RNA는 생체 외 (in vitro)에서 전사된 (transcr ibed) 것일 수 있고, 예컨대 올리고뉴클레오티드 이중가닥 또는 플라스미드 주형으로부터 전사된 것일 수 있으나, 이에 제한되지 않는다. 상기 표적 특이적 뉴클레아제 및 가이드 RNA는 리보핵산 단백질 (RNP) 형태로 사용될 수 있으며, 상기 리보핵산 단백질은 표적 특이적 뉴클레아제 또는 이의 암호화 유전자 및 RNA 또는 이의 암호화 유전자가 흔합물 또는 서로 결합된 복합체 형태로 포함된 것일 수 있다.  In one embodiment, the target specific nuclease is of a type such as Cas protein (eg, Cas9 protein (CRISPR (Clustered regularly interspaced short palindromic repeats) associated protein 9), Cpfl protein (CRISPR from Prevotel la and Francisella 1), etc.). And / or at least one member selected from the group consisting of endonucleases involved in the type V CRISPR system. In this case, the target specific nuclease may further comprise a target DNA specific guide RNA for guiding to the target site of the genomic DNA. The guide RNA may be transcr ibed in vitro, such as, but not limited to, oligonucleotide duplex or plasmid template. The target specific nuclease and guide RNA may be used in the form of ribonucleic acid protein (RNP), wherein the ribonucleic acid protein is a complex or a combination of a target specific nuclease or its coding gene and RNA or its coding gene It may be included in the form of a complex.
Cas9 단백질은 CRISPR/Cas 시스템의 주요 단백질 구성 요소로, 활성화된 엔도뉴클레아제 또는 nickase 기능을 할 수 있는 단백질이다.  Cas9 protein is a major protein component of the CRISPR / Cas system, a protein that can function as an activated endonuclease or nickase.
Cas9 단백질 또는 유전자 정보는 NCBI (National Center for Biotechnology Informat ion)의 GenBank와 같은 공지의 데이터 베이스에서 얻을 수 .있다. 예컨대, 상기 Cas9 단백질은 스트랩토코커스 sp. {Streptococcus sp.), 예컨대, 스트렙토코커스 피요젠스 {Streptococcus pyogenes) 유래의 Cas9 단백질 (예컨대, SwissProt Accession number Q99ZW2(NP_269215.1) (암호화 유전자: 서열번호 4); Cas9 protein or genetic information can be obtained from known databases such as GenBank of the National Center for Biotechnology Informat ion (NCBI). For example, the Cas9 protein Strap Toe Caucasus sp. {Streptococcus sp.), Such as Cas9 protein from Streptococcus pyogenes (eg SwissProt Accession number Q99ZW2 (NP_269215.1) (coding gene: SEQ ID NO: 4));
캄필로박터 속, 예컨대, 캄필로박터 제주니 Campylobacter jejuni) 유래의 Cas9 단백질;  Cas9 protein from the genus Campylobacter, such as, for example, Campylobacter jejuni;
스트렙토코커스 속, 예컨대, 스트렙토코커스 써모필러스 {Streptococcus thermophi les) 또는 스트렙토코커스 아우레우스 {Streptocuccus aureus) 유래의 Cas9 단백질;  Cas9 protein from the genus Streptococcus, such as Streptococcus thermophi les or Streptococcus aureus;
네이세리아 메닝기디티스 Neisseria meningitidis) 유래의 Cas9 단백질;  Cas9 protein from Neisseria meningitidis);
파스테우렐라 PasteureJ d 속, 예컨대, 파스테우렐라 물토시다 {Pasteurella multocida) 유래의 Cas9 단백질;  Cas9 protein from Pasteureella PasteureJ d, such as Pasteurella multocida;
프란시셀라 F mci sella) 속, 예컨대, 프란시셀라 노비시다 {Francisel la novicida) 유래의 예컨대 Cas9 단백질  For example Cas9 protein from the genus Francisla F mci sella), such as, for example, Franciscan la novicida.
등으로 이루어진 군에서 선택된 하나 이상일 수 있으나, 이에 제한되는 것은 아니다.  It may be one or more selected from the group consisting of, but is not limited thereto.
Cpfl 단백질은 상기 CRISPR/Cas 시스템과는 구별되는 새로운 CRISPR 시스템의 엔도뉴클레아제로서, Cas9에 비해 상대적으로 크기가 작고 tracrRNA가 필요 없으며, 단일 가이드 RNA에 의해 작용할 수 있다. 또한, 티민 (thymine)이 풍부한 PAM (protospacer-adjacent motif) 서열을 인식하고 DNA의 이중 사슬을 잘라 점착종단 (cohesive end; cohesive double-strand break)을 생성한다.  The Cpfl protein is an endonuclease of the new CRISPR system that is distinct from the CRISPR / Cas system, which is relatively small in size compared to Cas9, does not require tracrRNA, and can act by a single guide RNA. It also recognizes a thymine-rich protospacer-adjacent motif (PAM) sequence and cuts the double chain of DNA to create a cohesive end (cohesive double-strand break).
예컨대, 상기 Cpfl 단백질은 캔디다투스 iCandidatus) 속, 라치노스피라 Lachnospira) 속, 뷰티리비브리오 Butyrivibrio) 속, 페레그리니박테리아 {Per egr ini bacteria) , 액사도미노코쿠스 For example, the Cpfl protein may be found in the genus Candida iCandidatus, Lachnospira, Butyrivibrio, Peregr ini bacteria, and axadominococus.
Ucidominococcus) 속, 포르파이로모나스 Porphyromcmas) 속, 프레보텔라 {Prevotella) 속, 프란시셀라 {Francisel la) 속, 캔디다투스 메타노플라스마 (Cand iatus Methanoplas a) , 또는 유박테리움 Eubacterium) 속 유래의 것일 수 있고, 예컨대, Parcubacter ia bacterium (GWC2011ᅳ GWC2_44— 17), Lachnospiraceae bacterium (MC2017) , Butyri vibrio proteoclasi icus, Per egr in ibact er ia bacterium (GW2011_GWA_33_10) , Acidaminococcus sp. (BV3L6) , Porphyromona s macacae, Lachnospiraceae bacterium (ND2006) , Porphyromonas crevi or /cam's, Prevotel la disiens, Mo r axel J a bovocul i (237) , Smi ihel la sp . (SC— K08D17) , Leptospira inadai , Lachnospiraceae bacterium (MA2020) , Francisel la novicida (U112) , Candidatus Methanoplasma termitum, Candidatus Paceibacter , Eubacter ium eligens등의 미생물 유래의 것일 수 있으나, 이에 제한되는 것은 아니다 . 상기 표적 특이적 엔도뉴클레아제는 미생물에서 분리된 것 또는 재조합적 방법 또는 합성적 방법 등과 같이 인위적 또는 비자연적 생산된 것 (non-natura l ly occurr ing)일 수 있다. 일 예에서 , 상기 표적 특이적 엔도뉴클레아제 (예컨대, Cas9 , Cpf l , 등)은 재조합 DNA에 의하여 만들어진 재조합 단백질일 수 있다. 재조합 DAN(Recomb i nant DNA ; rDNA)는 다양한 유기체로부터 얻어진 이종 또는 동종 유전 물질을 포함하기 위하여 분자 클로닝과 같은 유전자 재조합 방법에 의하여 인공적으로 만들어진 DNA 분자를 의미한다. 예컨대 ; 재조합 DNA를 적절한 유기체에서 발현시켜 표적 특이적 엔도뉴클레아제를 생산 Un vivo 또는 in ro)하는 경우, 재조합 DNA는 제조하고자 하는 단백질을 암호화 하는 코돈들 중에서 상기 유기체에 발현하기에 최적화된 코돈을 선택하여 재구성된 뉴클레오타이드 서열을 갖는 것일 수 있다. From the genus Ucidominococcus, the genus Porphyromcmas, the genus Prevotella, the genus Francisel la, the Candi iatus Methanoplas a, or the genus Eubacterium. For example, Parcubacter ia bacterium (GWC2011 ᅳ GWC2_44—17), Lachnospiraceae bacterium (MC2017), Butyri vibrio proteoclasi icus, Per egr in ibact er ia bacterium (GW2011_GWA_33_10), Acidaminococcus sp. (BV3L6), Porphyromona s macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevi or / cam 's, Prevotel la disiens, Mo r axel J a bovocul i (237), Smi ihel la sp. (SC— K08D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisel la novicida (U112), Candidatus Methanoplasma termitum, Candidatus Paceibacter, Eubacter ium eligens, and the like, but are not limited thereto. The target specific endonucleases may be isolated from microorganisms or artificially or non-naturally produced such as recombinant or synthetic methods. In one example, the target specific endonucleases (eg Cas9, Cpf l, etc.) may be recombinant proteins made by recombinant DNA. Recombinant DNA (rDNA) refers to DNA molecules artificially produced by genetic recombination methods such as molecular cloning to include heterologous or homologous genetic material obtained from various organisms. For example; When the recombinant DNA is expressed in an appropriate organism to produce a target specific endonuclease (Un vivo or in ro), the recombinant DNA selects a codon optimized for expression in the organism among the codons encoding the protein to be prepared. It may be one having a nucleotide sequence reconstructed.
상기 불활성화된 표적특이적 엔도뉴클레아제불활성화된 표적특이적 엔도뉴클레아제는 DNA 이중 가닥을 절단하는 엔도뉴클레아제 활성을 상실한 표적특이적 엔도뉴클레아제을 의미하는 것으로, 예컨대, 엔도뉴클레아제 활성을 상실하고 니케이즈 활성을 갖는 불활성화된 표적특이적 엔도뉴클레아제_ 및 엔도뉴클레아제 활성과 니케이즈 활성을 모두 상실한 불활성화된 표적특이적 엔도뉴클레아제 중에서 선택된 1종 이상일 수 있다. 일 예에서, 상기 불활성화된 표적특이적 엔도뉴클레아제가 니케이즈 활성을 갖는 것일 수 있으며, 이 경우 상기 시토신이 우라실로 변환되는 것과 동시 또는 순서와 무관하게 순차적으로, 시토신이 우라실로 변환된 가닥 또는 그 반대 가닥 (예컨대 반대 가닥)에서 ni ck이 도입된다 (예컨대, PAM 서열이 존재하는 가닥의 반대 가닥에 PAM 서열의 5 1 말단 방향으로 3번째 뉴클레오타이드와 4번째 뉴클레오타이드 사이에 해당하는 위치에 ni ck이 도입됨) . 이와 같은 표적특이적 엔도뉴클레아제의 변형 (돌연변이)는 적어도 촉매 활성을 갖는 아스파르트산 잔기 ( cat a lyt i c aspar t ate res i due ; 예컨대, 스트렙토코커스 피요젠스 유래 Cas9 단백질의 경우 10번째 위치의 아스파르트산 (D10) , 762번째 위치의 글루탐산 (E762) , 840번째 위치의 히스티딘 (H840) , 854번째 위치의 아스파라긴 (N854) , 863번째 위치의 아스파라긴 (N863) , 986번째 위치의 아스파르트산 (D986) 등으로 이루어진 군에서 선택된 하나 이상)가 임의의 다른 아미노산으로 치환된 Cas9의 돌연변이를 포함하는 것일 수 있으며, 상기 다른 아미노산은 알라닌 ( a l an i ne)일 수 있지만, 이에 제한되지 않는다. The inactivated target specific endonuclease inactivated target specific endonuclease refers to a target specific endonuclease that has lost endonuclease activity that cleaves a DNA double strand, for example, an endonuclease. At least one selected from inactivated target specific endonucleases that have lost activity and have Nikase activity and inactivated target specific endonucleases that have lost both endonuclease activity and Nikase activity have. In one example, the inactivated target specific endonuclease may have a Nikase activity, in which case the cytosine is converted to uracil simultaneously or sequentially with the cytosine being converted to uracil Or ni ck is introduced at the opposite strand (such as the opposite strand) (e.g., at a position corresponding to the third and fourth nucleotides in the 5 1 terminal direction of the PAM sequence to the opposite strand of the strand where the PAM sequence is present). ck is introduced). Such modifications (mutations) of the target specific endonuclease are at least catalyzed aspartic acid residues (cat a lytic aspar t ate res i due; e.g., in the tenth position for a Streptococcus pyogenes-derived Cas9 protein). Aspartic acid (D10), Glutamic acid at position 762 (E762), Histidine at position 840 (H840), Asparagine at position 854 (N854), Asparagine at position 863 (N863), Aspartic acid at position 986 (D986) ) And at least one selected from the group consisting of the same) may include a mutation of Cas9 substituted with any other amino acid, and the other amino acid may be alanine (al an i ne), but is not limited thereto.
본 명세서에 사용된 바로서, 상기 '다른 아미노산'은, 알라닌, 이소류신, 류신, 메티오닌, 페닐알라닌, 프를린, 트립토판, 발린, 아스파라긴산, 시스테인, 글루타민, 글리신, 세린, 트레오닌, 티로신, 아스파르트산, 글루탐산, 아르기닌, 히스티딘, 라이신, 상기 아미노산들의 공지된 모든 변형체 중에서, 야생형 단백질이 원래 변이 위치에 갖는 아미노산을 제외한 아미노산들 중에서 선택된 아미노산을 의미한다.  As used herein, the 'other amino acids' are alanine, isoleucine, leucine, methionine, phenylalanine, plinine, tryptophan, valine, aspartic acid, cysteine, glutamine, glycine, serine, threonine, tyrosine, aspartic acid, Glutamic acid, arginine, histidine, lysine, among all known variants of the amino acids, refers to an amino acid selected from among amino acids except for those that the wild type protein originally had at the mutation site.
일 예에서, 상기 불활성화된 표적특이적 엔도뉴클레아제가 변형 Cas9 단백질인 경우, 변형 Cas9 단백질은 스트렙토코커스 피요젠스 { Streptococcus pyogenes) 유래의 Cas9 단백질 (예컨대, Swi ssProt Access i on number Q99ZW2(NP_269215. 1) )에 D10 또는 H840 위치에서의 돌연변이 (예컨대, 다른 아미노산으로의 치환) 가 도입되어 엔도뉴클레아제 활성이 상실되고 니케이즈 활성을 갖는 변형 Cas9 , _ 스트렙토코커스 피요젠스 ( Streptococcus pyogenes) 유래'의 Cas9 단백질에 D'10 및 H840 위치 모두에 돌연변이 (예컨대, 다른 아미노산으로의 치환)가 도입되어 엔도뉴클레아제 활성 및 니케이즈 활성을 모두 상실한 변형 Cas9 단백질 등으로 이루어진 군에서 선택된 1종 이상일 수 있다. 예컨대, 상기 CAs9 단백질의 D10 위치에서의 돌연변이는 D10A 돌연변이 (Cas9 단백질의 아미노산 중 10번째 아미노산인 D가 A로 치환된 돌연변이를 의미함; 이하, Cas9에 도입된 돌연변이는 동일한 방법으로 표기됨)일 수 있고, 상기 H840 위치에서의 돌연변이는 H840A 돌연변이일 수 있다. 일 구체예에서, 상기 불활성화된 표적특이적 엔도뉴클레아제는 스트랩토코커스 피요젠스 { Streptococcus pyogenes) 유래의 Cas9 단백질 (서열번호 4)의 D10가 A로 치환된 D10A 돌연변이를 갖는 니케이즈 (예컨대, 서열번호 11에 의하여 코딩됨)일 수 있다. In one embodiment, where the inactivated target specific endonuclease is a modified Cas9 protein, the modified Cas9 protein is a Cas9 protein derived from Streptococcus pyogenes (eg, Swi ssProt Access i on number Q99ZW2 (NP_269215. 1) a mutation at the D10 or H840 position (e.g., substitution with another amino acid) is introduced, resulting in a modified Cas9, _ Streptococcus pyogenes having lost endonuclease activity and having Nikase activity ' of the mutation in both D '10 and H840 position in Cas9 protein (e. g., substitution of a different amino acid) is introduced endonuclease activity and Zernike rise lost all the active strain Cas9 be at least one member selected from the group consisting of proteins, etc. have. For example, the mutation at the D10 position of the CAs9 protein means a D10A mutation (mutation in which D, the tenth amino acid of the amino acids of the Cas9 protein, is replaced by A; hereinafter, a mutation introduced into Cas9 is represented by the same method). And the mutation at the H840 position may be a H840A mutation. In one embodiment, the inactivated target specific endonuclease comprises a Nikase (eg, a D10A mutation having a D10A mutation in which D10 of Cas9 protein (SEQ ID NO: 4) from Streptococcus pyogenes (SEQ ID NO: 4) is substituted for A). , Encoded by SEQ ID NO: 11).
상기 시티딘 디아미나제와 불활성화된 표적특이적 엔도뉴클레아제는 직접 또는 펩타이드 링커를 통하여 서로 융합된 융합 단백질 (예컨대, N- 말단에서 C—말단 방향으로 시티딘 디아미나제—불활성화된 표적특이적 엔도뉴클레아제 순서로 위치하거나 (즉, 시티딘 디아미나제의 C-말단에 불활성화된 표적특이적 엔도뉴클레아제가 융합됨), 불활성화된 표적특이적 엔도뉴클레아제—시티딘 디아미나제 순서로 위치 (즉, 불활성화된 표적특이적 엔도뉴클레아제의 C-말단에 시티딘 디아미나제가 융합됨)할 수 있음) 형태로 사용 (또는 상기 조성물에 포함)되거나 되거나, 정제된 시티딘 디아미나제 또는 이를 암호화하는 mRNA와 불활성화된 표적특이적 엔도뉴클레아제 또는 이를 암호화하는 mRNA의 흔합물 형태로 사용 (또는 상기 조성물에 포함)되거나, 시티딘 디아미나제 암호화 유전자와 불활성화된 표적특이적 엔도뉴클레아제 암호화 유전자가 모두 포함 (예컨대, 상기 두 유전자는 앞서 설명한 융합 단백질을 암호화하도록 포함됨)된 하나의 플라스미드 형태로 사용 (또는 상기 조성물에 포함)되거나, 시티딘 디아미나제 암호화 유전자와 불활성화된 표적특이적 엔도뉴클레아제 암호화 유전자가 각각 별개의 플라스미드에 포함된 시티딘 디아미나제 발현 플라스미드와 불활성화된 표적특이적 엔도뉴클레아제 발현 플라스미드의 흔합물 형태로 사용 (또는 상기 조성물에 포함)될 수 있다. 일 구체예에서는 N—말단에서 C-말단 방향으로, 시티딘 디아미나제 -불활성화된 표적특이적 엔도뉴클레아제 순서로 위치하는 융합 단백질, 또는 불활성화된 ' 표적특이적 엔도뉴클레아제-시티딘 디아미나제 순서로 위치하는 . 융합 단백질, 또는 상기 융합 단백질을 암호화하도록 시티딘 디아미나제 암호화 유전자와 불활성화된 표적특이적 엔도뉴클레아제 암호화 유전자가 하나의 플라스미드에 포함된 형태로 사용될 수 있다. The cytidine deaminase and inactivated target specific endonucleases are fused proteins fused to each other directly or via a peptide linker (e.g., cytidine deaminase—inactivated from the N-terminus to the C-terminal direction). Target-specific endonucleases located in target-specific endonuclease sequence (ie, inactivated target-specific endonucleases are fused to the C-terminus of cytidine deaminase) or inactivated target-specific endonucleases—cities Can be positioned in the order of the din deaminase (ie, the cytidine deaminase is fused to the C-terminus of the inactivated target specific endonuclease) Used in the form (or included in the composition), or in the form of a purified cytidine deaminase or mRNA encoding it and an inactivated target specific endonuclease or a combination of mRNA encoding it (or the above In the composition, or in the form of one plasmid containing both a cytidine deaminase coding gene and an inactivated target specific endonuclease coding gene (eg, the two genes are included to encode the fusion protein described above). Target-specific endonuclease coding genes or inactivated (or included in the composition) or inactivated cytidine deaminase expression plasmids each contained in separate plasmids. Use in the form of a complex of a target specific endonuclease expression plasmid (or the composition above) On may be included). In one embodiment the fusion protein is located in the sequence N—terminal to C-terminal, in the order of cytidine deaminase-inactivated target specific endonuclease, or inactivated ' target specific endonuclease- Located in the order of Citidine Diaminase. A fusion protein, or a cytidine deaminase coding gene and a target specific endonuclease coding gene inactivated to encode the fusion protein, may be used in the form included in one plasmid.
상기 플라스미드는 상기 시티딘 디아미나제 암호화 유전자 및 /또는 불활성화된 표적특이적 엔도뉴클레아제 암호화 유전자를 삽입하고 이를 숙주세포 내에서 발현시킬 수 있는 발현 시스템을 포함하는 모든 플라스미드일 수 있다. 상기 플라스미드는 목적 유전자 발현을 위한 요소 (elements)를 포함하는 것으로, 복제원점 (repl i cat ion or igin) , 프로모터, 작동 유전자 (operator ) , 전사 종결 서열 (terminator) 등을 포함할 수 있고, 숙주 세포의 게놈 내로의 도입을 위한 적절한 효소 부위 (예컨대, 제한 효소 부위) 및 /또는 임의로 숙주 세포 내로의 성공적인 도입을 확인하기 위한 선별 마커 및 /또는 단백질로의 번역을 위한 리보좀 결합 부위 (r ibosome binding si te ; RBS) 및 /또는 전자 조절 인자 등을 추가로 포함할 수 있다. 상기 플라스미드는 당업계에서 사용되는 플라스미드, 예컨대, pcDNA 시리즈, pSClOl , pGV1106 , pACYC177, ColEl , pKT230 , pME290 , pBR322 , pUC8/9 , pUC6 , pBD9 , pHC79 , PIJ61 , pLAFRl , pHV , pGEX 시리즈, pET 시리즈, pUC19 등으로 이루어진 군에서 선택된 1종 이상일 수 있으나, 이에 제한되는 것은 아니다. 상기 숙주세포는 상기 시티딘 디아미나제에 의하여 염기 교정 또는 이중 가닥 절단을 도입하고자 하는 세포 (예컨대, 인간 세포 등과 같은 포유류 세포를 포함하는 진핵 세포) 또는 상기 시티딘 디아미나제 암호화 유전자 및 /또는 불활성화된 표적특이적 엔도뉴클레아제 암호화 유전자를 발현하여 시티딘 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제를 발현할 수 있는 모든 세포 (예컨대, E. coli 등) 들 중에서 선택될 수 있다. The plasmid may be any plasmid including an expression system capable of inserting the cytidine deaminase coding gene and / or inactivated target specific endonuclease coding gene and expressing it in a host cell. The plasmid contains elements for gene expression of interest, and may include a rep i cat ion or igin, a promoter, an operator, a transcription terminator, and the like. Appropriate enzyme sites (eg, restriction enzyme sites) for introduction into the genome of cells and / or selection markers to confirm successful introduction into the host cell and / or ribosomal binding sites for translation into proteins (r ibosome binding si te; RBS) and / or electronic control factors, and the like. The plasmids are plasmids used in the art such as pcDNA series, pSClOl, pGV1106, pACYC177, ColEl, pKT230, pME290, pBR322, pUC8 / 9, pUC6, pBD9, pHC79, P IJ61, pLAFRl, pHV, pGEX series, pET Series, pUC19 may be one or more selected from the group consisting of, but is not limited thereto. The host cell is a cell (eukaryotic cell comprising a mammalian cell such as a human cell) or the cytidine to be subjected to base correction or double strand cleavage by the cytidine deaminase. Any cell capable of expressing a diaminease coding gene and / or an inactivated target specific endonuclease coding gene to express cytidine deaminase and an inactivated target specific endonuclease (eg, E coli, etc.).
상기 가이드 RNA 는 상기 시티딘 디아미나제와 불활성화된 표적특이적 엔도뉴클레아제의 혼합물 또는 융합 단백질을 표적 부위로 안내하는 역할을 하는 것으로, CRISPR. RNA (crRNA), iray75-act ivat ing crRNA (tracrRNA), 및 단일 가이드 RNA (single guide RNA; sgRNA)로 이루어진 군에서 선택된 1 종 이상일 수 있으며, 구체적으로 crRNA 와 tracrRNA 가 서로 결합된 이중 가닥 crRNA: tracrRNA 복합체, 또는 crRNA 또는 그 일부와 tracrRNA 또는 그 일부가 올리고뉴클레오타이드 링커로 연결된 단일 가닥 가이드 RNA (sgRNA)일 수 있다. The guide RNA serves to guide a mixture or fusion protein of the cytidine deaminase and the inactivated target specific endonuclease to the target site, and CRISPR . It may be one or more selected from the group consisting of RNA (crRNA), iray75-act ivat ing crRNA (tracrRNA), and single guide RNA (sgRNA), specifically double-stranded crRNA in which crRNA and tracrRNA are bonded to each other: The tracrRNA complex, or crRNA or portion thereof, and the tracrRNA or portion thereof may be single stranded guide RNA (sgRNA) linked by an oligonucleotide linker.
상기 가이드 RNA 의 구체적 서열은 사용된 표적특이적 엔도뉴클레아제 의 종류 또는 그 유래 미생물 등에 따라서 적절히 선택할 수 있으며, 이는 이 발명이 속하는 기술 분야의 통상의 지식을 가진 자가 용이하게 알 수 있는 사항이다.  The specific sequence of the guide RNA may be appropriately selected depending on the type of target specific endonuclease used or the microorganism derived therefrom, which is easily understood by those skilled in the art. .
표적특이적 엔도뉴클레아제로서 Streptococcus pyogenes 유래의 Cas9 단백질을 사용하는 경우, crRNA는 다음의 일반식 1로 표현될 수 있다:  When using a Cas9 protein from Streptococcus pyogenes as a target specific endonuclease, the crRNA can be expressed by the following general formula (1):
5 ' -(Ncas9) i-(GUUUUAGAGCUA)-(Xcas9)m-3 ' (일반식 1) 5 '-(N cas9 ) i- (GUUUUAGAGCUA)-(X cas9 ) m -3' (Formula 1)
상기 일반식 1에서,  In the general formula 1,
Ncas9 는 표적화 서열, 즉 표적 유전자 (target gene)의 표적 부위 (target site)의 서열에 따라서 결정되는 부위 (즉, 표적 부위의 서열과 흔성화 가능한 서열임 )이며, 1 은 상기 표적화 서열에 포함된 뉴클레오타이드 수를 나타내는 것으로 17 내지 23 또는 18 내지 22 의 정수, 예컨대 20일 수 있고; N cas9 is a targeting sequence, i.e., a site determined according to the sequence of the target site of the target gene (ie, the sequence can be hybridized with the sequence of the target site), and 1 is included in the targeting sequence. Indicative of the number of nucleotides formed, which may be an integer from 17 to 23 or 18 to 22, such as 20;
상기 표적 서열의 3' 방향으로 인접하여 위치하는 연속하는 12 개의 뉴클레오타이드 (GUUUUAGAGCUA; 서열번호 1)를 포함하는 부위는 crRNA 의 필수적 부분이고,  The site comprising 12 consecutive nucleotides (GUUUUAGAGCUA; SEQ ID NO: 1) located adjacent to the 3 'direction of the target sequence is an essential part of the crRNA,
Xcas9는 crRNA 의 3' 말단쪽에 위치하는 (즉, 상기 crRNA 의 필수적 부분의 3' 방향으로 인접하여 위치하는) m 개의 뉴클레오타이드를 포함하는 부위로, m 은 8 내지 12 의 정수, 예컨대 11 일 수 있으며, 상기 m 개의 뉴클레오타이드들은 서로 같거나 다를 수 있으며, 각각 독립적으로 A, U, C 및 G로 이루어진 군에서 선택될 수 있다. 일 예에서 , 상기 Xcas9는 UGCUGUUUUG (서열번호 2)를 포함할 수 있으나 이에 제한되지 않는다. X cas9 is a site comprising m nucleotides located at the 3 'end of the crRNA (ie, located adjacent to the 3' direction of an essential part of the crRNA), where m is an integer from 8 to 12, such as 11 The m nucleotides may be the same as or different from each other, and may be independently selected from the group consisting of A, U, C, and G. In one example, the X cas9 may include UGCUGUUUUG (SEQ ID NO: 2), but is not limited thereto.
또한, 상기 tracrRNA는 다음의 일반식 2로 표현될 수 있다:  In addition, the tracrRNA may be represented by the following general formula (2):
5 — (Ycas9)p一  5 — (Ycas9) p 一
(일반식 2) (Formula 2)
상기 일반식 2에서,  In the general formula 2,
60 개의 뉴클레오타이드 60 nucleotides
( UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCMCUUGAAAAAGUGGCACCGAGUCGGUGC; (UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCMCUUGAAAAAGUGGCACCGAGUCGGUGC;
서열번호 3)로 표시된 부위는 tracrRNA의 필수적 부분이고, The site represented by SEQ ID NO: 3) is an integral part of the tracrRNA,
YcaS9 는 상기 tracrRNA 의 필수적 부분의 5' 말단에 인접하여 위치하는 p개의 뉴클레오타이드를 포함하는 부위로, p는 6 내지 20의 정수, 예컨대 8 내지 19 의 정수일 수 있으며, 상기 p 개의 뉴클레오타이드들은 서로 같거나 다를 수 있고, A, U, C 및 G 로 이루어진 군에서 각각 독립적으로 선택될 수 있다. Y caS 9 is a site comprising p nucleotides located adjacent to the 5 'end of the essential portion of the tracrRNA, p may be an integer of 6 to 20, such as an integer of 8 to 19, the p nucleotides are May be the same or different and may be independently selected from the group consisting of A, U, C and G, respectively.
또한, sgRNA는 상기 crRNA의 표적화.서열과 필수적 부위를 포함하는 crRNA 부분과 상기 tracrRNA 의 필수적 부분 (60 개 뉴클레오타이드)를 포함하는 tracrRNA 부분이 올리고뉴클레오타이드 링커를 통하여 헤어핀 구조 (stem-loop 구조)를 형성하는 것일 수 있다 (이 때, 을리고뉴클레오타이드 링커가 루프 구조에 해당함). 보다 구체적으로, 상기 sgRNA 는 crRNA 의 표적화 서열과 필수적 부분을 포함하는 crRNA 부분과 tracrRNA 의 필수적 부분을 포함하는 tracrRNA 부분이 서로 결합된 이중 가닥 RNA 분자에세 crRNA 부위의 3' 말단과 tracrRNA 부위의 5' 말단이 을리고뉴클레오타이드 링커를 통하여 연결된 해어핀 구조를 갖는 것일 수 있다.  In addition, the sgRNA is targeted to the crRNA.The crRNA portion comprising the sequence and the essential portion and the tracrRNA portion including the essential portion (60 nucleotides) of the tracrRNA form a hairpin structure through the oligonucleotide linker (stem-loop structure). (In this case, the ligonucleotide linker corresponds to the loop structure). More specifically, the sgRNA is a double-stranded RNA molecule in which a crRNA portion including a targeting sequence and an essential portion of the crRNA and a tracrRNA portion including an essential portion of the tracrRNA are bonded to each other. 'The terminal may have a hampin structure connected via a ligonucleotide linker.
일 예에서 , sgRNA는 다음의 일반식 3으로 표현될 수 있다:  In one example, the sgRNA can be represented by the following general formula 3:
5'—
Figure imgf000013_0001
(올리고뉴클레오타이드 링커) -
5'—
Figure imgf000013_0001
(Oligonucleotide linker)-
(일반식 3) (Formula 3)
상기 일반식 3 에서 , (Ncasg)!는 표적화 서열로서 앞서 일반식 1 에서 설명한 바와 같다.  In Formula 3, (Ncasg)! Is the same as described above in Formula 1 as the targeting sequence.
상기 sgRNA 에 포함되는 을리고뉴클레오타이드 링커는 3 내지 5 개, 예컨대 4 개의 뉴클레오타이드를 포함하는 것일 수 있으며, 상기 뉴클레오타이드들은 서로 같거나 다를 수 있고, A, U, (: 및 G 로 이루어진 군에서 각각 독립적으로 선택될 수 있다. The ligonucleotide linker included in the sgRNA may be one containing 3 to 5, such as 4 nucleotides, The nucleotides may be the same or different from each other, and may be independently selected from the group consisting of A, U, (: and G).
상기 crRNA 또는 sgRNA는 5' 말단 (즉, crRNA의 타겟팅 서열 부위의 5' 말단)에 1 내지 3개의 구아닌 (G)을 추가로 포함할 수 있다.  The crRNA or sgRNA may further comprise 1-3 guanine (G) at the 5 'end (ie, the 5' end of the targeting sequence region of the crRNA).
상기 tracrRNA 또는 sgRNA 는 tracrRNA 의 필수적 부분 (60nt)의 3' 말단에 5 개 내지 7 개의 우라실 (U)을 포함하는 종결부위를 추가로 포함할 수 있다.  The tracrRNA or sgRNA may further comprise a termination region comprising 5 to 7 uracils (U) at the 3 ′ end of the essential portion (60nt) of the tracrRNA.
상기 가이드 RNA 의 표적 서열은 표적 DNA 상의 PAM (Protospacer Adjacent Motif 서열 (5. pyogenes Cas9 의 경우, 5'— NGG-31 (N 은 A, T, G, 또는 C 임 ))의 5'에 인접하여 위치하는 약 17 개 내지 약 23 개 또는 약 18개 내지 약 22개 , 예컨대 20개의 연속하는 핵산 서열일 수 있다. The target sequence of the guide RNA is adjacent to 5 'of the PAM (Protospacer Adjacent Motif sequence (5.— NGG-3 1 (N is A, T, G, or C) for pyogenes Cas9)) on the target DNA. And from about 17 to about 23 or from about 18 to about 22, such as 20 contiguous nucleic acid sequences.
상기 가이드 RNA의 표적 서열과 흔성화 가능한 가이드 RNA의 표적화 서열은 상기 표적 서열이 위치하는 DNA 가닥 (즉 PAM 서열 (5'— NGG-3' (N은 A, T, G, 또는 C 임)이 위치하는 DNA 가닥)의 상보적인 가닥의 뉴클.레오타이드 서열과 50% 이상, 60% 이상, 70% 이상, 80% 이상, 90% 이상 95%.이상, 99% 이상, 또는 100%의 서열 '상보성을 갖는 뉴클레오타이드 서열을 의미하는 것으로, 상기 상보적 가닥의 뉴클레오타이드 서열과 상보적 결합이 가능하다. The targeting sequence of the guide RNA, which is capable of hybridizing with the target sequence of the guide RNA, is the DNA strand in which the target sequence is located (ie, the PAM sequence (5′— NGG-3 ′ (N is A, T, G, or C)). position nyukeul of the complementary strands of the DNA strands) to Leo tide sequence that is at least 50%, at least 60%, at least 70%, 80%, at least 90% or 95%, more than 99%, or 100% sequence " It means a nucleotide sequence having complementarity, it is possible to complementarily bind to the nucleotide sequence of the complementary strand.
본 명세서에서, 표적 부위의 핵산 서열은 표적 유전자의 해당 유전자 부위의 두 개의 DNA 가닥 중 PAM 서열이 위.치하는 가닥의 핵산 서열로 표시된다. 이 때, 실제로 가이드 RNA 가 결합하는 DNA 가닥은 PAM 서열이 위치하는 가닥의 상보적 가닥이므로, 상기 가이드 RNA 에 포함된 표적화 서열은, RNA 특성상 T를 U로 변경하는 것을 제외하고, 표적 부위의 '서열과 동일한 핵산 서열을 갖게 된다. 따라서, 본 명세서에서, 가이드 RNA 의 표적화 서열과 표적 부위의 서열 (또는 절단 부위의 서열)은 T 와 U가 상호 변경되는 ,것을 제외하고 동일한 핵산 서열로 표시된다. In the present specification, the nucleic acid sequence of the target site is a PAM sequence of two DNA strands of the corresponding gene site of the target gene . The plaque is represented by the stranded nucleic acid sequence. At this time, since the DNA strand to which the guide RNA actually binds is the complementary strand of the strand where the PAM sequence is located, the targeting sequence included in the guide RNA is changed from T to U on the basis of RNA characteristics. It will have the same nucleic acid sequence as the sequence. Thus, in this specification, the targeting sequence of the guide RNA and the sequence of the target site (or the sequence of the cleavage site) are represented by the same nucleic acid sequence except that T and U are mutually altered.
상기 가이드 RNA는 RNA 형태로 사용 (또는 상기 조성물에 포함)되거나, 이를 암호화하는 DNA를 포함하는 플라스미드 형태로 사용 (또는 상기 조성물에 포함)될 수 있다.  The guide RNA may be used in the form of RNA (or included in the composition), or in the form of a plasmid containing DNA encoding the same (or in the composition).
본 명세서에 기재된 조성물 및 방법은 우라실-특이적 제거 시약 (Uracil-Specific Excision Reagent; USER)을 포함 또는 사용하지 않는 것을 특징으로 하는 것일 수 있다. 상기 우라실-특이적 제거 시약은 상기 시티딘 디아미나제에 의하여 시토신로부터 변환된 우라실을 제거하고, 및 /또는 상기 우라실이 제거된 위치에 DNA 절단을 도입하는 역할을 하는 모든 물질을 포함할 수 있다. The compositions and methods described herein can be characterized as including or not using uracil-specific removal reagents (USER). The uracil-specific removal reagent serves to remove uracil converted from cytosine by the cytidine deaminase and / or to introduce DNA cleavage at the position where the uracil is removed. It can include all materials.
일 예에서, 상기 우라실-특이적 제거 시약 (Uracil-Specific Excision Reagent; USER)은 우라실 DNA 글라이코실라제 (Uracil DNA glycosylase; UDG) , 엔도뉴클레아제 VIII, 및 이들의 조합을 포함한다. 일 예에서, 상기 우라실—특이적 제거 시약은 엔도뉴클레아제 VIII 또는 우라실 DNA 글라이코실라제와 엔도뉴클레아제 VIII의 조합을 포함하는 것일 수 있다.  In one embodiment, the uracil-specific removal reagent (USER) comprises uracil DNA glycosylase (UDG), endonuclease VIII, and combinations thereof. In one example, the uracil-specific removal reagent may comprise an endonuclease VIII or a combination of uracil DNA glycosylase and endonuclease VIII.
우라실 DNA 글라이코실라제 (Uracil DNA glycosylase; UDG)는 DNA에 존재하는 우라실 (U)을 제거하여 DNA의 mutagenesis를 방지하는 작용을 하는 효소로서, 우라실의 N-glycosylic bond을 절단함으로써 base— excision repair (BER) pathway를 개시하도록 하는 역할을 하는 모든 효소들 중에서 1종 이상 선택될 수 있다. 예컨대, 상기 우라실 DNA 글라이코실라제는 Escherichia coli 우라실 DNA 글라이코실라제 (예컨대, GenBank Accession Nos. ADX49788.1 , ACT28166.1, EFN36865.1, BAA10923.1 , ACA76764.1, ACX38762.1 , EFU59768.1 , EFU53885.1 , EFJ57281.1, EFU47398.1 , EFK71412.1, EFJ92376.1, EFJ79936.1, EF059084..1 , EFK47562.1 , ΚΧΗΟ 1728.1, ESE25979.1 ESD99489.1 , ESD73882.1 , ESD69341.1 등), 인간 우라실 DNA 글라이코실라제 (예컨대 GenBank Accession Nos. NP_003353.1 , NP_550433.1 등), 마우스 우라실 DNA 글라이코실라제 (예컨대, GenBank Accession Nos. NP_001035781.1, NP_035807.2 등) 등으로 이루어진 군에서 선택된 1종 이상일 수 있으나, 이에 제한되는 것은 아니다.  Uracil DNA glycosylase (UDG) is an enzyme that acts to prevent the mutagenesis of DNA by removing uracil (U) present in DNA, and by cutting the N-glycosylic bond of uracil, base—excision repair One or more of the enzymes that play a role in initiating the (BER) pathway can be selected. For example, the uracil DNA glycosylase is Escherichia coli uracil DNA glycosylase (eg, GenBank Accession Nos. ADX49788.1, ACT28166.1, EFN36865.1, BAA10923.1, ACA76764.1, ACX38762.1, EFU59768 .1, EFU53885.1, EFJ57281.1, EFU47398.1, EFK71412.1, EFJ92376.1, EFJ79936.1, EF059084..1, EFK47562.1, ΚΧΗΟ 1728.1, ESE25979.1 ESD99489.1, ESD73882.1, ESD69341.1, etc.), human uracil DNA glycosylase (such as GenBank Accession Nos. NP_003353.1, NP_550433.1, etc.), mouse uracil DNA glycosylase (such as GenBank Accession Nos. NP_001035781.1, NP_035807.2 Etc.) may be one or more selected from the group consisting of, but is not limited thereto.
상기 엔도뉴클레아제 VIII는 상기 우라실이 제거된 뉴클레오타이드를 제거하는 역할을 하는 것으로, 상기 우라실 DNA 글라이코실라제에 의하여 손상된 우라실을 이중 가닥 DNA로부터 제거하는 N-glycosylase 활성과 상기 손상된 우라실 제거로부터 발생한 apurinic site (AP site)의 3' 및 5' 말단을 절단하는 AP- lyase 활성을 모두 갖는 모든 효소들 중에서 1종 이상 선택될 수 있다. 예컨대, 상기 엔도뉴클레아제 VIII는 인간 엔도뉴클레아제 VIII (예컨대, GenBank Accession Nos. BAC06476.1 , ΝΡ_001339449.1, ΝΡ_001243481.1 , ΝΡ_078884.2, ΝΡ_001339448.1 등), 마우스 엔도뉴클레아제 VIII (예컨대, GenBank Accession Nos. BAC06477.1, NP— 082623.1 등), Escherichia coli 엔도뉴클레아제 VIII (예컨대, GenBank Accession Nos. OBZ49008.1, 0BZ43214.1, 0BZ42025.1 , ANJ41661.1, KYL40995.1, KMV55034.1, 丽 53379.1, 丽 50038.1, KMV40847.1 , AQW72152.1 등) 등으로 이루어진 군에서 선택된 1종 이상일 수 있으나, 이에 제한되는. 것은 아니다. The endonuclease VIII serves to remove the nucleotide from which the uracil has been removed, resulting from N-glycosylase activity that removes uracil damaged by the uracil DNA glycosylase from the double-stranded DNA and the damaged uracil removal. At least one may be selected from all enzymes having both AP-lyase activity that cleaves 3 'and 5' ends of an apurinic site (AP site). For example, the endonuclease VIII is human endonuclease VIII (eg, GenBank Accession Nos. BAC06476.1, ΝΡ_001339449.1, ΝΡ_001243481.1, ΝΡ_078884.2, ΝΡ_001339448.1, etc.), mouse endonuclease VIII (Eg, GenBank Accession Nos. BAC06477.1, NP-082623.1, etc.), Escherichia coli endonuclease VIII (eg, GenBank Accession Nos. OBZ49008.1, 0BZ43214.1, 0BZ42025.1, ANJ41661.1, KYL40995.1 , KMV55034.1, 丽 53379.1, 丽 50038.1, KMV40847.1, AQW72152.1, etc.), and the like. It is not.
다른 예는, (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA) , 및 (c) 가이드 RNA 또는 이의 암호화 유전자를,  Other examples include (a) deaminase or its coding gene (cDNA, rDNA, or mRNA), (b) inactivated target specific endonuclease or its coding gene (cDNA, rDNA, or mRNA), and (c) guide RNA or a coding gene thereof,
세포에 도입하거나 세포로부터 분리된 DNA에 접촉시키는 단계 를 포함하는, DNA 에 단일 가닥 절단 (double strand break)를 생성하는 방법을 제공한다. 상기 방법은 우라실—특이적 제거 시약 (Uracil- Specific Excision Reagent; USER)을 처리하는 단계를 포함하지 않는 것일 수 있다.  Provided is a method of creating a double strand break in DNA comprising introducing into or contacting DNA isolated from the cell. The method may be one that does not include treating uracil-specific Excision Reagent (USER).
이와 같이 DNA 에 단일 가닥 절단을 생성 (또는 도입)함으로써, 유전체 DNA. 또는 DNA 의 표적 부위에서 시티딘 디아미나제에 의하여 염기 교정 (base editing, 즉 C 에서 U 로의 변환)이 일어난 위치 또는 상기 단일 가닥 절단이 생성 (도입)된 위치, 염기 교정 효율 등을 분석할 수 있으며, 이를 통하여, .on-target 부위에서의 염기 교정 효율, on-target 서열에 대한.특이성, off— target 서열 등을 확인 (또는 측정)할 수 있다.  Thus, by generating (or introducing) a single stranded break in the DNA, genomic DNA. Alternatively, at the target site of the DNA, the site where the base correction (base editing, ie, the conversion from C to U) occurs by the cytidine deaminase, or the location where the single-strand break is generated (introduced), and the base calibration efficiency can be analyzed. Through this, the base calibration efficiency at the .on-target site, the specificity of the on-target sequence, the off-target sequence, and the like can be identified (or measured).
다른 예는,  Another example is
(i) (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA), (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA,. 또는 mRNA) , 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA 단일 가닥 절단을 유도하는 단계 ; 및  (i) (a) diaminase or its coding gene (cDNA, rDNA, or mRNA), (b) inactivated target specific endonuclease or its coding gene (cDNA, rDNA, .or mRNA), and (c) introducing a guide RNA or a coding gene thereof into the cell or contacting DNA isolated from the cell to induce DNA single strand break; And
(ii) 상기 단일 가닥 절단된 DNA 절편의 핵산 서열을 분석하는 단계 를 포함하는, 디아미나제에 의하여 염기 교정 (base editing)이 도입된 DNA 의 핵산 서열 분석 방법을 제공한다. 상기 방법은 우라실ᅳ특이적 제거 시약 (Uracil-Specific Excision Reagent; USER)을 처리하여 DNA 에 이증 가닥 절단을 생성하는 단계를 포함하지 않는 것일 수 있다.  (ii) analyzing the nucleic acid sequence of the single-stranded DNA fragment, provides a method for nucleic acid sequence analysis of DNA introduced with a base editing by a diminase. The method may not comprise the step of treating the uracil-specific removal reagent (USER) to generate a double strand break in the DNA.
다른 예는,  Another example is
(i) (a) 디아미나제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA) , 및 (b) 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자 (cDNA, rDNA, 또는 mRNA) 및 (c) 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA 단일 가닥 절단을 유도하는 단계 ;  (i) (a) diaminase or its coding gene (cDNA, rDNA, or mRNA), and (b) inactivated target specific endonuclease or its coding gene (cDNA, rDNA, or mRNA) and ( c) introducing a guide RNA or a coding gene thereof into the cell or contacting DNA isolated from the cell to induce DNA single strand break;
(ii) 상기 절단된 DNA 절편의 핵산 서열을 분석하는 단계; 및 (iii) 상기 분석에 의여 수득된 핵산 서열 데이터로부터 단일 가닥 절단 위치를 확인하는 단계 (ii) analyzing the nucleic acid sequence of the cleaved DNA fragment; And (iii) identifying the single stranded cleavage site from the nucleic acid sequence data obtained by the analysis
를 포함하는, 디아미나제의 염기 교정 위치, 단일 가닥 절단 위치ᅳ on-target 부위에서의 염기 교정 효율, 비표적 위치 (off-target site), 및 /또는 표적 특이성을 확인 (또는 측정 또는 검출)하는 방법을 제공한다. 상기 방법은, 예컨대, 상기 단계 (Π) 이후 및 단계 (iii) 이전, 동시 또는 이후에, (iii-1) 상기 분석에 의여 수득된 핵산 서열 데이터 (sequence read)에서 염기 교정 (예컨대, 시토신 (C)의 우라실 (U) 또는 티민 (T)으로의 변환) 여부를 확인하는 단계를 추가로 포함할 수 있다. 상기 방법은 우라실-특이적 제거 시약 (Uracil— Specific Excision Reagent; USER)을 처리하여 DNA 에 이중 가닥 절단을 생성하는 단계를 포함하지 않는 것일 수 있다.  Confirming (or measuring or detecting) base calibration sites, single-strand cleavage sites, on-target sites, base calibration efficiency, off-target sites, and / or target specificities, including, Provide a way to. The method, for example, after step (Π) and before, concurrently or after step (iii), (iii-1) base calibration (eg, cytosine (e.g., cytosine) in nucleic acid sequence data obtained by the analysis. C) may further comprise determining whether to convert to uracil (U) or thymine (T). The method may not comprise the step of treating the uracil-specific Excision Reagent (USER) to generate double strand breaks in the DNA.
일 예에서, 상기 방법 (예컨대, on-target 부위에서의 염기 교정 효율, 비표적 위치 (off— target site) 확인 방법)은, 상기 단계 (iii) 이후에, (iv) 상기 절단 위치가 표적 위치 (on-target site)가 아닌 경우, 비표적 위치 (off-target site)로 확인 (판단)하는 단계를 추가로 포함할 수 있다.  In one example, the method (e.g., base calibration efficiency at an on-target site, a method of identifying off-target sites), wherein after step (iii), (iv) the cleavage site is a target site If not (on-target site), may further comprise the step of identifying (determining) the off-target site.
상기 디아미나제, 불활성화된 표적특이적 엔도뉴클레아제, 가이드 RNA 및 우라실-특이적 제거 시약은 앞서 설명한 바와 같다.  The deaminase, inactivated target specific endonuclease, guide RNA and uracil-specific ablation reagents are as described above.
본 명세서에서 제공된 방법들은 세포 내 또는 시험관 내 (in vitro)에서 수행되는 것일 수 있으며, 예컨대, 시험관 내에서 수행되는 것일 수 있다. 보다 구체적으로, 상기 방법의 모든 단계가 시험관 내 (in vitro)에서 수행되거나, 상기 단계 (i)은 세포 내에서 수행되고, 상기 단계 (ii) 이후 단계는 상기 단계 (i)이 수행된 세포에서 추출된 DNA (예컨대, 유전체 DNA)를 사용하여 시험관 내 (in vitro)에서 수행되는 것일 수 있다. 상기 단계 (i)은 디아미나제 (또는 이의 암호화 유전자) 및 불활성화된 표적특이적 엔도뉴클레.아제 (또는 이의 암호화 유전자)와 가이드 RNA 를 세포에 형질감염시키거나, 또는 상기 세포로부터 추출된 DNA 에 접촉 (예컨대, 함께 배양)시켜, 가이드 RNA 에 의하여 표적화되는 표적 부위 내에서 염기 교정 (염기 변환, 예컨대, 시토신에서 우라샬로의 변환) 및 DNA 단일 가닥에 nick 발생을 유도하는 단계이다. 상기 세포는 디아미나제에 의한 염기 교정 및 /또는 단일 가닥 절단을 도입하고자 하는 모든 진핵 세포들 중에서 선택된 것일 수 있으며, 예컨대, 인간 세포를 포함하는 포유 동물 세포들 중에서 선택될 수 있다. 상기 형질감염은 The methods provided herein may be performed in cells or in vitro, for example, may be performed in vitro. More specifically, all steps of the method are carried out in vitro, or step (i) is performed intracellularly, and step (ii) and subsequent steps are performed in the cell where step (i) is performed. The extracted DNA (eg, genomic DNA) may be used to perform in vitro. Said step (i) comprises transfecting or extracting a deaminase (or a coding gene thereof) and an inactivated target specific endonuclease (or a coding gene thereof) and a guide RNA to the cell, or extracted from the cell. Contacting (eg, incubating with) DNA to induce base correction (base conversion, such as cytosine to urachal) and nick generation in a single strand of DNA within the target site targeted by the guide RNA. The cell may be selected from all eukaryotic cells intended to introduce base correction and / or single stranded cleavage by deaminase, eg, may be selected from mammalian cells, including human cells. The transfection is
(1) 디아미나제, 블활성화된 표적특이적 엔도뉴클레아제, 및 가이드 RNA의 흔합물 또는 이들이 결합된 복합체 (리보핵산단백질 ; RNP),  (1) a combination of diamines, deactivated target specific endonucleases, and guide RNAs or complexes to which they are bound (ribonucleic acid protein; RNP),
(2) 디아미나제 암호화 mRNA, 불활성화된 표적특이적 엔도뉴클레아제 암호화 mRNA, 및 가이드 RNA의 흔합물, 또는 (2) a combination of diaminase encoding m RNA, inactivated target specific endonuclease encoding mRNA, and guide RNA, or
(3) 디아미나제 암호화 유전자 및 표적특이적 엔도뉴클레아제 암호화 유전자를 각각 또는 함께 포함하는 플라스미드 (재조합 백터), 및 가이드 RNA 또는 가이드 RNA 암호화 유전자를 포함하는 플라스미드  (3) a plasmid (recombinant vector) comprising a diaminase coding gene and a target specific endonuclease coding gene, respectively, or a plasmid comprising a guide RNA or guide RNA coding gene
를 통상적인 모든 수단에 의하여 세포에 도입시킴으로써 수행될 수 있으며, 예컨대, 상기 도입은 전기천공 (electroporation), 리포펙션, 미세주입 등에 의하여 수행될 수 있으나 이에 제한되는 것은 아니다.  It may be carried out by introducing into the cell by any conventional means, for example, the introduction may be performed by electroporation, lipofection, microinjection and the like, but is not limited thereto.
일 구체예에서, 상기 단계 (i)은 상기 세포 (디아미나제 및 불활성화된 엔도뉴클레아제에 의한 염기 교정 (염기 교정 위치, 염기 교정 효율 등) 및 /또는 단일 가닥 절단 (절단 위치, 절단 효율 등)을 확인하고자 하는 세포)로부터 추출된 DNA 를 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제 (예컨대, 사티딘 디아미나제 및 불활성화된 Cas9 단백질을 포함하는 융합 단백질) 및 가이드 RNA 와 함께 배양함으로써 수행될 수 있다 {in vitro). 상기 세포로부터 추출된 DNA는 유전체 DNA (genome DNA) 또는 표적 유전자 또는 표적 부위를 포함하는 PCR (polymerase chain reaction) 증폭 산물일 수 있다ᅳ  In one embodiment, said step (i) comprises the step of base correction (base correction site, base calibration efficiency, etc.) and / or single strand cleavage (cutting site, cleavage) by the cell (diaminase and inactivated endonuclease). DNA extracted from the cells to be confirmed (eg, efficiency, etc.) is subjected to a deaminase and an inactivated target specific endonuclease (e.g., a fusion protein comprising a satidine deaminase and an inactivated Cas9 protein) and a guide. By incubation with RNA {in vitro). DNA extracted from the cell may be a genomic DNA or a polymerase chain reaction (PCR) amplification product comprising a target gene or a target site.
임의로, 상기 단계 (i) 수행 (또는 완료) 후 단계 (Π) 수행 전에, 단계 (1)에서 사용된 디아미나제, 불활성화된 표적특이적 엔도뉴클레아제, 및 /또는 가이드 RNA 를 제거하는 단계를 추가로 포함할 수 있다. 또한, 상기 단계 (i) 수행 (또는 완료) 후 단계 (Π) 수행 전에, 단일가닥 절단이 일어난 이중가닥 DNA 절편의 말단 평활화 (또는 말단 수선; end repair) 단계를 추가로 포함할 수 있으며, 상기 말단 평활화 단계는 (b) 단일가닥 절단이 일어난 이중가닥 DNA 절편에서, 절단이 일어나지 않은 가닥의 3'쪽 (절단된 가닥의 절단 지점의 51 말단쪽과 상보적 위치)의 연장된 뉴클레오타이드를 제거 (절단)하는 3'-to_5' 절단 (3'— to-5' trimming) 단계, 및 /또는 (c) 단일가닥 절단이 일어난 이중가닥 DNA 절편에서, 절단이 일어난 가닥의 절단 지점에서의 3' 말단의 뉴클레오타이드를 연장시키는 5'-to-3' DNA 합성 단계를 추가로 포함할 수 있다 (실시예 1 의 그림 참조). 상기 3'— to-5' 절단 단계는 적절한 통상의 엑소뉴클레아제를 사용하여 수행할 수 있다. 상기 5 '-to— 3' DNA 합성 단계는 적절한 통상의 DNA 폴리머라제를 사용하여 수행할 수 있다. Optionally, after performing (or completing) said step (i) and before performing step (Π), the deaminase, inactivated target specific endonuclease, and / or guide RNA used in step (1) are removed. It may further comprise a step. In addition, after performing (or completing) the step (i), before performing step (Π), the step of smoothing (or end repairing) the double-stranded DNA fragment in which the single-strand break may occur may be further included. The terminal smoothing step (b) removes the extended nucleotides on the 3 'side (complementary position with the 5 1 terminal side of the cleavage point of the cleaved strand) in the double stranded DNA fragment where the single strand cleavage occurred. A 3'-to_5 'trimming step (cutting), and / or (c) a double stranded DNA fragment in which a single strand break has occurred, 3' at the cleavage point of the strand where the break occurred It may further comprise a 5'-to-3 'DNA synthesis step that extends the terminal nucleotides (see figure in Example 1). Said 3'- to-5 'cleavage step is carried out using an appropriate conventional exonuclease. Can be done. The 5'-to-3 'DNA synthesis step can be carried out using any suitable conventional DNA polymerase.
또한, 임의로, 상기 단계 (i) 수행 (또는 완료) 후 단계 (ii) 수행 전에, 단계 (ii)의 DNA 절편의 핵산 서열 분석을 보다 용이하게 하기 위하여, 단일가닥 절단된 DNA 절편 (DNA 이중 가닥 중, 절단된 가닥의 절단 위치 (cleavage site)를 포함하는 연속하는 10 내지 30nt 또는 15 내지 25nt 의 올리고뉴클레오타이드 및 /또는 절단되지 않은 가닥의 절단 위치와 대응되는 (상보적인) 위치를 포함하는 연속하는 10 내지 30nt 또는 15 내지 25ht 의 올리고뉴클레오타이드)를 증폭시키는 과정을 추가로 포함할 수 있다. 상기 단계 (Π)에서 핵산 서열 분석에 사용되는 단일 가닥 절단된 DNA 절편은, 단일가닥 절단이 일어난 가닥의 절단 위치를 포함하는 연속하는 10 내지 30nt 또는 15 내지 25nt 의 을리고뉴클레오타이드 및 /또는 절단되지 않은 가닥의 절단 위치와 대웅되는 (상보적인) 위치를 포함하는 연속하는 10 내지 30nt 또는 .15 내지 25nt 의 올리고뉴클레오타이드; 및 /또는 상기 올리고뉴클레오타이드의 증폭산물을 포함할 수 있다.  In addition, optionally, to facilitate nucleic acid sequencing of the DNA fragment of step (ii) after performing (or completing) the step (i), the single stranded DNA fragment (DNA double strand) A contiguous 10-30 nt or 15-25 nt oligonucleotide comprising the cleavage site of the cleaved strand and / or a contiguous position comprising a (complementary) position corresponding to the cleavage position of the uncleaved strand 10 to 30nt or 15 to 25ht of oligonucleotide) may be further included. The single stranded DNA fragments used for nucleic acid sequencing in step (Π) above are not contiguous with 10 to 30 nt or 15 to 25 nt of ligonucleotides and / or cleavage, including the cleavage sites of the strands where the single strand cleavage has occurred. Contiguous 10-30nt or .15-25nt oligonucleotides comprising a (complementary) position opposite to the cleavage position of the strand; And / or amplification products of the oligonucleotides.
상기 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제는 가이드 RNA 와 함께 사용되어 서열 특이성 (specificity)을 가지므로 대부분 표적 위치 (on-target)에 작용하지만, 표적 서열 이외의 부위에 표적 서열과 유사한 서열이 어느 정도 존재하는지에 따라 비표적 위치 (off-target site)에 작용하는 부작용이 발생할 수도 있다. 본 명세서에서, 비표적 위치 (off-target site)라 함은 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제의 표적 부위는 아니지만 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제가 활성을 가지는 위치를 말한다. 즉, 표적 위치 이외의, 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제에 의해 염기 교정 및 /또는 절단되는 위치를 말한다. 일 예에서, 상기 비표적 위치는 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제에 대한 실제 비표적 위치뿐만 아니라 비표적 위치가 될 가능성이 있는 잠재적인 비표적 위치까지 포함하는 개념으로 사용될 수 있다. 상기 비표적 위치는 이에 제한되는 것은 아니나, 시험관 내 Un ^Υ Ό)에서 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제에 의해 절단되는 표적 위치 이외의 모든 위치를 의미할 수 있다.  The deaminase and inactivated target specific endonucleases are used in conjunction with guide RNAs to have sequence specificity and thus act mostly on-target, but at target sites other than the target sequence. Depending on how much similar sequences are present, side effects may occur that affect off-target sites. As used herein, an off-target site is not a target site for deaminase and inactivated target specific endonuclease, but the deaminase and inactivated target specific endonuclease are active. Say a location with. In other words, it refers to a position other than the target position that is base corrected and / or cleaved by a deaminase and inactivated target specific endonuclease. In one example, the non-target position is to be used in a concept that includes not only the actual non-target position for the deaminase and inactivated target specific endonuclease, but also a potential non-target position that is likely to be a non-target position. Can be. The non-target position may mean any position other than the target position cleaved by the deaminase and inactivated target specific endonuclease in vitro, but not limited thereto.
디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제가 표적 위치 이외의 위치에서도 활성을 가지는 것은 다양한 원인에 의해 야기될 수 있다. 예컨대, 표적 부위에 대하여 설계된 표적 서열과 뉴클레오타이드 불일치 (mi smatch) 수준이 낮아서, 표적 서열과 서열 상동성이 높은 표적 서열 이외의 서열 (비표적 서열)의 경우 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제가 작동할 가능성이 있다. The deaminase and inactivated target specific endonuclease having activity at a position other than the target position can be caused by various causes. For example, a nucleotide mismatch with a target sequence designed for a target site Because of the low level of (mi smatch), there is a possibility that deaminase and inactivated target specific endonucleases work for sequences other than the target sequence (non-target sequence) that have high sequence homology with the target sequence.
상기 비표적 위치는 이에 제한되는 것은 아니나, 다음의 조건 중 하나 이상을 만족하는 서열 부위 (유전자 부위)일 수 있다:  The non-target position may be, but is not limited to, a sequence region (gene region) that satisfies one or more of the following conditions:
5 ' 말단이 수직정렬 되는 DNA 리드의 수가 2 이상, 예컨대, 3 이상, 4 이상, 5 이상, 6 이상, 7 이상, 8 이상, 9 이상, 또는 10 이상;  2 or more, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, or 10 or more;
이중가닥 DNA 절편 중 절단이 일어난 가닥과 상보적 가닥이 PAM 서열을 포함;  In the double-stranded DNA fragment, the strand in which the cleavage occurred and the complementary strand comprise the PAM sequence;
이중가닥 DNA 절편 중 절단이 일어난 가닥과 상보적 가닥이 표적 위치의 서열 (표적 서열)과 15 개 이하 또는 10 개 이하, 예컨대, 1.개 내지 15개 , 1개 내지 14개 , 1개 내지 13 개 , 1개 내지 12개, 1개 내지 11개 , 1개 내지 10개, 1개 내지 9개, 1개 내지 8개, 1개 내지 7개, 1개 내지 6개, 1개 내지 5개 , 1개 내지 4개 , 1개 내지 3개, 1개 내지 2개, 또는 1개의 불일치 (mi smatch) 뉴클레오타이드를 포함; 및 Among the double-stranded DNA fragments, the strand in which the cleavage occurred and the complementary strand were not more than 15 or 10 or less, such as 1 . 1 to 15, 1 to 14, 1 to 13, 1 to 12, 1 to 11, 1 to 10, 1 to 9, 1 to 8, 1 to Includes seven, one to six, one to five, one to four, one to three, one to two, or one mismatch nucleotide; And
이중가닥 DNA 절편 중 절단이 일어난 가닥과 상보적 가닥이 염기 교정 (하나 이상의 시토신 (C)의 우라실 (U) 또는 티민. (T)으로의 변환)을 포함. Among the double-stranded DNA fragments, the cleaved and complementary strands contain base correction (conversion of one or more cytosines (C) to uracil (U) or thymine . (T)).
' 비표적 위치에서 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제가 작동하는 경우 유전체 내에서 원치 않는 유전자의 돌연변이를 야기할 수 있어 심각한 문제가 야기될 수 있다. 이에, 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제의 표적 위치에서의 활성 못지 않게 비표적 서열을 정확히 검출하여 분석하는 과정 또한 매우 중요할 수 있으며, 아는 비표적 효과 없이 표적 위치에만 특이적으로 작동하는 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제를 개발하는데 유용하게 사용될 수 있을 것이다ᅳ The operation of deaminase and inactivated target specific endonucleases at non-target positions can cause mutations of unwanted genes in the genome, which can cause serious problems. Therefore, the process of accurately detecting and analyzing non-target sequences as well as the activity at the target sites of the deaminase and inactivated target specific endonuclease can be very important and specific to the target sites without any known non-target effects. It may be useful to develop diaminase and inactivated target specific endonucleases that operate independently.
본 발명의 목적상 상기 시티딘 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제는 생체 내 Un vivo) 및 시험관 내 Un / ro)에서 활성을 가질 수 있으므로, 시험관 내에서 DNA (예컨대, 유전체 DNA)의 비표적 위치를 검출하는데 사용될 수 있으며, 이를 생체 내에서 적용하였을 때 상기 검출된 비표적 위치 (비표적 서열을 포함하는 유전자상 위치 (부위) )와 동일한 위치에도 활성을 가질 것을 예상할 수 있다.  For the purposes of the present invention, the cytidine deaminase and the inactivated target specific endonuclease may have activity in vivo (Un vivo) and in vitro Un / ro), and therefore, in vitro (eg, genomes). DNA) can be used to detect non-target positions, which, when applied in vivo, are expected to have activity at the same position as the detected non-target position (genetic position (site) containing non-target sequence). Can be.
상기 단계 ( i i )는 상기 단계 ( i )에서 절단 (단일 가닥 절단)된 DNA 절편의 핵산 서열을 분석하는 단계로서, 통상적인 모든 핵산 서열 분석 방법에 의하여 수행될 수 있다. 예컨대, 상기 단계 (1)에서 사용된 분리된Step (ii) is a step of analyzing the nucleic acid sequence of the DNA fragment cut (single stranded) in step (i), all conventional nucleic acid sequence analysis It may be carried out by the method. For example, the separated used in step (1)
DNA 가 유전체 DNA 인 경우, 상기 핵산 서열 분석은 전체 유전체 시퀀싱 (whole genome sequencing)에 의하여 수행될 수 있다. 전체 유전체 시뭔싱을 수행하는 경우, 표적 부위의 서열과 상동성을 가지는 서열을 찾아 비표적 위치일 것으로 예측하는 간접적인 방법과 달리 전체 유전체 수준에서 실질적으로 표적 특이적 뉴클레아제에 의해 절단되는 비표적 위치를 검출할 수 있으므로, 보다 정확하게 비표적 위치를 검출할 수 있다. 본 명세서에 사용된 바로서, "전체 유전체 시퀀싱 (whole genome sequencing; WGS)"은 차세대 시퀀싱 (next generation sequencing)에 의한 전장 유전체 시퀀싱을 10 X, 20 X, 40 X 형식으로 여러 배수로 유전체를 읽는 방법을 의미한다. "차세대 시뭔싱 ' '은 칩 (Chip) 기반 및 PCR 기반 페어드엔드 (paired end) 형식으로 전장 유전체를 조각내고, 상기 조각을 화학적인 반응 (hybridization)에 기초하여 초고속으로 시퀀싱을 수행하는 기술을 의미한다. If the DNA is genomic DNA, the nucleic acid sequencing may be performed by whole genome sequencing. When performing whole genome sequencing, the ratio of cleavage by target specific nucleases substantially at the level of the entire genome is unlike the indirect method of looking for sequences that are homologous to the sequence of the target site and predicting that they are nontarget sites. Since the target position can be detected, the non-target position can be detected more accurately. As used herein, "whole genome sequencing (WGS)" is a method for reading a genome in multiples of 10x, 20x, 40x format for full-length genome sequencing by next generation sequencing. Means. "Next-generation sequencing" is a technology that fragments the full-length genome in chip-based and PCR-based paired end formats, and performs the sequencing of the fragments at high speed based on chemical hybridization. it means.
상기 단계 (iii)은 상기 단계 (ii)에서 수득한 염기서열 데이터 (sequence read)에서 DNA가 절단된 위치를 확인 (또는 결정)하는 단계로서 , 시퀀싱 데이터를 분석하여 표적 위치 (on-target site)와 비표적 위치 (off- target site)를 간편하게 검출할 수 있다. 상기 염기서열 데이터로부터 DNA 가 절단된 특정 위치를 결정하는 것은 다양한 접근 방법으로 수행될 수. 있으며, 본 명세서에서는 상기 위치를 결정하기 위한 여러 가지의 합리적인 방법들을 제공한다. 그러나 이는 본 발명의 기술적 사상에 포함되는 예시에 불과하몌 본 발명의 범위가 이들 방법에 의해 제한되는 것은 아니다.  Step (iii) is a step of identifying (or determining) the position at which DNA is cleaved from the sequence read data obtained in step (ii), and analyzing the sequencing data to generate an on-target site. And off-target sites can be detected easily. Determining the specific location of DNA cleavage from the sequencing data can be performed by various approaches. The present specification provides various rational methods for determining the location. However, this is only an example included in the technical idea of the present invention and the scope of the present invention is not limited by these methods.
예컨대, 상기 절단된 위치를 결정하기 위한 일례로서, 전체 유전체 시퀀싱올 통해 수득한 염기서열 데이터를 유전체 상의 위치에 따라 정렬하였을 경우, 5' 말단이 수직 정렬된 위치가 DNA 가 절단된 위치를 의미할 수 있다. 상기 염기서열 데이터를 유전체 상의 위치에 따라 정렬하는 단계는 분석 프로그램 (예를 들어, BWA/GATK 또는 ISAAC 등)을 이용하여 수행할 수 있다. 본 명세서에 사용된 바로서, 상기 용어 "수직 정렬"이란, BWA/GATK 또는 ISAAC 등의 프로그램으로 전체 유전체 시퀀싱 결과를 분석할 때, 인접한 왓슨 가닥 (Watson st rand)과 크릭 가닥 (Crick strand) 각각에 대해 2 개 이상의 염기서열 데이터의 5' 말단이 유전체 상의 동일한 위치 (nucleotide position)에서 시작되는 배열을 말한다. 이로 인하여, 상기 단계 (ii)에서 절단되어 동일한 5' 말단을 갖게 되는 DNA 단편들이 각각 시퀀싱되어 나타나게 된다. For example, as an example for determining the cleaved position, when the sequence data obtained through the entire genome sequencing is aligned according to the position on the genome, the position where the 5 'end is vertically aligned may mean the position where the DNA is cleaved. Can be. Sorting the sequence data according to the position on the genome may be performed using an analysis program (eg, BWA / GATK or ISAAC, etc.). As used herein, the term "vertical alignment" refers to the adjacent Watson st rand and the Crick strand, respectively, when analyzing the whole genome sequencing results by a program such as BWA / GATK or ISAAC. Refers to an arrangement in which the 5 'end of two or more nucleotide sequences data starts at the same nucleotide position on the genome. Due to this, the DNA fragments which are cut in step (ii) and have the same 5 'end appear in sequence.
즉, 상기 단계 (1)에서의 절단이 표적 위치 및 비표적 위치에서 일어나는 경우, 염기서열 데이터를 정렬하게 되면 공통적으로 절단된 부위는 각각 그 위치가 5' 말단으로 시작되므로 수직 정렬되나, 절단되지 않은 부위에는 5' 말단이 존재하지 않으므로 정렬 시 스태거드 (staggered) 방식으로 배열될 수 있다. 따라서, 수직 정렬된 위치를 상기 단계 (i)에서 절단된 부위로 볼 수 있으며, 이는 곧 불활성화된 표적특이적 엔도뉴클레아제에 의하여 절단된 표적 위치 또는 비표적 위치를 의미하는 것일 수 있다.  That is, when the cleavage in step (1) occurs at the target position and the non-target position, when the sequence data are aligned, the commonly cleaved sites are vertically aligned because their positions each start at the 5 'end, but are not cleaved. Since the 5 'end is not present in the non-site, the alignment may be arranged in a staggered manner. Thus, the vertically aligned position can be seen as the cleaved site in step (i), which may mean a target position or non-target position cleaved by an inactivated target specific endonuclease.
상기 "정렬' '은 표준 염기서열 (reference genome)로 염기서열 데이터를 맵핑한 뒤, 유전체에서 동일 위치를 가지는 염기들을 각 위치에 맞게 배열하는 것을 의미한다. 따라서, 염기서열 데이터를 상기와 같은 방식으로 정렬할 수 있다면 어떠한 컴퓨터 프로그램도 이용될 수 있으며, 이는 당업계에 이미 알려진 공지의 프로그램이거나 또는 목적에 맞게 제작된 프로그램들 중에서 선택될 수 있다. 일 실시예에서는 ISAAC를 이용하여 정렬을 수행하였으나, 이에 제한되는 것은 아니다.  The term "alignment" means mapping base sequence data to a reference genome and arranging bases having the same position in the genome according to each position. Any computer program can be used as long as it can be sorted, and it can be selected from programs known in the art, or programs designed for the purpose. However, it is not limited thereto.
정렬 결과, 상기 설명한 바와 같은 5' 말단이 수직 정렬된 위치를 찾는 등의 방법을 통해 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제에 의해 DNA 가 절단된 위치를 결정할 수 있고, 상기 절단된 위치가 표적 위치 (on-target site)가 아니라면, 비표적 위치 (off— target site)로 판단할 수 있다. 다시 말해, 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제의 표적 위치로 설계한 염기 서열과 동일한 서열은 표적 위치이고, 상기 염기 서열과 동일하지 않은 서열은 비표적 위치로 볼 수 있다. 이는 상기 기술한 비표적 위치의 정의상 자명한 것이다.  As a result of the alignment, the position where the DNA is cleaved by the deaminase and the inactivated target specific endonuclease can be determined by a method such as finding a position where the 5 'end is vertically aligned as described above. If the location is not an on-target site, it can be determined as an off-target site. In other words, a sequence identical to the base sequence designed as the target position of the deaminase and inactivated target specific endonuclease is a target position, and a sequence not identical to the base sequence may be regarded as a non-target position. This is obvious by definition of the non-target location described above.
상기 방법 (예컨대, on-target 부위에서의 염기 교정 효율, 비표적 위치 (off— target site) 확인 방법)은, 상기 단계 (iii) 이후에 상기 절단 위치가 표적 위치 (on-target site)가 아닌 경우, 비표적 위치 (off-target s e)로 확인 (판단)하는 단계를 추가로 포함할 수 있다.  The method (eg, base calibration efficiency at on-target site, off-target site identification method) is such that after step (iii) the cleavage site is not an on-target site. In this case, the method may further include identifying (determining) the off-target se.
염기교정기 (디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제)에 의해 절단된 DNA 절편 중 절단된 가닥은 5' 말단이 수직정렬된다. 상기 5' 말단이 수직정렬 되는 DNA 리드 (DNA read; 본 명세서에서 사용되는 바로서, 5' 말단이 수직정렬되고 동일한 핵산 서열을 갖는 DNA 단편 또는 상기 DNA 단편의 집단을 의미함)의 개수에 따라서 , 절단 위치의 개수를 확인할 수 있다. 예컨대, 5 ' 말단이 수직정렬 되는 DNA 리드의 개수가 1 인 경우, 상기 염기교정기에 의하여 하나의 위치, 즉 표적 위치에서만 절단이 일어났음을 확인할 수 있다. 5 ' 말단이 수직 정렬되는 DNA 리드의 개수가 2 이상, 예컨대 , 3 이상, 4 이상, 5 이상, 6 이상, Ί 이상, 8 이상, 9 이상, 또는 10 이상인 경우, 2 이상의 다수의 위치에서 절단이 일어났음을 확인할 수 있으며, 이는 표적위치 이외의 위치 (비표적 위치)에서 DNA 절단이 있음을 의미하는 것이다. 또한, 상기 5 ' 말단이 수직정렬되는 DNA 리드들 중 표적 위치가 아닌 것 (즉, 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 것)은 비표적 위치라고 확인 (또는 결정할 수 있다) . The cut strands in the DNA fragments cleaved by the braces (diaminase and inactivated target specific endonuclease) are vertically aligned at the 5 'end. DNA reads in which the 5 'end is vertically aligned (DNA read; as used herein, the 5' end is vertically aligned and the same nucleic acid sequence The number of cleavage sites can be determined according to the number of DNA fragments having the same or a population of the DNA fragments). For example, when the number of DNA reads vertically aligned at the 5 'end is 1, it can be confirmed that the cleavage occurs only at one position, that is, at the target position, by the braces. When the number of DNA reads vertically aligned at the 5 'end is 2 or more, for example, 3 or more, 4 or more, 5 or more, 6 or more, Ί or more, 8 or more, 9 or more, or 10 or more, cleavage at a plurality of locations It can be confirmed that this has occurred, which means that there is DNA cleavage at a position other than the target position (non-target position). In addition, one of the DNA reads whose vertical 5 'ends are not the target position (ie, having a nucleic acid sequence different from the nucleic acid sequence of the target position) may be identified (or determined) as a non-target position.
따라서, 상기 단계 ( i i i )의 단일 가닥 절단 위치를 확인하는 단계는 ( a) 5 ' 말단이 수직정렬 되는 DNA 리드의 수를 확인 (또는 측정)하는 단계를 포함할 수 있다. 이 경우, 5 ' 말단이 수직정렬 되는 DNA 리드의 수가 2 이상, 예컨대, 3 이상, 4 이상, 5 이상, 6 이상, 7 이상, 8 이상, 9 이상, 또는 10 이상인 경우, 표적 위치가 아닌 위치 (비표적 위치)에서 DNA 절단이 일어난 것으로 확인 (또는 결정)할 수 있다. 또한, 이 경우, 상기 단계 ( i v)의 비표적 위치로 확인하는 단계는 ( i v-1) 상기 2 개 이상의 5 ' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드를 비표적 위치로 확인 (또는 결정)하는 단계를 포함할 수 있다. Thus, identifying the single-stranded cleavage position of step (iii) may comprise (a) identifying (or measuring) the number of DNA reads that are vertically aligned at the 5 'end. In this case, when the number of DNA reads vertically aligned at the 5 'end is 2 or more, for example, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, or 10 or more, the position other than the target position It can be confirmed (or determined) that DNA cleavage has occurred at the (non-target position). Also, in this case, the step of identifying as the non-target position of step (iv) may include (i v-1) having a nucleic acid sequence different from that of the target position in the DNA read in which the two or more 5 'ends are vertically aligned. Identifying (or determining) the DNA read to a non-target location.
또한, 상기 비표적 위치가 PAM 서열을 포함하는지 여부를 확인 (보다 구체적으로, 절단된 DNA 절편 중, 5 ' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)이 PAM 서열을 포함하는지 여부를 확인)함으로써, 염기교정기에 포함된 표적특이적 엔도뉴클레아제에 의한 표적특이적 절단이 아닌 오류로 절단된 위치를 배제하여 비표적 위치의 정확도를 보다 증가시킬 시킬 수 있다. 따라서, 상기 단계 ( i i i )의 단일 가닥 절단 위치를 확인하는 단계는, (b) 비표적 위치가 PAM 서열을 포함하는지 여부를 확인하는 단계, 예컨대, 절단된 DNA 절편 중, 5 ' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)이 염기교정기에 포함된 표적특이적 엔도뉴클레아제에 특이적인 PAM 서열을 포함하는지 여부를 확인하는 단계를 추가로 포함할 수 있다. 이 경우, 상기 단계 ( i v)의 비표적 위치로 확인하는 단계는, ( iv-2) 절단된 DNA 절편 중, 5 ' 말단이 수직정렬 되는 DNA 리드 중 표작 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)이 염기교정기에 포함된 표적특이적 엔도뉴클레아제에 특이적인 PAM 서열을 포함하는 경우, 비표적 위치로 확인 (또는 결정)하는 단계를 포함할 수 있다. In addition, it is confirmed whether the non-target position includes the PAM sequence (more specifically, DNA reads having a nucleic acid sequence different from that of the target position in the DNA read whose 5 ′ end is vertically aligned among the cut DNA fragments; By checking whether the complementary strand (strand with complementary sequence) contains the PAM sequence), thereby excluding the erroneously cleaved position rather than the target specific cleavage by the target specific endonuclease included in the calibrator This can further increase the accuracy of the non-target position. Thus, identifying the single stranded cleavage position of step (iii) comprises: (b) confirming whether the non-target position comprises a PAM sequence, e.g., in the truncated DNA fragment, the 5 'end is vertically aligned. DNA strands having a nucleic acid sequence different from the nucleic acid sequence at the target position among the DNA reads that are complementary strands (strands having complementary sequences) include a PAM sequence specific for the target-specific endonuclease included in the braces. It may further comprise the step of checking whether or not. In this case, to the non-target position of step (iv) (Iv-2) Among the truncated DNA fragments, the DNA read having a nucleic acid sequence different from the nucleic acid sequence at the position of the cut out of the DNA read whose vertical alignment of the 5 'end is complementary with the strand having the complementary sequence. ) Includes a PAM sequence specific for the target specific endonuclease included in the braces, it may comprise the step of identifying (or determining) the non-target position.
또한, 상기 비표적 위치는 표적 위치의 서열과 상동성을 가지는 서열로 구성된 것일 수 있다. 보다 구체적으로, 표적 위치 서열은 P層 서열을 포함하는 가닥의 핵산 서열로 표현되므로, 상기 비표적 위치는, 절단된 DNA 절편 중, 5 ' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)의 핵산 서열이 표적 위치와 1 개 이상의 뉴클레오타이드 불일치 (m i smatch)를 가지는 서열, 더욱 구체적으로 표적 위치 (표적 서열)와 15 개 이하 또는 10 개 이하, 예컨대 1 개 내지 15 개, 1 개 내지 14개 , 1개 내지 13개, 1개 내지 12개, 1개 내지 11개 1개 내지 10개, 1개 내지 9개, 1개 내지 8개, 1 개 내지 7개, 1 개 내지 6개, 1개 내지 5개, 1개 내지 4개, 1개 내지 3개, 또는 1개 내지 2개의 뉴클레오타이드 블일치를 가지는 것일 수 있다. In addition, the non-target position may be composed of a sequence having homology with the sequence of the target position. More specifically, since the target position sequence is represented by the nucleic acid sequence of the strand including the P ′ sequence, the non-target position is determined by the nucleic acid sequence of the target position in the DNA read whose vertical 5 ′ is aligned in the truncated DNA fragment. Nucleic acid sequences of DNA reads with different nucleic acid sequences and complementary strands (strands with complementary sequences) have one or more nucleotide mismatches with the target position, more specifically with the target position (target sequence). Up to 10 or less, such as 1 to 15, 1 to 14, 1 to 13, 1 to 12, 1 to 11 1 to 10, 1 to 9, It may have 1 to 8, 1 to 7, 1 to 6, 1 to 5, 1 to 4, 1 to 3, or 1 to 2 nucleotide match .
따라서, 상기 단계 ( i i i )의 단일 가닥 절단 위치를 확인하는 단계는, (c) 절단된 DNA 절편 중, 5 ' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 Thus, identifying the single stranded cleavage site of step (iii) comprises: (c) a DNA read having a nucleic acid sequence that is different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end is aligned in the truncated DNA fragment; And complementary strands
(상보적 서열을 갖는 가닥)의 표적 위치 서열에 대한 불일치 뉴클레오타이드 수를 확인 (또는 측정)하는 단계를 추가로 포함할 수 있다. 이 경우, 상기 불일치 뉴클레오타이드 수가 15 개 이하 또는 10 개 이하, 예컨대 , 1개 내지 15개, 1개 내지 14개 , 1개 내지 13개, 1개 내지 12개, 1개 내지 11개, 1개 내지 10개, 1개 내지 9개, 1개 내지 8개, 1개 내지 7개 , 1개 내지 6개, 1개 내지 5개, 1개 내지 4개 , 1개 내지 3개 , 또는 1 개 내지 2 개인 경우, 상기 DNA 리드를 표적 위치가 아닌 위치 (비표적 위치)에서 DNA 절단이 일어난 것으로 확인 (또는 결정)할 수 있다. 또한, 이 경우, 상기 단계 ( I V)의 비표적 위치로 확인하는 단계는, ( iv-3) 절단된 DNA 절편 중, 5 ' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)의 표적 위치 서열에 대한 불일치 뉴클레오타이드가 15 개 이하 또는 10개 이하, 예컨대 , 1개 내지 15개, 1개 내지 14개 , 1개 내지 13개 , 1 개 내지 12 개, 1 개 내지 11 개, 1 개 내지 10 개, 1 개 내지 9 개, 1 개 내지 8개, 1개 내지 7개, 1개 내지 6개, 1개 내지 5개, 1개 내지 4개, 1 개 내지 3 개, 또는 1 개 내지 2 개인 경우, 비표적 위치로 확인 (또는 결정)하는 단계를 포함할 수 있다. (Identifying (or measuring) the number of mismatched nucleotides relative to the target position sequence of the strand with the complementary sequence). In this case, the number of mismatched nucleotides is 15 or less or 10 or less, for example, 1 to 15, 1 to 14, 1 to 13, 1 to 12, 1 to 11, 1 to 10, 1 to 9, 1 to 8, 1 to 7, 1 to 6, 1 to 5, 1 to 4, 1 to 3, or 1 to 2 In individual cases, the DNA read can be identified (or determined) as DNA cleavage has occurred at a position other than the target position (non-target position). Also, in this case, the step of identifying the non-target position of step (IV) may be performed by (iv-3) a nucleic acid sequence different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end is aligned in the cut DNA fragment. Up to 15 or 10 or less, such as 1 to 15, 1 to 14, 1 to 1, mismatched nucleotides to the target position sequence of the DNA read and the complementary strand (the strand with the complementary sequence) 13, 1 to 12, 1 to 11, 1 to 10, 1 to 9, 1 In the case of 8 to 1, 1 to 7, 1 to 6, 1 to 5, 1 to 4, 1 to 3, or 1 to 2, identification (or determination) by non-target position It may include the step).
상기 단계 (iii)는 단계 (a), (b), 및 (c) 중 하나 이상 (예컨대, 단계 (a) 및, (b)와 (c) 중 하나 이상)을 포함할 수 있으며, 상기 단계 (a), (b), 및 (c) 중 2 이상을 포함하는 경우, 이들은 동시에 또는 순서에 상관없이 순차적으로 수행되는 것일 수 있다. 또한, 단계 (iv)는 단계 (iv一 1), (iv— 2), 및 (iv— 3) 중 하나 이상 (예컨대, 단계 (iv-1) 및, (iv-2)와 (iv-3) 중 하나 이상)을 포함할 수 있으며, 상기 단계 (iv-l), (iv-2), 및 (iv— 3) 중 2 이상을 포함하는 경우, 이들은 동시에 또는 순서에 상관없이 순차적으로 수행되는 것일 수 있다.  Step (iii) may comprise one or more of steps (a), (b), and (c) (eg, step (a) and one or more of (b) and (c)), wherein When two or more of (a), (b), and (c) are included, they may be performed simultaneously or sequentially in any order. Further, step (iv) may comprise at least one of steps (iv1 1), (iv-2), and (iv-3) (e.g., steps (iv-1) and (iv-2) and (iv-3) And at least two of steps (iv-1), (iv-2), and (iv-3), which are performed simultaneously or sequentially, regardless of order. It may be.
상기 단계 (iii-Ι)의 염기 교정 (예컨대, 시토신 (C)의 우라실 (U) 또는 티민 (T)으로의 변환) 여부를 확인하는 단계는, 절단된 DNA 절편 중, 5' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)이 염기 교정 (하나 이상의 시토신 (C)의 우라실 (U) 또는 티민 (T)으로의 변환)을 포함하는지 여부를 확인 (측정)하는 단계를 포함하는 것일 수 있다. 이 경우, 상기 단계 (iv)의 비표적 위치로 확인하는 단계는, (iv-4) 절단된 DNA 절편 중, 5' 말단이 수직정렬 되는 DNA 리드 중 표적 위치의 핵산 서열과 상이한 핵산 서열을 갖는 DNA 리드와 상보적 가닥 (상보적 서열을 갖는 가닥)이 염기 교정 (하나 이상의 시토신 (C)의 우라실 (U) 또는 티민 (T)으로의 변환)을 포함하는 경우, 비표적 위치로 확인 (또는 결정)하는 단계를 포함할 수 있다.  Confirming whether the step (iii-Ι) of the base correction (for example, the conversion of cytosine (C) to uracil (U) or thymine (T)), the 5 'end of the cut DNA fragments, vertical alignment Complementary strands (strands with complementary sequences) with DNA reads having a nucleic acid sequence different from the nucleic acid sequence at the target position among the DNA reads are subjected to nucleotide correction (uracil (U) or thymine (T) of one or more cytosines (C) It may be to include a step of determining (measure) whether or not). In this case, identifying the non-target position of step (iv) includes (iv-4) having a nucleic acid sequence different from the nucleic acid sequence of the target position in the DNA read whose vertical 5 'end of the truncated DNA fragment is vertically aligned. If the DNA read and the complementary strand (strand with complementary sequences) comprise base correction (conversion of one or more cytosines (C) to uracil (U) or thymine (T)), they are identified as non-target positions (or Determining) may be included.
일 실시예에서, 유전체 DNA 에 대하여 상기 단계 (i)를 수행하여 단일 가닥 절단한 뒤, 전체 유전체 분석 (단계 (ii)) 수행 후, 이를 ISAAC로 정렬하여 절단된 위치에서는 수직 정렬, 절단되지 않은 위치에서는 스태거드 방식으로 정렬되는 패턴을 확인하여, 이를 5' 말단 플롯으로 나타내었을 때 절단 부위에서 독특한 패턴이 나타날 수 있다.  In one embodiment, the single strand is cut by performing step (i) on the genomic DNA, followed by whole genome analysis (step (ii)), and then aligned with ISAAC to vertically aligned and not cut at the cut position. At the location, a pattern that is aligned in a staggered manner is identified, and when this is represented by a 5 'terminal plot, a unique pattern may appear at the cut site.
나아가 이에 제한되는 것은 아니나, 구체적인 일례로 왓슨 가닥 Further, but not limited to, Watson strand as a specific example
(Watson strand)과 크릭 가닥 (Crick strand)에 해당하는 염기서열 데이터 (sequence read)가 각각 두 개 이상씩 수직으로 정렬되는 위치를 비표적 위치인 것으로 판단할 수 있고, 또한 20 % 이상의 염기서열 데이터가 수직으로 정렬되고, 각각의 왓슨 가닥 및 크릭 가닥에서 동일한 5' 말단을 가진 염기서열 데이터의 수가 10 이상인 위치가 비표적 위치, 즉 절단되는 위치인 것으로 판단할 수 있다. A non-target position may be determined as a position where two or more sequence reads corresponding to the Watson strand and the Crick strand are vertically aligned, respectively, and more than 20% of the sequence data. Are vertically aligned and have the same 5 'end at each Watson strand and Creek strand. It can be determined that the position where the number of the nucleotide sequence data is 10 or more is a non-target position, that is, a position to be cleaved.
상기한 방법은 단계 (Π) 및 (iii)의 과정은 Digenome— seq (di ested-genome sequencing 일 수 있으며, 보다 구체적인 내용은 한국 특허공개 제 10-2016-0058703 호에 기재되어 있다 (상기 문헌은 본 발명에 참조로서 포함된다).  In the above method, the process of steps (Π) and (iii) may be Digenome—seq (di ested-genome sequencing), and more details are described in Korean Patent Publication No. 10-2016-0058703 Incorporated herein by reference).
앞서 설명한 방법에 의하여, 디아미나제의 염기 교정 위치 및 /또는 단일 가닥 절단 위치, on— target 부위에서의 염기 교정 효율 또는 표적 특이성 (즉, [on-target 부위에서의 염기 교정 또는 절단 빈도] / [전체 염기 교정 또는 절단 빈도]) , 및 /또는 비표적 위치 (off— target site; 디아미나제의 염기 교정 위치로 확인된 위치 중 on-target 위치가 아닌 위치)를 확인 (또는 측정 또는 검출)할 수 있다.  By the methods described above, the base calibration site and / or single strand cleavage site of the diminase, the base calibration efficiency or target specificity at the on—target site (ie, the base calibration or cleavage frequency at the on-target site) / [Total base calibration or cleavage frequency]), and / or non-target sites (off—target site; one that is identified as the base calibration site of the diminase but not on-target) can do.
상기 비표적 위치 확인 (검출)은 시험관 내 Un w' ro)에서 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제를 유전체 DNA 에 처리하여 수행될 수 있다. 이에 상기 방법을 통해 확인 (검출)된 비표적 위치에 대하여 실질적으로 생체 내 ( in vivo)에서도 비표적 효과가 나타나는지 확인해볼 수 있다. 다만 이는 추가적인 검증 과정에 불과하므로 본 발명의 범위에 필수적으로 수반되는 단계는 아니며, 필요에 따라 추가적으로 수행될 수 있는 단계에 불과하다. Make the non-target position (detection) can be carried out by treating the non-specific deaminase, and an active target enemy endonuclease in vitro Un w 'ro) in the dielectric DNA. Therefore, it can be confirmed whether the non-target effect is substantially observed in vivo with respect to the non-target position identified (detected) through the above method. However, since this is only an additional verification process, it is not an essential step in the scope of the present invention, but is only a step that may be additionally performed as necessary.
본 명세서에 사용된 바로서, 용어 "비표적 효과 (off-target effect)' '는 비표적 위치 (off-target, site)에서 염기 교정 및 /또는 이중 가닥 절단이 일어나는 수준을 의미하기 위한 것일 수 있다. 용어 "인델 (Insertion and/or deletion; Indel)"은 DNA 의 염기 배열에서 일부 염기가 중간에 삽입되거나 (insertion) 및 /또는 결실된 (deletion) 변이를 총칭한다. As used herein, the term "off-target effect"'can be used to mean the level at which base correction and / or double stranded cleavage occurs at off-target , site. The term “Insertion and / or deletion (Indel)” generically refers to variations in which some bases are inserted or deleted in the base sequence of DNA.
【발명의 효과】  【Effects of the Invention】
본 명세서에서 제공되는 시티딘 디아미나제를 이용한 DNA 단일 가닥 절단 방법 및 이를 이용한 핵산 서열 분석 기술에 의하여, 시티딘 디아미나제의 염기 교정 위치, 표적 특이성, 및 /또는 비표적 위치를 보다 정확하고 효율적으로 확인할 수 있다. ― 【도면의 간단한 설명】  The DNA single strand cleavage method using cytidine deaminase provided herein and nucleic acid sequencing techniques using the same can provide more accurate and more accurate detection of the base correction position, target specificity, and / or non-target position of the cytidine deaminase. It can be confirmed efficiently. ― 【Brief Description of Drawings】
도 1은 EMX1의 표적 위치에서의 C→U 변환 및 직선 정렬 (straight alignment)을 보여주는 대표적인 IGV 이미지이다.  1 is a representative IGV image showing C → U transformation and straight alignment at the target location of EMX1.
도 2는 Digenome-seq 결과 얻어진 하나의 가닥에서만 시퀀스 리드의 균일한 정렬을 갖는 절단 위치 (mcked sites)의 개수 및 이들 위치 중 10개 이하의 mismatch를 갖는 PAM—포함 위치의 개수 보여준다. Figure 2 shows the sequence reads from only one strand obtained with Digenome-seq results. Shows the number of mcked sites with uniform alignment and the number of PAM-containing sites with a mismatch of 10 or less of these locations.
도 3a는 rAP0BECl-XTEN-dCas9-NLS 백터의 개열지도이다.  3A is a cleavage map of the rAP0BECl-XTEN-dCas9-NLS vector.
도 3b는 rAP0BECl-XTEN-dCas9— UGI-NLS 백터의 개열지도이다.  3B is a cleavage map of the rAP0BECl-XTEN-dCas9—UGI-NLS vector.
도 3c는 rAP0BECl-XTEN-Cas9n-UGI-NLS 백터의 개열지도이다.  3C is a cleavage map of the rAP0BECl-XTEN-Cas9n-UGI-NLS vector.
도 4는 Cas9 expression plasmid의 개열지도이다.  4 is a cleavage map of the Cas9 expression plasmid.
도 5는 pET28b_BEl 백터의 개열지도이다.  5 is a cleavage map of the pET28b_BEl vector.
도 6은 PET28b-BE3 delta UGI 백터의 개열지도이다. 6 is a P-BE3 delta ET28b a cleavage map of the UGI vector.
【발명의 실시를 위한 구체적인 내용】  [Specific contents for implementation of the invention]
이하 본 발명을 다음의 실시예에 의하여 보다 구체적으로 설명하고자 한다. 그러나 이들은 본 발명을 예시하기 위한 것일 뿐이며, 본 발명의 범위가 이들 실시예에 의하여 제한되는 것은 아니다.  Hereinafter, the present invention will be described in more detail with reference to the following examples. However, these are only for illustrating the present invention, and the scope of the present invention is not limited by these examples.
[참고예]  [Reference Example]
1. 세포 배양 및 형질감염  1. Cell Culture and Transfection
HEK293T 세포 (ATCC CRL-11268)를 10¾(w/v) FBS 및 l%(w/v) 페니실린 /스트렙토 마이신 (Welgene)으로 보충된 DMEM (Dulbecco Modified Eagle Medium) 배지에서 유지시켰다. HEK293T 세포 (1.5xl05)를 24- 웰 플레이트에 접종하고, Lipofectamine 2000 (Invitrogen)을 사용하여 sgRNA plasmid (500 ng)와, Base Editor' plasmid (Addgene plasmid #73019 (Expresses BE1 with C— terminal NLS in mammalian cells; rAPOBECl— XTEN一 dCas9-NLS; 도 3a), #73020 (Expresses BE2 in mammalian cells; rAPOBECl- XTEN-dCas9-UGI-NLS; 도 3b), #73021 (Expresses BE3 in mammalian cells; rAP0BECl-XTEN-Cas9n-UGI-NLS; 도 3c)) (1.5^g) 또는 Cas9 expression plasmid (Addgene plasmid #43945; 도 4)를 형질감염시켰다 (at ~80% confluency). 형질감염 후 72 시간 후에 DNeasy Blood & Tissue Kit (Qiagen)을 사용하여 유전체 DNA 를 분리하였다. 상기 세포에 대하여 마이코플라스마 오염 여부를 테스트하지 않았다. HEK293T cells (ATCC CRL-11268) were maintained in DMEM (Dulbecco Modified Eagle Medium) medium supplemented with 10¾ (w / v) FBS and l% (w / v) penicillin / Streptomycin (Welgene). HEK293T cells (1.5xl0 5 ) were inoculated in 24-well plates, sgRNA plasmid (500 ng) and Lipofectamine 2000 (Invitrogen), and Base Editor ' plasmid (Addgene plasmid # 73019 (Expresses BE1 with C— terminal NLS in) mammalian cells; rAPOBECl—XTEN 一 dCas9-NLS; FIG. 3A), # 73020 (Expresses BE2 in mammalian cells; rAPOBECl-XTEN-dCas9-UGI-NLS; FIG. 3B), # 73021 (Expresses BE3 in mammalian cells; rAP0BECl-XTEN -Cas9n-UGI-NLS; FIG. 3C)) (1.5 ^ g) or Cas9 expression plasmid (Addgene plasmid # 43945; FIG. 4) was transfected (at ˜80% confluency). Genomic DNA was isolated 72 hours after transfection using the DNeasy Blood & Tissue Kit (Qiagen). The cells were not tested for mycoplasma contamination.
하기하는 실시예에 사용된 sgRNA 는 표적 부위 서열 (표적 서열; EMX1 on— target 서열; GAGTCCGAGCAGAAGMGAAGGG (서열번호 M)) 중 5' 말단의 PAM 서열 (5'-NGG-3' (N은 A, T, G, 또는 C 임 ))을 제외한 서열에서 T 를 U 로 바꾼 서열을 아래의 일반식 3의 표적화 서열 '(Ncas9)r로 하여 제작된 것을 사용하였다: 5"-(Ncasg)1-(GUUUUAGAGCUA 서열번호 l)-(GAAA)- 서열번호 3)-3' (일반식 3; 올리고뉴클레오타이드 링커: GAM). The sgRNAs used in the Examples below are the PAM sequences (5'-NGG-3 ', where N is A, T) at the 5' end of the target site sequence (target sequence; EMX1 on—target sequence; GAGTCCGAGCAGAAGMGAAGGG (SEQ ID NO)). , G, or C)) was used as the targeting sequence '(N cas9 ) r of the following general formula 3 in which T was replaced with U: 5 "-(N cas g) 1- (GUUUUAGAGCUA SEQ ID NO: l)-(GAAA)-SEQ ID NO: 3) -3 '(formula 3; oligonucleotide linker: GAM).
2. 단백질 정제  2. Protein Purification
Hi S6-rAP0BECl-XTEN-dCas9 단백질을 코딩하는 플라스미드 (pET28b- BE1; Expresses BE1 with N— terminal His6 tag in E. Col i; 도 5)는 David Liu (Addgene plasmid #73018)로부터 제공받고, 상기 His6-rAP0BECl-XTENᅳ dCas9 단백질을 코딩하는 플라스미드 pET28b-BEl에서 site directed mutagenesis 를 이용하여 dCas9의 A840을 H840로 치환하여 , His6-rAP0BECl- nCas9 단백질 (BE3 delta UGI; UGI 도메인을 결여한 BE3 변이형)을 코딩하는 플라스미드 (pET28b-BE3 delta UGI; 도 6)를 제작하였다. A plasmid encoding Hi S 6-rAP0BECl-XTEN-dCas9 protein (pET28b-BE1; Expresses BE1 with N—terminal His6 tag in E. Col i; FIG. 5) was provided by David Liu (Addgene plasmid # 73018). His6-rAP0BECl-nCas9 protein (BE3 delta UGI; BE3 variant lacking the UGI domain) was substituted by H840 in the plasmid pET28b-BEl encoding His6-rAP0BECl-XTEN ᅳ dCas9 protein using site directed mutagenesis. Plasmid (pET28b-BE3 delta UGI; FIG. 6) was constructed.
Rosetta 발현 세포 (Novagen, catalog number: 70954-3CN)를 상기 준비된 pET28b-BEl 또는 pET28b-BE3 delta UGI 로 형질 전환시키고, 100 g/ml kanamycin 과 50 mg/ml carbenici 1 in을 포함하는 Lur ia-Bertani (LB) brot 에서 37°C 조건으로 밤새 배양하였다. pET28b_BEl 또는 pET28b— BE3 delta UGI 을 함유하는 Rosetta 세포를 밤새 배양한 배양물 10ml 를 100/ g/ml kanamycin 및 50mg/ml carbenici 1 in 을 함유하는 400ml LB broth 에 접종하고 OD600이 0.5-0.6에 도달 할 때까지 30 °C 조건에서 배양하였다. 상기 배양된 세포를 1 시간 동안 16 °C로 냉각시키고, 으 5 mM IPTGdsopropyl β _D— 1— thiogalactopyranoside)를 보충하여, 14-18 시간 동안 배양하였다. Rosetta expressing cells (Novagen, catalog number: 70954-3CN) were transformed with pET28b-BEl or pET28b-BE3 delta UGI prepared above, Lur ia-Bertani containing 100 g / ml kanamycin and 50 mg / ml carbenici 1 in (LB) incubated overnight at 37 ° C in brot. 10 ml of overnight cultures of Rosetta cells containing pET28b_BEl or pET28b—BE3 delta UGI were inoculated into 400 ml LB broth containing 100 / g / ml kanamycin and 50 mg / ml carbenici 1 in and OD600 reached 0.5-0.6. Incubate at 30 ° C until. The cultured cells were cooled to 16 ° C. for 1 hour, supplemented with 5 mM IPTGdsopropyl β_D— 1—thiogalactopyranoside) and incubated for 14-18 hours.
단백질 정제를 위해, 세포를 4 °C에서 10 분 동안 5000xg 에서 원심 분리하여 수확하고, 리소자임 (Sigma) 및 프로테아제 억제제 (Roche complete, EDTA— free)보충된 용해 완충액 (50 mM NaH2P04 , 300 mM NaCl , 1 mM DTT 및 10 mM imidazole, pH 8.0) 5 ml 에서 초음파 처리하여 용해시켰다. 상기 얻어진 세포 반웅물을 4 °C에서 13,000 rpm 로 30분 동안 원심분리하여 얻어진 용해성 세포 용해물을 Ni-NTA 아가로즈 레진 (Qiagen)과 함께 4°C에서 1 시간 동안 배양하였다. 세포 용해물 /Ni- NTA 흔합물을 컬럼에 적용하고 완충액 (50 mM NaH2P04, 300 mhl NaCl 및 20 mM 이미다졸, pH 8.0)으로 세척하였다. BE3 단백질을 용출 완충액 (50 mM NaH2P04) 300 mM NaCl 및 250 mM 이미다졸, pH 8.0)으로 용출시켰다. 용출된 단백질을 저장 완충액 (20 mM HEPES-KOH (pH 7.5), 150 mM KC1, 1 mM DTT 및 20% 글리세를)으로 버퍼 교체하여 저장하고 원심 분리 필터 유닛 (Millipore)을 사용하여 농축시켜, rAP0BECl-nCas9 단백질을 정제하였다. 3. 유전체 DNA의 탈아민화 For protein purification, cells were harvested by centrifugation at 5000x g for 10 min at 4 ° C, lysozyme (Sigma) and protease inhibitors (Roche complete, EDTA—free) supplemented lysis buffer (50 mM NaH 2 P0 4 , Solubilized in 5 ml of 300 mM NaCl, 1 mM DTT and 10 mM imidazole, pH 8.0). The obtained cell reaction product was centrifuged at 13,000 rpm for 30 minutes at 4 ° C., and the soluble cell lysate obtained was incubated with Ni-NTA agarose resin (Qiagen) at 4 ° C. for 1 hour. Cell lysate / Ni-NTA mixture was applied to the column and washed with buffer (50 mM NaH 2 P0 4 , 300 mhl NaCl and 20 mM imidazole, pH 8.0). BE3 protein was eluted with elution buffer (50 mM NaH 2 P0 4) 300 mM NaCl and 250 mM imidazole, pH 8.0). The eluted protein was stored by buffer replacement with storage buffer (20 mM HEPES-KOH (pH 7.5), 150 mM KC1, 1 mM DTT and 20% glycerol) and concentrated using a centrifugal filter unit (Millipore), rAP0BECl -nCas9 protein was purified. 3. Deamination of Genomic DNA
유전체 DNA 는 제조자의 지시에 따라 DNeasy Blood & Tissue Kit (Qiagen)을 사용하여 HEK293T 세포로부터 정제 (추출)하였다. 유전체 DNA (10 ig)를 상기 참고예 2에서 정제된 rATOBECl— nCas9 단백질 (300 nM)과 sgRNA (900 nM)와 함께 500 의 반응 용량으로 37 °C에서 8 시간 동안 완충액 (100 mM NaCl , 40 mM) Hris-HCl, 10 mM MgC12, 및 100 /g/ml BSA, pH 7.9)에서 배양하였다. Genomic DNA was purified (extracted) from HEK293T cells using the DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer's instructions. Genomic DNA (10 ig) was buffered (100 mM NaCl, 40 mM for 8 hours at 37 ° C with a reaction volume of 500 with rATOBECl—nCas9 protein (300 nM) and sgRNA (900 nM) purified in Reference Example 2 above. ) Hris-HCl, 10 mM MgC12, and 100 / g / ml BSA, pH 7.9).
상기 사용된 sgRNA 는 표적 부위 서열 (표적 서열; EMX1 on— target 서열; GAGTCCGAGCAGAAGAAGMGGG (서열번호 14)) 중 5' 말단의 PM 서열 (5'-NGG-3' (N 은 A, T, G, 또는 C 임))을 제외한 서열에서 T 를 U 로 바꾼 서열을 아래의 일반식 3의 표적화 서열 ' (Nca^ 1로 하여 제작된 것을 사용하였다: The sgRNA used above was the PM sequence at the 5 'end of the target site sequence (target sequence; EMX1 on—target sequence; GAGTCCGAGCAGAAGAAGMGGG (SEQ ID NO: 14)) (5'-NGG-3' (N is A, T, G, or The sequence in which T was replaced with U in the sequence except for C) was used as the targeting sequence '(Nca ^ 1) of the following general formula 3:
51 -(Ncas9 ) , -(GUUUUAGAGCUA ) -(GAM ) - 5 1- (N cas9 ),-(GUUUUAGAGCUA)-(GAM)-
(일반식 3; 올리고뉴클레오타이드 링커: GAM). (Formula 3; oligonucleotide linker: GAM).
RNase A (50 /g/mL)를 사용하여. sgRNA 를 제거한 후, 우라실 함유 유전체 DNA 를 DNeasy Blood & Tissue Kit (Qiagen)로 정제하였다. 표적 부위를 SUN— PCR 블렌드를 사용하여 PCR 증폭시키고 생거 (Sanger) 서열 분석을 수행하여 BE3-매개 시토신 탈아민화 및 DNA 절단을 확인하였다.  Using RNase A (50 / g / mL). After removal of the sgRNA, uracil containing genomic DNA was purified by DNeasy Blood & Tissue Kit (Qiagen). Target sites were PCR amplified using SUN—PCR blends and Sanger sequencing to confirm BE3-mediated cytosine deamination and DNA cleavage.
4. 전체 유전체 및 Di genome Sequencing  4. Whole Genome and Di genome Sequencing
Covaris 시스템 (Life Technologies)을 사용하여 400-500 bp 범위로 유전체 DNA (1 )를 단편화하고 End Repair Mix (Thermo Fischer)를 사용하여 blunt— ended 시켰다. 단편화된 DNA 를 어댑터로 연결하여 라이브러리를 생성 한 다음, (주)마크로젠에 의뢰하여 HiSeq X Ten Sequencer (Illumina)를 사용하여 WGS( hole genome sequencing)를 수행하였다 (Kim, D. , Kim, S. , Kim, S. , Park, J. & Kim, J.S. Genome一 wide target specificities of CRISPRᅳ Cas9 nucleases revealed by multiplex Digenome- seq. Genome research 26, 406-415 (2016) ) .  Genomic DNA (1) was fragmented in the 400-500 bp range using Covaris system (Life Technologies) and blunt-ended using End Repair Mix (Thermo Fischer). A library was generated by connecting fragmented DNA with an adapter, followed by Macrogen, and carried out hole genome sequencing (WGS) using HiSeq X Ten Sequencer (Illumina) (Kim, D., Kim, S. , Kim, S., Park, J. & Kim, JS Genome one wide target specificities of CRISPR ᅳ Cas9 nucleases revealed by multiplex Digenome- seq.Genome research 26, 406-415 (2016)).
5. 표적 심증 시뭔싱 (Targeted deep sequencing)  5. Targeted deep sequencing
deep sequencing 라이브러리 생성을 위해, 표적과 잠재적인 비표적 부위를 KAPA HiFi HotStart PCR 키트 (KAPA Biosystems # KK2501)로 증폭시켰다. 풀링된 PCR 증폭물을 TruSeq HT Dual Index 시스템 (Illumina)이 장착된 MiniSeq (II lumina) 또는 Illumina Miseq(LAS Inc. 한국)을 사용하여 시퀀싱하였다. 상기 표적 심층 시뭔싱에 사용된 프라이머는 다음과 같다 : For deep sequencing library generation, targets and potential nontarget sites were amplified with a KAPA HiFi HotStart PCR kit (KAPA Biosystems # KK2501). Pooled PCR amplifications were sequenced using MiniSeq (II lumina) or Illumina Miseq (LAS Inc. Korea) equipped with TruSeq HT Dual Index System (Illumina). The primers used for the target deep sequencing are as follows:
EMX1  EMX1
On— target sequence: GAGTCCGAGCAGAAGAAGAAGGG (서열번호 )  On— target sequence: GAGTCCGAGCAGAAGAAGAAGGG (SEQ ID NO:
1st PCR  1st PCR
Forward(5,→3' ): Forward (5 , → 3 ') :
AGTGTTGAGGCCCCAGTG (서열번호 15);  AGTGTTGAGGCCCCAGTG (SEQ ID NO: 15);
Reverse(5 '→3 ' ):  Reverse (5 '→ 3') :
ΟΤΟΑΟΤΟΟΑθπθΑΟΑΟΟΤΟΤΟΟΤΟΤΤΟΟΟΑΤΟΤΟΑαθΑΟΟΑΑΟΟΑΟΟΑΟΤΟΤ (서열번호 ΟΤΟΑΟΤΟΟΑθπθΑΟΑΟΟΤΟΤΟΟΤΟΤΤΟΟΟΑΤΟΤΟΑαθΑΟΟΑΑΟΟΑΟΟΑΟΤΟΤ (SEQ ID NO:
16); 16);
2nd PCR  2nd PCR
Forward(5' -→3' ):  Forward (5 '-→ 3') :
ACACTCTTTCCCTACACGACGCTCTTCCGATCTGGGCCTCCTGAGTTTCTCAT (서열번호 ACACTCTTTCCCTACACGACGCTCTTCCGATCTGGGCCTCCTGAGTTTCTCAT (SEQ ID NO
17); 17);
Reverse(5'→3' )  Reverse (5 '→ 3')
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAGCAGCAAGCAGCACTCT (서열번호 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAGCAGCAAGCAGCACTCT (SEQ ID NO:
18) . 18).
실시예 1. Digenome-seq를 이용한 BE3 비표적 위치 확인  Example 1 BE3 Non-Target Location Identification Using Digenome-seq
생체 외에서 (in vitro), 인간 유전체 DNA에 EMX1 특이적 sgRNA (참고예 3 참조; on target 서열: 서열번호 14)와 rAP0BECl-nCas9 단백질 (BE3: 참고예 2에서 정제됨)이 복합체를 형성한 리보핵산단백질을 처리하여 표적 (on-target) 위치 및 비표적 (off-target) 위치에서 한 가닥에는 C→U 변환을, 다른 가닥에서는 절단 (nick)을 유도한 후, 참고예 4를 참조하여 Digenome-seq를 수행하였다. 본 실시예에서, Uracil DNA glycosylase (UDG)와 DNA glycosylaseᅳ lyase Endonuc lease VIII는 사용하지 않았다ᅳ end repair 및 adaptor ligation 후, BE3 처리된 유전체 DNA에 대하여 전체 유전체 시퀀싱 (WGS)을 수행하였다. In vitro (i n vitro), a complex of EMX1-specific sgRNA (see Reference 3; on target sequence: SEQ ID NO: 14) and rAP0BECl-nCas9 protein (BE3: purified in Reference Example 2) was formed in human genomic DNA. Treatment of ribonucleic acid proteins induces C → U conversion on one strand and nicking on the other strand at the on-target and off-target positions, with reference to Reference Example 4. Digenome-seq was performed. In this example, Uracil DNA glycosylase (UDG) and DNA glycosylase® lyase Endonuc lease VIII were not used. After end repair and adapter ligation, whole genome sequencing (WGS) was performed on BE3 treated genomic DNA.
그 과정을 모식적으로 나타내면 다음과 같다: 5' - GAGTCCGAGCAGAAGAAGAAGGG- 3' The process is represented as follows: 5 '-GAGTCCGAGCAGAAGAAGAAGGG- 3'
3' -CTCAGGCTCGTCTTCTTCTTCCC- 5'  3 '-CTCAGGCTCGTCTTCTTCTTCCC- 5'
rAPOBEC1-nCas9  rAPOBEC1-nCas9
- GAGTUUGAGC¾GAAG¾AG¾AGGG- 3'  -GAGTUUGAGC¾GAAG¾AG¾AGGG- 3 '
-CTCAGGCTCGTCTTCTTCTTCCC- 5'  -CTCAGGCTCGTCTTCTTCTTCCC- 5 '
5' GAGTUUGAGCAGAAGAAGAAGGG - 3' 5' - GAGTUUGAGCAGAAGAAGASGGG- 3'5 'GAGTUUGAGCAGAAGAAGAAGGG-3' 5 '-GAGTUUGAGCAGAAGAAGASGGG- 3'
3' CTCAGGCTCGTCTTCTT CTTCCC- 5 3 'CTCAGGCTCGTCTTCTT CTTCCC- 5
.5' 3'A  .5 '3'A
End repair  End repair
3'-to-5' Trimming 5'-to-3' FiWn by by exonuciease DNA polymerase  3'-to-5 'Trimming 5'-to-3' FiWn by by exonuciease DNA polymerase
Figure imgf000031_0001
Figure imgf000031_0001
한 가닥에서 서열 리드의 균일한 정렬 및 다른 가닥에서 c→u 변환이 일어난 표적 위치 및 비표적 위치를 계산적으로 확인하였다.  The homogeneous alignment of the sequence reads on one strand and the target and nontarget positions where c → u transformation occurred in the other strand were calculated.
도 1은 EMX1의 표적 위치에서의 C→U 변환 및 직선 정렬 (straight alignment)을 보여주는 대표적인 IGV 이미지이다.  1 is a representative IGV image showing C → U transformation and straight alignment at the target location of EMX1.
상기 Digenome-seq 결과 얻어진 하나의 가닥에서만 시뭔스 리드의 균일한 정렬을 갖는 절단 위치 (nicked sites; 5' 말단이 수직정렬을 갖는 위치 (read))의 개수 및 이들 위치 중 10개 이하의 mismatch를 갖는 PAM— 포함 위치 (read)의 개수를 도 2에 나타내었다. 그룹 A와 B는 동일한 5' 말단을 갖는 시뭔스 리드의 절대 개수 (n ≥ 5 또는 10) 및 상대 개수 (각각 10% 또는 20%)를 갖는 것으로 확인되고 표적 서열과 상동성을 갖는 위치의 개수를 보여준다.  The number of nicked sites (reads with vertical alignment at the 5 'end) and mismatches of 10 or less of these positions in only one strand obtained from the Digenome-seq results The number of PAMs having reads is shown in FIG. 2. Groups A and B are identified as having absolute numbers (n ≧ 5 or 10) and relative numbers (10% or 20%, respectively) having the same 5 ′ end and the number of positions homologous to the target sequence. Shows.
인간 유전체 전역에서 동일한 5' 말단을 갖는 서열 리드의 절대적 개수 (n≥ 5 또는 10) 및 상대적 개수 (각각 10 % 또는 20 %)를 계수하여 유전체 내에서의 균일한 정렬 패턴과 관련된 모든 위치를 열거하였다. 그 결과, 도 2에 나타난 바와 같이, 각각 90 , 496 또는 1 , 807 개의 해당 위치들을 확보하였다. 단일가닥 절단 . (ni ck)이 있는 위치들 중 34 개 (그룹 A) 또는 142 개 (그룹 B ; 상기 그룹 A를 포함함) 위치들은 각각 단일가닥 절단 위치의 하류방향의 PAM (5 '— NGNᅳ 3 ' 또는 5 ' -NNG-3 ' ) 3 염기쌍을 가지며, 10 개 이하의 불일치 염기를 갖는 EMX1 표적 서열과 10개 이하의 mi smatches를 갖는 정도의 상동성을 갖는다. Absolute of sequence reads with identical 5 'ends throughout the human genome The number (n ≧ 5 or 10) and the relative number (10% or 20%, respectively) were counted to list all positions associated with a uniform alignment pattern in the dielectric. As a result, as shown in FIG. 2, 90, 496 or 1, 807 corresponding positions were secured, respectively. Single strand cutting. 34 (group A) or 142 (group B; including group A) of the positions with (ni ck) each have a PAM (5'— NGN ᅳ 3 'or downstream of the single-strand cut position; 5'-NNG-3 ') has 3 base pairs and homology with up to 10 mi smatches with an EMX1 target sequence with up to 10 mismatched bases.
Di genome— seq를 통해 확인된 EMX1에 대한 BE3 비표적 위치에서의 Cas9 유도 inde l 빈도와 BE3 유도 치환 빈도를 HEK293T 세포에서 표적심층시퀀싱 (참고예 5 참조)을 사용하여 측정하였다. 손상되지 않은 유전체 DNA 또는 rAPOBECl— nCas9로 처리된 유전체 DNA를 사용하여 얻어진 WGS 데이터에서의 이들 위치 각각에서 C→T 변환이 일어나는지 여부를 확인하였다.  Di genome—The Cas9 induced inde l frequency and the BE3 induced substitution frequency at the BE3 non-target position for EMX1 identified via seq were measured using in-depth sequencing in HEK293T cells (see Reference Example 5). Intact genomic DNA or rAPOBECl—generated DNA treated with nCas9 was used to determine whether C → T conversion occurred at each of these positions in the WGS data obtained.
EMX1에서 BE3에 의하여 절단된 DNA 서열 (표적 위치 1개 + 비표적 위치 141개 = 총 142개)을 아래의 표 1에 정리하였다 (아래 표 1에서, on t arget 서열과 mi smat ch인 염기는 소문자로 표시함) .  DNA sequences cleaved by BE3 in EMX1 (1 target position + 141 non-target positions = 142 total) are summarized in Table 1 below (in Table 1 below, the base on on arget sequence and mi smat ch) In lowercase).
[표 1]  TABLE 1
Chr Pos i t i on .DNA seq at a n i ckage s i t es  Chr Pos i t i on .DNA seq at a n i ckage s i t es
EMX1-001  EMX1-001
(on- t arget; chr2 73160998 GAGTCCGAGCAGAAGAAGAAGGG  (on- t arget; chr2 73160998 GAGTCCGAGCAGAAGAAGAAGGG
서열번호 14)  SEQ ID NO: 14)
EMX 1-002 chr4 131662222 GAaTCCaAG-AGAAGAAGAATGG  EMX 1-002 chr4 131662222 GAaTCCaAG-AGAAGAAGAATGG
EM 1-003 chr2 219845072 GAGgCCGAGCAGAAGAAagACGG  EM 1-003 chr2 219845072 GAGgCCGAGCAGAAGAAagACGG
EMX 1-004 chr ll 62365273 GAaTCCaAGCAGAAGAAGAgAAG  EMX 1-004 chr ll 62365273 GAaTCCaAGCAGAAGAAGAgAAG
EMX 1-005 chr 8 128801258 GAGTCC t AGCAGgAGAAGAAGAG  EMX 1-005 chr 8 128801258 GAGTCC t AGCAGgAGAAGAAGAG
EMX 1-006 chr 15 44109763 GAGTCt aAGCAGAAGAAGAAGAG  EMX 1-006 chr 15 44109763 GAGTCt aAGCAGAAGAAGAAGAG
EMX 1-007 chr 19 24250503 GAGTCCaAGCAG t AGAgGAAGGG  EMX 1-007 chr 19 24 250 503 GAGTCCaAGCAG t AGAgGAAGGG
EMX 1-008 chr6 9118799 acGTCt GAGCAGMGAAGAATGG  EMX 1-008 chr6 9118799 acGTCt GAGCAGMGAAGAATGG
EMX 1-009 chr 5 9227162 aAGTCt GAGCAcAAGAAGAATGG  EMX 1-009 chr 5 9227162 aAGTCt GAGCAcAAGAAGAATGG
EMX 1-010 chr 1 4515013 G t GTCC t AG-AGAAGAAGAAGGG  EMX 1-010 chr 1 4515013 G t GTCC t AG-AGAAGAAGAAGGG
EMX1-011 chr5 45359067 GAGTt aGAGCAGAAGAAGAAAGG  EMX1-011 chr5 45359067 GAGTt aGAGCAGAAGAAGAAAGG
EMX1-012 chr 13 96928092 GAGaCaGAG-AGAAGAAGAATGG  EMX1-012 chr 13 96928092 GAGaCaGAG-AGAAGAAGAATGG
EMX 1-013 chr 18 34906762 GAGcC t GAGCgGAAGAgGAAAGG  EMX 1-013 chr 18 34906762 GAGcC t GAGCgGAAGAgGAAAGG
EMX1-014 chr 1 184236243 aA t aCaGAGCAGAAGAAGAATGG  EMX1-014 chr 1 184236243 aA t aCaGAGCAGAAGAAGAATGG
EMX1-015 " chr 18 1677040 agt cCaGAGCAaMtAAGAAGGG EMX1-015 " chr 18 1677040 agt cCaGAGCAaMtAAGAAGGG
EM 1-016 chr l 33606480 GAGcCtGAGCAGAAGgAGAAGGG  EM 1-016 chr l 33606 480 GAGcCtGAGCAGAAGgAGAAGGG
EMX1-017 chr3 111296327 GAagaaGAGCAaAAGAAGAAGGG  EMX1-017 chr3 111296327 GAagaaGAGCAaAAGAAGAAGGG
EMX1-018 chr22 34716275 G t GaCaGAGCAaAAGAAGAAAGG  EMX1-018 chr22 34716275 G t GaCaGAGCAaAAGAAGAAAGG
EMX1-019 chr3 37781974 GAagagGAGCAaAAGAAGAAGGG  EMX1-019 chr3 37781974 GAagagGAGCAaAAGAAGAAGGG
EMX 1-020 chr20 6653999 aAGTCCagaCAGAAGAAGAAGGA  EMX 1-020 chr20 6653999 aAGTCCagaCAGAAGAAGAAGGA
EM 1-021 chr 16 78848850 aAaTCCaAcCAGAAGAAGAAAGG  EM 1-021 chr 16 78848850 aAaTCCaAcCAGAAGAAGAAAGG
EMX 1-022 chr 6 92449690 Gt t caaGAGCAGgAGAAGAAGGG EM 1-023 chr4 87256692 GAGTaaGAGaAGMGMGAAGGGEMX 1-022 chr 6 92449690 Gt t caaGAGCAGgAGAAGAAGGG EM 1-023 chr4 87256692 GAGTaaGAGaAGMGMGAAGGG
EMX 1-024 chr l l 43747948 aAGcCCGAGCAaAgGAAGAAAGGEMX 1-024 chr l l 43747948 aAGcCCGAGCAaAgGAAGAAAGG
EMX 1-025 chr 5 160643032 cct at aGAGCAaAAGAAGAAAGGEMX 1-025 chr 5 160643032 cct at aGAGCAaAAGAAGAAAGG
EMX 1-026 chr ll 120873098 GAt caaGAGaAGAAGAAGAAGGGEMX 1-026 chr ll 120873098 GAt caaGAGaAGAAGAAGAAGGG
EMX1-027 chr5 62692054 cAaaaaGAGCAaAAGAAGAACGGEMX1-027 chr5 62692054 cAaaaaGAGCAaAAGAAGAACGG
EMX 1-028 chrX 3077291 tAcagtGAGCAaAAGAAGAAGGGEMX 1-028 chrX 3077291 tAcagtGAGCAaAAGAAGAAGGG
EMX 1-029 chr l4 98236084 Gt t caaGAGCAGgAGAAGAAGGGEMX 1-029 chr l4 98236084 Gt t caaGAGCAGgAGAAGAAGGG
EMX 1-030 chr2 205473563 t t cTCaGAGCAaAAGAAGAATGGEMX 1-030 chr2 205473563 t t cTCaGAGCAaAAGAAGAATGG
EMX 1-031 chr3 189633259 c t t TGCcAGGAGAAGgAcAtTGCEMX 1-031 chr3 189633259 c t t TGCcAGGAGAAGgAcAtTGC
EMX1-032 chr lO 58498683 agGTt aGAGCAaAAGAAGAAAGGEMX1-032 chr lO 58498683 agGTt aGAGCAaAAGAAGAAAGG
EMX 1-033 chr 1 35818892 t A t aCgGAGCAGAAGAAG TGGEMX 1-033 chr 1 35818892 t A t aCgGAGCAGAAGAAG TGG
EMX 1-034 chr3 45605387 GAGTCCac aCAGAAGAAGAAAGAEMX 1-034 chr3 45605387 GAGTCCac aCAGAAGAAGAAAGA
EMX1-035 chr3 5031614 GAaTCCaAGCAGgAGAAGAAGGAEMX1-035 chr3 5031614 GAaTCCaAGCAGgAGAAGAAGGA
EMX 1-036 chr 12 106646090 aAGTCCat GCAGAAGAgGAAGGGEMX 1-036 chr 12 106646090 aAGTCCat GCAGAAGAgGAAGGG
EMX 1-037 chr 1 23720618 aAGTCCGAGgAGAgGAAGAAAGGEMX 1-037 chr 1 23720618 aAGTCCGAGgAGAgGAAGAAAGG
EMX 1-038 chr l l 107812992 a AGTCCaAG t -GAAGAAGMAGGEMX 1-038 chr l l 107812992 a AGTCCaAG t -GAAGAAGMAGG
EMX 1—039 chr4 169444372 GAGaaCGAGaAGAAagAGgAGAGEMX 1—039 chr4 169444372 GAGaaCGAGaAGAAagAGgAGAG
EMX 1-040 chr6 18327737 GAGagaGAGagagAGAgGgAGGGEMX 1-040 chr6 18327737 GAGagaGAGagagAGAgGgAGGG
EMX 1-041 chr2 230161576 c t GgCaGAGCAaAAGAAGAgGGGEMX 1-041 chr2 230161576 c t GgCaGAGCAaAAGAAGAgGGG
EMX 1-042 chr3 95690186 t caTCCaAGCAGAAGAAGAAGAGEMX 1-042 chr3 95690186 t caTCCaAGCAGAAGAAGAAGAG
EMX 1-043 chr4 33321466 Gt acagGAGCAGgAGAAGAATGGEMX 1-043 chr4 33321466 Gt acagGAGCAGgAGAAGAATGG
EMX 1-044 chi-22 49900715 aAGaagGAGaAGAAGAAGAAGGGEMX 1-044 chi-22 49900715 aAGaagGAGaAGAAGAAGAAGGG
EM 1-045 chr 12 94591214 GAGagaGAGagagAGAgaAAGGGEM 1-045 chr 12 94591214 GAGagaGAGagagAGAgaAAGGG
EMX 1-046 chr5 146833190 GAGcCgGAGCAGAAGMGgAGGGEMX 1-046 chr5 146833190 GAGcCgGAGCAGAAGMGgAGGG
EMX 1-047 chr6 111509461 GAGggaGAGagGgAGAgagAAAGEMX 1-047 chr6 111509461 GAGggaGAGagGgAGAgagAAAG
EMX 1-048 chr l 26490139 t t aTCt ccGagaAgGAAGAAGGGEMX 1-048 chr l 26490 139 t t aTCt ccGagaAgGAAGAAGGG
EMX 1-049 chr6 31265461 GAtTCtGt cCcGAAt cAGAAGGGEMX 1-049 chr6 31265461 GAtTCtGt cCcGAAt cAGAAGGG
EM 1-050 chr 14 30099303 atGcaaGAGaAGAAGAAGAAAGGEM 1-050 chr 14 30099303 atGcaaGAGaAGAAGAAGAAAGG
EMX 1-051 chr3 83057859 age aggGAGCAGAgGAAGAATGGEMX 1-051 chr3 83057859 age aggGAGCAGAgGAAGAATGG
EMX 1-052 chr 15 35575311 GAGaagGAGaAGAAGAAGAAGGGEMX 1-052 chr 15 35575311 GAGaagGAGaAGAAGAAGAAGGG
EMX 1-053 chr 1 55846672 ac t c t aGAGCAGAAaAAGAATGGEMX 1-053 chr 1 55846672 ac t c t aGAGCAGAAaAAGAATGG
EMX 1-054 chr6 104384459 GAGgagGAGgAGgAGgAaggAGGEMX 1-054 chr6 104384459 GAGgagGAGgAGgAGgAaggAGG
EMX 1-055 chr 19 9975831 aAagagGAGaAGAAGAAGAAGGGEMX 1-055 chr 19 9975831 aAagagGAGaAGAAGAAGAAGGG
E X1-056 chr 12 99525769 GgGgagGAGCAGAAGAAGAgAGGE X1-056 chr 12 99525769 GgGgagGAGCAGAAGAAGAgAGG
EMX1-057 chr6 162280006 agGcCgagGCAGgAGAA t AgGAGEMX1-057 chr6 162280006 agGcCgagGCAGgAGAA t AgGAG
EMX 1-058 chr7 85359110 GAGaagGAGCAGAAaAAGAATGGEMX 1-058 chr7 85359110 GAGaagGAGCAGAAaAAGAATGG
EMX 1-059 chr2 10462867 acagt aGAGCAGAAGMGAcTGGEMX 1-059 chr2 10462867 acagt aGAGCAGAAGMGAcTGG
EMX1-060 chr 3 18195303 at cca aGAGCAGgAGAAGAAGGGEMX1-060 chr 3 18195303 at cca aGAGCAGgAGAAGAAGGG
EMX 1-061 chr2 57855994 a t aagaGAGCAaAAGAAGAAAGGEMX 1-061 chr2 57855994 a t aagaGAGCAaAAGAAGAAAGG
EMX 1-062 chr6 33957284 GAGagaGAGagagAGAgaAACGGEMX 1-062 chr6 33957284 GAGagaGAGagagAGAgaAACGG
EMX 1-063 chr 22 37474903 GAGaagGAGaAGAAGgAGAAGAGEMX 1-063 chr 22 37474903 GAGaagGAGaAGAAGgAGAAGAG
EMX 1-064 chr 8 141193983 aAGaagaAGaAGAAGAAGAAGAGEMX 1-064 chr 8 141193983 aAGaagaAGaAGAAGAAGAAGAG
EMX 1-065 chr l 110038435 t t t cggGAGCAGAAGAAGAACAGEMX 1-065 chr l 110038435 t t t cggGAGCAGAAGAAGAACAG
EMX 1-066 chr4 117483357 a t c aCaGAGCAGgAGAAGAAGGGEMX 1-066 chr4 117483357 a t c aCaGAGCAGgAGAAGAAGGG
EMX1-067 chr4 6150362 a A a c agGAGCAGAgGAAGAAGGGEMX1-067 chr4 6150362 a A a c agGAGCAGAgGAAGAAGGG
EMX 1-068 chr2 116142148 aAGaagagGaAGAgGAgGAAAAGEMX 1-068 chr2 116142148 aAGaagagGaAGAgGAgGAAAAG
EM 1-069 chr 12 30794309 GAaat gGAGaAGAAGAAGAAGGGEM 1-069 chr 12 30794309 GAaat gGAGaAGAAGAAGAAGGG
EM 1-070 chr22 44527016 GAGagaGAaagaAAGAAaAAGGAEM 1-070 chr22 44527016 GAGagaGAaagaAAGAAaAAGGA
EMX 1-071 chr9 96189722 Gc t gtgGAGCAaAAGAAGAAAGGEMX 1-071 chr9 96189722 Gc t gtgGAGCAaAAGAAGAAAGG
EMX 1-072 chr8 113493465 GAGgagGAGCAGAAGAAGAAAAGEMX 1-072 chr8 113493465 GAGgagGAGCAGAAGAAGAAAAG
EMX 1-073 chr ll 46171476 tAaaagGAGCAGAAaAAGAAGGGEMX 1-073 chr ll 46171476 tAaaagGAGCAGAAaAAGAAGGG
EMX 1—074 chrX 3075272 tAcct tGAGCAaAAGAAGAAGGGEMX 1—074 chrX 3075272 tAcct tGAGCAaAAGAAGAAGGG
EMX 1-075 chr5 56038567 aAGaagGAGaAGAAGAAGAAGGG EMX1-076 chr2 71789100 GcaggaGAGCAGAAGAAGAAAGGEMX 1-075 chr5 56038567 aAGaagGAGaAGAAGAAGAAGGG EMX1-076 chr2 71789 100 GcaggaGAGCAGAAGAAGAAAGG
EMX 1-077 chr7 52389195 aAGagCGAGat tAAGAgGAATGGEMX 1-077 chr7 52389195 aAGagCGAGat tAAGAgGAATGG
EMX 1-078 chr 5 31088930 aAGaaaGgagAGgAGAgGAgAGGEMX 1-078 chr 5 31088930 aAGaaaGgagAGgAGAgGAgAGG
EMX 1-079 chr l l 111680806 agt agtGAGCAGAAGAAGAtAGGEMX 1-079 chr l l 111680806 agt agtGAGCAGAAGAAGAtAGG
EM 1-080 chr20 51306677 aAGaagGAGaAGAAGAAGAAGAGEM 1-080 chr20 51306677 aAGaagGAGaAGAAGAAGAAGAG
EMX1-081 chr l9 38433655 GAGagaGAGagagAGAgaAAGAGEMX1-081 chr l9 38433655 GAGagaGAGagagAGAgaAAGAG
EMX 1-082 chr8 60956107 GgccagGAGCAGgAGAAGAAGGGEMX 1-082 chr8 60956107 GgccagGAGCAGgAGAAGAAGGG
EM 1-083 chr 16 26617803 agaggaGAGCAGAAGAAGgATGGEM 1-083 chr 16 26617803 agaggaGAGCAGAAGAAGgATGG
EMX 1-084 chr l2 52621931 aAGaagGAGaAGAAGAAGgAGGAEMX 1-084 chr l2 52621931 aAGaagGAGaAGAAGAAGgAGGA
EMX 1-085 chr3 156028864 cAtTaaGAGCAGgAGAAGAAGGGEMX 1-085 chr3 156028864 cAtTaaGAGCAGgAGAAGAAGGG
EMX 1-086 chr6 40280504 cgcTga tAcagaAAGAAGAATGGEMX 1-086 chr6 40280 504 cgcTga tAcagaAAGAAGAATGG
EM 1-087 chr l 35385601 GAagt gGAGCAGgAGAAGAAGGGEM 1-087 chr l 35385601 GAagt gGAGCAGgAGAAGAAGGG
EM 1-088 chr 1 59299359 1 1 1 gt gGAGCAGAAaAAGAAAGGEM 1-088 chr 1 59299359 1 1 1 gt gGAGCAGAAaAAGAAAGG
EMX 1-089 chr 15 61646877 aAGTCaGAGgAGAAGAAGAAGGGEMX 1-089 chr 15 61646877 aAGTCaGAGgAGAAGAAGAAGGG
EMX 1-090 chr2 159685754 aAagCtGAGCAGAAaAAGAAGGGEMX 1-090 chr2 159685754 aAagCtGAGCAGAAaAAGAAGGG
E X1-091 chr 12 41494108 GcagtgGAGCAGAAGAAGAtGGGE X1-091 chr 12 41494 108 GcagtgGAGCAGAAGAAGAtGGG
EMX 1-092 chr7 119831026 acaaaaGAGCAGAgGAAGAAAGGEMX 1-092 chr7 119831026 acaaaaGAGCAGAgGAAGAAAGG
EMX 1-093 chr l 234492864 GAagt aGAGCAGAAGAAGAAGCGEMX 1-093 chr l 234492864 GAagt aGAGCAGAAGAAGAAGCG
EMX 1-094 chr 14 104091588 aAagagGgagAGAAGAAGAAGGGEMX 1-094 chr 14 104091588 aAagagGgagAGAAGAAGAAGGG
EMX 1-095 chr l 31954326 aAGaagGAGaAGAAGAAGAAGAGEMX 1-095 chr l 31954326 aAGaagGAGaAGAAGAAGAAGAG
EMX 1-096 chr8 120587501 aAGgCCaAGCAGAAGAgt AATGGEMX 1-096 chr8 120587501 aAGgCCaAGCAGAAGAgt AATGG
EMX 1-097 chr2 46020469 acacaaGAGCAGAAGAAGAAAGAEMX 1-097 chr2 46020469 acacaaGAGCAGAAGAAGAAAGA
EMX1-098 chr2 219294645 Gccaa t GAGCAGgAGAAGAAGGGEMX1-098 chr2 219294645 Gccaa t GAGCAGgAGAAGAAGGG
EMX1-099 chr8 11924153 cAt at aGAGCAaAAGAAGAgAGGEMX1-099 chr8 11924153 cAt at aGAGCAaAAGAAGAgAGG
EMX 1-100 chr6 54740531 GAGgt gGAGggGAAGAgGgAAGGEMX 1-100 chr6 54740531 GAGgt gGAGggGAAGAgGgAAGG
EMX 1- 101 chr 1 156786840 GAGagaGAGagagAGAgaAAGGGEMX 1- 101 chr 1 156786840 GAGagaGAGagagAGAgaAAGGG
EMX 1- 102 chr6 30791217 aAGgagGAGaAGAAGAAGAAGGGEMX 1- 102 chr6 30791217 aAGgagGAGaAGAAGAAGAAGGG
EMX 1- 103 chr3 192777993 GAGggaGAGagagAGAgagAAAGEMX 1- 103 chr3 192777993 GAGggaGAGagagAGAgagAAAG
EMX 1-104 chr2 36207879 agt cggGAGCAGgAGAAGAAAGGEMX 1-104 chr2 36207879 agt cggGAGCAGgAGAAGAAAGG
EMX1-105 chr 16 54831367 Gt t caaGAGCAGAAGAAGAATGGEMX1-105 chr 16 54831367 Gt t caaGAGCAGAAGAAGAATGG
EMX 1- 106 chr 6 160868147 t c t aaaGAGCAGAAaAAGAAAGGEMX 1- 106 chr 6 160868147 t c t aaaGAGCAGAAaAAGAAAGG
EMX 1- 107 chr2 24438043 actgat GAGCAGAAGAAGAAAGGEMX 1- 107 chr2 24438043 actgat GAGCAGAAGAAGAAAGG
EMX 1- 108 chr 22 37102243 aAGaagGAGaAGAAGAAGgAGGAEMX 1- 108 chr 22 37102243 aAGaagGAGaAGAAGAAGgAGGA
EM 1-109 chr l l 121786535 agGaaaagagAGAAGMGAAGGGEM 1-109 chr l l 121786535 agGaaaagagAGAAGMGAAGGG
EMX 1- 110 chr 7 3337380 GAGgagGAGaAGAAGAAGAAGGGEMX 1- 110 chr 7 3337 380 GAGgagGAGaAGAAGAAGAAGGG
EMX l- 111 chr8 112924257 GAGagaGAGagagAGAgaAAGGGEMX l- 111 chr8 112924257 GAGagaGAGagagAGAgaAAGGG
EMX 1- 112 chr 16 69047289 GAGgCCGAagct gAGgtGggAGGEMX 1- 112 chr 16 69047289 GAGgCCGAagct gAGgtGggAGG
EM 1- 113 chr8 105164125 GAGcCCaAGaAGAAGAAGAAGGAEM 1- 113 chr8 105 164 125 GAGcCCaAGaAGAAGAAGAAGGA
EMX 1-114 chr 13 83353702 t GTaCagagAGAAGAAGAAAGGEMX 1-114 chr 13 83353702 t GTaCagagAGAAGAAGAAAGG
EMX1- 115 chr2 102929260 Gc cTt CagagAGAAGAAGAATGGEMX1- 115 chr2 102929260 Gc cTt CagagAGAAGAAGAATGG
EMX 1- 116 chr 15 22366621 Ggagt aGAGCAGAgGAAGAAGGGEMX 1- 116 chr 15 22366621 Ggagt aGAGCAGAgGAAGAAGGG
EM 1- 117 chr2 172374203 GAagt aGAGCAGAAGAAGAAGCGEM 1- 117 chr2 172374 203 GAagt aGAGCAGAAGAAGAAGCG
EMX 1-118 chr 8 31096390 Gct cCtGAGCAGAAGAAGAACAGEMX 1-118 chr 8 31096390 Gct cCtGAGCAGAAGAAGAACAG
EMX 1- 119 chr2 66729772 agtTCaGAGCAGgAGAAGAATGGEMX 1- 119 chr2 66729772 agtTCaGAGCAGgAGAAGAATGG
EMX 1-120 chr2 14472327 atGaaCagagAGAAGAAGAATGGEMX 1-120 chr2 14472327 atGaaCagagAGAAGAAGAATGG
EMX1- 121 chr8 140468447 GAGagCGAGagagAGAgagAGGGEMX1-121 chr8 140468447 GAGagCGAGagagAGAgagAGGG
EMX 1- 122 chr7 52204863 aAaaagGAGCAGAAGAAGAAGGAEMX 1- 122 chr7 52204863 aAaaagGAGCAGAAGAAGAAGGA
EMX 1-123 chr l 151027598 t t cTCCaAGCAGAAGAAGAAGAGEMX 1-123 chr l 151027598 t t cTCCaAGCAGAAGAAGAAGAG
EM 1- 124 chr l 35590719 GAGagaGAGagagAGAgaAAGGGEM 1- 124 chr l 35590 719 GAGagaGAGagagAGAgaAAGGG
EMX1- 125 chr 1 106744880 t tGgaaagagAGAAGAAGAAGGGEMX1- 125 chr 1 106 744 880 t tGgaaagagAGAAGAAGAAGGG
EM 1- 126 chr 10 115484209 aAGaggaAGaAGAAGAAGAAGAGEM 1- 126 chr 10 115484209 aAGaggaAGaAGAAGAAGAAGAG
EMX 1- 127 chr3 119686684 GAGagaGAGaAagAGAAagAGAGEMX 1- 127 chr3 119686684 GAGagaGAGaAagAGAAagAGAG
EMX 1- 128 chr8 53295601 GAagaaGAGaAGMGAAGAAGGG EMX 1-129 chrl8 12032247 GAtTCtGAGaAaAttAAGAtGGG EMX 1- 128 chr8 53295601 GAagaaGAGaAGMGAAGAAGGG EMX 1-129 chrl8 12032247 GAtTCtGAGaAaAttAAGAtGGG
EMX 1-130 chrl5 61383748 GgGctCcgGCAGAAGAtGccATG  EMX 1-130 chrl5 61383748 GgGctCcgGCAGAAGAtGccATG
EMX 1-131 chrl 209298672 GA t TCCaAGCA a t gGAgGAgGGG  EMX 1-131 chrl 209298672 GA t TCCaAGCA a t gGAgGAgGGG
EM 1-132 chr7 17446438 Gt ccaaGAGCAGgAGAAGAAGGG  EM 1-132 chr7 17446438 Gt ccaaGAGCAGgAGAAGAAGGG
EMX1-133 chrl3 74473871 a t cTggGAGCAGgAGAAGAAGGG  EMX1-133 chrl3 74473871 a t cTggGAGCAGgAGAAGAAGGG
EMX 1-134 chr5 5141237 GAGgatccGagGAtGt AGAAGGG  EMX 1-134 chr5 5141237 GAGgatccGagGAtGt AGAAGGG
EMX 1-135 chrl2 5041728 GAagaaGAagAaAgaAAGAAAGA  EMX 1-135 chrl2 5041728 GAagaaGAagAaAgaAAGAAAGA
EMX 1-136 chr8 112756160 cAGagaGAGaAtAAGtAGcATAG  EMX 1-136 chr8 112756 160 cAGagaGAGaAtAAGtAGcATAG
EMX 1-137 chr8 17384135 tgaggaagagAGAAGAAGAAAGG  EMX 1-137 chr8 17384135 tgaggaagagAGAAGAAGAAAGG
EMX 1-138 chrl2 4545932 cAagCa t gagAGAAGAAGAt GGG  EMX 1-138 chrl2 4545932 cAagCa t gagAGAAGAAGAt GGG
EMX1-139 chrlO 58848728 GAGcaCGAGCAagAGAAGAAGGG  EMX1-139 chrlO 58848728 GAGcaCGAGCAagAGAAGAAGGG
EMX 1-140 chrl4 48932119 GAGTCCcAGCAaMGAAGAAAAG  EMX 1-140 chrl4 48932119 GAGTCCcAGCAaMGAAGAAAAG
EMX1-141 chr3 145057362 GAGTCCct -CAGgAGAAGAAAGG  EMX1-141 chr3 145057362 GAGTCCct -CAGgAGAAGAAAGG
EMX 1-142 chr9 111348573 GAGTCCt tG-AGAAGAAGgAAGG  EMX 1-142 chr9 111348573 GAGTCCt tG-AGAAGAAGgAAGG
표 1에 열거된 절단 사이트에서 측정된 count (동일한 5' 말단을 갖는 sequence reads의 개수), count (특정 위치에서의 sequence reads 개수), %(count /depth) 및 C→T 변환이 일어난 reads 개수를 아래의 표 2에 나타내었다:  Count (number of sequence reads with the same 5 'end), count (number of sequence reads at a specific location),% (count / depth), and number of reads with C → T conversion measured at the cleavage sites listed in Table 1. Is shown in Table 2 below:
[표 2]  TABLE 2
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000035_0001
Figure imgf000036_0001
L LOOO/8lOZW^/13d 8C8SCl/8T0Z OAV 9C L LOOO / 8lOZW ^ / 13d 8C8SCl / 8T0Z OAV 9C
Figure imgf000037_0001
Figure imgf000037_0001
LtL000 0ZW^/13d 8C8SCT/810Z ΟΛ\ EMX 1-135 6 59 10.2 N.A. N.A. - VLtL000 0ZW ^ / 13d 8C8SCT / 810Z ΟΛ \ EMX 1-135 6 59 10.2 NANA-V
EM 1-136 5 49 10.2 N.A. N.A. - VEM 1-136 5 49 10.2 N.A. N.A. -V
EMX 1-137 7 69 10.1 N.A. N.A. - VEMX 1-137 7 69 10.1 N.A. N.A. -V
EMX 1-138 5 50 10.0 0 0 - VEMX 1-138 5 50 10.0 0 0-V
EM 1-139 5 50 10.0 0 0 - VEM 1-139 5 50 10.0 0 0-V
EMX 1-140 7 44 15.9 0 0 - VEMX 1-140 7 44 15.9 0 0-V
EMX 1-141 5 40 12.5 1 0 - VEMX 1-141 5 40 12.5 1 0-V
EMX 1-142 6 49 12.2 1 0 - V EMX 1-142 6 49 12.2 1 0-V
(N.A.: not applicable because there are no cytosines to be deaminated at these sites)  (N.A .: not applicable because there are no cytosines to be deaminated at these sites)
상기 표 2에 나타난 바와 같이, BE— 3 처리된 유전체 DNA 또는 손상되지 않은 (BE— 3 미처리) 유전체 DNA를 사용하여 얻은 WGS 데이터에서 142개의 그룹 B 위치들 중 16개 위치 (BE-3 처리군) 또는 1개 위치 (BE-3 미처리군)에서 C→T 변환이 각각 관찰되었다. 이들 위치들 중 70 개 위치는 BE3—매개 디아미네이션의 window인 4 내지 8 번 위치 (5'에서 3' 방향으로 1 내지 20번으로 넘버링됨)에 시토신을 갖지 않는다 (표 2에서 N.A.로 표시).  As shown in Table 2 above, 16 of 142 Group B positions (BE-3 treated group) in WGS data obtained using BE-3 treated genomic DNA or intact (BE-3 untreated) genomic DNA ) Or C → T transformation was observed at one position (untreated BE-3 group), respectively. Seventy of these positions do not have cytosine at positions 4 to 8 (numbered 1 to 20 in the 5 'to 3' direction), which is a window of BE3—mediated delamination (indicated by NA in Table 2). ).
Digenome-seq에 의해 확인된 그룹 A와 그룹 B 사이트의 일부에서의 비표적 효과 (off-target effect)를 확인하기 위하여, HEK293T 세포에서 targeted deep sequencing을 수행하고, BE3—유도 염기교정 빈도와 Cas9- 유도 inciel 빈도를 측정하여 , 하기의 표 3에 나타내었다:  To identify off-target effects in some of the group A and group B sites identified by the digenome-seq, targeted deep sequencing was performed on HEK293T cells, and BE3—induced base calibration frequency and Cas9– Induction inciel frequency was measured and shown in Table 3 below:
[표 3]  TABLE 3
Val idat ion by NGS  Val idat ion by NGS
Indel frequency (%) Base editing frequency (¾) Indel frequency (%) Base editing frequency (¾)
(-) (+) (-) ( + ) (-) (+) (-) (+)
Val idat ion Val idat ion Cas9 Cas9 BE3 BE3  Val idat ion Val idat ion Cas9 Cas9 BE3 BE3
EMX 1-001  EMX 1-001
0.15 61.59 Val idated 0.10 49.33 Val idated (on- target )  0.15 61.59 Val idated 0.10 49.33 Val idated (on- target)
EMX 1—002 0.01 0.01 Inval idated 0.16 1.05 Val idated EMX 1—002 0.01 0.01 Inval idated 0.16 1.05 Val idated
EMX 1-003 0.00 7.94 Va I idated 0.24 4.04 Val idatedEMX 1-003 0.00 7.94 Va I idated 0.24 4.04 Val idated
EMX 1-004 0.00 0.01 Val idated 0.16 0.93 Val idatedEMX 1-004 0.00 0.01 Val idated 0.16 0.93 Val idated
EMX 1-005 0.00 8.63 Val idated 0.05 - 2.47 Val idatedEMX 1-005 0.00 8.63 Val idated 0.05-2.47 Val idated
EMX 1-006 0.29 38.25 Val idated 0.04 15.59 Val idatedEMX 1-006 0.29 38.25 Val idated 0.04 15.59 Val idated
EM 1-007 0.01 0.01 Inval idated 0.08 0.13 Val idatedEM 1-007 0.01 0.01 Inval idated 0.08 0.13 Val idated
EMX 1-008 0.02 0.17 Val idated 0.03 0.62 Val idatedEMX 1-008 0.02 0.17 Val idated 0.03 0.62 Val idated
EMX1-009 0.10 3.45 Val idated 0.02 0.15 Val idatedEMX1-009 0.10 3.45 Val idated 0.02 0.15 Val idated
EMX 1-010 0.08 0.08 Inval idated 0.07 0.70 Val idatedEMX 1-010 0.08 0.08 Inval idated 0.07 0.70 Val idated
EMX 1-034 0.00 0.00 Inval idated 0.33 0.40 Inval idatedEMX 1-034 0.00 0.00 Inval idated 0.33 0.40 Inval idated
EM 1-035 0.46 0.89 Val idated 0.23 0.48 Val idatedEM 1-035 0.46 0.89 Val idated 0.23 0.48 Val idated
EMX 1-036 0.01 0.02 Inval idated 0.09 0.31 Val idatedEMX 1-036 0.01 0.02 Inval idated 0.09 0.31 Val idated
EMX 1-037 0.01 0.23 Val idated 0.20 0.23 Val idatedEMX 1-037 0.01 0.23 Val idated 0.20 0.23 Val idated
EMX 1-038 0.01 0.01 Inval idated 0.14 0.16 Val idatedEMX 1-038 0.01 0.01 Inval idated 0.14 0.16 Val idated
EM 1-140 0.01 0.00 Inval idated 0.38 0.36 Inval idatedEM 1-140 0.01 0.00 Inval idated 0.38 0.36 Inval idated
EMXl-141 0.00 0.00 Inval idated 0.30 0.37 Inval idatedEMXl-141 0.00 0.00 Inval idated 0.30 0.37 Inval idated
EMX 1-142 0.01 0.01 Inval idated 0.19 0.17 Inval idated 표 3에 나타난 바와 같이, 총 18 개 사이트를 분석하였으며, EMX1 on-target site를 포함한 14 개 사이트에서 시퀀싱 에러에 의한 noise 수준 (0.002-0.38%) 이상의 빈도로 BE3에 의한 점돌연변이가 일어남을 관찰하였다 (유효성 확인 비율 78%). BE3는 background noise 수준보다 낮은 빈도로 다른 BE3-관련 Digenome—양성 사이트에서. 변이를 유도하는 것이 가능하다. 중요한 것은, 상기 방법에 의하여 0.13% 이하의 빈도로 염기 교정이 검출되는 BE3 비표적 위치의 확인이 가능하다는 것이며, 이는 Digenome— seq가 매우 민감한 방법임을 보여준다. EMX1에 특이적인 Cas9 뉴클레아제는 노이즈 수준 이상의 빈도로 18 개 위치 중 9 개 위치에서 indels를 유도하며, 이는 BE3 및 Cas9 비표적 위치가 종종 다르다는 것올 보여준다. 종합하면, 이러한 결과는 BE3 비표적 위치가 Digenome-seq 데이터를 사용하여 확인될 수 있음을 보여준다. EMX 1-142 0.01 0.01 Inval idated 0.19 0.17 Inval idated As shown in Table 3, a total of 18 sites were analyzed and point mutations caused by BE3 occurred at 14 sites including the EMX1 on-target site with frequency above noise level (0.002-0.38%) due to sequencing errors. (Validation ratio 78%). BE3 occurs at other BE3-related Digenome—positive sites with a frequency lower than the background noise level . It is possible to induce mutations. Importantly, the method allows identification of BE3 nontarget sites where base calibration is detected at frequencies of 0.13% or less, indicating that Digenome—seq is a very sensitive method. Cas9 nuclease specific for EMX1 induces indels at 9 of 18 positions at frequencies above the noise level, demonstrating that BE3 and Cas9 non-target positions are often different. Taken together, these results show that BE3 nontarget locations can be identified using Digenome-seq data.
이상의 설명으로부터, 본 발명이 속하는 기슬분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 이와 관련하여, 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허 청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.  From the above description, those skilled in the art of the present invention will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features. In this regard, the embodiments described above are to be understood in all respects as illustrative and not restrictive. The scope of the present invention should be construed that all changes or modifications derived from the meaning and scope of the following claims and equivalent concepts rather than the detailed description are included in the scope of the present invention.

Claims

【특허청구범위】 [Patent Claims]
【청구항 1】  [Claim 1]
(i) 시티딘 디아미나제 또는 이의 암호화 유전자, 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자, 및 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA 단일 가닥 절단을 유도하는 단계 ;  (i) DNA single by introducing cytidine deaminase or its coding gene, inactivated target specific endonuclease or its coding gene, and guide RNA or its coding gene into cells or contacting DNA isolated from the cell Inducing strand cleavage;
(ii) 상기 단일 가닥 절단된 DNA 절편의 핵산 서열을 분석하는 단계; 및  (ii) analyzing the nucleic acid sequence of said single stranded DNA fragment; And
(iii) 상기 분석에 의여 수득된 핵산 서열 데이터로부터 단일 가닥 절단 위치를 확인하는 단계  (iii) identifying the single stranded cleavage site from the nucleic acid sequence data obtained by the analysis
를 포함하고,  Including
상기 불활성화된 표적특이적 엔도뉴클레아제는 DNA 이중 가닥을 절단하는 엔도뉴클레아제 활성을 상실한 Cas9 단백질 또는 Cpfl 단백질인, 시티딘 디아미나제의 비표적 위치 (off— target site) 확인 방법.  The inactivated target specific endonuclease is a Cas9 protein or Cpfl protein that has lost the endonuclease activity of cleaving DNA double strands, the off-target site of the cytidine deaminase.
【청구항 2】 [Claim 2]
(i) 시티딘 디아미나제 또는 이의 암호화 유전자, 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자, 및 가이드 R A 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA 단일 가닥 절단을 유도하는 단계; 및  (i) DNA single by introducing cytidine deaminase or its coding gene, inactivated target specific endonuclease or its coding gene, and guide RA or its coding gene into a cell or contacting DNA isolated from the cell Inducing strand cleavage; And
(ii) 상기 단일 가닥 절단된 DNA 절편의 핵산 서열을 분석하는 단계 를 포함하고,  (ii) analyzing the nucleic acid sequence of said single stranded DNA fragment,
상기 불활성화된 표적특이적 엔도뉴클레아제는 DNA 이중 가닥을 절단하는 엔도뉴클레아제 활성을 상실한 Cas9 단백질 또는 Cpfl 단백질인, 시티딘 디아미나제에 의하여 염기 교정 (base editing)이 도입된 DNA의 서열 분석 방법 .  The inactivated target specific endonuclease is a Cas9 protein or a Cpfl protein that has lost endonuclease activity that cleaves DNA double strands. Sequence analysis method.
【청구항 3】  [Claim 3]
(i) 시티딘 디아머나제 또는 이의 암호화 유전자, 불활성화된 표적특이적 엔도뉴클레아제 또는 이의 암호화 유전자, 및 가이드 RNA 또는 이의 암호화 유전자를 세포에 도입하거나 세포로부터 분리된 DNA 에 접촉시켜 DNA단일 가닥 절단을 유도하는 단계;  (i) DNA single by introducing cytidine diamerase or its coding gene, inactivated target specific endonuclease or its coding gene, and guide RNA or its coding gene into cells or contacting DNA isolated from the cell Inducing strand cleavage;
(ii) 상기 단일 가닥 절단된 DNA 절편의 핵산 서열을 분석하는 단계; 및  (ii) analyzing the nucleic acid sequence of said single stranded DNA fragment; And
(iii) 상기 분석에 의여 수득된 핵산 서열 데이터로부터 단일 가닥 절단 위치를 확인하는 단계 를 포함하고, (iii) identifying the single stranded cleavage site from the nucleic acid sequence data obtained by the analysis Including
상기 불활성화된 표적특이적 엔도뉴클레아제는 DNA 이중 가닥을 절단하는 엔도뉴클레아제 활성을 상실한 Cas9 단백질 또는 Cpfl 단백질인, 시티딘 디아미나제의 염기 교정 위치 확인 방법.  Said inactivated target specific endonuclease is Cas9 protein or Cpfl protein which lost the endonuclease activity which cut | disconnects DNA double strands, The base correcting position of the cytidine deaminase.
【청구항 4】  [Claim 4]
제 1항 내지 제 3항 중 어느 한 항에 있어서, 상기 불활성화된 표적특이적 엔도뉴클레아제는 스트랩토코커스 피요젠스 [Streptococcus pyogenes) 유래의 Cas9 단백질의,  The method according to any one of claims 1 to 3, wherein the inactivated target specific endonuclease is selected from Cas9 protein derived from Streptococcus pyogenes.
(1) D10, H840, 또는 D10 및 H840;  (1) D10, H840, or D10 and H840;
(2) D1135, R1335, 및 T1337로 이루어진 군에서 선택된 하나 이상; 또는  (2) one or more selected from the group consisting of D1135, R1335, and T1337; or
(3) 상기 (1) 및 (2)의 아미노산 잔기 모두  (3) all of the amino acid residues of (1) and (2)
가 야생형과 다른 아미노산으로 치환된 것인, 방법.  Is substituted with an amino acid different from the wild type.
【청구항 5】  [Claim 5]
제 1항 내지 제 4항 중 어느 한 항에 있어서, 상기 가이드 RNA는 CRISPR RNA (crRNA) 및 r s—act ivat ing crRNA (tracrRNA)를 포함하는 이중 RNA, 또는 단일 가이드 RNA (sgRNA)인, 방법.  5. The method of claim 1, wherein the guide RNA is a double RNA comprising a CRISPR RNA (crRNA) and an r s—act ivat ing crRNA (tracrRNA), or a single guide RNA (sgRNA). 6.
【청구항 6]  [Claim 6]
제 1항 내지 제 5항 중 어느 한 항에 있어서, 상기 시티딘 디아미나제는 APOBEC (apol ipoprotein B mRNA editing enzyme, catalytic polypept ide-1 ike) , AID (act i vat ion- induced cyt idine deaminase) , CDA (cytidine deaminase) , 또는 이들이 조합인 , 방법 .  The method according to any one of claims 1 to 5, wherein the cytidine deaminase is APOBEC (apol ipoprotein B mRNA editing enzyme, catalytic polypept ide-1 ike), AID (act i vat ion-induced cyt idine deaminase) , CDA (cytidine deaminase), or a combination thereof.
【청구항 7]  [Claim 7]
제 1항 내지 제 6항 중 어느 한 항에 있어서, 상기 시티딘 디아미나제 및 불활성화된 표적특이적 엔도뉴클레아제는  The method according to any one of claims 1 to 6, wherein the cytidine deaminase and inactivated target specific endonuclease
융합단백질 형태,  Fusion protein form
시티딘 디아미나제 또는 이를 암호화하는 mRNA와, 불활성화된 표적특이적 엔도뉴클레아제 또는 이를 암호화하는 mRNA의 흔합물 형태, 또는  A combination form of cytidine deaminase or mRNA encoding it and an inactivated target specific endonuclease or mRNA encoding it, or
시티딘 디아미나제 암호화 유전자와 불활성화된 표적특이적 엔도뉴클레아제 암호화 유전자를 각각 또는 함께 포함하는 플라스미드 형태 로 사용되는 것인, 방법  Used in the form of a plasmid comprising each or together a cytidine deaminase coding gene and an inactivated target specific endonuclease coding gene.
【청구항 8】  [Claim 8]
제 1항 내지 제 7항 중 어느 한 항에 있어서, 우라실-특이적 제거 시약 (Uracil -Specific Excision Reagent; USER)를 사용하지 않고, 8. A uracil-specific removal reagent according to any one of claims 1 to 7. (Uracil -Specific Excision Reagent; USER)
상기 우라실-특이적 제거 시약은 우라실 DNA 글라이코실라제 (Uracil The uracil-specific removal reagent is uracil DNA glycosylase (Uracil
DNA glycosylase; UDG) , 엔도뉴클레아제 VIII, 및 이들의 조합인, DNA glycosylase; UDG), endonuclease VIII, and combinations thereof,
방법.  Way.
【청구항 9】  [Claim 9]
저 U항 내지 제 8항 중 어느 한 항에 있어서, 시험관 내 (in vitro)에서 수행되는 것인, 방법.  The method according to any one of claims U to 8, which is carried out in vitro.
【청구항 10]  [Claim 10]
저 U항 내지 제 9항 중 어느 한 항에 있어서, 상기 세포로부터 분리된 DNA는 유전체 DNA인 , 방법 .  The method of any of claims U-9, wherein the DNA isolated from the cell is genomic DNA.
【청구항 11】  [Claim 11]
저 U항 내지 제 10항 중 어느 한 항에 있어서,  The method according to any one of claims U to 10,
단계 (i)의 세포로부터 분리된 DNA는 유전체 DNA이고,  DNA isolated from the cell of step (i) is genomic DNA,
단계 (ii)의 핵산 서열 분석은 전체 유전체 시퀀싱에 의하여 수행되는 것인, 방법  The nucleic acid sequencing of step (ii) is performed by whole genome sequencing
【청구항 12]  [Claim 12]
제 1항에 있어서, 상기 단계 ( i i i ) 이후에 ,  The method according to claim 1, wherein after the step i i i,
(iv) 상기 절단 위치가 표적 위치' (onᅳ target site)가 아닌 경우, 비표적 위치 (off-target site)로 판단하는 단계를 추가로 포함하는, 방법. (iv) the case where the cutting position is not a target position '(on eu target site), further including the step of determining a non-target position (off-target site),.
【청구항 13】 [Claim 13]
제 12항에 있어서, 상기 단계 (iii)는 5' 말단이 수직정렬 되는 DNA 리드의 수를 확인하는 단계를 포함하는, 방법.  The method of claim 12, wherein step (iii) comprises identifying the number of DNA reads that are vertically aligned at the 5 ′ end.
【청구항 14]  [Claim 14]
제 13항에 있어서, 상기 비표적 위치는 5' 말단이 수직정렬 되는 DNA 리드의 수가 2 이상인 것인, 방법.  The method of claim 13, wherein the non-target position is at least two DNA reads that are vertically aligned at the 5 ′ end.
【청구항 15】  [Claim 15]
제 14항에 있어서,  The method of claim 14,
상기 비표적 위치는 다음 중 하나 이상에 해당하는 것인, 방법:  Wherein the non-target location corresponds to one or more of the following:
DNA 절편 중 절단이 일어난 가닥과 상보적 가닥이 PAM 서열을 포함; The strand where the cleavage occurred in the DNA fragment and the complementary strand comprises the PAM sequence;
DNA 절편 중 절단이 일어난 가닥과 상보적 가닥이 표적 위치의 서열과 15개 이하의 불일치 뉴클레오타이드를 포함; 및 The strand in which the cleavage in the DNA fragment and the complementary strand comprises up to 15 mismatched nucleotides with the sequence at the target position; And
DNA 절편 중 절단이 일어난 가닥과 상보적 가닥이 하나 이상의 시토신 (C)의 우라실 (U) 또는 티민 (T)으로의 변환을 포함.  Among the DNA fragments, the strands where the cleavage occurred and the complementary strands contain the conversion of one or more cytosines (C) to uracil (U) or thymine (T).
PCT/KR2018/000747 2017-01-17 2018-01-16 Method for identifying base editing off-target site by dna single strand break WO2018135838A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880007380.8A CN110234770A (en) 2017-01-17 2018-01-16 Identify that base editor is missed the target the method in site by DNA single-strand break
JP2019559249A JP2020505062A (en) 2017-01-17 2018-01-16 Base editing non-target position confirmation method by DNA single strand break
EP18741209.3A EP3572525A4 (en) 2017-01-17 2018-01-16 Method for identifying base editing off-target site by dna single strand break

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762446951P 2017-01-17 2017-01-17
US62/446,951 2017-01-17

Publications (2)

Publication Number Publication Date
WO2018135838A2 true WO2018135838A2 (en) 2018-07-26
WO2018135838A3 WO2018135838A3 (en) 2018-12-06

Family

ID=62908205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/000747 WO2018135838A2 (en) 2017-01-17 2018-01-16 Method for identifying base editing off-target site by dna single strand break

Country Status (6)

Country Link
US (1) US20180258418A1 (en)
EP (1) EP3572525A4 (en)
JP (1) JP2020505062A (en)
KR (1) KR102084186B1 (en)
CN (1) CN110234770A (en)
WO (1) WO2018135838A2 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
WO2020063520A1 (en) * 2018-09-30 2020-04-02 中山大学 Method for detecting off-target effect of adenine base editor system based on whole-genome sequencing and use thereof in gene editing
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
WO2020173150A1 (en) * 2019-02-28 2020-09-03 中国科学院脑科学与智能技术卓越创新中心 Off-target single nucleotide variants caused by single-base editing and high-specificity off-target-free single-base gene editing tool
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020235974A2 (en) * 2019-05-22 2020-11-26 주식회사 툴젠 Single base substitution protein, and composition comprising same
US20230086199A1 (en) * 2019-11-26 2023-03-23 The Broad Institute, Inc. Systems and methods for evaluating cas9-independent off-target editing of nucleic acids
US20230295710A1 (en) * 2020-03-04 2023-09-21 Suzhou Qi Biodesign Biotechnology Company Limited Method for detecting random off-target effect of single-base editing system
KR20230002481A (en) * 2020-04-24 2023-01-05 기초과학연구원 Genome editing using CAS9 or CAS9 variants
WO2022056301A1 (en) * 2020-09-11 2022-03-17 Metagenomi Ip Technologies, Llc Base editing enzymes
WO2023132704A1 (en) * 2022-01-07 2023-07-13 주식회사 툴젠 Method for predicting possible off-targets in gene editing process

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160058703A (en) 2014-11-14 2016-05-25 기초과학연구원 Method for detecting genome-wide off-target sites of programmable nucleases

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100080068A (en) * 2008-12-31 2010-07-08 주식회사 툴젠 A novel zinc finger nuclease and uses thereof
US10119133B2 (en) * 2013-03-15 2018-11-06 The General Hospital Corporation Using truncated guide RNAs (tru-gRNAs) to increase specificity for RNA-guided genome editing
WO2014186686A2 (en) * 2013-05-17 2014-11-20 Two Blades Foundation Targeted mutagenesis and genome engineering in plants using rna-guided cas nucleases
AU2014281027A1 (en) * 2013-06-17 2016-01-28 Massachusetts Institute Of Technology Optimized CRISPR-Cas double nickase systems, methods and compositions for sequence manipulation
US9737604B2 (en) * 2013-09-06 2017-08-22 President And Fellows Of Harvard College Use of cationic lipids to deliver CAS9
US9068179B1 (en) * 2013-12-12 2015-06-30 President And Fellows Of Harvard College Methods for correcting presenilin point mutations
EP3613854A1 (en) * 2014-03-05 2020-02-26 National University Corporation Kobe University Genomic sequence modification method for specifically converting nucleic acid bases of targeted dna sequence, and molecular complex for use in same
AU2015298571B2 (en) * 2014-07-30 2020-09-03 President And Fellows Of Harvard College Cas9 proteins including ligand-dependent inteins
CN105647968B (en) * 2016-02-02 2019-07-23 浙江大学 A kind of CRISPR/Cas9 working efficiency fast testing system and its application
US11920151B2 (en) * 2016-09-13 2024-03-05 Toolgen Incorporated Method for identifying DNA base editing by means of cytosine deaminase
KR102151065B1 (en) * 2016-12-23 2020-09-02 기초과학연구원 Composition and method for base editing in animal embryos
GB2605925B (en) * 2016-12-23 2023-02-22 Harvard College Gene editing of PCSK9

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160058703A (en) 2014-11-14 2016-05-25 기초과학연구원 Method for detecting genome-wide off-target sites of programmable nucleases

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"GenBank", Database accession no. NM _ 001134391.1
KIM, D.KIM, S.KIM, S.PARK, J.KIM, J.S.: "Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq", GENOME RESEARCH, vol. 26, 2016, pages 406 - 415, XP055448257, DOI: doi:10.1101/gr.199588.115
See also references of EP3572525A4

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
WO2020063520A1 (en) * 2018-09-30 2020-04-02 中山大学 Method for detecting off-target effect of adenine base editor system based on whole-genome sequencing and use thereof in gene editing
WO2020173150A1 (en) * 2019-02-28 2020-09-03 中国科学院脑科学与智能技术卓越创新中心 Off-target single nucleotide variants caused by single-base editing and high-specificity off-target-free single-base gene editing tool
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Also Published As

Publication number Publication date
JP2020505062A (en) 2020-02-20
KR102084186B1 (en) 2020-03-03
WO2018135838A3 (en) 2018-12-06
KR20180084671A (en) 2018-07-25
CN110234770A (en) 2019-09-13
EP3572525A2 (en) 2019-11-27
US20180258418A1 (en) 2018-09-13
EP3572525A4 (en) 2020-09-30

Similar Documents

Publication Publication Date Title
WO2018135838A2 (en) Method for identifying base editing off-target site by dna single strand break
KR102026421B1 (en) Method of identifying base editing by cytosine deaminase in DNA
JP7038079B2 (en) CRISPR hybrid DNA / RNA polynucleotide and usage
US20200325471A1 (en) Compositions and methods for detecting nucleic acid regions
CN107690480B (en) Evaluation of CAS9 molecule/guide RNA molecule complexes
US20220033785A1 (en) Rna programmable epigenetic rna modifiers and uses thereof
KR102602047B1 (en) Using truncated guide rnas (tru-grnas) to increase specificity for rna-guided genome editing
Xie et al. High-fidelity SaCas9 identified by directional screening in human cells
KR20180103923A (en) Compositions and methods for the treatment of hemochromatosis
KR102210700B1 (en) Method of identifying base editing using adenosine deaminase
JP2023517041A (en) Class II type V CRISPR system
KR101600902B1 (en) Method of synthesis of gene library using codon randomization and mutagenesis
CA3219187A1 (en) Class ii, type v crispr systems
JP2024501892A (en) Novel nucleic acid-guided nuclease
劉雁成 et al. Genetic And Functional Analyses of Archaeal ATP-Dependent RNA Ligase
CN117062827A (en) CRISPR related transposon subsystem and methods of use thereof
KR20100088174A (en) Purification and usage of the extremozyme from an anaerobic archaebacteria
JP2014221028A (en) Methods for introducing nick into double-stranded deoxyribonucleic acid and proteins having nicking enzyme activity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18741209

Country of ref document: EP

Kind code of ref document: A2

ENP Entry into the national phase

Ref document number: 2019559249

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018741209

Country of ref document: EP

Effective date: 20190819