WO2017099494A1 - Genome editing composition comprising cpf1, and use thereof - Google Patents

Genome editing composition comprising cpf1, and use thereof Download PDF

Info

Publication number
WO2017099494A1
WO2017099494A1 PCT/KR2016/014379 KR2016014379W WO2017099494A1 WO 2017099494 A1 WO2017099494 A1 WO 2017099494A1 KR 2016014379 W KR2016014379 W KR 2016014379W WO 2017099494 A1 WO2017099494 A1 WO 2017099494A1
Authority
WO
WIPO (PCT)
Prior art keywords
crrna
sequence
target
cpfl
protein
Prior art date
Application number
PCT/KR2016/014379
Other languages
French (fr)
Korean (ko)
Other versions
WO2017099494A8 (en
Inventor
김진수
허준호
김대식
김정은
김경미
김혜란
구태영
Original Assignee
기초과학연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 기초과학연구원 filed Critical 기초과학연구원
Publication of WO2017099494A1 publication Critical patent/WO2017099494A1/en
Publication of WO2017099494A8 publication Critical patent/WO2017099494A8/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression

Definitions

  • the present invention relates to a composition for genome correction comprising Cpfl, a genome correction method using the same, and a technology for producing a transformed eukaryotic organism.
  • Target genes are linked through the combination of Cas9 protein and guide RNA.
  • the type II CRISP-Cas9 system which cuts effectively, is widely used in various ways.
  • S9 pyogenes-derived Cas9 as well as ortholog Cas9 of other species have been developed as a method of using the scissors.
  • This technology is faster and more efficient than conventional mutant production methods and has the advantage of only producing guide RNA according to the target gene.
  • the Cas9-system has a number of advantages, but one limitation is that the target DNA must have a sequence called a protospacer adjacent motif (PAM).
  • PAM protospacer adjacent motif
  • Other recently used Cas9 proteins including S. pyogenes Cas9, recognize PAM at the 3 'position of the target sequence.
  • S. pyogenes Cas9 which is widely used, recognizes 3 'NGG PAM of the target gene region and has a limitation that it cannot be used for a target without this sequence.
  • Another feature of the Cas9-system, such as S. pyogenes Cas9 is that it has two nuclease domains in a single protein, cutting both strands of the target DNA into blunt ends. In this case, gene knock-out efficiency is high through insertion and deletion (indel) through non-homologous end joining (NHEJ), whereas knock-in using homologous recombination (HR) has a low efficiency.
  • One example provides a complex comprising a Cpf l protein or DNA encoding the same, and a guide RNA or DNA encoding the same.
  • composition for genome calibration comprising a Cpfl protein or DNA encoding the same, and a guide RNA or DNA encoding the same.
  • Another example provides a genome calibration method using Cpfl protein or DNA encoding the same, and guide RNA or DNA encoding the same.
  • Cpfl protein or DNA encoding and / or guide RNA or DNA encoding the same is used in the form of a complex comprising Cpfl protein and guide RNA or a ribonucleioprotein (RNA) in which they form a complex, or
  • the DNA encoding the Cpf l protein and the DNA encoding the guide RNA may be included in separate vectors, or may be used together in one vector.
  • the compositions and methods may be applied to eukaryotic organisms.
  • the eukaryotic organism may be eukaryotic cells (e.g., yeast, eukaryotic and / or eukaryotic plant-derived cells (e.g.
  • eukaryotic animals e.g., humans, monkeys
  • eukaryotic plants e.g algae, green algae, corn, soybeans, wheat, rice, etc. It may be selected.
  • Another example provides a method for producing a transformed organism by genotyping with a Cpf l protein or DNA encoding the same and guide RNA or DNA encoding the same.
  • transgenic organisms include all eukaryotic cells (e.g., yeasts, eukaryotic and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g., humans, It may be selected from the group consisting of primates such as monkeys, dogs, pigs, cattle, sheep, goats, mice, rats, etc., and eukaryotic plants (eg, algae such as green algae, soybeans, wheat, rice, etc.).
  • eukaryotic cells e.g., yeasts, eukaryotic and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.)
  • eukaryotic animals e.g., humans, It may be selected from the group consisting of primates such as monkeys, dogs, pigs, cattle, sheep, goats, mice, rats, etc.
  • eukaryotic plants
  • Another example is a method of delivering an RNA guided endonuclease (RGEN) or a complex comprising a DNA encoding the same and a DNA or a DNA encoding the same to an organism, the method of local injection (eg, a lesion). Or direct injection of a target site), microinjection method, electroporat ion, lipofection, or the like.
  • RGEN RNA guided endonuclease
  • Cpfl is a type V CRISPR system protein. A single protein binds to crRNA and cleaves the target gene.
  • Cpfl protein acts as a single crRNA, it is the same as that of Cas9 and trans-act ivat ing crRNA.
  • Cpfl has a PAM at the 5 'position of the target sequence, and the length of the guide RNA that determines the target is shorter than that of the Cas9. Utilizing this feature, Cpfl has the advantage that the genome can be corrected even in the target sequence where Cas9 cannot be used, and it is relatively easy as compared with Cas9 for producing the guide RNA crRNA.
  • Cpf l indicates that the target DNA Since the 5 'overhang (st icky end) is generated instead of the blunt-end at the cut position, more accurate and diverse genetic correction is possible.
  • the term 'genome edi ting 1 ' refers to a nucleic acid molecule (eg, 1-100, OOObp, 1-10) by cleavage at a target site of a target gene. , OOObp, 1-1000, 1-lOObp, l-70bp, l-50bp, l-30bp, 1-20bp, or l-10bp) loss, alteration, and / or loss of gene function Can be used to mean repair.
  • the type V CRISPR-Cpfl system using the Cpfl protein can be cleaved at a desired position of the target DNA.
  • the type V CRISPR-Cpfl system using the Cpfl protein is capable of correcting specific genes in cells.
  • a method for overcoming the disadvantages of the conventional microinject ion method is provided.
  • the genome calibration technique using the Cpfl system is not limited thereto.
  • CRISPR-Cpfl ribonucleic acid proteins are introduced into cells or organisms in the form of recombinant vectors comprising DNA encoding Cpfl and recombinant vectors comprising DNA encoding crRNA, or ⁇ complexes comprising Cpf l protein and crRNA It may be introduced into a cell or organism in the form of a complex ribonucleic acid protein.
  • composition for genome calibration comprising a ribonucleic acid protein comprising a Cpfl protein or DNA encoding the same and a guide RNA (CRISPR RNA; crRNA) or a DNA encoding the same.
  • CRISPR RNA guide RNA
  • Another example provides a method for genome calibration of an organism comprising delivering to the organism ribonucleic acid protein comprising a Cpf l protein and guide RNA (CRISPR RNA; crRNA). Included in or used in the dielectric calibration composition or dielectric calibration method
  • the Cpfl protein or DNA encoding the same, and the guide RNA or DNA encoding the same, is used in the form of a mixture comprising the Cpf l protein and the guide RNA or a ribonucleioprotein (RNA) in which they are complex, or the Cpfl protein.
  • the DNA encoding and the DNA encoding the guide RA may be included in separate vectors, or may be included together in one vector.
  • the single-group compositions and methods can be applied to eukaryotic organisms.
  • the eukaryotic organism may be eukaryotic cells (e.g., yeast, eukaryotic and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g. vertebrates or invertebrates). More specifically, humans, mammals including primates such as pigs, dogs, pigs, sheep, goats, mice, rats, etc.), and eukaryotic plants (e.g., bird corn such as green algae, soybeans, wheat, rice, etc.). It may be selected from the group consisting of monocotyledonous or dicotyledonous plants, such as).
  • the method for producing a transgenic organism may include delivering a Cpfl protein or DNA encoding the same and guide RNA (CRISPR RNA; crRNA) or DNA encoding the same to eukaryotic cells. If the transgenic organism is a transgenic eukaryotic animal or transgenic eukaryotic plant, the preparation method may further comprise culturing and / or differentiating the eukaryotic cells simultaneously with or after the delivering.
  • CRISPR RNA CRISPR RNA
  • crRNA guide RNA
  • the preparation method may further comprise culturing and / or differentiating the eukaryotic cells simultaneously with or after the delivering.
  • Another example provides a transformed organism produced by the method for producing a transformed organism.
  • the transforming organism may be any eukaryotic cell (e.g., yeast, eukaryotic, and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g., vertebrates). Or invertebrates, more specifically, primates such as humans, animals, etc., mammals including dogs, pigs, cattle, sheep, goats, mice, rats, etc.), and eukaryotic plants (eg, birds, such as green algae, corn, Soybean, wheat, rice, such as monocotyledonous or dicotyledonous) may be selected from the group consisting of.
  • eukaryotic cell e.g., yeast, eukaryotic, and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g., vertebrates). Or invertebrates,
  • the eukaryotic animal may be other than human, the eukaryotic cell is Cells isolated from eukaryotic animals, including humans.
  • ribonucleic acid protein 1 refers to an RNA guide.
  • crRNA guide RNA
  • the Cpfl protein is an endonuclease of the new CRISPR system that is distinct from the CRISPR / Cas system, which is relatively small in size compared to Cas9, does not require tracrRNA, and can act by a single guide RNA.
  • the Cpfl protein is a PAM (protospacer-adjacent motif) sequence located at the 5 'end, 5'- TTN-3' or 5'- TTTN-3 1 (N is any nucleotide, T, G Thymine-rich DNA sequences, such as nucleotides having a base of C, or C) and cut the double chains of DNA to create a cohesive end (cohesive double-strand break). The cohesive terminus thus generated may facilitate NHE J-mediated transgene knock-in at the target position (or cleavage position).
  • the Cpfl protein is a genus Candidatus, Lachnospira, Butyri vibrio, Peregrinini bacteria
  • (Eubacterium) may be derived from, for example, Parcubacteria bacterium
  • Acidaminococcus sp. (BV3L6), Porphyromonas macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevi or J cam 's, Prevotella disiens, Moraxella bovoculi (237), Smiihella sp. (SC_K08D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisel la novicida (U112), Candidatus Methanoplasma termitum, Candidatus Paceibacter, Eubacterium el i gens, and the like, but are not limited thereto.
  • the Cpfl protein is
  • the Cpfl protein may further include, but is not limited to, elements commonly used for nuclear transfer of eukaryotic cells (eg, nuclear localization signal (NLS), etc.).
  • the Cpfl protein may be used in the form of a purified protein, or may be used in the form of a DNA encoding the same, or a recombinant vector including the DNA.
  • the guide R A may be appropriately selected depending on the type of Cpfl protein and / or the microorganism derived therefrom to form the complex.
  • crR A used in the Cpfl system can be represented by the following general formula:
  • nl is absent, U, A, or G, n2 is A or G, n3 is U, A, or C, n4 is absent or G, C, or A, n5 is A, U, C, G, or absent, n6 is U, G or C, n7 is U or G,
  • N cpil is a targeting sequence comprising a nucleotide sequence that can be localized with a gene target site, and is determined according to the target sequence of the target gene, q is an integer of 15 to 30, an integer of 15 to 29, ⁇ of 15 to 28
  • 3 ⁇ 4a an integer from 15 to 26, from 15 to 26, from 3 to 25, from 15 to 24, from 15 to 23, from 15 to 22, from 3 to 15 from 15 to 21, from 15 to 21
  • the target sequence (sequencing with crRNA) of the S gene is a ⁇ sequence (5 ' - ' ⁇ -3 1 or 5'-TTTN-3; (e. g. to 'N is an arbitrary nucleotide, a, ⁇ , G, or 3 nucleotides of Im) having a base C' adjacent to the position , continuity A) 15 to 30, 15 to 29, 15 to 28, 15 to 27, 15 to 26, 15 to 25, 15 to 24, 15 to 23, 15 to 22, 15 to 21 Dog, 15 to 20, 16 to 30, 16 to 29, 16 to 28, 16 inside 27, 16 to 26 '16 to 25, 16 to 24, 16 to 23, 16 to 22 Dogs 16 to 21, 16 to 20, 17 to 30, 17 to 29, 17 to 28, 17 to 27, 17 to 26, 17 to 25, 17 to 24, 17 to 23 Dog, 17 to 22, 17 to 21, 17 to 20, 18 to 30, 18 to 29, 18 to 28, 18 to 27, 18 to 26 '18 to 25, 18 to 24 Dogs, 18 to 23, 18 to 22, 18 to 21, or 18 to 20
  • nucleotide sequence of the target site of the target electron is the nucleotide sequence of the target site of the target electron.
  • 5 nucleotides (5 'terminal stem region) from 6th to 10th are counted at the 5' end and 15th (16th when ⁇ 4 is present) to 19th (20 when ⁇ 4 is present).
  • 5 nucleotides (3 1 terminal stem region) up to the first) are composed of complementary nucleotides antiparallel to each other (ant iparal l el) to form a double stranded structure (stem structure), and the 5 'terminal stem region and 3' Three to five nucleotide garroof structures between the terminal stem sites can be formed.
  • the crRNA (eg, represented by Formula 1) of the Cpf l protein may further include 1-3 guanine (G) at the 5 ′ end.
  • the nucleotide sequence capable of hybridizing with the gene target site is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at 99% with the nucleotide sequence (target sequence) of the gene target site.
  • nucleotide sequence having 100% sequence complementarity hereinafter, unless otherwise specified, the same meaning is used, and the sequence homology can be confirmed using conventional sequence comparison means (such as BLAST)).
  • the crRNA that can be hybridized with the target sequence may correspond to a corresponding sequence located on the opposite strand of the nucleic acid strand where the target sequence (located on the same strand as the strand on which the PAM sequence is located) (ie, the strand on which the PAM sequence is located). It may have a complementary sequence, and if described differently, the crRNA may include a sequence in which the T is replaced by U in the target sequence represented by the DNA sequence as the targeting sequence site.
  • crRNA may be expressed as a target sequence, and in this case, even if not mentioned otherwise, the crRNA sequence may be interpreted as a sequence in which T is replaced with U in the target sequence.
  • the nucleotide sequence (target sequence) of the gene target site is at least 50%, at least 66%, or at least 75% sequence homology with TTTN or TTN (N is A, T, C, or G), or these at the 5 'end.
  • PAM protospacer-adj ' acent mot if
  • N is A, T, C, or G
  • PAM protospacer-adj ' acent mot if
  • N a sequence complementary to the PAM sequence at the 3 'end in reverse (NAM or NAA, or a sequence having at least 50%, at least 66%, or at least 75% sequence homology with them;
  • N is A, T , C, or G;
  • inverted PAM sequence at the 3 'end eg, the 3' end and the inverted PAM sequence at the target sequence May be directly connected (Ont distance), or may be connected with a distance of 1 to lOnt).
  • the 5 'terminal region sequence (part except the targeting sequence region) of the crR is exemplarily described in Table 2:
  • the crRNA may be a crRNA transcribed in vitro with a plasmid as a template.
  • the crRNA is a phosphate-phosphate bond (eg,
  • crRNA does not include a phosphate-phosphate bond at the 5 'end, it may be a marked reduction in immune response and / or cytotoxicity, as compared to when it contains it.
  • the cell Reduced toxicity does not cause immune reaction (innate immuni ty); And / or inhibit cell survival, inhibit cell proliferation, and / or alleviate (reduce) and / or eliminate (resolve) induction of cell damage, hemolysis, and / or death.
  • the guide RA which does not contain a phosphate-phosphate bond at the 5 1 terminal includes a monophosphate group or a 0H group at the 5 'end, or in addition, a cell in a eukaryotic cell or a eukaryote which is distinguished from a pathogen such as a virus or a bacterium. It may mean having a modified form of the 5 'end of all RNAs that can be present without causing toxicity (eg, a 5 1 terminal form that is naturally or artificially modified for reasons of immunosuppression, stability enhancement, labeling, etc.).
  • the crRNA was prepared by in vitro tro transcription using prokaryotic RA polymerases such as T7 RA polymerase, T3 RNA polymerase, SP6 RNA polymerase, and then, 2 of 3 phosphate groups at the 5 'end. More than one phosphate group, such as three phosphate groups,
  • Triphosphate and / or diphosphate removed may be chemically synthesized such that it does not contain a phosphate-phosphate bond (eg, diphosphate and / or triphosphate) at the 5 'end.
  • Removal of the 5 1 terminal phosphate group such as the removal of two or more phosphate groups (ie, triphosphate and / or diphosphate), is all conventional, which breaks ester bonds with phosphate groups to liberate two or three phosphate groups from RNA.
  • Phosphorus may be by any method, for example, may be performed by treating phosphatase, but is not limited thereto.
  • the phosphatase is Cal f
  • CIP Intest inal alkal ine Phosphatase
  • SAP Shr imp Alkal ine Phosphatase
  • Antarctic Phosphatase and the like, but is not limited thereto and selected from all enzymes that liberate phosphate groups from RNA. Can be.
  • the genome calibration composition, genome calibration method, composition for preparing a transformant, and Cpf l protein and crRNA used in the transformant preparation method provided herein are purified Cpf l protein and phosphoric acid at the 5 'end. Does not contain phosphate bonds
  • crRNA eg, chemically synthesized
  • the efficiency of transferring the Cpf l protein into the cell or organism using a vector can be improved.
  • a vector for example, a viral vector such as Adeno-associated virus
  • the AAV vector In the same viral vector, due to the packaging limitation of the vector, it is generally well known that the efficiency of viral production and intracellular delivery is poor when genes exceeding the packaging limit are cloned.
  • the Cpfl protein or DNA encoding the same may be present at any position of at least one or more (eg, one).
  • It may be one comprising one or more (eg two) of two or more (eg two) cleaved fragments produced by cleavage.
  • the two or more Cpfl cleavage fragments may cover the full length Cpfl without overlap.
  • the two or more cleavage fragments (DNA fragments) may be included together in one vector or each included in two or more vectors to be delivered to a cell or organism.
  • the cleavage point of the Cpf l protein or DNA encoding the Cpf l protein may be an externally exposed site of the Cpf l protein or a site other than a domain having a predetermined function (for example, a domain-domain l inker, or the external exposure). It can be located within a DNA sequence encoding a site other than a site or domain.
  • Acidaminococcus sp. for BVBLG-derived Cpfl (AsCpfl), the cleavage point on the protein is between the 9th and 902th amino acids, the 886th and 887th amino acids in the AsCpfl amino acid sequence (Genbank Accession No. P_021736722.1; 1307 amino acids length). And at least one point selected from the group consisting of between 399th and 400th amino acids, and between 526th and 527th amino acids.
  • the cleavage fragment may include a) a first protein fragment from the first amino acid to the 901 th amino acid or a first DNA fragment encoding the same and a first DNA fragment from the 902 th amino acid to the 1307 th amino acid, among the AsCpfl amino acid sequences (1307 amino acids in length). Two protein fragments or a second DNA fragment encoding the same;
  • cleavage sites and cleavage fragments have been described using AsCpfl as an example, the cleavage sites and cleavage fragments may be applied at corresponding positions in Cpfl derived from other organisms. "The corresponding position 1 'in Cpfl from other organisms
  • AsCpfl amino acid sequence or DNA sequence encoding the same and the amino acid sequence of the Cpfl of the organism or DNA sequence encoding the same means for conventional sequence comparison (e.g. BLAST (Basic Local Alignment Search Tool; e.g. PS I -BLAST (Position- Specific Iterative) BLAST); blast.ncbi.nlm.nih.gov/Blast.cgi), etc.), which will be apparent to those of ordinary skill in the art.
  • BLAST Basic Local Alignment Search Tool
  • PS I -BLAST Purition- Specific Iterative
  • blast.ncbi.nlm.nih.gov/Blast.cgi blast.ncbi.nlm.nih.gov/Blast.cgi
  • the cleavage fragment of the Cpfl protein or gene encoding it may comprise two or more cleavage fragments, wherein the two or more cleavage fragments may each be N-terminus and / or C-terminus (for a protein fragment) or 5 '.
  • the terminal and / or 3 'terminus (in the case of gene fragments) may be associated with a binding protein or a nucleic acid molecule encoding the binding protein.
  • the binding protein may be different proteins that bind to different sites of the same bioactive material.
  • the bioactive material is
  • rapamydn and the binding protein may be selected from the group consisting of FRB protein and FKBP protein, but is not limited thereto.
  • the two or more truncated gene fragments may be included in separate vectors or together in one vector.
  • the truncated gene fragments included in or separately from the cleaved separate vectors included in the vector are 5 'or 3' ends of each truncated gene fragment.
  • the vector comprising the first DNA fragment comprises a first DNA fragment encoding a promoter, a crRNA coding DNA, a promoter, and a first protein fragment of Cpfl protein, in the 5 'to 3' direction
  • Encoding the second protein fragment of the coding DNA, the promoter, and the Cpf l protein may include two DNA fragments (see FIG. 32A).
  • All steps performed in the genome correction method and the transformed organism manufacturing method provided herein may be performed in a cell or extracellular, in vivo or ex vivo.
  • Another example of the present invention is the disadvantage of having to process each one by checking each embryo through a microscope during delivery of ribonucleic acid protein cells (eg embryos) by the mi croinject i on method, especially when processing a large number of embryos in sequence. Long time is required, which provides a technique for overcoming the technical obstacles caused by the short duration of the embryo stay in 1 cel l stage.
  • ribonucleic acid protein cells eg embryos
  • vectors in which crRNA is not in the form of a PCR product (eg,
  • Plasmid was used as a form (recombinant vector) to confirm that the efficiency of genetic correction (cutting, insertion, deletion, etc.) compared with the case of using the PCR product (ampl icon) form (see Fig. 14a and 14b) , Providing a technique of using crRNA in the (cloned) form contained in the vector.
  • the vector may include a crRNA expression cassette comprising a transcriptional regulatory sequence such as a crRNA coding DNA and / or a promoter operably linked thereto.
  • RNA guided endonuclease RGEN
  • RNA guide RNA RGEN
  • ribonucleic acid protein RNP
  • DNA encoding these, or recombinant vectors comprising said D to cells e.g. eukaryotic cells
  • organisms e.g. eukaryotic organisms
  • topical injection e.g., Direct injection of a lesion or target site
  • micro croinject ion e.g., electroporat ion
  • lipofection e.g, using lipofectamine
  • RNA-guided endonuc 1 ease RGEN
  • RNPs guide RNA or ribonucleoproteins
  • DNA encoding them
  • recombinants comprising such DNAs.
  • the complex, ribonucleic acid protein, DNA, or recombinant vector may be subjected to topical injection (eg, Direct injection of lesions or target sites) (microinject ion), electroporat ion, lipofection (e.g., using lipofectamine) and the like can be delivered to cells (e.g.
  • the cell to be delivered is a plant cell
  • the plant cell is mixed with a surfactant such as polyethylene glycol (PEG), and then mixed with a complex or ribonucleic acid protein containing the endonuclease and guide RNA.
  • PEG polyethylene glycol
  • RNA-guided endonucleases RGENs
  • RNPs guide RNAs or ribonucleoproteins
  • DNAs encoding them or recombinant vectors comprising such DNAs.
  • a cell e.g., eukaryotic cell
  • an organic e.g. eukaryotic organism
  • the complex, ribonucleoprotein, DNA, or recombinant vector is transferred to a cell (e.g. Local injection into an organism (e.g., direct injection of lesion or target site) in microinjection, electroporat ion, lipofection
  • introducing into cells eg eukaryotic cells
  • organisms eg eukaryotic organisms
  • ipofect ion eg, using lipofectamine
  • the cell to be delivered is a plant cell
  • the plant cell is mixed with a surfactant such as polyethylene glycol (PEG), and then with a complex or ribonucleic acid protein containing the endonuclease and guide RNA.
  • PEG polyethylene glycol
  • the endonuclease e.g., Cpf l, Cas9, etc.
  • DNA and guide encoding the same RNA e. G., CrRNA, sgRNA, etc.
  • the delivery of proteins, or DNAs encoding them is characterized by microinjection, electrophoresis of (purified) endonucleases and guide RNAs expressed in vitro or ribonucleic acid proteins to which they are conjugated. It may be carried out by delivery to eukaryotic cells and / or eukaryotic organisms by means of electroporat ion, lipofection, etc.
  • the endonuclease e.g., Cpf l, Cas9, etc.
  • DNA and guide encoding the same RNA e. G., CrRNA, sgRNA, etc.
  • common to include a DNA encoding the same compound or ribonucleic acid e. G., CrRNA, sgRNA, etc.
  • a complex or ribonucleic acid protein comprising a DNA encoding the same (eg, Cpfl 'Cas9, etc.) or a guide RNA (eg, crRNA, sgRNA, etc.) or a DNA encoding the same may result in a DNA encoding an endonuclease.
  • a complex or ribonucleic acid protein comprising a DNA encoding the same (eg, Cpfl 'Cas9, etc.) or a guide RNA (eg, crRNA, sgRNA, etc.) or a DNA encoding the same may result in a DNA encoding an endonuclease.
  • Each of the expression cassettes and the expression cassette including the DNA encoding the guide RNA comprising a separate vector or one Recombinant vectors that are included in the vector may be topically injected (eg direct injection of lesions or target sites), microinjection, electroporat ion,
  • the expression cassette may include a conventional gene expression control sequence in a form operably linked with the endonuclease coding DNA or crRNA coding DNA.
  • operatively linked means a functional link between a gene expression control sequence and another nucleotide sequence.
  • the gene expression control sequence may be at least one selected from the group consisting of a replication origin, a promoter, a transcription terminator, and the like.
  • a promoter described herein is one of the transcriptional regulatory sequences that regulate the transcriptional initiation of a particular gene, typically from about 100 to about 2500 bp in length.
  • the promoter can be used without limitation as long as it can regulate transcriptional initiation in cells such as eukaryotic cells (eg, plant cells, or animal cells (eg, mammalian cells such as humans, mice, etc.)). Do.
  • the promoter may be a CMV promoter (e.g. human or mouse CMV i 'ediate-early promoter), a U6 promoter, an EF1-a (elongat ion factor l- ⁇ ) promoter, an EFl- ⁇ short (EFS) promoter. , SV40 promoter ,
  • Adenovirus promoter pi / promoter, r promoter, lac promoter, tac promoter, T7 promoter, vaccinia virus 7.5K promoter, HSV promoter, SV40E1 promoter, Respiratory syncytial virus (RSV) Promoter, metal lothkmin promoter, ⁇ -actin promoter, ubiquitin C promoter, human IL-2 (human interleukin-2) gene
  • human granulocyte one macrophage colony stimulating factor may be one or more selected from the group consisting of gene promoters, but is not limited thereto.
  • the promoter is a CMV i ⁇ ediate-early promoter, a U6 promoter, an EF1- ⁇ (elongation factor 1- ⁇ ) promoter, an EFl- ⁇ short (EFS) promoter, or the like.
  • the transcription termination sequence is polyadenylation Sequence (pA) and the like.
  • the replication origin may be ⁇ replication origin, SV40 replication origin, pMBl replication origin, adeno replication origin, MV replication origin, BBV replication origin and the like.
  • the vectors described herein include plasmid vectors, cosmid vectors and
  • Viral vectors such as bacteriophage vectors, adenovirus vectors, retrovirus vectors and adeno-associated virus vectors.
  • Vectors that can be used as the recombinant vector are plasmids used in the art (eg, pcDNA series, pSClOl, pGV1106, P ACYC177, ColEl, pKT230, pME290, pBR322, pUC8 / 9, pUC6, pBD9, pHC79, IJ61, pLAFRl, pHV14, pGEX series, pET series, pUC19, etc.), phage (e.g., ⁇ ⁇ 4 ⁇ , ⁇ -Charon, ⁇ ⁇ , M13, etc.) or
  • Viral vectors eg, adeno-associated virus (AAV) vectors, etc.
  • AAV adeno-associated virus
  • the eukaryotic organisms may be eukaryotic cells (e.g., yeast, eukaryotic, and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, etc.)), eukaryotic animals
  • E. G. Vertebrate or invertebrate, more specifically human. Circle wins such as primates, dogs, swine, cattle, sheep, and mammals such as, including goats, mice, rats, etc.
  • eukaryotic plants e.g., green algae Algae, corn, soybeans, wheat, rice, such as monocotyledonous or dicotyledonous plants
  • green algae Algae corn, soybeans, wheat, rice, such as monocotyledonous or dicotyledonous plants
  • RNA guide endonuclease may exist in the form of a complex or complex together with a single guide RNA (sgRNA) or a dual guide RNA, and may be genetically modified by cutting a targeting sequence of a gene target site included in the RNA. It refers to endonuclease that acts, typically in the type ⁇ and / or type V CRISPR / Cas system such as Cas9 protein (CRISPR associated protein 9), Cpfl protein (CRISPR from Prevotella and Franci sella 1), etc. It may be an accompanying endonuclease.
  • Cas9 protein CRISPR associated protein 9
  • Cpfl protein CRISPR from Prevotella and Franci sella 1
  • Cas9 protein was found in Streptococcus sp. (Streptococcus sp.), For example
  • Cpfl protein is as described above (see, eg, Table 1).
  • Endonuclease such as Cas9 protein, Cpfl, etc. are isolated from microorganisms or produced non-naturally by recombinant or synthetic methods (non-natural ly occurring).
  • the endonuclease is an element commonly used for nuclear transfer of eukaryotic cells (eg, nuclear localization signal (NLS; for example, PKKKRKV, K PAATKKAGQAKKKK, or a nucleic acid molecule encoding the same), etc.) It may further include, but is not limited to, the terminal or C-terminus (or the 5 'end or 3' end of the nucleic acid molecule encoding it). remind
  • Endonuclease proteins may be used in the form of purified proteins, or in the form of DNAs encoding them, or recombinant vectors comprising said DNA.
  • the guide RNA may be appropriately selected depending on the type of endonuclease and / or the microorganism derived from the endonuclease.
  • the guide RNA may be at least one selected from the group consisting of CRISPR RNA (crR A), ia ⁇ act ivat ing crRNA (tracrRNA), and single-stranded guide RNA (sgRNA), and according to the endonnucleotide type, CRISPR RNA (crRNA) alone, CRISPR RNA (crRNA) and ra ⁇ activating crRNA
  • tracrRNA single stranded guide RNA
  • siRNA single stranded guide RNA
  • a complex comprising a Cas9 protein may be used to provide two guide RNAs, namely CRISPR RNA (crRNA), which has a nucleotide sequence that can be localized to a target site of a gene, and additional ra? Activating crRNA (tracrRNA) is required, and these crRNA and tracrRNA are used in the form of double stranded crRNA: tracrRNA complexes linked to each other, or in the form of single-stranded guide RNA (sgRNA) linked through a linker.
  • CRISPR RNA CRISPR RNA
  • tracrRNA Activating crRNA
  • the specific sequence of the guide R A may be appropriately selected according to the type of Cas9 protein or Cpfl protein (derived microorganism), which is easily understood by those skilled in the art.
  • the crRNA used in the Cas9 system including the Cas9 protein derived from Streptococcus pyogenes, can be represented by the following general formula:
  • N cas9 comprises a nucleotide sequence that can be localized with a gene target site
  • a targeting sequence site is a site determined according to a target site of a target gene, and 1 represents the number of nucleotides included in the targeting sequence site, and may be an integer of 18 to 22, such as 20;
  • the site comprising 12 consecutive nucleotides (GUUUUAGAGCUA) located adjacent to the 3 'direction of the targeting sequence site is an essential part of the crRNA
  • X cas9 is a site comprising m nucleotides located on the 3 1 side of the crRNA (ie, located adjacent to the 3 ′ direction of the essential part of the crRNA), m may be an integer from 8 to 12, such as 10
  • the m nucleotides may be the same as or different from each other, and may be independently selected from the group consisting of A, U, C, and G.
  • the X cas9 may include but is not limited to UGCUGUUUUG.
  • tracrRNA used in the Cas9 system including the Cas9 protein derived from Streptococcus pyogenes, can be represented by the following general formula:
  • Y cas9 is a site containing p nucleotides located adjacent to the 5 'end of the essential part of the t racrRNA, p may be an integer of 6 to 20, for example, 8 to 19, wherein the P nucleotides are May be the same or different and may be independently selected from the group consisting of A, U, C and G, respectively.
  • the sgRNA used in the Cas9 system including the Cas9 protein derived from Streptococcus pyogenes is a nucleotide linker with a crRNA site including a target sequence site and an essential site of the crRNA of the Cas9 and a tracrRNA site including an essential site of the Cas9 and t racrRNA. It may be to form a hairpin structure through.
  • the sgRNA is a target sequence region and essential region of the crRNA
  • the sgRNA is a target sequence region and essential region of the crRNA
  • the 3 'end of the crRNA site and the 5' end of the tracrRNA site are connected to each other through a nucleotide linker. It may be to have.
  • the targeting sequence site and the essential site of the crRNA and the essential site of the tracrRNA are as described above.
  • the nucleotide linker included in the sgRNA may be one containing three to five, for example four nucleotides,
  • the nucleotides may be the same or different from each other, and may be independently selected from the group consisting of A, U, C, and G.
  • the linker is a 'GAAA'
  • the sgRNA may be represented by the following general formula 2:
  • N cas9 is a targeting sequence site including a nucleotide sequence capable of hybridization with a gene target site, and is a site determined according to a target site of a target gene, and m is an integer of 16 to 24 as representing the number of nucleotides included in the targeting sequence site. Or an integer from 18 to 22;
  • the linker may be one containing three to five, such as four nucleotides,
  • the nucleotides included in the targeting sequence site and the linker may be the same or different from each other, and may be independently selected from the group consisting of A, U, C, and G. For example, 'GAAA 1 .
  • the crRNA (eg, represented by Formula 2) or sgRNA (eg, represented by Formula 4) of the Cas9 protein is 1 to 3 guanine (G) at the 5 ′ end (ie, 5 ′ end of the targeting sequence region of the crRNA). ) May be further included.
  • the tracrRNA or sgRNA of the Cas9 protein may further comprise a termination region comprising 5 to 7 uracils (U) at the 3 ′ end of the essential portion (60nt) of the tracrRNA.
  • the crRNA used herein is as described above (see Formula 1 and Table 2).
  • Another example provides a therapeutic use of eye diseases for crRNA targeting the Cpfl protein and Hifl-alpha gene.
  • Hifl-alpha (Hypoxia-inducible factor 1-alpha) is a subunit of hypoxia-inducible factor 1 (HIF-1), a heterodimer transcription factor, encoded by the HIF1A gene.
  • the Hifl—alpha may be a mammal, such as a human Hifl-alpha, NCBI accession no. NP_001230013.1, NP_001521.1, P_851397.1,
  • the HIF1A gene can be a mammal, such as a human HIF1A gene, and NCBI accession no. It may be expressed as, but is not limited to: __181054.1, ⁇ _001243084.1, NM_001530.1, and the like. Specifically, one example
  • nucleotide sequence target sequence
  • hybridizable nucleotide sequence hybridizable nucleotide sequence
  • It provides a pharmaceutical composition for the prevention or treatment of eye diseases, including.
  • Another example is
  • nucleotide sequence target sequence
  • hybridizable nucleotide sequence hybridizable nucleotide sequence
  • It provides a method for preventing or treating eye diseases, comprising the step of administering to a subject in need of the prevention or treatment of eye diseases.
  • the Cpfl and crRNA are as described above.
  • a recombinant vector including DNA encoding the Cpfl protein and DNA encoding the crRNA in a separate vector or together in a single vector may be included or administered.
  • a vector of the kind described above can be used, for example, adeno-associated virus (MV) can be used.
  • the crRNA may include a nucleotide sequence that is capable of hybridizing with a sequence selected from a target sequence of the Hifl-a gene of SEQ ID NO: 69 to SEQ ID NO: 79.
  • the ocular disease may be diabetic retinopathy or senile macular degeneration.
  • a recombinant vector comprising the Cpfl protein or DNA encoding the same, and
  • the complex or ribonucleic acid protein comprising a recombinant RNA comprising a crRNA comprising a target sequence of 15 to 30 nt of the target region of the Hifl-alpha gene and a nucleotide sequence which is capable of hybridization or a DNA encoding the same is administered intravenously or as a lesion.
  • Topical administration eg retinal injection (eg, subretinal injection or
  • the subject may be a mammal, such as a human, a mouse.
  • the present invention can be used to more effectively perform genome correction in eukaryotic cells (e.g., mammalian cells such as humans, mice, and eukaryotic plant cells) using the Cpfl system, and knock-out or knock-in traits of desired genes. Converting cells and / or transgenic animals / plants can be prepared.
  • eukaryotic organisms are delivered to ribonucleic acid proteins including RNA guide endonucleases and guide RNAs
  • ribonucleic acid proteins can be delivered to eukaryotic organisms more efficiently by employing electroporation rather than microinjection.
  • FIG. 1 schematically shows the process of delivering RNPs containing recombinant AsCpfl and crRNA to the mouse blastocyst by microinjection.
  • Figure 2 shows the results confirmed that there is a nucleotide sequence mutation in the blastocyst through the T7E1 experiment.
  • Figure 3 shows the results confirmed by targeted deep sequencing the Cpfl RNP genome correction, it was confirmed that there is a specific variation in the sequence position where Cpfl is expected to cause genome cleavage.
  • FIG. 5 shows the results of confirming the mutated nucleotide sequences by targeted deep sequencing
  • FIG. 6 shows genome wide sequencing of tail gDNAs in nonspecific positions
  • 7 to 10 are related to genome correction in mouse embryos by delivering SpCas9 and AsCpf l RNP to the electroporat ion,
  • FIG. 7 is a diagram schematically illustrating a process of combining SpCas9 / AsCpf l and sgRNA / crRNA and delivering them through el ectroporat i on a plurality of mouse embryos.
  • Figure 9 shows the sequence mutations made with SpCas9 RNP elect roporat ion
  • FIG. 11 is a schematic diagram showing a method for genome correction by AsCpf l and LbCpf l recombinant proteins of homologous FAD2 genes in soybean protoplasts.
  • FIG. 12 shows the results of genome calibration efficiency using AsCpf l and LbCpf l
  • FIG. 13 shows the results of confirming specific sequence variation through targeted deep sequencing.
  • 14a and 14b show the results of cell genome calibration and efficiency comparison using Pl asmid U6-crRNA and PCR product U6-crRNA.
  • 14a is an electrophoresis photograph showing the results of comparison of cellular genome calibration efficiency with plasmid U6_crRNA and PCR product U6-crRNA using T7E1 assay.
  • FIG. 14b is a graph showing the results of quantitative analysis of cellular genome calibration efficiency using the targeted-deprint sequencing method.
  • Figure 15a and 15b shows the results of in vitro cleavage assay for purification and activity of recombinant Cpfl protein,
  • 15b is the result of cleavage of target DNA using purified recombinant Cpfl protein and in vitro transcript ion (T7) or synthetic crRNA and electrophoresis with TBE-agarose gel.
  • 16a to 16c show the results of cellular genome calibration through RNP consisting of recombinant Cpfl and crRNA
  • 16a is an electrophoresis photograph confirmed by T7E1 assay for cellular genome correction by RNP delivery consisting of As- / Lb-Cpfl and crRNA,
  • 16b is a graph showing the results of measuring and quantifying the cellular genome calibration efficiency of Cpfl RNP by targeted deep-sequencing.
  • 16c is an electrophoresis photograph of the cellular genome calibration using synthetic crRNA with T7E1, comparing the efficiency with that of the in vitro transcript ion.
  • 17a to 17c show in vitro cleavage and Digenome-seq results of the cell genome using Cpfl and crRNA
  • 17 a is a schematic diagram of qPCR and Digenome-seq through in vitro cellular genome cleavage using Cpfl protein and crRNA,
  • 17b is a graph showing the results of quantifying qPCR of the remaining target site genome after cleavage with Lb- / As-cpfl protein (3nM_300nM) and crRNA (9nM-900nM) in the cell genome.
  • 17c shows the results of IGV comparing sequence reads near the target site by sequencing the whole genome of the cell genome before and after in vitro cleavage.
  • 18a shows the genomic position and gene sequence of the non-target candidate detected by Digenome-seq
  • 18b is indicated by the sequence logo of the conserved sequence of non-target candidate positions.
  • Figure 19a is an electrophoresis picture showing the results of comparing the cell genome calibration efficiency when using plasmid crRNA and PCR product crRNA by T7E1 assay.
  • 19B is a graph showing Indel f requencies (%) measured by targeted deep sequencing method using crRNA for each of the four Cpfl orthologs (Error bars indicate s.e.m).
  • 19C shows the frequency of mutations induced by LbCpfl, AsCpfl, and SpCas9 at ten endogenous target sites in HEK293T cells.
  • 20a to 20c are measured by targeted deep sequencing of the indel frequency (%) when using the crRNA for the on target in the HEK293T cells and the crRNA for the sequence having one or two mismatched nucleotides with the on target, the Cpfl of Showing specificity,
  • 20a is a graph showing the result for / 3 ⁇ 4 7 -3
  • 20b is a graph showing the result for D ⁇ T1-4, _ ⁇
  • 20c is a graph showing the results for MVS1 (Error bars indicate s.e.m).
  • 21a to 21f show the results of measuring the genome-wide target specificity of Cpfl and Cas9 nuclease by Digenome_seq method
  • 21a and 21b are Genome-wide Circos plots showing DNA cleavage scores obtained by whole-genome sequencing and Digenome sequencing.
  • Original genomic DNA is shown in red, and LbCpfl cleaved genomic DNA is green and AsCpfl.
  • Genomic DNA is blue and genomic DNA cleaved with SpCas9 is shown in yellow, with an asterisk indicating one false-positive site found in the original genomic DNA, arrows indicating on-target sites, and sequence logos.
  • 21c shows homologous sites and Fractions (left Y-axis, square marks are the result for AsCpf 1) captured by Digenome ⁇ seq.
  • 21d is a graph showing off-target sites identified in human cells by targeted deep sequencing, including DNA sequences of on-target and off-target sites (bold letters are PAM sequences and Mismatched nucleotides are shown in lowercase letters). ,
  • 21e was designed using crRNA redesigned to localize to the off-target site.
  • 21f is a graph showing the Cpfl off-target effect when using plasmids encoding Cpfl and crRNA and when using RNPs in which Cpfl and crRNA are complexed. Specificity ratio is the off-target indel frequency obtained using Cpfl RNP. The ratio of on—target indel frequency to
  • RNA / plasmid The fold difference (RNA / plasmid) between the ratios with the full lasmid is shown.
  • 22A to 22F show the sequence logos of the Cpf 1-mediated DNA genome-c ap ured site, the upper part is sequence logos of Di genome-captured sites obtained using AsCpfl, and the lower part is obtained using LbCpfl. Sequence logos from the Digenome-captured site.
  • 24A to 24F are graphs showing the Indel frequency at the Digenome-captured site in HEK293T17 cells, the dark bar being the LbCpfl plasmid
  • FIG. 25 is a graph showing indel frequencies at the on-target and off-target sites when truncated truncated crRNAs (tru-crRNAs) and full-length crRNAs are used at the 3 'end (FIG. Error bars represent mean ⁇ sem). 26a to 26e show that Cpfl orthologs exhibit different overhang patterns and variation characteristics,
  • 26a is a representative Integrative Genomics Viewer (IGV) image showing overhang patterns at the DNTMl-?> Target site and DNTMl-target site
  • 26b is a graph showing the number of mutant sequence reads binned by deletion / insertion size in base pairs.
  • 26c shows the mutant sequence derived from the target site of Cpfl or Cas9, for each nuclease, the sequence of the second row is the original target sequence, from the second bladder shows the mutated sequence,
  • the PAM sequence Cpfl: TTTC
  • the target sequence where crRNA / sgR A is active is underlined
  • the underlined sequence in the sequence from the second line represents the microhomology sequences.
  • the number on the right means the number of nucleotides deleted (indicated by '-') or inserted (in lowercase),
  • 26d and 26e show mutation characteristics induced by LbCpfl, AsCpfl, and SpCas9, and 26d shows the deletion vs. deletion sequence.
  • 27A and 27B show variation characteristics induced by LbCpfl, AsCpfl, and SpCas9.
  • 27a is a graph showing the number of variant sequence reads binned by deletion / insertion (Indel) size in base pairs, and the mutation characteristics were targeted deep sequencing from HEK293T cells transfected with LbCpfl, AsCpfl, or SpCas9 poles. Measured,
  • 27b shows a variant sequence derived from the EMX1-2 target site (CTGATGGTCCATGTCTGTTACTC; SEQ ID NO: 42), for each nuclease, the first row of the sequence is the original target site sequence and the second is the mutation PAM sequence (Cpfl: TTTG) is shown in bold in the first line sequence, and the target sequence where crRNA / sgRNA is popularized is underlined.
  • the underlined sequence in the sequence from the line refers to the Microhomology sequences, and the numbers shown on the right are deleted (indicated by '-') or inserted (in lowercase)
  • Figure 28 schematically shows the Di genome—Sequencing process.
  • 29A and 29B show recombinant vector constructs expressing Cpfl protein separated from the split position of Cpfl protein.
  • 29a is Wild type Acidaminococcus sp. Cpfl (AsCpfl) protein and four types of Split-Cpfl information,
  • 29b schematically shows a recombinant vector expressing each half domain of Split-Cpfl.
  • 30a is an agarose gel assay that shows the results of DNMT1-target genome calibration using Split-Cpfl by T7E1 assay. Stars indicate the location of the DNA fragment cut by the T7E1 enzyme,
  • 30b is a graph comparing the results of quantification of genome calibration efficiency according to the split position by the targeted deep-sequencing method.
  • 30c is a graph comparing the results obtained by quantifying the split—Cpfl genome calibration efficiency according to the target position by the targeted deep sequencing method.
  • 31a to 31e show the results of analyzing the induction genome calibration efficiency using the binding control of each half domain of Split Cpfl,
  • 31a schematically shows a recombinant vector construct that expresses each half-domain of Inducible-Split-Cpfl.
  • 31b shows the results of the targeted deep-sequencing method of split-Cpfl and Inducible—Spl it—Cpfl using Rapamycin treatment.
  • 31c to 31f show the results of analyzing the inducible genome correction efficiency by Inducible-Split-Cpfl according to the target position by the targeted deep-sequencing method.
  • 32A and 32B show a process of constructing a viral vector expressing each half domain of Split Cpfl.
  • 32a schematically shows the composition of an AAV, viral vector expressing each half-domain of Spl it— Cpfl (Spl i t-3-AsCpfl),
  • 32b shows the results of confirming Z3 ⁇ 4W77-3 target genome calibration efficiency using MV-Spl i t-Cpfl vector by T7E1 assay.
  • Figure 33 shows the nucleotide sequence of the pU6-As-crRNA plasmid, with the underlined portion corresponding to AsCpf l crRNA.
  • Figure 34 shows the nucleotide sequence of the pU6-Lb-crRNA plasmid, with the underlined portion corresponding to the LbCpfl crRNA.
  • Figure 35 shows the nucleotide sequence of the U6-As-crRNA-ampl icon, with the underlined portion corresponding to AsCpfl crRNA.
  • Figure 36 shows the nucleotide sequence of the U6-Lb-crRNA-ampl icon, the underlined portion is the site corresponding to LbCpfl crRNA.
  • FIG. 37 is a graph showing the results of deep sequencing analysis using the Indel frequency (%) obtained by transferring the target sequence of the LbCpfl protein and Hi fl-a gene and the hybridizable crRNA to 293T cells through the MV vector.
  • MV vector 38 exemplarily shows a recombinant MV vector (al 1-in-one AAV vector) comprising DNA encoding LbCpfl protein and DNA encoding hi-fl-a targeting LRNA of crb-TS6 ol in one vector. It is a schematic diagram showing.
  • 39A-39C show the nucleotide sequence of a recombinant MV vector comprising a DNA encoding the LbCpfl protein and a DNA encoding a crRNA targeting Lb-TS6 of Hi fl-a in a 5 1 to 3 'direction. Shows.
  • Example 1 Production and Purification of Recombinant Cpfl Protein E. coli codon optimized DNA sequence of AsCpfl and LbCpfl (SEQ ID NO: 44: E. coli codon optimized AsCpfl coding nucleic acid; SEQ ID NO: 46: E. coli codon optimized LbCpfl coding nucleic acid), and nuclear localization sequence (NLS) -Sequence for expression and purification of protein comprising (linker) -HA tag (Amino acid sequence: (KRPAATKKAGQAKKKK)-(GS)-(YPYDVPDYA)-(YPYDVPDYA YPYDVPDYA); DNA sequence:
  • CGCTTATCCCTACGACGT CCGCATAT (ATACCCATATGATGTCCC ⁇ having pi asmi d
  • Protein purification was performed by the following procedure: lysis buffer (50 mM, HEPES pH 7, 200 mM NaCl, 5 mM MgC12) supplemented with the prepared cell pellet lysozyme (Sigma) and protease inhibitor (Roche complete, EDTA-free) , ImM DTT, 10 mM imidazole) was added to 50 ml and dissolved by sonication. Obtained cells
  • Lysates (cell lysate) were centrifuged at 16,000 g for 30 minutes and then passed through a syringe filter (0.22 micron). The obtained lysate was applied to a nickel column (Ni-NTA agarose, Qiagen), washed with 2M salt, and eluted with 250 mM imidazole. The buffer of the eluted protein solution was replaced and concentrated using lysis buffer containing no magnesium and imidazole. The purified Cpfl protein was tested by SDS-PAGE and used in the examples below. In the following examples, when using human cells, the E. coli codon optimized Cpfl protein
  • the encoding plasmid was obtained from Addgene and used.
  • HEK293T cells were treated with 10% (v / v) FBS (fetal bovine serum) and 1% ( ⁇ / ⁇ )
  • HEK293T cells were seeded in 24-well plates at 70-80% confluency, followed by Cpfl expression plasmid (500 ng) and cr ⁇ using 1 ipofectamine 2000 (Invitrogen).
  • Plasmid 500 ng was transfected into the HEK293T cells.
  • Genome DNA was purified from HeLa cells (ATCC) using a DNeasy Tissue kit (Qiagen).
  • Cpfl protein 40 ug
  • crRNA 2.7 ug each
  • the purified genomic DNA 8 ug
  • the reaction buffer 100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl 2 ( 100 ug / ml BSA, pH 7.9)
  • the digested genomic DNA thus obtained was treated with R ase A (50 ug / mL) to digest the crRNA and once again purified using the DNeasy Tissue kit (Qiagen).
  • WGS Whole genome sequencing
  • the cleavage score can be calculated for each nucleotide position over the entire genome.
  • Cleavage Score at position / in the chromosome was calculated by the following formula (see Figure 28):
  • a pUC19 vector (Addgene; As-crRNA plasmid (SEQ ID NOs 65 and 33) or Lb-crR) comprising a DNA sequence encoding a crRNA and a U6 promoter operably linked thereto A plasmid (SEQ ID NOs 66 and 34)) or PCR product (amp ⁇ con; As-crRNA amp 1 icon (SEQ ID NOs 67 and 35) or Lb-crRNA amp 1 icon (SEQ ID NOs 68 and 36)) Together to HEK293T / 17 cells.
  • As-crRNA amp 1 icon SEQ ID NOs 67 and 35
  • Lb-crRNA amp 1 icon SEQ ID NOs 68 and 36
  • the underlined portion is a gene region encoding the crRNA
  • 'NNNNNNNNNNNNNNNNNNNNNNN' is a portion determined according to the target sequence. Delivery of the DNA encoding the Cpfl protein and crRNA were all performed by lipofection. The cell delivery conditions described above are summarized in Table 3 below: Table 3
  • the crRNA sequences and target sequences used above are also summarized in Table 4 below.
  • All AsCpfl crRNAs described below have a sequence corresponding to the target sequence of SEQ ID NO: 36 shown in Table 4 (underlined) corresponding to the target sequence of the target gene (ie, replacing T with U in the target sequence).
  • All LbCpfl crR As described below replace the targeting sequence site of SEQ ID NO: 37 shown in Table 4 (marked with underlines) corresponding to the sequence corresponding to the target sequence of the target gene (ie, replacing T with U in the target sequence). )
  • T7E1 assay T7E1 7 Endonuc lease I after specific partial PCR amplification in dielectric DNA
  • electrophoresis targeted deep-sequencing (amplified the target portion of the target gene by PCR, and then amplified it again with a PCR barcode primer for deep-sequencing and purified using a DNA purification kit.
  • the target mutagenesis frequency Indel frequencies;%) generated in the target DNA was calculated, and the results are shown in FIG. 14A (T7E1 assay result) and FIG. 14B (targeted deep-sequencing result).
  • FIG. 19A T7E1 assay results), respectively.
  • crRNA plasmids increased the target mutagenesis frequency by 2 to 30 times at the three endogenous target sites tested.
  • PCR amplicons produced false guide RNAs transcripts from synthesis fai led oligonucleotide templates, which are thought to cause off-target DNA cleavages at locations that potentially appear to have RNA bulges.
  • plasmids containing DNA encoding the four types of Cpfl or t ho logs were assigned to each of them.
  • the targeted mutagenes is frequency (Indel frequency (%)) was measured by the targeted deep sequencing method.
  • the crRNAs of Dlli T1-4 and MVS1 are sequences corresponding to the target sequence of the target gene (ie, the target sequence) in the sequence of SEQ ID NO. 38 or SEQ ID NO. Replacing T with U)
  • the obtained Indel frequency (%) is shown in FIG. 19B.
  • LbCpfl and AsCpfl recognize 5'- ⁇ -3 'PAMs, whereas FnCpfl and
  • MbCpfl recognizes 5'-TTN-3 'PAMs, which are known to be inefficient or inactive in human cells. As shown in FIG. 19B, when these Cpfl orthologs are co-transfected into human cells in various combinations with plasmids encoding crRNA orthologs, each Cpfl ortholog is cognate with cognate crRNA. The highest efficiency was shown when transfected. In addition, all four Cpfl orthologs, including FnCpfl and MbCpfl, are different.
  • HEK293T containing two PAM sequences (one is PAM sequence recognized by Cpfl (5'- ⁇ - 3 1 ) and the other is PAM sequence recognized by SpCas9 (5'-NGG-3 ') LbCpfl on 10 chromosome target sites in the cell and
  • Dielectric calibration efficiency of AsCpfl was measured and compared with SpCas9.
  • the genome calibration efficiency was calculated as Indel frequencies measured by targeted deep sequencing with reference to the method described above.
  • Ten target sequences used in the test are shown in Table 6 below:
  • VEGFA CGTCCMCTCTGGGCTGTTCTCTC SEQ ID NO: AGCGAGAACAGCCCAGAAGT (SEQ ID NO:
  • 'SpCas9 sgRNA replaces' (N cas9 ) ra ' in the following general formula (SEQ ID NO: 63) with a sequence substituted with T in the target sequence of SpCas9 in Table 6 above, and 'G A' as a linker.
  • SEQ ID NO: 63 a sequence substituted with T in the target sequence of SpCas9 in Table 6 above, and 'G A' as a linker.
  • FIG. 15B 07 crRNA prepared by in vitro transcript ion by T7 RNA polymerase; synthetic: chemically synthesized crRNA.
  • Cpfl showed the activity to cut the target DNA only in the presence of crRNA.
  • the cleavage efficiency of the synthetic crRNA having no phosphate at the 5 'end and the crRNA prepared from the in vitro transcript ion having the phosphate at the 5' end was similar, indicating that the presence or absence of phosphate at the 5 'end of the crRNA was in vitro cleavage. It does not affect.
  • Cpfl protein AsCpfl or LbCpfl
  • Dlia T1-3 target crRNA see Table 4; crRNA prepared from in vitro transcript ion
  • HEK293T / 17 cells were treated (delivered) by the method (Cpfl 20 ug: crRNA 20 ug for electroporation, Cpfl 10 ug: crRNA 2 ug for lipofection). After RNP delivery, the cells were incubated at 37 ° C.
  • the genomic DNA isolated from the cells was transformed into recombinant Cpf l protein (3nM-300nM) and crRNA (9nM-900nM; sequences 1 to 8 of SEQ ID NO: 6 (SEQ ID NOs: 19, 20, 21, 23, 24, 25, 27). , And 28) incubate for 12 hours with crRNA for each)
  • FIG. 17B The numerical value of the y-axis of FIG. 17B means the relative ratio of the uncut
  • FIG. 17B in the case of 3nM Lb- / As-cpfl protein and 9nM crRNA, the genome of the on-target siete was cut by 60%, ⁇ 30nM As- / Lb-Cpfl protein and 90nM crRNA and When using 300nM Lb- / As-cpfl protein and 900nM crRNA
  • Results confirmed using the Viewer (IGV) are shown in FIG. 17C.
  • FIG. 17C As shown in FIG. 17C, in the genome treated with the Cpfl protein and the crRNA, the 5 'end of the reads was vertically aligned at the target position, whereas in the genome not treated with the Cpfl protein and the crRNA, the sequence reads were aligned at the target position. There was no tendency to
  • inverted-PAM sequences existed on the opposite side in addition to the known CPF l PAM sequence (TTTN).
  • Inverted-PAM is not only AAA but also AAG, AGA, GAA
  • LbCpfl and AsCpfl both match the 5'- ⁇ — 3 '( ⁇ is A, T, C, or G) PAM sequence and 23-nt protospacer sequences located adjacent to the 3' direction (ie The targeting sequence of cr RA recognizes and cleaves a 27-nt target DNA sequence consisting of T to U in the protospacer sequence.
  • Three endogenous target sites (DVII-III-3, DNMT1-4, and MVS1) are selected (on target) and hybridized with off-target sequences comprising on target sequences and one or two mismatches of the target sites.
  • DVII-III-3, DNMT1-4, and MVS1 Three endogenous target sites (DVII-III-3, DNMT1-4, and MVS1) are selected (on target) and hybridized with off-target sequences comprising on target sequences and one or two mismatches of the target sites.
  • the selected three endogenous target sites (on target) are shown in Table 7 below:
  • the off-target sequences of the three selected endogenous target sites are shown in FIGS. 20A, 20B and 20C, respectively.
  • LbCpil crRNA and AsCpfl crRNA were prepared and used for the test by the method described in Table 4.
  • the obtained Indel frequency (%) is shown in Fig. 20a (Indel frequency of DVII-3), 20b (Indel frequency of DVIII-4) and 20c (Indel frequency of AAVS1) (Error bars indicate s). e .m).
  • LbCpf l and AsCpf l both contain one mi smatch (particularly PAM (5 Cpfl activity was hardly exhibited even when the distance from the distal end) was within 20 nt) and almost completely lost when two mismatches were included (especially when the distance from the PAM was within 20 nt). .
  • Cas-OFFinder was used to identify potential off-target sites in the human genome.
  • the sites on which the 10 on-target sites (Table 6) and 1 to 4 or 1 to 5 nucleotides differed were selected as potential off-target sites, and of f-target mutat ion in HEK293 cells ( Indel frequency (3 ⁇ 4)) was measured by targeted deep sequencing.
  • DNMTl- 960 504 TTTCCTGATGGTCCATactTG 0.01 0.00 0.00 3 ⁇ 11 Chr3 81 TTgCaC 5%%% X X
  • VEGFA-2 VEGFA-2
  • VEGFA- Chr 10405 144 T TACt aCCAACTTCTt t GCTGTT 0.022 0.025 0.026 02_02 X 3 CTC 4%%%
  • the lowercase alphabet represents the mismatch position
  • 'Mis-No.' Means the number of mismatches
  • '(-) Cpf' means the case without adding Cpfl
  • 'As 1 and 'Lb' means 'AsCpfl' and 'LbCpfl', respectively.
  • 'D-Cap.' Means 'Di genome Capture'. If the cleavage score obtained by Di genome sequencing (silent HI 4) is greater than or equal to the cutoff value (2.5), it is represented by 'o' and less than Marked with 'x'.
  • Indel frequencies shown in Tables 8 to 17 were measured by a targeted deep o sequencing method.
  • Example 4 Digenome-seq using a total of eight highly efficient Cpfl (using crRNA for target sequences 1-8 in Table 6). (Example 4) was performed. Cell-free genomic DNA isolated from Hela cells using DNeasy Tissue kit (Qiagen) was digested with high concentrations (300 nM Cpfl and 900 nM crRNA) of AsCpfl and LbC fl ribonucleoproteins (RNPs) obtained by the method of Example 3 Whole genome sequencing (WGS; see Example 4). For comparison, the same test was performed using SpCas9.
  • Results obtained using AsCpfl and LbCpfl among the obtained cleavage scores are shown in FIGS. 21A (results for DNMT1-3) and 21b (results for DNMT1-4) and Tables 18 to 33.
  • Chromosome locat ion site score Bulge chr5 13135736 TTTCCTGATGGTCCAcacCTGmaca 13,20 No chr8 112204853 TTTCCTGATGGTCCAcacCTGmaga 12.38 No chrl9 10244444 mCCTGATGGTCCATGTCTGTTACTC 11.97.
  • Chromosome locat ion site score Bulge chrl9 10244367 TTTArrTCCCTOAGCTAAAATAAAGG 6.86 No
  • Chromosome location site score Bulge chr2 73160921 nTGTCCTCCGGTTCTGGMCCACACC 12.19 No chr2 177017501 TOATCCTCCGGTOTGGAACCAgAtC 8.08 No chrl7 46690720 TCATCCTCCGGTTCTGGAACCAgAt t 4.71 No chr6 134409314 TTTCTCCTCaGGTOTGGAACCAat aC 3.77 No
  • Chromosome locat ion si te score Bulge chrX 133609321 TTTGCTGACCTGCTGGAmCATCAM 4.15 No chrll 93732147 TTTGCTGACCTGCTaGATaACATCAAA 3.91 No chr5 30248701 TTTGCTcACCTGCTGGAmCATCAAA 2.91 No
  • chr2 34206860 9.31 Bulge chr4 96317122 TTTCCTTAt GATGaAGCCAGAGAaGcT 5.26 No chrl6 34823594 TTTACaTAaGATGAAaCCAGAGAGaAa 4.34 No chrl9 55626945 ITTGCmCGATGGAGCCAGAGAGGAT 2.63 No
  • Chromosome locat ion site score Bulge chrl2 17538224 TTTACTGATGGTCt t acTtTaTaggcC 15.78 No chr7 134517009 TCTCCTGATGGTCCATacCTGTTAaca 14.35 No chr5 13135739 mCCTGATGGTCCAcacCTGTTAaca 13.65 No chr9 25518292 TCTCCTGATGGTCtATaTCTGTTAaa 12.61 No chf5 39969440 TCTCCTGATGGTCCATacCTGTTAacg 12.11 No chr8 112204856 TTTCCTGATGGTCCAcacCTGmaga 12.05 No chrll 82700148 TTTACTGATGGTCt catTt aaTct tTa 11.02 No chr3 164692191 TTTCCTGATGGTCCAcacCTG TAaca 10.97 No chr4 123785685 TTTCCTGATGGTC
  • Chromosome locat i on si te score Bulge chr2 73160922 TTTGTCCTCCGG CTGGAACCACACC 7.57 No chr2 177017500 TTCATCCTCCGGTTCTGGAACCAgAtC 6.59 No chr6 134409310 TTTCTCCTCaGGTOTGGAACCAat aC 4.44 No TTCC CTCTCTCCCTCTCTCC TCC
  • Chromosome locat ion si te score Bulge chr3 46414552 nTTGTGGGCAACATGCTGGTCATCCT 18.09 No RNA
  • chr 15 58588554 5.11 Bulge chr4 110395952 TTTAGTGGGCAAaccatTt acaAaata 4.19 No chr l 72141686 T TGGTaGGtAACATGgTGGaagTCaa 4.18 No chr 15 24068708 TTTTGTGGGCAACATat a t aTaggt cT 3.81 No chrt agtg aTgt a ttag a 56t 240t
  • Chromosome location site score Bulge chrll 93732153 TTTGCTGACCTGCTaGATaACATCAAA 48.57 No chr5 30248702 TTTGCTcACCTGCTGGAmCATCAAA 27.49 No chr6 49794715 TTTCCTGACCTGCTa ATa t at cacAA 8.55 No chrX 133609322CT 6.TTGCTGACTCAT
  • Chromosome locat ion site score Bulge chrX 133620495 mATGTCCCCTGTTGACTGGTCATTC 12.93 No chrll 93732073 mATaTCCCCTGTTGACTGGTCATTa 7.92 No chr5 161040022 mATGTCCCCTcTTGcCTGGTCATaa 4.46 No Table 33
  • 22A-22F show the sequence logos of Cpf 1-mediated Digenome—captured site, the top of which is the sequence logos of Digenome-captured sites obtained using AsCpfl, and the bottom of the Digenome one captured sites obtained using LbCpfl. Sequence logos.
  • 50 and 98 in vitro cleavage cells obtained using 8 LbCpfl and AsCpf l nucleases, respectively carry mi smatches, most of which are close to PAM by 10-nt from the PAM sequence. It is located in the PAM-distal region about 13-nt away from the PAM sequence rather than the PAM-proximal region.
  • FIG. 21C The obtained result is shown in FIG. 21C.
  • Cas ⁇ OFFinder a fast and versatile algorithm that searches for potential off-target sites of Cas9 'guided endonuc leases. Bioinformatics. 2014 May 15:30 (10): 1473-5). Only 0.93 ⁇ 4> fraction of homologous sites with 5 or 6 mismatches identified were cut in vitro. Homologous regions with four or fewer mismatches are more likely to be cleaved and captured by the Di genome—seq, but these sites are rarely present in the human genome (6 ⁇ 2 such sites per crRNA).
  • LbCpfl shows higher specificity than AsCpfl or Cas9 (see FIG. 21A). ' Both LbCpfl and AsCpfl have been shown to target sites that are cleaved only at the on-target site within the human genome (see FIGS. 21B and 23). 23 shows Sequence logos of Di genome-captured sites, and Sequence logos are WebLogos using Di genome-captured sites.
  • FIG. 21D is a graph showing off-target sites identified in human cells by targeted deep sequencing, including DNA sequences of on-target and off-target sites (bold letters are PAM sequences and Mismatched nucleotides are shown in lowercase letters). ). 24A-F show Indel at Di genome-captured site in HEK293T17 cells
  • a dark bar shows the results obtained in HEK293T17 cells transfected with the LbCpfl plasmid, and the light bar shows the AsCpfl plasmid.
  • the off-target effect index was calculated as the ratio of the total sum of the indel rates of the valid off-target site to the on-target indel rate.
  • the 0TIs of LbCpfl for the two Dlli Tl sites were 0.005 and 0.012 ', respectively, and the 0TIs of AsCpfl were 0.267 and 0.024, respectively.
  • FIG. 21E is a graph showing Targeted mutagenesis (Indel frequency (%)) obtained at the AsCpfl off-target site using crRNA redesigned to localize to the off-target site.
  • Indel frequency %
  • the 0T6 region contains an atypical 5 1 — TCTN-3 ′ PAM sequence, and the crRNAs specific for 0T6 and 0 ⁇ 2 (only one nucleotide at the 3 ′ end) sites are present at the 0T6 region, respectively. Indels were induced at frequencies of 3/7% and 8.1%. These results show that genome cleavage can be performed at chromosomal target sites with Cpf monovalent atypical PAM sequences, thereby extending the scope of Cpfl-mediated genome correction.
  • Cpfl RNPs were tested by transfection into human cells. Cas9 RNPs Cpfl RNPs are immediately cleaved off the target site and degraded by proteases and RNAases inherent in the cell, reducing the off-target effect without degrading on-target effects. Indeed, Cpfl RNP did not induce indels above noise levels at some off-target sites demonstrated using plasmids (see FIG. 21F).
  • FIG. 21F is a graph showing the Cpfl off-target effect when using a plasmid encoding Cpfl and crRNA and using a RNP in which Cpfl and crRNA form a complex.
  • Specificity ratio is an off-target indel obtained using Cpfl RNP. It shows the fold difference (RNA / plasmid) between the ratio of on-target indel frequency to the ratio of (OTI) and the ratio of plasmid, and these results show the off-target effect of RNP compared with plasmid. Shows a significant decrease.
  • 0TI was lower than 0.0004 ( ⁇ 0.0004) in both the case of using AsCpfl RNP and the case of using LbCpfl RNP. These results show that these RNPs show little off-target effect.
  • Example 11 Off-target effect measurement using crRNA cleaved at 3 'end
  • Truncated crRNAs (tru-crRNAs) cut at the 3 'end were designed to cut the targeting sequence of the crRNA from the 3' end, so that the targeting sequence lengths were 22nt, 20nt, 18nt, and 16nt, respectively.
  • truncated crRNAs (tru-crRNAs) cleaved at the 3 'end are consecutively located adjacent to the 3' direction of the PAM sequence ( ⁇ '- ⁇ -) in the DNTM1- target site of SEQ ID NO: 29 (mCCTGATGGTCCATGTCTGTTACTC).
  • the targeting sequence of the crRNA is located adjacent to the 3 'direction of the PAM sequence (5'-TTTC-3') of the sequence of SEQ ID NO. doing Having a sequence of T replaced by U in consecutive 22nt, 20nt, 18nt, and 16nt sequences.
  • Each tru-crRNA and full-length crRNA full-length crRNA; with T as U-subtracted at 23 nt sequence as the targeting sequence excluding the PAM sequence at SEQ ID NO: 29), respectively, were used with AsCpfl expression plasmid using lipofectamine 2000.
  • HEK293T cells were transfected. After 72 hours, genomic DNA was isolated and targeted deep
  • Genomics Viewer can be used to easily represent overhang patterns at the cleavage site.
  • Figure 26a shows DNTMl-?> Target site (SEQ ID NO: 19) and DNTM1-4 target
  • Overhang was produced but not 2-nt overhang, whereas AsCpfl produced 2-nt to 4-nt overhang at the 5 'end of the cleavage site.
  • Cas9 produced 1-nt overhang at the blunt end or 5 'end of the cleavage site.
  • DNTM-2-2 target site SEQ ID NO: 19
  • DNTM1-4 target site as above
  • FIG. 26B is a graph showing the number of variant sequence reads binned by deletion / insertion size in base pairs.
  • FIG. Figure 26c shows a variant sequence derived at the target site of Cpfl or Cas9, for each nuclease, the first The first line shows the original target sequence, the second shows the sequence into which the mutation is introduced, and the first line shows the PAM sequence (Cpfl: TTTC) in bold and the crRNA / sgRNA is common.
  • the target sequence is underlined, the underlined sequence in the sequence from the second line refers to the Microhomology sequences, and the numbers on the right are deleted (indicated by '-') or inserted (in lowercase)
  • 27a and 27b show mutation characteristics induced by LbCpfl, AsCpfl, and SpCas9.
  • 27a is a graph showing the number of mutant sequence reads binned by deletion / insertion (Indel) size in base pairs, and mutation characteristics were determined by targeted deep sequencing method from HEK293T cells transfected with LbCpfl, AsCpfl, or SpCas9 plasmids.
  • 27b shows a variant sequence derived from the EMX1-2 target site (CTGATGGTCCATGTCTGTTACTC; SEQ ID NO: 42), for each nuclease, the sequence of the first bladder is the original target site sequence, and from the second bla PAM sequence (Cpfl: TTTG) is shown in bold in the first line sequence, underlined in the target sequence to which crRNA / sgRNA is popularized, underlined in the sequence from the second bl Marked sequences refer to Microhomology sequences, the numbers on the right of which are deleted (indicated by '-') or inserted (in lowercase)
  • FIG. 26D and FIG. 26E show mutation characteristics induced by LbCpfl, AsCpfl and SpCas9, with 26d showing the deletion vs. deletion sequence. Two kinds of inserts
  • Example 13 A genome calibration technique that delivers RNPs of Cpfl and crRNA to mouse embryos by microinjection method to generate specific sequence mutations at target sites
  • Recombinant Acidaminococcus sp. BV3L6 Cpfl (AsCpfl) protein was expressed and purified in E. coli (see Example 1), crRNAs targeting mouse genes (FoxNl) (see SEQ ID NOs: 1 to 3) were constructed and combined to create RNPs (AsCpfl protein 200 ng / ul, crRNA 100 ng / ul). crRNA was constructed by the method described in Table 4 based on the target sequences of SEQ ID NO: 2 and SEQ ID NO: 3.
  • the thus prepared NP was transferred to mouse embryos using a microinjection method (see FIG. 1), the injected embryos were cultured to the blastocyst, and the nucleotide sequence was confirmed by purifying gDNA.
  • the results of the T7E1 assay are shown in FIG. 2. As shown in Figure 2, 10 out of 12 blastocyst (83%) showed a nucleotide sequence variation (marked with an asterisk).
  • mice were born in embryos using Cpfl RNP to determine whether there was specific sequence variation and non-specific sequence variation in the individual. Purification of gDNA from the tails of these mice Targeted deep sequencing method is used to identify genome mutations at specific locations
  • recombinant AsCpfl or SpCas9 protein 100 ng / ul
  • sgRNA 500 ng / ul; prepared with reference to the description in Table 5 based on the target sequence of SEQ ID NO: 6) or crRNA (250) ng / ul; prepared based on the target sequence of SEQ ID NO: 2 or 3, referring to the description in Table 4
  • Opti-Mem Thermo
  • 50 mouse embryos were added and electroporation was performed using NEPA 21 (NEPA GENE Co. Ltd) electroporator.
  • Electroporation consists of poring pulse (225 V, 1.5 tns, interval 50 ms, 4 times, decay rate 10%, polarity +) and transfer pulse (20V, 50 ms, interval 50 ms, 5 times, decay rate 40%, polarity + / -) Used the method. I first tried SpCas9, which made RNPs from sgRNAs targeting SpCas9 and VEGFA and electroporated them into mouse embryos. This was cultured embryo to blastocyst and purified gDNA and analyzing nucleotide sequence variations in T7E1 manner as targeted deep sequencing method (see Figs.).
  • the Blastocyst analysis showed that SpCas9 was delivered in an electroporation manner, and efficient genome correction occurred. (Variation confirmed in 12 of 15 (variations observed in 12 columns except 8, 13 and 15 columns), 80% efficiency).
  • FAD2 homologous genes Glymal0g42470 and C TTTA-SEQ ID NO: 15
  • CrRNA was constructed by the method described in Table 4 based on the obtained target sequences.
  • Plant protoplasts (40 mg polyethylene glycol (PEG) solution (PEG 4000, 0.2 M manni tol and 0.1 M CaCl 2 ) in 300 ul of the same amount of solution of li G (0.4 M manni tol, 15 mM MgCl 2 )
  • Recombinant AsCpf l (also LbCpf l) protein (40 ug / 2xl0 5 protoplasts) and crRNA (80 ug / 2xl0 5 protoplasts) premixed with 2xl0 5 protopl asts (beans) were combined to deliver RNPs into plant cells (FIG. 11).
  • the delivered plant protoplasts were cultured in a W5 (2 mM MES [H 5.7], 154 mM NaCl, 125 mM CaCl 2 , 5 mM KC1) solution for 24 hours, and then gDNA was isolated to confirm that genetic modification occurred from the target gene.
  • W5 2 mM MES [H 5.7], 154 mM NaCl, 125 mM CaCl 2 , 5 mM KC1
  • Cpfl protein is more targeted than artificial nucleases
  • the present embodiment produced a Split-Cpfl system.
  • the wild type (WT) AsCpfl protein (SEQ ID NO: 43)
  • Virus packaging is carried out by transferring the expression cassette including the promoter (CMV promoter; SEQ ID NO: 64) sequence, nuclear localization signal (KRPAATKKAGQAKKKK), poly A signal, etc., necessary for protein expression of AsCpfl and intracellular nuclear transfer. Because of the limitation, we devised a method to express the AsCpfl protein into two fragments by reducing the size of the expression cassette, and designed four types of Split-AsCpfl.
  • Split-l-AsCpfl is between 901 and 902 amino acids of AsCpfl (SEQ ID NO: 43), Split-2-AsCpfl is between 886 and 887 amino acids of AsCpfl, and Split-3-AsCpfl is 399 amino acids of AsCpfl Split-4-AsCpfl was divided into two fragments by separating WT AsCpfl between the 526th and 527th amino acids of AsCpfl (see FIG. 29A).
  • HAlldNiaVA A331 (HA5SS dinssiaisi ⁇ ⁇ ⁇ idai) ioiva ⁇ ⁇ ⁇ TM
  • 3V03V033I0I3V0V3V3IW3I0IVX 3V03VIDI30VODV33VI3103DV3V0 33331W3333WW0V3IVO0WOI3 0W3W3W0I30IV0W3V0V000X3 OD33V0V3O0IVO0V0W0IVooooo0
  • GGCCG ATCGGGAAGGCCCGCAGGAAAACCGGCCGAAAATAT AAAAGCAATCCTT

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present invention relates to a genome editing composition comprising Cpf1, a genome editing method using the same, and a technique for preparing transformed eukaryotic organisms.

Description

【발명의 설명】  [Explanation of invention]
【발명의 명칭】  [Name of invention]
CPF1을 포함하는 유전체 교정용 조성물 및 그 용도 【기술분야】  Dielectric correction composition containing CPF1 and its use
Cpfl을 포함하는 유전체 교정용 조성물, 이를사용하는 유전체 교정 방법, 및 형질 전환 진핵 유기체 제작 기술에 관한 것이다.  The present invention relates to a composition for genome correction comprising Cpfl, a genome correction method using the same, and a technology for producing a transformed eukaryotic organism.
【발명의 배경이 되는 기술】 [Technique to become background of invention]
유전체 교정 (genome editing)된 동물 및 식물을 만드는 것은 긴 시간과 노력이 필요했고 표적 유전자마다 따로 제작해야 하는 reagent 들이 많아 어려운 일이었다. 최근 Cas9 단백질과 guide RNA의 결합을 통해 표적 유전자를  Generating animals and plants that were genome edited took a long time and effort, and was difficult due to the large amount of reagents that had to be prepared for each target gene. Recently, target genes are linked through the combination of Cas9 protein and guide RNA.
효과적으로 절단하는 type II CRISP -Cas9 시스템이 다양한 방법으로 폭넓게 사용되고 있다. 최근 들어서는 가장 많이 쓰이는 S. pyogenes유래의 Cas9뿐만 아니라 다른 종의 ortholog Cas9들 또한 유전자가위로써 사용하는 방법들이 개발되고 있는 추세이다. 이 기술은 기존의 mutant 제작 방법에 비해 빠르고 효율적이며 표적 유전자에 따른 guide RNA만 제작하면 되는 장점이 있다. The type II CRISP-Cas9 system, which cuts effectively, is widely used in various ways. In recent years, the most widely used S9 pyogenes-derived Cas9 as well as ortholog Cas9 of other species have been developed as a method of using the scissors. This technology is faster and more efficient than conventional mutant production methods and has the advantage of only producing guide RNA according to the target gene.
Cas9-system은 많은 장점이 있지만 제한 적인 면도 있는데 그 증 대표적인 것은 표적 DNA가 protospacer adjacent motif (PAM)이라고 불리는 서열을 가지고 있어야 한다는 점이다. S. pyogenes Cas9을 비롯하여 최근에 사용되기 시작한 다른 종류의 Cas9 단백질들은 모두 표적 서열의 3' 위치의 PAM을 인식한다. 널리 사용되는 S. pyogenes Cas9 의 경우 표적 유전자부위의 3' NGG PAM을 인식하여, 이 서열을 가지지 않는 표적에는 사용 될 수 없다는 한계가 있다. S. pyogenes Cas9와 같은 Cas9-system의 또 다른 특징은 단일 단백질에 nuclease domain을 두 개 가지고 있어 표적 DNA의 양쪽 strand를 blunt end로 절단한다는 점이다. 이런 경우 non-homologous end joining(NHEJ)를 통한 insertion 및 deletion(indel)을 통하여 유전자의 knock-out 효율은 높은데 반해, homologous recombination(HR)을 이용한 knock-in은 효율이 낮다는 한계가 있다.  The Cas9-system has a number of advantages, but one limitation is that the target DNA must have a sequence called a protospacer adjacent motif (PAM). Other recently used Cas9 proteins, including S. pyogenes Cas9, recognize PAM at the 3 'position of the target sequence. S. pyogenes Cas9, which is widely used, recognizes 3 'NGG PAM of the target gene region and has a limitation that it cannot be used for a target without this sequence. Another feature of the Cas9-system, such as S. pyogenes Cas9, is that it has two nuclease domains in a single protein, cutting both strands of the target DNA into blunt ends. In this case, gene knock-out efficiency is high through insertion and deletion (indel) through non-homologous end joining (NHEJ), whereas knock-in using homologous recombination (HR) has a low efficiency.
한편, CRISPR-Cas9 시스템을사용한유전체 교정을 위하여 embryo에 CRISPR- Cas9 ribonucleoprotein (RNP)을 microinjection 방법으로 주입하는 방법이 보고된 대체용지 (규칙 제 26조) RO/KR 바가 있다. 이 방법은 RNP를 embryo에 확실히 전달할 수 있는 방법이지만, 각각의 embryo 를 microscope를 통해 확인하면서 하나씩 처리해야 하는 단점이 있다. On the other hand, alternative papers have been reported to inject CRISPR-Cas9 ribonucleoprotein (RNP) into embryos by microinjection method for genotyping using the CRISPR-Cas9 system (Article 26). There is a bar. This method is able to deliver RNP to the embryo surely, but it has the disadvantage that each embryo must be processed one by one through the microscope.
특히 많은 수의 embryo를 순서대로 처리할 때는 긴 시간이 필요한데, 이는 embryo 가 1 cel l stage에서 유지되는 시간이 짧다는 점에서 기술적인 장애가 되고 있다. 따라서, CRISPR— Cas9 시스템의 한계를 극복하고 이를 대체할 수 있는 효율적인 유전자 교정 기술의 개발 및 이를 효과적으로 수행할 수 있는 RNP의 세포 내 전달 기술의 개발이 요구된다. In particular, when a large number of embryos are processed in order, a long time is required, which is a technical obstacle in that the embryos are kept in one cel l stage. Therefore, there is a need for the development of an efficient genetic correction technology that can overcome and replace the CRISPR—Cas9 system and the development of intracellular delivery of RNP that can effectively perform this.
【발명의 내용】 [Content of invention]
【해결하고자 하는 과제】  Problem to be solved
본 명세서에서는 type I I CRISPR-Cas9 시스템의 단점을 보완할 수 있는, Cpf l을 이용한 type V CRISPR-Cpfl 시스템을 사용하여 동물 및 식물과 같은 진핵 유기체에서 유전체를 교정하는 기술이 제공된다.  Provided herein is a technique for correcting genomes in eukaryotic organisms, such as animals and plants, using the type V CRISPR-Cpfl system using Cpf l, which can compensate for the shortcomings of the type I CRISPR-Cas9 system.
일 예는 Cpf l 단백질 또는 이를 암호화하는 DNA, 및 가이드 RNA또는 이를 암호화하는 DNA를 포함하는 복합체를 제공한다.  One example provides a complex comprising a Cpf l protein or DNA encoding the same, and a guide RNA or DNA encoding the same.
다른 예는 Cpfl 단백질 또는 이를 암호화하는 DNA, 및 가이드 RNA또는 이를 암호화하는 DNA를 포함하는 유전체 교정용 조성물을 제공한다.  Another example provides a composition for genome calibration comprising a Cpfl protein or DNA encoding the same, and a guide RNA or DNA encoding the same.
다른 예는 Cpfl 단백질 또는 이를 암호화하는 DNA, 및 가이드 RNA또는 이를 암호화하는 DNA를 이용한 유전체 교정 방법을 제공한다.  Another example provides a genome calibration method using Cpfl protein or DNA encoding the same, and guide RNA or DNA encoding the same.
상기 복합체 또는 유전체 교정용 조성물 또는 유전체 교정 방법에  In the composite or dielectric calibration composition or dielectric calibration method
포함되거나 사용되는 Cpfl 단백질 또는 이를 암호화하는 DNA, 및 가이드 RNA또는 이를 암호화하는 DNA는, Cpfl 단백질 및 가이드 RNA를 포함하는 흔합물 또는 이들이 복합체를 형성하는 리보핵산 단백질 (ribonucleioprotein; RNA) 형태로 사용되거나, Cpf l 단백질을 암호화하는 DNA, 및 가이드 RNA를 암호화하는 DNA를 별도의 백터에 각각 포함하거나 또는 하나의 백터에 함께 포함되어 사용될 수 있다. 상기 조성물 및 방법은 진핵 유기체에 적용되는 것일 수 있다. 상기 진핵 유기체는 진핵 세포 (예컨대, 효모 등의 균류, 진핵 동물 및 /또는 진핵 식물 유래 세포 (예컨대, 배아세포, 줄기세포, 체세포, 생식세포 등) 등), 진핵 동물 (예컨대, 인간, 원숭이 둥의 영장류 개, 돼지, 소, 양, 염소, 마우스, 래트 등), 및 진핵 식물 (예컨대, 녹조류 등의 조류, 옥수수, 콩, 밀, 벼 등)로 이루어진 군에서 선택된 것일 수 있다. Cpfl protein or DNA encoding and / or guide RNA or DNA encoding the same is used in the form of a complex comprising Cpfl protein and guide RNA or a ribonucleioprotein (RNA) in which they form a complex, or The DNA encoding the Cpf l protein and the DNA encoding the guide RNA may be included in separate vectors, or may be used together in one vector. The compositions and methods may be applied to eukaryotic organisms. The eukaryotic organism may be eukaryotic cells (e.g., yeast, eukaryotic and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g., humans, monkeys) In primates, pigs, cattle, sheep, goats, mice, rats), and eukaryotic plants (eg algae, green algae, corn, soybeans, wheat, rice, etc.) It may be selected.
다른 예는 Cpf l 단백질 또는 이를 암호화하는 DNA 및 가이드 RNA또는 이를 암호화하는 DNA를 이용한유전체 교정에 의한 형질 전환 유기체의 제조 방법을 제공한다.  Another example provides a method for producing a transformed organism by genotyping with a Cpf l protein or DNA encoding the same and guide RNA or DNA encoding the same.
다른 예는 상기 형질 전환 유기체의 제조 방법에 의하여 제조된 형질 전환 유기체를 제공한다. 상기 형질전환 유기체는 모든 진핵 세포 (예컨대, 효모 등의 균류, 진핵 동물 및 /또는 진핵 식물 유래 세포 (예컨대, 배아세포, 줄기세포, 체세포, 생식세포 등) 등) , 진핵 동물 (예컨대, 인간, 원숭이 등의 영장류, 개, 돼지, 소, 양, 염소, 마우스, 래트 등), 및 진핵 식물 (예컨대 녹조류 등의 조류ᅳ 옥수수, 콩, 밀, 벼 등)로 이루어진 군에서 선택된 것일 수 있다.  Another example provides a transformed organism prepared by the method for producing the transformed organism. Such transgenic organisms include all eukaryotic cells (e.g., yeasts, eukaryotic and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g., humans, It may be selected from the group consisting of primates such as monkeys, dogs, pigs, cattle, sheep, goats, mice, rats, etc., and eukaryotic plants (eg, algae such as green algae, soybeans, wheat, rice, etc.).
다른 예는 RNA 가이드 엔도뉴클레아제 (RNA-guided endonuc lease; RGEN) 또는 이를 암호화하는 DNA 및 가이드腿또는 이를 암호화하는 DNA를 포함하는 복합체를 유기체에 전달하는 방법에 있어서, 국소주입법 (예컨대, 병변 또는 표적 부위 직접 주입), 미세주입법 , 전기천공법 (electroporat ion) , 또는 리포펙션 등을 사용하는 것을 특징으로 하는 방법을 제공한다.  Another example is a method of delivering an RNA guided endonuclease (RGEN) or a complex comprising a DNA encoding the same and a DNA or a DNA encoding the same to an organism, the method of local injection (eg, a lesion). Or direct injection of a target site), microinjection method, electroporat ion, lipofection, or the like.
【과제의 해결 수단】 [Measures of problem]
본 명세서에서는 type I I CRISPR-Cas9 시스템의 단점 한계점들을 극복하기 위한 방법 중 하나로 type V CRISPR' system 단백질인 Cpfl올 사용하는 기술이 제공된다. In the present specification, as a method for overcoming the disadvantages of the type II CRISPR-Cas9 system, a technique of using Cpfl, which is a type V CRISPR ' system protein, is provided.
Cpfl은 type V CRISPR 시스템 단백질로서 단일 단백질이 crRNA과 결합하여 표적 유전자를 절단한다는 점은 type I I CRISPR시스템 단백질인 Cas9과  Cpfl is a type V CRISPR system protein. A single protein binds to crRNA and cleaves the target gene.
유사하지만 그 작동 방식에는 차이가 크다. 특히 Cpfl 단백질은 하나의 crRNA로 작동하기 때문에 Cas9의 경우와 같이 crRNA와 trans— act ivat ing crRNA Similar, but differs in how it works. In particular, because the Cpfl protein acts as a single crRNA, it is the same as that of Cas9 and trans-act ivat ing crRNA.
(tracrRNA)를 동시에 사용하거나 인위적으로 tracrRNA와 crRNA를 합친 single guide RNA (sgRNA)를 제작할 필요가 없다. 또한 Cpfl 시스템은 Cas9과 다르게 PAM이 표적 서열의 5' 위치에 존재하고, 표적을 결정하는 guide RNA 의 길이도 Cas9 에 비해 짧다. 이러한 특징을 활용하면, Cpfl은 Cas9이 사용될 수 없는 표적 염기서열에도 유전체 교정이 가능하고, 가이드 RNA인 crRNA를 제작하는 Cas9와 비교하여 것도 상대적으로 쉽다는 이점을 갖는다. 또한, Cpf l은 표적 DNA가 절단된 위치에 blunt-end가 아닌 5' overhang (st icky end)이 발생시키므로, 보다 정확하고 다양한 유전자 교정이 가능하다는 이점올 갖는다. There is no need to use (tracrRNA) simultaneously or to create single guide RNA (sgRNA) that combines tracrRNA and crRNA artificially. In addition, unlike the Cas9, the Cpfl system has a PAM at the 5 'position of the target sequence, and the length of the guide RNA that determines the target is shorter than that of the Cas9. Utilizing this feature, Cpfl has the advantage that the genome can be corrected even in the target sequence where Cas9 cannot be used, and it is relatively easy as compared with Cas9 for producing the guide RNA crRNA. In addition, Cpf l indicates that the target DNA Since the 5 'overhang (st icky end) is generated instead of the blunt-end at the cut position, more accurate and diverse genetic correction is possible.
본 명세서에서는 Cpfl 시스템을 이용한 보다 편리하면서 정확하고  In this specification, more convenient and accurate using the Cpfl system
효과적으로 표적 유전체를 교정하는 기술이 제공된다. Techniques for effectively correcting a target genome are provided.
본 명세서에서, 용어 '유전체 교정 (genome edi t ing) 1은, 특별한 언급이 없는 한, 표적 유전자의 표적 부위에서의 절단에 의한 핵산 분자 (하나 이상, 예컨대, 1-100, OOObp, 1-10, OOObp, 1-1000, 1-lOObp, l-70bp, l-50bp, l-30bp, 1- 20bp, 또는 l-10bp)의 결실, 삽입, 치환 등에 의하여 유전자 기능을 상실, 변경, 및 /또는 회복 (수정) 시키는 것을 의미하기 위하여 사용될 수 있다. As used herein, the term 'genome edi ting 1 ', unless otherwise stated, refers to a nucleic acid molecule (eg, 1-100, OOObp, 1-10) by cleavage at a target site of a target gene. , OOObp, 1-1000, 1-lOObp, l-70bp, l-50bp, l-30bp, 1-20bp, or l-10bp) loss, alteration, and / or loss of gene function Can be used to mean repair.
일 구현예에 따르면, Cpfl 단백질을 이용한 type V CRISPR-Cpfl 시스템으로 표적 DNA의 원하는 위치에서의 절단이 가능하다. 다른 구현예에 따르면, Cpfl 단백질을 이용한 type V CRISPR-Cpfl 시스템으로 세포 내 특정 유전자의 교정이 가능하다.  According to one embodiment, the type V CRISPR-Cpfl system using the Cpfl protein can be cleaved at a desired position of the target DNA. According to another embodiment, the type V CRISPR-Cpfl system using the Cpfl protein is capable of correcting specific genes in cells.
또한ᅳ CRISPR-Cpfl 리보핵산단백질 (r ibonucleoprotein; RNP) 또는 이를 암호화하는 DNA를 세포에 전달하는 기술에 있어서, 기존의 microinject ion 방법의 단점을 극복하기 위한 방안이 제공된다. 그 일 예로서, electroporat ion 방식, 리포펙션 ( Hpofect ion) 등의 방식으로 한 번에 많은 수의 세포에 리보핵산단백질 또는 이를 암호화하는 DNA를 플라스미드에 포함시켜 전달하여 유전체를 교정하는 기술이 제공되지만, 상기 Cpfl 시스템올 이용한 유전체 교정 기술이 이에 제한되는 것은 아니다.  In addition, in the technology for delivering CRISPR-Cpfl ribonucleoprotein (RNP) or DNA encoding the same to cells, a method for overcoming the disadvantages of the conventional microinject ion method is provided. As an example, there is provided a technique for correcting the genome by delivering ribonucleic acid protein or DNA encoding the same to a plasmid to a large number of cells at a time by an electroporat ion method or a lipofect ion method. The genome calibration technique using the Cpfl system is not limited thereto.
CRISPR-Cpfl 리보핵산단백질은 Cpfl을 코딩하는 DNA를 포함하는 재조합 백터 및 crRNA를 코딩하는 DNA를 포함하는 재조합 백터의 형태로 세포 또는 유기체에 도입되거나, Cpf l 단백질 및 crRNA를 포함하는 ^합물 또는 이들이 복합체를 이루는 리보핵산단백질 형태로 세포 또는 유기체에 도입될 수 있다.  CRISPR-Cpfl ribonucleic acid proteins are introduced into cells or organisms in the form of recombinant vectors comprising DNA encoding Cpfl and recombinant vectors comprising DNA encoding crRNA, or ^ complexes comprising Cpf l protein and crRNA It may be introduced into a cell or organism in the form of a complex ribonucleic acid protein.
일 예는 Cpfl 단백질 또는 이를 암호화하는 DNA 및 가이드 RNA (CRISPR RNA; crRNA) 또는 이를 암호화하는 DNA를 포함하는 리보핵산단백질을 포함하는 유전체 교정용 조성물을 제공한다.  One example provides a composition for genome calibration comprising a ribonucleic acid protein comprising a Cpfl protein or DNA encoding the same and a guide RNA (CRISPR RNA; crRNA) or a DNA encoding the same.
다른 예는 Cpf l 단백질 및 가이드 RNA (CRISPR RNA; crRNA)를 포함하는 리보핵산단백질을 유기체에 전달하는 단계를 포함하는, 유기체의 유전체 교정 방법을 제공한다. 상기 유전체 교정용 조성물 또는 유전체 교정 방법에 포함되거나사용되는Another example provides a method for genome calibration of an organism comprising delivering to the organism ribonucleic acid protein comprising a Cpf l protein and guide RNA (CRISPR RNA; crRNA). Included in or used in the dielectric calibration composition or dielectric calibration method
Cpfl 단백질 또는 이를 암호화하는 DNA, 및 가이드 RNA또는 이를 암호화하는 DNA는, Cpf l 단백질 및 가이드 RNA 를 포함하는 흔합물 또는 이들이 복합체를 이루는 리보핵산 단백질 (ribonucleioprotein; RNA) 형태로 사용되거나, Cpfl 단백질을 암호화하는 DNA, 및 가이드 R A를 암호화하는 DNA를 별도의 백터에 각각 포함하거나 또는 하나의 백터에 함께 포함되어 사용될 수 있다. The Cpfl protein or DNA encoding the same, and the guide RNA or DNA encoding the same, is used in the form of a mixture comprising the Cpf l protein and the guide RNA or a ribonucleioprotein (RNA) in which they are complex, or the Cpfl protein. The DNA encoding and the DNA encoding the guide RA may be included in separate vectors, or may be included together in one vector.
싱-기 조성물 및 방법은 진핵 유기체에 적용되는 것일 수 있다. 상기 진핵 유기체는 진핵 세포 (예컨대, 효모 등의 균류, 진핵 동물 및 /또는 진핵 식물 유래 세포 (예컨대, 배아세포, 줄기세포, 체세포, 생식세포 등) 등), 진핵 동물 (예컨대 척추동물 또는 무척추동물, 보다 구체적으로, 인간, 원승이 등의 영장류, 개, 돼지 소, 양, 염소, 마우스, 래트 등을 포함하는 포유류 등), 및 진핵 식물 (예컨대, 녹조류 등의 조류 옥수수, 콩, 밀, 벼 등의 단자엽 또는 쌍자엽 식물 등)로 이루어진 군에서 선택된 것일 수 있다.  The single-group compositions and methods can be applied to eukaryotic organisms. The eukaryotic organism may be eukaryotic cells (e.g., yeast, eukaryotic and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g. vertebrates or invertebrates). More specifically, humans, mammals including primates such as pigs, dogs, pigs, sheep, goats, mice, rats, etc.), and eukaryotic plants (e.g., bird corn such as green algae, soybeans, wheat, rice, etc.). It may be selected from the group consisting of monocotyledonous or dicotyledonous plants, such as).
다른 예는 Cpf l 단백질을 이용한 유전체 교정에 의한 형질 전환유기체의 제조 방법을 제공한다. 보다 구체적으로, 상기 형질 전환유기체의 제조 방법은 Cpfl 단백질 또는 이를 암호화하는 DNA 및 가이드 RNA (CRISPR RNA; crRNA) 또는 이를 암호화하는 DNA를 진핵 세포에 전달하는 단계를 포함할 수 있다. 상기 형질 전환 유기체가 형질전환 진핵 동물 또는 형질전환 진핵 식물인 경우, 상기 제조 방법은 상기 전달하는 단계와 동시 또는 그 이후에 상기 진핵 세포의 배양 및 /또는 분화 단계를 추가로 포함할 수 있다.  Another example provides a method for producing a transgenic organism by genome correction using a Cpf l protein. More specifically, the method for producing a transgenic organism may include delivering a Cpfl protein or DNA encoding the same and guide RNA (CRISPR RNA; crRNA) or DNA encoding the same to eukaryotic cells. If the transgenic organism is a transgenic eukaryotic animal or transgenic eukaryotic plant, the preparation method may further comprise culturing and / or differentiating the eukaryotic cells simultaneously with or after the delivering.
다른 예는 상기 형질 전환 유기체 제조 방법에 의하여 제조된 형질 전환 유기체를 제공한다.  Another example provides a transformed organism produced by the method for producing a transformed organism.
상기 형질전환 유기체는 모든 진핵 세포 (예컨대, 효모 등의 균류, 진핵 동물 및 /또는 진핵 식물 유래 세포 (예컨대, 배아세포, 줄기세포, 체세포, 생식세포 등) 등), 진핵 동물 (예컨대, 척추동물 또는 무척추동물, 보다 구체적으로, 인간, 원승이 등의 영장류, 개, 돼지, 소, 양, 염소, 마우스, 래트 등을 포함하는 포유류 등), 및 진핵 식물 (예컨대, 녹조류 등의 조류, 옥수수, 콩, 밀, 벼 등의 단자엽 또는 쌍자엽 싀물 등)로 이루어진 군에서 선택된 것일 수 있다.  The transforming organism may be any eukaryotic cell (e.g., yeast, eukaryotic, and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, germ cells, etc.), eukaryotic animals (e.g., vertebrates). Or invertebrates, more specifically, primates such as humans, animals, etc., mammals including dogs, pigs, cattle, sheep, goats, mice, rats, etc.), and eukaryotic plants (eg, birds, such as green algae, corn, Soybean, wheat, rice, such as monocotyledonous or dicotyledonous) may be selected from the group consisting of.
본 명세서에서 제공되는 유전체 교정 방법 및 형질 전환 유기체 제조 방법 있어서, 상기 진핵 동물은 인간을 제외한 것일 수 있으며, 상기 진핵 세포는 인간을 포함한 진핵 동물에서 분리된 세포를 포함할 수 있다. In the genome correction method and the transformed organism manufacturing method provided herein, the eukaryotic animal may be other than human, the eukaryotic cell is Cells isolated from eukaryotic animals, including humans.
본 명세서에서 사용된 용어 "리보핵산단백질1'은 RNA 가이드 The term "ribonucleic acid protein 1 " as used herein refers to an RNA guide.
엔도뉴클레아제인 Cpfl 단백질과 가이드 RNA (crRNA)를 포함하는 단백질-리보핵산 복합체를 의미한다. It refers to a protein-ribonucleic acid complex comprising an endonuclease Cpfl protein and a guide RNA (crRNA).
Cpfl 단백질은 상기 CRISPR/Cas 시스템과는 구별되는 새로운 CRISPR 시스템의 엔도뉴클레아제로서 , Cas9에 비해 상대적으로 크기가 작고, tracrRNA가 필요 없으며, 단일 가이드 RNA에 의해 작용할 수 있다. 또한, Cpfl 단백질은, PAM (protospacer-adjacent motif) 서열로서, 5' 말단에 위치하는, 5'— TTN-3' 또는 5'- TTTN-31 (N은 임의의 뉴클레오타이드로서 , , T, G, 또는 C의 염기를 갖는 뉴클레오타이드임)와 같은 티민 (thymine)이 풍부한 DNA 서열을 인식하고 DNA의 이중 사슬을 잘라 점착종단 (cohesive end; cohesive double-strand break)을 생성한다. 이와 같이 생성된 점착 종단은 표적 위치 (또는 절단 위치)에서의 NHE J -mediated transgene knock-in을 용이하게 할 수 있다. The Cpfl protein is an endonuclease of the new CRISPR system that is distinct from the CRISPR / Cas system, which is relatively small in size compared to Cas9, does not require tracrRNA, and can act by a single guide RNA. In addition, the Cpfl protein is a PAM (protospacer-adjacent motif) sequence located at the 5 'end, 5'- TTN-3' or 5'- TTTN-3 1 (N is any nucleotide, T, G Thymine-rich DNA sequences, such as nucleotides having a base of C, or C) and cut the double chains of DNA to create a cohesive end (cohesive double-strand break). The cohesive terminus thus generated may facilitate NHE J-mediated transgene knock-in at the target position (or cleavage position).
예컨대, 상기 Cpfl 단백질은 캔디다투스 {Candidatus) 속, 라치노스피라 (Lachnospira) 속, 뷰티리비브리오 (Butyri vibrio) 속, 페레그리니박테리아  For example, the Cpfl protein is a genus Candidatus, Lachnospira, Butyri vibrio, Peregrinini bacteria
(Peregrini bacteria) , 액시도미노코쿠스 (Acidoiriinococcus) 속, 포르파이로모나스 {Porphyromonas) 속, 프레보텔라 Prevotella) 속, 프란시셀라 Franci sella) 속, 캔디다투스 메타노플라스마 iCandidatus Methanoplasma , 또는 유박테리움  (Peregrini bacteria), genus Acidoiriinococcus, genus Porphyromonas, genus Prevotella, genus Franci sella, Candidatus metanoplasma, iCandidatus Methanoplasma, or Eubacteria
(Eubacterium) 속 유래의 것일 수 있고, 예컨대, Parcubacteria bacterium (Eubacterium) may be derived from, for example, Parcubacteria bacterium
(GWC2011_GWC2_44_17), Lachnospiraceae bacterium (MC2017) , Butyri vibrio proteoclasi icus, Peregrini bacteria bacterium (GW2011_GWA_33_10) , (GWC2011_GWC2_44_17), Lachnospiraceae bacterium (MC2017), Butyri vibrio proteoclasi icus, Peregrini bacteria bacterium (GW2011_GWA_33_10),
Acidaminococcus sp. (BV3L6) , Porphyromonas macacae, Lachnospiraceae bacterium (ND2006) , Porphyromonas crevi or J cam's, Prevotella disiens, Moraxella bovoculi (237), Smiihella sp. (SC_K08D17) , Leptospira inadai , Lachnospiraceae bacterium (MA2020), Francisel la novicida (U112) , Candidatus Methanoplasma termitum, Candidatus Paceibacter, Eubacterium el i gens등의 미생물 유래의 것일 수 있으나, 이에 제한되는 것은 아니다 . 일 예에서, 상기 Cpfl 단백질은 Acidaminococcus sp. (BV3L6), Porphyromonas macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevi or J cam 's, Prevotella disiens, Moraxella bovoculi (237), Smiihella sp. (SC_K08D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisel la novicida (U112), Candidatus Methanoplasma termitum, Candidatus Paceibacter, Eubacterium el i gens, and the like, but are not limited thereto. In one embodiment, the Cpfl protein is
Parcubacteria bacterium (GWC2011_G C2_44_17) , Peregrini bacteria bacterium ( GW2011_GWA_33_ 10), Acidaminococcus sp. (BV3L6) , Porphyromonas macacae, Lachnospiraceae bacterium (ND2006) , Porphyromonas crevwri cam's, Prevotella disiens, Moraxella bovoculi (237) , Leptospira inadai, Lachnospiraceae bacterium (MA2020) , Franci sella novicida (U112) , Candidatus Methanoplasma termitum, 또는 Eubacterium eligens유래의 것일 수 있으나, 이에 제한되는 것은 아니다. Parcubacteria bacterium (GWC2011_G C2_44_17), Peregrini bacteria bacterium (GW2011_GWA_33_ 10), Acidaminococcus sp. (BV3L6), Porphyromonas macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevwri cam ' s, Prevotella disiens, Moraxella bovoculi (237), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Franci sella novicida (U112), Candidatus Methanoplasma termitum, or Eubacterium eligens, but are not limited thereto.
상기와 같은 Cpf l 단백질의 예를 유래 미생물 별로 아래의 표 1에 정리하였다:  Examples of such Cpf l protein are summarized in Table 1 below for each of the derived microorganisms:
【표 1】  Table 1
Figure imgf000008_0001
Figure imgf000008_0001
Figure imgf000009_0001
Figure imgf000009_0001
방법으로 비자연적 생산된 것 (non-natural ly occurring)일 수 있다. 상기 Cpfl 단백질은 진핵세포의 핵 내 전달을 위하여 통상적으로 사용되는 요소 (예컨대, 핵위치신호 (nuclear localization signal; NLS) 등)를 추가로 포함하는 것일 수 있으나, 이에 제한되는 것은 아니다. 상기 Cpfl단백질은 정제된 단백질 형태로 사용되거나, 이를 암호화하는 DNA, 또는 상기 DNA를 포함하는 재조합 백터의 형태로 사용될 수 있다. It can be non-naturally occurring in a way. The Cpfl protein may further include, but is not limited to, elements commonly used for nuclear transfer of eukaryotic cells (eg, nuclear localization signal (NLS), etc.). The Cpfl protein may be used in the form of a purified protein, or may be used in the form of a DNA encoding the same, or a recombinant vector including the DNA.
상기 가이드 R A는 복합체를 형성할 Cpfl 단백질 종류 및 /또는 그 유래 미생물에 따라서 적절히 선택될 수 있다.  The guide R A may be appropriately selected depending on the type of Cpfl protein and / or the microorganism derived therefrom to form the complex.
일 예에서, Cpfl 시스템에 사용되는 crR A는 다음의 일반식 1로 표현될 수 있다:  In one example, crR A used in the Cpfl system can be represented by the following general formula:
5 ' -nl-n2-A-U-n3-U-C-U-A-C-U-n4-n5-n6-n7-G-U-A-G-A-U-(NCpfi)p-3 ' (일반식 1; 서열번호 60). 5'-nl-n2-AU-n3-UCUACU-n4-n5-n6-n7-GUAGAU- (N C p f i) p -3 '(Formula 1; SEQ ID NO: 60).
상기 일반식 1에서, nl은 존재하지 않거나, U, A , 또는 G이고, n2는 A또는 G이고, n3은 U, A, 또는 C이고, n4는 존재하지 않거나 G, C, 또는 A이고, n5는 A, U , C , G, 또는 존재하지 않고, n6은 U, G또는 C이고, n7은 U또는 G이며, In the general formula 1, nl is absent, U, A, or G, n2 is A or G, n3 is U, A, or C, n4 is absent or G, C, or A, n5 is A, U, C, G, or absent, n6 is U, G or C, n7 is U or G,
Ncpil는 유전자 표적 부위와흔성화 가능한 뉴클레오타이드 서열을 포함하는 타겟팅 서열로서 표적 유전자의 표적 서열에 따라서 결정되며, q는 포함된 뉴클레오타이드 수를 나타내는 것으로, 15 내지 30의 정수, 15 내지 29의 정수, 15 내지 28의 人 N cpil is a targeting sequence comprising a nucleotide sequence that can be localized with a gene target site, and is determined according to the target sequence of the target gene, q is an integer of 15 to 30, an integer of 15 to 29,人 of 15 to 28
¾ᄀ—, 15 내지 27의 15 내지 26의 정수 15 내지 25의 ¾丁, 15 내지 24의 정수, 15 내지 23의 성丁, 15 내지 22의 ¾ᅳ厂, 15 내지 21의 ¾丁, 15 내지 20의 정수, 16 내지 30의 16 내지 29의 ¾ τ , 16 내지 28의 청丁, 16 내지 27의 ¾ᅳ厂 , 16 내지 26의 겨  ¾a—, an integer from 15 to 26, from 15 to 26, from 3 to 25, from 15 to 24, from 15 to 23, from 15 to 22, from 3 to 15 from 15 to 21, from 15 to 21 An integer of 20, 16 to 29 ¾ τ, 16 to 30 blue, 16 to 28 blue, 16 to 27 ¾ ᅳ 厂, 16 to 26 bran
"厂, 16 내지 25의 청丁, 16 내지 24의 ¾丁, 16 내지 23의 16 내지 22의 ¾丁, 16 내지 21의 정수, 16 내지 20의 정수, 17 내지 30의 정수, 17 내지 29의 17 내지 28의 정수, 17 내지 27의 정수, 17 내지 26의 정수, 17 내지 25의 人 17 내지 24의 정수, 17 내지 23의 청ᄀ―, 17 내지 22의 ¾丁, 17 내지 21의 17 내지 20의 ¾수ᅵ , 18 내지 30의 ¾丁, 18 내지 29의 ¾丁, 18 내지 28의 ¾ τ 18 내지 27의 정ᄋ수, 18 내지 26의 청ᄀ一, 18 내지 25의 청ᅳ厂, 18 내지 24의 18 내지 23의 정수, 18 내지 22의 18 내지 21의 또는 18 내지 20의 정수일 수 있다. 상기 . S적 유전자의 표적 서열 (crRNA와 흔성화 하는 서열)은 ΡΑΜ서열 (5 ' -' ΓΤΝ-31 또-는 5 ' -TTTN-3 '; N은 임의의 뉴클레오타이드로서, A, Τ, G, 또는 C의 염기를 갖는 뉴클레오타이드임)의 3 ' 방향으로 인접하여 위치하는 (예컨대, 연속하는) 15 내지 30개, 15 내지 29 개, 15 내지 28 개, 15 내지 27 개, 15 내지 26 개, 15 내지 25 개, 15 내지 24 개, 15 내지 23 개 , 15 내지 22 개 , 15 내지 21 개, 15 내지 20 개, 16 내지 30 개, 16 내지 29 개, 16 내지 28 개 , 16 내자 27 개, 16 내지 26 개 ' 16 내지 25 개 , 16 내지 24 개, 16 내지 23 개 , 16 내지 22 개ᅳ 16 내지 21 개, 16 내지 20 개, 17 내지 30 개, 17 내지 29 개 , 17 내지 28 개, 17 내지 27 개, 17 내지 26 개, 17 내지 25 개, 17 내지 24 개 , 17 내지 23 개, 17 내지 22 개, 17 내지 21 개, 17 내지 20 개 , 18 내지 30 개 , 18 내지 29 개, 18 내지 28 개 , 18 내지 27 개, 18 내지 26 개 ' 18 내지 25 개 , 18 내지 24 개, 18 내지 23 개, 18 내지 22 개, 18 내지 21 개, 또는 18 내지 20 개 ᄋ Blue ", 16-25 blue, 16-24 square, 16-23 square, 16-21 integer, 16-20 integer, 17-30 integer, 17-29 Of integers from 17 to 28, integers from 17 to 27, integers from 17 to 26, integers from 17 to 24, integers from 17 to 24, blue from 17 to 23, ¾ 丁 from 17 to 22, 17 to 21 at 17 18 to 30 ¾ 丁, 18 to 29 ¾ 丁, 18 to 28 ¾ τ 18 to 27 Integer, 18 to 26 Blue, 18 to 25 Blue It may be an integer of 18 to 23 of 18 to 24, an integer of 18 to 21 of 18 to 22, or an integer of 18 to 20. The target sequence (sequencing with crRNA) of the S gene is a ΡΑΜ sequence (5 ' - 'ΓΤΝ-3 1 or 5'-TTTN-3; (e. g. to 'N is an arbitrary nucleotide, a, Τ, G, or 3 nucleotides of Im) having a base C' adjacent to the position , continuity A) 15 to 30, 15 to 29, 15 to 28, 15 to 27, 15 to 26, 15 to 25, 15 to 24, 15 to 23, 15 to 22, 15 to 21 Dog, 15 to 20, 16 to 30, 16 to 29, 16 to 28, 16 inside 27, 16 to 26 '16 to 25, 16 to 24, 16 to 23, 16 to 22 Dogs 16 to 21, 16 to 20, 17 to 30, 17 to 29, 17 to 28, 17 to 27, 17 to 26, 17 to 25, 17 to 24, 17 to 23 Dog, 17 to 22, 17 to 21, 17 to 20, 18 to 30, 18 to 29, 18 to 28, 18 to 27, 18 to 26 '18 to 25, 18 to 24 Dogs, 18 to 23, 18 to 22, 18 to 21, or 18 to 20
의 표적 전자의 표적 부위의 뉴클레오타이 서열이다. 상기 일반식 1에서 5 ' 말단에서 카운팅하여 6번째부터 10번째까지의 5개의 뉴클레오타이드 (5 ' 말단 스템 부위)와 15번째 (π4가 존재하는 경우 16번째)부터 19번째 (η4가 존재하는 경우 20번째)까지의 5개 뉴클레오타이드 (31 말단 스템 부위)은 서로 역평행 (ant iparal l el )하게 상보적 뉴클레오타이드로 이루어져 이중 가닥 구조 (스템 구조)를 형성하고, 상기 5 ' 말단 스템 부위와 3 ' 말단 스템 부위 사이의 3 내지 5개 뉴클레오타이드가루프 구조를 형성할 수 있다. Is the nucleotide sequence of the target site of the target electron. In the general formula 1, 5 nucleotides (5 'terminal stem region) from 6th to 10th are counted at the 5' end and 15th (16th when π4 is present) to 19th (20 when η4 is present). 5 nucleotides (3 1 terminal stem region) up to the first) are composed of complementary nucleotides antiparallel to each other (ant iparal l el) to form a double stranded structure (stem structure), and the 5 'terminal stem region and 3' Three to five nucleotide garroof structures between the terminal stem sites can be formed.
상기 Cpf l 단백질의 crRNA (예컨대, 일반식 1로 표현됨)는 5 ' 말단에 1 내지 3개의 구아닌 (G)을 추가로 포함할 수 있다.  The crRNA (eg, represented by Formula 1) of the Cpf l protein may further include 1-3 guanine (G) at the 5 ′ end.
본 명세서에서, 유전자 표적 부위와 흔성화 가능한 뉴클레오타이드 서열은 유전자 표적 부위의 뉴클레오타이드 서열 (표적 서열)과 50% 이상, 60% 이상, 70% 이상, 80% 이상 90% 이상, 95% 이상, 99% 이상, 또는 100%의 서열 상보성을 갖는 뉴클레오타이드 서열을 의미한다 (이하, 특별한 언급이 없는 한 동일한 의미로 사용되며, 상기 서열 상동성은 통상적인 서열 비교 수단 (예컨대 BLAST)를 사용하여 확인될 수 있다) . 예컨대, 상기 표적 서열과 흔성화 가능한 crRNA는 상기 표적 서열 (PAM서열이 위치하는 가닥과 동일한 가닥에 위치)이 위치하는 핵산 가닥 (즉 PAM서열이 위치하는 가닥)의 반대 가닥에 위치하는 대응 서열과 상보적 서열을 갖는 것일 수 있으며, 이를 다르게 설명하면, crRNA은 DNA서열로 표시된 표적 서열에서 T를 U로 치환한 서열을 타겟팅 서열 부위로 포함하는 것일 수 있다.  In the present specification, the nucleotide sequence capable of hybridizing with the gene target site is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at 99% with the nucleotide sequence (target sequence) of the gene target site. Or nucleotide sequence having 100% sequence complementarity (hereinafter, unless otherwise specified, the same meaning is used, and the sequence homology can be confirmed using conventional sequence comparison means (such as BLAST)). . For example, the crRNA that can be hybridized with the target sequence may correspond to a corresponding sequence located on the opposite strand of the nucleic acid strand where the target sequence (located on the same strand as the strand on which the PAM sequence is located) (ie, the strand on which the PAM sequence is located). It may have a complementary sequence, and if described differently, the crRNA may include a sequence in which the T is replaced by U in the target sequence represented by the DNA sequence as the targeting sequence site.
본 명세서에서, crRNA를 표적 서열로 표현할 수 있으며, 이 경우 별도의 언급이 없어도, crRNA서열은 표적 서열에서 T를 U로 치환한서열인 것으로 해석될 수 있다.  In the present specification, crRNA may be expressed as a target sequence, and in this case, even if not mentioned otherwise, the crRNA sequence may be interpreted as a sequence in which T is replaced with U in the target sequence.
상기 유전자 표적 부위의 뉴클레오타이드 서열 (표적 서열)은 5 ' 말단에 TTTN또는 TTN (N은 A, T, C, 또는 G) , 또는 이들과 50% 이상, 66% 이상, 또는 75% 이상의 서열 상동성을 갖는 PAM(protospacer-adj'acent mot i f )와 연결 (예컨대, 표적서열의 5 ' 말단과 PAM서열이 직접 연결되거나 (Ont 거리) , 1 내지 10nt 거리를 두고 연결)되어 있거나, 상기 5 ' 말단 PAM서열에 더하여, 3 ' 말단에 상기 PAM서열과 역방향으로 상보적인 서열 (NAM또는 NAA , 또는 이들과 50% 이상, 66% 이상, 또는 75% 이상의 서열 상동성을 갖는 서열; N은 A , T, C, 또는 G; 3 ' 말단의 inverted PAM서열)과 연결 (예컨대, 표적서열의 3 ' 말단과 inverted PAM서열이 직접 연결되거나 (Ont 거리), 1 내지 lOnt 거리를 두고 연결될 수 있음)된 것일 수 있다. The nucleotide sequence (target sequence) of the gene target site is at least 50%, at least 66%, or at least 75% sequence homology with TTTN or TTN (N is A, T, C, or G), or these at the 5 'end. PAM (protospacer-adj ' acent mot if) having a (for example, the 5' end of the target sequence and the PAM sequence is directly connected (Ont distance), or 1 to 10nt distance), or the 5 'end In addition to the PAM sequence, a sequence complementary to the PAM sequence at the 3 'end in reverse (NAM or NAA, or a sequence having at least 50%, at least 66%, or at least 75% sequence homology with them; N is A, T , C, or G; inverted PAM sequence at the 3 'end (eg, the 3' end and the inverted PAM sequence at the target sequence May be directly connected (Ont distance), or may be connected with a distance of 1 to lOnt).
Cpfl 유래 미생물에 따라사용 가능한 Cpfl 단백질의 crR A서열의 5' 말단 부위 서열 (타겟팅 서열 부위 제외한 부분)을 표 2에 예시적으로 기재하였다:  The 5 'terminal region sequence (part except the targeting sequence region) of the crR A sequence of the Cpfl protein usable according to the Cpfl derived microorganism is exemplarily described in Table 2:
【표 2]  [Table 2]
Figure imgf000012_0001
Figure imgf000012_0001
(-: 뉴클레오타이드가존재하지 않음을 의미)  (-: Means no nucleotides)
일 예에서, 상기 crRNA는 플라스미드 (plasmid)를 주형으로 하여 시험관 내 (in vitro) 전사된 crRNA일 수 있다.  In one example, the crRNA may be a crRNA transcribed in vitro with a plasmid as a template.
다른 예에서, 상기 crRNA는 5' 말단에 인산 -인산 결합 (예컨대,  In another example, the crRNA is a phosphate-phosphate bond (eg,
다이포스페이트 또는 트리포스페이트)을 포함하지 않는 것일 수 있다. crRNA가 5' 말단에 인산 -인산 결합을포함하지 않음으로써, 이를 포함하는 경우와 비교하여, 면역 반웅 유도능 및 /또는 세포 독성이 현저히 감소된 것일 수 있다. 상기 세포 독성 감소는 면역 반웅 ( innate immuni ty)을 유발하지 않음; 및 /또는 세포 생존 저해, 세포 증식 저해, 및 /또는 세포의 손상, 용혈, 및 /또는 사멸 유도의 완화 (감소) 및 /또는 제거 (해소)를 의미할 수 있다. 예컨대, 상기 51 말단에 인산 -인산 결합을 포함하지 않는 가이드 R A는 5 ' 말단에 모노포스페이트기 또는 0H기를 포함하거나, 이 외에도, 바이러스 또는 박테리아와 같은 pathogen과 구별되는 진핵 세포 또는 진핵 생물 내에 세포 독성 유발 없이 존재 가능한 모든 RNA의 5 ' 말단의 변형된 형태 (예컨대, 면역 억제, 안정성 증진, 표지 등의 이유로 자연적 또는 인공적으로 변형된 51 말단 형태)를 갖는 것을 의미할 수 있다. 상기 crRNA는 T7 R A폴리머라아제, T3 RNA 폴리머라아제, SP6 RNA 폴리머라아제와 같은 원핵 세포의 R A 폴리머라아제를 사용하는 in vi tro 전사에 의하여 제작된 후 5 ' 말단의 3개의 인산기 중 2개 이상의 인산기, 예컨대 3개의 인산기가 제거 (즉, Diphosphate or triphosphate). Since crRNA does not include a phosphate-phosphate bond at the 5 'end, it may be a marked reduction in immune response and / or cytotoxicity, as compared to when it contains it. The cell Reduced toxicity does not cause immune reaction (innate immuni ty); And / or inhibit cell survival, inhibit cell proliferation, and / or alleviate (reduce) and / or eliminate (resolve) induction of cell damage, hemolysis, and / or death. For example, the guide RA which does not contain a phosphate-phosphate bond at the 5 1 terminal includes a monophosphate group or a 0H group at the 5 'end, or in addition, a cell in a eukaryotic cell or a eukaryote which is distinguished from a pathogen such as a virus or a bacterium. It may mean having a modified form of the 5 'end of all RNAs that can be present without causing toxicity (eg, a 5 1 terminal form that is naturally or artificially modified for reasons of immunosuppression, stability enhancement, labeling, etc.). The crRNA was prepared by in vitro tro transcription using prokaryotic RA polymerases such as T7 RA polymerase, T3 RNA polymerase, SP6 RNA polymerase, and then, 2 of 3 phosphate groups at the 5 'end. More than one phosphate group, such as three phosphate groups,
트리포스페이트 및 /또는 다이포스페이트가 제거)된 것, 또는 5 ' 말단에 인산 -인산 결합 (예컨대, 다이포스페이트 및 /또는 트리포스페이트)을 포함하지 않도록 화학 합성된 것일 수 있다. 상기 51 말단의 인산기의 제거, 예컨대, 2개 이상의 인산기 (즉, 트리포스페이트 및 /또는 다이포스페이트)의 제거는 인산기와의 에스테르 결합을 분해하여 2개 또는 3개의 인산기를 RNA로부터 유리시키는 모든 통상적인 모든 방법에 의할 수 있으며, 예컨대, 포스파타아제 (phosphatase)를 처리하여 수행할 수 있으나, 이에 제한되는 것은 아니다. 상기 포스파타아제는 Cal f Triphosphate and / or diphosphate removed), or may be chemically synthesized such that it does not contain a phosphate-phosphate bond (eg, diphosphate and / or triphosphate) at the 5 'end. Removal of the 5 1 terminal phosphate group, such as the removal of two or more phosphate groups (ie, triphosphate and / or diphosphate), is all conventional, which breaks ester bonds with phosphate groups to liberate two or three phosphate groups from RNA. Phosphorus may be by any method, for example, may be performed by treating phosphatase, but is not limited thereto. The phosphatase is Cal f
Intest inal alkal ine Phosphatase (CIP) , Shr imp Alkal ine Phosphatase (SAP) , Antarct i c Phosphatase 등으로 이루어진 군에서 선택된 1종 이상일 수 있으나, 이에 제한되는 것은 아니며, RNA로부터 인산기를 유리시키는 모든 효소들 중에서 선택될 수 있다. It may be one or more selected from the group consisting of Intest inal alkal ine Phosphatase (CIP), Shr imp Alkal ine Phosphatase (SAP), Antarctic Phosphatase, and the like, but is not limited thereto and selected from all enzymes that liberate phosphate groups from RNA. Can be.
일 예에서, 본 명세서에서 제공되는 유전체 교정 조성물, 유전체 교정 방법, 형질전환체 제조용 조성물, 및 형질전환체 제조 방법에서 사용되는 Cpf l 단백질 및 crRNA는 정제된 Cpf l 단백질 및 5 ' 말단에 인산 -인산 결합을 포함하지 않은  In one embodiment, the genome calibration composition, genome calibration method, composition for preparing a transformant, and Cpf l protein and crRNA used in the transformant preparation method provided herein are purified Cpf l protein and phosphoric acid at the 5 'end. Does not contain phosphate bonds
(예컨대, 화학 합성된) crRNA를 포함하거나 이를 사용하는 것일 수 있다. It may include or use a crRNA (eg, chemically synthesized).
한편, Cpf l 단백질을 코딩하고 있는 유전자사이즈가크기 때문에, 백터 (예컨대, AAV (Adeno-associ ated vi rus) 등의 바이러스 백터)를 이용하여 Cpf l 단백질을 세포 내 또는 유기체 내로 전달하는 경우 효율이 떨어지는 문제가 있을 수 있고, 이는 Cpf l 기술을 적용하는 데 장애가 될 수 있다. 특히, AAV 백터와 같은 바이러스 백터의 경우, 백터의 패키징 한계 때문에, 패키징 한계를 넘는 유전자가 클로닝된 경우 바이러스 생산 효율 및 세포 내 전달 효율이 떨어지는 현상이 보편적으로 잘 알려져 있다 . On the other hand, since the gene size coding for the Cpf l protein is large, the efficiency of transferring the Cpf l protein into the cell or organism using a vector (for example, a viral vector such as Adeno-associated virus) can be improved. There may be problems with falling, which can impede the adoption of Cpf1 technology. In particular, with the AAV vector In the same viral vector, due to the packaging limitation of the vector, it is generally well known that the efficiency of viral production and intracellular delivery is poor when genes exceeding the packaging limit are cloned.
이러한 문제를 해결하기 위해서, 본 명세서에서 사용되는 Cpfl 단백질 또는 이를 암호화하는 DNA는 적어도 하나 이상 (예컨대 하나)의 임의의 위치에서  In order to solve this problem, as used herein, the Cpfl protein or DNA encoding the same may be present at any position of at least one or more (eg, one).
절단되어 생성된 두 개 이상 (예컨대, 두 개)의 절단 단편들 중 하나 이상 (예컨대 두 개)을 포함하는 것일 수 있다. 상기 두 개 이상의 Cpfl 절단 단편은 전장 Cpfl를 중복없이 cover하는 것일 수 있다. 상기 두 개 이상의 절단 단편 (DNA 단편)은 하나의 백터에 함께 포함되거나 두 개 이상의 백터에 각각 포함되어 세포 또는 유기체에 전달될 수 있다. It may be one comprising one or more (eg two) of two or more (eg two) cleaved fragments produced by cleavage. The two or more Cpfl cleavage fragments may cover the full length Cpfl without overlap. The two or more cleavage fragments (DNA fragments) may be included together in one vector or each included in two or more vectors to be delivered to a cell or organism.
상기 Cpf l 단백질 또는 이를 암호화하는 DNA의 절단 지점은 Cpf l 단백질의 3차 구조상 외부 노출 부위 또는 소정의 기능을 갖는 도메인 이외의 부위 (예컨대 도메인 간 링커 (domain-domain l inker) , 또는 상기 외부 노출 부위 또는 도메인 이외의 부위를 암호화하는 DNA서열 내에 위치할 수 있다.  The cleavage point of the Cpf l protein or DNA encoding the Cpf l protein may be an externally exposed site of the Cpf l protein or a site other than a domain having a predetermined function (for example, a domain-domain l inker, or the external exposure). It can be located within a DNA sequence encoding a site other than a site or domain.
예컨대, Acidaminococcus sp. BVBLG유래 Cpfl (AsCpfl)의 경우, 단백질 상의 절단 지점은, AsCpfl 아미노산서열 (Genbank Accession No . P_021736722.1; 1307 아미노산 길이) 중, 9이번째 아미노산과 902번째 아미노산사이, 886번째 아미노산과 887번째 아미노산사이, 399번째 아미노산과 400번째 아미노산사이, 및 526번째 아미노산과 527번째 아미노산사이로 이루어진 군에서 선택된 하나 이상의 지점일 수 있다.  For example, Acidaminococcus sp. For BVBLG-derived Cpfl (AsCpfl), the cleavage point on the protein is between the 9th and 902th amino acids, the 886th and 887th amino acids in the AsCpfl amino acid sequence (Genbank Accession No. P_021736722.1; 1307 amino acids length). And at least one point selected from the group consisting of between 399th and 400th amino acids, and between 526th and 527th amino acids.
예컨대, 상기 절단 단편은, AsCpfl 아미노산서열 (1307 아미노산 길이) 중, 1) 첫번째 아미노산부터 901번째 아미노산까지의 제 1 단백질 단편 또는 이를 암호화하는 제 1 DNA 단편 및 902번째 아미노산부터 1307번째 아미노산까지의 제 2 단백질 단편 또는 이를 암호화하는 제 2 DNA 단편;  For example, the cleavage fragment may include a) a first protein fragment from the first amino acid to the 901 th amino acid or a first DNA fragment encoding the same and a first DNA fragment from the 902 th amino acid to the 1307 th amino acid, among the AsCpfl amino acid sequences (1307 amino acids in length). Two protein fragments or a second DNA fragment encoding the same;
2) 첫번째 아미노산부터 886번째 아미노산까지의 제 1 단백질 단편 또는 이를 암호화하는 게 1 DNA 단편 및 887번째 아미노산부터 1307번째 아미노산까지의 제 2 단백질 단편 또는 이를 암호화하는 게 2 DNA 단편;  2) the first protein fragment from the first amino acid to the 886 th amino acid or the first DNA fragment encoding the same and the second protein fragment from the 887 th amino acid to the 1307 th amino acid or the second DNA fragment encoding the same;
3) 첫번째 아미노산부터 399번째 아미노산까지의 제 1 단백질 단편 또는 이를 암호화하는 게 1 DNA 단편 및 400번째 아미노산부터 1307번째 아미노산까지의 제 2 단백질 단편 또는 이를 암호화하는 제 2 DNA 단편 ; 또는 4) 첫번째 아미노산부터 526번째 아미노산까지의 제 1 단백질 단편 또는 이를 암호화하는 제 1 DNA 단편 및 527번째 아미노산부터 1307번째 아미노산까지의 제 2 단백질 단편 또는 이를 암호화하는 제 2 DNA 단편 3) the first protein fragment from the first amino acid to the 399 th amino acid or the first DNA fragment encoding it and the second protein fragment from the 400 th amino acid to the 1307 th amino acid or the second DNA fragment encoding the same; or 4) the first protein fragment from the first amino acid to the 526 th amino acid or the first DNA fragment encoding the same and the second protein fragment from the 527 th amino acid to the 1307 th amino acid or the second DNA fragment encoding the same
을 포함하는 것일 수 있다.  It may be to include.
상기 절단 위치 및 절단 단편을 AsCpfl를 예를 들어 설명하였지만, 상기 절단 위치 및 절단 단편은 다른 유기체에서 유래하는 Cpfl에서의 해당 위치에 적용될 수 있다. 상기 "다른 유기체에서 유래하는 Cpfl에서의 해당 위치1 '는 Although the cleavage sites and cleavage fragments have been described using AsCpfl as an example, the cleavage sites and cleavage fragments may be applied at corresponding positions in Cpfl derived from other organisms. "The corresponding position 1 'in Cpfl from other organisms
AsCpfl 아미노산서열 또는 이를 암호화하는 DNA서열과 당해 유기체의 Cpfl의 아미노산 서열 또는 이를 암호화하는 DNA서열을 통상적인 서열 비교 수단 (예컨대 BLAST (Basic Local Alignment Search Tool; 예컨대, PS I -BLAST (Position- Specific Iterative BLAST); blast .ncbi .nlm.nih.gov/Blast .cgi ) 등)을 이용하여 결정될 수 있으며, 이는 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자가 명확하게 알 수 있는 사항이다. AsCpfl amino acid sequence or DNA sequence encoding the same and the amino acid sequence of the Cpfl of the organism or DNA sequence encoding the same means for conventional sequence comparison (e.g. BLAST (Basic Local Alignment Search Tool; e.g. PS I -BLAST (Position- Specific Iterative) BLAST); blast.ncbi.nlm.nih.gov/Blast.cgi), etc.), which will be apparent to those of ordinary skill in the art.
상기 Cpfl 단백질 또는 또는 이를 암호화하는 유전자의 절단 단편은 두 개 이상의 절단 단편을 포함할 수 있으며 , 상기 두 개 이상의 절단 단편은 각각 N- 말단 및 /또는 C-말단 (단백질 단편의 경우) 또는 5' 말단 및 /또는 3' 말단 (유전자 단편의 경우)에 결합 단백질 또는 결합 단백질을 암호화하는 핵산 분자와 결합되어 있을 수 있다. 상기 결합 단백질은 동일한 생체활성물질의 서로 다른 부위에 결합하는 서로 다른 단백질일 수 있다. 일 예에서, 상기 생체활성물질은  The cleavage fragment of the Cpfl protein or gene encoding it may comprise two or more cleavage fragments, wherein the two or more cleavage fragments may each be N-terminus and / or C-terminus (for a protein fragment) or 5 '. The terminal and / or 3 'terminus (in the case of gene fragments) may be associated with a binding protein or a nucleic acid molecule encoding the binding protein. The binding protein may be different proteins that bind to different sites of the same bioactive material. In one embodiment, the bioactive material is
rapamydn이고 상기 결합 단백질은 FRB 단백질 및 FKBP 단백질로 이루어진 군에서 선택된 것일 수 있으나, 이에 제한되는 것은 아니다. rapamydn and the binding protein may be selected from the group consisting of FRB protein and FKBP protein, but is not limited thereto.
상기 두 개 이상의 Cpfl 단백질 단편을 암호화하는 유전자 단편 (절단 유전자 단편)이 재조합 백터를 통하여 전달되는 경우, 상기 두 개 이상의 절단 유전자 단편은 별개의 백터에 각각 포함되거나 하나의 백터에 함께 포함될 수 있다. 다른 예에서, 상기 백터에 포함된 절단 별개의 백터에 각각 또는 함께 포함된 절단 유전자 단편은 각각의 절단 유전자 단편의 5' 말단 또는 3' 말단  When a gene fragment (cutting gene fragment) encoding the two or more Cpfl protein fragments is delivered through a recombinant vector, the two or more truncated gene fragments may be included in separate vectors or together in one vector. In another example, the truncated gene fragments included in or separately from the cleaved separate vectors included in the vector are 5 'or 3' ends of each truncated gene fragment.
(예컨대, 51 말단) 방향에 crRNA 암호화 DNA와 연결된 것일 수 있다. 일 예에서, 제 1 DNA 단편을 포함하는 백터는, 5'에서 3' 방향으로, 프로모터, crRNA 암호화 DNA, 프로모터, 및 Cpfl 단백질의 제 1 단백질 단편을 암호화하는 제 1 DNA 단편을 포함하고, 제 2 DNA 단편을 포함하는 백터는, 5'에서 3' 방향으로, 프로모터, crRNA 암호화 DNA, 프로모터, 및 Cpf l 단백질의 제 2 단백질 단편을 암호화하는 게 2 DNA 단편을 포함하는 것일 수 있다 (도 32a 참조) . (Eg, 5 1 terminus) may be linked to the crRNA coding DNA. In one example, the vector comprising the first DNA fragment comprises a first DNA fragment encoding a promoter, a crRNA coding DNA, a promoter, and a first protein fragment of Cpfl protein, in the 5 'to 3' direction, Vectors containing 2 DNA fragments, promoters crRNA, 5 'to 3' orientation Encoding the second protein fragment of the coding DNA, the promoter, and the Cpf l protein may include two DNA fragments (see FIG. 32A).
본 명세서에서 제공되는 유전체 교정 방법 및 형질 전환 유기체 제조 방법에서 수행되는 모든 단계는 세포 내 또는 세포 외, 또는 생체 내 또는 생체 외에서 수행되는 것일 수 있다.  All steps performed in the genome correction method and the transformed organism manufacturing method provided herein may be performed in a cell or extracellular, in vivo or ex vivo.
본 발명의 다른 예는 mi croinject i on 방법에 의한 리보핵산단백질의 세포 (예컨대 embryo) 전달시 각각의 embryo 를 microscope를 통해 확인하면서 하나씩 처리해야 하는 단점, 특히 많은 수의 embryo를 순서대로처리할 때는 긴 시간이 필요한데, 이는 embryo 가 1 cel l stage에서 유지되는 시간이 짧다는 점에서 야기되는 기술적인 장애를 극복하기 위한 기술을 제공한다.  Another example of the present invention is the disadvantage of having to process each one by checking each embryo through a microscope during delivery of ribonucleic acid protein cells (eg embryos) by the mi croinject i on method, especially when processing a large number of embryos in sequence. Long time is required, which provides a technique for overcoming the technical obstacles caused by the short duration of the embryo stay in 1 cel l stage.
또한, crRNA가 PCR산물 ( amp 1 i con) 형태가 아닌 백터 (에컨대,  In addition, vectors in which crRNA is not in the form of a PCR product (amp 1 i con) (eg,
플라스미드)에 포함된 형태 (재조합 백터)로 사용됨으로써 PCR산물 (ampl icon) 형태로 사용되는 경우와 비교하여 유전자 교정 (절단, 삽입, 결실 등) 효율이 증진됨을 확인하여 (도 14a 및 14b 참조), crRNA를 백터에 포함된 (클로닝된) 형태로 사용하는 기술을 제공한다. 상기 백터는 crRNA 코딩 DNA 및 /또는 이와 작동 가능하게 연결된 프로모터 등의 전사조절서열을 포함하는 crRNA 발현 카세트를 포함하는 것일 수 있다. Plasmid) was used as a form (recombinant vector) to confirm that the efficiency of genetic correction (cutting, insertion, deletion, etc.) compared with the case of using the PCR product (ampl icon) form (see Fig. 14a and 14b) , Providing a technique of using crRNA in the (cloned) form contained in the vector. The vector may include a crRNA expression cassette comprising a transcriptional regulatory sequence such as a crRNA coding DNA and / or a promoter operably linked thereto.
구체적으로, 다른 예는 RNA 가이드 엔도뉴클레아제 (RNA-guided endonuc lease ; RGEN)과 가이드 RNA를 포함하는 흔합물 또는  Specifically, another example is a combination comprising an RNA guided endonuclease (RGEN) and a guide RNA or
리보핵산단백질 (r ibonucl eoprotein ; RNP) , 이들올 암호화하는 DNA , 또는 상기 D 를 포함하는 재조합 백터를 세포 (예컨대, 진핵 세포) 또는 유기체 (예컨대, 진핵 유기체)에 전달하는 것은 국소주입법 (예컨대, 병변 또는 표적 부위 직접 주입) , 미세주입법 (mi croinject ion) , 전기천공법 (electroporat ion), 리포펙션 (예컨대, 리포펙타민 사용) 등에 의할 수 있다. Delivery of ribonucleic acid protein (RNP), DNA encoding these, or recombinant vectors comprising said D to cells (e.g. eukaryotic cells) or organisms (e.g. eukaryotic organisms) may be achieved by topical injection (e.g., Direct injection of a lesion or target site), micro croinject ion, electroporat ion, lipofection (eg, using lipofectamine), and the like.
다른 예는 RNA 가이드 엔도뉴클레아제 (RNA-guided endonuc 1 ease; RGEN)와 가이드 RNA를 포함하는 흔합물 또는 리보핵산단백질 (r ibonucleoprotein; RNP) , 이들을 암호화하는 DNA , 또는 상기 DNA를 포함하는 재조합 백터를 이용하는 세포 (예컨대, 진핵 세포) 또는 유기체 (예컨대, 진핵 유기체)의 유전체 교정 방법 및 형질전환유기체의 제조 방법에 있어서, 상기 흔합물, 리보핵산단백질, DNA, 또는 재조합 백터는 국소주입법 (예컨대, 병변 또는 표적 부위 직접 주입)ᅳ 미세주입법 (microinject ion) , 전기천공법 (electroporat ion), 리포펙션 ( l ipofect ion; 예컨대 , 리포펙타민 사용) 등에 의하여 세포 (예컨대, 진핵 세포) 또는 유기체 (예컨대, 진핵 유기체)에 전달될 수 있다. 전달 대상 세포가 식물 세포인 경우, 상기 식물 세포를 폴리에틸렌글리콜 (polyethylene glycol ; PEG) 등의 계면활성제와 흔합한 후, 상기 엔도뉴클레아제와 가이드 RNA를 포함하는 흔합물 또는 리보핵산단백질과 흔합하여 전달할 수 있다. Other examples include complexes comprising RNA-guided endonuc 1 ease (RGEN) and guide RNA or ribonucleoproteins (RNPs), DNA encoding them, or recombinants comprising such DNAs. In the method of genome calibration and transformation of a cell (eg, eukaryotic cell) or organism (eg, eukaryotic organism) using a vector, the complex, ribonucleic acid protein, DNA, or recombinant vector may be subjected to topical injection (eg, Direct injection of lesions or target sites) (microinject ion), electroporat ion, lipofection (e.g., using lipofectamine) and the like can be delivered to cells (e.g. eukaryotic cells) or organisms (e.g. eukaryotic organisms). When the cell to be delivered is a plant cell, the plant cell is mixed with a surfactant such as polyethylene glycol (PEG), and then mixed with a complex or ribonucleic acid protein containing the endonuclease and guide RNA. I can deliver it.
다른 예는 RNA 가이드 엔도뉴클레아제 (RNA-guided endonuc lease ; RGEN)와 가이드 RNA를 포함하는 흔합물 또는 리보핵산단백질 (r ibonucleoprotein; RNP) , 이들을 암호화하는 DNA, 또는 상기 DNA를 포함하는 재조합 백터를 세포 (예컨대, 진핵 세포) 또는유기체 (예컨대, 진핵 유기체)에 전달하는 방법에 있어서, 상기 흔합물, 리보핵산단백질, DNA, 또는 재조합 백터를 세포 (예컨대, 진핵 세포) 또는 유기체 (예컨대, 진핵 유기체)에 국소주입법 (예컨대, 병변 또는 표적 부위 직접 주입 )ᅳ 미세주입법 (microinject ion) , 전기천공법 (electroporat ion), 리포펙션  Other examples include complexes comprising RNA-guided endonucleases (RGENs) and guide RNAs or ribonucleoproteins (RNPs), DNAs encoding them, or recombinant vectors comprising such DNAs. In a method for delivering a cell (e.g., eukaryotic cell) or an organic (e.g. eukaryotic organism), the complex, ribonucleoprotein, DNA, or recombinant vector is transferred to a cell (e.g. Local injection into an organism (e.g., direct injection of lesion or target site) in microinjection, electroporat ion, lipofection
( l ipofect ion; 예컨대, 리포펙타민 사용) 등에 의하여 세포 (예컨대, 진핵 세포) 또는 유기체 (예컨대, 진핵 유기체)에 도입시키는 단계를 포함하는 것을 특징으로 하는, 전달 방법을 제공한다. 전달 대상 세포가 식물 세포인 경우, 상기 식물 세포를 폴리에틸렌글리콜 (polyethylene glycol ; PEG) 등의 계면활성제와흔합한 후, 상기 엔도뉴클레아제와 가이드 RNA를 포함하는 흔합물 또는 리보핵산단백질과 흔합하여 전달할 수 있다. (1) introducing into cells (eg eukaryotic cells) or organisms (eg eukaryotic organisms) by ipofect ion (eg, using lipofectamine) or the like. When the cell to be delivered is a plant cell, the plant cell is mixed with a surfactant such as polyethylene glycol (PEG), and then with a complex or ribonucleic acid protein containing the endonuclease and guide RNA. I can deliver it.
상기"기재된 방법에 있어서, 상기 엔도뉴클레아제 (예컨대, Cpf l , Cas9 등) 또는 이를 암호화하는 DNA 및 가이드 RNA (예컨대, crRNA, sgRNA등) 또는 이를 암호화하는 DNA를 포함하는 흔합물 또는 리보핵산단백질, 또는 이를 암호화하는 DNA의 전달은 생체 외 ( in vi tro)에서 발현된 (정제된) 엔도뉴클레아제 및 가이드 RNA의 흔합물 또는 이들이 접합된 리보핵산단백질을 미세주입법 (microinject ion) , 전기천공법 (electroporat ion) , 리포펙션 등의 방식으로 진핵 세포 및 /또는 진핵 유기체에 전달함으로써 수행할 수 있다. 다른 예에서, 상기 엔도뉴클레아제 The "In the described method, the endonuclease (e.g., Cpf l, Cas9, etc.) or DNA and guide encoding the same RNA (e. G., CrRNA, sgRNA, etc.) or common to include a DNA encoding the same compound or ribonucleic acid The delivery of proteins, or DNAs encoding them, is characterized by microinjection, electrophoresis of (purified) endonucleases and guide RNAs expressed in vitro or ribonucleic acid proteins to which they are conjugated. It may be carried out by delivery to eukaryotic cells and / or eukaryotic organisms by means of electroporat ion, lipofection, etc. In another example, the endonuclease
(예컨대, Cpflᅳ Cas9 등) 또는 이를 암호화하는 DNA 및 가이드 RNA (예컨대, crRNA, sgRNA 등) 또는 이를 암호화하는 DNA를 포함하는 흔합물 또는 리보핵산단백질의 전달은 엔도뉴클레아제를 암호화하는 DNA을 포함하는 발현 카세트 및 가이드 RNA를 암호화하는 DNA를 포함하는 발현 카세트를 별도의 백터에 각각 포함하거나 하나의 백터에 함께 포함하는 재조합 백터를 국소주입법 (예컨대, 병변 또는 표적 부위 직접 주입), 미세주입법 (microinjection), 전기천공법 (electroporat ion), Delivery of a complex or ribonucleic acid protein comprising a DNA encoding the same (eg, Cpfl 'Cas9, etc.) or a guide RNA (eg, crRNA, sgRNA, etc.) or a DNA encoding the same may result in a DNA encoding an endonuclease. Each of the expression cassettes and the expression cassette including the DNA encoding the guide RNA comprising a separate vector or one Recombinant vectors that are included in the vector may be topically injected (eg direct injection of lesions or target sites), microinjection, electroporat ion,
리포펙션 등의 방식으로 진핵 세포 및 /또는 진핵 유기체에 전달함으로써 수행할 수 있다. By delivery to eukaryotic cells and / or eukaryotic organisms, such as by lipofection.
상기 발현 카세트는, 엔도뉴클레아제 코딩 DNA또는 crR A코딩 DNA에 더하여, 통상적인 유전자 발현 조절 서열을 상기 엔도뉴클레아제 코딩 DNA또는 crRNA코딩 DNA과 작동 가능하게 연결된 형태로 포함하는 것일 수 있다. 상기 용어 "작동 가능하게 연결된 (operatively linked)"은 유전자 발현 조절 서열과 다른 뉴클레오타이드 서열 사이의 기능적인 결합 (cis)을 의미한다.  In addition to the endonuclease coding DNA or crR A coding DNA, the expression cassette may include a conventional gene expression control sequence in a form operably linked with the endonuclease coding DNA or crRNA coding DNA. The term "operatively linked" means a functional link between a gene expression control sequence and another nucleotide sequence.
상기 유전자 발현 조절 서열은 복제원점 (replication origin), 프로모터, 전사 종결 서열 (terminator) 등으로 이루어진 군에서 선택된 1종 이상일 수 있다. 본 명세서에 시재된 프로모터는 특정 유전자의 전사 개시를 조절하는 전사 조절 서열 중 하나로, 통상적으로 약 100 내지 약 2500 bp 길이의  The gene expression control sequence may be at least one selected from the group consisting of a replication origin, a promoter, a transcription terminator, and the like. A promoter described herein is one of the transcriptional regulatory sequences that regulate the transcriptional initiation of a particular gene, typically from about 100 to about 2500 bp in length.
폴리뉴클레오타이드 단편이다. 일 구체예에서, 상기 프로모터는 세포, 예컨대, 진핵 세포, (예컨대, 식물 세포, 또는 동물 세포 (e.g., 인간, 마우스 등의 포유류 세포 등) 등)에서 전사 개시를 조절할 수 있으면, 제한 없이 사용 가능하다. Polynucleotide fragments. In one embodiment, the promoter can be used without limitation as long as it can regulate transcriptional initiation in cells such as eukaryotic cells (eg, plant cells, or animal cells (eg, mammalian cells such as humans, mice, etc.)). Do.
예컨대, 상기 프로모터는 CMV프로모터 (cytomegalovirus promoter; (예컨대, 인간 또는 마우스 CMV i議 ediate-early프로모터), U6프로모터, EF1- a (elongat ion factor l-α) 프로모터, EFl-α short (EFS) 프로모터, SV40 프로모터, For example, the promoter may be a CMV promoter (e.g. human or mouse CMV i 'ediate-early promoter), a U6 promoter, an EF1-a (elongat ion factor l-α) promoter, an EFl-α short (EFS) promoter. , SV40 promoter ,
아데노바이러스 프로모터 (major late promoter), pi/프로모터, r 프로모터, lac 프로모터, tac프로모터, T7 프로모터 , 백시니아 바이러스 7.5K프로모터, HSV의 프로모터, SV40E1 프로모터, 호흡기 세포융합 바이러스 (Respiratory syncytial virus; RSV) 프로모터, 메탈로티오닌 프로모터 (metal lothkmin promoter ), β-액틴 프로모터, 유비퀴틴 C프로모터, 인간 IL-2 (human interleukin-2) 유전자 Adenovirus promoter, pi / promoter, r promoter, lac promoter, tac promoter, T7 promoter, vaccinia virus 7.5K promoter, HSV promoter, SV40E1 promoter, Respiratory syncytial virus (RSV) Promoter, metal lothkmin promoter, β-actin promoter, ubiquitin C promoter, human IL-2 (human interleukin-2) gene
프로모터, 인간 림포톡신 (human lymphotoxin) 유전자 프로모터, 인간 GM-CSF Promoter, human lymphotoxin gene promoter, human GM-CSF
(human granulocyte一 macrophage colony stimulating factor) 유전자 프로모터 등으로 이루어진 군에서 선택된 1 종 이상일 수 있으나, 이에 제한되는 것은 아니다. 일 예에서, 상기 프로모터는 CMV i隱 ediate-early프로모터, U6 프로모터, EF1- α (elongation factor 1- α ) 프로모터, EFl-α short (EFS) 프로모터 등으로 (human granulocyte one macrophage colony stimulating factor) may be one or more selected from the group consisting of gene promoters, but is not limited thereto. In one example, the promoter is a CMV i 隱 ediate-early promoter, a U6 promoter, an EF1-α (elongation factor 1-α) promoter, an EFl-α short (EFS) promoter, or the like.
이루어진 군에서 선택된 것일 수 있다. 상기 전사 종결 서열은 폴리아데닐화 서열 (pA) 등일 수 있다. 상기 복제 원점은 Π 복제원점, SV40 복제원점, pMBl 복제원점 , 아데노 복제원점, MV복제원점 , BBV 복제원점 등일 수 있다. It may be selected from the group consisting of. The transcription termination sequence is polyadenylation Sequence (pA) and the like. The replication origin may be π replication origin, SV40 replication origin, pMBl replication origin, adeno replication origin, MV replication origin, BBV replication origin and the like.
본 명세서에 기재된 백터는 플라스미드 백터, 코즈미드 백터 및  The vectors described herein include plasmid vectors, cosmid vectors and
박테리오파아지 백터, 아데노바이러스 백터, 레트로바이러스 백터 및 아데노 -연관 바이러스 백터와 같은 바이러스 백터로 이루어진 군에서 선택된 것일 수 있다. Viral vectors such as bacteriophage vectors, adenovirus vectors, retrovirus vectors and adeno-associated virus vectors.
상기 재조합 백터로 사용될 수 있는 백터는 당업계에서 사용되는 플라스미드 (예를 들면, pcDNA 시리즈, pSClOl, pGV1106, PACYC177, ColEl, pKT230, pME290, pBR322, pUC8/9, pUC6, pBD9, pHC79, IJ61,. pLAFRl, pHV14, pGEX 시리즈, pET시리즈, pUC19 등), 파지 (예를 들면, λ^4λΒ, λ -Charon, λ Δζΐ, M13 등) 또는 Vectors that can be used as the recombinant vector are plasmids used in the art (eg, pcDNA series, pSClOl, pGV1106, P ACYC177, ColEl, pKT230, pME290, pBR322, pUC8 / 9, pUC6, pBD9, pHC79, IJ61, pLAFRl, pHV14, pGEX series, pET series, pUC19, etc.), phage (e.g., λ ^ 4λΒ, λ-Charon, λ ζ, M13, etc.) or
바이러스 백터 (예를 들면, 아데노 -연관 바이러스 (AAV) 백터 등) 등을 기본으로 하여 제작될 수 있으나 이에 제한되는 것은 아니다. Viral vectors (eg, adeno-associated virus (AAV) vectors, etc.) and the like can be produced based on, but are not limited to.
상기 진핵 유기체는 진핵 세포 (예컨대, 효모 등의 균류, 진핵 동물 및 /또는 진핵 식물 유래 세포 (예컨대, 배아세포, 줄기세포, 체세포, 등)), 진핵 동물  The eukaryotic organisms may be eukaryotic cells (e.g., yeast, eukaryotic, and / or eukaryotic plant-derived cells (e.g. embryos, stem cells, somatic cells, etc.)), eukaryotic animals
(예컨대, 척추동물 또는 무척추동물, 보다 구체적으로, 인간,. 원승이 등의 영장류, 개, 돼지, 소, 양, 염소, 마우스, 래트 등을 포함하는 포유류 등), 및 진핵 식물 (예컨대, 녹조류 등의 조류, 옥수수, 콩, 밀, 벼 등의 단자엽 또는 쌍자엽 식물 등)로 이루어진 군에서 선택된 것일 수 있으나, 이에 제한되는 것은 아니다. (E. G., Vertebrate or invertebrate, more specifically human. Circle wins such as primates, dogs, swine, cattle, sheep, and mammals such as, including goats, mice, rats, etc.), and eukaryotic plants (e.g., green algae Algae, corn, soybeans, wheat, rice, such as monocotyledonous or dicotyledonous plants) may be selected from the group consisting of, but is not limited thereto.
상기 RNA 가이드 엔도뉴클레아제는 단일 가이드 RNA (sgRNA) 또는 이중 가이드 RNA (dual guide RNA)와 함께 흔합물 또는 복합체 형태로 존재할 수 있으며, RNA에 포함된 유전자 표적부위의 타겟팅 서열을 절단하여 유전자 교정 작용을 하는 엔도뉴클레아제를 의미하는 것으로, 대표적으로 Cas9 단백질 (CRISPR associated protein 9), Cpfl 단백질 (CRISPR from Prevotella and Franci sella 1) 등과 같은 타입 Π , 및 /또는 타입 V의 CRISPR/Cas 시스템에 수반되는 엔도뉴클레아제일 수 있다.  The RNA guide endonuclease may exist in the form of a complex or complex together with a single guide RNA (sgRNA) or a dual guide RNA, and may be genetically modified by cutting a targeting sequence of a gene target site included in the RNA. It refers to endonuclease that acts, typically in the type Π and / or type V CRISPR / Cas system such as Cas9 protein (CRISPR associated protein 9), Cpfl protein (CRISPR from Prevotella and Franci sella 1), etc. It may be an accompanying endonuclease.
Cas9 단백질은 스트랩토코커스 sp. {Streptococcus sp.), 예컨대,  Cas9 protein was found in Streptococcus sp. (Streptococcus sp.), For example
스트렙토코커스 피요게네스 [Streptococcus pyogenes) 유래의 것 (SwissProt From Streptococcus pyogenes (SwissProt)
Accession number Q99ZW2)일 수 있으나, 이에 제한되는 것은 아니다. Accession number Q99ZW2), but is not limited thereto.
Cpfl 단백질은 앞서 설명한 바와 같다 (예컨대, 표 1 참조).  Cpfl protein is as described above (see, eg, Table 1).
상기 Cas9 단백질, Cpfl 등의 엔도뉴클레아제는 미생물에서 분리된 것 또는 재조합적 방법 또는 합성적 방법으로 비자연적 생산된 것 (non-natural ly occurring)일 수 있다. 상기 엔도뉴클레아제는 진핵세포의 핵 내 전달을 위하여 통상적으로 사용되는 요소 (예컨대, 핵위치신호 (nuclear localization signal; NLS; 예컨대, PKKKRKV, K PAATKKAGQAKKKK, 또는 이를 암호화하는 핵산 분자) 등)를 N-말단 또는 C-말단 (또는 이를 암호화하는 핵산 분자의 5' 말단 또는 3' 말단)에 추가로 포함하는 것일 수 있으나, 이에 제한되는 것은 아니다. 상기 Endonuclease such as Cas9 protein, Cpfl, etc. are isolated from microorganisms or produced non-naturally by recombinant or synthetic methods (non-natural ly occurring). The endonuclease is an element commonly used for nuclear transfer of eukaryotic cells (eg, nuclear localization signal (NLS; for example, PKKKRKV, K PAATKKAGQAKKKK, or a nucleic acid molecule encoding the same), etc.) It may further include, but is not limited to, the terminal or C-terminus (or the 5 'end or 3' end of the nucleic acid molecule encoding it). remind
엔도뉴클레아제 단백질은 정제된 단백질 형태로 사용되거나, 이를 암호화하는 DNA, 또는 상기 DNA를 포함하는 재조합 백터의 형태로 사용될 수 있다. Endonuclease proteins may be used in the form of purified proteins, or in the form of DNAs encoding them, or recombinant vectors comprising said DNA.
상기 가이드 RNA는 복합체를 형성할 엔도뉴클레아제의 종류 및 /또는 그 유래 미생물에 따라서 적절히 선.택될 수 있다. 예컨대, 상기 가이드 RNA는 CRISPR RNA (crR A), i a^act ivat ing crRNA (tracrRNA), 및 단일 가닥 가이드 RNA (sgRNA)로 이루어진 군에서 선택된 1종 이상일 수 있으며, 엔도뉴클레오타이드 종류에 따라서, CRISPR RNA (crRNA) 단독, CRISPR RNA (crRNA) 및 ra ^activating crRNA  The guide RNA may be appropriately selected depending on the type of endonuclease and / or the microorganism derived from the endonuclease. For example, the guide RNA may be at least one selected from the group consisting of CRISPR RNA (crR A), ia ^ act ivat ing crRNA (tracrRNA), and single-stranded guide RNA (sgRNA), and according to the endonnucleotide type, CRISPR RNA (crRNA) alone, CRISPR RNA (crRNA) and ra ^ activating crRNA
(tracrRNA)의 복합체, 또는 단일 가닥 가이드 RNA (sgRNA)일 수 있다. (tracrRNA), or single stranded guide RNA (sgRNA).
예컨대, Cas9 단백질을 포함하는 복합체 (Cas9 시스템)은 목적하는 유전자 교정을 위하여 두 개의 가이드 RNA, 즉, 유전자의 표적 부위와흔성화 가능한 뉴클레오타이드 서열을 갖는 CRISPR RNA (crRNA)와 추가적인 ra? activating crRNA (tracrRNA)를 필요로 하며 , 이들 crRNA와 tracrRNA는 서로 결합된 이중 가닥 crRNA: tracrRNA 복합체 형태, 또는 링커를 통하여 연결되어 단일 가닥 가이드 RNA (single-stranded guide RNA; sgRNA) 형태로 사용된다. Cpfl 단백질을 포함하는 복합체 (Cpfl 시스템)은 목적하는 유전자 교정을 위하여 하나의 가이드 RNA, 즉, 유전자의 표적 부위와흔성화 가능한 뉴클레오타이드 서열을 갖는 crRNA을 필요로 한다.  For example, a complex comprising a Cas9 protein (Cas9 system) may be used to provide two guide RNAs, namely CRISPR RNA (crRNA), which has a nucleotide sequence that can be localized to a target site of a gene, and additional ra? Activating crRNA (tracrRNA) is required, and these crRNA and tracrRNA are used in the form of double stranded crRNA: tracrRNA complexes linked to each other, or in the form of single-stranded guide RNA (sgRNA) linked through a linker. Complexes containing Cpfl proteins (Cpfl systems) require one guide RNA, ie, a crRNA having a nucleotide sequence that is capable of hybridizing with a target site of a gene, for the desired gene correction.
상기 가이드 R A의 구체적 서열은 Cas9 단백질 또는 Cpfl 단백질의 종류 (유래 미생물)에 따라서 적절히 선택할 수 있으며, 이는 이 발명이 속하는 기술 분야의 통상의 지식을 가진 자가 용이하게 알 수 있는 사항이다.  The specific sequence of the guide R A may be appropriately selected according to the type of Cas9 protein or Cpfl protein (derived microorganism), which is easily understood by those skilled in the art.
일 예에서, Streptococcus pyogenes유래의 Cas9 단백질을 포함한 Cas9 시스템에 사용되는 crRNA는 다음의 일반식 2로 표현될 수 있다:  In one example, the crRNA used in the Cas9 system, including the Cas9 protein derived from Streptococcus pyogenes, can be represented by the following general formula:
5 ' -(Ncas9) GUUUUAGAGCUA-(Xcas9)m-3 ' (일반식 2; 서열번호 61) 5 '-(N cas9 ) GUUUUAGAGCUA- (X cas9 ) m -3' (formula 2; SEQ ID NO: 61)
상기 일반식 2에서,  In the general formula 2,
Ncas9는 유전자 표적 부위와 흔성화 가능한뉴클레오타이드 서열을 포함하는 타겟팅 서열 부위로서 표적 유전자의 표적 부위에 따라서 결정되는 부위이며, 1은 상기 타겟팅 서열 부위에 포함된 뉴클레오타이드 수를 나타내는 것으로 18 내지 22의 정수, 예컨대 20일 수 있고; N cas9 comprises a nucleotide sequence that can be localized with a gene target site A targeting sequence site is a site determined according to a target site of a target gene, and 1 represents the number of nucleotides included in the targeting sequence site, and may be an integer of 18 to 22, such as 20;
상기 타겟팅 서열 부위의 3 ' 방향으로 인접하여 위치하는 연속하는 12개의 뉴클레오타이드 (GUUUUAGAGCUA)를 포함하는 부위는 crRNA의 필수적 부분이고,  The site comprising 12 consecutive nucleotides (GUUUUAGAGCUA) located adjacent to the 3 'direction of the targeting sequence site is an essential part of the crRNA,
Xcas9는 crRNA의 31 쪽에 위치하는 (즉, 상기 crRNA의 필수적 부분의 3 ' 방향으로 인접하여 위치하는) m개의 뉴클레오타이드를 포함하는 부위로, m은 8 내지 12의 정수, 예컨대 10일 수 있으며, 상기 m개의 뉴클레오타이드들은 서로 같거나 다를 수 있으며 , A , U , C 및 G로 이루어진 군에서 각각 독립적으로 선택될 수 있다. X cas9 is a site comprising m nucleotides located on the 3 1 side of the crRNA (ie, located adjacent to the 3 ′ direction of the essential part of the crRNA), m may be an integer from 8 to 12, such as 10 The m nucleotides may be the same as or different from each other, and may be independently selected from the group consisting of A, U, C, and G.
일 예에서 , 상기 Xcas9는 UGCUGUUUUG를 포함할 수 있으나 이에 제한되지 않는다. In one example, the X cas9 may include but is not limited to UGCUGUUUUG.
또한, Streptococcus pyogenes유래의 Cas9 단백질을 포함한 Cas9 시스템에 사용되는 tracrRNA는 다음의 일반식 3으로 표현될 수 있다:  In addition, the tracrRNA used in the Cas9 system, including the Cas9 protein derived from Streptococcus pyogenes, can be represented by the following general formula:
5 ' -(Ycas9)p-5 '-(Y cas9 ) p-
UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC-3 ' (일반식 3 ; 서열번호 62) UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC-3 '' (Formula 3; SEQ ID NO: 62)
상기 일반식 3에서,  In the general formula 3,
60개의 뉴클레오타이드  60 nucleotides
(UAGCMGUUAAMUAA(^UAGUCCGUUAUCMCUUGAAAMGUG :AC^ 포함하는 부위는 t racrRNA의 필수적 부분이고 (The site containing UAGCMGUUAAMUAA (^ UAGUCCGUUAUCMCUUGAAAMGUG: AC ^ is an essential part of t racrRNA and
Ycas9는 상기 t racrRNA의 필수적 부분의 5 ' 말단에 인접하여 위치하는 p개의 뉴클레오타이드를 포함하는 부위로, p는 6 내지 20의 정수, 예컨대 8 내지 19의 정수일 수 있으며, 상기 P개의 뉴클레오타이드들은 서로 같거나 다를 수 있고, A , U, C 및 G로 이루어진 군에서 각각 독립적으로 선택될 수 있다. Y cas9 is a site containing p nucleotides located adjacent to the 5 'end of the essential part of the t racrRNA, p may be an integer of 6 to 20, for example, 8 to 19, wherein the P nucleotides are May be the same or different and may be independently selected from the group consisting of A, U, C and G, respectively.
또한, Streptococcus pyogenes유래의 Cas9 단백질을 포함한 Cas9 시스템에 사용되는 sgRNA는 상기 Cas9의 crRNA의 타겟팅 서열 부위와 필수적 부위를 포함하는 crRNA 부위와 상기 Cas9와 t racrRNA의 필수적 부위를 포함하는 tracrRNA 부위가 뉴클레오타이드 링커를 통하여 헤어핀 구조를 형성하는 것일 수 있다. 보다 구체적으로, 상기 sgRNA는 crRNA의 타겟팅 서열 부위와 필수적 부위를 포함하는 crRNA 부위와 상기 Cas9의 t racrRNA의 필수적 부위를 포함하는 t racrRNA 부위가 서로 결합된 이중 가닥 RNA분자에서 crRNA부위의 3 ' 말단과 tracrRNA 부위의 5 ' 말단이 뉴클레오타이드 링커를 통하여 연결된 헤어핀 구조를 갖는 것일 수 있다. In addition, the sgRNA used in the Cas9 system including the Cas9 protein derived from Streptococcus pyogenes is a nucleotide linker with a crRNA site including a target sequence site and an essential site of the crRNA of the Cas9 and a tracrRNA site including an essential site of the Cas9 and t racrRNA. It may be to form a hairpin structure through. More specifically, the sgRNA is a target sequence region and essential region of the crRNA In the double-stranded RNA molecule in which the crRNA site and the t racrRNA site including the essential site of the t racrRNA of Cas9 are bonded to each other, the 3 'end of the crRNA site and the 5' end of the tracrRNA site are connected to each other through a nucleotide linker. It may be to have.
crRNA의 타겟팅 서열 부위와 필수적 부위 및 tracrRNA의 필수적 부위는 앞서 설명한 바와 같다. 상기 sgRNA에 포함되는 뉴클레오타이드 링커는 3 내지 5개, 예컨대 4개의 뉴클레오타이드를 포함하는 것일 수 있으며, 상기  The targeting sequence site and the essential site of the crRNA and the essential site of the tracrRNA are as described above. The nucleotide linker included in the sgRNA may be one containing three to five, for example four nucleotides,
뉴클레오타아드들은 서로 같거나 다를 수 있고, A , U , C 및 G로 이루어진 군에서 각각 독립적으로 선택될 수 있다. 일 예에서, 상기 링커는 ' GAAA '의 The nucleotides may be the same or different from each other, and may be independently selected from the group consisting of A, U, C, and G. In one example, the linker is a 'GAAA'
뉴클레오타이드 서열을 갖는 것일 수 있으나 이에 제한되는 것은 아니다. It may have a nucleotide sequence, but is not limited thereto.
예컨대, 상기 sgRNA는 다음의 일반식 2로 표현될 수 있다:  For example, the sgRNA may be represented by the following general formula 2:
51 -(Ncas9)„rGUUUCAGUUG(:lJ- (링커;卜 5 1- (N cas9 ) „rGUUUCAGUUG (: lJ- (linker; 卜
AUGCUCUGUMUCAUUUAA GUAUUUUG CCXJACCUCUGUUUGACACGUCUG U CUAAA ' (일반식 4 ; 서열번호 63) AUGCUCUGUMUCAUUUAA GUAUUUUG CCXJACCUCUGUUUGACACGUCUG U CUAAA '' (Formula 4; SEQ ID NO: 63)
상기 일반식 4에서,  In the general formula 4,
Ncas9는 유전자 표적 부위와 흔성화 가능한 뉴클레오타이드 서열을 포함하는 타겟팅 서열 부위로서 표적 유전자의 표적 부위에 따라서 결정되는 부위이며, m은 상기 타겟팅 서열 부위에 포함된 뉴클레오타이드 수를 나타내는 것으로 16 내지 24의 정수 또는 18 내지 22의 정수 일 수 있고; N cas9 is a targeting sequence site including a nucleotide sequence capable of hybridization with a gene target site, and is a site determined according to a target site of a target gene, and m is an integer of 16 to 24 as representing the number of nucleotides included in the targeting sequence site. Or an integer from 18 to 22;
상기 링커는 3 내지 5개, 예컨대 4개의 뉴클레오타이드를 포함하는 것일 수 있으며,  The linker may be one containing three to five, such as four nucleotides,
상기 타켓팅 서열 부위 및 링커에 포함된 뉴클레오타이드들은 서로 같거나 다를 수 있고, A , U , C 및 G로 이루어진 군에서 각각 독립적으로 선택된.것일 수 있고, 예컨대, ' GAAA 1일 수 있다. The nucleotides included in the targeting sequence site and the linker may be the same or different from each other, and may be independently selected from the group consisting of A, U, C, and G. For example, 'GAAA 1 .
상기 Cas9 단백질의 crRNA (예컨대, 일반식 2로 표현됨) 또는 sgRNA (예컨대, 일반식 4로 표현됨)는 5 ' 말단 (즉, crRNA의 타겟팅 서열 부위의 5 ' 말단)에 1 내지 3개의 구아닌 (G)을 추가로 포함할 수 있다.  The crRNA (eg, represented by Formula 2) or sgRNA (eg, represented by Formula 4) of the Cas9 protein is 1 to 3 guanine (G) at the 5 ′ end (ie, 5 ′ end of the targeting sequence region of the crRNA). ) May be further included.
상기 Cas9 단백질의 tracrRNA또는 sgRNA는 tracrRNA의 필수적 부분 (60nt )의 3 ' 말단에 5개 내지 7개의 우라실 (U)을 포함하는 종결부위를 추가로 포함할 수 있다. 다른 예에서, Cpfl 단백질올 포함한 Cpfl 시스템에 있어서, 여기에 사용되는 crRNA는 앞서 설명한 바와 같다 (일반식 1 및 표 2 참조). The tracrRNA or sgRNA of the Cas9 protein may further comprise a termination region comprising 5 to 7 uracils (U) at the 3 ′ end of the essential portion (60nt) of the tracrRNA. In another example, in a Cpfl system comprising a Cpfl protein, the crRNA used herein is as described above (see Formula 1 and Table 2).
다른 예에세 Cpfl 단백질 및 Hifl-alpha 유전자를 타겟팅하는 crRNA의 안구 질환 치료 용도를 제공한다.  Another example provides a therapeutic use of eye diseases for crRNA targeting the Cpfl protein and Hifl-alpha gene.
Hifl-alpha (Hypoxia-inducible factor 1-alpha)는 헤테로다이머 전사 인자인 hypoxia- inducible factor 1 (HIF-1)의 서브유닛으로, HIF1A유전자에 의하여 암호화된다. 상기 Hifl— alpha는 포유류, 예컨대 인간 Hifl-alpha일 수 있으며 , NCBI accession no. NP_001230013.1, NP_001521.1, P_851397.1,  Hifl-alpha (Hypoxia-inducible factor 1-alpha) is a subunit of hypoxia-inducible factor 1 (HIF-1), a heterodimer transcription factor, encoded by the HIF1A gene. The Hifl—alpha may be a mammal, such as a human Hifl-alpha, NCBI accession no. NP_001230013.1, NP_001521.1, P_851397.1,
NP_001521.1 등으로 표현될 수 있으나 이에 제한되는 것은 아니다. HIF1A유전자는 포유류, 예컨대 인간 HIF1A유전자일 수 있으며, NCBI accession no. 丽_181054.1, 匪_001243084.1, NM_001530.1 등으로 표현될 수 있으나 이에 제한되는 것은 아니다. 구체적으로, 일 예는 It may be expressed as NP_001521.1 or the like, but is not limited thereto. The HIF1A gene can be a mammal, such as a human HIF1A gene, and NCBI accession no. It may be expressed as, but is not limited to: __181054.1, 匪 _001243084.1, NM_001530.1, and the like. Specifically, one example
Cpfl 단백질 또는 이를 암호화하는 DNA, 및  Cpfl protein or DNA encoding the same, and
Hifl-alpha 유전자의ᅳ표적 부위의 연속하는 15nt 내지 30nt의  Consecutive 15nt to 30nt of the target region of the Hifl-alpha gene.
뉴클레오타이드 서열 (표적 서열)과 흔성화 가능한뉴클레오타이드 서열을 The nucleotide sequence (target sequence) and the hybridizable nucleotide sequence
포함하는 crRNA또는 이를 암호화하는 DNA CrRNA containing or DNA encoding the same
를 포함하는, 안구 질환의 예방 또는 치료용 약학 조성물을 제공한다.  It provides a pharmaceutical composition for the prevention or treatment of eye diseases, including.
다른 예는,  Another example is
Cpfl 단백질 또는 이를 암호화하는 DNA, 및  Cpfl protein or DNA encoding the same, and
Hifl-alpha 유전자의 표적 부위의 연속하는 15nt 내지 30nt의  Consecutive 15nt to 30nt of the target site of the Hifl-alpha gene
뉴클레오타이드 서열 (표적 서열)과 흔성화 가능한뉴클레오타이드 서열을 The nucleotide sequence (target sequence) and the hybridizable nucleotide sequence
포함하는 crRNA또는 이를 암호화하는 DNA CrRNA containing or DNA encoding the same
를 안구 질환의 예방 또는 치료를 필요로 하는 대상에 투여하는 단계를 포함하는, 안구 질환의 예방 또는 치료 방법을 제공한다.  It provides a method for preventing or treating eye diseases, comprising the step of administering to a subject in need of the prevention or treatment of eye diseases.
상기 Cpfl 및 crRNA은 앞서 설명한 바와 같다.  The Cpfl and crRNA are as described above.
상기 약학 조성물 및 예방또는 치료 방법에 있어서, 상기 Cpfl 단백질을 암호화하는 DNA 및 상기 crRNA를 암호화하는 DNA를 별도의 백터에 각각 포함하거나 하나의 백터에 함께 포함하는 재조합 백터가 포함또는 투여될 수 있다.  In the pharmaceutical composition and the prophylactic or therapeutic method, a recombinant vector including DNA encoding the Cpfl protein and DNA encoding the crRNA in a separate vector or together in a single vector may be included or administered.
상기 백터로서, 앞서 설명한 종류의 백터를 사용할 수 있으며, 예컨대, 아데노부속 바이러스 (MV)를 사용할 수 있다. 상기 crRNA는 서열번호 69 내지 서열번호 79의 Hifl-a유전자의 표적 서열 중에서 선택된서열과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 것일 수 있다. As the vector, a vector of the kind described above can be used, for example, adeno-associated virus (MV) can be used. The crRNA may include a nucleotide sequence that is capable of hybridizing with a sequence selected from a target sequence of the Hifl-a gene of SEQ ID NO: 69 to SEQ ID NO: 79.
상기 안구 질환은 당뇨성 망막병증 또는 노인성 황반변성일 수 있다.  The ocular disease may be diabetic retinopathy or senile macular degeneration.
상기 Cpfl 단백질 또는 이를 암호화하는 DNA를 포함하는 재조합 백터, 및 A recombinant vector comprising the Cpfl protein or DNA encoding the same, and
Hifl-alpha 유전자의 표적 부위의 연속하는 15nt 내지 30nt의 표적 서열과 흔성화 가능한 뉴클레오타이드 서열올 포함하는 crRNA또는 이를 암호화하는 DNA를 포함하는 재조합 백터를 포함하는 흔합물 또는 리보핵산 단백질은 정맥투여 또는 병변 국소 투여, 에컨대 망막주입 (예컨대, subretinal injection또는 The complex or ribonucleic acid protein comprising a recombinant RNA comprising a crRNA comprising a target sequence of 15 to 30 nt of the target region of the Hifl-alpha gene and a nucleotide sequence which is capable of hybridization or a DNA encoding the same is administered intravenously or as a lesion. Topical administration, eg retinal injection (eg, subretinal injection or
intravitreal injection)에 의하여 투여될 수 있다. intravitreal injection).
상기 대상은 인간, 마우스 등의 포유 동물일 수 있다.  The subject may be a mammal, such as a human, a mouse.
【발명의 효과】 【Effects of the Invention】
본 발명은 Cpfl 시스템을 이용하여 보다효과적으로 진핵 세포 (예컨대, 인간, 마우스 등의 포유 동물 세포, 진핵 식물 세포)에서의 유전체 교정을 수행할 수 있고, 원하는 유전자가 knock-out 또는 knock-in된 형질전환 세포 및 /또는 형질전환 동물 /식물을 제조할 수 있다. 또한, RNA가이드 엔도뉴클레아제와 가이드 RNA를 포함하는 리보핵산단백질의 진핵 유기체 전달시, microinjection이 아닌 electroporation 방식을 채용함으로써 보다 효율적으로 리보핵산단백질을 진핵 유기체에 전달할 수 있다.  The present invention can be used to more effectively perform genome correction in eukaryotic cells (e.g., mammalian cells such as humans, mice, and eukaryotic plant cells) using the Cpfl system, and knock-out or knock-in traits of desired genes. Converting cells and / or transgenic animals / plants can be prepared. In addition, when eukaryotic organisms are delivered to ribonucleic acid proteins including RNA guide endonucleases and guide RNAs, ribonucleic acid proteins can be delivered to eukaryotic organisms more efficiently by employing electroporation rather than microinjection.
【도면의 간단한 설명】 [Brief Description of Drawings]
도 1은 재조합 AsCpfl과 crRNA를 포함하는 RNP를 mouse blastocyst에 microinjection으로 전달하는 과정을 모식적으로 보여준다.  1 schematically shows the process of delivering RNPs containing recombinant AsCpfl and crRNA to the mouse blastocyst by microinjection.
도 2는 T7E1 실험을 통해 blastocyst 에서 염기서열 변이가 있음을 확인한 결과이다.  Figure 2 shows the results confirmed that there is a nucleotide sequence mutation in the blastocyst through the T7E1 experiment.
도 3은 Cpfl RNP유전체 교정을 targeted deep sequencing으로 확인한 결과를 보여주는 것으로, Cpfl 이 유전체 절단을 일으킬 것으로 예상되는 염기서열 위치에 특이적으로 변이가 존재함이 확인되었다.  Figure 3 shows the results confirmed by targeted deep sequencing the Cpfl RNP genome correction, it was confirmed that there is a specific variation in the sequence position where Cpfl is expected to cause genome cleavage.
도 4 내지 6은 Cpfl RNP으로 유전체 교정된 생쥐에서 비특이적 염기서열 변이 분석 결과를 보여주는 것으로, Figures 4 to 6 show nonspecific sequences in mice genome corrected with Cpfl RNP To show the results of the mutation analysis,
도 4는 Cpf l RNP 를 사용해 제작된 생쥐의 꼬리에서 gDNA를 정제하여  4 is purified gDNA from the tail of the mouse prepared using Cpf l RNP
T7E1으로 특이적 위치에서 염기서열 변이 확인한 결과이고, The result of nucleotide sequence mutation at a specific position by T7E1,
도 5는 변이된 염기서열을 targeted deep sequencing으로 확인한 결과이고, 도 6은 꼬리 gDNA를 genome wide sequencing 하여 비특이적 위치에  FIG. 5 shows the results of confirming the mutated nucleotide sequences by targeted deep sequencing, and FIG. 6 shows genome wide sequencing of tail gDNAs in nonspecific positions
염기서열 변이가 없음을 확인한 결과이다. This is a result of confirming that there is no nucleotide sequence variation.
도 7 내지 10은 Electroporat ion으로 SpCas9 과 AsCpf l RNP를 전달하여 mouse embryo 에서 유전체 교정하는 것과 관련된 것으로,  7 to 10 are related to genome correction in mouse embryos by delivering SpCas9 and AsCpf l RNP to the electroporat ion,
도 7은 SpCas9/AsCpf l 과 sgRNA/crRNA를 결합하여 다수의 mouse embryo 에 el ectroporat i on을 통해 전달하는 과정을 모식적으로 보여주는 것이고,  FIG. 7 is a diagram schematically illustrating a process of combining SpCas9 / AsCpf l and sgRNA / crRNA and delivering them through el ectroporat i on a plurality of mouse embryos.
도 8은 SpCas9 RNP electroporat ion으로 일으킨 염기서열 변이를 T7E1으로 확인한 결과를 보여주는 것이고,  8 shows the results of confirming the sequence variation caused by SpCas9 RNP electroporat ion with T7E1,
도 9는 SpCas9 RNP elect roporat ion 으로 만돌어진 염기서열 변이를  Figure 9 shows the sequence mutations made with SpCas9 RNP elect roporat ion
targeted deep sequencing으로 분석한 결과이고, results from targeted deep sequencing,
도 10은 AsCpf l RNP electroporat ion 에 의해 생긴 염기서열 변이를  10 is a sequence variation generated by AsCpf l RNP electroporat ion
targeted deep sequencing으로 분석한 결과이다. This is the result of analysis by targeted deep sequencing.
도 11은 콩 원형질체에서 상동 FAD2 유전자들의 AsCpf l 과 LbCpf l 재조합 단백질에 의한 유전체 교정 방법을 보여주는 모식도이다.  11 is a schematic diagram showing a method for genome correction by AsCpf l and LbCpf l recombinant proteins of homologous FAD2 genes in soybean protoplasts.
도 12 및 도 13은 FAD2 유전자들의 염기서열 변이 분석 결과를 보여주는 것으로,  12 and 13 show the results of nucleotide sequence analysis of the FAD2 genes,
도 12는 AsCpf l 과 LbCpf l을 사용한 유전체 교정 효율을 보여주는 결과이고, 도 13은 targeted deep sequencing 을 통한 특이적 염기서열 변이 확인 결과이다.  12 shows the results of genome calibration efficiency using AsCpf l and LbCpf l, and FIG. 13 shows the results of confirming specific sequence variation through targeted deep sequencing.
도 14a 및 14b는 Pl asmid U6-crRNA와 PCR product U6-crRNA를 이용한 세포 유전체 교정 및 효율 비교한 결과를 보여주는 것으로,  14a and 14b show the results of cell genome calibration and efficiency comparison using Pl asmid U6-crRNA and PCR product U6-crRNA.
14a는 T7E1 assay를 통하여 plasmid U6_crRNA를 사용한 경우와 PCR product U6-crRNA를 사용한 경우의 세포 유전체 교정 효율을 비교한 결과를 보여주는 전기영동 사진이고,  14a is an electrophoresis photograph showing the results of comparison of cellular genome calibration efficiency with plasmid U6_crRNA and PCR product U6-crRNA using T7E1 assay.
14b는 Targeted-de印 sequencing 방법올 이용한 세포 유전체 교정 효율의 정량 분석 결과를 보여주는 그래프이다. 도 15a 및 15b는 재조합 Cpfl 단백질 정제 및 activity 확인올 위한 in vitro cleavage assay 결과를 보여주는 것으로, 14b is a graph showing the results of quantitative analysis of cellular genome calibration efficiency using the targeted-deprint sequencing method. Figure 15a and 15b shows the results of in vitro cleavage assay for purification and activity of recombinant Cpfl protein,
15a는 AsCpfl 및 LbCpf 1를 박테리아에서 발현 및 정제하여 SDS— PAGE 전기영동으로 확인한 결과이고,  15a was confirmed by SDS—PAGE electrophoresis by expressing and purifying AsCpfl and LbCpf 1 in bacteria,
15b는 정제한 재조합 Cpfl 단백질과 in vitro transcript ion(T7) 또는 합성한 (synthetic) crRNA를 사용하여 target DNA를 절단하고 TBE— agarose gel로 전기영동한 결과이다.  15b is the result of cleavage of target DNA using purified recombinant Cpfl protein and in vitro transcript ion (T7) or synthetic crRNA and electrophoresis with TBE-agarose gel.
도 16a 내지 16c는 재조합 Cpfl과 crRNA로 이루어진 RNP를 통한 세포 유전체 교정 결과를 보여주는 것으로,  16a to 16c show the results of cellular genome calibration through RNP consisting of recombinant Cpfl and crRNA,
16a는 As-/Lb-Cpfl과 crRNA로 이루어진 RNP 전달에 의한 세포 유전체 교정을 T7E1 assay에 의하여 확인한 전기영동 사진이고,  16a is an electrophoresis photograph confirmed by T7E1 assay for cellular genome correction by RNP delivery consisting of As- / Lb-Cpfl and crRNA,
16b는 targeted deep-sequencing 방식으로 Cpfl RNP의 세포 유전체 교정 효율을 측정하고 이를 정량한 결과를 보여주는 그래프이며,  16b is a graph showing the results of measuring and quantifying the cellular genome calibration efficiency of Cpfl RNP by targeted deep-sequencing.
16c는 화학적 합성 (synthetic) crRNA를 이용한 세포 유전체 교정을 T7E1으로 측정하여 in vitro transcript ion으로 만들어진 crRNA와 효율을 비교하여 보여주는 전기영동사진이다.  16c is an electrophoresis photograph of the cellular genome calibration using synthetic crRNA with T7E1, comparing the efficiency with that of the in vitro transcript ion.
도 17a 내지 17c는 Cpfl과 crRNA를 이용한 세포 유전체의 in vitro cleavage 및 Digenome-seq 결과를 보여주는 것으로,  17a to 17c show in vitro cleavage and Digenome-seq results of the cell genome using Cpfl and crRNA,
17 a는 Cpfl 단백질과 crRNA를 이용한 시험관 내 세포 유전체 절단을 통한 qPCR과 Digenome-seq의 모식도이고,  17 a is a schematic diagram of qPCR and Digenome-seq through in vitro cellular genome cleavage using Cpfl protein and crRNA,
17b는 세포 유전체에 Lb-/As-cpfl 단백질 (3nM_300nM)과 crRNA(9nM- 900nM)으로 절단 처리한후 남아있는 표적위치 유전체를 qPCR로 정량한 결과를 보여주는 그래프이고,  17b is a graph showing the results of quantifying qPCR of the remaining target site genome after cleavage with Lb- / As-cpfl protein (3nM_300nM) and crRNA (9nM-900nM) in the cell genome.
17c는 세포 유전체를 시험관 내 절단 전과 후의 세포 유전체를 각각 전체 유전체 시퀀싱하여 표적위치 근처의 sequence read들을 IGV로 비교한 결과를 보여준다.  17c shows the results of IGV comparing sequence reads near the target site by sequencing the whole genome of the cell genome before and after in vitro cleavage.
도 18a 및 18b는 Cpfl과 crRNA를 이용한 Digenome-seq 결과를 보여주는 것으로,  18a and 18b show Digenome-seq results using Cpfl and crRNA,
18a는 Digenome-seq 결과 검출된 비표적 후보의 유전체 상 위치 및 유전자 서열을 보여주고, 18b는 비표적 후보 위치의 보존된 서열 (conserved sequence)올 서열 로고 (sequence logo)로 표시한 것이다ᅳ 18a shows the genomic position and gene sequence of the non-target candidate detected by Digenome-seq, 18b is indicated by the sequence logo of the conserved sequence of non-target candidate positions.
도 19a는 T7E1 assay를 통하여 plasmid crRNA를 사용한 경우와 PCR product crRNA를 사용한 경우의 세포 유전체 교정 효율을 비교한 결과를 보여주는 전기영동 사진이다.  Figure 19a is an electrophoresis picture showing the results of comparing the cell genome calibration efficiency when using plasmid crRNA and PCR product crRNA by T7E1 assay.
도 19b는 4종의 Cpfl orthologs 각각에 대한 crRNA를사용하여 targeted deep sequencing방법으로 측정된 Indel f requencies(%)를 나타낸 그래프이다 (Error bars indicate s.e.m) .  19B is a graph showing Indel f requencies (%) measured by targeted deep sequencing method using crRNA for each of the four Cpfl orthologs (Error bars indicate s.e.m).
도 19c 는 HEK293T세포 내의 10개의 내재 표적 위치 (endogenous target sites)에서 LbCpfl, AsCpfl, 및 SpCas9각각에 의하여 유도되는 변이 빈도  19C shows the frequency of mutations induced by LbCpfl, AsCpfl, and SpCas9 at ten endogenous target sites in HEK293T cells.
(Mutation frequencies; Indel frequencies (%))를 보여주는 그래프이다 (Mean indel frequencies 士 s.e.m. are shown) .  Graph showing (Mutation frequencies; Indel frequencies (%)) (Mean indel frequencies 士 s.e.m. are shown).
도 20a내지 20c는 HEK293T cell에서의 on target 에 대한 crRNA및 상기 on target과 하나 또는 2개의 mismatched nucleotide를 갖는 서열에 대한 crRNA를 사용한 경우의 Indel frequency(%)를 targeted deep sequencing로 측정하여, Cpfl의 Specificity를 보여주는 것으로,  20a to 20c are measured by targeted deep sequencing of the indel frequency (%) when using the crRNA for the on target in the HEK293T cells and the crRNA for the sequence having one or two mismatched nucleotides with the on target, the Cpfl of Showing specificity,
20a는 /¾ 7 -3에 대한 결과를 보여주는 그래프이고,  20a is a graph showing the result for / ¾ 7 -3
20b는 D匪 T1-4에 대한 결과를 보여주는 그래프이며, _一  20b is a graph showing the result for D 匪 T1-4, _ 一
20c는 MVS1에 대한 결과를 보여주는 그래프이다 (Error bars indicate s.e.m).  20c is a graph showing the results for MVS1 (Error bars indicate s.e.m).
도 21a내지 21f 는 Cpfl 및 Cas9뉴클레아제의 Genome-wide target specificity를 Digenome_seq 방식으로 측정한 결과를 보여주는 으로,  21a to 21f show the results of measuring the genome-wide target specificity of Cpfl and Cas9 nuclease by Digenome_seq method,
21a 및 21b는 whole-genome sequencing및 Digenome一 seq분석법에 의하여 얻어진 DNA cleavage scores를 보여주는 Genome-wide Circos plot으로, 본래의 유전체 DNA는 붉은 색으로 나타내고, LbCpfl로 절단된 유전체 DNA는 녹색, AsCpfl 로 절단된 유전체 DNA는 파란색, 및 SpCas9로 절단된 유전체 DNA는 노란색으로 각각 표시되어 있으며, 별표는 본래의 유전체 DNA에서 발견되는 하나의 false- positive site를 나타내고, 화살표는 on-target site을 나타내며, Sequence logos는 Digenome-seq에 의하여 동정된 in vitro cleavage site에서의 DNA서열을 이용한 WebLogo를 통하여 측정하였으며, 21c는 Digenome一 seq에 의하여 capture된 상동부위 (homologous sites)와 Fractions (왼쪽 Y축, 사각형 표시는 AsCpf 1에 대한 결과이고, 세모 표시는 21a and 21b are Genome-wide Circos plots showing DNA cleavage scores obtained by whole-genome sequencing and Digenome sequencing. Original genomic DNA is shown in red, and LbCpfl cleaved genomic DNA is green and AsCpfl. Genomic DNA is blue and genomic DNA cleaved with SpCas9 is shown in yellow, with an asterisk indicating one false-positive site found in the original genomic DNA, arrows indicating on-target sites, and sequence logos. Was determined by WebLogo using the DNA sequence at the in vitro cleavage site identified by Digenome-seq. 21c shows homologous sites and Fractions (left Y-axis, square marks are the result for AsCpf 1) captured by Digenome 一 seq.
LbCpfl에 대한 결과임) 및 8 Cpfl on-target sites에서부터 mismatch 개수에 의하여 bin되는 6 nucleotides까지 8 Cpfl orrtarget sites와상이한 homologous site의 개수 (오른쪽 Y축, bars)를 나타내며 (Error bars indicate s.e.m.), From the results Im) and 8 Cpfl on-target sites for LbCpfl up by the mismatch count bin 6 nucleotides is 8 Cpfl orrtarget sites eddy this indicates a number (right Y axis, bars of the homologous site) (Error bars indicate sem) ,
21d는 targeted deep sequencing에 의하여 인간 세포에서 확인된 off- target site을 보여주는 그래프로서 , on-target과 off-target 부위의 DNA서열도 함께 나타나 있으며 (굵은 글씨는 PAM서열이고 Mismatched뉴클레오타이드는 소문자로 표시됨),  21d is a graph showing off-target sites identified in human cells by targeted deep sequencing, including DNA sequences of on-target and off-target sites (bold letters are PAM sequences and Mismatched nucleotides are shown in lowercase letters). ,
21e는 상기 off-target site에 흔성화하도록 재설계된 crRNA를 이용하여 21e was designed using crRNA redesigned to localize to the off-target site.
AsCpf 1 off-target 부위에서 얻어진 Targeted mutagenesis (Indel frequency (%))를 보여주는 그래프이고, A graph showing the targeted mutagenesis (Indel frequency (%)) obtained at the AsCpf 1 off-target site,
21f 는 Cpfl 및 crRNA를 암호화하는 플라스미드를 사용한 경우와 Cpfl 및 crRNA가 복합체를 형성하는 RNP를 사용한 경우의 Cpfl off-target 효과를 보여주는 그래프로서, specificity ratio는 Cpfl RNP를 사용하여 얻어진 off- target indel frequency에 대한 on—target indel frequency의 비율과  21f is a graph showing the Cpfl off-target effect when using plasmids encoding Cpfl and crRNA and when using RNPs in which Cpfl and crRNA are complexed. Specificity ratio is the off-target indel frequency obtained using Cpfl RNP. The ratio of on—target indel frequency to
풀라스미드를 사용한 경우의 비율 간 fold difference (RNA/plasmid)를 나타낸다. 도 22a 내지 22f 는 Cpf 1-mediated D i genome-c ap t ur ed site의 Sequence logos를 보여주는 것으로, 상단은 AsCpfl를 사용하여 얻어진 Di genome-captured site의 Sequence logos이고, 하단은 LbCpfl를 사용하여 얻어진 Digenome-captured site의 Sequence logos이다. The fold difference (RNA / plasmid) between the ratios with the full lasmid is shown. 22A to 22F show the sequence logos of the Cpf 1-mediated DNA genome-c ap ured site, the upper part is sequence logos of Di genome-captured sites obtained using AsCpfl, and the lower part is obtained using LbCpfl. Sequence logos from the Digenome-captured site.
도 23은 Digenome-captured site의 Sequence logos를 나타낸 것이다.  23 shows Sequence logos of Digenome-captured sites.
도 24a 내지 24f 는 HEK293T17 세포에서의 Digenome-captured site에서의 Indel frequency를 나타낸 그래프로서, 진한 막대는 LbCpfl 플라스미드로  24A to 24F are graphs showing the Indel frequency at the Digenome-captured site in HEK293T17 cells, the dark bar being the LbCpfl plasmid
트랜스펙션된 ΗΕΚ293ΤΓ7 세포에서 얻어진 결과이고, 연한 막대는 AsCpfl Results obtained from transfected ΗΕΚ293ΤΓ7 cells, light rods AsCpfl
플라스미드로 트랜스펙션된 ΗΕΚ293ΊΊ7 세포에서 얻어진 결과이다. Results obtained from ΗΕΚ293ΊΊ7 cells transfected with plasmids.
도 25는 3' 말단에서 절단된 (truncated) 절단 crRNA (tru-crRNAs)와 전장 crRNA (full-length crRNA)를 사용한 경우의 on-target 부위 및 off-target 부위에서의 Indel frequencies를 보여주는 그래프이다 (Error bars represent mean 土 s.e.m). 도 26a 내지 26e는 Cpfl orthologs가 상이한 overhang 패턴 및 변이 특성을 나타냄을 보여주는 것으로, FIG. 25 is a graph showing indel frequencies at the on-target and off-target sites when truncated truncated crRNAs (tru-crRNAs) and full-length crRNAs are used at the 3 'end (FIG. Error bars represent mean 土 sem). 26a to 26e show that Cpfl orthologs exhibit different overhang patterns and variation characteristics,
26a는 DNTMl-?> target site 및 DNTMl- target site에서의 overhang pattern을 보여주는 대표적인 Integrative Genomics Viewer(IGV) 이미지이고, 26b는 염기쌍 내에서 deletion/insertion크기에 의하여 bin된 변이 서열 리드의 개수를 보여주는 그래프이며 ,  26a is a representative Integrative Genomics Viewer (IGV) image showing overhang patterns at the DNTMl-?> Target site and DNTMl-target site, and 26b is a graph showing the number of mutant sequence reads binned by deletion / insertion size in base pairs. ,
26c는 Cpfl 또는 Cas9의 target site에서 유도되는 변이 서열을 보여주는 것으로, 각각의 뉴클레아제에 대하여, 찻번째 줄의 서열은 원래의 target 서열이고, 두 번째 즐부터는 변이가도입된 서열을 보여주며, 첫 번째 줄 서열에서 PAM서열 (Cpfl: TTTC)은 굵은 글씨로 표시하고, crRNA/sgR A이 흔성화하는 표적 서열은 밑줄로 표시하였으며, 두 번째 줄부터의 서열에서 밑줄로 표시된 서열은 Microhomology sequences를 의미하고, 우측에 기재된 숫자는 결실 ('- '로 표시)되거나 삽입 (소문자로 표시)된 뉴클레오타이드의 개수를 의미하며,  26c shows the mutant sequence derived from the target site of Cpfl or Cas9, for each nuclease, the sequence of the second row is the original target sequence, from the second bladder shows the mutated sequence, In the first line, the PAM sequence (Cpfl: TTTC) is shown in bold, the target sequence where crRNA / sgR A is active is underlined, and the underlined sequence in the sequence from the second line represents the microhomology sequences. The number on the right means the number of nucleotides deleted (indicated by '-') or inserted (in lowercase),
26d 및 26e는 LbCpfl, AsCpfl 및 SpCas9에 의하여 유도되는 변이 특성을 보여주는 것으로, 26d는 변이 서열이 결실 vs. 삽입의 두 가지 fraction으로 나뉘어지는 경우의 각각의 비율을 보여주는 그래프이고, 26e는 변이 서열이 in- frame indels vs. out— of— frame indels의 두 가지 fraction으로 나뉘어지는 경우의 각각의 비율을 보여주는 그래프이다 (Data represent mean 士 s.e.m. (n = 10 target sites)) .  26d and 26e show mutation characteristics induced by LbCpfl, AsCpfl, and SpCas9, and 26d shows the deletion vs. deletion sequence. This is a graph showing the ratio of each of the cases divided into two fractions of the insert, and 26e shows the variation sequence in-frame indels vs. out—of— A graph showing the ratio of each case divided into two fractions of frame indels (Data represent mean ∗ s.e.m. (n = 10 target sites)).
도 27a 및 27b은 LbCpfl, AsCpfl, 및 SpCas9에 의하여 유도되는 변이 특성을 보여주는 것으로,  27A and 27B show variation characteristics induced by LbCpfl, AsCpfl, and SpCas9.
27a는 염기쌍 내에서 deletion/insertion (Indel) size에 의하여 bin된 변이 서열 리드의 개수를 보여주는 그래프이고, 변이 특성은 LbCpfl, AsCpfl, 또는 SpCas9폴라스이드로 트랜스펙션된 HEK293T세포로부터 targeted deep sequencing 방식으로 측정하였으며,  27a is a graph showing the number of variant sequence reads binned by deletion / insertion (Indel) size in base pairs, and the mutation characteristics were targeted deep sequencing from HEK293T cells transfected with LbCpfl, AsCpfl, or SpCas9 poles. Measured,
27b는 EMX1-2 target site (CTGATGGTCCATGTCTGTTACTC; 서열번호 42)에서 유도되는 변이 서열을 보여주는 것으로, 각각의 뉴클레아제에 대하여, 첫 번째 줄의 서열은 원래의 target 부위 서열이고, 두 번째 줄부터는 변이가 도입된 서열을 보여주며, 첫 번째 줄 서열에서 PAM서열 (Cpfl: TTTG)은 굵은 글씨로 표시하고, crRNA/sgRNA이 흔성화하는 표적 서열은 밑줄로 표시하였으며, 두 번째 줄부터의 서열에서 밑줄로표시된 서열은 Microhomology sequences를 의미하고, 우측에 기재된 숫자는 결실 ('-'로 표시)되거나 삽입 (소문자로 표시)된 27b shows a variant sequence derived from the EMX1-2 target site (CTGATGGTCCATGTCTGTTACTC; SEQ ID NO: 42), for each nuclease, the first row of the sequence is the original target site sequence and the second is the mutation PAM sequence (Cpfl: TTTG) is shown in bold in the first line sequence, and the target sequence where crRNA / sgRNA is popularized is underlined. The underlined sequence in the sequence from the line refers to the Microhomology sequences, and the numbers shown on the right are deleted (indicated by '-') or inserted (in lowercase)
뉴클레오타이드의 개수를 의미한다. It means the number of nucleotides.
도 28은 Di genome— Sequencing 과정을 모식적으로 보여준다.  Figure 28 schematically shows the Di genome—Sequencing process.
도 29a 및 29b는 Cpfl 단백질의 split 위치와분리된 Cpfl 단백질를 발현시키는 재조합 백터 구성을 보여주는 것으로.  29A and 29B show recombinant vector constructs expressing Cpfl protein separated from the split position of Cpfl protein.
29a는 Wild type Acidaminococcus sp. Cpfl (AsCpfl) 단백질과 4 종류의 Split-Cpfl 정보를 보여주고,  29a is Wild type Acidaminococcus sp. Cpfl (AsCpfl) protein and four types of Split-Cpfl information,
29b는 Split-Cpfl의 각 하프 도메인을 발현시키는 재조합 백터를 모식적으로 보여준다.  29b schematically shows a recombinant vector expressing each half domain of Split-Cpfl.
도 30a 내지 30c는 Split Cpfl과 crRNA 발현 백터를 이용한 유전체 교정결과를 보여주는 것으로,  30a to 30c show the results of genome calibration using Split Cpfl and crRNA expression vector,
30a는 Split-Cpfl을 이용한 DNMT1- 표적 유전체 교정 결과를 T7E1 assay 방식으로 확인하여 보여주는 아가로스 겔 분석 결과로서 . 별 표시는 T7E1 효소에 잘린 DNA조각 위치를 나타내며 ,  30a is an agarose gel assay that shows the results of DNMT1-target genome calibration using Split-Cpfl by T7E1 assay. Stars indicate the location of the DNA fragment cut by the T7E1 enzyme,
30b는 Split 위치에 따른 유전체 교정 효율을 Targeted deep-sequencing 방식으로 정량한 결과를 비교하여 보여주는 그래프이고,  30b is a graph comparing the results of quantification of genome calibration efficiency according to the split position by the targeted deep-sequencing method.
30c는 표적 위치에 따른 Split— Cpfl 유전체 교정 효율을 Targeted deep- sequencing 방식으로 정량한 결과를 비교하여 보여주는 그래프이다.  30c is a graph comparing the results obtained by quantifying the split—Cpfl genome calibration efficiency according to the target position by the targeted deep sequencing method.
도 31a 내지 31e는 Split Cpfl의 각 하프 도메인의 결합조절을 이용한 유도적 유전체 교정 효율을 분석한 결과를 보여주는 것으로,  31a to 31e show the results of analyzing the induction genome calibration efficiency using the binding control of each half domain of Split Cpfl,
31a는 Inducible-Split-Cpfl의 각 하프도메인을 발현시키는 재조합 백터 구성을 모식적으로 보여주고  31a schematically shows a recombinant vector construct that expresses each half-domain of Inducible-Split-Cpfl.
31b는 Rapamycin 처리에 따른 Split-Cpfl과 Inducible— Spl it—Cpfl을 이용한醒 ΊΊ-?> 표적 유전체 교정 효율을 targeted deep-sequencing 방식으로 확인한 결과를 보여주며,  31b shows the results of the targeted deep-sequencing method of split-Cpfl and Inducible—Spl it—Cpfl using Rapamycin treatment.
31c 내지 31f 는 표적 위치에 따른 Inducible-Split-Cpfl에 의한유도적 유전체 교정 효율을 targeted deep-sequencing 방식으로 분석한 결과를 보여준다. 도 32a 및 32b는 Split Cpfl의 각 하프 도메인을 발현하는 바이러스 백터 제작 과정을 보여주는 것으로, 32a는 Spl i t— Cpfl(Spl i t-3-AsCpfl)의 각 하프도쩨인을 발현하는 AAV , 바이러스 백터 구성을 모식적으로 보여주고, 31c to 31f show the results of analyzing the inducible genome correction efficiency by Inducible-Split-Cpfl according to the target position by the targeted deep-sequencing method. 32A and 32B show a process of constructing a viral vector expressing each half domain of Split Cpfl. 32a schematically shows the composition of an AAV, viral vector expressing each half-domain of Spl it— Cpfl (Spl i t-3-AsCpfl),
32b는 MV-Spl i t-Cpfl 백터를 이용한 Z¾W77-3 표적 유전체 교정 효율을 T7E1 assay 방식으로 확인한 결과를 보여준다.  32b shows the results of confirming Z¾W77-3 target genome calibration efficiency using MV-Spl i t-Cpfl vector by T7E1 assay.
도 33은 pU6-As-crRNA플라스미드의 뉴클레오타이드 서열을 보여주는 것으로, 밑줄로 표시된 부분은 AsCpf l crRNA에 해당하는 부위이다.  Figure 33 shows the nucleotide sequence of the pU6-As-crRNA plasmid, with the underlined portion corresponding to AsCpf l crRNA.
도 34은 pU6-Lb-crRNA플라스미드의 뉴클레오타이드 서열을 보여주는 것으로, 밑즐로 표시된 부분은 LbCpfl crRNA에 해당하는 부위이다.  Figure 34 shows the nucleotide sequence of the pU6-Lb-crRNA plasmid, with the underlined portion corresponding to the LbCpfl crRNA.
도 35은 U6-As-crRNA-ampl icon의 뉴클레오타이드 서열올 보여주는 것으로 , 밑줄로 표시된 부분은 AsCpfl crRNA에 해당하는 부위이다.  Figure 35 shows the nucleotide sequence of the U6-As-crRNA-ampl icon, with the underlined portion corresponding to AsCpfl crRNA.
도 36은 U6-Lb-crRNA-ampl icon 의 뉴클레오타이드 서열을 보여주는 것으로, 밑줄로 표시된 부분은 LbCpfl crRNA에 해당하는 부위이다.  Figure 36 shows the nucleotide sequence of the U6-Lb-crRNA-ampl icon, the underlined portion is the site corresponding to LbCpfl crRNA.
도 37은 LbCpfl 단백질 및 Hi fl-a 유전자의 표적 서열과 흔성화 가능한 crRNA를 MV 백터를 통하여 293T 세포에 전달하여 얻어진 Indel frequency (%)를 사용하여 얻어진 Deep sequencing으로 분석한 결과를 보여주는 그래프이다.  FIG. 37 is a graph showing the results of deep sequencing analysis using the Indel frequency (%) obtained by transferring the target sequence of the LbCpfl protein and Hi fl-a gene and the hybridizable crRNA to 293T cells through the MV vector.
도 38은 LbCpfl 단백질을 암호화하는 DNA와 Hi fl-a의 Lb-TS6올 타겟팅하는 crRNA를 암호화하는 DNA를 하나의 백터에 포함하는 재조합 MV 백터 (al 1-in-one AAV vector)를 예시적으로 보여주는 모식도이다.  38 exemplarily shows a recombinant MV vector (al 1-in-one AAV vector) comprising DNA encoding LbCpfl protein and DNA encoding hi-fl-a targeting LRNA of crb-TS6 ol in one vector. It is a schematic diagram showing.
도 39a 내지 39c는 LbCpfl 단백질을 암호화하는 DNA와 Hi fl-a의 Lb-TS6을 타겟팅하는 crRNA를 암호화하는 DNA를 하나의 백터에 포함하는 재조합 MV 백터의 뉴클레오타이드 서열을 51에서 3' 방향으로 연속적으로 보여준다. 39A-39C show the nucleotide sequence of a recombinant MV vector comprising a DNA encoding the LbCpfl protein and a DNA encoding a crRNA targeting Lb-TS6 of Hi fl-a in a 5 1 to 3 'direction. Shows.
【발명을 실시하기 위한 구체적인 내용】 [Specific contents to carry out invention]
이하, 실시예를 통하여 본 발명을 더욱 상세히 설명하고자 한다. 이들 실시예는 오로지 본 발명을 보다 구체적으로 설명하기 위한 것으로서, 본 발명의 범위가 이들 실시예에 의해 제한되지 않는다는 것은 본 발명이 속하는  Hereinafter, the present invention will be described in more detail with reference to Examples. These examples are only for illustrating the present invention in more detail, it is to be understood that the scope of the present invention is not limited by these examples.
기술분야에서 통상의 지식을 가진 자에 있어 자명할 것이다. 실시예 1: 재조합 Cpfl 단백질의 생산 및 정제 AsCpfl 및 LbCpfl 각각의 E.coli codon optimized DNA서열 (서열번호 44: E.coli codon optimized AsCpfl coding nucleic acid; 서열번호 46: E.coli codon optimized LbCpfl coding nucleic acid)과, 핵위치화서열 (NLS)- (링커) -HA tag을 포함하는 단백질 발현 및 정제를 위한서열 (아미노산서열: (KRPAATKKAGQAKKKK)- ( GS ) - ( YPYDVPDYA )― ( YPYDVPDYA YPYDVPDYA ); DNA서열: It will be apparent to those of ordinary skill in the art. Example 1: Production and Purification of Recombinant Cpfl Protein E. coli codon optimized DNA sequence of AsCpfl and LbCpfl (SEQ ID NO: 44: E. coli codon optimized AsCpfl coding nucleic acid; SEQ ID NO: 46: E. coli codon optimized LbCpfl coding nucleic acid), and nuclear localization sequence (NLS) -Sequence for expression and purification of protein comprising (linker) -HA tag (Amino acid sequence: (KRPAATKKAGQAKKKK)-(GS)-(YPYDVPDYA)-(YPYDVPDYA YPYDVPDYA); DNA sequence:
CGCTTATCCCTACGACGT( CTGATTAT( ATACCCATATGATGTCCC^ 갖는 pi asmi dCGCTTATCCCTACGACGT (CTGATTAT (ATACCCATATGATGTCCC ^ having pi asmi d
(pMAL-c5x, New England Biolabs; & pDEST-hisMBP)를 박테리아에서 (Rosetta; EMD Milipore)에 도입시키고 18°C에서 24시간동안 배양하여 AsCpfl 및 LbCpfl단백질을 발현시켰다. 50 mg/ml carbenicilin보층된 Luria broth (LB) 성장배지 2L에 상기 24시간 배양된 Cpfl plasmids를 포함하는 Rosetta 세포 10 ml를 넣고 (pMAL-c5x, New England Biolabs; & pDEST-hisMBP) was introduced into bacteria (Rosetta; EMD Milipore) and incubated at 18 ° C. for 24 hours to express AsCpfl and LbCpfl proteins. 10 ml of Rosetta cells containing Cpfl plasmids cultured for 24 hours were added to 2 L of 50 mg / ml carbenicilin supplemented Luria broth (LB) growth medium.
인큐베이팅하였다. 상기 세포들을 37°C에서 0D600이 0.6이 될 때까지 배양한후, 16°C로 냉각한후, 0.5 mM IPTGdsopropyl beta-EKL— thiogalactopyranoside)로 14- 18 시간 동안 유도하였다. 그 후, 세포들을 수집하고 단백질 정제시까지 -80°C에서 동결시켰다. Incubated. The cells were incubated at 37 ° C. until 0D600 became 0.6, cooled to 16 ° C., and induced with 0.5 mM IPTGdsopropyl beta-EKL—thiogalactopyranoside) for 14-18 hours. After that, cells were collected and frozen at -80 ° C until protein purification.
단백질 정제는 다음의 과정으로 수행하였다: 상기 준비된 세포 펠렛올 lysozyme (Sigma) 및 protease inhibitor (Roche complete, EDTA—free)이 보중된 lysis buffer (50 mM, HEPES pH 7, 200 mM NaCl , 5 mM MgC12, ImM DTT, 10 mM imidazole) 50 ml에 넣고 소니케이션하여 용해시켰다. 상기 얻어진 세포  Protein purification was performed by the following procedure: lysis buffer (50 mM, HEPES pH 7, 200 mM NaCl, 5 mM MgC12) supplemented with the prepared cell pellet lysozyme (Sigma) and protease inhibitor (Roche complete, EDTA-free) , ImM DTT, 10 mM imidazole) was added to 50 ml and dissolved by sonication. Obtained cells
용해물 (cell lysate)을 16,000 g에서 30분 동안 원심분리한 후, syringe filter (0.22 micron)에 통과시켰다. 상기 얻어진 용해물 (cleared lysate)을 nickel column (Ni-NTA agarose, Qiagen)에 적용하고, 2M salt로 세척한 후, 250 mM 이미다졸로 용출시켰다. 상기 용출된 단백질 용액의 버퍼 교체하고 마그네슘 및 이미다졸을 포함하지 않는 lysis buffer를 사용하여 농축시켰다. 상기 정제된 Cpfl 단백질을 SDS-PAGE로 시험하고, 하기 실시예에 사용하였다. 하기 실시예 중, 인간 세포를 사용하는 경우, 상기 E.coli codon optimized Cpfl 단백질을 Lysates (cell lysate) were centrifuged at 16,000 g for 30 minutes and then passed through a syringe filter (0.22 micron). The obtained lysate was applied to a nickel column (Ni-NTA agarose, Qiagen), washed with 2M salt, and eluted with 250 mM imidazole. The buffer of the eluted protein solution was replaced and concentrated using lysis buffer containing no magnesium and imidazole. The purified Cpfl protein was tested by SDS-PAGE and used in the examples below. In the following examples, when using human cells, the E. coli codon optimized Cpfl protein
암호화하는 플라스미드를 대체하여 human codon optimized Cpfl 단백질을 It replaces the plasmid encoding the human codon optimized Cpfl protein
암호화하는 플라스미드를 Addgene으로부터 입수하여 사용하였다. The encoding plasmid was obtained from Addgene and used.
상기 얻어진 SDS— PAGE 결과를 도 15a에 나타내었다. 실시예 2: 세포 배양 및 트랜스펙션 The obtained SDS-PAGE results are shown in FIG. 15A. Example 2: Cell Culture and Transfection
HEK293T cell를 10%(v/v) FBS (fetal bovine serum) 및 1%(ν/ν)  HEK293T cells were treated with 10% (v / v) FBS (fetal bovine serum) and 1% (ν / ν)
antibiotics ^층된 DMEM배지에 두었다. Cpfl-매개 유전체 교정을 위하여, HEK293T cell을 24-well plates에 70-80% confluency로 시딩한 후, 1 ipofectamine 2000 (Invitrogen)를 이용하여 Cpfl 발현 플라스미드 (500 ng) 및 cr醒 antibiotics ^ placed in layered DMEM medium. For Cpfl-mediated genome calibration, HEK293T cells were seeded in 24-well plates at 70-80% confluency, followed by Cpfl expression plasmid (500 ng) and cr 醒 using 1 ipofectamine 2000 (Invitrogen).
플라스미드 (500ng)를 상기 HEK293T cell에 트랜스펙션시켰다. 트랜스펙션 Plasmid (500 ng) was transfected into the HEK293T cells. Transfection
72시간 후에 DNeasy Blood & Tissue Kit (Qiagen)를 사용하여 유전체 DNA(genomic DNA)를 분리하였다. 실시예 3: RNP및 Di genome (digested genome) 준비 {In vitro cleavage of genomic DNA) After 72 hours, genomic DNA was isolated using DNeasy Blood & Tissue Kit (Qiagen). Example 3: Preparation of RNP and Di genome (digested genome) (In vitro cleavage of genomic DNA)
DNeasy Tissue kit (Qiagen)를 HeLa cell(ATCC)로부터 유전체 DNA를 정제하였다. Cpfl 단백질 (40 ug) 및 crRNA (2.7 ug each)을 실온에서 10분간 전배양 (pre-incubating)하여 리보핵산단백질 (r ibonucleoprotein; RNP) 복합체를 형성시켰다. 상기 정제된 유전체 DNA (8 ug)를 상기 RNP복합체와 함께 reaction buffer (100 mM NaCl , 50 mM Tris-HCl, 10 mM MgCl2( 100 ug/ml BSA, pH 7.9)에 넣고 37°C에서 8시간동안 인큐베이됩하였다. 이렇게 얻어진 절단된 유전체 DNA (Digested genomic DNA)를 R ase A (50 ug/mL)로 처리하여 crRNA를 분해시키고, DNeasy Tissue kit (Qiagen)를 이용하여 다시 한번 더 정제하였다. 실시예 4: 전체 유전체 (Whole genome) 및 절단 유전체 (digenome)의 서열분석 Genome DNA was purified from HeLa cells (ATCC) using a DNeasy Tissue kit (Qiagen). Cpfl protein (40 ug) and crRNA (2.7 ug each) were pre-incubated at room temperature for 10 minutes to form a r ibonucleoprotein (RNP) complex. The purified genomic DNA (8 ug) was added to the reaction buffer (100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl 2 ( 100 ug / ml BSA, pH 7.9)) together with the RNP complex for 8 hours at 37 ° C. The digested genomic DNA thus obtained was treated with R ase A (50 ug / mL) to digest the crRNA and once again purified using the DNeasy Tissue kit (Qiagen). Example 4 Sequencing the Whole Genome and Digenome
Cas9또는 Cpfl에 의하여 절단된 (digested) 유전체 DNA에 대하여 whole genome sequencing (WGS)를 수행하였다. 상기 WGS는 Illumina HiSeq X Ten Sequencer (Macrogen, South Korea)를 사용하여 30X내지 40X시뭔싱 뎁스  Whole genome sequencing (WGS) was performed on genomic DNA digested with Cas9 or Cpfl. The WGS is 30X to 40X sequence depth using Illumina HiSeq X Ten Sequencer (Macrogen, South Korea)
(sequencing depth)로 수행하였다. WGS 데이터를 이용하여 DNA 절단 스코어 (sequencing depth). DNA cleavage scores using WGS data
(cleavage score)는 전체 유전체에 걸쳐서 각 뉴클레오타이드 위치 별로 산정될 수 있다. 염색체 내의 i 위치에서의 절단 점수 (Cleavage Score at position /)는 다음의 수식으로 계산하였다 (도 28 참조):
Figure imgf000034_0001
The cleavage score can be calculated for each nucleotide position over the entire genome. Cleavage Score at position / in the chromosome was calculated by the following formula (see Figure 28):
Figure imgf000034_0001
Number of forward sequence reads starting at position i Number of forward sequence reads starting at position i
Number of reverse sequence reads starting at position i
Figure imgf000034_0002
Number of reverse sequence reads starting at position i
Figure imgf000034_0002
. 상기 수식은 Cas9이 , Munt end에 더하여, 51 및 3' 말단에 1-nt 내지 2- nt의 overhangs을 생성하고, Cpfl이 produces 5' 말단에 1-nt 내지 5_nt의 overhangs을 생성하는 것으로 가정한다. In vitro cleavage sites 중 상기 수식으로 얻어진 DNA cleavage scores가 컷오프 값인 2.5 이상인 것들을 컴퓨터로 확인하였다. 실시예 5: crRNA construct 차이에 따른 세포 유전체 교정 효율 비교 crRNA를 crRNA를 발현할 수 았는 cassette를 포함하는 PCR product (PCR amp 1 icon) 형태로 전달하는 경우와 crRNA를 발현할 수 있는 cassette를 포함하는 plasmid DNA 형태로 전달하는 경우의 세포 유전체 교정 효율을 비교하기 위하여, HEK293T/17 세포 (ATCC)에서 다음과 같이 lipofection실험을 진행하였다. . The formula assumes that Cas9 produces 1-nt to 2-nt overhangs at the 5 1 and 3 'ends, in addition to the Munt end, and Cpfl produces 1-nt to 5_nt overhangs at the 5' end. do. Among the in vitro cleavage sites, DNA cleavage scores obtained by the above formulas were identified by a computer with a cutoff value of 2.5 or more. Example 5 Comparison of Cell Genome Correction Efficiency According to Different CrRNA Constructs When crRNA is delivered in the form of a PCR product (PCR amp 1 icon) containing a cassette capable of expressing crRNA and a cassette capable of expressing crRNA In order to compare the efficiency of cellular genome correction when delivered in the form of plasmid DNA, lipofection experiments were performed on HEK293T / 17 cells (ATCC) as follows.
Cpfl 단백질 (AsCpfl 및 LbCpfl)을 암호화하는 DNA서열 및 이에  DNA sequences encoding Cpfl proteins (AsCpfl and LbCpfl)
작동가능하게 연결된 CMV promoter (서열번호 64)를 포함하는 pcDNA3.1 백터 PcDNA3.1 vector with operably linked CMV promoter (SEQ ID NO: 64)
(Invitrogen) (AsCpfl plasmid또는 LbCpfl plasmid)를, crRNA를 암호화하는 DNA 서열 및 이에 작동가능하게 연결된 U6 promoter를 포함하는 pUC19 백터 (Addgene; As-crRNA plasmid (서열번호 65 및 도 33) 또는 Lb-crR A plasmid (서열번호 66 및 도 34)) 또는 PCR product ( amp Π con; As-crRNA amp 1 icon (서열번호 67 및 도 35) 또는 Lb-crRNA amp 1 icon (서열번호 68 및 도 36))와 함께 HEK293T/17 세포에 전달하였다. 도 33 내지 36에서, 밑줄로 표시한 부분은 crRNA을 암호화하는 유전자 부위이며 , 'NNNNNNNNNNNNNNNNNNNNNNN'은 target sequence에 따라 결정되는 부위이다. 상기 Cpfl 단백질 및 crRNA를 암호화하는 DNA의 전달은 모두 lipofection 방식으로 수행하였다. 상기한 세포 전달 조건을 아래의 표 3에 정리하였다: 【표 3】 (Invitrogen) (AsCpfl plasmid or LbCpfl plasmid), a pUC19 vector (Addgene; As-crRNA plasmid (SEQ ID NOs 65 and 33) or Lb-crR) comprising a DNA sequence encoding a crRNA and a U6 promoter operably linked thereto A plasmid (SEQ ID NOs 66 and 34)) or PCR product (amp Π con; As-crRNA amp 1 icon (SEQ ID NOs 67 and 35) or Lb-crRNA amp 1 icon (SEQ ID NOs 68 and 36)) Together to HEK293T / 17 cells. In Figures 33 to 36, the underlined portion is a gene region encoding the crRNA, 'NNNNNNNNNNNNNNNNNNNNNNN' is a portion determined according to the target sequence. Delivery of the DNA encoding the Cpfl protein and crRNA were all performed by lipofection. The cell delivery conditions described above are summarized in Table 3 below: Table 3
Figure imgf000035_0001
Figure imgf000035_0001
또한, 상기 사용된 crRNA서열 및 표적 서열올 아래의 표 4에 정리하였다: 【표 4】  The crRNA sequences and target sequences used above are also summarized in Table 4 below.
Figure imgf000035_0002
Figure imgf000035_0002
((1) 표 4를 비롯하여 본 명세서에 기재된 염기서열은, 특별한 언급이 없는 한, 5'에서 3'로의 방향으로 기재됨  ((1) The base sequences described herein, including Table 4, are described in the direction of 5 'to 3', unless otherwise specified.
(2) 이하 기재되는 모든 AsCpfl crRNA는 표 4에 기재된 서열번호 36의 타겟팅 서열 부위 (밑줄로 표시)를 표적 유전자의 표적 서열에 해당하는 서열 (즉, 상기 표적 서열에서 T를 U로 치환함)로 대체한 것임 (3) 이하 기재되는 모든 LbCpfl crR A는 표 4에 기재된 서열번호 37의 타겟팅 서열 부위 (밑즐로 표시)를 표적 유전자의 표적 서열에 해당하는 서열 (즉, 상기 표적 서열에서 T를 U로 치환함)로 대체한 것임) (2) All AsCpfl crRNAs described below have a sequence corresponding to the target sequence of SEQ ID NO: 36 shown in Table 4 (underlined) corresponding to the target sequence of the target gene (ie, replacing T with U in the target sequence). Replaced by (3) All LbCpfl crR As described below replace the targeting sequence site of SEQ ID NO: 37 shown in Table 4 (marked with underlines) corresponding to the sequence corresponding to the target sequence of the target gene (ie, replacing T with U in the target sequence). ))
DNA를 전달한 후, 세포들을 72 시간 동안 37° C에서 배양한 뒤, 각각의 세포들로부터 genomic DNA를 분리하고, T7E1 assay (유전체 DNA에 서 특정부분 PCR증폭이후 T7E1 7 Endonuc lease I)을 37°C에서 20분 처리한 후 전기영동)와 targeted deep-sequencing (타겟 유전자의 타겟 부분을 PCR로 증폭한 이후 이를 Deep-sequencing 용 PCR barcode primer 로 재차 PCR증폭한 후, 이를 DNA 정제 kit 를 사용하여 정제한 뒤에 시뭔싱 ) 방법으로 표적 DNA에 발생한 염기서열 변이 발생 (targeted mutagenesis) 빈도 (Indel frequencies; %)를 산출하여 그 결과를 도 14a (T7E1 assay 결과), 도 14b (targeted deep-sequencing 결과), 및 도 19a (T7E1 assay 결과)에 각각 나타내었다. After delivery of the DNA, the cells were incubated at 37 ° C. for 72 hours, and then genomic DNA was isolated from each cell, followed by T7E1 assay (T7E1 7 Endonuc lease I after specific partial PCR amplification in dielectric DNA) at 37 ° C. After 20 minutes in C, electrophoresis) and targeted deep-sequencing (amplified the target portion of the target gene by PCR, and then amplified it again with a PCR barcode primer for deep-sequencing and purified using a DNA purification kit. Then, by the method of sequencing), the target mutagenesis frequency (Indel frequencies;%) generated in the target DNA was calculated, and the results are shown in FIG. 14A (T7E1 assay result) and FIG. 14B (targeted deep-sequencing result). And FIG. 19A (T7E1 assay results), respectively.
도 14a 및 14b에 나타난 바와 같이 , D匪 Π 유전자를 표적으로 하는 경우, AsCpfl과 LbCpfl 모두에서 crRNA를 plasmid 형태로 전달한 경우가 PCR product 형태로 전달하는 경우와 비교하여 보다높은 효율로 유전체 교정을 수행함을 확인하였다. 이러한 경항은 AAVS1 유전자를 표적으로 하는 경우에서도 유사하게 나타났다. 또한, 도 19a에 나타난 바와 같이, amplicon을 사용한 경우와  As shown in Figure 14a and 14b, when targeting the D 匪 Π gene, the delivery of crRNA in the form of plasmid in both AsCpfl and LbCpfl is carried out genome correction with higher efficiency compared to the case of delivery in the form of PCR product It was confirmed. This condition was similar in the case of targeting the AAVS1 gene. In addition, as shown in Figure 19a, when using amplicon and
비교하여, crRNA plasmids 를 사용한 경우, 표적한 변이유발 (targeted mutagenesis) 빈도가 시험된 3개의 endogenous target site에서 2 내지 30배 정도 증가하였다. PCR amplicons은 synthesis一 fai led oligonucleotide templates로부터 잘못된 가이드 RNAs 전사체를 생산하였고, 이는 잠재적으로 RNA bulge를 갖는 것으로 보이는 위치에서 비표적 DNA 절단 (off-target DNA cleavages)을 야기할 것으로 것으로 생각된다. 이러한 결과는 crRNA 발현 cassette를 plasmid 형태로 전달하는 것이 PCR product 형태로 전달하는 것에 비하여 유전체 교정 효율을 높일 수 있는 수단임을 보여준다. In comparison, the use of crRNA plasmids increased the target mutagenesis frequency by 2 to 30 times at the three endogenous target sites tested. PCR amplicons produced false guide RNAs transcripts from synthesis fai led oligonucleotide templates, which are thought to cause off-target DNA cleavages at locations that potentially appear to have RNA bulges. These results show that the delivery of the crRNA expression cassette in the plasmid form is a means to improve genome editing efficiency compared to the delivery in the PCR product form.
또한, 다 한 유래의 Cpfl or t ho logs {Lachnospiraceae bacterium (LbCpfl) , Acidaminococcus sp. (AsCpfl) , Francisel la novicida (FnCpf 1) , 및 Moraxella bovoculi ?7 (MbCpfl))에 대한 crRNA orthogonal ity를 시험하였다.  In addition, Cpfl or t ho logs {Lachnospiraceae bacterium (LbCpfl), Acidaminococcus sp. CrRNA orthogonality was tested for (AsCpfl), Francisel la novicida (FnCpf 1), and Moraxella bovoculi? 7 (MbCpfl).
앞서 설명한 과정을 참조하여, 4종의 Cpfl or t ho logs (LbCpfl, AsCpfl, FnCpfl, 및 MbCpfl)를 각각 암호화하는 DNA를 포함하는 플라스미드를 이들 각각에 대한 crRNA를 암호화하는 플라스미드와 함께 다양한조합으로 HEK293T세포에 도입시킨 早, targeted deep sequencing 방법으로 변이유발 (targeted mutagenes i s) 빈도 ( Indel frequency (%) )를측정하였다. With reference to the process described above, plasmids containing DNA encoding the four types of Cpfl or t ho logs (LbCpfl, AsCpfl, FnCpfl, and MbCpfl), respectively, were assigned to each of them. In addition to the plasmid encoding the crRNA for the various combinations introduced into HEK293T cells, the targeted mutagenes is frequency (Indel frequency (%)) was measured by the targeted deep sequencing method.
이 때 사용된 FnCpf l 및 MbCpf l에 대한 crRNA서열올 아래의 표 5에 정리하였다:  The crRNA sequences for FnCpf l and MbCpf l used at this time are summarized in Table 5 below:
【표 5】  Table 5
Figure imgf000037_0001
Figure imgf000037_0001
(표 5에서, D丽 T1-4 및 MVS1의 crRNA는 서열번호 38 또는 서열번호 39의 서열 중, 타겟팅 서열 부위 (밑줄로 표시)를 표적 유전자의 표적 서열에 해당하는 서열 (즉, 상기 표적 서열에서 T를 U로 치환함)로 대체한 것임)  (In Table 5, the crRNAs of Dlli T1-4 and MVS1 are sequences corresponding to the target sequence of the target gene (ie, the target sequence) in the sequence of SEQ ID NO. 38 or SEQ ID NO. Replacing T with U)
상기 얻어진 Indel frequency (%)를 도 19b에 나타내었다. LbCpfl 및 AsCpfl는 5'-ΓΓΤΝ-3' PAMs를 인식하는 반면, FnCpfl 및 The obtained Indel frequency (%) is shown in FIG. 19B. LbCpfl and AsCpfl recognize 5'-ΓΓΤΝ-3 'PAMs, whereas FnCpfl and
MbCpfl는 5'-TTN-3' PAMs을 인식하는데, 기존에 인간 세포에서 효율적이지 않거나 불활성인 것으로 알려져 있다. 도 19b에서 보여지는 바와 같이 , 이들 Cpfl ortholog들을 crRNA orthologs를 암호화하는 플라스미드와 함께 다양한 조합으로 인간 세포에 공동 트랜스펙션 (co-transfected)시켰을 때, 각 Cpfl ortholog는 동족의 (cognate) crRNA와 함께 트랜스펙션된 경우에 가장 높은 효율을 보였다. 또한, FnCpfl 및 MbCpfl를 포함하는 4종의 Cpfl orthologs 모두 상이한 MbCpfl recognizes 5'-TTN-3 'PAMs, which are known to be inefficient or inactive in human cells. As shown in FIG. 19B, when these Cpfl orthologs are co-transfected into human cells in various combinations with plasmids encoding crRNA orthologs, each Cpfl ortholog is cognate with cognate crRNA. The highest efficiency was shown when transfected. In addition, all four Cpfl orthologs, including FnCpfl and MbCpfl, are different.
species로부터 유래한 unorthogonal crRNAs와 함께 조합되어 사용된 경우에도, 염색체의 표적 위치를 절단할 수 있는 것으로 나타났다. FnCpfl 및 MbCpfl의 유전체 교정 활성은 crRNA plasmid를 사용함으로써 rescue될 수 있으나, AsCpfl 및 LbCpfl Cpfl orthologs 보다 효율이 상대적으로 가장 좋기 때문에, 본 연구에서는 상기 두 종의 Cpfl (AsCpfl 및 LbCpfl)에 초점을 두었다. Even when used in combination with unorthogonal crRNAs from species, it has been shown that the target site of a chromosome can be cleaved. Genomic correction activity of FnCpfl and MbCpfl can be rescued by using crRNA plasmids, but this study focused on the two Cpfl species (AsCpfl and LbCpfl) because they are relatively more efficient than AsCpfl and LbCpfl Cpfl orthologs.
두 개의 PAM서열 (하나는 Cpfl에 의하여 인식되는 PAM서열 (5'-ΉΤΝ- 31)이고, 나머지 하나는 SpCas9에 의하여 인식되는 PAM서열 (5'-NGG-3' )임 )을 포함하는 HEK293T세포 내의 10개의 염색체 표적 부위 상에서의 LbCpfl 및 HEK293T containing two PAM sequences (one is PAM sequence recognized by Cpfl (5'-ΉΤΝ- 3 1 ) and the other is PAM sequence recognized by SpCas9 (5'-NGG-3 ') LbCpfl on 10 chromosome target sites in the cell and
AsCpfl의 유전체 교정 효율을 측정하여 SpCas9와 비교하였다. 유전체 교정 효율은 앞서 설명한 방법을 참조하여 targeted deep sequencing에 의하여 측정된 Indel frequencies로서 산정하였다. 상기 시험에 사용된 10개의 표적 서열을 아래의 표 6에 나타내었다: Dielectric calibration efficiency of AsCpfl was measured and compared with SpCas9. The genome calibration efficiency was calculated as Indel frequencies measured by targeted deep sequencing with reference to the method described above. Ten target sequences used in the test are shown in Table 6 below:
【표 6】  Table 6
Gene Cpfl crRNA의 Target sequence SpCas9 sgRNA의 target  Target sequence of Gene Cpfl crRNA Target of SpCas9 sgRNA
sequence  sequence
1 D匪 T1- CTGATGGTCCATGTCTGTTACTC (서열번호 AGTAACAGACATGGACCATC (서열번호 3 19) 50)  1 D 匪 T1- CTGATGGTCCATGTCTGTTACTC (SEQ ID NO: AGTAACAGACATGGACCATC (SEQ ID NO: 3 19) 50)
2 D匪 T1- TTTCCCnCAGCTAAAATAAAGG (서열번호 TTTCCOTCAGCTAAAATAA (서열번호 2 D 匪 T1- TTTCCCnCAGCTAAAATAAAGG (SEQ ID NO: TTTCCOTCAGCTAAAATAA (SEQ ID NO:
4 20) 51) 4 20) 51)
3 AAVS1 CTTACGATGGAGCCAGAGAGGAT (서열번호 TGOTACGATGGAGCCAGAG (서열번호  3 AAVS1 CTTACGATGGAGCCAGAGAGGAT (SEQ ID NO: TGOTACGATGGAGCCAGAG (SEQ ID NO:
21) 52)  21) 52)
4 EMX1 TCCTCCGGTTCTGGAACCACACC (서열번호 AGGTGTGGTTCCAGAACCGG (서열번호 23) 53) 4 EMX1 TCCTCCGGTTCTGGAACCACACC (SEQ ID NO: AGGTGTGGTTCCAGAACCGG (SEQ ID NO: 23) 53)
5 CCR5-1 GTGGGCAACATGCTGGTCATCCT (서열번호 TGGmTGTGGGCAACATGC (서열번호  5 CCR5-1 GTGGGCAACATGCTGGTCATCCT (SEQ ID NO: TGGmTGTGGGCAACATGC (SEQ ID NO:
24) 54)  24) 54)
6 CCR5-9 GCCTGAATMTTGCAGTAGCTCT (서열번호 TAGAGCTACTGCAATTATTC (서열번호  6 CCR5-9 GCCTGAATMTTGCAGTAGCTCT (SEQ ID NO: TAGAGCTACTGCAATTATTC (SEQ ID NO:
25) 55)  25) 55)
7 HPRT-1 CTGACCTGCTGGATTACATCAAA (서열번호 GTGC1TTGATGTAATCCAGC (서열번호  7 HPRT-1 CTGACCTGCTGGATTACATCAAA (SEQ ID NO: GTGC1TTGATGTAATCCAGC (SEQ ID NO:
27) 56)  27) 56)
8 HPRT-4 TGTCCCCTGTTGACTGGTCATTC (서열번호 CTAGAATGACCAGTCAACAG (서열번호  8 HPRT-4 TGTCCCCTGTTGACTGGTCATTC (SEQ ID NO: CTAGAATGACCAGTCAACAG (SEQ ID NO:
28) 57)  28) 57)
9 HBB-1 AGTCCmGGGGATCTGTCCACT (서열번호 TCCACTCCTGATGCTGTTAT (서열번호  9 HBB-1 AGTCCmGGGGATCTGTCCACT (SEQ ID NO: TCCACTCCTGATGCTGTTAT (SEQ ID NO:
40) 58)  40) 58)
10 VEGFA CGTCCMCTCTGGGCTGTTCTC (서열번호 AGCGAGAACAGCCCAGAAGT (서열번호  10 VEGFA CGTCCMCTCTGGGCTGTTCTC (SEQ ID NO: AGCGAGAACAGCCCAGAAGT (SEQ ID NO:
41) 59) 상기 표 6에 나타낸 표적 서열을 기초로 표 4에서 설명한 방법으로 LbCpil crRNA 및 AsCpfl crRNA를 제작하여 시험에 사용하였다.  41) 59) LbCpil crRNA and AsCpfl crRNA were prepared and used in the test by the method described in Table 4 based on the target sequences shown in Table 6 above.
' SpCas9꾀 sgRNA는 아래의 서열 일반식 (서열번호 63) 중 ' (Ncas9)ra'를 상기 표 6의 SpCas9의 Target sequence 중 T를 U로 치환한 서열로 대체하고, 링커로서 'G A'를 포함하는 서열을 갖도록 제작하였다 (이하, SpCas9의 sgRNA는 이와 동일한 방법으로 제작함): 'SpCas9 sgRNA replaces' (N cas9 ) ra ' in the following general formula (SEQ ID NO: 63) with a sequence substituted with T in the target sequence of SpCas9 in Table 6 above, and 'G A' as a linker. Was constructed to have a sequence comprising (hereinafter, the sgRNA of SpCas9 was constructed in the same manner):
51 -(Ncasg GUUUCAGUUGClH링커) - AUGCUCUGU UCAUUUAA GUAUUUUG CGGACCUCUGUUUGACACGUCUGAAUAACUAAAAA-3 ' (일반식 4; 서열번호 63) 5 1- (N cas g GUUUCAGUUGClH linker)-AUGCUCUGU UCAUUUAA GUAUUUUG CGGACCUCUGUUUGACACGUCUGAAUAACUAAAAA-3 '(Formula 4; SEQ ID NO: 63)
상기 얻어진 결과를 도 19c에 나타내었다. 도 19c에 나타난 바와 같이 시험에 사용된 모든 뉴클레아제 타입은 인간 세포 (HEK293 cell)에서 광범위한 변이빈도를 나타냈다 (SpCas9: 평균 37±5%; LbCpfl: 21士 6%; AsCpfl: 21土 5%). 실시예 6: 재조합 Cpfl 단백질 정제 및 리보핵산단백질 (RNP) 전달을 통한 세포 유전체 교정  The obtained result is shown in FIG. 19C. As shown in FIG. 19C, all nuclease types used in the test showed a wide range of variability in human cells (HEK293 cells) (SpCas9: average 37 ± 5%; LbCpfl: 21 ° 6%; AsCpfl: 21% 5% ). Example 6: Cell Genome Correction Through Recombinant Cpfl Protein Purification and Ribonucleic Acid Protein (RNP) Delivery
6.1. 재조합 Cpfl 단백질을 이용한 in vitro cleavage assay 정제한 재조합 AsCpfl 과 LbCpfl 단백질이 crRNA 와 결합하여 DNA를 자르는 activity가 있는지 확인하기 위하여 , in vitro cleavage assay를 진행하였다. 이를 위하여, 상기 실시예 1 에서 얻어진 재조합 AsCpfl (1 uM) 또는 LbCpfl (1 uM) T7 RNA polymerase (New England Biolabs)에 의한 in vitro transcript ion으로 제작하거나 화학적으로 합성한 D画 T1을 표적으로 하는 crRNA (상기 표 4 참조) (1 uM), 및 상기 표적 (D MTl) DNA서열 (표 4 참조)을 갖는 DNA 단편을 함께 1 시간 동안 37°C에서 incubation한 다음, TBE— agarose gel 전기영동을 통해 표적 DNA가 절단되는 것을 확인하였다. T7 RNA polymer ase(New England Bio labs)에 의한 in vitro transcription으로 제작된 crRNA의 경우, 5' 말단에 트리포스페이트 6.1. In vitro cleavage assay using recombinant Cpfl protein In vitro cleavage assay was performed to determine whether the purified recombinant AsCpfl and LbCpfl proteins had activity to cut DNA by binding to crRNA. To this end, crRNA targeting D ′ T1 produced or chemically synthesized in vitro transcript ion by recombinant AsCpfl (1 uM) or LbCpfl (1 uM) T7 RNA polymerase (New England Biolabs) obtained in Example 1 was targeted. (See Table 4 above) (1 uM), and DNA fragments having the target (D MTl) DNA sequence (see Table 4) were incubated together at 37 ° C. for 1 hour, followed by TBE—agarose gel electrophoresis. It was confirmed that the target DNA was cleaved. In case of crRNA produced by in vitro transcription by T7 RNA polymer ase (New England Bio labs), triphosphate at 5 'end
(PPP)를 포함하는 반면, 화학적으로 합성된 crRNA는 이를 포함하지 않는다. 상기 전기 영동 결과를 도 15b에 나타내었다 07: T7 RNA polymerase에 의한 in vitro transcript ion으로 제작한 crRNA; synthetic: 화학적으로 합성한 crRNA). (PPP), while chemically synthesized crRNA does not. The electrophoresis results are shown in FIG. 15B 07: crRNA prepared by in vitro transcript ion by T7 RNA polymerase; synthetic: chemically synthesized crRNA.
도 15b에 나타난 바와 같이, Cpfl은 crRNA가 있는 경우에만 target DNA를 자르는 activity 를 보였다. 또한, 5' 말단에 phosphate를 갖지 않는 합성 crRNA와 5' 말단에 phosphate를 갖는 in vitro transcript ion으로 제작된 crRNA의 절단 효율이 유사한 것으로 확인되었으며, 이는 crRNA의 5' 말단의 phosphate 유무가 in vitro cleavage에 영향을 미치지 않음을 의미한다.  As shown in Figure 15b, Cpfl showed the activity to cut the target DNA only in the presence of crRNA. In addition, it was confirmed that the cleavage efficiency of the synthetic crRNA having no phosphate at the 5 'end and the crRNA prepared from the in vitro transcript ion having the phosphate at the 5' end was similar, indicating that the presence or absence of phosphate at the 5 'end of the crRNA was in vitro cleavage. It does not affect.
6.2. 재조합 Cpfl 단백질을 이용한 세포에서의 유전체 교정 시험 6.2. Genome correction test in cells using recombinant Cpfl protein
재조합 AsCpfl과 LbCpfl 단백질을 세포실험에 적용하여  Recombinant AsCpfl and LbCpfl proteins were applied to cell experiments
ribonucleoprotein(R P) 전달을 통한 세포 유전체 교정올 시험하였다. Cell genome calibration via ribonucleoprotein (R P) delivery was tested.
상기 실시예 1에서 정제된 재조합 Cpfl단백질 (AsCpfl또는 LbCpfl)과 D丽 T1-3 표적 crRNA (표 4 참조; in vitro transcript ion으로 제작된 crRNA)를 적정 비율로 섞어 RNP를 만들고, 이를 electroporation또는 lipofection 방식에 의하여 HEK293T/17 세포에 처리 (전달)하였다 (electroporation 의 경우 Cpfl 20 ug : crRNA 20 ug흔합, lipofection 의 경우 Cpfl 10 ug : crRNA 2 ug흔합). RNP 전달 후, 세포를 37 °C 에서 72 시간동안 배양한 다음, genomic DNA를 분리하여 상기 실시예 5에 기재된 방법을 참조하여 T7E1 assay 및 targeted deep-sequencing 방식으로 표적 위치 (D醒 T1) 염기서열 변이의 발생 효율을 분석하여 빈도수 (%)로 산출하였다. 비교를 위하여 , SpCas9 (SwissProt Accession number Q99ZW2(NP_269215. D) 및 sgRNA (target sequence : AGTACGTTAATGTTTCCTGA)를 사용하여 상기와 동일한 시험을 수행하였다. 그 결과를 도 16a (T7E1 assay 결과) 및 ; 16b (targeted deep-sequencing 결과)에 각각 나타내었다. The recombinant Cpfl protein (AsCpfl or LbCpfl) purified in Example 1 and Dlia T1-3 target crRNA (see Table 4; crRNA prepared from in vitro transcript ion) were mixed at an appropriate ratio to form an RNP, which was then electroporation or lipofection. HEK293T / 17 cells were treated (delivered) by the method (Cpfl 20 ug: crRNA 20 ug for electroporation, Cpfl 10 ug: crRNA 2 ug for lipofection). After RNP delivery, the cells were incubated at 37 ° C. for 72 hours, and then genomic DNA was isolated to refer to the method described in Example 5, using a T7E1 assay and a targeted deep-sequencing method to target position (D 醒 T1) nucleotide sequence. The efficiency of occurrence of the mutations was analyzed and calculated as frequency (%). For comparison, SpCas9 (SwissProt Accession number The same test was carried out using Q99ZW2 (NP_269215.D) and sgRNA (target sequence: AGTACGTTAATGTTTCCTGA). The results are shown in Figure 16a (T7E1 assay results) and ; 16b (targeted deep-sequencing results).
도 16a 및 16b에 나타난 바와 같이 , AsCpfl 및 LbCpf 1이 crRNA와 결합한 RNP 전달에서 electroporat ion 방식을 사용하는 경우와 l ipofect ion을 사용하는 경우 모두 표적 위치 (D匪 T1) 에서 SpCas9과 유사한 수준의 변이 효율을 보였다.  As shown in FIGS. 16A and 16B, similar variation in SpCas9 at the target position (D 匪 T1) at both the electroporat ion and l ipofect ion in the RNP delivery of AsCpfl and LbCpf 1 in crRNA-bound Showed efficiency.
5 ' 포스페이트가 없는 합성 (synthet ic) crRNA를 사용하여 상기한  As described above using 5 'phosphate free synthetic crRNA
electroporat ion 방식에 의한 R P 전달을 수행하고 유전체 세포 교정 효율을 측정하여, in vitro transcript ion으로 제작된 crRNA를 사용한 경우와 비교하였다. 상기 얻어진 결과를 도 16c에 나타내었다. 도 16c에 나타난 바와 같이, 합성 (synthet ic) crRNA를 사용한 경우에도 in vi tro tr'anscr ipt ion으로 제작된 RP delivery by electroporat ion method was performed and genome cell calibration efficiency was measured and compared with the case of using crRNA prepared with in vitro transcript ion. The obtained result is shown in FIG. 16C. As shown in Figure 16c, even in the case of using a synthetic (synthet ic) crRNA in vi tro tr ' anscr ipt ion
crRNA와 유사한 정도의 유전체 교정 효율올 얻을 수 있다. Genome correction efficiency similar to that of crRNA can be obtained.
상기에서 얻어진 결과는 재조합 Cpf l 단백질을 포함하는 RNP을  The results obtained above showed that the RNP containing the recombinant Cpf l protein
electroporat ion또는 1 ipofect ion를 통하여 세포에 전달하는 경우 모두에서 세포 유전체 교정에 효과적으로 사용될 수 있음을 보여준다. 이와 같은 R P 전달 방식은 DNA 플라즈미드 전달 방식에 비해 짧은 시간 안에 효과적인 유전체 교정을 할 수 있으며 , DNA가 사용되지 않아 세포의 유전체에 외부 DNA 가 끼여들 위험이 전혀 없다는 장점이 있다. 또한 Cpf l은 PAM이 Cas9 과 다른 서열을 가지고 있으므로 Cas9으로 표적할 수 없었 위치의 유전체 교정이 가능해진다. 그리고 Cas9 과 Cpfl 단백질을 orthogonal하게 사용하면 각각 다른 표적 유전자를 동시에 교정할 수 있으며, catalyt ic dead 형태의 Cpfl mutant (dCpfl)을 dCas9 과 함께 사용하면 복수의 표적 유전자들의 발현을 선택적으로 동시에 발현 및 억제하는 것도 가능하다. .. 실시예 7. Digenome-seq을 이용한 Cpfl의 inverted PAM repeat 규명 It has been shown that it can be effectively used for cellular genome correction in both cases of electroporat ion or 1 ipofect ion delivery to cells. This R P delivery method is capable of effective genome correction in a short time compared to the DNA plasmid delivery method, there is no advantage that there is no risk of the insertion of external DNA in the genome of the cell because no DNA is used. In addition, Cpf l has a sequence different from that of Cas9, allowing genome correction at positions that could not be targeted by Cas9. Orthogonal use of Cas9 and Cpfl proteins allows simultaneous correction of different target genes, and the use of catalytic ic dead Cpfl mutant (dCpfl) together with dCas9 selectively and simultaneously expresses and inhibits expression of multiple target genes. It is also possible. Example 7. Identification of inverted PAM repeat of Cpfl using Digenome-seq
세포에서 분리한 유전체 (genomic DNA)를 재조합 Cpf l 단백질 (3nM-300nM)과 crRNA (9nM-900nM; 표 6의 1 내지 8번 서열 (서열번호 19, 20, 21 , 23, 24, 25, 27, 및 28) 각각에 대한 crRNA를 사용함)과 함께 12시간동안 인큐베이션  The genomic DNA isolated from the cells was transformed into recombinant Cpf l protein (3nM-300nM) and crRNA (9nM-900nM; sequences 1 to 8 of SEQ ID NO: 6 (SEQ ID NOs: 19, 20, 21, 23, 24, 25, 27). , And 28) incubate for 12 hours with crRNA for each)
시켰다 (도 17a 참조) . 12시간 후, Cpfl 단백질과 crRNA를 각각 protease K와 RNase A로 제거한 후 유전체를 정제하고 qPCR (사용된 프라이머: Forward: AAG TCA CTC TGG GGA ACA CG, Reverse: TCC CTT AGC ACT CTG CCA CT; PCR조건: 2step (95C lOsec , 60C lOsec x 40cycle) )올 통해 표적위치에서 유전체의 절단 효율을 (See FIG. 17A). After 12 hours, the Cpfl protein and crRNA were removed with protease K and RNase A, respectively, and the genome was purified and qPCR (primary primer used: Forward: AAG TCA). CTC TGG GGA ACA CG, Reverse: TCC CTT AGC ACT CTG CCA CT; PCR conditions: 2steps (95C lOsec, 60C lOsec x 40cycle)
정량하였다. 그 결과를 도 17b에 나타내었다. 도 17b의 y축의 수치는 control올 1로 하였을 때의 절단되지 않은 유전체의 상대적 비율을 의미한다. 도 17b에 나타난 바와 같이, 3nM Lb-/As-cpfl 단백질과 9nM crRNA의 경우 표적위치 (On- target si te)의 유전체가 60%정도 잘렸으며ᅳ 30nM As-/Lb-Cpfl 단백질과 90nM crRNA 그리고 300nM Lb-/As-cpfl 단백질과 900nM crRNA을 이용하였을 경우 Quantification The results are shown in Figure 17b. The numerical value of the y-axis of FIG. 17B means the relative ratio of the uncut | disconnected dielectric material when controlall 1 is used. As shown in FIG. 17B, in the case of 3nM Lb- / As-cpfl protein and 9nM crRNA, the genome of the on-target siete was cut by 60%, ᅳ 30nM As- / Lb-Cpfl protein and 90nM crRNA and When using 300nM Lb- / As-cpfl protein and 900nM crRNA
표적위치의 유전체가 95% 이상 잘리는 것을 확인하였다. It was confirmed that the genome at the target position was cut at least 95%.
Cpf l 단백질과 crRNA에 의해 절단된 유전체를 이용하여 전체 유전체 시뭔싱 (whole genome sequencing)을 진행하고, 그 결과를 Integrat ive Genome  Whole genome sequencing is performed using the genome cleaved by Cpf l protein and crRNA, and the result is integrated.
Viewer( IGV)를 이용하여 확인한 결과를 도 17c에 나타내었다. 도 17c에 나타난 바와 같이, Cpfl 단백질과 crRNA를 처리한 유전체에서는 표적위치에서 read들의 5' 말단이 수직 정렬된 형태가 나타난 반면, Cpfl 단백질과 crRNA를 처리하지 않은 유전체에서는 표적 위치에서 sequence read 들이 정렬되는 경향성이 보이지 않았다.  Results confirmed using the Viewer (IGV) are shown in FIG. 17C. As shown in FIG. 17C, in the genome treated with the Cpfl protein and the crRNA, the 5 'end of the reads was vertically aligned at the target position, whereas in the genome not treated with the Cpfl protein and the crRNA, the sequence reads were aligned at the target position. There was no tendency to
Cpf l 단백질과 crRNA에 의해 절단된 유전체를 이용하여 비표적 위치 (of f- target si te)를 찾기 위하여 digenome-seq을 수행하였다 (실시예 4 참조) . 상기 얻어진 결과를 도 18a에 나타내었다. 도 18a에 나타난 바와 같이, 표적위치  Digenome-seq was performed to find the non-target site using the genome cleaved by Cpf l protein and crRNA (see Example 4). The obtained result is shown in FIG. 18A. As shown in Figure 18a, the target position
1개와 비표적 후보위치 25개를 찾을 수 있었다. One and 25 non-target candidates were found.
상기 얻어진 26개 위치의 서열 (sequence)을 이용하여 얻어진 서열 로고 (sequence logo)를 도 18b에 나타내었다. 도 18b에 나타난 바와 같이, 이미 알려진 Cpf l의 PAM서열 (TTTN) 이외에도 반대편에 inverted-PAM서열 (NAM)이 존재하는 것을 확인했다. Inverted-PAM은 AAA뿐만 아니라 AAG, AGA, GAA  The sequence logo obtained using the obtained sequence of 26 positions is shown in Fig. 18B. As shown in FIG. 18B, inverted-PAM sequences (NAMs) existed on the opposite side in addition to the known CPF l PAM sequence (TTTN). Inverted-PAM is not only AAA but also AAG, AGA, GAA
형태로도 일부 나타났다. 이러한 결과는 Cpf l 단백질이 유전체 절단을 일으킬 때 하나의 Cpf l 단백질이 crRNA와 결합을 통해 유전체와의 결합을 유도하면서 다른 하나의 Cpfl과 이합체를 이루고, 이 Cpfl은 반대편의 PAM서열 (NAM)에 결합하여 작동할 수 있음을 의미한다. 위의 invertedᅳ PAM정보는 Cpfl의 절단 효율이 높은 표적위치를 선정하는 데 사용될 수 있으며 이와 같은 inverted-PAM서열을 가진 표적위치에서는 2개 이상의 Cpf l crRNA를 nickase와 비슷한 방법으로 동시에 사용했을 때 절단 효율이 높아질 가능성이 있다. 또한 이 정보를 사용하여 절단위치에 형성되는 overhang 길이를 조절하여 homologous recombination (HR) mediated knock- in효율을 높이는 방법도 가능할 수 있다. 실시예 8: Cpfl의 mismatch tolerance시험 Some also appeared in form. These results indicate that when a Cpf l protein causes genome cleavage, one Cpf l protein dimerizes with another Cpfl while inducing binding to the genome through binding to the crRNA, and this Cpfl binds to the opposite PAM sequence (NAM). It means that it can work in combination. The above inverted ᅳ PAM information can be used to select target sites with high Cpfl cleavage efficiency, and when two or more Cpf l crRNAs are used at the same time as nickase, the target sites with the inverted-PAM sequence There is a possibility of higher efficiency. You can also use this information It may be possible to increase the homologous recombination (HR) mediated knock-in efficiency by adjusting the overhang length at the cleavage site. Example 8: mismatch tolerance test of Cpfl
LbCpfl와 AsCpfl 모두 5'-ΊΤΤΝ— 3' (Ν은 A, T, C, 또는 G) PAM서열 및 3' 방향으로 인접하여 위치하는 23-nt protospacer sequences (crRNA의 타겟팅 서열과 매칭됨 (즉, crR A의 타겟팅 서열은 protospacer sequence에서 T를 U로 변환한 서열임)으로 이루어진 27-nt target DNA서열을 인식하고 절단한다. LbCpfl and AsCpfl both match the 5'-ΊΤΤΝ— 3 '(Ν is A, T, C, or G) PAM sequence and 23-nt protospacer sequences located adjacent to the 3' direction (ie The targeting sequence of cr RA recognizes and cleaves a 27-nt target DNA sequence consisting of T to U in the protospacer sequence.
3개의 endogenous target sites (D匪 ΊΊ-3, DNMT1-4, 및 MVS1)을 선정하고 (on target), 상기 표적 부위의 on target 서열 및 하나 또는 두 개의 mismatch를 포함하는 off-target 서열과 흔성화 가능한 다양한 crRNAs을 암호화하는  Three endogenous target sites (DVII-III-3, DNMT1-4, and MVS1) are selected (on target) and hybridized with off-target sequences comprising on target sequences and one or two mismatches of the target sites. To encode as many crRNAs as possible
플라스미드와 LbCpfl 또는 AsCpfl를 암호화하는 플라스미드를 HEK293 eel Is에 트랜스펙션시키고 , targeted deep sequencing 식으로 Indel frequency (%)를 측정하여, Cpfl가 어느 정도까지의 on-target DNA서열과 crRNA서,열간 mismatch를 관용 (tolerate)할 수 있는지 시험하였다. Transfection of plasmid encoding the plasmid and LbCpfl or AsCpfl in HEK293 eel Is and, targeted deep sequencing expression Indel by measuring the frequency (%), Cpfl is to some extent on-target DNA sequences and crRNA standing, hot mismatch Was tested for tolerate.
상기 선정된 3개의 endogenous target sites (on target)을 아래의 표 7에 나타내었다:  The selected three endogenous target sites (on target) are shown in Table 7 below:
【표 7]  [Table 7]
Figure imgf000043_0001
Figure imgf000043_0001
상기 선정된 3개의 endogenous target sites의 off-target 서열은 도 20a, 20b, 및 20c에 각각 나타내었다.  The off-target sequences of the three selected endogenous target sites are shown in FIGS. 20A, 20B and 20C, respectively.
상기 표 7 및 도 20a 내지 20c에 나타낸 on-target 서열 및 off-target 서열을 기초로 표 4에서 설명한 방법으로 LbCpil crRNA 및 AsCpfl crRNA를 제작하여 시험에 사용하였다. 상기 얻어진 Indel frequency (%)를 도 20a (D匪Ί-3의 Indel frequency) , 20b (D匪 ΊΊ— 4의 Indel frequency) 및 20c (AAVS1의 Indel frequency)에 각각 나타내었다 (Error bars indicate s .e .m) . Based on the on-target sequence and off-target sequence shown in Table 7 and FIGS. 20A to 20C, LbCpil crRNA and AsCpfl crRNA were prepared and used for the test by the method described in Table 4. The obtained Indel frequency (%) is shown in Fig. 20a (Indel frequency of DVII-3), 20b (Indel frequency of DVIII-4) and 20c (Indel frequency of AAVS1) (Error bars indicate s). e .m).
도 20a-20c에 나타난 바와 같이, D匪 Tl— 3 (도 20a) 및 D匪 T1-4의 경우 (도 20b) , LbCpf l 및 AsCpf l 모두 하나의 mi smatch를 포함하는 경우 (특히 PAM (5' 말단으로부터의 거리)으로부터 거리가 20nt 이내인 경우)에도 Cpfl 활성을 거의 나타내지 못하였으며, 두 개의 mismatch를 포함하는 경우 (특히 PAM으로부터 거리가 20nt 이내인 경우)에는 거의 완전하게 Cpfl 활성을 상실하였다. 이러한 결과는 Cpf l가 인간 세포에서 높은 특이성을 가짐을 보여준다. . 실시예 9: 인간 유전체에서의 potent ial off-target site 동정  As shown in FIGS. 20A-20C, for D 匪 Tl-3 (FIG. 20A) and for D 匪 T1-4 (FIG. 20B), LbCpf l and AsCpf l both contain one mi smatch (particularly PAM (5 Cpfl activity was hardly exhibited even when the distance from the distal end) was within 20 nt) and almost completely lost when two mismatches were included (especially when the distance from the PAM was within 20 nt). . These results show that Cpf l has high specificity in human cells. . Example 9 Identification of Potential Off-target Sites in the Human Genome
Cas-OFFinder를 사용하여 인간 유전체에서의 잠재적 off-target site을 동정하였다. 상기 시험된 10개의 on-target si tes (표 6)과 1 내지 4개 또는 1 내지 5개 뉴클레오타이드가 상이한 부위를 잠재적 off-target si te로 선정하였으며, HEK293 세포에서의 of f-target mutat ion ( Indel frequency (¾) )을 targeted deep sequencing 방식으로 측정하였다.  Cas-OFFinder was used to identify potential off-target sites in the human genome. The sites on which the 10 on-target sites (Table 6) and 1 to 4 or 1 to 5 nucleotides differed were selected as potential off-target sites, and of f-target mutat ion in HEK293 cells ( Indel frequency (¾)) was measured by targeted deep sequencing.
【표 8】  Table 8
Indel frequency D-Indel frequency D-
(%) cap .(%) cap.
Mis Mis
(- AsCp LbCp A L (-AsCp LbCp A L
Locat ion PAM-Target Sequence No . )Cpf fl fl s b Locat ion PAM-Target Sequence No. ) Cpf fl fl s b
TTTCCTGATGGTCCATGTCTGT  TTTCCTGATGGTCCATGTCTGT
0η- Chrl 102444 TACTC 0.01 47.1 34.4  0η- Chrl 102444 TACTC 0.01 47.1 34.4
target 9 42 (TTTC-서열번호 19) 0 % 6% 5% 0 0 target 9 42 (TTTC-SEQ ID NO: 19) 0% 6% 5% 0 0
D丽 Tl- 687779 ITTCCTGcTGGTCCATGTCTa 0.01 0.00 0.01 D 丽 Tl- 687779 ITTCCTGcTGGTCCATGTCTa 0.01 0.00 0.01
3_02 Chr7 05 aTACTC 3 % % % X X 3_02 Chr7 05 aTACTC 3%%% X X
D匪 Tl- Chrl 757458 TTTTCTGATGGTCCATacCTG 3 0.00 0.01 0.00 0 0 3_03 6 70 TTACaC % % % D 匪 Tl- Chrl 757458 TTTTCTGATGGTCCATacCTG 3 0.00 0.01 0.00 0 0 3_03 6 70 TTACaC%%%
D匪 Tl- 825498 TTTCCTGATGGTCCAcacCTG 0 .03 0.02 0.03 3—04 ChrX 85 TTACaC 4 % % % 0 X D 匪 Tl-825498 TTTCCTGATGGTCCAcacCTG 0.03 0.02 0.03 3—04 ChrX 85 TTACaC 4%%% 0 X
D醒 Tl- Chr l 564688 TTTTCTt ATt GTaCATGTCTG 0.01 0.01 0.00 3_05 1 ^ 96 TaACTC 4 % % % X XD 醒 Tl- Chr 564688 TTTTCTt ATt GTaCATGTCTG 0.01 0.01 0.00 3_05 1 ^ 96 TaACTC 4%%% XX
DNMT1- 721339 TTTCCTGATGGTCCAcacCTG 0.01 0.02 0.01 3_06 Chr8 67 TTgCaC 5 % % % X XDNMT1- 721339 TTTCCTGATGGTCCAcacCTG 0.01 0.02 0.01 3_06 Chr8 67 TTgCaC 5%%% X X
DNMTl- 968775 TTTTCTGcTt cTCCATGTtTG 0.00 0.01 0.01 3—07 Chr8 20 TTACTt 5 % % % X X DNMTl- 968 775 TTTTCTGcTt cTCCATGTtTG 0.00 0.01 0.01 3—07 Chr8 20 TTACTt 5%%% X X
Chr l 317624 TTTCCTGATGGTCCAcacCTG 0.03 0.03 0.03 2 9 TTgCaC 5 % % % X X Chr l 317 624 TTTCCTGATGGTCCAcacCTG 0.03 0.03 0.03 2 9 TTgCaC 5%%% X X
D匪 ΊΊ- Chrl 611503 TTTCCTGAgGGTgCATt TgTG 0.02 0.01 0.01 3_09 2 24 TTtCTC 5 % % % X XD 匪 ΊΊ- Chrl 611503 TTTCCTGAgGGTgCATt TgTG 0.02 0.01 0.01 3_09 2 24 TTtCTC 5%%% X X
DNMT1- 853675 TTTTCTGtTtGTCCAatTCTG 0.01 0.00 0.01 3ᅳ 10 Chr3 21 TTACTg 5 % % % X XDNMT1- 853675 TTTTCTGtTtGTCCAatTCTG 0.01 0.00 0.01 3 ᅳ 10 Chr3 21 TTACTg 5%%% X X
DNMTl- 960504 TTTCCTGATGGTCCATactTG 0.01 0.00 0.00 3ᅳ 11 Chr3 81 TTgCaC 5 % % % X XDNMTl- 960 504 TTTCCTGATGGTCCATactTG 0.01 0.00 0.00 3 ᅳ 11 Chr3 81 TTgCaC 5%%% X X
D匪 Tl- 140583 TTTTCTGcTGcTCCcTGTCTG 0.03 0.02 0.02 3ᅳᅳ 12 Chr3 435 TTt tTC 5 % % % X XD 匪 Tl- 140583 TTTTCTGcTGcTCCcTGTCTG 0.03 0.02 0.02 3 ᅳ ᅳ 12 Chr3 435 TTt tTC 5%%% X X
D丽 ΊΊ- 156104 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 3_13 Chr3 287 TTgCaC 5 % % % X XD 丽 ΊΊ-156 104 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 3_13 Chr3 287 TTgCaC 5%%% X X
D丽 Tl- 183313 TTTTCTGATGGTCCAcacCTG 0.00 0.00 0.00 3—14 Chr3 194 TTgCaC 5 % % . % X XD 丽 Tl-183313 TTTTCTGATGGTCCAcacCTG 0.00 0.00 0.00 3—14 Chr3 194 TTgCaC 5%% . % XX
D蘭 Tl- 189760 TTTGCTaATaGgCCATGTaTG 0.03 0.02 0.02 3_15 Chr3 807 gTACTC 5 % % . % . X XD 蘭 Tl- 189760 TTTGCTaATaGgCCATGTaTG 0.03 0.02 0.02 3_15 Chr3 807 gTACTC 5%%. %. XX
D匪 Tl- 179010 TTTCCTGATGGTCCAcagCTG 0.01 0.01 0.02D 匪 Tl-179010 TTTCCTGATGGTCCAcagCTG 0.01 0.01 0.02
S Ghr7 53 TcACaC 5 % % % X X S Ghr7 53 TcACaC 5%%% X X
D丽 ΊΊ- 474599 TTTCCTGATGGTCCAcGcCTa 0.03 0.01 0.01 3_17 Chr7 50 TTgCaC 5 % % % X X D醒Ί- 542296 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 3_18 Chr7 03 TTgCaC 5 % % % X XD 丽 ΊΊ- 474599 TTTCCTGATGGTCCAcGcCTa 0.03 0.01 0.01 3_17 Chr7 50 TTgCaC 5%%% XX D 醒 Ί- 542296 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 3_18 Chr7 03 TTgCaC 5%%% XX
D匪 Tl- 105875 ITTCCTGATGGTtCAcaTCTG 0.08 0.07 0.07 3ᅳ 19 Chr7 645 TTgCaC 5 % % % X XD 匪 Tl- 105875 ITTCCTGATGGTtCAcaTCTG 0.08 0.07 0.07 3 ᅳ 19 Chr7 645 TTgCaC 5%%% X X
D匪 Tl- 113376 TTTTCTGATGtTCCAaGTCTG 0.02 0.04 0.04 3_20 Chr7 854 cTtCTt 5 % % % X XD 匪 Tl- 113376 TTTTCTGATGtTCCAaGTCTG 0.02 0.04 0.04 3_20 Chr7 854 cTtCTt 5%%% X X
D匪 Tl- 658577 TTTCCTGATGGTCCAcacCTG 0.01 0.00 0.01 3_21 Chr4 5 TTgCaC 5 % % % X XD 匪 Tl- 658577 TTTCCTGATGGTCCAcacCTG 0.01 0.00 0.01 3_21 Chr4 5 TTgCaC 5%%% X X
DNMT1- 103567 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 3_22 Chr4 02 TTgCaC 5 % % % X XDNMT1- 103567 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 3_22 Chr4 02 TTgCaC 5%%% X X
DNMT1- 661662 TT TCTGATGGTCCAcacCTG 0.00 0.01 0.00 3_23 Chr4 14 TTtCaC 5 % % % X XDNMT1- 661662 TT TCTGATGGTCCAcacCTG 0.00 0.01 0.00 3_23 Chr4 14 TTtCaC 5%%% X X
D醒 Tl- 115245 TTTCCTGATGGTCCAcacCTG 0.01 0.01 0.02 3_24 Chr4 817 TTgCaC 5 % % % X XD 醒 Tl- 115245 TTTCCTGATGGTCCAcacCTG 0.01 0.01 0.02 3_24 Chr4 817 TTgCaC 5%%% X X
D匪 Tl- 117890 TTTCCTGATaGTCCAcaTCTG 0.00 0.00 0.00 3_25 Chr4 965 TTgCaC 5 % % % X XD 匪 Tl- 117 890 TTTCCTGATaGTCCAcaTCTG 0.00 0.00 0.00 3_25 Chr4 965 TTgCaC 5%%% X X
D匪 Tl- 128766 TTTCCTGATGGTCCAcacCTG 0.02 0.01 0.02 3_26 Chr4 667 TTgCcC 5 % % % . X XD 匪 Tl-128766 TTTCCTGATGGTCCAcacCTG 0.02 0.01 0.02 3_26 Chr4 667 TTgCcC 5%%% . XX
D匪Ί- 201240 TTTACTGtatGTtCATGTCTG 0.00 0 , 01 0.00 3—27 Chr5 84 TTtCTC 5 % % % X XD 匪 Ί- 201240 TTTACTGtatGTtCATGTCTG 0.00 0, 01 0.00 3—27 Chr5 84 TTtCTC 5%%% X X
D匪 Tl- 358911 TTTCCTGATGGTCtAcacCTG 0.00 0.00 0.01 3_28 Chr5 14 T CTC 5 % % % X XD 匪 Tl- 358 911 TTTCCTGATGGTCtAcacCTG 0.00 0.00 0.01 3_28 Chr5 14 T CTC 5%%% X X
D醒 ΊΊ- 541194 TTTCCTGATGGTCCAcacCTG 0.02 0.00 0.00 3_29 Chr5 62 TTAaTg 5 % % % 0 XD 醒-541 194 TTTCCTGATGGTCCAcacCTG 0.02 0.00 0.00 3_29 Chr5 62 TTAaTg 5%%% 0 X
D匪 Tl- 558799 TTTCCTGATGGTCCAcacCTG 0.02 0.04 0.01 3_30 Chr5 68 TTAacC 5 % % % 0 0D 匪 Tl- 558799 TTTCCTGATGGTCCAcacCTG 0.02 0.04 0.01 3_30 Chr5 68 TTAacC 5%%% 0 0
D醒 TI¬ 645450 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01D 醒 TI ¬ 645 450 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01
S 1 Chr5 17 TTgCaC 5 % % % X X S 1 Chr5 17 TTgCaC 5%%% X X
D匪 Tl- Chr5 103867 TTTTCTtATtGTCaATcaCTG 5 0.01 0.01 0.00 X X 3_32 835 TTACTC % % % D 匪 Tl- Chr5 103867 TTTTCTtATtGTCaATcaCTG 5 0.01 0.01 0.00 XX 3_32 835 TTACTC%%%
D匪 Tl- 113070 TTTCCTGATGGTCCAcacCTG 0.01 0.00 0.02 3_33 Chr5 541 TTgCcC 5 % % % X X D 匪 Tl- 113070 TTTCCTGATGGTCCAcacCTG 0.01 0.00 0.02 3_33 Chr5 541 TTgCcC 5%%% X X
D匪 Tl- 128492 ITTCCTGATGGTCCAcacCTG 0.01 0.00 0.00 3—34 Chr5 015 TTgCaC 5 % % % X XD 匪 Tl- 128492 ITTCCTGATGGTCCAcacCTG 0.01 0.00 0.00 3—34 Chr5 015 TTgCaC 5%%% X X
D匪 ΊΊ- 174988 TTTCCTGATGGTCCAcacCTG 0.00 0.01 0.01 3_35 Chr5 329 TTgCaC 5 % % % X XD 匪 匪-174988 TTTCCTGATGGTCCAcacCTG 0.00 0.01 0.01 3_35 Chr5 329 TTgCaC 5%%% X X
D丽 Tl- Chr l 334763 TTTGgTGAgGGTCCAaGTCTt 0.02 0.04 0.02 3_36 6 17 TTACcC 5 % % % X XDlli Tl- Chr l 334763 TTTGgTGAgGGTCCAaGTCTt 0.02 0.04 0.02 3_36 6 17 TTACcC 5%%% X X
DNMT1- 326896 TTTCCaGATcGTCaATGTaTG 0.00 0.00 0.00 3_37 Chr l 27 gTACTC 5 % % % X XDNMT1- 326896 TTTCCaGATcGTCaATGTaTG 0.00 0.00 0.00 3_37 Chr l 27 gTACTC 5%%% X X
D丽 Tl- 146123 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_38 Chr l 481 TTgCaC 5 % % . % 0 XD 丽 Tl-146123 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_38 Chr l 481 TTgCaC 5%% . % 0 X
D匪 ΊΊ- 147665 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_39 Chrl 313 TTgCaC 5 % . % % X XD 匪 ΊΊ- 147665 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_39 Chrl 313 TTgCaC 5%. %% X X
D MT1- Chr l 718666 ITTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_40 3 78 TTgCaC 5 % % % X X 腦 Tl- Chr l 820880 TTTCCcGATGGTCCAcaTCTG 0.01 0.01 0.00 3_41 3 53 TTACca 5 % % % X XD MT1- Chr l 718 666 ITTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_40 3 78 TTgCaC 5%%% X X 腦 Tl- Chr 820880 TTTCCcGATGGTCCAcaTCTG 0.01 0.01 0.00 3_41 3 53 TTACca 5%%% X X
D匪 Tl- 458189 ITTCCTGATGGTCCATactTa 0.00 0.01 0.01 3ᅳ 42 Chr2 35 TTACaC 5 % % % X XD 匪 Tl- 458189 ITTCCTGATGGTCCATactTa 0.00 0.01 0.01 3 ᅳ 42 Chr2 35 TTACaC 5%%% X X
DNMT1- 697944 TTTCCTGATGGTCCcTGgCcc 0.04 0.05 0.04 3—43 Chr2 08 TcACTC 5 % % % X XDNMT1- 697944 TTTCCTGATGGTCCcTGgCcc 0.04 0.05 0.04 3—43 Chr2 08 TcACTC 5%%% X X
D匪 Ί- 968336 TTTCCTGATGGTCCAcacCTG 0.04 0.04 0.05 3一 44 Chr2 34 TTgCaC 5 % % . % . X XD 匪 Ί- 968 336 TTTCCTGATGGTCCAcacCTG 0.04 0.04 0.05 3 一 44 Chr2 34 TTgCaC 5%% . %. XX
D匪 Tl- 121847 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_45 Chr2 952 TTgCaC 5 % % % X XD 匪 Tl- 121 847 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 3_45 Chr2 952 TTgCaC 5%%% X X
DNMT1- 124681 TTTTCTGtTaGTCCATtTgTG 0.01 0.01 0.00 3_46 Chr2 966 TTACTg 5 % % . % X X 0 0 08"0 ΟΖΌ I0"0 g 0I33BIV30imV0I33 l S9Z,926 -u匪 aDNMT1- 124681 TTTTCTGtTaGTCCATtTgTG 0.01 0.01 0.00 3_46 Chr2 966 TTACTg 5%% . % XX 0 0 08 " 0 ΟΖΌ I0 " 0 g 0I3 3B IV30imV0I33 l S9Z, 926 -u 匪 a
X X % % % 9 eg Χ·"Ό 09— εX X%%% 9 eg Χ · "Ό 09— ε
TOO OO'O TOO 961988 -U NdTOO OO ' O TOO 961988 -U Nd
X X % % % s : ¾1 o )™ X X%%% s: ¾1 o) ™
OO'O OO'O 00.0 OlD^e^VOOIOOIVOIOlLLL u誦OO ' O OO ' O 00.0 OlD ^ e ^ VOOIOOIVOIOlLLL u 誦
X X ¾ % ¾ 9 ' : K)S L 68 XX ¾% ¾ 9 ' : K) S L 68
00.0 00·0 TO'O z 6εε -LL匪 a00.0 00 · 0 TO ' O z 6εε -LL 匪 a
X X % ¾ % s 6·"Ό X X% ¾% s 6 · "Ό
OO'O OO'O OO'O z i -u匪 aOO ' O OO ' O OO ' O zi-u 匪 a
X X % ¾ % Q 0 95"εXX% ¾% Q 0 95 " ε
TOO OO'O Ι0·0 01¾0IV3310 V¾¾VX11 OZCOII -u画TOO OO ' O Ι0 · 0 01¾0IV3310 V¾¾VX11 OZCOII -u 画
X X % % % s 313 ILL S9 gg"sXX%%% s 313 ILL S9 gg " s
SO'O 90"0 w).o Ol^lVOOI^IVOIOOllL IJ¾ -u顧 aSO ' O 90 " 0 w) .o Ol ^ lVOOI ^ IVOIOOllL I J ¾ -u 顧 a
0 0 % % % g BV L Z8 0 0%%% g B VL Z8
εο'ο o'o ZOO 9S99I2 -u画 aεο ' ο o ' o ZOO 9S99I2 -u 画 a
X X % % % g 91 9J¾ X X%%% g 91 9J¾
SO'O OO'O 10Ό 88I6Z -LL顺 aSO '' O OO '' O 10Ό 88I6Z -LL 顺 a
0 0 % % % s 33¾ T65 0 0 0%%% s 3 3¾ T65 0
TOO K)'0 ZOO 0I3IBIV301001V0I331 L ' -u匪TOO K) '' 0 ZOO 0I3I B IV301001V0I331 L '' -u
X 0 % % % s : > P 9Z 0 X 0%%% s :> P 9Z 0
oro 90"0 ΑΟΌ 0I3°eiV33I001V0133ULL - iNaoro 90 " 0 ΑΟΌ 0I3 ° eiV33I001V0133ULL-iNa
X X % % % 086 0 09~SXX%%% 086 0 09 ~ S
OO'O TOO TOO K)680I ΐ·"0 -u腿 aOO ' O TOO TOO K) 680I ΐ · "0 -u 腿 a
X X % % % s £1 T X X%%% s £ 1 T
10*0 OO'O ZOO 0IB10IVB3B09IV01¾1LLL z^d . -U I, 10 * 0 OO ' O ZOO 0IB10IVB3B09IV01 ¾ 1LLL z ^ d . -UI ,
X X % % ¾ s : )¾i LO X X%% ¾ s :) ¾i LO
OO'O OO'O OO'O 0I33R3VD3100XV0I3311X L6LV0Z -u顧 aOO ' O OO ' O OO ' O 0I3 3R3 VD3100XV0I3311X L6LV0Z -u 顧 a
X X % % % s 3I3V11 290 X X%%% s 3 I3 V11 290
OO'O OO'O OO'O S ZSI -u匪 a t6l7660//J0Z OAV 3_61 63 TTAaca % % % OO '' O OO ' ' O OO ' ' OS ZSI -u 匪 a t6l7660 // J0Z OAV 3_61 63 TTAaca%%%
D MT1- 975461 TTTCCTGATGGTCCAcGcCTG 0.01 0.33 0.01  D MT1- 975461 TTTCCTGATGGTCCAcGcCTG 0.01 0.33 0.01
3_62 ChrX 76 TTAaca 5 % % % 0 03_62 ChrX 76 TTAaca 5%%% 0 0
D匪 Tl- 146525 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01 D 匪 Tl- 146 525 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.01
3一 63 ChrX 283 TTgCaC 5 % % % X X3 一 63 ChrX 283 TTgCaC 5%%% X X
D丽 Tl- Chr l 136873 TTTCCTGATGGcCCAcacCTG 0.01 0.01 0.01 D 丽 Tl- Chr l 136873 TTTCCTGATGGcCCAcacCTG 0.01 0.01 0.01
3_64 8 81 TTACaC 5 % % % X X3_64 8 81 TTACaC 5%%% X X
D丽 Tl- Chr l 333305 TTTCCTGATGGTCCAc acCTG 0.01 0.00 0.02 D 丽 Tl- Chr l 333 305 TTTCCTGATGGTCCAc acCTG 0.01 0.00 0.02
3_65 8 96 TTgCaC 5 % % % X X3_65 8 96 TTgCaC 5%%% X X
DNMT1- Chr l 607384 TTTGCTcATGcTCCATGcCTG 0.02 0.02 0.01 DNMT1- Chr l 607384 TTTGCTcATGcTCCATGcCTG 0.02 0.02 0.01
3—66 8 22 TgAgTC 5 % % % X X3—66 8 22 TgAgTC 5%%% X X
D MT1- Chr l 686816 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 D MT1- Chr l 686 816 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00
3_67 8 15 TTgCaC 5 % % % X X3_67 8 15 TTgCaC 5%%% X X
D匪 Tl- Chrl 502468 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00 D 匪 Tl- Chrl 502468 TTTCCTGATGGTCCAcacCTG 0.00 0.00 0.00
3_68 1 6 TTgCaC 5 % % % X X3_68 1 6 TTgCaC 5%%% X X
DNMT1- Chr l 221891 TTTCCTGATGaTCCATacCTG 0.01 0.00 0.01 DNMT1- Chr l 221891 TTTCCTGATGaTCCATacCTG 0.01 0.00 0.01
3—69 1 52 TTgCaC 5 % % % X X3—69 1 52 TTgCaC 5%%% X X
D匪 ΊΊ- Chr l 261242 TTTCCTGATGGTCCAcaTCTG 0.00 0. 12 0. 14 D 匪 ΊΊ- Chr l 261242 TTTCCTGATGGTCCAcaTCTG 0.00 0. 12 0. 14
3_70 1 28 TTAaca 5 % % % 0 0 3_70 1 28 TTAaca 5%%% 0 0
【표 9] [Table 9]
DNMT1- , ' DNMT1- , '
Inde l frequency D- Inde l frequency D-
(%) cap .(%) cap.
Mi s Mi s
(一 AsCpf LbCpf A L (一 AsCpf LbCpf A L
Locat i on PAMᅳ Target Sequence No . )Cpf 1 1 s bLocat i on PAM ᅳ Target Sequence No. Cpf 1 1 s b
On- chr l 1024433 TTTATTTCCOTCAGCTAAAAT 12.24 On- chr l 1024433 TTTATTTCCOTCAGCTAAAAT 12.24
target 9 8 AAAGG 0 0.07% % 3.38% 0 0 (Ί ΤΑ-서열번호 20) target 9 8 AAAGG 0 0.07%% 3.38% 0 0 (Ί ΤΑ-SEQ ID NO: 20)
DNMT1- 1427068 TTmTTCCOTgAGCTAAAAT  DNMT1- 1427068 TTmTTCCOTgAGCTAAAAT
4_02 chr7 43 AAAtG 2 0.09% 0.09% 0.07% X X4_02 chr7 43 AAAtG 2 0.09% 0.09% 0.07% X X
DNMT1- 1771052 TTTGT TCCC CAGtTAAAAT DNMT1- 1771052 TTTGT TCCC CAGtTAAAAT
4_03 chr4 77 AtgGG 3 0.01 0.01% 0.01% X X4_03 chr4 77 AtgGG 3 0.01 0.01% 0.01% X X
DNMT1- 1822948 TTTATTgCCC TCAGCTAAAAT DNMT1- 1822948 TTTATTgCCC TCAGCTAAAAT
4_04 chr4 50 AcAGt 3 0.03% 0.04% 0.05% X X4_04 chr4 50 AcAGt 3 0.03% 0.04% 0.05% X X
D丽 Tl- 9997567 TTTCTTTCCCTTt AGCTAAAcTD 丽 Tl- 9997567 TTTCTTTCCCTTt AGCTAAAcT
S chr8 2 tcAGG 4 0.05% 0.05% 0.05% X X S chr8 2 tcAGG 4 0.05% 0.05% 0.05% X X
DNMT1- 5229175 TTTTTTTCCCTOc 11 TAAAAa DNMT1- 5229175 TTTTTTTCCCTOc 11 TAAAAa
4_06 chr3 2 AAAGG 4 6. 16% 6. 16% 6. 12% X X4_06 chr3 2 AAAGG 4 6. 16% 6. 16% 6. 12% X X
D匪 Tl- 3910905 TITaTTt CCTTCAGCTAAAAT D 匪 Tl- 3910905 TITaTTt CCTTCAGCTAAAAT
4_07 chr7 3 AAAat 4 0.04% 0.06% 0.04% X X4_07 chr7 3 AAAat 4 0.04% 0.06% 0.04% X X
D丽 Tl- 9163303 TTTTTTcCCCTTCAGgTAtAAT D 丽 Tl- 9163303 TTTTTTcCCCTTCAGgTAtAAT
4—08 chr7 8 AAAGa 4 0.23% 0.22% 0.29% X X4—08 chr7 8 AAAGa 4 0.23% 0.22% 0.29% X X
D丽 Tl- 1137328 TGTTgCCaTTt AGCTAAAcT D 丽 Tl- 1137328 TGTTgCCaTTt AGCTAAAcT
4—09 chr7 89 AAAGG 4 0.05% 0.04% 0.04% X X4—09 chr7 89 AAAGG 4 0.05% 0.04% 0.04% X X
D匪 Tl- 1476614 TTTG TaCCCTTgAGCTAcAATD 匪 Tl-1476614 TTTG TaCCCTTgAGCTAcAAT
_10; chr4 42 . . . AAAaG 4 0. 12% 0.09% 0.08% X X _10; chr4 42. . . AAAaG 4 0. 12% 0.09% 0.08% X X
D匪 Ti¬ 1816990 TTTmrCt OTgAGCTAAAATD 匪 Ti ¬ 1816990 TTTmrCt OTgAGCTAAAAT
l 1 chr4 76 AtAcG 4 0. 16% 0. 14% 0. 12% X X l 1 chr4 76 AtAcG 4 0. 16% 0. 14% 0. 12% X X
D匪 Ti¬ 1525413 TTTTTTTCCaTTtAGCTMgATD 匪 Ti ¬ 1525413 TTTTTTTCCaTTtAGCTMgAT
l2 chr5 20 AAAGc 4 1.52% 1.53% 1.49% X Xl 2 chr5 20 AAAGc 4 1.52% 1.53% 1.49% XX
D匪 ΊΊ- 6709882 TTTTTrTCCCTaCAGgaAAAAa 21.96 21.25 21.58 4—13 chrl 0 AAAGG 4 % % % X XD 匪 6-6709882 TTTTTrTCCCTaCAGgaAAAAa 21.96 21.25 21.58 4—13 chrl 0 AAAGG 4%%% X X
D丽 Tl- chrl 8589643 TTTATTTaCtnCAGtTAAAAT D 丽 Tl- chrl 8589643 TTTATTTaCtnCAGtTAAAAT
4—14 0 3 AAAtG. 4 0.01% 0.02% 0.01% X X4—14 0 3 AAAtG. 4 0.01% 0.02% 0.01% X X
D匪 Tl- 8033450 TTTTTTTCCt gTCAGaTAAAAT D 匪 Tl- 8033450 TTTTTTTCCt gTCAGaTAAAAT
4ᅳᅳ 15 chr6 5 AAAGa 4 0.59% 0.51% 0.56% X X DNM x oT1- 2992821 TTTCTTTCCCTTCAt t TAcAAT 4 ᅳ ᅳ 15 chr6 5 AAAGa 4 0.59% 0.51% 0.56% XX DNM x oT1- 2992821 TTTCTTTCCCTTCAt t TAcAAT
4_16 chrX 6 AAtGG 4 0.02% 0.02% 0.02% X X4_16 chrX 6 AAtGG 4 0.02% 0.02% 0.02% X X
D匪Ί- 1365062 TTTTmCCtTTCAGCTgAAAT D 匪 Ί- 1365062 TTTTmCCtTTCAGCTgAAAT
4_17 chrX 80 AgAGa 4 1.35% 1 .27% 1.20% X X4_17 chrX 80 AgAGa 4 1.35% 1 .27% 1.20% X X
D匪 Tl- chr l 3741615 TTTmTCCCcTCAGCcAAcAg D 匪 Tl- chr l 3741615 TTTmTCCCcTCAGCcAAcAg
dS 8. 8 AAAGG 4 0. 12% 0. 17% 0. 16% X XdS 8. 8 AAAGG 4 0. 12% 0. 17% 0. 16% X X
D匪 Ti¬ chr l 1308185 TTTAaTTCCCTTCAGgTAMATD 匪 Ti ¬ chr l 1308 185 TTTAaTTCCCTTCAGgTAMAT
l9 1 72 tAgGG 4 0.02% 0.02% 0.03% X Xl 9 1 72 tAgGG 4 0.02% 0.02% 0.03% XX
【표 10】 Table 10
EMX1-2 . . ■  EMX1-2. . ■
Indel frequency D- (%) cap . Indel frequency D-(%) cap.
Mi s Mi s
(- AsCpf LbCpf A L (-AsCpf LbCpf A L
Locat i on PAMᅳ Target Sequence No . )Cpf 1 1 s b Locat i on PAM ᅳ Target Sequence No. Cpf 1 1 s b
TTTGTCCTCCGGTOTGGAACC  TTTGTCCTCCGGTOTGGAACC
On- 7316092 ACACC 0.02 12.66 25.33 t arget chr2 0 (TI G-서열번호 23) 0 % % % 0 0  On- 73 16092 ACACC 0.02 12.66 25.33 t arget chr2 0 (TI G-SEQ ID NO: 23) 0%%% 0 0
1344092 TTTCTCCTCaGGTOTGGAACC 0.00  1344092 TTTCTCCTCaGGTOTGGAACC 0.00
chr6 88 Aat aC 4 % 0.04% 0.07% 0 0 chr6 88 Aat aC 4% 0.04% 0.07% 0 0
EMX1- 2340827 TTTCTCCTCCGGcTtTaGAgtC 0.05 EMX1- 2340827 TTTCTCCTCCGGcTtTaGAgtC 0.05
2_03 chr l 9 ACACC 5 % 0.04% 0.04% X X2_03 chr l 9 ACACC 5% 0.04% 0.04% X X
EMX1- TTTCTCCTgCGGgTCTGcAAt C 0.01 EMX1- TTTCTCCTgCGGgTCTGcAAt C 0.01
2—04 chr l 7977477 tCACC 5 % 0.00% 0.02% X X2—04 chr l 7977477 tCACC 5% 0.00% 0.02% X X
EMX1- chr l 6870348 TTTATggTggGGTOTGGAACC 0.01 EMX1-chr l 6870348 TTTATggTggGGTOTGGAACC 0.01
2_05 0 4 AaACC 5 % 0.01% 0.00% X X2_05 0 4 AaACC 5% 0.01% 0.00% X X
EMX1- chr l 1028941 mGTCCgCCGGTTCTGGAACC 0.01 EMX1-chr l 1028941 mGTCCgCCGGTTCTGGAACC 0.01
2_06 0 00 Aggt t 5 % 0.00% 0.00% X X ΈΜΧΙ- chrl 1193075 TTTGTt CTt CGGTTCTGaAACC 0.01 2_06 0 00 Aggt t 5% 0.00% 0.00% XX ΈΜΧΙ- chrl 1193075 TTTGTt CTt CGGTTCTGaAACC 0.01
2_07 0 80 AtACt 5 % 0.02% 0.01% X X2_07 0 80 AtACt 5% 0.02% 0.01% X X
EMXl- chr l 9375134 TTTATCaTggt GgTCTGGAACC 0.02 EMXl- chr l 9375134 TTTATCaTggt GgTCTGGAACC 0.02
2—08 1 8 ACACC 5 % 0.02% 0.01% X X2—08 1 8 ACACC 5% 0.02% 0.01% X X
EMX1- chr l 5183337 TTTTTt tTt t aGTTCTGGAACC 0.96 EMX1-chr l 5183337 TTTTTt tTt t aGTTCTGGAACC 0.96
2_09 2 8 ACACC 5 % 1.06% 0.98% X X2_09 2 8 ACACC 5% 1.06% 0.98% X X
EMX1- chrl 4809377 TTTATatTCaGGTOTGGMCC 0. 19 EMX1- chrl 4809377 TTTATatTCaGGTOTGGMCC 0. 19
2_10 4 2 AacCC 5 % 0. 16% 0. 11% X X2_10 4 2 AacCC 5% 0. 16% 0. 11% X X
EMX1- 1595169 TTTCTCCaCaGcTTCTGGgACC 0. 12 EMX1- 1595169 TTTCTCCaCaGcTTCTGGgACC 0. 12
2—11 chr2 70 cCACC 5 % 0.09% 0.08% X X2—11 chr2 70 cCACC 5% 0.09% 0.08% X X
EMXl- 4557688 ΓΓΤΑΤ t CTgg t GTTCTGGAACC 0.04 EMXl- 4557688 ΓΓΤΑΤ t CTgg t GTTCTGGAACC 0.04
2—12 chr5 5 AaACC 5 % 0.03% 0.02% X X2—12 chr5 5 AaACC 5% 0.03% 0.02% X X
EMX1- 1495630 TTTGcCCgCCGGTTt TGGAACC 0.06 EMX1- 1495630 TTTGcCCgCCGGTTt TGGAACC 0.06
2_13 chr5 41 AgAtC 5 % 0.04% 0.05% X X2_13 chr5 41 AgAtC 5% 0.04% 0.05% X X
EMX1- 4670387 ITTCatCTCCaGTTCTGGcACC 0.07 EMX1- 4670387 ITTCatCTCCaGTTCTGGcACC 0.07
2_14 chr6 6 tCACC 5 % 0.04% 0.05% X X2_14 chr6 6 tCACC 5% 0.04% 0.05% X X
EMX1- 1228157 TTTCaCCaCCt GTTCTGGAACC 0.20 EMX1- 1228157 TTTCaCCaCCt GTTCTGGAACC 0.20
2_15 chr6 01 ACAaa 5 % 0. 16% 0. 18% X X2_15 chr6 01 ACAaa 5% 0. 16% 0. 18% X X
EMX1- 1209211 TTTATtCTgtGGaTCTGGAACC 0.01 EMX1- 1209211 TTTATtCTgtGGaTCTGGAACC 0.01
2_16 chr7 49 ACAtC 5 % 0.01% 0.02% X X2_16 chr7 49 ACAtC 5% 0.01% 0.02% X X
EMXl- 3249272 TTTAcCCTCCacTOTGGAACt 0.01 EMXl- 3249272 TTTAcCCTCCacTOTGGAACt 0.01
2_17 chr7 6 cCACC 5 % 0.02% 0.01% X X2_17 chr7 6 cCACC 5% 0.02% 0.01% X X
EMX1- 1029378 TTATtCTCtGGTTCTGGAACC 0.00 EMX1- 1029378 TTATtCTCtGGTTCTGGAACC 0.00
2_18 chr9 40 AagtC 5 % 0.00% 0.01% X X2_18 chr9 40 AagtC 5% 0.00% 0.01% X X
EMX1- 1354280 TTTCTCCat aGtTTCTGGAACC 0.00 EMX1- 1354280 TTTCTCCat aGtTTCTGGAACC 0.00
2_19 chrX 78 ACAtC 5 % 0.00% 0.00% X X 2_19 chrX 78 ACAtC 5% 0.00% 0.00% X X
【표 11】 Table 11
GCR&-1
Figure imgf000053_0001
GCR & -1
Figure imgf000053_0001
【표 12]
Figure imgf000053_0002
CCR5- 1822888 TTTTGCCTt gATAATTGCAGaA 0.01
Table 12
Figure imgf000053_0002
CCR5- 1822888 TTTTGCCTt gATAATTGCAGaA 0.01
09-06 chr4 63 GCTgT 4 % 0.00% 0.00% X X09-06 chr4 63 GCTgT 4% 0.00% 0.00% X X
CCR5- chr l 5517666 TTTTcCCTGAATAcTTcCAGTg 0.02 CCR5- chr l 5517666 TTTTcCCTGAATAcTTcCAGTg 0.02
09-07 6 9 GCTCT 4 % 0.01% 0.01% X X09-07 6 9 GCTCT 4% 0.01% 0.01% X X
CCR5- 1849805 TTTTGt CTGgATAAcTGCAGTA 0.00 CCR5- 1849805 TTTTGt CTGgATAAcTGCAGTA 0.00
09-08 chr2 77 tCTCT 4 % 0.00% 0.00% X X09-08 chr2 77 tCTCT 4% 0.00% 0.00% X X
CCR5- chr2 3430349 TTTGGCCTcMa c ATTGCAGaA 0.01 CCR5- chr2 3430349 TTTGGCCTcMa c ATTGCAGaA 0.01
09-09 1 5 GCTCT 4 % 0 .00% 0.00% X X09-09 1 5 GCTCT 4% 0 .00% 0.00% X X
CCR5- chr l 4258428 mCtCCTGAATtATTGCAGTA 0.01 CCR5- chr l 4258428 mCtCCTGAATtATTGCAGTA 0.01
09-10 7 . 5 GCTac 4 % 0.02% 0.00% X X09-10 7. 5 GCTac 4% 0.02% 0.00% X X
CCR5- 8091747 TTAGCCTGAATt ATTaCAaTA 0.00 CCR5- 8091747 TTAGCCTGAATt ATTaCAaTA 0.00
09-11 chrX 7 GCTtT 4 % 0.00% 0.00% X X chr l 2999362 TTTTGCCTGgAcAATTGCAaTA 0.00 09-11 chrX 7 GCTtT 4% 0.00% 0.00% X X chr l 2999362 TTTTGCCTGgAcAATTGCAaTA 0.00
1 2 GCTtT 4 % 0.01% 0.01% X X 1 2 GCTtT 4% 0.01% 0.01% X X
【표 13】 . Table 13.
HPRTl-l  HPRTl-l
Inde l frequency D- (%) cap . Inde l frequency D- (%) cap.
Mi s Mi s
(- AsCpf LbCpf A L (-AsCpf LbCpf A L
Locat i on PAMᅳ Target Sequence No . )Cpf 1 1 s b Locat i on PAM ᅳ Target Sequence No. Cpf 1 1 s b
T TGCTGACCTGCTGGATTACA  T TGCTGACCTGCTGGATTACA
0n- 1336092 TCAAA 0.02  0n- 1336092 TCAAA 0.02
t arget chrX 98 (T TG-서열번호 27) 0 % 9.76% 0 0t arget chrX 98 (T TG-SEQ ID NO: 27) 0% 9.76% 0 0
HPRT1- 3024867 TTTGCTcACCTGCTGGAmCA 0.01 HPRT1- 3024867 TTTGCTcACCTGCTGGAmCA 0.01
01-02 chr5 8 TCAAA 1 % 0.08% 0.04% 0 0 chr l 9373214 TTTGCTGACCTGCTaGATaACA 0.03 01-02 chr5 8 TCAAA 1% 0.08% 0.04% 0 0 chr l 9373 214 TTTGCTGACCTGCTaGATaACA 0.03
1 4 TCAAA 2 % 0.04% 0.02% 0 0 HPRT1- chrl 6289253 TTTACTGACaact TGGATTACA 0.01 1 4 TCAAA 2% 0.04% 0.02% 0 0 HPRT1- chrl 6289253 TTTACTGACaact TGGATTACA 0.01
01-04 2 1 TCAAA 4 % 0.01% 0.01% X X01-04 2 1 TCAAA 4% 0.01% 0.01% X X
HPRT1- chrl 7875813 TTTTtTaACCTGCTGGAmaA 0. 10 HPRT1- chrl 7875813 TTTTtTaACCTGCTGGAmaA 0. 10
01-05 7 1. TgAAA 4 % 0.08% 0.08% X X 01-05 7 1.TgAAA 4% 0.08% 0.08% X X
【표 14] Table 14
HPRT1-A  HPRT1-A
Indel frequency D- (%) cap . Indel frequency D-(%) cap.
Mi s Mi s
(- AsCp LbCp A L (-AsCp LbCp A L
Locat ion PAM-Target Sequence No . )Cpf f l f l s b Locat ion PAM-Target Sequence No. Cpf f l f l s b
TTTATGTCCCCTGTTGACTGGT  TTTATGTCCCCTGTTGACTGGT
On- 1336204 CATTC 34.9 36. 1 target ChrX 66 (ΤΊΤΑ-서열번호 28) 0 1.67% 6% 8% 0 0 On- 1336204 CATTC 34.9 36. 1 target ChrX 66 (ΤΊΤΑ-SEQ ID NO: 28) 0 1.67% 6% 8% 0 0
HPRT1- Chrl 9373202 TTTATaTCCCCTGTTGACTGGT 0. 11 7.57 HPRT1- Chrl 9373202 TTTATaTCCCCTGTTGACTGGT 0. 11 7.57
04_02 1 3 CA Ta 2 0.03% % % 0 004_02 1 3 CA Ta 2 0.03%%% 0 0
HPRT1- 1610399 TTTATGTCCCCTcTTGcCTGGT 0.08 0.08 HPRT1- 1610399 TTTATGTCCCCTcTTGcCTGGT 0.08 0.08
04—03 Chr5 71 CATaa 4 0. 10% % % 0 0 04—03 Chr5 71 CATaa 4 0. 10%%% 0 0
【표 15】 Table 15
AA VSl . . ' ' , ' ' AA VSl. . ' ', ''
Indel frequency D- Indel frequency D-
(%) cap .(%) cap.
Mi s Mi s
(- AsCpf LbCpf A L (-AsCpf LbCpf A L
Locat i on PAM-Target Sequence No . )Cpf 1 1 s bLocat i on PAM-Target Sequence No. Cpf 1 1 s b
On- Chrl 5562691 TTTGCmCGATGGAGCCAGAG 0.00 22.42 22.78 target 9 6 AGGAT 0 % % % 0 0 (T G-서열번호 21) On- Chrl 5562691 TTTGCmCGATGGAGCCAGAG 0.00 22.42 22.78 target 9 6 AGGAT 0%%% 0 0 (T G-SEQ ID NO: 21)
AAVS1_ 7999913 TTTTCTTt tGATGGtGCCAGAG 0.00  AAVS1_ 7999913 TTTTCTTt tGATGGtGCCAGAG 0.00
02 Chr2 3 AGGAT 3 % 0.01% 0.00% X X 02 Chr2 3 AGGAT 3% 0.01% 0.00% X X
AAVS1_ 1138383 TTTTCTTc t GcTGGAGCCAGAG 0.01 AAVS1_ 1138383 TTTTCTTc t GcTGGAGCCAGAG 0.01
03 Chr8 77 " AGGcT 4 % ' 0.01% 0.01% X X03 Chr8 77 " AGGcT 4% ' 0.01% 0.01% XX
AAVS1_ 9631709 TTTCOTAtGATGaAGCCAGAG 0.00 AAVS1_ 9631709 TTTCOTAtGATGaAGCCAGAG 0.00
04 Chr4 3 AaGcT 4 % 0.08% 0.53% 0 0  04 Chr4 3 AaGcT 4% 0.08% 0.53% 0 0
【표 16】 Table 16
Figure imgf000056_0001
Figure imgf000056_0001
【표 17】 Table 17
VEGFA-2 :  VEGFA-2 :
Inde l frequency (%) Inde l frequency (%)
Mi s (- AsCpf LbCpfMi s (-AsCpf LbCpf
Locat i on PAM一 Tar get Sequence )Cpf 1 1 No. Locat i on PAM 一 Tar get Sequence) Cpf 1 1 No.
TmCGTCCAACTTCTGGGCTGTT  TmCGTCCAACTTCTGGGCTGTT
On- Chr CTC 0.942 0.199 target 6 43738576 (ΓΠΤ-서열번호 41) 0 % %  On- Chr CTC 0.942 0.199 target 6 43738 576 (ΓΠΤ-SEQ ID NO: 41) 0%%
VEGFA- Chr 10405144 T TACt aCCAACTTCTt t GCTGTT 0.022 0.025 0.026 02_02 X 3 CTC 4 % % %  VEGFA- Chr 10405 144 T TACt aCCAACTTCTt t GCTGTT 0.022 0.025 0.026 02_02 X 3 CTC 4%%%
상기 표 8 내자표 17에서 , 소문자 알파벳은 mismatch 위치를 나타내고, 'Mis-No.'는 mismatch 개수를 의미하고, '(-)Cpf'는 Cpfl을 첨가하지 않은 경우를 의미하고, 'As1와 'Lb'는 각각 'AsCpfl' 및 'LbCpfl'을 의미한다. 또한, 'D- Cap.'은 'Di genome Capture'를 의미하는 것으로, Di genome sequencing (실入 HI 4)에 의하여 얻어진 cleavage score가 컷오프 값 (2.5) 이상인 것은 'o'로 표시하고 그 이하인 것은 'x'로 표시하였다. In Table 8, the lowercase alphabet represents the mismatch position, 'Mis-No.' Means the number of mismatches, '(-) Cpf' means the case without adding Cpfl, and 'As 1 and 'Lb' means 'AsCpfl' and 'LbCpfl', respectively. In addition, 'D-Cap.' Means 'Di genome Capture'. If the cleavage score obtained by Di genome sequencing (silent HI 4) is greater than or equal to the cutoff value (2.5), it is represented by 'o' and less than Marked with 'x'.
상기 표 8 내지 17에 나타낸 표적 서열 (Target sequence)을 기초로 표 4에서 설명한 방법으로 LbCpil crRNA.및 AsCpfl crRNA를 제작하여 시험에 사용하였다.  Based on the target sequence shown in Tables 8 to 17, LbCpil crRNA. And AsCpfl crRNA were prepared and used in the test by the method described in Table 4.
r o  r o
표 8 내지 17에 나타낸 Indel frequency 는 targeted deep o sequencing 방식으로 측정하였다.  Indel frequencies shown in Tables 8 to 17 were measured by a targeted deep o sequencing method.
상기 표 8 내지 17에 나타난 바와 같이, LbCpfl and AsCpfl을 이용하여 on- target sites (labeled as DNMTl-3 and EMXl-2 sites)와 mismatch number가 5개 이하인 지역의 off-target을 여부를 관찰하였을 때 전체 87 site 중 LbCpfl의 경우는 3개 AsCpfl의 경우는 4개의 site에서 validation 되었으나 off-target indel은 0.04% 에서 0.7)로 on-target indel frequency (34% and 25% with LbCpfl and 47% and 13% with AsCpfl)에 비해 매우 낮았다. 또한 다른 두개의 on-target sites (CCR5-1 and HPRT-1)에 대해서 single mismatch가 있는 homologous sites를 구별하는 것을 관찰하였다. LbCpfl은 CCR5-1 and HPRT-1 site에서 on-target frequency가 각각 19% and 10% 이지만 single-base mismatched sites에서는 각각 0.4% 와 0.04% 였다. 이는 각각 on-target indel frequency에 1/48 (= 19%/0.4%) 과 1/250 (= 10%/0.04%) 수준이므로 single-base mismatch도 잘 구별함을 알 수 있다. 전체적으로 130개의 bona fide off-target sites의 indel frequncy를  As shown in Tables 8 to 17, when using the LbCpfl and AsCpfl to observe the off-target of the on-target sites (labeled as DNMTl-3 and EMXl-2 sites) and the region with a mismatch number of 5 or less Of the 87 sites, LbCpfl was validated at 3 sites and 3 AsCpfl was validated at 4 sites, but off-target indels were 0.04% to 0.7.On-target indel frequency (34% and 25% with LbCpfl and 47% and 13% with AsCpfl). We also observed homologous sites with a single mismatch for the other two on-target sites (CCR5-1 and HPRT-1). LbCpfl had on-target frequencies of 19% and 10% at the CCR5-1 and HPRT-1 sites, respectively, at 0.4% and 0.04% at single-base mismatched sites, respectively. These are 1/48 (= 19% / 0.4%) and 1/250 (= 10% / 0.04%) at the on-target indel frequency, respectively, so it can be seen that the single-base mismatch is well distinguished. Overall, the indel frequncy of 130 bona fide off-target sites
관찰하였으며 그 중 9개 site에서 validation 되었으나 대부분의 site의 indel이 1% 이하였다. 이러한 결과는 Cpfl이 human cell에서 highly specific 함을 보여준다. 비편향적 방식 (unbiased manner)으로 genome-wide C fl off-target 부위 ϊ 확인하기 위하여, 효율이 높은 총 8개의 Cpfl (표 6의 1-8번 target sequence에 대한 crRNA사용)를 사용하여 Digenome-seq (실시예 4)를 수행하였다. DNeasy Tissue kit (Qiagen)를 이용하여 Hela 세포에서 분리된 Cell-free genomic DNA에 실시예 3의 방법으로 얻어진 AsCpfl 및 LbC fl ribonucleoproteins (RNPs)를 고농도 (300 nM Cpfl 및 900 nM crRNA)를 처리하여 절단하고, whole genome sequencing (WGS; 실시예 4 참조)를 수행하였다. 비교를 위하여, SpCas9를 사용하여 동일한 시험을 수행하였다. Among them, 9 sites were validated but most sites had 1% indel Or less. These results show that Cpfl is highly specific in human cells. To identify genome-wide C fl off-target sites in an unbiased manner, Digenome-seq using a total of eight highly efficient Cpfl (using crRNA for target sequences 1-8 in Table 6). (Example 4) was performed. Cell-free genomic DNA isolated from Hela cells using DNeasy Tissue kit (Qiagen) was digested with high concentrations (300 nM Cpfl and 900 nM crRNA) of AsCpfl and LbC fl ribonucleoproteins (RNPs) obtained by the method of Example 3 Whole genome sequencing (WGS; see Example 4). For comparison, the same test was performed using SpCas9.
상기 얻어진 cleavage score (실시예 4) 결과 중 AsCpfl 및 LbCpfl을 사용하여 얻어진 결과를 도 21a (DNMT1-3에 대한 결과) 및 21b (DNMT1-4에 대한 결과) 및 표 18 내지 표 33에 나타내었다.  Results obtained using AsCpfl and LbCpfl among the obtained cleavage scores (Example 4) are shown in FIGS. 21A (results for DNMT1-3) and 21b (results for DNMT1-4) and Tables 18 to 33.
【표 18]  Table 18
LbCpflJ層 77-3  LbCpflJ 層 77-3
DNA sequence at cleavage DNA cleavage  DNA sequence at cleavage DNA cleavage
Chromosome locat ion site score Bulge chr5 13135736 TTTCCTGATGGTCCAcacCTGmaca 13,20 No chr8 112204853 TTTCCTGATGGTCCAcacCTGmaga 12.38 No chrl9 10244444 mCCTGATGGTCCATGTCTGTTACTC 11.97 . No chrll 26124230 TTTCCTGATGGTCCAcaTCTGmaca 11.51 No chrl6 75745894 TTTTCTGATGGTCCATacCTGmCaC 9.36 No chr3 30592945 TTTCCTGATGGTCCAcacCTGTTAaca 8.74 No chrlO 66295933 mCCTGATGGTCCAcacCTGTAaca 8.67 No chr5 39969437 TCTCCTGATGGTCCATacCTGmacg 8.65 No chrlO 6784959 TTTCCTGATGGTCCAcacCTGmaca 7.24 No chr3 166705664 TTTCCTGATGGTCCAcacCTGmaca 5.96 No chr2 62165341 TTTCCTGATGGTCCAcacCTGmaca 5.56 No chrl 89819957 TTTCCTGATGGcCCATacCTGmaca 5.31 No chrX 115862097 TTTCaTGATGGTCCATacCTGTAaca 5.29 No chrX 92676365 mCCTGATGGTCCATacCTGTTAaca 5.22 No chr3 164692184 TTTCCTGATGGTCCAcacCTGmaca 5.05 No chrl6 13699913 TTTCCTGATGGTCCAcacCTGmaca 4.84 No chr2 153648723 TTTCCTGATGGTCCAcacCTGmaca 4.83 No chrl 236623991 TTTACTGATGaTCCATGTCTaaacgTt 4.74 No chrX 97546178 TTTCCTGATGGTCCAcGcCTGmac a 4.45 No chrll 38911731 TTTCCTGATGGTCCAcacCTGTTAaca 4.07 No chrX 57676022 TTTCCTGATGGTCCAcacCTGmaca 4.01 No chr5 55879970 TTTCCTGATGGTCCAcacCTGmacC 3.76 No chrX 153891299 ri CCTGATGGTCCAcacCTGTTAaca 3.62 No chrl4 21663713 mCCTGATGGTCCAcacCTGTTAaTt 3.24 No chr6 55276466 TTTCCTGATGGTCCAcacCTGTTAaca 2.85 No chrlO 113265597 TTTCCTGATGGTCCATaTCTGTggCa t 2.66 No chr7 7682807 TTTCCTGATGGTCCAcacCTGmt ca 2.57 No chrX 8935018 mCCTGATGGTCCAcacCTGTTAaca 2.50 No Chromosome locat ion site score Bulge chr5 13135736 TTTCCTGATGGTCCAcacCTGmaca 13,20 No chr8 112204853 TTTCCTGATGGTCCAcacCTGmaga 12.38 No chrl9 10244444 mCCTGATGGTCCATGTCTGTTACTC 11.97. No chrll 26124230 TTTCCTGATGGTCCAcaTCTGmaca 11.51 No chrl6 75745894 TTTTCTGATGGTCCATacCTGmCaC 9.36 No chr3 30592945 TTTCCTGATGGTCCAcacCTGTTAaca 8.74 No chrlO 66295933 mCCTGATGGTCCAcacCTGTAaca 8.67 No chr5 39969437 TCTCCTGATGGTCCATacCTGmacg 8.65 No chrlO 6784959 TTTCCTGATGGTCCAcacCTGmaca 7.24 No chr3 166705664 TTTCCTGATGGTCCAcacCTGmaca 5.96 No chr2 62165341 TTTCCTGATGGTCCAcacCTGmaca 5.56 No chrl 89819957 TTTCCTGATGGcCCATacCTGmaca 5.31 No chrX 115862097 TTTCaTGATGGTCCATacCTGTAaca 5.29 No chrX 92676365 mCCTGATGGTCCATacCTGTTAaca 5.22 No chr3 164692184 TTTCCTGATGGTCCAcacCTGmaca 5.05 No chrl6 13699913 TTTCCTGATGGTCCAcacCTGmaca 4.84 No chr2 153648723 TTTCCTGATGGTCCAcacCTGmaca 4.83 No chrl 236623991 TTTACTGATGaTCCATGTCTaaacgTt 4.74 No chrX 97546178 TTTCCTGATGGTCCAcGcCTGmac a 4.45 No chrll 38911731 TTTCCTGATGGTCCAcacCTGTTAaca 4.07 No chrX 57676022 TTTCCTGATGGTCCAcacCTGmaca 4.01 No chr5 55879970 TTTCCTGATGGTCCAcacCTGmacC 3.76 No chrX 153891299 ri CCTGATGGTCCAcacCTGTTAaca 3.62 No chrl4 21663713 mCCTGATGGTCCAcacCTGTTAaTt 3.24 No chr6 55 276 466 TTTCCTGATGGTCCAcacCTGTTAaca 2.85 No chrlO 113265597 TTTCCTGATGGTCCATaTCTGTggCa t 2.66 No chr7 7682807 TTTCCTGATGGTCCA57TCTCTCTCCATCTCTCTCTCCATCTCTCCCTACTCCCTAC
' [표 19】 "[Table 19]
LbCpfl_Z¾ 77-4  LbCpfl_Z¾ 77-4
讓 sequence at cleavage 瞧 cleavage  讓 sequence at cleavage 瞧 cleavage
Chromosome locat ion site score Bulge chrl9 10244367 TTTArrTCCCTOAGCTAAAATAAAGG 6.86 No  Chromosome locat ion site score Bulge chrl9 10244367 TTTArrTCCCTOAGCTAAAATAAAGG 6.86 No
[표 20】 TABLE 20
LbCpfl_ -2  LbCpfl_ -2
匪 sequence at cleavage DNA cleavage  匪 sequence at cleavage DNA cleavage
Chromosome location site score Bulge chr2 73160921 nTGTCCTCCGGTTCTGGMCCACACC 12.19 No chr2 177017501 TOATCCTCCGGTOTGGAACCAgAtC 8.08 No chrl7 46690720 TCATCCTCCGGTTCTGGAACCAgAt t 4.71 No chr6 134409314 TTTCTCCTCaGGTOTGGAACCAat aC 3.77 No Chromosome location site score Bulge chr2 73160921 nTGTCCTCCGGTTCTGGMCCACACC 12.19 No chr2 177017501 TOATCCTCCGGTOTGGAACCAgAtC 8.08 No chrl7 46690720 TCATCCTCCGGTTCTGGAACCAgAt t 4.71 No chr6 134409314 TTTCTCCTCaGGTOTGGAACCAat aC 3.77 No
【표 21] Table 21
Figure imgf000060_0002
Figure imgf000060_0002
【표 22] Table 22
Figure imgf000060_0003
Figure imgf000060_0003
【표 23】
Figure imgf000060_0001
Table 23
Figure imgf000060_0001
DNA sequence at cleavage DNA c leavage  DNA sequence at cleavage DNA c leavage
Chromosome locat ion si te score Bulge chrX 133609321 TTTGCTGACCTGCTGGAmCATCAM 4. 15 No chrll 93732147 TTTGCTGACCTGCTaGATaACATCAAA 3.91 No chr5 30248701 TTTGCTcACCTGCTGGAmCATCAAA 2.91 No  Chromosome locat ion si te score Bulge chrX 133609321 TTTGCTGACCTGCTGGAmCATCAM 4.15 No chrll 93732147 TTTGCTGACCTGCTaGATaACATCAAA 3.91 No chr5 30248701 TTTGCTcACCTGCTGGAmCATCAAA 2.91 No
【표 24】 Table 24
LbCpf l_AP?n-4  LbCpf l_AP? N-4
Chromosome locat ion DNA sequence at cleavage DNA c leavage Bulge site score Chromosome locat ion DNA sequence at cleavage DNA c leavage Bulge site score
chrll 93732073 mATaTCCCCTGTTGACTGGTCATTa 28.40 No chr5 161040022 riTATGTCCCCTcTTGcCTGGTCATaa 3.88 No chrX 133620495 mATGTCCCCTGTTGACTGGTCATTC 2.90 No chrll 93732073 mATaTCCCCTGTTGACTGGTCATTa 28.40 No chr5 161040022 riTATGTCCCCTcTTGcCTGGTCATaa 3.88 No chrX 133620495 mATGTCCCCTGTTGACTGGTCATTC 2.90 No
【표 25] Table 25
LbCpfl_^ ?i  LbCpfl_ ^? I
腿 sequence at cleavage DNA cleavage  腿 sequence at cleavage DNA cleavage
Chromosome locat ion site score Bulge  Chromosome locat ion site score bulge
RNA RNA
TTTCCaTACaATGGAGCCAGAGa-GAT TTTCCaTACaATGGAGCCAGAGa-GAT
chr2 34206860 9.31 Bulge chr4 96317122 TTTCCTTAt GATGaAGCCAGAGAaGcT 5.26 No chrl6 34823594 TTTACaTAaGATGAAaCCAGAGAGaAa 4.34 No chrl9 55626945 ITTGCmCGATGGAGCCAGAGAGGAT 2.63 No chr2 34206860 9.31 Bulge chr4 96317122 TTTCCTTAt GATGaAGCCAGAGAaGcT 5.26 No chrl6 34823594 TTTACaTAaGATGAAaCCAGAGAGaAa 4.34 No chrl9 55626945 ITTGCmCGATGGAGCCAGAGAGGAT 2.63 No
【표 26】 Table 26
AsCpfl_»77-3  AsCpfl_ »77-3
DNA sequence at cleavage DNA cleavage  DNA sequence at cleavage DNA cleavage
Chromosome locat ion site score Bulge chrl2 17538224 TTTACTGATGGTCt t acTtTaTaggcC 15.78 No chr7 134517009 TCTCCTGATGGTCCATacCTGTTAaca 14.35 No chr5 13135739 mCCTGATGGTCCAcacCTGTTAaca 13.65 No chr9 25518292 TCTCCTGATGGTCtATaTCTGTTAaaa 12.61 No chf5 39969440 TCTCCTGATGGTCCATacCTGTTAacg 12.11 No chr8 112204856 TTTCCTGATGGTCCAcacCTGmaga 12.05 No chrll 82700148 TTTACTGATGGTCt catTt aaTct tTa 11.02 No chr3 164692191 TTTCCTGATGGTCCAcacCTG TAaca 10.97 No chr4 123785685 TTTCCTGATGGTCt catatTtTct tTa 8.95 No chrl 213377380 TTTCCTGATGGTCCATGTCTGaat t ag 8.65 No chr lO 6784966 TTTCCTGATGGTCCAcacCTG Aaca 8. 18 No chr7 123688384 ITTCCTGATGGTCCAcacCTGmaca 8.03 No chr2 200523682 TTTACTGATGGTat tataggaagt t at 7.99 No chr l6 75745895 TTTTCTGATGGTCCATacCTG ACaC 7. 12 No chr lO 111147398 TTTCCTGATGGTCCATacCTGcTgCaC 7.06 No chr ll 38911734 mCCTGATGGTCCAcacCTGTTAaca 6.69 No chrX 57676029 TTTCCTGATGGTCCAcacCTGTTAaca 6.55 No chr l6 13699916 ITTCCTGATGGTCCAcacCTGmaca 6.45 No chr3 30592953 ITTCCTGATGGTCCAcacCTGmaca 5.77 No chr l9 43263943 TTTACTGATGGTCCAaacaTcTaAgat 5.66 No chr l9 43416520 TTTACTGATGGTCCAaacaTcTaAgat 5.41 No chr6 55276469 ITTCCTGATGGTCCAcacCTGmaca 5.03 No chrlO 66295940 TTTCCTGATGGTCCAcacCTGTTAaca 4.93 No chr l9 43435385 TTTACTGATGGTCCAaacaTcTaAgat 4.85 No chrX 82549910 ITTCCTGATGGTCCAcacCTGmCaC 4.58 No chr5 54119487 TITCCTGATGGTCCAcacCTGmaTg 4.58 No chr l 236623994 ITTACTGATGaTCCATGTCTaaacgTt 4.39 No chr l9 10244446 mCCTGATGGTCCATGTCTGTTACTC 4.35 No chr4 9395117 TTTCCTGATGGTCtAcaTCTGmaca 4.20 No chr3 166705667 T TCCTGATGGTCCAcacCTGTTAaca 4. 13 No chr l4 21663712 TTTCCTGATGGTCCAcacCTGmaTt 4. 12 No chr7 80711731 TTTACTGATGGTCacTaTa aac ac aga 3.79 No chr5 55879977 TTTCCTGATGGTCCAcacCTGmacC 3.69 No chr7 7682808 TTTCCTGATGGTCCAcacCTG At ca 3.56 No chr lO 113265596 TTTCCTGATGGTCCATaTCTGTggCa t 3.25 No chrX 97546179 TTTCCTGATGGTCCAcGcCTGmaca 3.24 No chr l 146123502 TTTCCTGATGGTCCAcacCTGTTgCaC 3.22 No chr7 3669637 TTTCCTGATGGTCCcatcCaaTgt tTa 3.22 No chr ll 26124238 TTTCCTGATGGTCCAcaTCTGmaca 3.09 No chrX 153891300 TTTCCTGATGGTCCAcacCTGmaca 3.00 No chrX 92676366 TTTCCTGATGGTCCATacCTGTTAaca 2.88 No chr2 153648726 mCCTGATGGTCCAcacCTGTTAaca 2.80 No chr2 62165344 TTTCCTGATGGTCCAcacCTGmaca 2.77 No chrl9 43524715 TTTACTGATGGTCtAaacaTcTaAgat 2.72 No chrlO 14334368 TTTCCTGATGGTCt tcaatatcct tct 2.67 No chrl9 43377706 TTTACTGATGGTCCAaacaTcTaAga t 2.53 No Chromosome locat ion site score Bulge chrl2 17538224 TTTACTGATGGTCt t acTtTaTaggcC 15.78 No chr7 134517009 TCTCCTGATGGTCCATacCTGTTAaca 14.35 No chr5 13135739 mCCTGATGGTCCAcacCTGTTAaca 13.65 No chr9 25518292 TCTCCTGATGGTCtATaTCTGTTAaaa 12.61 No chf5 39969440 TCTCCTGATGGTCCATacCTGTTAacg 12.11 No chr8 112204856 TTTCCTGATGGTCCAcacCTGmaga 12.05 No chrll 82700148 TTTACTGATGGTCt catTt aaTct tTa 11.02 No chr3 164692191 TTTCCTGATGGTCCAcacCTG TAaca 10.97 No chr4 123785685 TTTCCTGATGGTCt catatTtTct tTa 8.95 No chrl 213377380 TTTCCTGATGGTCCATGTCTGaat t ag 8.65 No chr lO 6784966 TTTCCTGATGGTCCAcacCTG Aaca 8. 18 No chr7 123688384 ITTCCTGATGGTCCAcacCTGmaca 8.03 No chr2 200523682 TTTACTGATGGTat tataggaagt t at 7.99 No chr l6 75745895 TTTTCTGATGGTCCATacCTG ACaC 7. 12 No chr lO 111147398 TTTCCTGATGGTCCATacCTGcTgCaC 7.06 No chr ll 38911734 mCCTGATGGTCCAcacCTGTTAaca 6.69 No chrX 57676029 TTTCCTGATGGTCCAcacCTGTTAaca 6.55 No chr l6 13699916 ITTCCTGATGGTCCAcacCTGmaca 6.45 No chr3 30592953 ITTCCTGATGGTCCAcacCTGmaca 5.77 No chr l9 43263943 TTTACTGATGGTCCAaacaTcTaAgat 5.66 No chr l9 43416520 TTTACTGATGGTCCAaacaTcTaAgat 5.41 No chr6 55276469 ITTCCTGATGGTCCAcacCTGmaca 5.03 No chrlO 66295940 TTTCCTGATGGTCCAcacCTGTTAaca 4.93 No chr l9 43435385 TTTACTGATGGTCCAaacaTcTaAgat 4.85 No chrX 82549910 ITTCCTGATGGTCCAcacCTGmCaC 4.58 No chr5 54119487 TITCCTGATGGTCCAcacCTGmaTg 4.58 No chr l 236623994 ITTACTGATGaTCCATGTCTaaacgTt 4.39 No chr l9 10244446 mCCTGATGGTCCATGTCTGTTACTC 4.35 No chr4 9395 117 TTTCCTGATGGTCtAcaTCTGmaca 4.20 No chr3 166705667 T TCCTGATGGTCCAcacCTGTTAaca 4. 13 No chr l4 21663712 TTTCCTGATGGTCCAcacCTGmaTt 4. 12 No chr7 80711731 TTTACTGATGGTCacTaTa aac ac aga 3.79 No chr5 55879977 TTTCCTGATGGTCCAcacCTGmacC 3.69 No chr7 7682808 TTTCCTGATGGTCCAcacCTG At ca 3.56 No chr lO 113265596 TTTCCTGATGGTCCATaTCTGTggCa t 3.25 No chrX 97546179 TTTCCTGATGGTCCAcGcCTGmaca 3.24 No chr l 146123502 TTTCCTGATGGTCCAcacCTGTTgCaC 3.22 No chr7 3669637 TTTCCTGATGGTCCcatcCaaTgt tTa 3.22 No chr ll 26124238 TTTCCTGATGGTCCAcaTCTGmaca 3.09 No chrX 153891300 TTTCCTGATGGTCCAcacCTGmaca 3.00 No 2.88 No chr2 TTTCCTGATGGTCCATacCTGTTAaca chrX 92676366 153648726 mCCTGATGGTCCAcacCTGTTAaca 2.80 No 2.77 No chrl9 TTTCCTGATGGTCCAcacCTGmaca chr2 62165344 43524715 14334368 TTTCCTGATGGTCt chrlO TTTACTGATGGTCtAaacaTcTaAgat 2.72 No 2.67 No tcaatatcct tct chrl9 43377706 TTTACTGATGGTCCAaacaTcTaAga t 2.53 No
【표 27] Table 27
Figure imgf000063_0001
Figure imgf000063_0001
【표 28】 Table 28
AsCpf l_ O-2 ' AsCpf l_ O-2 ''
DNA sequence at cleavage DNA c leavage  DNA sequence at cleavage DNA c leavage
Chromosome locat i on si te score Bulge chr2 73160922 TTTGTCCTCCGG CTGGAACCACACC 7.57 No chr2 177017500 TTCATCCTCCGGTTCTGGAACCAgAtC 6.59 No chr6 134409310 TTTCTCCTCaGGTOTGGAACCAat aC 4.44 No chrl7 46690718 TTCATCCTCCGGTTCTGGMCCAgAt t 3.28 No chr7 145773724 TTTGTCCTCCaGaTaTGGAACCAt gt g 3. 14 No  Chromosome locat i on si te score Bulge chr2 73160922 TTTGTCCTCCGG CTGGAACCACACC 7.57 No chr2 177017500 TTCATCCTCCGGTTCTGGAACCAgAtC 6.59 No chr6 134409310 TTTCTCCTCaGGTOTGGAACCAat aC 4.44 No TTCC CTCTCTCCCTCTCTCC TCC
[표 29】 TABLE 29
AsCpf l_OT5-l  AsCpf l_OT5-l
DNA sequence at c leavage DNA c leavage  DNA sequence at c leavage DNA sequence at c leavage
Chromosome locat ion si te score Bulge chr3 46414552 nTTGTGGGCAACATGCTGGTCATCCT 18.09 No RNA Chromosome locat ion si te score Bulge chr3 46414552 nTTGTGGGCAACATGCTGGTCATCCT 18.09 No RNA
TTTGGTGGGCAACATGCcaG-CATTaa  TTTGGTGGGCAACATGCcaG-CATTaa
chr l 113920223 16.52 Bulge chr8 138491414 TTTAGTGGGaAc agTc tgGtcatgagt 14.80 No chr 16 61917098 TTTGGTGGGCAACATGCTat aCAaaaT 12.36 No chr lO 56138671 TCTGGTGGaCAACATGCTGaTCAaagg 11 .54 No chr8 54163354 CTTGGTGGGCAACt cGcTGGTCATgtT 11 .22 No chr6 137588270 TTTGGTGGGgMCATaCaaGTCATa t T 9.30 No chr20 43657503 rTTGGTGGGCAAgcTaCTt aTacggag 9.04 No chr8 24661222 TTTAGTGGGCAAacTat TGaaaAgat a 8.63 No chr6 127930554 TTTGGTGGGCAACt c t aTt aTtgTatc 8.26 No chr7 62666896 TTTAGTGGGCAAt cTaCTGGaaggaag 6.91 No chrX 65962874 TTTGGTGGGCAAgcTatTaaTgATtgc 6.07 No ' chr 19 44648031 GTTAGTGGGCAACATaCTGt aaAgacc 5.79 No chr2 78618092 TTTGGTGGGCAACtTt tTatTgtTgCT 5.59 No chr3 46399210 TTTTGTGGGCAACATGCTGGTCgTCCT 5. 17 No chr l 113920223 16.52 Bulge chr8 138491414 TTTAGTGGGaAc agTc tgGtcatgagt 14.80 No chr 16 61917098 TTTGGTGGGCAACATGCTat aCAaaaT 12.36 No chr lO 56138671 TCTGGTGGaCAACATGCTGaTCAaagg 11 .54 No chr8 54163354 CTTGGTGGGCAACt cGcTGGTCATgtT 11 .22 No chr6 137588270 TTTGGTGGGgMCATaCaaGTCATa t T 9.30 No chr20 43657503 rTTGGTGGGCAAgcTaCTt aTacggag 9.04 No chr8 24661222 TTTAGTGGGCAAacTat TGaaaAgat a 8.63 No chr6 127930554 TTTGGTGGGCAACt ct aTt aTtgTatc 8.26 No chr7 62666896 TTTAGTGGGCAAt cTaCTGGaaggaag 6.91 No chrX 65962874 TTTGGTGGGCAAgcTatTaaTgATtgc 6.07 No 'chr 19 44648031 GTTAGTGGGCAACATaCTGt aaAgacc 5.79 No chr2 ■ 78618092 TTTGGTGGGCAACtTt tTatTgtTgCT 5.59 No chr3 46399210 TTTTGTGGGCAACATGCTGGTCgTCCT 5. 17 No
RNA RNA
TTTAGTGGGaAACtT-CTGGTCATaCa TTTAGTGGGaAACtT-CTGGTCATaCa
chr 15 58588554 5. 11 Bulge chr4 110395952 TTTAGTGGGCAAaccatTt acaAaata 4. 19 No chr l 72141686 T TGGTaGGtAACATGgTGGaagTCaa 4. 18 No chr 15 24068708 TTTTGTGGGCAACATat a t aTaggt cT 3.81 No chr5 56998240 TTTAGTGGGCAACt gt aTt t agAaat c 2.60 No chr 15 58588554 5.11 Bulge chr4 110395952 TTTAGTGGGCAAaccatTt acaAaata 4.19 No chr l 72141686 T TGGTaGGtAACATGgTGGaagTCaa 4.18 No chr 15 24068708 TTTTGTGGGCAACATat a t aTaggt cT 3.81 No chrt agtg aTgt a ttag a 56t 240t
【표 30] Table 30
AsCpf l_OT5-9 '  AsCpf l_OT5-9 ''
DNA sequence at c l eavage DNA c leavage  DNA sequence at c l eavage DNA c leavage
Chromosome l ocat i on s i te score Bulge chr3 44394779 TTTAGCCTGAATAATat t caaTtgTCT 35.07 No chr 15 35754229 TTTGGCCTGAATAAcaa t At acat gt T 14.46 No chr6 3563258 TTTGGt CTGAATMTTt CAGTAGCTCT 11.91 No chrl2 58841086 TITAGCCTGAATAATaCAt Tt aaTaa 11.58 No chrl 144014812 TrTGCCTGAATgATTGCAGTAt tTac 10.39 No chr9 37289591 TTTGGaCTGAATt aTTGCAGTAacatT 7.57 No chrl5 66090016 TTTAGCCTGAAat tTTGCAGTAGt caT 6.55 No chr7 35337090 TTTAGCCTGAATAATat tccatt gccT 6.37 No chr7 13593045 TTTAGCCTGAATAAcat t gtattgTgT 5.59 No chr4 98108485 TTTGCCCTGAATAATTGCAGca t aa t T 5.47 No chr8 74162570 TTTAGCCTGAATAtTatAt aTtatcaT 4.86 No chr3 46415212 mGGCCTGAATMTTGCAGTAGCTCT 4.42 No chr5 59993666 TTTAGCCTGAATAtTat t tGTt aggga 3.73 No chrl5 95848470 TTTGGCCTGAATtATat t acTtAGTCa 3.41 No chr4 108769431 TTTAGCCTGAATAATaatAcTgcaTt a 3.16 No Chromosome l ocat i on si te score Bulge chr3 44394779 TTTAGCCTGAATAATat t caaTtgTCT 35.07 No chr 15 35754229 TTTGGCCTGAATAAcaa t At acat gt T 14.46 No chr6 3563258 TTTGGt CTGAATMTTt CAGTAGCTCT 11.91 No chrl2 58841086 TITAGCCTGAATAATaCAt Tt aaTaa 11.58 No chrl 144014812 TrTGCCTGAATgATTGCAGTAt tTac 10.39 No chr9 37289591 TTTGGaCTGAATt aTTGCAGTAacatT 7.57 No chrl5 66090016 TTTAGCCTGAAat tTTGCAGTAGt caT 6.55 No chr7 35337090 TTTAGCCTGAATAATat tccatt gccT 6.37 No chr7 13593045 TTTAGCCTGAATAAcat t gtattgTgT 5.59 No chr4 98108485 TTTGCCCTGAATAATTGCAGca t aa t T 5.47 No chr8 74162570 TTTAGCCTGAATAtTatAt aTtatcaT 4.86 No chr3 46415212 mGGCCTGAATMTTGCAGTAGCTCT 4.42 No chr5 59993666 TTTAGCCTGAATAtTat t tGTt aggga 3.73 No chrl5 95848470 TTTGGCCTGAATtATat t acTtTATTTCATCatCatCatCatCatCatCACa
【표 31】 Table 31
DNA sequence at cleavage DNA cleavage DNA sequence at cleavage DNA cleavage
Chromosome location site score Bulge chrll 93732153 TTTGCTGACCTGCTaGATaACATCAAA 48.57 No chr5 30248702 TTTGCTcACCTGCTGGAmCATCAAA 27.49 No chr6 49794715 TTTCCTGACCTGCTa ATa t at cacAA 8.55 No chrX 133609322 TTTGCTGACCTGCTGGAmCATCAAA 6.67 No  Chromosome location site score Bulge chrll 93732153 TTTGCTGACCTGCTaGATaACATCAAA 48.57 No chr5 30248702 TTTGCTcACCTGCTGGAmCATCAAA 27.49 No chr6 49794715 TTTCCTGACCTGCTa ATa t at cacAA 8.55 No chrX 133609322CT 6.TTGCTGACTCAT
【표 32】 Table 32
AsCpfᄂ  AsCpf
DNA sequence at cleavage DNA cleavage  DNA sequence at cleavage DNA cleavage
Chromosome locat ion site score Bulge chrX 133620495 mATGTCCCCTGTTGACTGGTCATTC 12.93 No chrll 93732073 mATaTCCCCTGTTGACTGGTCATTa 7.92 No chr5 161040022 mATGTCCCCTcTTGcCTGGTCATaa 4.46 No 【표 33】 Chromosome locat ion site score Bulge chrX 133620495 mATGTCCCCTGTTGACTGGTCATTC 12.93 No chrll 93732073 mATaTCCCCTGTTGACTGGTCATTa 7.92 No chr5 161040022 mATGTCCCCTcTTGcCTGGTCATaa 4.46 No Table 33
Figure imgf000066_0001
도 21a 및 21b, 및 표 18 내지 표 33에 나타난 바와 같이 , on-target 및 of f-target in vi tro 절단부위 (cleavage si tes)에 해당하는 sequence read들의 정렬은 무작위적이기보다는 균일한 (uni form) 것으로 확인되었으며, in vitro cleavage에 있어서, Cpfl은 on-target si te을 포함하여 1 내지 46 부위에서 높은 특이성을 갖는 것으로 나타났다. in vi tro cleavage si tes (or Digenome— captured sites)의 개수는 LbCpfl의 경우 6±3, AsCpfl의 경우 12± 5로 나타났으며, 이는 본 발명자들의 이전 연구에서 얻어진 SpCas9의 90±30보다 현저하게 낮은 수치이다. 도 22a 내지 22f 는 Cpf 1-mediated Digenome— captured site의 Sequence logos를 보여주는 것으로, 상단은 AsCpfl를 사용하여 얻어진 Digenome-captured si te의 Sequence logos이고, 하단은 LbCpfl를 사용하여 얻어진 Digenome一 captured si te의 Sequence logos이다. 도 22a 내지 22f 에 나타난 바와같이, 8 LbCpfl 및 AsCpf l 뉴클레아제를 사용하여 각각 얻어진 50 및 98개의 in vitro cleavage si te은 mi smatches를 carrying하며 , 이들은 대부분 PAM서열로부터 10-nt 정도 떨어진 PAM 근접 부위 (PAM-proximal region)보다는 PAM서열로부터 13-nt 정도 떨어진 PAM-di stal region에 위치한다.
Figure imgf000066_0001
As shown in FIGS. 21A and 21B and Tables 18-33, the alignment of sequence reads corresponding to on-target and of f-target in vitro cleavage sites is uniform rather than random (uni form). In vitro cleavage, Cpfl was found to have high specificity at 1 to 46 sites, including on-target sites. The number of in vi tro cleavage sites (or Digenome—captured sites) was 6 ± 3 for LbCpfl and 12 ± 5 for AsCpfl, which is significantly more than 90 ± 30 for SpCas9 obtained in our previous study. It is low. 22A-22F show the sequence logos of Cpf 1-mediated Digenome—captured site, the top of which is the sequence logos of Digenome-captured sites obtained using AsCpfl, and the bottom of the Digenome one captured sites obtained using LbCpfl. Sequence logos. As shown in FIGS. 22A-22F, 50 and 98 in vitro cleavage cells obtained using 8 LbCpfl and AsCpf l nucleases, respectively, carry mi smatches, most of which are close to PAM by 10-nt from the PAM sequence. It is located in the PAM-distal region about 13-nt away from the PAM sequence rather than the PAM-proximal region.
8 LbCpf l 뉴클레아제에 의하여 절단되는 50개 부위 중 46개 부위가  8 out of 50 sites cleaved by LbCpf l nuclease
AsCpf l에 의하여 절단되었다. 4개 부위는 각각의 해당 on-target 부위와 비교하여 하나의 뉴클레오타이드가 결실되었으며, 이는 잠재적으로 DNA-crRNA duplex region에서 RNA돌출 (bulge)을 생성할 수 있다. 2개의 LbCpfl 및 AsCpfl 뉴클레아제는 5'-TCTN-3' 및 5'-TTCN-3'와 같은 비정형 (non-canonical)의 PAM 서열을 포함하는 6개 (LbCpfl의 경우) 및 4개 (AsCpfl의 경우)의 부위를 절단하였다. 모든 8개의 on-target 부위들 및 상기에서 deep sequencing에 의하여 확인된 8개의 off-target 부위들을 D i genome— seq에 의하여 capture하였다. Cut by AsCpf l. Four sites deleted one nucleotide compared to each corresponding on-target site, potentially a DNA-crRNA RNA bulges can be generated in the duplex region. The two LbCpfl and AsCpfl nucleases are six (for LbCpfl) and four (AsCpfl) containing non-canonical PAM sequences such as 5'-TCTN-3 'and 5'-TTCN-3'. In the case of)). All eight on-target sites and eight off-target sites identified by deep sequencing above were captured by Di genome—seq.
상기 얻어진 결과를 도 21c에 나타내었다. 도 21c에 나타난 바와 같이 , Casᅳ OFFinder (a fast and versatile algorithm that searches for potential off- target sites of Cas9 腿ᅳ guided endonuc leases. Bioinformatics. 2014 May 15:30(10) :1473-5 참조)에 의하여 확인된 5 개 또는 6개의 mismatch를 갖는 상동부위 (homologous site)의 0.9¾> fraction만이 in vitro 절단되었다. 4 개 또는 그보다 적은 mismatch를 갖는 상동부위는 절단될 가능성이 높아서 D i genome— seq에 의하여 캡쳐되었으나, 이들 부위는 인간 유전체에는 거의 존재하지 않는다 (6±2 such sites per crRNA) .  The obtained result is shown in FIG. 21C. As shown in Figure 21C, by Cas ᅳ OFFinder (a fast and versatile algorithm that searches for potential off-target sites of Cas9 'guided endonuc leases. Bioinformatics. 2014 May 15:30 (10): 1473-5). Only 0.9¾> fraction of homologous sites with 5 or 6 mismatches identified were cut in vitro. Homologous regions with four or fewer mismatches are more likely to be cleaved and captured by the Di genome—seq, but these sites are rarely present in the human genome (6 ± 2 such sites per crRNA).
We compared the genome-wide specificity of Cas9 with that of LbC fl and AsCpfl at two overlapping sites in the DNMT1 locus의 2 개의 overlapping site에서의 LbCpfl 및 AsCpfl의 genome-wide 특이성을 Digenome-seq 방식 (실시예 4 참조)으로 측정하여 SpCas9과 비교하였다 (도 21a 및 21b 참조). 도 21a에 나타난 in vitro 절단 부위의 genome-wide distribution플롯은 Cas9와 Cpf 1은 매우 상이한 부위에서 chromosomal DNA를 절단함을 보여준다. in vitro cleavage site에서의 DNA서열을 비교하여 얻어진 새로운 모티프 또는 서열 로고는  We compared the genome-wide specificity of Cas9 with that of LbC fl and AsCpfl at two overlapping sites in the DNMT1 locus Digenome-seq method for genome-wide specificity of LbCpfl and AsCpfl at two overlapping sites of DNMT1 locus (see Example 4) And compared with SpCas9 (see FIGS. 21A and 21B). The genome-wide distribution plot of the in vitro cleavage site shown in FIG. 21A shows that Cas9 and Cpf 1 cleave chromosomal DNA at very different sites. New motifs or sequence logos obtained by comparing DNA sequences at in vitro cleavage sites
LbCpfl가 AsCpfl 또는 Cas9보다 높은 특이성을 가짐을 보여준다 (도 21a 참조). ' LbCpfl와 AsCpfl 모두 인간의 전체 유전체 내에서 on-target site에서만 절단되는 부위를 타겟팅하는 것으로 나타났다 (도 21b 및 도 23 참조). 도 23은 Di genome-captured site의 Sequence logos를 나타낸 것으로, Sequence logos는 Di genome-captured sites을 사용하는 WebLogo LbCpfl shows higher specificity than AsCpfl or Cas9 (see FIG. 21A). ' Both LbCpfl and AsCpfl have been shown to target sites that are cleaved only at the on-target site within the human genome (see FIGS. 21B and 23). 23 shows Sequence logos of Di genome-captured sites, and Sequence logos are WebLogos using Di genome-captured sites.
(http://weblogo.berkeley.edu/logo.cgi)를 통하여 얻었으며, 오직 하나의 D醒 on-target site만이 LbCpfl 및 AsCpfl에 의하여 캡쳐되는 것으로 나타났다.  (http://weblogo.berkeley.edu/logo.cgi), and only one D 하나 on-target site was shown to be captured by LbCpfl and AsCpfl.
Digenome-seq에 의하여 확인된 in vitro 절단 부위를 targeted deep sequencing를 통하여 HEK293 cell 세포에서 유효성을 검증하였다. 유효성이 있는 것으로 검증된 off-target 부위의 대부분에서의 Indel frequency는 1% 마만이었으며 (도 21d 및 24a 내지 24f 참조), 이러한 결과는 대응 on-target site에서의 Indel frequency와 비교하여 매우 낮은 수치이다. 도 21d는 targeted deep sequencing에 의하여 인간 세포에서 확인된 off-target site을 보여주는 그래프로서, on-target과 off— target 부위의 DNA서열도 함께 나타나 있다 (굵은 글씨는 PAM서열이고 Mismatched뉴클레오타이드는 소문자로 표시됨). 도 24a 내지 24f 는 HEK293T17 세포에서의 Di genome-captured site에서의 Indel In vitro cleavage sites identified by Digenome-seq were validated in HEK293 cell cells through targeted deep sequencing. Indel frequency in most of the off-target sites proven to be valid is 1% Marman was (see FIGS. 21D and 24A-24F) and this result is a very low value compared to the Indel frequency at the corresponding on-target site. FIG. 21D is a graph showing off-target sites identified in human cells by targeted deep sequencing, including DNA sequences of on-target and off-target sites (bold letters are PAM sequences and Mismatched nucleotides are shown in lowercase letters). ). 24A-F show Indel at Di genome-captured site in HEK293T17 cells
frequency를 나타낸 그래프로서, 진한 막대는 LbCpfl 플라스미드로 트랜스펙션된 HEK293T17 세포에서 얻어진 결과이고, 연한 막대는 AsCpfl 플라스미드로 A dark bar shows the results obtained in HEK293T17 cells transfected with the LbCpfl plasmid, and the light bar shows the AsCpfl plasmid.
트랜스펙션된 ΗΕΚ293ΊΊ7 세포에서 얻어진 결과를 보여준다. The results obtained in transfected ΗΕΚ293ΊΊ7 cells are shown.
genome-wide off-target 효과를 정량하기 위하여 , on-target indel rate에 대한 유효한 (validate) off-target site의 indel rate의 총 합의 비율로서 산정한 off-target effect index (OTI)를 계산하였다. 두 개의 D丽 Tl sites (DNMT1-3 및 D丽 T1-4)에 대한 LbCpfl의 0TI는 각각 0.005 및 0.012이'고, AsCpfl 의 0TI는 각각 0.267 및 0.024로 나타났다. 이러한 결과는 off-target effect는 이치 의존적 (site-dependent)이며, LbCpfl는 AsCpfl와 비교하여 비교적 특이성이 높음을 제안한다. 한편, 본 발명자들의 이전 연구에서 상기 두 부위에서의 Cas9의 0TI가 >2.0인 것으로 나타났다,  To quantify the genome-wide off-target effect, the off-target effect index (OTI) was calculated as the ratio of the total sum of the indel rates of the valid off-target site to the on-target indel rate. The 0TIs of LbCpfl for the two Dlli Tl sites (DNMT1-3 and Dlli T1-4) were 0.005 and 0.012 ', respectively, and the 0TIs of AsCpfl were 0.267 and 0.024, respectively. These results suggest that the off-target effect is site-dependent and that LbCpfl is relatively specific compared to AsCpfl. On the other hand, our previous study showed that the 0TI of Cas9 at these two sites was> 2.0.
이들 유효한 off-target site에서의 indel frequency가 local chromatin inaccessibility에 의하여 저하될 가능성을 배제시키기 위하여, off-target site과 흔성화 가능한 (matched) 서열을 갖는 새로운 crRNA를 트랜스펙션하여 시험하였다 (도 21e 참조). 도 21e는 상기 off-target site에 흔성화하도록 재설계된 crRNA를 이용하여 AsCpfl off-target 부위에서 얻어진 Targeted mutagenesis (Indel frequency (%))를 보여주는 그래프이다. 각각의 off-target- specific crRNA는 각각의 대응하는 위치에서 indel s을 유도할 수 있지만, on- target site에서는 indel을 유도하지 않았다. 도 21e에 나타난 바와 같이, 0T6 부위는 비정형의 51— TCTN-3' PAM서열을 포함하며, 0T6 및 0ΊΊ2 (3' 말단의 하나의 뉴클레오타이드만 상이함) 부위에 특이적인 crRNAs는 0T6 부위에서 각각 3/7% 및 8.1%의 빈도로 indel을 유도하였다. 이러한 결과는 Cpf 1가 비정형 PAM서열을 갖는 염색체 표적 부위에서도 유전체 절단올 수행할 수 있고, 이로 인하여 Cpfl- 매개 유전체 교정의 범위를 확장시킬 수 있음을 보여준다. 실시예 10: RNP를사용한 경우의 off-target 효과 시험 To rule out the possibility that the indel frequency at these effective off-target sites would be degraded by local chromatin inaccessibility, new crRNAs with sequences matched with the off-target site were tested by transfection (FIG. 21E). Reference). 21E is a graph showing Targeted mutagenesis (Indel frequency (%)) obtained at the AsCpfl off-target site using crRNA redesigned to localize to the off-target site. Each off-target-specific crRNA can induce indel s at each corresponding site, but not indel at on-target site. As shown in FIG. 21E, the 0T6 region contains an atypical 5 1 — TCTN-3 ′ PAM sequence, and the crRNAs specific for 0T6 and 0ΊΊ2 (only one nucleotide at the 3 ′ end) sites are present at the 0T6 region, respectively. Indels were induced at frequencies of 3/7% and 8.1%. These results show that genome cleavage can be performed at chromosomal target sites with Cpf monovalent atypical PAM sequences, thereby extending the scope of Cpfl-mediated genome correction. Example 10 Off-target Effect Test with RNP
off-target 효과를 회피하거나 감소시키기 위하여, 미리 조합된  Pre-combined to avoid or reduce off-target effects
(preassembled) Cpfl RNP를 인간 세포에 트랜스펙션시켜 시험하였다. Cas9 RNPs Cpfl RNPs는 트랜스펙션 직후 즉시 표적 부위를 절단하고 세포에 내재하는 단백질 분해효소 (proteases)와 RNA 분해효소 (ribonuc leases)에 의하여 분해되어, on- target 효과 저하 없이 off-target 효과가 저하될 것으로 예상되었으며, 실제로 Cpfl RNP는 plasmids를 사용하여 입증된 몇 몇 off-target site에서 노이즈 수준 이상으로 indel을 유도하지 않았다 (도 21f 참조). (Preassembled) Cpfl RNPs were tested by transfection into human cells. Cas9 RNPs Cpfl RNPs are immediately cleaved off the target site and degraded by proteases and RNAases inherent in the cell, reducing the off-target effect without degrading on-target effects. Indeed, Cpfl RNP did not induce indels above noise levels at some off-target sites demonstrated using plasmids (see FIG. 21F).
도 21f 는 Cpfl 및 crRNA를 암호화하는 플라스미드를 사용한 경우와 Cpfl 및 crRNA가 복합체를 형성하는 RNP를 사용한 경우의 Cpfl off-target 효과를 보여주는 그래프로서, specificity ratio는 Cpfl RNP를 사용하여 얻어진 off- target indel (OTI) frequency에 대한 on—target indel frequency의 비율과 플라스미드를 사용한 경우의 비율 간 fold difference (RNA/plasmid)를 나타내며, 이들 결과는 플라스미드를 사용한 경우와 비교하여 RNP를 사용한 경우의 off- target 효과가 현저히 감소함을 보여준다. 도 21f 의 결과를 기초로, AsCpfl RNP를 사용한 경우와 LbCpfl RNP를 사용한 경우 모두에 있어서 0TI가 0.0004보다 낮게 (<0.0004) 나타났다. 이러한 결과는 이들 RNP가 off-target effect를 거의 나타내지 않음을 보여준다. 실시예 11: 3' 말단에서 절단된 crRNA를 이용한 off-target 효과측정 FIG. 21F is a graph showing the Cpfl off-target effect when using a plasmid encoding Cpfl and crRNA and using a RNP in which Cpfl and crRNA form a complex. Specificity ratio is an off-target indel obtained using Cpfl RNP. It shows the fold difference (RNA / plasmid) between the ratio of on-target indel frequency to the ratio of (OTI) and the ratio of plasmid, and these results show the off-target effect of RNP compared with plasmid. Shows a significant decrease. Based on the results of FIG. 21F, 0TI was lower than 0.0004 (<0.0004) in both the case of using AsCpfl RNP and the case of using LbCpfl RNP. These results show that these RNPs show little off-target effect. Example 11: Off-target effect measurement using crRNA cleaved at 3 'end
3' 말단에서 절단된 (truncated) 절단 crRNA (tru-crRNAs)의 off-target 효과를 시험하였다. The off-target effect of truncated truncated crRNAs (tru-crRNAs) at the 3 'end was tested.
3' 말단에서 절단된 절단 crRNA (tru-crRNAs)는 crRNA의 타겟팅 서열을 3' 말단부터 절단하여 , 타겟팅 서열 길이가 22nt, 20nt, 18nt, 및 16nt가 되도록 각각 설계하였다. 구체적으로, 상기 3' 말단에서 절단된 절단 crRNA (tru-crRNAs)는 서열번호 29(mCCTGATGGTCCATGTCTGTTACTC)의 DNTM1- target site 중에서 PAM 서열 (δ'-ΊΤΤ - )의 3' 방향으로 인접하여 위치하는 연속하는 22nt, 20nt , 18n t , 및 16nt 서열과 흔성화 가능하도록 설계하였다 (즉, crRNA의 타겟팅 서열이 서열 호 29의 서열 중 PAM서열 (5'-TTTC-3')의 3' 방향으로 인접하여 위치하는 연속하는 22nt, 20nt, 18nt , 및 16nt 서열에서 T를 U로 치환한 서열을 가짐). 각각의 tru-crRNA 및 전장 crRNA (full-length crRNA; 타겟팅 서열로서 서열번호 29에서 PAM서열을 제외한 23nt 서열에서 T를 U로 차환한서열을 가짐 )를 각각 lipofectamine 2000를 사용하여 AsCpfl 발현 플라스미드와 함께 HEK293T세포에 트랜스펙션시켰다. 72시간 후, 유전체 DNA를분리하고, targeted deep Truncated crRNAs (tru-crRNAs) cut at the 3 'end were designed to cut the targeting sequence of the crRNA from the 3' end, so that the targeting sequence lengths were 22nt, 20nt, 18nt, and 16nt, respectively. Specifically, truncated crRNAs (tru-crRNAs) cleaved at the 3 'end are consecutively located adjacent to the 3' direction of the PAM sequence (δ'-ΊΤΤ-) in the DNTM1- target site of SEQ ID NO: 29 (mCCTGATGGTCCATGTCTGTTACTC). Designed to be capable of hybridization with 22nt, 20nt, 18n t, and 16nt sequences (ie, the targeting sequence of the crRNA is located adjacent to the 3 'direction of the PAM sequence (5'-TTTC-3') of the sequence of SEQ ID NO. doing Having a sequence of T replaced by U in consecutive 22nt, 20nt, 18nt, and 16nt sequences. Each tru-crRNA and full-length crRNA (full-length crRNA; with T as U-subtracted at 23 nt sequence as the targeting sequence excluding the PAM sequence at SEQ ID NO: 29), respectively, were used with AsCpfl expression plasmid using lipofectamine 2000. HEK293T cells were transfected. After 72 hours, genomic DNA was isolated and targeted deep
sequencing를 통하여 onᅳ target 및 off-target site에서의 indel frequencies 를 측정하였다. Sequencing measured indel frequencies at on- and off-target sites.
상기 얻어진 결과를 도 25에 나타내었다. 도 25에 나타난 바와 같이 , tru- crRNAs를 사용한 경우, off-target 효과는 약 1/10 정도까지 감소함을 확인할 수 있다. 이러한 off-target 효과의 감소는 off-target가 PAM-distal 3' 말단에 미스매치 뉴클레오타이드를 포함하는 경우에 보다뚜렷하게 나타날 것으로 기대된다. 실시예 12: Cpfl에 의한 절단 말단 확인  The obtained result is shown in FIG. As shown in Figure 25, when using tru-crRNAs, it can be seen that the off-target effect is reduced to about 1/10. This reduction in off-target effect is expected to be more pronounced when the off-target contains a mismatch nucleotide at the PAM-distal 3 'end. Example 12 Cleavage Termination by Cpfl
실시에 4에 기재된 Digenome-seq 분석법을 사용하는 경우 Integrative Integrative when using the Digenome-seq assay described in Example 4
Genomics Viewer (IGV)를 사용하여 절단 부위에서의 overhang 패턴을 용이하게 나타낼 수 있다는 이점이 있다. The advantage is that the Genomics Viewer (IGV) can be used to easily represent overhang patterns at the cleavage site.
도 26a는 DNTMl-?> target site (서열번호 19) 및 DNTM1-4 target  Figure 26a shows DNTMl-?> Target site (SEQ ID NO: 19) and DNTM1-4 target
site (서열번호 20)에서의 overhang pattern을 보여주는 대표적인 Integrative Genomics Viewer (IGV; ' ht tp: / /software . broadinst i tute . org/ software/ igv/ ' 참조) 이미지를 보여준다. LbCpfl는 대체적으로 절단 부위의 5' 말단에 3-nt Show a representative Integrative Genomics Viewer (IGV; 'ht tp: / / software. broadinst i tute. org / software / igv /') image showing the overhang pattern at site (SEQ ID NO: 20). LbCpfl is typically 3-nt at the 5 'end of the cleavage site
overhang을 생성하지만 2-nt overhang은 생성하지 않은 반면, AsCpfl는 절단 부위의 5' 말단에 2- nt 내지 4-nt overhang을 생성하였다. Cas9는 blunt end 또는 절단부위의 5' 말단에 1-nt overhang을 생성하였다. Overhang was produced but not 2-nt overhang, whereas AsCpfl produced 2-nt to 4-nt overhang at the 5 'end of the cleavage site. Cas9 produced 1-nt overhang at the blunt end or 5 'end of the cleavage site.
상기와 같이 DNTMl-2> target site (서열번호 19) 및 DNTM1-4 target site DNTM-2-2 target site (SEQ ID NO: 19) and DNTM1-4 target site as above
(서열번호 20)에 대하여 생성된 상이한 overhang pattern이 상이한 변이 특성을 유발하는지 여부를 시험하였다. It was tested whether the different overhang patterns generated for (SEQ ID NO: 20) caused different mutation properties.
도 26b는 염기쌍 내에서 deletion/insertion크기에 의하여 bin된 변이 서열 리드의 개수를 보여주는 그래프이다. 도 26c는 Cpfl 또는 Cas9의 target site에서 유도되는 변이 서열을 보여주는 것으로, 각각의 뉴클레아제에 대하여, 첫 번째 줄의 서열은 원래의 target 서열이고, 두 번째 줄부터는 변이가 도입된 서열을 보여주며, 첫 번째 줄 서열에서 PAM서열 (Cpfl: TTTC)은 굵은 글씨로 표시하고, crRNA/sgRNA이 흔성화하는 표적 서열은 밑줄로 표시하였으며 , 두 번째 줄부터의 서열에서 밑줄로 표시된 서열은 Microhomology sequences를 의미하고, 우측에 기재된 숫자는 결실 ('- '로 표시)되거나 삽입 (소문자로 표시)된 FIG. 26B is a graph showing the number of variant sequence reads binned by deletion / insertion size in base pairs. FIG. Figure 26c shows a variant sequence derived at the target site of Cpfl or Cas9, for each nuclease, the first The first line shows the original target sequence, the second shows the sequence into which the mutation is introduced, and the first line shows the PAM sequence (Cpfl: TTTC) in bold and the crRNA / sgRNA is common. The target sequence is underlined, the underlined sequence in the sequence from the second line refers to the Microhomology sequences, and the numbers on the right are deleted (indicated by '-') or inserted (in lowercase)
뉴클레오타이드의 개수를 의미한다. It means the number of nucleotides.
도.27a 및 27b은 LbCpfl, AsCpfl, 및 SpCas9에 의하여 유도되는 변이 특성을 보여주는 것으로,  27a and 27b show mutation characteristics induced by LbCpfl, AsCpfl, and SpCas9.
27a는 염기쌍 내에서 deletion/insertion (Indel) size에 의하여 bin된 변이 서열 리드의 개수를 보여주는 그래프이고, 변이 특성은 LbCpfl, AsCpfl, 또는 SpCas9플라스이드로 트랜스펙션된 HEK293T세포로부터 targeted deep sequencing 방식으로 측정하였으며,  27a is a graph showing the number of mutant sequence reads binned by deletion / insertion (Indel) size in base pairs, and mutation characteristics were determined by targeted deep sequencing method from HEK293T cells transfected with LbCpfl, AsCpfl, or SpCas9 plasmids. And
27b는 EMX1-2 target site (CTGATGGTCCATGTCTGTTACTC; 서열번호 42)에서 유도되는 변이 서열을 보여주는 것으로, 각각의 뉴클레아제에 대하여, 첫 번째 즐의 서열은 원래의 target 부위 서열이고, 두 번째 즐부터는 변이가 도입된 서열을 보여주며, 첫 번째 줄 서열에서 PAM서열 (Cpfl: TTTG)은 굵은 글씨로 표시하고, crRNA/sgRNA이 흔성화하는 표적 서열은 밑줄로 표시하였으며 , 두 번째 즐부터의 서열에서 밑줄로 표시된 서열은 Microhomology sequences를 의미하고, 우측에 기재된 숫자는 결실 ('- '로 표시)되거나 삽입 (소문자로 표시)된  27b shows a variant sequence derived from the EMX1-2 target site (CTGATGGTCCATGTCTGTTACTC; SEQ ID NO: 42), for each nuclease, the sequence of the first bladder is the original target site sequence, and from the second bla PAM sequence (Cpfl: TTTG) is shown in bold in the first line sequence, underlined in the target sequence to which crRNA / sgRNA is popularized, underlined in the sequence from the second bl Marked sequences refer to Microhomology sequences, the numbers on the right of which are deleted (indicated by '-') or inserted (in lowercase)
뉴클레오타이드의 개수를 의미한다. It means the number of nucleotides.
LbCpfl, AsCpfl, 및 Cas9는, 결실 junction에서 약간의 미세상동성  LbCpfl, AsCpfl, and Cas9 have slight microhomology at deletion junctions
(microhomoloy)이 발견되기는 하지만, 비교적 상당히 다른 변이 서열을 유도한다. Cpfl 뉴클레아제의 경우 뉴클레오타이드 하나의 삽입 또는 결실은 드물지만, Cas9의 경우에는 우세한 변이 패턴일 수 있다. 이러한 결과는 Cpfl와 Cas9 간 절단 부위 및 overhang pattern에서의 차이가 상이한 변이 특성을 유발함을 보여준다. Although microhomoloy is found, it leads to relatively different variant sequences. Insertion or deletion of one nucleotide is rare for Cpfl nucleases, but may be a predominant variation pattern for Cas9. These results show that differences in cleavage sites and overhang patterns between Cpfl and Cas9 cause different mutational characteristics.
도 26d 및 도 26e는 LbCpfl, AsCpfl 및 SpCas9에 의하여 유도되는 변이 특성을 보여주는 것으로, 26d는 변이 서열이 결실 vs. 삽입의 두 가지  FIG. 26D and FIG. 26E show mutation characteristics induced by LbCpfl, AsCpfl and SpCas9, with 26d showing the deletion vs. deletion sequence. Two kinds of inserts
fraction으로 나뉘어지는 경우의 각각의 비율을 보여주는 그래프이고, 26e는 변0 서열이 in-frame indels vs. out— of一 frame indels의 두 가지 fraction으로 나뉘어지는 경우의 각각의 비율을 보여주는 그래프이다 (Data represent mean 士 s.e.m. (n = 10 target sites)) . a graph showing the proportion of each of the case where divided into a fraction, 26e is a change in zero-sequence frame indels vs. out—of two frame indels This is a graph showing the ratio of each case of dividing (Data represent mean mean sem (n = 10 target sites)).
도 26d에 나타난 바와 같이, Cas9와 달리, Cpfl는 삽입 변이를 거의 유도하지 않는다. 또한, 도 26e에 보여지는 바와 같이 , 3-nt, 6-nt , 9-nt 등의 결실에 의하여 유발되는 in-frame mutation 비율은 Cas9보다 Cpfl를 사용하는 경우에 높게 나타났다. 이러한 결과는, Cas9와 비교하여, Cpfl를 사용하는 경우에 표적 부위를 microhomology 기반으로 선택하는 것이 단백질 코딩 유전자를 불활성화시키기 위하여 보다 중요함을 제안한다. 실시예 13: Cpfl 과 crRNA 의 RNP를 microinjection 방식으로 mouse embryo 에 전달하여 표적 위치에 특이적 염기서열 변이를 일으키는 유전체 교정 기법  As shown in FIG. 26D, unlike Cas9, Cpfl hardly induces insertional variation. In addition, as shown in Figure 26e, the ratio of in-frame mutations caused by deletion of 3-nt, 6-nt, 9-nt and the like was higher when using Cpfl than Cas9. These results suggest that, in comparison with Cas9, microhomology-based selection of target sites is more important for inactivating protein coding genes when using Cpfl. Example 13: A genome calibration technique that delivers RNPs of Cpfl and crRNA to mouse embryos by microinjection method to generate specific sequence mutations at target sites
현재까지 Cpfl RNP를 사용하여 mouse embryo 에 microinjection 하는 방법으로 mutant mouse 를 만드는 것이 보고된 바는 없다.  To date, no mutant mouse has been reported by microinjection into mouse embryos using Cpfl RNP.
재조합 Acidaminococcus sp. BV3L6 Cpfl (AsCpfl) 단백질을 E.coli 에서 발현 및 정제하고 (실시예 1 참조), 마우스 유전자 (FoxNl)을 표적하는 crRNA (서열번호 1 내지 3 참조)를 제작하여 이를 조합하여 RNP를 만들었다 (AsCpfl protein 200 ng/ul, crRNA 100 ng/ul). crRNA는 서열번호 2 및 서열번호 3의 표적 서열을 기반으로, 표 4에 설명된 방법으로 제작하였다.  Recombinant Acidaminococcus sp. BV3L6 Cpfl (AsCpfl) protein was expressed and purified in E. coli (see Example 1), crRNAs targeting mouse genes (FoxNl) (see SEQ ID NOs: 1 to 3) were constructed and combined to create RNPs (AsCpfl protein 200 ng / ul, crRNA 100 ng / ul). crRNA was constructed by the method described in Table 4 based on the target sequences of SEQ ID NO: 2 and SEQ ID NO: 3.
이와 같이 제조된 NP를 microinjection 방식으로 mouse embryo 에 전달하고 (도 1 참조), injection 한 embryo 들을 blastocyst까지 배양하여 gDNA를 정제하여 염기서열 변이를 확인하였다. T7E1 assay를 진행한 결과를 도 2에 나타내었다. 도 2에서 보여지는 바와 같이, 12 개 중 10개의 blastocyst (83%) 에서 염기서열 변이가 나타났다 (별표로 표시함).  The thus prepared NP was transferred to mouse embryos using a microinjection method (see FIG. 1), the injected embryos were cultured to the blastocyst, and the nucleotide sequence was confirmed by purifying gDNA. The results of the T7E1 assay are shown in FIG. 2. As shown in Figure 2, 10 out of 12 blastocyst (83%) showed a nucleotide sequence variation (marked with an asterisk).
유전체 변이가 crRNA 가 표적하는 서열에 특이적으로 유도되었음을 확인하기 위해 targeted deep sequencing을 진행하여 그 결과를 도 3에 나타내었다. 이 결과는 AsCpfl RNP 를 microinjection 하는 방법이 동물에서 효율적인 유전체 교정을 할 수 있는 방법임을 보여준다.  In order to confirm that genomic variation was specifically induced in the target sequence of crRNA, the target deep sequencing was performed and the results are shown in FIG. 3. The results show that microinjection of AsCpfl RNP is an efficient method for genome editing in animals.
또한, Cpfl RNP를 사용하여 embryo 에서 유전체 교정을 한 생쥐가 태어나서 이 개체에서 염기서열 변이가 특이적으로 일어났는지 그리고 비특이적 염기서열 변이가 없는지 확인하였다. 이 생쥐의 꼬리에서 gDNA를 정제하여 T7E1 실험과 targeted deep sequencing 방법으로 특이적 위치에 유전체 변이가 있음을 In addition, genome-corrected mice were born in embryos using Cpfl RNP to determine whether there was specific sequence variation and non-specific sequence variation in the individual. Purification of gDNA from the tails of these mice Targeted deep sequencing method is used to identify genome mutations at specific locations
확인하였고 (도 4 및 도 5 참조), whole genome sequencing (WGS) 방법으로 비특이적 유전체 변이가 있는지 분석하였다 (도 6 참조). WGS 데이터를 reference genome 과 비교 분석한 결과 비특이적 염기서열 변이는 일어나자않았고 오직 특이적 서열에만 유전체 교정이 있었음을 확인하였다 (도 6 참조). 실시예 14: Electroporation방법을 통해 Cpfl과 Cas9 RNP를 mouse embryo 에 전달하는 유전체 교정 기법 4 and 5, and analyzed for non-specific genome variation by whole genome sequencing (WGS) method (see FIG. 6). Comparative analysis of the WGS data with the reference genome confirmed that nonspecific sequence mutations did not occur and that only specific sequences had genome correction (see FIG. 6). Example 14 Genome Correction Technique for Delivering Cpfl and Cas9 RNP to Mouse Embryos by Electroporation
Microinjection을 통한 Cpfl RNP delivery는 mouse embryo 를 하나씩 처리해야 하기 때문에 embryo 가 1 cell stage 에 머물러 있는 몇 시간 동안 실험을 마쳐야 하고 따라서 한 번에 실험할 수 있는 개수가 실험자와 injection 장비의 수에 의해 제한되는 단점이 있다.  Since Cpfl RNP delivery via microinjection requires processing of mouse embryos one by one, experiments must be completed for several hours during which the embryo stays in one cell stage, so the number of experiments that can be performed at one time is limited by the number of experimenters and injection equipment. There are disadvantages.
이를 극복하기 위해 우리는 한 번에 여러 embryo 를 처리할 수 있는 electroporation 방법을 Streptococcus pyogenes Cas9 (SpCas9)과 AsCpf 1 재조합 단백질에 적용하여 mouse embryo 에서 유전체 교정을 하는 방법을 규명하였다 (도 7 참조). 본 실시예에서는, 재조합 AsCpfl 또는 SpCas9 단백질 (100 ng/ul)과 sgRNA (500 ng/ul; 서열번호 6의 표적서열 (VEGFA)을 기초로 표 5에 기재된 설명을 참조하여 제작) 또는 crRNA (250 ng/ul; 서열번호 2 또는 3의 표적 서열을 기초로 표 4의 설명을 참조하여 제작)을 Opti-Mem (Thermo) 배지에 회석하여 RNP를 준비하였다. 여기에 mouse embryo 를 50개 넣고 NEPA 21(NEPA GENE Co. Ltd) electroporator 장비를 사용하여 electroporation 을 진행했다.  In order to overcome this, we have identified a method for genome correction in mouse embryos by applying electroporation method that can process several embryos at once to Streptococcus pyogenes Cas9 (SpCas9) and AsCpf 1 recombinant protein (see FIG. 7). In this example, recombinant AsCpfl or SpCas9 protein (100 ng / ul) and sgRNA (500 ng / ul; prepared with reference to the description in Table 5 based on the target sequence of SEQ ID NO: 6) or crRNA (250) ng / ul; prepared based on the target sequence of SEQ ID NO: 2 or 3, referring to the description in Table 4), was prepared in an Opti-Mem (Thermo) medium to prepare an RNP. 50 mouse embryos were added and electroporation was performed using NEPA 21 (NEPA GENE Co. Ltd) electroporator.
Electroporation은 poring pulse (225 V, 1.5 tns , interval 50 ms, 4 회, decay rate 10%, polarity +)과 transfer pulse (20V, 50 ms, interval 50 ms, 5 회 , decay rate 40%, polarity +/-) 방법을 사용했다. 먼저 SpCas9을 시도했는데, SpCas9 과 VEGFA를 표적으로 하는 sgRNA로 RNP를 만들고 이를 mouse embryo 에 electroporation 했다. 이 embryo 를 blastocyst까지 배양하고 gDNA를 정제하고 T7E1 방법과 targeted deep sequencing방법으로 염기서열 변이를 분석하였다 (도 8 및 도 9.참조). Electroporation consists of poring pulse (225 V, 1.5 tns, interval 50 ms, 4 times, decay rate 10%, polarity +) and transfer pulse (20V, 50 ms, interval 50 ms, 5 times, decay rate 40%, polarity + / -) Used the method. I first tried SpCas9, which made RNPs from sgRNAs targeting SpCas9 and VEGFA and electroporated them into mouse embryos. This was cultured embryo to blastocyst and purified gDNA and analyzing nucleotide sequence variations in T7E1 manner as targeted deep sequencing method (see Figs.).
도 8 및 도 9에 나타난 바와 같이, Blastocyst 분석 결과 electroporation 방식으로 SpCas9을 전달하여 효율적인 유전체 교정이 일어났음을 확인할 수 있었다 (15 개 중 12 개에 변이 확인 (8, 13 및 15 컬럼 제외하고 12개 컬럼에서 변이가 관찰됨), 80% 효율). As shown in FIG. 8 and FIG. 9, the Blastocyst analysis showed that SpCas9 was delivered in an electroporation manner, and efficient genome correction occurred. (Variation confirmed in 12 of 15 (variations observed in 12 columns except 8, 13 and 15 columns), 80% efficiency).
같은 방법으로 FoxNl exon 7을 표적으로 하는 AsCpfl RNP 를 mouse embryo 에 electroporation 으로 전달하였을 때 blastocycst 분석을 통해 효율적인 유전체 교정 (25 개 중 16개 , 64%)이 됨을 targeted deep sequencing으로 확인하였다 (도 10 참조). 실시예 15: Polyethylene glycol (PEG)를 이용하여 Cpfl RNP를 식물에 전달하여 특이적 염기서열 변이를 일으키는 유전체 교정 기법  AsCpfl RNP targeting FoxNl exon 7 in the same manner was confirmed by targeted deep sequencing that efficient genome correction (16 out of 25, 64%) through blastocycst analysis when electroporation was delivered to mouse embryos (see FIG. 10). ). Example 15 Genome Correction Technique Using Polyethylene Glycol (PEG) to Deliver Cpfl RNP to Plants for Specific Sequence Variation
현재까지 식물 유전체 교정을 위해서 Cpfl RNP을 사용하는 방법은 보고된 바가 없다. 본 실시예에서는 재조합 AsCpfl 과 Lachnospiraceae bacterium D2006 Cpfl (LbCpfl)을 사용하는 식물 유전체 교정하는 방법을 규명하고 이 방법을 적용하여 콩 (Glycine Max) 의 FAD2 상동유전자들이 녹아웃된 식물을 제조하여 활용하는 방법을 제시한다. 이를 위해 콩의 FAD2 상동 유전자 (Glymal0g42470 과 Glyma20g24530)을 동시에 특이적으로 인식하는 AsCpfl 과 LbCpfl 의 표적 염기서열을 확보하였다. 이와 같이 확보된 표적 서열을 아래의 표 34에  To date, no method of using Cpfl RNP for plant genome correction has been reported. In this example, a method of correcting a plant genome using recombinant AsCpfl and Lachnospiraceae bacterium D2006 Cpfl (LbCpfl) is applied, and the method is applied to prepare and utilize a plant in which the FAD2 homologs of soybean (Glycine Max) are knocked out. present. To this end, we obtained target sequences of AsCpfl and LbCpfl that specifically recognized the soybean FAD2 homologous genes (Glymal0g42470 and Glyma20g24530) simultaneously. The target sequences thus obtained are shown in Table 34 below.
나타내었다: Indicated:
【표 34]  Table 34
PAM and Target sequence 1 for FAD2 TTTCTACATTGCCACCACCTAOT  PAM and Target sequence 1 for FAD2 TTTCTACATTGCCACCACCTAOT
homo 1 ogous genes Glymal0g42470 and CC TTTC-서열번호 7) homo 1 ogous genes Glymal0g42470 and CC TTTC-SEQ ID NO: 7)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 2 for FAD2 TTTCCCTCATTGCATGGCCMTCT  PAM and Target sequence 2 for FAD2 TTTCCCTCATTGCATGGCCMTCT
homologous genes Glymal0g42470 and AT TC-서열번호 8) homologous genes Glymal0g42470 and AT TC-SEQ ID NO 8)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 3 (LbCpfl) for TTTAGTCCCmiTTCTCATGGAA  PAM and Target sequence 3 (LbCpfl) for TTTAGTCCCmiTTCTCATGGAA
FAD2 homo 1 ogous genes Glymal0g42470 and M TTA-서열번호 9) FAD2 homo 1 ogous genes Glymal0g42470 and M TTA-SEQ ID NO: 9)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 4 for FAD2 TTTCTCATGGAAAATAAGCCATCG  PAM and Target sequence 4 for FAD2 TTTCTCATGGAAAATAAGCCATCG
homologous genes Glymal0g42470 and CC TTC-서열번호 10) Glyma20g24530 homologous genes Glymal0g42470 and CC TTC-SEQ ID NO: 10) Glyma20g24530
PAM and Target sequence 5 for FAD2 TTTCTCCCAAAACCAAAATCCAAA  PAM and Target sequence 5 for FAD2 TTTCTCCCAAAACCAAAATCCAAA
homologous genes Glymal0g42470 and GT TTG—서열번호 11) homologous genes Glymal0g42470 and GT TTG—SEQ ID NO: 11)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 6 for FAD2 TITCGCTGCTATGTGTTTATGGGG  PAM and Target sequence 6 for FAD2 TITCGCTGCTATGTGTTTATGGGG
homologous genes Glymal0g42470 and TG TTG-서열번호 12) homologous genes Glymal0g42470 and TG TTG-SEQ ID NO: 12)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 7 for FAD2 TTTGGCAACTATGGACAGAGATTA  PAM and Target sequence 7 for FAD2 TTTGGCAACTATGGACAGAGATTA
homologous genes Glymal0g42470 and TGOTTG-서열번호 13) homologous genes Glymal0g42470 and TGOTTG-SEQ ID NO: 13)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 8 for FAD2 TTTGATGACACACCATTTTACAAG  PAM and Target sequence 8 for FAD2 TTTGATGACACACCATTTTACAAG
homologous genes Glymal0g42470 and GC TTG-서열번호 14) homologous genes Glymal0g42470 and GC TTG-SEQ ID NO: 14)
Glyma20g24530 Glyma20g24530
PAM and Target sequence 9 (AsCpf 1) for TTTACAAGGCACTGTGGAGAGAAG  PAM and Target sequence 9 (AsCpf 1) for TTTACAAGGCACTGTGGAGAGAAG
FAD2 homologous genes Glymal0g42470 and C(TTTA-서열번호 15) FAD2 homologous genes Glymal0g42470 and C (TTTA-SEQ ID NO: 15)
Glyma20g24530 Glyma20g24530
(PAM서열을 굵은 글씨로 표시함)  (PAM sequence in bold)
상기 확보된 표적 서열을 기초로 표 4에서 설명된 방법으로 crRNA를 제작하였다.  CrRNA was constructed by the method described in Table 4 based on the obtained target sequences.
40% polyethylene glycol (PEG) 용액 (PEG 4000, 0.2 M manni tol and 0. 1 M CaCl2) 300 ul을 이용하여 동량의 丽 G(0.4 M manni tol , 15 mM MgCl2) 용액에 푼 식물 원형질체 (2xl05 protopl asts (콩) )에 미리 섞은 재조합 AsCpf l (또 LbCpf l) 단백질 (40 ug/2xl05 protoplasts)과 crRNA (80 ug/2xl05 protoplasts)을 흔합하여 식물 세포 내로 RNP를 전달하였다 (도 11 참조) . Plant protoplasts (40 mg polyethylene glycol (PEG) solution (PEG 4000, 0.2 M manni tol and 0.1 M CaCl 2 ) in 300 ul of the same amount of solution of li G (0.4 M manni tol, 15 mM MgCl 2 ) Recombinant AsCpf l (also LbCpf l) protein (40 ug / 2xl0 5 protoplasts) and crRNA (80 ug / 2xl0 5 protoplasts) premixed with 2xl0 5 protopl asts (beans) were combined to deliver RNPs into plant cells (FIG. 11).
전달된 식물 원형질체는 W5 (2 mM MES [ H 5.7] , 154 mM NaCl , 125 mM CaCl2 , 5 mM KC1 ) 용액에 24시간 배양 후 gDNA를 분리하여 타겟 유전자로부터 유전자 교정이 발생했는지 확인하였다. 이 방법올 적용하여 두 상동 FAD2 유전자가 The delivered plant protoplasts were cultured in a W5 (2 mM MES [H 5.7], 154 mM NaCl, 125 mM CaCl 2 , 5 mM KC1) solution for 24 hours, and then gDNA was isolated to confirm that genetic modification occurred from the target gene. This method applies two homologous FAD2 genes
녹아웃된 식물 세포를 제작할 수 있음을 targeted deep sequencing 방법으로 분석하여 효율적인 유전체 교정을 보였다 (도 12 참조) . 염기서열 분석을 통해 염기서열 변이가 Cpfl 에서 타겟 유전자가 잘릴 것으로 예상되는 표적 위치에 발생했다는 것도 확인하였다. (도 13 참조). 실시예 16: Split-Cpfl를 이용한유전체 교정 Analysis of the ability to produce knocked out plant cells by targeted deep sequencing method showed efficient genome correction (see FIG. 12). Through sequencing It was also confirmed that the sequencing occurred at the target site where the target gene in the Cpfl is expected to be cut. (See Figure 13). Example 16: Dielectric Calibration Using Split-Cpfl
16.1. Split-Cpfl의 제작  16.1. Build Split-Cpfl
Cpfl 단백질은 기존에 사용되고 있는 인공 뉴클레아제들보다 표적  Cpfl protein is more targeted than artificial nucleases
특이적으로 작동하는 성향이 강해 진핵세포 및 생물체 내 유전자 변형을 설계하는 데 있어 주목받고 있는 차세대 유전자 가위이다. 이렇게 유용한 도구임에도 불구하고, Cpfl 단백질을 코딩하고 있는 유전자사이즈가 크기 때문에 바이러스 백터를 이용해 Cpfl 단백질을 세포 내로 전달하는 일은 상당히 효율이 떨어지는 문제가 있고, Cpfl 기술을 적용하는 데 걸림돌로 작용하고 있다. 바이러스 백터의 경우 백터의 패키징 한계를 가지고 있어서, 패키징 한계를 넘는 유전자가 코딩된 경우 바이러스 생산 효율 및 세포 내 전달 효율이 떨어지는 현상이 보편적으로 잘 알려져 있다. It is the next generation of genetic scissors that has attracted attention in designing genetic modifications in eukaryotic cells and organisms because of its highly specific behavior. Despite this useful tool, because of the large gene size encoding Cpfl protein, the transfer of Cpfl protein into cells using viral vectors is quite inefficient and poses an obstacle to the application of Cpfl technology. Viral vectors have a packaging limit of the vector, so it is generally well known that virus production efficiency and intracellular delivery efficiency are deteriorated when genes exceeding the packaging limit are encoded.
이러한 문제를 해결하기 위해서, 본 실시예에서는 Split-Cpfl system을 제작하였다. 야생형 (Wild type, WT) AsCpfl의 단백질 (서열번호 43)은  In order to solve this problem, the present embodiment produced a Split-Cpfl system. The wild type (WT) AsCpfl protein (SEQ ID NO: 43)
1,307개의 아미노산으로 구성되어 있다 (도 29a참조). AsCpfl의 단백질 발현과 세포 내 핵 전달에 필요한 프로모터 (CMV프로모터; 서열번호 64) 서열, 핵 위치 신호 (nuclear localization signal; KRPAATKKAGQAKKKK), poly A신호 등을 모두 포함한 발현용 카세트를 바이러스 백터로 옮기면 바이러스 패키징 한계에 해당하기 때문에 발현용 카세트의 크기를 줄이는 방법으로 AsCpfl 단백질을 두 조각으로 나누어 발현시키는 방법을 고안하고, 4종류의 Split-AsCpfl을 구상했다. It consists of 1,307 amino acids (see FIG. 29A). Virus packaging is carried out by transferring the expression cassette including the promoter (CMV promoter; SEQ ID NO: 64) sequence, nuclear localization signal (KRPAATKKAGQAKKKK), poly A signal, etc., necessary for protein expression of AsCpfl and intracellular nuclear transfer. Because of the limitation, we devised a method to express the AsCpfl protein into two fragments by reducing the size of the expression cassette, and designed four types of Split-AsCpfl.
Split-l-AsCpfl은 AsCpfl (서열번호 43)의 901번째 아미노산과 902번째 아미노산사이, Split-2-AsCpfl은 AsCpfl의 886번째 아미노산과 887번째 아미노산사이, Split-3-AsCpfl은 AsCpfl의 399번째 아미노산과 400번째 아미노산사이 , Split-4-AsCpfl은 AsCpfl의 526번째 아미노산과 527번째 아미노산 사이를 경계로 WT AsCpfl을 분리시켜 각각 두 조각으로 나누었다 (도 29a 참조).  Split-l-AsCpfl is between 901 and 902 amino acids of AsCpfl (SEQ ID NO: 43), Split-2-AsCpfl is between 886 and 887 amino acids of AsCpfl, and Split-3-AsCpfl is 399 amino acids of AsCpfl Split-4-AsCpfl was divided into two fragments by separating WT AsCpfl between the 526th and 527th amino acids of AsCpfl (see FIG. 29A).
상기 얻어진 하프 도메인을 아래의 표 35에 정리하였다:  The resulting half domains are summarized in Table 35 below:
【표 35】 lldWOdNTHd 3ASAcfflLLV Table 35 lldWOdNTHd 3ASAcfflLLV
扁 ν Νλ«Ί Sd3W3DII0I  扁 ν Νλ «Ί Sd3W3DII0I
TavsH3daA3 NsaaAV cn  TavsH3daA3 NsaaAV cn
¾n Lcnd5a TWHVHS ia si¾5¾dvasi a¾Dwsii3b ¾n Lcnd5a TWHVHS ia si¾5¾dvasi a¾Dwsii3b
TNiaaronsa M3 VS LI  TNiaaronsa M3 VS LI
AV¾a5NSI0 Νδ ΐα 3)1  AV¾a5NSI0 Νδ ΐα 3) 1
ΊΗΝΊΊΊθ Π VIHAVONVaV aoivssuai ¾¾HSidiHn  ΊΗΝΊΊΊθ Π VIHAVONVaV aoivssuai ¾¾HSidiHn
aiSN13NJlV 3νΐ3ΤΛΝ3Ν¾  aiSN13NJlV 3νΐ3ΤΛΝ3Ν¾
AdSN aaoi WNS薩 Λ TO)iA)lDJS5 IA33aSM33 snnvAMiai VHsaaN3Ti  AdSN aaoi WNS 薩 Λ TO) iA) lDJS5 IA33aSM33 snnvAMiai VHsaaN3Ti
cniNSOOHAV Ι0¾33ΊΤνΠ HdlSVIIHVI aaN¾5lVTNl a VdAia¾A¾ 9I^HN3IAd Λ3ΝΊ9¾Ι)ί3Ι 0V3¾SIK)n  cniNSOOHAV Ι0¾33ΊΤνΠ HdlSVIIHVI aaN¾5lVTNl a VdAia¾A¾ 9I ^ HN3IAd Λ3ΝΊ9¾Ι) ί3Ι 0V3¾SIK) n
AIMOVIddl M L3N  AIMOVIddl M L3N
; MAKIMVcDW OdO¾5£STN AaaisiSAdi oivmNadH ᅳ ) UH IS3 HN¾II Add ¾ ΝαδΛΐ Η divismavs dAHOndan SIAdVdAAdl MAKIMVcDW OdO¾5 £ STN AaaisiSAdi oivmNadH ᅳ) UH IS3 HN¾II Add ¾ ΝαδΛΐ Η divismavs dAHOndan SIAdVdAAdl
T1VN3H31L 1 £Πδ)ί Λ)ί  T1VN3H31L 1 £ Πδ) ί Λ) ί
NTAOOA)iaVcI λ(Η ΛΌΝ ONdiaV^dlO ¾iaVHH¾ I  NTAOOA) iaVcI λ (Η ΛΌΝ ONdiaV ^ dlO ¾iaVHH¾ I
ν ΓΜΕ )Ι JAaHIVNHAl  ν ΓΜΕ) Ι JAaHIVNHAl
S KMN N31 A VQAHIWl VQ33ITVNHI 人 Sd  S KMN N31 A VQAHIWl VQ33ITVNHI 人 Sd
OAI3HIA5Sl Α30)Π(ΗΙΙΟ IWS NS n ΰΛΊΰΌΰαν入 OAI3HIA5Sl Α30) Π (ΗΙΙΟ IWS NS n ΰΛΊΰΌΰαν 入
S VGHWA 33 Nd 人 KIIIcDl 13 HCIN V δ入 (ωδδιΐΝΊ SHSaiims aaai d05a5i roni¾05dn αΐΛΐΐλΐτΝΗ aonaioiidi -I oei-so6 [ό£ ¥t¾l k)  S VGHWA 33 Nd 人 KIIIcDl 13 HCIN V δ 入 (ωδδιΐΝΊ SHSaiims aaai d05a5i roni¾05dn αΐΛΐΐλΐτΝΗ aonaioiidi -I oei-so6 (ό £ ¥ t¾l k)
1 UIBIUOQ UIBUIOQ  1 UIBIUOQ UIBUIOQ
9L OIO0XO0I333O0V33VI0V33IV0IV 0V DVO33WW0VO0VW0V1VI331 oiDDvmooivovoovooivoioovo 3V031VD300000DV013DW0V00019L OIO0XO0I333O0V33VI0V33IV0IV 0V DVO33WW0VO0VW0V1VI331 oiDDvmooivovoovooivoioovo 3V031VD300000DV013DW0V0001
30V0131VI000DV30W0I31V DW 1V00133V33I9DIOOV00133310V330V0131VI000DV30W0I31V DW 1V00133V33I9DIOOV00133310V3
3IW3V0000I3010I3100I3300V3 ovoomvioovowovxoivoooivo3IW3V0000I3010I3100I3300V3 ovoomvioovowovxoivoooivo
DOWOOVOODIOOOVOVODWOVOOOV 31VD1V3DD9W0I30V0DW3VI3V3 owovmoowowovooviivoiiL IVOIWOOOOOOOWOVOOVOOVOOIVDOWOOVOODIOOOVOVODWOVOOOV 31VD1V3DD9W0I30V0DW3VI3V3 owovmoowowovooviivoiiL IVOIWOOOOOOOWOVOOVOOVOOIV
0V33V33IV33V3W0ID30V0330V0 31L3030V03VOOV33IV3V30W0130V33V33IV33V3W0ID30V0330V0 31L3030V03VOOV33IV3V30W013
OVO0133IVOW000D3V3D1DV00IV ODVDWDOOOVOVDDOIVOIODVOI LOVO0133IVOW000D3V3D1DV00IV ODVDWDOOOVOVDDOIVOIODVOI L
0I0V3VDIV1VI3IV0133VW0V0V0 οοοοιονονοννοΰνοιοονοινιοιο0I0V3VDIV1VI3IV0133VW0V0V0 οοοοιονονοννοΰνοιοονοινιοιο
300003IV03XV3000IV3IVI33V3V 3W33V1U30DOV331LOVOV3V01V 300003IV03XV3000IV3IVI33V3V 3W33V1U30DOV331LOVOV3V01V
(i>k VNQ (l>k VNa  (i> k VNQ (l> k VNa)
3 ciHS VNAH 5Nd¾SdSNW 3 ciHS VNAH 5Nd¾SdSNW
G人 NiLidMU Jd¾asi^raa II3HSA3)LL IA dlTVHVS aS IHS HHN ΛΑαΛ 30λΈ adidi XDn iNM rraG 人 NiLidMU Jd¾asi ^ raa II3HSA3) LL IA dlTVHVS aS IHS HHN ΛΑαΛ 30λΈ adidi XDn iNM rra
HVMMHS¾d HAJ13VD3N1HVMMHS¾d HAJ13VD3N1
¾ISl)IVlNad SHOIAIIH¾ISl) IVlNad SHOIAIIH
TNcHOHHOW dd NAI n入TNcHOHHOW dd NAI n 入
TMOI VdWI 3¾3VIH5JSITMOI VdWI 3¾3VIH5JSI
HAlldNiaVA A331(HA5SS dinssiaisi ΠΧΛ^ΉΟΗ idai )ioiva Ηλθ^δσο™HAlldNiaVA A331 (HA5SS dinssiaisi ΠΧΛ ^ ΉΟΗ idai) ioiva Ηλθ ^ δσο ™
VAVl5H¾Icia ¾3dNNiaAiaVAVl5H¾Icia ¾3dNNiaAia
¾XiaiciaidN NSTlIdURL a 人 αλ α doasi¾aida ¾XiaiciaidN NSTlIdURL a 人 αλ α doasi¾aida
LL ACAAGGAACGCCCTGATCGAGGAGCAG CTGGAGAACCTGAAT TCGGCTTTAAGLL ACAAGGAACGCCCTGATCGAGGAGCAG CTGGAGAACCTGAAT TCGGCTTTAAG
GCCACATATCGCAATGCCATCCACGAC AGCAAGAGGACCGGCATCGCCGAGAAGGCCACATATCGCAATGCCATCCACGAC AGCAAGAGGACCGGCATCGCCGAGAAG
TAOTCATCGGCCGGACAGACAACCTG GCCGTGTACCAGCAGTTCGAGAAGATGTAOTCATCGGCCGGACAGACAACCTG GCCGTGTACCAGCAGTTCGAGAAGATG
ACCGATGCCATCAATAAGAGACACGCC CTGATCGATAAGCTGAATTGCCTGGTGACCGATGCCATCAATAAGAGACACGCC CTGATCGATAAGCTGAATTGCCTGGTG
GAGATCTACAAGGGCCTGTOAAGGCC CTGAAGGACTATCCAGCAGAGAAAGTGGAGATCTACAAGGGCCTGTOAAGGCC CTGAAGGACTATCCAGCAGAGAAAGTG
GAGCTGTTTAATGGCAAGGTGCTGAAG GGAGGCGTGCTGAACCCATACCAGCTGGAGCTGTTTAATGGCAAGGTGCTGAAG GGAGGCGTGCTGAACCCATACCAGCTG
CAGCTGGGCACCGTGACCACAACCGAG ACAGACCAGTOACCTCCTTTGCCAAGCAGCTGGGCACCGTGACCACAACCGAG ACAGACCAGTOACCTCCTTTGCCAAG
CACGAGAACGCCCTGCTGCGGAGCTO ATGGGCACCCAGTCTGGCTTCCTGTTTCACGAGAACGCCCTGCTGCGGAGCTO ATGGGCACCCAGTCTGGCTTCCTGTTT
GACAAGTTTACAACCTACnCTCCGGC TACGTGCCTGCCCCATATACATCTAAGGACAAGTTTACAACCTACnCTCCGGC TACGTGCCTGCCCCATATACATCTAAG
TnTATGAGAACAGGAAGAACGTGTO ATCGATCCCCTGACCGGOTCGTGGACTnTATGAGAACAGGAAGAACGTGTO ATCGATCCCCTGACCGGOTCGTGGAC
AGCGCCGAGGATATCAGCACAGCCATC CCOTCGTGTGGAAAACCATCAAGAATAGCGCCGAGGATATCAGCACAGCCATC CCOTCGTGTGGAAAACCATCAAGAAT
CCACACCGCATCGTGCAGGACAAOTC CACGAGAGCCGCAAGCAOTCCTGGAGCCACACCGCATCGTGCAGGACAAOTC CACGAGAGCCGCAAGCAOTCCTGGAG
CCCAAGmAAGGAGAATTGTCACATC GGCTOGACITTCTGCACTACGACGTGCCCAAGmAAGGAGAATTGTCACATC GGCTOGACITTCTGCACTACGACGTG
TOACACGCCTGATCACCGCCGTGCCC AAAACCGGCGACTTCATCCTGCACTTTTOACACGCCTGATCACCGCCGTGCCC AAAACCGGCGACTTCATCCTGCACTTT
AGCCTGCGGGAGCACTTTGAGAACGTG AAGATGAACAGAAATCTGTCOTCCAGAGCCTGCGGGAGCACTTTGAGAACGTG AAGATGAACAGAAATCTGTCOTCCAG
AAGMGGCCATCGGCATCTTCGTGAGC AGGGGCCTGCCCGGCTTTATGCCTGCAAAGMGGCCATCGGCATCTTCGTGAGC AGGGGCCTGCCCGGCTTTATGCCTGCA
ACCTCCATCGAGGAGGTGTTTTCOTC TGGGATATCGTGTTCGAGAAGAACGAGACCTCCATCGAGGAGGTGTTTTCOTC TGGGATATCGTGTTCGAGAAGAACGAG
CCTTTTTATAACCAGCTGCTGACACAG ACACAGTTTGACGCCAAGGGCACCCCTCCTTTTTATAACCAGCTGCTGACACAG ACACAGTTTGACGCCAAGGGCACCCCT
ACCCAGATCGACCTGTATAACCAGCTG TTCATCGCCGGCAAGAGAATCGTGCCAACCCAGATCGACCTGTATAACCAGCTG TTCATCGCCGGCAAGAGAATCGTGCCA
CTGGGAGGAATCTCTCGGGAGGCAGGC GTGATCGAGAATCACAGATTCACCGGCCTGGGAGGAATCTCTCGGGAGGCAGGC GTGATCGAGAATCACAGATTCACCGGC
ACCGAGAAGATCAAGGGCCTGAACGAG AGATACCGGGACCTGTATCCTGCCAACACCGAGAAGATCAAGGGCCTGAACGAG AGATACCGGGACCTGTATCCTGCCAAC
GTGCTGAATCTGGCCATCCAGAAGAAT GAGCTGATCGCCCTGCTGGAGGAGAAGGTGCTGAATCTGGCCATCCAGAAGAAT GAGCTGATCGCCCTGCTGGAGGAGAAG
GATGAGACAGCCCACATCATCGCCTCC GGCATCGTGTOAGGGATGGCTCCAACGATGAGACAGCCCACATCATCGCCTCC GGCATCGTGTOAGGGATGGCTCCAAC
CTGCCACACAGATOATCCCCCTGTTT ATCCTGCCAAAGCTGCTGGAGAATGACCTGCCACACAGATOATCCCCCTGTTT ATCCTGCCAAAGCTGCTGGAGAATGAC
AAGCAGATCCTGTCCGATAGGAACACC GAnCTCACGCCATCGACACCATGGTGAAGCAGATCCTGTCCGATAGGAACACC GAnCTCACGCCATCGACACCATGGTG
CTGTCTTTCATCCTGGAGGAGTTTAAG GCCCTGATCCGCAGCGTGCTGCAGATGCTGTCTTTCATCCTGGAGGAGTTTAAG GCCCTGATCCGCAGCGTGCTGCAGATG
AGCGACGAGGAAGTGATCCAGTCOTC CGGAACTCCAATGCCGCCACAGGCGAGAGCGACGAGGAAGTGATCCAGTCOTC CGGAACTCCAATGCCGCCACAGGCGAG
TGCAAGTACAAGACACTGCTGAGAAAC GACTATATCAAGAGCCCCGTGCGCGATTGCAAGTACAAGACACTGCTGAGAAAC GACTATATCAAGAGCCCCGTGCGCGAT
GAGAACGTGCTGGAGACAGCCGAGGCC CTGAATGGCGTGTGCnCGACTCCCGG oioovooovaovooiovwoooivoiv GAGAACGTGCTGGAGACAGCCGAGGCC CTGAATGGCGTGTGCnCGACTCCCGG oioovooovaovooiovwoooivoiv
0W0303001V01330110VI3V0IVI 0W0303001V01330110VI3V0IVI
3VI0IV0WIV0111D000V030V33V3VI0IV0WIV0111D000V030V33V
VW0V0V3V3030V031133V0I30a0VW0V0V3V3030V031133V0I30a0
0WIVI00V3000W0V0DVW330IV0WIVI00V3000W0V0DVW330IV
3IV3OO0131VI3V10I0DO03W0W 3IV3OO0131VI3V10I0DO03W0W
D101U3I331V3003901W3W0W  D101U3I331V3003901W3W0W
0VO0WIWDI33Vi»013OOJ3I33D 0VO0WIWDI33Vi »013OOJ3I33D
OIOVOVXODOIVOVOl LOWOIOOW OIOVOVXODOIVOVOl LOWOIOOW
D11OW0VO9I03313V1030OW0W  D11OW0VO9I03313V1030OW0W
33V3331VJLLVW0VD00DW0W3VI 33V3331VJLLVW0VD00DW0W3VI
OLLOOVOIOIOIIOODVOOIVOVOOIOOLLOOVOIOIOIIOODVOOIVOVOOIO
OW3IV30D33VOI30033001310LLOW3IV30D33VOI30033001310LL
0VO0DD3VO0IO0V03WD0I0V0IVO0VO0DD3VO0IO0V03WD0I0V0IVO
OXOOOOLUOOlOVODimOOVOOVIOXOOOOLUOOlOVODimOOVOOVI
0130009109I300V3V0DX00V3I3I0130009109I300V3V0DX00V3I3I
OWOIOOIVOVOOWOVOOVOOVOOW OWOIOOIVOVOOWOVOOVOOVOOW
0WDI333W0VI300I0VD3OV31V0 0WDI333W0VI300I0VD3OV31V0
913D0D3303V3V303V333I01331V913D0D3303V3V303V333I01331V
0V000V33WW0V30W31L0300V0 0V000V33WW0V30W31L0300V0
OOVOIOOVODWOODVaOOOOIOXOIV  OOVOIOOVODWOODVaOOOOIOXOIV
3IV0V00V30130W3IVIV00V03V3 3IV0V00V30130W3IVIV00V03V3
0W0I30DV303OV30IODW0VO0W 0W0I30DV303OV30IODW0VO0W
0301310W33VDIVDW003V3V013 0301310W33VDIVDW003V3V013
3V00V01W33131VOO03WOVDOI3 0VD3013IW0V00D0V0IV10I03303V00V01W33131VOO03WOVDOI3 0VD3013IW0V00D0V0IV10I0330
OWOIOIVODWOOVOVODWOIOOVO IWO0V0I0V0VIVOO0I3V33VO001 OWOIOIVODWOOVOVODWOIOOVO IWO0V0I0V0VIVOO0I3V33VO001
0I303033V30V3J,W3V0VO0133W  0I303033V30V3J, W3V0VO0133W
3iyDVD3VI003300IW0301V033D 0W3V330V3IV3JI3XV3V3V0V3I3 3iyDVD3VI003300IW0301V033D 0W3V330V3IV3JI3XV3V3V0V3I3
0VODIV3Q3O31DV0V303WOV3111 3V03XV30VDW0I00V03WX110I3 0VODIV3Q3O31DV0V303WOV3111 3V03XV30VDW0I00V03WX110I3
6/.6 /.
.CMO/9TOZaM/X3d t6l7660//J0Z OAV OIO3V30LLLJ13LL0W0VO0OV33V .CMO / 9TOZaM / X3d t6l7660 // J0Z OAV OIO3V30LLLJ13LL0W0VO0OV33V
l 30309VIVi)DW31V3IV0V0DV3 X3I0I0DV00W3DV01V0I03W333 0I0DI333OODV33O0V01V0I31010 3V03V03DI013V0V3V3IW0I01V1 3V93VI3133V DV33V10I333V3V0 OOOOIWOOOOWWOVOXVOOWOIO  l 30309VIVi) DW31V3IV0V0DV3 X3I0I0DV00W3DV01V0I03W333 0I0DI333OODV33O0V01V0I31010 3V03V03DI013V0V3V3IW0I01V1 3V93VI3133V DV33V10I333V3OOWOOOXO
0W0W3W01D3IV0WDV0V09DID O000VOV3O9IVO0VDW0IVOOV031 OWI333033V13LLOI30V03300V3  0W0W3W01D3IV0WDV0V09DID O000VOV3O9IVO0VDW0IVOOV031 OWI333033V13LLOI30V03300V3
DIOOWDVOVOOIOLLILOIDOOOOOV DIOOWDVOVOOIOLLILOIDOOOOOV
901LVI0I3V3V3V0D13IWI333W  901LVI0I3V3V3V0D13IWI333W
3003V33VD00a0W3301 L0Vi»VV 3W1V131V0V33LL0133VI0I33W  3003V33VD00a0W3301 L0Vi »VV 3W1V131V0V33LL0133VI0I33W
3ODV3V0VO0I0D301VO0IV3IV0V3 3W3V03D0DXW3VDV331L00V3IV 3V03V10I00I3333IW010DV3330 XVI3VI3V03300I33V00WIVI3V0 I3I33XV30O033X330VI3I3I0XV0 3IVI3I33WDV0W33VIVI0W33I 0I31LUV000W3V3LL3V03IV001 owmoiooooDvovovovioooow 3ODV3V0VO0I0D301VO0IV3IV0V3 3W3V03D0DXW3VDV331L00V3IV 3V03V10I00I3333IW010DV3330 XVI3VI3V03300I33V00WIVI3V0 I3I33XV30O033X330VI3I300V33V3V3V3V3V3
OVOOVOOOOOOWWOWOOOOVIOOO VDV0V3JJ 0W0VW30DV0DW0V0 I331W3WOI30V03V13IV3VOOW OVOOVOOOOOOWWOWOOOOVIOOO VDV0V3JJ 0W0VW30DV0DW0V0 I331W3WOI30V03V13IV3VOOW
V3V3IV0V00131333V03IV0 UW  V3V3IV0V00131333V03IV0 UW
3W3310X30ID3IV33033W3V3V3  3W3310X30ID3IV33033W3V3V3
1 33V0V3JIX3V333QV3V0X00099W  1 33V0V3JIX3V333QV3V0X00099W
0808
.CM0/9T0ZaM/X3d ^61^660/ . I OAV IHAIOllSai 5S)I113)1330.CM0 / 9T0ZaM / X3d ^ 61 ^ 660 /. I OAV IHAIOllSai 5S) I113) 1330
η ΐχ Μδα TWHVHSiia  η ΐχ Μδα TWHVHSiia
ΤΝΐα3Η¾Έ¾ OA¾a¾vs¾n ΤΝΐα3Η¾Έ¾ OA¾a¾vs¾n
(rasanHNTi ¾ ) 入 v miasi^a AivNHiia H  (rasanHNTi ¾) 入 v miasi ^ a AivNHiia H
ONVavaWdS dN¾aS(MA aoTvssuai  ONVavaWdS dN¾aS (MA aoTvssuai
ONldHAdSNI AaaOlWNSN aiSN S HTV 3VI3TAN3NH  ONldHAdSNI AaaOlWNSN aiSN S HTV 3VI3TAN3NH
AiiaivHsaa TlL¾) tK5 ΙΛ33(ΚΜ33  AiiaivHsaa TlL¾) tK5 ΙΛ33 (ΚΜ33
N3TT)kniNS 30HdAI0)iaa  N3TT) kniNS 30HdAI0) iaa
HcnSVIIHVl 3( 0ΐν ΝΊ  HcnSVIIHVl 3 (0ΐν ΝΊ
Λ3ΝΊ0)ίΙ¾3Ι 0V3¾SI09n  Λ3ΝΊ0) ίΙ¾3Ι 0V3¾SI09n
ONAiaiQlOl llQ AdcMSH miHaoi¾va A33ISISAdI 0ΐν¾ΙΛΝ3  ONAiaiQlOl llQ AdcMSH miHaoi¾va A33ISISAdI 0ΐν¾ΙΛΝ3
入 ΗΉ )3Ή腿 S3HN LL nsdAvin ^IHDNSM  入 ΗΉ) 3Ή 腿 S3HN LL nsdAvin ^ IHDNSM
divisiaavs TIVN3H31L ΙΛΧΟΊ0)ΠΛ)Ι divisiaavs TIVN3H31L ΙΛΧΟΊ0) ΠΛ) Ι
NDianra a^QQAAV^av υΉν ΙΌ MiaVH¾¾ I  NDianra a ^ QQAAV ^ av υΉν ΙΌ MiaVH¾¾ I
I L S )d NTNSTA VO ναιτΝαΐίΐοι dA(mivNH入 人 ΗΙΜΊ(ΙΛΙ3Η IAOSIAOS)!! voaanvNHi aaDiamsa ILS) d NTNSTA VO ναιτΝαΐίΐοι dA (mivNH 入 人 ΗΙΜΊ ( ΙΛΙ3Η IAOSIAOS) !! voaanvNHi aaDiamsa
(DIIIO S V 5HWA¾3)ia¾ IWSlNMai 5Λ1θ 5ανλ  (DIIIO S V 5HWA¾3) ia¾ IWSlNMai 5Λ1θ 5ανλ
J 入 I ailc 13¾AH0 HV)I  J 入 I ailc 13¾AH0 HV) I
DlOISaiAII ΑΓ ΗίΜΠ aaaiH0Da5i κηΐΜπ  DlOISaiAII ΑΓ ΗίΜΠ aaaiH0Da5i κηΐΜπ
01 IdiadH3¾ lAVNAHONd)! ( . e . e -Z 01 IdiadH3¾ lAVNAHONd)! e.
Z0CI- 88 tp ^i¾i>k) 988— ΐ to ^t¾l,k) Z0CI- 88 tp ^ i¾i> k) 988— ΐ to ^ t¾l, k)
ovo  ovo
3D33V0DV00W3I33V130DIW0I0  3D33V0DV00W3I33V130DIW0I0
O0V9V30W3LL0WI3IV033311W  O0V9V30W3LL0WI3IV033311W
OOOOOOOVOIVIOWOIOVOVOIVIOO  OOOOOOOVOIVIOWOIOVOVOIVIOO
1818
.CMO/9TOZaM/X3d LDWFAVDESN EVDPEFSARL .CMO / 9TOZaM / X3d LDWFAVDESN EVDPEFSARL
TGIKLEMEPS LSFYNKARNY TGIKLEMEPS LSFYNKARNY
ATKKPYSVEK FKLNFQMPTLATKKPYSVEK FKLNFQMPTL
ASGWDVNKEK NNGAILFVKNASGWDVNKEK NNGAILFVKN
GLYYLGIMPK QKGRYKALSFGLYYLGIMPK QKGRYKALSF
EPTEKTSEGF DKMYYDYFPDEPTEKTSEGF DKMYYDYFPD
AAKMIPKCST QLKAVTAHFQAAKMIPKCST QLKAVTAHFQ
THTTPILLSN NFIEPLEITKTHTTPILLSN NFIEPLEITK
EIYDLNNPEK EPKKFQTAYAEIYDLNNPEK EPKKFQTAYA
KKTGDQKGYR EALCKWIDFTKKTGDQKGYR EALCKWIDFT
RDFLSKYTKT TSIDLSSLRPRDFLSKYTKT TSIDLSSLRP
SSQYKDLGEY YAELNPLLYHSSQYKDLGEY YAELNPLLYH
ISFQRIAEKE IMDAVETGKLISFQRIAEKE IMDAVETGKL
YLFQIYNKDF AKGHHGKPNLYLFQIYNKDF AKGHHGKPNL
HTLYWTGLFS PENLAKTSIKHTLYWTGLFS PENLAKTSIK
LNGQAELFYR PKSRMK MAHLNGQAELFYR PKSRMK MAH
RLGEKML KK LKDQKTPIPDRLGEKML KK LKDQKTPIPD
TLYQELYDYV匪 RLSHDLSD TLYQELYDYV 匪 RLSHDLSD
EARALLPNVI TKEVSHEI IK EARALLPNVI TKEVSHEI IK
DRRFTSDKFF FHVPITLNYQDRRFTSDKFF FHVPITLNYQ
AANSPS AANSPS
(코딩 DNA 서열) (코딩 DNA 서열)  (Coding DNA sequence) (coding DNA sequence)
ATGACACAGTTCGAGGGCTTTACCAAC AAGTTCAACCAGAGGGTGAATGCCTAC ATGACACAGTTCGAGGGCTTTACCAAC AAGTTCAACCAGAGGGTGAATGCCTAC
CTGTATCAGGTGAGCAAGACACTGCGG CTGAAGGAGCACCCCGAGACACCTATCCTGTATCAGGTGAGCAAGACACTGCGG CTGAAGGAGCACCCCGAGACACCTATC
TITGAGCTGATCCCACAGGGCAAGACC ATCGGCATCGATCGGGGCGAGAGAAACTITGAGCTGATCCCACAGGGCAAGACC ATCGGCATCGATCGGGGCGAGAGAAAAC
CTGAAGCACATCCAGGAGCAGGGCTO CTGATCTATATCACAGTGATCGACTCCCTGAAGCACATCCAGGAGCAGGGCTO CTGATCTATATCACAGTGATCGACTCC
ATCGAGGAGGACAAGGCCCGCAATGAT ACCGGCAAGATCCTGGAGCAGCGGAGCATCGAGGAGGACAAGGCCCGCAATGAT ACCGGCAAGATCCTGGAGCAGCGGAGC
CACTACAAGGAGCTGAAGCCCATCATC CTGAACACCATCCAGCAGTTTGAmCCACTACAAGGAGCTGAAGCCCATCATC CTGAACACCATCCAGCAGTTTGAmC
GATCGGATCTACAAGACCTATGCCGAC CAGAAGAAGCTGGACAACAGGGAGAAG CAGTGCCTGCAGCTGGTGCAGCTGGAT GAGAGGGTGGCAGCAAGGCAGGCCTGGGATCGGATCTACAAGACCTATGCCGAC CAGAAGAAGCTGGACAACAGGGAGAAG CAGTGCCTGCAGCTGGTGCAGCTGGAT GAGAGGGTGGCAGCAAGGCAGGCCTGG
TGGGAGAACCTGAGCGCCGCCATCGAC TCTGTGGTGGGCACAATCAAGGATCTGTGGGAGAACCTGAGCGCCGCCATCGAC TCTGTGGTGGGCACAATCAAGGATCTG
TCCTATAGAAAGGAGAAAACCGAGGAG AAGCAGGGCTATCTGAGCCAGGTCATCTCCTATAGAAAGGAGAAAACCGAGGAG AAGCAGGGCTATCTGAGCCAGGTCATC
ACAAGGAACGCCCTGATCGAGGAGCAG CACGAGATCGTGGACCTGATGATCCACACAAGGAACGCCCTGATCGAGGAGCAG CACGAGATCGTGGACCTGATGATCCAC
GCCACATATCGCAATGCCATCCACGAC TACCAGGCCGTGGTGGTGCTGGAGAACGCCACATATCGCAATGCCATCCACGAC TACCAGGCCGTGGTGGTGCTGGAGAAC
TACTTCATCGGCCGGACAGACAACCTG CTGAATTTCGGCTTTAAGAGCAAGAGGTACTTCATCGGCCGGACAGACAACCTG CTGAATTTCGGCTTTAAGAGCAAGAGG
ACCGATGCCATCAATAAGAGACACGCC ACCGGCATCGCCGAGAAGGCCGTGTACACCGATGCCATCAATAAGAGACACGCC ACCGGCATCGCCGAGAAGGCCGTGTAC
GAGATCTACAAGGGCCTGTTCAAGGCC CAGCAGTTCGAGAAGATGCTGATCGATGAGATCTACAAGGGCCTGTTCAAGGCC CAGCAGTTCGAGAAGATGCTGATCGAT
GAGCTGTTTAATGGCAAGGTGCTGAAG MGCTGMTTGCCTGGTGCTGAAGGACGAGCTGTTTAATGGCAAGGTGCTGAAG MGCTGMTTGCCTGGTGCTGAAGGAC
CAGCTGGGCACCGTGACCACAACCGAG TATCCAGCAGAGAAAGTGGGAGGCGTGCAGCTGGGCACCGTGACCACAACCGAG TATCCAGCAGAGAAAGTGGGAGGCGTG
CACGAGAACGCCCTGCTGCGGAGOTC CTGAACCCATACCAGCTGACAGACCAGCACGAGAACGCCCTGCTGCGGAGOTC CTGAACCCATACCAGCTGACAGACCAG
GACAAGTTTACAACCTACnCTCCGGC TTCACCTCCriTGCCAAGATGGGCACCGACAAGTTTACAACCTACnCTCCGGC TTCACCTCCriTGCCAAGATGGGCACC
TTTTATGAGAACAGGAAGAACGTGTTC CAGTCTGGCTTCCTGTTTTACGTGCCTTTTTATGAGAACAGGAAGAACGTGTTC CAGTCTGGCTTCCTGTTTTACGTGCCT
AGCGCCGAGGATATCAGCACAGCCATC GCCCCATATACATCTAAGATCGATCCCAGCGCCGAGGATATCAGCACAGCCATC GCCCCATATACATCTAAGATCGATCCC
CCACACCGCATCGTGCAGGACAAOTC CTGACCGGCTTCGTGGACCCCTTCGTGCCACACCGCATCGTGCAGGACAAOTC CTGACCGGCTTCGTGGACCCCTTCGTG
CCCAAGTTTAAGGAGAATTGTCACATC TGGAAAACCATC GAATCACGAGAGCCCCAAGTTTAAGGAGAATTGTCACATC TGGAAAACCATC GAATCACGAGAGC
TOACACGCCTGATCACCGCCGTGCCC CGCAAGCAOTCCTGGAGGGCTOGACTOACACGCCTGATCACCGCCGTGCCC CGCAAGCAOTCCTGGAGGGCTOGAC
AGCCTGCGGGAGCACITTGAGAACGTG 1TTCTGCACTACGACGTGAAAACCGGCAGCCTGCGGGAGCACITTGAGAACGTG 1TTCTGCACTACGACGTGAAAACCGGC
AAGAAGGCCATCGGCATCTTCGTGAGC GACTTCATCCTGCACTTTAAGATGAACAAGAAGGCCATCGGCATCTTCGTGAGC GACTTCATCCTGCACTTTAAGATGAAC
ACCTCCATCGAGGAGGTGmTCOTC AGAAATCTGTCOTCCAGAGGGGCCTGACCTCCATCGAGGAGGTGmTCOTC AGAAATCTGTCOTCCAGAGGGGCCTG
CCTTTTTATAACCAGCTGCTGACACAG CCCGGCTTTATGCCTGCATGGGATATCCCTTTTTATAACCAGCTGCTGACACAG CCCGGCTTTATGCCTGCATGGGATATC
ACCCAGATCGACCTGTATAACCAGCTG GTGTTCGAGMGAACGAGACACAGTTTACCCAGATCGACCTGTATAACCAGCTG GTGTTCGAGMGAACGAGACACAGTTT
CTGGGAGGAATCTCTCGGGAGGCAGGC GACGCCMGGGCACCCCTTTCATCGCCCTGGGAGGAATCTCTCGGGAGGCAGGC GACGCCMGGGCACCCCTTTCATCGCC
ACCGAGAAGATCAAGGGCCTGAACGAG GGCAAGAGAATCGTGCCAGTGATCGAGACCGAGAAGATCAAGGGCCTGAACGAG GGCAAGAGAATCGTGCCAGTGATCGAG
GTGCTGAATCTGGCCATCCAGAAGAAT AATCACAGATTCACCGGCAGATACCGGGTGCTGAATCTGGCCATCCAGAAGAAT AATCACAGATTCACCGGCAGATACCGG
GATGAGACAGCCCACATCATCGCCTCC GACCTGTATCCTGCCAACGAGCTGATCGATGAGACAGCCCACATCATCGCCTCC GACCTGTATCCTGCCAACGAGCTGATC
CTGCCACACAGATTCATCCCCCTG TT GCCCTGCTGGAGGAGAAGGGCATCGTGCTGCCACACAGATTCATCCCCCTG TT GCCCTGCTGGAGGAGAAGGGCATCGTG
AAGCAGATCCTGTCCGATAGGAACACC TTCAGGGATGGCTCCAACATCCTGCCAAAGCAGATCCTGTCCGATAGGAACACC TTCAGGGATGGCTCCAACATCCTGCCA
CTGTCTTTCATCCTGGAGGAGTTTAAG AAGCTGCTGGAGAATGACGATTCTCAC VW0V3V3V3330V03J133V0I3030 CTGTCTTTCATCCTGGAGGAGTTTAAG AAGCTGCTGGAGAATGACGATTCTCAC VW0V3V3V3330V03J133V0I3030
owivioovoooowovoovwomv owivioovoooowovoovwomv
3IV3O DI3IVI3VIDI30O03W0VV3IV3O DI3IVI3VIDI30O03W0VV
0I01 L01331V3003OD1W3W0W oiovoviomvovoiuowoioow 0I01 L01331V3003OD1W3W0W oiovoviomvovoiuowoioow
3LL0W0V00I033I3VI330DW0W  3LL0W0V00I033I3VI330DW0W
33V330IV11WV3V3300W3WOV1 33V330IV11WV3V3300W3WOV1
3J 30V0I3I3J1300V 0IVDVO0I33J 30V0I3I3J1300V 0IVDVO0I3
0W3IV3D033V013303030I3131L 0W3IV3D033V013303030I3131L
0103301 L00I0V00130I33V33V10103301 L00I0V00130I33V33V1
OX30000I30I300VDV 0103V3I3IOX30000I30I300VDV 0103V3I3I
0W01D3IV0Vi»W0Vi)0V09V39W 0W01D3IV0Vi »W0Vi) 0V09V39W
0W0I303W3VI30DI3V030V3IV0 0W0I303W3VI30DI3V030V3IV0
OI33003303V3V303V033I9I301VOI33003303V3V303V033I9I301V
0V330V33WW0V30W31L3D00VD oovoiaovoowooovooomoioiv0V330V33WW0V30W31L3D00VD oovoiaovoowooovooomoioiv
3W303f)I30V00V331V3VI 3IV3V00V3DI33W3IV1V00V03V33W303f) I30V00V331V3VI 3IV3V00V3DI33W3IV1V00V03V3
33O0I3ODI3VO9V3IW33I3IV0O0 OWOIOOOVOOOOVOOIOOWOVOOW 33O0I3ODI3VO9V3IW33I3IV0O0 OWOIOOOVOOOOVOOIOOWOVOOW
OWOVOOIOOWOIOIVOOWOOVOVO omoiowoovoivowooovovoio  OWOVOOIOOWOIOIVOOWOOVOVO omoiowoovoivowooovovoio
0V03313IW0V0030V0IV101D330 0V03313IW0V0030V0IV101D330
0W3I330331V3V33VI300a0DlW 1WOOV3I3V3V1V00013V33V030I0W3I330331V3V33VI300a0DlW 1WOOV3I3V3V1V00013V33V030I
300IV03003V001V30000I0V0V33 0I333003V30V3IVV3V0V DI30VV300IV03003V001V30000I0V0V33 0I333003V30V3IVV3V0V DI30VV
3W0V3L 003D3I3V03X130I0I0 0W3V330V3IV31L31V0V3V3V0133W0V3L 003D3I3V03X130I0I0 0W3V330V3IV31L31V0V3V3V013
OOOIWOIOIVDOOOOIOOOOOOVOW 3V03IV30V3W0100V03W11L013OOOIWOIOIVDOOOOIOOOOOOVOW 3V03IV30V3W0100V03W11L013
31VlV10VODVD0i)0V3V33ODmVV 3DODV3003V3VOVOOIOD133WOVO31VlV10VODVD0i) 0V3V33ODmVV 3DODV3003V3VOVOOIOD133WOVO
33I3W0000IV0VD3I00X030V003 3VW0V0I33I3V3V0W3VI0W3{)133I3W0000IV0VD3I00X030V003 3VW0V0I33I3V3V0W3VI0W3 () 1
31V0I333ODIO0IV33V3V03IV330 D1L33I0V331V01OWO0V33VO30V t6l7660//J0Z OAV 013DI333000V30 0V01VOI3XOI3 31V0I333ODIO0IV33V3V03IV330 D1L33I0V331V01OWO0V33VO30V t6l7660 // J0Z OAV 013DI333000V30 0V01VOI3XOI3
3V03V033I0I3V0V3V3IW3I0IVX 3V03VIDI30VODV33VI3103DV3V0 33331W3333WW0V3IVO0WOI3 0W3W3W0I30IV0W3V0V000X3 OD33V0V3O0IVO0V0W0IVO0V33X owiooooooviouoioovooaoovo  3V03V033I0I3V0V3V3IW3I0IVX 3V03VIDI30VODV33VI3103DV3V0 33331W3333WW0V3IVO0WOI3 0W3W3W0I30IV0W3V0V000X3 OD33V0V3O0IVO0V0W0IVooooo0
303IW0X30W3IV00W3V0W330 0I33W0VOV33I31L1L3133O933V 001LV10I3V3V3V30I3IWI33DW  303IW0X30W3IV00W3V0W330 0I33W0VOV33I31L1L3133O933V 001LV10I3V3V3V30I3IWI33DW
OOOOVOOVOOOOOWOOOLUOVOOW  OOOOVOOVOOOOOWOOOLUOVOOW
3W1V13XV3V301L0I33VI0I30W  3W1V13XV3V301L0I33VI0I30W
OOOVOVOV OIOOOOIVOOIVOIVOVO  OOOVOVOV OIOOOOIVOOIVOIVOVO
3W0V03000IW0V3V331LD0V31V 0V33VI0I00I3333IW0100V03D3 IVI3VI0V00O00I33V00WIVIOV3 13I331V3000301300VI310I0IV0 D1V13I33W3V0W33VIV10W33I 0I31LLLV009WDV3LL3V03IV001 OWmOIOOODOVOVOVOVIOOOOW  3W0V03000IW0V3V331LD0V31V 0V33VI0I00I3333IW0100V03D3 IVI3VI0V00O00I33V00WIVIOV3 13I331V3000301300VI310I0IV0 D1V13I33W3V0W33VIV10W33I 0I31LLLV009WDVOO3VOVO3VOVO
OVOOVOOOOOOWWOWOOOOVIOOO  OVOOVOOOOOOWWOWOOOOVIOOO
V3V0V3 0W0VW330V00W0V0 ' I33IWDWDI33V00VI31V0VODW V3V0V3 0W0VW330V00W0V0 ' I33IWDWDI33V00VI31V0VODW
V3V3IV0V0DI31333V03IV31 LW  V3V3IV0V0DI31333V03IV31 LW
3W33I0I33I33IV0330DW3V3V3 33V0VD1L13V3D00V3V0I00DODVV OI03V333V0OV33IOVW333IVOIV OW33O3OD1V0X030J13VI3V0IVI I 0VX0XV5WXV5JI13TOV93SV33V .CMO/9TOZaM/X3d ^61^660/ . I OAV 3W33I0I33I33IV0330DW3V3V3 33V0VD1L13V3D00V3V0I00DODVV OI03V333V0OV33IOVW333IVOIV OW33O3OD1V0X030J13VI3V0IVI I 0VX0XV5WXV5JI13TOV93SV3V / V3 I OAV
Figure imgf000087_0001
Figure imgf000087_0001
GGCCG ATCGGGAAGGCCCGCAGGAAAACCGGCCGAAAATAT AAAAGCAATCCTTGGCCG ATCGGGAAGGCCCGCAGGAAAACCGGCCGAAAATAT AAAAGCAATCCTT
C GGGGGGCCAAAGAGCGGGCOTATAC C GGGG G GCCAAAGAGCGGGC OT ATAC
3OO033000 3193OOVIV3OWI1VVV 3 OO033000 3193OOVIV3O W I1VVV
GCCCGCCGCG T TGCGATCGGGCAACCCAATAGGATCCTATAAA ATAATTCGCCT G CCCGCCGCG T TGCGATCGGGCAACCCAATAGGATCC T ATAAA ATAATTCGCCT
C GGGCCCGCGC TG CGTAGCATGCGG GGAAGCCGAAGCCG TA TAAAATTAGGAAT GCC CGGGCCCCCCGC ATAAGTTAGCAAAGATAAAGCCGGG A TTT AATTAAAAGA C GGGCCCGCGC TG CGTAGCATGCGG GGAAGCCGAAGCCG TA TAAAATTAGGAAT G CC CGGGCCCCCCGC ATAA G TTAGCAAAGATAAAGCCGGG A TTT AATTAAAAGA
서딩서딩열 ( ) () DN DNA A코코 δ ΝΗΊ3ΙV QG lALLLNLK Sudding Sudding Heat ( ) ( Column ) DN DNA A Coco δ ΝΗΊ3ΙV QG lALLLNLK
SaoNuaiw j
Figure imgf000088_0001
ITVxHaiv
S ao N ua i w j
Figure imgf000088_0001
ITV x H a i v
o o
SOCMA D INd I S OC M A D INd I
CO  CO
2; ra VdAl(A¾I 2; ra VdAl ( A¾I
σ AG KTPFI KR 3σ AG KTPFI K R 3
V PGFPAWDIM NHESRKHFLE PLG TKIDTF QGSSM TFAKT ' 90 AA¾aVdAaV PGFPAWDIM NHESRKHFLE PLG TKIDTF QGSS M TFAKT '90 AA¾ a VdA a
δ0 ΤΙ¾3ΗΛΛνδ 0 Τ Ι ¾3ΗΛΛ ν
ΝΛΛΝ3 Ν ΛΛΝ3
Q GVYLSIHEI SO QdlISNlH Q GVYLSIHEI SO QdlISNlH
GLIYITVERN QVALKHRNYE 0WIVI0V3I3I33IV330030I330V U10I333331V311V0VDV3V3301D moioivooivnioowovowDov OOIDDODIVDIVOVOOOOV VOVOIVOG LIYITVERN QVALKHRNYE 0WIVI0V3I3I33IV330030I330V U10I333331V311V0VDV3V3301D moioivooivnioowovowDov OOIDDODIVDIVOVOOOOV VOVOIVO
IVI3W3D10131LUV900W3V3LL IW3W0V33JLV330013IW0I03I0 DV33IVO0IDWm0I330O0V0V0V 3V03W0I3m)W3IV0W0V033V DVim)W0V3DV0D0D33WW0W IVI3W3D10131LUV900W3V3LL IW3W0V33JLV330013IW0I03I0 DV33IVO0IDWm0I330O0V0V0V 3V03W0I3m) W3IV0W0V033V DVim) W0V3DV0D0D33WW0W
OODOVIOOOVOVOVOILLOWOWVOO 0I33V33WIVI0133V031V0VD33V 0V00W0V013DIW3W0I33V03V1 OVOVOVOimOOVOOWIVLLL OO  OODOVIOOOVOVOVOVOLOWLOWOWVOO 0I33V33WIVI0133V031V0VD33V 0V00W0V013DIW3W0I33V03V1 OVOVOVOimOOVOOWIVLLL OO
D1VOV90VW3V3IV0VOOI3I030VO 31L3311 L0IO3VO0V03IV33I33V DIV31 LW3W03I0130I031V303 OOVOXOO LOIVODOOIVOO DWOW  D1VOV90VW3V3IV0VOOI3I030VO 31L3311 L0IO3VO0V03IV33I33V DIV31 LW3W03I0130I031V303 OOVOXOO LOIVODOOIVOO DWOW
D3W3V0V333V0V3X113V333DV3V OIOOWOVO LOVOOVOODOOIOOOV 0ID30i)0W0ID0V303Va9VD3I0W  D3W3V0V333V0V3X113V333DV3V OIOOWOVO LOVOOVOODOOIOOOV 0ID30i) 0W0ID0V303Va9VD3I0W
V303IV0IVDW33D330IV0I3331L 3IV3V3I311VV0V00VV1 10VV333 3V13V0IV13VI0IV0W1V011100D 3 L3W3V00V30I00IV3033V3V33 9V000V3DWW3V0V3V3333V0D 1 οινοαΰνον ονοινινοονοοοοαον  V303IV0IVDW33D330IV0I3331L 3IV3V3I311VV0V00VV1 10VV333 3V13V0IV13VI0IV0W1V011100D 3 L3W3V00V30I00IV3033V3V33 9V000V3DWW3V0V3V3333V0D 1 οονοαΰνοαονοα
30V0I333ODWIVIO0VO00OW0V0 31L0IO3W0WO0V3W0V0IV1U1 0VW330IV0IV3000I3IVI3V1013 3003313 L3V130W3V U0W3V0 DO0DW0WD1DJHOI301V30D3O0 31LD0V0000ia0I33a03W3V03V3 IW3WOWOVO0WIWOIO3VOOD1 ovooowovoovoiooovaoooioovo  30V0I333ODWIVIO0VO00OW0V0 31L0IO3W0WO0V3W0V0IV1U1 0VW330IV0IV3000I3IVI3V1013 3003313 L3V130W3V U0W3V0 DO0DW0WD1DJHOI301V30D3O0 31LD030000W3O3V03O3V033
DO0X3I33O3I3V3VX303IV0VD U 3WOI00103W300IW OI30VO 3W3I00W31L3W0V00ID33X3VI 33D0W31LO130¾)0W3VI3IVOV0 ooaowowoovoaoivLLvwovooo 0003V3V0V0WIW31V33DIV033V DO0X3I33O3I3V3VX303IV0VD U 3WOI00103W300IW OI30VO 3W3I00W31L3W0V00ID33X3VI 33D0W31LO130¾) 0W3VI3IVOV0 ooaowowoovoaoivLLvwovooo 0003V3V0V0WIW31V33V03
0W3W0VI3JI )VDI3I3LL333V0 0133W3V0V3VO03DOD0IV31L3VI 0IV0V00130W3IV00033V010003 ovoovooivoooiwoooxvivovoao 0W3W0VI3JI ) VDI3I3LL333V0 0133W3V0V3VO03DOD0IV31L3VI 0IV0V00130W3IV00033V010003 ovoovooivoooiwoooxvivovoao
300I3I3 L9VD3033V00I03VD3W OV30VOOV03IV0103003WOOW3V DDIOVOIVOOlOOOOLUOOIOVmO OVO0VO30WWOVODVW0V1VI33I 0I33V33VI3X30090im033V0V0 DVOOIVOaOOOOOOVOIODWOVOODl  300I3I3 L9VD3033V00I03VD3W OV30VOOV03IV0103003WOOW3V DDIOVOIVOOlOOOOLUOOIOVmO OVO0VO30WWOVODVW0V1VI33I 0I33V33VI3X30090im033V0V0 DVOOIVOaOOOOOOlVOIODWOV
0I30V3X313W0I33IV0V00WDV0 1V9DI00V3DI00I39V0013D0X0V0 OVOOVOOWOWDIOOOWOVIOOOIO ovooaoivioovowovioivoooivo  0I30V3X313W0I33IV0V00WDV0 1V9DI00V3DI00I39V0013D0X0V0 OVOOVOOWOWDIOOOWOVIOOOIO ovooaoivioovowovioivoooivo
V0a0V3IVrJ0I33303303V3V003V0 3IV3IV3330W0I30V00W3VI3V3 V0a0V3IV r J0I33303303V3V003V0 3IV3IV3330W0I30V00W3VI3V3
8888
.CMO/9TOZaM/X3d t6l7660//J0Z OAV AAGCAGATCCTGTCCGATAGGAACACC GACCTGGGCGAGTACTATGCCGAGCTG.CMO / 9TOZaM / X3d t6l7660 // J0Z OAV AAGCAGATCCTGTCCGATAGGAACACC GACCTGGGCGAGTACTATGCCGAGCTG
CTGTCITTCATCCTGGAGGAGnTAAG AATCCCCTGCTGTACCACATCAGOTCCTGTCITTCATCCTGGAGGAGnTAAG AATCCCCTGCTGTACCACATCAGOTC
AGCGACGAGGAAGTGATCCAGTCOTC CAGAGAATCGCCGAGAAGGAGATCATGAGCGACGAGGAAGTGATCCAGTCOTC CAGAGAATCGCCGAGAAGGAGATCATG
TGCAAGTACAAGACACTGCTGAGAAAC GATGCCGTGGAGACAGGCAAGCTGTACTGCAAGTACAAGACACTGCTGAGAAAC GATGCCGTGGAGACAGGCAAGCTGTAC
GAGAACGTGCTGGAGACAGCCGAGGCC CTGTTCCAGATCTATAACAAGGACITTGAGAACGTGCTGGAGACAGCCGAGGCC CTGTTCCAGATCTATAACAAGGACITT
CTGTTTAACGAGCTGAACAGCATCGAC GCCAAGGGCCACCACGGCAAGCCTAATCTGTTTAACGAGCTGAACAGCATCGAC GCCAAGGGCCACCACGGCAAGCCTAAT
CTGACACACATOTCATCAGCCACAAG CTGCACACACTGTATTGGACCGGCCTGCTGACACACATOTCATCAGCCACAAG CTGCACACACTGTATTGGACCGGCCTG
AAGCTGGAGACAATCAGCAGCGCCCTG nTTCTCCAGAGAACCTGGCCAAGACAAAGCTGGAGACAATCAGCAGCGCCCTG nTTCTCCAGAGAACCTGGCCAAGACA
TGCGACCACTGGGATACACTGAGGAAT AGCATCAAGCTGAATGGCCAGGCCGAGTGCGACCACTGGGATACACTGAGGAAT AGCATCAAGCTGAATGGCCAGGCCGAG
GCCCTGTATGAGCGGAGMTCTCCGAG CTGTTCTACCGCCCTAAGTCCAGGATGGCCCTGTATGAGCGGAGMTCTCCGAG CTGTTCTACCGCCCTAAGTCCAGGATG
CTGACAGGC AAGAGGATGGCACACCGGCTGGGAGAGCTGACAGGC AAGAGGATGGCACACCGGCTGGGAGAG
AAGATGCTGAACAAGAAGCTGAAGGATAAGATGCTGAACAAGAAGCTGAAGGAT
CAGAAAACCCCAATCCCCGACACCCTGCAGAAAACCCCAATCCCCGACACCCTG
TACCAGGAGCTGTACGACTATGTGAATTACCAGGAGCTGTACGACTATGTGAAT
CACAGACTGTCCCACGACCTGTCTGATCACAGACTGTCCCACGACCTGTCTGAT
GAGGCCAGGGCCCTGCTGCCCAACGTGGAGGCCAGGGCCCTGCTGCCCAACGTG
ATCACCAAGGAGGTGTCTCACGAGATCATCACCAAGGAGGTGTCTCACGAGATC
ATCAAGGATAGGCGCTTTACCAGCGACATCAAGGATAGGCGCTTTACCAGCGAC
AAGTTCTTTTTCCACGTGCCTATCACAAAGTTCTTTTTCCACGTGCCTATCACA
CTGAACTATCAGGCCGCCAATOCCCACTGAACTATCAGGCCGCCAATOCCCA
TCTMGTTCAACCAGAGGGTGAATGCCTCTMGTTCAACCAGAGGGTGAATGCC
TACCTGAAGGAGCACCCCGAGACACCTTACCTGAAGGAGCACCCCGAGACACCT
ATCATCGGCATCGATCGGGGCGAGAGAATCATCGGCATCGATCGGGGCGAGAGA
AACCTGATCTATATCACAGTGATCGACAACCTGATCTATATCACAGTGATCGAC
TCCACCGGCAAGATCCTGGAGCAGCGGTCCACCGGCAAGATCCTGGAGCAGCGG
AGCCTGAACACCATCCAGCAGITTGATAGCCTGAACACCATCCAGCAGITTGAT
TACCAGAAGAAGCTGGACAACAGGGAGTACCAGAAGAAGCTGGACAACAGGGAG
AAGGAGAGGGTGGCAGCAAGGCAGGCCAAGGAGAGGGTGGCAGCAAGGCAGGCC
TGGTCTGTGGTGGGCACAATCAAGGAT CTGAAGCAGGGCTATCTGAGCCAGGTC ATCCACGAGATGGTGGACCTGATGATC CACTACCAGGCCGTGGTGGTGCTGGAG AACCTGAAITTCGGCITTAAGAGCAAG AGGACCGGCATCGCCGAGAAGGCCGTG TACCAGCAGTTCGAGAAGATGCTGATC GATAAGCTGAATTGCCTGGTGCTGAAG GACTATCCAGCAGAGAAAGTGGGAGGC GTGCTGAACCCATACCAGCTGACAGAC CAGTTCACCTCCriTGCCAAGATGGGC ACCCAGTCTGGCTTCCTGrnTACGTG CCTGCCCCATATACATCTAAGATCGAT. CCCCTGACCGGCTTCGTGGACCCCTTC GTGTGGAAAACCATCAAGAATCACGAG AGCCGCAAGCACTTCCTGGAGGGCTTC GAC1TTCTGCACTACGACGTGAAAACC GGCGACnGATCCTGCACITTAAGATG AACAGAAATCTGTCOTCCAGAGGGGC CTGCCCGGCTTTATGCCTGCATGGGAT ATCGTGTTCGAGAAGAACGAGACACAG TTGACGCCAAGGGCACCCCTTTCATC GCCGGCAAGAGAATCGTGCCAGTGATC GAGAATCACAGATOACCGGCAGATAC CGGGACCTGTATCCTGCCAACGAGCTG ATCGCCCTGCTGGAGGAGAAGGGCATC GTGTTCAGGGATGGCTCCAACATCCTG CCAAAGCTGCTGGAGAATGACGATOT CACGCCATCGACACCATGGTGGCCCTG ATCCGCAGCGTGCTGCAGATGCGGAAC TCCAATGCCGCCACAGGCGAGGACTAT TGGTCTGTGGTGGGCACAATCAAGGAT CTGAAGCAGGGCTATCTGAGCCAGGTC ATCCACGAGATGGTGGACCTGATGATC CACTACCAGGCCGTGGTGGTGCTGGAG AACCTGAAITTCGGCITTAAGAGCAAG AGGACCGGCATCGCCGAGAAGGCCGTG TACCAGCAGTTCGAGAAGATGCTGATC GATAAGCTGAATTGCCTGGTGCTGAAG GACTATCCAGCAGAGAAAGTGGGAGGC GTGCTGAACCCATACCAGCTGACAGAC CAGTTCACCTCCriTGCCAAGATGGGC ACCCAGTCTGGCTTCCTGrnTACGTG CCTGCCCCATATACATCTAAGATCGAT. CCCCTGACCGGCTTCGTGGACCCCTTC GTGTGGAAAACCATCAAGAATCACGAG AGCCGCAAGCACTTCCTGGAGGGCTTC GAC1TTCTGCACTACGACGTGAAAACC GGCGACnGATCCTGCACITTAAGATG AACAGAAATCTGTCOTCCAGAGGGGC CTGCCCGGCTTTATGCCTGCATGGGAT ATCGTGTTCGAGAAGAACGAGACACAG TTGACGCCAAGGGCACCCCTTTCATC GCCGGCAAGAGAATCGTGCCAGTGATC GAGAATCACAGATOACCGGCAGATAC CGGGACCTGTATCCTGCCAACGAGCTG ATCGCCCTGCTGGAGGAGAAGGGCATC GTGTTCAGGGATGGCTCCAACATCCTG CCAAAGCTGCTGGAGAATGACGATOT CACGCCATCGACACCATGGTGGCCCTG ATCCGCAGCGTGCTGCAGATGCGGAAC TCCAATGCCGCCACAGGCGAGGACTAT
ATCAACAGCCCCGTGCGCGATCTGAAT ATCAACAGCCCCGTGCGCGATCTGAAT
GGCGTGTGCTTCGACTCCCGGTTTCAGGGCGTGTGCTTCGACTCCCGGTTTCAG
AACCCAGAGTGGCCCATGGACGCCGATAACCCAGAGTGGCCCATGGACGCCGAT
GCCAATGGCGCCTACCACATCGCCCTGGCCAATGGCGCCTACCACATCGCCCTG
AAGGGCCAGCTGCTGCTGAATCACCTGAAGGGCCAGCTGCTGCTGAATCACCTG
AAGGAGAGCAAGGATCTGAAGCTGCAGAAGGAGAGCAAGGATCTGAAGCTGCAG
AACGGCATCTCCAATCAGGACTGGGTGAACGGCATCTCCAATCAGGACTGGGTG
GCCTACATCCAGGAGCTGCGCAAC GCCTACATCCAGGAGCTGCGCAAC
Spl i t- (서열번호 43의 1-526 a . a . ) (서열번호 43의 527-1307 Spl i t- (1-526 a. A. Of SEQ ID NO: 43) (527-1307 of SEQ ID NO: 43
4- MTQFEGFTNL YQVSKTLRFE a . a . ) SVEKFKLNFQ MPTLASGWDV4- MTQFEGFTNL YQVSKTLRFE a. a. ) SVEKFKLNFQ MPTLASGWDV
AsCpf l LIPQGKTLKH IQEQGFIEED NKEKNNGAIL FVKNGLYYLG AsCpf l LIPQGKTLKH IQEQGFIEED NKEKNNGAIL FVKNGLYYLG
KARNDHYKEL KPI IDRIYKT IMPKQKGRYK AI,SFEPTEKT KARNDHYKEL KPI IDRIYKT IMPKQKGRYK AI, SFEPTEKT
YADQCLQLVQ LDWE LSAAI SEGFDKMYYD YFPDAAKMIPYADQCLQLVQ LDWE LSAAI SEGFDKMYYD YFPDAAKMIP
DSYRKEKTEE TRNALIEEQA KCSTQLKAVT AHFQTHTTPIDSYRKEKTEE TRNALIEEQA KCSTQLKAVT AHFQTHTTPI
TYR AIHDYF IGRTD LTDA LLSNNFIEPL EITKEIYDLNTYR AIHDYF IGRTD LTDA LLSNNFIEPL EITKEIYDLN
INKRHAEIY GLFKAELFNG NPEKEPKKFQ TAYAKKTGDQINKRHAEIY GLFKAELFNG NPEKEPKKFQ TAYAKKTGDQ
KVLKQLGTVT TTEHENALLR KGYREALCKW IDFTRDFLSKKVLKQLGTVT TTEHENALLR KGYREALCKW IDFTRDFLSK
SFDKFTTYFS GFYENRKNVF YT TTSIDLS SLRPSSQYKDSFDKFTTYFS GFYENRKNVF YT TTSIDLS SLRPSSQYKD
SAEDISTAIP HRIVQDNFPK LGEYYAELNP LLYHISFQRISAEDISTAIP HRIVQDNFPK LGEYYAELNP LLYHISFQRI
F ENCHIFTR LITAVPSLRE AEKEIMDAVE TGKLYLFQIYF ENCHIFTR LITAVPSLRE AEKEIMDAVE TGKLYLFQIY
HFENVKKAIG IFVSTSIEEV NKDFAKGHHG KPNLHTLYWTHFENVKKAIG IFVSTSIEEV NKDFAKGHHG KPNLHTLYWT
FSFPFYNQLL TQTQIDLYNQ GLFSPENLAK TSIKLNGQAEFSFPFYNQLL TQTQIDLYNQ GLFSPENLAK TSIKLNGQAE
LLGGISREAG TEKIKGLNEV LFYRPKSRMK RMAHRLGEKMLLGGISREAG TEKIKGLNEV LFYRPKSRMK RMAHRLGEKM
LNLAIQKNDE TAHI IASLPH LNKKLKDQKT PIPDTLYQELLNLAIQKNDE TAHI IASLPH LNKKLKDQKT PIPDTLYQEL
RFIPLFKQIL SDRNTLSFIL YDYVNHRLSH DLSDEARALLRFIPLFKQIL SDRNTLSFIL YDYVNHRLSH DLSDEARALL
EEFKSDEEVI QSFCKYKTLL PNVITKEVSH EI IKDRRFTSEEFKSDEEVI QSFCKYKTLL PNVITKEVSH EI IKDRRFTS
RNENVLETAE ALFNELNSID DKFFFHVPIT LNYQAANSPSRNENVLETAE ALFNELNSID DKFFFHVPIT LNYQAANSPS
LTHIFISHKK LETISSALCD KFNQRVNAYL KEHPETPI IG HWDTLRNALY ERRISELTGK IDRGERNLIY ITVIDSTGKILTHIFISHKK LETISSALCD KFNQRVNAYL KEHPETPI IG HWDTLRNALY ERRISELTGK IDRGERNLIY ITVIDSTGKI
IT SAKEKVQ RSLKHEDI L LEQRSLNTIQ QFDYQKKLDNIT SAKEKVQ RSLKHEDI L LEQRSLNTIQ QFDYQKKLDN
QEI ISAAGKE LSEAFKQKTS REKERVAARQ AWSWGTIKDQEI ISAAGKE LSEAFKQKTS REKERVAARQ AWSWGTIKD
EILSHAHAAL DQPLPTTLKK LKQGYLSQVI HEIVDLMIHYEILSHAHAAL DQPLPTTLKK LKQGYLSQVI HEIVDLMIHY
QEEKEILKSQ LDSLLGLYHL QAVWLENLN FGFKSKRTGIQEEKEILKSQ LDSLLGLYHL QAVWLENLN FGFKSKRTGI
LDWFAVDESN EVDPEFSARL AEKAVYQQFE KMLIDKLNCLLDWFAVDESN EVDPEFSARL AEKAVYQQFE KMLIDKLNCL
TGIKLEMEPS LSFY KAR Y VLKDYPAEKV GGVLNPYQLTTGIKLEMEPS LSFY KAR Y VLKDYPAEKV GGVLNPYQLT
ATKKPY DQFTSFAKMG TQSGFLFYVPATKKPY DQFTSFAKMG TQSGFLFYVP
APYTSKIDPL TGFVDPFVWKAPYTSKIDPL TGFVDPFVWK
TIKNHESRKH FLEGFDFLHYTIKNHESRKH FLEGFDFLHY
DVKTGDFILH FKMNRNLSFQDVKTGDFILH FKMNRNLSFQ
RGLPGFMPAW DIVFEKNETQRGLPGFMPAW DIVFEKNETQ
FDAKGTPFIA GKRIVPVIENFDAKGTPFIA GKRIVPVIEN
HRFTGRYRDL YPANELIALLHRFTGRYRDL YPANELIALL
EEKGIVFRDG SNILPKLLENEEKGIVFRDG SNILPKLLEN
DDSHAIDTMV ALIRSVLQMRDDSHAIDTMV ALIRSVLQMR
NSNAATGEDY INSPVRDLNGNSNAATGEDY INSPVRDLNG
VCFDSRFQNP EWPMDADANGVCFDSRFQNP EWPMDADANG
AYHIALKGQL LLNHLKESKDAYHIALKGQL LLNHLKESKD
LKLQNGISNQ DWLAYIQELR N LKLQNGISNQ DWLAYIQELR N
(코딩 DNA 서열) (코딩 DNA 서열)  (Coding DNA sequence) (coding DNA sequence)
ATGACACAGnCGAGGGCTTTACCAAC TCCGTGGAGAAGnCAAGCTGAACTTT ATGACACAGnCGAGGGCTTTACCAAC TCCGTGGAGAAGnCAAGCTGAACTTT
CTGTATCAGGTGAGCAAGACACTGCGG CAGATGCCTACACTGGCCTCTGGCTGGCTGTATCAGGTGAGCAAGACACTGCGG CAGATGCCTACACTGGCCTCTGGCTGG
TTTGAGCTGATCCCACAGGGCAAGACC GACGTGAATAAGGAGAAGAACAATGGCTTTGAGCTGATCCCACAGGGCAAGACC GACGTGAATAAGGAGAAGAACAATGGC
CTGAAGCACATCCAGGAGCAGGGOTC GCCATCCTGTTTGTGAAGAACGGCCTGCTGAAGCACATCCAGGAGCAGGGOTC GCCATCCTGTTTGTGAAGAACGGCCTG
ATCGAGGAGGACAAGGCCCGCAATGAT TACTATCTGGGCATCATGCCAAAGCAGATCGAGGAGGACAAGGCCCGCAATGAT TACTATCTGGGCATCATGCCAAAGCAG
CACTACAAGGAGCTGAAGCCCATCATC AAGGGCAGGTATAAGGCCCTGAGCnCCACTACAAGGAGCTGAAGCCCATCATC AAGGGCAGGTATAAGGCCCTGAGCnC
GATCGGATCTACAAGACCTATGCCGAC GAGCCCACAGAGAAAACCAGCGAGGGCGATCGGATCTACAAGACCTATGCCGAC GAGCCCACAGAGAAAACCAGCGAGGGC
CAGTGCCTGCAGCTGGTGCAGCTGGAT ITTGATAAGATGTACTATGACTACTTC OXVOIVOVOOVOXOIOIOOVOOWOOV 3 L33XDVD0IV010W00V03V030V 3IVOID3W3D30ID01033000V300 OWIILOVOOVODIOOIVOIUOIOIO 0V0IV0I3I0I33V03V3D3I013V0V 33V3W00VIV033I0I03IV0V30W ονοχννοιοινιανοονΐΰΐοονοονο LU0I33033IV3 LVDVDV3V33013CAGTGCCTGCAGCTGGTGCAGCTGGAT ITTGATAAGATGTACTATGACTACTTC OXVOIVOVOOVOXOIOIOOVOOWOOV 3 L33XDVD0IV010W00V03V030V 3IVOID3W3D30ID01033000V300 OWIILOVOOVODIOOIVOIUOIOIO 0V0IV0I3I0I33V03V3D3I013V0V 33V3W00VIV0οονοο
3VI0ID33V0V03300IW3D0DWW D0133331V3XVDV3330V3V0V0IV0 OV3IVOOW0100WOWDW0130IV IWOW0V331V33ODI3IWO13D10 0WOV0VO0DI3O033V3V3O01VO0V 0V03WD133O00W3IV0W0V033V 3W0XV30V33I0WID3303DVX0XX 3VI0ID33V0V03300IW3D0DWW D0133331V3XVDV3330V3V0V0IV0 OV3IVOOW0100WOWDW0130IV IWOW0V331V33ODI3IWO13D10 0WOV0VO0DI3O033V3V3O01VO0V0V03VVW3V330W3V3V3W3
0I09V03300V3000IW0130W3IV 0I33V33WIVI0133V33IV0V333V D0W0V0W3300I33W0V0V3DI3I 0V3V0VDI00I00V33WIVLL11L30 ι ιοχοα οοονοοχχνχοχονονονο 3 L3DJJJ2913DVi)0V0DlV0DXD0V 0I09V03300V3000IW0130W3IV 0I33V33WIVI0133V33IV0V333V D0W0V0W3300I33W0V0V3DI3I 0V3V0VDI00I00V33WIVLL11L30 ι ιοχοα οοονοοχχνχοχον13DVJV0J0
013IWID30W3003V03V33090W 33V0I331L3IV3OD31V30 0WDW 013IWID30W3003V03V33090W 33V0I331L3IV3OD31V30 0WDW
OOOJJIOVOOWDWIVIOIVOVOOLI 0193WDV0JXL3V3DV00030X333V  OOOJJIOVOOWDWIVIOIVOVOOLI 0193WDV0JXL3V3DV00030X333V
IV001V3IV0V00W0V030031W3V 3IV3V3ID LW0VDDW1 L0VV303 0V33L130VD1VDV03VI0I30I3333 DX13W0V00Vm03IVD030V3V33 XW0I33V033DXVI0VJ,0VD3i)00I3 OIVODOVDVDDVOXVIVOOVOODOOOV 0VO0WIVI0V3I3I03IV30O0OOI3 3LL0ID3W3WO0V3W0VOIV1LU IV001V3IV0V00W0V030031W3V 3IV3V3ID LW0VDDW1 L0VV303 0V33L130VD1VDV03VI0I30I3333 DX13W0V00Vm03IVD030V3V33 XW0I33V033DXVI0VJ3V03V0VIVO3V0V3V3
30V13IDI3IVD3IV13I33W3V0W 3003310LL3VI33W3V1110W3V0 33VIVJ,9W33X0I3UJ VOO0W3V 3JJL30VO033I33133303W0V00V3 3J 0V33IVm0W33IOI330O0V0 3V033W3V33V0I033V3000133V3 VOVOViaOOOWOVOOVODOOOOWW OWOimOOWOOOlWl LOlOOVO owoooovioaovovovDiuowow 3OO0W31L0I30O00W3VI3IV0V0 30V13IDI3IVD3IV13I33W3V0W 3003310LL3VI33W3V1110W3V0 33VIVJ, 9W33X0I3UJ VOO0W3V 3JJL30VO033I33133303W0V00V3 3J 0V33IVm0W33IOI330O0V0 3V033W3V33V0I033V3000133V3 VOVOViaOOOWOVOOVODOOOOWW OWOimOOWOOOlWl LOlOOVO owoooovioaovovovDiuowow 3OO0W31L0I30O00W3VI3IV0V0
V33OVO0W0V0X03IW3W0133V0 3003V0V0V0WIW3IV330XV033V DVIOIVOVDOVWOVOIVOVmOIOO 0I33W0V0V3V003300DIV3 L3VI 0V03IV3111W3W33I010013DIV ovoovooxvomwaooivivovooD V33OVO0W0V0X03IW3W0133V0 3003V0V0V0WIW3IV330XV033V DVIOIVOVDOVWOVOIVOVmOIOO 0I33W0V0V3V003300DIV3 L3VI 0V03IV3111W3W33I010013DIV ovoovooxvomwaooivivovoD
33333W3V3V330V0V01 L3V3000 0V30V 0V03IV3I33303W00W0V V3V0ID3300W0130V333V33Vm 0V03V033WW0V09WV0V1VX33I OVWOOOIVOIVOWOOOOODIVDIOO 3V03IV0300aoaOV3133WOVODOI  33333W3V3V330V0V01 L3V3000 0V30V 0V03IV3I33303W00W0V V3V0ID3300W0130V333V33Vm 0V03V033WW0V09WV0V1VX33I OVWOOOIVOIVOWOOOOODIVDIOO 3V03IV0300aoaOVI133WOV
C6C6
.CMO/9TOZaM/X3d t6l7660//J0Z OAV TGCAAGTACAAGACACTGCTGAGAAAC AAGGATAGGCGCTTTACCAGCGACAAG.CMO / 9TOZaM / X3d t6l7660 // J0Z OAV TGCAAGTACAAGACACTGCTGAGAAAC AAGGATAGGCGCTTTACCAGCGACAAG
GAGAACGTGCTGGAGACAGCCGAGGCC CTTTTTCCACGTGCCTATCACACTGGAGAACGTGCTGGAGACAGCCGAGGCC CTTTTTCCACGTGCCTATCACACTG
CTGITTAACGAGCTGAACAGCATCGAC AACTATCAGGCCGCCAATOCCCATCTCTGITTAACGAGCTGAACAGCATCGAC AACTATCAGGCCGCCAATOCCCATCT
CTGACACACATCTTCATCAGCCACAAG AAGTTCMCCAGAGGGTGAATGCCTACCTGACACACATCTTCATCAGCCACAAG AAGTTCMCCAGAGGGTGAATGCCTAC
AAGCTGGAGACAATCAGCAGCGCCCTG CTGAAGGAGCACCCCGAGACACCTATCAAGCTGGAGACAATCAGCAGCGCCCTG CTGAAGGAGCACCCCGAGACACCTATC
TGCGACCACTGGGATACACTGAGGAAT ATCGGCATCGATCGGGGCGAGAGAAACTGCGACCACTGGGATACACTGAGGAAT ATCGGCATCGATCGGGGCGAGAGAAAC
GCCCTGTATGAGCGGAGAATCTCCGAG CTGATCTATATCACAGTGATCGACTCCGCCCTGTATGAGCGGAGAATCTCCGAG CTGATCTATATCACAGTGATCGACTCC
CTGACAGGCAAGATCACCAAGTCTGCC ACCGGCAAGATCCTGGAGCAGCGGAGCCTGACAGGCAAGATCACCAAGTCTGCC ACCGGCAAGATCCTGGAGCAGCGGAGC
AAGGAGAAGGTGCAGCGCAGCCTGAAG CTGAACACCATCCAGCAGTTTGAmCAAGGAGAAGGTGCAGCGCAGCCTGAAG CTGAACACCATCCAGCAGTTTGAmC
CACGAGGATATCMCCTGCAGGAGATC CAGMGAAGCTGGACMCAGGGAGAAGCACGAGGATATCMCCTGCAGGAGATC CAGMGAAGCTGGACMCAGGGAGAAG
ATCTCTGCCGCAGGCAAGGAGCTGAGC GAGAGGGTGGCAGCAAGGCAGGCCTGGATCTCTGCCGCAGGCAAGGAGCTGAGC GAGAGGGTGGCAGCAAGGCAGGCCTGG
GAGGCOTCAAGCAGAAAACCAGCGAG TCTGTGGTGGGCACAATCAAGGATCTGGAGGCOTCAAGCAGAAAACCAGCGAG TCTGTGGTGGGCACAATCAAGGATCTG
ATCCTGTCCCACGCACACGCCGCCCTG AAGCAGGGCTATCTGAGCCAGGTCATCATCCTGTCCCACGCACACGCCGCCCTG AAGCAGGGCTATCTGAGCCAGGTCATC
GATCAGCCACTGCCTACAACCCTGAAG CACGAGATCGTGGACCTGATGATCCACGATCAGCCACTGCCTACAACCCTGAAG CACGAGATCGTGGACCTGATGATCCAC
AAGCAGGAGGAGAAGGAGATCCTGAAG TACCAGGCCGTGGTGGTGCTGGAGAACAAGCAGGAGGAGAAGGAGATCCTGAAG TACCAGGCCGTGGTGGTGCTGGAGAAC
TCTCAGCTGGACAGCCTGCTGGGCCTG CTGAArrrCGGCTTTAAGAGCAAGAGGTCTCAGCTGGACAGCCTGCTGGGCCTG CTGAArrrCGGCTTTAAGAGCAAGAGG
TACCACCTGCTGGACTGGITTGCCGTG ACCGGCATCGCCGAGAAGGCCGTGTACTACCACCTGCTGGACTGGITTGCCGTG ACCGGCATCGCCGAGAAGGCCGTGTAC
GATGAGTCCAACGAGGTGGACCCCGAG CAGCAGTTCGAGAAGATGCTGATCGATGATGAGTCCAACGAGGTGGACCCCGAG CAGCAGTTCGAGAAGATGCTGATCGAT
TTCTCTGCCCGGCTGACCGGCATCAAG AAGCTGAATTGCCTGGTGCTGAAGGACTTCTCTGCCCGGCTGACCGGCATCAAG AAGCTGAATTGCCTGGTGCTGAAGGAC
CTGGAGATGGAGCCTTCTCTGAGCTTC TATCCAGCAGAGAAAGTGGGAGGCGTGCTGGAGATGGAGCCTTCTCTGAGCTTC TATCCAGCAGAGAAAGTGGGAGGCGTG
TACAACAAGGCCAGAAATTATGCCACC CTGAACCCATACCAGCTGACAGACCAGTACAACAAGGCCAGAAATTATGCCACC CTGAACCCATACCAGCTGACAGACCAG
AAGAAGCCCTAC TTCACCTCCTTTGCCAAGATGGGCACCAAGAAGCCCTAC TTCACCTCCTTTGCCAAGATGGGCACC
CAGTCTGGCTOCTGTTTTACGTGCCTCAGTCTGGCTOCTGTTTTACGTGCCT
GCCCCATATACATCTAAGATCGATCCCGCCCCATATACATCTAAGATCGATCCC
CTGACCGGCnCGTGGACCCOTCGTGCTGACCGGCnCGTGGACCCOTCGTG
TGGAAAACCATCAAGAATCACGAGAGCTGGAAAACCATCAAGAATCACGAGAGC
CGCMGCACTTCCTGGAGGGCTTCGACCGCMGCACTTCCTGGAGGGCTTCGAC
TTTCTGCACTACGACGTGAAAACCGGCTTTCTGCACTACGACGTGAAAACCGGC
GACnCATCCTGCACITTAAGATGAAC AGAAATCTGTCOTCCAGAGGGGCCTG GACnCATCCTGCACITTAAGATGAAC AGAAATCTGTCOTCCAGAGGGGCCTG
CCCGGCTTTATGCCTGCATGGGATATC GTGTTCGAGAAGAACGAGACACAGTTT GACGCCAAGGGCACCCCTTTCATCGCC GGCAAGAGAATCGTGCCAGTGATCGAG AATCACAGATOACCGGCAGATACCGG GACCTGTATCCTGCCAACGAGCTGATC GCCCTGCTGGAGGAGAAGGGCATCGTG TOAGGGATGGCTCCAACATCCTGCCA MGCTGCTGGAGAATGACGATTCTCAC GCCATCGACACCATGGTGGCCCTGATC CGCAGCGTGCTGCAGATGCGGAACTCC AATGCCGCCACAGGCGAGGACTATATC AACAGCCCCGTGGGCGATCTGAATGGC GTGTGCTTCGACTCCCGGnTCAGAAC  CCCGGCTTTATGCCTGCATGGGATATC GTGTTCGAGAAGAACGAGACACAGTTT GACGCCAAGGGCACCCCTTTCATCGCC GGCAAGAGAATCGTGCCAGTGATCGAG AATCACAGATOACCGGCAGATACCGG GACCTGTATCCTGCCAACGAGCTGATC GCCCTGCTGGAGGAGAAGGGCATCGTG TOAGGGATGGCTCCAACATCCTGCCA MGCTGCTGGAGAATGACGATTCTCAC GCCATCGACACCATGGTGGCCCTGATC CGCAGCGTGCTGCAGATGCGGAACTCC AATGCCGCCACAGGCGAGGACTATATC AACAGCCCCGTGGGCGATCTGAATGGC GTGTGCTTCGACTCCCGGnTCAGAAC
CCAGAGTGGCCCATGGACGCCGATGCC AATGGCGCCTACCACATCGCCCTGAAG GGCCAGCTGCTGCTGAATCACCTGAAG GAGAGCAAGGATCTGAAGCTGCAGAAC GGCATCTCCAATCAGGACTGGCTGGCC TACATCCAGGAGCTGCGCAAC WT AsCpf l (서열번호 43)의 아미노산을 두 개의 하프 도메인으로 나눴고, 각각의 하프 도메인은 CMV promoter에 의해 독립적으로 발현할 수 있는 재조합 백터로 제작하였다. 재조합 백터의 경우 세포 내 핵으로 전달하는 데 필요한 핵 위치 신호를 각각의 하프 도메인에 추가하였고, CMV promoter 서열 (서열번호 64)과 poly A신호를 포함하였다 (도 29b 참조; original backbone vector :  CCAGAGTGGCCCATGGACGCCGATGCC AATGGCGCCTACCACATCGCCCTGAAG GGCCAGCTGCTGCTGAATCACCTGAAG GAGAGCAAGGATCTGAAGCTGCAGAAC GGCATCTCCAATCAGGACTGGCTGGCC TACATCCAGGAGCTGCG. In the case of recombinant vectors, the nuclear position signal required for delivery to the intracellular nucleus was added to each half domain and included the CMV promoter sequence (SEQ ID NO: 64) and the poly A signal (see FIG. 29B; original backbone vector:
pcDNA3.1 ( Invi t rogen ) , HA: YPYDVPDYA , SV40 NLS: PKKKRKV , nuc 1 eop 1 asm i n NLS: KRPAATKKAGQAKK K, 3xHA: YPYDVPDYAYPYDVPDYAYPYDVPDYA) . pcDNA3.1 (Invitrogen), HA: YPYDVPDYA, SV40 NLS: PKKKRKV, nuc 1 eop 1 asm i n NLS: KRPAATKKAGQAKK K, 3xHA: YPYDVPDYAYPYDVPDYAYPYDVPDYA).
16.2. Spl it-Cpfl를 이용한 유전자 교정 Split-Cpfl의 각 하프 도메인을 발현시키는 재조합 백터들과 DNMTl-2> 표적 (CTGATGGTCCATGTCTGTTACTC: 서열번호 19)에 작동하는 crRNA (표 4의 설명 조하여 제ᅳ작)를 발현하는 폴라스미드를 리포펙타민 (lipofectamin)를 이용해 HEK293T17 세포 (ATCC) 내에 전달하였다. 16.2. Genetic Correction with Spl it-Cpfl Lipofectamine is expressed on recombinant vectors expressing each half-domain of Split-Cpfl and on crucidates expressing crRNA (produced in accordance with the description in Table 4) that act on DNMTl-2> target (CTGATGGTCCATGTCTGTTACTC: SEQ ID NO: 19). (lipofectamin) was used to deliver into HEK293T17 cells (ATCC).
Split-Cpfl의 각 하프 도메인을 발현시키는 재조합 백터는 다음과 같이 제작하였다 (도 29b 참조): pADl (Split-Cpfl 하프도메인 1 서열 포함)은 pcDNA3.1 백터 (Invitrogen)에 각 split site에 대한 하프도메인 1을 Gibson 클로닝 방식을 통하여 제작하였으며, 각 하프도메인은 pYOlO (Addgene)을 template로 해서  Recombinant vectors expressing each half-domain of Split-Cpfl were constructed as follows (see FIG. 29B): pADl (including Split-Cpfl half-domain 1 sequence) was generated for each split site in pcDNA3.1 vector (Invitrogen). Domain 1 was produced by Gibson cloning method, and each half-domain has pYOlO (Addgene) as template
PCR해서 준비한 것이다. Gibson cloning진행 시 , 백터를 절단하기 위하여 제한효소 Hind3 및 EcoRl를 사용하였다. pAD2는 Split-Cpfl 하프도메인 2 서열올 포함하는 것으로, pADl 제조 방법을 참조하여 제작하였다. PCR prepared. During Gibson cloning, restriction enzymes Hind3 and EcoRl were used to cut the vector. pAD2 contains a Split-Cpfl half domain 2 sequence, which was prepared with reference to the pADl preparation method.
하기의 유전자 교정 시험은 모두 HEK293T17 세포 (ATCC)에서 진행하였다. 이후 HEK293T17 세포로부터 genomic DNA를 추출하였고, 讓 T ?> 표적 부원를 PCR로 증폭 후 (프라이머 서열 : D丽 T1-3-1F: ccagaagtcccgtgcaaatc, DNMT1- 3-1R: ATCTTTCTCAAGGGGCTGCT , D匪 T1-3-2F: cagtgcatgttggggattcc, PCR조건: 1st PCR Tm: 60 °C, 2nd PCR Tm: 60°C), T7E1 assay 방법으로 유전체 교정이 일어났는지 확인하였다. The following genetic correction tests were all conducted on HEK293T17 cells (ATCC). After genomic DNA was extracted from HEK293T17 cells, 讓 T?> Target member was amplified by PCR (primer sequence: D lia T1-3-1F: ccagaagtcccgtgcaaatc, DNMT1- 3-1R: ATCTTTCTCAAGGGGCTGCT, D 匪 T1-3-2F: cagtgcatgttggggattcc, PCR conditions: 1st PCR Tm: 60 ° C, 2nd PCR Tm: 60 ° C), T7E1 assay method was confirmed whether the genome calibration occurred.
상기 얻어진 아가로스 겔 분석 결과를 도 30a에 나타내었다. 도 30a에 나타난 바와 같이 , Split-AsCpfl의 각 하프 도메인을 개별적으로 발현시킨 경우에는 유전체 교정이 발생한 것을 확인할 수 없었지만 두 개의 하프 도메인을 같이 발현시킨 경우에 대해서는 SpHt-1 부터 Split-4의 4종류 모두 유전체 교정이 일어나 T7E1 assay에 의해 잘린 DNA조각이 아가로스 겔 상 나타나는 것올 확인할 수 있었다. ' The obtained agarose gel analysis results are shown in FIG. 30A. As shown in FIG. 30A, genome correction did not occur when each half domain of Split-AsCpfl was individually expressed, but four types of SpHt-1 to Split-4 were expressed when two half domains were expressed together. All genomes were corrected and the DNA fragments cut by the T7E1 assay were confirmed to appear on the agarose gel. '
유전체 교정 효율을 정량적으로 분석하기 위해 targeted deep-sequencing을 진행하여 그 결과를 도 30b 에 나타내었다. 도 30b에 나타낸 바와 같이, Split- AsCpfl올 구성하는 하프 도메인들은 발현 후 융합되어 AsCpfl 단백질을 형성한 경우에 대해서 유전체 교정을 일으키는 것을 확인할 수 있었고, 유전체 교정 효율은 WT AsCpfl 단백질을 두 개의 조각으로 나눈 위치에 따라차이가 나타나는 것을 확인할 수 있었다. 또한, 표적 위치에 따른 Split-AsCpfl에 의한 유전체 교정 효율을 측정하기 위해 W -3 표적에 더하여 , CCR5—1 표적 (GTGGGCMCATGCTGGTCATCCT; 서열번호 24)과 QWn-4 표적 (TTTCCCTTCAGCTAAMTAMGG; 서열번호 20)을 추가해서 세포 실험을 진행하여 Targeted deep-sequencing 방식으로 유전체 교정 효율을 Targeted deep-sequencing was performed to quantitatively analyze genome calibration efficiency, and the results are shown in FIG. 30B. As shown in FIG. 30B, the half domains constituting Split-AsCpflol were found to cause genome correction for the case of fusion after expression to form AsCpfl protein, and genome calibration efficiency was divided into two fragments of WT AsCpfl protein. It was confirmed that the difference appeared depending on the location. In addition, in addition to the W-3 target, CCR5-1 target (GTGGGCMCATGCTGGTCATCCT; SEQ ID NO: 24) and QWn-4 target (TTTCCCTTCAGCTAAMTAMGG; SEQ ID NO: 20) were added to determine the genome calibration efficiency by Split-AsCpfl according to the target position. Cell experiments to improve genome calibration efficiency with targeted deep-sequencing.
측정하였다. 상기 얻어진 indel frequency(%)를 도 30c에 나타내었다. 도 30c에 나타낸 바와 같이, Split-1-AsCpfl부터 Split-4— AsCpfl의 경우 세 가지 표적에 대해 모두 작동했고, Spl -3-AsCpfl의 경우 WT AsCpfl과 비교했을 때도 높은 효율로 유전체를 교정할 수 있는 것을 확인하였다. . Measured. The obtained indel frequency (%) is shown in FIG. 30C. As shown in FIG. 30C, all three targets were performed for Split-1-AsCpfl to Split-4—AsCpfl, and Spl-3-AsCpfl was able to correct genomes with high efficiency even when compared to WT AsCpfl. It confirmed that there was. .
본 실시예는 Cpfl 유전자사이즈가 커서 바이러스 생산 및 세포 내 전달 효율이 떨어지는 문제를 해결함과 동시에 기존 WT Cpfl과 비교했을 때도 높은 효율로 작동하는 Split 위치를 찾았다는 점에서 해당 기술의 유용함을 입증한다.  This example demonstrates the usefulness of the technique in that the large Cpfl gene size solves the problem of decreased virus production and intracellular delivery efficiency, and at the same time finds a split position that operates with higher efficiency compared to the conventional WT Cpfl. .
Split-Cpfl은 각 하프 도메인이 결합해서 표적 위치에 작동하기에 결합을 특정 심호 물질을 이용해 조절할 수 있으면 바이러스를 통해 세포 내로 전달된 유전자 가위의 작동을 신호 물질을 이용해 원하는 시기에만 작동시키는 것이 가능하다. 이러한 방법을 구현하기 위해 Split-Cpfl의 각 하프 도메인에 FRB 단백질 (서열번호 81: EMWHEGLEEA SRLYFGERNV KGMFEVLEPL HAMMERGPQT LKETSFNQAY GRDLMEAQEW CRKYMKSGNV KDLTQA DLY YHVFRRISKQ)과 FKBP 단백질 (서열번호 82:  Split-Cpfl can be controlled only at specific times with the signaling material, as each half-domain binds to the target site and can regulate binding with specific deep material. . In order to implement this method, FRB protein (SEQ ID NO: 81: EMWHEGLEEA SRLYFGERNV KGMFEVLEPL HAMMERGPQT LKETSFNQAY GRDLMEAQEW CRKYMKSGNV KDLTQA DLY YHVFRRISKQ) and FKBP protein (SEQ ID NO: 82:
GVQVETISPG DGRTFPKRGQ TCWHYTGML EDGKKFDSSR DRNKPFKFML GKQEVIRGWE EGVAQMSVGQ RAKLTISPDY AYGATGHPGI IPPHATLVFD VELLKLE)을 융합시켰다 (도 31a 참조; 이하 Inducible-Split— Cpfl로 표현함). 도 31a에 나타난 pADl 및 pAD2는 앞서 설명한 과정을 참조하여 제조하였다. FRB, FKBP에 해당하는 서열은 oligo GVQVETISPG DGRTFPKRGQ TCWHYTGML EDGKKFDSSR DRNKPFKFML GKQEVIRGWE EGVAQMSVGQ RAKLTISPDY AYGATGHPGI IPPHATLVFD VELLKLE) were fused (see FIG. 31A; hereinafter Inducible-Split—Cpfl). PADl and pAD2 shown in Figure 31a was prepared with reference to the above-described process. The sequence corresponding to FRB, FKBP is oligo
extension과정을 통하여 준비하고, 상기 준비된 FRB FKBP는 overlapping PCR 과정을 통해 하프도메인과 연결하였으며 , 하프도메인— FRB또는 하프도메인 -FKBP PCR product를 Gibson클로닝 과정을 통해서 상기 pADl 및 pAD2에 클로닝하였다. Gibson 클로닝에서 백터를 절단가히 위하여 제한효소 EcoRl 및 Hind3 를 사용하였다. Prepared through the extension process, the prepared FRB FKBP was connected to the half domain through the overlapping PCR process, half domain—FRB or half domain -FKBP PCR product was cloned into the pADl and pAD2 through the Gibson cloning process. Restriction enzymes EcoRl and Hind3 were used to cut the vector in Gibson cloning.
FRB와 FKBP는 rapamycin이라는 물질에 강력하게 결합하는 성질을 가지고 있는 것으로 알려진 단백질로써, FRB와 FKBP는 각각 rapamycin구조 다른 위치에 결합하기 때문에 각 단백질이 rapamycin에 결합하는 것을 서로 방해하지 않는다. 융합된 단백질은 Split-Cpfl 각 하프 도메인들이 자발적으로 결합하는 성질을 저해해 rapamycin이 없는 조건에서는 결합과 유전체 교정을 방해하지만, rapamycin이 있는 조건에서는 rapamycin을 중심으로 강력하게 결합해서 각 하프 도메인을 결집하고 결합을 유도해 유전체 교정을 촉진 시킬 것으로 예상하고 FRB and FKBP are known to bind strongly to a substance called rapamycin. FRB and FKBP do not interfere with each other's binding to rapamycin because FRB and FKBP bind to different positions of rapamycin. The fused protein inhibits the spontaneous binding of the Split-Cpfl half domains, preventing the binding and genome correction in the absence of rapamycin. In the presence of rapamycin, strong binding to rapamycin is expected to aggregate and induce binding of each half domain to promote genome correction.
HE 293T17 세포에서 실험을 진행하였다. The experiment was conducted on HE 293T17 cells.
DNTM1-3 표적 crRNA를 발현하는 플라스미드와 FRB또는 FKBP가융합된 하프 도메인올 발현하는 플라스미드 (pcDNA3.1)를 세포 내 천달하였다. 200nM조건으로 rapamycin을 처리하고 transfection후 72 시간 뒤 샘플을 분석하여 유전체 교정 여부를 targeted deep-sequencing 방식으로 확인하였다. 그 결과를 도 31b에 나타내었다. 도 31b에 나타낸 바와 같이, FRB또는 FKBP 단백질이 융합된  Plasmids expressing DNTM1-3 target crRNA and half domainol expressing plasmids (pcDNA3.1) fused with FRB or FKBP were transduced intracellularly. After treatment with rapamycin at 200 nM condition, 72 hours after transfection, the samples were analyzed to determine whether the genome was corrected by targeted deep-sequencing. The result is illustrated in FIG. 31B. As shown in FIG. 31B, the FRB or FKBP protein is fused.
Inducible-Split-Cpfl의 경우 Inducible-Spl it_l부터 Indue ibl e-Spl it-4모두 rapamycin이 있는 조건에서는 유전체 교정 작동이 저해되고, rapamycin이 있는 조건에서는 유전체 교정이 촉진되는 경향을 나타냈다. 특히, Inducible-Spl it-1 과 Inducible-Spl it-4는 rapamycin이 없는 조건에서는 Inducible— Split를 처리하지 않은 조건 수준으로 유전체 교정이 거의 일어나지 않고 rapamycin이 있는 In Inducible-Split-Cpfl, Inducible-Spl it_l to Indue ibl e-Spl it-4 showed a tendency to inhibit genome correction in the presence of rapamycin and to promote genome correction in the presence of rapamycin. Especially, Inducible-Spl it-1 and Inducible-Spl it-4 are inducible-split condition without rapamycin.
조건에서만 높은 효율로 작동함을 확인했고, 처음에 기대했던 목적에 가장 It has been found to operate at high efficiency only under conditions and best
부합되는 경우임을 확인할 수 있었다. It was confirmed that the case.
Inducible-Split-1과 Inducible-Split -4는厦 77-3 표적에 더하여, ΗΒΒΛ 표적 (AGTCCmGGGGATCTGTCCACT; 서열번호 40), CCR5-8 표적  Inducible-Split-1 and Inducible-Split-4 are the ΒΒΛ target (AGTCCmGGGGATCTGTCCACT; SEQ ID NO: 40), CCR5-8 target, in addition to the 77-3 target.
(GACACCGAAGCAGAGI TTTAGG; 서열번호 49), HPRT1-1 표적 (CTGACCTGCTGGATTACATCAAA; 서열번호 27)을 추가해서 실험올 진행했고, 모든 표적에서 rapamycin을 처리한 조건에서 Inducible-Split-Cpfl에 의한 유도적 유전체 교정 효율을 targeted deep- sequencing 방식으로 분석하여 그 결과를 도 31c 내지 도 31f 에 나타내었다. 도 31c 내지 도 31f 에 나타난 바와 같이, 상기한 표적에 대한 Inducible-Split- Cpfl들도 유의미하게 작동함을 확인할 수 있다.  (GACACCGAAGCAGAGI TTTAGG; SEQ ID NO: 49), the HPRT1-1 target (CTGACCTGCTGGATTACATCAAA; SEQ ID NO: 27) was performed in all experiments, and inducible genome correction efficiency by Inducible-Split-Cpfl under rapamycin treatment on all targets Analysis by the targeted deep- sequencing method is shown in Figure 31c to 31f. As shown in FIGS. 31C to 31F, it can be seen that the Inducible-Split-Cpfls for the above targets also operate significantly.
상기와 같이 찾은 Split-Cpfl 정보를 기반으로, 발현용 카세트를 MV 바이러스 백터에 옮기는 일을 진행하였다. 제작된 MV바이러스 백터 (original backbone vector: AAV-MCS expression vector (VPK-410, Cell Biolabs, INC))는 Split-Cpfl(Split-3-AsCpfl)의 하프도메인을 발현할 수 있는 카세트와 AsCpf 1의 crRNA를 발현할 수 있는 카세트를 포함하고 있는 형태지만, 야생형 AsCpfl를 두 조각으로 나눴기 때문에 전체 크기가 바이러스 패키징의 한계 사이즈로 알려진 4.7 kb보다 작은 2.1 kb (하프도메인 1)과 3.8 kb (하프도메인 2)으로 제작할 수 있었다 (도 32a 참조; L Based on the Split-Cpfl information found as above, the expression cassette was transferred to the MV virus vector. The produced MV virus vector (original backbone vector: AAV-MCS expression vector (VPK-410, Cell Biolabs, INC)) is a cassette capable of expressing a half-domain of Split-Cpfl (Split-3-AsCpfl) and AsCpf 1 Form containing a cassette capable of expressing crRNA, but because the wild type AsCpfl was divided into two fragments, the overall size is known as the limit size of viral packaging 4.7 It was possible to produce 2.1 kb (halfdomain 1) and 3.8 kb (halfdomain 2) smaller than kb (see FIG. 32A; L
Split— Cpfl을 이용한 경우 추가적으로 시뭔스를 더 넣어도 바이러스 패키징에 문제가 없으므로, Split-Cpfl에 특정 기능을 가진 단백질 등을 결합해서 발현시키는 것도 가능할 것으로 기대된다.  In the case of Split-Cpfl, additional sequence is not a problem for virus packaging. Therefore, it is expected that it is possible to combine and express Split-Cpfl with proteins with specific functions.
제작된 MV-Split-3-Cpfl 백터가 작동하는지 확인하기 위해서, 우선 플라스미드 형태로 세포에 전달해 유전체 교정이 일어나는지 확인하였다. AAV- In order to confirm that the constructed MV-Split-3-Cpfl vector works, it was first delivered to the cells in the form of plasmid to confirm that genome correction occurred. AAV-
Split-3-Cpfl과 해당 백터에 대한 대조군으로 MV-Cpfl 백터 (전장 AsCpfl포함), p3-Split-3-Cpfl 백터 (Split-3-Cpfl을 pcDNA3.1 백터 (addgene)에 클로닝함), 및 p3-Cpfl 백터 (전장 AsCpfl을 p3 백터 클로닝함)를 각각 사용했을 때의 유전자 교정 효율을 T7E1 assay방식으로 측정한 결과를 도 32b에 나타내었다. 도 32b에 나타낸 바와 같이, p3 백터에서 실험한 경향과 유사하게 MV— Split-Cpfl 백터가 대조군들의 유전체 교정 효율에 근접하게 작동함을 확인할 수 있었다. 제작된 바이러스 백터를 이용하면 실제로 MV제작과 이를 이용한 in vivo genome editing 실험에 사용할 수 있을 것으로 기대된다. 실시예 17: Cpfl을 이용한 Hifl-alpha단백질 knock-out 시험 MV-Cpfl vector (including full length AsCpfl), p3-Split-3-Cpfl vector (cloning Split-3-Cpfl to pcDNA3.1 addgene) as a control for Split-3-Cpfl and the corresponding vector; and The results of measuring the genetic correction efficiency when the p3-Cpfl vector (full length AsCpfl cloned the p3 vector) were measured by the T7E1 assay method are shown in FIG. 32B. As shown in FIG. 32B, similar to the trends tested in the p3 vector, it was confirmed that the MV—Split-Cpfl vector operates close to the genome calibration efficiency of the controls. By using the viral vector, it is expected to be used for MV production and in vivo genome editing experiment using the same. Example 17 Hifl-alpha Protein Knock-out Test Using Cpfl
Hiflalpha 단백질은 세포내 환경이 hypoxia상태가 될 때 vascular endothelial growth factor-A (VEGF-A)를 발현하는 유전자에 특이적으로 결합하여 유전자의 전사를 활성시키는 전사인자이다. 당뇨성 망막병증이나 노인성 황반변성 등과 같은 안구질환에는 세포의 비정상적인 hypoxia상태로 인해 VEGFA의 비정상적인 발현이 유도된다. VEGFA를 활성화시키는 Hifla 전사인자를 LbCpfl을 통해 넉아웃함으로써 안구 질환 치료 개발로 가능성이 있다. 본 실시예에서는 아데노부속 바이러스를 이용하여 LbCpfl 및 Hifla유전자를 타겟팅하는 CrRNA의 효과적인 안구 내 전달을 입증함으로써, 안구 질환 치료 가능성올 보였다.  Hiflalpha protein is a transcription factor that activates gene transcription by specifically binding to genes expressing vascular endothelial growth factor-A (VEGF-A) when the intracellular environment becomes hypoxia. Ocular diseases such as diabetic retinopathy and senile macular degeneration cause abnormal expression of VEGFA due to abnormal hypoxia of the cells. There is potential for developing ocular disease therapies by knocking out Hifla transcription factors activating VEGFAs through LbCpfl. In this example, we demonstrated the possibility of treating ocular diseases by demonstrating the effective intraocular delivery of CrRNA targeting LbCpfl and Hifla genes using adeno-associated viruses.
Hypoxia-inducible factor l(Hifl)-alpha단백질을 인코딩하는 Hifla 유전자의 대립 유전자 넉아웃을 위해 사용할 수 있는 표적 서열로서 Hifla 엑손에 존재하는 5'-RGEN target -3' 서열올 표적으로 하는 crRNA (LbCpfl)를 제작하였다.  CrRNA (LbCpfl) that targets the 5'-RGEN target -3 'sequence present in the Hifla exon as a target sequence that can be used for allelic knockout of the Hifla gene encoding the hypoxia-inducible factor l (Hifl) -alpha protein ) Was produced.
【표 36】 Hifla 유전자 넉아웃을 위해 사용 가능한 Cpfl sgRNA(single guide RNA; crRNA)의 표적 서열 Table 36 Target sequence of Cpfl single guide RNA (crRNA) available for Hifla gene knockout
Figure imgf000101_0001
Figure imgf000101_0001
상기 표적 서열에 대한 LbCpfl crRNA는 앞서 표 4에 기재된 서열번호 37의 타겟팅 서열 부위 (밑줄로 표시)를 상기 표 36의 표적 서열에 해당하는 서열 (즉, 상기 표적 서열에서 T를 U로 치환함)로 대체한 것이다. LbCpfl 단백질을 암호화하는 DNA서열 및 이에 작동가능하게 연결된 CMV promoter (서열번호 64)를 포함하는 pcDNA3.1 백터 (Invitrogen) (LbCpfl LbCpfl crRNA for the target sequence is a sequence corresponding to the target sequence of SEQ ID NO: 37 (indicated by underlined) shown in Table 4 above (ie, replacing T with U in the target sequence). Is replaced by. PcDNA3.1 Invitrogen (LbCpfl) comprising a DNA sequence encoding the LbCpfl protein and a CMV promoter (SEQ ID NO: 64) operably linked thereto
plasmid)와 상기 Hifla 유전자에 대한 각각의 crRNA (표 36의 LB-TS6 포함)를 암호화하는 DNA를 포함하는 플라스미드들 (PUC19 백터 (Addgene; Lb-crRNA plasmids and DNA encoding each crRNA (including LB-TS6 in Table 36) for the Hifla gene ( P UC19 vector (Addgene; Lb-crRNA)
plasmid)에 도입)을 리포펙타민 ( 1 ipofectamin)을 이용한 형질주입으로 293T 세포 (ATCC) 내에 전달하였다. 이 후 293T세포로부터 유전체 DNA를 DNeasy Blood & Tissue Kit (Qiagen kit)를 이용하여 제조사의 지시에 따라 추출하였다. 추출한 유전체 DNA의 Hifla 유전자 내의 표적 서열 (표 36)을 PCR로 증폭시켰다. plasmid) was delivered into 293T cells (ATCC) by transfection with lipofectamine (1 ipofectamin). Thereafter, genomic DNA was extracted from 293T cells using the DNeasy Blood & Tissue Kit (Qiagen kit) according to the manufacturer's instructions. Target sequences in the Hifla gene of the extracted genomic DNA (Table 36) were amplified by PCR.
상기 증복된 PCR산물에 도입된 IndeK insert ion or deletion) 빈도 를 Deep sequencing으로 분석을 하여 그 결과를 도 37에 나타내었다.  IndeK insert ion or deletion) frequency introduced into the PCR product was analyzed by deep sequencing and the results are shown in FIG. 37.
도 37에 나타낸 바와 같이, 세포 내에 도입된 LbCpfl 단백질이 crRNA와 함께 작용하여 Hifla 유전자에 Indel을 유도하는 것을 알 수 있었다. 참고로, LbCpfl을 암호화하는 플라스미드만 형질주입한 경우에서는 Indel이 나타나지 않았다 (0%) .  As shown in FIG. 37, it was found that the LbCpfl protein introduced into the cell acts with the crRNA to induce Indel in the Hifla gene. For reference, Indel did not appear when only the plasmid encoding LbCpfl was transfected (0%).
상기 도 37에서 우수한 indel빈도를 보인 Hifla의 표적 서열 (LB-TS6)을 암호화하는 DNA와 LbCpfl를 암호화하는 DNA를 포함하는 MV 백터에 클로닝하였다. 상기 제작된 재조합 V 백터는 하나의 백터에 elongation factor short  37 was cloned into an MV vector containing DNA encoding the target sequence (LB-TS6) of Hifla and LbCpfl showing excellent indel frequency. The produced recombinant V vector has an elongation factor short in one vector.
프로모터에 LbCpfl이 조절되고 U6 promoter에 의해 crRNA가 조절되는 두 LbCpfl is regulated in the promoter and crRNA is regulated by the U6 promoter.
molecule이 동시에 발현되는 aH-in-one 백터 시스템이다 (도 38, 도 39a_39c, 및 서열번호 80). 도 39a-39c는 상기 제작된 재조합 MV의 전체 서열 (서열번호 80)을 5'에서 3' 방향으로 연속적으로 보여주는 것으로, 밑즐 및 /또는 이탤릭체로 표시된 부위는, 순서대로 (5'에서 3' 방향), Inverted Terminal repeat (ITR, 5'), U6 promoter, LBCpfl crRNA (LB-TS6; 밑줄 및 굵은 체), Elongation factor la- short promoter, LBCpfl (굵은 이탤릭체), LS, HA tag, bGH poly A signal, 및 ITR sequence (3')를 나타내며, 이 중에서 U6 promoter, LBCpfl crRNA (LB-TS6; aH-in-one vector system in which the molecule is expressed simultaneously (FIG. 38, FIG. 39A_39C, and SEQ ID NO: 80). Figure 39a-39c shows the entire sequence (SEQ ID NO: 80) of the recombinant MV produced in the 5 'to 3' direction in succession, the sites marked in the underlay and / or italics, in order (5 'to 3' direction) ), Inverted Terminal repeat (ITR, 5 '), U6 promoter, LBCpfl crRNA (LB-TS6; underlined and bold), Elongation factor la-short promoter, LBCpfl (boled italic), LS, HA tag, bGH poly A signal , And an ITR sequence (3 ′), among which the U6 promoter, LBCpfl crRNA (LB-TS6;
밑줄 및 굵은 체), Elongation factor la-short promoter, LBCpfl (굵은 이탤릭체), NLS, HA tag, 및 bGH poly A signal 부위는 총 4675 bp (도 38)이다. Underlined and bold), Elongation factor la-short promoter, LBCpfl (boled italic), NLS, HA tag, and bGH poly A signal sites total 4675 bp (Figure 38).
상기 제작된 재조합 MV 백터의 패키징 한계 사이즈인 4.7 kb 이내로  Within 4.7 kb of packaging limit size of the produced recombinant MV vector
LbCpfl과 crRNA가 발현되도록 제작하였다. LbCpfl and crRNA were produced.

Claims

【청구범위】 [Claim]
【청구항 1】  [Claim 1]
Cpfl 단백질 또는 이를 암호화하는 DNA, 및  Cpfl protein or DNA encoding the same, and
유전자의 표적 부위의 15nt (nucleotide) 내지 30nt의 뉴클레오타이드 서열 (표적 서열;)과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 crRNA또는 이를 암호화하는 DNA CrRNA comprising DNA nucleotide sequence of 15nt to 30nt of nucleotide sequence (target sequence; ) and targetable nucleotide sequence of gene target site
를 포함하는 유전체 교정용 조성물.  Dielectric correction composition comprising a.
【청구항 2】  [Claim 2]
거 U항에 있어서, 상기 Cpfl 단백질은 캔디다투스 Candidatus) 속, 라치노스피라 Lachnospira) 속, 뷰티리비브리오 But y vibrio) 속,  According to claim U, The Cpfl protein is Candidatus genus, Lachnospira genus Buty vibrio genus
페레그리니박테리아 Peregrinibacteria) , 액시도미노코쿠스 Acick inococcus) 속 포르파이로모나스 Porphyromonas) 속, 프레보텔라 (Prevotella) 속, 프란시셀라 {Franc i sell a) 속, 캔디다투스 메타노플라스마 :andidatus Methanoplasma) , 또는 유박테리움 iEubacterium) 속 미생물 유래의 것인, 유전체 교정용 조성물. Peregrinibacteria, Acix inococcus, Porphyromonas, Prevotella, Francis i sell a, Candida methanoplasma: andidatus Methanoplasma Or, derived from the microorganism of the genus Eubacterium iEubacterium, genome calibration composition.
【청구항 3】  [Claim 3]
제 2항에 있어서, 상기 Cpfl 단백질은 Parcubacteria bacterium  The method of claim 2, wherein the Cpfl protein is Parcubacteria bacterium
(GWC2011_GWC2_44_17), Lachnospiraceae bacterium (MC2017) , Butyri vibrio proteoclasi icus, Per egri'ni bacteria bacterium (GW2011_GWA_33_10) , (GWC2011_GWC2_44_17), Lachnospiraceae bacterium (MC2017), Butyri vibrio proteoclasi icus, Per egri ' ni bacteria bacterium (GW2011_GWA_33_10),
Acidaminococcus sp. (BV3L6) , Porphyromonas macacae, Lachnospiraceae bacterium (ND2006) , Porphyromonas crevi or / can! s, Prevotella disiens, Moraxella bovoculi (237), S iihella sp. (SC— K08D17), Leptospira inadai , Lachnospiraceae bacterium (MA2020) , Francisel la novicida (U112) , Candidatus Methanoplasma termituw, Candidatus Paceibacter, 또는 Eubacterium el i gens유래의 것인, 유전체 교정용 조성물. Acidaminococcus sp. (BV3L6), Porphyromonas macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevi or / can! s, Prevotella disiens, Moraxella bovoculi (237), S iihella sp. (SC— K08D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisel la novicida (U112), Candidatus Methanoplasma termituw, Candidatus Paceibacter, or Eubacterium el i gens.
【청구항 4】  [Claim 4]
게 1항에 있어서, 상기 표적 서열은 5' 말단에 ΠΤΝ또는 ΊΤΝ (N은 A, T, C, 또는 G)의 PAM(protospacer— adjacent motif)와 연결되어 있거나,  The method of claim 1, wherein the target sequence is linked to a PAM (protospacer—adjacent motif) of ΠΤΝ or ΊΤΝ (N is A, T, C, or G) at the 5 'end,
이에 더하여, 3' 말단에 상기 PAM서열과 역방향으로 상보적인 서열 (NAM 또는 NAA; N은 A, T, C, 또는 G)과 연결된 것인, 유전체 교정용 조성물.  In addition, the 3 'end is complementary to the sequence complementary to the PAM sequence (NAM or NAA; N is A, T, C, or G), genome calibration composition.
[청구항 5】 제 4항에 있어서, 상기 crRNA (CRISPR R A)는 다음의 일반식 1로 표현되는 것인, 유전체 교정용 조성물: [Claim 5] According to claim 4, wherein the crRNA (CRISPR RA) is represented by the following general formula 1, genome calibration composition:
5 ' -nl-n2-A-U-n3-U-C-U-A-C-U-n4-n5-n6-n7-G-U-A-G-A-U-(Ncpfl)q-3 ' (일반식 1; 서열번호 60) 5'-nl-n2-AU-n3-UCUACU-n4-n5-n6-n7-GUAGAU- (N cpfl ) q -3 '(Formula 1; SEQ ID NO: 60)
상기 일반식 1에서,  In the general formula 1,
nl은 존재하지 않거나, II, A, 또는 G이고, n2는 A또는 G이고, n3은 U, A, 또는 C이고, n4는 존재하지 않거나 G, C, 또는 A이고, n5는 A, U, C, G, 또는 존재하지 않고, n6은 U, G또는 C이고, n7은 U'또는 G이고, nl is absent or II, A, or G, n2 is A or G, n3 is U, A, or C, n4 is absent or is G, C, or A, n5 is A, U, C, G, or absent, n6 is U, G or C, n7 is U ' or G,
Ncpn는 표적 서열과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 타겟팅 서열 부위로서 표적 유전자의 표적 부위에 따라서 결정되며, N cpn is a targeting sequence site comprising a target sequence and a nucleotide sequence that can be hybridized, and is determined according to the target site of the target gene,
q는 상기 타겟팅 서열에 포함된 뉴클레오타이드 수를 나타내는 것으로 15 내지 30의 정수임.  q indicates the number of nucleotides included in the targeting sequence and is an integer of 15 to 30.
【청구항 6】  [Claim 6]
저 15항에 있어서, 상기 crRNA는 5' 말단에 1 내지 3개의 구아닌 (G)을 추가로 포함하는 것인, 유전체 교정용 조성물.  The composition of claim 15, wherein the crRNA further comprises 1-3 guanine (G) at the 5 ′ end.
【청구항 7]  [Claim 7]
제 1항에 있어서, 상기 Cpfl 단백질은 아래의 표에 기재된 미생물들에서 선택된 미생물로부터 유래하는 것인, 유전체 교정용 조성물:  According to claim 1, wherein the Cpfl protein is derived from a microorganism selected from the microorganisms listed in the table below, genome calibration composition:
Parcubacteria bacterium GWC2011_GWC2_44_17 Parcubacteria bacterium GWC2011_GWC2_44_17
(PbCpfl) (PbCpfl)
Peregr inibacteria bacterium GW2011_GWA_33_10  Peregr inibacteria bacterium GW2011_GWA_33_10
(PeCpfl) (PeCpfl)
Acidaminococcus sp. BVBLG (AsCpf 1)  Acidaminococcus sp. BVBLG (AsCpf 1)
Porphyromonas macacae (PmCpf 1) Porphyromonas macacae (PmCpf 1)
Lachnospiraceae bacterium ND2006 (LbCpil) Lachnospiraceae bacterium ND2006 (LbCpil)
Porphyromonas crevior i canis(PcCpf 1 ) Porphyromonas crevior i canis (PcCpf 1)
Prevotel la disiens (PdCpf 1) Prevotel la disiens (PdCpf 1)
Moraxella bovoculi 237 (MbCpfl) Moraxella bovoculi 237 (MbCpfl)
Leptospira inadai (LiCpf 1) Lachnos i raceae bacter ium MA2020 (Lb2Cpf 1) Leptospira inadai (LiCpf 1) Lachnos i raceae bacter ium MA2020 (Lb2Cpf 1)
Franci sel l a novi cida U112 (FnCpf 1) Franci sel l a novi cida U112 (FnCpf 1)
Candidatus Methanoplasma tend turn (CMtCpf 1) Candidatus Methanoplasma tend turn (CMtCpf 1)
Eubacter ium el igens (EeCpf 1)  Eubacter ium el igens (EeCpf 1)
【청구항 8】 [Claim 8]
제; l항에 있어서, 상기 Cpfl 단백질은 Cpfl 단백질이 적어도 하나 이상의 임의와위치에서 절단되어 생성된 두 개 이상의 절단 단편들 중 하나 이상을 포함하는 것인, 유전체 교정용 조성물.  My; The composition of claim 1, wherein the Cpfl protein comprises one or more of two or more cleavage fragments generated by cleaving the Cpfl protein in at least one random position.
【청구항 9]  [Claim 9]
제 8항에 있어세 상기 Cpf l 단백질은 두 개 이상의 절단 단편을 포함하며, 상기 두 개 이상의 절단단편은 각각 N-말단 또는 C-말단에 결합 단백질과  The method of claim 8, wherein the Cpf l protein comprises two or more cleavage fragments, wherein the two or more cleavage fragments each comprise an N-terminus or C-terminus binding protein.
결합되어 있으며, 상기 결합 단백질은 동일한 생체활성물질의 서로 다른 부위에 결합하는서로 다른 단백질인, 유전체 교정용 조성물. Is bound, the binding protein is a different protein that binds to different sites of the same bioactive material, genome correction composition.
【청구항 10】  [Claim 10]
제 9항에 있어서, 상기 생체활성물질은 rapamycin이고, 상기 결합 단백질은 FRB 단백질 및 FKBP 단백질로 이루어진 군에서 선택된 것인, 유전체 교정용 조성물.  The composition of claim 9, wherein the bioactive material is rapamycin and the binding protein is selected from the group consisting of FRB protein and FKBP protein.
[청구항 11】  [Claim 11]
거 U항 내지 제 10항 중 어느 한 항에 있어서, 상기 crRNA를 암호화하는 DNA는 백터에 포함된 형태인, 유전체 교정용 조성물.  The composition for genome calibration according to any one of claims U to 10, wherein the DNA encoding the crRNA is in a form contained in a vector.
【청구항 12】  [Claim 12]
제 1항 내지 제 10항 중 어느 한 항에 있어서, 상기 crRNA는 플라스미드  The method according to any one of claims 1 to 10, wherein the crRNA is a plasmid
(pl asmid)를 주형으로 하여 시험관 내 ( in vi tro) 전사된 crRNA인, 유전체 교정용 조성물. A composition for genome correction, which is a crRNA transcribed in vitro with (pl asmid) as a template.
【청구항 13]  [Claim 13]
제 1항 내지 제 10항 중 어느 한 항에 있어서, 상기 crRNA는 5 ' 말단에 인산- 인산 결합을 포함하지 않는 것인, 유전체 교정용 조성물.  The genome calibration composition according to any one of claims 1 to 10, wherein the crRNA does not include a phosphate-phosphate bond at the 5 'end.
【청구항 14】  [Claim 14]
제 1항 내지 제 10항 중 어느 한 항에 있어서, 상기 Cpf l 단백질 또는 이를 암호화하는 DNA는 핵 위치 신호 (nuclear local i zat ion signal , NLS) 서열 또는 이를 암호화하는 DNA를 추가로 포함하는 것인, 유전체 교정용 조성물. The method according to any one of claims 1 to 10, wherein the Cpf l protein or DNA encoding it is a nuclear local signal (NLS) sequence or It further comprises DNA encoding this, genome calibration composition.
【청구항 15】  [Claim 15]
제 11항에 있어서, 상기 Cpf l 단백질을 암호화하는 DNA를 포함하는 재조합 백터, 및 상기 crRNA을 암호화하는 DNA를 포함하는 재조합 백터를 포함하는, 유전체 교정용 조성물. The composition of claim 11, comprising a recombinant vector comprising the DNA encoding the Cpf l protein, and a recombinant vector comprising the DNA encoding the cr RNA.
【청구항 16】  [Claim 16]
제 15항에 있어서, 상기 Cpfl 단백질의 암호화 DNA 및 crRNA의 암호화 DNA는 하나의 재조합 백터에 함께 포함되거나 별개의 백터에 각각 포함된 것인, 유전체 교정용 조성물.  The composition for genome calibration according to claim 15, wherein the coding DNA of the Cpfl protein and the coding DNA of crRNA are included together in one recombinant vector or in separate vectors, respectively.
【청구항 17】  [Claim 17]
게 1항 내지 제 10항 중 어느 한 항에 있어서, 진핵 세포 또는 진핵 유기체의 유전자 교정에 적용하기 위한, 유전체 교정용 조성물.  The composition for genome correction according to any one of claims 1 to 10, for application to gene correction of eukaryotic cells or eukaryotic organisms.
【청구항 18】  [Claim 18]
제 17항에 있어서, 상기 진핵 유기체는 진핵 동물 또는 진핵 식물인, 유전체 교정용 조성물.  The composition of claim 17, wherein the eukaryotic organism is a eukaryotic animal or a eukaryotic plant.
【청구항 19】  [Claim 19]
제 1항 내지 제 10항 중 어느 한 항의 유전체 교정용 조성물을 세포 또는 유기체에 도입하는 단계를 포함하는, 유전체 교정 방법.  A method for genome calibration comprising the step of introducing the genome calibration composition of any one of claims 1 to 10 into a cell or organism.
【청구항 20】  [Claim 20]
제 19항에 있어서, 상기 유전체 교정용 조성물은 crRNA를 암호화하는 DNA를 백터에 포함된 형태로 포함하는 것인, 유전체 교정 방법.  The method of claim 19, wherein the genome editing composition comprises DNA encoding crRNA in a form included in a vector.
【청구항 21】  [Claim 21]
제 19항에 있어서, 상기 유전체 교정용 조성물에 포함된 crRNA는 플라스미드 (pl asmid)를 주형으로 하여 시험관 내 ( in vi tro) 전사된 crRNA인, 유전체 교정 방법.  The genome calibration method according to claim 19, wherein the crRNA included in the genome calibration composition is a crRNA transcribed in vitro using a plasmid as a template.
【청구항 22】  [Claim 22]
제 19항에 있어서, 상기 유전체 교정용 조성물에 포함된 crRNA는 5 ' 말단에 인산 -인산 결합을 포함하지 않는 것인, 유전체 교정 방법.  The method of claim 19, wherein the crRNA included in the genome calibration composition does not include a phosphate-phosphate bond at the 5 ′ end.
【청구항 23】  [Claim 23]
제 19항에 있어서, 상기 유전체 교정용 조성물에 포함된 상기 Cpfl 단백질 또는 이를 암호화하는 DNA는 핵 위치 신호 (nuclear local izat ion signal , NLS) 서열 또는 이를 암호화하는 DNA를 추가로 포함하는 것인, 유전체 교정 방법. 20. The method of claim 19, wherein the Cpfl protein contained in the genome calibration composition Or DNA encoding the DNA further comprises a nuclear local signal signal (NLS) sequence or DNA encoding the same.
【청구항 24】  [Claim 24]
제 19항에 있어서, 상기 유전체 교정용 조성물을도입하는 단계는 국소 주입법, 마이크로주입법 (microinject ion) , 전기천공법 (electroporat ion) 또는 리포펙션 ( l ipofect ion) 방법에 의하여 수행되는 것인, 유전체 교정 방법.  The dielectric of claim 19, wherein the introducing of the dielectric correction composition is performed by a local injection method, a microinject ion method, an electroporat ion method or a lipofect ion method. Calibration method.
【청구항 25]  [Claim 25]
제 19항에 있어서, 상기 세포 또는 유기체는 진핵 세포 또는 진핵 유기체인, 유전체 교정 방법 .  The method of claim 19, wherein the cell or organism is a eukaryotic cell or eukaryotic organism.
【청구항 26】 ' [26.] "
제 25항에 있어서, 상기 진핵 세포는 진핵 동물 또는 진핵 식물로부터 분리된 세포인, 유전체 교정 방법 .  The method of claim 25, wherein the eukaryotic cell is a cell isolated from a eukaryotic animal or a eukaryotic plant.
【청구항 27】  [Claim 27]
제 25항에 있어서, 진핵 유기체는 진핵 동물 또는 진핵 식물인, 유전체 교정 방법.  The method of claim 25, wherein the eukaryotic organism is a eukaryotic animal or a eukaryotic plant.
【청구항 28】  [Claim 28]
제 1항 내지 제 10항 중 어느 한 항의 유전체 교정용 조성물을 세포 또는 유기체에 도입하는 단계를 포함하는, 형질 전환체의 제조 방법.  A method for producing a transformant, comprising the step of introducing the composition for genome correction according to any one of claims 1 to 10 into a cell or an organism.
【청구항 29】  [Claim 29]
제 28항에 있어서, 상기 유전체 교정용 조성물은 crRNA를 암호화하는 DNA를 백터에 포함된 형태로 포함하는 것인, 형질 전환체의 제조 방법.  29. The method of claim 28, wherein the genome corrective composition comprises a DNA encoding crRNA in a form contained in a vector.
【청구항 30】  [Claim 30]
제 28항에 있어서, 상기 유전체 교정용 조성물에 포함된 crRNA는 플라스미드 (plasmid)를 주형으로 하여 시험관 내 ( in vi tro) 전사된 crRNA인, 형질 전환체의 제조 방법 .  29. The method of claim 28, wherein the crRNA included in the genome calibration composition is a crRNA transcribed in vitro using a plasmid as a template.
【청구항 31】  [Claim 31]
제 28항에 있어서, 상기 유전체 교정용 조성물에 포함된 crRNA는 5 ' 말단에 인산 -인산 결합을 포함하지 않는 것인, 형질 쟌환체의 제조 방법.  29. The method of claim 28, wherein the crRNA included in the genome calibration composition does not include a phosphate-phosphate bond at the 5 'end.
【청구항 32】  [Claim 32]
제 28항에 있어서, 상기 유전체 교정용 조성물에 포함된 상기 Cpfl 단백질 또는 이를 암호화하는 DNA는 핵 위치 신호 (nuclear local i zat ion signal , NLS) 서열 또는 이를 암호화하는 DNA를 추가로 포함하는 것인, 형질 전환체의 제조 방법. 29. The method of claim 28, wherein the Cpfl protein contained in the genome calibration composition Or DNA encoding the DNA further comprises a nuclear local signal (NLS) sequence or DNA encoding the same, a method for producing a transformant.
【청구항 33】 [Claim 33]
제 28항에 있어서, 상기 유전체 교정용 조성물을 도입하는 단계는 국소 주입법, 마이크로주입법 (microinject ion) , 전기천공법 (electroporat ion) , 또는 리포펙션 ( l ipofect ion) 방법에 의하여 수행되는 것인, 형질 전환체의 제조 방법.  29. The method of claim 28, wherein the step of introducing the dielectric correction composition is performed by a local injection method, a microinject ion (microinject ion), an electroporat ion, or a lipofect ion method, Method for producing a transformant.
【청구항 34] [Claim 34]
제 28항에 있어서, 상기 형질 전환체는 유전자 절단, 뉴클레오타이드의 삽입, 뉴클레오타이드의 치환, 또는 뉴클레오타이드의 결실이 유도된 것인ᅳ 형질  The transformant of claim 28, wherein the transformant is induced by gene cutting, insertion of a nucleotide, substitution of a nucleotide, or deletion of a nucleotide.
전환체의 제조 방법 . Method for Producing Converted Body.
【청구항 35]  [Claim 35]
제 28항에 있어서, 상기 세포 또는 유기체는 진핵 세포 또는 진핵 유기체인, 형질 전환체의 제조 방법.  29. The method of claim 28, wherein said cell or organism is a eukaryotic cell or eukaryotic organism.
【청구항 36】  [Claim 36]
제 35항에 있어서, 상기 진핵 유기체는 진핵 동물 또는 진핵 식물인, 형질 전환체의 제조 방법 .  36. The method of claim 35, wherein said eukaryotic organism is a eukaryotic animal or a eukaryotic plant.
【청구항 37】  [Claim 37]
제 28항의 방법에 의하여 제조된 형질 전환체.  A transformant prepared by the method of claim 28.
【청구항 38】  [Claim 38]
제 37항에 있어서, 상기 형질 전환체는 유전자 절단, 뉴클레오타이드의 삽입, 또는 뉴클레오타이드의 결실이 유도된 진핵 세포, 진핵 동물 또는 진핵 식물인, 형질 전환체 . ·  38. The transformant of claim 37, wherein the transformant is a eukaryotic cell, eukaryotic animal or eukaryotic plant from which gene cleavage, insertion of a nucleotide, or deletion of a nucleotide is induced. ·
【청구항 39】  [Claim 39]
RNA 가이드 엔도뉴클레아제 (RNAᅳ guided endonuc lease ; RGEN) 및 가이드 RNA를 포함하는 흔합물 또는 리보핵산 단백질을 국소주입법 (예컨대, 병변 또는 표적 부위 직접 주입), 미세주입법, 전기천공법 (electroporat ion) , 또는  Topical injection (eg, direct injection of lesions or target sites), microinjection, and electroporation of RNA or guided endonucase (RGEN) and a combination or ribonucleic acid protein comprising guide RNA ) , or
리포펙션에 의하여 진핵 세포 또는 진핵 유기체에 도입시키는 단계를 포함하는, RNA 가이드 엔도뉴클레아제 및 가이드 R A의 진핵 세포 또는 진핵 유기체에 To eukaryotic cells or eukaryotic organisms of RNA guide endonucleases and guide R A, comprising introducing into the eukaryotic cells or eukaryotic organisms by lipofection.
전달하는 방법 . How to pass.
【청구항 40】 Cpfl 단백질 또는 이를 암호화하는 DNA, 및 [Claim 40] Cpfl protein or DNA encoding the same, and
Hifl-alpha 유전자의 표적 부위의 연속하는 15nt 내지 30nt의  Consecutive 15nt to 30nt of the target site of the Hifl-alpha gene
뉴클레오타이드 서열 (표적 서열)과 흔성화 가능한 뉴클레오타이드 서열을 . The nucleotide sequence (target sequence) and the nucleotide sequence that can be hybridized.
포함하는 crRNA 또는 이를 암호화하는 DNA CrRNA containing or DNA encoding the same
를 포함하는, 안구 질환의 예방 또는 치료ᅳ용 약학 조성물.  A pharmaceutical composition for the prevention or treatment of eye diseases, comprising the.
【청구항 41】  [Claim 41]
제 40항에 있어서, 상기 crRNA (CRISPR RNA)는 다음의 일반식 1로 표현되는 것인, 안구 질환의 예방또는 치료용 약학 조성물:  The pharmaceutical composition of claim 40, wherein the crRNA (CRISPR RNA) is represented by the following general formula (1):
5 ' -nl-n2-A-U-n3-U-C-U-A-C-U-n4-n5-n6-n7-G-U-A-G-A-U-(Ncpfl)q-3 ' (일반식 1; 서열번호 60) 5'-nl-n2-AU-n3-UCUACU-n4-n5-n6-n7-GUAGAU- (N cpfl ) q -3 '(Formula 1; SEQ ID NO: 60)
상기 일반식 1에서,  In the general formula 1,
nl은 존재하지 않거나, U, A, 또는 G이고, n2는 A또는 G이고, n3은 U, A, 또는 C이고, n4는 존재하지 않거나 G, C, 또는 A이고, n5는 A, U, C, G, 또는 존재하지 않고, n6은 U, G또는 C이고, n7은 U또는 G이고,  nl is absent or is U, A, or G, n2 is A or G, n3 is U, A, or C, n4 is absent or is G, C, or A, n5 is A, U, C, G, or absent, n6 is U, G or C, n7 is U or G,
Ncpfl는 표적 서열과흔성화 가능한 뉴클레오타이드 서열을 포함하는 타겟팅 서열 부위로서 표적 유전자의 표적 부위에 따라서 결정되며, N cpfl is a targeting sequence site comprising a target sequence and a nucleotide sequence that can be hybridized, and is determined according to the target site of the target gene,
q는 상기 타겟팅 서열에 포함된 뉴클레오타이드 수를 나타내는 것으로 15 내지 30의 정수임.  q indicates the number of nucleotides included in the targeting sequence and is an integer of 15 to 30.
【청구항 42】 .  【Claim 42】.
제 40항에 있어서, 상기 Cpfl 단백질은 아래의 표에 기재된 미생물들 중에서 선택된 미생물로부터 유래하는 것인, 안구 질환의 예방또는 치료용 약학 조성물: Parcubacteria bacterium G C2011_GWC2_44_17  The pharmaceutical composition of claim 40, wherein the Cpfl protein is derived from a microorganism selected from the microorganisms listed in the table below. Parcubacteria bacterium G C2011_GWC2_44_17
(PbCpfl) (PbCpfl)
Peregr inibacter ia bacterium GW2011_GWA_33_10  Peregr inibacter ia bacterium GW2011_GWA_33_10
(PeCpfl) (PeCpfl)
Acidaminococcus sp. BVBLG (AsCpf 1)  Acidaminococcus sp. BVBLG (AsCpf 1)
Porphyromonas macacae (PmCpf 1) Porphyromonas macacae (PmCpf 1)
Lachnos iraceae bacterium ND2006 (LbCpi 1) Lachnos iraceae bacterium ND2006 (LbCpi 1)
Porphyromonas crevior icanis(PcCpf 1) Prevotel la disiens (PdCpf 1) Porphyromonas crevior icanis (PcCpf 1) Prevotel la disiens (PdCpf 1)
Moraxella bovoculi 237 (MbCpfl)  Moraxella bovoculi 237 (MbCpfl)
Leptospira inadai (LiCpf 1)  Leptospira inadai (LiCpf 1)
Lachnospiraceae bacterium MA2020 (Lb2Cpf 1) Lachnospiraceae bacterium MA2020 (Lb2Cpf 1)
Francisel la novicida U112 (FnCpf 1) Francisel la novicida U112 (FnCpf 1)
Candidatus Methano lasma termitum (CMtCpf 1) Candidatus Methano lasma termitum (CMtCpf 1)
Eubacter ium el igens (EeCpf 1)  Eubacter ium el igens (EeCpf 1)
【청구항 43] [Claim 43]
제 40항에 있어서, 상기 약학 조성물은 상기 Cpfl 단백질올 암호화하는 DNA 및 상기 crRNA를 암호화하는 DNA를 별도의 백터에 각각 포함하거나 하나의 백터에 함께 포함하는 재조합 백터를 포함하는 것인, 안구 질환의 예방또는 치료용 약학 조성물.  41. The method of claim 40, wherein the pharmaceutical composition comprises a recombinant vector comprising the DNA encoding the Cpfl protein and the crRNA encoding DNA in a separate vector or together in one vector. Prophylactic or therapeutic pharmaceutical composition.
【청구항 44】  [Claim 44]
.제 43항 중 어느 한 항에 있어서, 상기 백터는 아데노부속 바이러스 (AAV)인, 안구 질환의 예방 또는 치료용 약학조성물. . The pharmaceutical composition according to any one of claims 43 to 43, wherein the vector is adeno-associated virus (AAV).
【청구항 45】  [Claim 45]
제 40항 내지 제 44항 증 어느 한 항에 있어서, 상기 crRNA는 서열번호 69 내지 서열번호 79의 Hifl-a 유전자의 표적 서열 중에서 선택된 서열과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 것인, 안구 질환의 예방또는 치료용 약학 조성물.  45. The method of any one of claims 40 to 44, wherein the crRNA comprises a nucleotide sequence that is capable of hybridizing with a sequence selected from a target sequence of the Hifl-a gene of SEQ ID NO: 69 to SEQ ID NO: 79. Prophylactic or therapeutic pharmaceutical composition.
【청구항 46】  [Claim 46]
제 40항 내지 제 44항 중 어느 한 항에 있어서, 상기 안구 질환은 당뇨성 망막병증 또는 노인성 황반변성인, 안구 질환의 예방 또는 치료용 약학 조성물.  45. The pharmaceutical composition according to any one of claims 40 to 44, wherein the ocular disease is diabetic retinopathy or macular degeneration.
【청구항 47] [Claim 47]
Cpfl 단백질 또는 이를 암호화하는 DNA, 및  Cpfl protein or DNA encoding the same, and
Hifl-alpha 유전자의 표적 부위의 연속하는 15nt 내지 30nt의  Consecutive 15nt to 30nt of the target site of the Hifl-alpha gene
뉴#레오타이드 서열 (표적 서열)과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 crRNA또는 이를 암호화하는 DNA 를 안구 질환의 예방 또는 치료를 필요로 하는 대상에 투여하는 단계를 포함하는, 안구 질환의 예방또는 치료 방법 . CrRNA comprising a nucleotide sequence (target sequence) and a hybridizable nucleotide sequence or DNA encoding the same A method for preventing or treating ocular disease, comprising administering to a subject in need thereof.
【청구항 48】 ᅳ '  【Claim 48】 48 '
제 47항에 있어서 , 상기 crRNA (CRISPR RNA)는 다음의 일반식 1로 표현되는 것인, 안구 질환의 예방 또는 치료 방법:  48. The method of claim 47, wherein the crRNA (CRISPR RNA) is represented by the following general formula (1):
51 -nl-n2-A-U-n3-U-C-U-A-C-U-n4-n5-n6-n7-G-U-A-G-A-U-(Ncpfl)q-3 ' (일반식 1; 서열번호 60) 5 1 -nl-n2-AU-n3-UCUACU-n4-n5-n6-n7-GUAGAU- (N cpfl ) q -3 '(Formula 1; SEQ ID NO: 60)
상기 일반식 1에서,  In the general formula 1,
nl은 존재하지 않거나 U, A, 또는 G이고, n2는 A또는 G이고 n3은 U, A, 또는 C이고, n4는 존재하지 않거나 G, C, 또는 A이고, n5는 A, U, C, G, 또는 존재하지 않고, n6은 U, G또는 C이고, n7은 U또는 G이고,  nl is absent or is U, A, or G, n2 is A or G, n3 is U, A, or C, n4 is absent or is G, C, or A, n5 is A, U, C, G, or absent, n6 is U, G or C, n7 is U or G,
Ncpfi는 표적 서열과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 타겟팅 서열 부위로서 표적 유전자의 표적 부위에 따라서 결정되며, N cp fi is a targeting sequence site comprising a target sequence and a nucleotide sequence that is capable of being determined according to the target site of the target gene.
q는 상기 타겟팅 서열에 포함된 뉴클레오타이드 수를 나타내는 것으로 15 내지 30의 정수임 .  q represents the number of nucleotides included in the targeting sequence and is an integer of 15 to 30.
【청구항 49]  [Claim 49]
제 47항에 있어서, 상기 Cpfl 단백질은 아래의 표에 기재된 미생물들 중에서 선택된 미생물로부터 유래하는 것인, 안구 질환의 예방 또는 치료 방법:  48. The method of claim 47, wherein the Cpfl protein is derived from a microorganism selected from the microorganisms listed in the table below.
Parcubacteri bacterium G C2011_GWC2_44_17 Parcubacteri bacterium G C2011_GWC2_44_17
(PbCpfl) (PbCpfl)
Peregrinibacteria bacterium GW2011_GWA_33_10  Peregrinibacteria bacterium GW2011_GWA_33_10
(PeCpfl) (PeCpfl)
Acidaminococcus sp. BVBLG (AsCpf 1)  Acidaminococcus sp. BVBLG (AsCpf 1)
Porphyromonas macacae (PmCpf 1) Porphyromonas macacae (PmCpf 1)
Lachnospiraceae bacterium ND2006 (LbCpi 1) Lachnospiraceae bacterium ND2006 (LbCpi 1)
Porphyromonas crevior i canis(PcCpf 1 ) Porphyromonas crevior i canis (PcCpf 1)
Prevotel l disiens (PdCpf 1) Prevotel l disiens (PdCpf 1)
Moraxella bovoculi 237 (MbCpfl) Moraxella bovoculi 237 (MbCpfl)
Leptospira inadai (LiCpf 1) Lachnospi raceae bacter ium MA2020 (Lb2Cpf 1) Leptospira inadai (LiCpf 1) Lachnospi raceae bacter ium MA2020 (Lb2Cpf 1)
Franci sel l a novi cida U112 (FnCpf 1) Franci sel l a novi cida U112 (FnCpf 1)
Candidatus Methanopl sma termi tum (CMtCpf 1) Candidatus Methanopl sma termi tum (CMtCpf 1)
Eubacter ium el igens (EeCpf 1)  Eubacter ium el igens (EeCpf 1)
【청구항 50】 [Claim 50]
제 47항에 있어서, 상기 투여하는 단계는 상기 Cpf l 단백질을 암호화하는 DNA 및 상기 crRNA를 암호화하는 DNA를 별도의 백터에 각각 포함하거나 하나의 백터에 함께 포함하는 재조합 백터를 투여하는 것인, 안구 질환의 예방 또는 치료 방법.  48. The method of claim 47, wherein the administering step comprises administering a recombinant vector comprising the DNA encoding the Cpf l protein and the DNA encoding the crRNA in a separate vector or together in one vector How to prevent or treat a disease.
【청구항 51】 ' [Claim 51] "
제 50항 중 어느 한 항에 있어서, 상기 백터는 아데노부속 바이러스 (AAV)인, 안구 질환의 예방 또는 치료 방법 .  51. The method of any one of claims 50, wherein said vector is adeno-associated virus (AAV).
【청구항 52】  [Claim 52]
제 47할 내지 제 51항 중 어느 한 항에 있어서, 상기 crRNA는 서열번호 69 내지 서열번호 79의 Hi f l-a 유전자의 표적 서열 중에서 선택된 서열과 흔성화 자능한 뉴클레오타이드 서열을 포함하는 것인, 안구 질환의 예방 또는 치료 방법.  52. The ocular disease according to any one of claims 47-51, wherein the crRNA comprises a nucleotide sequence capable of localizing with a sequence selected from the target sequences of the Hi f la gene of SEQ ID NOs: 69-79 Methods of prevention or treatment.
【청구항 53】 [Claim 53]
제 47항 내지 제 51항 증 어느 한 항에 있어서 , 상기 안구 질환은 당뇨성 망막병증 또는 노인성 황반변성인, 안구 질환의 예방 또는 치료 방법.  52. The method of any one of claims 47 to 51, wherein the ocular disease is diabetic retinopathy or macular degeneration.
【청구항 54】  [Claim 54]
제 47항 내지 제 51항 중 어느 한 항에 있어서, 상기 투여하는 단계는  52. The method of any of claims 47-51, wherein the administering is
Cpf l 단백질 또는 이를 암호화하는 DNA를 포함하는.재조합 백터, 및 Hi fl- alpha 유전자의 표적 부위의 연속하는 15nt 내지 30nt의 표적 서열과 흔성화 가능한 뉴클레오타이드 서열을 포함하는 crRNA또는 이를 암호화하는 DNA를 포함하는 재조합 백터를 포함하는 흔합물 또는 리보핵산 단백질을 망막 주입에 의하여 수행되는 것인, 안구 질환의 예방 또는 치료 방법.  A recombinant RNA comprising a Cpf l protein or a DNA encoding the same, and a crRNA comprising a nucleotide sequence capable of hybridizing with a target sequence of 15 to 30 nt consecutive of the target site of the Hi fl-alpha gene or a DNA encoding the same Method for preventing or treating ocular disease, which is carried out by retinal injection of a complex or ribonucleic acid protein comprising a recombinant vector.
PCT/KR2016/014379 2015-12-08 2016-12-07 Genome editing composition comprising cpf1, and use thereof WO2017099494A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
KR10-2015-0174212 2015-12-08
KR20150174212 2015-12-08
US201662299043P 2016-02-24 2016-02-24
US62/299,043 2016-02-24
KR20160036381 2016-03-25
KR10-2016-0036381 2016-03-25

Publications (2)

Publication Number Publication Date
WO2017099494A1 true WO2017099494A1 (en) 2017-06-15
WO2017099494A8 WO2017099494A8 (en) 2017-08-10

Family

ID=59013788

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2016/014379 WO2017099494A1 (en) 2015-12-08 2016-12-07 Genome editing composition comprising cpf1, and use thereof

Country Status (2)

Country Link
KR (2) KR101897213B1 (en)
WO (1) WO2017099494A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180078620A1 (en) * 2016-07-28 2018-03-22 Institute For Basic Science Method of Treating or Preventing Eye Disease Using Cas9 Protein and Guide RNA
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
US10190137B2 (en) 2013-11-07 2019-01-29 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
CN109666684A (en) * 2018-12-25 2019-04-23 北京化工大学 A kind of CRISPR/Cas12a gene editing system and its application
WO2019173942A1 (en) * 2018-03-12 2019-09-19 Nanjing Bioheng Biotech Co., Ltd Engineered chimeric guide rna and uses thereof
US10428319B2 (en) 2017-06-09 2019-10-01 Editas Medicine, Inc. Engineered Cas9 nucleases
EP3502261A4 (en) * 2016-08-19 2020-07-15 Toolgen Incorporated Artificially engineered angiogenesis regulatory system
CN113373170A (en) * 2021-04-29 2021-09-10 江西农业大学 pFNCpfAb/pCrAb double-plasmid system and application thereof
CN113969281A (en) * 2021-12-24 2022-01-25 汕头大学 Modified CrRNA fragment and African swine fever virus kit
US11236313B2 (en) 2016-04-13 2022-02-01 Editas Medicine, Inc. Cas9 fusion molecules, gene editing systems, and methods of use thereof
US11390884B2 (en) 2015-05-11 2022-07-19 Editas Medicine, Inc. Optimized CRISPR/cas9 systems and methods for gene editing in stem cells
US11499151B2 (en) 2017-04-28 2022-11-15 Editas Medicine, Inc. Methods and systems for analyzing guide RNA molecules
US11597924B2 (en) 2016-03-25 2023-03-07 Editas Medicine, Inc. Genome editing systems comprising repair-modulating enzyme molecules and methods of their use
US11667911B2 (en) 2015-09-24 2023-06-06 Editas Medicine, Inc. Use of exonucleases to improve CRISPR/CAS-mediated genome editing
US11680268B2 (en) 2014-11-07 2023-06-20 Editas Medicine, Inc. Methods for improving CRISPR/Cas-mediated genome-editing
US11866726B2 (en) 2017-07-14 2024-01-09 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
US11911415B2 (en) 2015-06-09 2024-02-27 Editas Medicine, Inc. CRISPR/Cas-related methods and compositions for improving transplantation

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11667917B2 (en) 2017-11-21 2023-06-06 Genkore Co. Ltd. Composition for genome editing using CRISPR/CPF1 system and use thereof
JP7075170B2 (en) * 2018-01-23 2022-05-25 インスティチュート フォー ベーシック サイエンス Extended single guide RNA and its uses
KR102177174B1 (en) 2018-05-18 2020-11-10 울산대학교 산학협력단 A retinal degenerated animal model by PDE6B gene deletion and the preparation method thereof
EP3830301B1 (en) 2018-08-01 2024-05-22 Mammoth Biosciences, Inc. Programmable nuclease compositions and methods of use thereof
CN113227367B (en) 2018-08-09 2023-05-12 G+Flas生命科学公司 Compositions and methods for genome engineering with CAS12A protein
WO2020032711A1 (en) * 2018-08-09 2020-02-13 (주)지플러스 생명과학 Novel crispr-associated protein and use thereof
EP3931313A2 (en) 2019-01-04 2022-01-05 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
KR102493904B1 (en) 2019-12-13 2023-01-31 한국생명공학연구원 Immunodeficient Animal Model Mutated IL2Rg Gene by EeCpf1 and Method for Producing the same
KR102551876B1 (en) 2019-12-18 2023-07-05 한국생명공학연구원 Composition for Genome Editing or Inhibiting Gene Expression comprising Cpf1 and Chimeric DNA-RNA Guide
KR102471698B1 (en) * 2020-03-24 2022-11-28 연세대학교 산학협력단 Novel guide RNA and method for diagnosing Coronavirus disease 2019
WO2021194172A1 (en) * 2020-03-24 2021-09-30 연세대학교 산학협력단 Novel guide rna and method for diagnosing coronavirus infection 2019 using same
KR20240020336A (en) 2022-08-04 2024-02-15 성균관대학교산학협력단 Protospacer Adjacent Motif-independent mutant Cas9 protein

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150101476A (en) * 2012-10-23 2015-09-03 주식회사 툴젠 Composition for cleaving a target DNA comprising a guide RNA specific for the target DNA and Cas protein-encoding nucleic acid or Cas protein, and use thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106061510B (en) 2013-12-12 2020-02-14 布罗德研究所有限公司 Delivery, use and therapeutic applications of CRISPR-CAS systems and compositions for genome editing
US9790490B2 (en) * 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150101476A (en) * 2012-10-23 2015-09-03 주식회사 툴젠 Composition for cleaving a target DNA comprising a guide RNA specific for the target DNA and Cas protein-encoding nucleic acid or Cas protein, and use thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHO ET AL.: "Analysis of Off-target Effects of CRISPR/Cas-derived RNA-guided Endonucleases and Nickases", GENOME RESEARCH, vol. 24, no. 1, 19 November 2013 (2013-11-19), pages 132 - 141, XP055227885 *
FAGERLUND ET AL.: "The Cpf1 CRISPR-Cas Protein Expands Genome-editing Tools", GENOME BIOLOGY, vol. 16, no. 251, 17 November 2015 (2015-11-17), pages 1 - 3, XP002757560 *
MAKAROVA ET AL.: "An Updated Evolutionary Classification of CRISPR-Cas Systems", NATURE REVIEWS MICROBIOLOGY, vol. 13, no. 11, 28 September 2015 (2015-09-28), pages 722 - 736, XP055271841 *
ZETSCHE ET AL.: "Cpfl is a Single RNA-guided Endonuclease of a Class 2 CRISPR-Cas System", CELL, vol. 163, no. 3, 25 September 2015 (2015-09-25), pages 759 - 771, XP055267511 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11390887B2 (en) 2013-11-07 2022-07-19 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
US10190137B2 (en) 2013-11-07 2019-01-29 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
US10640788B2 (en) 2013-11-07 2020-05-05 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAs
US11680268B2 (en) 2014-11-07 2023-06-20 Editas Medicine, Inc. Methods for improving CRISPR/Cas-mediated genome-editing
US11390884B2 (en) 2015-05-11 2022-07-19 Editas Medicine, Inc. Optimized CRISPR/cas9 systems and methods for gene editing in stem cells
US11911415B2 (en) 2015-06-09 2024-02-27 Editas Medicine, Inc. CRISPR/Cas-related methods and compositions for improving transplantation
US11667911B2 (en) 2015-09-24 2023-06-06 Editas Medicine, Inc. Use of exonucleases to improve CRISPR/CAS-mediated genome editing
US11597924B2 (en) 2016-03-25 2023-03-07 Editas Medicine, Inc. Genome editing systems comprising repair-modulating enzyme molecules and methods of their use
US11236313B2 (en) 2016-04-13 2022-02-01 Editas Medicine, Inc. Cas9 fusion molecules, gene editing systems, and methods of use thereof
US11123409B2 (en) * 2016-07-28 2021-09-21 Institute For Basic Science Method of treating or preventing eye disease using Cas9 protein and guide RNA
US20180078620A1 (en) * 2016-07-28 2018-03-22 Institute For Basic Science Method of Treating or Preventing Eye Disease Using Cas9 Protein and Guide RNA
EP3502261A4 (en) * 2016-08-19 2020-07-15 Toolgen Incorporated Artificially engineered angiogenesis regulatory system
US11499151B2 (en) 2017-04-28 2022-11-15 Editas Medicine, Inc. Methods and systems for analyzing guide RNA molecules
US11098297B2 (en) 2017-06-09 2021-08-24 Editas Medicine, Inc. Engineered Cas9 nucleases
US10428319B2 (en) 2017-06-09 2019-10-01 Editas Medicine, Inc. Engineered Cas9 nucleases
US10626416B2 (en) 2017-06-23 2020-04-21 Inscripta, Inc. Nucleic acid-guided nucleases
US10435714B2 (en) 2017-06-23 2019-10-08 Inscripta, Inc. Nucleic acid-guided nucleases
US10337028B2 (en) 2017-06-23 2019-07-02 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
US11697826B2 (en) 2017-06-23 2023-07-11 Inscripta, Inc. Nucleic acid-guided nucleases
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US11866726B2 (en) 2017-07-14 2024-01-09 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
WO2019173942A1 (en) * 2018-03-12 2019-09-19 Nanjing Bioheng Biotech Co., Ltd Engineered chimeric guide rna and uses thereof
CN109666684A (en) * 2018-12-25 2019-04-23 北京化工大学 A kind of CRISPR/Cas12a gene editing system and its application
CN113373170A (en) * 2021-04-29 2021-09-10 江西农业大学 pFNCpfAb/pCrAb double-plasmid system and application thereof
CN113969281A (en) * 2021-12-24 2022-01-25 汕头大学 Modified CrRNA fragment and African swine fever virus kit

Also Published As

Publication number Publication date
KR101897213B1 (en) 2018-09-11
KR101958437B1 (en) 2019-03-15
KR20180028996A (en) 2018-03-19
WO2017099494A8 (en) 2017-08-10
KR20170068400A (en) 2017-06-19

Similar Documents

Publication Publication Date Title
WO2017099494A1 (en) Genome editing composition comprising cpf1, and use thereof
US10781432B1 (en) Engineered cascade components and cascade complexes
AU2017225060B2 (en) Methods and compositions for RNA-directed target DNA modification and for RNA-directed modulation of transcription
CN107130000B (en) CRISPR-Cas9 system for simultaneously knocking out KRAS gene and EGFR gene and application thereof
AU2021269364A1 (en) Crispr-cas system materials and methods
ES2955957T3 (en) CRISPR hybrid DNA/RNA polynucleotides and procedures for use
JP7063885B2 (en) Targeted increased DNA demethylation
CA3002827A1 (en) Nucleobase editors and uses thereof
KR102151065B1 (en) Composition and method for base editing in animal embryos
JP6628387B2 (en) Modified Cas9 protein and use thereof
US20180201912A1 (en) Modified fncas9 protein and use thereof
JP2022522650A (en) CRISPR-CAS effector polypeptide and how to use it
CN109971755B (en) Tumor-targeted gene therapy drug based on CRISPR/Cas9 gene editing technology and application thereof
CN117384880A (en) Engineered nucleic acid modification editor
JP2024501892A (en) Novel nucleic acid-guided nuclease
KR20220039564A (en) Compositions and methods for use of engineered base editing fusion protein
EP4065702A1 (en) System and method for activating gene expression
WO2022045169A1 (en) ENGINEERED CjCas9 PROTEIN
JP2018007589A (en) Method for screening cell strain capable of in vivo cloning, method for producing cell strain capable of in vivo cloning, cell strain, in vivo cloning method, and kit for performing in vivo cloning
CA3221684A1 (en) Crispr-transposon systems for dna modification
WO2023183217A2 (en) Methods and materials for treating dyskeratosis congenita
WO2024052681A1 (en) Rett syndrome therapy
WO2019060631A1 (en) Expression systems that facilitate nucleic acid delivery and methods of use

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16873359

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/10/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16873359

Country of ref document: EP

Kind code of ref document: A1