KR101897213B1 - Composition for Genome Editing Comprising Cpf1 and Use thereof - Google Patents

Composition for Genome Editing Comprising Cpf1 and Use thereof Download PDF

Info

Publication number
KR101897213B1
KR101897213B1 KR1020160167045A KR20160167045A KR101897213B1 KR 101897213 B1 KR101897213 B1 KR 101897213B1 KR 1020160167045 A KR1020160167045 A KR 1020160167045A KR 20160167045 A KR20160167045 A KR 20160167045A KR 101897213 B1 KR101897213 B1 KR 101897213B1
Authority
KR
South Korea
Prior art keywords
sequence
cpf1
crrna
target
protein
Prior art date
Application number
KR1020160167045A
Other languages
Korean (ko)
Other versions
KR20170068400A (en
Inventor
김진수
허준호
김대식
김정은
김경미
김혜란
구태영
Original Assignee
기초과학연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 기초과학연구원 filed Critical 기초과학연구원
Publication of KR20170068400A publication Critical patent/KR20170068400A/en
Application granted granted Critical
Publication of KR101897213B1 publication Critical patent/KR101897213B1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression

Abstract

Cpf1, a dielectric correction method using the same, and a technology for producing a transformed eukaryotic organism.

Figure R1020160167045

Description

Composition for Genome Editing Comprising Cpf1 and Use Thereof

Cpf1, a dielectric correction method using the same, and a technology for producing a transformed eukaryotic organism.

Making genome-edited animals and plants took a long time and effort, and it was difficult because there were many reagents that had to be prepared separately for each target gene. Recently, a type II CRISPR-Cas9 system that efficiently cleaves a target gene through the binding of Cas9 protein and guide RNA has been widely used in various ways. Recently, the most commonly used S. pyogenes In addition to Cas9, other orthologs of Cas9 are also being used as gene scissors. This technique is faster and more efficient than conventional mutant production methods, and it is advantageous to produce only guide RNA according to the target gene.

The Cas9-system has many advantages, but it is limited in that the target DNA must have a sequence called protospacer adjacent motif (PAM). S. pyogenes Cas9 and other Cas9 proteins that have recently begun to be used all recognize PAM at the 3 'position of the target sequence. The widely used S. pyogenes Cas9 recognizes the 3 'NGG PAM of the target gene region and has a limitation that it can not be used for a target that does not have this sequence. S. Another characteristic of the Cas9-system, such as pyogenes Cas9, is that it has two nuclease domains in a single protein, thus cleaving both strands of the target DNA into blunt ends. In this case, knock-out efficiency of gene is high through insertion and deletion (indel) through non-homologous end joining (NHEJ), but knock-in using homologous recombination (HR) is low efficiency.

On the other hand, a method of injecting CRISPR-Cas9 ribonucleoprotein (RNP) into the embryo by microinjection method has been reported for the dielectric correction using the CRISPR-Cas9 system. Although this method can reliably deliver RNP to the embryo, there is a disadvantage that each embryo must be processed one by one while checking through the microscope. Especially, it takes a long time to process a large number of embryos in order, which is a technical obstacle in that the embryo is kept in the 1 cell stage for a short time.

Therefore, it is required to develop an efficient gene correction technique that overcomes the limitations of the CRISPR-Cas9 system and can replace it, and to develop an intracellular delivery technology of RNP that can effectively perform this.

A technique for correcting a dielectric in eukaryotic organisms such as animals and plants is provided herein using a type V CRISPR-Cpf1 system using Cpf1, which can overcome the drawbacks of the type II CRISPR-Cas9 system.

One example provides a complex comprising a Cpf1 protein or a DNA encoding it, and a guide RNA or DNA encoding the same.

Another example provides a composition for correcting a dielectric comprising a Cpf1 protein or a DNA encoding the same, and a guide RNA or DNA encoding the same.

Another example provides a method of genetic modification using a Cpf1 protein or a DNA encoding it, and a guide RNA or DNA encoding it.

The Cpf1 protein or the DNA encoding it, and the guide RNA or the DNA encoding the same, which are contained in or used in the above-mentioned complex or a composition for orthodontic correction or the genetic modification method, may be a mixture comprising a Cpf1 protein and a guide RNA, It can be used in the form of ribonucleoprotein (RNA), DNA encoding the Cpf1 protein, and DNA encoding the guide RNA, respectively, in separate vectors or can be used together in one vector.

The composition and method may be applied to eukaryotic organisms. The eukaryotic organisms include eukaryotic cells such as fungi such as yeast, eukaryotic animals and / or eukaryotic plant derived cells such as embryonic cells, stem cells, somatic cells, germ cells, etc., eukaryotic animals such as human, (Such as algae such as green algae, corn, soybean, wheat, rice, etc.), and the like may be selected from the group consisting of primates, dogs, pigs, cows, sheep, goats,

Another example provides a method for producing a transformed organism by genetic modification using a Cpf1 protein or a DNA encoding it and a guide RNA or a DNA encoding the same.

Another example provides a transformed organism produced by the method for producing the transformed organism. The transformed organism may be any eukaryotic cell such as fungi such as yeast, eukaryotic animal and / or eukaryotic plant derived cells such as embryonic cells, stem cells, somatic cells, germ cells, etc., Mouse, rat, etc.) and eukaryotic plants (for example, birds such as green algae, corn, soybean, wheat, rice, etc.).

Another example is a method of delivering a complex comprising an RNA-guided endonuclease (RGEN) or a DNA encoding it and a guide RNA, or a DNA encoding it, to an organism, Direct injection of a target site), microinjection, electroporation, or lipofection.

One of the ways to overcome the drawbacks of the type II CRISPR-Cas9 system is to use the technique of using the type V CRISPR system protein Cpf1.

Cpf1 is a type V CRISPR system protein, which is similar to Cas9, a type II CRISPR system protein, in that a single protein binds to crRNA and cleaves the target gene. In particular, since Cpf1 protein acts as a single crRNA, it is not necessary to construct a single guide RNA (sgRNA) that combines crRNA and trans-activating crRNA (tracrRNA) or artificially combines tracrRNA and crRNA as in Cas9. Unlike Cas9, the Cpf1 system is located at the 5 'position of the target sequence, and the length of the guide RNA that determines the target is shorter than that of Cas9. Using this feature, Cpf1 can be used for genomic correction of target sequences that can not use Cas9, and it is relatively easy to compare with Cas9, which produces the guide RNA, crRNA. In addition, Cpf1 has the advantage that 5 'overhang (sticky end) is generated rather than blunt-end at the position where the target DNA is cleaved, so that more accurate and various gene correction can be performed.

Techniques are provided herein for calibrating the target dielectric more conveniently, accurately and effectively using the Cpf1 system.

As used herein, the term "genome editing" refers to a nucleic acid molecule (one or more, eg, 1-100,000 bp, 1-10,000 bp, 1, Alteration, and / or recovery (modification) of the gene function by deletion, insertion, substitution, etc. of 1 to 1000 bp, 1-70 bp, 1-50 bp, 1-30 bp, 1-20 bp, Can be used to mean.

According to one embodiment, it is possible to cleave the target DNA at a desired position with the type V CRISPR-Cpf1 system using the Cpf1 protein. According to another embodiment, the type V CRISPR-Cpf1 system using the Cpf1 protein is capable of calibrating specific genes in the cell.

In addition, a technique for overcoming the disadvantages of the existing microinjection method is provided in the technology for transferring CRISPR-Cpf1 ribonucleoprotein (RNP) or a DNA encoding the same to cells. As an example of such a technique, there is provided a technique of incorporating a ribonucleic acid protein or a DNA encoding the ribonucleic acid protein into a large number of cells at one time by a method such as electroporation, lipofection, The dielectric correction technique using the Cpf1 system is not limited thereto.

The CRISPR-Cpf1 ribonucleic acid protein may be introduced into a cell or an organism in the form of a recombinant vector comprising a recombinant vector comprising DNA encoding Cpf1 and a DNA encoding a crRNA, or a mixture comprising Cpf1 protein and a crRNA, Lt; RTI ID = 0.0 > ribonucleic acid < / RTI > protein form.

One example provides a composition for genetic modification comprising a ribonucleic acid protein comprising a Cpf1 protein or a DNA encoding it and a guide RNA (CRISPR RNA; crRNA) or DNA encoding the same.

Another example provides a method for genetic modification of an organism, comprising transferring a ribonucleic acid protein comprising a Cpf1 protein and a guide RNA (CRISPR RNA; crRNA) to the organism.

The Cpf1 protein or the DNA encoding it, and the guide RNA or the DNA encoding the same, which are contained in or used in the above-mentioned dielectric substance for orthodontic correction or the dielectric calibration method, may be a mixture comprising a Cpf1 protein and a guide RNA or a ribonucleic acid protein ribonucleoprotein (RNA), DNA encoding the Cpf1 protein, and DNA encoding the guide RNA, each in a separate vector, or may be used together in one vector.

The composition and method may be applied to eukaryotic organisms. Such eukaryotic organisms include eukaryotic cells such as fungi such as yeast, eukaryotic and / or eukaryotic plant derived cells such as embryonic cells, stem cells, somatic cells, germ cells, etc., eukaryotic animals such as vertebrates or invertebrates Animal, more specifically mammals including primates such as humans and monkeys, dogs, pigs, cows, sheep, goats, mice and rats) and eucaryotic plants (for example birds such as green algae, corn, soybean, wheat , A terminal leaf such as rice or a twin leaf plant, etc.).

Another example provides a method for preparing a transformed organism by genetic modification using a Cpf1 protein. More specifically, the method for producing the transformed organism may include the step of delivering a Cpf1 protein or a DNA encoding it and a guide RNA (CRISPR RNA; crRNA) or a DNA encoding the same to eukaryotic cells. If the transformed organism is a transgenic eukaryote animal or a transformed eukaryotic plant, the method may further comprise the step of culturing and / or differentiating the eukaryotic cell either simultaneously with or subsequent to the step of delivering.

Another example provides a transformed organism produced by the method for producing a transformed organism.

The transgenic organism may be any eukaryotic cell such as fungi such as yeast, eukaryotic animal and / or eukaryotic plant derived cells such as embryonic cells, stem cells, somatic cells, germ cells, etc., eukaryotic animals such as vertebrates (E. G., Algae such as green algae, corn, soybeans, etc.), or invertebrates, more specifically mammals including primates such as humans, monkeys, dogs, pigs, cows, sheep, , Wheat, rice, and the like), and the like.

In the dielectric correction method and the method for producing a transformed organism provided herein, the eukaryotic animal may be excluded from a human, and the eukaryotic cell may include cells isolated from an eukaryotic animal including a human.

As used herein, the term " ribonucleic acid protein " refers to a protein-ribonucleic acid complex comprising a Cpf1 protein, which is an RNA guide endonuclease, and a guide RNA (crRNA).

The Cpf1 protein is an endonuclease of the new CRISPR system that is distinct from the CRISPR / Cas system, and is relatively small in size compared to Cas9, does not require tracrRNA, and can act by a single guide RNA. Also, the Cpf1 protein is a 5'-TTN-3 'or 5'-TTTN-3' (N is any nucleotide, A, T, G, or C) and recognizes a DNA sequence rich in thymine, such as DNA, and cuts the double strand of the DNA to produce a cohesive end (cohesive double-strand break). The resulting cohesive termini can facilitate NHEJ-mediated transgene knock-in at the target location (or cleavage site).

For example, the Cpf1 protein is Pseudomonas as Candida tooth (Candidatus), A la pants Spira (Lachnospira), A beauty Lee V. (Butyrivibrio) in, Ferre Greenwich bacteria (Peregrinibacteria), axial domino nose kusu (Acidominococcus) in, formate pie ( Porphyromonas spp., Prevotella spp., Francisella spp., Candidatus spp. Methanoplasma , or Eubacterium genus; for example, Parcubacteria bacterium (GWC2011_GWC2_44_17), Lachnospiraceae bacterium (MC2017), Butyrivibrio proteoclasicus , Peregrinibacteria bacterium (GW2011_GWA_33_10), Acidaminococcus sp. (BV3L6), Porphyromonas macacae , Lachnospiraceae bacterium (ND2006), Porphyromonas crevioricanis , Prevotella disiens , Moraxella bovoculi (237), Smiihella sp. (SC_KO8D17), Leptospira inadai , Lachnospiraceae bacterium (MA2020), Francisella novicida (U112), Candidatus Methanoplasma termite , Candidatus Paceibacter , Eubacterium eligens, and the like, but the present invention is not limited thereto. In one example, the Cpf1 protein is selected from the group consisting of Parcubacteria bacterium (GWC2011_GWC2_44_17), Peregrinibacteria bacterium (GW2011_GWA_33_10), Acidaminococcus sp. (BV3L6), Porphyromonas macacae , Lachnospiraceae bacterium (ND2006), Porphyromonas crevioricanis , Prevotella disiens , Moraxella bovoculi (237), Leptospira inadai , Lachnospiraceae bacterium (MA2020), Francisella novicida (U112), Candidatus Methanoplasma termitum , or Eubacterium eligens But is not limited thereto.

Examples of such Cpf1 proteins are summarized in Table 1 below for the derived microorganisms:

Cpf1 protein Derived microorganism Genbank protein ID NCBI protein GI from NR Database or local GI (for proteins originated from WGS database) Contig ID in WGS database PbCpf1 Parcubacteria bacterium GWC2011_GWC2_44_17 KKT48220.1 818703647 LCIC01000001.1 PeCpf1 Peregrinibacteria bacterium GW2011_GWA_33_10 KKP36646.1 818249855 LBOR01000010.1 AsCpf1 Acidaminococcus sp. BVBLG WP_021736722.1 545612232 AWUR01000016.1 PmCpf1 Porphyromonas macacae WP_018359861.1 517171043 BAKQ01000001.1 LbCpi1 Lachnospiraceae bacterium ND2006 WP_035635841.1 737666241 JNKS01000011.1 PcCpf1 Porphyromonas crevioricanis WP_036890108.1 739008549 JQJC01000021.1 PdCpf1 Prevotella disiens WP_004356401.1 490490171 AEDO01000031.1 MbCpf1 Moraxella bovoculi 237 KDN25524.1 639140168 AOMT01000011.1 LiCpf1 Leptospira inadai WP_020988726.1 537834683 AHMM02000017.1 Lb2Cpf1 Lachnospiraceae bacterium MA2020 WP_044919442.1 769142322 JQKK01000008.1 FnCpf1 Francisella novicida U112 WP_003040289.1 489130501 CP000439.1 CMtCpf1 Candidatus Methanplasma termitum AIZ56868.1 851218172 CP010070.1 EeCpf1 Eubacterium eligens WP_012739647.1 502240446 CP001104.1

The Cpf1 protein may be isolated from microorganisms or non-naturally occurring by recombinant or synthetic methods. The Cpf1 protein may further include, but is not limited to, elements commonly used for nuclear transfer of eukaryotic cells (e.g., nuclear localization signal (NLS), etc.). The Cpf1 protein may be used in the form of a purified protein, or may be used in the form of a DNA encoding the same, or a recombinant vector containing the DNA.

The guide RNA can be appropriately selected according to the kind of Cpf1 protein to be complexed and / or the microorganism derived therefrom.

In one example, the crRNA used in the Cpf1 system can be represented by the following general formula 1:

5'-n1-n2-AU-n3-UCUACU-n4-n5-n6-n7-GUAGAU- (N cpf1 ) q- 3 '(SEQ ID NO: 60).

In the general formula 1,

n1 is absent or is U, A or G, n2 is A or G, n3 is U, A or C, n4 is absent or G, C or A and n5 is A, C, G, or n6 is absent, U, G or C, n7 is U or G,

N cpf1 is determined according to the target sequence of the target gene as a targeting sequence including a nucleotide sequence capable of hybridizing with a gene target site, q represents an integer of 15 to 30, an integer of 15 to 29, 15 An integer of 15 to 26, an integer of 15 to 26, an integer of 15 to 25, an integer of 15 to 24, an integer of 15 to 23, an integer of 15 to 22, an integer of 15 to 21, An integer of 16 to 30, an integer of 16 to 29, an integer of 16 to 28, an integer of 16 to 27, an integer of 16 to 26, an integer of 16 to 25, an integer of 16 to 24, , An integer of 16 to 22, an integer of 16 to 21, an integer of 16 to 20, an integer of 17 to 30, an integer of 17 to 29, an integer of 17 to 28, an integer of 17 to 27, An integer of 17 to 24, an integer of 17 to 23, an integer of 17 to 22, an integer of 1 An integer of 7 to 21, an integer of 17 to 20, an integer of 18 to 30, an integer of 18 to 29, an integer of 18 to 28, an integer of 18 to 27, an integer of 18 to 26, an integer of 18 to 25, An integer of 18-23, an integer of 18-23, an integer of 18-22, an integer of 18-21, or an integer of 18-20. The target sequence (the sequence that hybridizes with the crRNA) of the target gene is a PAM sequence (5'-TTN-3 'or 5'-TTTN-3'; N is any nucleotide and a base of A, T, G, 15 to 29, 15 to 28, 15 to 27, 15 to 26, 15 to 25, and 15 to 26 (for example, consecutive) adjacent to each other in the 3 ' 15 to 24, 15 to 23, 15 to 22, 15 to 21, 15 to 20, 16 to 30, 16 to 29, 16 to 28, 16 to 27, 16 to 26, 16 to 25, 16 to 24, 16 to 23, 16 to 22, 16 to 21, 16 to 20, 17 to 30, 17 to 29, 17 to 28, 17 to 27, 17 to 26, 17 to 25, 17 to 24, 17 to 23, 17 to 22, 17 to 21, 17 to 20, 18 to 30, 18 to 29, 18 to 28, 18 to 27, 18 to 26, 18 to 25, 18 to 24, 18 to 23, 18 to 22, 18 to 21, or 18 to 20 nucleotides of the target region of the target gene.

(5 'terminal stem portion) and the 15th nucleotide (from the 16th nucleotide when n4 exists) to the 19th nucleotide (when n4 is present, 20 nucleotides) from the 6th to 10th nucleotides (3 'terminal stem portion) are antiparallel to each other to form a complementary nucleotide to form a double stranded structure (stem structure), and the 5' terminal stem portion and the 3 'terminal stem portion 3 to 5 nucleotides may form a loop structure.

The crRNA of the Cpf1 protein (for example, represented by the general formula 1) may further include 1 to 3 guanines (G) at the 5 'terminus.

In the present specification, a nucleotide sequence capable of hybridizing with a gene target site is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99% identical to the nucleotide sequence (target sequence) Or 100% sequence complementarity (hereinafter, used in the same sense unless otherwise specified, and the sequence homology can be confirmed using conventional sequence comparison means (for example, BLAST)), . For example, a crRNA hybridizable with the target sequence is complementary to a corresponding sequence located on the opposite strand of the nucleic acid strand (i.e., the strand in which the PAM sequence is located) in which the target sequence (located on the same strand as the strand in which the PAM sequence is located) In other words, the crRNA may include a sequence in which a target sequence represented by a DNA sequence is substituted with a sequence of T as a target sequence region.

In the present specification, a crRNA can be expressed as a target sequence, and in this case, the crRNA sequence can be interpreted as a sequence in which T is replaced with U in the target sequence.

The nucleotide sequence (target sequence) of the gene target site is a nucleotide sequence having at least 50%, at least 66%, or at least 75% sequence homology with TTTN or TTN (N is A, T, C, or G) (For example, the 5 'end of the target sequence is directly linked to the PAM sequence (0nt distance), 1 to 10 nt distance), or the 5' terminal PAM sequence is linked to the PAM sequence In addition, a sequence complementary to the PAM sequence at the 3 'end (NAAA or NAA, or a sequence having 50% or more, 66% or more, or 75% Or an inverted PAM sequence at the 3 'terminus (for example, the inverted PAM sequence may be directly linked (at a distance of 0nt) with the 3' terminus of the target sequence, and may be connected with 1 to 10 nt distance).

The 5 'terminal region sequence (excluding the targeting sequence region) of the crRNA sequence of the usable Cpf1 protein according to the Cpf1-derived microorganism is exemplified in Table 2:

Cpf1-derived microorganism The 5 'end sequence (5'-3') of the guide RNA (crRNA) Parcubacteria bacterium GWC2011_GWC2_44_17 (PbCpf1) AAAUUUCUACU-UUUGUAGAU Peregrinibacteria bacterium GW2011_GWA_33_10 (PeCpf1) GGAUUUCUACU-UUUGUAGAU Acidaminococcus sp. BVBLG (AsCpf1) UAAUUUCUACU-CUUGUAGAU Porphyromonas macacae (PmCpf1) UAAUUUCUACU-AUUGUAGAU Lachnospiraceae bacterium ND2006 (LbCpf1) AAUUUCUACUAAGUGUAGAU Porphyromonas crevioricanis (PcCpf1) UAAUUUCUACU-AUUGUAGAU Prevotella disiens (PdCpf1) UAAUUUCUACU-UCGGUAGAU Moraxella bovoculi 237 (MbCpf1) AAUUUCUACUG-UUUGUAGAU Leptospira inadai (LiCpf1) GAAUUUCUACU-UUUGUAGAU Lachnospiraceae bacterium MA2020 (Lb2Cpf1) GAAUUUCUACU-AUUGUAGAU Francisella novicida U112 (FnCpf1) AAUUUCUACU-GUUGUAGAU Candidatus Methanplasma termite (CMtCpf1) GAAUCUCUACUCUUUGUAGAU Eubacterium eligens (EeCpf1) UAAUUUCUACU - UUGUAGAU

(-: meaning no nucleotide exists)

In one example, the crRNA may be an in vitro transcribed crRNA using the plasmid as a template.

In another example, the crRNA may be one that does not contain a phosphate-phosphate linkage (e.g., a diphosphate or a triphosphate) at the 5 'terminus. Since the crRNA does not contain a phosphoric acid-phosphate bond at the 5 'terminus, the immune response inducing ability and / or cytotoxicity may be significantly reduced as compared with the case where the crRNA contains the phosphate-phosphate bond. Such cytotoxicity reduction does not result in an innate immunity; (Reduction) and / or elimination (elimination) of cell viability and / or cell viability, cell proliferation inhibition, and / or cell damage, hemolysis, and / or death induction. For example, a guide RNA that does not include a phosphoric acid-phosphate bond at the 5 'terminus may contain a monophosphate group or an OH group at the 5' terminus, or may contain a nucleotide sequence encoding a cell in a eukaryotic cell or an eukaryote that is different from a pathogen such as a virus or a bacterium May refer to having a modified form of the 5 'end of all possible RNAs without toxicity inducing (e.g., a 5' terminal form, naturally or artificially modified for immunosuppression, stability enhancement, labeling, etc.). The above-mentioned crRNA was prepared by in vitro transcription using an RNA polymerase of prokaryotic cells such as T7 RNA polymerase, T3 RNA polymerase and SP6 RNA polymerase, and then 2 out of 3 phosphate groups at the 5 ' (E.g., triphosphate and / or diphosphate is removed) or phosphate (such as diphosphate and / or triphosphate) is not included at the 5'end. It may be synthesized. Removal of the phosphate group at the 5'terminus, such as removal of two or more phosphate groups (i.e., triphosphate and / or diphosphate), may be accomplished by any conventional method of decomposing ester bonds with phosphate groups to liberate two or three phosphate groups from the RNA And can be carried out by, for example, treating phosphatase. However, the present invention is not limited thereto. The phosphatase may be at least one selected from the group consisting of Calf Intestinal alkaline phosphatase (CIP), Shrimp Alkaline Phosphatase (SAP), Antarctic Phosphatase and the like, but is not limited thereto. Among all the enzymes releasing phosphate groups from RNA Can be selected.

In one example, the Cpf1 protein and the crRNA used in the dielectric correction composition, the dielectric correcting method, the composition for producing a transformant, and the transformant preparation method provided herein have a purified Cpf1 protein and a phosphate-phosphate bond (E. G., Chemically synthesized).

On the other hand, since the size of the gene encoding the Cpf1 protein is large, there is a problem in that efficiency is inferior when a Cpf1 protein is delivered into a cell or an organism by using a vector (for example, a viral vector such as an adeno-associated virus (AAV) , Which may be a barrier to applying Cpf1 technology. Particularly, in the case of a viral vector such as an AAV vector, due to the packaging limit of the vector, it is generally known that when the gene exceeding the packaging limit is cloned, the virus production efficiency and the intracellular delivery efficiency are lowered.

In order to solve this problem, the Cpf1 protein or the DNA encoding the Cpf1 protein used in the present invention is one of two or more (e.g., two) cleavage fragments produced by cleavage at an arbitrary position of at least one And may include one or more (e.g., two). The two or more Cpf1 cleavage fragments may cover the full length Cpf1 without overlap. The two or more cleavage fragments (DNA fragments) can be contained in one vector or contained in two or more vectors, respectively, and can be delivered to a cell or an organism.

The cleavage site of the Cpf1 protein or the DNA encoding the Cpf1 protein may be a site exposed to the external structure on the tertiary structure of the Cpf1 protein or a site other than the domain having a predetermined function (for example, a domain-domain linker, And may be located in a DNA sequence encoding another site.

For example, Acidaminococcus sp. In the case of BVBLG-derived Cpf1 (AsCpf1), the cleavage site on the protein is located between the 901th amino acid and the 902th amino acid, between the 886th amino acid and the 887th amino acid in the AsCpf1 amino acid sequence (Genbank Accession No. WP_021736722.1; , Between the 399th amino acid and the 400th amino acid, and between the 526th amino acid and the 527th amino acid.

For example, the cleaved fragment has an AsCpf1 amino acid sequence (1307 amino acid length)

1) a first protein fragment from the first amino acid to the 901st amino acid or a first DNA fragment encoding the same, and a second protein fragment from the 902st amino acid to the 1307th amino acid, or a second DNA fragment encoding the same;

2) a first protein fragment from the first amino acid to the 886th amino acid or a first DNA fragment encoding the same, and a second protein fragment from the 887th amino acid to the 1307th amino acid, or a second DNA fragment encoding the same;

3) a first protein fragment from the first amino acid to the 399th amino acid or a first DNA fragment encoding the same, and a second protein fragment from the 400th amino acid to the 1307th amino acid, or a second DNA fragment encoding the same; or

4) a first protein fragment from the first amino acid to the 526th amino acid or a first DNA fragment encoding the same, and a second protein fragment from the 527th amino acid to the 1307th amino acid, or a second DNA fragment

. ≪ / RTI >

Although the cut position and the cut fragment are exemplified by AsCpf1, the cut position and the cut fragment can be applied to the corresponding position in Cpf1 derived from another organism. The " corresponding position in Cpf1 derived from another organism " refers to the position of the AsCpf1 amino acid sequence or the DNA sequence encoding the same and the amino acid sequence of Cpf1 of the organism or the DNA sequence encoding the same, using conventional sequence comparison means (for example, BLAST For example, a Position-Specific Iterative BLAST (PSI-BLAST), blast.ncbi.nlm.nih.gov/Blast.cgi), and the like, Is a matter that can be clearly understood by the person having the.

The cleaved fragment of the Cpf1 protein or a gene encoding the same may comprise two or more cleavage fragments, wherein the two or more cleavage fragments are each an N-terminal and / or a C-terminal (in the case of a protein fragment) Terminal and / or 3 ' end (in the case of a gene fragment) with a nucleic acid molecule encoding a binding protein or binding protein. The binding protein may be a different protein that binds to different sites of the same bioactive substance. In one example, the bioactive substance is rapamycin, and the binding protein may be selected from the group consisting of FRB protein and FKBP protein, but is not limited thereto.

When a gene fragment (truncated gene fragment) encoding the two or more Cpf1 protein fragments is delivered through a recombinant vector, the two or more truncated gene fragments may be contained in separate vectors or included together in one vector.

In another example, truncated gene fragments, each included with or included in a truncated vector contained in the vector, are linked to a crRNA-encoding DNA in the 5'- or 3'-terminal (e.g., 5'-terminal) orientation of each truncated gene fragment Lt; / RTI > In one example, the vector comprising the first DNA fragment comprises, in the 5 'to 3' direction, a first DNA fragment encoding a promoter, a crRNA encoding DNA, a promoter, and a first protein fragment of the Cpf1 protein, 2 DNA fragment may include a promoter, a crRNA encoding DNA, a promoter, and a second DNA fragment encoding a second protein fragment of the Cpf1 protein in the 5 'to 3' direction (see FIG. 32 (a)) ).

All the steps performed in the dielectric correction method and the method for producing a transformed organism provided herein may be performed intracellularly or extracellularly, or in vivo or ex vivo.

Another example of the present invention is a disadvantage in that each embryo must be processed one by one while confirming each embryo through a microscope when the cells (for example, embryo) of the ribonucleic acid protein are transferred by microinjection method. Especially, when processing a large number of embryos in order, This provides a technique to overcome the technical obstacles that arise from the fact that the embryo is kept in the 1 cell stage for a short time.

In addition, when crRNA is used as a form (recombinant vector) contained in a vector (such as a plasmid) rather than a form of a PCR product (amplification), gene correction (cutting, insertion, Deletion, etc.) efficiency (see FIGS. 14A and 14B), and provides a technique of using crRNA as a vector (cloned) form. The vector may comprise a crRNA expression cassette comprising a transcriptional control sequence such as a crRNA encoding DNA and / or a promoter operably linked thereto.

Specifically, another example is a mixture comprising a RNA-guided endonuclease (RGEN) and a guide RNA or a ribonucleoprotein (RNP), a DNA encoding them, or a recombinant vector containing the DNA (E. G., Direct injection of a lesion or target site), microinjection, electroporation, lipofection (e. G., Direct injection) into cells (e. G., Eukaryotic cells) or organisms , Using lipofectamine) or the like.

Other examples include a mixture comprising a RNA-guided endonuclease (RGEN) and a guide RNA or a ribonucleoprotein (RNP), DNA encoding them, or a cell using a recombinant vector comprising the DNA (E. G., Eukaryotic cells) or organisms (e. G., Eukaryotic organisms) and methods for preparing transformed organisms, the mixture, ribonucleic acid protein, DNA, (E. G., Eukaryotic cells) or organisms (e. G., Eukaryotic organisms), such as by direct injection, microinjection, electroporation, lipofection have. When the cell to be delivered is a plant cell, the plant cell is mixed with a surfactant such as polyethylene glycol (PEG), mixed with a mixture containing the endonuclease and the guide RNA or a ribonucleic acid protein and then delivered .

Other examples include a mixture comprising a RNA-guided endonuclease (RGEN) and a guide RNA or a ribonucleoprotein (RNP), a DNA encoding them, or a recombinant vector comprising the DNA in a cell (E. G., Eukaryotic cells) or organisms (e. G., Eukaryotic organisms) in a method of delivering said mixture, ribonucleic acid protein, DNA, or recombinant vector to a cell (E. G., Eukaryotic cells) or organisms (e. G., Eukaryotic cells) by, for example, injections (e. G., Lesions or target site direct infusion), microinjection, electroporation, lipofection , ≪ / RTI > a eukaryotic organism). ≪ / RTI > When the cell to be delivered is a plant cell, the plant cell is mixed with a surfactant such as polyethylene glycol (PEG), mixed with a mixture containing the endonuclease and the guide RNA or a ribonucleic acid protein and then delivered .

In the above-described method, a mixture or ribonucleic acid protein comprising the endonuclease (for example, Cpf1, Cas9, etc.) or the DNA encoding it and the guide RNA (for example, crRNA, sgRNA, etc.) The transfer of the DNA encoding the same can be accomplished by a method of microinjection, electroporation, or immunoprecipitation of a mixture of endogenous (purified) endonuclease and guide RNA expressed in vitro or the ribonucleic acid protein to which they are conjugated, Lt; RTI ID = 0.0 > eukaryotic < / RTI > and / or eukaryotic organisms. In another example, the delivery of a mixture or ribonucleic acid protein comprising said endonuclease (such as Cpf1, Cas9, etc.) or a DNA encoding it and a guide RNA (e.g., crRNA, sgRNA, etc.) A recombinant vector containing the expression cassette containing the expression cassette containing the DNA encoding the nuclease and the DNA encoding the guide RNA, respectively, in a separate vector, or in one vector, may be introduced by a local injection method (for example, By direct delivery to the eukaryotic cell and / or the eukaryotic organism in a manner such as by direct infusion, direct infusion, microinjection, electroporation, lipofection, and the like.

The expression cassette may comprise, in addition to the endonuclease coding DNA or the crRNA coding DNA, a conventional gene expression control sequence in operable linkage with the endonuclease coding DNA or the crRNA coding DNA. The term " operatively linked " means a functional association (cis) between a gene expression control sequence and another nucleotide sequence.

The gene expression control sequence may be at least one selected from the group consisting of a replication origin, a promoter, and a transcription termination terminator.

The promoter described herein is one of the transcription control sequences that regulate the transcription initiation of a specific gene and is usually a polynucleotide fragment of about 100 to about 2500 bp in length. In one embodiment, the promoter is capable of regulating transcription initiation in a cell, such as a eukaryotic cell, such as a plant cell, or an animal cell (e. G., A mammalian cell such as a human, mouse, etc.) Do. For example, the promoter may be a CMV promoter (e.g., a human or mouse CMV immediate-early promoter), a U6 promoter, an EF1-alpha (elongation factor 1-a) promoter, an EF1-alpha short (EFS) promoter, an SV40 promoter , An adenovirus promoter (major late promoter), pL ? Promoter, trp promoter, lac promoter, tac promoter, T7 promoter, vaccinia virus 7.5K promoter, HSV tk promoter, SV40E1 promoter, respiratory syncytial virus (RSV) promoter, metallothionin promoter ), 棺 -actin promoter, ubiquitin C promoter, human IL-2 (human IL-2) gene promoter, human lymphotoxin gene promoter, human GM-CSF (human granulocyte-macrophage colony stimulating factor) , But the present invention is not limited thereto. In one example, the promoter may be selected from the group consisting of a CMV immediate-early promoter, a U6 promoter, an EF1-alpha (elongation factor 1-a) promoter, and an EF1-alpha short (EFS) promoter. The transcription termination sequence may be a polyadenylation sequence (pA) or the like. The replication origin may be f1 replication origin, SV40 replication origin, pMB1 replication origin, adeno replication origin, AAV replication origin, BBV replication origin, and the like.

The vectors described herein may be selected from the group consisting of plasmid vectors, cosmid vectors, and viral vectors such as bacteriophage vectors, adenovirus vectors, retroviral vectors, and adeno-associated viral vectors. The vector that can be used as the recombinant vector may be a plasmid (for example, pcDNA series, pSC101, pGV1106, pACYC177, ColE1, pKT230, pME290, pBR322, pUC8 / 9, pUC6, pBD9, pHC79, pIJ61, pLAFR1 (eg, λgt4λB, λ-Charon, λΔz1, M13, etc.) or viral vectors (eg, adeno-associated virus (AAV) vectors, etc.) But it is not limited thereto.

The eukaryotic organism may be eukaryotic (e. G., Fungi such as yeast, eukaryotic and / or eukaryotic plant derived cells such as embryonic cells, stem cells, somatic cells, etc.), eukaryotic animals (e. G., Vertebrates or invertebrates, Specific examples thereof include primates such as humans and monkeys, mammals including dogs, pigs, cows, sheep, goats, mice, rats and the like) and eukaryotes such as algae such as green algae, corn, Or a dicotyledonous plant), but the present invention is not limited thereto.

The RNA guide endonuclease may be present in the form of a mixture or a complex with a single guide RNA (sgRNA) or a dual guide RNA, and may be obtained by cleaving the targeting sequence of a gene target region contained in RNA, Refers to endo-nuclease, which is typified by type II, such as Cas9 protein (CRISPR associated protein 9), Cpf1 protein (CRISPR from Prevotella and Francisella 1), and / or the type V CRISPR / Cas system It can be endogenous.

The Cas9 protein is expressed in Streptococcus sp. But is not limited to, those derived from Streptococcus sp., Such as Streptococcus pyogenes (SwissProt Accession number Q99ZW2).

The Cpf1 protein is as described above (see, for example, Table 1).

Endonuclease such as Cas9 protein, Cpf1, etc. may be isolated from microorganisms or non-naturally occurring by recombinant or synthetic methods. The endonuclease may be an N-terminal nucleoside, such as a nuclear localization signal (NLS) (e.g., PKKKRKV, KRPAATKKAGQAKKKK, or a nucleic acid molecule encoding the same) End or C-terminus (or the 5 ' end or the 3 ' end of the nucleic acid molecule encoding it), but is not limited thereto. The endonuclease protein can be used in the form of a purified protein, or can be used in the form of a DNA encoding it, or a recombinant vector containing the DNA.

The guide RNA may be appropriately selected depending on the kind of the endonuclease to be complexed and / or the microorganism derived therefrom. For example, the guide RNA may be at least one member selected from the group consisting of CRISPR RNA (crRNA), trans- activating crRNA (tracrRNA), and single strand guide RNA (sgRNA), and CRISPR RNA A single complex, a complex of CRISPR RNA (crRNA) and trans- activating crRNA (tracrRNA), or single strand guide RNA (sgRNA).

For example, a complex containing Cas9 protein (Cas9 system) has two guide RNAs, namely CRISPR RNA (crRNA) having a nucleotide sequence capable of hybridizing with the target region of the gene and additional trans- activating crRNA (tracrRNA ). These crRNAs and tracrRNAs are used in the form of a double-stranded crRNA: tracrRNA complex bound to each other or in the form of single-stranded guide RNA (sgRNA) linked through a linker. The complex containing the Cpf1 protein (Cpf1 system) requires one guide RNA for the purpose of gene correction, that is, a crRNA having a nucleotide sequence capable of hybridizing with the target site of the gene.

The specific sequence of the guide RNA can be appropriately selected according to the type of Cas9 protein or Cpf1 protein (derived microorganism), and it is easily understood by those skilled in the art.

In one example, the crRNA used in the Cas9 system, including the Cas9 protein from Streptococcus pyogenes , can be represented by the following general formula 2:

5 '- (N cas9 ) 1 -GUUUUAGAGCUA- (X cas9 ) m -3' (Formula 2; SEQ ID NO: 61)

In the general formula 2,

N cas9 is a target sequence region including a nucleotide sequence capable of hybridizing with a gene target site, and is a site determined according to a target region of the target gene, 1 is an integer of 18 to 22 representing the number of nucleotides contained in the targeting sequence region, For example 20;

The site containing 12 consecutive nucleotides (GUUUUAGAGCUA) located in the 3 'direction of the targeting sequence region is an essential part of the crRNA,

X cas9 is a site containing m nucleotides located on the 3 'side of the crRNA (i.e., located adjacent to the 3' direction of the essential part of the crRNA), and m may be an integer of 8 to 12, , The m nucleotides may be the same or different and may be independently selected from the group consisting of A, U, C,

In one example, X cas9 may include, but is not limited to, UGCUGUUUUG.

Also, the tracrRNA used in the Cas9 system, including Cas9 protein from Streptococcus pyogenes , can be represented by the following general formula 3:

5 '- (Y cas9 ) p -UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC-3' (SEQ ID NO: 62)

In the general formula 3,

The site containing 60 nucleotides (UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC) is an integral part of the tracrRNA,

Y cas9 is a site containing p nucleotides located adjacent to the 5 'end of an essential part of the tracrRNA, p may be an integer of 6 to 20, such as an integer of 8 to 19, and the p nucleotides may be the same And may be independently selected from the group consisting of A, U, C and G,

In addition, the sgRNA used in the Cas9 system including the Cas9 protein derived from Streptococcus pyogenes contains the target sequence region of the Cas9 crRNA and the tracrRNA region including the essential region of the Cas9 tracrRNA and the nucleotide linker To form a hairpin structure. More specifically, the sgRNA is a double-stranded RNA molecule in which a crRNA region containing a targeting sequence region and an essential region of a crRNA and a tracrRNA region containing an essential region of the Cas9 tracrRNA are bound to each other, and a 3'- And the 5 ' end of the region may have a hairpin structure connected through a nucleotide linker.

The targeting sequence and essential regions of the crRNA and essential regions of the tracRNA are as described above. The nucleotide linker contained in the sgRNA may contain 3 to 5 nucleotides, for example, 4 nucleotides. The nucleotides may be the same or different from each other and are each independently selected from the group consisting of A, U, C and G . In one example, the linker may be, but is not limited to, a nucleotide sequence of 'GAAA'.

For example, the sgRNA may be represented by the following general formula 2:

5 '- (N cas9 ) m -GUUUCAGUUGCU- (linker) -AUGCUCUGUAAUCAUUUAAAAGUAUUUUGAGAGGACCUCUGUUUGACACGUCUGAAUAACUAAAAA-3' (SEQ ID NO: 63)

In the general formula 4,

N cas9 is a target sequence region containing a nucleotide sequence capable of hybridizing with a gene target site and is determined according to the target region of the target gene, m is an integer of 16 to 24 representing the number of nucleotides contained in the target sequence region, Lt; / RTI > may be an integer from 18 to 22;

The linker may comprise from 3 to 5, for example 4, nucleotides,

The target sequence region and the nucleotides included in the linker may be the same or different from each other, and may be independently selected from the group consisting of A, U, C and G, and may be, for example, 'GAAA'.

The crRNA of the Cas9 protein (for example, represented by the general formula 2) or the sgRNA (for example, represented by the general formula 4) has 1 to 3 guanines (G) at the 5 'terminus (i.e., at the 5' terminus of the targeting sequence region of the crRNA) ). ≪ / RTI >

The tracrRNA or sgRNA of the Cas9 protein may further comprise a termination site comprising 5 to 7 uracil (U) at the 3 'end of the essential part (60 nt) of the tracrRNA.

In another example, in a Cpf1 system containing a Cpf1 protein, the crRNA used herein is as described above (see Formulas 1 and 2).

In another example, there is provided a use for the treatment of ocular disease of crRNA targeting Cpf1 protein and Hif1-alpha gene.

Hypoxia-inducible factor 1-alpha (HIF1-alpha) is a subunit of hypoxia-inducible factor 1 (HIF-1), a heterodimer transcription factor, encoded by the HIF1A gene. The Hif1-alpha may be a mammal, such as human Hif1-alpha, and may be an NCBI accession no. NP_001230013.1, NP_001521.1, NP_851397.1, NP_001521.1, and the like. The HIF1A gene can be a mammal, such as the human HIF1A gene, and can be obtained from NCBI accession no. NM_181054.1, NM_001243084.1, NM_001530.1, etc. However, the present invention is not limited thereto.

Specifically, one example is

A Cpf1 protein or a DNA encoding the same, and

A crRNA comprising a nucleotide sequence capable of hybridizing with a consecutive nucleotide sequence (target sequence) of 15 nt to 30 nt at the target site of the Hif1-alpha gene, or a DNA encoding the same

A pharmaceutical composition for preventing or treating ocular diseases.

In another example,

A Cpf1 protein or a DNA encoding the same, and

A crRNA comprising a nucleotide sequence capable of hybridizing with a consecutive nucleotide sequence (target sequence) of 15 nt to 30 nt at the target site of the Hif1-alpha gene, or a DNA encoding the same

To a subject in need of prevention or treatment of an eye disease.

The Cpf1 and crRNA are as described above.

In the pharmaceutical composition and the prophylactic or therapeutic method of the present invention, a recombinant vector containing DNA encoding the Cpf1 protein and DNA encoding the crRNA may be contained or administered in separate vectors or in a single vector.

As the above-mentioned vector, a vector of the above-mentioned kind can be used. For example, adeno-associated virus (AAV) can be used.

The crRNA may comprise a nucleotide sequence capable of hybridizing with a sequence selected from the target sequences of Hif1-a gene of SEQ ID NO: 69 to SEQ ID NO: 79.

The ocular disease may be diabetic retinopathy or senile AMD.

A recombinant vector comprising the Cpf1 protein or a DNA encoding the Cpf1 protein, and a recombinant vector comprising a crRNA comprising a nucleotide sequence capable of hybridizing with a target sequence of 15 nt to 30 nt consecutively at a target site of the Hif1-alpha gene or a DNA encoding the same, Or ribonucleic acid protein may be administered by intravenous or lesional topical administration, such as by retinal injection (e.g., subretinal injection or intravitreal injection).

The subject may be a mammal such as a human, a mouse, or the like.

The present invention is based on the discovery that a Cpf1 system can be used to more effectively perform genetic corrections in eukaryotic cells (e. G., Mammalian cells such as humans and mice, eukaryotic plant cells), and knock-out or knock- Transfected cells and / or transgenic animals / plants. In addition, ribonucleic acid proteins can be delivered to eukaryotic organisms more efficiently by employing electroporation rather than microinjection in the delivery of eukaryotic organisms of ribonucleic acid proteins containing RNA guide endonuclease and guide RNA.

FIG. 1 schematically shows the process of transferring recombinant AsCpf1 and RNP containing crRNA into mouse blastocyst by microinjection.
FIG. 2 shows the result of confirming the base sequence variation in blastocyst through T7E1 experiment.
FIG. 3 shows the result of targeted deep sequencing of Cpf1 RNP genome correction, and it was confirmed that Cpf1 specifically exists in the nucleotide sequence position expected to cause genome cleavage.
Figures 4 to 6 show the results of nonspecific nucleotide sequence variation analysis in mice genetically modified with Cpf1 RNP,
FIG. 4 shows the results obtained by purifying gDNA from the tail of mice prepared using Cpf1 RNP and confirming the base sequence variation at a specific position with T7E1,
FIG. 5 shows the result of targeted deep sequencing of the mutated nucleotide sequence,
FIG. 6 shows the result of genome wide sequencing of tail gDNA to confirm that there is no sequence variation at a nonspecific position.
FIGS. 7 to 10 relate to genetic correction in mouse embryo by transferring SpCas9 and AsCpf1 RNP by electroporation,
FIG. 7 schematically shows a process of binding SpCas9 / AsCpf1 to sgRNA / crRNA and delivering them to a plurality of mouse embryos via electroporation.
Fig. 8 shows the result of confirming the nucleotide sequence variation caused by SpCas9 RNP electroporation with T7E1,
Figure 9 shows the results of targeted deep sequencing analysis of nucleotide sequences generated by SpCas9 RNP electroporation,
FIG. 10 shows the result of targeted deep sequencing analysis of nucleotide sequence variation caused by AsCpf1 RNP electroporation.
11 is a schematic diagram showing a method of correcting a genome by AsCpf1 and LbCpf1 recombinant proteins of homologous FAD2 genes in soybean protoplasts.
FIG. 12 and FIG. 13 show the result of analysis of nucleotide sequence variation of FAD2 genes,
12 shows the results of the dielectric calibration efficiency using AsCpf1 and LbCpf1,
Figure 13 shows the result of specific base sequence mutation detection by targeted deep sequencing.
FIGS. 14A and 14B show the results of a comparison of the cell genome correction and efficiency using Plasmid U6-crRNA and PCR product U6-crRNA,
14a is an electrophoresis image showing the results of comparing the cytodiode correction efficiencies of plasmid U6-crRNA and PCR product U6-crRNA through the T7E1 assay.
14b is a graph showing the quantitative analysis results of the cytodiode correction efficiency using the Targeted-deep sequencing method.
15a and 15b show in vitro cleavage assay results for recombinant Cpf1 protein purification and activity confirmation,
15a shows the results of SDS-PAGE electrophoresis of AsCpf1 and LbCpf1 expressed and purified in bacteria,
15b is the result of cleavage of target DNA using purified recombinant Cpf1 protein and in vitro transcription (T7) or synthetic crRNA and electrophoresis on TBE-agarose gel.
16a to 16c show the results of cytodiode correction through RNP consisting of recombinant Cpf1 and crRNA,
16a is electrophoresis image of T7E1 assay of cytoskeletal correction by RNP transfer consisting of As- / Lb-Cpf1 and crRNA,
16b is a graph showing the results of measuring the cytodiode correcting efficiency of Cpf1 RNP using a targeted deep-sequencing method,
16c is an electrophoresis image showing the comparison of the efficiency of the in vitro transcription with the crRNA prepared by measuring the cytoskeletal correction using synthetic crRNA with T7E1.
17a to 17c show in vitro cleavage and Digenome-seq results of Cpf1 and cRNA using the cell genome,
17a is a schematic diagram of qPCR and Digenome-seq through intracellular cytoskeleton cleavage using Cpf1 protein and crRNA,
17b is a graph showing the results of quantitating the remaining target positional genome by qPCR after cleavage with Lb- / As-cpf1 protein (3nM-300nM) and crRNA (9nM-900nM)
17c shows the result of IGV comparison of the sequence readings near the target position by total dielectric sequencing of the cell dielectrics before and after in vitro cleavage, respectively.
18a and 18b show Digenome-seq results using Cpf1 and crRNA,
18a shows the genomic sequence and the genomic location of the non-target candidate detected by digenome-seq,
18b shows the conserved sequence of the non-target candidate position as a sequence logo.
FIG. 19A is an electrophoresis image showing the results of comparing the cytodiode correcting efficiency with plasmid crRNA and PCR product crRNA through the T7E1 assay.
FIG. 19B is a graph showing the indel frequencies (%) measured by targeted deep sequencing using crRNA for each of four Cpf1 orthologs (Error bars indicate sem).
Figure 19c is a graph showing the mutation frequencies (Indel frequencies (%)) induced by LbCpf1, AsCpf1, and SpCas9, respectively, at 10 endogenous target sites within HEK293T cells are shown.
20a to 20c show the results of targeted deep sequencing of the indel frequency (%) when using the crRNA for the on-target in the HEK293T cell and the crRNA for the target with one or two mismatched nucleotides, and found that Cpf1 Specificity is shown,
20a is a graph showing the results for DNMT1-3 ,
20b is a graph showing the results for DNMT1-4,
20c is a graph showing the results for AAVS1 (Error bars indicate sem).
21a to 21f show the result of measuring the genome-wide target specificity of Cpf1 and Cas9 nuclease by Digenome-seq method,
21a and 21b are genome-wide circular plots showing DNA cleavage scores obtained by whole-genome sequencing and Digenome-seq analysis. The original genomic DNA is shown in red, the genomic DNA cleaved with LbCpf1 is green, and AsCpf1 is cut The blue and SpCas9 digested genomic DNA are shown in yellow, respectively. The asterisk indicates one false-positive site found in the original genomic DNA, the arrow indicates the on-target site, and Sequence logos Was measured using a WebLogo using a DNA sequence in an in vitro cleavage site identified by Digenome-seq,
21c is the Fractions of homologous sites captured by Digenome-seq (the left Y-axis, the squares represent the results for AsCpf1, the triplicate represent the results for LbCpf1) and 8 the Cpf1 on- (Erbs bars indicate sem), the number of homologous sites differing from 8 Cpf1 on-target sites up to 6 nucleotides,
21d is a graph showing the off-target sites identified in human cells by targeted deep sequencing. The DNA sequences of the on-target and off-target regions are also shown (the bold text is the PAM sequence and the mismatched nucleotides are the lowercase letters) ,
21e is a graph showing Targeted mutagenesis (Indel frequency (%)) obtained at AsCpf1 off-target site using crRNA redesigned to hybridize to the off-target site,
21f is a graph showing the Cpf1 off-target effect when plasmids encoding Cpf1 and crRNA are used and when RNP complexes with Cpf1 and crRNA are used. The specificity ratio is the off-target indel frequency And the fold difference (RNA / plasmid) between the ratio of the on-target indel frequency to the plasmid used.
22a to 22f show the Sequence logos of Digenome-captured site obtained by using AsCpf1 and the Sequence logos of digenome-captured site obtained by using LbCpf1 at the bottom, showing the Sequence logos of Cpf1-mediated digenome-captured site .
Figure 23 shows the Sequence logos of the Digenome-captured site.
24A to 24F are graphs showing the Indel frequency at the digenome-captured site in HEK293T17 cells. The dark bars are the results obtained in HEK293T17 cells transfected with the LbCpf1 plasmid and the light bars are the HEK293T17 transfected with the AsCpf1 plasmid It is the result obtained from the cell.
Figure 25 is a graph showing the Indel frequencies at the on-target and off-target sites when truncated trRNAs (trru-crRNAs) at the 3 'end and full-length crRNAs were used ( Error bars represent mean ± sem).
Figures 26a-26e show that Cpf1 orthologs exhibit different overhang patterns and mutation properties,
26a is a representative Integrative Genomics Viewer (IGV) image showing a pattern of the overhang in the target site DNTM1 -3 and -4 DNTM1 target site,
26b is a graph showing the number of side-by-side sequence deletions by deletion / insertion size in base pairs,
26c shows a mutation sequence derived from the target site of Cpf1 or Cas9. For each nuclease, the sequence of the first line is the original target sequence, the sequence of the second line shows the sequence introduced with the mutation, In the first line sequence, the PAM sequence (Cpf1: TTTC) is indicated in bold, the target sequence hybridized with the crRNA / sgRNA is underlined, and the underlined sequence in the sequence from the second line means Microhomology sequences , The numbers on the right side indicate the number of deletion (denoted by '-') or insertion (denoted by lower case)
26d and 26e show the mutation characteristics induced by LbCpf1, AsCpf1 and SpCas9, 26d shows that the mutation sequence is defective or not. And 26e is a graph showing the ratio of each of the two fractions to the in-frame indels. out-of-frame indels (Data represent mean ± sem (n = 10 target sites)).
27A and 27B show the mutation characteristics induced by LbCpf1, AsCpf1, and SpCas9,
27a is a graph showing the number of mutated sequence leads bound by deletion / insertion (Indel) size in base pairs. The mutation characteristics are measured by targeted deep sequencing method from HEK293T cells transfected with LbCpf1, AsCpf1, or SpCas9 plasmid Respectively.
27b is EMX1 -2 target site; and (CTGATGGTCCATGTCTGTTACTC SEQ ID NO: 42) by showing a sequence variation, for each of the nuclease, the sequence of the first line is the original target region sequences derived from, the transition beginning the second line In the first line sequence, the PAM sequence (Cpf1: TTTG) is shown in bold, the target sequence hybridized with crRNA / sgRNA is underlined, and the sequence from the second line is underlined Sequence refers to Microhomology sequences, and the numbers on the right indicate the number of deletions (denoted by '-') or insertions (denoted by lowercase letters).
28 schematically shows the digenome-sequencing process.
29A and 29B show the split position of the Cpf1 protein and a recombinant vector structure expressing the separated Cpf1 protein.
29a is Wild type Acidaminococcus sp . Cpf1 (AsCpf1) protein and four kinds of Split-Cpf1 information,
29b schematically shows a recombinant vector expressing each half-domain of Split-Cpf1.
FIGS. 30A to 30C show the results of dielectric correction using Split Cpf1 and a crRNA expression vector,
30a is a Split-Cpf1 DNMT1 a -3 target dielectric correction result using an agarose gel showing the analysis results confirmed by T7E1 assay method. The asterisk indicates the location of the truncated DNA fragment in the T7E1 enzyme,
30b is a graph comparing the results obtained by quantifying the dielectric correcting efficiency according to the split position by the targeted deep-sequencing method,
30c is a graph comparing the result of quantitation of the Split-Cpf1 dielectric correction efficiency according to the target position by the targeted deep-sequencing method.
FIGS. 31A to 31E show results of analysis of inductive dielectric correction efficiency using binding control of each half domain of Split Cpf1,
31a schematically shows a recombinant vector construct expressing each half-domain of Inducible-Split-Cpf1,
31b shows the result of confirming the target DNMT1 -3 dielectric correction efficiency using a Split-Cpf1 and Inducible-Split-Cpf1 according to Rapamycin treated with targeted deep-sequencing method,
31c to 31f show the result of targeted deep-sequencing analysis of inductive dielectric correcting efficiency by Inducible-Split-Cpf1 according to the target position.
FIGS. 32A and 32B show a process of constructing a virus vector expressing each half domain of Split Cpf1,
32a schematically shows an AAV viral vector construct expressing each half-domain of Split-Cpf1 (Split-3-AsCpf1)
32b shows the results of confirming the target DNMT1 -3 dielectric correction efficiency using the AAV-Split-Cpf1 vector as T7E1 assay method.
Figure 33 shows the nucleotide sequence of the pU6-As-crRNA plasmid, with the underlined portion corresponding to the AsCpf1 crRNA.
Fig. 34 shows the nucleotide sequence of the pU6-Lb-crRNA plasmid. The underlined portion corresponds to the LbCpf1 crRNA.
Figure 35 shows the nucleotide sequence of the U6-As-crRNA-amplicon, and the underlined portion corresponds to the AsCpf1 crRNA.
Figure 36 shows the nucleotide sequence of the U6-Lb-crRNA-amplicon, and the underlined portion corresponds to the LbCpf1 crRNA.
FIG. 37 is a graph showing the results of analysis by Deep sequencing obtained using Indel frequency (%) obtained by transferring crRNA hybridizable with the target sequence of LbCpf1 protein and Hif1-a gene to 293T cells through AAV vector.
38 is a schematic diagram exemplarily showing a recombinant AAV vector (all-in-one AAV vector) including a DNA encoding the LbCpf1 protein and a DNA encoding the crRNA targeting Lb-TS6 of Hif1-a in one vector to be.
39A to 39C show the nucleotide sequence of a recombinant AAV vector containing a DNA encoding the LbCpf1 protein and a DNA encoding the crRNA targeted to Lb-TS6 of Hif1-a in one vector in sequence from 5 'to 3' Show.

Hereinafter, the present invention will be described in more detail with reference to Examples. It is to be understood by those skilled in the art that these embodiments are only for describing the present invention in more detail and that the scope of the present invention is not limited by these embodiments.

Example  1: Recombination Cpf1  Production and purification of proteins

(E. coli codon optimized AsCpf1 coding nucleic acid (SEQ ID NO: 46) and nucleotide locating sequence (NLS) of each of AsCpf1 and LbCpf1, - (linker) sequence for protein expression and purification, including -HA tag (amino acid sequence: (KRPAATKKAGQAKKKK) - (GS) - (YPYDVPDYA) (YPYDVPDYAYPYDVPDYA); DNA sequence: AAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGGGATCCTACCCATACGATGTTCCAGATTACGCTTATCCCTACGACGTGCCTGATTATGCATACCCATATGATGTCCCCGACTATGCC) plasmid (pMAL-c5x having, New England Biolabs; & pDEST-hisMBP) were introduced into bacteria (Rosetta; EMD Milipore) and cultured at 18 ° C for 24 hours to express AsCpf1 and LbCpf1 proteins. 10 ml of Rosetta cells containing the Cpf1 plasmids incubated for 24 hours were incubated in 2 L of Luria broth (LB) growth medium supplemented with 50 mg / ml carbenicillin for 24 hours. The cells were cultured at 37 ° C until the OD600 reached 0.6, cooled to 16 ° C, and then induced with 0.5 mM IPTG (Isopropyl beta-D-1-thiogalactopyranoside) for 14-18 hours. The cells were then harvested and frozen at-80 C until protein purification.

The prepared cell pellet was lysed in lysis buffer (50 mM, HEPES pH 7, 200 mM NaCl, 5 mM MgCl2, pH 7.4) supplemented with lysozyme (Sigma) and protease inhibitor (Roche complete, EDTA- , 1 mM DTT, 10 mM imidazole) and dissolved by sonication. The obtained cell lysate was centrifuged at 16,000 g for 30 minutes and passed through a syringe filter (0.22 micron). The obtained cleared lysate was applied to a nickel column (Ni-NTA agarose, Qiagen), washed with 2M salt, and eluted with 250 mM imidazole. The eluted protein solution was buffer-exchanged and concentrated using lysis buffer not containing magnesium and imidazole. The purified Cpf1 protein was tested by SDS-PAGE and used in the following examples. In the following examples, when a human cell was used, a plasmid encoding the human codon optimized Cpf1 protein was obtained from Addgene instead of the plasmid encoding the E. coli codon optimized Cpf1 protein.

The obtained SDS-PAGE results are shown in Fig. 15A.

Example  2: cell culture and Transfection

HEK293T cells were placed in DMEM medium supplemented with 10% (v / v) FBS (fetal bovine serum) and 1% (v / v) antibiotics. For Cpf1-mediated genetic modification, HEK293T cells were seeded in 24-well plates at 70-80% confluency and then Cpf1 expression plasmid (500 ng) and a crRNA plasmid (500 ng) were transfected with lipofectamine 2000 (Invitrogen) Cells were transfected. After 72 hours of transfection, genomic DNA was isolated using DNeasy Blood & Tissue Kit (Qiagen).

Example  3: RNP  And Digenome  (digested genome) preparation In vitro  cleavage of genomic DNA)

DNeasy Tissue kit (Qiagen) was purified from genomic DNA from HeLa cell (ATCC). Cpf1 protein (40 ug) and crRNA (2.7 ug each) were preincubated at room temperature for 10 minutes to form a ribonucleoprotein (RNP) complex. The purified genomic DNA (8 ug) was added to a reaction buffer (100 mM NaCl, 50 mM Tris-HCl, 10 mM MgCl 2 , 100 ug / ml BSA, pH 7.9) together with the RNP complex at 37 ° C for 8 hours Lt; / RTI > The digested genomic DNA thus obtained was digested with RNase A (50 ug / mL) to digest crRNA and further purified using DNeasy Tissue kit (Qiagen).

Example  4: Whole genome and cleaved dielectric ( digenome ) Sequence analysis

Whole genome sequencing (WGS) was performed on Cas9 or Cpf1 digested genomic DNA. The WGS was performed with a sequencing depth of 30X to 40X using an Illumina HiSeq X Ten Sequencer (Macrogen, South Korea). Using the WGS data, a DNA cleavage score can be estimated for each nucleotide position across the entire genome. The cleavage score at position i in the i-position in the chromosome was calculated by the following equation (see Fig. 28):

Figure 112016120627256-pat00001

Figure 112016120627256-pat00002
: Number of forward sequence reads starting at position

Figure 112016120627256-pat00003
: Number of reverse sequence reads starting at position

Figure 112016120627256-pat00004
: Sequencing depth at position

The above formula shows that Cas9 produces 1-nt to 2-nt overhangs at the 5 'and 3' ends, in addition to the blunt end, and Cpf1 produces overhangs of 1-nt to 5-nt at the 5 ' . Among the in vitro cleavage sites, the DNA cleavage scores obtained from the above formula were confirmed by computer with cutoff value of 2.5 or higher.

Example  5: crRNA  Comparison of Cell Diode Correction Efficiency with Construct Difference

In order to compare the efficiency of transduction of plasmid DNA in the form of a PCR product containing a cassette capable of expressing crRNA and a plasmid DNA containing a cassette capable of expressing crRNA, , And HEK293T / 17 cells (ATCC) were subjected to lipofection experiments as follows.

A pcDNA3.1 vector (Invitrogen) (AsCpf1 plasmid or LbCpf1 plasmid) containing a DNA sequence encoding the Cpf1 protein (AsCpf1 and LbCpf1) and a CMV promoter operatively linked thereto (SEQ ID NO: 64) AsclRNA plasmid (SEQ ID NO: 65 and FIG. 33) or Lb-crRNA plasmid (SEQ ID NO: 66 and FIG. 34) or PCR product (amplicon; As- were delivered to HEK293T / 17 cells along with a crRNA amplicon (SEQ ID NOS: 67 and 35) or Lb-crRNA amplicon (SEQ ID NOS: 68 and 36). 33 to 36, the underlined portion is a gene region encoding the crRNA, and 'NNNNNNNNNNNNNNNNNNNNNNNN' is a region determined according to the target sequence. The transfer of the DNA encoding the Cpf1 protein and the crRNA was carried out by lipofection. The above cell delivery conditions are summarized in Table 3 below:

Figure 112016120627256-pat00005

In addition, the crRNA sequences and target sequences used above are summarized in Table 4 below:

Target gene DNMT1-3 (DNMT1-TS3) Cpf1 protein AsCpf1 LbCpf1 Target sequence CTGATGGTCCATGTCTGTTACTC (SEQ ID NO: 19)  crRNA UAAUUUCUACUCUUGUAGAU CUGAUGGUCCAUGUCUGUUACUC (SEQ ID NO: 36) AAUUUCUACUAAGUGUAGAU CUGAUGGUCCAUGUCUGUUACUC (SEQ ID NO: 37) Target gene DNMT1-4 Target sequence TTTCCCTTCAGCTAAAATAAAGG (SEQ ID NO: 20) Target gene AAVS1 Target sequence CTTACGATGGAGCCAGAGAGGAT (SEQ ID NO: 21)

((1) The nucleotide sequences described herein, including Table 4, are described in the 5 'to 3' direction, unless otherwise noted.

(2) All AsCpf1 crRNAs described below are those in which the sequence corresponding to the target sequence of the target gene (i.e., the T in the target sequence is replaced with U) of the targeting sequence region (underlined) of SEQ ID NO: 36 shown in Table 4, .

(3) All of the LbCpf1 crRNAs described below have a sequence corresponding to the target sequence of the target gene (that is, substituting T in the target sequence for the target sequence) with the targeting sequence region (underlined) of SEQ ID NO: 37 shown in Table 4, .

After transferring the DNA, the cells were incubated at 37 ° C for 72 hours, and genomic DNA was isolated from each cell. T7E1 assay (T7E1 (T7 Endonuclease I) was performed at 37 ° C After 20 min of electrophoresis) and targeted deep-sequencing (PCR amplification of the target portion of the target gene, PCR amplification was performed using a PCR barcode primer for deep-sequencing, and then purified using a DNA purification kit (T7E1 assay result), Fig. 14b (targeted deep-sequencing result), and Fig. 14 (b), and the results are shown in Fig. 14 19a (T7E1 assay result).

As shown in FIGS. 14A and 14B, when the DNMT1 gene is targeted, it is confirmed that the case of transferring the crRNA in the form of plasmid in both AsCpf1 and LbCpf1 performs a dielectric correction with higher efficiency as compared with the case of transferring in the PCR product form Respectively. This challenge was similar when targeting the AAVS1 gene. Also, as shown in Fig. 19A, when crRNA plasmids were used, the frequency of targeted mutagenesis increased by 2 to 30 times in the three endogenous target sites tested compared with the case of using amplicon. PCR amplicons produced incorrect guided RNAs transcripts from synthesis-failed oligonucleotide templates, which would likely result in off-target DNA cleavages at sites that appear potentially to have RNA bulges. These results show that transferring the crRNA expression cassette in the form of plasmid is a means to improve the efficiency of the genetic correction compared with the transfer in the PCR product form.

In addition, various derived Cpf1 orthologs ( Lachnospiraceae bacterium (LbCpf1), Acidaminococcus sp . (AsCpf1), Francisella novicida (FnCpf1), and Moraxella bovoculi 237 (MbCpf1)).

With reference to the above-described process, plasmids containing DNA encoding each of four Cpf1 orthologs (LbCpf1, AsCpf1, FnCpf1, and MbCpf1) were introduced into HEK293T cells in various combinations with plasmids encoding crRNA for each of them , And the frequency of targeted mutagenesis (Indel frequency (%)) was measured by targeted deep sequencing method.

The crRNA sequences for FnCpf1 and MbCpf1 used at this time are summarized in Table 5 below:

Target gene DNMT1-3 Cpf1 protein FnCpf1 (coding DNA: SEQ ID NO: 47) MbCpf1 (coding DNA: SEQ ID NO: 48) Target sequence CTGATGGTCCATGTCTGTTACTC (SEQ ID NO: 19)  crRNA AAUUUCUACUGUUGUAGAU CUGAUGGUCCAUGUCUGUUACUC (SEQ ID NO: 38) AAUUUCUACUGUUUGUAGAU CUGAUGGUCCAUGUCUGUUACUC (SEQ ID NO: 39) Target gene DNMT1-4 Cpf1 protein FnCpf1 MbCpf1 Target sequence TTTCCCTTCAGCTAAAATAAAGG (SEQ ID NO: 20) Target gene AAVS1 Cpf1 protein FnCpf1 MbCpf1 Target sequence CTTACGATGGAGCCAGAGAGGAT (SEQ ID NO: 21)

(In Table 5, the crRNAs of DNMT1-4 and AAVS1 contain a sequence corresponding to the target sequence of the target gene (i.e., T in the target sequence) in the sequence of SEQ ID NO: 38 or SEQ ID NO: 39 Quot; U "))

The obtained Indel frequency (%) is shown in Fig. 19B.

LbCpf1 and AsCpf1 recognize 5'-TTTN-3 'PAMs whereas FnCpf1 and MbCpf1 recognize 5'-TTN-3' PAMs, which are known to be ineffective or inactive in human cells. As shown in Figure 19b, when these Cpf1 orthologs were co-transfected into human cells in various combinations with plasmids encoding crRNA orthologs, each Cpf1 ortholog was co-transfected with cognate crRNA And showed the highest efficiency when transfected. In addition, all four Cpf1 orthologs, including FnCpf1 and MbCpf1, were found to be capable of cleaving chromosomal target positions, even when used in combination with unorthogonal crRNAs from different species. The genetic corrective activity of FnCpf1 and MbCpf1 can be rescued by using a crRNA plasmid, but this study focuses on the two species of Cpf1 (AsCpf1 and LbCpf1), as they are relatively more efficient than AsCpf1 and LbCpf1 Cpf1 orthologs.

(PAM sequence (5'-TTTN-3 ') recognized by Cpf1 and one PAM sequence (5'-NGG-3') recognized by SpCas9) Dielectric correction efficiencies of LbCpf1 and AsCpf1 on 10 chromosomal target sites in the cells were measured and compared with SpCas9. Dielectric calibration efficiencies were calculated as Indel frequencies measured by targeted deep sequencing with reference to the method described above. The 10 target sequences used in this test are shown in Table 6 below:

Gene Target sequence of Cpf1 crRNA Target sequence of SpCas9 sgRNA One DNMT1-3 CTGATGGTCCATGTCTGTTACTC (SEQ ID NO: 19) AGTAACAGACATGGACCATC (SEQ ID NO: 50) 2 DNMT1-4 TTTCCCTTCAGCTAAAATAAAGG (SEQ ID NO: 20) TTTCCCTTCAGCTAAAATAA (SEQ ID NO: 51) 3 AAVS1 CTTACGATGGAGCCAGAGAGGAT (SEQ ID NO: 21) TGCTTACGATGGAGCCAGAG (SEQ ID NO: 52) 4 EMX1 TCCTCCGGTTCTGGAACCACCC (SEQ ID NO: 23) AGGTGTGGTTCCAGAACCGG (SEQ ID NO: 53) 5 CCR5-1 GTGGGCAACATGCTGGTCATCCT (SEQ ID NO: 24) TGGTTTTGTGGGCAACATGC (SEQ ID NO: 54) 6 CCR5-9 GCCTGAATAATTGCAGTAGCTCT (SEQ ID NO: 25) TAGAGCTACTGCAATTATTC (SEQ ID NO: 55) 7 HPRT-1 CTGACCTGCTGGATTACATCAAA (SEQ ID NO: 27) GTGCTTTGATGTAATCCAGC (SEQ ID NO: 56) 8 HPRT-4 TGTCCCCTGTTGACTGGTCATTC (SEQ ID NO: 28) CTAGAATGACCAGTCAACAG (SEQ ID NO: 57) 9 HBB-1 AGTCCTTTGGGGATCTGTCCACT (SEQ ID NO: 40) TCCACTCCTGATGCTGTTAT (SEQ ID NO: 58) 10 VEGFA CGTCCAACTTCTGGGCTGTTCTC (SEQ ID NO: 41) AGCGAGAACAGCCCAGAAGT (SEQ ID NO: 59)

Based on the target sequences shown in Table 6 above, LbCpi1 crRNA and AsCpf1 crRNA were prepared and used for the test as described in Table 4.

The sgRNA of SpCas9 was obtained by replacing '(N cas9 ) m ' in the following sequence (SEQ ID NO: 63) with the sequence of T in the target sequence of SpCas9 in Table 6 and substituting 'GAAA' (Hereinafter, the sgRNA of SpCas9 was constructed in the same manner): < RTI ID = 0.0 >

5 '- (N cas9 ) m -GUUUCAGUUGCU- (linker) -AUGCUCUGUAAUCAUUUAAAAGUAUUUUGAGAGGACCUCUGUUUGACACGUCUGAAUAACUAAAAA-3' (SEQ ID NO: 63)

The results obtained are shown in Fig. 19C. As shown in Figure 19c, all of the nuclease types used in the tests showed extensive mutation frequencies in human cells (HEK293 cells) (SpCas9: mean 37 +/- 5%; LbCpf1: 21 +/- 6%; AsCpf1: 21 +/- 5 %).

Example  6: Recombination Cpf1  Protein purification and Ribonucleic acid protein ( RNP ) Cellular Dielectric Correction via Transfer

6.1. Recombination Cpf1  In vitro cleavage assay using protein

In vitro cleavage assays were performed to determine if the purified recombinant AsCpf1 and LbCpf1 proteins bind to crRNA and cleave DNA. To this end, the recombinant AsCpf1 (1 uM) or LbCpf1 (1 uM) obtained in Example 1 was in vitro transcribed by T7 RNA polymerase (New England Biolabs), or the crRNA targeting DNMT1 chemically synthesized (1 uM), and the DNA fragment having the target (DNMT1) (see Table 4) were incubated together at 37 ° C for 1 hour, and then the target DNA was cut through TBE-agarose gel electrophoresis . In vitro transcription of T7 RNA polymerase (New England Biolabs) by in vitro transcription involves triphosphate (PPP) at the 5 'end, but not chemically synthesized crRNA. The electrophoresis result is shown in Fig. 15B (T7: crRNA prepared by in vitro transcription with T7 RNA polymerase: synthetic: chemically synthesized crRNA).

As shown in FIG. 15B, Cpf1 showed activity to cut target DNA only when crRNA was present. In addition, it was confirmed that the cleavage efficiency of the synthetic crRNA without phosphate at the 5 'end and the crRNA prepared with in vitro transcription with phosphate at the 5' end were similar, suggesting that the presence of phosphate at the 5 ' It does not have any effect.

6.2. Recombination Cpf1  Dielectric calibration test in cells using protein

Recombinant AsCpf1 and LbCpf1 proteins were applied to cell experiments to test for cytogenetic correction through ribonucleoprotein (RNP) transfer.

The recombinant Cpf1 protein (AsCpf1 or LbCpf1) purified in Example 1 and the DNMT1-3 target crRNA (see Table 4, crRNA prepared in vitro transcription) were mixed at an appropriate ratio to prepare RNP, which was subjected to electroporation or lipofection (20 μg of Cpf1: 20 μg of crRNA for electroporation and 10 μg of Cpf1: 2 μg of crRNA for lipofection) in HEK293T / 17 cells. After the RNP transfer, the cells were cultured at 37 ° C for 72 hours. Then, genomic DNA was isolated and the nucleotide sequence of the target site (DNMT1) was detected by the T7E1 assay and targeted deep-sequencing method with reference to the method described in Example 5 The efficiency was analyzed and calculated as frequency (%). For comparison, the same test as above was performed using SpCas9 (SwissProt Accession number Q99ZW2 (NP_269215.1)) and sgRNA (target sequence: AGTACGTTAATGTTTCCTGA). The results are shown in FIG. 16a (T7E1 assay result) and 16b (targeted deep-sequencing result), respectively.

As shown in FIGS. 16A and 16B, both the electroporation method and the lipofection using AsCpf1 and LbCpf1 bound to the crRNA showed mutation efficiency similar to that of SpCas9 at the target site (DNMT1).

RNP transfer using 5 'phosphate-free synthetic crRNA was performed, and the efficiency of genetic amendment was measured and compared with the case of using in vitro transcription-produced crRNA. The obtained results are shown in Fig. 16C. As shown in FIG. 16C, even when synthetic crRNA is used, a dielectric correction efficiency similar to that of a crRNA prepared by in vitro transcription can be obtained.

The results obtained above show that RNP containing the recombinant Cpf1 protein can be effectively used for cell genome correction in cases where the RNP is delivered to cells through electroporation or lipofection. Such an RNP transmission method can effectively perform a dielectric correction in a shorter time than the DNA plasmid delivery method, and has no advantage that no DNA is inserted into the genome of the cell because no DNA is used. In addition, Cpf1 has a sequence different from that of Cas9, making it possible to calibrate the genome at a position that could not be targeted by Cas9. When Cas9 and Cpf1 proteins are used orthogonally, different target genes can be simultaneously calibrated. When catalytic dead form Cpf1 mutant (dCpf1) is used together with dCas9, expression and suppression of multiple target genes can be selectively expressed and inhibited simultaneously It is also possible.

Example  7. Digenome - seq  Used Of Cpf1  Identification of inverted PAM repeat

The genomic DNA isolated from the cells was amplified using recombinant Cpf1 protein (3 nM-300 nM) and crRNA (9 nM-900 nM; SEQ ID NOS: 19, 20, 21, 23, 24, 25, 27, And 28) for 12 hours (see Fig. 17A). After 12 hours, the Cpf1 protein and the crRNA were removed with protease K and RNase A, respectively, and the genome was purified and qPCR (primer used: Forward: AAG TCA CTC TGG GGA ACA CG, Reverse: TCC CTT AGC ACT CTG CCA CT; : 2 step (95 C 10 sec, 60 C 10 sec x 40 cycle)) to determine the cutting efficiency of the dielectric at the target position. The results are shown in Fig. 17B. The value on the y-axis in Fig. 17B means the relative ratio of the uncut dielectric when control is set to one. As shown in FIG. 17B, the 3nM Lb- / As-cpf1 protein and 9nM crRNA genomes of the target site were truncated by 60%, 30 nM As- / Lb-Cpf1 protein and 90 nM crRNA and 300 nM When Lb- / As-cpf1 protein and 900 nM crRNA were used, it was confirmed that the target site was cleaved by 95% or more.

Whole genome sequencing was performed using Cpf1 protein and a genome cleaved by crRNA, and the results were confirmed using the Integrative Genome Viewer (IGV). As shown in FIG. 17C, in the genome treated with the Cpf1 protein and the crRNA, the 5 'end of the reads was vertically aligned at the target position, whereas in the genome without the Cpf1 protein and the crRNA, .

Digenome-seq was performed to look for off-target sites using the Cpf1 protein and a genome cleaved by crRNA (see Example 4). The results obtained are shown in Fig. 18A. As shown in FIG. 18A, one target position and 25 non-target candidate positions were found.

A sequence logo obtained by using the obtained sequence of 26 positions is shown in Fig. 18B. As shown in Fig. 18B, it was confirmed that an inverted-PAM sequence (NAAA) was present on the opposite side in addition to the known PAM sequence (TTTN) of Cpf1. Inverted-PAM was not only AAA but also AAG, AGA and GAA. These results suggest that when Cpf1 protein causes genome cleavage, one Cpf1 protein binds to the genome via binding to crRNA and forms a dimer with another Cpf1, which binds to the opposite PAM sequence (NAAA) It means that it can work. The above inverted-PAM information can be used to select target sites with high cleavage efficiency of Cpf1. When two or more Cpf1 crRNAs are simultaneously used in a similar manner to nickase at the target position with inverted-PAM sequence, Is likely to increase. It is also possible to use this information to increase the homologous recombination (HR) mediated knock-in efficiency by adjusting the overhang length formed at the cleavage site.

Example  8: Of Cpf1  mismatch tolerance test

Both LbCpf1 and AsCpf1 were matched with the targeting sequences of 23-nt protospacer sequences (crRNAs located 5'-TTTN-3 '(N is an A, T, C, or G) PAM sequence and 3' and the target sequence of the crRNA is a sequence from T to U in the protospacer sequence).

Three endogenous target sites (DNMT1-3, DNMT1-4, and AAVS1) were selected, and various crRNAs capable of hybridizing with the off-target sequence containing the on target sequence of the target site and one or two mismatches And the plasmid encoding LbCpf1 or AsCpf1 were transfected into HEK293 cells and Indel frequency (%) was measured by targeted deep sequencing method to determine the degree to which Cpf1 was located between the on-target DNA sequence and the crRNA sequence mismatch was tolerated.

The three selected endogenous target sites (on target) are shown in Table 7 below:

On target DNMT1-3 CTGATGGTCCATGTCTGTTACTC (SEQ ID NO: 19) DNMT1-4 TTTCCCTTCAGCTAAAATAAAGG (SEQ ID NO: 20) AAVS1 CTTACGATGGAGCCAGAGAGGAT (SEQ ID NO: 21)

Off-target sequences of the three selected endogenous target sites are shown in Figures 20a, 20b, and 20c, respectively.

Based on the on-target and off-target sequences shown in Table 7 and Figs. 20a to 20c, LbCpi1 crRNA and AsCpf1 crRNA were prepared and used for the test as described in Table 4.

The obtained Indel frequency (%) is shown in FIG. 20a (Indel frequency of DNMT1-3), 20b (Indel frequency of DNMT1-4) and 20c (Inder frequency of AAVS1), respectively.

20A) and DNMT1-4 (Fig. 20B). When both LbCpf1 and AsCpf1 include one mismatch (in particular, the distance from the PAM (the distance from the 5 'end) And the distance was within 20 nt), and almost completely lost Cpf1 activity when containing two mismatches (especially within 20 nt from PAM). These results show that Cpf1 has high specificity in human cells.

Example  9: Identification of potential off-target site in human genome

Cas-OFFinder was used to identify potential off-target sites in the human genome. Target off-target sites were selected from 1 to 4 or 1 to 5 nucleotide differences from the 10 on-target sites (Table 6) tested, and off-target mutations (Indel frequency (%) in HEK293 cells ) Were measured by targeted deep sequencing method.

Figure 112018015311475-pat00089

Figure 112018015311475-pat00090

Figure 112018015311475-pat00091

Figure 112018015311475-pat00092

EMX1-2 Indel frequency (%) D-
cap.
Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1 As Lb
On-target chr2 73160920 TTTG TCCTCCGGTTCTGGAACCACCC (TTTG-SEQ ID NO: 23) 0 0.02% 12.66% 25.33% o o EMX1-2_02 chr6 134409288 TTTCTCCTCaGGTTCTGGAACCAataC 4 0.00% 0.04% 0.07% o o EMX1-2_03 chrl 23408279 TTTCTCCTCCGGcTtTaGAgtCACACC 5 0.05% 0.04% 0.04% x x EMX1-2_04 chrl 7977477 TTTCTCCTgCGGgTCTGcAAtCtCACC 5 0.01% 0.00% 0.02% x x EMX1-2_05 chr10 68703484 TTTATggTggGGTTCTGGAACCAaACC 5 0.01% 0.01% 0.00% x x EMX1-2_06 chr10 102894100 TTTGTCCgCCGGTTCTGGAACCAggtt 5 0.01% 0.00% 0.00% x x EMX1-2_07 chr10 119307580 TTTGTtCTtCGGTTCTGaAACCAtACt 5 0.01% 0.02% 0.01% x x EMX1-2_08 chr11 93751348 TTTATCaTggtGgTCTGGAACCACACC 5 0.02% 0.02% 0.01% x x EMX1-2_09 chr12 51833378 TTTTTttTttaGTTCTGGAACCACACC 5 0.96% 1.06% 0.98% x x EMX1-2_10 chr14 48093772 TTTATatTCaGGTTCTGGAACCAacCC 5 0.19% 0.16% 0.11% x x EMX1-2_11 chr2 159516970 TTTCTCCaCaGcTTCTGGgACCcCACC 5 0.12% 0.09% 0.08% x x EMX1-2_12 chr5 45576885 TTTATtCTggtGTTCTGGAACCAaACC 5 0.04% 0.03% 0.02% x x EMX1-2_13 chr5 149563041 TTTGcCCgCCGGTTtTGGAACCAgAtC 5 0.06% 0.04% 0.05% x x EMX1-2_14 chr6 46703876 TTTCatCTCCaGTTCTGGcACCtCACC 5 0.07% 0.04% 0.05% x x EMX1-2_15 chr6 122815701 TTTCaCCaCCtGTTCTGGAACCACAaa 5 0.20% 0.16% 0.18% x x EMX1-2_16 chr7 120921149 TTTATtCTgtGGaTCTGGAACCACAtC 5 0.01% 0.01% 0.02% x x EMX1-2_17 chr7 32492726 TTTAcCCTCCacTTCTGGAACtcCACC 5 0.01% 0.02% 0.01% x x EMX1-2_18 chr9 102937840 TTTATtCTCtGGTTCTGGAACCAagtC 5 0.00% 0.00% 0.01% x x EMX1-2_19 chrX 135428078 TTTCTCCataGtTTCTGGAACCACAtC 5 0.00% 0.00% 0.00% x x

delete

delete

Figure 112018015311475-pat00093

CCR5-9 Indel frequency (%) D-
cap.
Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1 As Lb
On-target chr3 46415182 TTTG GCCTGAATAATTGCAGTAGCTCT (TTTG-SEQ ID NO: 25) 0 0.00% 12.61% 11.37% o o CCR5-09-02 chr6 3563234 TTTGGtCTGAATAATTtCAGTAGCTCT 2 0.00% 0.00% 0.00% o x CCR5-09-03 chr8 28861456 TTTTGCCTGgAcAATTGCAGTAaCTaT 4 0.01% 0.00% 0.00% x x CCR5-09-04 chr7 115259943 TTTTGCCTGgATAATTGCAGTAGCctc 4 0.29% 0.28% 0.28% x x CCR5-09-05 chr4 20700452 TTTCaCCTGAATAATTGCAcTAGCTaa 4 0.00% 0.00% 0.00% x x CCR5-09-06 chr4 182288863 TTTTGCCTtgATAATTGCAGaAGCTgT 4 0.01% 0.00% 0.00% x x CCR5-09-07 chr16 55176669 TTTTcCCTGAATAcTTcCAGTgGCTCT 4 0.02% 0.01% 0.01% x x CCR5-09-08 chr2 184980577 TTTTGtCTGgATAAcTGCAGTAtCTCT 4 0.00% 0.00% 0.00% x x CCR5-09-09 chr21 34303495 TTTGGCCTcAAacATTGCAGaAGCTCT 4 0.01% 0.00% 0.00% x x CCR5-09-10 chr17 42584285 TTTCtCCTGAATtATTGCAGTAGCTac 4 0.01% 0.02% 0.00% x x CCR5-09-11 chrX 80917477 TTTAGCCTGAATtATTaCAaTAGCTtT 4 0.00% 0.00% 0.00% x x CCR5-09-12 chr11 29993622 TTTTGCCTGgAcAATTGCAaTAGCTtT 4 0.00% 0.01% 0.01% x x

delete

HPRT1 -1 Indel frequency (%) D-
cap.
Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1 As Lb
On-target chrX 133609298 TTTG CTGACCTGCTGGATTACATCAAA (TTTG-SEQ ID NO: 27) 0 0.02% 9.76% 10.39% o o HPRT1-01-02 chr5 30248678 TTTGCTcACCTGCTGGATTACATCAAA One 0.01% 0.08% 0.04% o o HPRT1-01-03 chr11 93732144 TTTGCTGACCTGCTaGATaACATCAAA 2 0.03% 0.04% 0.02% o o HPRT1-01-04 chr12 62892531 TTTACTGACaactTGGATTACATCAAA 4 0.01% 0.01% 0.01% x x HPRT1-01-05 chr17 78758131 TTTTtTaACCTGCTGGATTAaATgAAA 4 0.10% 0.08% 0.08% x x

HPRT1 -4 Indel frequency (%) D-
cap.
Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1 As Lb
On-target ChrX 133620466 TTTA TGTCCCCTGTTGACTGGTCATTC (TTTA-SEQ ID NO: 28) 0 1.67% 34.96% 36.18% o o HPRT1-04_02 Chr11 93732023 TTTATaTCCCCTGTTGACTGGTCATTa 2 0.03% 0.11% 7.57% o o HPRT1-04_03 Chr5 161039971 TTTATGTCCCCTcTTGcCTGGTCATaa 4 0.10% 0.08% 0.08% o o

AAVS1 Indel frequency (%) D-
cap.
Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1 As Lb
On-target Chr19 55626916 TTTG CTTACGATGGAGCCAGAGAGGAT (TTTG-SEQ ID NO: 21) 0 0.00% 22.42% 22.78% o o AAVS1_02 Chr2 79999133 TTTTCTTttGATGGtGCCAGAGAGGAT 3 0.00% 0.01% 0.00% x x AAVS1_03 Chr8 113838377 TTTTCTTctGcTGGAGCCAGAGAGGcT 4 0.01% 0.01% 0.01% x x AAVS1_04 Chr4 96317093 TTTCCTTAtGATGaAGCCAGAGAaGcT 4 0.00% 0.08% 0.53% o o

HBB- 1 Indel frequency (%) Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1
On-target Chr11 5247941 TTTG AGTCCTTTGGGGATCTGTCCACT
(TTTG-SEQ ID NO: 40)
0 0.063% 1.855% 0.916%
HBB-01_02 Chr11 5255355 TTTGAGTCCTTTGGGGATCTGTCCtCT One 0.095% 0.824% 1.108% HBB-01_03 ChrX 62099518 TTTTAGTCCTTTGGGTATaTaTCCAgT 4 0.031% 0.039% 0.029% HBB-01_04 Chr11 93986836 TTTTAaTCCTTTGGGtATaTGcCCACT 4 0.052% 0.062% 0.063%

VEGFA -2 Indel frequency (%) Location PAM-Target Sequence Mis-
No.
(-) Cpf AsCpf1 LbCpf1
On-target Chr6 43738576 TTTT CGTCCAACTTCTGGGCTGTTCTC
(TTTT-SEQ ID NO: 41)
0 0.000% 0.942% 0.199%
VEGFA-02_02 ChrX 104051443 TTTACTACCAACTTCTttGCTGTTCTC 4 0.022% 0.025% 0.026%

In Table 8 to Table 17, the lower case alphabet represents the position of mismatch, 'Mis-No.' Represents the number of mismatches, '(-) Cpf' means the case where Cpf1 is not added, 'Lb' means 'AsCpf1' and 'LbCpf1', respectively. Also, 'D-Cap.' Means 'Digenome Capture'. When the cleavage score obtained by digenome sequencing (Example 4) is greater than or equal to the cutoff value (2.5) Respectively.

Based on the target sequences shown in Tables 8 to 17, LbCpi1 crRNA and AsCpf1 crRNA were prepared and used in the test described in Table 4.

Indel frequency (%) shown in Tables 8 to 17 was measured by targeted deep sequencing method.

As it is shown in Table 8 to 17, LbCpf1 and using the on-target sites AsCpf1 (labeled as DNMT1 EMX1 -3 and -2 sites) and mismatch number is when observed whether or not the off-target of more than five locations Of the 87 sites, LbCpf1 was validated at 4 sites in the case of 3 AsCpf1, while the off-target indel was reduced from 0.04% to 0.7% on-target indel frequency (34% and 25% with LbCpf1 and 47% and 13% with AsCpf1). We also observed different homologous sites with single mismatches for the other two on-target sites (CCR5-1 and HPRT-1). On-target frequencies of LbCpf1 in CCR5-1 and HPRT-1 sites were 19% and 10%, respectively, but single-base mismatched sites were 0.4% and 0.04%, respectively. It can be seen that single-base mismatch is also distinguished because it is 1/48 (= 19% / 0.4%) and 1/250 (= 10% / 0.04%) on on-target indel frequency respectively. Overall, 130 indel frequncies of bona fide off-target sites were observed. Of these, 9 sites were validated, but less than 1% of indels in most sites. These results show that Cpf1 is highly specific in human cells.

In order to identify the genome-wide Cpf1 off-target site in an unbiased manner, a total of eight efficient Cpf1 (using the crRNA for the 1-8 target sequence in Table 6) Example 4) was carried out. AsCpf1 and LbCpf1 ribonucleoproteins (RNPs) obtained by the method of Example 3 were treated with high concentration (300 nM Cpf1 and 900 nM crRNA) and cut into cell-free genomic DNA isolated from Hela cells using DNeasy Tissue kit (Qiagen) , whole genome sequencing (WGS; see Example 4). For comparison, the same test was carried out using SpCas9.

The results obtained using AsCpf1 and LbCpf1 in the obtained cleavage score (Example 4) are shown in Figs. 21a (results for DNMT1-3) and 21b (results for DNMT1-4) and Tables 18 to 33. Fig.

LbCpf1_ DNMT1 -3 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr5 13135736 TTTCCTGATGGTCCAcacCTGTTAaca 13.20 No chr8 112204853 TTTCCTGATGGTCCAcacCTGTTAaga 12.38 No chr19 10244444 TTTCCTGATGGTCCATGTCTGTTACTC 11.97 No chr11 26124230 TTTCCTGATGGTCCAcaTCTGTTAaca 11.51 No chr16 75745894 TTTTCTGATGGTCCATacCTGTTACaC 9.36 No chr3 30592945 TTTCCTGATGGTCCAcacCTGTTAaca 8.74 No chr10 66295933 TTTCCTGATGGTCCAcacCTGTTAaca 8.67 No chr5 39969437 TCTCCTGATGGTCCATacCTGTTAacg 8.65 No chr10 6784959 TTTCCTGATGGTCCAcacCTGTTAaca 7.24 No chr3 166705664 TTTCCTGATGGTCCAcacCTGTTAaca 5.96 No chr2 62165341 TTTCCTGATGGTCCAcacCTGTTAaca 5.56 No chrl 89819957 TTTCCTGATGGcCCATacCTGTTAaca 5.31 No chrX 115862097 TTTCaTGATGGTCCATacCTGTTAaca 5.29 No chrX 92676365 TTTCCTGATGGTCCATacCTGTTAaca 5.22 No chr3 164692184 TTTCCTGATGGTCCAcacCTGTTAaca 5.05 No chr16 13699913 TTTCCTGATGGTCCAcacCTGTTAaca 4.84 No chr2 153648723 TTTCCTGATGGTCCAcacCTGTTAaca 4.83 No chrl 236623991 TTTACTGATGaTCCATGTCTaaacgTt 4.74 No chrX 97546178 TTTCCTGATGGTCCAcGcCTGTTAaca 4.45 No chr11 38911731 TTTCCTGATGGTCCAcacCTGTTAaca 4.07 No chrX 57676022 TTTCCTGATGGTCCAcacCTGTTAaca 4.01 No chr5 55879970 TTTCCTGATGGTCCAcacCTGTTAacC 3.76 No chrX 153891299 TTTCCTGATGGTCCAcacCTGTTAaca 3.62 No chr14 21663713 TTTCCTGATGGTCCAcacCTGTTAaTt 3.24 No chr6 55276466 TTTCCTGATGGTCCAcacCTGTTAaca 2.85 No chr10 113265597 TTTCCTGATGGTCCATaTCTGTggCat 2.66 No chr7 7682807 TTTCCTGATGGTCCAcacCTGTTAtca 2.57 No chrX 8935018 TTTCCTGATGGTCCAcacCTGTTAaca 2.50 No

LbCpf1_ DNMT1 -4 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr19 10244367 TTTATTTCCCTTCAGCTAAAATAAAGG 6.86 No

LbCpf1_ EMX1 -2 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr2 73160921 TTTGTCCTCCGGTTCTGGAACCACACC 12.19 No chr2 177017501 TTCATCCTCCGGTTCTGGAACCAgAtC 8.08 No chr17 46690720 TTCATCCTCCGGTTCTGGAACCAgAtt 4.71 No chr6 134409314 TTTCTCCTCaGGTTCTGGAACCAataC 3.77 No

LbCpf1_ CCR5 -1 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr3 46399210 TTTTGTGGGCAACATGCTGGTCgTCCT 33.76 No chr3 46414552 TTTTGTGGGCAACATGCTGGTCATCCT 13.90 No chr8 54163354 CTTGGTGGGCAACtcGcTGGTCATgtT 2.93 No

LbCpf1_ CCR5 -9 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr3 46415211 TTTGGCCTGAATAATTGCAGTAGCTCT 51.78 No chrl 1.44E + 08 TTTTGCCTGAATgATTGCAGTAttTac 11.04 No chr9 37289589 TTTGGaCTGAATtaTTGCAGTAacatT 3.13 No

LbCpf1_ HPRT1 -1 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chrX 133609321 TTTGCTGACCTGCTGGATTACATCAAA 4.15 No chr11 93732147 TTTGCTGACCTGCTaGATaACATCAAA 3.91 No chr5 30248701 TTTGCTcACCTGCTGGATTACATCAAA 2.91 No

LbCpf1_ HPRT1 -4 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr11 93732073 TTTATaTCCCCTGTTGACTGGTCATTa 28.40 No chr5 161040022 TTTATGTCCCCTcTTGcCTGGTCATaa 3.88 No chrX 133620495 TTTATGTCCCCTGTTGACTGGTCATTC 2.90 No

LbCpf1_AAVS1 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr2 34206860 TTTCCaTACaATGGAGCCAGAGA-GAT 9.31 RNA Bulge chr4 96317122 TTTCCTTAtGATGaAGCCAGAGAaGcT 5.26 No chr16 34823594 TTTACaTAaGATGAAaCCAGAGAGaAa 4.34 No chr19 55626945 TTTGCTTACGATGGAGCCAGAGAGGAT 2.63 No

AsCpf1_ DNMT1 -3 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr12 17538224 TTTACTGATGGTCttacTtTaTaggcC 15.78 No chr7 134517009 TCTCCTGATGGTCCATacCTGTTAaca 14.35 No chr5 13135739 TTTCCTGATGGTCCAcacCTGTTAaca 13.65 No chr9 25518292 TCTCCTGATGGTCtATaTCTGTTAaaa 12.61 No chr5 39969440 TCTCCTGATGGTCCATacCTGTTAacg 12.11 No chr8 112204856 TTTCCTGATGGTCCAcacCTGTTAaga 12.05 No chr11 82700148 TTTACTGATGGTCtcatTtaaTcttTa 11.02 No chr3 164692191 TTTCCTGATGGTCCAcacCTGTTAaca 10.97 No chr4 123785685 TTTCCTGATGGTCtcatatTtTcttTa 8.95 No chrl 213377380 TTTCCTGATGGTCCATGTCTGaattag 8.65 No chr10 6784966 TTTCCTGATGGTCCAcacCTGTTAaca 8.18 No chr7 123688384 TTTCCTGATGGTCCAcacCTGTTAaca 8.03 No chr2 200523682 TTTACTGATGGTattataggaagttat 7.99 No chr16 75745895 TTTTCTGATGGTCCATacCTGTTACaC 7.12 No chr10 111147398 TTTCCTGATGGTCCATacCTGcTgCaC 7.06 No chr11 38911734 TTTCCTGATGGTCCAcacCTGTTAaca 6.69 No chrX 57676029 TTTCCTGATGGTCCAcacCTGTTAaca 6.55 No chr16 13699916 TTTCCTGATGGTCCAcacCTGTTAaca 6.45 No chr3 30592953 TTTCCTGATGGTCCAcacCTGTTAaca 5.77 No chr19 43263943 TTTACTGATGGTCCAaacaTcTaAgat 5.66 No chr19 43416520 TTTACTGATGGTCCAaacaTcTaAgat 5.41 No chr6 55276469 TTTCCTGATGGTCCAcacCTGTTAaca 5.03 No chr10 66295940 TTTCCTGATGGTCCAcacCTGTTAaca 4.93 No chr19 43435385 TTTACTGATGGTCCAaacaTcTaAgat 4.85 No chrX 82549910 TTTCCTGATGGTCCAcacCTGTTACaC 4.58 No chr5 54119487 TTTCCTGATGGTCCAcacCTGTTAaTg 4.58 No chrl 236623994 TTTACTGATGaTCCATGTCTaaacgTt 4.39 No chr19 10244446 TTTCCTGATGGTCCATGTCTGTTACTC 4.35 No chr4 9395117 TTTCCTGATGGTCtAcaTCTGTTAaca 4.20 No chr3 166705667 TTTCCTGATGGTCCAcacCTGTTAaca 4.13 No chr14 21663712 TTTCCTGATGGTCCAcacCTGTTAaTt 4.12 No chr7 80711731 TTTACTGATGGTCacTaTaaacacaga 3.79 No chr5 55879977 TTTCCTGATGGTCCAcacCTGTTAacC 3.69 No chr7 7682808 TTTCCTGATGGTCCAcacCTGTTAtca 3.56 No chr10 113265596 TTTCCTGATGGTCCATaTCTGTggCat 3.25 No chrX 97546179 TTTCCTGATGGTCCAcGcCTGTTAaca 3.24 No chrl 146123502 TTTCCTGATGGTCCAcacCTGTTgCaC 3.22 No chr7 3669637 TTTCCTGATGGTCCcatcCaaTgttTa 3.22 No chr11 26124238 TTTCCTGATGGTCCAcaTCTGTTAaca 3.09 No chrX 153891300 TTTCCTGATGGTCCAcacCTGTTAaca 3.00 No chrX 92676366 TTTCCTGATGGTCCATacCTGTTAaca 2.88 No chr2 153648726 TTTCCTGATGGTCCAcacCTGTTAaca 2.80 No chr2 62165344 TTTCCTGATGGTCCAcacCTGTTAaca 2.77 No chr19 43524715 TTTACTGATGGTCtAaacaTcTaAgat 2.72 No chr10 14334368 TTTCCTGATGGTCttcaatatctcttct 2.67 No chr19 43377706 TTTACTGATGGTCCAaacaTcTaAgat 2.53 No

ASCpf1_ DNMT1 -4 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr19 10244367 TTTATTTCCCTTCAGCTAAAATAAAGG 8.92 No

AsCpf1_ EMX1 -2 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr2 73160922 TTTGTCCTCCGGTTCTGGAACCACACC 7.57 No chr2 177017500 TTCATCCTCCGGTTCTGGAACCAgAtC 6.59 No chr6 134409310 TTTCTCCTCaGGTTCTGGAACCAataC 4.44 No chr17 46690718 TTCATCCTCCGGTTCTGGAACCAgAtt 3.28 No chr7 145773724 TTTGTCCTCCaGaTaTGGAACCAtgtg 3.14 No

AsCpf1_ CCR5 -1 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr3 46414552 TTTTGTGGGCAACATGCTGGTCATCCT 18.09 No chrl 113920223 TTTGGTGGGCAACATGCcaG-CATTaa 16.52 RNA Bulge chr8 138491414 TTTAGTGGGaAcagTctgGtcatgagt 14.80 No chr16 61917098 TTTGGTGGGCAACATGCTataCAaaaT 12.36 No chr10 56138671 TCTGGTGGaCAACATGCTGaTCAaagg 11.54 No chr8 54163354 CTTGGTGGGCAACtcGcTGGTCATgtT 11.22 No chr6 137588270 TTTGGTGGGgAACATaCaaGTCATatT 9.30 No chr20 43657503 TTTGGTGGGCAAgcTaCTtaTacggag 9.04 No chr8 24661222 TTTAGTGGGCAAacTatTGaaaAgata 8.63 No chr6 127930554 TTTGGTGGGCAACtctaTtaTtgTatc 8.26 No chr7 62666896 TTTAGTGGGCAAtcTaCTGGaaggaag 6.91 No chrX 65962874 TTTGGTGGGCAAgcTatTaaTgATtgc 6.07 No chr19 44648031 GTTAGTGGGCAACATaCTGtaaAgacc 5.79 No chr2 78618092 TTTGGTGGGCAACtTttTatTgtTgCT 5.59 No chr3 46399210 TTTTGTGGGCAACATGCTGGTCgTCCT 5.17 No chr15 58588554 TTTAGTGGGaAACtT-CTGGTCATaCa 5.11 RNA Bulge chr4 110395952 TTTAGTGGGCAAaccatTtacaAaata 4.19 No chrl 72141686 TTTGGTaGGtAACATGgTGGaagTCaa 4.18 No chr15 24068708 TTTTGTGGGCAACATatataTaggtcT 3.81 No chr5 56998240 TTTAGTGGGCAACtgtaTttagAaatc 2.60 No

AsCpf1_ CCR5 -9 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr3 44394779 TTTAGCCTGAATAATattcaaTtgTCT 35.07 No chr15 35754229 TTTGGCCTGAATAAcaatAtacatgtT 14.46 No chr6 3563258 TTTGGtCTGAATAATTtCAGTAGCTCT 11.91 No chr12 58841086 TTTAGCCTGAATAATTaCAtTtaaTaa 11.58 No chrl 144014812 TTTTGCCTGAATgATTGCAGTAttTac 10.39 No chr9 37289591 TTTGGaCTGAATtaTTGCAGTAacatT 7.57 No chr15 66090016 TTTAGCCTGAAattTTGCAGTAGtcaT 6.55 No chr7 35337090 TTTAGCCTGAATAATattccattgccT 6.37 No chr7 13593045 TTTAGCCTGAATAAcattgtattgTgT 5.59 No chr4 98108485 TTTGCCCTGAATAATTGCAGcataatT 5.47 No chr8 74162570 TTTAGCCTGAATAtTatAtaTtatcaT 4.86 No chr3 46415212 TTTGGCCTGAATAATTGCAGTAGCTCT 4.42 No chr5 59993666 TTTAGCCTGAATAtTatttGTtaggga 3.73 No chr15 95848470 TTTGGCCTGAATtATattacTtAGTCa 3.41 No chr4 108769431 TTTAGCCTGAATAATaatAcTgcaTta 3.16 No

AsCpf1_ HPRT1 -1 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr11 93732153 TTTGCTGACCTGCTaGATaACATCAAA 48.57 No chr5 30248702 TTTGCTcACCTGCTGGATTACATCAAA 27.49 No chr6 49794715 TTTCCTGACCTGCTatATatatacacAA 8.55 No chrX 133609322 TTTGCTGACCTGCTGGATTACATCAAA 6.67 No

AsCpf1_ HPRT1-4 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chrX 133620495 TTTATGTCCCCTGTTGACTGGTCATTC 12.93 No chr11 93732073 TTTATaTCCCCTGTTGACTGGTCATTa 7.92 No chr5 161040022 TTTATGTCCCCTcTTGcCTGGTCATaa 4.46 No

AsCpf1_AAVS1 Chromosome location DNA sequence at cleavage site DNA cleavage score Bulge chr19 55626945 TTTGCTTACGATGGAGCCAGAGAGGAT 14.05 No chrl 182157128 TTTACTTA-GATGaAGCCAcAGAGGcc 11.23 RNA Bulge chr4 96317123 TTTCCTTAtGATGaAGCCAGAGAaGcT 3.76 No chr12 31744997 TTTACTTA-GATGGAGaCAGAGtctcc 4.29 RNA Bulge

As shown in FIGS. 21A and 21B and Tables 18 to 33, the alignment of sequence readings corresponding to on-target and off-target in vitro cleavage sites was confirmed to be more uniform than random , in vitro cleavage, Cpf1 has been shown to have high specificity at sites 1 to 46, including the on-target site. The number of in vitro cleavage sites (or Digenome-captured sites) was 6 ± 3 for LbCpf1 and 12 ± 5 for AsCpf1, which is significantly lower than the 90 ± 30 of SpCas9 obtained in our previous study to be.

22a to 22f show the Sequence logos of Digenome-captured site obtained by using AsCpf1 and the Sequence logos of digenome-captured site obtained by using LbCpf1 at the bottom, showing the Sequence logos of Cpf1-mediated digenome-captured site . As shown in Figures 22a to 22f, 50 and 98 in vitro cleavage sites, each obtained using 8 LbCpf1 and AsCpf1 nuclease, carry mismatches, which are mostly PAM close-in sites (PAM -proximal region) located in the PAM-distal region about 13-nt from the PAM sequence.

Of the 50 sites cut by 8 LbCpf1 nuclease, 46 sites were cleaved by AsCpf1. Four sites have been deleted, one nucleotide compared to their respective on-target sites, which can potentially produce RNA bulges in the DNA-crRNA duplex region. The two LbCpf1 and AsCpf1 nuclease are 6 (for LbCpf1) and 4 (for AsCpf1) containing non-canonical PAM sequences such as 5'-TCTN-3 'and 5'-TTCN- ) Was cut. All 8 on-target sites and 8 off-target sites identified by deep sequencing above were captured by Digenome-seq.

The results obtained are shown in Fig. 21C. As shown in Fig. 21C, the Cas-OFFinder (see Fast and versatile algorithm for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics. 2014 May 15; 30 (10): 1473-5) Only 0.9% of homologous sites with 5 or 6 mismatches were cut in vitro. Homologous sites with four or fewer mismatches were captured by Digenome-seq because they are likely to be cleaved, but these sites are rarely present in human genomes (6 ± 2 such sites per crRNA).

We compared the genome-wide specificity of Cas9 with that of LbCpf1 and AsCpf1 at two overlapping sites in the DNMT1 2 of overlapping site the genome-wide specificity of LbCpf1 and AsCpf1 Digenome-seq manner in a locus (see Example 4) in And compared with SpCas9 (see Figures 21a and 21b). A genome-wide distribution plot of the in vitro cleavage site shown in Figure 21A shows that Cas9 and Cpf1 cleave chromosomal DNA at very different sites. A new motif or sequence logo obtained by comparing DNA sequences in an in vitro cleavage site shows that LbCpf1 has higher specificity than AsCpf1 or Cas9 (see Figure 21a). All LbCpf1 and AsCpf1 been shown to target the areas to be cut only on DNMT1 -4-target site in the human genome of the total (see Fig. 21b and 23). Figure 23 is shows an Sequence logos of Digenome-captured site, Sequence logos were obtained through the WebLogo (http://weblogo.berkeley.edu/logo.cgi) that uses the captured Digenome-sites, only one DNMT1 - Only 4 on-target sites were found to be captured by LbCpf1 and AsCpf1.

The in vitro cleavage site confirmed by digenome-seq was validated in HEK293 cell cells through targeted deep sequencing. Indel frequency at most of the validated off-target sites was less than 1% (see Figures 21d and 24a-24f), and this result is very low compared to the Indel frequency at the corresponding on-target site . Figure 21d is a graph showing the off-target sites identified in human cells by targeted deep sequencing, with the DNA sequences of the on-target and off-target regions also being shown (bold in the PAM sequence and mismatched nucleotides in the lower case ). 24A to 24F are graphs showing the Indel frequency at the digenome-captured site in HEK293T17 cells. The dark bars are the results obtained in HEK293T17 cells transfected with the LbCpf1 plasmid and the light bars are the HEK293T17 transfected with the AsCpf1 plasmid Results from cells are shown.

To calculate the genome-wide off-target effect, the off-target effect index (OTI), calculated as the ratio of the total sum of the indel rates of the validated off-target sites to the on-target indel rate, was calculated. Two sites DNMT1 OTI of about LbCpf1 (DNMT1-3 and DNMT1-4) are respectively 0.005 and 0.012, OTI of AsCpf1 is found to be 0.267 and 0.024 respectively. These results suggest that the off-target effect is site-dependent and that LbCpf1 has relatively high specificity compared to AsCpf1. On the other hand, in a previous study of the present inventors, it was found that the OTI of Cas9 at the two sites was> 2.0.

To exclude the possibility that indel frequencies in these valid off-target sites would be degraded by local chromatin accessibility, new crRNAs with matched sequences with off-target sites were tested by transfection (see Figure 21 e ). Figure 21E is a graph showing Targeted mutagenesis (Indel frequency (%)) obtained at the AsCpf1 off-target site using crRNA redesigned to hybridize to the off-target site. Each off-target-specific crRNA is capable of inducing indels at each corresponding position, but not at the on-target site. As shown in FIG. 21E, the OT6 site contains an atypical 5'-TCTN-3 'PAM sequence, and the crRNAs specific for OT6 and OT12 (only one nucleotide at the 3' end is different) Indel was induced at a frequency of 3.7% and 8.1%. These results show that Cpf1 can also perform genomic cleavage at chromosome target sites with atypical PAM sequences, thereby extending the range of Cpf1-mediated genetic corrections.

Example 10: Off-target effect test using RNP

In order to avoid or reduce the off-target effect, preassembled Cpf1 RNP was tested by transfection into human cells. Cas9 RNPs Cpf1 RNPs cleave the target site immediately after transfection and are degraded by the proteases and ribonucleases inherent in the cell, resulting in degradation of the off-target effect without degrading the on-target effect And indeed Cpf1 RNP did not induce indel above the noise level at some of the off-target sites proven using plasmids (see Figure 21f).

21f is a graph showing the Cpf1 off-target effect in the case of using a plasmid encoding Cpf1 and crRNA and in the case of using RNP in which Cpf1 and crRNA are complexed, and the specificity ratio is the off-target indel level obtained using Cpf1 RNP (RNA / plasmid) between the ratio of the on-target indel frequency to the OTI frequency and the ratio of the plasmid to the plasmid. These results show that the off-target effect . Based on the results of Fig. 21F, the OTI was lower than 0.0004 (< 0.0004) in both cases where AsCpf1 RNP and LbCpf1 RNP were used. These results show that these RNPs show little off-target effect.

Example 11: Measurement of off-target effect using crRNA truncated at the 3 'end

The off-target effect of truncated trRNA (tru-crRNAs) at the 3 'end was tested.

The truncated crRNA (tru-crRNAs) truncated at the 3 'end was designed so that the targeting sequence of the crRNA was cut from the 3' end and the targeting sequence lengths were 22 nt, 20 nt, 18 nt, and 16 nt, respectively. Specifically, the 3 'cut crRNA (tru-crRNAs) cut at the end is from -3 DNTM1 target site of SEQ ID NO: 29 (TTTC CTGATGGTCCATGTCTGTTACTC) PAM sequence (5'-TTTC-3' adjacent to the 3 'direction) (I.e., the targeting sequence of the crRNA is adjacent in the 3 'direction of the PAM sequence (5'-TTTC-3') of the sequence of SEQ ID NO: 29) With sequences substituted for U in the consecutive 22 nt, 20 nt, 18 nt, and 16 nt sequences located by. Each tru-crRNA and full-length crRNA (having a sequence in which T was replaced with U in the 23 nt sequence except for the PAM sequence in SEQ ID NO: 29 as the targeting sequence) was ligated with the AsCpf1 expression plasmid using lipofectamine 2000 And transfected into HEK293T cells. After 72 hours, the genomic DNA was isolated and indel frequencies were measured on targeted and off-target sites by targeted deep sequencing.

The results obtained are shown in Fig. As shown in Fig. 25, when the tru-crRNAs were used, it was confirmed that the off-target effect was reduced to about 1/10. This reduction in off-target effect is expected to be more pronounced when the off-target contains a mismatch nucleotide at the PAM-distal 3 'terminus.

Example 12: Identification of cleavage ends by Cpf1

When using the Digenome-seq assay described in Example 4, the Integrative Genomics Viewer (IGV) can be used to facilitate the display of overhang patterns at the cleavage site.

Figure 26A is a representative Integrative Genomics Viewer (IGV; http://software.broadinstitute.org/software/) showing overhang patterns in DNTM1-3 target site (SEQ ID NO: 19) and DNTM1-4 target site igv / ') image. LbCpf1 generally produced 3-nt overhang at the 5 'end of the cleavage site but not 2-nt overhang, whereas AsCpf1 produced 2- nt to 4-nt overhang at the 5' end of the cleavage site. Cas9 produced a 1-nt overhang at the blunt end or at the 5 'end of the cleavage site.

The different overhang patterns generated for the DNTM1-3 target site (SEQ ID NO: 19) and DNTM1 -4 target site (SEQ ID NO: 20) as described above were tested to see if they caused different mutation characteristics.

FIG. 26B is a graph showing the number of mutant sequence leads binned by deletion / insertion size in a base pair. FIG. FIG. 26C shows a mutation sequence derived from the target site of Cpf1 or Cas9. For each nuclease, the sequence of the first row is the original target sequence, the sequence of the second sequence is the mutant sequence In the first line sequence, the PAM sequence (Cpf1: TTTC) is indicated in bold, the target sequence hybridized with crRNA / sgRNA is underlined, and the underlined sequence in the sequence from the second line indicates Microhomology sequences , And the numbers on the right side indicate the number of deletion (denoted by '-') or insertion (denoted by lower case) of nucleotides.

27A and 27B show the mutation characteristics induced by LbCpf1, AsCpf1, and SpCas9,

27a is a graph showing the number of mutated sequence leads bound by deletion / insertion (Indel) size in base pairs. The mutation characteristics are measured by targeted deep sequencing method from HEK293T cells transfected with LbCpf1, AsCpf1, or SpCas9 plasmid Respectively.

27b is EMX1 -2 target site; and (CTGATGGTCCATGTCTGTTACTC SEQ ID NO: 42) by showing a sequence variation, for each of the nuclease, the sequence of the first line is the original target region sequences derived from, the transition beginning the second line In the first line sequence, the PAM sequence (Cpf1: TTTG) is shown in bold, the target sequence hybridized with crRNA / sgRNA is underlined, and the sequence from the second line is underlined Sequence refers to Microhomology sequences, and the numbers on the right indicate the number of deletions (denoted by '-') or insertions (denoted by lowercase letters).

LbCpf1, AsCpf1, and Cas9 induce relatively mutated sequences, although some microhomoloy is found at the deletion junction. In the case of Cpf1 nuclease, insertion or deletion of a single nucleotide is rare, but in case of Cas9 it may be a dominant mutation pattern. These results show that the differences in Cpf1 and Cas9 cleavage sites and overhang patterns cause different mutation characteristics.

Fig. 26d and Fig. 26e show mutation characteristics induced by LbCpf1, AsCpf1 and SpCas9. And 26e is a graph showing the ratio of each of the two fractions to the in-frame indels. out-of-frame indels (Data represent mean ± s.e.m. (n = 10 target sites)).

As shown in Fig. 26 (d), unlike Cas9, Cpf1 induces almost no insertion mutation. Also, as shown in FIG. 26 (e), the in-frame mutation ratio caused by deletion of 3-nt, 6-nt, and 9-nt was higher when Cpf1 was used than Cas9. These results suggest that selection of target sites based on microhomology is more important to inactivate protein coding genes when using Cpf1 compared to Cas9.

Example  13: Cpf1  and crRNA  of RNP microinjection  Transduction method to mouse embryo to induce specific nucleotide sequence variation at target site

To date, no mutant mice have been reported to be microinjected into mouse embryos using Cpf1 RNP.

Recombinant Acidaminococcus sp . RNP was produced by expressing and purifying the BV3L6 Cpf1 (AsCpf1) protein in E. coli (see Example 1), constructing crRNAs (see SEQ ID NOS: 1 to 3) targeting the mouse gene (FoxN1) protein 200 ng / ul, crRNA 100 ng / ul). The crRNA was prepared by the method described in Table 4 based on the target sequence of SEQ ID NO: 2 and SEQ ID NO:

The thus prepared RNP was transferred to a mouse embryo by microinjection method (see FIG. 1), and the injected embryos were cultured up to blastocyst to purify gDNA to confirm the nucleotide sequence variation. The results of the T7E1 assay are shown in FIG. As shown in FIG. 2, nucleotide sequence variation was shown in 10 of 12 blastocysts (83%) (marked with an asterisk).

Targeted deep sequencing was performed to confirm that the genetic mutation was specifically induced in the sequence targeted by the crRNA, and the results are shown in FIG. These results demonstrate that microinjection of AsCpf1 RNP is an efficient method for genome-wide corrections in animals.

In addition, Cpf1 RNP was used for genetic correction of the embryo, and it was confirmed whether the nucleotide sequence variation was specific and nonspecific sequence variation was observed in this individual. The gDNA was purified from the tail of this mouse, and it was confirmed that there was a genetic mutation at a specific site by the T7E1 experiment and targeted deep sequencing method (see FIGS. 4 and 5), and a whole genome sequencing (WGS) (See FIG. 6). Comparison of the WGS data with the reference genome revealed that no nonspecific nucleotide mutations occurred and only genomic corrections were made to specific sequences (see FIG. 6).

Example  14: Electroporation  By way of Cpf1  and Cas9 RNP  To the mouse embryo

Because Cpf1 RNP delivery via microinjection requires the processing of mouse embryos one by one, the embryo must be tested for several hours during which it remains in the 1 cell stage, and thus the number of experiments that can be performed at one time is limited by the number of testis and injection devices There are disadvantages.

In order to overcome this problem, we have applied the electroporation method which can treat several embryos at once to a recombinant protein of Streptococcus pyogenes Cas9 (SpCas9) and AsCpf1 to identify a method of dielectric correction in mouse embryo (see FIG. 7). In this example, recombinant AsCpf1 or SpCas9 protein (100 ng / ul) and sgRNA (500 ng / ul; referenced in Table 5 based on the target sequence (VEGFA) ng / ul; produced according to the description of Table 4 based on the target sequence of SEQ ID NO: 2 or 3) was diluted in Opti-Mem (Thermo) medium to prepare RNP. Here, 50 mouse embryos were placed and electroporation was carried out using NEPA 21 (NEPA GENE Co. Ltd) electroporator equipment.

Electroporation was performed with a poring pulse (225 V, 1.5 ms, interval 50 ms, 4 times, decay rate 10%, polarity +) and transfer pulse (20V, 50 ms, interval 50 ms, 5 times, decay rate 40% -) method. We tried SpCas9 first, and then RNP was made with sgRNA targeting SpCas9 and VEGFA and electroporation into mouse embryo. The embryo was cultured up to blastocyst, purified gDNA, and sequenced by T7E1 method and targeted deep sequencing method (see FIGS. 8 and 9).

As shown in FIG. 8 and FIG. 9, Blastocyst analysis showed that efficient genetic correction occurred by transferring SpCas9 by electroporation (12 out of 15 confirmed mutations (except for 8, 13 and 15 columns, 12 Variation observed in column), 80% efficiency).

In the same manner, when AsCpf1 RNP targeting FoxN1 exon 7 was transferred to mouse embryo by electroporation, it was confirmed by targeted deep sequencing that blastocycst analysis yielded efficient genetic corrections (16 out of 25, 64%). ).

Example  15: Using polyethylene glycol (PEG) Cpf1 RNP  To a plant to induce a specific base sequence mutation

Until now, there has been no report on the use of Cpf1 RNP for plant genetic corrections. In this example, a method for correcting plant genome using recombinant AsCpf1 and Lachnospiraceae bacterium ND2006 Cpf1 (LbCpf1) is described, and a method for manufacturing and utilizing a knockout plant of FAD2 homologous genes of soybean ( Glycine Max ) present. For this purpose, the target sequences of AsCpf1 and LbCpf1 which simultaneously recognize the FAD2 homologous genes (Glyma10g42470 and Glyma20g24530) of soybean were obtained at the same time. The target sequences thus obtained are shown in Table 34 below:

PAM and Target sequence 1 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTC TACATTGCCACCACCTACTTCC (TTTC-SEQ ID NO: 7) PAM and Target sequence 2 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTC CCTCATTGCATGGCCAATCTAT (TTTC-SEQ ID NO: 8) PAM and Target sequence 3 (LbCpf1) for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTA GTCCCTTATTTCTCATGGAAAA (TTTA-SEQ ID NO: 9) PAM and Target sequence 4 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTC TCATGGAAAATAAGCCATCGCC (TTTC-SEQ ID NO: 10) PAM and Target sequence 5 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTG TCCCAAAACCAAAATCCAAAGT (TTTG-SEQ ID NO: 11) PAM and Target sequence 6 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTG GCTGCTATGTGTTTATGGGGTG (TTTG-SEQ ID NO: 12) PAM and Target sequence 7 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTG GCAACTATGGACAGAGATATG (TTTG-SEQ ID NO: 13) PAM and Target sequence 8 for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTG ATGACACACCATTTTACAAGGC (TTTG-SEQ ID NO: 14) PAM and Target sequence 9 (AsCpf1) for FAD2 homologous genes Glyma10g42470 and Glyma20g24530 TTTA CAAGGCACTGTGGAGAGAAGC (TTTA-SEQ ID NO: 15)

(PAM sequence is shown in bold)

Based on the obtained target sequences, crRNAs were prepared by the method described in Table 4.

(300 μl) of 40% polyethylene glycol (PEG) solution (PEG 4000, 0.2 M mannitol and 0.1 M CaCl 2 ) was added to the same volume of MMG (0.4 M mannitol, 15 mM MgCl 2 ) solution of plant protoplasts (2 × 10 5 protoplasts RNP was transferred into plant cells by mixing pre-mixed recombinant AsCpf1 (or LbCpf1) proteins (40 ug / 2x10 5 protoplasts) and crRNA (80 ug / 2x10 5 protoplasts)

The transferred plant protoplasts were cultured for 24 hours in a solution of W5 (2 mM MES [pH 5.7], 154 mM NaCl, 125 mM CaCl 2 , 5 mM KCl) and then gDNA was isolated to confirm whether genetic modification occurred from the target gene. By applying this method, we were able to produce plant cells with knockout of two homologous FAD2 genes by targeted deep sequencing method (see FIG. 12). Sequence analysis also confirmed that the nucleotide sequence variation occurred at the target site where the target gene was predicted to be cleaved at Cpf1. (See Fig. 13).

Example  16: Split- Cpf1  Dielectric correction using

16.1. Split- Of Cpf1  making

The Cpf1 protein is a next-generation gene scissor that has been shown to be more specific to target-specific than artificial nucleases that have been used in the past, and has been attracting attention in designing genetic modifications in eukaryotic cells and organisms. Despite such a useful tool, the size of the gene coding for the Cpf1 protein is so large that the transfer of the Cpf1 protein into the cell using the viral vector is problematic in that it is inefficient and it is a stumbling block to the application of the Cpf1 technology. In the case of viral vectors, there are packaging limitations of vectors, and thus it is well known that when a gene over a packaging limit is coded, the virus production efficiency and the intracellular delivery efficiency are lowered.

In order to solve such a problem, a Split-Cpf1 system was manufactured in this embodiment. The wild type (WT) AsCpf1 protein (SEQ ID NO: 43) consists of 1,307 amino acids (see FIG. 29A). When a cassette for expression containing both a protein expression of AsCpf1 and a promoter (CMV promoter: SEQ ID NO: 64) sequence necessary for intracellular nuclear transfer, a nuclear localization signal (KRPAATKKAGQAKKKK) and a poly A signal is transferred to a virus vector, Ascfp1 protein was divided into two fragments by devising a method to reduce the size of the expression cassette, and four kinds of Split-AsCpf1 were designed.

Split-1-AsCpf1 is located between the 901th amino acid and the 902th amino acid of AsCpf1 (SEQ ID NO: 43), Split-2-AsCpf1 is between 886th amino acid and 887th amino acid of AsCpf1, Split-3-AsCpf1 is the 399th amino acid of AsCpf1 And 400th amino acid, Split-4-AsCpf1 was separated into two fragments by separating WT AsCpf1 between the 526th amino acid and the 527th amino acid of AsCpf1 (see FIG. 29A).

The resulting half-domains are summarized in the following Table 35:

Domain 1 Domain 2 Split-1-AsCpf1 (Aa 1-901 of SEQ ID NO: 43) MTQFEGFTNL YQVSKTLRFE LIPQGKTLKH IQEQGFIEED KARNDHYKEL KPIIDRIYKT YADQCLQLVQ LDWENLSAAI DSYRKEKTEE TRNALIEEQA TYRNAIHDYF IGRTDNLTDA INKRHAEIYK GLFKAELFNG KVLKQLGTVT TTEHENALLR SFDKFTTYFS GFYENRKNVF SAEDISTAIP HRIVQDNFPK FKENCHIFTR LITAVPSLRE HFENVKKAIG IFVSTSIEEV FSFPFYNQLL TQTQIDLYNQ LLGGISREAG TEKIKGLNEV LNLAIQKNDE TAHIIASLPH RFIPLFKQIL SDRNTLSFIL EEFKSDEEVI QSFCKYKTLL RNENVLETAE ALFNELNSID LTHIFISHKK LETISSALCD HWDTLRNALY ERRISELTGK ITKSAKEKVQ RSLKHEDINL QEIISAAGKE LSEAFKQKTS EILSHAHAAL DQPLPTTLKK QEEKEILKSQ LDSLLGLYHL LDWFAVDESN EVDPEFSARL TGIKLEMEPS LSFYNKARNY ATKKPYSVEK FKLNFQMPTL ASGWDVNKEK NNGAILFVKN GLYYLGIMPK QKGRYKALSF EPTEKTSEGF DKMYYDYFPD AAKMIPKCST QLKAVTAHFQ THTTPILLSN NFIEPLEITK EIYDLNNPEK EPKKFQTAYA KKTGDQKGYR EALCKWIDFT RDFLSKYTKT TSIDLSSLRP SSQYKDLGEY YAELNPLLYH ISFQRIAEKE IMDAVETGKL YLFQIYNKDF AKGHHGKPNL HTLYWTGLFS PENLAKTSIK LNGQAELFYR PKSRMKRMAH RLGEKMLNKK LKDQKTPIPD TLYQELYDYV NHRLSHDLSD EARALLPNVI TKEVSHEIIK DRRFTSDKFF FHVPITLNYQ AANSPSKFNQ RVNAYLKEHP E (Aa 902-1307 of SEQ ID NO: 43) TPIIGIDRGE RNLIYITVID STGKILEQRS LNTIQQFDYQ KKLDNREKER VAARQAWSVV GTIKDLKQGY LSQVIHEIVD LMIHYQAVVV LENLNFGFKS KRTGIAEKAV YQQFEKMLID KLNCLVLKDY PAEKVGGVLN PYQLTDQFTS FAKMGTQSGF LFYVPAPYTS KIDPLTGFVD PFVWKTIKNH ESRKHFLEGF DFLHYDVKTG DFILHFKMNR NLSFQRGLPG FMPAWDIVFE KNETQFDAKG TPFIAGKRIV PVIENHRFTG RYRDLYPANE LIALLEEKGI VFRDGSNILP KLLENDDSHA IDTMVALIRS VLQMRNSNAA TGEDYINSPV RDLNGVCFDS RFQNPEWPMD ADANGAYHIA LKGQLLLNHL KESKDLKLQN GISNQDWLAY IQELRN (Coding DNA sequence) SEQ ID NO: 83 (Coding DNA sequence) SEQ ID NO: 84 Split-2-AsCpf1 (1-886 aa of SEQ ID NO: 43) MTQFEGFTNL YQVSKTLRFE LIPQGKTLKH IQEQGFIEED KARNDHYKEL KPIIDRIYKT
YADQCLQLVQ LDWENLSAAI DSYRKEKTEE TRNALIEEQA TYRNAIHDYF IGRTDNLTDA INKRHAEIYK GLFKAELFNG KVLKQLGTVT TTEHENALLR SFDKFTTYFS GFYENRKNVF SAEDISTAIP HRIVQDNFPK FKENCHIFTR LITAVPSLRE HFENVKKAIG IFVSTSIEEV FSFPFYNQLL TQTQIDLYNQ LLGGISREAG TEKIKGLNEV LNLAIQKNDE TAHIIASLPH RFIPLFKQIL SDRNTLSFIL EEFKSDEEVI QSFCKYKTLL RNENVLETAE ALFNELNSID LTHIFISHKK LETISSALCD HWDTLRNALY ERRISELTGK ITKSAKEKVQ RSLKHEDINL QEIISAAGKE LSEAFKQKTS EILSHAHAAL DQPLPTTLKK QEEKEILKSQ LDSLLGLYHL LDWFAVDESN EVDPEFSARL TGIKLEMEPS LSFYNKARNY ATKKPYSVEK FKLNFQMPTL ASGWDVNKEK NNGAILFVKN GLYYLGIMPK QKGRYKALSF EPTEKTSEGF DKMYYDYFPD AAKMIPKCST QLKAVTAHFQ THTTPILLSN NFIEPLEITK EIYDLNNPEK EPKKFQTAYA KKTGDQKGYR EALCKWIDFT RDFLSKYTKT TSIDLSSLRP SSQYKDLGEY YAELNPLLYH ISFQRIAEKE IMDAVETGKL YLFQIYNKDF AKGHHGKPNL HTLYWTGLFS PENLAKTSIK LNGQAELFYR PKSRMKRMAH RLGEKMLNKK LKDQKTPIPD TLYQELYDYV NHRLSHDLSD EARALLPNVI TKEVSHEIIK DRRFTSDKFF FHVPITLNYQ AANSPS
(Aa 887-1307 of SEQ ID NO: 43) KFNQRVNAYL KEHPETPIIG IDRGERNLIY ITVIDSTGKI LEQRSLNTIQ QFDYQKKLDN REKERVAARQ AWSVVGTIKD LKQGYLSQVI HEIVDLMIHY QAVVVLENLN FGFKSKRTGI AEKAVYQQFE KMLIDKLNCL VLKDYPAEKV GGVLNPYQLT DQFTSFAKMG TQSGFLFYVP APYTSKIDPL TGFVDPFVWK TIKNHESRKH FLEGFDFLHY DVKTGDFILH FKMNRNLSFQ RGLPGFMPAW DIVFEKNETQ FDAKGTPFIA GKRIVPVIEN HRFTGRYRDL YPANELIALL EEKGIVFRDG SNILPKLLEN DDSHAIDTMV ALIRSVLQMR NSNAATGEDY INSPVRDLNG VCFDSRFQNP EWPMDADANG AYHIALKGQL LLNHLKESKD LKLQNGISNQ DWLAYIQELR N
(Coding DNA sequence) SEQ ID NO: 85 (Coding DNA sequence) SEQ ID NO: 86 Split-3-AsCpf1 (A.a. 1-399 of SEQ ID NO: 43) MTQFEGFTNL YQVSKTLRFE LIPQGKTLKH IQEQGFIEED KARNDHYKEL KPIIDRIYKT YADQCLQLVQ LDWENLSAAI DSYRKEKTEE TRNALIEEQA TYRNAIHDYF IGRTDNLTDA INKRHAEIYK GLFKAELFNG KVLKQLGTVT TTEHENALLR SFDKFTTYFS GFYENRKNVF SAEDISTAIP HRIVQDNFPK FKENCHIFTR LITAVPSLRE HFENVKKAIG IFVSTSIEEV FSFPFYNQLL TQTQIDLYNQ LLGGISREAG TEKIKGLNEV LNLAIQKNDE TAHIIASLPH RFIPLFKQIL SDRNTLSFIL EEFKSDEEVI QSFCKYKTLL RNENVLETAE ALFNELNSID LTHIFISHKK LETISSALCD HWDTLRNALY ERRISELTG (Aa 400-1307 of SEQ ID NO: 43) KITKSAKEKV QRSLKHEDIN LQEIISAAGK ELSEAFKQKT SEILSHAHAA LDQPLPTTLK KQEEKEILKS QLDSLLGLYH LLDWFAVDES NEVDPEFSAR LTGIKLEMEP SLSFYNKARN YATKKPYSVE KFKLNFQMPT LASGWDVNKE KNNGAILFVK NGLYYLGIMP KQKGRYKALS FEPTEKTSEG FDKMYYDYFP DAAKMIPKCS TQLKAVTAHF QTHTTPILLS NNFIEPLEIT KEIYDLNNPE KEPKKFQTAY AKKTGDQKGY REALCKWIDF TRDFLSKYTK TTSIDLSSLR PSSQYKDLGE YYAELNPLLY HISFQRIAEK EIMDAVETGK LYLFQIYNKD FAKGHHGKPN LHTLYWTGLF SPENLAKTSI KLNGQAELFY RPKSRMKRMA HRLGEKMLNK KLKDQKTPIP DTLYQELYDY VNHRLSHDLS DEARALLPNV ITKEVSHEII KDRRFTSDKF FFHVPITLNY QAANSPSKFN QRVNAYLKEH PETPIIGIDR GERNLIYITV IDSTGKILEQ RSLNTIQQFD YQKKLDNREK ERVAARQAWS VVGTIKDLKQ GYLSQVIHEI VDLMIHYQAV VVLENLNFGF KSKRTGIAEK AVYQQFEKML IDKLNCLVLK DYPAEKVGGV LNPYQLTDQF TSFAKMGTQS GFLFYVPAPY TSKIDPLTGF VDPFVWKTIK NHESRKHFLE GFDFLHYDVK TGDFILHFKM NRNLSFQRGL PGFMPAWDIV FEKNETQFDA KGTPFIAGKR IVPVIENHRF TGRYRDLYPA NELIALLEEK GIVFRDGSNI LPKLLENDDS HAIDTMVALI RSVLQMRNSN AATGEDYINS PVRDLNGVCF DSRFQNPEWP MDADANGAYH IALKGQLL LN HLKESKDLKL QNGISNQDWL AYIQELRN (Coding DNA sequence) SEQ ID NO: 87 (Coding DNA sequence) SEQ ID NO: 88 Split-4-AsCpf1 (Aa 1-526 of SEQ ID NO: 43) MTQFEGFTNL YQVSKTLRFE LIPQGKTLKH IQEQGFIEED KARNDHYKEL KPIIDRIYKT YADQCLQLVQ LDWENLSAAI DSYRKEKTEE TRNALIEEQA TYRNAIHDYF IGRTDNLTDA INKRHAEIYK GLFKAELFNG KVLKQLGTVT TTEHENALLR SFDKFTTYFS GFYENRKNVF SAEDISTAIP HRIVQDNFPK FKENCHIFTR LITAVPSLRE HFENVKKAIG IFVSTSIEEV FSFPFYNQLL TQTQIDLYNQ LLGGISREAG TEKIKGLNEV LNLAIQKNDE TAHIIASLPH RFIPLFKQIL SDRNTLSFIL EEFKSDEEVI QSFCKYKTLL RNENVLETAE ALFNELNSID LTHIFISHKK LETISSALCD HWDTLRNALY ERRISELTGK ITKSAKEKVQ RSLKHEDINL QEIISAAGKE LSEAFKQKTS EILSHAHAAL DQPLPTTLKK QEEKEILKSQ LDSLLGLYHL LDWFAVDESN EVDPEFSARL TGIKLEMEPS LSFYNKARNY ATKKPY (Aa 527-1307 of SEQ ID NO: 43) SVEKFKLNFQ MPTLASGWDV NKEKNNGAIL FVKNGLYYLG IMPKQKGRYK ALSFEPTEKT SEGFDKMYYD YFPDAAKMIP KCSTQLKAVT AHFQTHTTPI LLSNNFIEPL EITKEIYDLN NPEKEPKKFQ TAYAKKTGDQ KGYREALCKW IDFTRDFLSK YTKTTSIDLS SLRPSSQYKD LGEYYAELNP LLYHISFQRI AEKEIMDAVE TGKLYLFQIY NKDFAKGHHG KPNLHTLYWT GLFSPENLAK TSIKLNGQAE LFYRPKSRMK RMAHRLGEKM LNKKLKDQKT PIPDTLYQEL YDYVNHRLSH DLSDEARALL PNVITKEVSH EIIKDRRFTS DKFFFHVPIT LNYQAANSPS KFNQRVNAYL KEHPETPIIG IDRGERNLIY ITVIDSTGKI LEQRSLNTIQ QFDYQKKLDN REKERVAARQ AWSVVGTIKD LKQGYLSQVI HEIVDLMIHY QAVVVLENLN FGFKSKRTGI AEKAVYQQFE KMLIDKLNCL VLKDYPAEKV GGVLNPYQLT DQFTSFAKMG TQSGFLFYVP APYTSKIDPL TGFVDPFVWK TIKNHESRKH FLEGFDFLHY DVKTGDFILH FKMNRNLSFQ RGLPGFMPAW DIVFEKNETQ FDAKGTPFIA GKRIVPVIEN HRFTGRYRDL YPANELIALL EEKGIVFRDG SNILPKLLEN DDSHAIDTMV ALIRSVLQMR NSNAATGEDY INSPVRDLNG VCFDSRFQNP EWPMDADANG AYHIALKGQL LLNHLKESKD LKLQNGISNQ DWLAYIQELR N (Coding DNA sequence) SEQ ID NO: 89 (Coding DNA sequence) SEQ ID NO: 90

The amino acid of WT AsCpf1 (SEQ ID NO: 43) was divided into two half-domains, each of which was constructed as a recombinant vector that could be independently expressed by the CMV promoter. In the case of the recombinant vector, a nucleotide position signal necessary for transfer to the intracellular nucleus was added to each half-domain, and the CMV promoter sequence (SEQ ID NO: 64) and the poly A signal were included (see FIG. 29B; original backbone vector: pcDNA3. 1 (Invitrogen), HA: YPYDVPDYA, SV40 NLS: PKKKRKV, nucleoplasm NLS: KRPAATKKAGQAKKKK, 3xHA: YPYDVPDYAYPYDVPDYAYPYDVPDYA).

16.2. Split- Cpf1  Gene correction using

The recombinant vector expressing each half domain of the Split-Cpf1 and DNMT1 -3 target: the lipofectamine (lipofectamin) a plasmid expressing the crRNA (as described in Table 4 to manufacture) to operate on (CTGATGGTCCATGTCTGTTACTC SEQ ID NO: 19) 0.0 &gt; HEK293T17 &lt; / RTI &gt; cells (ATCC).

The recombinant vector expressing each half-domain of Split-Cpf1 was constructed as follows (see Fig. 29B): pAD1 (including Split-Cpf1 half domain 1 sequence) was inserted into pcDNA3.1 vector (Invitrogen) Domain 1 was constructed by Gibson cloning method, and each half domain was prepared by PCR using pY010 (Addgene) as a template. During Gibson cloning, restriction enzymes Hind3 and EcoR1 were used to excise the vector. pAD2 contains the Split-Cpf1 half-domain 2 sequence and was produced with reference to the method for producing pAD1.

All of the following gene correction tests were carried out in HEK293T17 cells (ATCC).

After genomic DNA was extracted from cells HEK293T17, DNMT1 -3 after amplifying a target site by PCR (primer sequences: DNMT1-3-1F: ccagaagtcccgtgcaaatc, DNMT1-3-1R: ATCTTTCTCAAGGGGCTGCT , DNMT1-3-2F: cagtgcatgttggggattcc, PCR conditions : 1st PCR Tm: 60 ° C, 2nd PCR Tm: 60 ° C), and T7E1 assay method was used to confirm that the genome was corrected.

The result of the analysis of the obtained agarose gel is shown in Fig. As shown in FIG. 30 (a), when the half-domains of Split-AsCpf1 were separately expressed, it was not confirmed that the genome was calibrated. However, when two half-domains were coexpressed, four types of Split-1 to Split-4 All of the DNA was cleaved and DNA fragment cleaved by the T7E1 assay appeared on the agarose gel.

Targeted deep-sequencing was performed to quantitatively analyze the dielectric correcting efficiency, and the results are shown in FIG. 30B. As shown in FIG. 30B, it was confirmed that the half-domains constituting Split-AsCpf1 were genetically corrected when the AsCpf1 protein was fused after expression, and the dielectric correcting efficiency was obtained by dividing the WT AsCpf1 protein into two fragments It was confirmed that there is a difference according to the position.

Further, in addition to 1-3 DNMT target to measure the dielectric correction efficiency by the Split-AsCpf1 according to the target position, target CCR5 -1 (GTGGGCAACATGCTGGTCATCCT; SEQ ID NO: 24) and DNMT1 -4 target (TTTCCCTTCAGCTAAAATAAAGG; SEQ ID NO: 20) In addition, cell experiments were conducted and the dielectric correcting efficiency was measured by Targeted deep-sequencing method. The obtained indel frequency (%) is shown in Fig. 30C. As shown in Figure 30C, Split-1-AsCpf1 to Split-4-AsCpf1 all worked on three targets and Split-3-AsCpf1 was able to correct the genome with higher efficiency compared to WT AsCpf1 Respectively.

This example solves the problem that the Cpf1 gene size is large and the virus production and the intracellular delivery efficiency are inferior, and at the same time, it proves the usefulness of the technique in that it finds a splitting position that operates at high efficiency compared with the existing WT Cpf1 .

Split-Cpf1 binds to each half-domain and acts on the target site. If the binding can be regulated using a specific signaling material, it is possible to operate the gene scissors transferred into the cell through the virus only at desired times using the signaling material . To implement this method FRB protein in each half domain of the Split-Cpf1 (SEQ ID NO: 81: EMWHEGLEEA SRLYFGERNV KGMFEVLEPL HAMMERGPQT LKETSFNQAY GRDLMEAQEW CRKYMKSGNV KDLTQAWDLY YHVFRRISKQ) and FKBP protein (SEQ ID NO: 82: GVQVETISPG DGRTFPKRGQ TCVVHYTGML EDGKKFDSSR DRNKPFKFML GKQEVIRGWE EGVAQMSVGQ RAKLTISPDY AYGATGHPGI IPPHATLVFD VELLKLE ) (Refer to FIG. 31A; hereinafter referred to as Inducible-Split-Cpf1). The pAD1 and pAD2 shown in Fig. 31A were manufactured by referring to the process described above. FRB and FKBP sequences were prepared by oligo extension process. The prepared FRB and FKBP were ligated to the half-domain through overlapping PCR and half-domain-FRB or half-domain-FKBP PCR products were subjected to Gibson cloning And cloned into pAD1 and pAD2. Restriction enzymes EcoR1 and Hind3 were used to cleave the vector in Gibson cloning.

FRB and FKBP are proteins known to bind strongly to rapamycin. FRB and FKBP bind to each other in the rapamycin structure, so they do not interfere with binding of each protein to rapamycin. Fusion proteins interfere with binding and genomic corrections in the absence of rapamycin because they inhibit the spontaneous binding of Split-Cpf1 half-domains. However, in the presence of rapamycin, they bind strongly to rapamycin, And to induce the binding to promote the genetic correction, and the experiment was carried out in HEK293T17 cells.

DNTM1 -3 target plasmid (pcDNA3.1) for expressing crRNA express a half-domain fusion plasmid and the FRB or FKBP was delivered within the cell. Rapamycin was treated at 200 nM and the samples were analyzed 72 hours after transfection. The results are shown in Fig. 31B. As shown in Fig. 31B, in the case of Inducible-Split-Cpf1 fused with FRB or FKBP protein, the induction-split-1 to inducible-split-4 all exhibit inhibition of genetic correcting operation under the condition of rapamycin, And the tendency of promoting dielectric correction was shown. In particular, Inducible-Split-1 and Inducible-Split-4 were found to work at high efficiency only in the presence of rapamycin, with almost no genetic correction at the condition level that did not treat Inducible-Split under the absence of rapamycin. Which is the most suitable for the purpose that was expected to be achieved.

Inducible-Split-1 and Inducible-Split-4 is DNMT1 -3 in addition to the target, the target -1 HBB (AGTCCTTTGGGGATCTGTCCACT; SEQ ID NO: 40), CCR5-8 target (GACACCGAAGCAGAGTTTTTAGG; SEQ ID NO: 49), HPRT1-1 target (CTGACCTGCTGGATTACATCAAA; SEQ ID NO: 27), and the inducible-split-Cpf1 inducible dielectric correcting efficiency was analyzed by targeted deep-sequencing method under the conditions of treatment with rapamycin in all targets, and the results are shown in Figs. 31c to 31f Respectively. As shown in FIGS. 31C to 31F, Inducible-Split-Cpf1 for the above-mentioned target also works significantly.

Based on the Split-Cpf1 information thus found, the expression cassette was transferred to the AAV virus vector. The AAV-MCS expression vector (VPK-410, Cell Biolabs, INC) was prepared from a cassette capable of expressing the half-domain of Split-Cpf1 (Split-3-AsCpf1) and a cassette capable of expressing the half-domain of AsCpf1 However, since the wild-type AsCpf1 is divided into two pieces, the total size is 2.1 kb (half-domain 1) and 3.8 kb (half-domain 2) smaller than 4.7 kb, ) (See Fig. 32A).

In the case of using Split-Cpf1, it is expected that it is possible to bind and express proteins having specific functions in Split-Cpf1 as there is no problem in virus packaging even if additional sequences are added.

To confirm that the constructed AAV-Split-3-Cpf1 vector works, it was first transferred to the cells in the form of a plasmid to confirm whether the genome was calibrated. ATP-Split-3-Cpf1 vector and Split-3-Cpf1 vector were cloned into a pcDNA3.1 vector (addgene) as a control group for AAV-Split-3-Cpf1 and corresponding vector, , And p3-Cpf1 vector (full-length AsCpf1 was cloned into p3 vector), respectively, were measured by the T7E1 assay method. As shown in Fig. 32B, it was confirmed that the AAV-Split-Cpf1 vector works close to the dielectric correction efficiency of the control groups, similar to the tendency of experimentation in the p3 vector. Using the prepared viral vectors, it is expected that they can be used for practical AAV production and in vivo genome editing experiments.

Example 17: Hif1-alpha protein knock-out test using Cpf1

Hif1alpha protein is a transcription factor that specifically binds to a gene that expresses vascular endothelial growth factor-A (VEGF-A) when the cellular environment becomes hypoxia, thereby activating gene transcription. In ocular diseases such as diabetic retinopathy and senile AMD, abnormal expression of VEGFA is induced by abnormal hypoxia of the cells. There is a possibility to develop eye disease treatment by knocking out HF1a transcription factor activating VEGFA through LbCpf1. This example demonstrated the possibility of treating eye diseases by demonstrating effective intraocular delivery of CrRNA targeting LbCpf1 and Hif1a genes using adeno-associated virus.

The target sequence that can be used for the allele knockout of the Hif1a gene encoding the Hypoxia-inducible factor 1 (Hif1) -alpha protein is crRNA (LbCpf1) targeting the 5'-RGEN target-3 'sequence present in the Hif1a exon ).

The target sequence of Cpf1 sgRNA (single guide RNA) (crRNA) that can be used for Hif1a gene knockout sgRNA  ID Hif1a gene PAM The target sequence (5 'to 3') LB-TS1 Hif1a-Exon2-TS2 TTTT TATGAGCTTGCTCATCAGTTGCC (SEQ ID NO: 69) LB-TS2 Hif1a-Exon4-TS1 TTTG AACTAACTGGACACAGTGTGTTT (SEQ ID NO: 70) LB-TS3 Hif1a-Exon4-TS2 TTTG ATTTTACTCATCCATGTGACCAT (SEQ ID NO: 71) LB-TS4 Hif1a-Exon4-TS3 TTTT ACTCATCCATGTGACCATGAGGA (SEQ ID NO: 72) LB-TS5 Hif1a-Exon8-TS1 TTTA CTAAAGGACAAGTCACCACAGGA (SEQ ID NO: 73) LB-TS6 Hif1a-Exon8-TS2 TTTG GCAAGCATCCTGTACTGTCCTGT (SEQ ID NO: 74) LB-TS7 Hif1a-Exon8-TS3 TTTT GGCAAGCATCCTGTACTGTCCTG (SEQ ID NO: 75) LB-TS8 Hif1a-Exon8-TS4 TTTC AACCCAGACATATCCACCTCTTT (SEQ ID NO: 76) LB-TS9 Hif1a-Exon9-TS1 TTTG TTGAAGGGAGAAAATCAAGTCGT (SEQ ID NO: 77) LB-TS10 Hif1a-Exon9-TS2 TTTT GACAGTGGTATTATTCAGCACGA (SEQ ID NO: 78) LB-TS11 Hif1a-Exon9-TS3 TTTG ACAGTGGTATTATTCAGCACGAC (SEQ ID NO: 79)

LbCpf1 &lt; / RTI &gt; crRNA for the target sequence is the same as the target sequence (SEQ ID NO: 37) shown in Table 4, .

(Invitrogen) (LbCpf1 plasmid) containing the DNA sequence encoding the LbCpf1 protein and the CMV promoter operatively linked thereto (SEQ ID NO: 64) and the respective crRNA for the Hif1a gene (LB-TS6 Were introduced into 293T cells (ATCC) by transfection with lipofectamine. The plasmids were transfected into plasmids (pUC19 vector, Lb-crRNA plasmid) Then, genomic DNA was extracted from 293T cells using DNeasy Blood & Tissue Kit (Qiagen kit) according to the manufacturer's instructions. The target sequence in the Hif1a gene of the extracted genomic DNA (Table 36) was amplified by PCR.

The frequency (%) of Indel (insertion or deletion) introduced into the amplified PCR product was analyzed by Deep sequencing, and the results are shown in FIG.

As shown in Fig. 37, it was found that the LbCpf1 protein introduced into the cell works together with the crRNA to induce Indel in the Hif1a gene. For reference, Indel was not detected (0%) when only plasmid encoding LbCpf1 was transfected.

In Fig. 37, cloning was carried out on an AAV vector comprising a DNA encoding the target sequence of Hif1a (LB-TS6) showing a good indel level and a DNA encoding LbCpf1. The prepared recombinant AAV vector is an all-in-one vector system in which LbCpf1 is regulated in the elongation factor short promoter in one vector and two molecules in which the crna is regulated in the U6 promoter are simultaneously expressed (Fig. 38, Figs. 39A to 39C , And SEQ ID NO: 80). 39A to 39C show the entire sequence of the prepared recombinant AAV (SEQ ID NO: 80) continuously in the 5 'to 3' direction, and the underlined and / or italicized regions are shown sequentially (5 'to 3' ), Inverted Terminal repeat (ITR, 5 '), U6 promoter, LBCpf1 crRNA (LB-TS6, underlined and bold), Elongation factor 1a-short promoter, LBCpf1 (bold italic) , And the ITR sequence (3 '). Among them, U6 promoter, LBCpf1 crRNA (LB-TS6; underline and bold), Elongation factor 1a-short promoter, LBCpf1 (bold italic), NLS, HA tag, and bGH poly The A signal region is a total of 4675 bp (Figure 38).

LbCpf1 and crRNA were expressed within a packaging limit of 4.7 kb for the recombinant AAV vector.

<110> INSTITUTE FOR BASIC SCIENCE <120> Composition for Genome Editing Comprising Cpf1 and Use thereof <130> DPP20164740KR <150> KR 10-2015-0174212 <151> 2015-12-08 &Lt; 150 > US 62 / 299,043 <151> 2016-02-24 <150> KR 10-2016-0036381 <151> 2016-03-25 <160> 90 <170> Kopatentin 2.0 <210> 1 <211> 84 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 Target region of FoxN1 exon 7 (wild type) <400> 1 cttgtcgatt ttggaaggat tgagggccca cagacagccc tttcgagagg aacttccgga 60 tttattctcc accttctcaa agca 84 <210> 2 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence 1 (crRNA A) of the target sequence of FoxN1 exon          7 <400> 2 gaaggattga gggcccacag aca 23 <210> 3 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence 2 (crRNA B) of target sequence of FoxN1 exon          7 <400> 3 gagaggaact tccggattta ttc 23 <210> 4 <211> 78 <212> DNA <213> Artificial Sequence <220> <223> on-target 6 bp deletion at FoxN1 exon 7 by Cpf1 <400> 4 cttgtcgatt ttggaaggat tgaggggaca gccctttcga gaggaacttc cggatttatt 60 ctccaccttc tcaaagca 78 <210> 5 <211> 92 <212> DNA <213> Artificial Sequence <220> <223> Target region of VEGFa for SgCas9 <400> 5 ttgaagatgt actctatctc gtcggggtac tcctggaaga tgtccaccag ggtctcaatc 60 ggacggcagt agcttcgctg gtagacatcc at 92 <210> 6 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> Target sequence of VEGFa for SgCas9 <400> 6 ctcctggaag atgtccacca 20 <210> 7 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 PAM and Target sequence 1 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 7 tacattgcca ccacctactt cc 22 <210> 8 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 PAM and Target sequence 2 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 8 cctcattgca tggccaatct at 22 <210> 9 <211> 22 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > Cpf1 PAM and Target sequence 3 (LbCpf1) for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 9 gtcccttatt tctcatggaa aa 22 <210> 10 <211> 22 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > Cpf1 PAM and Target sequence 4 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 10 tcatggaaaa taagccatcg cc 22 <210> 11 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 PAM and Target sequence 5 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 11 tcccaaaacc aaaatccaaa gt 22 <210> 12 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 PAM and Target sequence 6 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 12 gctgctatgt gtttatgggg tg 22 <210> 13 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 PAM and Target sequence 7 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 13 gcaactatgg acagagatta tg 22 <210> 14 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 PAM and Target sequence 8 for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 14 atgacacacc attttacaag gc 22 <210> 15 <211> 21 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > Cpf1 PAM and Target sequence 9 (AsCpf1) for FAD2 homologous genes          Glyma10g42470 and Glyma20g24530 <400> 15 caaggcactg tggagagaag c 21 <210> 16 <211> 73 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target region containing FAD2 containing target sequence 3 <400> 16 ttgatgatgt tatgggtttg accgttcact cagcactttt agtcccttat ttctcatgga 60 aaataagcca tcg 73 <210> 17 <211> 39 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > Cpf1 target region (on target) for DNMT1-3 <400> 17 gtttcctgat ggtccatgtc tgttactcgc ctgtcaagt 39 <210> 18 <211> 27 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > Cpf1 PAM and target sequence for DNMT1-3, where N is A, T, C,          or G <400> 18 tttnctgatg gtccatgtct gttactc 27 <210> 19 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence of DNMT1-3 <400> 19 ctgatggtcc atgtctgtta ctc 23 <210> 20 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence of DNMT1-4 <400> 20 tttcccttca gctaaaataa agg 23 <210> 21 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence of AAVS1 <400> 21 cttacgatgg agccagagag gat 23 <210> 22 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence of DNMT1-3 (5 'PAM contained) <400> 22 tttcctgatg gtccatgtct gttactc 27 <210> 23 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of EMX1 <400> 23 tcctccggtt ctggaaccac acc 23 <210> 24 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of CCR5-1 <400> 24 gtgggcaaca tgctggtcat cct 23 <210> 25 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of CCR5-9 <400> 25 gcctgaataa ttgcagtagc tct 23 <210> 26 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of AAVS1 <400> 26 cttacgatgg agccagagag gat 23 <210> 27 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of HPRT-1 <400> 27 ctgacctgct ggattacatc aaa 23 <210> 28 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of HPRT-4 <400> 28 tgtcccctgt tgactggtca ttc 23 <210> 29 <211> 27 <212> DNA <213> Artificial Sequence <220> AsCpf1 crRNA for DNMT1 (DNMT1-3) <400> 29 tttcctgatg gtccatgtct gttactc 27 <210> 30 <211> 47 <212> DNA <213> Artificial Sequence <220> <223> Target region of DNMT1-3 for LbCpf1 <400> 30 tacgttaatg tttcctgatg gtccatgtct gttactcgcc tgtcaag 47 <210> 31 <211> 47 <212> DNA <213> Artificial Sequence <220> <223> Target region of DNMT1-3 for AsCpf1 <400> 31 tacgttaatg tttcctgatg gtccatgtct gttactcgcc tgtcaag 47 <210> 32 <211> 47 <212> DNA <213> Artificial Sequence <220> <223> Target region of DNMT1-3 for SpCas9 <400> 32 tacgttaatg tttcctgatg gtccatgtct gttactcgcc tgtcaag 47 <210> 33 <211> 67 <212> DNA <213> Artificial Sequence <220> <223> Target region of EMX1-2 for LbCpf1 <400> 33 ccgtttgtac tttgtcctcc ggttctggaa ccacaccttc acctgggcca gggagggagg 60 ggcacag 67 <210> 34 <211> 67 <212> DNA <213> Artificial Sequence <220> <223> Target region of EMX1-2 for AsCpf1 <400> 34 ccgtttgtac tttgtcctcc ggttctggaa ccacaccttc acctgggcca gggagggagg 60 ggcacag 67 <210> 35 <211> 67 <212> DNA <213> Artificial Sequence <220> <223> Target region of EMX1-2 for SpCas9 <400> 35 ccgtttgtac tttgtcctcc ggttctggaa ccacaccttc acctgggcca gggagggagg 60 ggcacag 67 <210> 36 <211> 43 <212> RNA <213> Artificial Sequence <220> <223> AsCpf1 crRNA of DNMT1-3 <400> 36 uaauuucuac ucuuguagau cugauggucc augucuguua cuc 43 <210> 37 <211> 43 <212> RNA <213> Artificial Sequence <220> <223> LbCpf1 crRNA of DNMT1-3 <400> 37 aauuucuacu aaguguagau cugauggucc augucuguua cuc 43 <210> 38 <211> 43 <212> RNA <213> Artificial Sequence <220> <223> FnCpf1 crRNA of DNMT1-3 <400> 38 uaauuucuac uguuguagau cugauggucc augucuguua cuc 43 <210> 39 <211> 44 <212> RNA <213> Artificial Sequence <220> <223> MbCpf1 crRNA of DNMT1-3 <400> 39 aaauuucuac uguuuguaga ucugaugguc caugucuguu acuc 44 <210> 40 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of HBB-1 <400> 40 agtcctttgg ggatctgtcc act 23 <210> 41 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 On target of VEGFA <400> 41 cgtccaactt ctgggctgtt ctc 23 <210> 42 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence of EMX1-2 <400> 42 ctgatggtcc atgtctgtta ctc 23 <210> 43 <211> 1307 <212> PRT <213> Artificial Sequence <220> &Lt; 223 > Cpf1 protein derived from Acidaminococcus sp. BVBLG (AsCpf1) <400> 43 Met Thr Gln Phe Glu Gly Phe Thr Asn Leu Tyr Gln Val Ser Lys Thr   1 5 10 15 Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys Thr Leu Lys His Ile Gln              20 25 30 Glu Gln Gly Phe Ile Glu Glu Asp Lys Ala Arg Asn Asp His Tyr Lys          35 40 45 Glu Leu Lys Pro Ile Ile Asp Arg Ile Tyr Lys Thr Tyr Ala Asp Gln      50 55 60 Cys Leu Gln Leu Val Gln Leu Asp Trp Glu Asn Leu Ser Ala Ala Ile  65 70 75 80 Asp Ser Tyr Arg Lys Glu Lys Thr Glu Glu Thr Arg Asn Ala Leu Ile                  85 90 95 Glu Glu Gln Ala Thr Tyr Arg Asn Ala Ile His Asp Tyr Phe Ile Gly             100 105 110 Arg Thr Asp Leu Thr Asp Ala Ile Asn Lys Arg His Ala Glu Ile         115 120 125 Tyr Lys Gly Leu Phe Lys Ala Glu Leu Phe Asn Gly Lys Val Leu Lys     130 135 140 Gln Leu Gly Thr Val Thr Thr Thr Glu His Glu Asn Ala Leu Leu Arg 145 150 155 160 Ser Phe Asp Lys Phe Thr Thr Tyr Phe Ser Gly Phe Tyr Glu Asn Arg                 165 170 175 Lys Asn Val Phe Ser Ala Glu Asp Ile Ser Thr Ala Ile Pro His Arg             180 185 190 Ile Val Gln Asp Asn Phe Pro Lys Phe Lys Glu Asn Cys His Ile Phe         195 200 205 Thr Arg Leu Ile Thr Ala Val Ser Ser Leu Arg Glu His Phe Glu Asn     210 215 220 Val Lys Lys Ala Ile Gly Ile Phe Val Ser Thr Ser Ile Glu Glu Val 225 230 235 240 Phe Ser Phe Pro Phe Tyr Asn Gln Leu Leu Thr Gln Thr Gln Ile Asp                 245 250 255 Leu Tyr Asn Gln Leu Leu Gly Gly Ile Ser Arg Glu Ala Gly Thr Glu             260 265 270 Lys Ile Lys Gly Leu Asn Glu Val Leu Asn Leu Ala Ile Gln Lys Asn         275 280 285 Asp Glu Thr Ala His Ile Ile Ala Ser Leu Pro His Arg Phe Ile Pro     290 295 300 Leu Phe Lys Gln Ile Leu Ser Asp Arg Asn Thr Leu Ser Phe Ile Leu 305 310 315 320 Glu Glu Phe Lys Ser Asp Glu Glu Val Ile Gln Ser Phe Cys Lys Tyr                 325 330 335 Lys Thr Leu Leu Arg Asn Glu Asn Val Leu Glu Thr Ala Glu Ala Leu             340 345 350 Phe Asn Glu Leu Asn Ser Ile Asp Leu Thr His Ile Phe Ile Ser His         355 360 365 Lys Lys Leu Glu Thr Ile Ser Ser Ala Leu Cys Asp His Trp Asp Thr     370 375 380 Leu Arg Asn Ala Leu Tyr Glu Arg Arg Ile Ser Glu Leu Thr Gly Lys 385 390 395 400 Ile Thr Lys Ser Ala Lys Glu Lys Val Gln Arg Ser Leu Lys His Glu                 405 410 415 Asp Ile Asn Leu Gln Glu Ile Ile Ser Ala Ala Gly Lys Glu Leu Ser             420 425 430 Glu Ala Phe Lys Gln Lys Thr Ser Glu Ile Leu Ser His Ala His Ala         435 440 445 Ala Leu Asp Gln Pro Leu Pro Thr Thr Leu Lys Lys Gln Glu Glu Lys     450 455 460 Glu Ile Leu Lys Ser Gln Leu Asp Ser Leu Leu Gly Leu Tyr His Leu 465 470 475 480 Leu Asp Trp Phe Ala Val Asp Glu Ser Asn Glu Val Asp Pro Glu Phe                 485 490 495 Ser Ala Arg Leu Thr Gly Ile Lys Leu Glu Met Glu Pro Ser Leu Ser             500 505 510 Phe Tyr Asn Lys Ala Arg Asn Tyr Ala Thr Lys Lys Pro Tyr Ser Val         515 520 525 Glu Lys Phe Lys Leu Asn Phe Gln Met Pro Thr Leu Ala Ser Gly Trp     530 535 540 Asp Val Asn Lys Glu Lys Asn Asn Gly Ala Ile Leu Phe Val Lys Asn 545 550 555 560 Gly Leu Tyr Tyr Leu Gly Ile Met Pro Lys Gln Lys Gly Arg Tyr Lys                 565 570 575 Ala Leu Ser Phe Glu Pro Thr Glu Lys Thr Ser Glu Gly Phe Asp Lys             580 585 590 Met Tyr Tyr Asp Tyr Phe Pro Asp Ala Ala Lys Met Ile Pro Lys Cys         595 600 605 Ser Thr Gln Leu Lys Ala Val Thr Ala His Phe Gln Thr His Thr Thr     610 615 620 Pro Ile Leu Leu Ser Asn Asn Phe Ile Glu Pro Leu Glu Ile Thr Lys 625 630 635 640 Glu Ile Tyr Asp Leu Asn Asn Pro Glu Lys Glu Pro Lys Lys Phe Gln                 645 650 655 Thr Ala Tyr Ala Lys Lys Thr Gly Asp Gln Lys Gly Tyr Arg Glu Ala             660 665 670 Leu Cys Lys Trp Ile Asp Phe Thr Arg Asp Phe Leu Ser Lys Tyr Thr         675 680 685 Lys Thr Thr Ser Ile Asp Leu Ser Ser Leu Arg Pro Ser Ser Gln Tyr     690 695 700 Lys Asp Leu Gly Glu Tyr Tyr Ala Glu Leu Asn Pro Leu Leu Tyr His 705 710 715 720 Ile Ser Phe Gln Arg Ile Ala Glu Lys Glu Ile Met Asp Ala Val Glu                 725 730 735 Thr Gly Lys Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ala Lys             740 745 750 Gly His His Gly Lys Pro Asn Leu His Thr Leu Tyr Trp Thr Gly Leu         755 760 765 Phe Ser Pro Glu Asn Leu Ala Lys Thr Ser Ile Lys Leu Asn Gly Gln     770 775 780 Ala Glu Leu Phe Tyr Arg Pro Lys Ser Arg Met Lys Arg Met Ala His 785 790 795 800 Arg Leu Gly Glu Lys Met Leu Asn Lys Lys Leu Lys Asp Gln Lys Thr                 805 810 815 Pro Ile Pro Asp Thr Leu Tyr Gln Glu Leu Tyr Asp Tyr Val Asn His             820 825 830 Arg Leu Ser His Asp Leu Ser Asp Glu Ala Arg Ala Leu Leu Pro Asn         835 840 845 Val Ile Thr Lys Glu Val Ser His Glu Ile Ile Lys Asp Arg Arg Phe     850 855 860 Thr Ser Asp Lys Phe Phe Phe His Val Pro Ile Thr Leu Asn Tyr Gln 865 870 875 880 Ala Ala Asn Ser Ser Lys Phe Asn Gln Arg Val Asn Ala Tyr Leu                 885 890 895 Lys Glu His Pro Glu Thr Pro Ile Ile Gly Ile Asp Arg Gly Glu Arg             900 905 910 Asn Leu Ile Tyr Ile Thr Val Ile Asp Ser Thr Gly Lys Ile Leu Glu         915 920 925 Gln Arg Ser Leu Asn Thr Ile Gln Gln Phe Asp Tyr Gln Lys Lys Leu     930 935 940 Asp Asn Arg Glu Lys Glu Arg Val Ala Ala Arg Gln Ala Trp Ser Val 945 950 955 960 Val Gly Thr Ile Lys Asp Leu Lys Gln Gly Tyr Leu Ser Gln Val Ile                 965 970 975 His Glu Ile Val Asp Leu Met Ile His Tyr Gln Ala Val Val Val Leu             980 985 990 Glu Asn Leu Asn Phe Gly Phe Lys Ser Lys Arg Thr Gly Ile Ala Glu         995 1000 1005 Lys Ala Val Tyr Gln Gln Phe Glu Lys Met Leu Ile Asp Lys Leu Asn    1010 1015 1020 Cys Leu Val Leu Lys Asp Tyr Pro Ala Glu Lys Val Gly Gly Val Leu 1025 1030 1035 1040 Asn Pro Tyr Gln Leu Thr Asp Gln Phe Thr Ser Phe Ala Lys Met Gly                1045 1050 1055 Thr Gln Ser Gly Phe Leu Phe Tyr Val Pro Ala Pro Tyr Thr Ser Lys            1060 1065 1070 Ile Asp Pro Leu Thr Gly Phe Val Asp Pro Phe Val Trp Lys Thr Ile        1075 1080 1085 Lys Asn His Glu Ser Arg Lys His Phe Leu Glu Gly Phe Asp Phe Leu    1090 1095 1100 His Tyr Asp Val Lys Thr Gly Asp Phe Ile Leu His Phe Lys Met Asn 1105 1110 1115 1120 Arg Asn Leu Ser Phe Gln Arg Gly Leu Pro Gly Phe Met Pro Ala Trp                1125 1130 1135 Asp Ile Val Phe Glu Lys Asn Glu Thr Gln Phe Asp Ala Lys Gly Thr            1140 1145 1150 Pro Phe Ile Ala Gly Lys Arg Ile Val Pro Val Ile Glu Asn His Arg        1155 1160 1165 Phe Thr Gly Arg Tyr Arg Asp Leu Tyr Pro Ala Asn Glu Leu Ile Ala    1170 1175 1180 Leu Leu Glu Glu Lys Gly Ile Val Phe Arg Asp Gly Ser Asn Ile Leu 1185 1190 1195 1200 Pro Lys Leu Leu Glu Asn Asp Asp Ser Ala Ile Asp Thr Met Val                1205 1210 1215 Ala Leu Ile Arg Ser Val Leu Gln Met Arg Asn Ser Asn Ala Ala Thr            1220 1225 1230 Gly Glu Asp Tyr Ile Asn Ser Pro Val Arg Asp Leu Asn Gly Val Cys        1235 1240 1245 Phe Asp Ser Arg Phe Gln Asn Pro Glu Trp Pro Met Asp Ala Asp Ala    1250 1255 1260 Asn Gly Ala Tyr His Ile Ala Leu Lys Gly Gln Leu Leu Leu Asn His 1265 1270 1275 1280 Leu Lys Glu Ser Lys Asp Leu Lys Leu Gln Asn Gly Ile Ser Asn Gln                1285 1290 1295 Asp Trp Leu Ala Tyr Ile Gln Glu Leu Arg Asn            1300 1305 <210> 44 <211> 3921 <212> DNA <213> Artificial Sequence <220> <223> E.coli codon optimized AsCpf1 coding nucleic acid <400> 44 atgacacagt ttgaaggctt caccaatctc taccaggtca gcaagacgct acgttttgag 60 cttatcccgc agggaaaaac cctgaaacac attcaggaac aggggttcat agaggaagat 120 aaggcgcgta acgaccatta taaagaactg aagcctataa tcgaccgtat ttataaaacg 180 tacgcggatc agtgcctgca gctggttcag ctggattggg agaatctgtc cgcggctatt 240 gatagctatc gcaaagagaa gaccgaggaa acccgtaacg cactgattga agagcaggcg 300 acctatcgga atgcgatcca tgattacttc atcggccgca ccgacaacct gaccgatgca 360 attaacaaac gtcacgcaga gatttacaaa ggtctgttta aagcagagtt attcaatggc 420 aaggttctga aacagctggg tacggtcacc accaccgaac acgaaaacgc actgctgagg 480 agctttgata aatttaccac atatttcagc ggtttctatg aaaatcgtaa gaatgtattt 540 agcgccgaag atatttccac cgcaattcct catcgtattg tgcaggataa ttttccgaag 600 tttaaagaaa attgtcatat ttttacccgt ctgatcaccg cggtaccgag cctgcgagag 660 cattttgaaa acgttaagaa agccattgga atttttgtca gtaccagcat tgaagaagtg 720 ttttcgttcc cgttctataa ccaactgctg acccagaccc agattgatct gtacaatcag 780 ctgctggggg gcataagccg cgaggcaggt accgaaaaga taaagggact caatgaggtg 840 ctgaatctgg caattcagaa gaatgatgaa acggctcata tcattgctag cctgccgcat 900 cgtttcattc ccctgtttaa gcaaatcctg agcgatcgca atacactgag ctttatcctc 960 gaagagttta aatcggacga agaagttatc cagagctttt gcaaatacaa aaccctgctg 1020 cggaacgaaa atgtgctgga gaccgctgaa gcactgttta atgaactgaa ctcgatcgac 1080 ctcacccata tttttatatc ccacaaaaaa ctggaaacca taagcagcgc tctgtgtgac 1140 cattgggata ccctgcgcaa cgccctgtat gaacggcgta tcagcgagct gaccgggaaa 1200 atcaccaaat ccgcaaagga aaaagttcag cgtagtctga aacacgagga catcaacctg 1260 caagaaatta ttagcgcagc aggtaaagag ctgagcgaag cattcaaaca gaaaaccagc 1320 gaaatcctga gccatgccca tgctgcactg gatcagccgc tgccgaccac cctgaaaaaa 1380 caggaggaaa aggagattct gaaaagccaa ctggacagcc tgctgggcct gtatcacctg 1440 ctggactggt ttgcagtcga tgagagcaac gaggttgatc ctgagttctc cgctcgtctg 1500 accggaatca agctggagat ggaaccgagt ctgtcgtttt acaataaagc gcgtaattac 1560 gcgaccaaga aaccgtatag cgtggaaaaa ttcaaactga actttcagat gccgaccctt 1620 gcaagcggat gggacgttaa caaagaaaaa aacaatgggg caattctgtt tgtgaaaaat 1680 ggcctctatt atctgggtat catgccgaaa cagaaagggc gctacaaagc cctgtcattt 1740 gagccgaccg agaaaacctc agagggtttc gacaagatgt actacgatta tttcccggat 1800 gcggcaaaaa tgatacccaa atgtagcacc caactgaagg cagttacagc ccactttcag 1860 acccatacca ccccgatcct gctgtcgaac aattttatag agccgctgga aattaccaaa 1920 gagatttatg atctgaataa tccggaaaag gagcccaaga aatttcagac ggcgtatgca 1980 aaaaagaccg gggatcagaa aggttatcgt gaagcgctgt gcaaatggat tgactttacc 2040 cgtgactttc tgtcaaaata taccaaaacg acgagcattg atctgagcag cctacgtccg 2100 agcagccaat ataaggatct gggcgaatat tacgccgaac tgaatccgct gctctaccat 2160 atttccttcc aacgaatcgc tgaaaaagaa ataatggacg ccgttgaaac cggcaaactg 2220 tatctgtttc aaatctacaa caaagatttc gccaaaggcc atcacggtaa gccgaacctg 2280 cataccctgt attggaccgg tctgtttagc ccggagaatc tggccaaaac cagcatcaag 2340 ctgaacggac aggcagaact gttttaccgc cccaaaagcc gtatgaaaag gatggcacac 2400 cgcctgggcg aaaaaatgct gaataagaaa ctcaaagatc agaaaacgcc gataccggat 2460 accctttatc aggagctgta tgattatgtt aaccaccggc tgagccatga cctgagcgac 2520 gaagcgcgtg cactgctgcc gaacgtgatt accaaggaag tctcgcatga aattattaaa 2580 gatcggcgct tcaccagtga taaatttttc ttccatgtac cgatcaccct gaattatcaa 2640 gccgcaaata gcccttccaa atttaatcaa cgcgtgaatg cgtacctgaa agagcatccg 2700 gagaccccaa ttattggcat agaccgagga gaacgcaatc tcatttatat caccgtcatt 2760 gatagcaccg gtaagatcct ggaacagcgt agcctgaata ccattcagca gtttgactac 2820 cagaaaaagc tggacaacag agaaaaggaa cgtgtagccg cccggcaggc ttggagtgtg 2880 gtgggtacta tcaaggatct gaagcagggg tatctctccc aagttatcca tgaaattgtc 2940 gatctaatga ttcactatca agcagtagtg gtactggaaa atctgaattt cggtttcaaa 3000 agcaaacgta cagggatcgc tgaaaaagcc gtttatcagc agttcgagaa aatgctgata 3060 gacaagctga attgcctggt tctgaaagat tatccggcag agaaggtggg cggtgtgctg 3120 aacccgtacc agctgactga tcaatttacg agctttgcaa aaatgggaac gcagagcggt 3180 ttcctgttct atgttccggc gccatatacc agcaagatag acccgctgac aggtttcgta 3240 gatccgtttg tctggaaaac cattaaaaat catgaaagtc gcaaacattt tctggagggc 3300 tttgattttc tgcactatga cgtgaaaacc ggcgacttca ttctgcattt taaaatgaac 3360 cgtaatctgt cctttcagcg cggcctgcct ggctttatgc cggcgtggga cattgttttt 3420 gaaaagaatg agacacagtt tgatgccaaa ggtaccccct ttattgcggg gaaacgcatt 3480 gtgcccgtta tagaaaatca ccgcttcacc ggacggtata gggacttgta cccggcaaat 3540 gaattgatag cgctgctgga ggagaaaggt attgtctttc gggatggatc aaacatcctg 3600 ccgaagctgc tggagaacga tgacagccac gcaatagaca ccatggtagc gctgatccga 3660 agcgtgctgc agatgcgtaa cagtaatgcg gctacggggg aagactacat taatagcccg 3720 gtccgtgatc tgaacggcgt ttgtttcgat agcagatttc aaaatccgga gtggccgatg 3780 gatgccgatg ccaatggagc ttaccatatc gctctcaaag gtcagctcct actgaaccat 3840 ttgaaagaat caaaagatct gaaactgcag aacggcatct cgaatcagga ctggctggcc 3900 tacattcaag aactgagaaa c 3921 <210> 45 <211> 1228 <212> PRT <213> Artificial Sequence <220> &Lt; 223 > Cpf1 derived from Lachnospiraceae bacterium ND2006 (LbCpi1) <400> 45 Met Ser Lys Leu Glu Lys Phe Thr Asn Cys Tyr Ser Leu Ser Lys Thr   1 5 10 15 Leu Arg Phe Lys Ala Ile Pro Val Gly Lys Thr Gln Glu Asn Ile Asp              20 25 30 Asn Lys Arg Leu Leu Val Glu Asp Glu Lys Arg Ala Glu Asp Tyr Lys          35 40 45 Gly Val Lys Lys Leu Leu Asp Arg Tyr Tyr Leu Ser Phe Ile Asn Asp      50 55 60 Val Leu His Ser Ile Lys Leu Lys Asn Leu Asn Asn Tyr Ile Ser Leu  65 70 75 80 Phe Arg Lys Lys Thr Arg Thr Glu Lys Glu Asn Lys Glu Leu Glu Asn                  85 90 95 Leu Glu Ile Asn Leu Arg Lys Glu Ile Ala Lys Ala Phe Lys Gly Asn             100 105 110 Glu Gly Tyr Lys Ser Leu Phe Lys Lys Asp Ile Ile Glu Thr Ile Leu         115 120 125 Pro Glu Phe Leu Asp Asp Lys Asp Glu Ile Ala Leu Val Asn Ser Phe     130 135 140 Asn Gly Phe Thr Thr Ala Phe Thr Gly Phe Phe Asp Asn Arg Glu Asn 145 150 155 160 Met Phe Ser Glu Glu Ala Lys Ser Thr Ser Ile Ala Phe Arg Cys Ile                 165 170 175 Asn Glu Asn Leu Thr Arg Tyr Ile Ser Asn Met Asp Ile Phe Glu Lys             180 185 190 Val Asp Ala Ile Phe Asp Lys His Glu Val Gln Glu Ile Lys Glu Lys         195 200 205 Ile Leu Asn Ser Asp Tyr Asp Val Glu Asp Phe Phe Glu Gly Glu Phe     210 215 220 Phe Asn Phe Val Leu Thr Gln Glu Gly Ile Asp Val Tyr Asn Ala Ile 225 230 235 240 Ile Gly Gly Phe Val Thr Glu Ser Gly Glu Lys Ile Lys Gly Leu Asn                 245 250 255 Glu Tyr Ile Asn Leu Tyr Asn Gln Lys Thr Lys Gln Lys Leu Pro Lys             260 265 270 Phe Lys Pro Leu Tyr Lys Gln Val Leu Ser Asp Arg Glu Ser Leu Ser         275 280 285 Phe Tyr Gly Glu Gly Tyr Thr Ser Asp Glu Glu Val Leu Glu Val Phe     290 295 300 Arg Asn Thr Leu Asn Lys Asn Ser Glu Ile Phe Ser Ser Ile Lys Lys 305 310 315 320 Leu Glu Lys Leu Phe Lys Asn Phe Asp Glu Tyr Ser Ser Ala Gly Ile                 325 330 335 Phe Val Lys Asn Gly Pro Ala Ile Ser Thr Ile Ser Lys Asp Ile Phe             340 345 350 Gly Glu Trp Asn Val Ile Arg Asp Lys Trp Asn Ala Glu Tyr Asp Asp         355 360 365 Ile His Leu Lys Lys Lys Ala Val Val Thr Glu Lys Tyr Glu Asp Asp     370 375 380 Arg Arg Lys Ser Phe Lys Lys Ile Gly Ser Phe Ser Leu Glu Gln Leu 385 390 395 400 Gln Glu Tyr Ala Asp Ala Asp Leu Ser Val Val Glu Lys Leu Lys Glu                 405 410 415 Ile Ile Ile Gln Lys Val Asp Glu Ile Tyr Lys Val Tyr Gly Ser Ser             420 425 430 Glu Lys Leu Phe Asp Ala Asp Phe Val Leu Glu Lys Ser Leu Lys Lys         435 440 445 Asn Asp Ala Val Val Ala Ile Met Lys Asp Leu Leu Asp Ser Val Lys     450 455 460 Ser Phe Glu Asn Tyr Ile Lys Ala Phe Phe Gly Glu Gly Lys Glu Thr 465 470 475 480 Asn Arg Asp Glu Ser Phe Tyr Gly Asp Phe Val Leu Ala Tyr Asp Ile                 485 490 495 Leu Leu Lys Val Asp His Ile Tyr Asp Ala Ile Arg Asn Tyr Val Thr             500 505 510 Gln Lys Pro Tyr Ser Lys Asp Lys Phe Lys Leu Tyr Phe Gln Asn Pro         515 520 525 Gln Phe Met Gly Gly Trp Asp Lys Asp Lys Glu Thr Asp Tyr Arg Ala     530 535 540 Thr Ile Leu Arg Tyr Gly Ser Lys Tyr Tyr Leu Ala Ile Met Asp Lys 545 550 555 560 Lys Tyr Ala Lys Cys Leu Gln Lys Ile Asp Lys Asp Asp Val Asn Gly                 565 570 575 Asn Tyr Glu Lys Ile Asn Tyr Lys Leu Leu Pro Gly Pro Asn Lys Met             580 585 590 Leu Pro Lys Val Phe Phe Ser Lys Lys Trp Met Ala Tyr Tyr Asn Pro         595 600 605 Ser Glu Asp Ile Gln Lys Ile Tyr Lys Asn Gly Thr Phe Lys Lys Gly     610 615 620 Asp Met Phe Asn Leu Asn Asp Cys His Lys Leu Ile Asp Phe Phe Lys 625 630 635 640 Asp Ser Ile Ser Arg Tyr Pro Lys Trp Ser Asn Ala Tyr Asp Phe Asn                 645 650 655 Phe Ser Glu Thr Glu Lys Tyr Lys Asp Ile Ala Gly Phe Tyr Arg Glu             660 665 670 Val Glu Glu Gln Gly Tyr Lys Val Ser Phe Glu Ser Ala Ser Lys Lys         675 680 685 Glu Val Asp Lys Leu Val Glu Glu Gly Lys Leu Tyr Met Phe Gln Ile     690 695 700 Tyr Asn Lys Asp Phe Ser Asp Lys Ser His Gly Thr Pro Asn Leu His 705 710 715 720 Thr Met Tyr Phe Lys Leu Leu Phe Asp Glu Asn Asn His Gly Gln Ile                 725 730 735 Arg Leu Ser Gly Gly Ala Glu Leu Phe Met Arg Arg Ala Ser Leu Lys             740 745 750 Lys Glu Glu Leu Val Val His Pro Ala Asn Ser Pro Ile Ala Asn Lys         755 760 765 Asn Pro Asp Asn Pro Lys Lys Thr Thr Thr Leu Ser Tyr Asp Val Tyr     770 775 780 Lys Asp Lys Arg Phe Ser Glu Asp Gln Tyr Glu Leu His Ile Pro Ile 785 790 795 800 Ala Ile Asn Lys Cys Pro Lys Asn Ile Phe Lys Ile Asn Thr Glu Val                 805 810 815 Arg Val Leu Leu Lys His Asp Asp Asn Pro Tyr Val Ile Gly Ile Asp             820 825 830 Arg Gly Glu Arg Asn Leu Leu Tyr Ile Val Val Val Asp Gly Lys Gly         835 840 845 Asn Ile Val Glu Gln Tyr Ser Leu Asn Glu Ile Ile Asn Asn Phe Asn     850 855 860 Gly Ile Arg Ile Lys Thr Asp Tyr His Ser Leu Leu Asp Lys Lys Glu 865 870 875 880 Lys Glu Arg Phe Glu Ala Arg Gln Asn Trp Thr Ser Ile Glu Asn Ile                 885 890 895 Lys Glu Leu Lys Ala Gly Tyr Ile Ser Gln Val Val His Lys Ile Cys             900 905 910 Glu Leu Val Glu Lys Tyr Asp Ala Val Ile Ala Leu Glu Asp Leu Asn         915 920 925 Ser Gly Phe Lys Asn Ser Arg Val Lys Val Glu Lys Gln Val Tyr Gln     930 935 940 Lys Phe Glu Lys Met Leu Ile Asp Lys Leu Asn Tyr Met Val Asp Lys 945 950 955 960 Lys Ser Asn Pro Cys Ala Thr Gly Gly Ala Leu Lys Gly Tyr Gln Ile                 965 970 975 Thr Asn Lys Phe Glu Ser Phe Lys Ser Met Ser Thr Gln Asn Gly Phe             980 985 990 Ile Phe Tyr Ile Pro Ala Trp Leu Thr Ser Lys Ile Asp Pro Ser Thr         995 1000 1005 Gly Phe Val Asn Leu Leu Lys Thr Lys Tyr Thr Ser Ile Ala Asp Ser    1010 1015 1020 Lys Lys Phe Ile Ser Ser Phe Asp Arg Ile Met Tyr Val Pro Glu Glu 1025 1030 1035 1040 Asp Leu Phe Glu Phe Ala Leu Asp Tyr Lys Asn Phe Ser Arg Thr Asp                1045 1050 1055 Ala Asp Tyr Ile Lys Lys Trp Lys Leu Tyr Ser Tyr Gly Asn Arg Ile            1060 1065 1070 Arg Ile Phe Arg Asn Pro Lys Lys Asn Asn Val Phe Asp Trp Glu Glu        1075 1080 1085 Val Cys Leu Thr Ser Ala Tyr Lys Glu Leu Phe Asn Lys Tyr Gly Ile    1090 1095 1100 Asn Tyr Gln Gln Gly Asp Ile Arg Ala Leu Leu Cys Glu Gln Ser Asp 1105 1110 1115 1120 Lys Ala Phe Tyr Ser Ser Phe Met Ala Leu Met Ser Leu Met Leu Gln                1125 1130 1135 Met Arg Asn Ser Ile Thr Gly Arg Thr Asp Val Asp Phe Leu Ile Ser            1140 1145 1150 Pro Val Lys Asn Ser Asp Gly Ile Phe Tyr Asp Ser Arg Asn Tyr Glu        1155 1160 1165 Ala Gln Glu Asn Ala Ile Leu Pro Lys Asn Ala Asp Ala Asn Gly Ala    1170 1175 1180 Tyr Asn Ile Ala Arg Lys Val Leu Trp Ala Ile Gly Gln Phe Lys Lys 1185 1190 1195 1200 Ala Glu Asp Glu Lys Leu Asp Lys Val Lys Ile Ala Ile Ser Asn Lys                1205 1210 1215 Glu Trp Leu Glu Tyr Ala Gln Thr Ser Val Lys His            1220 1225 <210> 46 <211> 3684 <212> DNA <213> Artificial Sequence <220> <223> E. coli codon optimized LbCpf1 coding nucleic acid <400> 46 atgagcaaac tggaaaaatt tacgaattgt tatagcctgt ccaagaccct gcgtttcaaa 60 gccatccccg ttggcaaaac ccaggagaat attgataata aacgtctgct ggttgaggat 120 gaaaaaagag cagaagacta taagggagtc aaaaaactgc tggatcggta ctacctgagc 180 tttataaatg acgtgctgca tagcattaaa ctgaaaaatc tgaataacta tattagtctg 240 ttccgcaaga aaacccgaac agagaaagaa aataaagagc tggaaaacct ggagatcaat 300 ctgcgtaaag agatcgcaaa agcttttaaa ggaaatgaag gttataaaag cctgttcaaa 360 aaagacatta ttgaaaccat cctgccggaa tttctggatg ataaagacga gatagcgctc 420 gtgaacct tcaacgggtt cacgaccgcc ttcacgggct ttttcgataa cagggaaaat 480 atgttttcag aggaagccaa aagcacctcg atagcgttcc gttgcattaa tgaaaatttg 540 acaagatata tcagcaacat ggatattttc gagaaagttg atgcgatctt tgacaaacat 600 gaagtgcagg agattaagga aaaaattctg aacagcgatt atgatgttga ggattttttc 660 gagggggaat tttttaactt tgtactgaca caggaaggta tagatgtgta taatgctatt 720 atcggcgggt tcgttaccga atccggcgag aaaattaagg gtctgaatga gtacatcaat 780 ctgtataacc aaaagaccaa acagaaactg ccaaaattca aaccgctgta caagcaagtc 840 ctgagcgatc gggaaagctt gagcttttac ggtgaaggtt ataccagcga cgaggaggta 900 ctggaggtct ttcgcaatac cctgaacaag aacagcgaaa ttttcagctc cattaaaaag 960 ctggagaaac tgtttaagaa ttttgacgag tacagcagcg caggtatttt tgtgaagaac 1020 ggacctgcca taagcaccat tagcaaggat atttttggag agtggaatgt tatccgtgat 1080 aaatggaacg cggaatatga tgacatacac ctgaaaaaga aggctgtggt aactgagaaa 1140 tatgaagacg atcgccgcaa aagctttaaa aaaatcggca gctttagcct ggagcagctg 1200 caggaatatg cggacgccga cctgagcgtg gtcgagaaac tgaaggaaat tattatccaa 1260 aaagtggatg agatttacaa ggtatatggt agcagcgaaa aactgtttga tgcggacttc 1320 gttctggaaa aaagcctgaa aaaaaatgat gctgttgttg cgatcatgaa agacctgctc 1380 gatagcgtta agagctttga aaattacatt aaagcattct ttggcgaggg caaagaaaca 1440 aacagagacg aaagctttta tggcgacttc gtcctggctt atgacatcct gttgaaggta 1500 gatcatatat atgatgcaat tcgtaattac gtaacccaaa agccgtacag caaagataag 1560 ttcaaactgt atttccagaa cccgcagttt atgggtggct gggacaaaga caaggagaca 1620 gactatcgcg ccactattct gcgttacggc agcaagtact atctcgccat catggacaaa 1680 aaatatgcaa agtgtctgca gaaaatcgat aaagacgacg tgaacggaaa ttacgaaaag 1740 attaattata agctgctgcc agggcccaac aagatgttac cgaaagtatt tttttccaaa 1800 aaatggatgg catactataa cccgagcgag gatatacaga agatttacaa aaatgggacc 1860 ttcaaaaagg gggatatgtt caatctgaat gactgccaca aactgatcga tttttttaaa 1920 gatagcatca gccgttatcc taaatggtca aacgcgtatg attttaattt ctccgaaacg 1980 gagaaatata aagacattgc tggtttctat cgcgaagtcg aagaacaggg ttataaagtt 2040 agctttgaat cggccagcaa gaaagaggtt gataaactgg tggaggaggg taagctgtat 2100 atgtttcaga tttataaca agactttagc gacaaaagcc acggtactcc taatctgcat 2160 acgatgtact ttaaactgct gtttgatgag aataaccacg gccaaatccg tctctccggt 2220 ggagcagaac tttttatgcg gcgtgcgagc ctaaaaaagg aagaactggt ggtgcatccc 2280 gccaacagcc cgattgctaa caaaaatcca gataatccta agaagaccac cacactgtcg 2340 tacgatgtct ataaggataa acgtttctcg gaagaccagt atgaattgca tataccgata 2400 gcaattaata aatgcccaaa aaacattttc aaaatcaaca ctgaagttcg tgtgctgctg 2460 aaacatgatg ataatccgta tgtgatcgga attgaccgtg gggagagaaa tctgctgtat 2520 attgtagtcg ttgatggcaa gggcaacatc gttgagcagt atagcctgaa tgaaataatt 2580 aataatttta acggtatacg tattaaaacc gactatcata gcctgctgga taaaaaggag 2640 aaagagcgtt ttgaggcacg ccaaaattgg acgagcatcg aaaacatcaa ggaactgaag 2700 gcaggatata tcagccaagt agtccataaa atctgtgaac tggtggagaa gtacgacgct 2760 gtcattgccc tggaagacct caatagcggc tttaaaaaca gccgggtgaa ggtggagaaa 2820 caggtatacc aaaagtttga aaagatgctc attgataagc tgaactatat ggttgataaa 2880 aagagcaacc cgtgcgccac tggcggtgca ctgaaagggt accaaattac caataaattt 2940 gaaagcttta aaagcatgag cacgcagaat gggtttattt tttatatacc agcatggctg 3000 acgagcaaga ttgaccccag cactggtttt gtcaatctgc tgaaaaccaa atacacaagc 3060 tgggggaa gatctgtttg aatttgccct ggattataaa aacttcagcc gcaccgatgc agattatatc 3180 aaaaaatgga agctgtacag ttatggtaat cgtatacgta tcttccgtaa tccgaagaaa 3240 aacaatgtgt tcgattggga agaggtctgt ctgaccagcg cgtataaaga actgttcaac 3300 aagtacggaa taaattatca gcaaggtgac attcgcgcac tgctgtgtga acagtcagat 3360 aaagcatttt atagcagctt tatggcgctg atgagcctga tgctccagat gcgcaacagc 3420 ataaccggtc gcacagatgt tgactttctg atcagccctg tgaagaatag cgacggcatc 3480 ttctacgatt ccaggaacta tgaagcacag gaaaacgcta ttctgcctaa aaatgccgat 3540 gccaacggcg cctataatat tgcacggaag gttctgtggg cgattggaca gttcaagaaa 3600 gcggaagatg agaagctgga taaggtaaaa attgctatta gcaataagga atggctggag 3660 tacgcacaga catcggttaa acac 3684 <210> 47 <211> 4038 <212> DNA <213> Artificial Sequence <220> <223> DNA encoding FnCpf1 <400> 47 atgagcatct accaggagtt cgtcaacaag tattcactga gtaagacact gcggttcgag 60 ctgatcccac agggcaagac actggagaac atcaaggccc gaggcctgat tctggacgat 120 gagaagcggg caaaagacta taagaaagcc aagcagatca ttgataaata ccaccagttc 180 tttatcgagg aaattctgag ctccgtgtgc atcagtgagg atctgctgca gaattactca 240 gacgtgtact tcaagctgaa gaagagcgac gatgacaacc tgcagaagga cttcaagtcc 300 gccaaggaca ccatcaagaa acagattagc gagtacatca aggactccga aaagtttaaa 360 aatctgttca accagaatct gatcgatgct aagaaaggcc aggagtccga cctgatcctg 420 tggctgaaac agtctaagga caatgggatt gaactgttca aggctaactc cgatatcact 480 gatattgacg aggcactgga aatcatcaag agcttcaagg gatggaccac atactttaaa 540 ggcttccacg agaaccgcaa gaacgtgtac tccagcaacg acattcctac ctccatcatc 600 taccgaatcg tcgatgacaa tctgccaaag ttcctggaga acaaggccaa atatgaatct 660 ctgaaggaca aagctcccga ggcaattaat tacgaacaga tcaagaaaga tctggctgag 720 gaactgacat tcgatatcga ctataagact agcgaggtga accagagggt cttttccctg 780 gacgaggtgt ttgaaatcgc caatttcaac aattacctga accagtccgg cattactaaa 840 ttcaatacca tcattggcgg gaagtttgtg aacggggaga ataccaagcg caagggaatt 900 aacgaataca tcaatctgta tagccagcag atcaacgaca aaactctgaa gaaatacaag 960 atgtctgtgc tgttcaaaca gatcctgagt gataccgagt ccaagtcttt tgtcattgat 1020 aaactggaag atgactcaga cgtggtcact accatgcaga gcttttatga gcagatcgcc 1080 gctttcaaga cagtggagga aaaatctatt aaggaaactc tgagtctgct gttcgatgac 1140 ctgaaagccc agaagctgga cctgagtaag atctacttca aaaacgataa gagtctgaca 1200 gcctgtcac agcaggtgtt tgatgactat tccgtgattg ggaccgccgt cctggagtac 1260 attacacagc agatcgctcc aaagaacctg gataatccct ctaagaaaga gcaggaactg 1320 atcgctaaga aaaccgagaa ggcaaaatat ctgagtctgg aaacaattaa gctggcactg 1380 gaggagttca acaagcacag ggatattgac aaacagtgcc gctttgagga aatcctggcc 1440 aacttcgcag ccatccccat gatttttgat gagatcgccc agaacaaaga caatctggct 1500 cagatcagta ttaagtacca gaaccagggc aagaaagacc tgctgcaggc ttcagcagaa 1560 gatgacgtga aagccatcaa ggatctgctg gaccagacca acaatctgct gcacaagctg 1620 aaaatcttcc atattagtca gtcagaggat aaggctaata tcctggataa agacgaacac 1680 ttctacctgg tgttcgagga atgttacttc gagctggcaa acattgtccc cctgtataac 1740 aagattagga actacatcac acagaagcct tactctgacg agaagtttaa actgaacttc 1800 gaaaatagta ccctggccaa cgggtgggat aagaacaagg agcctgacaa cacagctatc 1860 ctgttcatca aggatgacaa gtactatctg ggagtgatga ataagaaaaa caataagatc 1920 ttcgatgaca aagccattaa ggagaacaaa ggggaaggat acaagaaaat cgtgtataag 1980 ctgctgcccg gcgcaaataa gatgctgcct aaggtgttct tcagcgccaa gagtatcaaa 2040 ttctacaacc catccgagga catcctgcgg attagaaatc actcaacaca tactaagaac 2100 gggagccccc agaagggata tgagaaattt gagttcaaca tcgaggattg caggaagttt 2160 attgacttct acaagcagag catctccaaa caccctgaat ggaaggattt tggcttccgg 2220 ttttccgaca cacagagata taactctatc gacgagttct accgcgaggt ggaaaatcag 2280 gggtataagc tgacttttga gaacatttct gaaagttaca tcgacagcgt ggtcaatcag 2340 ggaaagctgt acctgttcca gatctataac aaagattttt cagcatacag caagggcaga 2400 ccaaacctgc atacactgta ctggaaggcc ctgttcgatg agaggaatct gcaggacgtg 2460 gtctataaac tgaacggaga ggccgaactg ttttaccgga agcagtctat tcctaagaaa 2520 atcactcacc cagctaagga ggccatcgct aacaagaaca aggacaatcc taagaaagag 2580 agcgtgttcg aatacgatct gattaaggac aagcggttca ccgaagataa gttctttttc 2640 cattgtccaa tcaccattaa cttcaagtca agcggcgcta acaagttcaa cgacgagatc 2700 aatctgctgc tgaaggaaaa agcaaacgat gtgcacatcc tgagcattga ccgaggagag 2760 cggcatctgg cctactatac cctggtggat ggcaaaggga atatcattaa gcaggataca 2820 ttcaacatca ttggcaatga ccggatgaaa accaactacc acgataaact ggctgcaatc 2880 gagaaggata gagactcagc taggaaggac tggaagaaaa tcaacaacat taaggagatg 2940 aaggaaggct atctgagcca ggtggtccat gagattgcaa agctggtcat cgaatacaat 3000 gccattgtgg tgttcgagga tctgaacttc ggctttaaga gggggcgctt taaggtggaa 3060 aaacaggtct atcagaagct ggagaaaatg ctgatcgaaa agctgaatta cctggtgttt 3120 aaagataacg agttcgacaa gaccggaggc gtcctgagag cctaccagct gacagctccc 3180 tttgaaactt tcaagaaaat gggaaaacag acaggcatca tctactatgt gccagccgga 3240 ttcacttcca agatctgccc cgtgaccggc tttgtcaacc agctgtaccc taaatatgag 3300 tcagtgagca agtcccagga atttttcagc aagttcgata agatctgtta taatctggac 3360 aaggggtact tcgagttttc cttcgattac aagaacttcg gcgacaaggc cgctaagggg 3420 aaatggacca ttgcctcctt cggatctcgc ctgatcaact ttcgaaattc cgataaaaac 3480 cacaattggg acactaggga ggtgtaccca accaaggagc tggaaaagct gctgaaagac 3540 tactctatcg agtatggaca tggcgaatgc atcaaggcag ccatctgtgg cgagagtgat 3600 aagaaatttt tcgccaagct gacctcagtg ctgaatacaa tcctgcagat gcggaactca 3660 aagaccggga cagaactgga ctatctgatt agccccgtgg ctgatgtcaa cggaaacttc 3720 ttcgacagca gacaggcacc caaaaatatg cctcaggatg cagacgccaa cggggcctac 3780 cacatcgggc tgaagggact gatgctgctg ggccggatca agaacaatca ggaggggaag 3840 aagctgaacc tggtcattaa gaacgaggaa tacttcgagt ttgtccagaa tagaaataac 3900 aaaaggccgg cggccacgaa aaaggccggc caggcaaaaa agaaaaaggg atcctaccca 3960 tacgatgttc cagattacgc ttatccctac gacgtgcctg attatgcata cccatatgat 4020 gtccccgact atgcctaa 4038 <210> 48 <211> 4257 <212> DNA <213> Artificial Sequence <220> <223> DNA encoding MbCpf1 <400> 48 atgctgttcc aggactttac ccacctgtat ccactgtcca agacagtgag atttgagctg 60 aagcccatcg ataggaccct ggagcacatc cacgccaaga acttcctgtc tcaggacgag 120 acaatggccg atatgcacca gaaggtgaaa gtgatcctgg acgattacca ccgcgacttc 180 atcgccgata tgatgggcga ggtgaagctg accaagctgg ccgagttcta tgacgtgtac 240 ctgaagtttc ggaagaaccc aaaggacgat gagctgcaga agcagctgaa ggatctgcag 300 gccgtgctga gaaaggagat cgtgaagccc atcggcaatg gcggcaagta taaggccggc 360 tacgacaggc tgttcggcgc caagctgttt aaggacggca aggagctggg cgatctggcc 420 aagttcgtga tcgcacagga gggagagagc tccccaaagc tggcccacct ggcccacttc 480 gagaagtttt ccacctattt cacaggcttt cacgataacc ggaagaatat gtattctgac 540 gaggataagc acaccgccat cgcctaccgc ctgatccacg agaacctgcc ccggtttatc 600 gacaatctgc agatcctgac cacaatcaag cagaagcact ctgccctgta cgatcagatc 660 atcaacgagc tgaccgccag cggcctggac gtgtctctgg ccagccacct ggatggctat 720 cacaagctgc tgacacagga gggcatcacc gcctacaata cactgctggg aggaatctcc 780 ggagaggcag gctctcctaa gatccagggc atcaacgagc tgatcaattc tcaccacaac 840 cagcactgcc acaagagcga gagaatcgcc aagctgaggc cactgcacaa gcagatcctg 900 tccgacggca tgagcgtgtc cttcctgccc tctaagtttg ccgacgatag cgagatgtgc 960 caggccgtga acgagttcta tcgccactac gccgacgtgt tcgccaaggt gcagagcctg 1020 ttcgacggct ttgacgatca ccagaaggat ggcatctacg tggagcacaa gaacctgaat 1080 gagctgtcca agcaggcctt cggcgacttt gcactgctgg gacgcgtgct ggacggatac 1140 tatgtggatg tggtgaatcc agagttcaac gagcggtttg ccaaggccaa gaccgacaat 1200 gccaaggcca agctgacaaa ggagaaggat aagttcatca agggcgtgca ctccctggcc 1260 tctctggagc aggccatcga gcactatacc gcaaggcacg acgatgagag cgtgcaggca 1320 ggcaagctgg gacagtactt caagcacggc ctggccggag tggacaaccc catccagaag 1380 atccacagaca catcaagggc tttctggaga gggagcgccc tgcaggagag 1440 agagccctgc caaagatcaa gtccggcaag aatcctgaga tgacacagct gaggcagctg 1500 aaggagctgc tggataacgc cctgaatgtg gcccacttcg ccaagctgct gaccacaaag 1560 accacactgg acaatcagga tggcaacttc tatggcgagt ttggcgtgct gtacgacgag 1620 ctggccaaga tccccaccct gtataacaag gtgagagatt acctgagcca gaagcctttc 1680 tccaccgaga agtacaagct gaactttggc aatccaacac tgctgaatgg ctgggacctg 1740 aacaaggaga aggataattt cggcgtgatc ctgcagaagg acggctgcta ctatctggcc 1800 ctgctggaca aggcccacaa gaaggtgttt gataacgccc ctaatacagg caagagcatc 1860 tatcagaaga tgatctataa gtacctggag gtgaggaagc agttccccaa ggtgttcttt 1920 tccaaggagg ccatcgccat caactaccac ccttctaagg agctggtgga gatcaaggac 1980 aagggccggc agagatccga cgatgagcgc ctgaagctgt atcggtttat cctggagtgt 2040 ctgaagatcc accctaagta cgataagaag ttcgagggcg ccatcggcga catccagctg 2100 tttaagaagg ataagaaggg cagagaggtg ccaatcagcg agaaggacct gttcgataag 2160 atcaacggca tcttttctag caagcctaag ctggagatgg aggacttctt tatcggcgag 2220 ttcaagaggt ataacccaag ccaggacctg gtggatcagt ataatatcta caagaagatc 2280 gactccaacg ataatcgcaa gaaggagaat ttctacaaca atcaccccaa gtttaagaag 2340 gatctggtgc ggtactatta cgagtctatg tgcaagcacg aggagtggga ggagagcttc 2400 gagttttcca agaagctgca ggacatcggc tgttacgtgg atgtgaacga gctgtttacc 2460 gagatcgaga cacggagact gaattataag atctccttct gcaacatcaa tgccgactac 2520 atcgatgagc tggtggagca gggccagctg tatctgttcc agatctacaa caaggacttt 2580 tccccaaagg cccacggcaa gcccaatctg cacaccctgt acttcaaggc cctgttttct 2640 gggacaacc tggccgatcc tatctataag ctgaatggcg aggcccagat cttctacaga 2700 aaggcctccc tggacatgaa cgagacaaca atccacaggg ccggcgaggt gctggagaac 2760 aagaatcccg ataatcctaa gaagagacag ttcgtgtacg acatcatcaa ggataagagg 2820 tacacacagg acaagttcat gctgcacgtg ccaatcacca tgaactttgg cgtgcagggc 2880 atgacaatca aggagttcaa taagaaggtg aaccagtcta tccagcagta tgacgaggtg 2940 aacgtgatcg gcatcgatcg gggcgagaga cacctgctgt acctgaccgt gatcaatagc 3000 aagggcgaga tcctggagca gtgttccctg aacgacatca ccacagcctc tgccaatggc 3060 acacagatga ccacacctta ccacaagatc ctggataaga gggagatcga gcgcctgaac 3120 gcccgggtgg gatggggcga gatcgagaca atcaaggagc tgaagtctgg ctatctgagc 3180 cacgtggtgc accagatcag ccagctgatg ctgaagtaca acgccatcgt ggtgctggag 3240 gacctgaatt tcggctttaa gaggggccgc tttaaggtgg agaagcagat ctatcagaac 3300 ttcgagaatg ccctgatcaa gaagctgaac cacctggtgc tgaaggacaa ggccgacgat 3360 gagatcggct cttacaagaa tgccctgcag ctgaccaaca atttcacaga tctgaagagc 3420 atcggcaagc agaccggctt cctgttttat gtgcccgcct ggaacacctc taagatcgac 3480 cctgagacag gctttgtgga tctgctgaag ccaagatacg agaacatcgc ccagagccag 3540 gccttctttg gcaagttcga caagatctgc tataatgccg acaaggatta cttcgagttt 3600 cacatcgact acgccaagtt taccgataag gccaagaata gccgccagat ctggacaatc 3660 tgttcccacg gcgacaagcg gtacgtgtac gataagacag ccaaccagaa taagggcgcc 3720 gccaagggca tcaacgtgaa tgatgagctg aagtccctgt tcgcccgcca ccacatcaac 3780 gagaagcagc ccaacctggt catggacatc tgccagaaca atgataagga gtttcacaag 3840 tctctgatgt acctgctgaa aaccctgctg gccctgcggt acagcaacgc ctcctctgac 3900 gaggatttca tcctgtcccc cgtggcaaac gacgagggcg tgttctttaa tagcgccctg 3960 gccgacgata cacagcctca gaatgccgat gccaacggcg cctaccacat cgccctgaag 4020 ggcctgtggc tgctgaatga gctgaagaac tccgacgatc tgaacaaggt gaagctggcc 4080 atcgacaatc agacctggct gaatttcgcc cagaacagga aaaggccggc ggccacgaaa 4140 aaggccggcc aggcaaaaaa gaaaaaggga tcctacccat acgatgttcc agattacgct 4200 tatccctacg acgtgcctga ttatgcatac ccatatgatg tccccgacta tgcctaa 4257 <210> 49 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Cpf1 target sequence of CCR5-8 <400> 49 acaccgaagc agagttttta gg 22 <210> 50 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of DNMT1-3 <400> 50 agtaacagac atggaccatc 20 <210> 51 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of DNMT1-4 <400> 51 tttcccttca gctaaaataa 20 <210> 52 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of AAVS1 <400> 52 tgcttacgat ggagccagag 20 <210> 53 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of EMX1 <400> 53 aggtgtggtt ccagaaccgg 20 <210> 54 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of CCR5-1 <400> 54 tggttttgtg ggcaacatgc 20 <210> 55 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of CCR5-9 <400> 55 tagagctact gcaattattc 20 <210> 56 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of HPRT-1 <400> 56 gtgctttgat gtaatccagc 20 <210> 57 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of HPRT-4 <400> 57 ctagaatgac cagtcaacag 20 <210> 58 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of HBB-1 <400> 58 tccactcctg atgctgttat 20 <210> 59 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> SpCas9 sgRNA target sequence of VEGFA <400> 59 agcgagaaca gcccagaagt 20 <210> 60 <211> 22 <212> RNA <213> Artificial Sequence <220> <223> General formula of Cpf1 crRNA <220> <221> misc_feature <222> (1) <223> n is absent, U, A, or G <220> <221> misc_feature <222> (2) <223> n isa, or G <220> <221> misc_feature <222> (5) <223> n is is, A, or C <220> <221> misc_feature <12> <223> n is absent, G, C, or A <220> <221> misc_feature <222> (13) <223> n is absent, A, U, C, or G <220> <221> misc_feature <222> (14) <223> n is is, G, or C <220> <221> misc_feature &Lt; 222 > (15) <223> n isi or G <400> 60 nnaunucuac unnnnguaga un 22 <210> 61 <211> 14 <212> RNA <213> Artificial Sequence <220> <223> General formula of SpCas9 crRNA <220> <221> misc_feature <222> (1) <223> n is targeting sequence comprising 18-22 or 20 nucleotides <220> <221> misc_feature <222> (14) <223> comprises: 8-12 or 10 nucleotides, each of which is A, U, C, or          G <400> 61 nguuuuagag cuan 14 <210> 62 <211> 61 <212> RNA <213> Artificial Sequence <220> <223> General formula of SpCas9 tracrRNA <220> <221> misc_feature <222> (1) &Lt; 223 > is made 6-20 or 8-19 nucleotides, each of which is A, U, C,          or G <400> 62 nuagcaaguu aaaauaaggc uaguccguua ucaacuugaa aaaguggcac cgagucggug 60 c 61 <210> 63 <211> 80 <212> RNA <213> Artificial Sequence <220> <223> General formula of SpCas9 sgRNA <220> <221> misc_feature <222> (1) <223> n is targeting sequence comprising 18-22 or 20 nucleotides <220> <221> misc_feature <222> (14) <223> n is a linker comprising 3-5 or 4 nucleotides <400> 63 nguuucaguu gcunaugcuc uguaaucauu uaaaaguauu uugaacggac cucuguuuga 60 cacgucugaa uaacuaaaaa 80 <210> 64 <211> 655 <212> DNA <213> Artificial Sequence <220> <223> CMV promoter <400> 64 cgatgtacgg gccagatata cgcgttgaca ttgattattg actagttatt aatagtaatc 60 aattacgggg tcattagttc atagcccata tatggagttc cgcgttacat aacttacggt 120 aaatggcccg cctggctgac cgcccaacga cccccgccca ttgacgtcaa taatgacgta 180 tgttcccata gtaacgccaa tagggacttt ccattgacgt caatgggtgg actatttacg 240 gtaaactgcc cacttggcag tacatcaagt gtatcatatg ccaagtacgc cccctattga 300 cgtcaatgac ggtaaatggc ccgcctggca ttatgcccag tacatgacct tatgggactt 360 tcctacttgg cagtacatct acgtattagt catcgctatt accatggtga tgcggttttg 420 gcagtacatc aatgggcgtg gatagcggtt tgactcacgg ggatttccaa gtctccaccc 480 cattgacgtc aatgggagtt tgttttggca ccaaaatcaa cgggactttc caaaatgtcg 540 taacaactcc gccccattga cgcaaatggg cggtaggcgt gtacggtggg aggtctatat 600 aagcagagct ctctggctaa ctagagaacc cactgcttac tggcttatcg aaatt 655 <210> 65 <211> 2436 <212> DNA <213> Artificial Sequence <220> <223> pU6-As-crRNA <400> 65 gacgaagact caattgtcga ttagtgaacg gatctcgacg gtatcgatca cgagactagc 60 ctcgagcggc cgcccccttc accgagggcc tatttcccat gattccttca tatttgcata 120 tacgatacaa ggctgttaga gagataattg gaattaattt gactgtaaac acaaagatat 180 tagtacaaaa tacgtgacgt agaaagtaat aatttcttgg gtagtttgca gttttaaaat 240 tatgttttaa aatggactat catatgctta ccgtaacttg aaagtatttc gatttcttgg 300 ctttatatat cttgtggaaa ggacgaaaca ccgtaatttc tactcttgta gatnnnnnnn 360 nnnnnnnnnn nnnnnntttt ttctagattc gcgatgtacg ggccagatat acgcgttgac 420 attgattatt gactagttgt cttcctgcat taatgaatcg gccaacgcgc ggggagaggc 480 ggtttgcgta ttgggcgctc ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt 540 cggctgcggc gagcggtatc agctcactca aaggcggtaa tacggttatc cacagaatca 600 ggggataacg caggaaagaa catgtgagca aaaggccagc aaaaggccag gaaccgtaaa 660 aaggccgcgt tgctggcgtt tttccatagg ctccgccccc ctgacgagca tcacaaaaat 720 cgacgctcaa gtcagaggtg gcgaaacccg acaggactat aaagatacca ggcgtttccc 780 cctggaagct ccctcgtgcg ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc 840 gt; tcggtgtagg tcgttcgctc caagctgggc tgtgtgcacg aaccccccgt tcagcccgac 960 cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc cggtaagaca cgacttatcg 1020 ccactggcag cagccactgg taacaggatt agcagagcga ggtatgtagg cggtgctaca 1080 gagttcttga agtggtggcc taactacggc tacactagaa ggacagtatt tggtatctgc 1140 gctctgctga agccagttac cttcggaaaa agagttggta gctcttgatc cggcaaacaa 1200 accaccgctg gtagcggtgg tttttttgtt tgcaagcagc agattacgcg cagaaaaaaa 1260 ggatctcaag aagatccttt gatcttttct acggggtctg acgctcagtg gaacgaaaac 1320 tcacgttaag ggattttggt catgagatta tcaaaaagga tcttcaccta gatcctttta 1380 aattaaaaat gaagttttaa atcaatctaa agtatatatg agtaaacttg gtctgacagt 1440 taccaatgct taatcagtga ggcacctatc tcagcgatct gtctatttcg ttcatccata 1500 gttgcctgac tccccgtcgt gtagataact acgatacggg agggcttacc atctggcccc 1560 agtgctgcaa tgataccgcg agatccacgc tcaccggctc cagatttatc agcaataaac 1620 cagccagccg gaagggccga gcgcagaagt ggtcctgcaa ctttatccgc ctccatccag 1680 tctattaatt gttgccggga agctagagta agtagttcgc cagttaatag tttgcgcaac 1740 gttgttgcca ttgctacagg catcgtggtg tcacgctcgt cgtttggtat ggcttcattc 1800 agctccggtt cccaacgatc aaggcgagtt acatgatccc ccatgttgtg caaaaaagcg 1860 gttagctcct tcggtcctcc gatcgttgtc agaagtaagt tggccgcagt gttatcactc 1920 atggttatgg cagcactgca taattctctt actgtcatgc catccgtaag atgcttttct 1980 gtgactggtg agtactcaac caagtcattc tgagaatagt gtatgcggcg accgagttgc 2040 tcttgcccgg cgtcaatacg ggataatacc gcgccacata gcagaacttt aaaagtgctc 2100 atcattggaa aacgttcttc ggggcgaaaa ctctcaagga tcttaccgct gttgagatcc 2160 agttcgatgt aacccactcg tgcacccaac tgatcttcag catcttttac tttcaccagc 2220 gtttctgggt gagcaaaaac aggaaggcaa aatgccgcaa aaaagggaat aagggcgaca 2280 cggaaatgtt gaatactcat actcttcctt tttcaatatt attgaagcat ttatcagggt 2340 tattgtctca tgagcggata catatttgaa tgtatttaga aaaataaaca aataggggtt 2400 ccgcgcacat ttccccgaaa agtgccacct gacgtc 2436 <210> 66 <211> 2436 <212> DNA <213> Artificial Sequence <220> <223> pU6-Lb-crRNA <400> 66 gacgaagact caattgtcga ttagtgaacg gatctcgacg gtatcgatca cgagactagc 60 ctcgagcggc cgcccccttc accgagggcc tatttcccat gattccttca tatttgcata 120 tacgatacaa ggctgttaga gagataattg gaattaattt gactgtaaac acaaagatat 180 tagtacaaaa tacgtgacgt agaaagtaat aatttcttgg gtagtttgca gttttaaaat 240 tatgttttaa aatggactat catatgctta ccgtaacttg aaagtatttc gatttcttgg 300 ctttatatat cttgtggaaa ggacgaaaca ccgaatttct actaagtgta gatnnnnnnn 360 nnnnnnnnnn nnnnnntttt ttctagattc gcgatgtacg ggccagatat acgcgttgac 420 attgattatt gactagttgt cttcctgcat taatgaatcg gccaacgcgc ggggagaggc 480 ggtttgcgta ttgggcgctc ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt 540 cggctgcggc gagcggtatc agctcactca aaggcggtaa tacggttatc cacagaatca 600 ggggataacg caggaaagaa catgtgagca aaaggccagc aaaaggccag gaaccgtaaa 660 aaggccgcgt tgctggcgtt tttccatagg ctccgccccc ctgacgagca tcacaaaaat 720 cgacgctcaa gtcagaggtg gcgaaacccg acaggactat aaagatacca ggcgtttccc 780 cctggaagct ccctcgtgcg ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc 840 gt; tcggtgtagg tcgttcgctc caagctgggc tgtgtgcacg aaccccccgt tcagcccgac 960 cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc cggtaagaca cgacttatcg 1020 ccactggcag cagccactgg taacaggatt agcagagcga ggtatgtagg cggtgctaca 1080 gagttcttga agtggtggcc taactacggc tacactagaa ggacagtatt tggtatctgc 1140 gctctgctga agccagttac cttcggaaaa agagttggta gctcttgatc cggcaaacaa 1200 accaccgctg gtagcggtgg tttttttgtt tgcaagcagc agattacgcg cagaaaaaaa 1260 ggatctcaag aagatccttt gatcttttct acggggtctg acgctcagtg gaacgaaaac 1320 tcacgttaag ggattttggt catgagatta tcaaaaagga tcttcaccta gatcctttta 1380 aattaaaaat gaagttttaa atcaatctaa agtatatatg agtaaacttg gtctgacagt 1440 taccaatgct taatcagtga ggcacctatc tcagcgatct gtctatttcg ttcatccata 1500 gttgcctgac tccccgtcgt gtagataact acgatacggg agggcttacc atctggcccc 1560 agtgctgcaa tgataccgcg agatccacgc tcaccggctc cagatttatc agcaataaac 1620 cagccagccg gaagggccga gcgcagaagt ggtcctgcaa ctttatccgc ctccatccag 1680 tctattaatt gttgccggga agctagagta agtagttcgc cagttaatag tttgcgcaac 1740 gttgttgcca ttgctacagg catcgtggtg tcacgctcgt cgtttggtat ggcttcattc 1800 agctccggtt cccaacgatc aaggcgagtt acatgatccc ccatgttgtg caaaaaagcg 1860 gttagctcct tcggtcctcc gatcgttgtc agaagtaagt tggccgcagt gttatcactc 1920 atggttatgg cagcactgca taattctctt actgtcatgc catccgtaag atgcttttct 1980 gtgactggtg agtactcaac caagtcattc tgagaatagt gtatgcggcg accgagttgc 2040 tcttgcccgg cgtcaatacg ggataatacc gcgccacata gcagaacttt aaaagtgctc 2100 atcattggaa aacgttcttc ggggcgaaaa ctctcaagga tcttaccgct gttgagatcc 2160 agttcgatgt aacccactcg tgcacccaac tgatcttcag catcttttac tttcaccagc 2220 gtttctgggt gagcaaaaac aggaaggcaa aatgccgcaa aaaagggaat aagggcgaca 2280 cggaaatgtt gaatactcat actcttcctt tttcaatatt attgaagcat ttatcagggt 2340 tattgtctca tgagcggata catatttgaa tgtatttaga aaaataaaca aataggggtt 2400 ccgcgcacat ttccccgaaa agtgccacct gacgtc 2436 <210> 67 <211> 293 <212> DNA <213> Artificial Sequence <220> <223> U6-As-crRNA-amplicon <400> 67 gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60 ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120 aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180 atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240 cgaaacaccg taatttctac tcttgtagat nnnnnnnnnn nnnnnnnnnnnnnn 293 <210> 68 <211> 293 <212> DNA <213> Artificial Sequence <220> <223> U6-Lb-crRNA-amplicon <400> 68 gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60 ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120 aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180 atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240 cgaaacaccg aatttctact aagtgtagat nnnnnnnnnn nnnnnnnnnnnnn 293 <210> 69 <211> 23 <212> DNA <213> Artificial Sequence <220> LB-TS1 (LbCpf1-Target Sequence 1) <400> 69 tatgagcttg ctcatcagtt gcc 23 <210> 70 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS2 <400> 70 aactaactgg acacagtgtg ttt 23 <210> 71 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS3 <400> 71 attttactca tccatgtgac cat 23 <210> 72 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS4 <400> 72 actcatccat gtgaccatga gga 23 <210> 73 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS5 <400> 73 ctaaaggaca agtcaccaca gga 23 <210> 74 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS6 <400> 74 gcaagcatcc tgtactgtcc tgt 23 <210> 75 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS7 <400> 75 ggcaagcatc ctgtactgtc ctg 23 <210> 76 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS8 <400> 76 aacccagaca tatccacctc ttt 23 <210> 77 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS9 <400> 77 ttgaagggag aaaatcaagt cgt 23 <210> 78 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS10 <400> 78 gacagtggta ttattcagca cga 23 <210> 79 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> LB-TS11 <400> 79 acagtggtat tattcagcac gac 23 <210> 80 <211> 7585 <212> DNA <213> Artificial Sequence <220> <223> AAV vector containing HIF1-alpha crRNA <400> 80 cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60 ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120 aggggttcct gcggccgcac gcgtgagggc ctatttccca tgattccttc atatttgcat 180 atacgataca aggctgttag agagataatt ggaattaatt tgactgtaaa cacaaagata 240 ttagtacaaa atacgtgacg tagaaagtaa taatttcttg ggtagtttgc agttttaaaa 300 ttatgtttta aaatggacta tcatatgctt accgtaactt gaaagtattt cgatttcttg 360 gctttatata tcttgtggaa aggacgaaac accgaatttc tactaagtgt agatgcaagc 420 atcctgtact gtcctgtttt tttctagatt cgctagctag gtcttgaaag gagtgggaat 480 tggctccggt gcccgtcagt gggcagagcg cacatcgccc acagtccccg agaagttggg 540 gggaggggtc ggcaattgat ccggtgccta gagaaggtgg cgcggggtaa actgggaaag 600 tgatgtcgtg tactggctcc gcctttttcc cgagggtggg ggagaaccgt atataagtgc 660 agtagtcgcc gtgaacgttc tttttcgcaa cgggtttgcc gccagaacac aggaccggtt 720 ctagagcgct aagcttggta ccgccaccat gagcaagctg gagaagttta caaactgcta 780 ctccctgtct aagaccctga ggttcaaggc catccctgtg ggcaagaccc aggagaacat 840 cgacaataag cggctgctgg tggaggacga gaagagagcc gaggattata agggcgtgaa 900 gaagctgctg gatcgctact atctgtcttt tatcaacgac gtgctgcaca gcatcaagct 960 gaagaatctg aacaattaca tcagcctgtt ccggaagaaa accagaaccg agaaggagaa 1020 taaggagctg gagaacctgg agatcaatct gcggaaggag atcgccaagg ccttcaaggg 1080 caacgagggc tacaagtccc tgtttaagaa ggatatcatc gagacaatcc tgccagagtt 1140 cctggacgat aaggacgaga tcgccctggt gaacagcttc aatggcttta ccacagcctt 1200 caccggcttc tttgataaca gagagaatat gttttccgag gaggccaaga gcacatccat 1260 cgccttcagg tgtatcaacg agaatctgac ccgctacatc tctaatatgg acatcttcga 1320 gaaggtggac gccatctttg ataagcacga ggtgcaggag atcaaggaga agatcctgaa 1380 cagcgactat gatgtggagg atttctttga gggcgagttc tttaactttg tgctgacaca 1440 ggagggcatc gacgtgtata acgccatcat cggcggcttc gtgaccgaga gcggcgagaa 1500 gatcaagggc ctgaacgagt acatcaacct gtataatcag aaaaccaagc agaagctgcc 1560 taagtttaag ccactgtata agcaggtgct gagcgatcgg gagtctctga gcttctacgg 1620 cgagggctat acatccgatg aggaggtgct ggaggtgttt agaaacaccc tgaacaagaa 1680 cagcgagatc ttcagctcca tcaagaagct ggagaagctg ttcaagaatt ttgacgagta 1740 ctctagcgcc ggcatctttg tgaagaacgg ccccgccatc agcacaatct ccaaggatat 1800 cttcggcgag tggaacgtga tccgggacaa gtggaatgcc gagtatgacg atatccacct 1860 gaagaagaag gccgtggtga ccgagaagta cgaggacgat cggagaaagt ccttcaagaa 1920 gatcggctcc ttttctctgg agcagctgca ggagtacgcc gacgccgatc tgtctgtggt 1980 ggagaagctg aaggagatca tcatccagaa ggtggatgag atctacaagg tgtatggctc 2040 ctctgagaag ctgttcgacg ccgattttgt gctggagaag agcctgaaga agaacgacgc 2100 cgtggtggcc atcatgaagg acctgctgga ttctgtgaag agcttcgaga attacatcaa 2160 ggccttcttt ggcgagggca aggagacaaa cagggacgag tccttctatg gcgattttgt 2220 gctggcctac gacatcctgc tgaaggtgga ccacatctac gatgccatcc gcaattatgt 2280 gacccagaag ccctactcta aggataagtt caagctgtat tttcagaacc ctcagttcat 2340 gggcggctgg gacaaggata aggagacaga ctatcgggcc accatcctga gatacggctc 2400 caagtactat ctggccatca tggataagaa gtacgccaag tgcctgcaga agatcgacaa 2460 ggacgatgtg aacggcaatt acgagaagat caactataag ctgctgcccg gccctaataa 2520 gatgctgcca aaggtgttct tttctaagaa gtggatggcc tactataacc ccagcgagga 2580 catccagaag atctacaaga atggcacatt caagaagggc gatatgttta acctgaatga 2640 ctgtcacaag ctgatcgact tctttaagga tagcatctcc cggtatccaa agtggtccaa 2700 tgcctacgat ttcaactttt ctgagacaga gaagtataag gacatcgccg gcttttacag 2760 agaggtggag gagcagggct ataaggtgag cttcgagtct gccagcaaga aggaggtgga 2820 taagctggtg gaggagggca agctgtatat gttccagatc tataacaagg acttttccga 2880 taagtctcac ggcacaccca atctgcacac catgtacttc aagctgctgt ttgacgagaa 2940 caatcacgga cagatcaggc tgagcggagg agcagagctg ttcatgaggc gcgcctccct 3000 gaagaaggag gagctggtgg tgcacccagc caactcccct atcgccaaca agaatccaga 3060 taatcccaag aaaaccacaa ccctgtccta cgacgtgtat aaggataaga ggttttctga 3120 ggaccagtac gagctgcaca tcccaatcgc catcaataag tgccccaaga acatcttcaa 3180 gatcaataca gaggtgcgcg tgctgctgaa gcacgacgat aacccctatg tgatcggcat 3240 cgataggggc gagcgcaatc tgctgtatat cgtggtggtg gacggcaagg gcaacatcgt 3300 ggagcagtat tccctgaacg agatcatcaa caacttcaac ggcatcagga tcaagacaga 3360 ttaccactct ctgctggaca agaaggagaa ggagaggttc gaggcccgcc agaactggac 3420 ctccatcgag aatatcaagg agctgaaggc cggctatatc tctcaggtgg tgcacaagat 3480 ctgcgagctg gtggagaagt acgatgccgt gatcgccctg gaggacctga actctggctt 3540 taagaatagc cgcgtgaagg tggagaagca ggtgtatcag aagttcgaga agatgctgat 3600 cgataagctg aactacatgg tggacaagaa gtctaatcct tgtgcaacag gcggcgccct 3660 gaagggctat cagatcacca ataagttcga gagctttaag tccatgtcta cccagaacgg 3720 cttcatcttt tacatccctg cctggctgac atccaagatc gatccatcta ccggctttgt 3780 gaacctgctg aaaaccaagt ataccagcat cgccgattcc aagaagttca tcagctcctt 3840 tgacaggatc atgtacgtgc ccgaggagga tctgttcgag tttgccctgg actataagaa 3900 cttctctcgc acagacgccg attacatcaa gaagtggaag ctgtactcct acggcaaccg 3960 gatcagaatc ttccggaatc ctaagaagaa caacgtgttc gactgggagg aggtgtgcct 4020 gaccagcgcc tataaggagc tgttcaacaa gtacggcatc aattatcagc agggcgatat 4080 cagagccctg ctgtgcgagc agtccgacaa ggccttctac tctagcttta tggccctgat 4140 gagcctgatg ctgcagatgc ggaacagcat cacaggccgc accgacgtgg attttctgat 4200 cagccctgtg aagaactccg acggcatctt ctacgatagc cggaactatg aggcccagga 4260 gaatgccatc ctgccaaaga acgccgacgc caatggcgcc tataacatcg ccagaaaggt 4320 gctgtgggcc atcggccagt tcaagaaggc cgaggacgag aagctggata aggtgaagat 4380 cgccatctct aacaaggagt ggctggagta cgcccagacc agcgtgaagc acaaaaggcc 4440 ggcggccacg aaaaaggccg gccaggcaaa aaagaaaaag ggatcctacc catacgatgt 4500 tccagattac gcttatccct acgacgtgcc tgattatgca tacccatatg atgtccccga 4560 ctatgcctaa gaattcctcg ctgatcagcc tcgactgtgc cttctagttg ccagccatct 4620 gttgtttgcc cctcccccgt gccttccttg accctggaag gtgccactcc cactgtcctt 4680 tcctaataaa atgaggaaat tgcatcgcat tgtctgagta ggtgtcattc tattctgggg 4740 ggtggggtgg ggcaggacag caagggggag gattgggaag acaatagcag gcatgctggg 4800 gatgcggtgg gctctatggg gtaaccacgt gcggaccgag cggccgcagg aacccctagt 4860 gatggagttg gccactccct ctctgcgcgc tcgctcgctc actgaggccg ggcgaccaaa 4920 ggtcgcccga cgcccgggct ttgcccgggc ggcctcagtg agcgagcgag cgcgcagctg 4980 cctgcagggg cgcctgatgc ggtattttct ccttacgcat ctgtgcggta tttcacaccg 5040 catacgtcaa agcaaccata gtacgcgccc tgtagcggcg cattaagcgc ggcgggtgtg 5100 gtggttacgc gcagcgtgac cgctacactt gccagcgccc tagcgcccgc tcctttcgct 5160 ttcttccctt cctttctcgc cacgttcgcc ggctttcccc gtcaagctct aaatcggggg 5220 ctccctttag ggttccgatt tagtgcttta cggcacctcg accccaaaaa acttgatttg 5280 ggtgatggtt cacgtagtgg gccatcgccc tgatagacgg tttttcgccc tttgacgttg 5340 gagtccacgt tctttaatag tggactcttg ttccaaactg gaacaacact caaccctatc 5400 tcgggctatt cttttgattt ataagggatt ttgccgattt cggcctattg gttaaaaaat 5460 gagctgattt aacaaaaatt taacgcgaat tttaacaaaa tattaacgtt tacaatttta 5520 tggtgcactc tcagtacaat ctgctctgat gccgcatagt taagccagcc ccgacacccg 5580 ccaacacccg ctgacgcgcc ctgacgggct tgtctgctcc cggcatccgc ttacagacaa 5640 gctgtgaccg tctccgggag ctgcatgtgt cagaggtttt caccgtcatc accgaaacgc 5700 gcgagacgaa agggcctcgt gatacgccta tttttatagg ttaatgtcat gataataatg 5760 gtttcttaga cgtcaggtgg cacttttcgg ggaaatgtgc gcggaacccc tatttgttta 5820 tttttctaaa tacattcaaa tatgtatccg ctcatgagac aataaccctg ataaatgctt 5880 caataatatt gaaaaaggaa gagtatgagt attcaacatt tccgtgtcgc ccttattccc 5940 ttttttgcgg cattttgcct tcctgttttt gctcacccag aaacgctggt gaaagtaaaa 6000 gatgctgaag atcagttggg tgcacgagtg ggttacatcg aactggatct caacagcggt 6060 aagatccttg agagttttcg ccccgaagaa cgttttccaa tgatgagcac ttttaaagtt 6120 ctgctatgtg gcgcggtatt atcccgtatt gacgccgggc aagagcaact cggtcgccgc 6180 atacactatt ctcagaatga cttggttgag tactcaccag tcacagaaaa gcatcttacg 6240 gatggcatga cagtaagaga attatgcagt gctgccataa ccatgagtga taacactgcg 6300 gccaacttac ttctgacaac gatcggagga ccgaaggagc taaccgcttt tttgcacaac 6360 atggggatc atgtaactcg ccttgatcgt tgggaaccgg agctgaatga agccatacca 6420 aacgacgagc gtgacaccac gatgcctgta gcaatggcaa caacgttgcg caaactatta 6480 actggcgaac tacttactct agcttcccgg caacaattaa tagactggat ggaggcggat 6540 aaagttgcag gaccacttct gcgctcggcc cttccggctg gctggtttat tgctgataaa 6600 tctggagccg gtgagcgtgg gtctcgcggt atcattgcag cactggggcc agatggtaag 6660 ccctcccgta tcgtagttat ctacacgacg gggagtcagg caactatgga tgaacgaaat 6720 agacagatcg ctgagatagg tgcctcactg attaagcatt ggtaactgtc agaccaagtt 6780 tactcatata tactttagat tgatttaaaa cttcattttt aatttaaaag gatctaggtg 6840 aagatccttt ttgataatct catgaccaaa atcccttaac gtgagttttc gttccactga 6900 gcgtcagacc ccgtagaaaa gatcaaagga tcttcttgag atcctttttt tctgcgcgta 6960 atctgctgct tgcaaacaaa aaaaccaccg ctaccagcgg tggtttgttt gccggatcaa 7020 gagctaccaa ctctttttcc gaaggtaact ggcttcagca gagcgcagat accaaatact 7080 gtccttctag tgtagccgta gttaggccac cacttcaaga actctgtagc accgcctaca 7140 tacctcgctc tgctaatcct gttaccagtg gctgctgcca gtggcgataa gtcgtgtctt 7200 accgggttgg actcaagacg atagttaccg gataaggcgc agcggtcggg ctgaacgggg 7260 ggttcgtgca cacagcccag cttggagcga acgacctaca ccgaactgag atacctacag 7320 cgtgagctat gagaaagcgc cacgcttccc gaagggagaa aggcggacag gtatccggta 7380 agcggcaggg tcggaacagg agagcgcacg agggagcttc cagggggaaa cgcctggtat 7440 ctttatagtc ctgtcgggtt tcgccacctc tgacttgagc gtcgattttt gtgatgctcg 7500 tcaggggggc ggagcctatg gaaaaacgcc agcaacgcgg cctttttacg gttcctggcc 7560 ttttgctggc cttttgctca catgt 7585 <210> 81 <211> 90 <212> PRT <213> Artificial Sequence <220> <223> FRB protein <400> 81 Glu Met Trp His Glu Gly Leu Glu Glu Ala Ser Arg Leu Tyr Phe Gly   1 5 10 15 Glu Arg Asn Val Lys Gly Met Phe Glu Val Leu Glu Pro Leu His Ala              20 25 30 Met Met Glu Arg Gly Pro Gln Thr Leu Lys Glu Thr Ser Phe Asn Gln          35 40 45 Ala Tyr Gly Arg Asp Leu Met Glu Ala Gln Glu Trp Cys Arg Lys Tyr      50 55 60 Met Lys Ser Gly Asn Val Lys Asp Leu Thr Gln Ala Trp Asp Leu Tyr  65 70 75 80 Tyr His Val Phe Arg Arg Ile Ser Lys Gln                  85 90 <210> 82 <211> 107 <212> PRT <213> Artificial Sequence <220> <223> FKBP protein <400> 82 Gly Val Gln Val Glu Thr Ile Ser Pro Gly Asp Gly Arg Thr Phe Pro   1 5 10 15 Lys Arg Gly Gln Thr Cys Val Val His Tyr Thr Gly Met Leu Glu Asp              20 25 30 Gly Lys Lys Phe Asp Ser Ser Arg Asp Arg Asn Lys Pro Phe Lys Phe          35 40 45 Met Leu Gly Lys Gln Glu Val Ile Arg Gly Trp Glu Glu Gly Val Ala      50 55 60 Gln Met Ser Val Gly Gln Arg Ala Lys Leu Thr Ile Ser Pro Asp Tyr  65 70 75 80 Ala Tyr Gly Ala Thr Gly His Pro Gly Ile Ile Pro Pro His Ala Thr                  85 90 95 Leu Val Phe Asp Val Glu Leu Leu Lys Leu Glu             100 105 <210> 83 <211> 2703 <212> DNA <213> Artificial Sequence <220> Split-1-AsCpf1 Domain 1 coding DNA <400> 83 atgacacagt tcgagggctt taccaacctg tatcaggtga gcaagacact gcggtttgag 60 ctgatcccac agggcaagac cctgaagcac atccaggagc agggcttcat cgaggaggac 120 aaggcccgca atgatcacta caaggagctg aagcccatca tcgatcggat ctacaagacc 180 tatgccgacc agtgcctgca gctggtgcag ctggattggg agaacctgag cgccgccatc 240 gactcctata gaaaggagaa aaccgaggag acaaggaacg ccctgatcga ggagcaggcc 300 acatatcgca atgccatcca cgactacttc atcggccgga cagacaacct gaccgatgcc 360 atcaataaga gacacgccga gatctacaag ggcctgttca aggccgagct gtttaatggc 420 aaggtgctga agcagctggg caccgtgacc acaaccgagc acgagaacgc cctgctgcgg 480 agcttcgaca agtttacaac ctacttctcc ggcttttatg agaacaggaa gaacgtgttc 540 agcgccgagg atatcagcac agccatccca caccgcatcg tgcaggacaa cttccccaag 600 tttaaggaga attgtcacat cttcacacgc ctgatcaccg ccgtgcccag cctgcgggag 660 cactttgaga acgtgaagaa ggccatcggc atcttcgtga gcacctccat cgaggaggtg 720 ttttccttcc ctttttataa ccagctgctg acacagaccc agatcgacct gtataaccag 780 ctgctgggag gaatctctcg ggaggcaggc accgagaaga tcaagggcct gaacgaggtg 840 ctgaatctgg ccatccagaa gaatgatgag acagcccaca tcatcgcctc cctgccacac 900 agattcatcc ccctgtttaa gcagatcctg tccgatagga acaccctgtc tttcatcctg 960 gaggagttta agagcgacga ggaagtgatc cagtccttct gcaagtacaa gacactgctg 1020 agaaacgaga acgtgctgga gacagccgag gccctgttta acgagctgaa cagcatcgac 1080 ctgacacaca tcttcatcag ccacaagaag ctggagacaa tcagcagcgc cctgtgcgac 1140 cactgggata cactgaggaa tgccctgtat gagcggagaa tctccgagct gacaggcaag 1200 atcaccaagt ctgccaagga gaaggtgcag cgcagcctga agcacgagga tatcaacctg 1260 caggagatca tctctgccgc aggcaaggag ctgagcgagg ccttcaagca gaaaaccagc 1320 gagatcctgt cccacgcaca cgccgccctg gatcagccac tgcctacaac cctgaagaag 1380 caggaggaga aggagatcct gaagtctcag ctggacagcc tgctgggcct gtaccacctg 1440 ctggactggt ttgccgtgga tgagtccaac gaggtggacc ccgagttctc tgcccggctg 1500 accggcatca agctggagat ggagccttct ctgagcttct acaacaaggc cagaaattat 1560 gccaccaaga agccctactc cgtggagaag ttcaagctga actttcagat gcctacactg 1620 gcctctggct gggacgtgaa taaggagaag aacaatggcg ccatcctgtt tgtgaagaac 1680 ggcctgtact atctgggcat catgccaaag cagaagggca ggtataaggc cctgagcttc 1740 gagcccacag agaaaaccag cgagggcttt gataagatgt actatgacta cttccctgat 1800 gccgccaaga tgatcccaaa gtgcagcacc cagctgaagg ccgtgacagc ccactttcag 1860 acccacacaa cccccatcct gctgtccaac aatttcatcg agcctctgga gatcacaaag 1920 gagatctacg acctgaacaa tcctgagaag gagccaaaga agtttcagac agcctacgcc 1980 aagaaaaccg gcgaccagaa gggctacaga gaggccctgt gcaagtggat cgacttcaca 2040 agggattttc tgtccaagta taccaagaca acctctatcg atctgtctag cctgcggcca 2100 tcctctcagt ataaggacct gggcgagtac tatgccgagc tgaatcccct gctgtaccac 2160 atcagcttcc agagaatcgc cgagaaggag atcatggatg ccgtggagac aggcaagctg 2220 tacctgttcc agatctataa caaggacttt gccaagggcc accacggcaa gcctaatctg 2280 cacacactgt attggaccgg cctgttttct ccagagaacc tggccaagac aagcatcaag 2340 ctgaatggcc aggccgagct gttctaccgc cctaagtcca ggatgaagag gatggcacac 2400 cggctgggag agaagatgct gaacaagaag ctgaaggatc agaaaacccc aatccccgac 2460 accctgtacc aggagctgta cgactatgtg aatcacagac tgtcccacga cctgtctgat 2520 gaggccaggg ccctgctgcc caacgtgatc accaaggagg tgtctcacga gatcatcaag 2580 gataggcgct ttaccagcga caagttcttt ttccacgtgc ctatcacact gaactatcag 2640 gccgccaatt ccccatctaa gttcaaccag agggtgaatg cctacctgaa ggagcacccc 2700 gag 2703 <210> 84 <211> 1218 <212> DNA <213> Artificial Sequence <220> <223> Split-1-AsCpf1 Domain 2 coding DNA <400> 84 acacctatca tcggcatcga tcggggcgag agaaacctga tctatatcac agtgatcgac 60 tccaccggca agatcctgga gcagcggagc ctgaacacca tccagcagtt tgattaccag 120 aagaagctgg acaacaggga gaaggagagg gtggcagcaa ggcaggcctg gtctgtggtg 180 ggcacaatca aggatctgaa gcagggctat ctgagccagg tcatccacga gatcgtggac 240 ctgatgatcc actaccaggc cgtggtggtg ctggagaacc tgaatttcgg ctttaagagc 300 aagaggaccg gcatcgccga gaaggccgtg taccagcagt tcgagaagat gctgatcgat 360 aagctgaatt gcctggtgct gaaggactat ccagcagaga aagtgggagg cgtgctgaac 420 ccataccagc tgacagacca gttcacctcc tttgccaaga tgggcaccca gtctggcttc 480 ctgttttacg tgcctgcccc atatacatct aagatcgatc ccctgaccgg cttcgtggac 540 cccttcgtgt ggaaaaccat caagaatcac gagagccgca agcacttcct ggagggcttc 600 gactttctgc actacgacgt gaaaaccggc gacttcatcc tgcactttaa gatgaacaga 660 aatctgtcct tccagagggg cctgcccggc tttatgcctg catgggatat cgtgttcgag 720 aagaacgaga cacagtttga cgccaagggc acccctttca tcgccggcaa gagaatcgtg 780 ccagtgatcg agaatcacag attcaccggc agataccggg acctgtatcc tgccaacgag 840 ctgatcgccc tgctggagga gaagggcatc gtgttcaggg atggctccaa catcctgcca 900 aagctgctgg agaatgacga ttctcacgcc atcgacacca tggtggccct gatccgcagc 960 gtgctgcaga tgcggaactc caatgccgcc acaggcgagg actatatcaa cagccccgtg 1020 cgcgatctga atggcgtgtg cttcgactcc cggtttcaga acccagagtg gcccatggac 1080 gccgatgcca atggcgccta ccacatcgcc ctgaagggcc agctgctgct gaatcacctg 1140 aaggagagca aggatctgaa gctgcagaac ggcatctcca atcaggactg gctggcctac 1200 atccaggagc tgcgcaac 1218 <210> 85 <211> 2658 <212> DNA <213> Artificial Sequence <220> <223> Split-2-AsCpf1 Domain 1 coding DNA <400> 85 atgacacagt tcgagggctt taccaacctg tatcaggtga gcaagacact gcggtttgag 60 ctgatcccac agggcaagac cctgaagcac atccaggagc agggcttcat cgaggaggac 120 aaggcccgca atgatcacta caaggagctg aagcccatca tcgatcggat ctacaagacc 180 tatgccgacc agtgcctgca gctggtgcag ctggattggg agaacctgag cgccgccatc 240 gactcctata gaaaggagaa aaccgaggag acaaggaacg ccctgatcga ggagcaggcc 300 acatatcgca atgccatcca cgactacttc atcggccgga cagacaacct gaccgatgcc 360 atcaataaga gacacgccga gatctacaag ggcctgttca aggccgagct gtttaatggc 420 aaggtgctga agcagctggg caccgtgacc acaaccgagc acgagaacgc cctgctgcgg 480 agcttcgaca agtttacaac ctacttctcc ggcttttatg agaacaggaa gaacgtgttc 540 agcgccgagg atatcagcac agccatccca caccgcatcg tgcaggacaa cttccccaag 600 tttaaggaga attgtcacat cttcacacgc ctgatcaccg ccgtgcccag cctgcgggag 660 cactttgaga acgtgaagaa ggccatcggc atcttcgtga gcacctccat cgaggaggtg 720 ttttccttcc ctttttataa ccagctgctg acacagaccc agatcgacct gtataaccag 780 ctgctgggag gaatctctcg ggaggcaggc accgagaaga tcaagggcct gaacgaggtg 840 ctgaatctgg ccatccagaa gaatgatgag acagcccaca tcatcgcctc cctgccacac 900 agattcatcc ccctgtttaa gcagatcctg tccgatagga acaccctgtc tttcatcctg 960 gaggagttta agagcgacga ggaagtgatc cagtccttct gcaagtacaa gacactgctg 1020 agaaacgaga acgtgctgga gacagccgag gccctgttta acgagctgaa cagcatcgac 1080 ctgacacaca tcttcatcag ccacaagaag ctggagacaa tcagcagcgc cctgtgcgac 1140 cactgggata cactgaggaa tgccctgtat gagcggagaa tctccgagct gacaggcaag 1200 atcaccaagt ctgccaagga gaaggtgcag cgcagcctga agcacgagga tatcaacctg 1260 caggagatca tctctgccgc aggcaaggag ctgagcgagg ccttcaagca gaaaaccagc 1320 gagatcctgt cccacgcaca cgccgccctg gatcagccac tgcctacaac cctgaagaag 1380 caggaggaga aggagatcct gaagtctcag ctggacagcc tgctgggcct gtaccacctg 1440 ctggactggt ttgccgtgga tgagtccaac gaggtggacc ccgagttctc tgcccggctg 1500 accggcatca agctggagat ggagccttct ctgagcttct acaacaaggc cagaaattat 1560 gccaccaaga agccctactc cgtggagaag ttcaagctga actttcagat gcctacactg 1620 gcctctggct gggacgtgaa taaggagaag aacaatggcg ccatcctgtt tgtgaagaac 1680 ggcctgtact atctgggcat catgccaaag cagaagggca ggtataaggc cctgagcttc 1740 gagcccacag agaaaaccag cgagggcttt gataagatgt actatgacta cttccctgat 1800 gccgccaaga tgatcccaaa gtgcagcacc cagctgaagg ccgtgacagc ccactttcag 1860 acccacacaa cccccatcct gctgtccaac aatttcatcg agcctctgga gatcacaaag 1920 gagatctacg acctgaacaa tcctgagaag gagccaaaga agtttcagac agcctacgcc 1980 aagaaaaccg gcgaccagaa gggctacaga gaggccctgt gcaagtggat cgacttcaca 2040 agggattttc tgtccaagta taccaagaca acctctatcg atctgtctag cctgcggcca 2100 tcctctcagt ataaggacct gggcgagtac tatgccgagc tgaatcccct gctgtaccac 2160 atcagcttcc agagaatcgc cgagaaggag atcatggatg ccgtggagac aggcaagctg 2220 tacctgttcc agatctataa caaggacttt gccaagggcc accacggcaa gcctaatctg 2280 cacacactgt attggaccgg cctgttttct ccagagaacc tggccaagac aagcatcaag 2340 ctgaatggcc aggccgagct gttctaccgc cctaagtcca ggatgaagag gatggcacac 2400 cggctgggag agaagatgct gaacaagaag ctgaaggatc agaaaacccc aatccccgac 2460 accctgtacc aggagctgta cgactatgtg aatcacagac tgtcccacga cctgtctgat 2520 gaggccaggg ccctgctgcc caacgtgatc accaaggagg tgtctcacga gatcatcaag 2580 gataggcgct ttaccagcga caagttcttt ttccacgtgc ctatcacact gaactatcag 2640 gccgccaatt ccccatct 2658 <210> 86 <211> 1263 <212> DNA <213> Artificial Sequence <220> <223> Split-2-AsCpf1 Domain 2 coding DNA <400> 86 aagttcaacc agagggtgaa tgcctacctg aaggagcacc ccgagacacc tatcatcggc 60 atcgatcggg gcgagagaaa cctgatctat atcacagtga tcgactccac cggcaagatc 120 ctggagcagc ggagcctgaa caccatccag cagtttgatt accagaagaa gctggacaac 180 agggagaagg agagggtggc agcaaggcag gcctggtctg tggtgggcac aatcaaggat 240 ctgaagcagg gctatctgag ccaggtcatc cacgagatcg tggacctgat gatccactac 300 caggccgtgg tggtgctgga gaacctgaat ttcggcttta agagcaagag gaccggcatc 360 gccgagaagg ccgtgtacca gcagttcgag aagatgctga tcgataagct gaattgcctg 420 gtgctgaagg actatccagc agagaaagtg ggaggcgtgc tgaacccata ccagctgaca 480 gaccagttca cctcctttgc caagatgggc acccagtctg gcttcctgtt ttacgtgcct 540 gccccatata catctaagat cgatcccctg accggcttcg tggacccctt cgtgtggaaa 600 cctcagagac gcgtgaaaa ccggcgactt catcctgcac tttaagatga acagaaatct gtccttccag 720 aggggcctgc ccggctttat gcctgcatgg gatatcgtgt tcgagaagaa cgagacacag 780 tttgacgcca agggcacccc tttcatcgcc ggcaagagaa tcgtgccagt gatcgagaat 840 cacagattca ccggcagata ccgggacctg tatcctgcca acgagctgat cgccctgctg 900 gaggagaagg gcatcgtgtt cagggatggc tccaacatcc tgccaaagct gctggagaat 960 gacgattctc acgccatcga caccatggtg gccctgatcc gcagcgtgct gcagatgcgg 1020 aactccaatg ccgccacagg cgaggactat atcaacagcc ccgtgcgcga tctgaatggc 1080 gtgtgcttcg actcccggtt tcagaaccca gagtggccca tggacgccga tgccaatggc 1140 gcctaccaca tcgccctgaa gggccagctg ctgctgaatc acctgaagga gagcaaggat 1200 ctgaagctgc agaacggcat ctccaatcag gactggctgg cctacatcca ggagctgcgc 1260 aac 1263 <210> 87 <211> 1197 <212> DNA <213> Artificial Sequence <220> Split-3-AsCpf1 Domain 1 coding DNA <400> 87 atgacacagt tcgagggctt taccaacctg tatcaggtga gcaagacact gcggtttgag 60 ctgatcccac agggcaagac cctgaagcac atccaggagc agggcttcat cgaggaggac 120 aaggcccgca atgatcacta caaggagctg aagcccatca tcgatcggat ctacaagacc 180 tatgccgacc agtgcctgca gctggtgcag ctggattggg agaacctgag cgccgccatc 240 gactcctata gaaaggagaa aaccgaggag acaaggaacg ccctgatcga ggagcaggcc 300 acatatcgca atgccatcca cgactacttc atcggccgga cagacaacct gaccgatgcc 360 atcaataaga gacacgccga gatctacaag ggcctgttca aggccgagct gtttaatggc 420 aaggtgctga agcagctggg caccgtgacc acaaccgagc acgagaacgc cctgctgcgg 480 agcttcgaca agtttacaac ctacttctcc ggcttttatg agaacaggaa gaacgtgttc 540 agcgccgagg atatcagcac agccatccca caccgcatcg tgcaggacaa cttccccaag 600 tttaaggaga attgtcacat cttcacacgc ctgatcaccg ccgtgcccag cctgcgggag 660 cactttgaga acgtgaagaa ggccatcggc atcttcgtga gcacctccat cgaggaggtg 720 ttttccttcc ctttttataa ccagctgctg acacagaccc agatcgacct gtataaccag 780 ctgctgggag gaatctctcg ggaggcaggc accgagaaga tcaagggcct gaacgaggtg 840 ctgaatctgg ccatccagaa gaatgatgag acagcccaca tcatcgcctc cctgccacac 900 agattcatcc ccctgtttaa gcagatcctg tccgatagga acaccctgtc tttcatcctg 960 gaggagttta agagcgacga ggaagtgatc cagtccttct gcaagtacaa gacactgctg 1020 agaaacgaga acgtgctgga gacagccgag gccctgttta acgagctgaa cagcatcgac 1080 ctgacacaca tcttcatcag ccacaagaag ctggagacaa tcagcagcgc cctgtgcgac 1140 cactgggata cactgaggaa tgccctgtat gagcggagaa tctccgagct gacaggc 1197 <210> 88 <211> 2724 <212> DNA <213> Artificial Sequence <220> <223> Split-3-AsCpf1 Domain 2 coding DNA <400> 88 aagatcacca agtctgccaa ggagaaggtg cagcgcagcc tgaagcacga ggatatcaac 60 ctgcaggaga tcatctctgc cgcaggcaag gagctgagcg aggccttcaa gcagaaaacc 120 agcgagatcc tgtcccacgc acacgccgcc ctggatcagc cactgcctac aaccctgaag 180 aagcaggagg agaaggagat cctgaagtct cagctggaca gcctgctggg cctgtaccac 240 ctgctggact ggtttgccgt ggatgagtcc aacgaggtgg accccgagtt ctctgcccgg 300 ctgaccggca tcaagctgga gatggagcct tctctgagct tctacaacaa ggccagaaat 360 tatgccacca agaagcccta ctccgtggag aagttcaagc tgaactttca gatgcctaca 420 ctggcctctg gctgggacgt gaataaggag aagaacaatg gcgccatcct gtttgtgaag 480 aacggcctgt actatctggg catcatgcca aagcagaagg gcaggtataa ggccctgagc 540 ttcgagccca cagagaaaac cagcgagggc tttgataaga tgtactatga ctacttccct 600 gatgccgcca agatgatccc aaagtgcagc acccagctga aggccgtgac agcccacttt 660 cagacccaca caacccccat cctgctgtcc aacaatttca tcgagcctct ggagatcaca 720 aaggagatct acgacctgaa caatcctgag aaggagccaa agaagtttca gacagcctac 780 gccaagaaaa ccggcgacca gaagggctac agagaggccc tgtgcaagtg gatcgacttc 840 acaagggatt ttctgtccaa gtataccaag acaacctcta tcgatctgtc tagcctgcgg 900 ccatcctctc agtataagga cctgggcgag tactatgccg agctgaatcc cctgctgtac 960 cacatcagct tccagagaat cgccgagaag gagatcatgg atgccgtgga gacaggcaag 1020 ctgtacctgt tccagatcta taacaaggac tttgccaagg gccaccacgg caagcctaat 1080 ctgcacacac tgtattggac cggcctgttt tctccagaga acctggccaa gacaagcatc 1140 aagctgaatg gccaggccga gctgttctac cgccctaagt ccaggatgaa gaggatggca 1200 caccggctgg gagagaagat gctgaacaag aagctgaagg atcagaaaac cccaatcccc 1260 gacaccctgt accaggagct gtacgactat gtgaatcaca gactgtccca cgacctgtct 1320 gatgaggcca gggccctgct gcccaacgtg atcaccaagg aggtgtctca cgagatcatc 1380 aaggataggc gctttaccag cgacaagttc tttttccacg tgcctatcac actgaactat 1440 caggccgcca attccccatc taagttcaac cagagggtga atgcctacct gaaggagcac 1500 cccgagacac ctatcatcgg catcgatcgg ggcgagagaa acctgatcta tatcacagtg 1560 atcgactcca ccggcaagat cctggagcag cggagcctga acaccatcca gcagtttgat 1620 taccagaaga agctggacaa cagggagaag gagagggtgg cagcaaggca ggcctggtct 1680 gtggtgggca caatcaagga tctgaagcag ggctatctga gccaggtcat ccacgagatc 1740 gtggacctga tgatccacta ccaggccgtg gtggtgctgg agaacctgaa tttcggcttt 1800 aagagcaaga ggaccggcat cgccgagaag gccgtgtacc agcagttcga gaagatgctg 1860 atcgataagc tgaattgcct ggtgctgaag gactatccag cagagaaagt gggaggcgtg 1920 ctgaacccat accagctgac agaccagttc acctcctttg ccaagatggg cacccagtct 1980 ggcttcctgt tttacgtgcc tgccccatat acatctaaga tcgatcccct gaccggcttc 2040 gtggacccct tcgtgtggaa aaccatcaag aatcacgaga gccgcaagca cttcctggag 2100 ggcttcgact ttctgcacta cgacgtgaaa accggcgact tcatcctgca ctttaagatg 2160 aacagaaatc tgtccttcca gaggggcctg cccggcttta tgcctgcatg ggatatcgtg 2220 ttcgagaaga acgagacaca gtttgacgcc aagggcaccc ctttcatcgc cggcaagaga 2280 atcgtgccag tgatcgagaa tcacagattc accggcagat accgggacct gtatcctgcc 2340 aacgagctga tcgccctgct ggaggagaag ggcatcgtgt tcagggatgg ctccaacatc 2400 ctgccaaagc tgctggagaa tgacgattct cacgccatcg acaccatggt ggccctgatc 2460 cgcagcgtgc tgcagatgcg gaactccaat gccgccacag gcgaggacta tatcaacagc 2520 cccgtgcgcg atctgaatgg cgtgtgcttc gactcccggt ttcagaaccc agagtggccc 2580 atggacgccg atgccaatgg cgcctaccac atcgccctga agggccagct gctgctgaat 2640 cacctgaagg agagcaagga tctgaagctg cagaacggca tctccaatca ggactggctg 2700 gcctacatcc aggagctgcg caac 2724 <210> 89 <211> 1578 <212> DNA <213> Artificial Sequence <220> <223> Split-4-AsCpf1 Domain 1 coding DNA <400> 89 atgacacagt tcgagggctt taccaacctg tatcaggtga gcaagacact gcggtttgag 60 ctgatcccac agggcaagac cctgaagcac atccaggagc agggcttcat cgaggaggac 120 aaggcccgca atgatcacta caaggagctg aagcccatca tcgatcggat ctacaagacc 180 tatgccgacc agtgcctgca gctggtgcag ctggattggg agaacctgag cgccgccatc 240 gactcctata gaaaggagaa aaccgaggag acaaggaacg ccctgatcga ggagcaggcc 300 acatatcgca atgccatcca cgactacttc atcggccgga cagacaacct gaccgatgcc 360 atcaataaga gacacgccga gatctacaag ggcctgttca aggccgagct gtttaatggc 420 aaggtgctga agcagctggg caccgtgacc acaaccgagc acgagaacgc cctgctgcgg 480 agcttcgaca agtttacaac ctacttctcc ggcttttatg agaacaggaa gaacgtgttc 540 agcgccgagg atatcagcac agccatccca caccgcatcg tgcaggacaa cttccccaag 600 tttaaggaga attgtcacat cttcacacgc ctgatcaccg ccgtgcccag cctgcgggag 660 cactttgaga acgtgaagaa ggccatcggc atcttcgtga gcacctccat cgaggaggtg 720 ttttccttcc ctttttataa ccagctgctg acacagaccc agatcgacct gtataaccag 780 ctgctgggag gaatctctcg ggaggcaggc accgagaaga tcaagggcct gaacgaggtg 840 ctgaatctgg ccatccagaa gaatgatgag acagcccaca tcatcgcctc cctgccacac 900 agattcatcc ccctgtttaa gcagatcctg tccgatagga acaccctgtc tttcatcctg 960 gaggagttta agagcgacga ggaagtgatc cagtccttct gcaagtacaa gacactgctg 1020 agaaacgaga acgtgctgga gacagccgag gccctgttta acgagctgaa cagcatcgac 1080 ctgacacaca tcttcatcag ccacaagaag ctggagacaa tcagcagcgc cctgtgcgac 1140 cactgggata cactgaggaa tgccctgtat gagcggagaa tctccgagct gacaggcaag 1200 atcaccaagt ctgccaagga gaaggtgcag cgcagcctga agcacgagga tatcaacctg 1260 caggagatca tctctgccgc aggcaaggag ctgagcgagg ccttcaagca gaaaaccagc 1320 gagatcctgt cccacgcaca cgccgccctg gatcagccac tgcctacaac cctgaagaag 1380 caggaggaga aggagatcct gaagtctcag ctggacagcc tgctgggcct gtaccacctg 1440 ctggactggt ttgccgtgga tgagtccaac gaggtggacc ccgagttctc tgcccggctg 1500 accggcatca agctggagat ggagccttct ctgagcttct acaacaaggc cagaaattat 1560 gccaccaaga agccctac 1578 <210> 90 <211> 2343 <212> DNA <213> Artificial Sequence <220> <223> Split-4-AsCpf1 Domain 2 coding DNA <400> 90 tccgtggaga agttcaagct gaactttcag atgcctacac tggcctctgg ctgggacgtg 60 aataaggaga agaacaatgg cgccatcctg tttgtgaaga acggcctgta ctatctgggc 120 atcatgccaa agcagaaggg caggtataag gccctgagct tcgagcccac agagaaaacc 180 agcgagggct ttgataagat gtactatgac tacttccctg atgccgccaa gatgatccca 240 aagtgcagca cccagctgaa ggccgtgaca gcccactttc agacccacac aacccccatc 300 ctgctgtcca acaatttcat cgagcctctg gagatcacaa aggagatcta cgacctgaac 360 aatcctgaga aggagccaaa gaagtttcag acagcctacg ccaagaaaac cggcgaccag 420 aagggctaca gagaggccct gtgcaagtgg atcgacttca caagggattt tctgtccaag 480 tataccaaga caacctctat cgatctgtct agcctgcggc catcctctca gtataaggac 540 ctgggcgagt actatgccga gctgaatccc ctgctgtacc acatcagctt ccagagaatc 600 gccgagaagg agatcatgga tgccgtggag acaggcaagc tgtacctgtt ccagatctat 660 aacaaggact ttgccaaggg ccaccacggc aagcctaatc tgcacacact gtattggacc 720 ggcctgtttt ctccagagaa cctggccaag acaagcatca agctgaatgg ccaggccgag 780 ctgttctacc gccctaagtc caggatgaag aggatggcac accggctggg agagaagatg 840 ctgaacaaga agctgaagga tcagaaaacc ccaatccccg acaccctgta ccaggagctg 900 tacgactatg tgaatcacag actgtcccac gacctgtctg atgaggccag ggccctgctg 960 cccaacgtga tcaccaagga ggtgtctcac gagatcatca aggataggcg ctttaccagc 1020 gacaagttct ttttccacgt gcctatcaca ctgaactatc aggccgccaa ttccccatct 1080 aagttcaacc agagggtgaa tgcctacctg aaggagcacc ccgagacacc tatcatcggc 1140 atcgatcggg gcgagagaaa cctgatctat atcacagtga tcgactccac cggcaagatc 1200 ctggagcagc ggagcctgaa caccatccag cagtttgatt accagaagaa gctggacaac 1260 agggagaagg agagggtggc agcaaggcag gcctggtctg tggtgggcac aatcaaggat 1320 ctgaagcagg gctatctgag ccaggtcatc cacgagatcg tggacctgat gatccactac 1380 caggccgtgg tggtgctgga gaacctgaat ttcggcttta agagcaagag gaccggcatc 1440 gccgagaagg ccgtgtacca gcagttcgag aagatgctga tcgataagct gaattgcctg 1500 gtgctgaagg actatccagc agagaaagtg ggaggcgtgc tgaacccata ccagctgaca 1560 gaccagttca cctcctttgc caagatgggc acccagtctg gcttcctgtt ttacgtgcct 1620 gccccatata catctaagat cgatcccctg accggcttcg tggacccctt cgtgtggaaa 1680 cctcagagac gcgtgaaaa ccggcgactt catcctgcac tttaagatga acagaaatct gtccttccag 1800 aggggcctgc ccggctttat gcctgcatgg gatatcgtgt tcgagaagaa cgagacacag 1860 tttgacgcca agggcacccc tttcatcgcc ggcaagagaa tcgtgccagt gatcgagaat 1920 cacagattca ccggcagata ccgggacctg tatcctgcca acgagctgat cgccctgctg 1980 gaggagaagg gcatcgtgtt cagggatggc tccaacatcc tgccaaagct gctggagaat 2040 gcgattctc acgccatcga caccatggtg gccctgatcc gcagcgtgct gcagatgcgg 2100 aactccaatg ccgccacagg cgaggactat atcaacagcc ccgtgcgcga tctgaatggc 2160 gtgtgcttcg actcccggtt tcagaaccca gagtggccca tggacgccga tgccaatggc 2220 gcctaccaca tcgccctgaa gggccagctg ctgctgaatc acctgaagga gagcaaggat 2280 ctgaagctgc agaacggcat ctccaatcag gactggctgg cctacatcca ggagctgcgc 2340 aac 2343

Claims (46)

Introducing a ribonucleic acid protein complexed with a mixture of a Cpf1 protein and a guide RNA or a Cpf1 protein and a guide RNA into an eukaryotic cell or a eukaryotic organism other than a human by electroporation,
The guide RNA is a crRNA comprising a nucleotide sequence capable of hybridizing with a nucleotide of 30 nt to a nucleotide sequence (target sequence) of a target portion of a gene,
The Cpf1 protein is selected from the group consisting of Parcubacteria bacterium GWC2011_GWC2_44_17, Peregrinibacteria bacterium GW2011_GWA_33_10, Acidaminococcus sp. Which is derived from E. coli, BV3L6, Porphyromonas macacae , Lachnospiraceae bacterium ND2006, Porphyromonas crevioricanis , Prevotella disiens , Moraxella bovoculi 237, Leptospira inadai , Lachnospiraceae bacterium MA2020, Francisella novicida U112, Candidatus Methanoplasma termitum , or Eubacterium eligens .
Dielectric correction method.
delete delete 2. The method of claim 1, wherein the target sequence is linked to a TTTN or TTN (N is A, T, C or G) protospacer-adjacent motif at the 5 '
In addition, a sequence complementary to the PAM sequence (NAAA or NAA; N is A, T, C, or G) at the 3 'end is connected.
5. The method of claim 4, wherein the crRNA (CRISPR RNA) is represented by the following general formula 1:
(N cpf1 ) q- 3 '(general formula 1: SEQ ID NO: 60), 5'-n1-n2-AU-n3-UCUACU- n4-
In the general formula 1,
n1 is absent or is U, A or G, n2 is A or G, n3 is U, A or C, n4 is absent or G, C or A and n5 is A, C, G, or n6 is absent, U, G or C, n7 is U or G,
N cpf1 is determined according to the target region of the target gene as a targeting sequence region comprising a nucleotide sequence capable of hybridizing with the target sequence,
q is an integer of 15 to 30, which represents the number of nucleotides contained in the targeting sequence.
6. The method of claim 5, wherein the crRNA further comprises 1 to 3 guanines (G) at the 5 'terminus. A Cpf1 protein or a DNA encoding the same, and
A crRNA containing a nucleotide sequence capable of hybridizing with a nucleotide of 30 nt to a nucleotide sequence of 30 nt (target sequence) of the target site of the gene, or a DNA encoding the same
Lt; / RTI &gt;
The Cpf1 protein comprises two cleavage fragments generated by cleavage between the 886th amino acid and the 887th amino acid of AsCpf1 (SEQ ID NO: 43), between the 399th amino acid and the 400th amino acid, or between the 526th amino acid and the 527th amino acid In fact,
Composition for dielectric correction.
The composition of claim 7, wherein the Cpf1 protein comprises two cleavage fragments generated by cleavage between the 399th amino acid and the 400th amino acid of AsCpf1 (SEQ ID NO: 43). [Claim 8] The method according to claim 7, wherein the Cpf1 protein comprises two cleavage fragments, each of which is bound to a binding protein at the N-terminal or C-terminus, Wherein the protein is a different protein binding to another site. 10. The composition according to claim 9, wherein the bioactive substance is rapamycin, and the binding protein is selected from the group consisting of FRB protein and FKBP protein. 11. The composition according to any one of claims 7 to 10, wherein the DNA encoding the crRNA is contained in a vector. 11. The composition according to any one of claims 7 to 10, wherein the crRNA is a crRNA transcribed in vitro or chemically synthesized using a plasmid as a template. 11. The composition for dielectric correction according to any one of claims 7 to 10, wherein the crRNA does not contain a phosphate-phosphate bond at the 5 'terminus. 11. The method according to any one of claims 7 to 10, wherein the Cpf1 protein or the DNA encoding the Cpf1 protein further comprises a nuclear localization signal (NLS) sequence or DNA encoding the same. Composition. 12. The composition for dielectrophoresis according to claim 11, comprising a recombinant vector comprising a DNA encoding a cleavage fragment of the Cpf1 protein and a recombinant vector comprising DNA encoding the crRNA. 16. The composition according to claim 15, wherein the DNA encoding the cleavage fragment of the Cpf1 protein and the encoding DNA of the crRNA are contained together in one recombinant vector or in separate vectors, respectively. 11. The composition for dielectric correction according to any one of claims 7 to 10, wherein the composition for dielectric correction is for use in eukaryotic animal embryo cells or eukaryotic plant cells. 18. The composition of claim 17, wherein the eukaryotic animal is a mammal, and the eukaryotic plant is algae, monocotyledonous or dicotyledonous plants. 10. A method of dielectric correction comprising the step of introducing the composition for dielectric modification according to any one of claims 7 to 10 into a eukaryotic cell or a eukaryotic organism other than a human. delete 20. The method of claim 19, wherein the crRNA contained in the composition for orthodontic treatment is a crRNA transcribed in vitro or chemically synthesized using a plasmid as a template. 20. The method of claim 19, wherein the crRNA contained in the dielectric correcting composition does not include a phosphoric acid-phosphate bond at the 5 'terminus. 20. The method of claim 19, wherein the Cpf1 protein or the DNA encoding the same further comprises a nuclear localization signal (NLS) sequence or a DNA encoding the Cpf1 protein. . 20. The method of claim 19, wherein the step of introducing the dielectric correcting composition is performed by a local injection method, a microinjection method, an electroporation method or a lipofection method. 20. The method of claim 19, wherein the cell or organism is a separate eukaryotic cell or a eukaryotic organism other than a human. 26. The method of claim 25, wherein said eukaryotic cell is a cell isolated from an embryonic cell or eukaryotic plant isolated from an eukaryotic animal other than human. 26. The method of claim 25, wherein the eukaryotic organism is an eukaryotic animal or an eukaryote plant other than a human. Introducing the composition for dielectrophoresis of any one of claims 7 to 10 into a eukaryotic cell or a eukaryotic organism other than a human,
Introducing a ribonucleic acid protein complexed with a mixture of a Cpf1 protein and a guide RNA or a Cpf1 protein and a guide RNA into an eukaryotic cell or a eukaryotic organism other than a human by electroporation
Wherein the transformant is a transformant.
delete 29. The method according to claim 28, wherein the crRNA contained in the composition for orthodontic treatment is a crRNA transcribed in vitro or chemically synthesized using a plasmid as a template. 29. The method according to claim 28, wherein the crRNA contained in the composition for orthodontic treatment does not contain a phosphate-phosphate bond at the 5 'terminus. 29. The transformant according to claim 28, wherein the Cpf1 protein or the DNA encoding the same further comprises a nuclear localization signal (NLS) sequence or a DNA encoding the same. &Lt; / RTI &gt; 29. The method of claim 28, wherein the step of introducing the dielectric correcting composition is performed by a method of local injection, microinjection, electroporation, or lipofection. Gt; 29. The method of producing a transformant according to claim 28, wherein the transformant is derived from truncation of a gene, insertion of a nucleotide, substitution of a nucleotide, or deletion of a nucleotide. 29. The method according to claim 28, wherein the eukaryotic cells are cells isolated from eukaryotic cells or eukaryotic cells isolated from eukaryotic animals except human. 29. The method according to claim 28, wherein the eukaryotic organism is an eukaryotic animal or an eukaryotic plant except human. 28. A transformant produced by the method of claim 28 except human. 38. The transformant according to claim 37, wherein the transformant is a eukaryotic cell derived from genetic amputation, insertion of nucleotides, or deletion of nucleotides, eukaryotic animal or eukaryotic plant except human. delete delete delete delete delete delete delete delete
KR1020160167045A 2015-12-08 2016-12-08 Composition for Genome Editing Comprising Cpf1 and Use thereof KR101897213B1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
KR1020150174212 2015-12-08
KR20150174212 2015-12-08
US201662299043P 2016-02-24 2016-02-24
US62/299,043 2016-02-24
KR1020160036381 2016-03-25
KR20160036381 2016-03-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
KR1020180017195A Division KR101958437B1 (en) 2015-12-08 2018-02-12 Composition for Genome Editing Comprising Cpf1 and Use thereof

Publications (2)

Publication Number Publication Date
KR20170068400A KR20170068400A (en) 2017-06-19
KR101897213B1 true KR101897213B1 (en) 2018-09-11

Family

ID=59013788

Family Applications (2)

Application Number Title Priority Date Filing Date
KR1020160167045A KR101897213B1 (en) 2015-12-08 2016-12-08 Composition for Genome Editing Comprising Cpf1 and Use thereof
KR1020180017195A KR101958437B1 (en) 2015-12-08 2018-02-12 Composition for Genome Editing Comprising Cpf1 and Use thereof

Family Applications After (1)

Application Number Title Priority Date Filing Date
KR1020180017195A KR101958437B1 (en) 2015-12-08 2018-02-12 Composition for Genome Editing Comprising Cpf1 and Use thereof

Country Status (2)

Country Link
KR (2) KR101897213B1 (en)
WO (1) WO2017099494A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11174470B2 (en) 2019-01-04 2021-11-16 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
US11273442B1 (en) 2018-08-01 2022-03-15 Mammoth Biosciences, Inc. Programmable nuclease compositions and methods of use thereof
KR20240020336A (en) 2022-08-04 2024-02-15 성균관대학교산학협력단 Protospacer Adjacent Motif-independent mutant Cas9 protein

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3066201B1 (en) 2013-11-07 2018-03-07 Editas Medicine, Inc. Crispr-related methods and compositions with governing grnas
EP3215617A2 (en) 2014-11-07 2017-09-13 Editas Medicine, Inc. Methods for improving crispr/cas-mediated genome-editing
WO2016182959A1 (en) 2015-05-11 2016-11-17 Editas Medicine, Inc. Optimized crispr/cas9 systems and methods for gene editing in stem cells
WO2016201047A1 (en) 2015-06-09 2016-12-15 Editas Medicine, Inc. Crispr/cas-related methods and compositions for improving transplantation
CA2999500A1 (en) 2015-09-24 2017-03-30 Editas Medicine, Inc. Use of exonucleases to improve crispr/cas-mediated genome editing
WO2017165826A1 (en) 2016-03-25 2017-09-28 Editas Medicine, Inc. Genome editing systems comprising repair-modulating enzyme molecules and methods of their use
US11236313B2 (en) 2016-04-13 2022-02-01 Editas Medicine, Inc. Cas9 fusion molecules, gene editing systems, and methods of use thereof
KR101961332B1 (en) * 2016-07-28 2019-03-22 기초과학연구원 Pharmaceutical Composition for Treating or Preventing Eye Disease Comprising Cas9 Protein and Guide RNA
EP4012032A1 (en) * 2016-08-19 2022-06-15 Toolgen Incorporated Artificially engineered angiogenesis regulatory system
WO2018201086A1 (en) 2017-04-28 2018-11-01 Editas Medicine, Inc. Methods and systems for analyzing guide rna molecules
AU2018279829B2 (en) 2017-06-09 2024-01-04 Editas Medicine, Inc. Engineered Cas9 nucleases
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
US11866726B2 (en) 2017-07-14 2024-01-09 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
EP3715461A4 (en) * 2017-11-21 2021-09-08 Genkore Co. Ltd. Genome editing composition using crispr/cpf1 system and use thereof
KR20200103769A (en) * 2018-01-23 2020-09-02 기초과학연구원 Extended single guide RNA and uses thereof
WO2019173942A1 (en) * 2018-03-12 2019-09-19 Nanjing Bioheng Biotech Co., Ltd Engineered chimeric guide rna and uses thereof
KR102177174B1 (en) 2018-05-18 2020-11-10 울산대학교 산학협력단 A retinal degenerated animal model by PDE6B gene deletion and the preparation method thereof
CA3109105A1 (en) * 2018-08-09 2020-02-13 G+Flas Life Sciences Novel crispr-associated protein and use thereof
WO2020030984A2 (en) 2018-08-09 2020-02-13 G+Flas Life Sciences Compositions and methods for genome engineering with cas12a proteins
CN109666684A (en) * 2018-12-25 2019-04-23 北京化工大学 A kind of CRISPR/Cas12a gene editing system and its application
KR102493904B1 (en) 2019-12-13 2023-01-31 한국생명공학연구원 Immunodeficient Animal Model Mutated IL2Rg Gene by EeCpf1 and Method for Producing the same
KR102551876B1 (en) 2019-12-18 2023-07-05 한국생명공학연구원 Composition for Genome Editing or Inhibiting Gene Expression comprising Cpf1 and Chimeric DNA-RNA Guide
WO2021194172A1 (en) * 2020-03-24 2021-09-30 연세대학교 산학협력단 Novel guide rna and method for diagnosing coronavirus infection 2019 using same
KR102471698B1 (en) * 2020-03-24 2022-11-28 연세대학교 산학협력단 Novel guide RNA and method for diagnosing Coronavirus disease 2019
CN113373170A (en) * 2021-04-29 2021-09-10 江西农业大学 pFNCpfAb/pCrAb double-plasmid system and application thereof
CN113969281B (en) * 2021-12-24 2022-04-01 汕头大学 Modified CrRNA fragment and African swine fever virus kit

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101656237B1 (en) * 2012-10-23 2016-09-12 주식회사 툴젠 Composition for cleaving a target DNA comprising a guide RNA specific for the target DNA and Cas protein-encoding nucleic acid or Cas protein, and use thereof
EP3653229A1 (en) 2013-12-12 2020-05-20 The Broad Institute, Inc. Delivery, use and therapeutic applications of the crispr-cas systems and compositions for genome editing
US9790490B2 (en) * 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Cell, Vol.163, pp.759-771 (2015.10.22.)*
Genome Biology, Vol.16, Article 251 (2015.11.17.)
Genome Research, Vol.24, pp.1012-1019 (2014)*
Genome Research, Vol.24, pp.132-141 (2014)*
Nature Biotechnology, Vol.33, No.2, pp.139-142 (2015.02.)
Nature Reviews Microbiology, Vol.13, pp.722-736 (2015.11.)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11273442B1 (en) 2018-08-01 2022-03-15 Mammoth Biosciences, Inc. Programmable nuclease compositions and methods of use thereof
US11761029B2 (en) 2018-08-01 2023-09-19 Mammoth Biosciences, Inc. Programmable nuclease compositions and methods of use thereof
US11174470B2 (en) 2019-01-04 2021-11-16 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
KR20240020336A (en) 2022-08-04 2024-02-15 성균관대학교산학협력단 Protospacer Adjacent Motif-independent mutant Cas9 protein

Also Published As

Publication number Publication date
KR20180028996A (en) 2018-03-19
WO2017099494A1 (en) 2017-06-15
KR20170068400A (en) 2017-06-19
KR101958437B1 (en) 2019-03-15
WO2017099494A8 (en) 2017-08-10

Similar Documents

Publication Publication Date Title
KR101897213B1 (en) Composition for Genome Editing Comprising Cpf1 and Use thereof
JP7239725B2 (en) CRISPR-Cas effector polypeptides and methods of use thereof
US10570415B2 (en) RNA-guided nucleic acid modifying enzymes and methods of use thereof
CA3038982A1 (en) Rna-guided nucleic acid modifying enzymes and methods of use thereof
CN109609579B (en) Genetically engineered bacterium for producing beta-carotene and construction method thereof
US20040245317A1 (en) Artificial chromosomes that can shuttle between bacteria yeast and mammalian cells
KR20240017367A (en) Class II, type V CRISPR systems
CN114207125A (en) Reverse selection by suppression of conditionally essential genes
AU2021336262A1 (en) Miniaturized cytidine deaminase-containing complex for modifying double-stranded dna
KR102468650B1 (en) Recombinant vector inducing expression of T7 RNA polymerase and mRNA capping enzyme and uses thereof
EP4256045A1 (en) Crispr-cas effector polypeptides and methods of use thereof
KR20200078200A (en) Modified crispr associated protein comprising crispr associated protein and exonuclease and use thereof
CN114921439A (en) CRISPR-Cas effector protein, and gene editing system and application thereof

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
A107 Divisional application of patent
AMND Amendment
E601 Decision to refuse application
AMND Amendment
X701 Decision to grant (after re-examination)
GRNT Written decision to grant