CN116751763A - Cpf1 protein, V-type gene editing system and application - Google Patents

Cpf1 protein, V-type gene editing system and application Download PDF

Info

Publication number
CN116751763A
CN116751763A CN202310510289.0A CN202310510289A CN116751763A CN 116751763 A CN116751763 A CN 116751763A CN 202310510289 A CN202310510289 A CN 202310510289A CN 116751763 A CN116751763 A CN 116751763A
Authority
CN
China
Prior art keywords
gene editing
protein
sequence
cpf1
crispr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310510289.0A
Other languages
Chinese (zh)
Other versions
CN116751763B (en
Inventor
田瑞
赵停停
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Shutong Medical Technology Co ltd
Original Assignee
Zhuhai Shutong Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Shutong Medical Technology Co ltd filed Critical Zhuhai Shutong Medical Technology Co ltd
Priority to CN202310510289.0A priority Critical patent/CN116751763B/en
Publication of CN116751763A publication Critical patent/CN116751763A/en
Application granted granted Critical
Publication of CN116751763B publication Critical patent/CN116751763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the technical field of genetic engineering, and discloses a Cpf1 protein, a V-type CRISPR/Cas12a gene editing system and application. The invention provides a Cpf1 protein and a nucleotide sequence encoding the Cpf1 protein. The invention provides a V-type CRISPR/Cas12a gene editing system which comprises Cpf1 protein, auxiliary protein and CRISPR array. The V-type CRISPR/Cas12a gene editing system is applied to prokaryotic or eukaryotic gene editing and preparation of biological gene editing preparations. The gene editing system expands the variety of gene editing tools, enriches the PAM diversity of the existing Cpf1 as the gene editing tool, provides more tool choices for the application of the Cpf1 in clinical treatment, and plays an important role in promoting the application of gene editing in clinical treatment.

Description

Cpf1 protein, V-type gene editing system and application
Technical Field
The invention relates to the technical field of genetic engineering, in particular to a Cpf1 protein, a V-type CRISPR/Cas12a gene editing system and application.
Background
Microbial adaptive immune system CRISPR/Cas (Clustered Regularly Interspaced Palindromic Repeats/CRISPR-associated proteins system) helps bacteria and archaea defend against invasion of foreign nucleic acids. The CRISPR/Cas system comprises Direct Repeat (DR) sequences separated by unique Spacer sequences (spacers) of exogenous DNA. CRISPR array is transcribed into a long transcript (precursor to pre-crRNA, CRISPR RNA) and then processed to produce small mature CRISPR RNA (crRNA) consisting of a spacer sequence and a direct repeat of the partial neighborhood. The crRNA forms a complex with the Cas endonuclease and, in some cases, also with the auxiliary Cas protein and serves as a guide for targeting and cleaving foreign nucleic acids, thereby effecting interference. DNA recognition of Cas-crRNA complexes requires the presence of a protospacer adjacent motif (PAM, proto-Spacer Adjacent Motif) near the target site, which aids in self and non-self discrimination. CRISPR/Cas systems are broadly divided into two classes based on the number of different proteases: class I systems use complexes of multiple Cas proteins, such as cascades, while class II systems use single effector enzymes, such as Cas9, cas12.
However, in the CRISPR/Cas9 system, since Cas9 protein is usually large and consists of about 1300 amino acids, the maximum load that AAV viruses commonly used for in vivo delivery can accommodate is 4500kb, besides accommodating Cas protein, other functional elements necessary for gene editing, such as tracrRNA, need to be loaded, and this condition limits that most Cas9 proteins cannot be delivered in a package, and thus makes its application difficult. Furthermore, cas9 recognizes G-rich PAMs, resulting in the inability to edit some specific target sites, so the range of targeting in the mammalian genome is limited. Thus, the development of new genome editing systems to provide more potential significance in the application of gene editing is important.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a Cpf1 protein, a V-type CRISPR/Cas12a gene editing system and application.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the invention provides a Cpf1 protein, the amino acid sequence of the Cpf1 protein is shown in any one of SEQ ID NO. 1-6.
The Cpf1 protein can identify a plurality of different PAM sequences, only needs to identify PAM with 3-base, 2-base and even 1-base, has wider targeting range, enriches the PAM diversity of the existing Cpf1 protein, can greatly improve the possibility of accurately editing specific gene loci, provides reference value for further excavating novel Cpf1 nuclease with more PAM and rich C-base, and expands the existing gene editing variety.
In a second aspect, the invention provides a nucleic acid encoding the Cpf1 protein, the nucleic acid having a base sequence as shown in any one of SEQ ID NOS.16 to 21.
In a third aspect, the invention provides a V-type CRISPR/Cas12a gene editing system comprising the Cpf1 protein, an accessory protein and a CRISPR array.
The gene editing system provided by the invention can identify the respective unique PAM sequences, can enable the gene editing function to be performed in vitro and eukaryotic cells under the guidance of crRNA, further expands the variety of gene editing tools, enriches the PAM diversity of the existing Cpf1 as the gene editing tool, provides more tool choices for the application of Cpf1 in clinical treatment, and plays an important role in promoting the application of gene editing in clinical treatment.
As a preferred embodiment of the V-type CRISPR/Cas12a gene editing system of the present invention, said CRISPR array comprises a direct repeat sequence and a spacer sequence; the direct repeat sequence and the spacer sequence are spaced apart.
Further, the nucleotide sequence of the direct repeated sequence is shown as any one of SEQ ID NO. 10-15.
As a preferred embodiment of the V-type CRISPR/Cas12a gene editing system, the amino acid sequence of the auxiliary protein is shown as any one of SEQ ID NO. 7-8.
In a fourth aspect, the invention applies the V-type CRISPR/Cas12a gene editing system in prokaryotic or eukaryotic gene editing.
In a fifth aspect, the invention applies the V-type CRISPR/Cas12a gene editing system in the preparation of biological gene editing formulations.
Compared with the prior art, the invention has the beneficial effects that:
(1) Six brand new V-type CRISPR/Cas12a gene editing systems are firstly dug through metagenome bioinformatics analysis, and direct repeated sequences corresponding to the six brand new V-type CRISPR/Cas12a gene editing systems are predicted. The Cpf1 protein of the 6 novel editing systems can identify a plurality of different PAM sequences, only needs to identify PAM with 3-base, 2-base and even 1-base, has wider targeting range, can greatly improve the possibility of accurately editing specific gene loci, and provides reference value for further excavating novel Cpf1 nuclease with more PAM and rich C-base.
(2) Experiments prove that the six CRISPR/Cas12a gene editing systems can identify respective unique PAM sequences and can perform gene editing functions in vitro and eukaryotic cells under the guidance of crRNA. The discovery of the novel six gene editing systems further expands the variety of gene editing tools, enriches the PAM diversity of the existing Cpf1 as the gene editing tools, provides more tool choices for the application of the Cpf1 in clinical treatment, and plays an important role in promoting the application of gene editing in clinical treatment.
Drawings
Fig. 1 is a schematic diagram of the composition of a CRISPR/Cas12a gene editing system according to the present invention from Cas proteins and CRISPR arrays.
Fig. 2 is a graph of the secondary structure prediction of crrnas of the six CRISPR/Cas12a gene editing systems of the present invention.
Fig. 3 is a PAM diagram of six CRISPR/Cas12a gene editing systems according to the present invention.
Fig. 4 is an in vitro cleavage experimental diagram of six CRISPR/Cas12a gene editing systems according to the present invention.
Fig. 5 is a PCR assay of insertion of dsODN after gene editing occurs in eukaryotic cells of the six CRISPR/Cas12a gene editing systems described herein.
Detailed Description
For a better description of the objects, technical solutions and advantages of the present invention, the present invention will be further described with reference to the following specific examples. It will be appreciated by persons skilled in the art that the specific embodiments described herein are for purposes of illustration only and are not intended to be limiting.
In the present invention, the Cas12a, also referred to as Cpf1, is classified into CRISPR system class 2V-type according to homology. Effector protein Cpf1 may be used for genome editing by complementarity to the guide RNA. In contrast to Cas9, the target DNA recognition and cleavage mechanism is different between Cas9 and Cpf1, said Cpf1 having the following features: (1) Cpf1 is guided by a single crRNA, while Cas9 uses crRNA and a second small RNA, transactivation crRNA (tracrRNA); (2) Cpf1 recognizes T-rich PAM, as opposed to G-rich PAM favored by Cas 9; (3) Cpf1 creates staggered ends at the PAM distal target site, while Cas9 creates blunt ends at the PAM proximal target site, compared to the sticky ends after Cpf1 cleavage, which are more susceptible to homologous recombination repair; (4) Cpf1 contains the RuvC domain but lacks a detectable second endonuclease domain, while Cas9 cleaves the target and non-target DNA strands using HNH and RuvC endonuclease domains, respectively.
The test methods used in the examples are conventional methods unless otherwise specified; the materials, reagents and the like used, unless otherwise specified, are all commercially available.
Example 1: six V-type CRISPR/Cas12a gene editing systems
In the embodiment, macro genome annotation is performed by using CRISPRCas Finder software, and the V-type CRISPR/Cas12a system related protein and elements thereof are analyzed, predicted and screened by using biological information means such as NUPACK software prediction crRNA structure, so that the novel editing system is obtained, as shown in figure 1. The gene editing system of the invention d consists of the following elements: the gene encodes endonuclease Cpf1, and the auxiliary proteins Cas1, cas2, cas4 and CRISPR array. Six novel Cpf1 proteins were found, named: 28c2, 28c6, 28c12, 28c13, 28c15 and 30c9. 28c2 protein codes 1262 amino acids, and the sequence of the protein is shown as SEQ ID NO. 1; 28c6 protein codes 1253 amino acids, and the sequence of the protein is shown as SEQ ID NO. 2; 28c12 protein codes 1265 amino acids, and the sequence of the protein is shown as SEQ ID NO. 3; 28c13 protein codes 1274 amino acids, and the sequence of the amino acid is shown in SEQ ID NO. 4; 28c15 protein codes 1260 amino acids, and the sequence of the protein is shown as SEQ ID NO. 5; 30c9 protein codes 1251 amino acids, and the sequence of the protein is shown as SEQ ID NO. 6.
The sequence of the auxiliary protein Cas1 is shown as SEQ ID NO.7, the sequence of the auxiliary protein Cas2 is shown as SEQ ID NO.8, the sequence of the auxiliary protein Cas4 is shown as SEQ ID NO.9, and the three auxiliary proteins participate in the capture of exogenous genes and the maturation of crRNA.
CRISPR array comprises a direct repeat sequence and a spacer sequence, the two sequences are arranged at intervals, one spacer sequence is sandwiched between the two repeat sequences, the base composition and length of the repeat sequences in the same bacteria are relatively conserved, and there are some differences between different bacteria. The invention relates to a repeated sequence (R sequence) D corresponding to six novel CRISPR/Cas12a systems, which is shown as SEQ ID NO. 10-15. CRISPR array is transcribed to form pre-crRNA, spacer of which is a sequence capable of complementary pair hybridization with a target sequence, which then forms after cleavage a mature crRNA with 5 'repeated sequence and 3' Spacer sequence, which crRNA is complementary to the anchor gene of interest, directing the Cpf1 protein to perform an editing function on the target sequence complementary to the Spacer sequence.
The above sequence is specifically as follows:
SEQ ID NO.1 (28 c2 protein sequence):
MVNGTKNYFDCFTGFYPINKTLRFELKPIGKTNALIDEFKKGYVDSIVSLDEKRAESRKKVIEVLDNYYEYFINCVLSKEVLLVNDINEAYKLYKDFKADKKDKNFKSYKVKMRTKISEKFQSEKIKFALKDYKDLFGKKRLQESLLYEWYKQKLNNEEINNEAFEDIVKTLSYFIGFTTSLKDYQENRNNFFVPDEKSTSIAYRIIDENMIRYFDNCIRFETFIENKIDLFESLKQWEEYFKPENYIKYFTQDGIDNYNQIIGRKGKDIYSKGINQLINEYRQINKIKNKNLPTMNQLYKQLLSKHNSEELIVGFKDEKDMLQKIETTYYEYSEIVSKLVSFLSESLADDINLYIRSDSLTNLSNSMFGRWDFINDAIYSYTSGFSEKDKLKYEKDVKEVISLVKLQKVIDTYVSSLDIDEKGKYISNSSIYKYLLSINDLNLKNAYSEAKPILVLNEIDNERTNDSERIQQINKIKSLLDAMLEIMHFYKPLYLYKNGKSLVEVEKDEVFYSEFDYLYSQLMPITKLYDKVRNHITKKPYSKDKFKIYFNKPTLLDGWDLNKENSNLGVLLTKNNNYYLGIMNGKYNTSFDTTVAEVKNQINESSTAIGYLKMEYKQVSGANKMFPKVFFAESNKHIYKPSKEILNIRENKLYTKGADDVESRIKWIDFCKHCIKLHPEWNKYFNFKFKPTTEYEDVNTFYEDADAQMYNVSFISFNESYINELVNEGKLYLFQIYNKDFSPNSKGKPNLHTMYWKMIFEDSNITNINNTGLPVFKLNGEAEIFYRKASLNKKVTHEKNLPIKNKNRNNPKEESIFSYDLYKDKRFMADKFFLHCPITINYRTKPLSSSEFNKKINCIVENNKDISILGVDRGERHLLYYSLINQKGEILKQGSLNSLSTSYERDGQEISVLTDYNSILQGREDERDDARKNWGTIQNIKEIKDGYMSHIVHQLSKILIDNNAVLVLENLNSGFKRGRFKIEKQVYQKFEKAMIEKLNYLVFKDRNSTSPGYYLNGYQLTAPFEGFKNLYSQSGIIYYVWPSYTSKICPRTGFVNLLKLNYENIEKSKEIFNNFDIISYNKAKDYFEFGLDYRRFGKDAGKSKWLICTYGNERYFYNSKLKKFECIDITNKIKELFKSNNIDYLNEKDLRNKITNVNSKDFFNSLLFYLRITLQLRYTNGGNLDENDYILSPINDGSDKFFDSRCASESEPKNCDANGAYHIALKGLRLIHSIEDGTTSKIGNETTDWFTFAQNKNKLVE;
SEQ ID NO.2 (28 c6 protein sequence):
MSKGKIWENFINQYSVSKTLRFELKPVGKTLENINAKGLIEEDEQRAEDYKKAKKIIDEYHKYFIEGALGSCSLDLNILNEFLQLYNKAQKTDADKKEYEKIQTTLRKNIAESFGKNADKKTKEQYENLFKKELLRNDLPDWVEDEEDAKIIERFKTFTTYFTGFHENRKNIYDNEEKSTAIGYRIVHENLPKFIDNMNAFEKISKALDLSEIDRDFQSELGEIKAEEFFTIEFFNQCLNQFGIDRYNTLLGGISEGENIKKKQGLNERINLYNQQLKGERKKERLPKLKVLYKQILSDSSSHSFSIDEFENDNELLESLEIFYKNELIGFNHSGVDSNIFDLVKDLLLKIDESEQSSIYLKNDKGLTEISQRIFGDWNIIKSALEEYYDEHYPPKKDTFNKKELDERSRWLKENHSIGVIEKALANYENEIVREHLKQNSAPIVSYFKSLEVDGENLIDKIYSAYGNISDLLNSSYPDEKKLVSDRTSKDKIKVFLDSLMSLLHFLKPLDVKDLGNKDSAFYGDYDFIVEQLSKLVRLYNKTRNYLTRKPYSIEKIKLNFENSTLLAGWDVNKERDNNCVIFKRQDGDRELFYLGIMDKSHNKIFTKIEEAKSDDVYQKMNYKLLPGPNKMLPKVFFSKKSIDFYAPGEELLKNYKNGTHKKGENFNLQHCHELIDFFKRSINKHEDWSQFNFKFSDTSEYEDTSFFFKEVSQQGYSITFKNIDRETIEKFVDEGKLYLFQIYNKDFSPKSKGRPNLHTLYWKMLFDERNLANTVYQLNGEAEVFYRKKSISEKDRVVHRADEPIGLKNSENSAQKSLFPYDIVKDRRFTVDKFQFHVPITLNFKSEGNERLNISVNKFLKDNPDVNIIGLDRGERHLIYLTLINQKGEILHQESLNEVMGVNYQQKLHRVEKDRTEERRNWDRIENIKELKSGYLSQVVHKISQLMVEYNAIVVMEDLNFGFKRGRIKVEKQVYQKFEKTLIDKLNYLVFKDREPEEPAGVLNALQLTNKFESFKKLGKQCGFLFYVTSDYTSKIDPATGFVNLLYPKYESVEKSQNFFRKFDNICFNSGAGYFEFDFDYSNFTDRADGTRTRWKVCTVGNERFGYNPKTKASETVNVTESLKELLLQHEIAFENGESLVESISKNTTKYFHKSLLNFLRLTLTLRHSKTGTDIDYILSPVANEEGVFFDSRNASDKMPKDADANGAYNVALKGLMVLERINAAEDLSQFKFKDMSIKNKDWLKFVQDRQG;
SEQ ID NO.3 (28 c12 protein sequence):
MIEYTNFIGLYPLSKTLRFKLLPIGKTLENITRNGILTDDKHRAQSYQEVKKLIDEYHKEFIEHTLETFNLELLSTNKQNSLEEYHQLYLKEKNESELKNFTKTQENLRKQIAKTLQNEAKKASLFDKDMIKKNLPDFIQQHPDLKDKENLVKEFDEFTTYFTGFHENRRNMYSDEEKSTAIGYRIIHQNLPKFIDNMIVFSRIQSELQGELNLIAADFKDLLVVNNLDEMFTLPYFNQVLTQSQIDLYNMVIGGKSEEGKIKKQGLNEYINLYNQNHKEQKLPLFKPLFKQILSDRQSLSWLPQQFEEDQELLNAVRECFYSLNDSQCNLKHLQALLVSLADYNLNGIYLTNGPAITTISQQMFNDWNLINRAIIERMSRDIKASSKQKSEAKLEEEIRKRMDSTESFSIQYLNECIETSEIEDIKNAADKRIESAHFARLMICNKKTNEQENLFERIYTAYNEAQTLLNTPYPENQNLIQDQENVARIKYLLDTVKDLQLFVKPLLGKGYEIGKDDTFYGILTRLWTVIDQLTPLYDKVRNYLTRKPYSDKKIKLNFKNSTLLNGWDKNKEADNTAIIMRKEGLFYLGIMNKDIKGYKRMFEKCPQCSEEEAYYEKMEYKLLPGPNKMLPKVFFAKNNIELFKPSERIMAIRENETFKKGDKFNLADCHAFIDFYKESIAKHPEWKDFDFHFSETQLYNDISGFYREVEHQGYKMSFRKIPATYIDQLVENNELYLFQIYNKDFSEYSKGTPNMHTLYWKMLFDERNLADVVYKLNGQAELFYRPASLNYNRPTHPKNEPITNKNKNNPKKESIFKYDLTKDKRYTQDTFLLHVPITLNFKGTNNGNINQQVNSYLQTADNTHIIGIDRGERHLLYLVVIDMKGNIKEQFSLNEIANQNKGIEYRTNYHQLLENREKERVEARVNWQNIENIKDLKEGYLSQVIHLITQLMLKYHAIVVLEDLNFGFMKGRQKVEKSVYQKFEKQLIDKLNYLVNKQIDAEKPGGLLKAYQLAKPFESFQKMGKQSGFLFYIPAWMTSKIDPVTGFVNLLNTNYVNVKESQKFFSNFDRIAYNPEKDWLEWDIDYNKFTTKAKNSRHNWTICTQGERIENHRNEKNGQWNSQNVNLTEEFKKLFALYDIDLAQDLKKYIIQQNDAKFFKELHRILKLTLQMRNSQINSDIDYLVSPVANAEGCFYNSQTANATLPANADANGAYNIARKGLYLLQQIKKAPDLAKLKLTISNEEWLKFAQEKTYQND*;
SEQ ID NO.4 (28 c13 protein sequence):
MFNQFTNLYPVIKTLRFELKSIGNTMDTIESNQVIHNDEKRADAYAKLKVTLDAYHKDIIEKVLSRARLTGLEDYAIAVNNLKTSKGNAAYGKELTKNKEQLRKQIAGFFKQPEFAPIFKDLFKEGVIKKDVKAWIDTQPNPSDYFYSDDFANFTGYFGNYNLIRQNLYSPEAKHGTIAYRLIDENLPKFIDNLSILQNIQNKNPDLFDQLSDQYQQYFSELLPSKPTLADFVSLDTFNDLLTQKGLDAYQQIIGGIKTENQLIQGINVLINLHNQQHPEQSKTPKLKPLYKQLLSDRGTFKLPRKFEDDAEMIQANRQYFEEVLGNNTLFETGETPTEAMNQLFLSIENYDLSKIFIESPLLVTSISQKIYGSYAVIPQALEYYHDNHVNPSYAAKFNKAKSDKSRETMEKAKAAWVKGVHAVSVIHQAVIAYNDVLPDDAKLTDTQPVISYYKDIQYSEKTGESQQIFDALMRRYHQAKGMLNTDYPKGSKQILNNKSSFAIVKNLLDVSKAYVNAARDLTIKKPEGLDLDLLFYERLAKTYTYLQDLHALYDTTRNYVTQKPFSTDKIKLNFDCAQLLAGWDFNVIDAKRGVFLVKNGRYYLVIIDNKHKKAMNNLPAPITNNCYDKYNMRLSKDAHMALPKKLFTKDNLKIPAIAEMERRCRDKNGGHHLRKSPDFDKDFMHQMIDTFKDIIKKDKDFDVFGFQFKPTHQYEDINEFYADFNEQALVTWYDKVDSDVIDSLVAEGKIYLFEVYSKDFSDKSTGTPNQQSLILQYLFSQDNLAKRHFKLNGEAEVFYRKASIDKDKAVVHKKGSLLENKNPARPNSKIAKFDIVKDRHYTEDKLFLHIPITLNNNAADMKSYAMNSKVLNTLKTNGGVNVIGIDRGERNLLKITVINSAGEILHQESLNKITSGQDMVTDYHELLDKKEQSRAESRLNWQEVESIKEIKQGYLSQVVYRLSQLMLQYKAIVVLEDLNIGFKRGRFKIEKQVYQNFEKALINKLNYLVLKQLEATEVGGTAHGYQLTAPFESFQKLGKQSGWLFYVPAWNTSHIDPTTGFVNLHHFKYESVAQATDIIDKLSNIRYNPEKDYFEFAIDYNEFTFKGGDSQKYWVVCSTPYKRYVFDKKANMGRGGTKAVDVNAELKALFAAHGVDYASGEDLRPQIKAKANKELLSQLLFLLKTLTAMRYTNASSYEDYILSPVVNKAGEFFDSRKGDATLPLDADSNGSYHIALKGLCLLQRVYDWRGEEFKGLDLFISNNDWLKFAQDRH*;
SEQ ID NO.5 (28 c15 protein sequence):
MSNTKDNIFNNFTGIYPINKTLRFELRPVGKTYDLIKDFKNGYVESIVAIDEKRSEARKRIIEIIDEYYEEFINTVLSKKVFYSDDIWQTYTSYKAYKSDKRNKEFVTQKAIMRKKISDAFQNEKTKFNLKDFKDLFGKKSNLKESPLYKWYKNKLDIGEITGEDFEDIIKIITYFIGFTTSLKDYQENRNNLFVAEEQSTAISHRIIDVNMIRYFENCIRFENMKDSELLEDMGKWEKYFVPANYDNFFTQEGIDNYNEIIGRKSKDLYYKGVNQLINEYRQKNKIKNKDMPTMNQLYKQHISKNGDNEINNDFSNEKEMLEQIEQAYITSLDKINRIVSFINENITEGNKIFIRKDFVTNISNRLFGEWNFINNALYSYLSGLSAKNKELFVKQTEEVIKISELQNIIDLYINNLDEDEKEKYLKTDAIYTHFCSFDVCGVQNAYYEAKTVLAVDEINKDREKEEEGAKQISKVKKLLDEILEAVHFYKPLYLYKNGKEIDEIEKDEIFYSEFDYLYSQLMLVTELYDRVRNYLTKKPYSKDKFKIYFNKPTLLDGWDLNKEKNNLSVLLIKDGFYYLGIMDSKYNSVFDVSADDVKINTTELSEEATFLKMEYKQVSGASKMFPKVFFAASNKDMFKPSEEILNIRENKQYLKGANNREAVIKWIDFCKDCLKIHPEWNRYFNFNFRHSDEYENVNSFYEDADTQMYYINFVKFKETYINDLVEEGKLFLFQIYNKDFSEYSKGKPNLHTVYWKMLFDENNVRNINDNTGKPVFKLNGEAEIFYRKASLDKKVTHKKNYPIKNKNKHNNKTESIFEYDLYKDKRFMDDKFFFHCPITINYRAKNILSSEFNKKFNLHIKNSDNMNILGVDRGERHLLYYSLINIKGGIIKQGSLNTIYDSYEKDGINIPVITDYKSILKDREDERMDSRKNWGTIKNIKEMKEGYLSHVVHQVSKLLIDNNAILVLENLNSGFKRRRLKIEKQVYQNFEKSLINKLNYLVLKDADNKDVGHFLKGYQLTAPFEGFQRLNNQSGIIYYVWPSYTSKICPRTGFVSLLHINYENIEKSKEFFNKFDKISYNKDKDYFEFHLDYTRFGKNAGKNKWVICTYGKDRYFFNQKLKKYEYIDITEKIKELLSNNGIDFINENDMRKSIVENNSKNFFGSLLFYLKVVMQLRYTNSNDGCRNENDYILSPVADINGMFFDSRHACDNEPENADANGAYHIALKGLRMIQFIENGVITKQGNETTDWFKFAQNKL*;
SEQ ID NO.6 (30 c9 protein sequence):
MSAQSALSTLINKYSLSKTLRFELIPIGKTKESIDRKGLLSQDVKRAQSYKEVKKIIDEYHKEFIEKSLINAKLKGLEEFSKLYYKLQKEDKDKKNIKKMQDNLREQISDLFKNNKKDKWNILFKEDLIKKELPLFAKDDKQKNLINEFNKFTTYFTGFHKNRKNMYAEEEKSTSIPYRIIHQNLPKFLDNIRIFEKIKKNKINTDVIEKELSLFLNGIKINDIFSINFFNDVLNQKGITFYNTILGGVSEKDRTKIKGINEYVNTEYNQKQLDKKSKIPKLKQLYKQILSDTETASFVLEQFENDNQLLEKIEQFYNTELINYETEGKTQSVFLQFEQLFKNMQNYDASKIYISNLSIANISKIIFGDWSIICNALAEWYDKHNTKGKKINEYKKENFLKQDFSIQQIEDAVLEYKNDTLNKEINFLLNYFASFLNEKSKKNIIQRIETEYSKVKDLLNTDYPEKKKLASDKDNVSKIKAFLDSLMDFLHFVKPFNIKKDTGLEKEENFYSIYVPLFEQIDKIIPLYNKVRNYLTKKPYSTEKIKLNFENSTLLDGWDLNKESDNTSVVLRKDDLYYLGIMDKKHNRIFKELPSQNGNESSYEKMIYKLLPGPNKMLPKVFFSKKGKKQFKPSKKLLKKYEDGTHLKGDNFNINDCHNLIDFFKESIAEHEDWKQFDFKFSSTSSYKDLSNFYKEVEKQGYKITFQNISENYINQLIDEGKLYLFQIYNKDFSKYSKGTPNLHTLYWKMLFDNDNLKNIVYKLNGKAEVFYRKSSLILGDNIVHKAGEAIINKNPDNEKKHSTFDYDLIKDKRFTLDKFQFHVPITLNFKSEGRQNLNEDVRKFLKNNPDINIIGIDRGERHLLYLTLINQKGKILFQKSLNEITNEYNNKNGKSQIKSTNYHSLLDKKEKKRDEARKNWGIIENIKELKEGYMSQIVHYISKLMIEKNAILSLEDLNFGFKRGRQKVEKQVYQKFEKMMIDKLNYLVFKDKKANETGGLLNALQLTNKFESFAKLYNQSGFIFYVPAWNTSKIDPITGFVNLLKPYYENLNKSQEFFKKFNNIKYNPKQEYFEFNFDYKNFTNKAEGSKNVWEICTTNNERFMWDKTLNSGKGAQKAVDVTQELKKLFDSSKINYLNGNDIKEDIINQNSADFFRKLMKLLSVVLSLRHNNGLKGKDEKDFILSPVEPFFNSLNAKMEEPKDADANGAYNIALKGLLILKQINESEDLRKIKFNLSNKEWLKFAQSKSF;
SEQ ID NO.7 (cas 1 protein sequence):
MNQLVTGGISVLNKGEFIKKQILVYEPFLGDKMSYKNDNMVIRDGNGKIKYQVSCYRIFMVLIVGDVTITTGILRRQQKFGFRLCFLTLGLKVYSVIGPQLQGNTLLHCKQYAYDELTVGKSIIINKILNQRAALTRLRSKTEDVWECISLLEQYSKRLQNDSLNLQEIIGIEGMASKIYFPRIFSNTQWIGRKPRIKFDYINTLLDIGYNALFNFIDAILQVFGFDVYYGVLHTCFYMRKSLVCDIMEPMRPIVDWQIRKSINLKQFKQDDFVQVGKQYQLKYKKSTQYLQVFLEAILNYKEEIFVYVRDYYRSFMKNNPIEAYPVFKLEEL;
SEQ ID NO.8 (cas 2 protein sequence):
MIIVSYDISDDKLRTKFSKYLSRFGHRIQYSMFEIDNSERILNNIICDIHNQFEKKFSQEDSIYIFNLSKWCKIE RFGYAKNETNDLLVLTGCKPRP;
SEQ ID NO.9 (cas 4 protein sequence):
MEDIILITELNDFIFCPASIYFHHLYGSRDPVLFQSEAQIKGTKAHEAVDSGCYSKKSSILQSLDVYCEKYRL LGKIDIYDGKKKILRERKRQIKQVYDGYIFQLYGQYFSLIEMGYEVDKMELYSMIDNKKYPIELPHNNINMLM KFEMLIHEMREFRLDDRFIQENANKCKNCIYEPACDRGNIGAK;
SEQ ID NO.10 (28 c2 direct repeat):
GAATTTCTACTGTTGTAGAT;
SEQ ID NO.11 (28 c6 direct repeat):
AAATTTCTACTTCTGTAGAT;
SEQ ID NO.12 (28 c12 direct repeat):
TAATTTCTACTATTGTAGAT;
SEQ ID NO.13 (28 c13 direct repeat):
AATTTCTACTATGTGTAGAT;
SEQ ID NO.14 (28 c15 direct repeat):
AAATTTCTACTGTTGTAGAT;
SEQ ID NO.15 (30 c9 direct repeat):
TAATTTCTACTATTGTAGAT。
example 2: predicting the secondary structure of crRNA of gene editing system recognition target sequence
This example is a predicted secondary structure of crrnas for the 6V-type CRISPR/Cas12a gene editing system described in example 1 for use in identifying target sequences.
The specific operation is as follows:
the secondary structure of mature crRNA is obtained by simulating the action process of the in vitro 37 ℃ repeated sequence by using alpha fold and carrying out secondary structure prediction. The upstream sequence of the repetitive sequence is removed by the Pre-crRNA under the action of Cpf1 nuclease to obtain a repetitive sequence of 20nt, and a spacer sequence of 23nt forms mature crRNA which is fused with Cpf1 protein to form a crRNA-Cpf1 complex. The crRNA-Cpf1 complex was first scanned for the appropriate PAM and then continued to scan for DNA sequences complementary to the spacer sequence, the Cpf1 nuclease activity was activated and the results are shown in figure 2.
Example 3: in vitro PAM depletion assay
This example excavates the PAM sequences required for the Cas nuclease recognition spacer sequence of the 6V-type CRISPR/Cas12a gene editing system described in example 1 by an in vitro PAM depletion experiment.
The specific operation is as follows:
(1) For the V-type CRISPR/Cas12a gene editing system in the above example 1, the nucleotide sequence corresponding to the coding Cas protein is inserted into a psumo protein expression vector through homologous recombination, the recombinant plasmid with correct sequence is transformed into E.Coli Rosseta 2 (full-size gold, CD 811-02) competence, a Kana resistance (50 mug/mL) culture dish is coated after activation, the monoclonal is selected the next day, a large amount of bacterial liquid is used for culturing, and the recombinant protein is sequentially subjected to Ni column affinity chromatography and molecular sieve purification and then stored at-80 ℃ for standby.
28c2, 28c6, 28c12, 28c13, 28c15 and 30c9 are shown in SEQ ID NO. 16-21, respectively.
(2) 6-position random bases NNNNNNNN (total 4096 inserts, N represents A, G, C, T) are added to the 3 '-end of a library interval sequence (the sequence of which is shown as SEQ ID NO. 22), and an overlap PCR method is adopted to construct library on a skeleton carrier, so that a Spacer-PAM mixed plasmid with 4096 different PAM combinations, but the 5' -end interval sequence is the same, and random base abundance Gini values of 6 positions detected by second-generation sequencing are smaller than 0.1, so that the random base distribution of 6 positions is relatively uniform.
(3) Using the repetitive sequence of the V-type CRISPR/Cas12a gene editing system described in example 2 above, and the corresponding library spacer, 5'-T7 promoter+repetitive sequence+spacer form-3' was constructed and crRNA was obtained by in vitro transcription (NEB, E2040S) and RNA purification (NEB, T2040S).
(4) 10pmol of the Cpf1 purified protein from step (a) was mixed with 10pmol of crRNA and incubated at room temperature to give a protein-crRNA complex. Uniformly mixing with 200ng of Spacer-PAM mixed plasmid, incubating at 37 ℃ for 30min, and cutting a Spacer complementary to crRNA by identifying proper PAM in 4096 mixed PAM combinations with Cpf1 protein; an appropriate amount of proteinase K was added and incubated for 15min at room temperature to digest excess Cpf1 protein and further inactivated at 98℃for 10min to inactivate proteinase K activity.
(5) On a Spacer-PAM mixed plasmid, designing proper primer pairs at two ends of random bases to contain a position of a combination of a Spacer sequence and PAM for PCR amplification and purification, adding joints at two ends of the product for second generation sequencing (the joints adopt commercial illumina sequencing joint primers: hieff NGS384 Dual Index Primer Kit for)Set1, cat No. 12613ES02; i5 primer: taagattata; i7 primer: GAGATTCC), the consumption of 6 random bases was analyzed by weblog 3 using PAM depletion threshold of the negative control group as a control benchmark, and PAM sequences recognized by each Cpf1 protein were obtained by negative screening.
As shown in FIG. 3, the six gene editing systems can identify a plurality of different PAM sequences, the 28c2, 28c6, 28c12 and 28c15 identify PAM with 3 bases, the 30c9 and 28c13 only need to identify PAM with 2 bases and even 1 base, and simpler PAM appears more frequently on the genome, which means that the novel Cpf1 protein can be targeted in a wider range, and the possibility of accurately editing specific gene loci can be greatly improved.
In particular, compared to the prior art Cpf1 proteins, most of the recognition PAM is rich in T bases, and the use of these Cpf1 proteins is limited for some gene sequences with less T base content. The 30C9 protein provided by the invention recognizes that PAM is rich in C base, has editing functions in vitro and under eukaryotic conditions, enriches the PAM diversity of the existing Cpf1 protein, and can be used as an ideal editing tool for editing some gene sequences with high GC content; on the other hand, referring to the protein sequence and editing characteristics of the 30C9 nuclease, the method can provide reference value for further excavating more novel Cpf1 nucleases with PAM rich in C base groups, and expands the existing gene editing variety.
The above sequence is specifically as follows:
SEQ ID NO.16 (28 c2 gene sequence):
ATGGTGAACGGCACCAAGAACTACTTCGACTGTTTCACCGGGTTCTACCCCATCAACAAGACCCTGCGGTTCGAGCTGAAGCCGATCGGGAAAACCAACGCCCTCATCGACGAGTTCAAGAAGGGCTACGTGGACTCCATCGTGAGCCTGGACGAGAAGCGGGCCGAGTCCAGGAAGAAAGTGATCGAGGTGCTGGACAACTACTATGAGTACTTCATCAACTGCGTGCTGAGCAAGGAGGTCCTGCTGGTGAACGACATCAACGAGGCCTACAAGCTATACAAGGACTTCAAGGCCGACAAGAAGGACAAGAACTTCAAGTCCTATAAGGTGAAGATGAGGACCAAGATCTCCGAGAAGTTCCAGTCCGAGAAGATCAAGTTCGCCCTGAAAGACTACAAGGACCTCTTCGGCAAGAAGCGCCTGCAGGAGTCCCTGCTGTACGAGTGGTACAAGCAGAAGCTGAACAACGAGGAGATCAACAACGAGGCCTTTGAGGACATCGTGAAAACCCTGAGCTACTTCATCGGCTTCACCACCAGCCTGAAGGACTACCAGGAGAACAGGAACAACTTCTTCGTGCCCGACGAGAAGAGCACCTCCATCGCTTACCGCATCATCGACGAGAACATGATCCGGTACTTCGATAACTGCATCCGGTTCGAGACCTTCATCGAGAATAAGATTGACCTGTTTGAGAGCCTGAAGCAGTGGGAGGAGTACTTTAAGCCCGAGAATTACATCAAGTACTTTACACAGGACGGGATCGACAACTACAACCAGATCATCGGGCGGAAGGGGAAGGACATCTACTCCAAGGGAATCAACCAACTGATCAACGAGTACCGGCAGATTAACAAGATCAAAAATAAGAACCTGCCGACCATGAATCAGCTCTACAAGCAGCTCCTGAGCAAGCACAACAGCGAAGAGCTGATCGTCGGCTTCAAGGACGAGAAGGACATGCTGCAGAAGATCGAGACCACTTACTACGAGTACTCCGAAATCGTGTCCAAGCTGGTGAGCTTCCTGAGCGAGTCCCTGGCCGACGACATCAACCTTTACATCCGCTCCGACAGCCTGACTAATCTGAGCAACAGTATGTTTGGCC
GCTGGGACTTTATCAACGACGCCATCTACTCTTACACCAGCGGATTCTCTGAGAAAGACAAGCTGAAGTA
CGAGAAGGACGTTAAGGAAGTGATCAGCCTCGTGAAGCTGCAGAAGGTTATCGACACCTATGTGAGCAG
CCTCGATATAGACGAGAAGGGGAAGTACATCTCCAATTCAAGTATCTACAAGTACCTGCTGTCCATCAAT
GACCTGAACCTGAAGAACGCCTACTCCGAGGCAAAGCCTATCCTCGTTCTCAACGAGATCGATAACGAGA
GGACAAATGACAGCGAGCGCATCCAGCAGATCAATAAGATCAAGTCCCTGCTGGACGCCATGCTGGAGA
TTATGCACTTCTATAAGCCCCTGTACCTGTATAAGAACGGCAAGAGCCTCGTCGAGGTGGAGAAGGACGA
GGTGTTCTATTCCGAGTTTGACTACCTCTACAGCCAGCTCATGCCAATCACAAAACTGTACGATAAGGTGC
GGAACCACATCACAAAGAAGCCCTACAGCAAGGACAAGTTCAAGATCTACTTCAATAAGCCCACTCTCCT
CGATGGCTGGGACCTTAATAAGGAAAACTCAAACTTGGGGGTGCTGCTTACCAAGAACAACAACTACTAC
CTGGGCATCATGAACGGGAAGTATAACACTTCCTTCGATACAACAGTGGCCGAGGTGAAAAACCAGATTA
ACGAGAGCTCTACAGCTATCGGGTATCTGAAGATGGAGTACAAGCAGGTCTCCGGGGCCAACAAGATGTT
CCCTAAGGTGTTCTTCGCCGAGTCCAATAAGCACATCTACAAGCCCTCCAAGGAGATACTGAACATCAGA
GAGAACAAGCTCTACACTAAGGGCGCTGACGATGTGGAGTCTCGCATCAAGTGGATTGACTTCTGCAAGC
ACTGTATCAAGCTGCACCCTGAGTGGAACAAATACTTCAACTTCAAGTTCAAGCCCACCACCGAGTACGA
GGACGTTAACACATTTTATGAAGATGCTGACGCCCAGATGTATAACGTGTCTTTTATCTCTTTCAACGAGA
GTTACATCAACGAGCTCGTCAATGAGGGGAAACTGTACCTGTTTCAGATCTATAATAAGGATTTTTCCCCA
AACAGCAAGGGCAAGCCAAATCTGCACACCATGTATTGGAAGATGATCTTCGAGGATAGCAATATTACTA
ACATCAACAATACCGGCCTCCCAGTGTTTAAGCTGAACGGCGAGGCCGAGATCTTCTACCGCAAGGCCAG
CCTGAATAAGAAGGTGACACACGAGAAGAACTTGCCCATCAAGAACAAGAACCGCAACAACCCCAAGGA
GGAGAGCATCTTCTCCTACGACCTCTACAAGGACAAGCGCTTCATGGCCGACAAGTTTTTCCTGCACTGTC
CTATCACCATCAACTATCGGACAAAGCCCCTCAGCAGTAGCGAGTTTAACAAGAAAATCAATTGCATCGT
GGAGAATAATAAGGACATCAGCATCCTGGGCGTGGATAGAGGCGAGCGCCATCTGCTGTACTATTCCCTG
ATCAATCAGAAGGGGGAGATCCTGAAGCAGGGCAGCCTGAACTCCCTTAGCACAAGTTACGAGCGTGAC
GGCCAGGAAATCAGCGTGCTCACCGACTACAACTCCATCCTGCAGGGCAGGGAGGACGAGCGCGACGAT
GCTAGGAAAAACTGGGGGACCATCCAGAATATCAAAGAGATCAAAGACGGCTACATGTCCCACATTGTG
CACCAACTGAGTAAGATCCTCATTGACAACAACGCCGTGCTCGTGCTCGAAAACCTGAACAGCGGCTTTA
AGCGGGGCCGGTTCAAGATCGAGAAGCAGGTCTACCAGAAGTTTGAGAAGGCCATGATCGAGAAGCTGA
ACTACCTAGTCTTTAAGGACCGGAACAGCACCAGCCCAGGCTACTATCTGAACGGCTACCAGCTCACCGC
CCCGTTCGAGGGCTTCAAGAACCTGTATAGCCAGAGTGGCATCATCTACTACGTGTGGCCATCCTACACCT
CTAAGATCTGTCCACGCACCGGCTTTGTCAACCTCCTGAAGCTGAATTACGAGAACATCGAGAAGTCCAA
GGAGATCTTTAACAACTTTGACATCATCTCCTACAATAAGGCAAAGGACTATTTCGAGTTTGGCCTCGACT
ATCGCAGATTTGGGAAGGACGCAGGCAAGTCAAAGTGGCTGATCTGCACCTATGGAAATGAGAGGTACTT
CTACAACAGCAAGCTGAAGAAGTTCGAGTGCATCGACATCACCAACAAGATCAAGGAGTTGTTTAAGTCC
AACAACATCGACTACCTGAACGAGAAGGACCTGCGGAACAAGATCACCAACGTGAACAGCAAAGATTTC
TTCAACTCCCTGCTGTTCTACCTGCGCATCACCCTGCAGCTCCGCTACACCAATGGGGGAAACCTGGATGA
GAACGACTATATCCTGAGCCCCATCAACGACGGATCTGATAAGTTCTTCGACTCCCGGTGCGCCTCCGAG
AGCGAGCCTAAGAACTGCGACGCCAACGGGGCCTACCACATCGCTCTGAAGGGCCTGCGTCTGATCCACA
GCATCGAGGACGGCACTACCAGCAAAATCGGCAATGAAACCACCGATTGGTTCACCTTCGCCCAGAACAAGAACAAGCTGGTGGAG;
SEQ ID NO.17 (28 c6 gene sequence):
ATGAGCAAGGGCAAGATCTGGGAGAACTTCATCAACCAGTATAGCGTGAGCAAGACCCTGAGGTTCGAGCTGAAGCCCGTGGGCAAGACCCTGGAGAACATTAACGCTAAGGGGCTGATTGAGGAGGACGAGCAGCGGGCCGAGGATTACAAGAAGGCTAAGAAGATCATCGATGAGTACCATAAGTACTTTATCGAGGGGGCTCTGGGAAGCTGCAGCCTGGACCTGAACATCCTGAACGAGTTTCTGCAGCTCTACAACAAGGCCCAGAAAACCGACGCCGACAAGAAGGAGTACGAGAAGATCCAGACCACCCTGCGGAAGAATATCGCCGAGAGCTTTGGCAAGAACGCCGATAAAAAGACCAAGGAGCAGTATGAGAACCTGTTCAAAAAGGAGCTCCTGCGGAACGATCTGCCTGACTGGGTGGAGGACGAGGAGGACGCCAAAATCATCGAGCGCTTCAAGACTTTCACCACCTATTTTACCGGGTTCCACGAGAACAGGAAGAACATCTACGACAACGAGGAGAAGTCCACCGCCATTGGGTATCGGATCGTCCACGAGAACCTCCCCAAGTTCATTGACAATATGAACGCTTTCGAGAAGATCAGCAAGGCCCTGGATCTGTCCGAGATCGACCGGGACTTCCAGAGCGAGCTGGGGGAGATCAAGGCCGAGGAGTTCTTTACCATTGAGTTCTTCAACCAGTGTCTGAACCAGTTCGGCATCGATCGCTACAATACTCTGCTCGGCGGCATCTCCGAGGGCGAGAATATCAAGAAGAAGCAGGGGCTGAATGAGAGGATCAACCTGTATAACCAGCAGTTGAAGGGAGAGAGGAAGAAGGAGAGGCTGCCCAAGCTGAAGGTGCTCTACAAGCAGATTCTCAGCGACAGCTCCAGCCACTCCTTTAGCATCGACGAGTTCGAGAACGACAACGAGCTGCTGGAGTCCCTGGAAATCTTTTACAAGAATGAGCTGATCGGCTTTAATCACAGCGGCGTGGACTCTAACATCTTTGACCTCGTGAAGGACCTGCTGCTGAAGATCGACGAGTCCGAGCAGTCCTCAATCTACCTGAAGAACGATAAGGGACTGACAGAGATCTCTCAGCGGATCTTTGGCGACTGGAACATTATCAAGAGCGCCCTGGAGGAGTACTATGACGAGCACTACCCTCCAAAGAAGGACACATTCAACAAGAAGGAGCTGGATGAGCGCTCACGGTGGCTGAAGGAGAACCACAG
CATCGGCGTCATCGAGAAGGCCTTGGCCAACTACGAGAACGAAATTGTGAGGGAGCATCTGAAACAGAA
CTCCGCCCCCATCGTGAGCTATTTCAAGTCCCTGGAGGTGGACGGCGAGAACCTGATCGATAAGATCTAC
AGCGCCTACGGCAACATCAGCGATCTCCTGAATAGCAGCTACCCTGACGAGAAGAAGCTGGTGAGCGATC
GGACCAGCAAGGACAAGATTAAGGTGTTCCTGGACAGCCTCATGTCCCTGCTGCACTTTCTCAAGCCTCTG
GACGTTAAAGACCTGGGGAATAAGGACAGCGCATTTTACGGCGACTACGATTTTATCGTGGAGCAACTGT
CCAAGCTGGTGCGGCTCTACAATAAGACAAGGAATTATCTGACCAGAAAACCCTACAGCATCGAGAAAA
TCAAACTGAACTTCGAGAACAGCACCTTGCTGGCCGGATGGGATGTGAACAAGGAACGGGACAACAACT
GCGTGATCTTTAAGAGGCAGGACGGCGACCGCGAGCTGTTCTACCTGGGAATCATGGACAAATCCCACAA
TAAGATCTTCACTAAGATTGAAGAGGCTAAGTCCGACGATGTGTACCAGAAGATGAATTATAAGCTGCTG
CCAGGGCCTAACAAGATGCTGCCCAAGGTCTTTTTCTCTAAGAAATCCATCGACTTTTACGCACCTGGGGA
GGAACTGCTGAAGAACTACAAGAATGGGACCCATAAGAAGGGCGAAAACTTCAACCTCCAGCACTGCCA
CGAGCTGATTGACTTCTTTAAGCGGTCCATCAATAAGCACGAGGACTGGTCTCAGTTCAACTTCAAGTTTT
CTGACACCAGCGAGTACGAGGACACCTCCTTCTTCTTCAAGGAAGTGTCCCAGCAGGGCTACAGTATCAC
ATTCAAGAATATTGATAGGGAAACAATCGAGAAGTTCGTGGACGAGGGGAAGCTGTATCTGTTCCAGATC
TATAACAAAGATTTCAGCCCCAAGAGCAAGGGCAGACCCAACCTGCACACCCTGTACTGGAAGATGCTGT
TCGATGAGCGGAATCTGGCCAACACCGTGTACCAGCTCAATGGGGAGGCCGAGGTGTTTTACCGCAAGAA
GAGCATCAGCGAGAAAGATAGGGTGGTGCACAGGGCCGACGAGCCTATTGGCCTGAAGAACTCCGAGAA
CAGTGCCCAGAAGAGCCTTTTTCCTTATGACATCGTGAAGGATCGCCGGTTCACCGTGGACAAGTTTCAGT
TCCATGTGCCCATCACTCTGAACTTCAAGAGCGAGGGGAACGAGCGGCTGAATATTAGCGTGAACAAGTT
CCTGAAGGACAACCCCGACGTTAACATCATCGGCCTGGACAGAGGCGAGCGGCACCTGATCTACCTGACC
CTGATCAATCAGAAGGGTGAAATCCTTCACCAGGAGTCCCTGAACGAGGTCATGGGAGTGAACTACCAGC
AGAAGCTGCACAGAGTTGAGAAGGACAGGACAGAAGAGAGGCGGAACTGGGACCGGATCGAGAACATA
AAGGAGCTGAAGTCTGGATACCTGAGCCAGGTGGTCCATAAGATTAGCCAGCTCATGGTGGAGTACAATG
CCATCGTGGTCATGGAGGATCTGAATTTTGGCTTCAAGCGGGGCCGAATCAAGGTGGAGAAGCAGGTGTA
TCAGAAGTTCGAAAAGACCCTGATCGACAAGCTGAATTATCTGGTGTTCAAGGACCGGGAACCTGAAGAA
CCTGCCGGAGTGCTCAACGCCCTGCAGCTCACCAACAAATTTGAGTCCTTCAAGAAGCTGGGCAAGCAGT
GCGGCTTCCTGTTCTACGTGACAAGTGACTACACTAGCAAGATCGACCCCGCCACCGGCTTCGTCAACCTG
CTGTACCCTAAGTATGAGTCAGTGGAGAAGTCCCAGAACTTCTTCAGAAAATTCGACAACATCTGCTTCA
ACTCCGGCGCAGGCTACTTCGAGTTCGACTTCGACTACTCCAACTTCACCGATAGAGCCGATGGGACCCG
CACCCGCTGGAAGGTGTGCACCGTGGGCAACGAGAGGTTCGGCTACAATCCAAAGACCAAGGCCAGCGA
GACCGTGAATGTGACCGAGTCCCTGAAGGAGCTGCTGCTGCAGCACGAGATCGCCTTCGAGAATGGCGAA
TCTCTGGTGGAGTCCATCAGCAAGAACACTACCAAATACTTCCACAAGTCCCTGCTGAATTTTCTGAGGCT
GACCCTGACCCTGAGACATAGCAAGACCGGCACCGACATCGATTACATCCTGAGCCCTGTGGCCAACGAG
GAGGGCGTGTTCTTCGACTCCCGGAATGCCAGCGATAAGATGCCAAAGGACGCCGACGCCAACGGAGCC
TACAACGTGGCCCTGAAGGGCCTGATGGTGCTGGAGAGGATTAACGCCGCCGAGGACCTGAGCCAGTTCAAGTTTAAGGACATGAGCATCAAGAACAAGGACTGGCTGAAGTTCGTGCAGGACAGGCAGGGC;
SEQ ID NO.18 (28 c12 gene sequence):
ATGATCGAGTACACCAACTTCATCGGCCTGTACCCCCTGTCCAAGACCCTGAGATTCAAGCTGCTGCCCATCGGCAAGACTCTGGAGAATATCACCCGCAACGGCATCCTGACAGATGACAAGCACCGCGCCCAGAGCTATCAGGAGGTGAAGAAGCTGATCGATGAGTACCACAAGGAGTTCATCGAGCACACCCTGGAGACCTTTAACCTGGAACTGCTTAGCACCAACAAGCAGAACTCCCTGGAGGAGTACCACCAGCTTTACCTGAAGGAGAAGAACGAGTCCGAGCTGAAGAACTTCACCAAGACACAGGAGAACCTGCGCAAGCAGATCGCCAAAACCCTGCAGAACGAGGCCAAGAAGGCTAGTCTGTTCGACAAGGATATGATTAAGAAGAACCTGCCCGACTTTATTCAGCAGCACCCCGACCTGAAGGACAAGGAAAACCTCGTGAAGGAGTTCGATGAGTTCACCACATACTTTACAGGCTTCCATGAGAACCGGAGGAACATGTATAGCGACGAGGAGAAGAGCACCGCCATCGGCTATCGGATTATCCACCAGAACCTGCCCAAGTTCATTGACAATATGATCGTCTTTAGCCGCATCCAGTCCGAGCTGCAGGGCGAGCTGAACCTGATCGCCGCTGACTTCAAGGACCTGCTGGTGGTCAACAACCTGGATGAGATGTTTACCCTGCCCTACTTCAACCAAGTGCTGACCCAGAGCCAGATCGACCTCTATAACATGGTAATTGGCGGGAAGAGCGAGGAGGGAAAGATTAAGAAGCAGGGACTGAACGAGTACATAAACCTGTATAACCAGAACCATAAGGAGCAGAAGCTGCCCCTGTTCAAGCCACTCTTCAAGCAGATCCTGAGCGATCGGCAGAGCCTGTCCTGGCTGCCCCAGCAGTTTGAGGAGGACCAGGAGCTGCTGAACGCCGTGAGGGAGTGCTTCTACTCCCTGAACGACTCCCAGTGCAACCTGAAGCACCTGCAGGCTCTGCTGGTTAGCCTGGCCGATTATAACCTGAATGGGATCTACCTGACCAATGGCCCCGCCATCACCACCATTAGCCAGCAGATGTTTAACGACTGGAACCTGATTAACCGCGCCATCATCGAGCGGATGAGCCGGGACATCAAGGCCAGCTCCAAGCAGAAGAGCGAGGCCAAACTGGAGGAGGAGATCAGGAAGCGGATGGACAGCACTGAGTCTTTCTCCATCCAGTACCTGAACGAATGCATCGAGACCAGCGAGATCGAGGACATCAAAAATGCCGCCGACAAGCGCATCGAAAGCGCCCACTTTGCCAGGCTGATGATCTGCAACAAGAAAACCAACGAGCAGGAGAATCTCTTCGAAAGGATCTACACCGCCTACAACGAGGCCCAGACCCTGCTGAATACCCCCTACCCAGAAAATCAGAATCTGATCCAGGACCAGGAGAACGTG
GCCCGGATCAAGTACCTGCTAGACACCGTAAAGGACCTCCAGCTTTTCGTTAAGCCACTGCTGGGGAAGG
GCTACGAAATCGGAAAGGATGACACCTTTTATGGTATACTGACCCGGCTGTGGACTGTGATCGACCAGCT
CACCCCCCTGTACGATAAGGTGCGAAATTACCTGACCCGCAAGCCTTACAGCGATAAGAAAATCAAGCTG
AATTTTAAGAACTCTACTCTGCTGAACGGCTGGGATAAAAATAAGGAGGCAGATAACACTGCCATCATCA
TGCGCAAGGAGGGACTGTTTTACCTGGGCATCATGAACAAGGACATTAAGGGGTATAAGAGGATGTTCGA
GAAGTGCCCTCAGTGCAGCGAGGAGGAGGCCTACTACGAGAAGATGGAGTACAAGCTCCTGCCTGGGCC
AAACAAGATGCTCCCTAAGGTGTTTTTCGCCAAGAACAACATTGAGCTGTTCAAACCCTCCGAGAGGATC
ATGGCAATCCGGGAGAACGAGACCTTTAAGAAAGGCGACAAGTTCAACCTCGCTGACTGCCACGCCTTCA
TCGACTTCTACAAGGAAAGCATCGCCAAACACCCCGAGTGGAAGGACTTTGACTTTCACTTTTCCGAAAC
CCAGCTCTACAATGACATTTCCGGGTTCTATCGCGAGGTGGAACACCAGGGATATAAGATGAGCTTTAGA
AAGATCCCAGCCACCTACATTGATCAGCTCGTGGAGAACAATGAACTGTACCTGTTCCAGATCTATAACA
AGGACTTTAGTGAATATAGCAAGGGCACCCCTAACATGCATACCCTGTACTGGAAGATGCTGTTTGACGA
GAGAAACCTGGCTGATGTTGTGTATAAGCTGAACGGCCAGGCTGAGCTGTTTTACCGACCCGCCAGCCTG
AACTACAACCGGCCCACTCACCCTAAGAACGAGCCCATCACCAACAAGAACAAGAACAACCCCAAAAAG
GAGTCTATCTTCAAGTACGACCTGACTAAGGATAAGCGGTACACCCAGGATACCTTCCTGCTGCACGTTCC
CATTACCCTGAACTTCAAAGGCACTAATAATGGCAATATCAACCAGCAAGTCAACAGCTACCTGCAGACT
GCTGATAATACACACATCATCGGCATCGACAGGGGCGAACGCCACCTGCTGTACCTCGTCGTCATCGACA
TGAAGGGGAACATCAAGGAGCAGTTCTCCCTGAATGAGATCGCCAACCAGAACAAGGGGATTGAGTACC
GGACAAACTACCACCAGCTCCTGGAGAACAGGGAGAAGGAGCGGGTGGAGGCACGGGTGAATTGGCAG
AACATCGAGAACATTAAGGACCTGAAGGAGGGCTACCTGAGCCAAGTGATCCACCTGATTACCCAGCTCA
TGCTGAAGTATCACGCCATCGTGGTGCTCGAAGATCTCAACTTTGGCTTCATGAAGGGGAGACAGAAGGT
GGAGAAGTCCGTGTACCAGAAGTTCGAGAAGCAGCTCATCGATAAACTGAACTATCTCGTGAATAAGCAG
ATCGACGCCGAGAAGCCTGGAGGCCTGCTCAAGGCCTACCAGCTCGCCAAGCCTTTTGAGAGCTTTCAGA
AGATGGGCAAGCAGTCCGGCTTCCTGTTCTACATCCCCGCTTGGATGACATCCAAGATCGATCCTGTGACC
GGCTTCGTCAATCTGCTGAACACCAACTACGTCAACGTTAAGGAGTCCCAGAAGTTTTTCAGCAACTTCGA
CCGGATCGCCTACAATCCAGAGAAGGACTGGCTGGAGTGGGATATTGACTACAATAAGTTCACCACTAAG
GCCAAGAATAGCAGGCACAACTGGACCATCTGTACCCAGGGCGAGCGGATCGAGAATCACAGGAATGAG
AAGAACGGCCAGTGGAACAGCCAGAACGTCAACCTGACCGAGGAGTTTAAGAAGCTGTTCGCACTCTAT
GACATCGACCTGGCCCAGGATCTGAAGAAGTACATCATCCAGCAGAATGACGCTAAGTTCTTTAAAGAGC
TGCACAGAATCCTGAAGCTGACCCTGCAGATGAGGAACTCCCAGATCAACAGCGACATTGACTACCTCGT
GAGCCCCGTGGCCAACGCCGAGGGCTGCTTCTACAATTCCCAGACCGCTAACGCCACCCTGCCAGCCAAC
GCCGACGCCAACGGGGCCTACAATATCGCCCGCAAGGGCCTGTACCTGCTGCAGCAGATCAAGAAGGCC
CCTGACCTGGCCAAGCTGAAGCTCACCATCTCTAACGAGGAGTGGCTGAAGTTCGCCCAGGAGAAAACCTACCAGAATGAC;
SEQ ID NO.19 (28 c13 gene sequence):
ATGTTTAACCAGTTCACCAACCTGTACCCAGTGATTAAGACCCTGAGATTCGAGCTGAAGAGCATCGGCAACACTATGGACACTATCGAGAGCAATCAGGTCATCCACAATGACGAGAAGAGGGCCGACGCCTACGCCAAGCTGAAGGTGACCCTCGATGCCTACCACAAGGATATTATTGAGAAGGTGCTGAGCCGCGCCAGACTGACCGGCCTGGAGGACTACGCCATCGCTGTGAACAACCTGAAAACCTCTAAGGGCAACGCCGCTTACGGCAAAGAGCTGACCAAGAACAAGGAGCAGTTGAGAAAGCAGATCGCAGGATTCTTCAAGCAGCCCGAGTTCGCCCCAATTTTCAAAGATCTGTTCAAGGAGGGCGTGATCAAGAAAGACGTTAAGGCCTGGATCGACACCCAGCCTAACCCTAGCGATTACTTCTACTCCGATGACTTCGCCAATTTCACCGGCTACTTCGGCAACTATAACCTGATCCGGCAGAACCTGTATAGCCCTGAGGCTAAGCACGGCACCATCGCCTATCGGCTGATTGACGAGAACCTGCCCAAGTTCATCGACAATCTGAGCATTCTGCAGAACATTCAGAATAAGAATCCCGACCTGTTCGACCAGTTGAGCGACCAGTACCAGCAGTACTTCAGCGAGCTGCTGCCTTCTAAGCCTACACTGGCCGACTTCGTGAGCCTGGACACCTTCAATGATCTGCTGACCCAGAAAGGCCTGGACGCCTACCAGCAGATCATCGGCGGCATCAAGACTGAGAACCAACTGATCCAGGGCATTAATGTGCTGATCAATCTGCACAACCAGCAGCACCCCGAGCAGAGCAAGACCCCCAAACTGAAGCCCCTCTATAAGCAGCTCCTGTCCGACCGCGGCACTTTCAAGCTCCCACGGAAGTTTGAGGATGACGCTGAAATGATCCAGGCCAACCGCCAGTACTTCGAGGAGGTGCTGGGCAACAACACTCTGTTCGAGACCGGCGAAACACCCACCGAAGCCATGAACCAGCTTTTCCTGAGCATCGAGAATTACGATCTGAGCAAGATCTTCATCGAGTCCCCCCTGCTGGTGACCTCCATCTCCCAGAAGATCTATGGCTCCTATGCCGTGATTCCCCAGGCCCTGGAGTACTACCACGATAATCACGTTAACCCCTCTTACGCCGCCAAGTTCAATAAGGCCAAGTCCGACAAGAGCAGGGAGACTATGGAAAAGGCCAAAGCCGCCTGGGTGAAAGGCGTGCACGCCGTGAGTGTGATCCACCAGGCTGTGATCGCATACAATGATGTGCTGCCTGATGACGCAAAGCTGACAGATACCCAGCCCGTGATTAGCTACTACAAGGACATCCAGTACTCCGAAAAGACTGGCGAGTCCCAGCAGATCTTCGATGCCCTGATGCGCCGCTACCACCAGGCCAAAGGCATGCTGAATACTGATTACCCAAAGGGCTCCAAGCAGATCCTGAACAACAAGTCTAGCTTCGCCATCGTGAAAAACCTGCTGGATGTGTCCAAGGCCTACGTGAACGCCGCCCGCGATCTGACAATCAAAAAGCCCGAAGGCCTTGACCTGGACCTGCTGTT
CTACGAGAGGCTCGCCAAAACTTACACATACCTGCAGGACCTGCACGCACTGTACGACACCACGAGAAAC
TACGTGACCCAGAAACCTTTCTCCACCGATAAGATCAAGCTGAATTTTGACTGCGCTCAGCTCCTGGCCGG
GTGGGACTTTAATGTGATCGATGCCAAGAGGGGCGTGTTTCTGGTCAAGAATGGGCGGTATTACCTCGTC
ATCATCGATAATAAGCATAAGAAGGCCATGAATAACCTGCCCGCTCCTATCACTAATAACTGCTACGACA
AATATAACATGAGACTGAGTAAGGACGCCCACATGGCCCTGCCTAAAAAGCTCTTTACCAAGGATAACCT
CAAGATCCCTGCCATTGCCGAGATGGAGCGCAGGTGTCGGGACAAAAATGGCGGCCACCACCTGAGGAA
GAGTCCCGACTTTGATAAGGACTTTATGCACCAGATGATTGACACCTTTAAGGACATTATCAAGAAGGAC
AAGGACTTCGACGTTTTCGGCTTCCAGTTTAAGCCCACTCACCAGTACGAGGACATCAATGAGTTTTACGC
CGACTTCAATGAGCAGGCCTTAGTGACTTGGTACGATAAGGTTGATAGCGATGTGATTGATAGCCTGGTG
GCCGAGGGGAAGATCTACCTGTTCGAAGTGTACTCCAAAGATTTTAGCGACAAGAGTACCGGGACTCCCA
ACCAGCAGAGCCTGATCCTGCAGTACCTGTTCTCTCAGGATAATCTGGCCAAAAGGCACTTTAAGCTGAA
CGGCGAAGCCGAAGTGTTCTACCGGAAGGCCTCTATTGATAAGGACAAGGCCGTGGTGCATAAGAAGGG
CTCCCTGCTGGAGAACAAAAACCCTGCACGGCCCAATTCTAAGATCGCTAAGTTCGACATTGTGAAGGAT
AGACACTACACCGAAGATAAGCTGTTCCTGCATATCCCAATCACACTGAACAACAATGCCGCCGACATGA
AATCCTACGCTATGAATAGCAAGGTGCTGAACACCCTGAAAACAAACGGAGGCGTGAACGTGATCGGCA
TTGACAGAGGGGAAAGAAATCTGCTGAAGATCACCGTGATTAATAGTGCCGGGGAGATCTTGCATCAGG
AGTCCCTGAATAAGATCACTAGCGGGCAGGACATGGTGACTGATTACCATGAGCTTCTGGACAAGAAGGA
GCAGAGCCGCGCTGAGTCTAGGCTGAATTGGCAGGAGGTCGAATCCATTAAGGAGATCAAGCAGGGCTA
CCTGTCCCAGGTGGTGTATAGACTGTCCCAACTGATGCTGCAGTATAAAGCCATCGTGGTGCTGGAAGAT
CTGAATATCGGCTTTAAGCGCGGGAGGTTTAAGATCGAGAAACAGGTGTACCAGAATTTCGAGAAGGCCC
TCATCAACAAGTTAAATTACCTCGTGCTGAAGCAGTTGGAGGCTACCGAGGTGGGGGGCACTGCTCATGG
ATACCAGCTCACAGCCCCCTTTGAGAGCTTTCAGAAGCTGGGGAAGCAGTCTGGCTGGCTCTTTTACGTCC
CCGCCTGGAATACATCCCATATTGACCCCACCACAGGCTTCGTGAACCTGCACCACTTCAAATACGAGAG
CGTCGCCCAGGCAACAGACATCATCGACAAACTGAGCAATATCCGCTACAATCCAGAGAAGGACTACTTC
GAGTTCGCCATTGACTACAACGAGTTCACTTTTAAGGGGGGCGACAGCCAGAAGTACTGGGTGGTGTGCT
CAACCCCTTACAAGAGGTACGTGTTTGATAAAAAAGCCAACATGGGCAGAGGCGGCACCAAGGCCGTGG
ATGTGAACGCCGAGCTGAAGGCCCTCTTTGCAGCCCACGGCGTGGATTATGCAAGCGGAGAGGATCTGAG
GCCCCAGATTAAGGCCAAGGCCAACAAGGAGCTGCTGAGTCAACTGCTGTTTCTGCTGAAAACCCTGACC
GCCATGCGGTACACCAACGCCAGCTCCTACGAGGACTACATCCTGTCTCCAGTGGTGAATAAGGCCGGAG
AGTTCTTTGACAGCAGGAAGGGCGACGCCACCCTGCCACTGGACGCCGACTCTAACGGGTCCTACCACAT
CGCCCTGAAGGGACTGTGCCTGCTGCAGAGGGTGTACGACTGGCGCGGCGAGGAGTTTAAGGGCCTGGACCTGTTCATCTCCAATAATGACTGGCTGAAGTTCGCCCAGGACCGGCAC;
SEQ ID NO.20 (28 c15 gene sequence):
ATGAGCAACACTAAGGACAACATCTTTAACAACTTCACCGGCATCTACCCCATCAACAAGACCCTGCGGTTCGAGCTGCGGCCCGTGGGCAAGACCTACGACCTGATCAAGGACTTCAAGAACGGGTACGTGGAGTCCATTGTGGCCATCGACGAGAAGCGGTCCGAGGCCCGGAAGCGGATCATCGAGATCATCGACGAGTACTACGAGGAGTTCATCAACACCGTGCTGAGCAAGAAGGTGTTCTACTCCGACGACATCTGGCAGACCTACACCAGCTACAAGGCCTACAAGAGTGACAAGCGGAACAAGGAGTTTGTCACACAAAAGGCCATCATGCGGAAGAAGATCAGCGATGCCTTCCAGAACGAGAAAACCAAGTTTAACCTGAAGGACTTCAAAGACCTGTTCGGCAAGAAGAGCAATCTGAAGGAGTCCCCCCTGTATAAGTGGTACAAGAACAAGCTGGACATCGGGGAGATCACGGGCGAGGATTTCGAGGACATCATCAAGATAATCACCTACTTCATCGGCTTCACCACCTCCCTGAAGGATTACCAGGAGAACCGGAACAACCTGTTCGTGGCCGAGGAGCAGAGCACCGCCATCAGCCACAGGATTATCGATGTGAACATGATTCGCTACTTCGAGAATTGTATCAGATTCGAGAATATGAAGGACTCCGAACTGCTGGAGGACATGGGGAAGTGGGAGAAGTACTTCGTGCCAGCTAACTACGACAATTTCTTCACTCAGGAGGGTATCGATAACTACAATGAGATTATTGGCCGGAAGTCCAAAGATCTCTACTATAAAGGCGTGAACCAGTTGATCAATGAGTATAGGCAGAAGAACAAGATCAAAAATAAGGATATGCCAACGATGAACCAGCTCTACAAACAGCACATCAGCAAGAACGGCGACAACGAAATCAACAACGACTTCTCCAACGAGAAAGAGATGCTGGAGCAGATCGAGCAAGCCTACATCACCAGCCTCGATAAGATCAATAGGATCGTGTCCTTCATCAATGAGAACATTACCGAAGGAAATAAGATCTTCATTAGGAAGGACTTCGTGACTAATATCAGTAACCGCCTGTTCGGGGAGTGGAACTTCATTAACAACGCCCTCTACAGCTACCTGAGCGGCCTGAGCGCAAAGAACAAGGAGCTGTTCGTGAAGCAGACAGAGGAGGTCATCAAGATCAGCGAGCTCCAGAACATCATCGACCTCTACATCAACAATCTGGATGAGGATGAGAAAGAGAAGTACCTCAAGACCGACGCCATCTACACCCACTTCTGCTCCTTCGATGTGTGCGGGGTGCAGAACGCATACTATGAGGCCAAGACCGTGCTCGCCGTGGACGAGATCAATAAGGACCGGGAGAAAGAGGAAGAGGGAGCCAAGCAGATTTCTAAGGTGAAGAAGCTGCTCGACGAGATCCTCGAAGCCGTCCACTTCTACAAGCCCCTTTACCTCTACAAGAACGGGAAGGAGATCGACGAGATTGAGAAGGATGAGATTTTCTACAGCGAGTTCGACTACCTGTATTCCCAGCTCATGCTGGTGACCGAGCTGTACGACAGGGTGCGCAACTACCTGACCAAGAAACCCTATAGCAAGGATAAATTCAAGATCTACTTTAACAAGCCTACACTGCTCGACGGCTGGGATCTGAACAAGGAGAAAAACAATCTGTCCGTGCTCCTCATCAAGGACGGCTTCTATTATCT
CGGCATCATGGACTCCAAGTACAATAGCGTGTTCGATGTGTCCGCAGACGATGTGAAGATCAACACCACC
GAGCTGTCCGAGGAGGCTACCTTCCTGAAGATGGAGTATAAGCAGGTGAGCGGAGCTTCCAAGATGTTCC
CCAAGGTGTTCTTCGCCGCCTCCAACAAGGACATGTTCAAGCCAAGCGAGGAGATTTTGAACATCCGGGA
GAATAAGCAGTACCTCAAGGGGGCCAATAACAGGGAGGCTGTAATCAAGTGGATCGATTTCTGCAAGGA
CTGTCTCAAGATCCATCCAGAATGGAACCGCTACTTTAACTTCAACTTCCGCCACAGCGACGAGTATGAG
AACGTGAATAGCTTCTATGAGGACGCCGATACTCAGATGTACTACATCAACTTCGTGAAGTTCAAGGAGA
CTTACATCAATGATCTGGTGGAGGAGGGGAAGCTGTTCCTGTTTCAGATCTACAACAAGGACTTCTCCGA
GTACTCCAAGGGCAAGCCCAACCTCCACACCGTGTATTGGAAGATGCTGTTCGACGAGAATAACGTGCGG
AACATCAATGACAATACCGGCAAGCCCGTGTTCAAGCTGAACGGCGAGGCTGAGATCTTTTATCGGAAGG
CCAGCCTGGATAAGAAGGTGACTCACAAGAAAAACTACCCTATCAAAAACAAGAATAAGCACAATAACA
AGACTGAGAGTATCTTTGAGTACGACCTCTACAAGGACAAGCGGTTCATGGATGACAAGTTCTTCTTCCAT
TGCCCCATCACCATCAACTACCGGGCCAAGAATATCCTGTCCAGCGAGTTCAATAAGAAGTTCAACTTGC
ACATCAAAAACAGCGATAACATGAACATTCTGGGCGTGGACAGAGGCGAAAGGCATCTGCTGTACTACTC
CCTGATCAACATTAAGGGAGGAATCATCAAGCAGGGGAGTCTGAACACCATCTACGATTCCTACGAAAAG
GACGGCATCAATATCCCCGTGATTACCGACTACAAGTCCATTCTGAAGGACCGCGAGGACGAGCGGATGG
ACTCCAGGAAGAACTGGGGCACCATCAAGAACATCAAGGAGATGAAGGAGGGCTATCTGAGCCATGTGG
TGCATCAGGTCAGCAAGCTCCTCATCGACAACAATGCCATCCTGGTCCTGGAGAACCTGAACAGCGGCTT
CAAGCGGCGCAGACTGAAGATCGAGAAGCAGGTGTACCAGAACTTCGAGAAAAGCCTGATCAACAAGCT
GAACTACCTCGTCCTGAAGGATGCCGATAACAAGGATGTGGGGCACTTCCTGAAGGGCTACCAGCTCACC
GCTCCTTTCGAGGGGTTCCAGCGCCTGAACAACCAGTCCGGCATCATCTACTACGTGTGGCCCAGCTATAC
CAGCAAGATCTGCCCCCGCACCGGTTTCGTGAGCCTCCTGCACATCAACTACGAGAACATCGAGAAGTCC
AAGGAGTTCTTTAACAAGTTTGACAAGATCTCATATAACAAGGACAAGGACTACTTCGAGTTCCACCTGG
ATTACACCCGGTTCGGGAAGAACGCTGGCAAGAACAAGTGGGTCATCTGCACTTACGGCAAGGATCGCTA
CTTCTTCAACCAGAAGCTGAAGAAGTACGAGTACATCGACATCACAGAGAAGATCAAGGAGCTGCTGAG
CAACAACGGGATCGACTTCATCAACGAGAACGACATGCGCAAGTCCATCGTGGAGAACAACTCCAAGAA
CTTCTTCGGCTCCCTGCTGTTTTACCTCAAGGTCGTGATGCAGTTGCGCTACACCAACAGCAACGACGGGT
GCCGGAATGAGAACGACTACATCCTGAGCCCCGTGGCCGACATTAACGGCATGTTCTTCGACTCCCGGCA
CGCCTGCGACAACGAGCCCGAGAACGCCGACGCCAACGGGGCCTACCACATCGCTCTGAAGGGCCTGCG
CATGATCCAGTTCATCGAGAACGGCGTGATCACCAAGCAGGGCAACGAGACCACCGACTGGTTCAAGTTCGCCCAGAATAAGCTG;
SEQ ID NO.21 (30 c9 gene sequence):
ATGAGCGCCCAGAGCGCCCTGAGCACCCTGATCAACAAGTACAGCCTGAGCAAGACCCTGCGCTTCGAGCTGATCCCCATCGGCAAGACCAAGGAGAGCATCGACCGGAAAGGCCTGCTGAGCCAGGATGTGAAGCGAGCCCAGTCCTACAAGGAGGTGAAGAAGATCATCGACGAGTACCACAAGGAGTTCATCGAGAAGTCCCTGATCAACGCCAAGCTGAAGGGCCTCGAAGAGTTCAGCAAGCTGTACTACAAGCTGCAGAAGGAGGACAAGGATAAGAAGAATATCAAGAAGATGCAGGATAACCTGCGCGAGCAGATCTCCGACCTCTTCAAGAACAACAAAAAGGACAAGTGGAACATCCTGTTTAAGGAGGACCTGATCAAGAAGGAGCTGCCACTGTTTGCGAAGGATGATAAGCAGAAGAACCTGATCAATGAGTTCAACAAGTTCACCACATACTTCACCGGCTTCCACAAGAACCGGAAGAACATGTACGCCGAGGAAGAGAAGTCCACCTCTATTCCCTACCGGATCATTCACCAGAATCTGCCTAAGTTTCTGGATAACATCAGGATTTTCGAGAAGATTAAGAAGAACAAGATCAACACTGACGTAATCGAGAAGGAGCTGAGTCTGTTCCTGAACGGAATCAAGATCAACGATATTTTCAGCATTAACTTTTTCAACGATGTGCTGAACCAGAAGGGCATCACCTTCTATAACACCATCCTGGGCGGAGTGAGCGAGAAGGACCGCACCAAGATCAAGGGCATTAATGAGTATGTGAACACCGAGTACAACCAGAAGCAACTGGACAAGAAGAGCAAGATCCCCAAGCTGAAGCAGCTCTACAAGCAGATCCTGAGCGACACCGAGACCGCCAGCTTCGTGCTGGAGCAGTTCGAGAACGACAACCAGCTCCTGGAGAAGATCGAGCAGTTCTACAACACAGAGCTCATCAATTACGAGACCGAGGGCAAGACCCAGTCCGTGTTCCTGCAGTTTGAGCAACTGTTTAAAAACATGCAGAATTACGACGCCTCCAAGATCTACATTAGCAATCTCTCCATCGCTAACATCAGCAAGATCATCTTCGGCGACTGGTCCATCATCTGCAACGCCCTGGCCGAGTGGTACGACAAGCACAACACAAAGGGGAAGAAGATTAACGAGTATAAGAAGGAAAACTTCCTGAAGCAGGATTTCAGCATCCAGCAGATTGAGGACGCCGTGCTGGAGTACAAGAACGACACCTTGAACAAGGAGATCAACTTCCTCCTGAACTACTTCGCCAGCTTCCTCAACGAGAAGTCCAAGAAAAACATCATCCAGCGCATCGAGACCGAGTACTCCAAGGTGAAGGACCTCCTGAACACCGATTACCCCGAGAAGAAGAAGCTGGCCAGCGACAAGGACAACGTGAGCAAGATCAAGGCCTTCCTGGACTCGCTGATGGACTTTCTGCACTTCGTGAAACCCTTCAATATTAAGAAGGACACAGGGCTGGAGAAGGAGGAGAACTTCTACTCCATCTACGTGCCCCTGTTCGAGCAGATCGACAAGATCATCCCCCTTTACAACAAGGTGCGCAACTACCTGACCAAGAAGCCCTATAGCACCGAAAAGATCAAGCTGAACTTCGAGAACAGCACCCTGCTTGACGGCTGGGACCTGAACAAGGAGTCCGACAACACTAGCGTGGTGCTGCGCAAGGACGACCTCTACTACCTGGGCATTATGGATAAGAAGCACAATCGGATCTTCAAAGAACTGCCCAGCCAGAACGGCAATGAGAGTAGCTATGAGAAGATGATCTACAAGCTGCTGCCGGGGCCAAATAAGATGCTGCCCAAGGTGTTCTTCTCCAAAAAGGGCAAGAAGCAGTTCAAGCCCTCCAAGAAACTTCTGAAGAAGTACGAGGACGGGACCCACCTGAAGGGCGATAACTTTAATATCAATGACTGCCACAACCTGATCGACTTCTTTAAGGAGTCCATCGCCGAGCACGAGGACTGGAAGCAGTTCGACTTCAAGTTTAGCAGCACAAGTAGCTACAAGGACCTGTCAAATTTCTATAAGGAGGTGGAGAAACAGGGCTACAAGATCACATTCCAGAACATCTCTGAGAACTATATCAACCAGCTCATCGACGAGGGCAAGCTCTACCTGTTCCAGATCTACAATAAGGACTTCAGCAAGTACAGCAAGGGGACCCCCAACCTGCACACCCTGTACTGGAAGATGCTGTTTGATAACGACAACCTGAAGAACATTGTGTATAAGCTGAATGGCAAGGCCGAGGTGTTCTACCGCAAGTCCTCCCTGATCCTGGGGGACAACATCGTGCACAAGGCTGGCGAGGCAATCATCAACAAGAACCCCGACAACGAGAAAAAGCACAGTACCTTCGATTACGACCTGATTAAGGACAAACGCTTCACCCTCGACAAGTTTCAGTTCCATGTGCCCATTACCCTGAACTTCAAGAGCGAGGGGAGGCAGAACCTGAACGAGGATGTGAGGAAGTTCCTGAAGAACAACCCTGACATAAACATCATCGGTATCGACCGGGGGGAGCGGCACCTCCTGTACCTGACCCTCATCAACCAGAAGGGAAAGATCCTCTTCCAGAAAAGCCTGAACGAGATCACCAACGAGTACAATAACAAGAACGGTAAATCCCAGATCAAGAGCACCAACTACCACTCCCTGCTCGACAAGAAGGAGAAGAAGCGCGATGAGGCCCGCAAGAACTGGGGCATAATCGAGAACATCAAGGAGCTGAAGGAGGGCTACATGAGCCAGATCGTCCACTATATCAGCAAGCTGATGATCGAGAAAAACGCCATTCTGAGCCTTGAGGACCTGAACTTCGGGTTCAAGCGCGGACGCCAGAAGGTCGAGAAGCAGGTGTACCAGAAGTTCGAAAAGATGATGATTGACAAGCTCAACTACCTTGTGTTCAAGGACAAGAAGGCCAACGAGACCGGCGGCCTGCTCAATGCCCTGCAATTGACTAACAAGTTCGAGTCCTTCGCCAAGCTGTATAACCAGTCCGGGTTCATCTTCTACGTCCCAGCTTGGAACACCAGCAAGATCGACCCAATCACCGGCTTTGTGAACCTCCTGAAGCCTTACTACGAGAACCTGAATAAGAGCCAGGAGTTTTTCAAGAAGTTCAACAACATCAAGTACAACCCTAAGCAGGAGTACTTCGAGTTCAACTTCGACTACAAGAACTTCACCAACAAAGCCGAGGGCAGCAAGAACGTCTGGGAGATCTGCACCACTAACAATGAGCGGTTCATGTGGGACAAGACCCTGAACAGCGGCAAGGGCGCTCAGAAGGCCGTGGATGTGACACAGGAGCTGAAGAAGCTGTTTGACAGCAGCAAGATCAACTACCTGAACGGAAACGACATCAAGGAGGACATTATCAATCAGAACTCCGCCGACTTCTTTCGGAAGCTGATGAAGCTGCTGTCCGTGGTGCTGAGCCTGCGGCACAACAACGGCCTGAAGGGGAAGGACGAGAAGGACTTCATCCTGAGCCCCGTGGAGCCCTTCTTTAACAGCCTGAACGCTAAGATGGAGGAGCCTAAGGACGCCGACGCTAACGGCGCATACAACATCGCCCTGAAGGGCCTGCTGATCCTGAAGCAGATTAACGAGAGTGAGGACCTGCGCAAGATCAAGTTCAACCTGAGCAATAAGGAGTGGCTGAAGTTCGCCCAGTCTAAGAGCTTC;
SEQ ID NO.22 (Library-spacer):
ATGGCGAATACTTTTAAAGTCAT;
the primer sequences are as follows:
library-NGS-F:
ACACTCTTTCCCTACACGACGCTCTTCCGATCTgtctacaatcggctcgatcga;
library-NGS-R:
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTgcgcagaccaaaacgatctc。
example 4: in vitro cutting experiments
This example demonstrates that the 6V-type CRISPR/Cas12a gene editing system of example 1 of the present invention has cleavage capability in vitro and that PAM identified in example 3 of the present invention is correct by in vitro cleavage experiments.
The specific operation is as follows:
(1) According to the PAM result identified in example 3 above, a suitable target site is found on the CDKN2A gene, DNA fragments with the same target sequence but different PAMs are obtained by annealing reaction, primers are designed on both sides of the target site to obtain DNA fragments with the length of 2000bp, and the target sequences selected by the six CRISPR/Cas12A systems and PAM sequences to be tested are shown in table 1.
TABLE 1 in vitro cleavage of target sequences and PAM sequences
(2) Using the repeat sequence of the V-type CRISPR/Cas12a gene editing system described in example 1 above, crrnas comprising the repeat sequence and corresponding CDKN target sites were obtained by in vitro transcription; incubating the Cpf1 purified protein of the embodiment 3 with crRNA to form a complex, then incubating with CDKN2A-DNA fragment at 37 ℃ for 30min, adding a proper amount of proteinase K, incubating for 15min at room temperature, and inactivating for 10min at 98 ℃; in vitro cleavage results were detected by 1.5% agarose gel.
The detection results are shown in FIG. 4. The results show that: the six CRISPR/Cas12A gene editing systems have good cutting capability in vitro, and CDKN2A-DNA fragments with the length of 2000bp can be cut into two fragments with the lengths of 500bp and 1500bp respectively, which indicates that Cpf1 protein obtained by expression and purification has biological activity; these six Cpf1 proteins may be correctly cleaved by recognition of a target complementary to the spacer sequence under the guidance of the crRNA.
Example 5: dsODN insertion experiments
In the embodiment, the editing capability of the 6V-type CRISPR/Cas12a gene editing system in the embodiment 1 of the invention for targeting target genes in eukaryotic cells of mammals is verified through a dsODN insertion experiment.
The specific operation is as follows:
(1) According to the six Cpf1 proteins of the embodiment 1 of the invention, the human codon optimization is carried out, and the corresponding nucleotide sequences are cloned into a PX330 eukaryotic expression vector (adedge, 59909) to obtain the PX 330-protein eukaryotic expression plasmid.
(2) In mammalian cells, taking HEK293T cells as an example, selecting an endogenous CDKN2A gene, taking a PAM sequence which can be identified and cut under in vitro conditions and is identified in the embodiment 3, finding a proper target site, cloning a direct repeated sequence-crRNA spacer sequence-3 'with a sequence format of 5' -combined with six Cpf1 proteins onto a PXZ carrier (adedge, 160229) by a Gibson method, constructing PXZ-CDKN2A target plasmids with different PAM targeting different target sites, simultaneously transfecting PX 330-protein eukaryotic expression plasmids and PXZ-CDKN2A target plasmids, taking LtCpf1 as a positive control group, and transfecting only PX 330-protein eukaryotic expression plasmids as negative controls. The CDKN2A gene target sites and corresponding PAMs selected in this example are shown in Table 2.
TABLE 2 true check of target sites, PAM sequences and detection primers used
(c) PX 330-protein eukaryotic plasmid, PXZ-Cpf1 protein-CDKN 2A target plasmid and 1.2 mu LdsODN are co-transfected in a HEK293T cell 24 pore plate with good growth state, and the cells are harvested after 72 hours to extract DNA.
(d) And designing primers (see table 2) on the upstream of the CDKN2A gene target point and on the dsODN sequence to carry out dsODN-PCR amplification, detecting whether a target band appears by running agarose gel to judge whether dsODN is inserted, and verifying whether the V-type CRISPR/Cas12A gene editing system has editing capability in a eukaryotic cell environment by detecting the insertion condition of the dsODN.
The result of dsODN-PCR electrophoresis is shown in FIG. 5. The results show that: the PCR bands of corresponding length are marked with red triangles, and the six CRISPR/Cas12a gene editing systems of example 1 all have cleavage capacity in eukaryotic cells.
In conclusion, six brand new V-shaped CRISPR/Cas12a gene editing systems are first discovered through metagenomic bioinformatics analysis, and direct repeated sequences corresponding to the six brand new V-shaped CRISPR/Cas12a gene editing systems are predicted. The Cpf1 proteins of the 6 novel editing systems were named: 28c2, 28c6, 28c12, 28c13, 28c15 and 30c9.Cpf1 is used as a single RNA guided endonuclease, only crRNA is needed for targeting, the whole volume is smaller than Cas9, and in vivo delivery is more convenient; the guide RNA design of Cpf1 is simpler and more convenient than Cas 9. Experiments prove that the six CRISPR/Cas12a gene editing systems can identify the respective unique PAM sequences and can perform gene editing functions in vitro and eukaryotic cells under the guidance of crRNA. The discovery of the novel six gene editing systems further expands the variety of gene editing tools, enriches the PAM diversity of the existing Cpf1 as the gene editing tools, provides more tool choices for the application of the Cpf1 in clinical treatment, and plays an important role in promoting the application of gene editing in clinical treatment.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted equally without departing from the spirit and scope of the technical solution of the present invention.

Claims (8)

1. The Cpf1 protein is characterized in that the amino acid sequence of the Cpf1 protein is shown as any one of SEQ ID NO. 1-6.
2. A nucleic acid encoding the Cpf1 protein of claim 1, wherein the nucleic acid has a base sequence as set forth in any one of SEQ ID nos. 16 to 21.
3. A V-type CRISPR/Cas12a gene editing system comprising the Cpf1 protein, an accessory protein, and a CRISPR array of claim 1.
4. The V-type CRISPR/Cas12a gene editing system according to claim 3, wherein said CRISPR array comprises a direct repeat sequence and a spacer sequence; the direct repeat sequence and the spacer sequence are spaced apart.
5. The V-type CRISPR/Cas12a gene editing system according to claim 4, wherein the nucleotide sequence of said direct repeat is as shown in any one of SEQ ID nos. 10 to 15.
6. The V-type CRISPR/Cas12a gene editing system according to claim 3, wherein the amino acid sequence of said helper protein is as shown in any one of SEQ ID nos. 7 to 8.
7. Use of the V-type CRISPR/Cas12a gene editing system as defined in any one of claims 3 to 6 in prokaryotic or eukaryotic gene editing.
8. Use of the V-type CRISPR/Cas12a gene editing system of any one of claims 3 to 6 in the preparation of a biological gene editing formulation.
CN202310510289.0A 2023-05-08 2023-05-08 Cpf1 protein, V-type gene editing system and application Active CN116751763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310510289.0A CN116751763B (en) 2023-05-08 2023-05-08 Cpf1 protein, V-type gene editing system and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310510289.0A CN116751763B (en) 2023-05-08 2023-05-08 Cpf1 protein, V-type gene editing system and application

Publications (2)

Publication Number Publication Date
CN116751763A true CN116751763A (en) 2023-09-15
CN116751763B CN116751763B (en) 2024-02-13

Family

ID=87948550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310510289.0A Active CN116751763B (en) 2023-05-08 2023-05-08 Cpf1 protein, V-type gene editing system and application

Country Status (1)

Country Link
CN (1) CN116751763B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205711A1 (en) * 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
CN109312316A (en) * 2016-02-15 2019-02-05 本森希尔生物系统股份有限公司 The composition and method of modifier group
CN111757889A (en) * 2018-10-29 2020-10-09 中国农业大学 Novel CRISPR/Cas12f enzymes and systems
CN111836894A (en) * 2017-11-21 2020-10-27 韩国生命工学研究院 Genome editing compositions using CRISPR/Cpf1 system and uses thereof
CN112331264A (en) * 2020-09-11 2021-02-05 中山大学附属第一医院 Construction method of homologous type 2 CRISPR/Cas gene editing system
CN112703250A (en) * 2018-08-15 2021-04-23 齐默尔根公司 Application of CRISPR in high-throughput metabolic engineering
CN113234701A (en) * 2020-10-20 2021-08-10 珠海舒桐医疗科技有限公司 Cpf1 protein and gene editing system
US20230056843A1 (en) * 2019-08-19 2023-02-23 Southern Medical University Construction of high-fidelity crispr/ascpf1 mutant and uses thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205711A1 (en) * 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
CN109312316A (en) * 2016-02-15 2019-02-05 本森希尔生物系统股份有限公司 The composition and method of modifier group
CN111836894A (en) * 2017-11-21 2020-10-27 韩国生命工学研究院 Genome editing compositions using CRISPR/Cpf1 system and uses thereof
CN112703250A (en) * 2018-08-15 2021-04-23 齐默尔根公司 Application of CRISPR in high-throughput metabolic engineering
CN111757889A (en) * 2018-10-29 2020-10-09 中国农业大学 Novel CRISPR/Cas12f enzymes and systems
US20230056843A1 (en) * 2019-08-19 2023-02-23 Southern Medical University Construction of high-fidelity crispr/ascpf1 mutant and uses thereof
CN112331264A (en) * 2020-09-11 2021-02-05 中山大学附属第一医院 Construction method of homologous type 2 CRISPR/Cas gene editing system
WO2022052211A1 (en) * 2020-09-11 2022-03-17 中山大学附属第一医院 Homologous type 2 crispr/cas9 gene editing system and construction method therefor
CN113234701A (en) * 2020-10-20 2021-08-10 珠海舒桐医疗科技有限公司 Cpf1 protein and gene editing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭婷等: "多重基因组编辑中CRISPR-Cas9系统和CRISPR-Cpf1系统的应用和比较", 中国细胞生物学学报, vol. 41, no. 11, pages 2234 - 2244 *

Also Published As

Publication number Publication date
CN116751763B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
US11713471B2 (en) Class II, type V CRISPR systems
Murray et al. Nucleotide sequences of transcription and translation initiation regions in Bacillus phage phi 29 early genes.
RU2237715C2 (en) Method for preparing insertion mutations
WO2022199511A1 (en) Lt1cas13d protein and gene editing system
CN113234701B (en) Cpf1 protein and gene editing system
CN112430586B (en) VI-B type CRISPR/Cas13 gene editing system and application thereof
CN114075559A (en) Type 2 CRISPR/Cas9 gene editing system and application thereof
Huang et al. Engineered Cas12a-Plus nuclease enables gene editing with enhanced activity and specificity
Fitzgerald et al. Rapid shotgeun cloning utillizing the two base recongition endonuclease Cvi JI
CN116751764B (en) Cas9 protein, type II CRISPR/Cas9 gene editing system and application
US20040091886A1 (en) Method for generating recombinant polynucleotides
CN116751763B (en) Cpf1 protein, V-type gene editing system and application
EP3676396B1 (en) Transposase compositions, methods of making and methods of screening
CN113549650B (en) CRISPR-SaCas9 gene editing system and application thereof
RU2804422C1 (en) Genomic dna editing system of an eukaryotic cell based on the nucleotide sequence encoding the sucas9nls protein
RU2712492C1 (en) DNA PROTEASE CUTTING AGENT BASED ON Cas9 PROTEIN FROM DEFLUVIIMONAS SP.
CN116179513B (en) Cpf1 protein and application thereof in gene editing
RU2712497C1 (en) DNA POLYMER BASED ON Cas9 PROTEIN FROM BIOTECHNOLOGICALLY SIGNIFICANT BACTERIUM CLOSTRIDIUM CELLULOLYTICUM
CN116004762A (en) In-vitro shearing efficiency kit based on CRISPR-Cas9 technology and application thereof
WO2024119052A2 (en) Genomic cryptography
JP2024509047A (en) CRISPR-related transposon system and its usage
JP2024509048A (en) CRISPR-related transposon system and its usage
CN118006584A (en) Programmable nuclease with CRISPR loci completely deleted from Cas1, cas2 and Cas4 and application thereof
CN117866924A (en) Multi-sgRNA-mediated EXPERTPlus lead gene editing system and application thereof
EA042517B1 (en) DNA CUTTER

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant