CN115851664B

CN115851664B - I-B CRISPR-Cascade-Cas3 gene editing system and application

Info

Publication number: CN115851664B
Application number: CN202211136496.6A
Authority: CN
Inventors: 肖易倍; 陆美玲; 俞晨霖; 张钰雯
Original assignee: China Pharmaceutical University
Current assignee: Shenzhen Adibek Biotechnology Co.,Ltd.
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-08-25
Anticipated expiration: 2042-09-19
Also published as: CN115851664A

Abstract

The invention relates to an I-B CRISPR-Cascade-Cas3 gene editing system and application thereof. The gene editing system consists of a Cascade complex and Cas3 protein, wherein the Cascade complex is formed by compounding Cmx8 protein, cas5 protein, cas6 protein, cas11 protein and crRNA; the application is used for identifying, combining and editing prokaryotic genes or eukaryotic genes. The I-B type CRISPR-Cascade-Cas3 gene editing system can enable a single CRISPR target site to form long fragment deletion with different degrees, thereby making up for the blank that the capability of generating the deletion of the long fragment of the prior CRISPR-Cas9 is relatively limited.

Description

I-B CRISPR-Cascade-Cas3 gene editing system and application

Technical Field

The invention relates to an I-B type CRISPR-Cascade-Cas3 gene editing system and application, and belongs to the technical field of biological medicines.

Background

The CRISPR-Cas (Clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins) system is an RNA-mediated "adaptive immune system" that resists foreign nucleic acid invasion found in bacterial and archaeal genomes " ^[1] . The CRISPR-Cas gene cluster comprises CRISPR loci storing exogenous nucleic acid sequence information and Cas genes encoding different functional proteins. The CRISPR locus comprises a leader, repeat and spacer sequence leader ^[2] 。

CRISPR-Cas system function is largely divided into three phases (as shown in fig. 1): 1. adaptation phase (adaptation): bacteria recognize and capture foreign invaded nucleic acid fragments through Cas proteins, integrating them as novel spacer sequences into CRISPR groupsThe seat is internally provided with a seat; 2. CRISPR RNA (crRNA) maturation stage (timing): the CRISPR locus storing the foreign nucleic acid information is transcribed into precursor CRISPR RNA (pre-crRNA), processed into mature CRISPR RNA (crRNA) by Cas protein and some nucleases, and then forms a crRNA/Cas protein complex with Cas protein; 3. interference phase (interference): the crRNA/Cas protein complex is combined with target nucleic acid near PAM (Protospacer Adjacent Motif) sequence on target sequence through interaction and pairing of crRNA, and the Cas protein with cleavage activity specifically cleaves target genes at target sites ^[3] 。

CRISPR-Cas systems fall into two categories: class1 and Class2 (FIG. 2). The Class2 system includes three types (type II, type v, and VI) that utilize a single multi-domain protein, such as Cas9 or Cas12, to interfere with a target nucleic acid. Class1 systems account for 90% of the total CRISPR-Cas system, and are also classified into three types, type I, type III and type iv, and further into seven subtypes (I-a to F and I-U), with multiple Cas proteins and crRNA composed effector complex cascades (CRISPR-associated complex for antiviral defense) to perform corresponding functions ^[4] . Class1 systems include Cas3 (sometimes fused Cas 2), cas5, cas6, cas7, cas8, cas10, cas11, with different combinations in different subtypes ^[5] . Cas5 forms a framework for cascades with Cas7, and Cas7 proteins form 6-7 multiple copy number subunits, bind and support crrnas and affect complementary forms of crrnas and DNA; cas6 is responsible for processing pre-crrnas; cas5 is of smaller molecular weight, associated with substrate nucleic acid binding; cas8 and Cas10, etc. are responsible for recognizing PAM sequences during substrate DNA binding; cas3 proteins with helicase and nuclease activities are responsible for the final cleavage of the target nucleic acid and further degradation of DNA ^[6] 。

At present, the CRISPR-Cas9 gene editing technology based on a Class2 system is very mature and widely applied, and has been developed into a high-efficiency gene editing tool and a gene detection tool, and plays an important role in basic and application biology research. The system utilizes guide RNA (sgRNA) to recognize and bind to target DNA, and guides Cas9 protein to cut at target site to form DNA Double Strand Break (DSB) ^[7] . However, the method is thatEditing the genome by inserting or deleting bases into or from the target site by the post-cell through non-homologous end joining (NHEJ) and homologous recombination repair (HDR) ^[8] 。

The mechanism of action of the type I CRISPR-Cas system in Class1 is different from Cas 9. The working principle is as follows:

1. cascades recognize target double-stranded DNA (dsDNA) near PAM sequences, causing CRISPR RNA (crRNA) to form heteroduplex nucleic acid molecules with target single strands (strands complementary to crRNA), displacing non-target strands, forming an R-loop structure;

2. cas3 with helicase and nuclease activity is specifically recruited to the Cascade/R-loop complex, cleaving the non-target strand therein (PAM sequence on the non-target strand), preferentially nicking 7 or 9 nucleotides from the PAM sequence, the nicking DNA strand may cross the helicase domain of Cas3, thereby initiating subsequent DNA melting and degradation processes ^[9] 。

Despite the simplicity and ease of operation of the CRISPR-Cas9 system, problems such as off-target remain. In addition, although CRISPR-Cas9 cuts target DNA, some target proteins can be partially expressed due to exon skipping or translation reinitiation mechanism, and activity is generated to play a role, so that the efficiency of gene editing is affected ^[10] . Meanwhile, CRISPR-Cas9 has relatively limited capability of generating long fragment deletion and also limits the application thereof ^[11] 。

Compared with the CRISPR-Cas9 system, the composition of the effector complex in the Class1 system is more complex and precise, and the length of crRNA in Cascade is generally more than 30nt, so that compared with Cas9 in targeted pairing around 20nt, the CRRNA has higher specificity and the probability of off-target occurrence is smaller. Cascade complex must form a complete R-loop structure with target DNA to recruit Cas3 ^[12] . This property can prevent Cas3 from binding to DNA prematurely, causing non-specific cleavage. Most importantly, cleavage of the target DNA by Cas3 results in the deletion of long fragments (hundreds of bp to 100 kb) at the target site, an ability not available with current CRISPR-Cas9 systems. Thus, class 1CRISPR-Cas systemIs expected to become a gene editing tool with better effect and stronger function.

Because of the complex effector complex composition of the Class1 system, the CRISPR-Cas system type I was not reported for the first time for mammalian cell gene editing until 2019. The study achieved 13% and 60% knockout efficiencies in human embryonic stem cells (hescs) and human near haploid cell lines (HAPs 1) using the CRISPR-Cas system type I-E derived from Thermobifida fusca, and found that a large number of genomic deletions of varying lengths could be generated from a particular crRNA (left panel of fig. 3), demonstrating the excellent ability of type I gene editing tools to cause long fragment deletions ^[9] . Another study has also achieved good knockout efficiency in human embryonic stem cells (hESC) and human near haploid cell lines (HAP 1) using a type I-C CRISPR-Cas system ^[13] (right panel of fig. 3). Both studies show that there is great prospect in developing type I gene editing tools.

At present, I-E and I-F type researches in Class 1CRISPR-Cas systems are complete and thorough, but researches on structures, functional characteristics and action mechanisms of other subtypes are insufficient, so that people cannot comprehensively know and master Class 1CRISPR-Cas systems, and development and application of the Class 1CRISPR-Cas systems in the fields of gene editing and the like are also hindered. Studies have shown that the arrangement of Cas proteins in a type I CRISPR-Cas system is preserved in type I-B, with varying degrees of gene deletion or rearrangement for other subtypes, which may be evolutionarily more primitive. Therefore, if the I-B type CRISPR-Cas system gene editing tool can be developed and utilized, people can more widely and comprehensively understand and master the Class 1CRISPR-Cas system, and a foundation is laid for future wide application.

The Synechocystis sp.PCC 6714 strain is a unicellular cyanobacterium and is closely related to the widely studied model organism Synechocystis sp.PCC 6803. Both strains were isolated from the same freshwater pond of octovalvis california, wherein 16S rRNA was as high as 99.4% homologous, the gene content and predicted proteome were very conserved, evolutionarily of the same origin ^[14] . At the same time in earlier studies, synBoth the echoocys sp.PCC 6714 and the Synechoccys sp.PCC 6803 can be used for the preparation of chromosomal DNA ^[15] Is suitable for gene manipulation in a laboratory.

The references referred to above are as follows:

[1]Marraffini LA,Sontheimer EJ.CRISPR interference:RNA-directed adaptive immunity in bacteria and archaea.Nat Rev Genet.2010,11(3):181-190.

[2]D.Gupta,O.Bhattacharjee,D.Mandal,M.K.Sen,D.Dey,A.Dasgupta,T.A.Kazi,R.Gupta,S.Sinharoy,K.Acharya,D.Chattopadhyay,V.Ravichandiran,S.Roy,D.Ghosh,CRISPR-Cas9 system:A new-fangled dawn in gene editing,Life Sci 232(2019)116636.

[3]B.Minkenberg,M.Wheatley,Y.Yang,CRISPR/Cas9-Enabled Multiplex Genome Editing and Its Application,Prog Mol Biol Transl Sci 149(2017)111-132.

[4]P,VenclovasWhite MF,et al.Evolutionary classification of CRISPR-Cas systems:a burst of class 2and derived variants.Nat Rev Microbiol,2020,18(2):67-83.

[5]Makarova KS,Koonin EV.Annotation and classification of CRISPR-Cas systems[J].Methods in molecular biology,2015:47-75.

[6]Makarova KS,Wolf YI,Alkhnbashi OS,Costa F,Shah SA,Saunders SJ,Barrangou R,BrounsSJ,Charpentier E,Haft DH.An updated evolutionary classification of CRISPR–Cas systems[J].Nature Reviews Microbiology,2015,13(11):722-736.

[7]Minkenberg B,Wheatley M,Yang Y.CRISPR/Cas9-enabled multiplex genome editing and its application.Prog Mol Biol Transl Sci,2017,149:111-132.

[8]R.Tuladhar,Y.Yeu,J.Tyler Piazza,Z.Tan,J.Rene Clemenceau,X.Wu,Q.Barrett,J.Herbert,D.H.Mathews,J.Kim,T.Hyun Hwang,L.Lum,CRISPR-Cas9-based mutagenesis frequently provokes on-target mRNA misregulation,Nat Commun 10(1)(2019)4056.

[9]Dolan AE,Hou Z,Xiao Y,Gramelspacher MJ,Heo J,Howden SE,Freddolino PL,Ke A,Zhang Y.Introducing a Spectrum of Long-Range Genomic Deletions in Human Embryonic Stem Cells Using Type I CRISPR-Cas.Mol Cell.2019Jun 6；74(5):936-950.e5.

[10]Xiao Y,Luo M,Dolan AE,Liao M,Ke A.Structure basis for RNA-guided DNA degradation by Cascade and Cas3[J].Science,2018,361(6397).

[11]A.H.Smits,F.Ziebell,G.Joberty,N.Zinn,W.F.Mueller,S.Clauder-Munster,D.Eberhard,M.Falth Savitski,P.Grandi,P.Jakob,A.M.Michon,H.Sun,K.Tessmer,T.Burckstummer,M.Bantscheff,L.M.Steinmetz,G.Drewes,W.Huber,Biological plasticity rescues target activity in CRISPR knock outs,Nat Methods 16(11)(2019)1087-1093.

[12]Xiao Y,Luo M,Hayes RP,Kim J,Ng S,Ding F,Liao M,Ke A.Structure basis for directional R-loop formation and substrate handover mechanisms in type I CRISPR-Cas system[J].Cell,2017,170(1):48-60.e11.

[13]Tan R,Krueger RK,Gramelspacher MJ,Zhou X,Xiao Y,Ke A,Hou Z,Zhang Y.Cas11 enables genome engineering in human cells with compact CRISPR-Cas3 systems.Mol Cell.2022 Jan 13:S1097-2765(21)01137-0.

[14]Kopf M,S,Pade N,et al.Comparative genome analysis of the closely related Synechocystis strains PCC 6714and PCC 6803.DNA Res,2014,21(3):255-266.

[15]Joset F.Transformation in Synechocystis PCC 6714and 6803:preparation of chromosomal DNA.Methods Enzymol,1988,167:712-714.

disclosure of Invention

The main purpose of the invention is as follows: the I-B type CRISPR-Casade-Cas 3 gene editing system and the application thereof are provided to overcome the problems in the prior art, and the formed gene editing technical means can lead a single CRISPR target position to form long fragment deletion with different degrees, thereby compensating the blank that the capability of generating the deletion of the long fragment of the CRISPR-Cas9 is relatively limited at present, and enriching the gene editing tool kit.

The technical scheme for solving the technical problems is as follows:

an I-B CRISPR-Cas3 gene editing system is characterized by consisting of Cascade complex and Cas3 protein; the Cascade complex is formed by compounding a Cmx8 protein, a Cas5 protein, a Cas6 protein, a Cas11 protein and a crRNA; the amino acid sequence of the Cmx8 protein is SEQ ID NO.2; the amino acid sequence of the Cas8 protein is SEQ ID NO.4; the amino acid sequence of the Cas5 protein is SEQ ID NO.6; the amino acid sequence of the Cas6 protein is SEQ ID NO.8; the amino acid sequence of the Cas11 protein is SEQ ID NO.10; the amino acid sequence of the Cas3 protein is SEQ ID NO.12; the DNA fragment sequences for expressing crRNA are formed by arranging repeat sequences which are identical to each other and spacer sequences which are identical to each other or are different from each other at intervals, the head and the tail of the DNA fragment sequences are repeat sequences, the repeat sequences are 5'-gtgtccaaaccattgatgccgtaaggcgttgagcac-3', and the spacer sequences are designed according to target genes.

The I-B type CRISPR-Cascade-Cas3 gene editing system recognizes and cuts Cas3 proteins through Cascade complexes, is more strict, and can enable single CRISPR target sites to form long fragment deletion of different degrees, so that the blank that the capability of generating the deletion of the long fragment is relatively limited at present is made up, and a gene editing tool box can be enriched.

Preferably, the 3' end of the amino acid sequence of Cas8 protein is linked to a nuclear localization signal NLS; the 5' end of the amino acid sequence of the Cas3 protein is connected with a nuclear localization signal NLS; the amino acid sequence of the nuclear localization signal NLS is SEQ ID NO.14; the structure of the crRNA expressing DNA fragment sequence is: 5'-repeat sequence-spacer sequence-repeat sequence-3'.

More preferably, the coding gene sequence of the Cmx8 protein is SEQ ID NO.1; the coding gene sequence of the Cas8 protein is SEQ ID NO.3; the coding gene sequence of the Cas5 protein is SEQ ID NO.5; the coding gene sequence of the Cas6 protein is SEQ ID NO.7; the coding gene sequence of the Cas11 protein is SEQ ID NO.9; the coding gene sequence of the Cas3 protein is SEQ ID NO.11; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13; the most preferred PAM-DNA sequence for the I-B type CRISPR-Cascade-Cas3 gene editing system is 5'-atg-3'.

By adopting the preferable scheme, the specific detail characteristics can be further optimized, and a better gene editing effect can be obtained.

The invention also proposes:

the preparation method of the I-B CRISPR-cascades-Cas 3 gene editing system is characterized by comprising the following steps:

firstly, constructing a plasmid of a Cascade complex and constructing a plasmid of a Cas3 protein;

secondly, co-transferring plasmids of the Cascade complex into E.coli prokaryotic expression cells, and independently transferring plasmids of Cas3 proteins into the E.coli prokaryotic expression cells; respectively carrying out induction expression, and purifying to obtain a purified Cascade complex and Cas3 protein;

thus obtaining the I-B CRISPR-cascades-Cas 3 gene editing system.

Preferably, in the first step, in the plasmid of the cascades complex, the 3' end of the coding gene sequence of the Cas8 protein is linked with the coding gene sequence of the nuclear localization signal NLS; in a plasmid of the Cas3 protein, a coding gene sequence of a nuclear localization signal NLS is connected to the 5' end of a coding gene sequence of the Cas3 protein; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13;

in the second step, the E.coli prokaryotic expression cell is E.coli BL21 (DE 3); during purification, the expression product is subjected to affinity chromatography treatment to obtain crude protein extract, and then the crude protein extract is subjected to molecular sieve chromatography to obtain the purified target protein.

More preferably, the specific procedure of the first step is as follows:

constructing encoding gene sequences of Cmx8 protein, nuclear localization signal NLS, cas8 protein and Cas5 protein into a first plasmid, constructing encoding gene sequences of Cas6 protein and Cas11 protein into a second plasmid, and constructing a DNA fragment sequence expressing crRNA into a third plasmid; the first plasmid, the second plasmid and the third plasmid all belong to plasmids of Cascade complex; constructing coding gene sequences of nuclear localization signals NLS and Cas3 proteins into a fourth plasmid, wherein the fourth plasmid is a plasmid of the Cas3 proteins;

the vector of the first plasmid is pCDF-Duet-1, the vector of the second plasmid is pRSF-Duet-1, the vector of the third plasmid is pUC19, and the vector of the fourth plasmid is pET-28a.

The preparation method can rapidly, efficiently and highly-productively generate Cascade complex and Cas protein with high purity and activity, and can obtain a large amount of protein within two days by using an E.coli prokaryotic expression system and purifying an affinity column and a molecular sieve, thereby having high efficiency.

The invention also proposes:

the use of the I-B CRISPR-cascades-Cas 3 gene editing system described above to recognize, bind to, and edit a prokaryotic or eukaryotic gene.

The invention also proposes:

a method of cellular gene knockout, characterized in that the type I-B CRISPR-cascades-Cas 3 gene editing system described above is employed, comprising the steps of:

s1, electrically transferring a Cascade complex and a Cas3 protein into a target cell to knock out a target gene;

s2, detecting and analyzing to determine the knockout effect of the target gene.

Preferably, in S1, electrotransformation is performed using a Neon nuclear transfection system;

and S2, detecting and analyzing by adopting a flow analysis method, or detecting and analyzing by adopting a Long range PCR method and an NGS sequencing method.

The cell gene knockout method can knockout target genes in target cells, and has high gene knockout efficiency.

The invention also proposes:

a cell line or cell strain comprising the type I-B CRISPR-cascades-Cas 3 gene editing system described hereinbefore.

Compared with the prior art, the I-B type CRISPR-Cascade-Cas3 gene editing system can cause a single CRISPR target position to form long fragment deletion with different degrees, thereby making up for the relatively limited blank of the capability of generating the long fragment deletion of the CRISPR-Cas9 at present; the preparation method can quickly, efficiently and highly-productively generate Cascade complex and Cas protein with high purity and activity, and can obtain a large amount of protein within two days by using an E.coli prokaryotic expression system and purifying an affinity column and a molecular sieve, thereby having high efficiency; the method for knocking out the cell gene can knock out the target gene in the target cell, and has high gene knocking-out efficiency.

Drawings

FIG. 1 is a schematic diagram of the immune mechanism of a CRISPR-Cas system in the background of the invention.

FIG. 2 is a Class1 and Class2 classification diagram of CRISPR-Cas systems in the background of the invention.

Fig. 3 is a schematic diagram of a gene knockout experiment in human cells using a CRISPR-Cas system in the background of the invention, wherein the left diagram is an I-E type CRISPR-Cas system and the right diagram is an I-C type CRISPR-Cas system.

FIGS. 4 to 7 show, in sequence, plasmid maps of pCDF-Duet-1-cmx8-NLS-Cas8-Cas5, pUC19-CRISPR array, pRSF-Duet-1-Cas6-Cas11 and pET-28a-Cas3 constructed in example 1 of the present invention.

FIG. 8 is a schematic diagram of the design of crRNA sequence using tdTomato gene as target gene in example 1 of the present invention.

FIG. 9 is a molecular sieve chromatogram of the Cascade complex of example 2 of the present invention and an SDS-PAGE electrophoresis.

FIG. 10 is a molecular sieve chromatogram of the Cas3 protein of example 2 of the present invention and an SDS-PAGE electrophoresis.

FIGS. 11 and 12 are graphs showing the results of EMSA reaction of a single PAM sequence-containing DNA and Cascade complex in example 3 of the present invention.

FIG. 13 is a molecular sieve chromatogram of a Cascade-DNA-Cas3 ternary complex obtained by in vitro recombination of molecular sieves in example 4 of the present invention, and an SDS-PAGE electrophoresis.

FIG. 14 is a graph showing the results of gene knockout efficiency in hESC cell lines in example 5 of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings in combination with embodiments. The invention is not limited to the examples given.

Example 1

This example is a construction of plasmids for subsequent preparation of the components of the gene editing system of the present invention.

Amplifying coding gene sequences (sequences of SEQ ID NO.1, 3, 5, 7, 9, 11 and 13) of Cmx8 protein, cas5 protein, cas6 protein, cas11 protein, cas3 protein and nuclear localization signal NLS by using PCR, and transferring the recombinant plasmid into DH5 alpha competence by using a chemical conversion method after enzyme digestion and enzyme ligation of the recombinant plasmid; the correct recombinant plasmid was obtained after Sanger sequencing using plasmid extraction Kit extraction plasmid.

The plasmid structures of pCDF-Duet-1-cmx8-NLS-Cas8-Cas5, pUC19-CRISPR array, pRSF-Duet-1-Cas6-Cas11 and pET-28a-Cas3 constructed by the method are shown in figures 4 to 7.

This example designed a DNA fragment sequence expressing CRISPR array (tdTomato gene was used as target gene in this example) (SEQ ID NO. 15) constructed on pUC19 vector. The structure of the crRNA expressing DNA fragment sequence is: 5'-repeat sequence-spacer sequence-repeat sequence-3', wherein the repeat sequence is derived from the original type I-B gene cluster of Synechocystis sp.PCC 6714 and the spacer sequence is derived from the tdTomato gene (as shown in FIG. 8).

The above sequences are as follows:

coding gene sequence of Cmx8 protein: SEQ ID NO.1:

atgggcagcagccatcaccatcatcaccaccaccacagccagtggagccatccgcagtttgaaaaaggtggtggtagcggtggtggttcaggtggtagtgcatggtcacaccctcagtttgagaaactggaagtgctgttccagggtccgggatccatgccgaaaacccaagcggagatcctgaccctggacttcaacctggcggaactgccgagcgcgcaacaccgtgcgggtctggcgggtctgatcctgatgattcgtgagctgaagaaatggccgtggtttaagatccgtcaaaaggagaaagacgtgctgctgagcattgaaaacctggatcagtacggtgcgagcatccaactgaacctggaaggcctgattgcgctgttcgatctggcgtatctgagctttaccgaggagcgtaagagcaaaagcaagatcaaagacttcaaacgtgttgatgagatcgaaattgaggaaaacggcaagaacaagatccagaagtactacttctacgacgtgattaccccgcaaggtggctttctggcgggttgggacaaaagcgatggccagatctggctgcgtatttggcgtgatatgttctggagcatcattaagggcgttccggcgacccgtaacccgtttaacaaccgttgcggtctgaacctgaacgcgggcgacagcttcagcaaggatgttgagagcgtgtggaaaagcctgcagaacgcggaaaagaccaccggtcaaagcggcgcgttttacctgggtgcgatggcggttaacgcggaaaacgtgagcaccgacgatctgatcaaatggcagttcctgctgcacttctgggcgtttgttgcgcaagtgtactgcccgtatattctggacaaggatggtaaacgtaactttaacggctatgtgatcgttattccggacatcgcgaacctggaggacttctgcgatattctgccggatgtgctgagcaaccgtaacagcaaagcgttcggttttcgtccgcaggaaagcgttatcgacgtgccggagcaaggcgcgctggaactgctgaacctgatcaagcagcgtattgcgaagaaagcgggtagcggcctgctgagcgatctgatcgtgggtgttgaggtgatccacgcggaaaagcagggcaacagcatcaaactgcacagcgttagctacctgcaaccgaacgaggaaagcgtggacgattataacgcgattaagaacagctactattgcccgtggttccgtcgtcagctgctgctgaacctggttaacccgaaatttgacctggcgagccaaagctggctgaagcgtcacccgtggtacggttttggcgatctgctgagccgtatcccgcagcgttggctgaaagagaacaacagctatttcagccacgacgcgcgtcagctgttcacccaaaagggtgactttgatatgaccgtggcgaccaccaaaacccgtgagtacgcggaaatcgtttataagattgcgcagggtttcgtgctgagcaagctgagcagcaaacacgacctgcaatggagcaagtgcaaaggcaacccgaaactggagcgtgaatacaacgataagaaagagaaggtggttaacgaagcgtttctggcgatccgtagccgtaccgaaaaacaggcgttcattgactactttgttagcaccctgtatccgcacgttcgtcaagacgagttcgtggattttgcgcagaaactgttccaagacaccgatgaaatccgtagcctgaccctgctggcgctgagcagccagtatccgattaagcgtcaaggcgagaccgaataa

amino acid sequence of Cmx8 protein: SEQ ID NO.2:

MPKTQAEILTLDFNLAELPSAQHRAGLAGLILMIRELKKWPWFKIRQKEKDVLLSIENLDQYGASIQLNLEGLIALFDLAYLSFTEERKSKSKIKDFKRVDEIEIEENGKNKIQKYYFYDVITPQGGFLAGWDKSDGQIWLRIWRDMFWSIIKGVPATRNPFNNRCGLNLNAGDSFSKDVESVWKSLQNAEKTTGQSGAFYLGAMAVNAENVSTDDLIKWQFLLHFWAFVAQVYCPYILDKDGKRNFNGYVIVIPDIANLEDFCDILPDVLSNRNSKAFGFRPQESVIDVPEQGALELLNLIKQRIAKKAGSGLLSDLIVGVEVIHAEKQGNSIKLHSVSYLQPNEESVDDYNAIKNSYYCPWFRRQLLLNLVNPKFDLASQSWLKRHPWYGFGDLLSRIPQRWLKENNSYFSHDARQLFTQKGDFDMTVATTKTREYAEIVYKIAQGFVLSKLSSKHDLQWSKCKGNPKLEREYNDKKEKVVNEAFLAIRSRTEKQAFIDYFVSTLYPHVRQDEFVDFAQKLFQDTDEIRSLTLLALSSQYPIKRQGETE

coding gene sequence of Cas8 protein: SEQ ID NO.3:

atgagcaacctgaacctgttcgcgaccatcctgacctatccggcgccggcgagcaactatcgtggcgagagcgaggaaaaccgtagcgtgatccagaagattctgaaagacggtcaaaaatacgcgatcattagcccggaaagcatgcgtaacgcgctgcgtgagatgctgattgaactgggccagccgaacaaccgtacccgtctgcacagcgaggaccaactggcggtggagttcaaagaatacccgaacccggataagtttgcggacgatttcctgtttggttatatggttgcgcagaccaacgacgcgaaagaaatgaagaaactgaaccgtccggcgaagcgtgatagcatcttccgttgcaacatggcggtggcggttaacccgtacaaatatgacaccgtgttttaccaaagcccgctgaacgcgggtgatagcgcgtggaagaacagcaccagcagcgcgctgctgcaccgtgaggttacccacaccgcgttccagtatccgttcgcgctggcgggcaaggactgcgcggcgaaaccggagtgggtgaaggcgctgctgcaagcgattgcggaactgaacggtgttgcgggtggccatgcgcgtgcgtactatgaatttgcgccgcgtagcgtggttgcgcgtctgaccccgaaactggtggcgggttaccagacctatggctttgatgcggagggtaactggctggaactgagccgtctgaccgcgaccgacagcgataacctggacctgccggcgaacgagttttggctgggtggcgaactggttcgtaaaatggatcaggagcaaaaggcgcaactggaagcgatgggtgcgcacctgtatgcgaacccggagaagttgtttgccgacttagcagatagttttctgggggtaccgaagaagaagcgtaaggtgtaa

amino acid sequence of Cas8 protein: SEQ ID NO.4:

MSNLNLFATILTYPAPASNYRGESEENRSVIQKILKDGQKYAIISPESMRNALREMLIELGQPNNRTRLHSEDQLAVEFKEYPNPDKFADDFLFGYMVAQTNDAKEMKKLNRPAKRDSIFRCNMAVAVNPYKYDTVFYQSPLNAGDSAWKNSTSSALLHREVTHTAFQYPFALAGKDCAAKPEWVKALLQAIAELNGVAGGHARAYYEFAPRSVVARLTPKLVAGYQTYGFDAEGNWLELSRLTATDSDNLDLPANEFWLGGELVRKMDQEQKAQLEAMGAHLYANPEKLFADLADSFLGV

coding gene sequence of Cas5 protein: SEQ ID NO.5:

atggcgcagctggcgctggcgctggacaccgtgacccgttacctgcgtctgaaggcgccgttcgcggcgtttcgtccgttccaaagcggtagctttcgtagcaccaccccggtgccgagcttcagcgcggtttatggtctgctgctgaacctggcgggcatcgagcagcgtcaagaggtggagggtaaagttaccctgattaagccgaaagcggaactgccgaagctggcgatcgcgattggccaggtgaaaccgagcagcaccagcctgatcaaccagcaactgcacaactacccggttggtaacagcggcaaggagtttgcgagccgtaccttcggtagcaaatattggattgcgccggtgcgtcgtgaagtgctggttaacctggacctgatcattggcctgcaaagcccggtggagttttggcagaagctggatcaaggtctgaaaggcgaaaccgttatcaaccgttacggtctgccgttcgcgggcgacaacaacttcctgtttgatgagatctacccgattgaaaagccggacctggcgagctggtattgcccgctggagccggatacccgtccgaaccagggtgcgtgccgtctgaccctgtggatcgaccgtgagaacaacacccaaaccaccattaaggtttttagcccgagcgatttccgtctggaaccgccggcgaaagcgtggcagcaactgccgggctaa

amino acid sequence of Cas5 protein: SEQ ID NO.6:

MAQLALALDTVTRYLRLKAPFAAFRPFQSGSFRSTTPVPSFSAVYGLLLNLAGIEQRQEVEGKVTLIKPKAELPKLAIAIGQVKPSSTSLINQQLHNYPVGNSGKEFASRTFGSKYWIAPVRREVLVNLDLIIGLQSPVEFWQKLDQGLKGETVINRYGLPFAGDNNFLFDEIYPIEKPDLASWYCPLEPDTRPNQGACRLTLWIDRENNTQTTIKVFSPSDFRLEPPAKAWQQLPG

coding gene sequence of Cas6 protein: SEQ ID NO.7:

atgaacttcatcgacctggcgtttccggtgaagggcaccgttctgaacgcggatcacaactactatctgtacagcgcgattgcgaaagagtttccgatcctgcacgacctgccggatctggcggtgaacaccatcagcggcaagccggaccgtgaaggcaaaattctgctggttccgggcagcaagctgtggatgcgtctgccgatcgataacattacccacatctaccagctggcgggtaagaaactgcgtattggccaatatagcatcgaactgggtaacccgagcctgcacccgctggagccggttgaaagcctgaaggcgcgtatcattaccattaaaggtcacaccgagccgatcagcttcctggaagcggtgaagcgtcagctgtttgcgctggagattaccgaaggtgacgttggcatcccggcgaaccacgagggtattccgaaacgtctgaccctgcaaatcaagaaaccggaacgtacctacagcattgtgggctatagcgttctgctgagcaacctgagcgcggaggatagcctgaagattcagcaagtgggtatcggtggcaaacgtcgtctgggttgcggcgtgttctatccggcggttaagaaaagcaccaacagcggtaacaagaaaaacgttgaagcgaccctgggctaa

amino acid sequence of Cas6 protein: SEQ ID NO.8:

MNFIDLAFPVKGTVLNADHNYYLYSAIAKEFPILHDLPDLAVNTISGKPDREGKILLVPGSKLWMRLPIDNITHIYQLAGKKLRIGQYSIELGNPSLHPLEPVESLKARIITIKGHTEPISFLEAVKRQLFALEITEGDVGIPANHEGIPKRLTLQIKKPERTYSIVGYSVLLSNLSAEDSLKIQQVGIGGKRRLGCGVFYPAVKKSTNSGNKKNVEATLG

coding gene sequence of Cas11 protein: SEQ ID NO.9:

atgaccgtggcgaccaccaaaacccgtgagtacgcggaaatcgtttataagattgcgcagggtttcgtgctgagcaagctgagcagcaaacacgacctgcaatggagcaagtgcaaaggcaacccgaaactggagcgtgaatacaacgataagaaagagaaggtggttaacgaagcgtttctggcgatccgtagccgtaccgaaaaacaggcgttcattgactactttgttagcaccctgtatccgcacgttcgtcaagacgagttcgtggattttgcgcagaaactgttccaagacaccgatgaaatccgtagcctgaccctgctggcgctgagcagccagtatccgattaagcgtcaaggcgagaccgaataa

amino acid sequence of Cas11 protein: SEQ ID NO.10:

MTVATTKTREYAEIVYKIAQGFVLSKLSSKHDLQWSKCKGNPKLEREYNDKKEKVVNEAFLAIRSRTEKQAFIDYFVSTLYPHVRQDEFVDFAQKLFQDTDEIRSLTLLALSSQYPIKRQGETE

coding gene sequence of Cas3 protein: SEQ ID NO.11:

atgctgaaacaactgctggcgaagagcctgccgaccgacccgcagaagaaaccgctgagcctggaacaacacctgctggataccgagaccgcggcgctggtgatctttaagggtcgtatgctggacaactggtgccgtttctttaaggttaaagacccggatgaattcctgctgcacctgcgtgtggcggcgctgtttcacgatctgggcaaagcgaaccacgagttcattgaagcggttaccgcgaaaggttttgtgccgcagaccctgcgtcacgaatggatcagcgcgctggttctgcacctgccggaagtgcgtcaatggctgggcaaaagcaacctgaacctggaagtggttaccgcggcggttctgagccatcacctgaaagcgagcccggatggtgattacaagtgggacgaaccgcagaagagcggtgataaagttgagaccaagctgtatttcaaccacgaggaagtggaccgtatcctgaacaaaattgcgaacctgctggacgtggatagcaagctgccggaactgccgaagaaatggatcaaaggcgacattttcctggagaacatctacaaagatgcgaaccagattggtcgtaagtttacccgtcaagcgaagaaagacgatagcctgaaaggcctgctgctggcggttaaagcgggtctgattgcgagcgacagcgtggcgagcggtatttaccgtacccaggatagcgaagcgatcgcgaactgggttaaccaaaccctgcacaccaacagcattaccccggaggaaatcgaggaaaagattctgcacccgcgttatcgtcaggtggagaaaagcatcaacgaaccgttccagctgaaacgttttcaagagaaggcggaaaccctgagcagccgtctgctgctgatgagcggttgcggcagcggtaaaaccattttcgcgtacaagtggatgcagggcgttctgaacaagcaccaagcgggtcgtgcgatcttcctgtatccgacccgtggcaccgcgaccgaaggttttaaagactatgtgagctggtgcccggaggcggatgcgagcctgctgaccggtaccgcgacctacgagctgcaggcgattgcgaaaaacccgaccgaggcgaacgaaggcaaggactatcaagcggatgaacgtctgtacgcgctgggctattggggcaagcgtttctttagcgcgaccgttgaccagttcctgagctttctgacccacaactacaaaagcatctgcctgctgccggtgctggcggacagcgtggttgtgatcgatgaaattcacagcttcagcccggagatgtttgacagcctggtttgcttcctgaagacctttgatgttccggtgctgtgcatgaccgcgaccctgccgcagacccgtattgaggacctgaccattcaactggacaaggataaagacggcctgggtctggaagttttcccgaccagcgatcgtagcgagctggcggagctggaaaaagcggagggcatggaacgttacctgattgcgcacaccaacgaggaagcggcgctggacctggcggtgaaagcgtatcaggatagcaagcgtgttctgtgggttgtgaacaccgtggaccgttgccgtgagaaggcgcgtaaactggaatgcctgctgaagaccgaggttctgacctaccacagccgtttcaaactggcggatcgtcaaaaccgtcaccgtgagaccgtggaagcgtttgcgctgcaccaggcgcaaggtgaaaagaaagcggcgatcgcggttaccacccaggtgtgcgagatgagcctggatctggacgcggatgttctgatcaccgaactggcgccgattagcagcctggtgcaacgtttcggccgtagcaaccgtggtgacaagaacgataaaaccgagccgagcaaaatttacgtttataagccgccgaaggacaaaccgtataagcagaaagacgatctggacccggcggaaaagttcatcaacgatgtgctgggtcgtgcgagccaaaaactgctggcggagaagctgaaagagcatagcccgccgggccgttacagcgatggtagcgcgccgtttgtgacccagggctattgggcgagcagcgatgagccgttccgtaagattgacgattttgcggttaacgcggtgctgaccgaggacctgggtgaaatcacccaatacctgaacagcaacccgccgaaaccgatcgatggctttattgttccggtgccgaagaaatataagttccagggttttagccaccgtccgccgcaactgccgaaatacctggaaatcgcggacagcaaattctatagcagcaagcgtggctttggtgacgatgcg

amino acid sequence of Cas3 protein: SEQ ID NO.12:

MLKQLLAKSLPTDPQKKPLSLEQHLLDTETAALVIFKGRMLDNWCRFFKVKDPDEFLLHLRVAALFHDLGKANHEFIEAVTAKGFVPQTLRHEWISALVLHLPEVRQWLGKSNLNLEVVTAAVLSHHLKASPDGDYKWDEPQKSGDKVETKLYFNHEEVDRILNKIANLLDVDSKLPELPKKWIKGDIFLENIYKDANQIGRKFTRQAKKDDSLKGLLLAVKAGLIASDSVASGIYRTQDSEAIANWVNQTLHTNSITPEEIEEKILHPRYRQVEKSINEPFQLKRFQEKAETLSSRLLLMSGCGSGKTIFAYKWMQGVLNKHQAGRAIFLYPTRGTATEGFKDYVSWCPEADASLLTGTATYELQAIAKNPTEANEGKDYQADERLYALGYWGKRFFSATVDQFLSFLTHNYKSICLLPVLADSVVVIDEIHSFSPEMFDSLVCFLKTFDVPVLCMTATLPQTRIEDLTIQLDKDKDGLGLEVFPTSDRSELAELEKAEGMERYLIAHTNEEAALDLAVKAYQDSKRVLWVVNTVDRCREKARKLECLLKTEVLTYHSRFKLADRQNRHRETVEAFALHQAQGEKKAAIAVTTQVCEMSLDLDADVLITELAPISSLVQRFGRSNRGDKNDKTEPSKIYVYKPPKDKPYKQKDDLDPAEKFINDVLGRASQKLLAEKLKEHSPPGRYSDGSAPFVTQGYWASSDEPFRKIDDFAVNAVLTEDLGEITQYLNSNPPKPIDGFIVPVPKKYKFQGFSHRPPQLPKYLEIADSKFYSSKRGFGDDA

coding gene sequence of nuclear localization signal NLS: SEQ ID NO.13:

ccgaagaagaagcgtaaggtg

amino acid sequence of nuclear localization signal NLS: SEQ ID NO.14:

PKKKRKV

DNA fragment sequence expressing CRISPR array: SEQ ID NO.15:

note that: in the above SEQ ID No.15, the spacer sequence is framed, and the rest is the repeat sequence.

Example 2

This embodiment is based on embodiment 1. This example is the preparation and purification of the respective proteins.

(1) Cascade purification

The three plasmids pCDF-Duet-1-cmx8-NLS-cas8-cas5, pRSF-Duet-1-cas6-cas11 and pUC19-CRISPR array constructed in example 1 were co-transferred into E.coli BL21 (DE 3), and the single colony of the strain was inoculated into 1L of large LB medium containing 50. Mu.g/ml of antibiotics Amp, kan and Strep. Shaking culture at 37 deg.C and 180rpm for about 3 hr, and when the OD value of the measured bacteria liquid reaches 0.6-0.8, cooling to 18 deg.C, adding 0.5mM IPTG solution to induce expression for 20 hr.

The cells were resuspended in 20mM Tris-HCl pH 7.5, 500mM NaCl solution and lysed by sonication. The protein with Strep tag can be combined with Strep column, and the mixed protein is washed by using 20mM Tris-HCl pH 7.5 and 500mM NaCl solution, and the crude extract protein is obtained by using 20mM Tris-HCl pH 7.5, 500mM NaCl and 5mM d-desthiobritin solution; performing molecular sieve Superdex 200 th/300 th chromatography to obtain uniform protein, wherein the eluent is: 20mM HEPES pH 7.5, 150mM NaCl.

Thus, the purified Cascade complex is obtained, and the corresponding molecular sieve chromatogram and electrophoresis chart are shown in FIG. 9.

(2) Cas3 purification

pET-28a-Cas3 plasmid was transformed into E.coli BL21 (DE 3) alone and induced to express at 18℃for 20h in 1L LB containing only 50. Mu.g/ml Kan antibiotic. The cells were resuspended in 20mM HEPES pH 7.5, 500mM NaCl,20mM imidazole,5% glycerol buffer and sonicated. Combining Cas3 with his-tag by Ni-NTA, gradient eluting target protein by using buffer solution of 20mM HEPES pH 7.5, 500mM NaCl,20mM imidazole,5% glycerol with final concentration of imidazole of 50mM, 100mM, 200mM and 500mM to obtain crude extract protein, and performing Superdex 200/300 row resistance chromatography to obtain uniform protein by using molecular sieve, wherein the eluent is: 20mM HEPES pH 7.5, 150mM NaCl.

The purified Cas3 protein is thus obtained, and the corresponding molecular sieve chromatograms and electrophoresis patterns are shown in fig. 10.

Example 3

This embodiment is based on embodiment 2. This example is to screen type I-B system PAM sequences.

In this example, the best PAM sequence of the I-B type system developed based on Synechocystis sp.PCC 6714 bacteria of the present invention was selected by constructing a PAM library.

Two primers Mix PAM-F and Mix PAM-R were designed containing a protospacer sequence and a random PAM sequence. The protospacer sequence was designed based on the sequence of crRNA as "tttatcaccgtgtccccaatctggatattttgtgt", and three random bases "nnn" were designed at its 5' end as PAM library. The PAM library was ligated with pET-28a vector using the enzyme digestion ligation method to construct a PAM library plasmid.

Note that: the crRNA sequence is: guguccaaaccauugaugccguaaggcguugagcac.

Amplifying 161bp double-stranded DNA of the PAM library by PCR by using the PAM library plasmid as a template, wherein the upstream primer carries a T7 promoter sequence; the PCR product was used as a template, and the 5' -end 6-FAM fluorescent-labeled T7 promoter primer was used again to carry out PCR, so that the product was FAM fluorescent-labeled. Through two PCR reactions, 97bp double-stranded DNA with CY5 fluorescent marks at the 3' end is obtained, and comprises a single PAM sequence which is named synPAM 1-30 respectively.

The primer sequences used to construct the PAM library are shown in the table below, with PAM sequences boxed.

/>

After the Cascade complex purified in example 2 was incubated with PAM library DNA at 25℃for 1h, the reacted product was separated by non-denaturing electrophoresis, and after the Cascade complex was formed, it was visually observed that the migration rate of the fluorescent band in the EMSA gel was slow. DNA bands bound to the Cascade complex were excised under fluorescence and sequenced with the sequencing primers being the universal T7 promoter and T7 terminator sequences. The sequencing result is analyzed and aligned with the pET-28a-PAM library plasmid sequence.

Note that: the T7 promoter sequences used hereinabove are: 6-FAM T7 master: taatacgactcactatagg (5' with fluorescent label); the T7 terminator sequence is: CY5-T7 terminator: gctagttattgctcagcgg (3' with fluorescent label).

PAM screening of ann was performed first. According to the concept of dissociation constant in molecular interaction, the protein concentration when half of the DNA substrate is bound is the dissociation constant, so that the reaction concentration of DNA is reduced to 10nM, and the gradient of the Cascade complex is set from 0nM to 200nM, and the binding condition of the DNA and Cascade complex is observed.

As shown in FIG. 11, the PAM sequences No. 1-8 are aan and agn, and at Cascade complex concentrations of 100nM and 200nM, there is still more DNA free, while the DNA containing both acn and att PAM sequences No. 9-16 is fully bound at high protein concentrations. The results indicate that the cascades preferentially bind cytosine c or thymine t in the second position of the PAM sequence.

In PAM sequences, the preference of the third bit is more pronounced than the second bit. The aan sequences exhibited poor performance, better g binding in agn, comparable binding between aca and acg in acn, and significantly better atg binding in att than the other three, thus, overall, the third bias for PAM sequences was g.

Then, the base sequences of the second and third positions of the PAM sequence are fixed, and the base of the first position is changed to observe the law. As shown in FIG. 12, the comparison of the two groups ncg and ntg shows that acg binds best in ncg and atg also in ntg more efficiently at 50nM, indicating that the first position of the PAM sequence is indeed preferential to base a. Comparing ntg and ncg to each other, it was found that ntg always has a binding rate greater than ncg. It follows that the second bit of the PAM sequence is biased at t.

Note that: fig. 11 and 12 show typical partial result graphs, and the result graphs of the remaining sequences are not shown, but the results meet the above conclusion.

The optimal PAM sequence for the I-B system of the present invention thus far was 5'-atg-3'.

Example 4

This embodiment is based on embodiment 3. This example is an in vitro assembly of a cascades-DNA-Cas 3 ternary complex.

The purified Cascade complex of example 2 was combined with the PAM-DNA determined in example 3 in a molar ratio of 1:3 incubation at 25℃for 1h, followed by centrifugation at 15000rpm for 10min, and cascades-DNA complexes were obtained by molecular sieve chromatography. The cascades-DNA complex was combined with Cas3 protein of example 2 in a molar ratio of 1:3 after incubation for 1h at 25℃and centrifugation at 15000rpm for 10min, the Cascade-DNA-Cas3 complex was obtained by molecular sieve chromatography and the resulting complex was verified by SDS-PAGE (the relevant results are shown in FIG. 13).

The Cascade-DNA-Cas3 ternary complex obtained by the method proves that the Cascade complex obtained by the method has binding activity with Cas3 protein.

Example 5

This embodiment is based on embodiment 4. This example is to test the gene knockout efficiency of the inventive type I-B CRISPR-Cas system in hESC cell lines.

The specific process of constructing the hESC-EGFP-tdTomato double-report cell line is as follows:

the hESC-EGFR reporter cell line was first constructed. Wild-type hESC cells were digested with TrypLE Express (Gibco), resuspended with OptiMem, and the cell density was adjusted to 5X 10 ⁶ cells/mL. Mu.l of the cell suspension was mixed with 30. Mu.g of linearized DNMT3B-EGFP plasmidAfter the combination, the mixture was placed in a 0.4cm electrocuvette for electroporation, and then placed in a 10cm dish, in which E8 medium containing 10. Mu. m Y-27632 was previously added. After 3 days of culture, puromycin 0.5 μg/ml was added for selection, during which time the medium was changed daily and passaged normally if full. After 7 days of culture, drug-resistant monoclonal expressing EGFP is identified by using a fluorescence microscope, and a hESC-EGFP cell line is obtained after monoclonal amplification.

Then, the hESC-EGFR-tdTomato reporter cell line was constructed on the basis of the hESC-EGFP cell line by the same method as described above. The plasmids are selected as follows: the DNMT3B-tdTomato plasmid was linearized. And screening and monoclonal amplifying the EGFP+/tdTm+ double-positive cells to obtain the hESC-EGFP-tdTomato double-report cell line.

The hESC-EGFP-tdTomato double-report cell line is adopted; the tdTomato is used as target gene to design crRNA, and the sequence is SEQ ID NO.15 (same as that of example 1); EGFR is used as a target gene to design crRNA, and the sequence is as follows:

wherein, the box is a spacer sequence, and the rest is a repeat sequence. Then, plasmids were constructed according to the method of example 1, cascades complexes and Cas3 proteins were prepared according to the method of example 2, and then were electrotransferred into the hESC-EGFP-tdTomato double-reporter cell line using the Neon nuclear transfection system (ThermoFisher), and finally the editing efficiency was calculated by FACS.

Approximately 4-5 days after electrotransformation, cells were digested with TrypLE Express (Gibco) and HAP1 cells were resuspended with IMDM supplemented with 10% FBS. The hESC-EGFP-tdTomato double reporter cell line was flow analyzed with LSR Fortessa (BD) 488nm laser. FACS data were analyzed with FlowJo v 10.4.1.

The present example explores the gene editing efficiency of the type I-B system developed based on Synechocystis sp.PCC 6714 strain, i.e., the gene editing system of the present invention, using the above-described method. Editing efficiency was expressed as the ratio of the number of tdTomato-or EGFP-cells.

Note that: in addition, the Long range PCR method and the NGS sequencing method can be adopted for detection and analysis.

The results are shown in FIG. 14, which shows that the efficiency of the type I-B gene editing system of the present invention is higher than that of each of the gene editing systems reported in the literature (e.g., type I-C system from N.lactamica ATCC 23970), up to 39% tdTomato targeting efficiency and 55% EGFP targeting efficiency; thus, the superiority of the type I-B gene editing system of the present invention was confirmed. And (3) injection: references mentioned herein refer to: tan R, krueger RK, gramelslache MJ, zhou X, xiao Y, ke A, hou Z, zhang Y.Cas11 enables genome engineering in human cells with compact CRISPR-Cas3 systems.mol cell.2022Jan 13:S1097-2765 (21) 01137-0.

By combining the above embodiments, the preparation method of the invention can rapidly, efficiently and highly-produce Cascade complex and Cas protein with high purity and activity, and can obtain a large amount of protein within two days by using an E.coli prokaryotic expression system and purifying an affinity column and a molecular sieve, thereby having high efficiency.

The gene editing system of the invention is developed based on Synechocystis sp.PCC 6714 and belongs to I-B type through biological information analysis. The knockout efficiency of the gene editing system of the invention on different genes is higher than that of other I-type systems, which shows that the system has superiority and has excellent research and development potential.

In addition to the embodiments described above, other embodiments of the invention are possible. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.

Claims

1. An I-B CRISPR-Cas3 gene editing system is characterized by consisting of Cascade complex and Cas3 protein; the Cascade complex is formed by compounding a Cmx8 protein, a Cas5 protein, a Cas6 protein, a Cas11 protein and a crRNA; the amino acid sequence of the Cmx8 protein is SEQ ID NO.2; the amino acid sequence of the Cas8 protein is SEQ ID NO.4; the amino acid sequence of the Cas5 protein is SEQ ID NO.6; the amino acid sequence of the Cas6 protein is SEQ ID NO.8; the amino acid sequence of the Cas11 protein is SEQ ID NO.10; the amino acid sequence of the Cas3 protein is SEQ ID NO.12; the DNA fragment sequences for expressing crRNA are formed by arranging repeat sequences which are identical to each other and spacer sequences which are identical to each other or are different from each other at intervals, the head and the tail of the DNA fragment sequences are repeat sequences, the repeat sequences are 5'-gtgtccaaaccattgatgccgtaaggcgttgagcac-3', and the spacer sequences are designed according to target genes.

2. The I-B type CRISPR-cascades-Cas 3 gene editing system according to claim 1, characterized in that the 3' end of the amino acid sequence of Cas8 protein is linked with a nuclear localization signal NLS; the 5' end of the amino acid sequence of the Cas3 protein is connected with a nuclear localization signal NLS; the amino acid sequence of the nuclear localization signal NLS is SEQ ID NO.14; the structure of the crRNA expressing DNA fragment sequence is: 5'-repeat sequence-spacer sequence-repeat sequence-3'.

3. The I-B type CRISPR-cascades-Cas 3 gene editing system according to claim 1, characterized in that the encoding gene sequence of the Cmx8 protein is SEQ ID No.1; the coding gene sequence of the Cas8 protein is SEQ ID NO.3; the coding gene sequence of the Cas5 protein is SEQ ID NO.5; the coding gene sequence of the Cas6 protein is SEQ ID NO.7; the coding gene sequence of the Cas11 protein is SEQ ID NO.9; the coding gene sequence of the Cas3 protein is SEQ ID NO.11; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13; the PAM-DNA sequence corresponding to the I-B CRISPR-Cascade-Cas3 gene editing system is 5'-atg-3'.

4. A method of preparing the CRISPR-cascades-Cas 3 gene editing system of type I-B as claimed in any one of claims 1 to 3, comprising the steps of:

thus obtaining the I-B CRISPR-cascades-Cas 3 gene editing system.

5. The preparation method according to claim 4, wherein in the first step, the 3' -end of the coding gene sequence of the Cas8 protein is linked with the coding gene sequence of the nuclear localization signal NLS in the plasmid of the cascades complex; in a plasmid of the Cas3 protein, a coding gene sequence of a nuclear localization signal NLS is connected to the 5' end of a coding gene sequence of the Cas3 protein; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13;

6. The preparation method according to claim 5, wherein the specific process of the first step is as follows:

7. Use of the CRISPR-cascades-Cas 3 gene editing system of type I-B of any one of claims 1 to 3 for identifying, binding to or editing a prokaryotic or eukaryotic gene.

8. A method of cellular gene knockout employing the CRISPR-cascades-Cas 3 gene editing system of type I-B as defined in any one of claims 1 to 3, comprising the steps of:

9. The method of claim 8, wherein in S1, electrotransformation is performed using a Neon nuclear transfection system;

10. A cell line or cell strain comprising the CRISPR-cascades-Cas 3 gene editing system of type I-B of any one of claims 1 to 3.