CN113652411A - Cas9 protein, gene editing system containing Cas9 protein and application - Google Patents

Cas9 protein, gene editing system containing Cas9 protein and application Download PDF

Info

Publication number
CN113652411A
CN113652411A CN202110878452.XA CN202110878452A CN113652411A CN 113652411 A CN113652411 A CN 113652411A CN 202110878452 A CN202110878452 A CN 202110878452A CN 113652411 A CN113652411 A CN 113652411A
Authority
CN
China
Prior art keywords
seq
protein
acid sequence
amino acid
sequence shown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110878452.XA
Other languages
Chinese (zh)
Inventor
王永明
王帅
高思琪
王瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202110878452.XA priority Critical patent/CN113652411A/en
Publication of CN113652411A publication Critical patent/CN113652411A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention belongs to the technical field of gene editing, and particularly relates to a CRISPR/Cas9 gene editing system and application thereof. The gene editing system is a complex formed by a specific Cas9 protein and sgRNA, and can accurately position a target DNA sequence and generate cutting, so that the target sequence is subjected to double-strand break damage; the gene editing is gene editing in a cell or in vitro. The PAM sequences recognized by the specific Cas9 proteins are various in types, the PAM sequences are suitable for gene editing under different scenes, the targetable range is widened, and the specific Hsp1-Hsp3Cas9-Y446A protein and Hsp1-Hsp3Cas9-K390A-Y446A protein have higher editing activity and specificity. The invention has wide application prospect in the field of gene editing.

Description

Cas9 protein, gene editing system containing Cas9 protein and application
Technical Field
The application belongs to the technical field of gene editing, and particularly relates to a Cas9 protein, a gene editing system containing the Cas9 protein and related applications thereof.
Background
The CRISPR/Cas system is an acquired immune system that bacteria and archaea have evolved to protect against foreign virus or plasmid invasion. In the CRISPR/Cas9 system, after a complex is formed by crRNA (CRISPR-derived RNA), tracrRNA (trans-activating RNA) and Cas9 protein, a pam (promoter Adjacent motif) sequence for identifying a target site is formed, the crRNA and the target DNA sequence form a complementary structure, and the Cas9 protein plays a role in cutting DNA, so that the DNA is subjected to breaking damage. Among them, tracrRNA and crRNA can be fused into single-stranded guide RNA (sgRNA) by a linker sequence. When DNA breaks and damages, two major DNA damage repair mechanisms within the cell are responsible for repair: non-homologous end-joining (NHEJ) and Homologous Recombination (HR). Deletion or insertion of a base can be caused as a result of NHEJ repair, and gene knockout can be carried out; in the case of providing a homologous template, site-directed insertion of genes and precise base substitution can be performed using HR repair.
Besides basic scientific researches, the CRISPR/Cas9 gene editing system also has wide clinical application prospects. When the CRISPR/Cas9 gene editing system is used for gene therapy, Cas and a single-stranded guide RNA need to be introduced into a body. The most effective expression vector for gene therapy is adeno-associated virus (AAV). However, AAV virus-packaged DNA typically does not exceed 4.5 kb. SpCas9 has been widely used because of its simple PAM sequence (recognition of NGG) and high activity. However, the SpCas9 protein has 1368 amino acids, and the sgRNA and the promoter cannot be effectively packaged into AAV viruses, so that the clinical application of the protein is limited. To overcome this problem, several Cas9 with small molecular weights were invented, including SaCas9(PAM sequence NNGRRT); st1Cas9(PAM sequence NNAGAW); NmCas9(PAM sequence NNNNGATT); nme2Cas9(PAM sequence NNNNCC); cjCas9(PAM sequence NNRYAC). However, these Cas9 are either prone to off-target (i.e., non-targeted site cleavage), or the PAM sequence is complex, or the editing activity is low, making wide application difficult.
Therefore, the search for a small CRISPR/Cas system with high editing activity, high specificity and simple PAM sequence is a hope to solve the above problems.
Disclosure of Invention
In view of the above problems, the present inventors have conducted extensive studies and found that a series of Cas9 proteins and single-stranded guide RNAs corresponding thereto both constitute a CRISPA/Cas9 gene editing system that efficiently performs gene editing, thereby completing the present invention.
Thus, in a first aspect, the present invention provides a Cas9 protein, the Cas9 protein being:
a) has the sequence shown in SEQ ID NO:12 of the amino acid sequence shown inHsp1-CcuCas9The protein is a protein which is capable of binding to the protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or is the Hsp1-Hsp4Cas9 protein of the amino acid sequence shown in 20
b) Has a sequence similar to SEQ ID NO:12 to SEQ ID NO:20, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence.
In a second aspect, the present invention provides a conjugate comprising:
a) a Cas9 protein, the Cas9 protein being a polypeptide having SEQ ID NOs: 1 to SEQ ID NO:21, a Cco2Cas9 protein, a CcuCas9 protein, a CspCas9 protein, a Hap1Cas 569 protein, a Hap2Cas9 protein, a hgas Cas9 protein, a HtyCas9 protein, an Hsp1Cas9 protein, an Hsp2Cas9 protein, an Hsp3Cas9 protein, an Hsp4Cas9 protein, an Hsp1-CcuCas9 protein, an Hsp1-Hap2Cas9 protein, an Hsp 1-hgas Cas9 protein, an Hsp1-HtyCas9 protein, an Hsp1-Hsp2Cas9 protein, an Hsp1-Hsp3Cas9 protein, an Hsp 1-3 Cas9-Y446A protein, an Hsp1-Hsp3Cas 9-K390-Y390A-Y446A protein, an Hsp1-Hsp4 9 protein or an Nsp2Cas9 protein having an amino acid sequence similar to that of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21 and retains its biological activity and at least 80% sequence identity to the amino acid sequence set forth in any one of seq id no;
b) a modifying moiety; and
c) optionally a linker for linking the Cas9 protein to the modification moiety.
In a third aspect, the present invention provides a fusion protein comprising:
a) a Cas9 protein, the Cas9 protein being a polypeptide having SEQ ID NOs: 1 to SEQ ID NO:21, a Cco2Cas9 protein, a CcuCas9 protein, a CspCas9 protein, a Hap1Cas 569 protein, a Hap2Cas9 protein, a hgas Cas9 protein, a HtyCas9 protein, an Hsp1Cas9 protein, an Hsp2Cas9 protein, an Hsp3Cas9 protein, an Hsp4Cas9 protein, an Hsp1-CcuCas9 protein, an Hsp1-Hap2Cas9 protein, an Hsp 1-hgas Cas9 protein, an Hsp1-HtyCas9 protein, an Hsp1-Hsp2Cas9 protein, an Hsp1-Hsp3Cas9 protein, an Hsp 1-3 Cas9-Y446A protein, an Hsp1-Hsp3Cas 9-K390-Y390A-Y446A protein, an Hsp1-Hsp4 9 protein or an Nsp2Cas9 protein having an amino acid sequence similar to that of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21 and retains its biological activity and at least 80% sequence identity to the amino acid sequence set forth in any one of seq id no;
b) an additional protein or polypeptide; and
c) optionally a linker for linking the Cas9 protein or homolog thereof to the additional protein or polypeptide.
In a fourth aspect, the present invention provides a single stranded guide RNA comprising a scaffold sequence having the sequence of SEQ ID NO:43 and SEQ ID NO:44, or a nucleic acid sequence as set forth in any one of SEQ ID NOs: 43 and SEQ ID NO:44 and retains its biological activity, or a nucleic acid sequence having at least 90% sequence identity to a nucleic acid sequence set forth in any one of SEQ ID NOs: 43 and SEQ ID NO:44, or a nucleic acid sequence engineered to retain its biological activity.
In a fifth aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleic acid sequence encoding:
a) a Cas9 protein, the Cas9 protein being a polypeptide having SEQ ID NOs: 1 to SEQ ID NO:21, a Cco2Cas9 protein, a CcuCas9 protein, a CspCas9 protein, a Hap1Cas 569 protein, a Hap2Cas9 protein, a hgas Cas9 protein, a HtyCas9 protein, an Hsp1Cas9 protein, an Hsp2Cas9 protein, an Hsp3Cas9 protein, an Hsp4Cas9 protein, an Hsp1-CcuCas9 protein, an Hsp1-Hap2Cas9 protein, an Hsp 1-hgas Cas9 protein, an Hsp1-HtyCas9 protein, an Hsp1-Hsp2Cas9 protein, an Hsp1-Hsp3Cas9 protein, an Hsp 1-3 Cas9-Y446A protein, an Hsp1-Hsp3Cas 9-K390-Y390A-Y446A protein, an Hsp1-Hsp4 9 protein or an Nsp2Cas9 protein having an amino acid sequence similar to that of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21 and retains its biological activity and at least 80% sequence identity to the amino acid sequence set forth in any one of seq id no;
b) a conjugate of the second aspect of the invention; or
c) A fusion protein of the third aspect of the invention.
In a sixth aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleic acid sequence encoding the single stranded guide RNA of the fourth aspect of the invention.
In a seventh aspect, the present invention provides a vector comprising a nucleic acid sequence encoding:
a) a Cas9 protein, the Cas9 protein being a polypeptide having SEQ ID NOs: 1 to SEQ ID NO:21, a Cco2Cas protein, a CcuCas protein, a CspCas protein, a Hap1Cas protein, a Hap2Cas protein, a HgaCas protein, an HtyCas protein, an Hsp Cas protein, an Hsp2Cas protein, an Hsp3Cas protein, an Hsp4Cas protein, an Hsp-CcuCas protein, an Hsp-Hap 2Cas protein, an Hsp-HgaCas protein, an Hsp-HtyCas protein, an Hsp-Hsp 2Cas protein, an Hsp-Hsp 3 Cas-Y446 protein, an Hsp-446 3 Cas-K390-Y protein, an Hsp-Hsp 4Cas protein, or an Nsp2Cas protein, or a polypeptide having an amino acid sequence similar to that shown in SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21 and retains its biological activity and at least 80% sequence identity to the amino acid sequence set forth in any one of seq id no;
b) a conjugate of the second aspect of the invention; or
A fusion protein of the third aspect of the invention.
In an eighth aspect, the present invention provides a vector comprising a nucleic acid sequence encoding the single stranded guide RNA of the fourth aspect of the invention.
In a ninth aspect, the present invention provides a CRISPR/Cas9 gene editing system comprising:
a) a protein component comprising:
1) a Cas9 protein, the Cas9 protein being a polypeptide having SEQ ID NOs: 1 to SEQ ID NO:21, a Cco2Cas9 protein, a CcuCas9 protein, a CspCas9 protein, a Hap1Cas 569 protein, a Hap2Cas9 protein, a hgas Cas9 protein, a HtyCas9 protein, an Hsp1Cas9 protein, an Hsp2Cas9 protein, an Hsp3Cas9 protein, an Hsp4Cas9 protein, an Hsp1-CcuCas9 protein, an Hsp1-Hap2Cas9 protein, an Hsp 1-hgas Cas9 protein, an Hsp1-HtyCas9 protein, an Hsp1-Hsp2Cas9 protein, an Hsp1-Hsp3Cas9 protein, an Hsp 1-3 Cas9-Y446A protein, an Hsp1-Hsp3Cas 9-K390-Y390A-Y446A protein, an Hsp1-Hsp4 9 protein or an Nsp2Cas9 protein having an amino acid sequence similar to that of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21 and retains its biological activity and at least 80% sequence identity to the amino acid sequence set forth in any one of seq id no;
2) a conjugate of the second aspect of the invention; or
3) A fusion protein of the third aspect of the invention;
b) a nucleic acid component comprising:
the single-stranded guide RNA of the fourth aspect of the present invention.
In a tenth aspect, the present invention provides a cell comprising: an isolated nucleic acid molecule of the seventh aspect of the invention, or a vector of the eighth aspect of the invention.
In an eleventh aspect, the invention provides a method of gene editing a target sequence in an intracellular or in vitro environment, the method comprising: contacting a Cas9 protein, a conjugate of the second aspect of the invention, or a fusion protein of the third aspect of the invention with a single-stranded guide RNA of the fourth aspect of the invention, a vector of the seventh and eighth aspects of the invention, or a CRISPR/Cas9 gene editing system of the ninth aspect of the invention with a target sequence in an intracellular or in vitro environment, wherein the Cas9 protein is a polypeptide having the amino acid sequence of SEQ ID NO:1 to SEQ ID NO:21, a Cco2Cas protein, a CcuCas protein, a CspCas protein, a Hap Cas protein, a Hap2Cas protein, a HgaCas protein, an HtyCas protein, an Hsp1Cas protein, an Hsp2Cas protein, an Hsp3Cas protein, an Hsp4Cas protein, an Hsp-CcuCas protein, an Hsp-Hap 2Cas protein, an Hsp-HgaCas protein, an Hsp-HtyCas protein, an Hsp-Hsp 2Cas protein, an Hsp-Hsp 3 Cas-Y446 protein, an Hsp-446 3 Cas-K390-Y protein, an Hsp-Hsp 4Cas protein, or an Nsp2Cas protein, or a polypeptide having an amino acid sequence similar to that shown in SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, the target sequence is located at the 5' end of the pro-septal adjacent sequence (PAM), and, for the Cco2Cas protein, the CcuCas protein, the CspCas protein, the Hap Cas protein, the Hap2Cas protein, the HgaCas protein, the HtyCas protein, the Hsp1Cas protein, the Hsp2Cas protein, the Hsp3Cas protein, the Hsp4Cas protein, the Hsp-CcuCas protein, Hsp-Hap 2Cas protein, the Hsp-HgaCas protein, the Hsp-HtyCas protein, the Hsp-Hsp 2Cas protein, the Hsp-Hsp 3 Cas-Y446 protein, the Hsp-Hsp 3-K390-Y446 protein, the Hsp-Hsp 4Cas protein, and the Nsp2Cas protein, or a homolog, or a conjugate, or a fusion protein thereof, the PAM has the sequence 5 '-NNNNCY, 5' -NNCNA, 5 '-NNNNCYWT, 5' -NNNGCCKS, 5 '-NNNGG, 5' -NNNCCC, 5 '-NNRTTA, 5' -NNRAA, 5 '-NNRYAT, 5' -NNNTCC, 5 '-NNRT, 5' -NNCNA, 5 '-NNNGG, 5' -NNCCAW, 5 '-NNRTYR, 5' -NNRYAT, 5 '-NNCY, 5' -NNNNCY, 5 '-NNRT and 5' -NNCC, respectively.
In a twelfth aspect, the present invention provides a kit comprising: a Cas9 protein, a conjugate of the second aspect of the invention, or a fusion protein of the third aspect of the invention with a single-stranded guide RNA of the fourth aspect of the invention, an isolated nucleic acid molecule of the fifth and sixth aspects of the invention, a vector of the seventh and eighth aspects of the invention, or a CRISPR/Cas9 gene editing system of the ninth aspect of the invention; and instructions for how to perform gene editing of the target sequence in an intracellular or in vitro environment; wherein the Cas9 protein is a polypeptide having the sequence shown in SEQ ID NO:1 to SEQ ID NO:21, a Cco2Cas9 protein, a CcuCas9 protein, a CspCas9 protein, a Hap1Cas 569 protein, a Hap2Cas9 protein, a hgas Cas9 protein, a HtyCas9 protein, an Hsp1Cas9 protein, an Hsp2Cas9 protein, an Hsp3Cas9 protein, an Hsp4Cas9 protein, an Hsp1-CcuCas9 protein, an Hsp1-Hap2Cas9 protein, an Hsp 1-hgas Cas9 protein, an Hsp1-HtyCas9 protein, an Hsp1-Hsp2Cas9 protein, an Hsp1-Hsp3Cas9 protein, an Hsp 1-3 Cas9-Y446A protein, an Hsp1-Hsp3Cas 9-K390-Y390A-Y446A protein, an Hsp1-Hsp4 9 protein or an Nsp2Cas9 protein having an amino acid sequence similar to that of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21 and an amino acid sequence which has at least 80% sequence identity to the amino acid sequence set forth in any one of seq id no 21 and retains its biological activity.
The subject group developed a Cas9 editing tool that can efficiently perform gene editing in eukaryotic cell environments. These Cas9 proteins have a small number of amino acids, can be efficiently packaged into expression vectors such as adeno-associated viral vectors, and are very suitable for later development as gene therapy tools. The PAM sequences recognized by the Cas9 are various in types, the targetable range is widened, and the PAM sequences can be used for gene editing under different conditions. Furthermore, the Cco2Cas9 protein, the CcuCas9 protein, the CspCas9 protein, the Hap1Cas9 protein, the Hap2Cas9 protein, the HgaCas9 protein, the HtyCas9 protein, the Hsp1Cas9 protein, the Hsp2Cas9 protein, the Hsp3Cas9 protein, the Hsp4Cas9 protein, the Hsp 9-CcuCas 9 protein, the Hsp 9-Hap 2Cas9 protein, the Hsp 9-HgaCas 9 protein, the Hsp 9-HtyCas 9 protein, the Hsp 9-Hsp 9 protein, the Hsp 9-3 Cas 9-K390-Y9-446 9 protein and the Hsp 72-Hsp 4 9 protein of the present invention can all perform gene editing guide under the same RNA-guide scaffold sequence.
In addition, the PAM sequences recognized by Cco2Cas9 protein, CcuCas9 protein, Hap2Cas9 protein, HgaCas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, Hsp1-Hsp4Cas9 protein and Nsp2Cas9 protein are simpler and have wide editing range. The Hsp1-Hsp3Cas9-Y446A protein and the Hsp1-Hsp3Cas9-K390A-Y446A protein have higher editing activity and specificity.
Drawings
Figure 1 shows a schematic diagram of the detection of CRISPR/Cas9 system editing of target sequences using a GFP reporter cell line library;
fig. 2 shows the cell images of the GFP reporter cell line processed by the CRISPR/Cas9 gene editing system of the present invention, wherein the upper image is the GFP fluorescence image and the lower image is the normal white light image;
fig. 3 shows a schematic diagram of the results of the editing efficiency after gene editing of two target sites by the CRISPR/Cco2Cas9 gene editing system;
fig. 4 shows a schematic diagram of the editing efficiency results after gene editing of two target sites by the CRISPR/CcuCas9 gene editing system;
fig. 5 shows a schematic diagram of the results of the editing efficiency after gene editing of two target sites by the CRISPR/CspCas9 gene editing system;
fig. 6 shows a schematic diagram of the editing efficiency results after gene editing of two target sites by the CRISPR/Hap1Cas9 gene editing system;
fig. 7 shows a schematic of the editing efficiency results after gene editing of two target sites by the CRISPR/Hsp1Cas9 gene editing system;
fig. 8 shows a schematic of the editing efficiency results after gene editing of two target sites by the CRISPR/Hsp3Cas9 gene editing system;
fig. 9 shows a schematic of the editing efficiency results after gene editing of two target sites by the CRISPR/Hsp1-Hsp3Cas9 gene editing system;
fig. 10 shows a schematic of the editing efficiency results after gene editing of two target sites by the CRISPR/Hsp1-Hsp3Cas9-Y446A gene editing system;
fig. 11 shows a schematic of the editing efficiency results after gene editing of two target sites by the CRISPR/Hsp1-Hsp3Cas9-K390A-Y446A gene editing system;
fig. 12 shows a schematic diagram of the editing efficiency results after gene editing of two target sites by the CRISPR/Nsp2Cas9 gene editing system;
fig. 13 shows a schematic diagram of the results of specific detection of the CRISPR/Cco2Cas9 gene editing system in the GFP reporter system HEK293T cell line;
fig. 14 shows a schematic diagram of the results of specific detection of the CRISPR/CcuCas9 gene editing system in the GFP reporter system HEK293T cell line;
fig. 15 shows a schematic diagram of the results of specific detection of the CRISPR/Hsp1Cas9 gene editing system in the GFP reporter HEK293T cell line;
fig. 16 shows a schematic diagram of the results of specific detection of the CRISPR/Hsp1-Hsp3Cas9 gene editing system in the GFP reporter HEK293T cell line;
FIG. 17 shows a schematic representation of the results of the specific detection of the CRISPR/Hsp1-Hsp3Cas9-Y446A gene editing system in the GFP reporter HEK293T cell line;
FIG. 18 shows a schematic representation of the results of the specific detection of the CRISPR/Hsp1-Hsp3Cas9-K390A-Y446A gene editing system in the GFP reporter HEK293T cell line;
fig. 19 shows a schematic diagram of the specific detection results of the CRISPR/Nsp2Cas9 gene editing system in the GFP reporter system HEK293T cell line.
Detailed Description
The present invention will be described in further detail below. It is to be understood that both the foregoing summary of the invention and the following detailed description are intended to illustrate the invention specifically and not to limit the invention in any way. The scope of protection of the invention is determined by the claims that follow. Modifications to the embodiments will be apparent to those skilled in the art without departing from the spirit and scope of the invention.
Definition of
Unless defined otherwise, scientific and technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. For a better understanding of the present invention, the following provides definitions and explanations of relevant terms.
As used herein, the terms "Cas 9 protein," "Cas 9," and "Cas" are used interchangeably in this application to refer to RNA-guided nucleases including Cas9 protein or functionally active fragments thereof. The Cas9 protein is a protein component of the CRISPR/Cas9 genome editing system, and is capable of targeting and cleaving a DNA target sequence under the direction of a single stranded guide rna (sgrna) to form a DNA Double Strand Break (DSB). DNA double strand breaks can activate non-homologous end joining (NHEJ) and Homologous Recombination (HR) mechanisms inherent in cells, thereby repairing DNA damage in cells. During repair, the specific DNA sequence is edited at a site.
The terms "single stranded guide RNA", "sgrna (single stranded RNA)", as used herein, are used interchangeably in this application and have the meaning commonly understood by those skilled in the art. In general, a single stranded guide RNA or sgRNA may comprise a scaffold sequence (scaffold sequence) and a guide sequence (guide sequence), also referred to herein as a guide RNA (guide RNA or gRNA). In the context of endogenous CRISPR systems, guide sequences are also referred to as spacer sequences (spacers). In certain instances, a guide sequence is any polynucleotide sequence that has sufficient similarity to a target sequence to hybridize to the target sequence and direct specific binding of the CRISPR/Cas9 complex to the target sequence. In certain embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned, is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%. Determining the optimal alignment is within the ability of one of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, the Smith-Waterman algorithm in matlab (Smith-Waterman), Bowtie, Geneius, Biopython, and SeqMan.
The term "CRISPR/Cas 9 complex" as used herein refers to a single stranded guide rna (single guide rna) or mature crRNA: the complex formed by the binding of the tracrRNA hybrid to the Cas9 protein comprises a guide sequence that hybridizes to the target sequence and thereby binds the Cas9 protein to the target sequence. The complex is capable of recognizing and cleaving a polynucleotide that is capable of hybridizing to the single stranded guide RNA or mature crRNA.
Thus, in the context of forming a CRISPR/Cas9 complex, a "target sequence" refers to a polynucleotide targeted by a guide sequence that is designed to be targeted, e.g., a sequence that is complementary to the guide sequence, wherein hybridization between the target sequence and the guide sequence will facilitate Cas9 to exert its activity, e.g., the activity of cleaving the target sequence. Complete complementarity is not necessary as long as there is sufficient complementarity to cause hybridization and to facilitate Cas9 to exert its activity. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located within an organelle of the eukaryotic cell, such as a mitochondrion or chloroplast.
The term "target sequence" or "target polynucleotide" as used herein can be any polynucleotide endogenous or exogenous to a cell (e.g., a eukaryotic cell). For example, the target polynucleotide may be a polynucleotide present in the nucleus of a eukaryotic cell. The target polynucleotide may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or non-useful DNA). In some cases, the target sequence should be related to the Protospacer Adjacent Motif (PAM). The exact sequence and length requirements for PAM vary depending on the Cas protein used, but PAM is typically a 2-8 base sequence adjacent to the original spacer sequence (target sequence). One skilled in the art is able to identify PAM sequences for use with a given Cas protein.
The terms "polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid fragment" as used herein are used interchangeably and are single-or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter designation as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), "C" represents cytidine or deoxycytidine, "G" represents guanosine or deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, "W" represents A or T, "S" represents C or G, "I" represents inosine, and "N" represents any nucleotide.
The terms "polypeptide", "peptide", and "protein" as used herein are used interchangeably in this application to refer to a polymer of amino acid residues. The term applies to amino acid polymers in which one or more amino acid residues is an artificial chemical analogue of a corresponding naturally occurring amino acid, and to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino acid sequence" and "protein" may also include modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation.
The terms "sequence identity" or "homology" as used herein have art-recognized meanings and the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions can be calculated using the disclosed techniques. Sequence identity can be measured along the entire length of the polynucleotide or polypeptide or along regions of the molecule. (see, e.g., Computer Molecular Biology, desk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: information and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 1994; Sequence Analysis Molecular Biology, von Heanje, G.academic Press, 1987; and Sequence Analysis, Gribskover, M.J., device, J.Y., 1991). Although there are many ways to measure identity between two polynucleotides or polypeptides, the term "identity" is well known to the skilled person to be suitable for conservative amino acid substitutions in a peptide or protein, and can generally be performed without altering the biological activity of the resulting molecule. Generally, one skilled in The art recognizes that a single amino acid substitution in a non-essential region of a polypeptide does not substantially alter biological activity (see, e.g., Watson et al, Molecular Biology of The Gene, 4th Edition, 1987, The Benjamin/Cummings pub.co., p.224).
The term "vector" as used herein refers to a nucleic acid delivery vehicle into which a polynucleotide can be inserted. A vector is referred to as an expression vector when it enables expression of a protein encoded by the inserted polynucleotide, or when it enables transcription of the inserted polynucleotide (e.g., transcription to produce mRNA or functional RNA). The vector may be introduced into a host cell by transformation, transduction, or transfection, and the genetic material elements it carries are expressed in the host cell. Vectors are well known to those skilled in the art and include, but are not limited to: plasmid vectors, viral vectors, and the like. The vector may also contain various regulatory sequences which regulate expression. "regulatory sequence" and "regulatory element" are used interchangeably herein to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence), intermediate or downstream (3' non-coding sequence) of a coding sequence and that affects the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include, but are not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, reporter genes, and the like. The control sequences may be of different origin or may be of the same origin but arranged in a manner different from that normally found in nature. In addition, the vector may contain a replication initiation site.
The term "promoter" as used herein refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the invention, the promoter is a promoter capable of controlling transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
The term "constitutive promoter" as used herein refers to a promoter that will generally cause a gene to be expressed in most cases in most cell types. "tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and refer to a promoter that is expressed primarily, but not necessarily exclusively, in a tissue or organ, but may also be expressed in a particular cell or cell type. "developmentally regulated promoter" refers to a promoter whose activity is determined by a developmental event. An "inducible promoter" selectively expresses an operably linked DNA sequence in response to an endogenous or exogenous stimulus (environmental, hormonal, chemical signal, etc.).
"introducing" a nucleic acid molecule (e.g., a plasmid, a linear nucleic acid fragment, RNA, etc.) or a protein into an organism refers to transforming cells of the organism with the nucleic acid or protein so that the nucleic acid or protein can function in the cells. "transformation" as used herein includes both stable transformation and transient transformation.
The term "stable transformation" as used herein refers to the introduction of an exogenous nucleotide sequence into the genome, resulting in the stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generation thereof.
The term "transient transformation" as used herein refers to the introduction of a nucleic acid molecule or protein into a cell, which performs a function without stable inheritance of a foreign gene. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
The term "complementarity" as used herein refers to the ability of one nucleic acid sequence to form one or more hydrogen bonds with another nucleic acid sequence by means of conventional Watson-Crick or other unconventional types. Percent complementarity refers to the percentage of residues in one nucleic acid molecule that can form hydrogen bonds (e.g., watson-crick base pairing) with another nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% percent complementarity for 5, 6, 7, 8, 9, 10 out of 10 complementarity). "completely complementary" means that all consecutive residues of one nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in another nucleic acid sequence. As used herein, "substantially complementary" refers to a degree of complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.
The term "stringent conditions" as used herein in connection with hybridization refers to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence and does not substantially hybridize to non-target sequences. Stringent conditions are generally sequence dependent and depend on many factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Nucleic Acid Probe Hybridization (Laboratory Techniques in Biochemistry and Molecular Biology), section I, chapter II, "brief description of Hybridization principles and Nucleic Acid Probe analysis strategy" ("Overview of Hybridization and Hybridization analysis strategy of Nucleic Acid probe assay"), Severe (Elsevier), New York.
The term "hybridization" as used herein refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding of bases between the nucleotide residues. Hydrogen bonding can occur by means of watson-crick base pairing, Hoogstein binding, or in any other sequence specific manner. The complex may comprise two strands forming a duplex, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination of these. The hybridization reaction may constitute a step in a broader process, such as the initiation of PCR, or the cleavage of a polynucleotide by an enzyme. Sequences that are capable of hybridizing to a given sequence are referred to as "complements" of the given sequence.
Cas9 protein
Accordingly, in a first aspect, the present invention providesA Cas9 protein, the Cas9 protein being:
a) has the sequence shown in SEQ ID NO:12 of the amino acid sequence shown inHsp1-CcuCas9The protein is a protein which is capable of binding to the protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or is the Hsp1-Hsp4Cas9 protein of the amino acid sequence shown in 20
b) Has a sequence similar to SEQ ID NO:12 to SEQ ID NO:20, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence.
In the present invention, the "biological activity" of Cas9 protein refers to, but is not limited to, the activity of binding the protein to a single-stranded guide RNA, the activity of endonuclease (including single-strand cleavage activity and double-strand cleavage activity), and/or the activity of binding to a specific site of a target sequence and cleaving under the guidance of guide RNA (grna).
The subject group developed a Cas9 editing tool that can efficiently perform gene editing in eukaryotic cell environments. These Cas9 proteins have a small number of amino acids, can be efficiently packaged into expression vectors such as adeno-associated viral vectors, and are very suitable for later development as gene therapy tools. The PAM sequences recognized by the Cas9 are various in types, the targetable range is widened, and the PAM sequences can be used for gene editing under different conditions. Furthermore, the Cco2Cas9 protein, the CcuCas9 protein, the CspCas9 protein, the Hap1Cas9 protein, the Hap2Cas9 protein, the HgaCas9 protein, the HtyCas9 protein, the Hsp1Cas9 protein, the Hsp2Cas9 protein, the Hsp3Cas9 protein, the Hsp4Cas9 protein, the Hsp 9-CcuCas 9 protein, the Hsp 9-Hap 2Cas9 protein, the Hsp 9-HgaCas 9 protein, the Hsp 9-HtyCas 9 protein, the Hsp 9-Hsp 9 protein, the Hsp 9-3 Cas 9-K390-Y9-446 9 protein and the Hsp 72-Hsp 4 9 protein of the present invention can all perform gene editing guide under the same RNA-guide scaffold sequence.
In addition, the PAM sequences recognized by Cco2Cas9 protein, CcuCas9 protein, Hap2Cas9 protein, HgaCas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, Hsp1-Hsp4Cas9 protein and Nsp2Cas9 protein are simpler and have wide editing range. The Hsp1-Hsp3Cas9-Y446A protein, the Hsp1-Hsp3Cas9-K390A-Y446A protein and the Nsp2Cas9 protein have higher editing activity and specificity.
Derivatized proteins
The Cas9 protein may be derivatized, e.g., linked to another molecule (e.g., another protein or polypeptide). In general, derivatization (e.g., labeling) of a protein does not adversely affect the desired activity of the protein (e.g., activity of binding to single-stranded guide RNA, endonuclease activity, activity of binding to a specific site of a target sequence under guidance of the guide RNA and cleavage). Thus, in the present invention, the Cas9 protein may be functionally linked (by chemical coupling, gene fusion, non-covalent linkage, or other means) to one or more other molecular moieties, such as additional proteins or polypeptides, detectable labels, pharmaceutical reagents, and the like.
In particular, the Cas9 protein may be linked to other functional units. For example, it may be linked to a Nuclear Localization Signal (NLS) sequence to enhance the ability of the protein of the invention to enter the nucleus. For example, it can be linked to a targeting moiety to target the Cas9 protein. For example, it can be linked to a detectable label to facilitate detection of Cas9 protein. For example, it can be linked to an epitope tag to facilitate expression, detection, tracking, and/or purification of Cas9 protein.
Accordingly, in a second aspect, the present invention provides a conjugate comprising:
a) a Cas9 protein, the Cas9 protein being:
1) has the sequence shown in SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the protein of Hap1Cas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8, Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:10, and an Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
2) Has the sequence shown in SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
b) a modifying moiety; and
c) optionally a linker for linking the Cas9 protein to the modification moiety.
The description above for the Cas9 protein applies here to this aspect of the invention and is therefore not repeated here.
It is understood that in addition to the Cas9 protein itself, the Cas9 protein may be conjugated to other substances such as other proteins or a taggable tag or the like to confer additional functionality.
Thus, in one embodiment, the modifying moiety may be an additional protein or polypeptide, a detectable label, or a combination thereof.
In a further embodiment, the additional protein or polypeptide is selected from one or more of an epitope tag, a reporter protein or Nuclear Localization Signal (NLS) sequence, cytosine deaminase (CBE), adenine deaminase (ABE), cytosine methylase DNMT3A and MQ1, cytosine demethylase Tet1, transcriptional activator protein VP64, p65 and RTA, transcriptional repressor protein KRAB, histone acetylase p300, histone deacetylase LSD1, and endonuclease fokl.
Epitope tags are well known to those skilled in the art, examples of which include, but are not limited to, His, V5, FLAG, HA, Myc, VSV-G, Trx, and the like, and it is known to those skilled in the art how to select an appropriate epitope tag for a desired purpose (e.g., purification, detection, or tracking).
Reporter proteins are well known to those skilled in the art, examples of which include, but are not limited to, GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, and the like.
Detectable labels are well known to those skilled in the art, examples of which include fluorescent dyes, such as Fluorescein Isothiocyanate (FITC) or DAPI.
The Cas9 protein of the invention may be coupled, conjugated or fused to the modification moiety through a linker, or may be directly linked to the modification moiety without a linker. Linkers are well known in the art, and examples thereof may include, but are not limited to, linkers comprising 1-50 amino acids (e.g., Glu or Ser) or amino acid derivatives (e.g., Ahx, β -Ala, GABA or Ava), or PEG, etc.
In a third aspect, the present invention provides a fusion protein comprising:
a) a Cas9 protein, the Cas9 protein being:
1) has the sequence shown in SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the protein of Hap1Cas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8 Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:10, and an Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
2) Has a sequence similar to SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
b) an additional protein or polypeptide; and
c) optionally a linker for linking the Cas9 protein to the additional protein or polypeptide.
As in the second aspect of the invention, the additional protein or polypeptide may be selected from one or more of an epitope tag, a reporter protein or Nuclear Localization Signal (NLS) sequence, cytosine deaminase (CBE), adenine deaminase (ABE), cytosine methylase DNMT3A and MQ1, cytosine demethylase Tet1, transcriptional activator protein VP64, p65 and RTA, transcriptional repressor protein KRAB, histone acetylase p300, histone deacetylase LSD1, and endonuclease fokl.
Epitope tags are well known to those skilled in the art, examples of which include, but are not limited to, His, V5, FLAG, HA, Myc, VSV-G, Trx, and the like, and it is known to those skilled in the art how to select an appropriate epitope tag for a desired purpose (e.g., purification, detection, or tracking). Reporter proteins are well known to those skilled in the art, examples of which include, but are not limited to, GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, and the like.
Reporter proteins are well known to those skilled in the art, examples of which include, but are not limited to, GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, and the like.
Detectable labels are well known to those skilled in the art, examples of which include fluorescent dyes, such as Fluorescein Isothiocyanate (FITC) or DAPI.
The Cas9 protein of the invention may be coupled, conjugated or fused to the additional protein or polypeptide through a linker, or may be linked directly to the additional protein or polypeptide without a linker. Linkers are well known in the art, examples of which include, but are not limited to, linkers comprising 1-50 amino acids (e.g., Glu or Ser) or amino acid derivatives (e.g., Ahx, β -Ala, GABA or Ava), or PEG, and the like.
Single-stranded guide RNA
In a fourth aspect, the present invention provides a single stranded guide RNA comprising a scaffold sequence having:
a) for Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas9 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp 9-CcuCas 9 protein, Hsp 9-Hap 2Cas9 protein, Hsp 9-HgaCas 9 protein, Hsp 9-HtyCas 9 protein, Hsp 9-Hsp 2Cas9 protein, Hsp 9-Hsp 3Cas9 protein, Hsp1-Hsp3Cas9-Y446 9 protein, Hsp 9-Hsp 3Cas 9-K9-Y446 9 protein, and Hsp 72-Hsp 4Cas9 protein, homologs thereof, or fusion proteins of SEQ ID NO:43, or
SEQ ID NO: 44;
or
b) And SEQ ID NO:43 and SEQ ID NO:44, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.9%, or at least 100% sequence identity to the nucleic acid sequence set forth in any one of seq id No. s, and retaining biological activity thereof; or
c) Based on SEQ ID NO:43 and SEQ ID NO:44 and retaining the biological activity thereof.
In one embodiment, the alteration may be one or more of base phosphorylation, base sulfurization, base methylation, base hydroxylation, shortening of the sequence, and lengthening of the sequence.
In a further embodiment, the shortening of the sequence and the lengthening of the sequence comprise the presence of a deletion or addition of one, two, three, four, five, six, seven, eight, nine or ten bases relative to the base sequence.
In yet another embodiment, the single stranded guide RNA may further comprise a CRISPR spacer at the 5' end of the scaffold sequence, the CRISPR spacer being a sequence of 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides (preferably 22 nucleotides) in length and capable of complementary pairing with a target sequence.
In a preferred embodiment, the CRISPR spacer sequence is a sequence that is 22 nucleotides in length and is capable of complementary pairing with a target sequence.
In a further embodiment, the single stranded guide RNA further comprises a terminator at the 3' end of the scaffold sequence. As an example, the terminator may be a plurality of terminators such as at least six (e.g., seven or eight) U.
The single-stranded guide RNA can bind to the Cas9 protein, conjugate or fusion protein described above to form a complex that can recognize the corresponding PAM and thereby bind to the target sequence, thereby effecting cleavage or gene editing of the target sequence.
Coding nucleic acid and vector
In a fifth aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleic acid sequence encoding:
a) a Cas9 protein, the Cas9 protein being:
1) has the sequence shown in SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the protein of Hap1Cas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8, Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:10, and an Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, and Hsp1-HgaCas9 protein.
Has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein.
Has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
2) Has a sequence similar to SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
b) a conjugate of the second aspect of the invention; or
c) A fusion protein of the third aspect of the invention.
In one embodiment, the isolated nucleic acid molecule comprises SEQ ID NO: 22. SEQ ID NO: 23. SEQ ID NO: 24. SEQ ID NO: 25. SEQ ID NO: 26. SEQ ID NO: 27. SEQ ID NO: 28. SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO: 40. SEQ ID NO:41 and SEQ ID NO:42 or a degenerate sequence thereof.
In a further embodiment, the isolated nucleic acid molecule further encodes a single stranded guide RNA of the fourth aspect of the invention corresponding to the Cas9 protein.
As one example, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a polypeptide having the sequence of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 1O, SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, a homologue, conjugate or fusion protein thereof, for example, a nucleic acid sequence of Cas9 protein, a homologue, conjugate or fusion protein thereof, of the amino acid sequence shown in SEQ ID NO: 22. SEQ ID NO: 23. SEQ ID NO: 24. SEQ ID NO: 25. SEQ ID NO: 26. SEQ ID NO: 27. SEQ ID NO: 28. SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO:41, and comprises a nucleic acid sequence encoding a polypeptide comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 nucleic acid sequence of a single stranded guide RNA engineered with the engineered sequence and retaining its biological activity, e.g. SEQ ID NO:45, or a nucleic acid sequence as set forth in seq id no.
As one example, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a polypeptide having the sequence of SEQ ID NO:21, a homologue, conjugate or fusion protein thereof, for example, a nucleotide sequence of an Nsp2Cas9 protein, a homologue, conjugate or fusion protein thereof, of the amino acid sequence shown in SEQ ID NO:42, and comprises a nucleic acid sequence encoding a polypeptide comprising SEQ ID NO:44, a scaffold sequence comprising a sequence substantially identical to SEQ ID NO:44, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polypeptide comprising a sequence based on SEQ ID NO:44, the nucleic acid sequence of a single stranded guide RNA engineered with the engineered sequence that retains its biological activity, such as SEQ ID NO: 46.
In a sixth aspect, the present invention provides an isolated nucleic acid molecule comprising a nucleic acid sequence encoding the single stranded guide RNA of the fourth aspect of the invention.
In one embodiment, the isolated nucleic acid molecule comprises SEQ ID NO:45 and SEQ ID NO:46 or a degenerate sequence thereof.
In a preferred embodiment, the isolated nucleic acid molecule further comprises a nucleic acid sequence encoding a CRISPR spacer.
After transfection of the isolated nucleic acid molecule of the invention into the corresponding cell using certain means known in the art, such as expression vectors, the isolated nucleic acid molecule of the invention can express the Cas9 protein, its conjugate or fusion protein, and/or the single stranded guide RNA described above, of the invention, and perform the corresponding function, e.g., gene editing, therein.
In addition, the isolated nucleic acid molecule of the present invention can express the Cas9 protein, its conjugate or fusion protein, and the single-stranded guide RNA separately or separately, and can also express the expression product in one body, and the selection of which expression mode is determined on a case-by-case basis.
Moreover, the expression product has the corresponding functions and/or functions as described above, and is not repeated herein for brevity.
In a seventh aspect, the present invention provides a vector comprising a nucleic acid sequence encoding:
a) a Cas9 protein, the Cas9 protein being:
1) has the sequence shown in SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the protein of Hap1Cas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8 Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:10, and an Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19 of the sequence of amino acids shown in the specification, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
2) Has a sequence similar to SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
b) a conjugate of the second aspect of the invention; or
c) A fusion protein of the third aspect of the invention.
In one embodiment, the vector comprises SEQ ID NO: 22. SEQ ID NO: 23. SEQ ID NO: 24. SEQ ID NO: 25. SEQ ID NO: 26. SEQ ID NO: 27. SEQ ID NO: 28. SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO: 40. SEQ ID NO: 41. SEQ ID NO:42 or a degenerate sequence thereof.
The vector may be an expression vector, such as a plasmid vector, e.g., pUC19 vector, an attachment vector, pAAV2_ ITR vector, a retroviral vector, a lentiviral vector, an adenoviral vector, or an adeno-associated viral vector.
In yet another embodiment, the vector further comprises a nucleic acid sequence encoding a single stranded guide RNA corresponding to the Cas9 protein of the fourth aspect of the invention.
As one example, the vector comprises a nucleic acid sequence encoding a polypeptide having SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, a homologue, conjugate or fusion protein thereof, for example, a nucleic acid sequence of Cas9 protein, a homologue, conjugate or fusion protein thereof, of the amino acid sequence shown in SEQ ID NO: 22. SEQ ID NO: 23. SEQ ID NO: 24. SEQ ID NO: 25. SEQ ID NO: 26. SEQ ID NO: 27. SEQ ID NO: 28. SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO:41, and comprises a nucleic acid sequence encoding a polypeptide comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 nucleic acid sequence of a single stranded guide RNA engineered with the engineered sequence and retaining its biological activity, e.g. SEQ ID NO:45, or a nucleic acid sequence as set forth in seq id no.
As one example, the vector comprises a nucleic acid sequence encoding a polypeptide having SEQ ID NO:21, a homologue, conjugate or fusion protein thereof, for example, a nucleotide sequence of an Nsp2Cas9 protein, a homologue, conjugate or fusion protein thereof, of the amino acid sequence shown in SEQ ID NO:42, and comprises a nucleic acid sequence encoding a polypeptide comprising SEQ ID NO:44, comprising a CRISPR repeat as set forth in SEQ ID NO:44, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polypeptide comprising a sequence based on SEQ ID NO:44, the nucleic acid sequence of a single stranded guide RNA engineered with the engineered sequence that retains its biological activity, such as SEQ ID NO: 46.
In an eighth aspect, the present invention provides a vector comprising a nucleic acid sequence encoding the single stranded guide RNA of the fourth aspect of the invention.
In one embodiment, the vector comprises SEQ ID NO:45 and SEQ ID NO:46 or a degenerate sequence thereof.
In a preferred embodiment, the vector further comprises a nucleic acid sequence encoding a CRISPR spacer.
As can be seen from the above description, after transfection of the vector of the present invention into cells, the nucleic acid sequence cloned in the vector can be expressed as Cas9 protein, its conjugate or fusion protein, and/or the single-stranded guide RNA described above, and perform the corresponding function, e.g. gene editing, therein.
In addition, multiple vectors, e.g., two vectors, can be transfected into the cell, wherein one vector expresses the Cas9 protein, conjugate or fusion protein thereof, and the other vector expresses a single stranded guide RNA. Subsequently, the expressed Cas9 protein, its conjugate or fusion protein is complexed with the expressed single-stranded guide RNA to form a complex, where it performs a corresponding function, such as gene editing.
Of course, the nucleic acid sequence encoding the Cas9 protein, its conjugate or fusion protein, and the nucleic acid sequence encoding the single-stranded guide RNA can also be cloned into a vector, such that transfection of the vector into a cell expresses both the Cas9 protein, its conjugate or fusion protein, and the single-stranded guide RNA, and performs the corresponding function, e.g., gene editing, therein.
CRISPR/Cas9 gene editing system
In a ninth aspect, the present invention provides a CRISPR/Cas9 gene editing system comprising:
a) a protein component comprising:
1) a Cas9 protein, the Cas9 protein being:
1.1) has the sequence of SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the Hap1Cas9 protein,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8, Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:10, and an Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
1.2) has a sequence similar to SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
2) a conjugate of the second aspect of the invention; or
3) A fusion protein of the third aspect of the invention; and
b) a nucleic acid component comprising: the single-stranded guide RNA of the fourth aspect of the present invention corresponding to the protein component in a);
and, the protein component and the nucleic acid component are bound to each other to form a complex.
As an example, the protein component comprises a polypeptide having the sequence of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, the nucleic acid component comprising a single-stranded guide RNA that is a nucleic acid sequence comprising the amino acid sequence shown in SEQ ID NO:43, a single-stranded guide RNA comprising a sequence identical to SEQ ID NO:43, or a single-stranded guide RNA comprising a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polynucleotide comprising a nucleotide sequence based on SEQ ID NO:43 engineering the resulting single stranded guide RNA with the engineered sequence that retains its biological activity.
As an example, the protein component comprises a polypeptide having the sequence of SEQ ID NO:21, the nucleic acid component comprising a single stranded guide RNA that is a polynucleotide comprising the amino acid sequence set forth in SEQ ID NO:44, a single-stranded guide RNA comprising a sequence identical to SEQ ID NO:44 having at least 90% sequence identity and retaining its biological activity, or a single stranded guide RNA comprising a homologous sequence based on SEQ ID NO:44 engineering the resulting single stranded guide RNA with the engineered sequence that retains its biological activity.
The expression "at least 90% sequence identity" referred to above for single stranded guide RNAs may be, for example, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.9%, or at least 100% sequence identity.
The CRISPR/Cas9 gene editing system of the present invention can be composed of the Cas9 protein, its homolog, or their conjugate or fusion protein, described herein, directly with the single-stranded guide RNA described herein, or can be composed of an expression product expressed from the vector described herein.
The CRISPR/Cas9 gene editing system realizes the recognition, positioning, cutting and gene editing of a target sequence through the combined action of a Cas9 protein and a single-stranded guide RNA contained in the CRISPR/Cas9 gene editing system.
The CRISPR/Cas9 gene editing system can accurately position a target sequence. The term "pinpoint" has two meanings: the first layer means that the CRISPR/Cas9 gene editing system of the invention can recognize and bind to a target sequence by itself, and the second layer means that the CRISPR/Cas9 gene editing system of the invention can bring other proteins fused with the Cas9 protein or a protein specifically recognizing the sgRNA to the position of the target sequence.
Some CRISPR/Cas9 gene editing systems of the invention have low tolerance to non-target sequences. By "having low tolerance" is meant herein that the CRISPR/Cas9 gene editing system of the invention is substantially or completely unable to recognize and bind non-target sequences, or to bring other proteins fused to the Cas9 protein or proteins specifically recognizing the sgRNA to positions of non-target sequences.
Cells
In a tenth aspect, the present invention provides a cell comprising: the isolated nucleic acid molecule of the fifth and sixth aspects of the invention, or the vector of the seventh and eighth aspects of the invention.
As an example, the cell may be a prokaryotic cell or a eukaryotic cell. For the eukaryotic cell, it may be a plant cell or an animal cell, as an example. As for the animal cell, it may be, for example, a mammalian cell such as a human cell.
Method
In an eleventh aspect, the present invention provides a method of gene editing a target sequence in an intracellular or in vitro environment, the method comprising contacting any one of the following (1) to (4) with the target sequence in the intracellular or in vitro environment:
(1) a Cas9 protein, a conjugate of the second aspect of the invention or a fusion protein of the third aspect of the invention, and a single-stranded guide RNA of the fourth aspect of the invention corresponding to the Cas9 protein,
wherein the Cas9 protein is:
1) has the sequence shown in SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the protein of Hap1Cas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8, Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:10, and an Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
2) Has a sequence similar to SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
(2) the vector of the seventh and eighth aspects of the invention;
(3) a vector of the seventh aspect of the invention; and
(4) the CRISPR/Cas9 gene editing system of the ninth aspect of the invention;
wherein, upon contact with a target sequence, the Cas9 protein, homolog, conjugate or fusion protein recognizes a respective protospacer adjacent sequence (PAM) located at the 3' end of the target sequence and, for the Cco2Cas9 protein, the CcuCas9 protein, the Ccspas 9 protein, the Hap1Cas9 protein, the Hap2Cas9 protein, the HgaCas9 protein, the HtyCas9 protein, the Hsp1Cas9 protein, the Hsp2Cas9 protein, the Hap 3Cas9 protein, the Hsp4 Hsp 9 protein, the Hsp 9-CcuCas 9 protein, the Hsp 9-Hap 2Cas9 protein, the Hsp 9-HgaCas 9 protein, the Hsp 9-Hty3672 protein, the Hsp 72-Hsp 2 Hsp 9 protein, the Hsp 72-Hsp 3 Hsp 9-Hsp 72 protein, the Hsp 72-Hsp 9-Hsp 72 protein, the Hsp 9-Hsp 72-Cas 72-9 protein, the Hsp 72-9-Hsp 72-9 protein or the fusion protein, the Cspc 72 protein, the Cspc 9 protein, the Cspc 72 protein or the fusion protein, the fusion protein or the fusion protein, the PAM is 5 ' -NNNNCY, 5 ' -NNCNA, 5 ' -NNNNCYWT, 5 ' -NNNGCCKS, 5 ' -NNNGG, 5 ' -NNNNCCC, 5 ' -NNRTTA, 5 ' -NNRAA, 5 ' -NNRYAT, 5 ' -NNNTCC, 5 ' -NNRT, 5 ' -NNCNA, 5 ' -NNNGG, 5 ' -NNCCAW, 5 ' -NNRTYR, 5 ' -NNRYAT, 5 ' -NNNNCY, 5 ' -NNRT and 5 ' -NNCC, respectively.
For item (1) above:
as an example, a polypeptide having SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, a homologue, conjugate or fusion protein thereof, and a Cas9 protein comprising the amino acid sequence shown in SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 single-stranded guide RNAs which are engineered to the resulting engineered sequence and retain their biological activity;
as an example, a polypeptide having SEQ ID NO:21, homologues thereof, conjugates thereof or fusion proteins thereof, and a nucleic acid sequence comprising the amino acid sequence shown in SEQ ID NO:44, a scaffold sequence comprising a sequence substantially identical to SEQ ID NO:44, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polypeptide comprising a sequence based on SEQ ID NO:44 engineering the resulting single stranded guide RNA with the engineered sequence that retains its biological activity.
For item (2) above:
as an example, a polypeptide comprising a nucleotide sequence encoding a polypeptide having SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20 (e.g. SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40 or SEQ ID NO: 41) and a vector comprising a nucleic acid sequence encoding a protein, homologue, conjugate or fusion protein comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 a vector which modifies the nucleic acid sequence of the single-stranded guide RNA obtained and retaining the modified sequence of its biological activity (for example, the nucleic acid sequence shown in SEQ ID NO: 45);
as an example, a polypeptide comprising a nucleotide sequence encoding a polypeptide having SEQ ID NO:21, a homologue thereof, a conjugate thereof or a fusion protein thereof, and a vector comprising a nucleic acid sequence encoding the Nsp2Cas9 protein, homologue thereof, conjugate or fusion protein comprising SEQ ID NO:44, a scaffold sequence comprising a sequence substantially identical to SEQ ID NO:44, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polypeptide comprising a sequence based on SEQ ID NO: 44A vector for engineering the nucleic acid sequence of the resulting single-stranded guide RNA (e.g., the nucleic acid sequence shown in SEQ ID NO: 46) and retaining its biologically active engineered sequence.
In one embodiment, the cell is a prokaryotic cell or a eukaryotic cell, such as a plant cell or an animal cell, such as a mammalian cell, e.g., a human cell.
In one embodiment, the gene editing comprises one or more of gene knockout, site-directed base alteration, site-directed insertion, regulation of gene transcription level, DNA methylation regulation, DNA acetylation modification, histone acetylation modification, single base conversion, and chromatin imaging tracking of a target sequence.
Further, in one embodiment, the single base transition comprises a transition of the bases adenine to guanine, cytosine to thymine or cytosine to uracil.
In one embodiment, in the method, the CRISPR spacer sequence of the single stranded guide RNA forms a structure of complete base complementary pairing with the target sequence and a structure of incomplete base complementary pairing with a non-target sequence.
Herein, the incomplete base complementary pairing structure refers to a structure including a part of base complementary pairing and a part of non-base complementary pairing including, for example, base mismatching (mismatch) and/or base bulge (bury), and the like.
In one embodiment, the incomplete base-complementary pairing structure comprises one or more, e.g., two or more, base mismatches.
Thus, the Cas9 protein of the present invention can cleave the target site on the target sequence, and the target sequence is double-stranded broken by the cleavage of Cas9 protein. Further, when the method is performed in a cell, the cleaved target sequence can be repaired by a non-homologous end joining repair or homologous recombination repair pathway in the cell, thereby achieving gene editing of the target sequence.
Experiments show that the CRISPR/Cas9 gene editing system and the gene editing method adopting the gene editing system have 6-83% of editing efficiency. In addition, for the CRISPR/Hsp1-Hsp3Cas9-K390A-Y446A gene editing system, the double base mismatch to the guide RNA has a fault tolerance of close to 0%. Therefore, the gene editing system can edit target genes with high specificity, has the characteristics of high editing efficiency and low off-target rate, and can be widely applied to gene editing in cells or in an in vitro environment.
Reagent kit
In a twelfth aspect, the present invention provides a kit for gene editing of a target sequence in a cellular or in vitro environment, comprising:
a) any one selected from the following 1) to 6):
1) a Cas9 protein or homologue thereof, a conjugate of the second aspect of the invention, or a fusion protein of the third aspect of the invention, and a single-stranded guide RNA of the fourth aspect of the invention corresponding to the Cas9 protein,
wherein the Cas9 protein is:
1.1) has the sequence of SEQ ID NO:1, Cco2Cas9 protein,
has the sequence shown in SEQ ID NO:2, the protein CcuCas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:3, the protein CspCas9,
has the sequence shown in SEQ ID NO:4, and the protein of Hap1Cas9 of the amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:5, the protein of Hap2Cas9,
has the sequence shown in SEQ ID NO:6, the HgaCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:7, the HtyCas9 protein of an amino acid sequence shown in the specification,
has the sequence shown in SEQ ID NO:8, Hsp1Cas9 protein,
has the sequence shown in SEQ ID NO:9, and an Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO: 1O, Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:11, Hsp4Cas9 protein,
has the sequence shown in SEQ ID NO:12, Hsp1-CcuCas9 protein,
has the sequence shown in SEQ ID NO:13, Hsp1-Hap2Cas9 protein,
has the sequence shown in SEQ ID NO:14, Hsp1-HgaCas9 protein,
has the sequence shown in SEQ ID NO:15, Hsp1-HtyCas9 protein,
has the sequence shown in SEQ ID NO:16, Hsp1-Hsp2Cas9 protein,
has the sequence shown in SEQ ID NO:17, Hsp1-Hsp3Cas9 protein,
has the sequence shown in SEQ ID NO:18, Hsp1-Hsp3Cas9-Y446A protein,
has the sequence shown in SEQ ID NO:19, Hsp1-Hsp3Cas9-K390A-Y446A protein,
has the sequence shown in SEQ ID NO:20, or Hsp1-Hsp4Cas9 protein, or
Has the sequence shown in SEQ ID NO:21, the Nsp2Cas9 protein of the amino acid sequence shown in the specification,
or is that
1.2) SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20 and SEQ ID NO:21, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or a homologue of an amino acid sequence that retains any percentage of its biological activity in its amino acid sequence;
2) the isolated nucleic acid molecule of the fifth and sixth aspects of the invention;
3) an isolated nucleic acid molecule of the seventh aspect of the invention;
4) the vector of the seventh and eighth aspects of the invention;
5) a vector of the seventh aspect of the invention; or
6) The CRISPR/Cas9 gene editing system of the ninth aspect of the invention;
and
b) instructions for how to perform gene editing of a target sequence in an intracellular or in vitro environment.
For item 1) above:
as an example, a polypeptide having SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, a homologue, conjugate or fusion protein thereof, having an amino acid sequence substantially identical to that shown in SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, homologues of an amino acid sequence having at least 80% sequence identity, conjugates or fusion proteins thereof, and a polypeptide comprising SEQ ID NO:43, a single-stranded guide RNA comprising a sequence identical to SEQ ID NO:43, or a single-stranded guide RNA comprising a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polynucleotide comprising a nucleotide sequence based on SEQ ID NO:43 single-stranded guide RNAs which are engineered to the resulting engineered sequence and retain their biological activity;
as an example, a polypeptide having SEQ ID NO:21, an Nsp2Cas9 protein having an amino acid sequence substantially identical to SEQ ID NO:21, homologues of an amino acid sequence having at least 80% sequence identity, conjugates or fusion proteins thereof, and a polypeptide comprising SEO ID NO:44, a single-stranded guide RNA comprising a sequence identical to SEQ ID NO:44 having at least 90% sequence identity and retaining its biological activity, or a single stranded guide RNA comprising a homologous sequence based on SEQ ID NO:44 engineering the resulting single stranded guide RNA with the engineered sequence that retains its biological activity.
For item 2) above:
as an example, a polypeptide comprising a nucleotide sequence encoding a polypeptide having SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20 (e.g., SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, or SEQ ID NO: 41) and an isolated nucleic acid molecule comprising a nucleic acid sequence encoding a Cas9 protein, a homologue, conjugate, or fusion protein thereof comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 an isolated nucleic acid molecule which is a nucleic acid sequence of a single stranded guide RNA (e.g., the nucleic acid sequence set forth in SEQ ID NO: 45) which has been engineered to have an engineered sequence which retains its biological activity;
as an example, a polypeptide comprising a nucleotide sequence encoding a polypeptide having SEQ ID NO:21, a homologue, conjugate or fusion protein thereof, and an isolated nucleic acid molecule comprising a nucleic acid sequence encoding an amino acid sequence for the Nsp2Cas9 protein, homologue, conjugate or fusion protein thereof (e.g., the nucleic acid sequence shown in SEQ ID NO: 42), and a nucleic acid molecule comprising a nucleic acid sequence encoding a polypeptide sequence for the Nsp2Cas9 protein, homologue, conjugate or fusion protein thereof comprising SEQ ID NO:44, a scaffold sequence comprising a sequence substantially identical to SEQ ID NO:44, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polypeptide comprising a sequence based on SEQ ID NO:44 an isolated nucleic acid molecule that has been engineered to the nucleic acid sequence of a single stranded guide RNA (e.g., the nucleic acid sequence shown in SEQ ID NO: 46) of the resulting engineered sequence that retains its biological activity.
For item 4) above:
as an example, a polypeptide comprising a nucleotide sequence encoding a polypeptide having SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20 (e.g. SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40 or SEQ ID NO: 41) and a vector comprising a nucleic acid sequence encoding a protein, homologue, conjugate or fusion protein comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 a vector which modifies the nucleic acid sequence of the single-stranded guide RNA obtained and retaining the modified sequence of its biological activity (for example, the nucleic acid sequence shown in SEQ ID NO: 45);
as an example, a polypeptide comprising a nucleotide sequence encoding a polypeptide having SEQ ID NO:21, a homologue thereof, a conjugate thereof or a fusion protein thereof, and a vector comprising a nucleic acid sequence encoding the Nsp2Cas9 protein, homologue thereof, conjugate or fusion protein comprising SEQ ID NO:44, a scaffold sequence comprising a sequence substantially identical to SEQ ID NO:44, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a polypeptide comprising a sequence based on SEQ ID NO: 44A vector for engineering the nucleic acid sequence of the resulting single-stranded guide RNA (e.g., the nucleic acid sequence shown in SEQ ID NO: 46) and retaining its biologically active engineered sequence.
Of course, it will be understood by those skilled in the art that other reagents that facilitate gene editing may also be included in the kits of the invention.
Brief description of the sequences involved in the invention
SEQ ID NO: 1: cco2Cas9 protein sequence
SEQ ID NO: 2: CcuCas9 protein sequence
SEQ ID NO: 3: cspcas9 protein sequence
SEQ ID NO: 4: hap1Cas9 protein sequence
SEQ ID NO: 5: hap2Cas9 protein sequence
SEQ ID NO: 6: HgaCas9 protein sequence
SEQ ID NO: 7: HtyCas9 protein sequence
SEQ ID NO: 8: hsp1Cas9 protein sequence
SEQ ID NO: 9: hsp2Cas9 protein sequence
SEQ ID NO: 10: hsp3Cas9 protein sequence
SEQ ID NO: 11: hsp4Cas9 protein sequence
SEQ ID NO: 12: hsp1-CcuCas9 protein sequence
SEQ ID NO: 13: hsp1-Hap2Cas9 protein sequence
SEQ ID NO: 14: hsp1-HgaCas9 protein sequence
SEQ ID NO: 15: hsp1-HtyCas9 protein sequence
SEQ ID NO: 16: hsp1-Hsp2Cas9 protein sequence
SEQ ID NO: 17: hsp1-Hsp3Cas9 protein sequence
SEQ ID NO: 18: hsp1-Hsp3Cas9-Y446A protein sequence
SEQ ID NO: 19: hsp1-Hsp3Cas9-K390A-Y446A protein sequence
SEQ ID NO: 20: hsp1-Hsp4Cas9 protein sequence
SEQ ID NO: 21: nsp2Cas9 protein sequence
SEQ ID NO: 22: coding sequence of Cco2Cas9 protein
SEQ ID NO: 23: coding sequence of CcuCas9 protein
SEQ ID NO: 24: coding sequence of CspCas9 protein
SEQ ID NO: 25: coding sequence of Hap1Cas9 protein
SEQ ID NO: 26: coding sequence of Hap2Cas9 protein
SEQ ID NO: 27: coding sequence of HgaCas9 protein
SEQ ID NO: 28: coding sequence of HtyCas9 protein
SEQ ID NO: 29: coding sequence of Hsp1Cas9 protein
SEQ ID NO: 30: coding sequence of Hsp2Cas9 protein
SEQ ID NO: 31: coding sequence of Hsp3Cas9 protein
SEQ ID NO: 32: coding sequence of Hsp4Cas9 protein
SEQ ID NO: 33: coding sequence of Hsp1-CcuCas9 protein
SEQ ID NO: 34: coding sequence of Hsp1-Hap2Cas9 protein
SEQ ID NO: 35: coding sequence of Hsp1-HgaCas9 protein
SEQ ID NO: 36: coding sequence of Hsp1-HtyCas9 protein
SEQ ID NO: 37: coding sequence of Hsp1-Hsp2Cas9 protein
SEQ ID NO: 38: coding sequence of Hsp1-Hsp3Cas9 protein
SEQ ID NO: 39: coding sequence of Hsp1-Hsp3Cas9-Y446A protein
SEQ ID NO: 40: coding sequence of Hsp1-Hsp3Cas9-K390A-Y446A protein
SEQ ID NO: 41: coding sequence of Hsp1-Hsp4Cas9 protein
SEQ ID NO: 42: coding sequence of Nsp2Cas9 protein
SEQ ID NO: 43: a scaffold sequence with Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas 569 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-HgaCas9 protein, Hsp1-HtyCas9 protein, Hsp1-Hsp2Cas9 protein, Hsp1-Hsp3Cas9 protein, 1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, and Hsp1-Hsp4Cas9 protein
SEQ ID NO: 44: scaffold sequences for use with Nsp2Cas9 proteins
SEQ ID NO: 45: DNA sequences of scaffold sequences of single-stranded guide RNAs related to Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas 56565656 9 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-HgaCas9 protein, Hsp1-HtyCas9 protein, Hsp1-Hsp2Cas9 protein, Hsp1-Hsp3Cas9 protein, 1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, and Hsp1-Hsp4Cas9 protein
SEQ ID NO: 46: DNA sequence of scaffold sequence of single-stranded guide RNA related to Nsp2Cas9 protein
Examples
The invention will now be described with reference to the following examples which are intended to illustrate, but not to limit the invention. It will be appreciated by those skilled in the art that the examples provided herein are for the purpose of describing the invention in detail only and are not intended to limit the scope of the invention as claimed.
Unless otherwise indicated, the experiments and procedures described in the examples were performed essentially according to conventional methods well known in the art and described in various references. In addition, for those whose specific conditions are not specified in the examples, they were conducted under the conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products commercially available.
Example 1
(1) Construction of plasmid pAAV2_ Cas9_ ITR
The amino acid sequence information was downloaded according to the gene accession numbers for each Cas protein listed in table 1, wherein the amino acid sequences of Cco2Cas protein, CcuCas protein, CspCas protein, Hap1Cas protein, Hap2Cas protein, HgaCas protein, HtyCas protein, Hsp1Cas protein, Hsp2Cas protein, Hsp3Cas protein, Hsp4Cas protein, Hsp-uucas protein, Hsp-Hap 2Cas protein, Hsp-HgaCas protein, Hsp-HtyCas protein, Hsp-Hsp 2Cas protein, Hsp-Hsp 3 Cas-Y446 protein, Hsp-Hsp 3 Cas-K390-Y446 protein, Hsp-4 Cas protein, and Nsp2Cas protein are as shown in SEQ ID NO:1 to SEQ ID NO: shown at 21.
TABLE 1Cas9 protein and its NCBI protein search ID and sequence numbering
Figure BDA0003190141730000431
Figure BDA0003190141730000441
And (3) carrying out codon optimization on the obtained coding nucleic acid sequence of the Cas9 protein to obtain a gene sequence of the Cas protein highly expressed in human cells. The optimized gene sequences of Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas9 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-HgaCas9 protein, 1-HtyCas9 protein, Hsp1-Hsp2 Hsp 9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, Hsp1-Hsp4Cas9 protein and Nsp 29 protein are respectively as shown in SEQ ID NO:22 to SEQ ID NO: shown at 42.
The SEQ ID NO:22 to SEQ ID NO:42, and constructing the gene sequence with high expression of each Cas protein onto a stugca 9 skeleton plasmid (Addgene platform, catalog #163793) to obtain a plasmid pAAV2_ Cas9_ ITR.
(2-1) construction of plasmid pSK-U6-Cj-tracr
The pSKB-plasmid (Addgene platform, catalog #62540) was digested with ClaI and XhoI restriction enzymes: mu.g of plasmid pSKB-, 5. mu.L of 10 XCutSmart buffer (from NEB), 1. mu.L of ClaI and 1. mu.L of XhoI restriction enzyme (from NEB), water to 50. mu.L. The cleavage system was allowed to react overnight at 37 ℃.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
A2945 bp DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Inc., Beijing, DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water.
A linear RNA-pSKB-9 plasmid, in which the DNA sequence of the single-chain pSKB-9 of the Cco2Cas9 protein, the CcuCas9 protein, the CcsCas 9 protein, the Hap1Cas9 protein, the Hap2Cas9 protein, the HgaCas9 protein, the HtyCas9 protein, the Hsp1Cas9 protein, the Hsp2Cas9 protein, the Hsp3Cas9 protein, the Hsp 9-CcuCas 9 protein, the Hsp 9-Hap 2Cas9 protein, the Hsp 9-HgaCas 9 protein, the Hsp 9-HtyCas 9 protein, the Hsp 9-Hsp 9 protein, the Hsp 9-Hsp 3Cas 9-K9-9 protein, the Hsp 9-Hsp 9 protein, the CcsKB-pSKB-9, the DNA sequence of the CcdsNO: 45, the CcsID-CspKB-9 protein, the CspKB-9, the CspA-9, the CspKB-CsKB-CsCsCsU-9, the DNA, the CsCsU, the DNA, the CsU, the C.
(2-2) construction of plasmid pU6-Nme2-tracr
The sequence of the Nme2Cas9_ AAV plasmid (Addgene platform, catalog #119924) contains the scaffold sequence in the single-stranded guide RNA for the Nsp2Cas9 protein (the DNA sequence of which is SEQ ID NO: 46). The Nme2Cas9_ AAV plasmid (Addgene platform, catalog #119924) was linearized by PCR with primer sequences GGCGGTACTATGTAGATGAGGGCCGCAGGAACCCCTAG and CTCATCTACATAGTACCGCCTCCA.
The reaction system is as follows:
Figure BDA0003190141730000451
the PCR run program was as follows:
Figure BDA0003190141730000452
Figure BDA0003190141730000461
the PCR product was electrophoresed on a 1% agarose gel at 120V for 30min, and a 3368bp DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. Using NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
Carrying out homologous recombination on the linearized Nme2Cas9_ AAV fragment according to the proportion required by the specification, wherein the used homologous recombinase is
Figure BDA0003190141730000462
High fidelity DNA assembly premix (NEB), the reaction system is as follows:
Figure BDA0003190141730000463
the reaction conditions were as follows:
Figure BDA0003190141730000464
the recombinant product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
The recovered Escherichia coli DH5 alpha competent cells were spread on LB solid plate containing ampicillin resistance and cultured in an inverted manner in an incubator at 37 ℃ to obtain Escherichia coli DH5 alpha monoclonal for Sanger sequencing.
And (3) carrying out sequencing verification and correct connection of Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain the plasmid pU6-Nme2-tracr for later use.
(3-1) preparation of linearized plasmid pSK-U6-Cj-tracr
The plasmid pSK-U6-Cj-tracr is subjected to enzyme digestion reaction by BbsI restriction enzyme, wherein the enzyme digestion reaction comprises the following steps: mu.g of plasmid pSK-U6-Cj-tracr, 5. mu.L of 10 × CutSmart buffer (from NEB), 1. mu.L of BbsI restriction enzyme (from NEB) and water to make up to 50. mu.L. The enzyme was allowed to react at 37 ℃ for 2 hours.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
The DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. The DNA fragment is a linearized plasmid pSK-U6-Cj-tracr with the size of 3380 bp.
The recovered linearized plasmid pSK-U6-Cj-tracr was treated with NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
(3-2) preparation of linearized plasmid pU6-Nme2-tracr
The plasmid pU6-Nme2-tracr was digested with BspQI restriction enzymes as follows: mu.g of plasmid pU6-Nme2-tracr, 5. mu.L of 10 XBuffer 3.1 Buffer (from NEB), 1. mu.L of BspQI restriction enzyme (from NEB) and water to make up to 50. mu.L. The enzyme was allowed to react at 50 ℃ for 2 hours.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
The DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. The DNA fragment is a linearized plasmid pU6-Nme2-tracr, and the size of the linearized plasmid is 3326 bp.
The recovered linearized plasmid pU6-Nme2-tracr was treated with NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
(4-1) preparation of plasmid pSK-U6-Cj-sgRNA
Designing gRNA (GAGUAGAGGCGGCCACGACCUG), respectively adding sticky end sequences corresponding to two sides of a linearized plasmid pSK-U6-Cj-tracr to a sense strand and an antisense strand for the designed gRNA sequence, and synthesizing two oligonucleotide single-stranded DNAs, wherein the specific sequences of the two oligonucleotide single-stranded DNAs are as follows (wherein lower case letters represent the sticky end sequences):
Oligo-F:tttgAGTAGAGGCGGCCACGACCTGGT
Oligo-R:taaaACCAGGTCGTGGCCGCCTCTACT
annealing the oligonucleotide single-stranded DNA to obtain a double-stranded DNA. The annealing reaction system is as follows: mu.L of 100. mu.M oligo-F, 1. mu.L of 100. mu.M oligo-R, 28. mu.L of water. After the annealing system is vibrated and uniformly mixed, the annealing system is placed in a PCR instrument to run an annealing program, wherein the annealing program comprises the following steps: 95 ℃ 5min, 85 ℃ 1min, 75 ℃ 1min, 65 ℃ 1min, 55 ℃ 1min, 45 ℃ 1min, 35 ℃ 1min, 25 ℃ 1min, 4 ℃ storage, cooling rate 0.3 ℃/s. After annealing, the resulting product was ligated to the linearized pSK-U6-Cj-tracr plasmid obtained in step (3-1) by DNA ligase (available from NEB).
The ligation product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
And coating the recovered escherichia coli DH5 alpha competent cells on an LB solid plate containing corresponding resistance, performing inverted culture in an incubator at 37 ℃, and performing Sanger sequencing verification on the obtained escherichia coli DH5 alpha monoclonal.
And (3) carrying out sequencing verification and correct connection on Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain plasmids pSK-U6-Cj-sgRNA containing expression target sgRNA sequences for later use.
(4-2) preparation of plasmid pU6-Nme2-sgRNA
Designing gRNA (GAGUAGAGGCGGCCACGACCUG), respectively adding a sticky end sequence corresponding to two sides of a linearized plasmid pU6-Nme2-tracr to a sense strand and an antisense strand for the designed gRNA sequence, and synthesizing two oligonucleotide single-stranded DNAs, wherein the specific sequences of the two oligonucleotide single-stranded DNAs are as follows (wherein lower case letters represent the sticky end sequences):
Oligo-F:accGAGTAGAGGCGGCCACGACCTG
Oligo-R:aacCAGGTCGTGGCCGCCTCTACTC
annealing the oligonucleotide single-stranded DNA to obtain a double-stranded DNA. The annealing reaction system is as follows: mu.L of 100. mu.M oligo-F, 1. mu.L of 100. mu.M oligo-R, 28. mu.L of water. After the annealing system is vibrated and uniformly mixed, the annealing system is placed in a PCR instrument to run an annealing program, wherein the annealing program comprises the following steps: 95 ℃ 5min, 85 ℃ 1min, 75 ℃ 1min, 65 ℃ 1min, 55 ℃ 1min, 45 ℃ 1min, 35 ℃ 1min, 25 ℃ 1min, 4 ℃ storage, cooling rate 0.3 ℃/s. After annealing, the resulting product was ligated to the linearized pU6-Nme2-tracr plasmid obtained in step (3-2) by DNA ligase (available from NEB).
The ligation product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
And coating the recovered escherichia coli DH5 alpha competent cells on an LB solid plate containing corresponding resistance, performing inverted culture in an incubator at 37 ℃, and performing Sanger sequencing verification on the obtained escherichia coli DH5 alpha monoclonal.
And (3) carrying out sequencing verification and correct connection of Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain a plasmid pU6-Nme2-sgRNA containing an expression target sgRNA sequence for later use.
(5) The resulting sgRNA-expressing plasmid pSK-U6-Cj-sgRNA or pU6-Nme2-sgRNA and Cas 9-expressing plasmid pAAV2_ Cas9_ ITR were each transfected into a library of GFP reporter line HEK293T containing the target sequence (GAGTAGAGGCGGCCACGACCTG) by liposome means.
The library of the GFP reporter HEK293T cell line containing the target sequence was obtained by: a24 bp protospacer (as target sequence) and an 8bp random sequence (as PAM sequence) were inserted between the initiation codon ATG and the GFP coding sequence, resulting in a GFP frameshift mutation without expression. This GFP gene containing the insert was driven by the CMV promoter and constructed on a lentiviral expression vector. This sequence was randomly inserted into the genome of HEK293T cells mediated by lentiviruses, making it a stable GFP reporter cell line library. When the target sequence is cut by using the gene editing system, the GFP reading frame of a part of cells can be restored by the cells through the self-repairing system to generate green fluorescence, which indicates that the gene editing system can be used for gene editing in mammalian cells.
The transfection process comprises the following steps:
on day 0, a library of the GFP reporter HEK293T cell line containing the target sequence was plated at 30% cell density on 10cm dishes as required for transfection.
The GFP reporter system HEK293T cell line library containing the target sequence comprises a CMV-ATG-target site-PAM-GFP nucleotide sequence, wherein the PAM sequence is an 8bp random sequence, and the sequence of the target site (target site) is GAGTAGAGGCGGCCACGACCTG (figure 1).
On day 1, transfection was performed as follows:
mu.g of plasmid pAAV2_ Cas9_ ITR expressing Cas9 and 5. mu.g of plasmid pSK-U6-Cj-sgRNA or pU6-Nme2-sgRNA expressing sgRNA were added to 500. mu.L of Opti-MEM medium (purchased from Gibco Co.) and mixed by gentle pipetting, respectively.
Will be provided with
Figure BDA0003190141730000501
2000 (available from Invitrogen) or PEI (available from polysciences) were gently mixed and 30. mu.L of the mixture was aspirated
Figure BDA0003190141730000502
2000 or 15. mu.L PEI (100. mu.M) was added to 500. mu.L Opti-MEM medium, gently mixed, and allowed to stand at room temperature for 5 min.
Mixing the diluted plasmid and the diluted transfection reagent, gently blowing and mixing uniformly, standing the obtained mixed solution at room temperature for 20min, adding the mixed solution into a culture medium of a GFP reporter system HEK293T cell line library containing a target sequence, and placing the culture medium at 37 ℃ and 5% CO2And continuing culturing in the incubator.
Then, in CO2The target sequences in the HEK293T cell line library were edited by each CRISPR/Cas9 system observed under a fluorescent microscope after 5 days in the incubator, and the results are shown in fig. 2. From this figure it can be seen that the CRISPR/Cco2Cas9 system, CRISPR/CcuCas9 system, CRISPR/CspCas9 system, CRISPR/Hap1Cas9 system, CRISPR/Hap2Cas9 system, CRISPR/HgaCas9 system, CRISPR/HtyCas9 system, CRISPR/Hsp1Cas9 system, CRISPR/Hsp2Cas9 system, CRISPR/Hsp3Cas9 system, the library cells of the CRISPR/Hsp4Cas9 system, the CRISPR/Hsp1-CcuCas9 system, the CRISPR/Hsp1-Hap2Cas9 system, the CRISPR/Hsp1-HgaCas9 system, the CRISPR/Hsp1-HtyCas9 system, the CRISPR/Hsp1-Hsp2Cas9 system, the CRISPR/Hsp1-Hsp3Cas9 system, the CRISPR/Hsp1-Hsp3Cas9-Y446A system, the CRISPR/Hsp1-Hsp3Cas9-K390A-Y446A system, the CRISPR/Hsp1-Hsp 9 system and the CRISPR/Nsp2Cas9 system all show green fluorescence, indicating that these systems successfully edit the target sequence in the cells.
Example 2
(1) Construction of plasmid pAAV2_ Cas9_ ITR
The amino acid sequences were downloaded according to the gene accession numbers of each Cas9 protein listed in table 1 above, wherein the amino acid sequences of Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hsp1Cas9 protein, Hsp3Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein and Nsp2Cas9 protein are shown as SEQ ID NO:1 to SEQ ID NO: 4. SEQ ID NO: 8. SEQ ID NO: 10. SEQ ID NO:17 to SEQ ID NO:19 and SEQ ID NO: shown at 21.
And carrying out codon optimization on the coding nucleic acid sequence of each Cas9 protein to obtain a gene sequence of the Cas9 protein highly expressed in human cells. The optimized gene sequences of Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hsp1Cas9 protein, Hsp3Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein and Nsp2Cas9 protein are respectively shown as SEQ ID NO:22 to SEQ ID NO: 25. SEQ ID NO: 29. SEQ ID NO: 31. SEQ ID NO:38 to SEQ ID NO:40 and SEQ ID NO: shown at 42.
The above-obtained SEQ ID NO:22 to SEQ ID NO: 25. SEQ ID NO: 29. SEQ ID NO: 31. SEQ ID NO:38 to SEQ ID NO:40 and SEQ ID NO:42, and constructing the gene sequence with high expression of each Cas9 protein onto a stuggCas 9 skeleton plasmid (Addgene platform, catalog #163793) to obtain a plasmid pAAV2_ Cas9_ ITR.
(2-1) construction of plasmid pSK-U6-Cj-tracr
The pSKB-plasmid (Addgene platform, catalog #62540) was digested with ClaI and XhoI restriction enzymes: mu.g of plasmid pSKB-, 5. mu.L of 10 XCutSmart buffer (from NEB), 1. mu.L of ClaI and 1. mu.L of XhoI restriction enzyme (from NEB), water to 50. mu.L. The cleavage system was allowed to react overnight at 37 ℃.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
A2945 bp DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Inc., Beijing, DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water.
The scaffold sequences (the DNA sequence of which is SEQ ID NO: 45) in single-stranded guide RNAs aiming at Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hsp1Cas9 protein, Hsp3Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein and Hsp1-Hsp3Cas9-K390A-Y446A protein were subjected to gene synthesis and constructed on a linearized pSKB-scaffold to obtain plasmid pSK-U6-Cj-tracr.
(2-2) construction of plasmid pU6-Nme2-tracr
The sequence of the Nme2Cas9_ AAV plasmid (Addgene platform, catalog #119924) contains the scaffold sequence in the single-stranded guide RNA for the Nsp2Cas9 protein (the DNA sequence of which is SEQ ID NO: 46). Linearizing an Nme2Cas9_ AAV plasmid (Addgene platform, catalog #119924) by using a PCR method, wherein the primer sequence is GGCGGTACTATGTAGATGAGGGCCGCAGGAACCCCTAG; and CTCATCTACATAGTACCGCCTCCA.
The reaction system is as follows:
Figure BDA0003190141730000521
the PCR run program was as follows:
Figure BDA0003190141730000531
the PCR product was electrophoresed on a 1% agarose gel at 120V for 30min, and a 3368bp DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. Using NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
Carrying out homologous recombination on the linearized Nme2Cas9_ AAV fragment according to the proportion required by the specification, wherein the used homologous recombinase is
Figure BDA0003190141730000534
High fidelity DNA assembly premix (NEB), the reaction system is as follows:
Figure BDA0003190141730000532
the reaction conditions were as follows:
Figure BDA0003190141730000533
the recombinant product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
The recovered Escherichia coli DH5 alpha competent cells were spread on LB solid plate containing ampicillin resistance and cultured in an inverted manner in an incubator at 37 ℃ to obtain Escherichia coli DH5 alpha monoclonal for Sanger sequencing.
And (3) carrying out sequencing verification and correct connection of Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain the plasmid pU6-Nme2-tracr for later use.
(3-1) preparation of linearized plasmid pSK-U6-Cj-tracr
The plasmid pSK-U6-Cj-tracr is subjected to enzyme digestion reaction by BbsI restriction enzyme, wherein the enzyme digestion reaction comprises the following steps: mu.g of plasmid pSK-U6-Cj-tracr, 5. mu.L of 10 × CutSmart buffer (from NEB), 1. mu.L of BbsI restriction enzyme (from NEB) and water to make up to 50. mu.L. The enzyme was allowed to react at 37 ℃ for 2 hours.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
The DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. The DNA fragment is a linearized plasmid pSK-U6-Cj-tracr with the size of 3380 bp.
The recovered linearized plasmid pSK-U6-Cj-tracr was treated with NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
(3-2) preparation of linearized plasmid pU6-Nme2-tracr
The plasmid pU6-Nme2-tracr was digested with BspQI restriction enzymes as follows: mu.g of plasmid pU6-Nme2-tracr, 5. mu.L of 10 XBuffer 3.1 Buffer (from NEB), 1. mu.L of BspQI restriction endonuclease (from NEB) and water to make up to 50. mu.L. The enzyme was allowed to react at 50 ℃ for 2 hours.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min. The DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. The DNA fragment is a linearized plasmid pU6-Nme2-tracr, and the size of the linearized plasmid is 3326 bp.
The recovered linearized plasmid pU6-Nme2-tracr was treated with NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
(4-1) preparation of plasmid pSK-U6-Cj-sgRNA
Each gRNA was designed and its sequence is shown in table 2. The sticky end sequences corresponding to both sides of the linearized plasmid pSK-U6-Cj-tracr were added to the sense strand and antisense strand corresponding to each gRNA sequence designed, and two oligonucleotide single-stranded DNAs were synthesized, the specific sequences of which are also shown in the following table.
Figure BDA0003190141730000561
Annealing the oligonucleotide single-stranded DNA to obtain a double-stranded DNA. The annealing reaction system is as follows: mu.L of 100. mu.M oligo-F, 1. mu.L of 100. mu.M oligo-R, 28. mu.L of water. After the annealing system is vibrated and uniformly mixed, the annealing system is placed in a PCR instrument to run an annealing program, wherein the annealing program comprises the following steps: 95 ℃ 5min, 85 ℃ 1min, 75 ℃ 1min, 65 ℃ 1min, 55 ℃ 1min, 45 ℃ 1min, 35 ℃ 1min, 25 ℃ 1min, 4 ℃ storage, cooling rate 0.3 ℃/s. After annealing, the resulting product was ligated to the linearized pSK-U6-Cj-tracr plasmid obtained in step (3-1) by DNA ligase (available from NEB).
The ligation product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
And coating the recovered escherichia coli DH5 alpha competent cells on an LB solid plate containing corresponding resistance, performing inverted culture in an incubator at 37 ℃, and performing Sanger sequencing verification on the obtained escherichia coli DH5 alpha monoclonal.
And (3) carrying out sequencing verification and correct connection on Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain plasmids pSK-U6-Cj-sgRNA containing expression target sgRNA sequences for later use.
(4-2) preparation of plasmid pU6-Nme2-sgRNA
Each gRNA was designed and its sequence is shown in table 2 above. Sticky end sequences corresponding to both sides of the linearized plasmids pU6-Nme2-tracr were added to the sense strand and antisense strand corresponding to each gRNA sequence designed, and two oligonucleotide single-stranded DNAs were synthesized, and the specific sequences of these two oligonucleotide single-stranded DNAs are also shown in Table 2 above.
Annealing the oligonucleotide single-stranded DNA to obtain a double-stranded DNA. The annealing reaction system is as follows: mu.L of 100. mu.M oligo-F, 1. mu.L of 100. mu.M oligo-R, 28. mu.L of water. After the annealing system is vibrated and uniformly mixed, the annealing system is placed in a PCR instrument to run an annealing program, wherein the annealing program comprises the following steps: 95 ℃ 5min, 85 ℃ 1min, 75 ℃ 1min, 65 ℃ 1min, 55 ℃ 1min, 45 ℃ 1min, 35 ℃ 1min, 25 ℃ 1min, 4 ℃ storage, cooling rate 0.3 ℃/s. After annealing, the resulting product was ligated to the linearized pU6-Nme2-tracr plasmid obtained in step (3-2) by DNA ligase (available from NEB).
The ligation product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
And coating the recovered escherichia coli DH5 alpha competent cells on an LB solid plate containing corresponding resistance, performing inverted culture in an incubator at 37 ℃, and performing Sanger sequencing verification on the obtained escherichia coli DH5 alpha monoclonal.
And (3) carrying out sequencing verification and correct connection of Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain a plasmid pU6-Nme2-sgRNA containing an expression target sgRNA sequence for later use.
(5-1) transfection of plasmid pAAV2_ Cas9_ ITR expressing Cas protein and plasmid pSK-U6-Cj-sgRNA expressing sgRNA into HEK293T cell line
On day 0, HEK293T cells containing the target sequence were plated in 24-well plates at a cell density of about 30% or so, as required for transfection.
On day 1, transfection was performed as follows:
500ng of plasmid pAAV2_ Cas9_ ITR and 300ng of plasmid pSK-U6-Cj-sgRNA were added to 25. mu.LOpti-MEM medium (purchased from Gibco Co.) and gently pipetted and mixed.
Liposome of transfection reagent
Figure BDA0003190141730000581
2000 (available from Invitrogen) or Polyethyleneimine (PEI) (available from polysciences) by flicking and mixing, and sucking 1.6. mu.L
Figure BDA0003190141730000582
2000 or 0.8. mu.L PEI (100. mu.M) was added to 25. mu.L Opti-MEM medium (purchased from Gibco Co.), gently mixed, and allowed to stand at room temperature for 5 min.
Mixing the diluted transfection reagent and the diluted plasmid, gently blowing and mixing, standing at room temperature for 20min, adding into a culture medium containing HEK293T cells to be transfected, and placing the cells at 37 deg.C and 5% CO2The cultivation was continued in the incubator for 5 days.
(5-2) transfection of plasmid pAAV2_ Cas9_ ITR expressing Cas protein and plasmid pU6-Nme2-sgRNA expressing sgRNA into HEK293T cell line
On day 0, HEK293T cells containing the target sequence were plated in 24-well plates at a cell density of about 30% or so, as required for transfection.
On day 1, transfection was performed as follows:
500ng of plasmid pAAV2_ Cas9_ ITR and 300ng of plasmid pU6-Nme2-sgRNA were added to 25. mu.LOpti-MEM medium (purchased from Gibco Co.) and gently pipetted and mixed.
Liposome of transfection reagent
Figure BDA0003190141730000591
2000 (available from Invitrogen) or Polyethyleneimine (PEI) (available from polysciences) by flicking and mixing, and sucking 1.6. mu.L
Figure BDA0003190141730000592
2000 or 0.8. mu.L PEI (100. mu.M) was added to 25. mu.L Opti-MEM medium (purchased from Gibco Co.), gently mixed, and allowed to stand at room temperature for 5 min.
Mixing the diluted transfection reagent and the diluted plasmid, gently blowing and mixing, standing at room temperature for 20min, adding into a culture medium containing HEK293T cells to be transfected, and placing the cells at 37 deg.C and 5% CO2The cultivation was continued in the incubator for 5 days.
(6) Preparation of a second Generation sequencing library
HEK293T cells were collected 5 days after editing, and genomic DNA was extracted using a DNA kit (tengen biochemistry technologies (beijing) ltd., DP304) according to the instructions provided by the DNA kit.
PCR pooling first round of PCR was performed using a 2XQ5 Mastermix PCR reaction with the PCR primers shown in Table 3 below:
Figure BDA0003190141730000601
the reaction system is as follows:
Figure BDA0003190141730000611
the PCR run program was as follows:
Figure BDA0003190141730000612
sequencing and pooling second round of PCR was performed using a 2xQ5 Mastermix PCR reaction with the following primers:
f2 primer:
Figure BDA0003190141730000613
r2 primer:
Figure BDA0003190141730000614
the reaction system is as follows:
Figure BDA0003190141730000615
Figure BDA0003190141730000621
the PCR run program was as follows:
Figure BDA0003190141730000622
the PCR products of the second round were purified using a gel recovery kit according to the manufacturer's procedures using 418bp, 421bp, 358bp, 381bp, 378bp, 417bp, 415bp, 419bp, 375bp, 355bp, 400bp, 397bp, 390bp and 390bp DNA fragments of the sizes of A46, V60, A28, A30, A45, A51, A18, A19, V26, V28, A59, V21, A10 and V54, respectively. Thus, the second generation sequencing library was prepared.
(8) Analysis of the results of the second Generation sequencing
The prepared second-generation sequencing library was paired-end sequenced on a high-throughput sequencer hiseqxten (illumina).
The efficiency of editing for each of the two target sites as calculated by the second generation sequencing is shown in fig. 3 to 12, where the X-axis represents the target site and the Y-axis represents the efficiency of editing (Indels%). As can be seen from the figure, gene editing systems containing Cco2Cas9, CcuCas9, CspCas9, Hap1Cas9, Hsp1Cas9, Hsp3Cas9, Hsp1-Hsp3Cas9, Hsp1-Hsp3Cas9-Y446A, Hsp1-Hsp3Cas9-K390A-Y446A and Nsp2Cas9 proteins can all be used for cell gene editing.
Example 3
(1) Construction of plasmid pAAV2_ Cas9_ ITR
The amino acid sequence information was downloaded according to the gene accession numbers of the individual Cas9 proteins listed in table 1 above, wherein the amino acid sequences of the Cco2Cas9 protein, the CcuCas9 protein, the Hsp1Cas9 protein, the Hsp1-Hsp3Cas9 protein, the Hsp1-Hsp3Cas9-Y446A protein, the Hsp1-Hsp3Cas9-K390A-Y446A protein and the Nsp2Cas9 protein are as set forth in SEQ ID NO:1 to SEQ ID NO: 2. SEQ ID NO: 8. SEQ ID NO:17 to SEQ ID NO:19 and SEQ ID NO: shown at 21.
And (3) carrying out codon optimization on the obtained coding nucleic acid sequence of the Cas9 protein to obtain a gene sequence of the Cas protein highly expressed in human cells. The optimized gene sequences of Cco2Cas9 protein, CcuCas9 protein, Hsp1Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein and Nsp2Cas9 protein are respectively shown as SEQ ID NO:22 to SEQ ID NO: 23. SEQ ID NO: 29. SEQ ID NO:38 to SEQ ID NO:40 and SEQ ID NO: shown at 42.
The SEQ ID NO:22 to SEQ ID NO: 23. SEQ ID NO: 29. SEQ ID NO:38 to SEQ ID NO:40 and SEQ ID NO:42, and constructing the gene sequence with high expression of each Cas protein onto a stugca 9 skeleton plasmid (Addgene platform, catalog #163793) to obtain a plasmid pAAV2_ Cas9_ ITR.
(2-1) construction of plasmid pSK-U6-Cj-tracr
The pSKB-plasmid (Addgene platform, catalog #62540) was digested with ClaI and XhoI restriction enzymes: mu.g of plasmid pSKB-, 5. mu.L of 10 XCutSmart buffer (from NEB), 1. mu.L of ClaI and 1. mu.L of XhoI restriction enzyme (from NEB), water to 50. mu.L. The cleavage system was allowed to react overnight at 37 ℃.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
A2945 bp DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Inc., Beijing, DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water.
The scaffold sequences (the DNA sequence of which is SEQ ID NO: 45) in the single-stranded guide RNAs aiming at Cco2Cas9 protein, CcuCas9 protein, Hsp1Cas9 protein, Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein and Hsp1-Hsp3Cas9-K390A-Y446A are subjected to gene synthesis and constructed on a linearized pSKB-skeleton to obtain a plasmid pSK-U6-Cj-tracr.
(2-2) construction of plasmid pU6-Nme2-tracr
The sequence of the Nme2Cas9_ AAV plasmid (Addgene platform, catalog #119924) contains the scaffold sequence in the single-stranded guide RNA for the Nsp2Cas9 protein (the DNA sequence of which is SEQ ID NO: 46). Linearizing an Nme2Cas9_ AAV plasmid (Addgene platform, catalog #119924) by using a PCR method, wherein the primer sequence is GGCGGTACTATGTAGATGAGGGCCGCAGGAACCCCTAG; and CTCATCTACATAGTACCGCCTCCA.
The reaction system is as follows:
Figure BDA0003190141730000641
the PCR run program was as follows:
Figure BDA0003190141730000642
the PCR product was electrophoresed on a 1% agarose gel at 120V for 30min, and a 3368bp DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. Using NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
Carrying out homologous recombination on the linearized Nme2Cas9_ AAV fragment according to the proportion required by the specification, wherein the used homologous recombinase is
Figure BDA0003190141730000651
High fidelity DNA assembly premix (NEB), the reaction system is as follows:
Figure BDA0003190141730000652
the reaction conditions were as follows:
Figure BDA0003190141730000653
the recombinant product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added to 900. mu.L of LB medium, and cultured at 37 ℃ for 1 hour to activate and revive E.coli DH 5. alpha. competent cells.
The recovered Escherichia coli DH5 alpha competent cells were spread on LB solid plate containing ampicillin resistance and cultured in an inverted manner in an incubator at 37 ℃ to obtain Escherichia coli DH5 alpha monoclonal for Sanger sequencing.
And (3) carrying out sequencing verification and correct connection of Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain the plasmid pU6-Nme2-tracr for later use.
(3-1) preparation of linearized plasmid pSK-U6-Cj-tracr
The plasmid pSK-U6-Cj-tracr is subjected to enzyme digestion reaction by BbsI restriction enzyme, wherein the enzyme digestion reaction comprises the following steps: mu.g of plasmid pSK-U6-Cj-tracr, 5. mu.L of 10 × CutSmart buffer (from NEB), 1. mu.L of BbsI restriction enzyme (from NEB) and water to make up to 50. mu.L. The enzyme was allowed to react at 37 ℃ for 2 hours.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
The DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. The DNA fragment is a linearized plasmid pSK-U6-Cj-tracr with the size of 3380 bp.
The recovered linearized plasmid pSK-U6-Cj-tracr was treated with NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
(3-2) preparation of linearized plasmid pU6-Nme 2-track
The plasmid pU6-Nme2-tracr was digested with BspQI restriction enzymes as follows: mu.g of plasmid pU6-Nme2-tracr, 5. mu.L of 10 XBuffer 3.1 Buffer (from NEB), 1. mu.L of BspQI restriction enzyme (from NEB) and water to make up to 50. mu.L. The enzyme was allowed to react at 50 ℃ for 2 hours.
Then, the cleavage products were electrophoresed on a 1% agarose gel at 120V for 30 min.
The DNA fragment was excised from the agarose gel, recovered with a gel recovery kit (Tiangen Biochemical technology, Beijing, Ltd., DP209) according to the instructions provided by the manufacturer, and finally eluted with ultrapure water. The DNA fragment is a linearized plasmid pU6-Nme2-tracr, and the size of the linearized plasmid is 3326 bp.
The recovered linearized plasmid pU6-Nme2-tracr was treated with NanoDropTMThe DNA concentration was measured by Lite spectrophotometer (Thermo Scientific) and stored for a long period at-20 ℃.
(4-1) preparation of plasmid pSK-U6-Cj-on target sgRNA or pSK-U6-Cj-mismatch sgRNA
The sequences of each on target gRNA and mismatch gRNA were designed and their corresponding oligonucleotide single-stranded DNAs are shown in table 4 below, where mismatch bases are shown in the sequence listing as underlined bold bases.
The obtained single-stranded oligonucleotide DNA corresponding to the on target gRNA and the single-stranded oligonucleotide DNA corresponding to different mismatch gRNAs were annealed. The annealing reaction system is as follows: mu.L of 100. mu.M oligo-F, 1. mu.L of 100. mu.M oligo-R, 28. mu.L of water. After the annealing system is vibrated and mixed uniformly, the annealing system is placed in a PCR instrument to run an annealing program; the annealing procedure was as follows: 95 ℃ 5min, 85 ℃ 1min, 75 ℃ 1min, 65 ℃ 1min, 55 ℃ 1min, 45 ℃ 1min, 35 ℃ 1min, 25 ℃ 1min, 4 ℃ storage, cooling rate 0.3 ℃/s. After annealing, the resulting products were ligated to the resulting linearized pSK-U6-Cj-tracr plasmid, respectively, by DNA ligase (available from NEB).
The ligation product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added with 900. mu.L of LB medium, and incubated at 37 ℃ for 1h to activate and revive E.coli DH 5. alpha. competent cells.
And coating the recovered escherichia coli DH5 alpha competent cells on an LB solid plate containing corresponding resistance, performing inverted culture in an incubator at 37 ℃, and performing Sanger sequencing verification on the obtained escherichia coli DH5 alpha monoclonal.
And (3) carrying out sequencing verification and correct connection of Escherichia coli DH5 alpha clone shake bacteria, and extracting plasmids to obtain a plasmid pSK-U6-Cj-on target sgRNA expressing the on target gRNA sequence and a plasmid pSK-U6-Cj-mismatch sgRNA expressing different mismatch gRNA sequences for later use.
(4-2) preparation of plasmid pU6-Nme2-on target sgRNA or pU6-Nme2-mismatch sgRNA
The sequences of each on target gRNA and mismatch gRNA were designed and their corresponding oligonucleotide single-stranded DNAs are shown in table 4 below, where mismatch bases are shown in the sequence listing as underlined bold bases. The obtained single-stranded oligonucleotide DNA corresponding to the on target gRNA and the single-stranded oligonucleotide DNA corresponding to different mismatch gRNAs were annealed. The annealing reaction system is as follows: mu.L of 100. mu.M oligo-F, 1. mu.L of 100. mu.M oligo-R, 28. mu.L of water. After the annealing system is vibrated and mixed uniformly, the annealing system is placed in a PCR instrument to run an annealing program; the annealing procedure was as follows: 95 ℃ 5min, 85 ℃ 1min, 75 ℃ 1min, 65 ℃ 1min, 55 ℃ 1min, 45 ℃ 1min, 35 ℃ 1min, 25 ℃ 1min, 4 ℃ storage, cooling rate 0.3 ℃/s. After annealing, the resulting products were ligated to the resulting linearized pU6-Nme2-tracr plasmid, respectively, by DNA ligase (available from NEB).
The ligation product was added to E.coli DH 5. alpha. competent cells (purchased from Shanghai Toshidi Biotech Co., Ltd.), incubated on ice for 30min, heat-shocked at 42 ℃ for 1min, incubated on ice for 2min, added with 900. mu.L of LB medium, and incubated at 37 ℃ for 1h to activate and revive E.coli DH 5. alpha. competent cells.
And coating the recovered escherichia coli DH5 alpha competent cells on an LB solid plate containing corresponding resistance, performing inverted culture in an incubator at 37 ℃, and performing Sanger sequencing verification on the obtained escherichia coli DH5 alpha monoclonal.
And (3) carrying out sequencing verification and correct connection on Escherichia coli DH5a clone bacteria shaking, and extracting plasmids to obtain a plasmid pU6-Nme2-on target sgRNA for expressing the on target gRNA sequence and a plasmid pU6-Nme2-mismatch sgRNA for expressing the different mismatch gRNA sequences respectively for later use.
Figure BDA0003190141730000691
Figure BDA0003190141730000701
Figure BDA0003190141730000711
Figure BDA0003190141730000721
(5-1) the resulting plasmid pSK-U6-Cj-on target sgRNA expressing the on target gRNA sequence and plasmid pSK-U6-Cj-mismatch sgRNA expressing the mismatch gRNA sequence were transfected into the GFP reporter HEK293T cell line containing the target sequence in a liposome manner together with plasmid pAAV2_ Cas9_ ITR expressing Cas9, respectively.
The transfection process comprises the following steps:
on day 0, GFP reporter HEK293T cell lines containing the target sequence were plated in 24-well plates at a cell density of 30% as required for transfection.
The GFP reporter system HEK293T cell line containing the target sequence comprises a CMV-ATG-target site-PAM-GFP nucleotide sequence, wherein the PAM sequence is shown in figures 13 to 18, the sequence of a target site (target site) corresponding to Cco2Cas9 protein is CCGGGACCCTCCACTCCTCCTG, CcuCas9 protein, the sequence of a target site corresponding to Hsp1Cas9 protein is TCCTTGGGCAGCAACACAGCAG, the sequence of a target site corresponding to Hsp1Cas9 protein is GGTGACAGGGGCTTCTCTCCAG, and the sequences of the target sites corresponding to Hsp1-Hsp3Cas9 protein, Hsp1-Hsp3Cas9-Y446A protein and Hsp1-Hsp 633 Cas 84-K390A-Y85446 protein are AGCGCAATGATGATCTCCGAGC.
On day 1, transfection was performed as follows:
300ng of plasmid pSK-U6-Cj-on target sgRNA or 300ng of plasmid pSK-U6-Cj-mismatch sgRNA, respectively, was added to 25. mu.L of Opti-MEM medium (purchased from Gibco) together with 500ng of plasmid pAAV2_ Cas9_ ITR, and gently vortexed.
Will be provided with
Figure BDA0003190141730000731
2000 (available from Invitrogen) or PEI (available from polysciences) were gently mixed and 1.6. mu.L of the mixture was aspirated
Figure BDA0003190141730000732
2000 or 0.8. mu.L PEI (100. mu.M) was added to 25. mu.L Opti-MEM medium, gently mixed, and allowed to stand at room temperature for 5 min.
Mixing the diluted plasmid and the diluted transfection reagent, gently blowing and mixing uniformly, standing the obtained mixed solution at room temperature for 20min, adding the mixed solution into a culture medium of a GFP reporter system HEK293T cell line containing a target sequence, and placing the culture medium at 37 ℃ and 5% CO2And continuing culturing in the incubator.
The flow cytometry analysis technology is used for analyzing the target sequence editing efficiency and the off-target rate of the CRISPR gene editing system.
In particular, it is collected in CO2The HEK293T cell line after 5 days in the incubator was tested for specificity using a flow cytometer (BD Biosciences FACSCalibur) and analyzed for GFP positive rate using FlowJo analysis software and plotted.
(5-2) the resulting plasmid pU6-Nme2-on target sgRNA expressing the on target gRNA sequence and plasmid pU6-Nme2-mismatch sgRNA expressing the mismatch gRNA sequence were transfected into a GFP reporter HEK293T cell line containing the target sequence (AGCGCAATGATGATCTCCGAGC) by liposome method with plasmid pAAV2_ Cas9_ ITR expressing Cas9, respectively.
The transfection process comprises the following steps:
on day 0, GFP reporter HEK293T cell lines containing the target sequence were plated in 24-well plates at a cell density of 30% as required for transfection.
The GFP reporter system HEK293T cell line containing the target sequence comprises a CMV-ATG-target site-PAM-GFP nucleotide sequence, wherein the PAM sequence is shown in figure 19, and the sequence of the target site (target site) is AGCGCAATGATGATCTCCGAGC.
On day 1, transfection was performed as follows:
300ng of plasmid pU6-Nme2-on target sgRNA or 300ng of plasmid pU6-Nme2-mismatch sgRNA were added together with 500ng of plasmid pAAV2_ Cas9_ ITR to 25. mu.L of Opti-MEM medium (purchased from Gibco Co.) and mixed by gentle pipetting.
Will be provided with
Figure BDA0003190141730000741
2000 (available from Invitrogen) or PEI (available from polysciences) were gently mixed and 1.6. mu.L of the mixture was aspirated
Figure BDA0003190141730000742
2000 or 0.8. mu.L PEI (100. mu.M) was added to 25. mu.L Opti-MEM medium, gently mixed, and allowed to stand at room temperature for 5 min.
Mixing the diluted plasmid and the diluted transfection reagent, gently blowing and mixing uniformly, standing the obtained mixed solution at room temperature for 20min, adding the mixed solution into a culture medium of a GFP reporter system HEK293T cell line containing a target sequence, and placing the culture medium at 37 ℃ and 5% CO2And continuing culturing in the incubator.
The flow cytometry analysis technology is used for analyzing the target sequence editing efficiency and the off-target rate of the CRISPR gene editing system.
In particular, it is collected in CO2The HEK293T cell line was cultured in an incubator for 5 days, the specificity thereof was examined by a flow cytometer (BD Biosciences FACSCalibur), and the GFP positive ratio was analyzed and plotted by FlowJo analysis software。
The results of the specific detection of the CRISPR/Cas9 gene editing system of the present invention in the GFP reporter HEK293T cell line containing the target sequence are shown in fig. 13 to 19, wherein the upper horizontal bar shows a schematic diagram of the GFP reporter system, and a specific target sequence and PAM sequence are inserted between the start codon ATG and the GFP coding sequence, resulting in GFP frameshift mutation. Therefore, when the gene editing system cuts the target sequence, the cells can restore GFP reading frames to some cells through the self-repairing system, and green fluorescence is generated. The Y-axis in the lower histograms of fig. 13 to 19 represents percentage (%) of GFP-positive cells, and the X-axis represents the oligonucleotide single-stranded DNA sequences corresponding to the On-target gRNA and the mismatch gRNA. As can be seen from fig. 13 to 19, the CRISPR gene editing system of the present invention edited the target site in the GFP reporter HEK293T cell line, and the ratio of mismatch gRNA-mediated gene editing was significantly lower than that of on-target gRNA-mediated gene editing. In addition, in the research result of the CRISPR/Hsp1-Hsp3Cas9-Y446A gene editing system, no obvious off-target phenomenon is found in the double-base mismatch of the last 14bp, and the CRISPR/Hsp1-Hsp3Cas9-K390A-Y446A gene editing system does not find the obvious off-target phenomenon in all the double-base mismatches, so that the gene editing systems have extremely high requirements on complete pairing between the gRNA and a target sequence, and have lower fault-tolerant rate and higher safety in practical application.
Sequence listing
SEQ ID NO: 1: cco2Cas9 protein sequence (Campylobacter coli)
Figure BDA0003190141730000761
SEQ ID NO: 2: CcuCas9 protein sequence (Campylobacter cunicululorum)
Figure BDA0003190141730000762
SEQ ID NO: 3: cspcas9 protein sequence (Campylobacter sp.113)
Figure BDA0003190141730000763
Figure BDA0003190141730000771
SEQ ID NO: 4: hap1Cas9 protein sequence (Helicobacter pylori (Helicobacter apodomus))
Figure BDA0003190141730000772
SEQ ID NO: 5: hap2Cas9 protein sequence (Helicobacter pylori (Helicobacter apodomus))
Figure BDA0003190141730000773
SEQ ID NO: 6: HgaCas9 protein sequence (Helicobacter strain)
Figure BDA0003190141730000774
Figure BDA0003190141730000781
SEQ ID NO: 7: HtyCas9 protein sequence (Helicobacter typhloinus)
Figure BDA0003190141730000782
SEQ ID NO: 8: hsp1Cas9 protein sequence (Helicobacter MIT 11-5569(Helicobacter sp. MIT 11-5569))
Figure BDA0003190141730000783
SEQ ID NO: 9: hsp2Cas9 protein sequence (Helicobacter MIT 03-1614(Helicobacter sp. MIT 03-1614))
Figure BDA0003190141730000784
Figure BDA0003190141730000791
SEQ ID NO: 10: hsp3Cas9 protein sequence (Helicobacter MIT 14-3879(Helicobacter sp. MIT 14-3879))
Figure BDA0003190141730000792
SEQ ID NO: 11: hsp4Cas9 protein sequence (Helicobacter 10-6591(Helicobacter sp.10-6591))
Figure BDA0003190141730000793
SEQ ID NO: 12: hsp1-CcuCas9 protein sequence
Figure BDA0003190141730000794
Figure BDA0003190141730000801
SEQ ID NO: 13: hsp1-Hap2Cas9 protein sequence
Figure BDA0003190141730000802
SEQ ID No: 14: hsp1-HgaCas9 protein sequence
Figure BDA0003190141730000803
SEQ ID NO: 15: hsp1-HtyCas9 protein sequence
Figure BDA0003190141730000804
Figure BDA0003190141730000811
sEQ ID No: 16: hsp1-Hsp2Cas9 protein sequence
Figure BDA0003190141730000812
SEQ ID NO: 17: hsp1-Hsp3Cas9 protein sequence
Figure BDA0003190141730000813
sEQ ID NO: 18: hsp1-Hsp3Cas9-Y446A protein sequence
Figure BDA0003190141730000814
Figure BDA0003190141730000821
sEQ ID NO: 19: hsp1-Hsp3Cas9-K390A-Y446A protein sequence
Figure BDA0003190141730000822
SEQ ID No: 20: hsp1-Hsp4Cas9 protein sequence
Figure BDA0003190141730000823
SEQ ID NO: 21: nsp2Cas9 protein sequence (Neisseria WF04(Neisseria sp. WF04))
Figure BDA0003190141730000831
SEQ ID NO: 22: coding sequence of Cco2Cas9 protein
Figure BDA0003190141730000832
Figure BDA0003190141730000841
SEQ ID NO: 23: coding sequence of CcuCas9 protein
Figure BDA0003190141730000842
Figure BDA0003190141730000851
SEQ ID No: 24: coding sequence of CspCas9 protein
Figure BDA0003190141730000852
Figure BDA0003190141730000861
SEQ ID NO: 25: coding sequence of Hap1Cas9 protein
Figure BDA0003190141730000862
Figure BDA0003190141730000871
sEQ ID No: 26: coding sequence of Hap2Cas9 protein
Figure BDA0003190141730000872
Figure BDA0003190141730000881
SEQ ID No: 27: coding sequence of HgaCas9 protein
Figure BDA0003190141730000882
Figure BDA0003190141730000891
SEQ ID NO: 28: coding sequence of HtyCas9 protein
Figure BDA0003190141730000892
Figure BDA0003190141730000901
SEQ ID No: 29: coding sequence of Hsp1Cas9 protein
Figure BDA0003190141730000902
Figure BDA0003190141730000911
SEQ ID No: 30: coding sequence of Hsp2Cas9 protein
Figure BDA0003190141730000912
Figure BDA0003190141730000921
SEQ ID No: 31: coding sequence of Hsp3Cas9 protein
Figure BDA0003190141730000922
Figure BDA0003190141730000931
SEQ ID NO: 32: coding sequence of Hsp4Cas9 protein
Figure BDA0003190141730000932
Figure BDA0003190141730000941
SEQ ID NO: 33: coding sequence of Hsp1-CcuCas9 protein
Figure BDA0003190141730000942
Figure BDA0003190141730000951
SEQ ID NO: 34: coding sequence of Hsp1-Hap2Cas9 protein
Figure BDA0003190141730000952
Figure BDA0003190141730000961
SEQ ID NO: 35: coding sequence of Hsp1-HgaCas9 protein
Figure BDA0003190141730000962
Figure BDA0003190141730000971
SEQ ID NO: 36: coding sequence of Hsp1-HtyCas9 protein
Figure BDA0003190141730000972
Figure BDA0003190141730000981
SEQ ID NO: 37: coding sequence of Hsp1-Hsp2Cas9 protein
Figure BDA0003190141730000982
Figure BDA0003190141730000991
SEQ ID NO: 38: coding sequence of Hsp1-Hsp3Cas9 protein
Figure BDA0003190141730000992
Figure BDA0003190141730001001
SEQ ID No: 39: coding sequence of Hsp1-Hsp3Cas9-Y446A protein
Figure BDA0003190141730001002
Figure BDA0003190141730001011
sEQ ID NO: 40: coding sequence of Hsp1-Hsp3Cas9-K390A-Y446A protein
Figure BDA0003190141730001012
Figure BDA0003190141730001021
SEQ ID NO: 41: coding sequence of Hsp1-Hsp4Cas9 protein
Figure BDA0003190141730001022
Figure BDA0003190141730001031
SEQ ID NO: 42: coding sequence of Nsp2Cas9 protein
Figure BDA0003190141730001032
Figure BDA0003190141730001041
SEQ ID No: 43: scaffold sequences of single-stranded guide RNAs related to Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas 5656 9 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-HgaCas9 protein, Hsp1-HtyCas9 protein, Hsp1-Hsp2Cas9 protein, Hsp1-Hsp3Cas9 protein, 1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, and Hsp1-Hsp4Cas9 protein
Figure BDA0003190141730001042
SEQ ID No: 44: scaffold sequence of single-stranded guide RNA related to Nsp2Cas9 protein
Figure BDA0003190141730001043
SEQ ID NO: 45: DNA sequences of scaffold sequences of single-stranded guide RNAs related to Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas 56565656 9 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-HgaCas9 protein, Hsp1-HtyCas9 protein, Hsp1-Hsp2Cas9 protein, Hsp1-Hsp3Cas9 protein, 1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein, and Hsp1-Hsp4Cas9 protein
Figure BDA0003190141730001044
SEQ ID NO: 46: DNA sequence of scaffold sequence of single-stranded guide RNA related to Nsp2Cas9 protein
Figure BDA0003190141730001051

Claims (16)

1. A Cas9 protein, the Cas9 protein being:
a) hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
20, or Hsp1-Hsp4Cas9 protein having an amino acid sequence shown in SEQ ID NO:20
b) Homologues of an amino acid sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any one of 80% -100% sequence identity to an amino acid sequence set forth in any one of SEQ ID No. 12 to SEQ ID No. 20 and retaining their biological activity.
2. A conjugate, comprising:
a) a Cas9 protein, the Cas9 protein being:
1) cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
2) Having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, of the amino acid sequence shown in any one of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 and SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
b) a modifying moiety;
for example, the modifying moiety is selected from an additional protein or polypeptide, a detectable label, or a combination thereof;
for example, the additional protein or polypeptide is selected from one or more of an epitope tag, a reporter protein or Nuclear Localization Signal (NLS) sequence, cytosine deaminase (CBE), adenine deaminase (ABE), cytosine methylase DNMT3A and MQ1, cytosine demethylase Tet1, transcriptional activator protein VP64, p65 and RTA, transcriptional repressor protein KRAB, histone acetylase p300, histone deacetylase LSD1, and endonuclease fokl;
and
c) optionally a linker for linking the Cas9 protein to the modification moiety.
3. A fusion protein, comprising:
a) a Cas9 protein, the Cas9 protein being:
1) cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
2) Having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, of the amino acid sequence shown in any one of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 and SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
b) an additional protein or polypeptide;
for example, the additional protein or polypeptide is selected from one or more of an epitope tag, a reporter protein or Nuclear Localization Signal (NLS) sequence, cytosine deaminase (CBE), adenine deaminase (ABE), cytosine methylase DNMT3A and MQ1, cytosine demethylase Tet1, transcriptional activator protein VP64, p65 and RTA, transcriptional repressor protein KRAB, histone acetylase p300, histone deacetylase LSD1, and endonuclease fokl;
and
c) optionally a linker for linking the Cas9 protein to the additional protein or polypeptide;
for example, the linker is a linker of 1-50 amino acids in length.
4. A single stranded guide RNA comprising a scaffold sequence having:
a) a nucleic acid sequence directed against Cco2Cas9 protein, CcuCas9 protein, CspCas9 protein, Hap1Cas9 protein, Hap2Cas 5656 9 protein, HgaCas9 protein, HtyCas9 protein, Hsp1Cas9 protein, Hsp2Cas9 protein, Hsp3Cas9 protein, Hsp4Cas9 protein, Hsp1-CcuCas9 protein, Hsp1-Hap2Cas9 protein, Hsp1-HgaCas9 protein, Hsp1-HtyCas9 protein, Hsp1-Hsp2Cas9 protein, Hsp1-Hsp3Cas9 protein, 1-Hsp3Cas9-Y446A protein, Hsp1-Hsp3Cas9-K390A-Y446A protein and Hsp1-Hsp 4-Cas 9 protein, homologues thereof, SEQ ID NO:43 conjugates, or a nucleic acid sequence shown in fusion protein, or a nucleic acid sequence shown in SEQ ID NO:43
A nucleic acid sequence shown as SEQ ID NO 44 for an Nsp2Cas9 protein, homologue, conjugate or fusion protein thereof;
or
b) A nucleic acid sequence that has at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.9%, or at least 100% sequence identity to a nucleic acid sequence set forth in any one of SEQ ID NO 43 and SEQ ID NO 44 and retains its biological activity; or
c) A nucleic acid sequence which is modified on the basis of the nucleic acid sequence described in any of SEQ ID NO 43 and SEQ ID NO 44 and retains the biological activity thereof,
for example, the modification is one or more of base phosphorylation, base sulfurization, base methylation, base hydroxylation, shortening of the sequence and lengthening of the sequence,
for example, shortening of the sequence and lengthening of the sequence includes the presence of deletions or additions of one, two, three, four, five, six, seven, eight, nine, or ten bases relative to the base sequence.
5. The single stranded guide RNA according to claim 4, further comprising a guide sequence at the 5' end of the scaffold sequence, the guide sequence being a sequence of 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides (preferably 22 nucleotides) in length and capable of complementary pairing with a target sequence.
6. An isolated nucleic acid molecule comprising a nucleic acid sequence encoding:
a) a Cas9 protein, the Cas9 protein being:
1) cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
2) Having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, of the amino acid sequence shown in any one of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 and SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
b) the conjugate of claim 2; or
c) A fusion protein of claim 3;
for example, the isolated nucleic acid molecule comprises the nucleic acid sequence shown in any one of SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO 24, SEQ ID NO 25, SEQ ID NO 26, SEQ ID NO 27, SEQ ID NO 28, SEQ ID NO 29, SEQ ID NO 30, SEQ ID NO 31, SEQ ID NO 32, SEQ ID NO 33, SEQ ID NO 34, SEQ ID NO 35, SEQ ID NO 36, SEQ ID NO 37, SEQ ID NO 38, SEQ ID NO 39, SEQ ID NO 40, SEQ ID NO 41 and SEQ ID NO 42, or a degenerate sequence thereof.
7. The isolated nucleic acid molecule of claim 6, wherein the isolated nucleic acid molecule further comprises a nucleic acid sequence encoding the single-stranded guide RNA corresponding to the Cas9 protein of any one of claims 4 to 5;
for example, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding a Cas9 protein, homologue, conjugate or fusion protein having the amino acid sequence shown in SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19 or SEQ ID NO 20, such as SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO 24, SEQ ID NO 25, SEQ ID NO 26, SEQ ID NO 27, SEQ ID NO 2, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19 or SEQ ID NO 20, SEQ ID NO: 28. SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO:41, or a nucleotide sequence shown in the sequence ID No. 41, and comprises a nucleic acid sequence encoding a polypeptide directed against the Cas9 protein, homolog, conjugate or fusion protein thereof comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 nucleic acid sequence of a single stranded guide RNA engineered with the engineered sequence and retaining its biological activity, e.g. SEQ ID NO: 45;
for example, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO 21, a homologue, a conjugate or a fusion protein thereof, such as the nucleic acid sequence shown in SEQ ID NO 42, and a nucleic acid sequence encoding a single stranded guide RNA for the Nsp2Cas9 protein, a homologue, a conjugate or a fusion protein thereof, comprising a scaffold sequence shown in SEQ ID NO 44, comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO 44 and retaining its biological activity, or comprising an engineered sequence based on SEQ ID NO 44 and retaining its biological activity, such as the nucleic acid sequence shown in SEQ ID NO 46.
8. An isolated nucleic acid molecule comprising a nucleic acid sequence encoding the single stranded guide RNA of any one of claims 4 to 5;
for example, the isolated nucleic acid molecule comprises the nucleic acid sequence set forth in any one of SEQ ID NO:45 and SEQ ID NO:46, or a degenerate sequence thereof, and preferably further comprises a nucleic acid sequence encoding a CRISPR spacer sequence.
9. A vector comprising a nucleic acid sequence encoding:
a) a Cas9 protein, the Cas9 protein being:
1) cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
2) Having an amino acid sequence at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% identical to that shown in any of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 and SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
b) the conjugate of claim 2; or
c) A fusion protein of claim 3;
for example, the vector comprises a nucleic acid sequence as set forth in any one of SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO 24, SEQ ID NO 25, SEQ ID NO 26, SEQ ID NO 27, SEQ ID NO 28, SEQ ID NO 29, SEQ ID NO 30, SEQ ID NO 31, SEQ ID NO 32, SEQ ID NO 33, SEQ ID NO 34, SEQ ID NO 35, SEQ ID NO 36, SEQ ID NO 37, SEQ ID NO 38, SEQ ID NO 39, SEQ ID NO 40, SEQ ID NO 41 and SEQ ID NO 42, or a degenerate sequence thereof;
for example, the vector is a plasmid vector such as pUC19 vector, an attachment vector, pAAV2_ ITR vector, a retroviral vector, a lentiviral vector, an adenoviral vector, or an adeno-associated viral vector.
10. The vector of claim 9, wherein the vector further comprises a nucleic acid sequence encoding the single stranded guide RNA corresponding to the Cas9 protein of any one of claims 4 to 5;
for example, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding a Cas9 protein, homologue, conjugate or fusion protein having the amino acid sequence shown in SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19 or SEQ ID NO 20, such as SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO 24, SEQ ID NO 25, SEQ ID NO 26, SEQ ID NO 27, SEQ ID NO 2, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19 or SEQ ID NO 20, SEQ ID NO: 28. SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO:41, or a nucleotide sequence shown in the sequence ID No. 41, and comprises a nucleic acid sequence encoding a polypeptide directed against the Cas9 protein, homolog, conjugate or fusion protein thereof comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 nucleic acid sequence of a single stranded guide RNA engineered with the engineered sequence and retaining its biological activity, e.g. SEQ ID NO: 45;
for example, the isolated nucleic acid molecule comprises a nucleic acid sequence encoding an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO 21, a homologue, a conjugate or a fusion protein thereof, such as the nucleic acid sequence shown in SEQ ID NO 42, and a nucleic acid sequence encoding a single stranded guide RNA for the Nsp2Cas9 protein, a homologue, a conjugate or a fusion protein thereof, comprising a scaffold sequence shown in SEQ ID NO 44, comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO 44 and retaining its biological activity, or comprising an engineered sequence based on SEQ ID NO 44 and retaining its biological activity, such as the nucleic acid sequence shown in SEQ ID NO 46.
11. A vector comprising a nucleic acid sequence encoding the single stranded guide RNA of any one of claims 4 to 5;
for example, the vector comprises the nucleic acid sequence set forth in any one of SEQ ID NO:45 and SEQ ID NO:46, or a degenerate sequence thereof, and preferably further comprises a nucleic acid sequence encoding a CRISPR spacer.
12. A CRISPR/Cas9 gene editing system comprising:
a) a protein component comprising:
1) a Cas9 protein, the Cas9 protein being:
1.1) Cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO:1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
1.2) has an amino acid sequence which is at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% or at least similar to the amino acid sequence shown in any of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 or SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
2) the conjugate of claim 2, or
3) A fusion protein of claim 3; and
b) a nucleic acid component comprising: the single stranded guide RNA of any one of claims 4 to 5 corresponding to the protein component of a);
and, the protein component and the nucleic acid component are bound to each other to form a complex;
for example, the protein component comprises a Cas9 protein, homologue, conjugate or fusion protein having the amino acid sequence shown by SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19 or SEQ ID NO 20, the nucleic acid component comprises a single stranded guide RNA which is a single stranded guide RNA comprising a scaffold sequence shown by SEQ ID NO 43, a single stranded guide RNA comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO 43 and retaining its biological activity, a polypeptide having the amino acid sequence shown by SEQ ID NO 43, a polypeptide having the amino acid sequence shown by SEQ ID NO 19, a homologue, conjugate or fusion protein having the amino acid sequence shown by SEQ ID NO 20, a single stranded guide RNA having at least 90% sequence identity to SEQ ID NO 43 and a sequence having a biological activity, Or a single stranded guide RNA comprising an engineered sequence based on SEQ ID NO 43 that retains its biological activity;
for example, the protein component comprises an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID No. 21, a homologue, conjugate or fusion protein thereof, and the nucleic acid component comprises a single-stranded guide RNA that is a single-stranded guide RNA comprising a scaffold sequence shown in SEQ ID No. 44, a single-stranded guide RNA comprising a homologous sequence having at least 90% sequence identity with SEQ ID No. 44 and retaining its biological activity, or a single-stranded guide RNA comprising an engineered sequence based on SEQ ID No. 44 and retaining its biological activity.
13. A cell, comprising: the isolated nucleic acid molecule of any one of claims 6 to 8, or the vector of any one of claims 9 to 11;
for example, the cell is a prokaryotic cell or a eukaryotic cell, such as a plant cell or an animal cell, such as a mammalian cell, e.g., a human cell.
14. A method of gene editing a target sequence in an intracellular or in vitro environment, the method comprising: contacting any one of (1) to (4) below with a target sequence in an intracellular or in vitro environment:
(1) a Cas9 protein, a conjugate according to claim 2 or a fusion protein according to claim 3, and a single-stranded guide RNA corresponding to the Cas9 protein according to any one of claims 4 to 5,
wherein the Cas9 protein is:
1) cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
2) Having an amino acid sequence at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% identical to that shown in any of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 and SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
for example, a polypeptide having the sequence of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 3. SEQ ID NO: 4. SEQ ID NO: 5. SEQ ID NO: 6. SEQ ID NO: 7. SEQ ID NO: 8. SEQ ID NO: 9. SEQ ID NO: 10. SEQ ID NO: 11. SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID NO: 15. SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO:19 or SEQ ID NO:20, a homologue, conjugate or fusion protein thereof, and a polypeptide comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 single-stranded guide RNAs which are engineered to the resulting engineered sequence and retain their biological activity;
for example, a nucleic acid sequence of an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 21, homologues thereof, conjugates thereof or fusion proteins thereof, and a single stranded guide RNA comprising a scaffold sequence shown in SEQ ID NO. 44, a homologous sequence having at least 90% sequence identity to SEQ ID NO. 44 and retaining its biological activity, or an engineered sequence based on SEQ ID NO. 44 and retaining its biological activity;
(2) a vector according to claim 9 and a vector according to claim 11;
for example, nucleic acid sequences comprising a Cas9 protein, homologue, conjugate or fusion protein encoding an amino acid sequence having the amino acid sequence shown in SEQ ID NO 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 (e.g., SEQ ID NO 22, 23, 24, 25, 26, 27, 28, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO: 41) in a vector, and a polypeptide comprising a nucleotide sequence encoding a polypeptide directed against the Cas9 protein, homolog, conjugate, or fusion protein thereof comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 a vector which modifies the nucleic acid sequence of the single-stranded guide RNA obtained and retaining the modified sequence of its biological activity (for example, the nucleic acid sequence shown in SEQ ID NO: 45);
for example, a vector comprising a nucleic acid sequence encoding an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 21, a homologue, conjugate or fusion protein thereof (e.g., the nucleic acid sequence shown in SEQ ID NO. 42), and a vector comprising a nucleic acid sequence encoding a single-stranded guide RNA for the Nsp2Cas9 protein, homologue, conjugate or fusion protein thereof comprising a scaffold sequence shown in SEQ ID NO. 44, comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO. 44 and retaining its biological activity, or comprising an engineered sequence based on SEQ ID NO. 44 and retaining its biological activity (e.g., the nucleic acid sequence shown in SEQ ID NO. 46);
(3) a vector according to claim 10; and
(4) the CRISPR/Cas9 gene editing system of claim 12;
wherein, upon contact with a target sequence, the Cas protein, homolog, conjugate or fusion protein recognizes a respective protospacer adjacent sequence (PAM) that is located at the 3' end of the target sequence and, for the Cco2Cas protein, the CcuCas protein, the CspCas protein, the Hap1Cas protein, the Hap2Cas protein, the HgaCas protein, the HtyCas protein, the Hsp1Cas protein, the Hsp2Cas protein, the Hsp3Cas protein, the Hsp4Cas protein, the Hsp-CcuCas protein, the Hsp-Hap 2Cas protein, the Hsp-HgaCas protein, the Hsp-HtyCas protein, the Hsp-2 Cas protein, the Hsp-3 Cas protein, the Hsp-Hsp 3 Cas-Y446 protein, the Hsp-Hsp 3-K390-Y446 protein, the Hsp-Hsp 4Cas protein, and the Nsp2 proteins, or their respective homologs, conjugates or fusion proteins, the PAM is 5 ' -NNNNCY, 5 ' -NNCNA, 5 ' -NNNNCYWT, 5 ' -NNNGCCKS, 5 ' -NNNGG, 5 ' -NNNCCC, 5 ' -NNRTTA, 5 ' -NNRAA, 5 ' -NNRYAT, 5 ' -NNNTCC, 5 ' -NNRT, 5 ' -NNCNA, 5 ' -NNNGG, 5 ' -NNCCAW, 5 ' -NNRTYR, 5 ' -NNRYAT, 5 ' -NNNNCY, 5 ' -NNCY, 5 ' -NNNNCY, 5 ' -NNRT and 5 ' -NNCC, respectively;
for example, the cell is a prokaryotic cell or a eukaryotic cell, such as a plant cell or an animal cell, such as a mammalian cell, e.g., a human cell;
for example, the gene editing comprises one or more of gene knockout, site-directed base alteration, site-directed insertion, regulation of gene transcription level, regulation of DNA methylation, DNA acetylation modification, histone acetylation modification, single base conversion, and chromatin imaging tracking of a target sequence, for example, the single base conversion comprises a base adenine to guanine conversion, a cytosine to thymine conversion, or a cytosine to uracil conversion.
15. The method of claim 14, wherein the CRISPR spacer sequence of the single stranded guide RNA forms a fully base complementary paired structure with the target sequence and a non-target sequence;
for example, the incomplete base-complementary pairing structure includes one or more, e.g., two or more, base-mismatched structures.
16. A kit for gene editing of a target sequence in an intracellular or in vitro environment, comprising:
a) any one selected from the following 1) to 6):
1) a Cas9 protein, a conjugate according to claim 2, or a fusion protein according to claim 3, and a single-stranded guide RNA corresponding to the Cas9 protein according to any one of claims 4 to 5,
wherein the Cas9 protein is:
a) cco2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 1,
2, and the protein CcuCas9 with the amino acid sequence shown in SEQ ID NO,
the protein CspCas9 with the amino acid sequence shown in SEQ ID NO. 3,
the protein of the Hap1Cas9 with the amino acid sequence shown in SEQ ID NO. 4,
the protein of the Hap2Cas9 with the amino acid sequence shown in SEQ ID NO. 5,
the HgaCas9 protein with the amino acid sequence shown in SEQ ID NO. 6,
HtyCas9 protein having the amino acid sequence shown in SEQ ID NO. 7,
hsp1Cas9 protein having the amino acid sequence shown in SEQ ID NO. 8,
an Hsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 9,
an Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO. 10,
an Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO. 11,
hsp1-CcuCas9 protein with amino acid sequence shown in SEQ ID NO. 12,
hsp1-Hap2Cas9 protein having the amino acid sequence shown in SEQ ID NO 13,
hsp1-HgaCas9 protein with amino acid sequence shown in SEQ ID NO. 14,
hsp1-HtyCas9 protein with amino acid sequence shown in SEQ ID NO. 15,
an Hsp1-Hsp2Cas9 protein having an amino acid sequence shown in SEQ ID NO:16,
hsp1-Hsp3Cas9 protein having the amino acid sequence shown in SEQ ID NO:17,
18, Hsp1-Hsp3Cas9-Y446A protein with an amino acid sequence shown as SEQ ID NO,
hsp1-Hsp3Cas9-K390A-Y446A protein with the amino acid sequence shown in SEQ ID NO. 19,
hsp1-Hsp4Cas9 protein having the amino acid sequence shown in SEQ ID NO:20, or
An Nsp2Cas9 protein with an amino acid sequence shown as SEQ ID NO. 21, or
b) Having an amino acid sequence at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94% identical to that shown in any of SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19, SEQ ID NO 20 and SEQ ID NO 21, A homologue of an amino acid sequence that is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, at least 99.95%, at least 99.99%, at least 99.999%, at least 100%, or any of 80% -100% sequence identity and retains its biological activity;
for example, Cas9 protein having an amino acid sequence shown by SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 11, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 17, SEQ ID NO 18, SEQ ID NO 19 or SEQ ID NO 20, a protein having an amino acid sequence shown by SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 5, SEQ ID NO 8, SEQ ID NO 9, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, as well as single-stranded guide RNAs comprising a scaffold sequence as set forth in SEQ ID NO 43, single-stranded guide RNAs comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO 43 and retaining its biological activity, or single-stranded guide RNAs comprising an engineered sequence based on SEQ ID NO 43 and retaining its biological activity;
for example, an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 21, homologues thereof having an amino acid sequence with at least 80% sequence identity to SEQ ID NO. 21, conjugates or fusion proteins thereof, and a single stranded guide RNA comprising a scaffold sequence shown in SEQ ID NO. 44, a single stranded guide RNA comprising a homologous sequence with at least 90% sequence identity to SEQ ID NO. 44 and retaining its biological activity, or a single stranded guide RNA comprising an engineered sequence based on SEQ ID NO. 44 and retaining its biological activity;
2) the isolated nucleic acid molecule according to claim 6 and the isolated nucleic acid molecule according to claim 8;
for example, nucleic acid sequences comprising a Cas9 protein, homologue, conjugate or fusion protein encoding an amino acid sequence having the amino acid sequence shown in SEQ ID NO 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 (e.g., SEQ ID NO 22, 23, 24, 25, 26, 27, 28, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO: 41) of a nucleic acid sequence as set forth in seq id No. 41), and a polypeptide comprising a nucleotide sequence encoding a polypeptide directed against the Cas9 protein, homolog, conjugate or fusion protein thereof comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 an isolated nucleic acid molecule which is a nucleic acid sequence of a single stranded guide RNA (e.g., the nucleic acid sequence set forth in SEQ ID NO: 45) which has been engineered to have an engineered sequence which retains its biological activity;
for example, an isolated nucleic acid molecule comprising a nucleic acid sequence encoding an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 21, a homologue, conjugate or fusion protein thereof (e.g., the nucleic acid sequence shown in SEQ ID NO. 42), and an isolated nucleic acid molecule comprising a nucleic acid sequence encoding a single-stranded guide RNA for the Nsp2Cas9 protein, homologue, conjugate or fusion protein thereof comprising a scaffold sequence shown in SEQ ID NO. 44, comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO. 44 and retaining its biological activity, or comprising an engineered sequence based on SEQ ID NO. 44 and retaining its biological activity (e.g., the nucleic acid sequence shown in SEQ ID NO. 46);
3) the isolated nucleic acid molecule of claim 7;
4) a vector according to claim 9 and a vector according to claim 11;
for example, nucleic acid sequences comprising a Cas9 protein, homologue, conjugate or fusion protein encoding an amino acid sequence having the amino acid sequence shown in SEQ ID NO 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 (e.g., SEQ ID NO 22, 23, 24, 25, 26, 27, 28, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), SEQ ID NO: 29. SEQ ID NO: 30. SEQ ID NO: 31. SEQ ID NO: 32. SEQ ID NO: 33. SEQ ID NO: 34. SEQ ID NO: 35. SEQ ID NO: 36. SEQ ID NO: 37. SEQ ID NO: 38. SEQ ID NO: 39. SEQ ID NO:40 or SEQ ID NO: 41) in a vector, and a polypeptide comprising a nucleotide sequence encoding a polypeptide directed against the Cas9 protein, homolog, conjugate, or fusion protein thereof comprising SEQ ID NO:43, a scaffold sequence comprising a nucleotide sequence substantially identical to SEQ ID NO:43, or a homologous sequence having at least 90% sequence identity and retaining its biological activity, or a nucleic acid sequence comprising a nucleotide sequence based on SEQ ID NO:43 a vector which modifies the nucleic acid sequence of the single-stranded guide RNA obtained and retaining the modified sequence of its biological activity (for example, the nucleic acid sequence shown in SEQ ID NO: 45);
for example, a vector comprising a nucleic acid sequence encoding an Nsp2Cas9 protein having the amino acid sequence shown in SEQ ID NO. 21, a homologue, conjugate or fusion protein thereof (e.g., the nucleic acid sequence shown in SEQ ID NO. 42), and a vector comprising a nucleic acid sequence encoding a single-stranded guide RNA for the Nsp2Cas9 protein, homologue, conjugate or fusion protein thereof comprising a scaffold sequence shown in SEQ ID NO. 44, comprising a homologous sequence having at least 90% sequence identity to SEQ ID NO. 44 and retaining its biological activity, or comprising an engineered sequence based on SEQ ID NO. 44 and retaining its biological activity (e.g., the nucleic acid sequence shown in SEQ ID NO. 46);
5) a vector according to claim 10; or
6) The CRISPR/Cas9 gene editing system of claim 12;
and
b) instructions for how to perform gene editing of a target sequence in an intracellular or in vitro environment.
CN202110878452.XA 2021-07-30 2021-07-30 Cas9 protein, gene editing system containing Cas9 protein and application Pending CN113652411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110878452.XA CN113652411A (en) 2021-07-30 2021-07-30 Cas9 protein, gene editing system containing Cas9 protein and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110878452.XA CN113652411A (en) 2021-07-30 2021-07-30 Cas9 protein, gene editing system containing Cas9 protein and application

Publications (1)

Publication Number Publication Date
CN113652411A true CN113652411A (en) 2021-11-16

Family

ID=78478229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110878452.XA Pending CN113652411A (en) 2021-07-30 2021-07-30 Cas9 protein, gene editing system containing Cas9 protein and application

Country Status (1)

Country Link
CN (1) CN113652411A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105331609A (en) * 2015-10-20 2016-02-17 芜湖医诺生物技术有限公司 Human CCR5 gene target sequence identified by neisseria meningitidis CRISPR-Cas9 system, sgRNA and application of target sequence and sgRNA
CN110358753A (en) * 2019-07-29 2019-10-22 南方医科大学 Fusion protein, corresponding DNA target based on CjCas9 and VPR Core domain are to activation system and its application
CN110551761A (en) * 2019-08-08 2019-12-10 复旦大学 CRISPR/Sa-SepCas9 gene editing system and application thereof
CN111868240A (en) * 2017-11-10 2020-10-30 马萨诸塞大学 Targeted CRISPR delivery platforms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105331609A (en) * 2015-10-20 2016-02-17 芜湖医诺生物技术有限公司 Human CCR5 gene target sequence identified by neisseria meningitidis CRISPR-Cas9 system, sgRNA and application of target sequence and sgRNA
CN111868240A (en) * 2017-11-10 2020-10-30 马萨诸塞大学 Targeted CRISPR delivery platforms
CN110358753A (en) * 2019-07-29 2019-10-22 南方医科大学 Fusion protein, corresponding DNA target based on CjCas9 and VPR Core domain are to activation system and its application
CN110551761A (en) * 2019-08-08 2019-12-10 复旦大学 CRISPR/Sa-SepCas9 gene editing system and application thereof

Similar Documents

Publication Publication Date Title
CN113373130A (en) Cas12 protein, gene editing system containing Cas12 protein and application
CN107109422B (en) Genome editing using split Cas9 expressed from two vectors
US20190382775A1 (en) Truncated crispr-cas proteins for dna targeting
CN114672473B (en) Optimized Cas protein and application thereof
US11767525B2 (en) System and method for genome editing
CN114517190B (en) CRISPR enzymes and systems and uses
CN114410609B (en) Cas protein with improved activity and application thereof
CN114438055B (en) Novel CRISPR enzymes and systems and uses
WO2020224611A1 (en) Improved gene editing system
WO2020087631A1 (en) System and method for genome editing based on c2c1 nucleases
CN116751762A (en) Cas12b proteins, single stranded guide RNAs, gene editing systems comprising same and related applications
CN117025570A (en) Cas12a mutant protein, gene editing system containing Cas12a mutant protein and application
CN111051509A (en) Composition for dielectric calibration containing C2CL endonuclease and method for dielectric calibration using the same
CN110551762B (en) CRISPR/ShaCas9 gene editing system and application thereof
KR102151064B1 (en) Gene editing composition comprising sgRNAs with matched 5' nucleotide and gene editing method using the same
CN113652411A (en) Cas9 protein, gene editing system containing Cas9 protein and application
CN114990093A (en) Protein sequence MINI RFX-CAS13D with small amino acid sequence
JP2024501892A (en) Novel nucleic acid-guided nuclease
CN110551763B (en) CRISPR/SlutCas9 gene editing system and application thereof
CN113583999A (en) Cas9 protein, gene editing system containing Cas9 protein and application
WO2021226369A1 (en) Enzymes with ruvc domains
CN118325867A (en) Cas9 protein, gene editing system containing Cas9 protein and application
US20190218533A1 (en) Genome-Scale Engineering of Cells with Single Nucleotide Precision
CN118165956A (en) CRISPR/Cas9 gene editing system based on Tsp2Cas9 protein and related application thereof
CN116804190A (en) SlugCas9 mutant protein and related application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination