WO2020007325A1

WO2020007325A1 - Cas9 variants and application thereof

Info

Publication number: WO2020007325A1
Application number: PCT/CN2019/094585
Authority: WO
Inventors: Zhen XIE; Dacheng Ma; Zhaoyu Zhang; Zhimeng XU
Original assignee: Tsinghua University
Priority date: 2018-07-05
Filing date: 2019-07-03
Publication date: 2020-01-09
Also published as: CN110684755A; CN110684755B

Abstract

Provided is a Cas9 variant, comprising: a first backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the first backbone region of a wild-type cas9; a protospacer adjacent motif (PAM) interaction region, being a 13-amino acid sequence deriving from the PAM interaction region of an ortholog of the wild-type Cas9, and having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the PAM interaction region of the ortholog of the wild-type Cas9; and a second backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the second backbone region of the wild-type cas9; wherein an N-terminus of the PAM interaction region is connected to a C-terminus of the first backbone region, and a C-terminus of the PAM interaction region is connected to an N-terminus of the second backbone region, and wherein the Cas9 variant has recognition capability at a PAM sequence selected from the group consisting of NNVRRN, NNVACT, NNVATG, NNVATT, NNVGCT, NNVGTG and NNVGTT PAM sequences, wherein N is adenine (A), thymine (T), cytosine (C) or guanine (G); R is adenine (A) or guanine (G); and V is adenine (A), cytosine (C) or guanine (G).

Description

CAS9 VARIANTS AND APPLICATION THEREOF

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority to and benefit of Chinese Patent Application No. 201810731984.9 filed with the National Intellectual Property Administration of PRC on July 5, 2018, the entire content of which is incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with support of National Natural Science Foundation of China (31771483 and 61721003) , Tsinghua Basic Research Program, and Basic Research Program of Beijing National Research Center for Information Science and Technology.

FIELD

The present disclosure relates to the field of biotechnology, in particular to a Cas9 variant, a nucleic acid, a kit, and a method for gene editing.

BACKGROUND

CRISPR (clustered regularly interspaced short palindromic repeats) -Cas (CRISPR-associated) systems work as the prokaryotic adaptive immune systems that provide protection against infection. The CRISPR system has been found in half of all sequenced bacterial genomes and nearly all archaeal genomes, and the CRISPR nucleases are highly diverse. However, only several CRISPR nucleases are functional in mammalian cells so far. Several Cas9 orthologs from microbial type II CRISPR systems have been widely applied for targeted gene and base editing, transcription modulations, and epigenetic modifications in the mammalian genome. Targeting of a specific genomic site (protospacer) is programmed by base-pairing with a chimeric guide RNA (gRNA) bound to the Cas9 endonuclease. In addition, a short protospacer adjacent motif (PAM) is required for target recognition by the Cas9: gRNA complex, which significantly restricts the range of genomic sequences that are targetable by the CRISPR/Cas9 system.

Although Cas9 nucleases are remarkably diverse in microorganisms, the range of genomic sequences targetable by a CRISPR/Cas9 system is restricted by the requirement of PAM at the target site. Meanwhile, few Cas9 orthologs identified are verified to be useful in genome editing in mammalian cells effectively.

Staphylococcus aureus (SaCas9) has been discovered as a compact Cas9 ortholog suitable for viral delivery for biomedical applications and displays a comparable activity to SpCas9 in mammalian cells. Further, A SaCas9 variant with E782K/N968K/R1015H triple mutations (SaCas9-KKH variant) has been identified to recognize an NNNRRT PAM (N=A, T, C or G; R=A or G) compared to NNGRRT PAM for wild-type SaCas9, with access to 1/16 of all the possible genomic targets in mammalian cells theoretically. However, the PAM compatibility for SaCas9 or its SaCas9-KKH variant is still limited, which cannot be widely used in genome editing for treating diseases and other biomedical applications.

Therefore, there is currently a need in the art for improved Cas9 with expanded PAM compatibility so as to improve the genome editing capability for the CRISPR/Cas9 system.

SUMMARY

Embodiments of the present disclosure seek to solve at least one of the problems existing in the related art to at least some extent, or to provide a useful commercial alternative at least.

For this purpose, in one aspect, the present disclosure in embodiments provides a Cas9 variant, comprising: a first backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the first backbone region of a wild-type cas9; a protospacer adjacent motif (PAM) interaction region, being a 13-amino acid sequence deriving from the PAM interaction region of an ortholog of the wild-type Cas9, and having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the PAM interaction region of the ortholog of the wild-type Cas9; and a second backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the second backbone region of the wild-type cas9; wherein an N-terminus of the PAM interaction region is connected to a C-terminus of the first backbone region, and a C-terminus of the PAM interaction region is connected to an N-terminus of the second backbone region, and wherein the Cas9 variant has recognition capability at a PAM sequence selected from the group consisting of NNVRRN, NNVACT, NNVATG, NNVATT, NNVGCT, NNVGTG and NNVGTT PAM sequences, wherein N is adenine (A) , thymine (T) , cytosine (C) or guanine (G) ; R is adenine (A) or guanine (G) ; and V is adenine (A) , cytosine (C) or guanine (G) .

In some embodiments, the wild-type Cas9 is derived from Micrococcus, Staphylococcus, Planoeoccus, Streptococcus, Leuconostoc, Pediococcus, Aerococcus or Gemella; preferably, Staphylococcus comprises Staphylococcus aureus, Staphylococcus epidermidis and Staphylococcus saprophyticus; preferably, Streptococcus comprises Streptococcus pyogenes, streptococcus equismilis, Streptococcus zooepidemicus, Streptococcus equi, Streptococcus dysgalactiae, Streptococcus sanguis, Streptococcus Pneumoniae, Streptococcus anginosus, Streptococcus agalactiae, streptococcus acidominimus, Streptococcus salivarius, Streptococcus mitis, Streptococcus bovis, streptococcus equinus, Streptococcus thermophilus, Streptococcus faecalis, streptococcus faecium, streptococcus avium, streptococcus uberis, Streptococcus lactis, streptococcus cremoris and Streptococcus canis; preferably, the wild-type Cas9 is derived from Staphylococcus aureus.

In some embodiments, the first and second backbone regions each independently have one or more of amino acid mutations compared to the first and second backbone region of the wild-type Cas9, wherein the amino acid mutation is a substitution, a deletion and an addition.

In some embodiments, the first backbone region comprises the amino acid mutation selected from the group consisting of:

a substitution of Alanine (A) for Arginine (R) at position 499,

a substitution of Lysine (K) for Glutamine (Q) at position 500,

a substitution of Alanine (A) for Arginine (R) at position 654,

a substitution of Arginine (R) for Glycine (G) at position 655,

a substitution of Lysine (K) for Glutamicacid (E) at position 782, and

a substitution of Lysine (K) for Asparagine (N) at position 968,

optionally, the second backbone region comprises an amino acid mutation at position 1015, wherein the amino acid mutation is a substitution of Histidine (H) for Arginine (R) ,

preferably, the first backbone region comprises a substitution of Lysine (K) for Glutamicacid (E) at position 782 and a substitution of Lysine (K) for Asparagine (N) at position 968, and the second backbone region comprises a substitution of Histidine (H) for Arginine (R) at position 1015.

In some embodiments, the first backbone region consists of the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 130; optionally the second backbone region consists of the amino acid sequence of SEQ ID NO: 2.

In some embodiments, the ortholog of Staphylococcus aureus is selected from Absiella dolichum, Clostridium coleatum, Veillonella parvula, Alkalibacterium gilvum, Alkalibacterium sp. 20, Lacticigenium naphtae, Alkalibacterium subtropicum, Carnobacterium iners, Carnobacterium viridans, Jeotgalibaca sp. PTS2502, Listeria ivanovii sp. londoniensis, Bacillus massilionigeriensis, Bacillus niameyensis, Ureibacillus thermosphaericus-1, Ureibacillus thermosphaericus-2, Halakalibacillus halophilus, Paraliobacillus ryukyuensis, Sediminibacillus albus, Virgibacillus senegalensis, Pelagirhabdus alkalitolerans, Massilibacterium senegalense, Macrocococcus sp. IME 1552, Staphylococcus (from multispecies) , Staphylococcus simulans, Staphylococcus sp. HMSC061G12, Staphylococcus massiliensis, Staphylococcus microti, Staphylococcus haemolyticus, Staphylococcus sp. HMSC34C02, Staphylococcus warneri, Staphylococcus schleiferi, Staphylococcus agnetis and Staphylococcus lutrae.

In some embodiments, the ortholog of Staphylococcus aureus is further selected from Sediminibacillus albus, Staphylococcus schleiferi, Staphylococcus simulans, Staphylococcus sp. HMSC061G12, Staphylococcus agnetis, Clostridium cocleatum, Absiella dolichum, Staphylococcus warneri, Staphylococcus microti, Massilibacterium senegalense, Lacticigenium naphtae and Halalkaibacillus halophilus.

In some embodiments, the ortholog of Staphylococcus aureus is further selected from Sediminibacillus albus, Staphylococcus schleiferi, Staphylococcus warneri and Staphylococcus microti.

In some embodiments, the PAM interaction region further has one or more of amino acid mutations compared to the PAM interaction region of the ortholog of the wild-type Cas9, wherein the amino acid mutation is a substitution, a deletion and an addition.

In some embodiments, the PAM interaction region has an amino acid mutation at position 991 compared to the PAM interaction region of Staphylococcus schleiferi, wherein the amino acid mutation is a substitution of Lysine (K) or Leucine (L) for Isoleucine (I) .

In some embodiments, the PAM interaction region has an amino acid mutation at position 991 compared to the PAM interaction region of Staphylococcus warneri, wherein the amino acid mutation is a substitution of Lysine (K) , Leucine (L) or Arginine (R) for Isoleucine (I) .

In some embodiments, the PAM interaction region consists of the amino acid sequence of SEQ ID NO: 3 to SEQ ID NO: 43; preferably the PAM interaction region consists of the amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 27, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 36 to SEQ ID NO: 43; preferably the PAM interaction region consists of the amino acid sequence selected from SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 13, SEQ ID NO: 33, SEQ ID NO: 36 to SEQ ID NO: 43.

In some embodiments, the Cas9 variant consists of the amino acid sequence selected from SEQ ID NO: 44 to SEQ ID NO: 84, SEQ ID NO: 131 and SEQ ID NO: 133; preferably the Cas9 variant consists of the amino acid sequence selected from SEQ ID NO: 49, SEQ ID NO: 54, SEQ ID NO: 74, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 83, SEQ ID NO: 84 and SEQ ID NO: 131.

In some embodiments, the Cas9 variant is modified with a substitution of Alanine (A) for Arginine (R) at position 499, a substitution of Lysine (K) for Glutamine (Q) at position 500, a substitution of Alanine (A) for Arginine (R) at position 654 and a substitution of Arginine (R) for Glycine (G) at position 655 for decrease of off-target and increase of fidelity.

In another aspect, the present disclosure in embodiments provides a nucleic acid encoding the Cas9 variant described above.

In some embodiments, the nucleic acid consists of the nucleotide sequence selected from SEQ ID NO: 85 to SEQ ID NO: 125 and SEQ ID NO: 132.

In still another aspect, the present disclosure in embodiments provides an expression vector comprising:

an encoding sequence comprising:

a first nucleic acid sequence encoding the Cas9 variant described above; and

a second nucleic acid sequence encoding a scaffold of a guide RNA (gRNA) which specifically directs the cleavage of a target gene to be edited by the Cas9 variant,

optionally a regulatory element, operably linked to the encoding sequence and configured to be suitable for expression of the encoding sequence in a cell to be edited.

In some embodiments, the first nucleic acid sequence consists of a nucleotide sequence selected from SEQ ID NO: 85 to SEQ ID NO: 125 and SEQ ID NO: 132.

In some embodiments, the second nucleic acid sequence consists of a nucleotide sequence selected from SEQ ID NO: 126 to SEQ ID NO: 129,

preferably the scaffold of the gRNA encoded by the second nucleic acid sequence comprises at least one of nucleotide mutations in a first stem-loop of the scaffold compared to the scaffold of the gRNA of a wild-type Cas9, wherein the nucleotide mutation in the first stem-loop is selected from 3rd Uracil (U) to Cytosine (C) , 4th Uracil (U) to Adenine (A) , 4th Uracil (U) to Cytosine (C) , 5th Uracil (U) to Cytosine (C) , 6th Adenine (A) to Guanine (G) , 32th Adenine (A) to Guanine (G) , 31th Adenine (A) to Thymine (T) , 31th Adenine (A) to Guanine (G) , 30th Adenine (A) to Guanine (G) and 29th Thymine (T) to Cytosine (C) .

In some embodiments, the regulatory element comprises a T7 promoter, an arabinose promoter phoA, tac, lpp, lac-lpp, lac, trp and trc, a CMV promoter, a RSV promoter, an SV40 promoter, an HSV promoter, a human Pol I promoter, human Pol II promoter or human Pol III promoter.

In some embodiments, the expression vector is an adenovirus vector, a lentiviral vector or a plasmid.

In yet another aspect, the present disclosure in embodiments provides a method for producing the expression vector described above, comprising transfecting the expression vector into a host cell, and isolating the expression vector from the host cell.

In a further aspect, the present disclosure in embodiments provides a kit for gene editing, comprising: a first nucleic acid molecule encoding the Cas9 variant described above; a second nucleic acid molecule encoding a scaffold of a gRNA that specifically directs the cleavage of a target gene to be edited by the Cas9 variant, a buffer suitable for gene editing, and an instruction for use of the kit, optionally one or more containers suitable for gene editing, wherein the first nucleic acid molecule and the second nucleic acid molecule are loaded in an expression vector.

In a furthermore aspect, the present disclosure in embodiments provides a method for gene editing, comprising transfecting the expression vector described above into a cell to be edited.

In some embodiments, the encoding sequence of the expression vector is determined based on the target gene to be edited.

In some embodiments, the encoding sequence encodes a Cas9 variant selected from SEQ ID NO: 49, SEQ ID NO: 54, SEQ ID NO: 74, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 131 and SEQ ID NO: 133.

In some embodiments, the cell to be edited is a prokaryotic cell or a eukaryotic cell; optionally, the cell to be edited is derived from animal, plant or microbe.

In a still further aspect, the present disclosure in embodiments provides a composition comprising the expression vector described above and a carrier material.

In a yet still aspect, the present disclosure in embodiments provides a pharmaceutical composition comprising the expression vector described above and a pharmaceutically acceptable carrier material.

In a yet still aspect, the present disclosure in embodiments provides a method for treating or preventing a disease in a subject, comprising administering a therapeutically effective amount of the composition described above or the pharmaceutical composition described above to the subject in need thereof, wherein the subject is preferably a human.

In a yet still aspect, the present disclosure in embodiments provides a pharmaceutical preparation for use in treating or preventing a disease in a subject, the pharmaceutical preparation being the composition described above or the pharmaceutical composition described above, wherein a therapeutically effective amount of the pharmaceutical preparation is administered to the subject in need thereof, wherein the subject is preferably a human.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the description of embodiments in combination with the accompanying drawings, in which:

Figure 1 is a graph showing a phylogenetic tree of Cas9 orthologs, where the phylogenetic tree is constructed by using Software Geneious R8.

Figure 2 is a graph showing the sequence alignment of the 982–994 peptide fragments of 33 orthologs.

Figure 3 is a graph showing sequence alignment of crRNA direct repeats in the corresponding species genome.

Figure 4 is a graph showing the wild-type and two optimized gRNA scaffolds for SaCas9 variants. The optimized gRNA-1 is developed previously. Cleavage activities of SaCas9-KKH (KKH) at CCCNNN PAMs with the wild-type gRNA scaffold or the optimized gRNA-2 scaffold are evaluated

Figure 5 is a graph showing a schematic of the EYFP reconstitution assay that is used to evaluate the cleavage activity of cCas9 variants at target sites containing indicated PAM sequences

Figure 6 is a graph showing the experiments are performed by using the EYFP reconstitution assay 3 days after transfection into HEK293FT cells.

Figure 7 is a graph showing illustration of gates strategy of EYFP reconstitution assay.

Figure 8 is a graph showing comparison of SaCas9 and SaCas9-KKH with the wild-type gRNA or optimized gRNA-2 at sites with CCCRRN and CCGRRN PAMs.

Figure 9 is a graph showing relative fluorescence intensity of EYFP measured by using flow cytometer 3 days after transfection into HEK293FT cells.

Figure 10 is a graph showing sequence alignment of the key 13-aa region of the PI domain in SaCas9-KKH, cCas9 v42, and v17.

Figure 11 is a graph showing functional characterization of SaCas9-KKH and cCas9 v42 at CCCNNN PAMs.

Figure 12 is a graph showing functional comparison of v42-KKH, v42-wild-type (v42-wt) , SaCas9 and SaCas9-KKH at CCGRRN and CCCRRN across eight plasmid doses by using the EYFP reconstitution assay. Data indicated the mean ± s.e.m. (n = 3 independent biological replicates) . The relative EYFP fluorescence intensity (y-axis) is measured by FACS.

Figure 13 is a graph showing functional characterization of v17K, v17L, cCas9 v42 and SaCas9-KKH at RRV PAMs.

Figure 14 is a graph showing PAM preference of SaCas9-KKH based variants. Cleavage activities of SaCas9-KKH with the R991K mutation and SaCas9-KKH with D987N and R991K double mutations at CCCRRN PAMs by using the EYFP reconstitution assay (n=1) .

Figure 15 is a graph showing comparison of variants at CCNRRN PAMs. Cleavage activities of by SaCas9-KKH, SaCas9, cCas9 v42 and v42-wild-type (v42-wt) at CCNRRN PAMs using the EYFP reconstitution assay. Data represents the mean (n = 3 independent biologic replicates) .

Figure 16 is a graph showing Indel frequencies induced by v42 or SaCas9-KKH at sites with NNNRRN PAMs measured by next-generation sequencing. Each point represents the mean of one endogenous site (n= 3 independent biological replicates) .

Figure 17 is a graph including: a. Indel frequencies at sites generated by each gRNA measured by using the next generation sequencing; b. Sequence logo showing the sequence preference in different PAM positions. Error bar, s.e.m.; *p<0.05 (paired t-test, two-tailed) ; **p<0.01 (paired t-test, two-tailed) ; ***p<0.001 (paired t-test, two-tailed) ; ****p<0.0001 (paired t-test, two-tailed) . (n = 3 independent biologic replicates) .

Figure 18 is a graph showing Indel frequency at NNVRRN PAMs. SaCas9, SaCas9-KKH, cCas9 v42 and v42-wild-type (v42-wt) are transfected along with different gRNAs targeting endogenous sites with indicated NNVRRN PAMs (V=A, C or G) . Indel frequencies are measured by using next generation sequencing. Each point represents the mean of one endogenous site. Black line indicates the mean indel frequency of all of the targets. **P<0.01 (paired t-test, two-tailed) ; ***P<0.001 (paired t-test, two-tailed) ; ****P<0.0001 (paired t-test, two-tailed) .

Figure 19 is a graph showing PAM preference of cCas9 v17 variants. a. Cleavage activities of cCas9 v17, v17 with the I991K mutation (v17-K) and v17 with the I991L mutation (v17-L) at CCCRRN PAMs by using the EYFP reconstitution assay. b. Indel frequencies generated by SaCas9-KKH, cCas9 v17-L and v42 at 37 different endogenous target sites with NNNRRV PAMs. Each point represents the mean of three independent biologic replicates targeting at one endogenous site. The black line indicates the mean value of all targets. ****P<0.0001 (paired t-test, twotailed) ; **P<0.01 (paired t-test, two-tailed) .

Figure 20 is a graph showing functional characterization of v17K, v17L, v42 and SaCas9-KKH at RRV PAMs.

Figure 21 is a graph including: a) the high activity of v17K, v42 and SaCas9-KKH at RRV PAMs; b) transactivation efficiency of SaCas9 variants. The IL1RN expression level is assayed using RT-PCR 4 days after transfecting dSaCas9-KKH: VPR or deactivated cCas9-v42: VPR in HEK293FT cells along with the corresponding gRNAs targeting at varying sites with indicated NNNRRN PAMs in the IL1RN promoter region.

Figure 22 is a graph showing sequence alignment of the key 13-aa region of the PI domain in SaCas9-KKH (KKH) , cCas9 v16 and v21.

Figure 23 is a graph showing functional characterization of SaCas9 variants at CCCNNN PAMs. Data indicated the mean (n=3 independent biologic replicates) activities of indicated cCas9 variants and SaCas9-KKH.

Figure 24 is a graph showing that SaCas9, SaCas9-KKH, cCas9 v21 with the I991R mutation (v21-R) , and v21 with the I991R mutation and the wild-type SaCas9 scaffold instead of SaCas9-KKH scaffold (v21R-wt) are used to target the fluorescent reporter gene containing sites with indicated PAMs.

Figure 25 is a graph showing that cleavage activities of SaCas9KKH (KKH) , cCas9 v21-R, and v21-R-HF (containing R499A, Q500K, R654A, and G655R mutations) at on-target and off-targets with dinucleotide mutations are evaluated by using the EYFP reconstitution assay. All targets contained a CCCAGT PAM. Bars represented the mean (±s.e.m., n=3) of EYFP fluorescent intensity measured by using flow cytometer 3 days after transfection into HEK293FT cells.

Figure 26 is a graph showing off-target effect of cCas9 variants at endogenous target sites. a. Sequence alignment of on-target sites and off-target sites of gRNA-a and gRNA-b. Letter in red indicates the mismatch. b. and c. Indel frequencies induced by SaCas9 and SaCas9KKH, cCas9 v21-R and v21-R-HF are measured by using the next generation sequencing 5 days after transfection into HEK293FT cells. Data indicate the mean ± SEM (n =2 independent replicates) . ****P<0.0001 (paired t-test, two-tailed) ; ***P<0.001 (paired ttest, two-tailed) ; *P<0.05 (paired t-test, two-tailed) .

Figure 27 is a graph showing comparison of chimeric cCas9 variants at non-NNNRRN PAMs, where a Direct comparison of v21-R, v21-L, N986S, v21-R-wt, v21-L-wt, N986S-wt, SaCas9, and SaCas9-KKH with eight plasmids doses across CCVACT, CCVGTG, CCVGTT, CCVGCT, CCVATT, CCVATG PAMs (V=A, C or G) by EYFP reconstitution assay. (n=3) Error bars, s.e.m. The relative EYFP fluorescence intensity (y-axis) is measured by FACS; b Indel frequency induced by indicated variant at the sites where the third position of PAM is A or C; and c Indel frequency induced by indicated variant at the sites where the third position of PAM is G.

Figure 28 is a graph showing genome editing efficiency of cCas9 variants at NNVRRN PAMs. Indel frequencies induced by indicated SaCas9 variants at indicated NNVRRN PAMs are measured by the next generation sequencing 5 days after transfection into HEK293FT cells.

Figure 29 is a graph showing genome editing efficiency of SshCas9-KKH and SlCas9-KKH at NNVRRN PAMs.

DETAILED DESCRIPTION

The present disclosure is described in detail with reference to the drawings and specific embodiments.

For the sake of clarity and readability, the following definitions are provided. Any technical feature mentioned for these definitions may be read on each and every embodiment of the invention. Additional definitions and explanations may be specifically provided in the context of these embodiments. Unless defined otherwise, all technical and scientific terms used herein generally have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, organic chemistry, and nucleic acid chemistry and hybridization are those well-known and commonly employed in the art. Standard techniques are used for nucleic acid and peptide synthesis. The techniques and procedures are generally performed according to conventional methods in the art and various general references (e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2d ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY) , which are provided throughout this document.

The same or similar elements and the elements having identical or similar functions are denoted by like reference numerals throughout the descriptions. Reference will be made in detail to embodiments of the present disclosure. The embodiments described herein with reference to drawings are explanatory, illustrative, and used to generally understand the present disclosure. The embodiments shall not be construed to limit the present disclosure. The terms “a, ” “an, ” “the” and similar referents used in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. Recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as” ) described herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention. As will be understood by one of ordinary skill in the art, each embodiment disclosed herein can comprise, consist essentially of or consist of its particular stated element, step, ingredient or component. Thus, the terms “include” or “including” should be interpreted to recite: “comprise, consist of, or consist essentially of. ” The transition term “comprise” or “comprises” means includes, but is not limited to, and allows for the inclusion of unspecified elements, steps, ingredients, or components, even in major amounts. The transitional phrase “consisting of” excludes any element, step, ingredient or component not specified. The transition phrase “consisting essentially of” limits the scope of the embodiment to the specified elements, steps, ingredients or components and to those that do not materially affect the embodiment.

Terms such as “first” and “second” are used herein for purposes of description and are not intended to indicate or imply relative importance or significance, impliedly indicate the quantity of the technical feature referred to or indicate the ordinal relation of elements or technical features. Thus, the feature defined with “first” and “second” may comprise one or more this feature. In the description of the present disclosure, “aplurality of” means two or more than two this features, unless specified otherwise

Protein: A protein typically comprises one or more peptides or polypeptides. A protein is typically folded into a 3-dimensional form, which may be required for the protein to exert its biological function. The sequence of a protein or peptide is typically understood to be the order, i.e. the succession of its amino acids.

Host cell: A host cell denotes an organism which is used for recombinant protein production. General host cells are bacteria, such as E. coli, yeasts, such as Saccharomyces cerevisiae or Pichia pastoris, or also mammal cells, such as human cells.

RNA, mRNA: RNA is the usual abbreviation for ribonucleic acid. It is a nucleic acid molecule, i.e. a polymer consisting of nucleotides. These nucleotides are usually adenosine-monophosphate, uridine-monophosphate, guanosine-monophosphate and cytidine-monophosphate monomers which are connected to each other along a so-called backbone. The backbone is formed by phosphodiester bonds between the sugar, i.e. ribose, of a first and a phosphate moiety of a second, adjacent monomer. The specific succession of the monomers is called the RNA sequence. Usually, RNA may be obtainable by transcription of a DNA sequence, e.g., inside a cell. In eukaryotic cells, transcription is typically performed inside the nucleus or the mitochondria. In vivo, transcription of DNA usually results in the so-called premature RNA, which has to be processed into so-called messenger RNA, usually abbreviated as mRNA. Processing of the premature RNA, e.g. in eukaryotic organisms, comprises a variety of different posttranscriptional-modifications such as splicing, 5’-capping, polyadenylation, export from the nucleus or the mitochondria and the like. The sum of these processes is also called maturation of RNA. The mature messenger RNA usually provides the nucleotide sequence that may be translated into an amino acid sequence of a particular peptide or protein. Typically, a mature mRNA comprises a 5’-cap, a 5’-UTR, an open reading frame, a 3’-UTR and a poly (A) sequence. Aside from messenger RNA, several non-coding types of RNA exist, which may be involved in the regulation of transcription and/or translation. and immunostimulation and which may also be produced by in vitro transcription.

DNA: DNA is the usual abbreviation for deoxyribonucleic acid. It is a nucleic acid molecule, i.e. a polymer consisting of nucleotide monomers. These nucleotides are usually deoxy-adenosine-monophosphate, deoxy-thymidine-monophosphate, deoxy-guanosine-monophosphate and deoxy-cytidine-monophosphate monomers which are –by themselves –composed of a sugar moiety (deoxyribose) , a base moiety and a phosphate moiety, and polymerized by a characteristic backbone structure. The backbone structure is, typically, formed by phosphodiester bonds between the sugar moiety of the nucleotide, i.e. deoxyribose, of a first and a phosphate moiety of a second, adjacent monomer. The specific order of the monomers, i.e. the order of the bases linked to the sugar/phosphate-backbone, is called the DNA-sequence. DNA may be single-stranded or double-stranded. In the double stranded form, the nucleotides of the first strand typically hybridize with the nucleotides of the second strand, e.g. by A/T-base-pairing and G/C-base-pairing.

Sequence of a nucleic acid molecule/nucleic acid sequence: The sequence of a nucleic acid molecule is typically understood to be the particular and individual order, i.e. the succession of its nucleotides.

Sequence of amino acid molecules/amino acid sequence: The sequence of a protein or peptide is typically understood to be the order, i.e. the succession of its amino acids.

Sequence identity: Two or more sequences are identical if they exhibit the same length and order of nucleotides or amino acids. The percentage of identity typically describes the extent, to which two sequences are identical, i.e. it typically describes the percentage of nucleotides that correspond in their sequence position to identical nucleotides of a reference sequence. For the determination of the degree of identity, the sequences to be compared are considered to exhibit the same length, i.e. the length of the longest sequence of the sequences to be compared. This means that a first sequence consisting of 8 nucleotides/amino acids is 80%identical to a second sequence consisting of 10 nucleotides/amino acids comprising the first sequence. In other words, in the context of the present invention, identity of sequences preferably relates to the percentage of nucleotides/amino acids of a sequence, which have the same position in two or more sequences having the same length. Gaps are usually regarded as non-identical positions, irrespective of their actual position in an alignment.

Vector: The term “vector” refers to a nucleic acid molecule, preferably to an artificial nucleic acid molecule. A vector in the context of the present invention is suitable for incorporating or harboring a desired nucleic acid sequence, such as a nucleic acid sequence comprising an open reading frame. Such vectors may be storage vectors, expression vectors, cloning vectors, transfer vectors etc. A storage vector is a vector, which allows the convenient storage of a nucleic acid molecule, for example, of an mRNA molecule. Thus, the vector may comprise a sequence corresponding, e.g., to a desired mRNA sequence or a part thereof, such as a sequence corresponding to the open reading frame and the 3’-UTR of an mRNA. An expression vector may be used for production of expression products such as RNA, e.g. mRNA, or peptides, polypeptides or proteins. For example, an expression vector may comprise sequences needed for transcription of a sequence stretch of the vector, such as a promoter sequence, e.g. an RNA polymerase promoter sequence. A cloning vector is typically a vector that contains a cloning site, which may be used to incorporate nucleic acid sequences into the vector. A cloning vector may be, e.g., a plasmid vector or a bacteriophage vector. A transfer vector may be a vector, which is suitable for transferring nucleic acid molecules into cells or organisms, for example, viral vectors. A vector in the context of the present invention may be, e.g., an RNA vector or a DNA vector. Preferably, a vector is a DNA molecule. Preferably, a vector in the sense of the present application comprises a cloning site, a selection marker, such as an antibiotic resistance factor, and a sequence suitable for multiplication of the vector, such as an origin of replication. Preferably, a vector in the context of the present application is a plasmid vector.

As used herein, the term "promoters" refers to a regulatory region of DNA usually located upstream of a gene, providing a control point for regulated gene transcription.

As used herein, the term "operably linked" refers to a functional relationship between two or more DNA segments, in particular gene sequences to be expressed and those sequences controlling their expression.

As used herein, the term “ortholog” describes genes in different species that derive from a single ancestral gene in the last common ancestor of the respective species.

As used herein, the term “wild-type Cas9” refers to all the naturally occurring Cas9 protein including but not limited to Streptococcus pyogenes Cas9 (SpCas9) , Staphylococcus aureus Cas9 (SaCas9) and Streptococcus canis Cas9 (ScCas9) , or its orthologs, for example, the 33 orthologs of SaCas9 listed in Figure 1, also referred to Absiella dolichum (O13) , Clostridium coleatum (O40) , Veillonella parvula (O23) , Alkalibacterium gilvum (O39) , Alkalibacterium sp. 20 (O26) , Lacticigenium naphtae (O18) , Alkalibacterium subtropicum (O38) , Carnobacterium iners (O12) , Carnobacterium viridans (O36) , Jeotgalibaca sp. PTS2502 (O27) , Listeria ivanovii sp. londoniensis (O10) , Bacillus massilionigeriensis (O33) , Bacillus niameyensis (O34) , Ureibacillus thermosphaericus-1 (O14) , Ureibacillus thermosphaericus-2 (O44) , Halakalibacillus halophilus (O15) , Paraliobacillus ryukyuensis (O28) , Sediminibacillus albus (O42) , Virgibacillus senegalensis (O20) , Pelagirhabdus alkalitolerans (O37) , Massilibacterium senegalense (O24) , Macrocococcus sp. IME 1552 (O43) , Staphylococcus (from multispecies) (O30) , Staphylococcus simulans (O31) , Staphylococcus sp. HMSC061G12 (O32) , Staphylococcus massiliensis (O29) , Staphylococcus microti (O16) , Staphylococcus haemolyticus (O19) , Staphylococcus sp. HMSC34C02 (O25) , Staphylococcus warneri (O21) , Staphylococcus schleiferi (O17) , Staphylococcus agnetis (O22) , or Staphylococcus lutrae (O35) , where the corresponding chimeric variants are named as v13, v40, v23, v39, v26, v18, v38, v12, v36, v27, v10, v33, v34, v14, v44, v15, v28, v42, v20, v37, v24, v43, v30, v31, v32, v29, v16, v19, v25, v21, v17, v35 and v22 respectively.

As used herein, the term “PAM interaction region (or PAM interaction domain, PI domain) ” in the wild-type SaCas9 or its variants generally refers to a conserved 13-amino acid region from position 982 to 994 which involves in binding to the 4th and 5th bases of the PAM, and the term “backbone region” in the wild-type SaCas9 or its variants generally refers to the remaining amino acids except for the 13 amino acids, including a first backbone region and a second backbone region, where an N-terminus of the PAM interaction region is connected to a C-terminus of the first backbone region, and a C-terminus of the PAM interaction region is connected to an N-terminus of the second backbone region, unless otherwise specified.

The present disclosure is accomplished by present inventors based on the following discoveries: a group of chimeric Cas9 (cCas9) variants are identified by replacing the key region in the PAM interaction (PI) domain of Staphylococcus aureus Cas9 (SaCas9) with the corresponding region in a panel of SaCas9 orthologs, and several cCas9 variants are identified by using a functional assay at target sites with different nucleotide recombinations at PAM position 3-6, with expanded recognition capability at NNVRRN, NNVACT, NNVATG, NNVATT, NNVGCT, NNVGTG and NNVGTT PAM sequences (N=A, T, G or C, and R=A or G) . In summary, a panel of cCas9 variants is provided by the present inventors, which are accessible up to 1/4 of all the possible genomic targets in mammalian cells.

In one embodiment, the SaCas9 has an amino acid sequence depicted in SEQ ID NO: 133. The SaCas9-KKH has three mutations (E782K, N968K and R1015H respectively) compared to the SaCas9.

In some embodiments, provided is a Cas9 variant including a first backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the first backbone region of a wild-type cas9; a protospacer adjacent motif (PAM) interaction region, being a 13-amino acid sequence deriving from the PAM interaction region of an ortholog of the wild-type Cas9, and having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the PAM interaction region of the ortholog of the wild-type Cas9; and a second backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the second backbone region of the wild-type cas9; wherein an N-terminus of the PAM interaction region is connected to a C-terminus of the first backbone region, and a C-terminus of the PAM interaction region is connected to an N-terminus of the second backbone region, and wherein the Cas9 variant has recognition capability at a PAM sequence selected from the group consisting of NNVRRN, NNVACT, NNVATG, NNVATT, NNVGCT, NNVGTG and NNVGTT PAM sequences, wherein N is adenine (A) , thymine (T) , cytosine (C) or guanine (G) ; R is adenine (A) or guanine (G) ; and V is adenine (A) , cytosine (C) or guanine (G) .

According to some embodiments, the wild-type Cas9 is derived from Micrococcus, Staphylococcus, Planoeoccus, Streptococcus, Leuconostoc, Pediococcus, Aerococcus or Gemella. Preferably, Staphylococcus includes Staphylococcus aureus, Staphylococcus epidermidis and Staphylococcus saprophyticus. Preferably, Streptococcus includes Streptococcus pyogenes, streptococcus equismilis, Streptococcus zooepidemicus, Streptococcus equi, Streptococcus dysgalactiae, Streptococcus sanguis, Streptococcus Pneumoniae, Streptococcus anginosus, Streptococcus agalactiae, streptococcus acidominimus, Streptococcus salivarius, Streptococcus mitis, Streptococcus bovis, streptococcus equinus, Streptococcus thermophilus, Streptococcus faecalis, streptococcus faecium, streptococcus avium, streptococcus uberis, Streptococcus lactis, streptococcus cremoris and Streptococcus canis. Preferably, the wild-type Cas9 is derived from Staphylococcus aureus (i.e. SaCas9) .

According to some embodiments, the ortholog of Staphylococcus aureus is selected from Absiella dolichum, Clostridium coleatum, Veillonella parvula, Alkalibacterium gilvum, Alkalibacterium sp. 20, Lacticigenium naphtae, Alkalibacterium subtropicum, Carnobacterium iners, Carnobacterium viridans, Jeotgalibaca sp. PTS2502, Listeria ivanovii sp. londoniensis, Bacillus massilionigeriensis, Bacillus niameyensis, Ureibacillus thermosphaericus-1, Ureibacillus thermosphaericus-2, Halakalibacillus halophilus, Paraliobacillus ryukyuensis, Sediminibacillus albus, Virgibacillus senegalensis, Pelagirhabdus alkalitolerans, Massilibacterium senegalense, Macrocococcus sp. IME 1552, Staphylococcus (from multispecies) , Staphylococcus simulans, Staphylococcus sp. HMSC061G12, Staphylococcus massiliensis, Staphylococcus microti, Staphylococcus haemolyticus, Staphylococcus sp. HMSC34C02, Staphylococcus warneri, Staphylococcus schleiferi, Staphylococcus agnetis and Staphylococcus lutrae. Preferably, the ortholog of Staphylococcus aureus is further selected from Sediminibacillus albus, Staphylococcus schleiferi, Staphylococcus simulans, Staphylococcus sp. HMSC061G12, Staphylococcus agnetis, Clostridium cocleatum, Absiella dolichum, Staphylococcus warneri, Staphylococcus microti, Massilibacterium senegalense, Lacticigenium naphtae and Halalkaibacillus halophilus. More preferably, the ortholog of Staphylococcus aureus is further selected from Sediminibacillus albus, Staphylococcus schleiferi, Staphylococcus warneri and Staphylococcus microti.

In some embodiments, the first and second backbone regions each independently have one or more of amino acid mutations compared to the first and second backbone region of the wild-type Cas9 especially SaCas9, where the amino acid mutation is a substitution, a deletion and an addition.

According to some embodiments, the first backbone region includes the amino acid mutation selected from the group consisting of:

a substitution of Alanine (A) for Arginine (R) at position 499,

a substitution of Lysine (K) for Glutamine (Q) at position 500,

a substitution of Alanine (A) for Arginine (R) at position 654,

a substitution of Arginine (R) for Glycine (G) at position 655,

a substitution of Lysine (K) for Glutamicacid (E) at position 782, and

a substitution of Lysine (K) for Asparagine (N) at position 968.

The second backbone region includes an amino acid mutation at position 1015, where the amino acid mutation is a substitution of Histidine (H) for Arginine (R) ,

In some embodiments, the PAM interaction region further has one or more of amino acid mutations compared to the PAM interaction region of the ortholog of the wild-type Cas9 especially SaCas9, where the amino acid mutation is a substitution, a deletion and an addition.

According to an embodiment, the PAM interaction region has an amino acid mutation at position 991 compared to the PAM interaction region of Staphylococcus schleiferi (O17) , where the amino acid mutation is a substitution of Lysine (K) or Leucine (L) for Isoleucine (I) .

According to an embodiment, the PAM interaction region has an amino acid mutation at position 991 compared to the PAM interaction region of Staphylococcus warneri (O21) , where the amino acid mutation is a substitution of Lysine (K) , Leucine (L) or Arginine (R) for Isoleucine (I) .

According to an embodiment, the PAM interaction region has an amino acid mutation at position 986 compared to the PAM interaction region of Staphylococcus aureus (Sa) , where the amino acid mutation is a substitution of Serine (S) for Asparagine (N) .

In some embodiments, the Cas9 variant is selected from v13, v40, v23, v39, v26, v18, v38, v12, v36, v27, v10, v33, v34, v14, v44, v15, v28, v42, v20, v37, v24, v43, v30, v31, v32, v29, v16, v19, v25, v21, v17, v35 and v22. The Cas9 variant further has one or more of amino acid mutations compared to the wild-type Cas9 especially SaCas9, where the amino acid mutation is a substitution, a deletion and an addition. Preferably, the Cas9 variant is selected from v42, v42-wt, v17-K, v17-L, v16, v21-L, v21-R, v21-K, v21-R-HF and SaCas9-N986S. Preferably, the Cas9 variant is selected from v21-R-HF. According to the present disclosure, the Cas9 variant has an extended PAM preference compared to SaCas9 or SaCas9-KKH, with a recognition capability of up to 1/4 of all the possible genomic targets in mammalian cells, thus improving the genome editing capability of the CRISPR/Cas9 system.

In another embodiment, provided is a nucleic acid encoding the Cas9 variant described above, where the nucleic acid at least has the nucleotide sequence as set forth in any one of SEQ ID NOs: 85 to 125 and SEQ ID NO: 132 . According to the present disclosure, the nucleic acid is capable of expressing the Cas9 variant under a suitable condition after introducing into the cell to be edited, thus activating the CRISPR/Cas9 system, thereby improving the genome editing capability of the CRISPR/Cas9 system.

In another embodiment, provided is an expression vector comprising an encoding sequence comprising: a first nucleic acid sequence encoding the Cas9 variant described above; and a second nucleic acid sequence encoding a scaffold of a guide RNA (gRNA) which specifically directs the cleavage of a target gene to be edited by the Cas9 variant described above, optionally a regulatory element, operably linked to the encoding sequence and configured to be suitable for expression of the encoding sequence in a cell to be edited.

The first nucleic acid sequence consists of a nucleotide sequence selected from SEQ ID NO: 85 to SEQ ID NO: 125 and SEQ ID NO: 132.

The second nucleic acid sequence consists of a nucleotide sequence selected from SEQ ID NO: 126 to SEQ ID NO: 129. Preferably, the scaffold of the gRNA encoded by the second nucleic acid sequence comprises at least one of nucleotide mutations in a first stem-loop of the scaffold compared to the scaffold of the gRNA of a wild-type Cas9, where the nucleotide mutation in the first stem-loop is selected from 3rd Uracil (U) to Cytosine (C) , 4th Uracil (U) to Adenine (A) , 4th Uracil (U) to Cytosine (C) , 5th Uracil (U) to Cytosine (C) , 6th Adenine (A) to Guanine (G) , 32th Adenine (A) to Guanine (G) , 31th Adenine (A) to Thymine (T) , 31th Adenine (A) to Guanine (G) , 30th Adenine (A) to Guanine (G) and 29th Thymine (T) to Cytosine (C) .

The regulatory element comprises a T7 promoter, an arabinose promoter phoA, tac, lpp, lac-lpp, lac, trp and trc, a CMV promoter, a RSV promoter, an SV40 promoter, an HSV promoter, a human Pol I promoter, human Pol II promoter or human Pol III promoter.

The expression vector is an adenovirus vector, a lentiviral vector or a plasmid.

In another embodiment, provided is a method for producing the expression vector described above, comprising transfecting the expression vector into a host cell, and isolating the expression vector from the host cell.

In a further embodiment, the present disclosure in embodiments provides a kit for gene editing, comprising: a first nucleic acid molecule encoding the Cas9 variant described above; a second nucleic acid molecule encoding a scaffold of a gRNA that specifically directs the cleavage of a target gene to be edited by the Cas9 variant, a buffer suitable for gene editing, and an instruction for use of the kit, optionally one or more containers suitable for gene editing, where the first nucleic acid molecule and the second nucleic acid molecule are loaded in an expression vector, where the first nucleic acid molecule at least has the nucleotide sequence as set forth in any one of SEQ ID NOs: 85 to 125 and SEQ ID NO: 132 ; and the second nucleic acid molecule at least has the nucleotide sequence as set forth in any one of SEQ ID NOs: 127 to 129 .

According to the present disclosure, the Cas9 variant is capable of recognizing up to 1/4 of all the possible PAMs in mammalian cells under the guidance of such the gRNA, thus improving the genome editing capability of the CRISPR/Cas9 system.

In a furthermore embodiment, provided is a method for gene editing, comprising transfecting the expression vector described above into a cell to be edited. According to the present disclosure, use of the present method and specific first nucleic acid molecule and second nucleic acid molecule has successfully edited many genes, such as EMX1, IL1RN, RUNX1 and ZSCAN2.

In some embodiments, the amino acid sequence of the PAM interaction region and the nucleotide sequence encoding a scaffold of the gRNA which specifically directs the cleavage of a target gene to be edited by the Cas9 variant are specifically determined by the gene to be edited, which is exemplified by the following Table.

In some embodiments, at least genes EMX1, IL1RN, RUNX1 and ZSCAN2 are successfully edited by the present method for gene editing under specific PAM interaction regions and the corresponding gRNAs, with high on-target, fidelity and efficiency.

In a still further embodiment, the present disclosure provides a composition comprising the expression vector described above and a carrier material.

In a yet still embodiment, the present disclosure provides a pharmaceutical composition comprising the expression vector described above and a pharmaceutically acceptable carrier material.

In a yet still embodiment, the present disclosure provides a method for treating or preventing a disease in a subject, comprising administering a therapeutically effective amount of the composition described above or the pharmaceutical composition described above to the subject in need thereof, where the subject is preferably a human.

In a still further embodiment, the present disclosure provides a pharmaceutical preparation for use in treating or preventing a disease in a subject, the pharmaceutical preparation being the composition described above or the pharmaceutical composition described above, where a therapeutically effective amount of the pharmaceutical preparation is administered to the subject in need thereof, where the subject is preferably a human.

By treating or treatment is meant at least one of:

(i) . inhibiting the disease i.e. arresting, reducing or delaying the development of the disease or a relapse thereof or at least one clinical or subclinical symptom thereof, or

(ii) . relieving or attenuating one or more of the clinical or subclinical symptoms of the disease.

By prevention is meant (i) preventing or delaying the appearance of clinical symptoms of the disease developing in a mammal.

The benefit to a subject to be treated is either statistically significant or at least perceptible to the patient or to the physician. In general a skilled man can appreciate when "treatment" occurs. It is particularly preferred if the pharmaceutical compositions of the invention are used therapeutically, i.e. to treat a condition which has manifested rather than prophylactically. It may be that the pharmaceutical composition of the invention is more effective when used therapeutically than prophylactically.

The pharmaceutical composition of the invention can be used on any animal subject, in particular a mammal and more particularly a human or an animal serving as a model for a disease (e.g., rat, mouse, pig, monkey, etc. ) . For example, in one use a pharmaceutical combination of the invention is used as a positive control in the animal subject to test other compounds for activity and/or side effects.

In order to treat a disease an effective amount of the active pharmaceutical composition needs to be administered to a patient. A "therapeutically effective amount" means the amount of a pharmaceutical composition that, when administered to an animal for treating a state, disorder or condition, is sufficient to effect such treatment. The "therapeutically effective amount" will vary depending on the pharmaceutical composition, the disease and its severity and the age, weight, physical condition and responsiveness of the subject to be treated and will be ultimately at the discretion of the attendant doctor.

The pharmaceutical composition of the invention typically comprises the active components in admixture with at least one pharmaceutically acceptable carrier selected with regard to the intended route of administration and standard pharmaceutical practice.

The term "carrier" refers to a diluent, excipient, and/or vehicle with which an active compound is administered. The pharmaceutical compositions of the invention may contain combinations of more than one carrier. Such pharmaceutical carriers are well known in the art. The pharmaceutical compositions may also comprise any suitable binder (s) , lubricant (s) , suspending agent (s) , coating agent (s) , and/or solubilizing agent (s) and so on. The pharmaceutical composition can also contain other active components, e.g. other drugs for the treatment of skin disorders.

It will be appreciated that pharmaceutical compositions for use in accordance with the present invention may be in the form of oral, parenteral, transdermal, sublingual, topical, implant, nasal, or enterally administered (or other mucosally administered) suspensions, capsules or tablets, which may be formulated in conventional manner using one or more pharmaceutically acceptable carriers or excipients. The pharmaceutical compositions of the invention could also be formulated as nanoparticle formulations.

The pharmaceutical composition of the invention will preferably be administered topically. The pharmaceutical composition may therefore be provided in the form of a cream, gel, foam, salve or ointment.

Administration may be once a day, twice a day, or more often, and may be decreased during a maintenance phase of the disease or disorder, e.g. once every second or third day instead of every day or twice a day. The dose and the administration frequency will depend on the clinical signs, which confirm maintenance of the remission phase, with the reduction or absence of at least one or more preferably more than one clinical signs of the acute phase known to the person skilled in the art.

Examples

Example 1 Engineering chimeric SaCas9 variants

The present inventors firstly searched for SaCas9 orthologs in the NCBI database by performing a BLAST analysis with the full-length amino acid sequences of SaCas9, with 33 SaCas9 orthologs found, including 11 orthologs identified in Staphylococcus species that showed a close homology to SaCas9 (refer to Figure 1) . For different SaCas9 orthologs, they are named as O + number instead of the full name thereof respectively, such as O21, O22 and the like. The SaCas9-KKH recognizing the NNNRRT PAM can be abbreviated as RRT for clarity.

It is found that the crucial 13-aa region from 982 to 994 involved in binding to the 4th and 5th bases of the PAM is conserved in general (refer to Figure 2) . Besides, the key regions of

ortholog

18 and 39 are identical to each other. It is found in surprise that the amino acid residues at position 986 and particularly that at position 991 are highly diverse, suggesting that these orthologs may recognize different PAM sequences (refer to Figure 2) . In addition, the residues at both N-terminal and C-terminal anchors of the crucial 13-aa region (such as amino acid residues at positions 982, 983, 990, 992, 993 and 994) are highly conserved, which may help to accommodate the structural changes in the chimeric variants (refer to Figure 2) .

Based on the above finding, the present inventors subsequently generated 33 unique chimeric Cas9 (cCas9) variants by replacing this crucial 13-aa region in SaCas9-KKH with that region in SaCas9 orthologs, where the cCas9 variants are named as v+number for clarity, for example, the cCas9 variant constructed by replacing this crucial 13-aa region in the SaCas9-KKH with that region in O32 SaCas9 ortholog is named as v32. The cCas9 variant can be further mutated, for example, the amino acid of v21 at position 991 can be mutated from Isoleucine (I) into Arginine (R) , with the mutated v21 named as v21R.

Example 2 Description of PAM preference of chimeric SaCas9s

It has been found in the art that altering the 3rd or 4th U in the first stem-loop of the gRNA scaffold to disrupt the putative “UUUU” terminator sequences for Polymerase III can enhance Cas9 activity, likely due to the increased gRNA expression level. In order to optimize the gRNA, the present inventors have discovered different CRISPR locus sequences in microorganisms containing the SaCas9 orthologs by using the CIRPSRfinder program, thus obtaining the crRNA direct repeat region of all of the SaCas9 orthologs. During that, the present inventors further found that the crRNA direct repeat region of all of the SaCas9 orthologs is highly diverse except for the first conserved 6-nt at the 5'-end, which may have evolved to avoid self-cleavage by its own Cas9 nucleases (refer to Figure 3) . To prevent gRNA self-targeting when testing SaCas9 activity, the present inventors generated an optimized gRNA-2 shown in SEQ ID NO: 127 by mutating the 2nd U to C (refer to Figure 4) .

The present inventors compared the PAM preference of different cCas9 variants by using a previous enhanced yellow fluorescent protein (EYFP) reconstitution assay in HEK293FT cells (refer to Figure 5) , where cCas9 binds to the binding site in EYFP reporter gene via gRNA guidance followed by cleaving the gene, after that a complete EYFP reporter gene is reconstructed by the action of homology directed repair (HDR) .

The present inventors further compared the PAM preference of SaCas9-KKH directed by either the original gRNA or optimized gRNA-2 after transfection into HEK293FT cells for 3 days (refer to Figure 6) . Since the SaCas9-KKH recognizes NNNRRT PAM sites and the 13-aa region is only responsible for the contact with 4th to 6th position of the PAM sequences, the present inventors arbitrarily assigned triple cytosines at the first to third PAM positions, and tested the PAM recognition preference to all 64 different sequences varying at

PAM position

4, 5, and 6. Consistently, SaCas9-KKH with the wild-type gRNA scaffold showed a strong activity at the CCCRRT PAMs and a weak activity at the CCCGGA, CCCGGC, and CCCAGC PAMs. When directed by the optimized gRNA scaffold, SaCas9-KKH retained the high activity at the CCCRRT PAMs and displayed a weak activity at the CCCATT, CCCCGT, and most of the CCCRRV (V = A, C, G) PAMs (refer to Figure 6, Figure 7) . The wild-type SaCas9 requires guanine at the third PAM position. To investigate whether the optimized gRNA would also be suitable for the wild-type SaCas9, the present inventors further compared the activity of SaCas9 and SaCas9-KKH at CCGRRN with a guanine at the third position and CCCRRN PAMs with a cytosine at the third position, guided with wild-type gRNA or optimized gRNA-2. Similarly, the present inventors observed increased activity at CCGRRV PAMs by SaCas9 and CCSRRV PAMs (S= C or G) by SaCas9-KKH when directed with the optimized gRNA-2 (refer to Figure 8) . Therefore, the present inventors used the optimized gRNA-2 in the rest of the study.

Furthermore, the present inventors evaluated the DNA cleavage efficiency of 32 cCas9 variants at all 64 different CCCNNN PAMs after transfection into HEK293FT cells for 3 days. The present inventors found that 2/3 cCas9 variants displayed a different PAM recognition pattern from SaCas9-KKH (refer to Figure 9) , where the v42, v17, v31, v32 and v35 showed a more extended PAM preference at sites with NNVRRV PAMs (V= A, C and G) relative to SaCas9-KKH; v32 showed a strong activity at the CCCACG and CCCACT PAMs; and v42 displayed a strongest activity at NNVRRV PAMs.

In addition, v24, v16 and v21 also exhibit extended PAM preference at sites with CCCATG, CCCATT, CCCGTG, CCCGTT and other sites; V18 shows a different PAM preference in a weak activity; and v15 exhibits PAM preference at sites with CCCATA, CCCATC, CCCGTA and CCCGTC.

Overall, the present inventors generate several cCas9s exhibiting different PAM preference.

Example 3 Description of PAM preference of specific cCas9s

In this example, the present inventors selected cCas9 v42 and v17 for further analysis because v42 and v17 displayed an enhanced activity at the CCCRRV PAMs compared to the SaCas9-KKH (refer to Figure 9-13) . Sequence alignment showed that the key 13-aa region of SaCas9-KKH differed from that of v42 ortholog by three amino acid residues, and differed from that of v17 ortholog by four amino acid residues, where the amino acid at position 986 of v42 and v17 are both N, and the amino acid at position 991 of v42 is K different with the I for v17 (refer to Figure 10) . Based on the above, the present inventors made some mutations in the 13-aa region of the SaCas9-KKH to explore the effect of such the mutations to PAM recognition. After introducing mutations in a stepwise fashion, the present inventors found that R991K mutation reduced the preference of SaCas9-KKH to thymidine at the 6th PAM position, and R991K/D987N double mutations further relaxed the PAM specificity (refer to Figure 14)

Since the wild-type SaCas9 requires the guanine at the third position in the PAM and has better activity at sites of NNGRRV PAMs than SaCas9-KKH, the present inventors further fused the 13-aa region of v42 to the wild-type SaCas9 to generate the cCas9 v42-wild-type (v42-wt) . In addition, since the three-dimensional (3D) structure suggests that SaCas9 does not make direct contact with the first two nucleotides in the PAM sequences, the present inventors arbitrarily assigned double cytosines at the first two PAM positions in the experimental set-up.

To characterize the cleavage activity of the cCas9 v42 and cCas9 v42-wild-type (v42-wt) at either CCGRRN or CCCRRN PAM, the present inventors transfected HEK293FT cells with four plasmid mixtures, in which one plasmid expressed EBFP as the transfection marker, one expressed the optimized gRNA-2, one expressed the reporter EYFP gene and the fourth plasmid expressed the corresponding Cas9 variants. Notably, cCas9 v42 showed the highest activity at the CCCRRV (V = A, C and G) PAMs over a range of plasmid doses (refer to Figure 12) . As expected, the present inventors observed only basal level activity of cCas9 v42-wild-type at CCCRRN PAMs (refer to Figure 12) . By contrast, v42-wild-type showed the highest activity at the CCGRRV PAMs (refer to Figure 12) . Furthermore, SaCas9 and SaCas9-KKH showed the highest efficiency at the CCGRRT and CCCRRT, respectively (refer to Figure 12) . In addition, the present inventors demonstrated that cCas9 v42 displayed a comparable activity at the CCARRN PAMs, but only showed a weak activity at the CCTRRN PAMs (refer to Figure 15) . Collectively, these results suggested that cCas9 v42 had an expanded PAM recognition at the NNVRRV PAMs and a slightly increased activity at NNTRRV PAMs compared to SaCas9-KKH, with the assumption that the first two nucleotides can be either A, C, G, and T. To demonstrate that the v42 variant enables targeting of human endogenous sites in NNNRRV PAMs most of which currently cannot be efficiently recognized by SaCas9-KKH, the present inventors tested the v42 activity on 77 different endogenous gene target sites with a panel of NNNRRN PAMs with a lower plasmid dosage of 25 ng per transfection experiment. In general, cCas9 v42 displayed a higher activity at NNNRRV PAMs and a comparable activity at NNNRRT PAMs. (Refer to Figure 16, Figure 17) . Sequence logo derived from sites with more than 5%Indel frequency by v42 revealed preference of A, C, and G at the third position and no strong preference at the first two positions (refer to Figure 17) , which were consistent with our reporter assay (Figure 12 and Figure 15) and a previous report that SaCas9 does not make direct contact with the first two nucleotides in the PAM sequence. To further examine whether the KKH trimutation in the chimeric Cas9 indeed expanded the third position of PAM, the present inventors selected 33 different endogenous gene target sites containing NNVRRN PAMs (V = A, C or G) . As expected, only the cCas9 v42 showed efficient activity with a mean mutagenesis frequency above 10%across all of the sites, but v42-wild-type, SaCas9 and SaCas9-KKH displayed only basal levels of indels at the NNARRN PAMs and low activity with a mean Indel frequency <6%at the NNCRRN PAMs (Figure 18) .

Given the expanded PAM preference of v17, the present inventors further discover the PAM recognition preference of v17 to all 64 different sequences varying at

PAM position

4, 5, and 6. It is surprisingly found that the cCas9 v17 with either an I991K or I991L mutation (cCas9 v17-K and v17-L) expanded the activity on targets containing CCCRRN PAMs (Figure 19a) , where cCas9 v17 I991K shows a strong activity at GCC and GCG PAMs, while cCas9 v17 I991L shows a strong activity at GCA and GCT PAMs.

To compare the activity of v17-L, v42, and SaCas9-KKH when targeting endogenous target sites in HEK293FT cells, the present inventors performed the deep sequencing analysis on the indel frequency at 37 different endogenous target sites with NNNRRV PAMs. The present inventors observed that v17-L displayed about half of the sites showing higher than 5%indels with a mean mutagenesis frequency of 9.5% (Figure 19b) .

The present inventors further compared the activity of cCas9 v17 I991K, cCas9 v17 I991L, v42 and SaCas9-KKH on 16 RRN PAMs. It is found that v17K, v17L and v42 show a stronger activity on RRV PAMs and a comparable activity on RRT PAMs compared to SaCas9-KKH (refer to Figure 20) .

To assay the activity of cCas9 when targeting endogenous target sites in cells, the present inventors performed the deep sequencing analysis for v42 and SaCas9-KKH on the indel frequency at different endogenous target sites with NNNRRV PAMs (such as, GGC, GAA, AGG, AGC and AGT PAMs) .

For the endogenous EMX1 and ZSCAN2 gene editing assay, SaCas9 variant (such as, v42 and SaCas9-KKH) plasmid DNA, gRNA plasmid DNA and transfection control plasmid DNA that encoding a constitutively expressed puromycin gene were mixed and co-transfected into each well of a 96-well plate containing HEK293FT cells according to the T7E1 experiment, with observation on

Day

1, 2, 4 and 8 post transfection. It is found in Figure 20 that v42 shows a higher indel frequency than the SaCas9-KKH on

Day

1, 2 and 4 at each of the GGC, GAA, AGG, AGC and AGT PAMs and reaches the maximum indel frequency earlier than the SaCas9-KKH, while v42 and SaCas9-KKH have no significant difference in the indel frequency on Day 1 to 8 at AGT PAM. Further, v42 and SaCas9-KKH both reach the maximum indel frequency on Day 8 at all the five PAMs.

The present inventors further detect the indel frequency of v17, v42 and SaCas9-KKH at other RRN PAMs (such as, AGA, GAG, GGG and the like) according to the endogenous gene editing assay as described above, finding no significant difference in DNA cleavage efficiency among v17, v42 and SaCas9-KKH (refer to Figure 21a) . Consistently, the present inventors also observed that nuclease inactivated cCas9 v42 fused with the gene activation domain VPR induced a 3-7-fold increase of IL1RN gene expression level when targeting the endogenous sites containing NNNRRV PAMs in the IL1RN promoter region, but resulted in a comparable IL1RN gene expression level when targeting the endogenous sites with NNNRRT PAMs (such as, GAT and GGT PAMs) (Figure 21b) .

In addition, the present inventors further selected cCas9 v16 and v21 for further analysis because the residues at both

position

986 and 991 in cCas9 v16 and v21 differed from those in the SaCas9-KKH (refer to Figure 22) , and these two variants showed a different PAM recognition pattern compared to the SaCas9-KKH (refer to Figure 9, Figure 23) . The present inventors mutated the Isoleucine (I) at position 991 to Leucine (L) , Lysine (K) or Arginine (R) , which were among the top residues that frequently appeared at position 991 in all 33 SaCas9 orthologs, generating cCas9 v21 I991L (v21-L) , v21 I991K (v21-K) and v21 I991R (v21-R) variants (refer to Figure 23) . The present inventors found that these mutations increased the activity of cCas9 v21 on targets containing several non-NNNRRN-expanded PAM sequence, including CCCACT, CCCATG, CCCATT, CCCGCT, CCCGTG and CCCGTT (refer to Figure 23) . Furthermore, it is surprisingly found that v16 and v21 shared the same Serine (S) residue at position 986, which was different from the Asparagine (N) at the same position in SaCas9. The present inventors showed that the SaCas9 variant with N986S mutation also expanded the PAM specificity of SaCas9-KKH with a similar PAM recognition pattern compared to cCas9 v16 and v21 variants (refer to Figure 23) . Similar to the cCas9 v42 variant, the present inventors confirmed that the cCas9 v21-R variant showed efficient activities at six different PAMs with the adenosine, guanine, or cytosine but not thymidine at the third position (refer to Figure 24) .

Overall, the present inventors generate a group of chimeric Cas9 variants with expanded recognition capability at ACT, ATG, ATT, GCT, GTG, and GTT PAMs, as evidenced at least by the table below. By measuring the activity of such the cCas9 variants at those PAMs, the present inventors observed that v21 I991R (v21R) has an increased activity than v16 at ACT PAM; v21R and v21 I991L (v21L) both show a significant increased activity than SaCas9-KKH at ATG PAM; v21L and v21R exhibit higher activity than SaCas9-KKH which is in a weak activity at ATT PAM; v42 and v21 demonstrate increased activity than SaCas9-KKH at GCT PAM; and v21L and v21R display significant higher activity than SaCas9-KKH (in a basal level activity) at GTG and GTT PAMs.

Example 4 Evaluation of cCas9s on endogenous gene editing

To evaluate the off-target activity of cCas9 variants, the present inventors generated a panel of gRNAs with dinucleotide mutations to target a reporter gene containing the CCCAGT PAM (refer to Figure 25) . It has been reported in the art that neutralization of positively charged residues positioned proximally to the non-target strand groove promotes re-hybridization between the target and non-target mutations, resulting in mutant SpCas9 and SaCas9 with improved specificity. Accordingly, the present inventors engineered the cCas9 v21-R with R499A, Q500K, R654A, and G655R mutations (v21-R-HF) . The present inventors demonstrated that the cCas9 v21-R-HF retained a similar activity at the on-target but a negligible activity at the off-targets with dinucleotide mutations compared to SaCas9-KKH (refer to Figure 25) . As shown in Figure 26, v21-R-HF displayed significantly decreased rates of mutagenesis at two out of three endogenous off-target sites containing one point mutation in the spacer sequences when directed by either wild-type gRNA or optimized gRNA-2 scaffold. To further examine the nuclease activity of chimeric Cas9 variants at these 6 PAMs in a dose experiment, the present inventors fused the 13-aa of v21-R, v21-L, N986S into wild-type SaCas9 (v21-R-wt, v21-L-wt, N986S-wt) , and tested the activity of these variants at 18 different PAMs with a guanine, a cytosine or an adenine at the third PAM position. By using the fluorescent reporter assay (refer to Figure 5) , the present inventors observed that v21-L and v21-R showed high activities at CCMACT, CCMATG, CCMATT, CCMGCT, CCMGTG, and CCMGTT PAMs (M=A or C) , while N986S displayed relatively high efficiencies at CCMGTT, CCMATT, and CCMACT PAMs (refer to Figure 27a) . Similarly, the cCas9 variants with the wild-type SaCas9 scaffold were highly active at PAM sites with a guanine at the third position (refer to Figure 27) .

Then, the present inventors selected 11 endogenous target sites with the non-NNNRRN PAMs and assayed the activities of different cCas9 variants by using the deep-sequencing analysis. The present inventors observed that the average indel frequencies induced by using v21-R, v21-L, N986S and v21-R-HF were >10%when targeting endogenous sites with six different PAMs (refer to Figure 27b, c) .

Furthermore, chimeric Cas9 variants with the scaffold of either wild-type SaCas9 or SaCas9-KKH displayed higher level of indels than SaCas9-KKH at sites of non-NNNRRN PAMs with a guanine at the third position (refer to Figure 27c) .

In addition, the present inventors also confirmed that both v21-L and v21-R efficiently induced indels when targeting endogenous sites with NNVRRN PAMs (refer to Figure 28) . Altogether, these results showed that cCas9 v21-R had an expanded PAM recognition compared to SaCas9 and SaCas9-KKH.

Moreover, the present inventors further cloned the full length of SaCas9 orthorlogs with some site mutations and then detect the PAM preference of the mutated SaCas9 orthorlogs. In specific, the present inventors firstly subjected the SaCas9 Orthorlog 32 (O32, Staphylococcus sp. HMSC061G12 Cas9, i.e. SshCas9) , SaCas9 Orthorlog35 (O35, Staphylococcus lutrae Cas9, i.e. SlCas9) and SaCas9-KKH (KKH) to multiple sequence alignment through the ESPript server. The SshCas9, SlCas9 and SaCas9-KKH display highly homologous sequences, where three triangles indicate three mutations E782K/N968K/R1015H of the SaCas9-KKH (KKH) as compared with the wild-type SaCas9. Further, given the SaCas9-KKH with E782K/N968K/R1015H triple mutations can recognize an NNNRRT PAM superior to NNGRRT PAM by the wild-type SaCas9, the present inventors made E782K/N968K/R1015H triple mutations as the backbone region for SshCas9 and Q782K/Y968K/R1015H triple mutations for SlCas9, generating SshCas9-KKH and SlCas9-KKH, respectively.

Then, the present inventors evaluated the DNA cleavage efficiency of SshCas9-KKH, SlCas9-KKH and SshCas9 and SlCas9 as controls at all 64 different PAMs varying at

positions

4, 5 and 6 after transfection into HEK293FT cells for 3 days as described above, each experiment in triplicate. The results in Figure 29 show that SshCas9-KKH and SlCas9-KKH display strong activity at RRV PAMs.

The present inventors also performed an endogenous gene editing assay for SshCas9-KKH and SlCas9-KKH through the T7E1 experiment in triplicate, and the results show both the SshCas9-KKH and SlCas9-KKH exhibit significantly higher indel frequency than the SaCas9-KKH (nearly no gene editing activity) at RRV PAMs after 8 days post transfection.

General methods:

Reagents and enzymes. Restriction endonuclease, polynucleotide kinase (PNK) , T4 DNA ligase, and Q5 High-Fidelity DNA Polymerase were purchased from New England Biolabs. Oligonucleotides were synthesized by Ruibiotech.

Plasmid DNA constructs. The gRNA sequences and associated primers are specifically synthesized as required. The constructs are made according to the general procedure in the art.

Cell culture and transfection. The HEK293FT cell line was purchased from Life Technologies. HEK293FT cells were cultured in high-glucose DMEM complete media (Dulbecco’s modified Eagle’s medium (DMEM) , 4.5 g/L glucose, 0.045 unit/mL of penicillin, 0.045 g/mL streptomycin, and 10%FBS (Life Technologies) ) at 37 ℃, 100%humidity, and 5%CO ₂. One day before transfection, ～1.2 × 105 HEK293FT cells in 0.5 mL of high-glucose DMEM complete media were seeded into each well of 96-well plastic plates (Falcon) . Shortly before transfection, the medium was replaced with fresh DMEM complete media. The transfection experiments were performed by using EpFect transfection reagent (SyngenTech) by following the manufacturer’s protocol. Each transfection experiment was independently repeated.

For the EYFP reconstitution reporter assay, 50 ng plasmid DNA encoding Cas9 variant if not emphasized particularly, 50 ng transfection control plasmid DNA (pB018 CAG: TagBFP) that constitutively express TagBFP, 50 ng plasmid DNA encoding the gRNA with a spacer sequence “ATACGTTCTCTATCACTGATA” , and 50 ng plasmid DNA encoding the inactive EYFP reporter gene that can be reconstituted via homologous recombination after DNA cleavage were mixed and co-transfected into each well of a 96-well plate. For the endogenous editing assay, in Figure 16 and Figure 19b, 25 ng Cas9 variant plasmid DNA, 25 ng gRNA plasmid DNA, and 50 ng transfection control plasmid DNA (hEF1α: EYFP-2A-puro) that encoded a constitutively expressed puromycin gene were mixed and co-transfected into each well of a 96-well plate. In Figures 26b, 26c, 100 ng Cas9 variant plasmid DNA, 100 ng gRNA plasmid DNA and 50 ng transfection control plasmid DNA were co-transfected into each well of a 96-well plate. Unless otherwise stated, 50 ng Cas9 variants plasmid DNA, 50 ng gRNA plasmid DNA, and 50 ng transfection control plasmid DNA were co-transfected into each well of a 96-well plate. To select transfected cells, puromycin (Invitrogen) was added at a final concentration of 10 μg/mL after 1 day, and fresh DMEM complete media were replaced after 4 days.

For each IL1RN gene activation assay, 50 ng plasmid DNA encoding Cas9 fused to the transactivation domain VPR and 12.5 ng of four different plasmids encoding gRNAs with the same PAM sites were mixed and co-transfected into HEK293FT cells in each well of a 96-well plate.

Flow cytometry. Cells were trypsinized 3 days after transfection and centrifuged at 300 × g for 7 min at 4 ℃. The supernatant was removed, and the cells were resuspended in 1 ×phosphate-buffered saline (PBS) that did not contain calcium or magnesium. Fortessa flow analyzer (BD Biosciences) was used for fluorescence activated cell sorting (FACS) analysis with the following settings: EBFP2 was measured using a 405 nm laser and a 450/50 filter with a photomultiplier tube (PMT) set at 275 V. The EYFP was measured with a 488 nm laser and a 530/30 filter using a PMT set at 270 V. For each sample, ～2 × 104 to ～3 × 104 cell events were collected. The relative fluorescence intensity of EYFP (EYFP fluorescence intensity a. u. ) was defined as the average fluorescence intensity of EYFP divided by the average fluorescence intensity of the internal control EBFP2 fluorescence.

RNA purification and quantitative PCR. Total RNA from HEK293FT cells was extracted with Trizol reagent (Life Technology) . For each sample, 500 ng total RNA was reversed transcripted by ReverTra Ace qPCR RT Master Mix with gDNA Remover Kit (TOYOBO) , and 1 μL of cDNA was used for each qPCR reaction, using 2× EvaGreen Master Mix (Syngentech) . The quantitative reverse transcription polymerase chain reaction (qRT-PCR) reaction was run and analyzed in the Light cycler 480 II (Roche) with all target gene expression levels normalized to β-actin mRNA levels. The primers used in quantitative PCR are depicted in SEQ ID NOs: 134 to 137.

Mutation quantification. All of the indels frequencies were measured by targeted next-generation sequencing (Illumina) . Appropriately 5 days post-transfection, cells were harvested and lysed by lysis buffer (NP40 0.45%and 10mM Tris-HCl, pH8.3) followed by 58 ℃ for 180 min and 95 ℃ for 10 min. Amplicons were generated by three rounds of nested PCR to add the illumina adaptor sequence. After filtering, reads with mutation is defined by mismatch within a 48-bp window around the cleavage site. Indel frequency is counted by reads with mutation divided by total reads.

Reporting summary. Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Next-generation sequencing data for detection indel frequency of the specific sites are available through NCBI Sequence Read Archive (PRJNA513032) .

Conclusions:

In this disclosure, the present inventors developed a strategy to engineer SaCas9 variants with altered PAM recognition specificity by swapping the key region in the PI domain in SaCas9 orthologs, and identified several cCas9 v42 and v17-L variants with expanded DNA cleavage activities at NNVRRN PAMs, along with multiple cCas9 v16 and v21 derived variants that can efficiently target sites with NNVACT, NNVATG, NNVATT, NNVGCT, NNVGTG, and NNVGTT PAM.

In addition, the present inventors demonstrated that the v42-wt based on the wild-type SaCas9 scaffold showed a higher activity at NNGRRV PAMs than the wild-type SaCas9 by using the fluorescent reporter assay. Similarly, the v21-R-wt and v21-L-wt based on the wild-type SaCas9 scaffold also displayed an enhanced activity at NNGACT, NNGATG, NNGATT, NNGGCT, NNGGTG, and NNGGTT PAMs compared to the wild-type SaCas9.

In addition, directed evolution screening and structure-guided mutagenesis based on these cCas9 variants is further improve the DNA cleavage activities at targets containing the expanded PAM sequences. It is intriguing that although the v42, v17-L, SaCas9-KKH R991K, and SaCas9KKH R991K/D987N showed expanded activities at NNVRRV PAMs, these variants displayed decreased activities on NNVRRT PAMs, which is consistent with the previous report that the SaCas9-KKH showed decreased activities at NNGRRT PAMs. One explanation is that sufficient PAM binding activity of SaCas9 nucleases may be required to initiate strong gene editing activities and relaxed PAM binding activity of SaCas9 nucleases results in reduced DNA cleavage activity.

In summary, the present inventors provided a panel of cCas9 variants that are accessible up to 1/4 of all of the PAM sequences with a compact size suitable for viral delivery in mammalian cells, which will be valuable for biomedical applications that require precise Cas9 positioning. This chimeric strategy based on the evolutionary information is also insightful to engineer Cas9 proteins for other functional purposes, such as low immunogenicity, high-fidelity and functional compatibility in mammalian cells.

Although embodiments of the present disclosure have been described, it will be understood by those skilled in the art that various changes, modifications, substitutions and variations can be made in these embodiments without departing from the principle and spirit of the present disclosure, and the scope of the disclosure is defined by the claims and their equivalents.

References

1. Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007) .

2. Makarova, K.S. et al. An updated evolutionary classification of CRISPR-Cas systems. Nat. Rev. Microbiol. 13, 722–736 (2015) .

3. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013) .

4. Qi, L.S. et al. Repurposing CRISPR as an RNA-guided platform for sequencespecific control of gene expression. Cell 152, 1173–1183 (2013) .

5. Hilton, I.B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–517 (2015) .

6. Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A. &Liu, D.R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016) .

7. Ma, Y. et al. Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells. Nat. Methods 13, 1029–1035 (2016) .

8. Ma, D., Peng, S. &Xie, Z. Integration and exchange of split dCas9 domains for transcriptional controls in mammalian cells. Nat. Commun. 7, 100084 (2016) .

9. Gaudelli, N.M. et al. Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017) .

10. Jinek, M. et al. A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–822 (2012) .

11. Wu, X., Kriz, A.J. &Sharp, P. A. Target specificity of the CRISPR-Cas9 system. Quant. Biol. 2, 59 –70 (2014) .

12. Anders, C., Niewoehner, O., Duerst, A. &Jinek, M. Structural basis of PAMdependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014) .

13. Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935–949 (2014) .

14. Nishimasu, H. et al. Crystal Structure of Staphylococcus aureus Cas9. Cell 162, 1113–1126 (2015) .

15. Anders, C., Bargsten, K. &Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016) .

16. Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018) .

17. Kleinstiver, B.P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015) .

18. Kleinstiver, B.P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1–7 (2015) .

19. Gao, L. et al. Engineered Cpf1 variants with altered PAM specificities. Nat. Biotechnol. 35, 789–792 (2017) .

20. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57 –63 (2018) .

21. Ran, F.A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–190 (2015) .

22. Yang, Y. et al. A dual AAV system enables the Cas9-mediated correction of a metabolic liver disease in newborn mice. Nat. Biotechnol. 34, 334–338 (2016) .

23. Murugan, K., Babu, K., Sundaresan, R., Rajan, R. &Sashital, D. G. The revolution continues: Newly discovered systems expand the CRISPR-Cas toolkit. Mol. Cell 68, 15 –25 (2017) .

24. Leenay, R.T. et al. Identifying and visualizing functional PAM diversity across CRISPR-Cas systems. Mol. Cell 62, 137–147 (2016) .

25. Burstein, D. et al. New CRISPR-Cas systems from uncultivated microbes. Nature 542, 237–241 (2017) .

26. Chatterjee, P., Jakimo, N. &Jacobson, J. M. Minimal PAM specificity of a highly similar SpCas9 ortholog. Sci. Adv. 4, eaau0766 (2018) .

27. Ma, H. et al. Multiplexed labeling of genomic loci with dCas9 and engineered sgRNAs using CRISPRainbow. Nat. Biotechnol. 34, 528–530 (2016) .

28. Chen, B. et al. Expanding the CRISPR imaging toolset with Staphylococcus aureus Cas9 for simultaneous imaging of multiple genomic loci. Nucl. Acids Res. 44, e75 (2016) .

29. Ma, D., Peng, S., Huang, W., Cai, Z. &Xie, Z. Rational design of Mini-Cas9 for transcriptional activation. ACS Synth. Biol. 7, 978–985 (2018) .

30. Slaymaker, I.M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84 –88 (2015) .

Claims

A Cas9 variant comprising:

a first backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the first backbone region of a wild-type cas9;

a protospacer adjacent motif (PAM) interaction region, being a 13-amino acid sequence deriving from the PAM interaction region of an ortholog of the wild-type Cas9, and having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the PAM interaction region of the ortholog of the wild-type Cas9; and

a second backbone region, having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%or 100%sequence identity to the second backbone region of the wild-type cas9;

wherein an N-terminus of the PAM interaction region is connected to a C-terminus of the first backbone region, and a C-terminus of the PAM interaction region is connected to an N-terminus of the second backbone region, and

wherein the Cas9 variant has recognition capability at a PAM sequence selected from the group consisting of NNVRRN, NNVACT, NNVATG, NNVATT, NNVGCT, NNVGTG and NNVGTT PAM sequences, wherein

N is adenine (A) , thymine (T) , cytosine (C) or guanine (G) ;

R is adenine (A) or guanine (G) ; and

V is adenine (A) , cytosine (C) or guanine (G) .
The Cas9 variant according to claim 1, wherein the wild-type Cas9 is derived from Micrococcus, Staphylococcus, Planoeoccus, Streptococcus, Leuconostoc, Pediococcus, Aerococcus or Gemella;

preferably, Staphylococcus comprises Staphylococcus aureus, Staphylococcus epidermidis and Staphylococcus saprophyticus;

preferably, Streptococcus comprises Streptococcus pyogenes, streptococcus equismilis, Streptococcus zooepidemicus, Streptococcus equi, Streptococcus dysgalactiae, Streptococcus sanguis, Streptococcus Pneumoniae, Streptococcus anginosus, Streptococcus agalactiae, streptococcus acidominimus, Streptococcus salivarius, Streptococcus mitis, Streptococcus bovis, streptococcus equinus, Streptococcus thermophilus, Streptococcus faecalis, streptococcus faecium, streptococcus avium, streptococcus uberis, Streptococcus lactis, streptococcus cremoris and Streptococcus canis;

preferably, the wild-type Cas9 is derived from Staphylococcus aureus.
The Cas9 variant according to claim 1 or 2, wherein the first and second backbone regions each independently have one or more of amino acid mutations compared to the first and second backbone region of the wild-type Cas9, wherein the amino acid mutation is a substitution, a deletion and an addition.
The Cas9 variant according to claim 3, wherein the first backbone region comprises the amino acid mutation selected from the group consisting of:

a substitution of Alanine (A) for Arginine (R) at position 499,

a substitution of Lysine (K) for Glutamine (Q) at position 500,

a substitution of Alanine (A) for Arginine (R) at position 654,

a substitution of Arginine (R) for Glycine (G) at position 655,

a substitution of Lysine (K) for Glutamicacid (E) at position 782, and

a substitution of Lysine (K) for Asparagine (N) at position 968,

optionally, the second backbone region comprises an amino acid mutation at position 1015, wherein the amino acid mutation is a substitution of Histidine (H) for Arginine (R) ,

preferably, the first backbone region comprises a substitution of Lysine (K) for Glutamicacid (E) at position 782 and a substitution of Lysine (K) for Asparagine (N) at position 968, and the second backbone region comprises a substitution of Histidine (H) for Arginine (R) at position 1015.
The Cas9 variant according to any one of the preceding claims, wherein the first backbone region consists of the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 130;

optionally the second backbone region consists of the amino acid sequence of SEQ ID NO: 2.
The Cas9 variant according to any one of the preceding claims, wherein the ortholog of Staphylococcus aureus is selected from Absiella dolichum, Clostridium coleatum, Veillonella parvula, Alkalibacterium gilvum, Alkalibacterium sp. 20, Lacticigenium naphtae, Alkalibacterium subtropicum, Carnobacterium iners, Carnobacterium viridans, Jeotgalibaca sp. PTS2502, Listeria ivanovii sp. londoniensis, Bacillus massilionigeriensis, Bacillus niameyensis, Ureibacillus thermosphaericus-1, Ureibacillus thermosphaericus-2, Halakalibacillus halophilus, Paraliobacillus ryukyuensis, Sediminibacillus albus, Virgibacillus senegalensis, Pelagirhabdus alkalitolerans, Massilibacterium senegalense, Macrocococcus sp. IME 1552, Staphylococcus (from multispecies) , Staphylococcus simulans, Staphylococcus sp. HMSC061G12, Staphylococcus massiliensis, Staphylococcus microti, Staphylococcus haemolyticus, Staphylococcus sp. HMSC34C02, Staphylococcus warneri, Staphylococcus schleiferi, Staphylococcus agnetis and Staphylococcus lutrae.
The Cas9 variant according to any one of the preceding claims, wherein the ortholog of Staphylococcus aureus is selected from Sediminibacillus albus, Staphylococcus schleiferi, Staphylococcus simulans, Staphylococcus sp. HMSC061G12, Staphylococcus agnetis, Clostridium cocleatum, Absiella dolichum, Staphylococcus warneri, Staphylococcus microti, Massilibacterium senegalense, Lacticigenium naphtae and Halalkaibacillus halophilus.
The Cas9 variant according to any one of the preceding claims, wherein the ortholog of Staphylococcus aureus is selected from Sediminibacillus albus, Staphylococcus schleiferi, Staphylococcus warneri and Staphylococcus microti.
The Cas9 variant according to any one of the preceding claims, wherein the PAM interaction region further has one or more of amino acid mutations compared to the PAM interaction region of the ortholog of the wild-type Cas9, wherein the amino acid mutation is a substitution, a deletion and an addition.
The Cas9 variant according to claim 9, wherein the PAM interaction region has an amino acid mutation at position 991 compared to the PAM interaction region of Staphylococcus schleiferi, wherein the amino acid mutation is a substitution of Lysine (K) or Leucine (L) for Isoleucine (I) ,

optionally the PAM interaction region has an amino acid mutation at position 991 compared to the PAM interaction region of Staphylococcus warneri, wherein the amino acid mutation is a substitution of Lysine (K) , Leucine (L) or Arginine (R) for Isoleucine (I) .
The Cas9 variant according to any one of the preceding claims, wherein the PAM interaction region consists of the amino acid sequence of SEQ ID NO: 3 to SEQ ID NO: 43;

preferably the PAM interaction region consists of the amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO: 16, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 27, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 36 to SEQ ID NO: 43;

preferably the PAM interaction region consists of the amino acid sequence selected from SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 13, SEQ ID NO: 33, SEQ ID NO: 36 to SEQ ID NO: 43.
The Cas9 variant according to any of the preceding claims, wherein the Cas9 variant consists of the amino acid sequence selected from SEQ ID NO: 44 to SEQ ID NO: 84, SEQ ID NO: 131 and SEQ ID NO: 133;

preferably the Cas9 variant consists of the amino acid sequence selected from SEQ ID NO: 49, SEQ ID NO: 54, SEQ ID NO: 74, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 83, SEQ ID NO: 84, and SEQ ID NO: 131.
The Cas9 variant according to any one of the preceding claims, wherein the Cas9 variant is modified with a substitution of Alanine (A) for Arginine (R) at position 499, a substitution of Lysine (K) for Glutamine (Q) at position 500, a substitution of Alanine (A) for Arginine (R) at position 654 and a substitution of Arginine (R) for Glycine (G) at position 655 for decrease of off-target and increase of fidelity.
A nucleic acid encoding the Cas9 variant of any one of claims 1 to 13.
The nucleic acid according to claim 14, wherein the nucleic acid consists of the nucleotide sequence selected from SEQ ID NO: 85 to SEQ ID NO: 125 and SEQ ID NO: 132.
An expression vector comprising:

an encoding sequence comprising:

a first nucleic acid sequence encoding the Cas9 variant of any of claims 1 to13; and

a second nucleic acid sequence encoding a scaffold of a guide RNA (gRNA) which specifically directs the cleavage of a target gene to be edited by the Cas9 variant of any one of claims 1 to 13,

optionally a regulatory element, operably linked to the encoding sequence and configured to be suitable for expression of the encoding sequence in a cell to be edited.
The expression vector according to claim 16, wherein the first nucleic acid sequence consists of a nucleotide sequence selected from SEQ ID NO: 85 to SEQ ID NO: 125 and SEQ ID NO: 132.
The expression vector according to claim 16, wherein the second nucleic acid sequence consists of a nucleotide sequence selected from SEQ ID NO: 126 to SEQ ID NO: 129,

preferably the scaffold of the gRNA encoded by the second nucleic acid sequence comprises at least one of nucleotide mutations in a first stem-loop of the scaffold compared to the scaffold of the gRNA of a wild-type Cas9, wherein the nucleotide mutation in the first stem-loop is selected from 3rd Uracil (U) to Cytosine (C) , 4th Uracil (U) to Adenine (A) , 4th Uracil (U) to Cytosine (C) , 5th Uracil (U) to Cytosine (C) , 6th Adenine (A) to Guanine (G) , 32th Adenine (A) to Guanine (G) , 31th Adenine (A) to Thymine (T) , 31th Adenine (A) to Guanine (G) , 30th Adenine (A) to Guanine (G) and 29th Thymine (T) to Cytosine (C) .
The expression vector according to claim 16, wherein the regulatory element comprises a T7 promoter, an arabinose promoter phoA, tac, lpp, lac-lpp, lac, trp and trc, a CMV promoter, a RSV promoter, an SV40 promoter, an HSV promoter, a human Pol I promoter, human Pol II promoter or human Pol III promoter.
The expression vector according to claim 16, wherein the expression vector is an adenovirus vector, a lentiviral vector or a plasmid.
A method for producing the expression vector of any one of claims 16 to 20, comprising transfecting the expression vector of any one of claims 16 to 20 into a host cell, and isolating the expression vector from the host cell.
A kit for gene editing, comprising:

a first nucleic acid molecule encoding the Cas9 variant of any one of claims 1 to 13;

a second nucleic acid molecule encoding a scaffold of a gRNA that specifically directs the cleavage of a target gene to be edited by the Cas9 variant of any one of claims 1 to 13,

a buffer suitable for gene editing, and

an instruction for use of the kit,

optionally one or more containers suitable for gene editing,

wherein the first nucleic acid molecule and the second nucleic acid molecule are loaded in an expression vector.
A method for gene editing, comprising transfecting the expression vector of any one of claims 16 to 20 into a cell to be edited.
The method according to claim 23, wherein the encoding sequence of the expression vector is determined based on the target gene to be edited.
The method according to claim 23, wherein the encoding sequence encodes a Cas9 variant selected from SEQ ID NO: 49, SEQ ID NO: 54, SEQ ID NO: 74, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 131 and SEQ ID NO: 133.
The method according to claim 23, wherein the cell to be edited is a prokaryotic cell or a eukaryotic cell;

optionally, the cell to be edited is derived from animal, plant or microbe.
A composition comprising the expression vector of any of claims 16 to 20 and a carrier material.
A pharmaceutical composition comprising the expression vector of any of claims 16 to 20 and a pharmaceutically acceptable carrier material.
A method for treating or preventing a disease in a subject, comprising administering a therapeutically effective amount of the composition of claim 27 or the pharmaceutical composition of claim 28 to the subject in need thereof, wherein the subject is preferably a human.
A pharmaceutical preparation for use in treating or preventing a disease in a subject, the pharmaceutical preparation being the composition of claim 27 or the pharmaceutical composition of claim 28, wherein a therapeutically effective amount of the pharmaceutical preparation is administered to the subject in need thereof, wherein the subject is preferably a human.