CN114292831A

CN114292831A - Novel Cas enzyme and application

Info

Publication number: CN114292831A
Application number: CN202210115774.3A
Authority: CN
Inventors: 梁亚峰
Original assignee: Shandong Shunfeng Biotechnology Co Ltd
Current assignee: Shandong Shunfeng Biotechnology Co Ltd
Priority date: 2021-02-03
Filing date: 2022-02-07
Publication date: 2022-04-08
Anticipated expiration: 2042-02-07
Also published as: CN114292831B; CN116555227A

Abstract

The invention belongs to the field of nucleic acid editing, and particularly relates to the technical field of regularly clustered spaced short palindromic repeats (CRISPR). Specifically, the invention provides a novel Cas enzyme, which has low homology with the reported Cas enzyme, can show the activity of nuclease in cells and outside cells, and has wide application prospect.

Description

Novel Cas enzyme and application

Technical Field

The invention relates to the field of gene editing, in particular to the technical field of regularly clustered spaced short palindromic repeats (CRISPR). In particular, the present invention relates to a novel Cas effector protein, fusion proteins comprising such proteins, and nucleic acid molecules encoding them. The invention also relates to complexes and compositions for nucleic acid editing (e.g., gene or genome editing) comprising a Cas protein or fusion protein of the invention, or a nucleic acid molecule encoding the same.

Background

The CRISPR/Cas technology is a widely used gene editing technology, which specifically binds to a target sequence on a genome and cleaves DNA to generate double-strand break through RNA guide, and performs site-directed gene editing by using bionon-homologous end joining or homologous recombination.

The CRISPR/Cas9 system is the most commonly used type II CRISPR system, which recognizes the PAM motif of 3' -NGG, performing blunt-end cleavage of the target sequence. The CRISPR/Cas Type V system is a newly discovered Type of CRISPR system that has a motif of 5' -TTN, with sticky end cleavage of the target sequence, e.g. Cpf1, C2C1, CasX, CasY. However, the different CRISPRs/Cas currently available have different advantages and disadvantages. For example, Cas9, C2C1 and CasX all require two RNAs for guide RNA, whereas Cpf1 requires only one guide RNA and can be used for multiple gene editing. CasX has a size of 980 amino acids, while the common Cas9, C2C1, CasY and Cpf1 are typically around 1300 amino acids in size. In addition, the PAM sequences of Cas9, Cpf1, CasX, and CasY are complex and diverse, while C2C1 recognizes the stringent 5' -TTN, so its target site is easily predicted than other systems to reduce potential off-target effects.

In summary, given that currently available CRISPR/Cas systems are all limited by some drawbacks, the development of a new more robust CRISPR/Cas system with versatile good performance is of great significance for the development of biotechnology.

Disclosure of Invention

The inventors of the present application have unexpectedly discovered a novel class of endonucleases (Cas enzymes) through a large number of experiments and repeated trials. The Cas enzyme comprises one or more of Cas-sf1, Cas-sf3, Cas-sf4, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf10, and based on the discovery, the inventor develops a novel CRISPR/Cas system and a gene editing method and a nucleic acid detection method based on the system.

Cas effector protein

In one aspect, the invention provides a Cas protein (alternatively referred to as Cas enzyme) which is an effector protein in a CRISPR/Cas system, wherein the effector protein is selected from one or any more of Cas-sf4, Cas-sf1, Cas-sf3, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf 10.

Wherein the amino acid sequences of the Cas-sf4, the Cas-sf1, the Cas-sf3, the Cas-sf6, the Cas-sf8, the Cas-sf9 and the Cas-sf10 are respectively shown as SEQ ID No. 1-7.

In one embodiment, the Cas protein amino acid sequence has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID nos. 1-7 and substantially retains its biological function from the sequence.

In one embodiment, the Cas protein amino acid sequence has a sequence with one or more amino acid substitutions, deletions, or additions compared to any of SEQ ID nos. 1-7, and substantially retains its biological function from the sequence; the one or more amino acids include substitutions, deletions or additions of 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acids.

It will be clear to those skilled in the art that the structure of a protein may be altered without adversely affecting its activity and functionality, for example one or more conservative amino acid substitutions may be introduced in the amino acid sequence of the protein without adversely affecting the activity and/or the three-dimensional structure of the protein molecule. Examples and embodiments of conservative amino acid substitutions will be apparent to those skilled in the art. Specifically, the amino acid residue may be substituted with another amino acid residue belonging to the same group as the site to be substituted, i.e., a nonpolar amino acid residue is substituted for another nonpolar amino acid residue, a polar uncharged amino acid residue is substituted for another polar uncharged amino acid residue, a basic amino acid residue is substituted for another basic amino acid residue, and an acidic amino acid residue is substituted for another acidic amino acid residue. Such substituted amino acid residues may or may not be encoded by the genetic code. Conservative substitutions where one amino acid is replaced by another amino acid belonging to the same group are within the scope of the present invention, as long as the substitution does not result in inactivation of the biological activity of the protein. Thus, the proteins of the invention may comprise one or more conservative substitutions in the amino acid sequence, which are preferably made by substitution according to Table 1. In addition, proteins that also comprise one or more other non-conservative substitutions are also encompassed by the present invention, provided that the non-conservative substitutions do not significantly affect the desired function and biological activity of the proteins of the present invention.

Conservative amino acid substitutions may be made at one or more predicted nonessential amino acid residues. A "nonessential" amino acid residue is an amino acid residue that can be altered (deleted, substituted, or substituted) without altering the biological activity, while an "essential" amino acid residue is required for biological activity. A "conservative amino acid substitution" is one in which an amino acid residue is replaced with an amino acid residue having a similar side chain. Amino acid substitutions can be made in non-conserved regions of the Cas enzyme. In general, such substitutions are not made to conserved amino acid residues, or to amino acid residues located within conserved motifs, where such residues are required for protein activity. However, one skilled in the art will appreciate that functional variants may have fewer conservative or non-conservative changes in conserved regions.

TABLE 1

Initial residue(s)	Representative substitutions	Preferred substitutions
			Ala(A)	Val；Leu；Ile	Val
Arg(R)	Lys；Gln；Asn	Lys
			Asn(N)	Gln；His；Lys；Arg	Gln
Asp(D)	Glu	Glu
			Cys(C)	Ser	Ser
Gln(Q)	Asn	Asn
			Glu(E)	Asp	Asp
Gly(G)	Pro；Ala	Ala
			His(H)	Asn；Gln；Lys；Arg	Arg
Ile(I)	Leu；Val；Met；Ala；Phe	Leu
			Leu(L)	Ile；Val；Met；Ala；Phe	Ile
Lys(K)	Arg；Gln；Asn	Arg
			Met(M)	Leu；Phe；Ile	Leu
Phe(F)	Leu；Val；Ile；Ala；Tyr	Leu
			Pro(P)	Ala	Ala
Ser(S)	Thr	Thr
			Thr(T)	Ser	Ser
Trp(W)	Tyr；Phe	Tyr
			Tyr(Y)	Trp；Phe；Thr；Ser	Phe
Val(V)	Ile；Leu；Met；Phe；Ala	Leu

It is well known in the art that one or more amino acid residues may be altered (substituted, deleted, truncated, or inserted) from the N-and/or C-terminus of a protein while still retaining its functional activity. Thus, proteins that have one or more amino acid residues altered from the N-and/or C-terminus of the Cas protein of the present invention, while retaining their desired functional activity, are also within the scope of the present invention. These changes may include changes introduced by modern molecular methods such as PCR, including PCR amplification by altering or extending the protein coding sequence by inclusion of amino acid coding sequences among the oligonucleotides used in PCR amplification.

It will be appreciated that proteins may be altered in various ways, including amino acid substitutions, deletions, truncations, and insertions, and methods for such manipulations are generally known in the art. For example, amino acid sequence variants of Cas proteins can be made by mutation of DNA. It may also be accomplished by other forms of mutagenesis and/or by directed evolution, e.g., using known methods of mutagenesis, recombination and/or shuffling (shuffling), in conjunction with related screening methods, to make single or multiple amino acid substitutions, deletions and/or insertions.

Those skilled in the art will appreciate that these minor amino acid changes in the Cas protein of the invention can occur (e.g., naturally occurring mutations) or be generated (e.g., using r-DNA technology) without loss of protein function or activity. If these mutations occur in the catalytic domain, active site or other functional domain of the protein, the properties of the polypeptide may change, but the polypeptide may retain its activity. Minor effects can be expected if the mutations present are not close to the catalytic domain, active site or other functional domains.

One skilled in the art can identify essential amino acids of a Cas protein according to methods known in the art, such as site-directed mutagenesis or analysis of protein evolution or biological information systems. The catalytic domain, active site or other functional domain of a protein can also be determined by physical analysis of the structure, such as by the following techniques: such as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in combination with mutations in putative key site amino acids.

In one embodiment, the Cas protein comprises the amino acid sequence shown in any one of SEQ ID nos. 1 to 7.

In one embodiment, the Cas protein is the amino acid sequence shown in any one of SEQ ID nos. 1 to 7.

In one embodiment, the Cas protein is a derivatized protein having the same biological function as a protein having the sequence shown in any of SEQ ID nos. 1-7.

Such biological functions include, but are not limited to, binding to a guide RNA, endonuclease activity, binding to a specific site of a target sequence under the guidance of a guide RNA and cleavage activity, including, but not limited to Cis cleavage activity and Trans cleavage activity.

The invention also provides a fusion protein which comprises any one Cas protein selected from Cas-sf1, Cas-sf3, Cas-sf4, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf10 and other modification parts.

In one embodiment, the modifying moiety is selected from an additional protein or polypeptide, a detectable label, or any combination thereof.

In one embodiment, the modifying moiety is selected from the group consisting of an epitope tag, a reporter sequence, a Nuclear Localization Signal (NLS) sequence, a targeting moiety, a transcription activation domain (e.g., VP64), a transcription repression domain (e.g., KRAB domain or SID domain), a nuclease domain (e.g., Fok1), and a domain having an activity selected from the group consisting of: nucleotide deaminase, methylase activity, demethylase, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity and nucleic acid binding activity; and any combination thereof. Such NLS sequences are well known to those skilled in the art, examples of which include, but are not limited to, the SV40 large T antigen, EGL-13, c-Myc, and TUS proteins.

In one embodiment, the NLS sequence is located at, near, or near a terminus (e.g., N-terminus, C-terminus, or both) of a Cas protein of the invention.

Such epitope tags (epitoptags) are well known to those skilled in the art and include, but are not limited to, His, V5, FLAG, HA, Myc, VSV-G, Trx, etc., and other suitable epitope tags (e.g., purification, detection, or tracking) may be selected by those skilled in the art.

The reporter gene sequences are well known to those skilled in the art, examples of which include, but are not limited to, GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, and the like.

In one embodiment, the fusion protein of the invention comprises a domain capable of binding to a DNA molecule or an intracellular molecule, such as Maltose Binding Protein (MBP), the DNA binding domain of Lex a (DBD), the DBD of GAL4, and the like.

In one embodiment, the fusion protein of the invention comprises a detectable label, such as a fluorescent dye, e.g. FITC or DAPI.

In one embodiment, the Cas protein of the present invention is coupled, conjugated or fused to the modifying moiety, optionally via a linker.

In one embodiment, the modification moiety is directly linked to the N-terminus or C-terminus of the Cas protein of the present invention.

In one embodiment, the modification moiety is linked to the N-terminus or C-terminus of the Cas protein of the present invention via a linker. Such linkers are well known in the art, examples of which include, but are not limited to, linkers comprising one or more (e.g., 1, 2, 3, 4, or 5) amino acids (e.g., Glu or Ser) or amino acid derivatives (e.g., Ahx, β -Ala, GABA, or Ava), or PEG, and the like.

The Cas protein, protein derivative or fusion protein of the present invention is not limited by the manner of its production, and for example, it may be produced by a genetic engineering method (recombinant technology) or may be produced by a chemical synthesis method.

Nucleic acid of Cas protein

In another aspect, the invention provides an isolated polynucleotide comprising: a polynucleotide sequence encoding a Cas protein or a fusion protein of the present invention.

In one embodiment, the polynucleotide sequence is codon optimized for expression in a prokaryotic cell. In one embodiment, the polynucleotide sequence is codon optimized for expression in a eukaryotic cell.

In one embodiment, the polynucleotide is preferably single-stranded or double-stranded.

Direct Repeat (Direct Repeat) sequences

In another aspect, the invention provides an engineered direct repeat that forms a complex with any one of the Cas proteins selected from Cas-sf1, Cas-sf3, Cas-sf4, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf10 described above.

The direct repeat sequence is connected with a guide sequence capable of hybridizing with a target sequence to form a guide RNA (guide RNA or gRNA).

Hybridization of the target sequence to the gRNA represents at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity of the target sequence and the nucleic acid sequence of the gRNA, such that a complex can be hybridized; or at least 12, 15, 16, 17, 18, 19, 20, 21, 22, or more bases of the nucleic acid sequences representing the target sequence and the gRNA can be complementarily paired to form a complex.

In some embodiments, the direct repeat sequence has at least 90% sequence identity, e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity, to any one of SEQ ID nos. 8-14. In some embodiments, the direct repeat sequence has a substitution, deletion, or addition of one or more bases (e.g., a substitution, deletion, or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 bases) as compared to the sequence set forth in any of SEQ ID nos. 8-14.

In some embodiments, the direct repeat sequence is as shown in any one of SEQ ID Nos. 8-14, or as shown in any one of SEQ ID Nos. 16-22.

In the invention, SEQ ID Nos. 8-14 respectively correspond to prototypes of homologous repeat sequences of Cas-sf4, Cas-sf1, Cas-sf3, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf 10; SEQ ID Nos. 16-22 correspond to the mature direct repeats of Cas-sf4, Cas-sf1, Cas-sf3, Cas-sf6, Cas-sf8, Cas-sf9, and Cas-sf10, respectively.

Guide RNA (gRNA)

In another aspect, the present invention provides a gRNA comprising a first segment and a second segment; the first segment is also referred to as "framework region", "protein binding segment", "protein binding sequence", or "Direct Repeat (Direct Repeat) sequence"; the second segment is also referred to as a "targeting sequence for targeting nucleic acid" or a "targeting segment for targeting nucleic acid", or a "targeting sequence for targeting a target sequence".

The first segment of the gRNA is capable of interacting with a Cas protein of the invention, thereby allowing the Cas protein and the gRNA to form a complex.

The targeting sequence of the targeting nucleic acid or the targeting segment of the targeting nucleic acid of the invention comprises a nucleotide sequence that is complementary to a sequence in the target nucleic acid. In other words, the targeting sequence of the targeting nucleic acid or the targeting segment of the targeting nucleic acid of the present invention interacts in a sequence-specific manner with the target nucleic acid upon hybridization (i.e., base pairing). Thus, the targeting sequence of the targeting nucleic acid or the targeting segment of the targeting nucleic acid may be altered or modified to hybridize to any desired sequence within the target nucleic acid. The nucleic acid is selected from DNA or RNA.

The percent complementarity between the targeting sequence of the targeting nucleic acid or the targeting segment of the targeting nucleic acid and the target sequence of the target nucleic acid can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%).

The "framework region", "protein-binding segment", "protein-binding sequence", or "direct repeat" of a gRNA of the invention can interact with a CRISPR protein (or, Cas protein). The gRNA of the invention directs its interacting Cas protein to a specific nucleotide sequence within a target nucleic acid through the action of a targeting sequence of the targeting nucleic acid.

Preferably, the guide RNA comprises a first segment and a second segment in the 5 'to 3' direction.

In the context of the present invention, the second segment is also understood to be a leader sequence which hybridizes to the target sequence.

The grnas of the invention are capable of forming a complex with the Cas protein.

Carrier

The present invention also provides a vector comprising a Cas protein, an isolated nucleic acid molecule or a polynucleotide as described above; preferably, it further comprises a regulatory element operably linked thereto.

In one embodiment, the regulatory element is selected from one or more of the group consisting of: enhancers, transposons, promoters, terminators, leader sequences, polyadenylation sequences, marker genes.

In one embodiment, the vector comprises a cloning vector, an expression vector, a shuttle vector, an integration vector.

In some embodiments, the vectors included in the system are viral vectors (e.g., retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors and herpes simplex vectors), and may also be of the type of plasmid, virus, cosmid, phage, and the like, which are well known to those skilled in the art.

Carrier system

The present invention provides an engineered non-naturally occurring vector system, or CRISPR-Cas system, comprising a Cas protein or a nucleic acid sequence encoding said Cas protein and nucleic acid encoding one or more guide RNAs.

In one embodiment, the nucleic acid sequence encoding the Cas protein and the nucleic acid encoding the one or more guide RNAs are artificially synthesized.

In one embodiment, the nucleic acid sequence encoding the Cas protein and the nucleic acid encoding the one or more guide RNAs do not occur naturally together.

The one or more guide RNAs target one or more target sequences in the cell. The one or more target sequences hybridize to the genomic locus of the DNA molecule encoding the one or more gene products and direct the Cas protein to the genomic locus site of the DNA molecule of the one or more gene products, and the Cas protein modifies, edits, or cleaves the target sequence upon reaching the target sequence position, whereby expression of the one or more gene products is altered or modified.

The cells of the invention include one or more of animals, plants, or microorganisms.

In some embodiments, the Cas protein is codon optimized for expression in a cell.

In some embodiments, the Cas protein directs cleavage of one or both strands at the target sequence position.

The present invention also provides an engineered non-naturally occurring vector system, which may include one or more vectors, the one or more vectors including:

a) a first regulatory element operably linked to the gRNA,

b) a second regulatory element operably linked to the Cas protein;

wherein components (a) and (b) are located on the same or different carriers of the system.

The first and second regulatory elements include promoters (e.g., constitutive promoters or inducible promoters), enhancers (e.g., 35S promoter or 35S enhanced promoter), Internal Ribosome Entry Sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences).

In some embodiments, the vector in the system is a viral vector (e.g., retroviral vectors, lentiviral vectors, adenoviral vectors, adeno-associated vectors and herpes simplex vectors), and may also be of the type of plasmid, virus, cosmid, phage, and the like, which are well known to those skilled in the art.

In some embodiments, the systems provided herein are in a delivery system. In some embodiments, the delivery system is a nanoparticle, a liposome, an exosome, a microbubble, and a gene gun.

In one embodiment, when the target sequence is DNA, the target sequence is located 3' of the Protospacer Adjacent Motif (PAM) and the PAM has a sequence represented by TTN, where N is selected from A, G, T, C.

In one embodiment, the target sequence is a DNA or RNA sequence from a prokaryotic or eukaryotic cell. In one embodiment, the target sequence is a non-naturally occurring DNA or RNA sequence.

In one embodiment, the target sequence is present within a cell. In one embodiment, the target sequence is present within the nucleus or within the cytoplasm (e.g., organelle). In one embodiment, the cell is a eukaryotic cell. In other embodiments, the cell is a prokaryotic cell.

In one embodiment, the Cas protein has one or more NLS sequences attached thereto. In one embodiment, the fusion protein comprises one or more NLS sequences. In one embodiment, the NLS sequence is linked to the N-terminus or C-terminus of the protein. In one embodiment, the NLS sequence is fused to the N-terminus or C-terminus of the protein.

In another aspect, the invention relates to an engineered CRISPR system comprising a Cas protein as described above and one or more guide RNAs, wherein the guide RNA comprises a direct repeat and a spacer sequence capable of hybridizing to a target nucleic acid, the Cas protein being capable of binding to the guide RNA and targeting a target nucleic acid sequence complementary to the spacer sequence.

Protein-nucleic acid complexes/compositions

In another aspect, the present invention provides a complex or composition comprising:

(i) a protein component selected from: the above Cas protein, derivatized protein, or fusion protein, and any combination thereof; and

(ii) a nucleic acid component comprising (a) a guide sequence capable of hybridizing to a target sequence; and (b) a direct repeat sequence capable of binding to a Cas protein of the present invention.

The protein component and the nucleic acid component are combined with each other to form a complex.

In one embodiment, the nucleic acid component is a guide RNA in a CRISPR-Cas system.

In one embodiment, the complex or composition is non-naturally occurring or modified. In one embodiment, at least one component of the complex or composition is non-naturally occurring or modified. In one embodiment, the first component is non-naturally occurring or modified; and/or, the second component is non-naturally occurring or modified.

Activated CRISPR complexes

In another aspect, the present invention also provides an activated CRISPR complex comprising: (1) a protein component selected from: a Cas protein, a derivatized protein, or a fusion protein of the invention, and any combination thereof; (2) a gRNA comprising (a) a guide sequence capable of hybridizing to a target sequence; and (b) a direct repeat sequence capable of binding to a Cas protein of the present invention; and (3) a target sequence that binds to the gRNA. Preferably, the binding is via a targeting sequence of a targeting nucleic acid on the gRNA to the target nucleic acid.

The terms "activated CRISPR complex", "activation complex" or "ternary complex" as used herein refer to a complex of a Cas protein, a gRNA, and a target nucleic acid in a CRISPR system after binding or modification.

The Cas protein and gRNA of the invention can form a binary complex that is activated upon binding to a nucleic acid substrate to form an activated CRISPR complex. The nucleic acid substrate is complementary to a spacer sequence in the gRNA (alternatively referred to as a guide sequence that hybridizes to the target nucleic acid). In some embodiments, the spacer sequence of the gRNA is perfectly matched to the target substrate. In other embodiments, the spacer sequence of the gRNA matches a portion (continuous or discontinuous) of the target substrate.

In a preferred embodiment, the activated CRISPR complex may exhibit a collateral nuclease activity, which refers to the non-specific or random cleavage activity of the activated CRISPR complex on single-stranded nucleic acids, also referred to in the art as trans cleavage activity.

Delivery and delivery compositions

The Cas proteins, grnas, fusion proteins, nucleic acid molecules, vectors, systems, complexes, and compositions of the invention can be delivered by any method known in the art. Such methods include, but are not limited to, electroporation, lipofection, nuclear transfection, microinjection, sonoporation, gene gun, calcium phosphate-mediated transfection, cationic transfection, lipofection, dendritic transfection, heat shock transfection, nuclear transfection, magnetic transfection, lipofection, puncture transfection, optical transfection, agent-enhanced nucleic acid uptake, and delivery via liposomes, immunoliposomes, viral particles, artificial virosomes, and the like.

Thus, in another aspect, the present invention provides a delivery composition comprising a delivery vehicle and one or any of the following: the Cas protein, fusion protein, nucleic acid molecule, vector, system, complex and composition of the present invention.

In one embodiment, the delivery vehicle is a particle.

In one embodiment, the delivery vector is selected from a lipid particle, a sugar particle, a metal particle, a protein particle, a liposome, an exosome, a microvesicle, a gene gun, or a viral vector (e.g., a replication defective retrovirus, lentivirus, adenovirus, or adeno-associated virus).

Host cell

The invention also relates to an in vitro, ex vivo or in vivo cell or cell line or progeny thereof comprising: cas proteins, fusion proteins, nucleic acid molecules, protein-nucleic acid complexes, activated CRISPR complexes, vectors, and delivery compositions of the invention described herein.

In certain embodiments, the cell is a prokaryotic cell.

In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is a non-human mammalian cell, e.g., a cell of a non-human primate, bovine, ovine, porcine, canine, monkey, rabbit, rodent (e.g., rat or mouse). In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as a cell of a poultry bird (e.g., chicken), fish, or crustacean (e.g., clam, shrimp). In certain embodiments, the cell is a plant cell, e.g., a cell possessed by a monocot or dicot or a cell possessed by a cultivated plant or a food crop such as cassava, corn, sorghum, soybean, wheat, oat, or rice, e.g., an algae, a tree, or a producer, a fruit, or a vegetable (e.g., a tree such as a citrus tree, a nut tree; a solanum plant, cotton, tobacco, tomato, grape, coffee, cocoa, etc.).

In certain embodiments, the cell is a stem cell or stem cell line.

In certain instances, a host cell of the invention comprises a modification of a gene or genome that is not present in its wild type.

Gene editing method and application

The Cas protein, the nucleic acid, the composition as described above, the CIRSPR/Cas system as described above, the vector system as described above, the delivery composition as described above or the activated CRISPR complex as described above or the host cell as described above may be used for any one or several of the following uses: targeting and/or editing a target nucleic acid; cleaving double-stranded DNA, single-stranded DNA, or single-stranded RNA; non-specifically cleaving and/or degrading the nucleic acid of the collateral branch; non-specifically cleaving single-stranded nucleic acids; detecting nucleic acid; detecting nucleic acids in a target sample; specifically editing double-stranded nucleic acids; base-editing double-stranded nucleic acids; base-editing single-stranded nucleic acids. In other embodiments, the kit may also be used to prepare reagents or kits for any one or more of the uses described above.

The invention also provides the application of the Cas protein, the nucleic acid, the composition, the CIRCR SPR/Cas system, the vector system, the delivery composition or the activated CRISPR complex in gene editing, gene targeting or gene cutting; alternatively, use in the manufacture of a reagent or kit for gene editing, gene targeting or gene cleavage.

In one embodiment, the gene editing, gene targeting or gene cleavage is gene editing, gene targeting or gene cleavage inside and/or outside a cell.

The present invention also provides a method of editing, targeting or cleaving a target nucleic acid, comprising contacting the target nucleic acid with the above-described Cas protein, nucleic acid, the above-described composition, the above-described CIRSPR/Cas system, the above-described vector system, the above-described delivery composition or the above-described activated CRISPR complex. In one embodiment, the method is editing, targeting or cleaving a target nucleic acid intracellularly and/or extracellularly.

The gene editing or editing target nucleic acids include modifying genes, knocking out genes, altering expression of gene products, repairing mutations, and/or inserting polynucleotides, gene mutations.

The editing can be performed in prokaryotic cells and/or eukaryotic cells.

In another aspect, the invention also provides the application of the above Cas protein, nucleic acid, the above composition, the above CIRSPR/Cas system, the above vector system, the above delivery composition or the above activated CRISPR complex in nucleic acid detection, or in the preparation of a reagent or kit for nucleic acid detection.

In another aspect, the invention also provides a method of cleaving single-stranded nucleic acid, the method comprising contacting a nucleic acid population with the Cas protein and the grnas described above, wherein the nucleic acid population comprises a target nucleic acid and a plurality of non-target single-stranded nucleic acids, the Cas protein cleaving the plurality of non-target single-stranded nucleic acids.

The gRNA is capable of binding the Cas protein.

The gRNA is capable of targeting the target nucleic acid.

The contacting may be in vitro, ex vivo, or inside a cell in vivo.

Preferably, the cleaved single-stranded nucleic acid is non-specific cleavage.

In another aspect, the invention also provides the use of the above Cas protein, nucleic acid, the above composition, the above CIRSPR/Cas system, the above vector system, the above delivery composition or the above activated CRISPR complex for non-specific cleavage of single stranded nucleic acids, or for the preparation of a reagent or kit for non-specific cleavage of single stranded nucleic acids.

In another aspect, the invention also provides a kit for gene editing, gene targeting or gene cleavage, comprising the above Cas protein, gRNA, nucleic acid, the above composition, the above CIRSPR/Cas system, the above vector system, the above delivery composition, the above activated CRISPR complex, or the above host cell.

In another aspect, the present invention also provides a kit for detecting a target nucleic acid in a sample, the kit comprising: (a) a Cas protein, or a nucleic acid encoding the Cas protein; (b) a guide RNA, or a nucleic acid encoding the guide RNA, or a precursor RNA comprising the guide RNA, or a nucleic acid encoding the precursor RNA; and (c) a single-stranded nucleic acid detector that is single-stranded and does not hybridize to the guide RNA.

It is known in the art that precursor RNAs can be cleaved or processed into mature guide RNAs as described above.

In another aspect, the invention provides the use of the above Cas protein, nucleic acid, the above composition, the above CIRSPR/Cas system, the above vector system, the above delivery composition, the above activated CRISPR complex or the above host cell in the preparation of a formulation or kit for:

(i) gene or genome editing;

(ii) target nucleic acid detection and/or diagnosis;

(iii) editing a target sequence in a target locus to modify an organism or non-human organism;

(iv) treatment of diseases;

(v) targeting a target gene;

(vi) cutting the target gene.

Preferably, the gene or genome editing is carried out intracellularly or extracellularly.

Preferably, the target nucleic acid detection and/or diagnosis is in vitro.

Preferably, the treatment of the disease is the treatment of a condition caused by a defect in the target sequence in the target locus.

In another aspect, the invention provides a method of detecting a target nucleic acid in a sample, the method comprising contacting the sample with the Cas protein, a gRNA (guide RNA) comprising a region that binds to the Cas protein and a guide sequence that hybridizes to the target nucleic acid, and a single-stranded nucleic acid detector; detecting a detectable signal generated by the Cas protein-cleaved single-stranded nucleic acid detector, thereby detecting a target nucleic acid; the single-stranded nucleic acid detector does not hybridize to the gRNA.

Method for specifically modifying target nucleic acid

In another aspect, the present invention also provides a method of specifically modifying a target nucleic acid, the method comprising: contacting the target nucleic acid with the Cas protein, the nucleic acid, the composition, the CIRSPR/Cas system, the vector system, the delivery composition, or the activated CRISPR complex.

The specific modification may occur in vivo or in vitro.

The specific modification may occur intracellularly or extracellularly.

In some cases, the cell is selected from a prokaryotic cell or a eukaryotic cell, e.g., an animal cell, a plant cell, or a microbial cell.

In one embodiment, the modification refers to a break in the target sequence, e.g., a single/double strand break in DNA, or a single strand break in RNA.

In some cases, the method further comprises contacting the target nucleic acid with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of the copy of the donor polynucleotide is integrated into the target nucleic acid.

In one embodiment, the modification further comprises inserting an editing template (e.g., an exogenous nucleic acid) into the break.

In one embodiment, the method further comprises: contacting the editing template with the target nucleic acid, or delivering into a cell comprising the target nucleic acid. In this embodiment, the method repairs the disrupted target gene by homologous recombination with an exogenous template polynucleotide; in some embodiments, the repair results in a mutation, including an insertion, deletion, or substitution of one or more nucleotides of the target gene, and in other embodiments, the mutation results in one or more amino acid changes in a protein expressed from a gene comprising the target sequence.

Detection (non-specific cleavage)

In another aspect, the invention provides a method of detecting a target nucleic acid in a sample, the method comprising contacting the sample with the above-described Cas protein, nucleic acid, the above-described composition, the above-described CIRSPR/Cas system, the above-described vector system, the above-described delivery composition, or the above-described activated CRISPR complex, and a single-stranded nucleic acid detector; detecting a detectable signal generated by the Cas protein cleavage single stranded nucleic acid detector, thereby detecting the target nucleic acid.

In the present invention, the target nucleic acid comprises a ribonucleotide or a deoxyribonucleotide; including single-stranded nucleic acids, double-stranded nucleic acids, e.g., single-stranded DNA, double-stranded DNA, single-stranded RNA, double-stranded RNA. In one embodiment, the target nucleic acid is derived from a sample of a virus, bacterium, microorganism, soil, water source, human, animal, plant, or the like.

Preferably, the target nucleic acid is a product enriched or amplified by PCR, NASBA, RPA, SDA, LAMP, HAD, NEAR, MDA, RCA, LCR, RAM and the like.

In one embodiment, the target nucleic acid is a viral nucleic acid, a bacterial nucleic acid, a specific nucleic acid associated with a disease, such as a specific mutation site or SNP site or a nucleic acid that is different from a control; preferably, the virus is a plant virus or an animal virus, e.g., papilloma virus, hepatic DNA virus, herpes virus, adenovirus, poxvirus, parvovirus, coronavirus; preferably, the virus is a coronavirus, preferably SARS, SARS-CoV2(COVID-19), HCoV-229E, HCoV-OC43, HCoV-NL63, HCoV-HKU1, Mers-CoV.

In some embodiments, the target nucleic acid is derived from a cell, e.g., from a cell lysate.

In one embodiment, the target nucleic acid comprises DNA, RNA, preferably single-stranded nucleic acid or double-stranded nucleic acid or nucleic acid modification.

In the present invention, the gRNA has at least 50% match to a target sequence on a target nucleic acid, preferably at least 60%, preferably at least 70%, preferably at least 80%, preferably at least 90%.

In one embodiment, when the target sequence contains one or more characteristic sites (e.g., a particular mutation site or SNP), the characteristic site is a perfect match to the gRNA.

In one embodiment, one or more grnas with targeting sequences different from each other can be included in the detection method, targeting different target sequences.

In the present invention, the single-stranded nucleic acid detector includes, but is not limited to, a single-stranded DNA, a single-stranded RNA, a DNA-RNA hybrid, a nucleic acid analog, a base modification, a single-stranded nucleic acid detector containing a base-free spacer, and the like; "nucleic acid analogs" include, but are not limited to: locked nucleic acids, bridged nucleic acids, morpholino nucleic acids, ethylene glycol nucleic acids, hexitol nucleic acids, threose nucleic acids, arabinose nucleic acids, 2 ' oxymethyl RNA, 2 ' methoxyacetyl RNA, 2 ' fluoro RNA, 2 ' amino RNA, 4 ' thio RNA, and combinations thereof, including optional ribonucleotide or deoxyribonucleotide residues.

In the present invention, the detectable signal is realized by: vision-based detection, sensor-based detection, color detection, fluorescence signal-based detection, gold nanoparticle-based detection, fluorescence polarization, fluorescence detection, colloidal phase transition/dispersion, electrochemical detection, and semiconductor-based detection.

In the present invention, it is preferable that a fluorescent group and a quencher group are respectively disposed at both ends of the single-stranded nucleic acid detector, and when the single-stranded nucleic acid detector is cleaved, a detectable fluorescent signal can be exhibited. The fluorescent group is selected from one or more of FAM, FITC, VIC, JOE, TET, CY3, CY5, ROX, Texas Red or LC RED 460; the quenching group is selected from one or more of BHQ1, BHQ2, BHQ3, Dabcy1 or Tamra.

In other embodiments, different labeled molecules are respectively disposed at the 5 'end and the 3' end of the single-stranded nucleic acid detector, and the results of the colloidal gold test before and after cleavage by the Cas protein of the single-stranded nucleic acid detector are detected by means of colloidal gold detection; the single-stranded nucleic acid detector shows different color development results on a colloidal gold detection line and a quality control line before and after being cut by the Cas protein.

In some embodiments, the method of detecting a target nucleic acid can further comprise comparing the level of the detectable signal to a reference signal level, and determining the amount of the target nucleic acid in the sample based on the level of the detectable signal.

In some embodiments, the method of detecting a target nucleic acid can further comprise using an RNA reporter nucleic acid and a DNA reporter nucleic acid (e.g., fluorescent color) on different channels and determining the level of detectable signal by measuring the signal levels of the RNA and DNA reporter molecules and by measuring the amount of target nucleic acid in the RNA and DNA reporter molecules, sampling based on combining (e.g., using a minimum or product) the levels of detectable signal.

In one embodiment, the target nucleic acid is present within a cell.

In one embodiment, the cell is a prokaryotic cell.

In one embodiment, the cell is a eukaryotic cell.

In one embodiment, the cell is an animal cell.

In one embodiment, the cell is a human cell.

In one embodiment, the cell is a plant cell, such as a cell possessed by a cultivated plant (e.g., cassava, corn, sorghum, wheat, or rice), an algae, a tree, or a vegetable.

In one embodiment, the target gene is present in a nucleic acid molecule (e.g., a plasmid) in vitro.

In one embodiment, the target gene is present in a plasmid.

Definition of terms

In the present invention, unless otherwise specified, scientific and technical terms used herein have the meanings that are commonly understood by those skilled in the art. Also, the procedures of molecular genetics, nucleic acid chemistry, molecular biology, biochemistry, cell culture, microbiology, cell biology, genomics, and recombinant DNA, etc., used herein, are all conventional procedures widely used in the corresponding field. Meanwhile, in order to better understand the present invention, the definitions and explanations of related terms are provided below.

Cas protein

In the present invention, Cas protein, Cas enzyme, Cas effector protein may be used interchangeably; the present inventors have for the first time discovered and identified a Cas effector protein having an amino acid sequence selected from the group consisting of:

(i) a sequence as shown in any one of SEQ ID Nos. 1 to 7;

(ii) a sequence having one or more amino acid substitutions, deletions or additions (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions) compared to the sequence set forth in any of SEQ ID nos. 1 to 7; or

(iii) A sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a sequence set forth in any one of SEQ ID Nos. 1-7.

Nucleic acid cleavage or cleavage of nucleic acids herein includes DNA or RNA fragmentation in a target nucleic acid (Cis cleavage), DNA or RNA fragmentation in a side-branch nucleic acid substrate (single-stranded nucleic acid substrate) (i.e., non-specific or non-targeting, Trans cleavage) produced by a Cas enzyme as described herein. In some embodiments, the cleavage is a double-stranded DNA break. In some embodiments, the cleavage is a single-stranded DNA break or a single-stranded RNA break.

CRISPR system

As used herein, the terms "regularly clustered short palindromic repeats (CRISPR) -CRISPR-associated (Cas) (CRISPR-Cas) system" or "CRISPR system" are used interchangeably and have the meaning generally understood by those skilled in the art, which generally comprise a transcript or other element that is associated with the expression of a CRISPR-associated ("Cas") gene, or a transcript or other element that is capable of directing the activity of said Cas gene.

CRISPR/Cas complexes

As used herein, the term "CRISPR/Cas complex" refers to a complex formed by the binding of a guide RNA (guide RNA) or mature crRNA to a Cas protein, which comprises a direct repeat that hybridizes to a guide sequence of a target sequence and binds to a Cas protein, which complex is capable of recognizing and cleaving a polynucleotide that is capable of hybridizing to the guide RNA or mature crRNA.

Guide RNA (guideRNA, gRNA)

As used herein, the terms "guide RNA", "gRNA", "mature crRNA", "guide sequence" are used interchangeably and have the meaning commonly understood by those skilled in the art. In general, the guide RNA may comprise, consist essentially of, or consist of a direct repeat (direct repeat) and a guide sequence.

In certain instances, the guide sequence is any polynucleotide sequence that is sufficiently complementary to the target sequence to hybridize to the target sequence and direct specific binding of the CRISPR/Cas complex to the target sequence. In one embodiment, the degree of complementarity between a guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%, when optimally aligned. Determining the optimal alignment is within the ability of one of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, the Smith-Waterman algorithm in matlab (Smith-Waterman), Bowtie, Geneius, Biopython, and SeqMan.

Target sequence

By "target sequence" is meant a polynucleotide that is targeted by a guide sequence in the gRNA, e.g., a sequence that is complementary to the guide sequence, wherein hybridization between the target sequence and the guide sequence will promote formation of a CRISPR/Cas complex (including Cas protein and gRNA). Complete complementarity is not necessary as long as there is sufficient complementarity to cause hybridization and promote formation of a CRISPR/Cas complex.

The target sequence may comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located intracellularly or extracellularly. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located within an organelle of the eukaryotic cell, such as a mitochondrion or chloroplast. Sequences or templates that can be used for recombination into a target locus containing the target sequence are referred to as "editing templates" or "editing polynucleotides" or "editing sequences". In one embodiment, the editing template is an exogenous nucleic acid. In one embodiment, the recombination is homologous recombination.

In the present invention, a "target sequence" or "target polynucleotide" or "target nucleic acid" can be any polynucleotide endogenous or exogenous to a cell (e.g., a eukaryotic cell). For example, the target polynucleotide may be a polynucleotide present in the nucleus of a eukaryotic cell. The target polynucleotide may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or non-useful DNA). In some cases, the target sequence should be related to the Protospacer Adjacent Motif (PAM).

Single-stranded nucleic acid detector

The single-stranded nucleic acid detector of the present invention refers to a sequence containing 2 to 200 nucleotides, preferably, 2 to 150 nucleotides, preferably, 3 to 100 nucleotides, preferably, 3 to 30 nucleotides, preferably, 4 to 20 nucleotides, and more preferably, 5 to 15 nucleotides. Preferably a single-stranded DNA molecule, a single-stranded RNA molecule or a single-stranded DNA-RNA hybrid.

The single-stranded nucleic acid detector comprises different reporter groups or marker molecules at both ends, and does not present a reporter signal when in an initial state (i.e., an uncleaved state), and presents a detectable signal when the single-stranded nucleic acid detector is cleaved, i.e., presents a detectable difference after cleavage from before cleavage.

In one embodiment, the reporter group or the marker molecule comprises a fluorescent group and a quenching group, wherein the fluorescent group is selected from one or any several of FAM, FITC, VIC, JOE, TET, CY3, CY5, ROX, Texas Red or LC RED 460; the quenching group is selected from one or more of BHQ1, BHQ2, BHQ3, Dabcy1 or Tamra.

In one embodiment, the single stranded nucleic acid detector has a first molecule (e.g., FAM or FITC) attached to the 5 'end and a second molecule (e.g., biotin) attached to the 3' end. The reaction system containing the single-stranded nucleic acid detector is used in combination with a flow strip to detect the target nucleic acid (preferably, in a colloidal gold detection manner). The flow strip is designed with two capture lines, with an antibody that binds to a first molecule (i.e. a first molecular antibody) at the sample contacting end (colloidal gold), an antibody that binds to the first molecular antibody at the first line (control line), and an antibody that binds to a second molecule (i.e. a second molecular antibody, such as avidin) at the second line (test line). As the reaction flows along the strip, the first molecular antibody binds to the first molecule carrying the cleaved or uncleaved oligonucleotide to the capture line, the cleaved reporter will bind to the antibody of the first molecular antibody at the first capture line, and the uncleaved reporter will bind to the second molecular antibody at the second capture line. Binding of the reporter group at each line will result in a strong readout/signal (e.g. color). As more reporters are cut, more signal will accumulate at the first capture line and less signal will appear at the second line. In certain aspects, the invention relates to the use of a flow strip as described herein for detecting nucleic acids. In certain aspects, the invention relates to a method of detecting nucleic acids using a flow strip as defined herein, e.g. a (side) flow test or a (side) flow immunochromatographic assay. In some aspects, the molecules in the single-stranded nucleic acid detector may be replaced with each other, or the positions of the molecules may be changed, and the modified form is also included in the present invention as long as the reporting principle is the same as or similar to that of the present invention.

The detection method of the present invention can be used for quantitative detection of a target nucleic acid to be detected. The quantitative detection index can be quantified according to the signal intensity of the reporter group, such as the luminous intensity of a fluorescent group, or the width of a color development strip.

Wild type

As used herein, the term "wild-type" has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, strain, gene, or characteristic that, when it exists in nature, is distinguished from a mutant or variant form, which may be isolated from a source in nature and which has not been intentionally modified by man.

Derivatization

As used herein, the term "derivatize" refers to a chemical modification of an amino acid, polypeptide, or protein to which one or more substituents have been covalently attached. The substituents may also be referred to as side chains.

The derivatized protein is a derivative of the protein, and generally, derivatization of the protein does not adversely affect the desired activity of the protein (e.g., activity in binding to a guide RNA, endonuclease activity, activity in binding to a specific site of a target sequence under the guidance of a guide RNA and cleavage), i.e., the derivative of the protein has the same activity as the protein.

Derivatized proteins

Also referred to as "protein derivatives" refer to modified forms of proteins, for example, wherein one or more amino acids of the protein may be deleted, inserted, modified and/or substituted.

Not naturally occurring

As used herein, the terms "non-naturally occurring" or "engineered" are used interchangeably and represent artificial participation. When these terms are used to describe a nucleic acid molecule or polypeptide, it means that the nucleic acid molecule or polypeptide is at least substantially free from at least one other component with which it is associated in nature or as found in nature.

Orthologues (orthologues)

As used herein, the term "ortholog" has the meaning commonly understood by those skilled in the art. By way of further guidance, an "ortholog" of a protein as described herein refers to a protein belonging to a different species that performs the same or similar function as the protein being its ortholog.

Identity of each other

As used herein, the term "identity" is used to refer to the match of sequences between two polypeptides or between two nucleic acids. When a position in both of the sequences being compared is occupied by the same base or amino acid monomer subunit (e.g., a position in each of two DNA molecules is occupied by adenine, or a position in each of two polypeptides is occupied by lysine), then the molecules are identical at that position. The "percent identity" between two sequences is a function of the number of matching positions shared by the two sequences divided by the number of positions compared x 100. For example, if 6 of 10 positions of two sequences match, then the two sequences have 60% identity. For example, the DNA sequences CTGACT and CAGGTT share 50% identity (3 of the total 6 positions match). Typically, the comparison is made when the two sequences are aligned to yield maximum identity. Such alignments can be performed by using, for example, Needleman et al (1970) j.mol.biol.48: 443-453. The algorithm of E.Meyers and W.Miller (Compout.appl biosci., 4:11-17(1988)) which has been incorporated into the ALIGN program (version 2.0) can also be used to determine percent identity between two amino acid sequences using a PAM120 weight residue table (weight residue table), a gap length penalty of 12, and a gap penalty of 4. Furthermore, percent identity between two amino acid sequences can be determined using the Needleman and Wunsch (J MoI biol.48: 444-.

Carrier

The term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid molecule to which it is linked. Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules comprising one or more free ends, free ends (e.g., circular); nucleic acid molecules comprising DNA, RNA, or both; and other various polynucleotides known in the art. The vector may be introduced into a host cell by transformation, transduction, or transfection, and the genetic material elements carried thereby are expressed in the host cell. A vector can be introduced into a host cell to thereby produce a transcript, protein, or peptide, including from a protein, fusion protein, isolated nucleic acid molecule, etc. (e.g., a CRISPR transcript, such as a nucleic acid transcript, protein, or enzyme) as described herein. A vector may contain a variety of elements that control expression, including, but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. In addition, the vector may contain a replication initiation site.

One type of vector is a "plasmid," which refers to a circular double-stranded DNA loop into which additional DNA segments can be inserted, for example, by standard molecular cloning techniques.

Another type of vector is a viral vector, in which the virus-derived DNA or RNA sequences are present in a vector for packaging of viruses (e.g., retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses). Viral vectors also comprise polynucleotides carried by viruses for transfection into a host cell. Certain vectors (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors) are capable of autonomous replication in a host cell into which they are introduced.

Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as "expression vectors".

Host cell

As used herein, the term "host cell" refers to a cell that can be used to introduce a vector, and includes, but is not limited to, prokaryotic cells such as Escherichia coli or Bacillus subtilis, eukaryotic cells such as microbial cells, fungal cells, animal cells, and plant cells.

One skilled in the art will appreciate that the design of an expression vector may depend on factors such as the choice of host cell to be transformed, the level of expression desired, and the like.

Regulatory element

As used herein, the term "regulatory element" is intended to include promoters, enhancers, Internal Ribosome Entry Sites (IRES), and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly-U sequences), which are described in detail with reference to gordel (Goeddel), "gene expression technology: METHODS IN ENZYMOLOGY (GENE EXPRESSION TECHNOLOGY: METHOD IN ENZYMOLOGY)185, Academic Press, San Diego, Calif. (1990). In some cases, regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells as well as those sequences that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters may primarily direct expression in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, a particular organ (e.g., liver, pancreas), or a particular cell type (e.g., lymphocyte). In certain instances, the regulatory element may also direct expression in a time-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue or cell type specific. In certain instances, the term "regulatory element" encompasses enhancer elements, such as WPRE; a CMV enhancer; the R-U5' fragment in the LTR of HTLV-I ((mol. cell. biol., Vol.8 (1), pp.466-472, 1988); the SV40 enhancer; and the intron sequence between

exons

2 and 3 of rabbit β -globin (Proc. Natl. Acad. Sci. USA., Vol.78 (3), pp.1527-31, 1981).

Promoters

As used herein, the term "promoter" has a meaning well known to those skilled in the art and refers to a non-coding nucleotide sequence located upstream of a gene that promotes expression of a downstream gene. Constitutive (constitutive) promoters are nucleotide sequences that: when operably linked to a polynucleotide that encodes or defines a gene product, it results in the production of the gene product in the cell under most or all physiological conditions of the cell. An inducible promoter is a nucleotide sequence that, when operably linked to a polynucleotide that encodes or defines a gene product, causes the gene product to be produced intracellularly substantially only when an inducer corresponding to the promoter is present in the cell. A tissue-specific promoter is a nucleotide sequence that: when operably linked to a polynucleotide that encodes or defines a gene product, it results in the production of the gene product in the cell substantially only when the cell is of the tissue type to which the promoter corresponds.

NLS

A "nuclear localization signal" or "nuclear localization sequence" (NLS) is an amino acid sequence that "tags" a protein for introduction into the nucleus by nuclear transport, i.e., a protein with NLS is transported to the nucleus. Typically, NLS contains positively charged Lys or Arg residues exposed at the surface of the protein. Exemplary nuclear localization sequences include, but are not limited to, NLS from: SV40 Large T antigen, EGL-13, c-Myc and TUS protein. In some embodiments, the NLS comprises a PKKKRKV sequence. In some embodiments, the NLS comprises an AVKRPAATKKAGQAKKKKLD sequence. In some embodiments, the NLS comprises an PAAKRVKLD sequence. In some embodiments, the NLS comprises an MSRRRKANPTKLSENAKKLAKEVEN sequence. In some embodiments, the NLS comprises an KLKIKRPVK sequence. Other nuclear localization sequences include, but are not limited to, the acidic M9 domain of hnRNP A1, the sequence KIPIK and PY-NLS in the yeast transcriptional repressor Mat α 2.

Is operably connected to

As used herein, the term "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).

Complementarity

As used herein, the term "complementarity" refers to the ability of a nucleic acid to form one or more hydrogen bonds with another nucleic acid sequence by means of a conventional watson-crick or other unconventional type. Percent complementarity refers to the percentage of residues (e.g., 5, 6, 7, 8, 9, 10 out of 10 are 50%, 60%, 70%, 80%, 90%, and 100% complementary) in a nucleic acid molecule that can form hydrogen bonds (e.g., watson-crick base pairing) with a second nucleic acid sequence. "completely complementary" means that all consecutive residues of one nucleic acid sequence hydrogen bond with the same number of consecutive residues in a second nucleic acid sequence. As used herein, "substantially complementary" refers to a degree of complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.

Stringent conditions

As used herein, "stringent conditions" for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence and does not substantially hybridize to non-target sequences. Stringent conditions are generally sequence dependent and vary depending on a number of factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence.

Hybridization of

The terms "hybridize" or "complementary" or "substantially complementary" refer to a nucleic acid (e.g., RNA, DNA) that comprises a nucleotide sequence that enables it to bind non-covalently, i.e., to form base pairs and/or G/U base pairs with another nucleic acid in a sequence-specific, antiparallel manner (i.e., the nucleic acid binds specifically to the complementary nucleic acid), "anneal" or "hybridize".

Hybridization requires that the two nucleic acids contain complementary sequences, although mismatches between bases are possible. Suitable conditions for hybridization between two nucleic acids depend on the length and degree of complementarity of the nucleic acids, variables well known in the art. Typically, the length of the hybridizable nucleic acid is 8 nucleotides or more (e.g., 10 nucleotides or more, 12 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 22 nucleotides or more, 25 nucleotides or more, or 30 nucleotides or more).

It is understood that the sequence of a polynucleotide need not be 100% complementary to the sequence of its target nucleic acid to specifically hybridize. A polynucleotide may comprise 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% or more, or a target region that hybridizes thereto has 100% sequence complementarity of the target region.

Hybridization of a target sequence to a gRNA represents that at least 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of the target sequence and the nucleic acid sequence of the gRNA can hybridize to form a complex; or at least 12, 15, 16, 17, 18, 19, 20, 21, 22 or more bases of nucleic acid sequences representing the target sequence and the gRNA can be complementarily paired to hybridize to form a complex.

Expression of

As used herein, the term "expression" refers to the process by which a polynucleotide is transcribed from a DNA template (e.g., into mRNA or other RNA transcript) and/or the process by which transcribed mRNA is subsequently translated into a peptide, polypeptide, or protein. The transcripts and encoded polypeptides may be collectively referred to as "gene products". If the polynucleotide is derived from genomic DNA, expression may include splicing of mRNA in eukaryotic cells.

Joint

As used herein, the term "linker" refers to a linear polypeptide formed from a plurality of amino acid residues joined by peptide bonds. The linker of the present invention may be an artificially synthesized amino acid sequence, or a naturally occurring polypeptide sequence, such as a polypeptide having a hinge region function. Such linker polypeptides are well known in the art (see, e.g., Holliger, P. et al (1993) Proc. Natl. Acad. Sci. USA 90: 6444-.

Treatment of

As used herein, the term "treating" refers to treating or curing a disorder, delaying the onset of symptoms of a disorder, and/or delaying the development of a disorder.

Test subject

As used herein, the term "subject" includes, but is not limited to, various animals, plants, and microorganisms.

Animal(s) production

For example, a mammal, such as a bovine, equine, ovine, porcine, canine, feline, lagomorph, rodent (e.g., mouse or rat), non-human primate (e.g., macaque or cynomolgus monkey), or human. In certain embodiments, the subject (e.g., human) has a disorder (e.g., a disorder resulting from a deficiency in a disease-associated gene).

Plant and method for producing the same

The term "plant" is to be understood as including any differentiated multicellular organism capable of photosynthesis, in including crop plants at any stage of maturity or development, in particular monocotyledonous or dicotyledonous plants, vegetable crops, including artichokes, corm cabbages, sesames, leeks, asparagus, lettuce (e.g. head lettuce, leaf lettuce), bok choy, yellow croaker, melons (e.g. melons, watermelons, crow's melon, honeydew melon, cantaloupe), rape crops (e.g. brussels sprouts, cabbage, cauliflower, broccoli, collards, headless cabbages, chinese cabbages, cephalanoplos, carrots, cabbage (napa), okra, onions, celery, chickpea, parsnip, endive, potato, cucurbits (e.g. zucchini, cucurbits, etc, Squash, pumpkin), radish, dried onion, turnip cabbage, purple eggplant (also called eggplant), salsify, endive, shallot, endive, garlic, spinach, green onion, squash, leafy vegetables (greens), beets (sugar and feed beets), sweet potato, lettuce, horseradish, tomato, turnip, and spices; fruit and/or vine crops such as apple, apricot, cherry, nectarine, peach, pear, plum, prune, cherry, quince, almond, chestnut, hazelnut, pecan, pistachio, walnut, citrus, blueberry, boysenberry (boysenberry), raspberry, gooseberry, loganberry, raspberry, strawberry, blackberry, grape, avocado, banana, kiwi, persimmon, pomegranate, pineapple, tropical fruit, pome, melon, mango, papaya, and lychee; field crops, such as clover, alfalfa, evening primrose, meadowfoam, corn/maize (fodder corn, sweet corn, popcorn), hops, jojoba, peanuts, rice, safflower, small grain crops (barley, oats, rye, wheat, etc.), sorghum, tobacco, kapok, legumes (beans, lentils, peas, soybeans), oleaginous plants (oilseed rape, mustard, poppy, olives, sunflowers, coconut, castor oil plants, cocoa beans, groundnuts), arabidopsis, fibrous plants (cotton, flax, jute), lauraceae (cinnamon, camphor), or a plant such as coffee, sugar cane, tea, and natural rubber plants; and/or bedding plants, such as flowering plants, cactus, fleshy plants and/or ornamental plants, and trees, such as forests (broad leaf and evergreen trees, such as conifers), fruit trees, ornamental trees, and nut-bearing trees, as well as shrubs and other plantlets.

Advantageous effects of the invention

The invention discovers a novel Cas enzyme which can show the activity of nuclease in vivo and in vitro and has wide application prospect.

Embodiments of the present invention will be described in detail below with reference to the drawings and examples, but those skilled in the art will understand that the following drawings and examples are only for illustrating the present invention and do not limit the scope of the present invention. Various objects and advantageous aspects of the present invention will become apparent to those skilled in the art from the accompanying drawings and the following detailed description of the preferred embodiments.

Drawings

FIG. 1. results of in vitro cleavage activity experiments for different Cas proteins.

Figure 2 editing efficiency in Cas-sf4 and Cas-sf1 protoplasts.

Figure 3 type of editing of genes by Cas-sf4 and Cas-sf1 in protoplasts.

FIG. 4 is a graph of fluorescence results of Cas-sf1 when used in vitro nucleic acid detection.

FIG. 5 is a graph of fluorescence results of Cas-sf4 when used in vitro nucleic acid detection.

FIG. 6 is a graph of fluorescence results of Cas-sf8 when used in vitro nucleic acid detection.

FIG. 7 is a graph of fluorescence results of Cas-sf9 when used in vitro nucleic acid detection.

FIG. 8 is a graph of fluorescence results of Cas-sf10 when used in vitro nucleic acid detection.

Sequence information

Detailed Description

The following examples are intended to illustrate the invention only and are not intended to limit the invention. Unless otherwise indicated, the experiments and procedures described in the examples were performed essentially according to conventional methods well known in the art and described in various references. For example, conventional techniques in immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA used in the present invention can be found in Sambrook (Sambrook), friesch (Fritsch), and manitis (manitis), molecular cloning: a LABORATORY Manual (Molecular CLONING: A Laboratory Manual), 2 nd edition (1989); a Current Manual of MOLECULAR BIOLOGY experiments (Current PROTOCOLS IN MOLECULAR BIOLOGY BIOLOGY) (edited by F.M. Otsubel et al, (1987)); METHODS IN ENZYMOLOGY (METHODS IN Enzymology) series (academic Press): PCR 2: practical methods (PCR 2: A PRACTICAL APPROACH) (m.j. macpherson, b.d. heims (b.d. hames) and g.r. taylor (g.r. taylor) editions (1995)), Harlow (Harlow) and la nei (Lane) editions (1988) antibodies: a LABORATORY Manual (ANTIBODIES, A LABORATORY MANUAL), and animal cell CULTURE (ANIMAL CELL CURTURE) (edited by R.I. Freyrnib (R.I. Freshney) (1987)).

In addition, those whose specific conditions are not specified in the examples are conducted under the conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products commercially available. The examples are given by way of illustration and are not intended to limit the scope of the invention as claimed. All publications and other references mentioned herein are incorporated by reference in their entirety.

Example 1 acquisition of Cas protein

The inventor analyzes the uncultured metagenome, and identifies a new Cas enzyme by redundancy removal and protein clustering analysis, and the new Cas enzyme is named as Cas-sf4, Cas-sf1, Cas-sf3, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf10 respectively, and the amino acid sequences of the new Cas enzyme are shown as SEQ ID No. 1-7; blast results show that the Cas protein has low sequence identity with the reported Cas protein.

Analysis shows that the prototype direct repeat sequences of gRNAs corresponding to Cas-sf4, Cas-sf1, Cas-sf3, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf10 are respectively shown as SEQ ID No.8-14, and the corresponding PAM is TTN (N can be any base); the corresponding mature direct repeat sequences are respectively shown as SEQ ID No. 16-22.

Example 2 validation of in vitro cleavage Activity of Cas protein

Cas-sf4, Cas-sf1, Cas-sf3, Cas-sf6, Cas-sf8, Cas-sf9 and Cas-sf10 proteins are respectively constructed on pet30a expression vectors, transferred into escherichia coli and purified to obtain purified target proteins.

Incubating 1ug of purified Cas protein, 500ng of in vitro transcribed gRNA and a PCR product with 300ngPAM as TTC for 1h or overnight enzyme digestion at 37 ℃, wherein the sequence of the PCR product is shown as SEQ ID No. 15; gRNA sequences for different Cas proteins are shown in table 1.

TABLE 1 gRNA sequences utilized in vitro cleavage experiments

The result is shown in fig. 1, in which the arrow position is the PCR product, and as can be seen from fig. 1, different Cas proteins cut the PCR product, especially Cas-sf4, to different extents, with the highest efficiency for cutting the PCR product.

Example 2 efficiency of Cas protein editing in maize protoplasts

To verify whether the Cas protein can produce editing effect in eukaryotic cells. Firstly, plasmids expressing different Cas proteins are constructed by PAM of TTTC and plasmids expressing crRNA are transformed into corn protoplast. Extracting DNA after transformation, amplifying a segment containing a target site, and performing second-generation sequencing; the genes of the targeted corn are: SBE2.2(Zm00001d003817), gRNA sequences for different Cas proteins are shown in table 2.

TABLE 2 gRNA sequences utilized in maize protoplasts

The test results show that Cas-sf4 and Cas-sf1 can show obvious editing efficiency in corn protoplasts, and as shown in FIG. 2, the editing efficiency of Cas-sf1 in corn protoplasts is 2.4%, and the editing efficiency of Cas-sf4 in corn protoplasts is 0.7%. The editing types of Cas-sf4 and Cas-sf1 for SBE2.2 are shown in fig. 3.

Example 3 application of Cas protein in vitro nucleic acid detection

This example was tested in vitro to verify the trans cleavage activity of the Cas enzyme. The gRNA that can pair with the target nucleic acid is used in the embodiment to guide the recognition and binding of the Cas enzyme on the target nucleic acid; subsequently, the Cas enzyme activates trans cleavage activity on any single-stranded nucleic acid, thereby cleaving the single-stranded nucleic acid detector in the system; the two ends of the single-stranded nucleic acid detector are respectively provided with a fluorescent group and a quenching group, and if the single-stranded nucleic acid detector is cut, fluorescence can be excited; in other embodiments, both ends of the single-stranded nucleic acid detector may be provided with a label capable of being detected by colloidal gold.

In this example, the target nucleic acid was selected to be a single-stranded DNA, N-B-i3g1-ssDNA0, having the sequence: cgacattccgaagaacgctgaagcgctgggggcaaattgtgcaatttgcggc are provided.

The 5 'end to the 3' end of the gRNA are sequentially a DR area of different Cas proteins and a sequence of a target nucleic acid, and the sequence of the target nucleic acid is cccccagcgcuucagcguuc;

the single-stranded nucleic acid detector sequence was FAM-TTATT-BHQ 1.

The following reaction system is adopted: cas enzyme final concentration is 50nM, gRNA final concentration is 50nM, target nucleic acid final concentration is 500nM, single-stranded nucleic acid detector final concentration is 200 nM. Incubation at 37 ℃ and reading FAM fluorescence/1 min. The control group had no target nucleic acid added.

In this example, the assay was performed for trans cleavage activity of Cas-sf4, Cas-sf1, Cas-sf6, Cas-sf8, Cas-sf9, Cas-sf10, and the DR region of the gRNA was selected from the mature direct repeats of the corresponding proteins, as shown in SEQ ID Nos. 18, 19, 21, 22, 23, and 24 for Cas-sf4, Cas-sf1, Cas-sf6, Cas-sf8, Cas-sf9, and Cas-sf 10. Of these, no significant trans cleavage activity was detected by Cas-sf 6. As shown in FIGS. 4-8, compared with the control without the target nucleic acid, the single-stranded nucleic acid in the system can be cleaved by the Cas-sf4, Cas-sf1, Cas-sf8, Cas-sf9 and Cas-sf10 in the presence of the target nucleic acid, and fluorescence is rapidly reported. The above experiments reflect that, in cooperation with single-stranded nucleic acid detectors, Cas-sf4, Cas-sf1, Cas-sf8, Cas-sf9 and Cas-sf10 can be used for detection of target nucleic acids. In FIGS. 4 to 8, line 1 shows the results of the experiment with the addition of the target nucleic acid, and line 2 shows the control group without the addition of the target nucleic acid.

While specific embodiments of the invention have been described in detail, those skilled in the art will understand that: various modifications and changes in detail can be made in light of the overall teachings of the disclosure, and such changes are intended to be within the scope of the present invention. A full appreciation of the invention is gained by taking the entire specification as a whole in the light of the appended claims and any equivalents thereof.

SEQUENCE LISTING

<110> Shunheng Biotech Co., Ltd

<120> novel Cas enzyme and use

<130> SF063

<160> 22

<170> PatentIn version 3.5

<210> 1

<211> 1285

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf4

<400> 1

Met Ile Lys Met Met Lys Glu Lys Ser Ile Trp Asn Glu Phe Thr Asn

1 5 10 15

Met Tyr Ser Ile Ser Lys Thr Leu Arg Phe Lys Leu Lys Pro Ile Gly

20 25 30

Lys Thr Phe Asp Asn Ile Lys Lys Lys Gly Leu Ile Glu Glu Asp Lys

35 40 45

Asp Arg Glu Lys Gly Phe Asn Asn Ile Lys Lys Ile Met Asp Asp Tyr

50 55 60

Tyr Arg Tyr Phe Ile Glu Lys Cys Leu Asn Gly Ile Lys Leu Glu Lys

65 70 75 80

Lys Asp Leu Glu Ala Tyr Gln Lys Val Tyr Glu Asp Leu Lys Lys Asp

85 90 95

Asn Lys Asn Gln Lys Leu Lys Asn Lys Tyr Ala Lys Asn Gln Thr Ile

100 105 110

Leu Arg Lys Glu Ile Tyr Asn His Ile Lys Ser Gln Lys Glu Phe Ser

115 120 125

Gln Leu Phe Lys Lys Glu Leu Ile Thr His Ile Leu Pro Glu Trp Leu

130 135 140

Glu Lys Asn Lys Arg Leu Lys Asp Lys Asn Leu Val Asn Gln Phe Asn

145 150 155 160

Asn Trp Ser Thr Tyr Phe Thr Gly Phe Phe Asn Asn Arg Lys Asn Val

165 170 175

Phe Ser Glu Lys Glu Ile Pro Thr Ser Ile Ile Tyr Arg Ile Val His

180 185 190

Val Asn Leu Pro Lys Tyr Leu Asp Asn Val Ser Arg Phe Glu Lys Ile

195 200 205

Lys Glu Phe Asn Leu Asp Leu Lys Thr Leu Glu Asn Asp Phe Lys Asp

210 215 220

Val Leu Asp Asn Met Asp Leu Asn Glu Phe Phe Ser Val Asn Asn Phe

225 230 235 240

Asn Asn Phe Leu Asn Gln Ser Gly Ile Asp Lys Phe Asn Leu Val Ile

245 250 255

Gly Gly Lys Ser Leu Glu Asp Asn Lys Lys Ile Lys Gly Leu Asn Glu

260 265 270

Tyr Ile Asn Glu Phe Ser Gln Lys Glu Ser Asp Lys Ala Lys Arg Lys

275 280 285

Asn Ile Arg Lys Leu Lys Phe Ala Val Leu Phe Lys Gln Ile Leu Ser

290 295 300

Asp Ser Glu Ser Ser Ser Phe Val Ile Glu Lys Phe Lys Asp Lys Lys

305 310 315 320

Glu Ile Phe Glu Thr Ile Asp Val Phe Tyr Lys Glu Phe Asn Lys Tyr

325 330 335

Ser Ser Lys Ile Lys Glu Ser Ile Thr Lys Leu Asn Asn Cys Asp Ser

340 345 350

Lys Asn Val Tyr Ile Lys Asn Asp Thr Asn Leu Thr Gln Ile Ser Lys

355 360 365

Gly Leu Phe Asn Asp Trp Asn Lys Ile Asp Gly Gly Leu Arg Arg His

370 375 380

Phe Glu Asn Glu Leu Lys Ile Lys Lys Leu Thr Asp Lys Gln Arg Glu

385 390 395 400

Lys Glu Leu Asp Lys Cys Met Lys Ser Lys Tyr Phe Ser Leu Tyr Glu

405 410 415

Ile Glu Lys Gly Ile Asn Ser Leu Glu Leu Lys Asp Lys Lys Ser Ile

420 425 430

Ile Asp Tyr Phe Leu Asn Phe Ser Lys Ser Lys Asn Asp Ser Lys Val

435 440 445

Asp Leu Phe Glu Asn Ile Lys Ser Lys Tyr Ser Glu Phe Asn Lys Ile

450 455 460

Asp Arg Asn Lys Thr Thr Lys Leu Thr Glu Lys Ser Ser Glu Asn Asp

465 470 475 480

Val Glu Leu Ile Lys Thr Phe Leu Asp Ala Ile Met Glu Leu Tyr His

485 490 495

Phe Ile Lys Pro Leu His Leu Asn Phe Lys Lys Asn Glu Asp Glu Lys

500 505 510

Gly Ser Asn Ala Leu Glu Thr Asp Ser Asp Phe Tyr Asn Tyr Phe Asn

515 520 525

Glu Ile Phe Asp Lys Leu Gly Glu Ile Ile Pro Leu Tyr Asn Lys Val

530 535 540

Arg Asn Tyr Val Thr Gln Lys Pro Phe Ser Thr Lys Lys Phe Lys Leu

545 550 555 560

Asn Phe Glu Asn Ser Thr Leu Ala Ala Gly Trp Asp Ile Asn Lys Glu

565 570 575

Thr Ala Asn Thr Ala Ile Ile Leu Lys Lys Gly Thr Asp Phe Tyr Leu

580 585 590

Gly Ile Ile Asp Lys Asn Asn Thr Lys Ile Phe Leu Asn Gln Gln Asn

595 600 605

Ser Asn Ser Ser Val Val Tyr Glu Lys Leu Cys Tyr Lys Leu Val Ser

610 615 620

Gly Ala Asn Lys Met Leu Pro Lys Val Phe Leu Ser Glu Lys Gly Val

625 630 635 640

Lys Thr Phe Lys Pro Ser Lys Glu Ile Leu Lys Leu Tyr Lys Asn Glu

645 650 655

Glu His Lys Lys Gly Asn Thr Phe Ser Ile Glu Ser Cys His Lys Leu

660 665 670

Ile Asp Tyr Phe Lys Glu Cys Met Pro Asn Tyr Lys Pro Asn Pro Asn

675 680 685

Asp Lys Tyr Gly Trp Asp Val Phe Lys Phe Lys Phe Ser Asp Thr Lys

690 695 700

Thr Tyr Lys Asp Ile Ser Asp Phe Tyr Arg Glu Val Glu Asn Gln Gly

705 710 715 720

Tyr Lys Ile Trp Phe Glu Asn Ile Asp Glu Ser Tyr Leu Asn Lys Leu

725 730 735

Val Asp Glu Gly Lys Leu Tyr Leu Phe Gln Ile Trp Asn Lys Asp Phe

740 745 750

Ser Lys Tyr Ser Lys Gly Lys Pro Asn Leu His Thr Met Tyr Trp Lys

755 760 765

Glu Leu Phe Ser Glu Glu Asn Leu Lys Asp Val Ile Tyr Lys Leu Asn

770 775 780

Gly Glu Ala Glu Leu Phe Tyr Arg Glu Ala Ser Ile Lys Arg Gln Ile

785 790 795 800

Thr His Pro Lys Asn Ile Ser Ile Asp Asn Lys Asn Pro Ile Lys Asn

805 810 815

Lys Glu Lys Ser Thr Phe Asn Tyr Asp Leu Ile Lys Asn Lys Arg Tyr

820 825 830

Ser Glu Asp Ser Phe Met Phe His Cys Pro Ile Thr Leu Asn Phe Lys

835 840 845

Ala Lys Asp Gln Ser Lys Ser Ile His Lys Leu Val Asn Lys Phe Ile

850 855 860

His Asp Thr Asp Lys Lys Ile Asn Ile Val Gly Ile Asp Arg Gly Glu

865 870 875 880

Arg Asn Leu Ala Tyr Tyr Thr Leu Val Asn Ser Asp Gly Asn Ile Ile

885 890 895

Glu Gln Glu Ser Phe Asn Ile Ile Ser Asp Asp Leu Gln Arg Lys Phe

900 905 910

Asp Tyr Gln Glu Lys Leu Asp Gln Ile Glu Gly Asp Arg Asp Lys Ala

915 920 925

Arg Lys Asn Trp Lys Lys Ile Ala Asn Ile Lys Glu Met Lys Thr Gly

930 935 940

Tyr Leu Ser Gln Val Ile His Lys Ile Ser Lys Leu Val Ile Glu His

945 950 955 960

Asp Ala Ile Ile Val Leu Glu Asp Leu Asn Tyr Gly Phe Lys Arg Gly

965 970 975

Arg Phe Lys Ile Glu Lys Gln Ile Tyr Gln Lys Phe Glu Lys Met Leu

980 985 990

Val Asp Lys Leu Asn Tyr Leu Val Phe Lys Gly Ile Asp Lys Thr Leu

995 1000 1005

Ser Gly Gly Asn Leu Asn Ala Tyr Gln Leu Thr Asn Lys Phe Glu

1010 1015 1020

Ser Phe Gln Lys Leu Gly Lys Gln Ser Gly Ile Ile Tyr Tyr Val

1025 1030 1035

Asp Ala Tyr Lys Thr Ser Lys Ile Cys Pro Lys Thr Gly Phe Val

1040 1045 1050

Asn Leu Leu Tyr Pro Lys Phe Glu Asn Ile Leu Lys Ser Gln Glu

1055 1060 1065

Phe Ile Lys Lys Phe Lys Ser Ile Lys Tyr His Lys Asp Glu Asp

1070 1075 1080

Leu Phe Glu Phe Asn Phe Asn Tyr Ser Asp Phe Lys Lys Asp Gln

1085 1090 1095

Lys Glu Lys Leu Glu Gln Asp Asn Trp Ser Ile Trp Ser Asn Gly

1100 1105 1110

Thr Lys Leu Ile Asn Phe Arg Asp Lys Glu Asn Asn Asn Gln Trp

1115 1120 1125

Thr Thr Lys Glu Phe Lys Val Thr Glu Lys Leu Lys Glu Leu Phe

1130 1135 1140

Glu Asn His Asn Ile Asp Tyr Asn Ser Gly Asn Asp Leu Ile Glu

1145 1150 1155

Gln Ile Val Thr Ile Glu Asn Lys Ser Phe Tyr Glu Ser Leu Ile

1160 1165 1170

Tyr Ile Leu Lys Ile Ile Leu Lys Leu Arg Asn Ser Tyr Ser Asp

1175 1180 1185

Phe Glu Val Lys Gln Phe Lys Lys Lys Leu Gly Asn Lys Phe Lys

1190 1195 1200

Glu Cys Asp Tyr Asp Tyr Ile Leu Ser Cys Val Lys Asp Lys Glu

1205 1210 1215

Gly Asn Phe Phe Asp Ser Arg His Ala Lys Thr Asn Glu Val Lys

1220 1225 1230

Asp Ala Asp Ala Asn Gly Ala Phe His Ile Ala Leu Lys Gly Leu

1235 1240 1245

Met Val Ile Asp Lys Ile Lys Lys Phe Asp Asp Val Asp Glu Lys

1250 1255 1260

Thr Lys Ile Asp Leu Lys Ile Pro Arg Thr Asp Phe Leu Asn Tyr

1265 1270 1275

Val Val Lys Arg Ile Asn Arg

1280 1285

<210> 2

<211> 1297

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf1

<400> 2

Met Ala Thr Leu Val Ser Phe Thr Lys Gln Tyr Gln Val Gln Lys Thr

1 5 10 15

Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys Thr Gln Ala Asn Ile Asp

20 25 30

Ala Lys Gly Phe Ile Asn Asp Asp Leu Lys Arg Asp Glu Asn Tyr Met

35 40 45

Lys Val Lys Gly Val Ile Asp Glu Leu His Lys Asn Phe Ile Glu Gln

50 55 60

Thr Leu Val Asn Val Asp Tyr Asp Trp Arg Ser Leu Ala Thr Ala Ile

65 70 75 80

Lys Asn Tyr Arg Lys Asp Arg Ser Asp Thr Asn Lys Lys Asn Leu Glu

85 90 95

Lys Thr Gln Glu Ala Ala Arg Lys Glu Ile Ile Ala Trp Phe Glu Gly

100 105 110

Lys Arg Gly Asn Ser Ala Phe Lys Asn Asn Gln Lys Ser Phe Tyr Gly

115 120 125

Lys Leu Phe Lys Lys Glu Leu Phe Ser Glu Ile Leu Arg Ser Asp Asp

130 135 140

Leu Glu Tyr Asp Glu Glu Thr Gln Asp Ala Ile Ala Cys Phe Asp Lys

145 150 155 160

Phe Thr Thr Tyr Phe Val Gly Phe His Glu Asn Arg Lys Asn Met Tyr

165 170 175

Ser Thr Glu Ala Lys Ser Thr Ser Val Ala Tyr Arg Val Val Asn Glu

180 185 190

Asn Phe Ser Lys Phe Leu Ser Asn Cys Glu Ala Phe Ser Val Leu Glu

195 200 205

Ala Val Cys Pro Asn Val Leu Val Glu Ala Glu Gln Glu Leu His Leu

210 215 220

His Lys Ala Phe Ser Asp Leu Lys Leu Ser Asp Val Phe Lys Val Glu

225 230 235 240

Ala Tyr Asn Lys Tyr Leu Ser Gln Thr Gly Ile Asp Tyr Tyr Asn Gln

245 250 255

Ile Ile Gly Gly Ile Ser Ser Ala Glu Gly Val Arg Lys Ile Arg Gly

260 265 270

Val Asn Glu Val Val Asn Asn Ala Ile Gln Gln Asn Asp Glu Leu Lys

275 280 285

Val Ala Leu Arg Asn Lys Gln Phe Thr Met Val Gln Leu Phe Lys Gln

290 295 300

Ile Leu Ser Asp Arg Ser Thr Leu Ser Phe Val Ser Glu Gln Phe Thr

305 310 315 320

Ser Asp Gln Glu Val Ile Thr Val Val Lys Gln Phe Asn Asp Asp Ile

325 330 335

Val Asn Asn Lys Val Leu Ala Val Val Lys Thr Leu Phe Glu Asn Phe

340 345 350

Asn Ser Tyr Asp Leu Glu Lys Ile Tyr Ile Asn Ser Lys Glu Leu Ala

355 360 365

Ser Val Ser Asn Ala Leu Leu Lys Asp Trp Ser Lys Ile Arg Asn Ala

370 375 380

Val Leu Glu Asn Lys Ile Ile Glu Leu Gly Ala Asn Pro Pro Lys Thr

385 390 395 400

Lys Ile Ser Ala Val Glu Lys Glu Val Lys Asn Lys Asp Phe Ser Ile

405 410 415

Ala Glu Leu Ala Ser Tyr Asn Asp Lys Tyr Leu Asp Lys Glu Gly Asn

420 425 430

Asp Lys Glu Ile Cys Ser Ile Ala Asn Val Val Leu Glu Ala Val Gly

435 440 445

Ala Leu Glu Ile Met Leu Ala Glu Ser Leu Pro Ala Asp Leu Lys Thr

450 455 460

Leu Glu Asn Lys Asn Lys Val Lys Gly Ile Leu Asp Ala Tyr Glu Asn

465 470 475 480

Leu Leu His Leu Leu Asn Tyr Phe Lys Val Ser Ala Val Asn Asp Val

485 490 495

Asp Leu Ala Phe Tyr Gly Ala Phe Glu Lys Val Tyr Val Asp Ile Ser

500 505 510

Gly Val Met Pro Leu Tyr Asn Lys Val Arg Asn Tyr Ala Thr Lys Lys

515 520 525

Pro Tyr Ser Val Glu Lys Phe Lys Leu Asn Phe Ala Met Pro Thr Leu

530 535 540

Ala Asp Gly Trp Asp Lys Asn Lys Glu Arg Asp Asn Gly Ser Ile Ile

545 550 555 560

Leu Leu Lys Asp Gly Gln Tyr Tyr Leu Gly Val Met Asn Pro Gln Asn

565 570 575

Lys Pro Val Ile Asp Asn Ala Val Cys Asn Asp Ala Lys Gly Tyr Gln

580 585 590

Lys Met Val Tyr Lys Met Phe Pro Glu Ile Ser Lys Met Val Thr Lys

595 600 605

Cys Ser Thr Gln Leu Asn Ala Val Lys Ala His Phe Glu Asp Asn Thr

610 615 620

Asn Asp Phe Val Leu Asp Asp Thr Asp Lys Phe Ile Ser Asp Leu Thr

625 630 635 640

Ile Thr Lys Glu Ile Tyr Asp Leu Asn Asn Val Leu Tyr Asp Gly Lys

645 650 655

Lys Lys Phe Gln Ile Asp Tyr Leu Arg Asn Thr Gly Asp Phe Ala Gly

660 665 670

Tyr His Lys Ala Leu Glu Thr Trp Ile Asp Phe Val Lys Glu Phe Leu

675 680 685

Ser Lys Tyr Arg Ser Thr Ala Ile Tyr Asp Leu Thr Thr Leu Leu Pro

690 695 700

Thr Asn Tyr Tyr Glu Lys Leu Asp Val Phe Tyr Ser Asp Val Asn Asn

705 710 715 720

Leu Cys Tyr Lys Ile Asp Tyr Glu Asn Ile Ser Val Glu Gln Val Asn

725 730 735

Glu Trp Val Glu Glu Gly Asn Leu Tyr Leu Phe Lys Ile Tyr Asn Lys

740 745 750

Asp Phe Ala Thr Gly Ser Thr Gly Lys Pro Asn Leu His Thr Met Tyr

755 760 765

Trp Asn Ala Val Phe Ala Glu Glu Asn Leu His Asp Val Val Val Lys

770 775 780

Leu Asn Gly Gly Ala Glu Leu Phe Tyr Arg Pro Lys Ser Asn Met Pro

785 790 795 800

Lys Val Glu His Arg Val Gly Glu Lys Leu Val Asn Arg Lys Asn Val

805 810 815

Asn Gly Glu Pro Ile Ala Asp Ser Val His Lys Glu Ile Tyr Ala Tyr

820 825 830

Ala Asn Gly Lys Ile Ser Lys Ser Glu Leu Ser Glu Asn Ala Gln Glu

835 840 845

Glu Leu Pro Leu Ala Ile Ile Lys Asp Val Lys His Asn Ile Thr Lys

850 855 860

Asp Lys Arg Tyr Leu Ser Asp Lys Tyr Phe Phe His Val Pro Ile Thr

865 870 875 880

Leu Asn Tyr Lys Ala Asn Gly Asn Pro Ser Ala Phe Asn Thr Lys Val

885 890 895

Gln Ala Phe Leu Lys Asn Asn Pro Asp Val Asn Ile Ile Gly Ile Asp

900 905 910

Arg Gly Glu Arg Asn Leu Leu Tyr Val Val Val Ile Asp Gln Gln Gly

915 920 925

Asn Ile Ile Asp Lys Lys Gln Val Ser Tyr Asn Lys Val Asn Gly Tyr

930 935 940

Asp Tyr Tyr Glu Lys Leu Asn Gln Arg Glu Lys Glu Arg Ile Glu Ala

945 950 955 960

Arg Gln Ser Trp Gly Ala Val Gly Lys Ile Lys Glu Leu Lys Glu Gly

965 970 975

Tyr Leu Ser Leu Val Val Arg Glu Ile Ala Asp Met Met Val Lys Tyr

980 985 990

Asn Ala Ile Val Val Met Glu Asn Leu Asn Ala Gly Phe Lys Arg Val

995 1000 1005

Arg Gly Gly Ile Ala Glu Lys Ala Val Tyr Gln Lys Phe Glu Lys

1010 1015 1020

Met Leu Ile Asp Lys Leu Asn Tyr Leu Val Phe Lys Asp Val Glu

1025 1030 1035

Ala Lys Glu Ala Gly Gly Val Leu Asn Ala Tyr Gln Leu Thr Asp

1040 1045 1050

Lys Phe Asp Ser Phe Glu Lys Met Gly Asn Gln Ser Gly Phe Leu

1055 1060 1065

Phe Tyr Val Pro Ala Ala Tyr Thr Ser Lys Ile Asp Pro Val Thr

1070 1075 1080

Gly Phe Ala Asn Val Phe Ser Thr Lys His Ile Thr Asn Thr Glu

1085 1090 1095

Ala Lys Lys Glu Phe Ile Cys Ser Phe Asn Ser Leu Arg Tyr Asp

1100 1105 1110

Glu Ala Lys Asp Lys Phe Val Leu Glu Cys Asp Leu Asn Lys Phe

1115 1120 1125

Lys Ile Val Ala Asn Ser His Ile Lys Asn Trp Lys Phe Ile Ile

1130 1135 1140

Gly Gly Lys Arg Ile Val Tyr Asn Ser Lys Asn Lys Thr Tyr Met

1145 1150 1155

Glu Lys Tyr Pro Cys Glu Asp Leu Lys Ala Thr Leu Asn Ala Ser

1160 1165 1170

Gly Ile Asp Phe Ser Ser Ser Glu Ile Ile Asn Leu Leu Lys Asn

1175 1180 1185

Val Pro Ala Asn Arg Glu Tyr Gly Lys Leu Phe Asp Glu Thr Tyr

1190 1195 1200

Trp Ala Ile Met Asn Thr Leu Gln Met Arg Asn Ser Asn Ala Leu

1205 1210 1215

Thr Gly Glu Asp Tyr Ile Ile Ser Ala Val Ala Asp Asp Asn Glu

1220 1225 1230

Lys Val Phe Asp Ser Arg Thr Cys Gly Ala Glu Leu Pro Lys Asp

1235 1240 1245

Ala Asp Ala Asn Gly Ala Tyr His Ile Ala Leu Lys Gly Leu Tyr

1250 1255 1260

Leu Leu Gln Arg Ile Asp Ile Ser Glu Glu Gly Glu Lys Val Asp

1265 1270 1275

Leu Ser Ile Lys Asn Glu Glu Trp Phe Lys Phe Val Gln Gln Lys

1280 1285 1290

Glu Tyr Ala Arg

1295

<210> 3

<211> 1288

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf3

<400> 3

Met Tyr Ser Leu Ile Asn Tyr Phe Thr Thr Phe Thr Gly Asn Phe Ile

1 5 10 15

Asn Asn Leu Phe Thr Leu Thr Glu Tyr Ile Met Lys Thr Phe Gln Gln

20 25 30

Phe Ser Arg Val Tyr Pro Leu Ser Lys Thr Leu Arg Phe Glu Leu Lys

35 40 45

Pro Ile Gly Ser Thr Leu Glu His Ile Asn Lys Asn Gly Leu Leu Asp

50 55 60

Gln Asp Gln His Arg Ala Lys Ser Tyr Ile Gln Met Lys Asn Ile Ile

65 70 75 80

Asp Glu Tyr His Lys Glu Phe Ile Glu Asp Val Leu Asp Asp Leu Glu

85 90 95

Leu Gln Tyr Asp Asn Glu Gly Arg Asn Asn Ser Ile Ser Glu Phe Tyr

100 105 110

Thr Cys Tyr Met Ile Lys Ser Lys Asp Asp Asn Gln Arg Lys Leu Tyr

115 120 125

Glu Lys Ile Gln Glu Glu Leu Arg Lys Gln Ile Ala Asn Ala Phe Asn

130 135 140

Lys Ser Asp Ile Tyr Lys Arg Ile Phe Ser Glu Lys Leu Ile Lys Glu

145 150 155 160

Asp Leu Lys Asn Phe Ile Thr Asn Gln Lys Asp Asn Asp Lys Arg Glu

165 170 175

Gln Asp Ile Gln Ile Ile Glu Glu Phe Lys Asn Phe Thr Thr Tyr Phe

180 185 190

Thr Gly Phe His Glu Asn Arg Lys Asn Met Tyr Thr Ser Glu Ala Gln

195 200 205

Ser Thr Ala Ile Ala Tyr Arg Leu Ile His Glu Asn Leu Pro Lys Phe

210 215 220

Ile Asp Asn Ile Met Val Phe Asp Lys Val Ala Ala Ser Pro Ile Ala

225 230 235 240

Asp Ser Phe Ser Glu Leu Tyr Thr Asn Phe Glu Glu Cys Leu Asn Val

245 250 255

Met Ser Ile Glu Glu Met Phe Lys Leu Asn Tyr Phe Asn Val Val Leu

260 265 270

Thr Gln Lys Gln Ile Asp Val Tyr Asn Ala Ile Ile Gly Gly Lys Thr

275 280 285

Ile Asp Asn Thr Asn Ile Lys Ile Lys Gly Leu Asn Glu Tyr Ile Asn

290 295 300

Leu Tyr Asn Gln Gln Gln Lys Asp Lys Ser Ala Arg Leu Pro Lys Leu

305 310 315 320

Lys Pro Leu Tyr Lys Gln Ile Leu Ser Asp Arg Asn Ala Ile Ser Trp

325 330 335

Leu Pro Glu Gln Phe Glu Ser Asp Asp Lys Leu Leu Glu Ala Ile Gln

340 345 350

Lys Ala Tyr Gln Glu Leu Asp Glu Gln Val Leu Asn Arg Lys Ile Glu

355 360 365

Gly Glu His Ser Leu Arg Glu Leu Leu Val Gly Leu Ala Asp Tyr Asp

370 375 380

Leu Ser Lys Ile Tyr Ile Arg Asn Asp Leu Gln Leu Thr Asp Ile Ser

385 390 395 400

Gln Lys Val Phe Gly His Trp Gly Val Ile Ser Lys Ala Leu Leu Glu

405 410 415

Glu Leu Lys Asn Glu Val Pro Lys Lys Ser Lys Lys Glu Ser Asp Glu

420 425 430

Ala Tyr Glu Asp Arg Leu Asn Lys Val Ile Lys Ser Gln Gly Ser Ile

435 440 445

Ser Ile Ala Phe Ile Asn Asp Cys Ile Asn Lys Gln Leu Pro Glu Lys

450 455 460

Gln Lys Thr Ile Gln Gly Tyr Phe Ala Glu Leu Gly Ala Val Asn Asn

465 470 475 480

Glu Thr Ile Gln Lys Glu Asn Leu Phe Ala Gln Ile Glu Asn Ala Tyr

485 490 495

Thr Glu Val Lys Asp Leu Leu Asn Thr Pro Tyr Thr Gly Lys Asn Leu

500 505 510

Ala Gln Asp Lys Val Asn Val Glu Lys Ile Lys Asn Leu Leu Asp Ala

515 520 525

Ile Lys Ala Leu Gln His Phe Ile Lys Pro Leu Leu Gly Asp Gly Thr

530 535 540

Glu Pro Glu Lys Asp Glu Lys Phe Tyr Gly Glu Phe Ala Ala Leu Trp

545 550 555 560

Glu Glu Leu Asp Lys Ile Thr Pro Leu Tyr Asn Met Val Arg Asn Tyr

565 570 575

Met Thr Arg Lys Pro Tyr Ser Thr Glu Lys Ile Lys Leu Asn Phe Glu

580 585 590

Asn Ser Thr Leu Met Asp Gly Trp Asp Leu Asn Lys Glu Gln Ala Asn

595 600 605

Thr Thr Val Ile Leu Arg Lys Asp Gly Leu Tyr Tyr Leu Ala Ile Met

610 615 620

Asn Lys Lys His Asn Arg Val Phe Asp Val Lys Ala Met Pro Asp Asp

625 630 635 640

Gly Asp Cys Tyr Glu Lys Met Glu Tyr Lys Leu Leu Pro Gly Ala Asn

645 650 655

Lys Met Leu Pro Lys Val Phe Phe Ser Lys Ser Arg Ile Gln Glu Phe

660 665 670

Ala Pro Ser Ser Gln Leu Leu Glu Asn Tyr His Asn Asp Thr His Lys

675 680 685

Lys Gly Val Thr Phe Asn Ile Lys Asp Cys His Ala Leu Ile Asp Phe

690 695 700

Phe Lys Ala Ser Ile Asn Lys His Glu Asp Trp Cys Lys Phe Gly Phe

705 710 715 720

Arg Phe Ser Pro Thr Glu Thr Tyr Glu Asp Leu Ser Gly Phe Tyr Arg

725 730 735

Glu Val Glu Gln Gln Gly Tyr Lys Ile Ser Phe Arg Asn Val Ser Val

740 745 750

Asp Tyr Ile His Ser Leu Val Glu Glu Gly Lys Ile Phe Leu Phe Gln

755 760 765

Ile Tyr Asn Lys Asp Phe Ser Pro Tyr Ser Lys Gly Thr Pro Asn Leu

770 775 780

His Thr Leu Tyr Trp Lys Met Leu Phe Asp Glu Lys Asn Leu Ala Asp

785 790 795 800

Val Val Tyr Lys Leu Asn Gly Gln Ala Glu Val Phe Phe Arg Lys Ser

805 810 815

Ser Ile Asn Tyr Glu Gln Pro Thr His Pro Ala Asn Lys Ala Ile Asp

820 825 830

Asn Lys Asn Glu Leu Asn Lys Lys Lys Gln Ser Leu Phe Thr Tyr Asp

835 840 845

Leu Ile Lys Asp Lys Arg Tyr Thr Ile Asp Lys Phe Gln Phe His Val

850 855 860

Pro Ile Thr Met Asn Phe Lys Ser Thr Gly Asn Asp Asn Ile Asn Gln

865 870 875 880

Ser Val Asn Glu Tyr Ile Gln Gln Ser Asp Asp Leu His Ile Ile Gly

885 890 895

Ile Asp Arg Gly Glu Arg His Leu Leu Tyr Leu Thr Val Ile Asn Leu

900 905 910

Lys Gly Glu Ile Lys Glu Gln Tyr Ser Leu Asn Glu Ile Val Asn Thr

915 920 925

Tyr Lys Gly Asn Glu Tyr Arg Thr Asp Tyr His Asp Leu Leu Ser Lys

930 935 940

Arg Glu Asp Glu Arg Met Lys Ala Arg Gln Ser Trp Gln Thr Ile Glu

945 950 955 960

Asn Ile Lys Glu Leu Lys Glu Gly Tyr Leu Ser Gln Val Val His Lys

965 970 975

Ile Ala Glu Leu Met Ile Lys Tyr Asn Ala Ile Val Val Leu Glu Asp

980 985 990

Leu Asn Ala Gly Phe Met Arg Gly Arg Gln Lys Val Glu Ser Ser Val

995 1000 1005

Tyr Gln Lys Phe Glu Lys Met Leu Ile Asp Lys Leu Asn Tyr Leu

1010 1015 1020

Ala Asp Lys Lys Lys Gln Pro Glu Glu Pro Gly Gly Ile Leu Asn

1025 1030 1035

Ala Tyr Gln Leu Thr Asn Lys Phe Val Ser Phe Gln Lys Met Gly

1040 1045 1050

Lys Gln Cys Gly Phe Leu Phe Tyr Thr Gln Ala Trp Asn Thr Ser

1055 1060 1065

Lys Ile Asp Pro Val Thr Gly Phe Val Asn Leu Phe Asp Thr Arg

1070 1075 1080

Tyr Glu Thr Arg Glu Lys Ala Lys Thr Phe Phe Gly Lys Phe Asp

1085 1090 1095

Ser Ile Arg Tyr Asn Asp Glu Lys Asp Trp Phe Glu Phe Ala Phe

1100 1105 1110

Asp Tyr Thr Asn Phe Thr Ser Lys Ala Asp Gly Ser Arg Thr Asn

1115 1120 1125

Trp Lys Leu Cys Thr Tyr Gly Lys Arg Ile Glu Thr Phe Arg Asp

1130 1135 1140

Glu Lys Gln Asn Ser Asn Trp Thr Ser Lys Glu Val Val Leu Thr

1145 1150 1155

Asp Lys Phe Lys Glu Phe Phe Lys Glu Ser Asn Ile Asp Ile His

1160 1165 1170

Ser Asn Leu Lys Glu Ala Ile Met Gln Gln Asp Ser Ala Asp Phe

1175 1180 1185

Phe Lys Lys Leu Leu Tyr Leu Leu Lys Leu Thr Leu Gln Met Arg

1190 1195 1200

Asn Ser Glu Thr Gly Thr Asn Val Asp Tyr Met Gln Ser Pro Val

1205 1210 1215

Ala Asp Glu Glu Gly Asn Phe Tyr Asn Ser Asp Thr Cys Asp Ser

1220 1225 1230

Ser Leu Pro Lys Asn Ala Asp Ala Asn Gly Ala Tyr Asn Ile Ala

1235 1240 1245

Arg Lys Gly Leu Trp Ile Val Gln Gln Ile Lys Thr Ser Asp Asp

1250 1255 1260

Leu Arg Asn Leu Lys Leu Ala Ile Thr Asn Lys Glu Trp Leu Gln

1265 1270 1275

Phe Ala Gln Arg Lys Pro Tyr Leu Asp Glu

1280 1285

<210> 4

<211> 1283

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf6

<400> 4

Met Ser Asn Met Gln Gln Tyr Asp Asn Phe Ile Asn His Tyr Ala Ile

1 5 10 15

Gln Lys Thr Leu Arg Phe Glu Leu Gln Pro Ile Gly Lys Thr Arg Glu

20 25 30

His Ile Gln Lys Asn Gly Ile Ile Glu His Asp Glu Ala Leu Glu Gln

35 40 45

Lys Tyr Gln Ile Val Lys Lys Ile Ile Asp Arg Phe His Arg Lys His

50 55 60

Ile Asp Glu Ala Leu Ser Leu Ala Asp Phe Ser Lys Asp Thr Ala Met

65 70 75 80

Leu Lys Arg Phe Glu Glu Leu Tyr Trp Lys Lys Asn Lys Asn Glu Asn

85 90 95

Glu Lys Asn Glu Phe Val Lys Ile Gln Ser Asp Leu Arg Lys Arg Val

100 105 110

Val Ser Phe Leu Glu Gly Lys Val Glu Gly Asp Ala Arg Phe Ala Lys

115 120 125

Val Gln Gln Arg Tyr Gly Ile Leu Phe Asp Ala Lys Ile Phe Lys Asp

130 135 140

Lys Glu Phe Ile Ser Thr Ala Cys Asp Asp Ile Glu Lys Asp Ala Ile

145 150 155 160

Glu Ala Phe Lys Arg Phe Ala Thr Tyr Phe Thr Gly Phe His Glu Asn

165 170 175

Arg Lys Asn Met Tyr Ser Ala Asp Glu Glu Ser Thr Ala Ile Ala Tyr

180 185 190

Arg Val Ile Asn Glu Asn Leu Pro Arg Phe Leu Glu Asn Lys Ala Arg

195 200 205

Phe Glu Lys Ile Gln His Thr Val Asp Ser Lys Thr Leu Asn Glu Ile

210 215 220

Ala Thr Glu Leu Lys Pro Val Leu Glu Lys Asn Lys Leu Glu Thr Ile

225 230 235 240

Phe Thr Leu Asn Tyr Phe Gln Asn Thr Leu Ser Gln Ala Gly Ile Thr

245 250 255

Tyr Tyr Asn Thr Ile Leu Gly Gly Lys Thr Lys Glu Asn Gly Glu Lys

260 265 270

Val Gln Gly Leu Asn Glu Ile Ile Asn Leu Phe Asn Gln Lys Asn Lys

275 280 285

Asp Thr Met Leu Pro Leu Leu Lys Pro Leu Tyr Lys Gln Ile Leu Ser

290 295 300

Glu Glu Tyr Ser Thr Ser Phe Thr Ile Ser Ala Phe Glu Lys Asp Asn

305 310 315 320

Asp Val Leu Gln Ala Ile Gly Ser Phe Cys Asn Asp Cys Ile Phe Tyr

325 330 335

Ala Lys Asn Asn Val Asn Gly Lys Ala Tyr Asn Leu Leu Gln Thr Val

340 345 350

Gln Ala Phe Cys Asn Ser Ile Asp Thr Tyr Asn Asp Asn Arg Leu Asp

355 360 365

Gly Leu His Ile Glu Arg Lys Asn Leu Ala Thr Leu Ser His Gln Val

370 375 380

Tyr Gly Glu Trp Asn Ile Leu Arg Asp Ala Leu Gln Ile His Tyr Glu

385 390 395 400

Ala Tyr Glu Gln Lys Asp Asn Gly Asn Asn Asn Asn Tyr Leu Glu Ser

405 410 415

Lys Thr Phe Ser Trp Lys Ala Leu Lys Asp Ala Leu Thr Thr Tyr Lys

420 425 430

Ser Leu Val Glu Glu Ala Gln Asp Ile Asp Glu Asn Gly Phe Ile Ala

435 440 445

Tyr Phe Lys Asp Met Lys Phe Lys Glu Glu Ile Asp Gly Lys Thr Thr

450 455 460

Ser Ile Asp Leu Ile Glu Asn Ile Gln Thr Arg Tyr Lys Ser Ile Glu

465 470 475 480

Thr Ile Leu Gln Glu Asp Arg Asn Asn Lys Asn Asn Leu His Gln Glu

485 490 495

Lys Glu Lys Val Ala Thr Ile Lys Gly Phe Leu Asp Ser Val Lys Tyr

500 505 510

Leu Gln Trp Phe Leu Asn Leu Met Tyr Ile Ala Ser Pro Val Asp Asp

515 520 525

Lys Asp Tyr Asp Phe Tyr Asn Glu Leu Glu Met Tyr His Asp Thr Leu

530 535 540

Leu Pro Leu Thr Thr Leu Tyr Asn Lys Val Arg Asn Tyr Met Thr Arg

545 550 555 560

Lys Pro Tyr Ser Val Glu Lys Phe Lys Leu Thr Phe Glu Lys Ser Thr

565 570 575

Leu Leu Asp Gly Trp Asp Lys Asn Lys Glu Arg Ala Asn Leu Gly Val

580 585 590

Ile Leu Arg Lys Gly Asn Asn Tyr Tyr Leu Gly Ile Met Asn Lys Lys

595 600 605

Tyr Asn Asp Ile Phe Asp Ser Ile Pro Gly Leu Thr Thr Thr Asp Tyr

610 615 620

Cys Glu Lys Met Asn Tyr Lys Leu Leu Pro Gly Pro Asn Lys Met Leu

625 630 635 640

Pro Lys Val Phe Phe Ser Lys Lys Gly Val Gln Phe Tyr Lys Pro Ser

645 650 655

Gln Glu Ile Ile Arg Leu Tyr Asn Asn Lys Glu Phe Lys Lys Gly Asp

660 665 670

Thr Phe Asn Lys Asn Ser Leu His Lys Leu Ile Asn Phe Tyr Lys Glu

675 680 685

Ser Ile Ala Lys Thr Glu Asp Trp Ser Val Phe Gln Phe Lys Phe Lys

690 695 700

Asn Thr Asn Asp Tyr Ala Asp Ile Ser Gln Phe Tyr Lys Asp Val Glu

705 710 715 720

Arg Gln Gly Tyr Lys Ile Ser Phe Asp Lys Ile Asp Trp Glu Tyr Ile

725 730 735

Leu Leu Leu Val Asp Glu Gly Lys Leu Phe Leu Phe Lys Ile Tyr Asn

740 745 750

Lys Asp Phe Ser Pro Tyr Ser Lys Gly Lys Pro Asn Leu His Thr Ile

755 760 765

Tyr Trp Lys Asn Ile Phe Ser His Asp Asn Leu Asn Asn Val Val Tyr

770 775 780

Lys Leu Asn Gly Glu Ala Glu Val Phe Tyr Arg Lys Lys Ser Ile Glu

785 790 795 800

Tyr Pro Glu Glu Ile Leu Gln Lys Gly His His Val Asn Glu Leu Lys

805 810 815

Asp Lys Phe Lys Tyr Pro Ile Ile Lys Asp Lys Arg Tyr Ala Glu Asp

820 825 830

Lys Phe Leu Phe His Val Pro Ile Thr Met Asn Phe Leu Ser Lys Gly

835 840 845

Glu Pro Asn Ile Asn Gln Arg Val Gln Gln Tyr Ile Ala Ser Thr Ser

850 855 860

Glu Asp Tyr His Ile Ile Gly Ile Asp Arg Gly Glu Arg Asn Leu Leu

865 870 875 880

Tyr Leu Ser Leu Ile Asp Ala Thr Gly Lys Ile Ile Lys Gln Leu Ser

885 890 895

Leu Asn Thr Ile Lys Asn Glu Asn Phe Asn Thr Thr Ile Asp Tyr His

900 905 910

Ala Lys Leu Asp Glu Lys Glu Lys Lys Arg Glu Glu Ala Arg Lys Asn

915 920 925

Trp Asp Val Ile Glu Asn Ile Lys Glu Leu Lys Glu Gly Tyr Leu Ser

930 935 940

Gln Val Val His Gln Ile Ala Lys Leu Met Val Glu Tyr Lys Ala Ile

945 950 955 960

Leu Val Met Glu Asp Leu Asn Thr Gly Phe Lys Arg Gly Arg Phe Lys

965 970 975

Val Glu Lys Gln Val Tyr Gln Lys Phe Glu Lys Met Met Ile Asp Lys

980 985 990

Leu Asn Tyr Leu Val Leu Lys Asp Arg Gln Ala Thr Gln Pro Gly Gly

995 1000 1005

Ser Leu Lys Ala Tyr Gln Leu Ala Ser Ser Leu Glu Ser Phe Lys

1010 1015 1020

Lys Leu Gly Lys Gln Cys Gly Met Ile Phe Tyr Val Pro Ala Val

1025 1030 1035

Tyr Thr Ser Lys Ile Asp Pro Thr Thr Gly Phe Tyr Asn Phe Leu

1040 1045 1050

Arg Val Asp Val Ser Thr Leu Asn Ser Ala His Ser Phe Phe Asn

1055 1060 1065

Arg Phe Asn Ala Ile Val Tyr Asn Asn Glu Gln Asp Tyr Phe Glu

1070 1075 1080

Phe His Cys Thr Tyr Lys Asn Phe Val Ser Glu Pro Ser Leu Gln

1085 1090 1095

Lys Asn Val Lys Ser Ser Lys Met His Glu Tyr Asn Asn Leu Lys

1100 1105 1110

Asp Thr Thr Trp Val Leu Cys Ser Thr His His Glu Arg Tyr Lys

1115 1120 1125

Lys Phe Lys Asn Lys Ser Gly Tyr Phe Glu Tyr Lys Pro Val Asn

1130 1135 1140

Val Thr Gln Ser Leu Lys Gln Leu Phe Asp Glu Ala Gly Ile Asp

1145 1150 1155

Tyr Gln Ala Gly Ala Asp Leu Lys Glu Ala Ile Val Thr Gly Lys

1160 1165 1170

Asn Thr Lys Leu Leu Lys Gly Leu Gly Glu Gln Leu Asn Ile Leu

1175 1180 1185

Leu Ala Met Arg Tyr Asn Asn Gly Lys His Gly Asn Glu Glu Lys

1190 1195 1200

Asp Tyr Ile Val Ser Pro Val Lys Asn Asn Tyr Gly Lys Phe Phe

1205 1210 1215

Cys Thr Leu Asp Gly Asp Ala Ser Leu Pro Val Asp Ala Asp Ala

1220 1225 1230

Asn Gly Ala Tyr Ala Ile Ala Leu Lys Gly Leu Met Leu Val Glu

1235 1240 1245

Arg Met Lys Ser Asn Lys Asp Ile Lys Gly Arg Ile Asp Tyr Phe

1250 1255 1260

Ile Ser Asn Asn Glu Trp Phe Asn Tyr Leu Ile Ala Lys Asn Thr

1265 1270 1275

Leu Asn Lys Ser Lys

1280

<210> 5

<211> 1275

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf8

<400> 5

Met Arg Lys Ser Phe Lys Asp Phe Thr Asn Met Tyr Pro Val Gln Lys

1 5 10 15

Thr Leu Arg Phe Glu Leu Lys Pro Leu Gly Lys Thr Glu Gln His Ile

20 25 30

Lys Glu Ser Phe Ile Ile Glu His Asp Glu Gln Arg Ser Asn Asp Tyr

35 40 45

Lys Ala Ala Lys Lys Ile Ile Asp Asp Tyr His Arg Leu Phe Ile Gln

50 55 60

Lys Thr Leu Ser Gln Thr Asp Leu Asp Trp Lys Asp Leu Lys Glu Ala

65 70 75 80

Leu Glu Tyr Asp Gly Glu Asp Lys Asp Lys Arg Leu Glu Thr Val Gln

85 90 95

Lys Asp Lys Arg Ser Lys Ile Ile Cys Arg Phe Thr Glu Gln Pro Glu

100 105 110

Phe Lys Lys Leu Phe Gly Lys Glu Leu Phe Ser Glu Leu Leu Pro Glu

115 120 125

Met Ile Asn Ala Glu Asn Ala Asp Asn Lys Asp Glu Lys Leu His Ala

130 135 140

Ala Ala Ala Phe Asp Lys Phe Ser Thr Tyr Phe Lys Gly Phe His Asp

145 150 155 160

Asn Arg Arg Asn Ile Tyr Ser Asn Glu Glu Ile Ser Thr Ser Val Ala

165 170 175

Tyr Arg Ile Val His Gln Asn Phe Pro Lys Phe Leu Ala Asn Ala Glu

180 185 190

Thr Phe Lys Thr Ile Cys Lys Lys Ala Pro Glu Ile Ile Glu Gln Thr

195 200 205

Gln Lys Glu Leu Ser Lys Ile Leu Gly Lys His Lys Leu Glu Asp Ile

210 215 220

Phe Arg Ile Glu Ser Phe Asn Asn Val Met Thr Gln Asp Gly Ile Asp

225 230 235 240

Tyr Tyr Asn Asn Ile Ile Asp Gly Val Pro Cys Glu Ala Gly Lys Lys

245 250 255

Lys Leu Arg Gly Val Asn Glu Phe Ala Ser Ile Tyr Arg Gln Gln His

260 265 270

Pro Asp Thr Lys Ile Gln Ile Lys Met Val Pro Leu Tyr Lys Gln Ile

275 280 285

Leu Ser Asp Arg Ala Thr Leu Ser Phe Met Pro Ala Ala Leu Asp Asn

290 295 300

Asp Gly Asp Ala Phe Glu Ala Val Ala Gly Leu Glu Lys Met Leu Asn

305 310 315 320

Glu Pro Asp Ala Glu Thr Lys Thr Ser Val Leu Gln Gln Ile Ser Ala

325 330 335

Leu Phe Ala Lys Pro Ser Asp Tyr Ser Gln Glu Arg Val Trp Ile Asn

340 345 350

Gln Lys Ser Val Pro Val Val Ser Ala Ala Leu Phe Gly Ser Trp Asp

355 360 365

Thr Leu Gly Ser Ala Leu Ala Ala Tyr Lys Glu Asn Glu Leu Gly Asp

370 375 380

Thr Arg Gly Lys Asp Lys Lys Val Glu Lys Trp Ile Lys Ser Lys Ala

385 390 395 400

Phe Ser Phe Ala Ser Leu Asp Ala Ala Ala Asp Phe Tyr Lys Asp Ser

405 410 415

Leu Pro Gly Glu Lys Ser Ala Arg Arg Ile Lys Asp Tyr Phe Ala Gly

420 425 430

Cys Arg Glu Leu Val Lys Asn Thr Ser Glu Lys Gln Lys Glu Phe Asp

435 440 445

Lys Ile Lys Asp Ser Ala Leu Phe Gly Asn Glu Thr Asn Thr Ser Ala

450 455 460

Val Lys Ala Tyr Leu Asp Ser Leu Asn Asp Ile Leu Arg Phe Met Arg

465 470 475 480

Pro Phe Glu Thr Glu Asp Ile Thr Asp Ile Asp Thr Glu Phe Tyr Ser

485 490 495

Ala Tyr Ser Val Leu Leu Glu Lys Ile Lys Met Val Ile Pro Val Tyr

500 505 510

Asn Thr Val Arg Asn Tyr Val Thr Lys Lys Pro Phe Lys Thr Asp Lys

515 520 525

Phe Lys Leu Asn Phe Glu Asn Pro Thr Leu Ala Tyr Gly Trp Asp Lys

530 535 540

Ser Lys Glu Gln Ala Asn Thr Ala Ile Leu Leu Met Lys Asp Asp Lys

545 550 555 560

Tyr Tyr Leu Gly Ile Met Asn Ala Lys His Lys Ile Lys Pro Ala Glu

565 570 575

Leu Ala Asp Asp His Asn Gly Asp Gly Tyr Lys Lys Met Gln Tyr Met

580 585 590

Gln Met Ser Gly Pro Thr Lys Asp Leu Pro Asn Leu Leu Val Ile Asp

595 600 605

Gly Lys Thr Val Arg Lys Thr Gly Ser Lys Asp Ala Asn Gly Val Asn

610 615 620

Arg Lys Gln Glu Gln Leu Lys Asn Thr Tyr Leu Pro Pro Asp Ile Asn

625 630 635 640

Glu Ile Arg Leu Asp Gly Ser Tyr Leu Glu Thr Ser Asn Asn Phe Ser

645 650 655

Lys Lys Asn Ser Gln Lys Tyr Leu Ala Tyr Tyr Met Lys Leu Leu Lys

660 665 670

Glu Tyr Lys Ser Asn Phe Asp Phe Asn Phe Lys Lys Ala Asn Glu Tyr

675 680 685

Glu Ser Tyr Tyr Asp Phe Thr Asn Asp Ile Lys Lys Gln Cys Tyr Ser

690 695 700

Leu Thr Phe Thr Asn Leu Ala Glu Asn Lys Val Asp Lys Trp Val Asp

705 710 715 720

Glu Gly Arg Leu Tyr Leu Phe Gln Ile Trp Asn Lys Asp Phe Ala Glu

725 730 735

Gly Val Ser Gly Arg Pro Asn Leu His Thr Leu Tyr Trp Lys Ala Leu

740 745 750

Phe Ser Pro Glu Asn Leu Lys Asn Val Val Tyr Lys Leu Asn Gly Lys

755 760 765

Ala Glu Leu Phe Phe Arg Arg Lys Ser Ile Asn Glu Pro Val Val His

770 775 780

Pro Thr Gly Ser Lys Lys Val Asn Arg Arg Asp Ile Asp Gly Ser Pro

785 790 795 800

Ile Asp Asp Glu Thr Phe Asn Glu Ile Tyr Leu Tyr Ala Asn Gly Lys

805 810 815

Arg Ala Leu Gly Ser Leu Gly Ala Ala Ala Arg Ala Leu Val Glu Ser

820 825 830

Lys Arg Val Arg Ile Thr Asp Val Lys His Glu Leu Val Lys Asp Lys

835 840 845

Arg Tyr Thr Gln Asp Lys Phe Phe Phe His Val Ser Leu Thr Ile Asn

850 855 860

Phe Lys Ala Ser Gly Lys Glu Asn Ile Asn Ser Asp Val Asn Leu Phe

865 870 875 880

Leu Lys Asn Asn Lys Asp Val Lys Ile Ile Gly Ile Asp Arg Gly Glu

885 890 895

Arg Asn Leu Ile Tyr Ile Ser Leu Ile Asp Arg Lys Gly Asn Ile Ile

900 905 910

Glu Gln Lys His Phe Asn Thr Val Gly Gly Met Asp Tyr His Ala Lys

915 920 925

Leu Asp Gln Arg Glu Lys Ala Arg Asp Glu Ala Arg Lys Ser Trp Lys

930 935 940

Thr Ile Gly Asn Ile Lys Glu Leu Lys Glu Gly Tyr Leu Ser Gln Val

945 950 955 960

Ile His Glu Ile Thr Lys Met Ala Val Glu Asn Asp Ala Ile Ile Ala

965 970 975

Met Glu Asp Leu Asn Val Gly Phe Lys Arg Gly Arg Phe Lys Val Glu

980 985 990

Lys Gln Val Tyr Gln Lys Phe Glu Glu Met Leu Ile Asn Lys Leu Asn

995 1000 1005

Tyr Leu Ser Phe Lys Asp Thr Gly Glu Asn Lys Gln Cys Gly Ile

1010 1015 1020

Arg Asn Gly Leu Gln Leu Ala Gly Lys Phe Thr Ser Phe Lys Lys

1025 1030 1035

Ile Gly Lys Gln Cys Gly Ile Ile Phe Tyr Val Pro Ala Gly Tyr

1040 1045 1050

Thr Ser Lys Ile Asp Pro Val Thr Gly Phe Val Ser Val Phe Asn

1055 1060 1065

Leu Ser Ala Val Thr Ser Gln Glu Lys Gln Lys Glu Phe Ile Asp

1070 1075 1080

Arg Leu Asp Ser Ile Arg Tyr Asp Lys Lys Leu Asp Met Phe Val

1085 1090 1095

Phe Ser Phe Asp Tyr Ser Glu Phe Lys Thr Tyr Gln Thr Leu Pro

1100 1105 1110

Val Thr Lys Trp Asp Val Tyr Thr Asn Gly Lys Arg Ile Ile Asn

1115 1120 1125

Lys Arg Glu Gly Ser Arg Trp Ile Pro Gln Asn Val Val Pro Thr

1130 1135 1140

Glu Glu Met Lys Arg Thr Leu Lys Gln Leu Gly Ile Glu Tyr Glu

1145 1150 1155

Ser Gly Arg Asp Ile Leu Pro Val Ile Met Glu Arg Asp Lys Lys

1160 1165 1170

Leu Ala Ser Asp Val Phe Tyr Ile Phe Lys Asn Thr Leu Gln Met

1175 1180 1185

Arg Asn Ser Asn Ala Ala Thr Gly Glu Asp Tyr Ile Ile Ser Pro

1190 1195 1200

Val Lys Gly Lys Lys Gly Val Phe Phe Ser Ser Ser Ala Lys Asp

1205 1210 1215

Lys Ser Leu Pro Gln Asp Ala Asp Ala Asn Gly Ala Tyr His Ile

1220 1225 1230

Ala Leu Lys Gly Ser Leu Val Leu Asp Ala Ile Asp Glu Lys Leu

1235 1240 1245

Lys Asp Asp Gly Lys Met Ser Tyr Lys Asp Met Tyr Ile Ser Asn

1250 1255 1260

Pro Asp Trp Phe Lys Phe Met Gln Thr Gly Lys His

1265 1270 1275

<210> 6

<211> 1273

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf9

<400> 6

Met Lys Glu Asn Phe Ile Gly Lys Tyr Gln Ile Thr Lys Thr Leu Arg

1 5 10 15

Phe Ser Leu Ile Pro Ile Gly Lys Thr Glu Glu Tyr Phe Asn Ala Arg

20 25 30

Cys Met Leu Glu Glu Asp Glu Gln Arg Ala Glu Asp Tyr Val Lys Val

35 40 45

Lys Ser Phe Ile Asp Glu Tyr His Lys Ala Phe Ile Glu Arg Ile Leu

50 55 60

Ser Asn Leu Ile Lys Gln Lys Ser Thr Ser Lys Gly Thr Glu Phe Ile

65 70 75 80

Glu Lys Val Arg Asp Tyr Ala Asp Leu Tyr Asn Ser Ser Gln Arg Asp

85 90 95

Asp Lys Lys Leu Asn Lys Ile Gly Glu Glu Leu Arg Lys Ser Ile Ser

100 105 110

Glu Ala Phe Thr Lys Asp Asp His Tyr Asp Arg Leu Phe Asn Lys Asp

115 120 125

Ile Ile Glu Glu Leu Leu Pro Glu Tyr Leu Gly Asp Ser Arg Lys Glu

130 135 140

Asp Thr Lys Ile Val Glu Asn Phe Val Gly Phe Lys Thr Tyr Phe Asn

145 150 155 160

Gly Phe Phe Glu Asn Arg Lys Asn Met Tyr Val Lys Glu Gln Glu Thr

165 170 175

Thr Ala Ile Ala Tyr Arg Cys Ile Asp Glu Asn Leu Pro Arg Phe Leu

180 185 190

Asp Asn Ala Thr Ile Trp Lys Lys Lys Leu Arg Asp Ala Leu Pro Glu

195 200 205

Glu Asp Ile Cys Arg Leu Asn Lys Glu Cys Thr Asp Phe His Asp Lys

210 215 220

Lys Val Glu Asp Ile Phe Asp Ile Asp Phe Phe Thr Gln Val Leu Ser

225 230 235 240

Gln Ser Gly Ile Asp Trp Tyr Asn Gln Ile Leu Gly Gly Tyr Thr Lys

245 250 255

Glu Gly Asn Ile Lys Ile Gln Gly Leu Asn Glu Tyr Ile Asn Thr Tyr

260 265 270

Asn Asp Lys Val Ser Glu Lys Glu Arg Ser His Arg Leu Pro Leu Leu

275 280 285

Lys Pro Leu Tyr Lys Gln Ile Leu Ser Asp Arg Val Ser Thr Ser Phe

290 295 300

Ile Pro Glu Lys Phe Thr Ser Asp Glu Glu Leu Leu Ser Ala Val His

305 310 315 320

Lys Leu Tyr Thr Val Lys Glu Asp Gly Arg Val Ser Leu Lys Glu Ala

325 330 335

Ile Ser Glu Ile Lys Glu Leu Phe Ala Glu Leu Ser Ile Phe Asn Leu

340 345 350

Ser Gly Ile Phe Val Ser Ala Lys Thr Gly Leu Ser Asp Val Ser Asn

355 360 365

Arg Val Phe Gly Tyr Trp Gly Ala Val Lys Glu Gly Trp Ile Asp Asn

370 375 380

Tyr His Glu Asn Asn Pro Leu Gly Lys Arg Glu Ser Ile Glu Leu Tyr

385 390 395 400

Glu Lys Lys Leu Asn Lys Glu Tyr Gly Asn Ile Pro Ser Phe Ser Ile

405 410 415

Glu Glu Ile Gln Gln Phe Gly Glu Gly Lys Ala Lys Glu Glu Tyr Arg

420 425 430

Asn Glu Thr Val Ile His Phe Tyr Ser Gly Thr Val Arg Lys Gln Ser

435 440 445

Asn Lys Ile Cys Asp Ser Tyr Lys Asp Ala Tyr Lys Arg Ile Lys Pro

450 455 460

Leu Leu Glu Ala Pro Asn Glu Ser Gly Asn Asp Leu Arg Ser Asn Lys

465 470 475 480

Glu Ala Ile Glu Leu Leu Lys Ile Phe Leu Asp Ser Val Lys Glu Leu

485 490 495

Glu Phe Leu Val Lys Pro Phe Arg Gly Glu Gly Asn Glu Thr Asp Lys

500 505 510

Asp Asn Asn Phe Tyr Asn Arg Phe Leu Val Ala Phe Asp Thr Phe Thr

515 520 525

Asp Phe Asp Phe Leu Tyr Asp Lys Val Arg Asn Tyr Ile Thr Gln Lys

530 535 540

Pro Phe Ser Thr Glu Lys Ile Lys Leu Asn Phe Asn Asn Pro Gln Phe

545 550 555 560

Leu Gly Gly Trp His Glu Asn Lys Glu Ser Ser Tyr Ser Ser Ile Leu

565 570 575

Leu Arg Ser Ala Gly Lys Tyr Tyr Leu Gly Val Met Asp Thr Lys Ser

580 585 590

Lys His Ser Phe Lys Lys Tyr Pro Ser Pro Lys Ser Lys Asn Asp Val

595 600 605

Val Glu Lys Met Phe Leu His Gln Val Ala Asn Pro Ala Lys Asp Val

610 615 620

Gln Asn Leu Met Val Ile Asn Gly Lys Thr Val Arg Arg Thr Gly Arg

625 630 635 640

Lys Glu Thr Glu Gly Glu Tyr Lys Gly Glu Asn Leu Arg Leu Glu Glu

645 650 655

Leu Lys Asn Thr His Leu Pro Glu Glu Ile Asn Arg Ile Arg Lys Ser

660 665 670

Gln Ser Tyr Leu Lys Ser Ser Gly Glu Ile Phe Ser Lys Gln Asp Leu

675 680 685

Val Ala Phe Ile Lys Phe Tyr Met Glu Arg Thr Lys Glu Tyr Tyr Thr

690 695 700

Asn Ser His Phe Glu Phe Arg Asn Ala Glu Asn Tyr Gln Asp Phe Lys

705 710 715 720

Glu Phe Thr Asp Asp Ile Asp Ala Gln Ala Tyr Gln Val His Phe Lys

725 730 735

Glu Ile Ser His Ser Phe Ile Asn Ser Leu Val Asp Lys Gly Glu Leu

740 745 750

Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser Pro Tyr Ser Arg Gly

755 760 765

Thr Pro Asn Leu His Thr Leu Tyr Phe Lys Met Leu Phe Asp Glu Arg

770 775 780

Asn Leu Ala Asp Val Val Phe Lys Leu Asp Gly Asn Ala Glu Met Phe

785 790 795 800

Tyr Arg Lys Ala Ser Leu Lys Lys Gln Ile Thr His Pro Ala Asn Lys

805 810 815

Pro Ile Pro Asn Lys Asn Thr Met Asn Pro Lys Lys Glu Ser Thr Phe

820 825 830

Gly Tyr Asp Ile Ile Lys Asp Lys Arg Tyr Thr Glu Arg Gln Phe Ser

835 840 845

Leu His Phe Pro Ile Thr Leu Asn Phe Lys Glu Ala Lys Asn Ala Asn

850 855 860

Ile Ser Lys Glu Val Arg Asp Thr Leu Tyr Lys Ser Asp Leu Pro Tyr

865 870 875 880

Ile Ile Gly Ile Asp Arg Gly Glu Arg Asn Leu Leu Tyr Ile Cys Val

885 890 895

Ile Asp Gly Asn Gly Asn Ile Val Glu Gln Met Ser Met Asn Glu Ile

900 905 910

Thr Thr Asp Asn Asn Tyr Lys Val Asn Tyr His Asn Leu Leu Gln Arg

915 920 925

Lys Glu Glu Glu Arg Lys Lys Ala Arg Gly Asn Trp Ser Val Ile Glu

930 935 940

Asn Ile Lys Glu Leu Lys Glu Gly Tyr Leu Ser Gln Val Ile Asn Lys

945 950 955 960

Ile Cys Gly Leu Val Ile Lys Tyr Asn Ala Val Ile Ala Met Glu Asn

965 970 975

Leu Asn Tyr Gly Phe Lys Arg Gly Arg Phe Arg Val Glu Lys Gln Val

980 985 990

Tyr Gln Lys Phe Glu Asn Asn Leu Ile Lys Lys Leu Asn Tyr Leu Ala

995 1000 1005

Asp Lys Lys Leu Pro Pro Glu Gln Asp Gly Gly Leu Leu Arg Ala

1010 1015 1020

Tyr Gln Leu Thr Glu Lys Phe Glu Lys Ile Asn Lys Ser Asn Gln

1025 1030 1035

Asn Gly Ile Ile Phe Phe Val Pro Ala Trp Leu Thr Ser Lys Ile

1040 1045 1050

Asp Pro Thr Thr Gly Phe Thr Asn Leu Leu Tyr Pro Arg Tyr Glu

1055 1060 1065

Ser Val Lys Lys Ala Lys Asn Phe Phe Ala Asn Phe Asn Leu Ile

1070 1075 1080

Thr Tyr Asp Ala Ser Glu Asp Met Phe Arg Phe Asp Phe Asp Tyr

1085 1090 1095

Thr Lys Phe Leu Cys Gly Val Ala Asp Phe Lys Lys Lys Trp Ser

1100 1105 1110

Val Trp Ser Tyr Gly Glu Arg Ile Lys Thr Arg Arg Lys Glu Lys

1115 1120 1125

His Asn Asn Asp Ile Glu Tyr Thr Thr Val Gln Leu Thr Asp Glu

1130 1135 1140

Phe Lys Asn Leu Phe Glu Asn Tyr Arg Ile Asn Tyr Leu Asp Asn

1145 1150 1155

Leu Gln Lys Gln Ile Ile Glu Ala Asp Asp Lys Glu Phe Phe Tyr

1160 1165 1170

Ser Leu Tyr Ser Leu Leu Asn Leu Thr Leu Gln Met Arg Asn Ser

1175 1180 1185

Asn Pro Asn Ser Gly Asp Asp Tyr Leu Ile Ser Pro Val Arg Asn

1190 1195 1200

Thr Ser Gly Gly Phe Tyr Asp Ser Arg Asn Tyr Leu Lys Ser Gly

1205 1210 1215

Asn Leu Ser Leu Pro Val Asp Ala Asp Ala Asn Gly Ala Tyr Asn

1220 1225 1230

Ile Ala Arg Lys Cys Leu Trp Gln Ile Met Lys Leu Lys Ser Leu

1235 1240 1245

Ser Glu Asp Glu Thr Lys Lys Pro Asn Leu Thr Ile Ser Asn Lys

1250 1255 1260

Asp Trp Leu Cys Tyr Ala Gln Glu Asn Lys

1265 1270

<210> 7

<211> 1273

<212> PRT

<213> Artificial Sequence

<220>

<223> Cas-sf10

<400> 7

Met Gln Asp Lys Thr Gly Trp Ser Ser Phe Thr Asn Lys Tyr Ser Leu

1 5 10 15

Ser Lys Thr Leu Arg Phe Glu Leu Lys Pro Val Gly Asn Thr Gln Lys

20 25 30

Met Leu Glu Asp Asp Gly Val Phe Gln Lys Asp Arg Glu Arg Gln Glu

35 40 45

Asn Tyr Lys Lys Val Lys Pro Phe Met Asp Lys Leu His Arg Glu Phe

50 55 60

Ile Lys Glu Ala Leu Asn Asn Leu Lys Leu Glu Gly Leu Thr Glu Tyr

65 70 75 80

Phe Glu Ile Phe Lys Lys Phe Arg Lys Asp Lys Asn Asn Lys Glu Leu

85 90 95

Lys Asn Ala Glu Lys Lys Leu Arg Gln Ile Ile Gly Arg Cys Tyr Thr

100 105 110

Glu Thr Ala Gln Ile Trp Val Glu Lys Tyr Lys Glu Phe Gly Phe Lys

115 120 125

Lys Lys Asn Ile Gly Phe Leu Phe Glu Glu Gly Val Phe Glu Leu Met

130 135 140

Lys Leu Lys Tyr Gly Asn Asp Glu Ala Ser Gln Ile Glu Lys Asn Gly

145 150 155 160

Glu Val Leu Ser Ile Phe Asp Gly Trp Lys Gly Phe Leu Gly Tyr Phe

165 170 175

Lys Lys Phe Phe Glu Thr Arg Asn Asn Phe Tyr Lys Asp Asp Gly Thr

180 185 190

Ser Thr Ala Val Ser Thr Arg Ile Ile Asn Glu Asn Leu Lys Ile Tyr

195 200 205

Leu Asp Asn Leu Ile Lys Tyr Asn Lys Ile Lys Asp Lys Val Asp Phe

210 215 220

Lys Glu Ala Asp Ile Leu Gln Glu Asn Lys Leu Asn Leu Ser Asp Phe

225 230 235 240

Phe Asn Val Glu Ser Tyr Ala Lys Tyr Ser Leu Gln Lys Gly Ile Asp

245 250 255

Tyr Tyr Asn Glu Ile Leu Gly Gly Lys Thr Leu Lys Asn Gly Thr Lys

260 265 270

Leu Lys Gly Leu Asn Glu Val Ile Asn Glu Tyr Lys Gln Lys Asn Lys

275 280 285

Ser Gly Glu Leu Ser Lys Phe Lys Met Leu Lys Lys Gln Ile Leu Gly

290 295 300

Glu Gly Glu Asp Arg Thr Leu Phe Glu Glu Ile Glu Asn Glu Asp Glu

305 310 315 320

Leu Lys Asp Val Leu Lys Asp Phe Phe Tyr Asn Ala Asp Pro Lys Ile

325 330 335

Thr Leu Phe Lys Thr Leu Leu Glu Asp Phe Phe Ser Asn Thr Glu Lys

340 345 350

Tyr Lys Asp Glu Leu Asp Lys Ile Tyr Phe Asn Thr Val Ala Ile Asn

355 360 365

Gly Ile Leu His Arg Trp Val Asp Asp Ser Gly Val Phe Gln Lys Tyr

370 375 380

Leu Phe Glu Val Leu Lys Ser Asn Lys Leu Val Lys Ser Asn His Tyr

385 390 395 400

Asp Lys Lys Glu Asp Ser Tyr Lys Phe Pro Asp Phe Ile Ser Phe Glu

405 410 415

His Ile Lys Val Ala Leu Glu Asn Cys Glu Arg Asp Gly Leu Lys Asp

420 425 430

Lys Phe Trp Lys Glu Lys Tyr Tyr Thr Lys Glu Cys Leu Thr Glu Asn

435 440 445

Gly Leu Ala Asn Leu Trp Gln Glu Phe Leu Glu Ile Tyr Lys Cys Glu

450 455 460

Phe Lys Lys Leu Tyr Asp Tyr Lys Thr Asp Asp Asn Asp Cys Tyr Leu

465 470 475 480

Gln Tyr Arg Asp Asn Tyr Lys Lys Tyr Ile Leu Asp Ala Asn Phe Asn

485 490 495

Pro Lys Glu Lys Ser Ala Lys Asp Ile Ile Lys Asp Tyr Leu Asp Ser

500 505 510

Val Leu Ser Ile Tyr Gln Leu Ala Lys Tyr Phe Ala Leu Glu Lys Lys

515 520 525

Lys Val Trp Thr Thr Asp Tyr Glu Thr Gly Asp Phe Tyr Tyr Glu Tyr

530 535 540

Ile Lys Phe Tyr Glu Asp Thr Tyr Glu Gln Ile Ile Lys Pro Tyr Asn

545 550 555 560

Leu Val Arg Asn Tyr Leu Thr Arg Lys Pro Ile Asn Thr Ala Lys Lys

565 570 575

Trp Lys Leu Asn Phe Asp Asn Ala Tyr Leu Ala Ser Gly Trp Asp Lys

580 585 590

Asp Lys Glu Val Ser Asn Leu Thr Val Ile Leu Arg Arg Asp Glu Gln

595 600 605

Tyr Tyr Leu Ala Ile Met Lys Lys Gly Lys Asn Lys Ile Phe Glu Lys

610 615 620

Lys Phe Ser Cys Gly Glu Phe Glu Lys Met Glu Tyr Lys Gln Ile Ala

625 630 635 640

Glu Ala Ser Ser Asp Ile His Asn Leu Val Leu Met Asn Asp Gly Ser

645 650 655

Cys Arg Arg Cys Ile Lys Met His Asp Lys Arg Lys Tyr Trp Pro Leu

660 665 670

Asp Ile Ser Ile Ile Lys Glu Lys Lys Ser Tyr Ala Lys Glu Asn Phe

675 680 685

Val Arg Arg Asp Phe Glu Arg Phe Val Asn Tyr Met Lys Lys Cys Ser

690 695 700

Leu Leu Tyr Trp Lys Glu Tyr Asp Leu Lys Phe Ser Asp Thr Ser Thr

705 710 715 720

Tyr Lys Asn Ile Asn Asp Phe Thr Asn Glu Ile Ala Ser Gln Gly Tyr

725 730 735

Lys Leu Ser Phe Ser Ala Ile Pro Glu Ser Tyr Ile Asn Glu Lys Asn

740 745 750

Asn Asn Gly Glu Leu Tyr Leu Phe Gln Ile Tyr Asn Lys Asp Phe Gly

755 760 765

Ile Lys Thr Glu Gly Asn Lys Asn Leu His Thr Met Tyr Trp Glu Ser

770 775 780

Ile Phe Ser Glu Glu Asn Arg Phe Arg Asn Phe Ile Val Lys Leu Asn

785 790 795 800

Gly Lys Ala Glu Ile Phe Tyr Arg Pro Lys Ser Glu Gln Val Glu Lys

805 810 815

Glu Gln Arg Asn Phe Thr Arg Glu Ile Ile Lys Asn Arg Arg Tyr Thr

820 825 830

Glu Asn Lys Ile Tyr Phe His Cys Pro Ile Thr Leu Asn Arg Ile Ser

835 840 845

Arg Glu Asn Val Lys Lys Phe Asn Asn Gly Ile Asn Asn Tyr Ile Ala

850 855 860

Thr Asn Pro Asn Ile Asn Ile Leu Gly Val Asp Arg Gly Glu Lys His

865 870 875 880

Leu Val Tyr Tyr Ala Ile Val Asp Gln Asp Gly Lys Leu Ile Asp Ala

885 890 895

Glu Asp Ala Thr Gly Ser Phe Asn Thr Ile Gly Ser Thr Asp Tyr His

900 905 910

Arg Leu Leu Glu Glu Lys Ala Lys Asp Arg Glu Lys Glu Arg Lys Asp

915 920 925

Trp Asp Leu Ile Arg Gly Ile Lys Asp Leu Lys Lys Gly Tyr Ile Ser

930 935 940

Leu Val Val Arg Lys Ile Ala Asp Leu Ala Ile Lys Tyr Asn Ala Ile

945 950 955 960

Ile Ile Phe Glu Asp Leu Asn Thr Arg Phe Lys Gln Ile Arg Gly Gly

965 970 975

Met Glu Lys Ser Val Tyr Gln Gln Leu Glu Lys Ala Leu Ile Asn Lys

980 985 990

Leu Ser Phe Leu Val Asn Lys Gly Glu Lys Asp Pro Glu Gln Ala Gly

995 1000 1005

His Leu Leu Lys Ala Tyr Gln Leu Ala Ala Pro Phe Gln Thr Phe

1010 1015 1020

Asp Lys Met Gly Arg Gln Thr Gly Ile Ile Phe Tyr Thr Gln Ala

1025 1030 1035

Ser Tyr Thr Ser Lys Ile Asp Pro Ile Thr Gly Trp Arg Pro Asn

1040 1045 1050

Leu Tyr Leu Lys Tyr Arg Asn Ile Asp Asp Ser Lys Glu Ser Ile

1055 1060 1065

Lys Lys Phe Lys Ser Ile Leu Phe Asn Lys Glu Lys Asn Arg Phe

1070 1075 1080

Glu Phe Thr Tyr Asp Leu Lys Asp Phe Val Asp Phe Glu Glu Asp

1085 1090 1095

Lys Ile Pro Glu Lys Thr Glu Trp Thr Leu Cys Ser Ser Val Glu

1100 1105 1110

Arg His Lys Trp Asn Arg His Met Asn Asn Asn Lys Gly Gly Tyr

1115 1120 1125

Glu Val Tyr Lys Asp Leu Thr Glu Asn Phe Tyr Lys Leu Phe Asp

1130 1135 1140

Glu Asn Asn Ile Ser Met Asn Lys Asp Ile Val Asp Gln Val Glu

1145 1150 1155

Ser Ile Ser Asn Gly Asn Phe Phe Arg Gln Phe Ile Tyr Leu Phe

1160 1165 1170

Asn Leu Val Cys Gln Ile Arg Asn Thr Asp Glu Lys Ala Glu Asp

1175 1180 1185

Val Asp Lys Arg Asp Phe Ile Leu Ser Pro Val Glu Pro Phe Phe

1190 1195 1200

Asp Ser Arg Arg Ala Lys Asp Phe Lys Ala Tyr Gly Asp Asn Leu

1205 1210 1215

Pro Lys Asn Gly Asp Glu Asn Gly Ala Tyr Asn Ile Ala Arg Lys

1220 1225 1230

Gly Val Leu Ile Ile Lys Lys Ile Lys Glu Tyr Tyr Asn Gln Asn

1235 1240 1245

Gly Ser Cys Asp Lys Leu Gly Trp Gly Asp Leu Ser Ile Ser His

1250 1255 1260

Lys Glu Trp Asp Asp Phe Ala Thr Asn Asn

1265 1270

<210> 8

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf4

<400> 8

guuuagaagc augcuuuaau uucuacuguu guagau 36

<210> 9

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf1

<400> 9

gucuaaaccu caaugaaaau uucuacuguu guagau 36

<210> 10

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf3

<400> 10

gucuauaaga cuauuauaau uucuacuauu guagau 36

<210> 11

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf6

<400> 11

gucuaaaggu auuauaaaau uucuacuauu guagau 36

<210> 12

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf8

<400> 12

gucuaaaggc cuuauauaau uucuacuuuu guagau 36

<210> 13

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf9

<400> 13

guuuaagacc uccuuuuaau uucuacuguu guagau 36

<210> 14

<211> 36

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf10

<400> 14

cucaauuccu uacaauagau uucuacuuuu guagau 36

<210> 15

<211> 709

<212> DNA

<213> Artificial Sequence

<220>

<223> PCR

<400> 15

tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca 60

cagcttgtct gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg 120

ttggcgggtg tcggggctgg cttaactatg cggcatcaga gcagattgta ctgagagtgc 180

accatatgcg gtgtgaaata ccgcacagat gcgtaaggag aaaataccgc atcaggcgcc 240

attcgccatt caggctgcgc aactgttggg aagggcgatc ggtgcgggcc tcttcgctat 300

tacgccagct ggcgaaaggg ggatgtgctg caaggcgatt aagttgggta acgccagggt 360

tttcccagtc acgacgttgt aaaacgacgg ccagtgaatt cgagctcggt accttcggta 420

taacaacttc gacgagctct acaaagcttg gcgtaatcat ggtcatagct gtttcctgtg 480

tgaaattgtt atccgctcac aattccacac aacatacgag ccggaagcat aaagtgtaaa 540

gcctggggtg cctaatgagt gagctaactc acattaattg cgttgcgctc actgcccgct 600

ttccagtcgg gaaacctgtc gtgccagctg cattaatgaa tcggccaacg cgcggggaga 660

ggcggtttgc gtattgggcg ctcttccgct tcctcgctca ctgactcgc 709

<210> 16

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf4

<400> 16

aauuucuacu guuguagau 19

<210> 17

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf1

<400> 17

aauuucuacu guuguagau 19

<210> 18

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf3

<400> 18

aauuucuacu auuguagau 19

<210> 19

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf6

<400> 19

aauuucuacu auuguagau 19

<210> 20

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf8

<400> 20

aauuucuacu uuuguagau 19

<210> 21

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf9

<400> 21

aauuucuacu guuguagau 19

<210> 22

<211> 19

<212> RNA

<213> Artificial Sequence

<220>

<223> Cas-sf10

<400> 22

gauuucuacu uuuguagau 19

Claims

1. A Cas protein, characterized in that the Cas protein is any one of the following I-III:

I. the amino acid sequence of the Cas protein has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of the amino acid sequences of SEQ ID nos. 1-7, and substantially retains the biological function of the sequence from which it is derived;

II. The amino acid sequence of the Cas protein has a sequence with one or more amino acid substitutions, deletions or additions compared with any one of the amino acid sequences of SEQ ID Nos. 1-7, and basically retains the biological function of the sequence from which it is derived;

III, the Cas protein comprises an amino acid sequence shown in any one of SEQ ID No. 1-7.

2. A fusion protein comprising the Cas protein of claim 1 and other modifying moieties.

3. An isolated polynucleotide, wherein the polynucleotide is a polynucleotide sequence encoding a Cas protein of claim 1, or a polynucleotide sequence encoding a fusion protein of claim 2.

4. A gRNA comprising a direct repeat sequence capable of binding the Cas protein of claim 1 and a guide sequence capable of targeting a target sequence.

5. An direct repeat comprising a sequence as set forth in any one of SEQ ID Nos. 8 to 14 or 16 to 22.

6. A vector comprising the polynucleotide of claim 3 operably linked to a regulatory element.

7. A CRISPR-Cas system, comprising a Cas protein of claim 1 and at least one gRNA of claim 4.

8. A vector system, wherein the vector system comprises one or more vectors comprising:

a) a first regulatory element operably linked to the gRNA of claim 4,

b) a second regulatory element operably linked to the Cas protein of claim 1;

9. A composition, characterized in that the composition comprises:

(i) a protein component selected from: a Cas protein according to claim 1 or a fusion protein according to claim 2;

(ii) a nucleic acid component selected from the group consisting of: the gRNA of claim 4, or a nucleic acid encoding the gRNA of claim 4, or a precursor RNA of the gRNA of claim 4, or a precursor RNA nucleic acid encoding the gRNA of claim 4;

10. An activated CRISPR complex comprising:

(iii) a target sequence that binds on a gRNA of claim 4.

11. An engineered host cell comprising the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of claim 3, or the vector of claim 6, or the CRISPR-Cas system of claim 7, or the vector system of claim 8, or the composition of claim 9, or the activated CRISPR complex of claim 10.

12. Use of a Cas protein of claim 1, or a fusion protein of claim 2, or a polynucleotide of claim 3, or a vector of claim 6, or a CRISPR-Cas system of claim 7, or a vector system of claim 8, or a composition of claim 9, or an activated CRISPR complex of claim 10, or a host cell of claim 11 in gene editing, gene targeting, or gene cleavage; alternatively, use in the manufacture of a reagent or kit for gene editing, gene targeting or gene cleavage.

13. Use of a Cas protein of claim 1, or a fusion protein of claim 2, or a polynucleotide of claim 3, or a vector of claim 6, or a CRISPR-Cas system of claim 7, or a vector system of claim 8, or a composition of claim 9, or an activated CRISPR complex of claim 10, or a host cell of claim 11 in a cell selected from any one or any of:

targeting and/or editing a target nucleic acid; cleaving double-stranded DNA, single-stranded DNA, or single-stranded RNA; non-specifically cleaving and/or degrading the nucleic acid of the collateral branch; non-specifically cleaving single-stranded nucleic acids; detecting nucleic acid; specifically editing double-stranded nucleic acids; base-editing double-stranded nucleic acids; base-editing single-stranded nucleic acids.

14. A method of editing, targeting or cleaving a target nucleic acid, the method comprising contacting the target nucleic acid with the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of claim 3, or the vector of claim 6, or the CRISPR-Cas system of claim 7, or the vector system of claim 8, or the composition of claim 9, or the activated CRISPR complex of claim 10, or the host cell of claim 11.

15. A method of cleaving single-stranded nucleic acid, the method comprising contacting a nucleic acid population with the Cas protein of claim 1 and the gRNA of claim 4, wherein the nucleic acid population comprises a target nucleic acid and at least one non-target single-stranded nucleic acid, the gRNA being capable of targeting the target nucleic acid, the Cas protein cleaving the non-target single-stranded nucleic acid.

16. A kit for gene editing, gene targeting or gene cleavage comprising the Cas protein of claim 1, or the fusion protein of claim 2, or the polynucleotide of claim 3, or the vector of claim 6, or the CRISPR-Cas system of claim 7, or the vector system of claim 8, or the composition of claim 9, or the activated CRISPR complex of claim 10, or the host cell of claim 11.

17. A kit for detecting a target nucleic acid in a sample, the kit comprising: (a) the Cas protein of claim 1, or a nucleic acid encoding the Cas protein; (b) the gRNA of claim 4, or a nucleic acid encoding the gRNA, or a precursor RNA comprising the gRNA, or a nucleic acid encoding the precursor RNA; and (c) a single-stranded nucleic acid detector that is single-stranded and does not hybridize to the gRNA.

18. Use of a Cas protein of claim 1, or a fusion protein of claim 2, or a polynucleotide of claim 3, or a vector of claim 6, or a CRISPR-Cas system of claim 7, or a vector system of claim 8, or a composition of claim 9, or an activated CRISPR complex of claim 10, or a host cell of claim 11 in the preparation of a formulation or kit for:

(i) gene or genome editing;

(ii) target nucleic acid detection and/or diagnosis;

(iv) treatment of diseases;

(v) targeting a target gene;

(vi) cutting the target gene.

19. A method of detecting a target nucleic acid in a sample, the method comprising contacting the sample with a Cas protein of claim 1, a gRNA (guide RNA) comprising a region that binds to the Cas protein and a guide sequence that hybridizes to the target nucleic acid, and a single-stranded nucleic acid detector; detecting a detectable signal generated by the Cas protein-cleaved single-stranded nucleic acid detector, thereby detecting a target nucleic acid; the single-stranded nucleic acid detector does not hybridize to the gRNA.