CN114990104A

CN114990104A - Modified sgRNA molecules and uses thereof

Info

Publication number: CN114990104A
Application number: CN202210539746.4A
Authority: CN
Inventors: 梁峻彬; 梁兴祥; 徐辉
Original assignee: Guangzhou Ruifeng Biotechnology Co ltd
Current assignee: Guangzhou Ruifeng Biotechnology Co ltd
Priority date: 2021-11-15
Filing date: 2022-05-18
Publication date: 2022-09-02
Anticipated expiration: 2042-05-18
Also published as: CN114990104B

Abstract

The invention discloses a modified sgRNA molecule and application thereof. The modified sgRNA molecule is obtained by introducing a sequence with a function of initiating nuclear localization in a snoRNA molecule into a nucleotide chain of the sgRNA. The invention develops a novel CRISPR-Cas system, and the modified sgRNA can guide the Cas protein into the nucleus for effective gene editing.

Description

Modified sgRNA molecule and application thereof

Technical Field

The invention relates to the technical field of biology, in particular to a modified sgRNA molecule and application thereof.

Background

The CRISPR/Cas system, an acquired immune system currently found in most bacteria and most archaea, can recognize and eliminate foreign plasmids or phages and leave foreign gene fragments in the self-genome as immunological memory. Naturally occurring CRISPR-Cas systems fall into two broad categories: class 1, the use of polyprotein complexes for nucleic acid cleavage; class 2, cleavage is performed using single protein effector domains. Due to the advantages offered by single protein effector domains, class 2 systems are the most widespread CRISPR tool for biological research and translation applications. Class 2 is further subdivided into three types II, V and VI, each using a different type of Cas protein. Among Cas proteins from class 2 systems, certain type II Cas9 and type V Cas12 have RNA-guided DNA endonuclease activity, while type VI Cas13 appears to show preferential RNA targeting and cleavage activity.

Among these, Cas9 and Cas12 effectors from class 2 CRISPR systems are RNA-guided endonucleases that can produce DSBs in a target DNA sequence. The CRISPR/Cas system mainly comprises a Cas protein and a single-stranded guide RNA (sgRNA), wherein the Cas protein has a function of cutting a DNA double strand, the sgRNA plays a guiding role, and the Cas protein can reach different target positions through base complementary pairing under the guidance of the sgRNA and cut a target gene to accurately edit the gene at a fixed point.

At present, the CRISPR/Cas system is the most popular and best used gene editing system because of its simple operation and accurate editing of nucleic acid sequences. However, Cas, as a nucleic acid editing technology, needs to enter into the cell nucleus to be combined with chromosomal DNA to function, and the currently mainstream nuclear entry method is to fuse Cas protein and nuclear localization signal (i.e., nuclear localization sequence, NLS), form a complex of Cas with the nuclear localization signal and sgRNA, and then interact nuclear localization signal and nuclear entry vector, so that Cas protein can be transported into the cell nucleus, thereby enabling Cas to function.

snoRNA (small nucleolar RNA) is a highly abundant small non-coding RNA in the nucleus, and the vast majority of snornas can be classified into two types: box C/D snorRNA and box H/ACA snorRNA, all have conserved characteristic secondary structures. The snornas have the function of directing specific nucleoside 2' -O-ribomethylation modifications and pseudouracil modifications in rRNA, snrnas or tRNA precursors. Various documents report that box C' and box D sequences in snornas play an important role in snoRNA RNP formation and interaction.

Disclosure of Invention

The invention aims to provide an improved sgRNA molecule and application thereof.

In a first aspect, the invention claims a method of engineering a sgRNA molecule.

The method for modifying the sgRNA molecule, which is claimed by the invention, can comprise the following steps: introducing a sequence with a function of initiating nuclear localization in a snoRNA molecule into a nucleotide chain of the sgRNA to obtain the modified sgRNA molecule.

In some cases, the engineered sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions as a nuclear localization in the snoRNA molecule. The framework sequence of the modified sgRNA molecule can be obtained by inserting or replacing a sequence with a function of nuclear localization in the snoRNA molecule into the framework sequence of the sgRNA molecule before modification.

In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule may be selected from the sequences shown below (a1) and/or (a 2):

(a1) box C' and/or box D sequences from snoRNA;

(a2) box H and/or box ACA sequences from snorRNAs.

Further, the sequence for nuclear localization function in the snoRNA molecule is box C' and/or box D sequence.

In some cases, the box C' sequence comprises or is a sequence DGAHBN, where D is U, G, or a; h is U, A or C; b is G, U or C; n may be any ribonucleotide.

In some cases, the box D sequence comprises the sequence NYVWGA or CUGA. Further, the box D sequence comprises the sequence NYVWGA or GGCUGA. Still further, the box D sequence is the sequence NYVWGA, GGCUGA or CUGA. Wherein N may be any ribonucleotide; y is C or U; v is C, G or A; w is U or A.

In some cases, the sequences that function in nuclear localization in the snoRNA molecule are box C ' and box D, and the box C ' sequence comprises GAGGAAGA or the box C ' sequence is GAGGAAGA, and the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.

In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA.

In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule is GAGGAAGAGCGUCAGCAGGCUGA.

In some cases, the snoRNA molecule is selected from any snoRNA comprising a box C' or box D sequence. In some cases, the snoRNA molecule is any selected from: u3snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA, and U24 to U63 snoRNA.

In some cases, the sequence inserted or substituted into the backbone sequence of the sgRNA molecule prior to engineering that functions as a nuclear localisation in the snoRNA molecule is a sequence inserted or substituted into the snoRNA molecule that functions as a nuclear localisation in the snoRNA molecule prior to engineering that is a non-complementary pairing sequence of the backbone sequence secondary structure of the sgRNA molecule (i.e. a portion that does not form intramolecular base complementary pairings). The secondary structure of the framework sequence can be predicted by a person skilled in the art according to a conventional calculation method, or can be determined according to a conventional experimental method. The complementary pairing may be a conventional A-U, C-G base complementary pairing, or may not include other less common base complementary pairing (e.g., G-U, A-A, A-C, A-G, G-G, U-U, U-C pairing).

Further, in some cases, the scaffold sequence secondary structure non-complementary pairing sequence is a loop (loop) sequence of the pre-engineered sgRNA molecule scaffold sequence secondary structure (i.e., a loop in the sgRNA scaffold sequence stem-loop structure).

The engineered sgRNA molecules of the invention may comprise engineered or unmodified crRNA sequences, as well as engineered or unmodified tracrRNA sequences. For clarity, reference is made to the non-limiting exemplary diagram shown in fig. 2 (the sequence comprising loop1 and loop2 in fig. 2 is the sgRNA backbone sequence corresponding to SpCas 9). The nuclear localization functional sequence may be linked (inserted or substituted) to a position in the sgRNA secondary structure where the crRNA sequence is chimeric (i.e., linked) to the tracrRNA sequence, such as the loop1 position in fig. 2; alternatively, the nuclear localization functional sequence may be attached (i.e., inserted or substituted) to the inside of the crRNA sequence of the sgRNA or to the inside of the tracrRNA sequence, non-limiting examples such as optionally to a loop (loop) formed by only the tracrRNA sequence in the secondary structure of the sgRNA, as in loop2 position in fig. 2. Upon insertion of the nuclear localization functional sequence, the number of nucleotides originally belonging to the sgRNA before the alteration is not reduced, e.g., the number of nucleotides originally belonging to that part of the loop is not reduced. After replacement into the nuclear localization functional sequence, the number of nucleotides originally belonging to the sgRNA prior to alteration is reduced, e.g., the number of nucleotides originally belonging to that portion of the loop is reduced (e.g., all nucleotides originally belonging to the loop are replaced by the nuclear localization functional sequence).

In some cases, a linker sequence (linker) may or may not be used when introducing a sequence that functions as a nuclear localization in the snoRNA molecule on the nucleotide chain of the sgRNA. The sequence that functions as a nuclear localization in the snoRNA molecule can be linked to the framework sequence of the pre-engineered sgRNA molecule by a linking sequence. Furthermore, one end of a sequence for initiating a nuclear localization function in the snoRNA molecule is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 1(Linker1), and the other end of the sequence is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 2(Linker2), wherein the connecting sequence 1 and the connecting sequence 2 can be the same or different. Still further, the nucleotide sequence of the Linker1 is shown as positions 38-42 of SEQ ID No.1 (ggcca); the nucleotide sequence of the Linker2 is shown as 66 th to 75 th positions of SEQ ID No.1 (cugcagggcc).

Further, when 2 or more than 2 sequences with nuclear localization function in the snoRNA molecule exist in the nucleotide chain of the modified sgRNA, the nuclear localization function sequences can be directly connected with each other or connected through a connecting sequence. When the sequence for performing a nuclear localization function in the snoRNA molecule introduced into the nucleotide chain of the sgRNA is two or more box sequences, different box sequences may be directly connected to each other or may be connected to each other through a Linker sequence (which may be referred to as Linker sequence 3[ Linker3 ]). The linker sequence here may be the same as or different from the linker sequence linking the nuclear localization functional sequence and the framework sequence of the sgRNA molecule before modification. That is, the linker3 may be the same as or different from the

linkers

1 and 2. Furthermore, the nucleotide sequence of the connecting sequence 3 is shown as the 51 st to 59 th positions of SEQ ID No.1 (GCGUCAGCA).

In a specific embodiment of the invention, the sequences that function as nuclear localization in the snoRNA molecule are box C' and box D in U3 snoRNA. Specifically, the box C' sequence in U3snoRNA is GAGGAAGA, and the box D sequence is GGCUGA/CUGA. The framework of the sgRNA is a sgRNA framework sequence corresponding to SpCas 9.

More specifically, the nucleotide sequence of the modified sgRNA molecule is any one of the following:

(b1) obtained by replacing the 1 st to 25 th positions of SEQ ID No.1 with a guide sequence for identifying a target nucleic acid,

wherein (b1) is the substitution of "box C 'and box D in U3 snoRNA" at the position of loop 1(loop1) of sgRNA backbone sequence corresponding to SpCas9, while the substitution of "box C' and box D in U3 snoRNA" at the position of loop 2(loop 2);

(b2) obtained by replacing the 1 st to 25 th positions of SEQ ID No.5 with a guide sequence for identifying the target nucleic acid,

wherein (b2) is the substitution of "box C' and box D in U3 snoRNA" at the position of loop 1(loop1) of sgRNA framework sequence corresponding to SpCas9, and the position of loop 2(loop2) is not modified;

(b3) obtained by replacing positions 1-25 of SEQ ID No.8 with a guide sequence for identifying the target nucleic acid,

wherein (b3) is the substitution of "box C' and box D in U3 snoRNA" at the position of loop 2(loop2) of sgRNA framework sequence corresponding to SpCas9, and the position of loop 1(loop1) is not modified.

For clarity, the lengths of the guide sequences described in (b1), (b2) and (b3) above may be varied moderately, and are not limited to a length of only 25 bp.

Wherein, positions 1-25 of SEQ ID No.1, positions 1-25 of SEQ ID No.5 and positions 1-25 of SEQ ID No.8 are guide sequences for identifying the target nucleic acid in the examples (GAACGGCUCGGAGAUCAUCAUUGCG).

In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the invention can be more than or equal to 10bp, more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp or more than or equal to 40bp, and can be less than or equal to 60bp, less than or equal to 50bp, less than or equal to 40bp, less than or equal to 30bp, less than or equal to 25bp, less than or equal to 20bp or less than or equal to 15 bp. Under certain conditions, the sequence length of the guide sequence of the modified sgRNA molecule can be 10bp-50bp, 10bp-40bp, 15bp-35bp, 15bp-30bp, 15bp-25bp, 17bp-24bp or 18bp-22bp, and can also be 20bp-35bp, 25bp-35bp or 28bp-32 bp.

In some cases, the sequence length of the backbone sequence of the modified sgRNA molecules of the invention (comprising the box sequence from snorRNA) may be equal to or greater than 15bp, equal to or greater than 20bp, equal to or greater than 25bp, equal to or greater than 30bp, equal to or greater than 40bp, equal to or greater than 50bp, equal to or greater than 60bp, equal to or greater than 70bp, equal to or greater than 80bp, equal to or greater than 90bp, equal to or greater than 100bp, equal to or greater than 110bp, equal to or greater than 120bp, equal to or greater than 130bp, equal to or greater than 140bp, equal to or greater than 150bp, equal to or greater than 160bp, equal to or greater than 170bp, equal to or greater than 180bp, equal to or greater than 200bp, equal to or greater than 210bp, equal to or greater than 250bp, equal to or greater than 300bp, equal to or greater than or equal to or greater than 150bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than or equal to or greater than 60bp, equal to or greater than 30bp, equal to or greater than 60bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than 200bp, equal to or greater than 200bp, equal to or equal to 100bp, equal to or equal to. In some cases, the sequence length of the backbone sequence (comprising box sequences from snornas) of the engineered sgRNA molecules described herein can be 10bp-300bp, 20bp-250bp, 30bp-240bp, 50bp-220bp, 80bp-200bp, or 100bp-180 bp.

In some cases, the modified sgRNA molecules of the invention may have a sequence length of ≥ 15bp, ≥ 20bp, ≥ 25bp, ≥ 30bp, ≥ 40bp, ≥ 50bp, ≥ 60bp, ≥ 70bp, ≥ 80bp, ≥ 90bp, ≥ 100bp, ≥ 110bp, ≥ 120bp, ≥ 130bp, ≥ 140bp, ≥ 150bp, ≥ 160bp, ≥ 170bp, ≥ 180bp, ≥ 190bp, ≥ 200bp, ≥ 210bp, ≥ 220, bp 250bp, ≥ 300bp or ≥ 350bp, or ≤ 350bp, ≤ 300bp, ≤ 250bp, ≤ 220bp, ≤ 210bp, ≤ 200bp, ≤ 190bp, ≤ 180bp, ≥ 170bp, ≤ 150bp, ≤ 300bp, ≤ 130bp, ≤ 100bp, ≤ 60bp, ≤ 100bp, or. Under certain conditions, the sequence length of the modified sgRNA molecule can be 10bp-300bp, 30bp-250bp, 50bp-240bp, 70bp-240bp, 90bp-220bp or 110bp-200 bp.

In some cases, the engineered sgRNA molecule can be used in conjunction with a Cas protein for gene targeting or modification, e.g., for eukaryotic cell gene targeting or modification, further for animal cell gene targeting or modification, and yet further for human cell gene targeting or modification. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence. In some cases, the Cas protein contains a nuclear localization sequence.

The engineered sgRNA molecule can direct the Cas protein to the nucleus, and further, can direct the Cas protein to a target nucleic acid within the nucleus. In some cases, the engineered sgRNA can direct Cas protein to the nucleus and target or modify the target nucleic acid. In some cases, the engineered sgRNA can form a complex with the Cas protein, direct the Cas protein to the nucleus, and target or modify the target nucleic acid. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence.

In some cases, the targeting the target nucleic acid consists of one or more of: cleaving one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids, transporting one or more target nucleic acids, masking one or more target nucleic acids, binding one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs. Further, in some cases, the targeting the target nucleic acid is binding to the target nucleic acid; in some cases, the targeting the target nucleic acid is cleaving the target nucleic acid.

In some cases, the modifying the target nucleic acid consists of one or more of: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of nucleic acids, demethylation of nucleic acids, and deamination of nucleic acids.

In some cases, the sgRNA includes at least one chemically modified nucleotide, non-limiting examples of which include 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-moe), 2 '-fluoro (2' -F), phosphorothioate (P ═ S) bond modifications between nucleotides. The chemical modification can be located on any number of nucleotides at any position. In some cases, the sgRNA comprises a modification at the 5 'end and/or the 3' end.

In some cases, the engineered sgRNA molecule can additionally add any number of nucleotides for modification, non-limiting examples such as 2 additional guanine nucleotides at the end of the sgRNA guide sequence in patent publication No. CN 104968784B.

In some cases, the Cas protein is selected from Cas9, Cas12, and Cas 13. In some cases, the Cas protein is selected from Cas9, Cas 12. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 without Cas endonuclease activity, including but not limited to a completely inactivated dead Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickases (nickases) with only single strand cleavage function, e.g., Cas9 nickases (Cas9 nickase, nCas9), Cas12 nickases. In some cases, the Cas9 is selected from SpCas9 (streptococcus pyogenes Cas 9).

It is understood that a corresponding method for modifying sgRNA molecules is within the scope of the present invention, as long as the sgRNA molecules modified to include sequences that serve the nuclear localization function in the snoRNA molecule are capable of directing any one specific Cas protein (without the nuclear localization sequence) to the nucleus.

In a second aspect, the invention claims an engineered sgRNA molecule.

In some cases, the engineered sgRNA molecule is prepared by the method described above in the first aspect.

In some cases, the engineered sgRNA molecule comprises a sequence on the nucleotide chain that functions as a nuclear localization in the snoRNA molecule. Further, in some cases, the engineered sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions in nuclear localization in the snoRNA molecule. The backbone sequence of the modified sgRNA molecule can be obtained by inserting or replacing a sequence with a nuclear localization function in the snoRNA molecule into the backbone sequence of the pre-modified sgRNA molecule.

(a1) box C' and/or box D sequences from snoRNA;

(a2) box H and/or box ACA sequences from snoRNA.

Further, the sequence for nuclear localization function in the snoRNA molecule is box C' sequence and/or box D sequence.

In some cases, the sequences that function in nuclear localization in the snoRNA molecule are box C ' and box D, and the box C ' sequence comprises GAGGAAGA or the box C ' sequence is GAGGAAGA, the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.

In some cases, a linker sequence (linker) may or may not be used when introducing a sequence that functions as a nuclear localization in the snoRNA molecule on the nucleotide chain of the sgRNA. The sequence that functions as a nuclear localization in the snoRNA molecule can be linked to the framework sequence of the pre-engineered sgRNA molecule by a linking sequence. Furthermore, one end of a sequence for initiating a nuclear localization function in the snoRNA molecule is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 1(Linker1), and the other end of the sequence is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 2(Linker2), wherein the connecting sequence 1 and the connecting sequence 2 can be the same or different. Still further, the nucleotide sequence of the Linker1 is shown as the 38 th to 42 th positions of SEQ ID No.1 (ggcca); the nucleotide sequence of the Linker2 is shown as 66 th to 75 th positions of SEQ ID No.1 (cugcagggcc).

Further, when 2 or more than 2 sequences with nuclear localization function in the snoRNA molecule exist in the nucleotide chain of the modified sgRNA, the nuclear localization function sequences can be directly connected with each other or connected through a connecting sequence. When the sequence for initiating the nuclear localization function in the snoRNA molecule introduced into the nucleotide chain of the sgRNA is two or more box sequences, different box sequences may be directly connected to each other or may be connected to each other through a Linker sequence (which may be referred to as Linker sequence 3[ Linker3 ]). The linker sequence here may be the same as or different from the linker sequence linking the nuclear localization functional sequence and the framework sequence of the sgRNA molecule before modification. That is, the linker3 may be the same as or different from the linker1 and the linker 2. Furthermore, the nucleotide sequence of the connecting sequence 3 is shown as the 51 st to 59 th positions of SEQ ID No.1 (GCGUCAGCA).

In a particular embodiment of the invention, the sequences that function in nuclear localization in the snoRNA molecule are box C' and box D in the U3 snoRNA. Specifically, the box C' sequence in U3snoRNA is GAGGAAGA, and the box D sequence is GGCUGA/CUGA. The framework of the sgRNA is a sgRNA framework sequence corresponding to SpCas 9.

(b3) obtained by replacing the 1 st to 25 th positions of SEQ ID No.8 with a guide sequence for identifying the target nucleic acid,

Wherein positions 1-25 of SEQ ID No.1, positions 1-25 of SEQ ID No.5 and positions 1-25 of SEQ ID No.8 are guide sequences for identifying a target nucleic acid in the examples (GAACGGCUCGGAGAUCAUCAUUGCG).

In some cases, the Cas protein is selected from Cas9, Cas12, and Cas 13. In some cases, the Cas protein is selected from Cas9, Cas 12. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 without Cas endonuclease activity, including but not limited to a completely inactivated dead Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickases (nickases) with single strand cleavage function only, e.g., Cas9 nickase (Cas9 nickase, nCas9), Cas12 nickase. In some cases, the Cas9 is selected from SpCas9 (streptococcus pyogenes Cas 9).

It is understood that such engineered sgRNA molecules (comprising sequences that serve a nuclear localization function in a snoRNA molecule) are within the scope of the present invention, provided that any one particular Cas protein (not comprising a nuclear localization sequence) can be directed to the nucleus of the cell.

In a third aspect, the invention claims a DNA molecule encoding the engineered sgRNA molecule of the second aspect.

In a particular embodiment of the invention, the DNA molecule is any one of:

(c1) replacing 1 st-25 th position of SEQ ID No.2 with DNA sequence corresponding to the guide sequence to obtain the product;

(c2) replacing the 1 st to 25 th sites of SEQ ID No.6 with a DNA sequence corresponding to the guide sequence to obtain the DNA sequence;

(c3) replacing 1 st-25 th position of SEQ ID No.9 with DNA sequence corresponding to the guide sequence to obtain the product;

wherein (c1) - (c3) correspond to (b1) - (b3) above in sequence.

Wherein, the 1 st to 25 th positions of SEQ ID No.2, the 1 st to 25 th positions of SEQ ID No.6 and the 1 st to 25 th positions of SEQ ID No.9 are DNA sequences corresponding to the guide sequences for identifying the target nucleic acid in the examples (GAACGGCTCGGAGATCATCATTGCG).

In a fourth aspect, the invention claims an expression cassette, an expression vector, a recombinant bacterium or a transgenic cell line comprising a DNA molecule as described in the third aspect above.

The expression vector may comprise any regulatory element operably linked to the DNA molecule. In some cases, the regulatory element is a promoter and/or enhancer. In some cases, the regulatory element is a promoter.

In a particular embodiment of the invention, the promoter in the expression cassette that initiates transcription of the DNA molecule is the U6 promoter.

More specifically, the expression cassette is any one of:

(d1) obtained by replacing the 250 th-274 th position of SEQ ID No.3 with a DNA sequence corresponding to a guide sequence;

(d2) obtained by replacing the 250 nd-274 nd position of the SEQ ID No.7 with a DNA sequence corresponding to the guide sequence;

(d3) obtained by replacing the 250 th-274 th position of SEQ ID No.10 with a DNA sequence corresponding to a guide sequence;

wherein (d1) - (d3) correspond to the above (c1) - (c3) in sequence.

Wherein the positions 250-274 of SEQ ID No.3, 250-274 of SEQ ID No.7 and 250-274 of SEQ ID No.10 are DNA sequences corresponding to the guide sequences for identifying the target nucleic acid in the examples (GAACGGCTCGGAGATCATCATTGCG).

Accordingly, the expression vector may be an expression vector comprising an expression cassette as described hereinbefore.

In a specific embodiment of the present invention, the expression vector is a recombinant vector obtained by replacing a small fragment between the cleavage sites Kpn I and Not I of the pX601 vector with the expression cassette described above.

In a fifth aspect, the invention claims a kit.

The kit claimed in the present invention may comprise any one of the following:

i. a Cas protein, and an engineered sgRNA molecule as described in the second aspect, supra.

ii. An expression vector comprising a nucleotide sequence encoding a Cas protein (denoted as expression vector 1), and an engineered sgRNA molecule as described in the second aspect above.

iii, a Cas protein, and an expression vector comprising a nucleotide sequence encoding the engineered sgRNA molecule described above in the second aspect (denoted as expression vector 2).

iv, an expression vector comprising a nucleotide sequence encoding a Cas protein (i.e., expression vector 1), and an expression vector comprising a nucleotide sequence encoding an engineered sgRNA molecule as described above in the second aspect (i.e., expression vector 2).

v, an expression vector comprising a nucleotide sequence encoding a Cas protein and a nucleotide sequence encoding the engineered sgRNA molecule described in the second aspect above (denoted as expression vector 3).

In some cases, the Cas protein does not contain a nuclear localization sequence.

In some cases, the Cas protein contains a nuclear localization sequence.

In a sixth aspect, the invention claims a composition selected from any one of:

I. a composition comprising: a Cas protein, and an engineered sgRNA molecule as described above in the second aspect;

II. A composition comprising: a nucleic acid molecule 1 encoding a Cas protein (expression vector 1 as described in the fifth aspect above), and an engineered sgRNA molecule as described in the second aspect above;

III, a composition comprising: a Cas protein, and a nucleic acid molecule 2 encoding an engineered sgRNA molecule as described in the second aspect above (expression vector 2 as described in the fifth aspect above);

IV, a composition comprising: a nucleic acid molecule 1 encoding a Cas protein (expression vector 1 as described in the fifth aspect above), and a nucleic acid molecule 2 encoding an engineered sgRNA molecule as described in the second aspect above (expression vector 2 as described in the fifth aspect above);

v, a composition comprising: a nucleic acid molecule 3 encoding a Cas protein and the engineered sgRNA molecule described in the second aspect above (expression vector 3 as described in the fifth aspect above).

In some cases, the Cas protein contains a nuclear localization sequence.

In a seventh aspect, the invention claims an RNP complex formed by a Cas protein and an engineered sgRNA molecule as described in the second aspect above.

In some cases, the Cas protein contains a nuclear localization sequence.

In the above fifth to seventh aspects, the Cas protein may be selected from: cas9, Cas12, and Cas 13.

In some cases, the Cas protein is selected from Cas9, Cas 12. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 without Cas endonuclease activity, including but not limited to a dead Cas protein (dead Cas protein) that is completely inactivated. In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickases (nickases) with only single strand cleavage function, e.g., Cas9 nickases (Cas9 nickase, nCas9), Cas12 nickases.

In a specific embodiment of the invention, the Cas protein is streptococcus pyogenes Cas9(SpCas 9).

In v of the above fifth aspect, the expression vector 3 can be a recombinant vector obtained by inserting a coding gene of a Cas protein (selected from Cas9, Cas12 or Cas 13; specifically SpCas9) which does not contain a Nuclear Localization Signal (NLS) into the expression vector of the above fourth aspect.

In a specific embodiment of the present invention, the expression vector 3 is any one of:

(e1) the complete sequence is obtained by replacing the 5365-5389 th site of SEQ ID No.4 with a DNA sequence corresponding to the guide sequence;

(e2) the complete sequence is obtained by replacing the 5365-5389 th site of SEQ ID No.11 with a DNA sequence corresponding to the guide sequence;

(e3) the complete sequence is obtained by replacing the 5365-5389 th position of SEQ ID No.12 with a DNA sequence corresponding to the guide sequence.

Wherein (e1) - (e3) correspond to the foregoing (d1) - (d3) in sequence.

Wherein, the 5365-5389 th positions of SEQ ID No.4, 5365-5389 th positions of SEQ ID No.11 and 5365-5389 th positions of SEQ ID No.12 are DNA sequences corresponding to the guide sequences for identifying the target nucleic acid in the examples (GAACGGCTCGGAGATCATCATTGCG).

In an eighth aspect, the invention claims the use of an engineered sgRNA molecule as described in the second aspect above, a DNA molecule as described in the third aspect above, an expression cassette, an expression vector, a recombinant bacterium or a transgenic cell line as described in the fourth aspect above, a kit as described in the fifth aspect above, a composition as described in the sixth aspect above or an RNP complex as described in the seventh aspect above in any one of:

p1, targeting or modifying a genomic target nucleic acid; and

p2, products prepared for targeting or modifying genomic target nucleic acids.

Wherein the targeting or modification of the genomic target nucleic acid may be: the gene targeting or modification method is used for eukaryotic cell gene targeting or modification, further can be used for animal cell gene targeting or modification, and further can be used for human cell gene targeting or modification.

In some cases, the product for targeting or modifying a genomic target nucleic acid is a medicament for treating a disease in an animal, including but not limited to a human subject.

In a ninth aspect, the invention claims a method of targeting or modifying a genomic target nucleic acid.

The presently claimed methods of targeting or modifying a genomic target nucleic acid can comprise: and (3) introducing the composition described in the sixth aspect into an organism or an organism cell, so that both the Cas protein and the modified sgRNA molecule are expressed, and the target or modification of the genome target nucleic acid is realized.

In a tenth aspect, the invention claims a method for preparing a mutant of a biological cell.

The claimed method for preparing a mutant of a biological cell may comprise: the genome of the biological cell is targeted or modified according to the method of the ninth aspect to obtain a mutant of the biological cell.

Wherein, the biological cell can be eukaryotic cell, further can be animal cell, and further can be human cell.

In a specific embodiment of the invention, the biological cell is a 293T cell.

In an eleventh aspect, the invention claims a method of making a biological mutant.

The claimed method for preparing a mutant of a biological cell may comprise: the genome of the organism is targeted or modified according to the method described in the ninth aspect above to obtain the biological mutant.

Wherein the organism may be a eukaryote, further an animal, further a mammal, such as a human.

The invention has the beneficial effects that:

1. in the prior art, the Cas protein is often linked with a nuclear localization sequence in practical application to help the Cas protein to be localized to the nucleus. The invention develops another novel CRISPR-Cas system, is a brand new technical scheme, and can effectively complete gene editing. Without being bound by theory, one skilled in the art can reasonably speculate that the sgrnas of the present invention form a complex with a Cas protein and subsequently enter the nucleus through interaction of the nuclear localization functional sequence from the snoRNA with the associated protein. Theoretically, it can be speculated that when the Cas protein is not connected with a Nuclear Localization Sequence (NLS), the Cas protein can be transported into the nucleus through the nuclear localization effect of the sgRNA connected with a snoRNA nuclear localization functional sequence, so that the nuclear entry of the Cas9 can be reduced, and the off-target effect can be reduced.

2. Introduction of a C'/D box sequence in the loop (loop) portion of the sgRNA molecular backbone can effectively guide Cas proteins into the nucleus for gene editing.

3. The editing activity is relatively low when a plurality of loop (loop) parts of the sgRNA molecular skeleton are introduced into 2 or more C '/D box sequences in total, and the editing activity is higher when only 1C '/D box sequence is introduced (namely, only 1C ' box and 1D box sequence are introduced). This is just the opposite of the case of gene editing by relying on Cas protein linked with NLS to enter the nucleus (in the practical application scenario, the more nuclear localization sequence NLS, the higher the editing efficiency is). Therefore, the corresponding technical scheme achieves unexpected technical effects.

4. In the case of sgrnas with only one C '/D box, the editing efficiency was higher when the C'/D box was attached at the loop2 position (distal end of the guide sequence) as shown in fig. 2 than when it was attached at the loop1 position (proximal end of the guide sequence).

Drawings

FIG. 1 is an exemplary box C'/D sequence of U3 snorRNA molecules. Derived from the documents Narayanan A, Speckmann W, Terns R, Terns MP.role of the box C/D motif in localization of small nucleolar RNAs to linked boxes and nucleoli. mol Biol cell.9 199Jul; 10(7) 2131-47.doi 10.1091/mbc 10.7.2131.PMID 10397754; PMCID PMC25425.

Fig. 2 shows the molecular structure of sgRNA corresponding to SpCas9 and containing a specific framework sequence, in which the crRNA sequence and tracrRNA sequence of sgRNA before modification are shown. And shows 2 of the numerous sites at which the snoRNA nuclear localization functional sequence can be inserted/substituted (loop1 and loop 2). Loop1 is the junction site of the crRNA and tracrRNA, which is immediately adjacent to stem 1 formed by base-complementary pairing, where a nuclear localization functional sequence may be attached. The loop (loop)2 is located within the tracrRNA sequence immediately adjacent to the stem 2 formed by base complementary pairing where a nuclear localisation function may be attached. The crRNA contains a guide sequence, where N at the guide sequence represents any ribonucleotide, and the ellipses indicate that the number of ribonucleotides in the guide sequence may vary as appropriate.

FIG. 3 is a carrier schematic diagram of the target carrier C'/D box-PAM.

FIG. 4 is a control vector SpCas9-PAM plasmid map.

FIG. 5 is a plasmid map of the lentiviral vector pGFPPAM.

FIG. 6 shows the results of measuring the proportion of GFP-positive cells by flow cytometry in example 1.

FIG. 7 is a schematic carrier diagram of the target carrier C'/D box-1-1.

FIG. 8 is a schematic carrier diagram of the object carrier C'/D box-1-2.

FIG. 9 shows the results of flow cytometry for detecting the proportion of GFP-positive cells in example 2.

Detailed Description

Defining:

as used herein, the term "Cas protein", or a protein or polypeptide having "Cas enzymatic activity" or "Cas endonuclease activity", relates to a CRISPR-associated (Cas) polypeptide or protein encoded by a CRISPR-associated (Cas) gene, which Cas protein or polypeptide is capable of being directed to a target sequence in a target nucleic acid and targeting or modifying the target nucleic acid when complexed or functionally combined with one or more guide RNAs (guide RNA, sgRNA molecules). By sgRNA guidance, the Cas endonuclease recognizes, targets, or modifies a specific target site (target sequence or nucleotide sequence near the target sequence) in the target nucleic acid.

As used herein, the term "sgRNA" (single-stranded guide RNA) refers to a single guide RNA used together with a Cas protein. The sgRNA is a fusion of crRNA and tracrRNA, and comprises a guide sequence; or the sgRNA comprises a crRNA sequence and a guide sequence, and does not comprise a tracrRNA sequence.

As used herein, the term "guide sequence" is used interchangeably with "targeting domain" and refers to a contiguous nucleotide sequence in a sgRNA that has partial or complete complementarity to a target sequence in a target nucleic acid and can hybridize to the target sequence in the target nucleic acid through base-complementary pairing facilitated by the Cas protein. Complete complementarity of the guide sequences described herein to the target sequence is not required, so long as there is sufficient complementarity to cause hybridization and promote formation of a CRISPR/Cas complex.

As used herein, the term "framework sequence" when referring to a sgRNA is intended to mean other nucleotide sequences in the sgRNA in addition to the guide sequence. For example, sequences between the guide sequence and the poly-U corresponding to the transcription terminator in the sgRNA can be included. The backbone sequence will generally not change due to changes in the target sequence. Thus, the backbone sequence may be any feasible sequence.

As used herein, the term "target nucleic acid" may comprise any polynucleotide, such as DNA (target DNA) or RNA (target RNA). By "target nucleic acid" is meant a nucleic acid that the sgRNA directs the Cas protein to target or modify. The term "target nucleic acid" can be any polynucleotide endogenous or exogenous to a cell (e.g., a eukaryotic cell). For example, a "target nucleic acid" can be a polynucleotide present in a eukaryotic cell, and can be a sequence (or portion thereof) that encodes a gene product (e.g., a protein) or a non-coding sequence (or portion thereof). In certain instances, a "target nucleic acid" can include one or more disease-associated genes and polynucleotides as well as signaling biochemical pathway-associated genes and polynucleotides.

As used herein, the term "target sequence" refers to a short piece of nucleotide sequence in a target nucleic acid molecule that is complementary (fully or partially complementary) or hybridizes to a guide sequence of a sgRNA molecule. The target sequence is often tens of bp in length, and may be, for example, about 10bp, about 20bp, about 30bp, about 40bp, about 50bp, about 60 bp.

As used herein, the term "targeted" is defined as consisting of one or more of the following: cleaving one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids, transporting one or more target nucleic acids, masking one or more target nucleic acids, binding one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs.

As used herein, the term "modified" is defined as consisting of one or more of the following: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of nucleic acids, demethylation of nucleic acids, and deamination of nucleic acids.

As used herein, the term "cleavage" (cleavage) refers to the breaking of a covalent bond (e.g., a covalent phosphodiester bond) in the ribosyl phosphodiester backbone of a polynucleotide, including but not limited to: the single-stranded polynucleotide is cleaved to cleave either single strand of the double-stranded polynucleotide comprising two complementary single strands, and both single strands of the double-stranded polynucleotide comprising two complementary single strands are cleaved.

As used herein, when referring to "a sequence that functions as a nuclear localisation function in a snoRNA molecule" it is intended to mean a nucleotide sequence/element of the snoRNA molecule that plays an important role in a nuclear localisation function, in particular a nucleotide sequence that plays an important role in a nucleolar localisation function, non-limiting examples include C' box, D box.

As used herein, the terms C 'box and box C' are used interchangeably; the terms D box and box D are used interchangeably.

As used herein, the term "nuclear localization signal" or Nuclear Localization Sequence (NLS) is a sequence of amino acids that serves as a tag for the transport of proteins through the nucleus and into the nucleus.

As used herein, the term "snoRNA" (Small nucleolar RNA) is a large class of eukaryotic RNAs that play a role in the biogenesis of ribosomes within the nucleoli.

As used herein, the term "engineered" when referring to a sgRNA includes altering the nucleotide sequence of the sgRNA resulting in an engineered sgRNA molecule.

As used herein, the term "non-complementary pairing sequence" when referring to the backbone sequence secondary structure of a sgRNA molecule refers to nucleotides in the backbone sequence secondary structure of the sgRNA that do not form intramolecular base-complementary pairings. The secondary structure of the framework sequence is predicted by a person skilled in the art according to a conventional calculation method, or determined according to a conventional experimental method.

As used herein, the terms "crRNA" and "tracrRNA" have the meanings that are commonly recognized by those skilled in the art, respectively.

As used herein, when referring to sgrnas, "loop" (loop) has a meaning commonly understood by those skilled in the art, and often refers to a loop in the stem-loop structure of an RNA in which the bases pair complementarily to form a stem, while the portion that cannot pair complementarily overhangs to form a loop.

As used herein, "sgRNA" and "sgRNA molecule" are used interchangeably.

As used herein, the term "replacement" when referring to a sequence that functions as a nuclear localization in a snoRNA molecule, refers to the replacement of a fragment consisting of 1 nucleotide or more than 1 contiguous nucleotide of the sgRNA molecule prior to engineering with a nuclear localization function sequence.

As used herein, the term "insertion" when referring to a sequence that functions in nuclear localization in a snoRNA molecule refers to the insertion of a nuclear localization function sequence only into the sgRNA sequence without deleting nucleotides of the sgRNA molecule prior to the alteration.

As used herein, one instance of the "directing Cas protein to the nucleus" is that the engineered sgRNA of the invention can be transported into the nucleus, and Cas protein is also transported into the nucleus simultaneously via the sgRNA. Generally, the one or more snoRNA nuclear localization functional sequences are of sufficient strength to drive accumulation of the Cas protein in a detectable amount in the nucleus of the eukaryotic cell. Detecting whether the Cas protein is directed to the nucleus or detecting the amount of accumulation of the sgRNA and Cas protein in the nucleus can be performed by any suitable technique. For example, a detectable label can be fused to the sgRNA or Cas protein such that the location within the cell is visualized, such as in conjunction with means for detecting the location of the nucleus. The nuclei may also be isolated from the cells and their contents may then be analyzed by any suitable method for detecting RNA or protein, including but not limited to methods such as immunohistochemistry, western blot or enzymatic activity assays, and the like. Accumulations in the nucleus can also be determined indirectly, such as by measuring the effect of targeting or modification on the target nucleic acid (e.g., measuring DNA cleavage or mutation at the target sequence, or measuring changes in the level of transcription or translation of the gene to which the target sequence belongs).

As used herein, the term "operably linked" is intended to mean that the Cas protein coding sequence or the sgRNA coding sequence in the vector is linked to one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).

As used herein, the term "regulatory element" is intended to include promoters, enhancers, Internal Ribosome Entry Sites (IRES), and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly-U sequences). Regulatory elements include those that direct the continuous expression of a nucleotide sequence in many types of host cells and those that direct the expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences).

The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The examples provided below serve as a guide for further modifications by a person skilled in the art and do not constitute a limitation of the invention in any way.

The experimental procedures in the following examples, unless otherwise indicated, are conventional and are carried out according to the techniques or conditions described in the literature in the field or according to the instructions of the products. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

The document Mol Biol cell.1999Jul; 10(7) 2131 (2147) the box C'/D of the U3snorNA molecule is shown in FIG. 1. The following specific example of the invention is the introduction of box C '/D of the U3snoRNA molecule in a sgRNA molecule, i.e. the sequence GAGGAAGAGCGUCAGCAGGCUGA in fig. 1 is introduced into a sgRNA molecule, wherein GAGGAAGA is the box C' sequence and GGCUGA is the box D sequence.

Example design in experiments, sgRNA backbone sequences corresponding to SpCas9 were engineered (see fig. 2). The box C '/D of the U3snoRNA molecule can be replaced at loop1 and/or loop2 positions and the box C'/D of the U3snoRNA molecule and the remaining sgRNA backbone are linked using a linking sequence (ggcca, ctgcaggcaggcc).

The engineered sgRNA molecules of the invention may comprise a crRNA portion and a tracrRNA portion, and the nuclear localization functional sequence may be linked within the crRNA sequence or within the tracrRNA sequence, or the nuclear localization functional sequence may be linked at a chimeric position of the crRNA and the tracrRNA, such as at loop1 in fig. 2.

The box C '/D sequence of the U3snoRNA molecule was ligated to the sgRNA backbone of SpCas9 (1U 3snoRNA box C'/D was ligated at loop1 and loop2 position of sgRNA backbone, respectively, as shown in fig. 2) to form the following sequence:

guuuuagagcuaggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccuagcaaguuaaaauaagg cuaguccguuaucaacuuggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccaagu ggcaccgagucggugc。

wherein, the upper case sequence is U3snoRNA box C '/D, the lower case and underlined is the framework sequence of sgRNA corresponding to SpCas9 before modification, and the lower case and not underlined is a connecting sequence (connecting U3snoRNA box C'/D and the sgRNA framework).

The sgRNA with nuclear entry function is formed by connecting U3snoRNA box C'/D with sgRNA corresponding to Cas9, thereby guiding the Cas9-sgRNA complex into the nucleus.

Example 1 verification of in vivo editing Activity of box C'/D-Cas9

1. Construction of verification vectors

Construction of verification vector C'/D box-PAM

The sgRNA expression cassette sequence (SEQ ID No.3) containing U3 snorRNA box C'/D was synthesized at reagent company. The positions 1-249 of SEQ ID No.3 are the U6 promoter, followed by the coding sequence of the modified sgRNA molecule (positions 250-274 are the coding sequence of the guide sequence). The expression cassette sequence encodes a sgRNA molecule containing the guide sequence GAACGGCUCGGAGAUCAUCAUUGCG for a targeted cell validation library, and a framework sequence that is substituted with 2 box C'/D. The sequence of the sgRNA molecule is shown in SEQ ID No.1, and the DNA sequence corresponding to the sgRNA molecule is shown in SEQ ID No. 2.

The expression cassette sequence (SEQ ID No.3) was assembled to the backbone of the pX601 vector (commercially available) by Kpn I and Not I enzymatic cleavage sites, and the intermediate vector C'/D box-pre was obtained after the sequencing verification.

In addition, a SpCas9 fragment containing no NLS was amplified by PCR using the vector pX459 vector (commercially available) as a template with the following primers:

F：5’-gctctctggctaactaccggtgccaccatggccGACAAGAAGTACAGCAT-3’；

R：5’-atcagcgagctctaggaattcTTAGTCGCCTCCCAGCTGAGACAG-3’。

and connecting the PCR product which is verified to be correct by sequencing to an intermediate carrier C '/D box-pre through Age I and EcoR I sites, and obtaining a target carrier C'/D box-PAM after being verified to be correct by sequencing.

The carrier complete sequence of the target carrier C'/D box-PAM is shown as SEQ ID No. 4.

A schematic representation of the target vector C'/D box-PAM is shown in FIG. 3.

2. Construction of control vector SpCas9-PAM

A control vector SpCas9-PAM was constructed that entered the nucleus using 2 x NLS.

The sequence GAACGGCTCGGAGATCATCATTGCG of the targeted cell verification library is connected to a vector skeleton pX459 for expressing SpCas9 through a Bbs I site, a control vector SpCas9-PAM is constructed, and the whole plasmid sequence of the control vector SpCas9-PAM is shown as SEQ ID No. 13.

The control vector SpCas9-PAM plasmid map is shown in figure 4.

3. Vector transfection of 293T library cells

The reference method (Hu Z, Wang D, Zhang C, et al. reverse non-functional PAMs synthesized by SpCas9 in human cells [ J ]. bioRxiv,2019:671503) constructs 293T library cells which contain target sites recognized by the sgRNAs containing box C'/D, the library itself contains a GFP library with frame shift mutations, the frame shift mutations of the expression frame after targeted editing can cause the original non-luminous cells to emit light, and the editing effect can be judged by detecting the proportion of luminous cells.

The specific method for constructing the 293T library cell comprises the following steps:

GFP PAM library design reference is made to the above-mentioned reference, the structure of which is CMV promoter-ATG-protospacer-NNNNN-EGFP-puro, where the sequence of protospacer is consistent with the literature and is GAACGGCTCGGAGATCATCATTGCG. N is any deoxyribonucleotide such as A, T, C or G.

The CMV promoter sequence used is shown in SEQ ID No. 14. The puro selection marker sequence used is shown in SEQ ID No. 15. The EGFP sequence used (without the start codon and stop codon) is shown in SEQ ID No. 16.

Experiment, hPGK promoter-EGFP expression frame of plasmid pRRLSIN. cPPT. PGK-GFP. WPRE (available on market) is cut by enzyme digestion through EcoR V and Sal I enzyme digestion sites and then used as a framework, CMV promoter-ATG-promoter-NNNNNNN-GFP-puro is connected with the framework through EcoR V and Sal I enzyme digestion sites to obtain a lentiviral vector pGFPPAM of the expression library, and the sequence of the lentiviral vector pGFPPAM is shown as SEQ ID No. 17.

The plasmid map of the lentiviral vector pGFPPAM is shown in FIG. 5.

293T library cells were obtained using the lentiviral vector pGFPPAM as described above by the method described in the above-mentioned publication (Hu Z, Wang D, Zhang C, et al. reverse non-structural PAMs recovered by SpCas9 in human cells [ J ]. bioRxiv,2019:671503) (except that the lentiviral vector pGFPPAM sequence was different, the other parts of the method were the same).

Vectors C '/D box-pre, C'/D box-PAM, pX459 and SpCas9-PAM were transfected into 293T library cells in 24-well plates at 800ng concentration.

The transfection method is as follows:

(1) the 293T cells were digested with Trypsin (Trypsin 0.25%, EDTA, Thermo, 11058021), counted, and 2X 10 cells were added to 500. mu.L of each well ⁵ Cells were plated in 24-well plates.

(2) For each transfection sample, please prepare the complex according to the following steps:

a. in each well plated into the cells, the aforementioned plasmid DNA was diluted in 50. mu.L of serum-free Opti-MEM I (Thermo, 25200056) reducing serum medium and gently mixed;

b. lipofectamine 2000(Thermo, 11668019) was gently mixed prior to use, and then 1.6. mu.L of Lipofectamine 2000 was diluted in each well, i.e., 50. mu.L of Opti-MEM I medium. Incubate at room temperature for 5 minutes. Note that: continuing to perform step c within 25 minutes;

c. after 5min incubation, the diluted DNA was combined with diluted Lipofectamine 2000. Mix gently and incubate at room temperature for 20 minutes (the solution may appear cloudy). Note that: the complex was stable for 6 hours at room temperature. Complexes were added to 293T library cells and mixed, and after 48h detection was performed using a flow cytometer.

4. Flow cytometry detection of SpCas9 with C'/D box and its control editing effect on library cells

Cells 48h after transfection in step 3 were digested with Trypsin (Trypsin 0.25%, EDTA, Thermo, 11058021), centrifuged at 300g 5min to remove supernatant, cells in each well were resuspended with 500 μ L PBS, GFP fluorescence expression was detected by flow cytometry, and after removal of cell debris by FCS-A and SSC-A gating, 239T-PAM cells (i.e., 293T library cells described above) were used as negative gating, and GFP fluorescence ratio was detected by flow cytometry. The experiment of this example was repeated 3 times, and the results are shown in FIG. 6, and the results of specific flow cytometry for detecting the proportion of GFP-positive cells are shown in Table 1.

TABLE 1 flow cytometry detection of GFP Positive cell proportion (average of 3 replicates)

Note: indicates significant differences (P <0.01) compared to the 293T-PAM, pX459, C'/D box-pre groups, respectively.

As can be seen from the flow results, the GFP positive cell ratio of the positive control SpCas9-PAM group is 3.25%, and the modeling is successful. The C '/D box-PAM grouping using box C '/D nuclear entry has a significant editing effect, library cells can be edited and made to produce GFP fluorescence, so a strategy to use box C '/D to guide the Cas9-sgRNA complex into the nucleus is feasible. Its efficiency is close to SpCas9-PAM, which uses NLS as the nuclear entry signal.

Example 2 location and number of box C'/D influences the editing Activity

1. Construction of validation vectors carrying sgRNAs of different numbers of box C'/D

In order to verify the effect of different numbers of box C '/D on Cas9 endoediting activity, the following two verification vectors were designed for expressing sgrnas with only one box C '/D at different positions of the molecule (only 1 box C '/D attached to loop1 or loop2 position of sgRNA backbone) whose backbone sequences are shown below:

C'/D box-1-1sgRNA backbone:

5’-gttttagagctaggccaGAGGAAGAGCGTCAGCAGGCTGActgcagggcctagcaagttaaaata aggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgc-3’；

C'/D box-1-2sgRNA backbone:

5’-gttttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttggccaGAGGAAGAGCGTCAGCAGGCTGActgcagggccaagtggcaccgagtcggtgc-3’；

in the two sgRNA frameworks, the upper case is box C'/D sequence, the lower case with underlined lines is sgRNA framework sequence of SpCas9 before modification, and the rest are connecting sequences.

Two sgRNA expression cassette sequences containing only one U3snoRNA box C '/D (denoted as expression cassette 1 and expression cassette 2, both containing the corresponding sequence GAACGGCTCGGAGATCATCATTGCG to the guide sequence of the targeted cell validation library) were synthesized at reagent companies, assembled on the pX601 vector backbone via Kpn I and Not I cleavage sites, and sequenced to verify that the correct intermediate vectors were named C '/D box-1-1-pre and C '/D box-1-2-pre, respectively.

The nucleotide sequence of the synthetic sequence of C'/D box-1-1 (expression box 1) is shown in SEQ ID No. 7. The positions 1-249 of SEQ ID No.7 are U6 promoter, followed by the coding sequence of the sgRNA molecule after modification (positions 250-274 are the coding sequence of the guide sequence). The expression cassette encodes sgRNA molecules (SEQ ID No.5), and the DNA sequence corresponding to the sgRNA molecules is shown in SEQ ID No. 6.

The nucleotide sequence of the synthetic sequence of C'/D box-1-2 (expression box 2) is shown in SEQ ID No. 10. The 1 st-249 bit of SEQ ID No.10 is U6 promoter, followed by the coding sequence of the sgRNA molecule after modification (the 250 st-274 bit is the coding sequence of the guide sequence). The expression cassette encodes sgRNA molecules (SEQ ID No.8), and the DNA sequence corresponding to the sgRNA molecules is shown in SEQ ID No. 9.

In addition, the SpCas9 fragment containing no NLS was amplified by PCR using vector pX459 (commercially available) as a template by the following primers:

F：5’-gctctctggctaactaccggtgccaccatggccGACAAGAAGTACAGCAT-3’；

R：5’-atcagcgagctctaggaattcTTAGTCGCCTCCCAGCTGAGACAG-3’。

and respectively connecting the PCR products which are verified to be correct by sequencing to intermediate vectors C '/D box-1-1-pre and C'/D box-1-2-pre through Age I and EcoR I sites, and obtaining target vectors C '/D box-1-1 and C'/D box-1-2 after being verified to be correct by sequencing.

The complete sequence of the C'/D box-1-1 vector is shown in SEQ ID No. 11.

A schematic vector diagram of the targeting vector C'/D box-1-1 is shown in FIG. 7.

The complete sequence of the target vector C'/D box-1-2 vector is shown in SEQ ID No. 12.

A schematic representation of the target vector C'/D box-1-2 is shown in FIG. 8.

2. Flow cytometry to detect the editing effect of SpCas9 with different numbers of C'/D boxes and its controls on library cells

The C '/D box-PAM, C'/D box-pre, pX459, SpCas9-PAM vectors of example 1, and the C '/D box-1-1, C'/D box-1-2 vectors of this example were transfected with 293T library cells of example 1, respectively. The transfection method was the same as example 1, and the GFP fluorescence ratio was measured by flow cytometry at 72h after transfection (the method was the same as example 1).

Cells 72h after transfection were digested with pancreatin (Trypsin 0.25%, EDTA, Thermo, 11058021), the supernatant was centrifuged off at 300g 5min, the cells in each well were resuspended in 500 μ L PBS, GFP fluorescence expression was detected by flow cytometry, and after removal of cell debris by FCS-A and SSC-A gating, the GFP fluorescence emission ratio was detected by flow cytometry with 239T-PAM cells (i.e., the 239T library cells described above) as A negative gating. The experiment of this example was repeated 3 times, and the results are shown in FIG. 9, and the results of specific flow cytometry for detecting the proportion of GFP-positive cells are shown in Table 2.

TABLE 2 flow cytometry detection of GFP Positive cell proportion (average of 3 replicates)

Note: indicates that the groups have significant differences (P) compared with the 293T-PAM, pX459 and C'/D box-pre groups respectively<0.01)。 ^## Indicates that the C '/D box1-2 group has significant difference (P) compared with the C'/D box-PAM group<0.05)。

The editing activity of the groups C '/D box-PAM, C '/D box 1-1 (C '/D box connected to sgRNA loop1) and C '/D box1-2 (C '/D box connected to sgRNA loop2) is close to that of the SpCas9-PAM group. The ratio of GFP positive cells in the C'/D box-pre group is basically equivalent to that in the pX459 control group. From the flow results, it can be seen that the higher the editing activity with only one C '/D box (C'/D box 1-1, C '/D box1-2 group) is, the higher the editing activity with two C'/D boxes connected. In the case of sgrnas with only one C '/D box, the editing efficiency was higher when the C'/D box was attached at loop2 position (distal end of the guide sequence) than when it was attached at loop1 position (proximal end of the guide sequence).

The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The use of some of the essential features is made possible within the scope of the claims attached below.

<110> Guangzhou Ruifeng Biotechnology, Inc

<120> modified sgRNA molecule and application thereof

<130> GNCLN221725

<160> 17

<170> PatentIn version 3.5

<210> 1

<211> 169

<212> RNA

<213> Artificial sequence

<400> 1

gaacggcucg gagaucauca uugcgguuuu agagcuaggc cagaggaaga gcgucagcag 60

gcugacugca gggccuagca aguuaaaaua aggcuagucc guuaucaacu uggccagagg 120

aagagcguca gcaggcugac ugcagggcca aguggcaccg agucggugc 169

<210> 2

<211> 169

<212> DNA

<213> Artificial sequence

<400> 2

gaacggctcg gagatcatca ttgcggtttt agagctaggc cagaggaaga gcgtcagcag 60

gctgactgca gggcctagca agttaaaata aggctagtcc gttatcaact tggccagagg 120

aagagcgtca gcaggctgac tgcagggcca agtggcaccg agtcggtgc 169

<210> 3

<211> 425

<212> DNA

<213> Artificial sequence

<400> 3

gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60

ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120

aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180

atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240

cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctaggcc agaggaagag 300

cgtcagcagg ctgactgcag ggcctagcaa gttaaaataa ggctagtccg ttatcaactt 360

ggccagagga agagcgtcag caggctgact gcagggccaa gtggcaccga gtcggtgctt 420

ttttt 425

<210> 4

<211> 8286

<212> DNA

<213> Artificial sequence

<400> 4

cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60

ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120

aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180

agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240

ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300

tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360

atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420

ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480

gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540

ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600

tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660

aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720

tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780

gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840

aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900

acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960

gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020

tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080

tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140

tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200

ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260

gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320

tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380

tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440

tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500

tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560

tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620

acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680

tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740

acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800

agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860

gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920

ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980

acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040

aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100

tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160

ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220

cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280

aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340

aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400

aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460

aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520

ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580

tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640

ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700

aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760

tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820

acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880

aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940

atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000

agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060

agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120

agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180

tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240

aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300

ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360

agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420

tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480

gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540

agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600

atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660

tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720

acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780

gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840

agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900

tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960

acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020

tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080

gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140

gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200

tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260

tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320

gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380

tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440

tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500

agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560

tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620

atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680

ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740

actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800

ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860

gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920

gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980

tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040

tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100

tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160

aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220

aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280

aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340

atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400

aggccagagg aagagcgtca gcaggctgac tgcagggcct agcaagttaa aataaggcta 5460

gtccgttatc aacttggcca gaggaagagc gtcagcaggc tgactgcagg gccaagtggc 5520

accgagtcgg tgcttttttt gcggccgcag gaacccctag tgatggagtt ggccactccc 5580

tctctgcgcg ctcgctcgct cactgaggcc gggcgaccaa aggtcgcccg acgcccgggc 5640

tttgcccggg cggcctcagt gagcgagcga gcgcgcagct gcctgcaggg gcgcctgatg 5700

cggtattttc tccttacgca tctgtgcggt atttcacacc gcatacgtca aagcaaccat 5760

agtacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg cgcagcgtga 5820

ccgctacact tgccagcgcc ttagcgcccg ctcctttcgc tttcttccct tcctttctcg 5880

ccacgttcgc cggctttccc cgtcaagctc taaatcgggg gctcccttta gggttccgat 5940

ttagtgcttt acggcacctc gaccccaaaa aacttgattt gggtgatggt tcacgtagtg 6000

ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg ttctttaata 6060

gtggactctt gttccaaact ggaacaacac tcaactctat ctcgggctat tcttttgatt 6120

tataagggat tttgccgatt tcggtctatt ggttaaaaaa tgagctgatt taacaaaaat 6180

ttaacgcgaa ttttaacaaa atattaacgt ttacaatttt atggtgcact ctcagtacaa 6240

tctgctctga tgccgcatag ttaagccagc cccgacaccc gccaacaccc gctgacgcgc 6300

cctgacgggc ttgtctgctc ccggcatccg cttacagaca agctgtgacc gtctccggga 6360

gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg cgcgagacga aagggcctcg 6420

tgatacgcct atttttatag gttaatgtca tgataataat ggtttcttag acgtcaggtg 6480

gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa atacattcaa 6540

atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat tgaaaaagga 6600

agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg gcattttgcc 6660

ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa gatcagttgg 6720

gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt gagagttttc 6780

gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt ggcgcggtat 6840

tatcccgtat tgacgccggg caagagcaac tcggtcgccg catacactat tctcagaatg 6900

acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg acagtaagag 6960

aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta cttctgacaa 7020

cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat catgtaactc 7080

gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag cgtgacacca 7140

cgatgcctgt agcaatggca acaacgttgc gcaaactatt aactggcgaa ctacttactc 7200

tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca ggaccacttc 7260

tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc ggtgagcgtg 7320

gaagccgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt atcgtagtta 7380

tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc gctgagatag 7440

gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat atactttaga 7500

ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt tttgataatc 7560

tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac cccgtagaaa 7620

agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc ttgcaaacaa 7680

aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca actctttttc 7740

cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgttcttcta gtgtagccgt 7800

agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct ctgctaatcc 7860

tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg gactcaagac 7920

gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc acacagccca 7980

gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta tgagaaagcg 8040

ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg gtcggaacag 8100

gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt cctgtcgggt 8160

ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg cggagcctat 8220

ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg ccttttgctc 8280

acatgt 8286

<210> 5

<211> 135

<212> RNA

<213> Artificial sequence

<400> 5

gaacggcucg gagaucauca uugcgguuuu agagcuaggc cagaggaaga gcgucagcag 60

gcugacugca gggccuagca aguuaaaaua aggcuagucc guuaucaacu ugaaaaagug 120

gcaccgaguc ggugc 135

<210> 6

<211> 135

<212> DNA

<213> Artificial sequence

<400> 6

gaacggctcg gagatcatca ttgcggtttt agagctaggc cagaggaaga gcgtcagcag 60

gctgactgca gggcctagca agttaaaata aggctagtcc gttatcaact tgaaaaagtg 120

gcaccgagtc ggtgc 135

<210> 7

<211> 391

<212> DNA

<213> Artificial sequence

<400> 7

gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60

ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120

aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180

atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240

cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctaggcc agaggaagag 300

cgtcagcagg ctgactgcag ggcctagcaa gttaaaataa ggctagtccg ttatcaactt 360

gaaaaagtgg caccgagtcg gtgctttttt t 391

<210> 8

<211> 135

<212> RNA

<213> Artificial sequence

<400> 8

gaacggcucg gagaucauca uugcgguuuu agagcuagaa auagcaaguu aaaauaaggc 60

uaguccguua ucaacuuggc cagaggaaga gcgucagcag gcugacugca gggccaagug 120

gcaccgaguc ggugc 135

<210> 9

<211> 135

<212> DNA

<213> Artificial sequence

<400> 9

gaacggctcg gagatcatca ttgcggtttt agagctagaa atagcaagtt aaaataaggc 60

tagtccgtta tcaacttggc cagaggaaga gcgtcagcag gctgactgca gggccaagtg 120

gcaccgagtc ggtgc 135

<210> 10

<211> 391

<212> DNA

<213> Artificial sequence

<400> 10

gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60

ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120

aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180

atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240

cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctagaaa tagcaagtta 300

aaataaggct agtccgttat caacttggcc agaggaagag cgtcagcagg ctgactgcag 360

ggccaagtgg caccgagtcg gtgctttttt t 391

<210> 11

<211> 8252

<212> DNA

<213> Artificial sequence

<400> 11

cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60

ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120

aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180

agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240

ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300

tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360

atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420

ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480

gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540

ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600

tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660

aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720

tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780

gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840

aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900

acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960

gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020

tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080

tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140

tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200

ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260

gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320

tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380

tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440

tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500

tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560

tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620

acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680

tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740

acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800

agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860

gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920

ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980

acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040

aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100

tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160

ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220

cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280

aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340

aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400

aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460

aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520

ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580

tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640

ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700

aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760

tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820

acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880

aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940

atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000

agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060

agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120

agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180

tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240

aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300

ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360

agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420

tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480

gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540

agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600

atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660

tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720

acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780

gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840

agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900

tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960

acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020

tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080

gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140

gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200

tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260

tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320

gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380

tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440

tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500

agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560

tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620

atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680

ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740

actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800

ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860

gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920

gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980

tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040

tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100

tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160

aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220

aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280

aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340

atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400

aggccagagg aagagcgtca gcaggctgac tgcagggcct agcaagttaa aataaggcta 5460

gtccgttatc aacttgaaaa agtggcaccg agtcggtgct ttttttgcgg ccgcaggaac 5520

ccctagtgat ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc 5580

gaccaaaggt cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc 5640

gcagctgcct gcaggggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700

cacaccgcat acgtcaaagc aaccatagta cgcgccctgt agcggcgcat taagcgcggc 5760

gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccttag cgcccgctcc 5820

tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa 5880

tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact 5940

tgatttgggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt 6000

gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa 6060

ctctatctcg ggctattctt ttgatttata agggattttg ccgatttcgg tctattggtt 6120

aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgtttac 6180

aattttatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagccccg 6240

acacccgcca acacccgctg acgcgccctg acgggcttgt ctgctcccgg catccgctta 6300

cagacaagct gtgaccgtct ccgggagctg catgtgtcag aggttttcac cgtcatcacc 6360

gaaacgcgcg agacgaaagg gcctcgtgat acgcctattt ttataggtta atgtcatgat 6420

aataatggtt tcttagacgt caggtggcac ttttcgggga aatgtgcgcg gaacccctat 6480

ttgtttattt ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata 6540

aatgcttcaa taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct 6600

tattcccttt tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa 6660

agtaaaagat gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa 6720

cagcggtaag atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt 6780

taaagttctg ctatgtggcg cggtattatc ccgtattgac gccgggcaag agcaactcgg 6840

tcgccgcata cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca 6900

tcttacggat ggcatgacag taagagaatt atgcagtgct gccataacca tgagtgataa 6960

cactgcggcc aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt 7020

gcacaacatg ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc 7080

cataccaaac gacgagcgtg acaccacgat gcctgtagca atggcaacaa cgttgcgcaa 7140

actattaact ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga 7200

ggcggataaa gttgcaggac cacttctgcg ctcggccctt ccggctggct ggtttattgc 7260

tgataaatct ggagccggtg agcgtggaag ccgcggtatc attgcagcac tggggccaga 7320

tggtaagccc tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga 7380

acgaaataga cagatcgctg agataggtgc ctcactgatt aagcattggt aactgtcaga 7440

ccaagtttac tcatatatac tttagattga tttaaaactt catttttaat ttaaaaggat 7500

ctaggtgaag atcctttttg ataatctcat gaccaaaatc ccttaacgtg agttttcgtt 7560

ccactgagcg tcagaccccg tagaaaagat caaaggatct tcttgagatc ctttttttct 7620

gcgcgtaatc tgctgcttgc aaacaaaaaa accaccgcta ccagcggtgg tttgtttgcc 7680

ggatcaagag ctaccaactc tttttccgaa ggtaactggc ttcagcagag cgcagatacc 7740

aaatactgtt cttctagtgt agccgtagtt aggccaccac ttcaagaact ctgtagcacc 7800

gcctacatac ctcgctctgc taatcctgtt accagtggct gctgccagtg gcgataagtc 7860

gtgtcttacc gggttggact caagacgata gttaccggat aaggcgcagc ggtcgggctg 7920

aacggggggt tcgtgcacac agcccagctt ggagcgaacg acctacaccg aactgagata 7980

cctacagcgt gagctatgag aaagcgccac gcttcccgaa gggagaaagg cggacaggta 8040

tccggtaagc ggcagggtcg gaacaggaga gcgcacgagg gagcttccag ggggaaacgc 8100

ctggtatctt tatagtcctg tcgggtttcg ccacctctga cttgagcgtc gatttttgtg 8160

atgctcgtca ggggggcgga gcctatggaa aaacgccagc aacgcggcct ttttacggtt 8220

cctggccttt tgctggcctt ttgctcacat gt 8252

<210> 12

<211> 8252

<212> DNA

<213> Artificial sequence

<400> 12

cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60

ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120

aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180

agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240

ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300

tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360

atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420

ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480

gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540

ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600

tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660

aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720

tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780

gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840

aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900

acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960

gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020

tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080

tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140

tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200

ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260

gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320

tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380

tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440

tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500

tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560

tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620

acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680

tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740

acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800

agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860

gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920

ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980

acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040

aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100

tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160

ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220

cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280

aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340

aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400

aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460

aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520

ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580

tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640

ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700

aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760

tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820

acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880

aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940

atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000

agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060

agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120

agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180

tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240

aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300

ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360

agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420

tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480

gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540

agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600

atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660

tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720

acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780

gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840

agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900

tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960

acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020

tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080

gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140

gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200

tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260

tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320

gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380

tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440

tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500

agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560

tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620

atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680

ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740

actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800

ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860

gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920

gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980

tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040

tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100

tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160

aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220

aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280

aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340

atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400

agaaatagca agttaaaata aggctagtcc gttatcaact tggccagagg aagagcgtca 5460

gcaggctgac tgcagggcca agtggcaccg agtcggtgct ttttttgcgg ccgcaggaac 5520

ccctagtgat ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc 5580

gaccaaaggt cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc 5640

gcagctgcct gcaggggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700

cacaccgcat acgtcaaagc aaccatagta cgcgccctgt agcggcgcat taagcgcggc 5760

gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccttag cgcccgctcc 5820

tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa 5880

tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact 5940

tgatttgggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt 6000

gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa 6060

ctctatctcg ggctattctt ttgatttata agggattttg ccgatttcgg tctattggtt 6120

aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgtttac 6180

aattttatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagccccg 6240

acacccgcca acacccgctg acgcgccctg acgggcttgt ctgctcccgg catccgctta 6300

cagacaagct gtgaccgtct ccgggagctg catgtgtcag aggttttcac cgtcatcacc 6360

gaaacgcgcg agacgaaagg gcctcgtgat acgcctattt ttataggtta atgtcatgat 6420

aataatggtt tcttagacgt caggtggcac ttttcgggga aatgtgcgcg gaacccctat 6480

ttgtttattt ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata 6540

aatgcttcaa taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct 6600

tattcccttt tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa 6660

agtaaaagat gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa 6720

cagcggtaag atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt 6780

taaagttctg ctatgtggcg cggtattatc ccgtattgac gccgggcaag agcaactcgg 6840

tcgccgcata cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca 6900

tcttacggat ggcatgacag taagagaatt atgcagtgct gccataacca tgagtgataa 6960

cactgcggcc aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt 7020

gcacaacatg ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc 7080

cataccaaac gacgagcgtg acaccacgat gcctgtagca atggcaacaa cgttgcgcaa 7140

actattaact ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga 7200

ggcggataaa gttgcaggac cacttctgcg ctcggccctt ccggctggct ggtttattgc 7260

tgataaatct ggagccggtg agcgtggaag ccgcggtatc attgcagcac tggggccaga 7320

tggtaagccc tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga 7380

acgaaataga cagatcgctg agataggtgc ctcactgatt aagcattggt aactgtcaga 7440

ccaagtttac tcatatatac tttagattga tttaaaactt catttttaat ttaaaaggat 7500

ctaggtgaag atcctttttg ataatctcat gaccaaaatc ccttaacgtg agttttcgtt 7560

ccactgagcg tcagaccccg tagaaaagat caaaggatct tcttgagatc ctttttttct 7620

gcgcgtaatc tgctgcttgc aaacaaaaaa accaccgcta ccagcggtgg tttgtttgcc 7680

ggatcaagag ctaccaactc tttttccgaa ggtaactggc ttcagcagag cgcagatacc 7740

aaatactgtt cttctagtgt agccgtagtt aggccaccac ttcaagaact ctgtagcacc 7800

gcctacatac ctcgctctgc taatcctgtt accagtggct gctgccagtg gcgataagtc 7860

gtgtcttacc gggttggact caagacgata gttaccggat aaggcgcagc ggtcgggctg 7920

aacggggggt tcgtgcacac agcccagctt ggagcgaacg acctacaccg aactgagata 7980

cctacagcgt gagctatgag aaagcgccac gcttcccgaa gggagaaagg cggacaggta 8040

tccggtaagc ggcagggtcg gaacaggaga gcgcacgagg gagcttccag ggggaaacgc 8100

ctggtatctt tatagtcctg tcgggtttcg ccacctctga cttgagcgtc gatttttgtg 8160

atgctcgtca ggggggcgga gcctatggaa aaacgccagc aacgcggcct ttttacggtt 8220

cctggccttt tgctggcctt ttgctcacat gt 8252

<210> 13

<211> 9181

<212> DNA

<213> Artificial sequence

<400> 13

gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60

ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120

aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180

atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240

cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctagaaa tagcaagtta 300

aaataaggct agtccgttat caacttgaaa aagtggcacc gagtcggtgc ttttttgttt 360

tagagctaga aatagcaagt taaaataagg ctagtccgtt tttagcgcgt gcgccaattc 420

tgcagacaaa tggctctaga ggtacccgtt acataactta cggtaaatgg cccgcctggc 480

tgaccgccca acgacccccg cccattgacg tcaatagtaa cgccaatagg gactttccat 540

tgacgtcaat gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat 600

catatgccaa gtacgccccc tattgacgtc aatgacggta aatggcccgc ctggcattgt 660

gcccagtaca tgaccttatg ggactttcct acttggcagt acatctacgt attagtcatc 720

gctattacca tggtcgaggt gagccccacg ttctgcttca ctctccccat ctcccccccc 780

tccccacccc caattttgta tttatttatt ttttaattat tttgtgcagc gatgggggcg 840

gggggggggg gggggcgcgc gccaggcggg gcggggcggg gcgaggggcg gggcggggcg 900

aggcggagag gtgcggcggc agccaatcag agcggcgcgc tccgaaagtt tccttttatg 960

gcgaggcggc ggcggcggcg gccctataaa aagcgaagcg cgcggcgggc gggagtcgct 1020

gcgcgctgcc ttcgccccgt gccccgctcc gccgccgcct cgcgccgccc gccccggctc 1080

tgactgaccg cgttactccc acaggtgagc gggcgggacg gcccttctcc tccgggctgt 1140

aattagctga gcaagaggta agggtttaag ggatggttgg ttggtggggt attaatgttt 1200

aattacctgg agcacctgcc tgaaatcact ttttttcagg ttggaccggt gccaccatgg 1260

actataagga ccacgacgga gactacaagg atcatgatat tgattacaaa gacgatgacg 1320

ataagatggc cccaaagaag aagcggaagg tcggtatcca cggagtccca gcagccgaca 1380

agaagtacag catcggcctg gacatcggca ccaactctgt gggctgggcc gtgatcaccg 1440

acgagtacaa ggtgcccagc aagaaattca aggtgctggg caacaccgac cggcacagca 1500

tcaagaagaa cctgatcgga gccctgctgt tcgacagcgg cgaaacagcc gaggccaccc 1560

ggctgaagag aaccgccaga agaagataca ccagacggaa gaaccggatc tgctatctgc 1620

aagagatctt cagcaacgag atggccaagg tggacgacag cttcttccac agactggaag 1680

agtccttcct ggtggaagag gataagaagc acgagcggca ccccatcttc ggcaacatcg 1740

tggacgaggt ggcctaccac gagaagtacc ccaccatcta ccacctgaga aagaaactgg 1800

tggacagcac cgacaaggcc gacctgcggc tgatctatct ggccctggcc cacatgatca 1860

agttccgggg ccacttcctg atcgagggcg acctgaaccc cgacaacagc gacgtggaca 1920

agctgttcat ccagctggtg cagacctaca accagctgtt cgaggaaaac cccatcaacg 1980

ccagcggcgt ggacgccaag gccatcctgt ctgccagact gagcaagagc agacggctgg 2040

aaaatctgat cgcccagctg cccggcgaga agaagaatgg cctgttcgga aacctgattg 2100

ccctgagcct gggcctgacc cccaacttca agagcaactt cgacctggcc gaggatgcca 2160

aactgcagct gagcaaggac acctacgacg acgacctgga caacctgctg gcccagatcg 2220

gcgaccagta cgccgacctg tttctggccg ccaagaacct gtccgacgcc atcctgctga 2280

gcgacatcct gagagtgaac accgagatca ccaaggcccc cctgagcgcc tctatgatca 2340

agagatacga cgagcaccac caggacctga ccctgctgaa agctctcgtg cggcagcagc 2400

tgcctgagaa gtacaaagag attttcttcg accagagcaa gaacggctac gccggctaca 2460

ttgacggcgg agccagccag gaagagttct acaagttcat caagcccatc ctggaaaaga 2520

tggacggcac cgaggaactg ctcgtgaagc tgaacagaga ggacctgctg cggaagcagc 2580

ggaccttcga caacggcagc atcccccacc agatccacct gggagagctg cacgccattc 2640

tgcggcggca ggaagatttt tacccattcc tgaaggacaa ccgggaaaag atcgagaaga 2700

tcctgacctt ccgcatcccc tactacgtgg gccctctggc caggggaaac agcagattcg 2760

cctggatgac cagaaagagc gaggaaacca tcaccccctg gaacttcgag gaagtggtgg 2820

acaagggcgc ttccgcccag agcttcatcg agcggatgac caacttcgat aagaacctgc 2880

ccaacgagaa ggtgctgccc aagcacagcc tgctgtacga gtacttcacc gtgtataacg 2940

agctgaccaa agtgaaatac gtgaccgagg gaatgagaaa gcccgccttc ctgagcggcg 3000

agcagaaaaa ggccatcgtg gacctgctgt tcaagaccaa ccggaaagtg accgtgaagc 3060

agctgaaaga ggactacttc aagaaaatcg agtgcttcga ctccgtggaa atctccggcg 3120

tggaagatcg gttcaacgcc tccctgggca cataccacga tctgctgaaa attatcaagg 3180

acaaggactt cctggacaat gaggaaaacg aggacattct ggaagatatc gtgctgaccc 3240

tgacactgtt tgaggacaga gagatgatcg aggaacggct gaaaacctat gcccacctgt 3300

tcgacgacaa agtgatgaag cagctgaagc ggcggagata caccggctgg ggcaggctga 3360

gccggaagct gatcaacggc atccgggaca agcagtccgg caagacaatc ctggatttcc 3420

tgaagtccga cggcttcgcc aacagaaact tcatgcagct gatccacgac gacagcctga 3480

cctttaaaga ggacatccag aaagcccagg tgtccggcca gggcgatagc ctgcacgagc 3540

acattgccaa tctggccggc agccccgcca ttaagaaggg catcctgcag acagtgaagg 3600

tggtggacga gctcgtgaaa gtgatgggcc ggcacaagcc cgagaacatc gtgatcgaaa 3660

tggccagaga gaaccagacc acccagaagg gacagaagaa cagccgcgag agaatgaagc 3720

ggatcgaaga gggcatcaaa gagctgggca gccagatcct gaaagaacac cccgtggaaa 3780

acacccagct gcagaacgag aagctgtacc tgtactacct gcagaatggg cgggatatgt 3840

acgtggacca ggaactggac atcaaccggc tgtccgacta cgatgtggac catatcgtgc 3900

ctcagagctt tctgaaggac gactccatcg acaacaaggt gctgaccaga agcgacaaga 3960

accggggcaa gagcgacaac gtgccctccg aagaggtcgt gaagaagatg aagaactact 4020

ggcggcagct gctgaacgcc aagctgatta cccagagaaa gttcgacaat ctgaccaagg 4080

ccgagagagg cggcctgagc gaactggata aggccggctt catcaagaga cagctggtgg 4140

aaacccggca gatcacaaag cacgtggcac agatcctgga ctcccggatg aacactaagt 4200

acgacgagaa tgacaagctg atccgggaag tgaaagtgat caccctgaag tccaagctgg 4260

tgtccgattt ccggaaggat ttccagtttt acaaagtgcg cgagatcaac aactaccacc 4320

acgcccacga cgcctacctg aacgccgtcg tgggaaccgc cctgatcaaa aagtacccta 4380

agctggaaag cgagttcgtg tacggcgact acaaggtgta cgacgtgcgg aagatgatcg 4440

ccaagagcga gcaggaaatc ggcaaggcta ccgccaagta cttcttctac agcaacatca 4500

tgaacttttt caagaccgag attaccctgg ccaacggcga gatccggaag cggcctctga 4560

tcgagacaaa cggcgaaacc ggggagatcg tgtgggataa gggccgggat tttgccaccg 4620

tgcggaaagt gctgagcatg ccccaagtga atatcgtgaa aaagaccgag gtgcagacag 4680

gcggcttcag caaagagtct atcctgccca agaggaacag cgataagctg atcgccagaa 4740

agaaggactg ggaccctaag aagtacggcg gcttcgacag ccccaccgtg gcctattctg 4800

tgctggtggt ggccaaagtg gaaaagggca agtccaagaa actgaagagt gtgaaagagc 4860

tgctggggat caccatcatg gaaagaagca gcttcgagaa gaatcccatc gactttctgg 4920

aagccaaggg ctacaaagaa gtgaaaaagg acctgatcat caagctgcct aagtactccc 4980

tgttcgagct ggaaaacggc cggaagagaa tgctggcctc tgccggcgaa ctgcagaagg 5040

gaaacgaact ggccctgccc tccaaatatg tgaacttcct gtacctggcc agccactatg 5100

agaagctgaa gggctccccc gaggataatg agcagaaaca gctgtttgtg gaacagcaca 5160

agcactacct ggacgagatc atcgagcaga tcagcgagtt ctccaagaga gtgatcctgg 5220

ccgacgctaa tctggacaaa gtgctgtccg cctacaacaa gcaccgggat aagcccatca 5280

gagagcaggc cgagaatatc atccacctgt ttaccctgac caatctggga gcccctgccg 5340

ccttcaagta ctttgacacc accatcgacc ggaagaggta caccagcacc aaagaggtgc 5400

tggacgccac cctgatccac cagagcatca ccggcctgta cgagacacgg atcgacctgt 5460

ctcagctggg aggcgacaaa aggccggcgg ccacgaaaaa ggccggccag gcaaaaaaga 5520

aaaaggaatt cggcagtgga gagggcagag gaagtctgct aacatgcggt gacgtcgagg 5580

agaatcctgg cccaatgacc gagtacaagc ccacggtgcg cctcgccacc cgcgacgacg 5640

tccccagggc cgtacgcacc ctcgccgccg cgttcgccga ctaccccgcc acgcgccaca 5700

ccgtcgatcc ggaccgccac atcgagcggg tcaccgagct gcaagaactc ttcctcacgc 5760

gcgtcgggct cgacatcggc aaggtgtggg tcgcggacga cggcgccgcg gtggcggtct 5820

ggaccacgcc ggagagcgtc gaagcggggg cggtgttcgc cgagatcggc ccgcgcatgg 5880

ccgagttgag cggttcccgg ctggccgcgc agcaacagat ggaaggcctc ctggcgccgc 5940

accggcccaa ggagcccgcg tggttcctgg ccaccgtcgg agtctcgccc gaccaccagg 6000

gcaagggtct gggcagcgcc gtcgtgctcc ccggagtgga ggcggccgag cgcgccgggg 6060

tgcccgcctt cctggagacc tccgcgcccc gcaacctccc cttctacgag cggctcggct 6120

tcaccgtcac cgccgacgtc gaggtgcccg aaggaccgcg cacctggtgc atgacccgca 6180

agcccggtgc ctgagaattc taactagagc tcgctgatca gcctcgactg tgccttctag 6240

ttgccagcca tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg aaggtgccac 6300

tcccactgtc ctttcctaat aaaatgagga aattgcatcg cattgtctga gtaggtgtca 6360

ttctattctg gggggtgggg tggggcagga cagcaagggg gaggattggg aagagaatag 6420

caggcatgct ggggagcggc cgcaggaacc cctagtgatg gagttggcca ctccctctct 6480

gcgcgctcgc tcgctcactg aggccgggcg accaaaggtc gcccgacgcc cgggctttgc 6540

ccgggcggcc tcagtgagcg agcgagcgcg cagctgcctg caggggcgcc tgatgcggta 6600

ttttctcctt acgcatctgt gcggtatttc acaccgcata cgtcaaagca accatagtac 6660

gcgccctgta gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct 6720

acacttgcca gcgccttagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 6780

ttcgccggct ttccccgtca agctctaaat cgggggctcc ctttagggtt ccgatttagt 6840

gctttacggc acctcgaccc caaaaaactt gatttgggtg atggttcacg tagtgggcca 6900

tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 6960

ctcttgttcc aaactggaac aacactcaac tctatctcgg gctattcttt tgatttataa 7020

gggattttgc cgatttcggt ctattggtta aaaaatgagc tgatttaaca aaaatttaac 7080

gcgaatttta acaaaatatt aacgtttaca attttatggt gcactctcag tacaatctgc 7140

tctgatgccg catagttaag ccagccccga cacccgccaa cacccgctga cgcgccctga 7200

cgggcttgtc tgctcccggc atccgcttac agacaagctg tgaccgtctc cgggagctgc 7260

atgtgtcaga ggttttcacc gtcatcaccg aaacgcgcga gacgaaaggg cctcgtgata 7320

cgcctatttt tataggttaa tgtcatgata ataatggttt cttagacgtc aggtggcact 7380

tttcggggaa atgtgcgcgg aacccctatt tgtttatttt tctaaataca ttcaaatatg 7440

tatccgctca tgagacaata accctgataa atgcttcaat aatattgaaa aaggaagagt 7500

atgagtattc aacatttccg tgtcgccctt attccctttt ttgcggcatt ttgccttcct 7560

gtttttgctc acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgca 7620

cgagtgggtt acatcgaact ggatctcaac agcggtaaga tccttgagag ttttcgcccc 7680

gaagaacgtt ttccaatgat gagcactttt aaagttctgc tatgtggcgc ggtattatcc 7740

cgtattgacg ccgggcaaga gcaactcggt cgccgcatac actattctca gaatgacttg 7800

gttgagtact caccagtcac agaaaagcat cttacggatg gcatgacagt aagagaatta 7860

tgcagtgctg ccataaccat gagtgataac actgcggcca acttacttct gacaacgatc 7920

ggaggaccga aggagctaac cgcttttttg cacaacatgg gggatcatgt aactcgcctt 7980

gatcgttggg aaccggagct gaatgaagcc ataccaaacg acgagcgtga caccacgatg 8040

cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg gcgaactact tactctagct 8100

tcccggcaac aattaataga ctggatggag gcggataaag ttgcaggacc acttctgcgc 8160

tcggcccttc cggctggctg gtttattgct gataaatctg gagccggtga gcgtggaagc 8220

cgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctac 8280

acgacgggga gtcaggcaac tatggatgaa cgaaatagac agatcgctga gataggtgcc 8340

tcactgatta agcattggta actgtcagac caagtttact catatatact ttagattgat 8400

ttaaaacttc atttttaatt taaaaggatc taggtgaaga tcctttttga taatctcatg 8460

accaaaatcc cttaacgtga gttttcgttc cactgagcgt cagaccccgt agaaaagatc 8520

aaaggatctt cttgagatcc tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa 8580

ccaccgctac cagcggtggt ttgtttgccg gatcaagagc taccaactct ttttccgaag 8640

gtaactggct tcagcagagc gcagatacca aatactgttc ttctagtgta gccgtagtta 8700

ggccaccact tcaagaactc tgtagcaccg cctacatacc tcgctctgct aatcctgtta 8760

ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg ggttggactc aagacgatag 8820

ttaccggata aggcgcagcg gtcgggctga acggggggtt cgtgcacaca gcccagcttg 8880

gagcgaacga cctacaccga actgagatac ctacagcgtg agctatgaga aagcgccacg 8940

cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg gcagggtcgg aacaggagag 9000

cgcacgaggg agcttccagg gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc 9060

cacctctgac ttgagcgtcg atttttgtga tgctcgtcag gggggcggag cctatggaaa 9120

aacgccagca acgcggcctt tttacggttc ctggcctttt gctggccttt tgctcacatg 9180

t 9181

<210> 14

<211> 584

<212> DNA

<213> Artificial sequence

<400> 14

gacattgatt attgactagt tattaatagt aatcaattac ggggtcatta gttcatagcc 60

catatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc tgaccgccca 120

acgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg ccaataggga 180

ctttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240

aagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300

ggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac atctacgtat 360

tagtcatcgc tattaccatg gtgatgcggt tttggcagta catcaatggg cgtggatagc 420

ggtttgactc acggggattt ccaagtctcc accccattga cgtcaatggg agtttgtttt 480

ggcaccaaaa tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca ttgacgcaaa 540

tgggcggtag gcgtgtacgg tgggaggtct atataagcag agct 584

<210> 15

<211> 600

<212> DNA

<213> Artificial sequence

<400> 15

atgaccgagt acaagcccac ggtgcgcctc gccacccgcg acgacgtccc cagggccgta 60

cgcaccctcg ccgccgcgtt cgccgactac cccgccacgc gccacaccgt cgatccggac 120

cgccacatcg agcgggtcac cgagctgcaa gaactcttcc tcacgcgcgt cgggctcgac 180

atcggcaagg tgtgggtcgc ggacgacggc gccgcggtgg cggtctggac cacgccggag 240

agcgtcgaag cgggggcggt gttcgccgag atcggcccgc gcatggccga gttgagcggt 300

tcccggctgg ccgcgcagca acagatggaa ggcctcctgg cgccgcaccg gcccaaggag 360

cccgcgtggt tcctggccac cgtcggcgtc tcgcccgacc accagggcaa gggtctgggc 420

agcgccgtcg tgctccccgg agtggaggcg gccgagcgcg ccggggtgcc cgccttcctg 480

gagacctccg cgccccgcaa cctccccttc tacgagcggc tcggcttcac cgtcaccgcc 540

gacgtcgagg tgcccgaagg accgcgcacc tggtgcatga cccgcaagcc cggtgcctga 600

<210> 16

<211> 714

<212> DNA

<213> Artificial sequence

<400> 16

gtgagcaagg gcgaggagct gttcaccggg gtggtgccca tcctggtcga gctggacggc 60

gacgtaaacg gccacaagtt cagcgtgtcc ggcgagggcg agggcgatgc cacctacggc 120

aagctgaccc tgaagttcat ctgcaccacc ggcaagctgc ccgtgccctg gcccaccctc 180

gtgaccaccc tgacctacgg cgtgcagtgc ttcagccgct accccgacca catgaagcag 240

cacgacttct tcaagtccgc catgcccgaa ggctacgtcc aggagcgcac catcttcttc 300

aaggacgacg gcaactacaa gacccgcgcc gaggtgaagt tcgagggcga caccctggtg 360

aaccgcatcg agctgaaggg catcgacttc aaggaggacg gcaacatcct ggggcacaag 420

ctggagtaca actacaacag ccacaacgtc tatatcatgg ccgacaagca gaagaacggc 480

atcaaggtga acttcaagat ccgccacaac atcgaggacg gcagcgtgca gctcgccgac 540

cactaccagc agaacacccc catcggcgac ggccccgtgc tgctgcccga caaccactac 600

ctgagcaccc agtccgccct gagcaaagac cccaacgaga agcgcgatca catggtcctg 660

ctggagttcg tgaccgccgc cgggatcact ctcggcatgg acgagctgta caag 714

<210> 17

<211> 8111

<212> DNA

<213> Artificial sequence

<220>

<221> misc_feature

<222> (2581)..(2587)

<223> n is a, c, g, or t

<400> 17

agcttaatgt agtcttatgc aatactcttg tagtcttgca acatggtaac gatgagttag 60

caacatgcct tacaaggaga gaaaaagcac cgtgcatgcc gattggtgga agtaaggtgg 120

tacgatcgtg ccttattagg aaggcaacag acgggtctga catggattgg acgaaccact 180

gaattgccgc attgcagaga tattgtattt aagtgcctag ctcgatacat aaacgggtct 240

ctctggttag accagatctg agcctgggag ctctctggct aactagggaa cccactgctt 300

aagcctcaat aaagcttgcc ttgagtgctt caagtagtgt gtgcccgtct gttgtgtgac 360

tctggtaact agagatccct cagacccttt tagtcagtgt ggaaaatctc tagcagtggc 420

gcccgaacag ggacttgaaa gcgaaaggga aaccagagga gctctctcga cgcaggactc 480

ggcttgctga agcgcgcacg gcaagaggcg aggggcggcg actggtgagt acgccaaaaa 540

ttttgactag cggaggctag aaggagagag atgggtgcga gagcgtcagt attaagcggg 600

ggagaattag atcgcgatgg gaaaaaattc ggttaaggcc agggggaaag aaaaaatata 660

aattaaaaca tatagtatgg gcaagcaggg agctagaacg attcgcagtt aatcctggcc 720

tgttagaaac atcagaaggc tgtagacaaa tactgggaca gctacaacca tcccttcaga 780

caggatcaga agaacttaga tcattatata atacagtagc aaccctctat tgtgtgcatc 840

aaaggataga gataaaagac accaaggaag ctttagacaa gatagaggaa gagcaaaaca 900

aaagtaagac caccgcacag caagcggccg ctgatcttca gacctggagg aggagatatg 960

agggacaatt ggagaagtga attatataaa tataaagtag taaaaattga accattagga 1020

gtagcaccca ccaaggcaaa gagaagagtg gtgcagagag aaaaaagagc agtgggaata 1080

ggagctttgt tccttgggtt cttgggagca gcaggaagca ctatgggcgc agcgtcaatg 1140

acgctgacgg tacaggccag acaattattg tctggtatag tgcagcagca gaacaatttg 1200

ctgagggcta ttgaggcgca acagcatctg ttgcaactca cagtctgggg catcaagcag 1260

ctccaggcaa gaatcctggc tgtggaaaga tacctaaagg atcaacagct cctggggatt 1320

tggggttgct ctggaaaact catttgcacc actgctgtgc cttggaatgc tagttggagt 1380

aataaatctc tggaacagat ttggaatcac acgacctgga tggagtggga cagagaaatt 1440

aacaattaca caagcttaat acactcctta attgaagaat cgcaaaacca gcaagaaaag 1500

aatgaacaag aattattgga attagataaa tgggcaagtt tgtggaattg gtttaacata 1560

acaaattggc tgtggtatat aaaattattc ataatgatag taggaggctt ggtaggttta 1620

agaatagttt ttgctgtact ttctatagtg aatagagtta ggcagggata ttcaccatta 1680

tcgtttcaga cccacctccc aaccccgagg ggacccgaca ggcccgaagg aatagaagaa 1740

gaaggtggag agagagacag agacagatcc attcgattag tgaacggatc tcgacggtat 1800

cggttaactt ttaaaagaaa aggggggatt ggggggtaca gtgcagggga aagaatagta 1860

gacataatag caacagacat acaaactaaa gaattacaaa aacaaattac aaaaattcaa 1920

aattttatcg atcacgagac tagcctcgag aagcttgata tcgacattga ttattgacta 1980

gttattaata gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg 2040

ttacataact tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga 2100

cgtcaataat gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat 2160

gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat catatgccaa 2220

gtacgccccc tattgacgtc aatgacggta aatggcccgc ctggcattat gcccagtaca 2280

tgaccttatg ggactttcct acttggcagt acatctacgt attagtcatc gctattacca 2340

tggtgatgcg gttttggcag tacatcaatg ggcgtggata gcggtttgac tcacggggat 2400

ttccaagtct ccaccccatt gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg 2460

actttccaaa atgtcgtaac aactccgccc cattgacgca aatgggcggt aggcgtgtac 2520

ggtgggaggt ctatataagc agagctgcca ccatggaacg gctcggagat catcattgcg 2580

nnnnnnngtg agcaagggcg aggagctgtt caccggggtg gtgcccatcc tggtcgagct 2640

ggacggcgac gtaaacggcc acaagttcag cgtgtccggc gagggcgagg gcgatgccac 2700

ctacggcaag ctgaccctga agttcatctg caccaccggc aagctgcccg tgccctggcc 2760

caccctcgtg accaccctga cctacggcgt gcagtgcttc agccgctacc ccgaccacat 2820

gaagcagcac gacttcttca agtccgccat gcccgaaggc tacgtccagg agcgcaccat 2880

cttcttcaag gacgacggca actacaagac ccgcgccgag gtgaagttcg agggcgacac 2940

cctggtgaac cgcatcgagc tgaagggcat cgacttcaag gaggacggca acatcctggg 3000

gcacaagctg gagtacaact acaacagcca caacgtctat atcatggccg acaagcagaa 3060

gaacggcatc aaggtgaact tcaagatccg ccacaacatc gaggacggca gcgtgcagct 3120

cgccgaccac taccagcaga acacccccat cggcgacggc cccgtgctgc tgcccgacaa 3180

ccactacctg agcacccagt ccgccctgag caaagacccc aacgagaagc gcgatcacat 3240

ggtcctgctg gagttcgtga ccgccgccgg gatcactctc ggcatggacg agctgtacaa 3300

ggagggcaga ggaagtcttc taacatgcgg tgacgtggag gagaatcccg gccctatgac 3360

cgagtacaag cccacggtgc gcctcgccac ccgcgacgac gtccccaggg ccgtacgcac 3420

cctcgccgcc gcgttcgccg actaccccgc cacgcgccac accgtcgatc cggaccgcca 3480

catcgagcgg gtcaccgagc tgcaagaact cttcctcacg cgcgtcgggc tcgacatcgg 3540

caaggtgtgg gtcgcggacg acggcgccgc ggtggcggtc tggaccacgc cggagagcgt 3600

cgaagcgggg gcggtgttcg ccgagatcgg cccgcgcatg gccgagttga gcggttcccg 3660

gctggccgcg cagcaacaga tggaaggcct cctggcgccg caccggccca aggagcccgc 3720

gtggttcctg gccaccgtcg gcgtctcgcc cgaccaccag ggcaagggtc tgggcagcgc 3780

cgtcgtgctc cccggagtgg aggcggccga gcgcgccggg gtgcccgcct tcctggagac 3840

ctccgcgccc cgcaacctcc ccttctacga gcggctcggc ttcaccgtca ccgccgacgt 3900

cgaggtgccc gaaggaccgc gcacctggtg catgacccgc aagcccggtg cctgagtcga 3960

caatcaacct ctggattaca aaatttgtga aagattgact ggtattctta actatgttgc 4020

tccttttacg ctatgtggat acgctgcttt aatgcctttg tatcatgcta ttgcttcccg 4080

tatggctttc attttctcct ccttgtataa atcctggttg ctgtctcttt atgaggagtt 4140

gtggcccgtt gtcaggcaac gtggcgtggt gtgcactgtg tttgctgacg caacccccac 4200

tggttggggc attgccacca cctgtcagct cctttccggg actttcgctt tccccctccc 4260

tattgccacg gcggaactca tcgccgcctg ccttgcccgc tgctggacag gggctcggct 4320

gttgggcact gacaattccg tggtgttgtc ggggaagctg acgtcctttc catggctgct 4380

cgcctgtgtt gccacctgga ttctgcgcgg gacgtccttc tgctacgtcc cttcggccct 4440

caatccagcg gaccttcctt cccgcggcct gctgccggct ctgcggcctc ttccgcgtct 4500

tcgccttcgc cctcagacga gtcggatctc cctttgggcc gcctccccgc ctggaattcg 4560

agctcggtac ctttaagacc aatgacttac aaggcagctg tagatcttag ccacttttta 4620

aaagaaaagg ggggactgga agggctaatt cactcccaac gaagacaaga tctgcttttt 4680

gcttgtactg ggtctctctg gttagaccag atctgagcct gggagctctc tggctaacta 4740

gggaacccac tgcttaagcc tcaataaagc ttgccttgag tgcttcaagt agtgtgtgcc 4800

cgtctgttgt gtgactctgg taactagaga tccctcagac ccttttagtc agtgtggaaa 4860

atctctagca gtagtagttc atgtcatctt attattcagt atttataact tgcaaagaaa 4920

tgaatatcag agagtgagag gaacttgttt attgcagctt ataatggtta caaataaagc 4980

aatagcatca caaatttcac aaataaagca tttttttcac tgcattctag ttgtggtttg 5040

tccaaactca tcaatgtatc ttatcatgtc tggctctagc tatcccgccc ctaactccgc 5100

ccatcccgcc cctaactccg cccagttccg cccattctcc gccccatggc tgactaattt 5160

tttttattta tgcagaggcc gaggccgcct cggcctctga gctattccag aagtagtgag 5220

gaggcttttt tggaggccta gggacgtacc caattcgccc tatagtgagt cgtattacgc 5280

gcgctcactg gccgtcgttt tacaacgtcg tgactgggaa aaccctggcg ttacccaact 5340

taatcgcctt gcagcacatc cccctttcgc cagctggcgt aatagcgaag aggcccgcac 5400

cgatcgccct tcccaacagt tgcgcagcct gaatggcgaa tgggacgcgc cctgtagcgg 5460

cgcattaagc gcggcgggtg tggtggttac gcgcagcgtg accgctacac ttgccagcgc 5520

cctagcgccc gctcctttcg ctttcttccc ttcctttctc gccacgttcg ccggctttcc 5580

ccgtcaagct ctaaatcggg ggctcccttt agggttccga tttagtgctt tacggcacct 5640

cgaccccaaa aaacttgatt agggtgatgg ttcacgtagt gggccatcgc cctgatagac 5700

ggtttttcgc cctttgacgt tggagtccac gttctttaat agtggactct tgttccaaac 5760

tggaacaaca ctcaacccta tctcggtcta ttcttttgat ttataaggga ttttgccgat 5820

ttcggcctat tggttaaaaa atgagctgat ttaacaaaaa tttaacgcga attttaacaa 5880

aatattaacg cttacaattt aggtggcact tttcggggaa atgtgcgcgg aacccctatt 5940

tgtttatttt tctaaataca ttcaaatatg tatccgctca tgagacaata accctgataa 6000

atgcttcaat aatattgaaa aaggaagagt atgagtattc aacatttccg tgtcgccctt 6060

attccctttt ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaa 6120

gtaaaagatg ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaac 6180

agcggtaaga tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcactttt 6240

aaagttctgc tatgtggcgc ggtattatcc cgtattgacg ccgggcaaga gcaactcggt 6300

cgccgcatac actattctca gaatgacttg gttgagtact caccagtcac agaaaagcat 6360

cttacggatg gcatgacagt aagagaatta tgcagtgctg ccataaccat gagtgataac 6420

actgcggcca acttacttct gacaacgatc ggaggaccga aggagctaac cgcttttttg 6480

cacaacatgg gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagcc 6540

ataccaaacg acgagcgtga caccacgatg cctgtagcaa tggcaacaac gttgcgcaaa 6600

ctattaactg gcgaactact tactctagct tcccggcaac aattaataga ctggatggag 6660

gcggataaag ttgcaggacc acttctgcgc tcggcccttc cggctggctg gtttattgct 6720

gataaatctg gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagat 6780

ggtaagccct cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaa 6840

cgaaatagac agatcgctga gataggtgcc tcactgatta agcattggta actgtcagac 6900

caagtttact catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc 6960

taggtgaaga tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc 7020

cactgagcgt cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg 7080

cgcgtaatct gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg 7140

gatcaagagc taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca 7200

aatactgttc ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg 7260

cctacatacc tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg 7320

tgtcttaccg ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga 7380

acggggggtt cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac 7440

ctacagcgtg agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat 7500

ccggtaagcg gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc 7560

tggtatcttt atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga 7620

tgctcgtcag gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc 7680

ctggcctttt gctggccttt tgctcacatg ttctttcctg cgttatcccc tgattctgtg 7740

gataaccgta ttaccgcctt tgagtgagct gataccgctc gccgcagccg aacgaccgag 7800

cgcagcgagt cagtgagcga ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc 7860

gcgcgttggc cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc 7920

agtgagcgca acgcaattaa tgtgagttag ctcactcatt aggcacccca ggctttacac 7980

tttatgcttc cggctcgtat gttgtgtgga attgtgagcg gataacaatt tcacacagga 8040

aacagctatg accatgatta cgccaagcgc gcaattaacc ctcactaaag ggaacaaaag 8100

ctggagctgc a 8111

Claims

1. A method of engineering a sgRNA molecule, comprising the steps of: introducing a sequence with a function of nuclear localization in a snorRNA molecule into a nucleotide chain of the sgRNA to obtain the modified sgRNA molecule.

2. The method of claim 1, wherein: the sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions as a nuclear localization in the snoRNA molecule;

further, the sequence which plays a nuclear localization function in the snoRNA molecule is a box C' sequence and/or a box D sequence;

still further, the box C' sequence comprises DGAHBN; wherein D is U, G or A; h is U, A or C; b is G, U or C; n may be any ribonucleotide; the box D sequence comprises NYVWGA or CUGA; wherein N may be any ribonucleotide; y is C or U; v is C, G or A; w is U or A;

still further, the box C 'sequence comprises GAGGAAGA or the box C' sequence is GAGGAAGA and the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.

3. The method according to claim 1 or 2, characterized in that: the engineered sgRNA molecule can direct Cas protein to the nucleus;

further, the engineered sgRNA molecule can direct Cas protein to the nucleus and target or modify a target nucleic acid;

further, the Cas protein is optionally selected from Cas9, Cas12, and Cas 13.

4. The method of claim 3, the Cas protein being free of nuclear localization sequences.

5. The modified sgRNA molecule has a nucleotide chain comprising a sequence with a function of initiating nuclear localization in a snoRNA molecule.

6. The engineered sgRNA molecule of claim 5, wherein: the sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions as a nuclear localization in the snoRNA molecule.

7. The engineered sgRNA molecule of claim 5 or 6, wherein: the sequence for the nuclear localization function in the snoRNA molecule is selected from the following sequences (a1) and/or (a 2):

(a1) box C' and/or box D sequences from snornas,

(a2) box H and/or box ACA sequences from snoRNA.

8. The engineered sgRNA molecule of claim 7, wherein: the sequences of the snoRNA molecule which play a nuclear localization function are box C' and box D;

further, the box C' sequence comprises or is DGAHBN; wherein D is U, G or A; h is U, A or C; b is G, U or C; n may be any ribonucleotide; the box D sequence comprises NYVWGA, GGCUGA or CUGA, or NYVWGA, GGCUGA or CUGA; wherein N can be any ribonucleotide; y is C or U; v is C, G or A; w is U or A;

still further, the box C 'sequence comprises GAGGAAGA or the box C' sequence is GAGGAAGA, the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA;

still further, the sequence serving a nuclear localization function in said snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA;

further, the sequence of the snoRNA molecule which functions as a nuclear localization is GAGGAAGAGCGUCAGCAGGCUGA.

9. The engineered sgRNA molecule of any one of claims 5-8, wherein the snoRNA molecule is optionally selected from: u3snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA, and U24 to U63 snoRNA.

10. The engineered sgRNA molecule of any one of claims 5-9, wherein: the modified sgRNA molecule is obtained by inserting or replacing a non-complementary pairing sequence of a framework sequence secondary structure of the sgRNA molecule into a sequence with a nuclear localization function in the snoRNA molecule before modification;

further, the framework sequence secondary structure non-complementary pairing sequence is a cyclization sequence of the framework sequence secondary structure of the sgRNA molecule before modification.

11. The engineered sgRNA molecule of claim 10, wherein: the modified sgRNA molecule is obtained by connecting a sequence with a framework sequence of the modified sgRNA molecule and a sequence with a function of initiating a nuclear localization in the snoRNA molecule through a connecting sequence;

furthermore, one end of a sequence with a nuclear localization function in the snoRNA molecule is connected with the framework sequence of the pre-modified sgRNA molecule through a connecting sequence 1, and the other end of the sequence is connected with the framework sequence of the pre-modified sgRNA molecule through a connecting sequence 2.

12. The engineered sgRNA molecule of any one of claims 5-11, wherein: more than 2 sequences which play a nuclear localization function in the snoRNA molecule exist on the nucleotide chain of the sgRNA, and the nuclear localization function sequences are directly connected with each other or connected through a connecting sequence 3.

13. The engineered sgRNA molecule of any one of claims 5-12, wherein: the sequence length of the guide sequence of the modified sgRNA molecule is 10bp-50 bp.

14. The engineered sgRNA molecule of any one of claims 5 to 13, wherein: the sequence length of the framework sequence of the modified sgRNA molecule is 10bp-300 bp.

15. The engineered sgRNA molecule of any one of claims 5-14, wherein: the engineered sgRNA molecule can direct Cas protein to the nucleus;

further, the engineered sgRNA molecule can direct Cas protein to the nucleus and target or modify the target nucleic acid.

16. The method of claim 15, the Cas protein being free of nuclear localization sequences.

17. The engineered sgRNA molecule of claim 15 or 16, wherein: the Cas protein is optionally selected from Cas9, Cas12, and Cas 13.

18. A DNA molecule encoding the engineered sgRNA molecule of any one of claims 5-17.

19. An expression cassette, expression vector, recombinant bacterium or transgenic cell line comprising the DNA molecule of claim 18.

20. A kit comprising any one of:

i. a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;

ii. An expression vector 1 comprising a nucleotide sequence encoding a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;

iii, a Cas protein, and an expression vector 2 comprising a nucleotide sequence encoding the engineered sgRNA molecule of any one of claims 5-17;

iv, an expression vector 1 comprising a nucleotide sequence encoding a Cas protein, and an expression vector 2 comprising a nucleotide sequence encoding the engineered sgRNA molecule of any one of claims 5-17;

v, an expression vector 3 comprising a nucleotide sequence encoding a Cas protein and a nucleotide sequence encoding the engineered sgRNA molecule of any one of claims 5-17.

21. A composition selected from any one of:

I. a composition comprising: a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;

II. A composition comprising: a nucleic acid molecule 1 encoding a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;

III, a composition comprising: a Cas protein, and a nucleic acid molecule 2 encoding the engineered sgRNA molecule of any one of claims 5-17;

IV, a composition comprising: a nucleic acid molecule 1 encoding a Cas protein, and a nucleic acid molecule 2 encoding the engineered sgRNA molecule of any one of claims 5-17;

v, a composition comprising: a nucleic acid molecule 3 encoding a Cas protein and the engineered sgRNA molecule of any one of claims 5-17.

22. The engineered sgRNA molecule of any one of claims 5-17, the DNA molecule of claim 18, the expression cassette, expression vector, recombinant bacterium, or transgenic cell line of claim 19, the kit of claim 20, or the composition of claim 21, for use in any one of:

p1, targeting or modifying a genomic target nucleic acid; and

p2, products prepared for targeting or modifying genomic target nucleic acids.

23. A method of targeting or modifying a genomic target nucleic acid comprising: introducing the composition of claim 21 into an organism or biological cell such that both the Cas protein and the engineered sgRNA molecule are expressed, resulting in targeting or modification of a genomic target nucleic acid.

24. A method of making a mutant biological cell, comprising: the method of claim 23, wherein the genome of the biological cell is targeted or modified to obtain a mutant of the biological cell.

25. A method of making a biological mutant comprising: the method of claim 23, wherein the genome of the organism is targeted or modified to obtain a biological mutant.