CN114990104B - Engineered sgRNA molecules and uses thereof - Google Patents

Engineered sgRNA molecules and uses thereof Download PDF

Info

Publication number
CN114990104B
CN114990104B CN202210539746.4A CN202210539746A CN114990104B CN 114990104 B CN114990104 B CN 114990104B CN 202210539746 A CN202210539746 A CN 202210539746A CN 114990104 B CN114990104 B CN 114990104B
Authority
CN
China
Prior art keywords
sequence
molecule
sgrna
engineered
box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210539746.4A
Other languages
Chinese (zh)
Other versions
CN114990104A (en
Inventor
梁峻彬
梁兴祥
徐辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ruifeng Biotechnology Co ltd
Original Assignee
Guangzhou Ruifeng Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ruifeng Biotechnology Co ltd filed Critical Guangzhou Ruifeng Biotechnology Co ltd
Publication of CN114990104A publication Critical patent/CN114990104A/en
Application granted granted Critical
Publication of CN114990104B publication Critical patent/CN114990104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • C12N5/0684Cells of the urinary tract or kidneys
    • C12N5/0686Kidney cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/31Chemical structure of the backbone
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2510/00Genetically modified cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2810/00Vectors comprising a targeting moiety
    • C12N2810/10Vectors comprising a non-peptidic targeting moiety

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Urology & Nephrology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention discloses an engineered sgRNA molecule and application thereof. Engineered sgRNA molecules are obtained by introducing a sequence with a nuclear localization function in the snoRNA molecule on the nucleotide chain of the sgRNA. The invention develops a novel CRISPR-Cas system, and the reformed sgRNA can guide Cas protein to enter the nucleus for effective gene editing.

Description

Engineered sgRNA molecules and uses thereof
Technical Field
The invention relates to the technical field of biology, in particular to an engineered sgRNA molecule and application thereof.
Background
The CRISPR/Cas system, a type of acquired immune system found in most bacteria and most archaea, recognizes and eliminates foreign plasmids or phages and leaves foreign gene fragments in the self genome as an immunological memory. Naturally occurring CRISPR-Cas systems fall into two main categories: class 1, use of polyprotein complexes for nucleic acid cleavage; class 2, cleavage using single protein effector domains. Because of the advantages provided by single protein effector domains, class 2 systems are the most widespread CRISPR tools for biological research and translation applications. Class 2 is further subdivided into three types II, V and VI, each using a different type of Cas protein. Among Cas proteins from class 2 systems, some type II Cas9 and type V Cas12 have RNA-guided DNA endonuclease activity, while type VI Cas13 appears to exhibit preferential RNA targeting and cleavage activity.
Wherein Cas9 and Cas12 effectors from class 2 CRISPR systems are RNA-guided endonucleases that can generate DSBs in a target DNA sequence. The CRISPR/Cas system mainly comprises Cas protein and single-stranded guide RNA (sgRNA), wherein the Cas protein has the function of cutting DNA double chains, the sgRNA plays a guiding role, the Cas protein can reach different target sites through base complementary pairing under the guidance of the sgRNA, and the target genes are cut to accurately edit the genes at fixed points.
At present, CRISPR/Cas systems are the most popular and best used gene editing systems because of their simplicity of operation, while allowing precise editing of nucleic acid sequences. However, cas is an editing technology at the nucleic acid level, and needs to enter the nucleus to combine with chromosomal DNA to function, and currently, the main method of entering the nucleus is to fuse Cas protein with a nuclear localization signal (i.e., a nuclear localization sequence, NLS), form a complex with Cas and sgRNA with the nuclear localization signal, and then interact with a nuclear-entering vector, so that Cas protein can be transported into the nucleus, thereby enabling Cas to function.
snoRNA (small nucleolar RNA) is a class of small non-coding RNAs of high abundance in the nucleus, most snornas can be categorized into two classes: both box C/D snoRNA and box H/ACA snoRNA have a conserved characteristic secondary structure. The snoRNA has the function of directing specific nucleoside 2' -O-ribomethylation modifications and pseudouracil modifications in rRNA, snRNA or tRNA precursors. Several documents report that the box C' and box D sequences in snornas play an important role in snoRNA RNP formation and interaction.
Disclosure of Invention
The invention aims to provide an engineered sgRNA molecule and application thereof.
In a first aspect, the invention claims a method of engineering an sgRNA molecule.
The method for modifying sgRNA molecules, which is claimed by the invention, can comprise the following steps: introducing a sequence with a nuclear localization function into the snoRNA molecule on the nucleotide chain of the sgRNA to obtain the modified sgRNA molecule.
In some cases, the engineered sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions to nuclear localization in the snoRNA molecule. The backbone sequence of the engineered sgRNA molecule may be a sequence that is inserted into or substituted into the backbone sequence of the pre-engineered sgRNA molecule for a nuclear localization function.
In some cases, the sequence of the nucleation localization function in the snoRNA molecule may be selected from the sequences shown in (a 1) and/or (a 2) below:
(a1) Box C' and/or box D sequences from snoRNA;
(a2) Box H and/or box ACA sequences from snoRNA.
Further, the sequence of the nucleation localization function in the snoRNA molecule is a box C' and/or box D sequence.
In some cases, the box C' sequence comprises or is the sequence DGAHBN, wherein D is U, G or a; h is U, A or C; b is G, U or C; n may be any ribonucleotide.
In some cases, the box D sequence comprises the sequence NYVWGA or CUGA. Further, the box D sequence comprises the sequence NYVWGA or GGCUGA. Still further, the box D sequence is the sequence NYVWGA, GGCUGA or CUGA. Wherein N can be any ribonucleotide; y is C or U; v is C, G or A; w is U or A.
In some cases, the sequences of the nucleation localization function in the snoRNA molecule are box C ' and box D, and the box C ' sequence comprises gaggaga or the box C ' sequence is gaggaga, and the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.
In some cases, the sequence that functions to nuclear localization in the snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the sequence of the nucleation localization function in the snoRNA molecule is GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the snoRNA molecule is selected from any snoRNA comprising a box C' or box D sequence. In some cases, the snoRNA molecule is any selected from the group consisting of: u3 snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA and U24 to U63snoRNA.
In some cases, the sequence inserted or substituted into the backbone sequence of the snoRNA molecule that functions to nucleate is the sequence inserted or substituted into the snoRNA molecule that functions to nucleate prior to the engineering of the backbone sequence secondary structure non-complementary pairing sequence (i.e., a portion that does not form an intramolecular base complementary pairing) of the snoRNA molecule. The framework sequence secondary structure can be a secondary structure predicted by a person skilled in the art according to a conventional calculation method or can be a secondary structure determined according to a conventional experimental method. The complementary pairing may be conventional A-U, C-G base complementary pairing, and may or may not include other less common base complementary pairing (e.g., G-U, A-A, A-C, A-G, G-G, U-U, U-C pairing).
Further, in some cases, the backbone sequence secondary structure non-complementary pairing sequence is a loop forming sequence of the backbone sequence secondary structure of the pre-engineered sgRNA molecule (i.e., a loop in the stem-loop structure of the sgRNA backbone sequence).
The engineered sgRNA molecules of the invention may comprise engineered or non-engineered crRNA sequences, as well as engineered or non-engineered tracrRNA sequences. For clarity, the description is presented in conjunction with a non-limiting example diagram as shown in fig. 2 (the sequence comprising loop 1 and loop 2 in fig. 2 is the sgRNA backbone sequence corresponding to SpCas 9). The nuclear localization functional sequence may be linked (inserted or substituted) to the location in the sgRNA secondary structure where the crRNA sequence is chimeric (i.e. linked) to the tracrRNA sequence, such as the loop 1 position in fig. 2; alternatively, the nuclear localization functional sequence may be linked (i.e. inserted or substituted into) the interior of the crRNA sequence of the sgRNA or the interior of the tracrRNA sequence, non-limiting examples may alternatively be linked, for example, to a loop (loop) formed only by the tracrRNA sequence in the sgRNA secondary structure, such as the loop 2 position in fig. 2. After insertion of the nuclear localization functional sequence, the number of nucleotides originally belonging to the pre-engineered sgRNA is not reduced, e.g. the number of nucleotides originally belonging to that part of the loop is not reduced. After substitution into the nuclear localization functional sequence, the number of nucleotides originally belonging to the pre-engineered sgRNA is reduced, e.g., the number of nucleotides originally belonging to that part of the loop is reduced (e.g., all nucleotides originally belonging to the loop are replaced by the nuclear localization functional sequence).
In some cases, a linker sequence (linker) may or may not be used when introducing a sequence with a nuclear localization function in the snoRNA molecule on the nucleotide chain of the sgRNA. The sequence of the nucleation localization function in the snoRNA molecule can be linked to the backbone sequence of the pre-engineered sgRNA molecule by a linking sequence. Furthermore, one end of the sequence with the nucleation and positioning function in the snoRNA molecule is connected with the framework sequence of the sgRNA molecule before transformation through a connecting sequence 1 (Linker 1), the other end of the sequence is connected with the framework sequence of the sgRNA molecule before transformation through a connecting sequence 2 (Linker 2), and the connecting sequence 1 and the connecting sequence 2 can be the same or different. Still further, the nucleotide sequence of the Linker1 is shown in positions 38-42 (ggcca) of SEQ ID No. 1; the nucleotide sequence of the Linker2 is shown in the 66 th-75 th positions of SEQ ID No.1 (cugcagggcc).
Further, when 2 or more than 2 sequences having a nuclear localization function in the snoRNA molecule are present on the nucleotide chain of the engineered sgRNA, the nuclear localization function sequences may be directly linked or may be linked by a linking sequence. When the sequence of the nucleation localization function in the snoRNA molecule introduced on the nucleotide chain of the sgRNA is two or more box sequences, the different box sequences may be directly linked or may be linked by a linking sequence (which may be referred to as a linking sequence 3[ linker3 ]). The linker sequence may be the same or different from the linker sequence linking the nuclear localization functional sequence and the backbone sequence of the pre-engineered sgRNA molecule. That is, the linker3 may be the same as or different from the linker1 and the linker 2. Further, the nucleotide sequence of the connecting sequence 3 is shown in positions 51-59 (GCGUCAGCA) of SEQ ID No. 1.
In a specific embodiment of the invention, the sequences of the nucleation and localization function in the snoRNA molecule are box C' and box D in U3 snoRNA. Specifically, the box C' sequence in the U3snoRNA is GAGGAAGA, and the box D sequence is GGCUGA/CUGA. The skeleton of the sgRNA is a sgRNA skeleton sequence corresponding to SpCas 9.
More specifically, the nucleotide sequence of the engineered sgRNA molecule is any one of the following:
(b1) The sequence obtained by replacing positions 1 to 25 of SEQ ID No.1 with a guide sequence for recognizing a target nucleic acid,
wherein, (b 1) is box C 'and box D "replaced in" U3snoRNA "at the position of loop1 (loop 1) of the sgRNA backbone sequence corresponding to SpCas9, and box C' and box D" also replaced in "U3 snoRNA" at the position of loop2 (loop 2);
(b2) The 1 st to 25 th positions of SEQ ID No.5 are replaced with a guide sequence for recognizing the target nucleic acid,
wherein, (b 2) is box C 'and box D' which are replaced by 'U3 snoRNA' at the position of loop1 (loop 1) of the sgRNA skeleton sequence corresponding to SpCas9, and the position of loop2 (loop 2) is not modified;
(b3) The 1 st to 25 th positions of SEQ ID No.8 are replaced with a guide sequence for recognizing the target nucleic acid,
wherein, (b 3) is a box C 'and a box D' which are replaced by 'U3 snoRNA' at the position of loop2 (loop 2) of the sgRNA skeleton sequence corresponding to SpCas9, and the position of loop1 (loop 1) is not modified.
For clarity, the length of the guide sequences described in (b 1), (b 2) and (b 3) above may vary moderately and is not limited to a length of only 25 bp.
Wherein, the 1 st to 25 th positions of SEQ ID No.1, the 1 st to 25 th positions of SEQ ID No.5 and the 1 st to 25 th positions of SEQ ID No.8 are all the guide sequences (GAACGGCUCGGAGAUCAUCAUUGCG) for recognizing the target nucleic acid in examples.
In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the present invention may be 10bp, 15bp, 20bp, 25bp, 30bp or 40bp, 60bp, 50bp, 40bp, 30bp, 25bp, 20bp or 15bp. In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the present invention may be 10bp-50bp, 10bp-40bp, 15bp-35bp, 15bp-30bp, 15bp-25bp, 17bp-24bp or 18bp-22bp, or may be 20bp-35bp, 25bp-35bp or 28bp-32bp.
In some cases, the length of the skeleton sequence (comprising the box sequence from the snoRNA) of the modified sgRNA molecule of the invention can be more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp, more than or equal to 40bp, more than or equal to 50bp, more than or equal to 60bp, more than or equal to 70bp, more than or equal to 80bp, more than or equal to 90bp, more than or equal to 100bp, more than or equal to 110bp, more than or equal to 120bp, more than or equal to 130bp, more than or equal to 140bp, more than or equal to 150bp, more than or equal to 160bp, more than or equal to 170bp, more than or equal to 180bp, more than or equal to 200bp, more than or equal to 210bp, more than or equal to 220bp, more than or equal to 200bp, more than or equal to 350bp, more than or equal to 50bp, more than or equal to 100bp, more than or less than or equal to 140bp, more than or less than or equal to 60bp, more than or less than or equal to 60 bp. In some cases, the backbone sequence of the engineered sgRNA molecules of the present invention (comprising the box sequence from the snoRNA) may have a sequence length of 10bp to 300bp, 20bp to 250bp, 30bp to 240bp, 50bp to 220bp, 80bp to 200bp, or 100bp to 180bp.
In some cases, the sequence length of the modified sgRNA molecule of the invention can be more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp, more than or equal to 40bp, more than or equal to 50bp, more than or equal to 60bp, more than or equal to 70bp, more than or equal to 80bp, more than or equal to 90bp, more than or equal to 100bp, more than or equal to 110bp, more than or equal to 120bp, more than or equal to 130bp, more than or equal to 140bp, more than or equal to 150bp, more than or equal to 160bp, more than or equal to 170bp, more than or equal to 180bp, more than or equal to 190bp, more than or equal to 210bp, more than or equal to 350bp, more than or equal to 180bp, more than or equal to 170bp, more than or equal to 160bp, more than or equal to 150bp, more than or equal to 140bp, more than or equal to 130bp, more than or equal to 20bp, more than or equal to 50 bp. In some cases, the sequence length of the modified sgRNA molecules of the present invention may be 10bp-300bp, 30bp-250bp, 50bp-240bp, 70bp-240bp, 90bp-220bp, or 110bp-200bp.
In some cases, the engineered sgRNA molecules can be used in combination with Cas proteins for gene targeting or modification, e.g., for eukaryotic gene targeting or modification, further for animal cell gene targeting or modification, and still further for human cell gene targeting or modification. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence. In some cases, the Cas protein contains a nuclear localization sequence.
The engineered sgRNA molecule can direct Cas protein to the nucleus, and further, can direct Cas protein to a nuclear target nucleic acid. In some cases, the engineered sgrnas can direct Cas proteins to the nucleus and target or modify the target nucleic acid. In some cases, the engineered sgrnas can form complexes with Cas proteins, guide Cas proteins to the nucleus, and target or modify target nucleic acids. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the targeting the target nucleic acid consists of one or more of: cleaving one or more target nucleic acids, visualizing or detecting the one or more target nucleic acids, labeling the one or more target nucleic acids, transporting the one or more target nucleic acids, masking the one or more target nucleic acids, binding the one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs. Further, in some cases, the targeting of the target nucleic acid is binding to the target nucleic acid; in some cases, the targeting the target nucleic acid is cleavage of the target nucleic acid.
In some cases, the modifying the target nucleic acid consists of one or more of: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of a nucleic acid, demethylation of a nucleic acid, and deamination of a nucleic acid.
In some cases, the sgRNA comprises at least one chemically modified nucleotide, non-limiting examples of which include 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-moe), 2 '-fluoro (2' -F), phosphorothioate (p=s) linkage modifications between nucleotides. The chemical modification may be located at any number of nucleotides at any position. In some cases, the sgrnas comprise modifications at the 5 'end and/or the 3' end.
In some cases, the engineered sgRNA molecules may additionally be modified with any number of nucleotides, such as, by way of non-limiting example, the end of the sgRNA guide sequence in the patent publication No. CN104968784B, which also contains 2 additional guanine nucleotides.
In some cases, the Cas protein is selected from Cas9, cas12, and Cas13. In some cases, the Cas protein is selected from Cas9, cas12. In some cases, the Cas protein is selected from Cas9, cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, cas12, and Cas13 without Cas endonuclease activity, including but not limited to a fully inactivated death Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickase (nickase) having only single-strand cleavage function, e.g., cas9 nickase (Cas 9 nickase, nCas 9), cas12 nickase. In some cases, the Cas9 is selected from SpCas9 (streptococcus pyogenes Cas 9).
It is understood that the corresponding methods of engineering a sgRNA molecule are within the scope of the invention, provided that the engineered sgRNA molecule comprising the sequence of the nuclear localization function in the snoRNA molecule is capable of directing any one specific Cas protein (without the nuclear localization sequence) to the nucleus.
In a second aspect, the invention claims engineered sgRNA molecules.
In some cases, the engineered sgRNA molecule is prepared by the method of the first aspect described above.
In some cases, the engineered sgRNA molecule comprises a sequence on the nucleotide chain that functions as a nuclear localization in the snoRNA molecule. Further, in some cases, the engineered sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions to nuclear localization in the snoRNA molecule. The backbone sequence of the engineered sgRNA molecule may be obtained by inserting or replacing a sequence having a nuclear localization function into the snana molecule in the backbone sequence of the pre-engineered sgRNA molecule.
In some cases, the sequence of the nucleation localization function in the snoRNA molecule may be selected from the sequences shown in (a 1) and/or (a 2) below:
(a1) Box C' and/or box D sequences from snoRNA;
(a2) Box H and/or box ACA sequences from snoRNA.
Further, the sequence of the nucleation and localization function in the snoRNA molecule is a box C' sequence and/or a box D sequence.
In some cases, the box C' sequence comprises or is the sequence DGAHBN, wherein D is U, G or a; h is U, A or C; b is G, U or C; n may be any ribonucleotide.
In some cases, the box D sequence comprises the sequence NYVWGA or CUGA. Further, the box D sequence comprises the sequence NYVWGA or GGCUGA. Still further, the box D sequence is the sequence NYVWGA, GGCUGA or CUGA. Wherein N can be any ribonucleotide; y is C or U; v is C, G or A; w is U or A.
In some cases, the sequences of the nucleation localization function in the snoRNA molecule are box C ' and box D, and the box C ' sequence comprises gaggaga or the box C ' sequence is gaggaga, and the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.
In some cases, the sequence that functions to nuclear localization in the snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the sequence of the nucleation localization function in the snoRNA molecule is GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the snoRNA molecule is selected from any snoRNA comprising a box C' or box D sequence. In some cases, the snoRNA molecule is any selected from the group consisting of: u3 snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA and U24 to U63snoRNA.
In some cases, the sequence inserted or substituted into the backbone sequence of the snoRNA molecule that functions to nucleate is the sequence inserted or substituted into the snoRNA molecule that functions to nucleate prior to the engineering of the backbone sequence secondary structure non-complementary pairing sequence (i.e., a portion that does not form an intramolecular base complementary pairing) of the snoRNA molecule. The framework sequence secondary structure can be a secondary structure predicted by a person skilled in the art according to a conventional calculation method or can be a secondary structure determined according to a conventional experimental method. The complementary pairing may be conventional A-U, C-G base complementary pairing, and may or may not include other less common base complementary pairing (e.g., G-U, A-A, A-C, A-G, G-G, U-U, U-C pairing).
Further, in some cases, the backbone sequence secondary structure non-complementary pairing sequence is a loop forming sequence of the backbone sequence secondary structure of the pre-engineered sgRNA molecule (i.e., a loop in the stem-loop structure of the sgRNA backbone sequence).
The engineered sgRNA molecules of the invention may comprise engineered or non-engineered crRNA sequences, as well as engineered or non-engineered tracrRNA sequences. For clarity, the description is presented in conjunction with a non-limiting example diagram as shown in fig. 2 (the sequence comprising loop 1 and loop 2 in fig. 2 is the sgRNA backbone sequence corresponding to SpCas 9). The nuclear localization functional sequence may be linked (inserted or substituted) to the location in the sgRNA secondary structure where the crRNA sequence is chimeric (i.e. linked) to the tracrRNA sequence, such as the loop 1 position in fig. 2; alternatively, the nuclear localization functional sequence may be linked (i.e. inserted or substituted into) the interior of the crRNA sequence of the sgRNA or the interior of the tracrRNA sequence, non-limiting examples may alternatively be linked, for example, to a loop (loop) formed only by the tracrRNA sequence in the sgRNA secondary structure, such as the loop 2 position in fig. 2. After insertion of the nuclear localization functional sequence, the number of nucleotides originally belonging to the pre-engineered sgRNA is not reduced, e.g. the number of nucleotides originally belonging to that part of the loop is not reduced. After substitution into the nuclear localization functional sequence, the number of nucleotides originally belonging to the pre-engineered sgRNA is reduced, e.g., the number of nucleotides originally belonging to that part of the loop is reduced (e.g., all nucleotides originally belonging to the loop are replaced by the nuclear localization functional sequence).
In some cases, a linker sequence (linker) may or may not be used when introducing a sequence with a nuclear localization function in the snoRNA molecule on the nucleotide chain of the sgRNA. The sequence of the nucleation localization function in the snoRNA molecule can be linked to the backbone sequence of the pre-engineered sgRNA molecule by a linking sequence. Furthermore, one end of the sequence with the nucleation and positioning function in the snoRNA molecule is connected with the framework sequence of the sgRNA molecule before transformation through a connecting sequence 1 (Linker 1), the other end of the sequence is connected with the framework sequence of the sgRNA molecule before transformation through a connecting sequence 2 (Linker 2), and the connecting sequence 1 and the connecting sequence 2 can be the same or different. Still further, the nucleotide sequence of the Linker1 is shown in positions 38-42 (ggcca) of SEQ ID No. 1; the nucleotide sequence of the Linker2 is shown in the 66 th-75 th positions of SEQ ID No.1 (cugcagggcc).
Further, when 2 or more than 2 sequences having a nuclear localization function in the snoRNA molecule are present on the nucleotide chain of the engineered sgRNA, the nuclear localization function sequences may be directly linked or may be linked by a linking sequence. When the sequence of the nucleation localization function in the snoRNA molecule introduced on the nucleotide chain of the sgRNA is two or more box sequences, the different box sequences may be directly linked or may be linked by a linking sequence (which may be referred to as a linking sequence 3[ linker3 ]). The linker sequence may be the same or different from the linker sequence linking the nuclear localization functional sequence and the backbone sequence of the pre-engineered sgRNA molecule. That is, the linker3 may be the same as or different from the linker1 and the linker 2. Further, the nucleotide sequence of the connecting sequence 3 is shown in positions 51-59 (GCGUCAGCA) of SEQ ID No. 1.
In a specific embodiment of the invention, the sequences of the nucleation and localization function in the snoRNA molecule are box C' and box D in U3 snoRNA. Specifically, the box C' sequence in the U3snoRNA is GAGGAAGA, and the box D sequence is GGCUGA/CUGA. The skeleton of the sgRNA is a sgRNA skeleton sequence corresponding to SpCas 9.
More specifically, the nucleotide sequence of the engineered sgRNA molecule is any one of the following:
(b1) The sequence obtained by replacing positions 1 to 25 of SEQ ID No.1 with a guide sequence for recognizing a target nucleic acid,
wherein, (b 1) is box C 'and box D "replaced in" U3snoRNA "at the position of loop1 (loop 1) of the sgRNA backbone sequence corresponding to SpCas9, and box C' and box D" also replaced in "U3 snoRNA" at the position of loop2 (loop 2);
(b2) The 1 st to 25 th positions of SEQ ID No.5 are replaced with a guide sequence for recognizing the target nucleic acid,
wherein, (b 2) is box C 'and box D' which are replaced by 'U3 snoRNA' at the position of loop1 (loop 1) of the sgRNA skeleton sequence corresponding to SpCas9, and the position of loop2 (loop 2) is not modified;
(b3) The 1 st to 25 th positions of SEQ ID No.8 are replaced with a guide sequence for recognizing the target nucleic acid,
wherein, (b 3) is a box C 'and a box D' which are replaced by 'U3 snoRNA' at the position of loop2 (loop 2) of the sgRNA skeleton sequence corresponding to SpCas9, and the position of loop1 (loop 1) is not modified.
For clarity, the length of the guide sequences described in (b 1), (b 2) and (b 3) above may vary moderately and is not limited to a length of only 25 bp.
Wherein, the 1 st to 25 th positions of SEQ ID No.1, the 1 st to 25 th positions of SEQ ID No.5 and the 1 st to 25 th positions of SEQ ID No.8 are all the guide sequences (GAACGGCUCGGAGAUCAUCAUUGCG) for recognizing the target nucleic acid in examples.
In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the present invention may be 10bp, 15bp, 20bp, 25bp, 30bp or 40bp, 60bp, 50bp, 40bp, 30bp, 25bp, 20bp or 15bp. In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the present invention may be 10bp-50bp, 10bp-40bp, 15bp-35bp, 15bp-30bp, 15bp-25bp, 17bp-24bp or 18bp-22bp, or may be 20bp-35bp, 25bp-35bp or 28bp-32bp.
In some cases, the length of the skeleton sequence (comprising the box sequence from the snoRNA) of the modified sgRNA molecule of the invention can be more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp, more than or equal to 40bp, more than or equal to 50bp, more than or equal to 60bp, more than or equal to 70bp, more than or equal to 80bp, more than or equal to 90bp, more than or equal to 100bp, more than or equal to 110bp, more than or equal to 120bp, more than or equal to 130bp, more than or equal to 140bp, more than or equal to 150bp, more than or equal to 160bp, more than or equal to 170bp, more than or equal to 180bp, more than or equal to 200bp, more than or equal to 210bp, more than or equal to 220bp, more than or equal to 200bp, more than or equal to 350bp, more than or equal to 50bp, more than or equal to 100bp, more than or less than or equal to 140bp, more than or less than or equal to 60bp, more than or less than or equal to 60 bp. In some cases, the backbone sequence of the engineered sgRNA molecules of the present invention (comprising the box sequence from the snoRNA) may have a sequence length of 10bp to 300bp, 20bp to 250bp, 30bp to 240bp, 50bp to 220bp, 80bp to 200bp, or 100bp to 180bp.
In some cases, the sequence length of the modified sgRNA molecule of the invention can be more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp, more than or equal to 40bp, more than or equal to 50bp, more than or equal to 60bp, more than or equal to 70bp, more than or equal to 80bp, more than or equal to 90bp, more than or equal to 100bp, more than or equal to 110bp, more than or equal to 120bp, more than or equal to 130bp, more than or equal to 140bp, more than or equal to 150bp, more than or equal to 160bp, more than or equal to 170bp, more than or equal to 180bp, more than or equal to 190bp, more than or equal to 210bp, more than or equal to 350bp, more than or equal to 180bp, more than or equal to 170bp, more than or equal to 160bp, more than or equal to 150bp, more than or equal to 140bp, more than or equal to 130bp, more than or equal to 20bp, more than or equal to 50 bp. In some cases, the sequence length of the modified sgRNA molecules of the present invention may be 10bp-300bp, 30bp-250bp, 50bp-240bp, 70bp-240bp, 90bp-220bp, or 110bp-200bp.
In some cases, the engineered sgRNA molecules can be used in combination with Cas proteins for gene targeting or modification, e.g., for eukaryotic gene targeting or modification, further for animal cell gene targeting or modification, and still further for human cell gene targeting or modification. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence. In some cases, the Cas protein contains a nuclear localization sequence.
The engineered sgRNA molecule can direct Cas protein to the nucleus, and further, can direct Cas protein to a nuclear target nucleic acid. In some cases, the engineered sgrnas can direct Cas proteins to the nucleus and target or modify the target nucleic acid. In some cases, the engineered sgrnas can form complexes with Cas proteins, guide Cas proteins to the nucleus, and target or modify target nucleic acids. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the targeting the target nucleic acid consists of one or more of: cleaving one or more target nucleic acids, visualizing or detecting the one or more target nucleic acids, labeling the one or more target nucleic acids, transporting the one or more target nucleic acids, masking the one or more target nucleic acids, binding the one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs. Further, in some cases, the targeting of the target nucleic acid is binding to the target nucleic acid; in some cases, the targeting the target nucleic acid is cleavage of the target nucleic acid.
In some cases, the modifying the target nucleic acid consists of one or more of: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of a nucleic acid, demethylation of a nucleic acid, and deamination of a nucleic acid.
In some cases, the sgRNA comprises at least one chemically modified nucleotide, non-limiting examples of which include 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-moe), 2 '-fluoro (2' -F), phosphorothioate (p=s) linkage modifications between nucleotides. The chemical modification may be located at any number of nucleotides at any position. In some cases, the sgrnas comprise modifications at the 5 'end and/or the 3' end.
In some cases, the engineered sgRNA molecules may additionally be modified with any number of nucleotides, such as, by way of non-limiting example, the end of the sgRNA guide sequence in the patent publication No. CN104968784B, which also contains 2 additional guanine nucleotides.
In some cases, the Cas protein is selected from Cas9, cas12, and Cas13. In some cases, the Cas protein is selected from Cas9, cas12. In some cases, the Cas protein is selected from Cas9, cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, cas12, and Cas13 without Cas endonuclease activity, including but not limited to a fully inactivated death Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickase (nickase) having only single-strand cleavage function, e.g., cas9 nickase (Cas 9 nickase, nCas 9), cas12 nickase. In some cases, the Cas9 is selected from SpCas9 (streptococcus pyogenes Cas 9).
It is understood that such engineered sgRNA molecules (sequences comprising the nuclear localization function in the snoRNA molecule) are within the scope of the invention, provided that any particular Cas protein (without nuclear localization sequence) can be directed to the nucleus.
In a third aspect, the invention claims a DNA molecule encoding the engineered sgRNA molecule of the second aspect.
In a specific embodiment of the invention, the DNA molecule is any one of the following:
(c1) The 1 st to 25 th positions of SEQ ID No.2 are replaced by DNA sequences corresponding to the guide sequences;
(c2) The 1 st to 25 th positions of SEQ ID No.6 are replaced by DNA sequences corresponding to the guide sequences;
(c3) The 1 st to 25 th positions of SEQ ID No.9 are replaced by DNA sequences corresponding to the guide sequences;
wherein (c 1) - (c 3) correspond in sequence to the foregoing (b 1) - (b 3).
Wherein, the 1 st to 25 th positions of SEQ ID No.2, the 1 st to 25 th positions of SEQ ID No.6 and the 1 st to 25 th positions of SEQ ID No.9 are all DNA sequences corresponding to the guide sequences for recognizing the target nucleic acid in examples (GAACGGCTCGGAGATCATCATTGCG).
In a fourth aspect, the invention claims an expression cassette, expression vector, recombinant or transgenic cell line comprising a DNA molecule as described in the third aspect above.
The expression vector may comprise any regulatory element operably linked to the DNA molecule. In some cases, the regulatory element is a promoter and/or enhancer. In some cases, the regulatory element is a promoter.
In a specific embodiment of the invention, the promoter in the expression cassette that initiates transcription of the DNA molecule is a U6 promoter.
More specifically, the expression cassette is any one of the following:
(d1) The 250 th to 274 th positions of SEQ ID No.3 are replaced by DNA sequences corresponding to the guide sequences;
(d2) The 250 th to 274 th positions of SEQ ID No.7 are replaced by DNA sequences corresponding to the guide sequences;
(d3) The 250 th to 274 th positions of SEQ ID No.10 are replaced by DNA sequences corresponding to the guide sequences;
wherein (d 1) - (d 3) correspond in sequence to the foregoing (c 1) - (c 3).
Wherein, the 250 th to 274 th positions of SEQ ID No.3, the 250 th to 274 th positions of SEQ ID No.7 and the 250 th to 274 th positions of SEQ ID No.10 are all DNA sequences (GAACGGCTCGGAGATCATCATTGCG) corresponding to the guide sequences for recognizing the target nucleic acid in the examples.
Accordingly, the expression vector may be an expression vector comprising the expression cassette described above.
In a specific embodiment of the invention, the expression vector is a recombinant vector obtained by replacing a small fragment between the cleavage sites Kpn I and Not I of the pX601 vector with the expression cassette.
In a fifth aspect, the invention claims a kit.
The kit claimed in the present invention may comprise any of the following:
i. cas protein, and engineered sgRNA molecules as described in the second aspect above.
ii. An expression vector comprising a nucleotide sequence encoding a Cas protein (denoted expression vector 1), and an engineered sgRNA molecule as described in the second aspect above.
A Cas protein, and an expression vector comprising a nucleotide sequence encoding the engineered sgRNA molecule described in the second aspect above (denoted expression vector 2).
iv, an expression vector comprising a nucleotide sequence encoding a Cas protein (i.e., expression vector 1), and an expression vector comprising a nucleotide sequence encoding an engineered sgRNA molecule as described in the second aspect of the foregoing (i.e., expression vector 2).
v, an expression vector comprising a nucleotide sequence encoding a Cas protein and a nucleotide sequence encoding the engineered sgRNA molecule described in the second aspect of the foregoing (denoted expression vector 3).
In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the Cas protein contains a nuclear localization sequence.
In a sixth aspect, the invention claims a composition selected from any one of the following:
I. a composition comprising: cas protein, and the engineered sgRNA molecule described in the second aspect above;
II. A composition comprising: a nucleic acid molecule 1 encoding a Cas protein (expression vector 1 as described in the fifth aspect of the foregoing), and an engineered sgRNA molecule as described in the second aspect of the foregoing;
III, a composition comprising: cas protein, and nucleic acid molecule 2 encoding the engineered sgRNA molecule described in the second aspect of the foregoing (expression vector 2 as described in the fifth aspect of the foregoing);
IV, a composition comprising: a nucleic acid molecule 1 encoding a Cas protein (expression vector 1 as described in the fifth aspect of the foregoing), and a nucleic acid molecule 2 encoding an engineered sgRNA molecule as described in the second aspect of the foregoing (expression vector 2 as described in the fifth aspect of the foregoing);
v, a composition comprising: nucleic acid molecule 3 encoding a Cas protein and an engineered sgRNA molecule as described in the second aspect of the foregoing (expression vector 3 as described in the fifth aspect of the foregoing).
In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the Cas protein contains a nuclear localization sequence.
In a seventh aspect, the invention claims an RNP complex formed from a Cas protein and the engineered sgRNA molecule described in the second aspect above.
In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the Cas protein contains a nuclear localization sequence.
In the fifth to seventh aspects above, the Cas protein may be selected from: cas9, cas12, and Cas13.
In some cases, the Cas protein is selected from Cas9, cas12. In some cases, the Cas protein is selected from Cas9, cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, cas12, and Cas13 without Cas endonuclease activity, including but not limited to a fully inactivated death Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickase (nickase) having only single-strand cleavage function, e.g., cas9 nickase (Cas 9 nickase, nCas 9), cas12 nickase.
In a specific embodiment of the invention, the Cas protein is streptococcus pyogenes Cas9 (SpCas 9).
In v of the fifth aspect, the expression vector 3 may be a recombinant vector obtained by inserting a Cas protein (which may be selected from Cas9, cas12 or Cas13; specifically, such as SpCas 9) encoding gene without Nuclear Localization Signal (NLS) into the expression vector of the fourth aspect.
In a specific embodiment of the present invention, the expression vector 3 is any one of the following:
(e1) The full sequence is obtained by replacing 5365 th-5389 th sites of SEQ ID No.4 with a DNA sequence corresponding to a guide sequence;
(e2) The full sequence is obtained by replacing 5365 th-5389 th sites of SEQ ID No.11 with a DNA sequence corresponding to a guide sequence;
(e3) The full sequence is obtained by replacing 5365-5389 of SEQ ID No.12 with a DNA sequence corresponding to the guide sequence.
Wherein (e 1) - (e 3) correspond in sequence to the foregoing (d 1) - (d 3).
Wherein, the 5365-5389 th site of SEQ ID No.4, the 5365-5389 th site of SEQ ID No.11 and the 5365-5389 th site of SEQ ID No.12 are all DNA sequences (GAACGGCTCGGAGATCATCATTGCG) corresponding to the guide sequences for recognizing the target nucleic acid in the examples.
In an eighth aspect, the invention claims an engineered sgRNA molecule according to the second aspect of the preceding, a DNA molecule according to the third aspect of the preceding, an expression cassette, an expression vector, a recombinant bacterium or a transgenic cell line according to the fourth aspect of the preceding, a kit according to the fifth aspect of the preceding, a composition according to the sixth aspect of the preceding or an RNP complex according to the seventh aspect of the preceding, for use in any of the following:
p1, targeting or modifying the genomic target nucleic acid; and
p2, preparation of products for targeting or modification of genomic target nucleic acids.
Wherein the targeting or modification of the genomic target nucleic acid may be: the method is used for eukaryotic cell gene targeting or modification, further can be used for animal cell gene targeting or modification, and further can be used for human cell gene targeting or modification.
In some cases, the product for targeting or modifying the genomic target nucleic acid is a drug for treating a disease of an animal body, including but not limited to a drug for treating a disease of a human individual.
In a ninth aspect, the invention claims a method of targeting or modifying a genomic target nucleic acid.
The methods of targeting or modifying genomic target nucleic acids claimed herein can include: introducing the composition of the sixth aspect into an organism or biological cell, such that both the Cas protein and the engineered sgRNA molecule are expressed, and targeting or modification of genomic target nucleic acid is achieved.
In a tenth aspect, the invention claims a method of making a mutant of a biological cell.
The method of preparing a mutant of a biological cell claimed in the present invention may comprise: targeting or modifying the genome of a biological cell according to the method of the ninth aspect of the invention to obtain a mutant biological cell.
Wherein the biological cell can be a eukaryotic cell, further can be an animal cell, and further can be a human cell.
In a specific embodiment of the invention, the biological cell is a 293T cell.
In an eleventh aspect, the invention features a method of making a biological mutant.
The method of preparing a mutant of a biological cell claimed in the present invention may comprise: the biological mutant is obtained by targeting or modifying the genome of the organism according to the method of the ninth aspect.
Wherein the organism may be a eukaryotic organism, further may be an animal, further may be a mammal, such as a human.
The invention has the beneficial effects that:
1. in the prior art, cas proteins are often linked with a nuclear localization sequence to help the Cas protein localize to the nucleus during actual use. The invention develops another novel CRISPR-Cas system, which is a brand new technical scheme and can effectively complete gene editing. Without being bound by theory, one of skill in the art can reasonably speculate that the sgrnas of the present invention form complexes with Cas proteins, followed by the interaction of the nuclear localization functional sequences from the snornas with the relevant proteins into the nucleus. It is theoretically speculated that when the Cas protein is not linked to a Nuclear Localization Sequence (NLS), cas protein can be transported into the nucleus by nuclear localization of sgrnas linked to a snoRNA nuclear localization functional sequence, so that the nuclear penetration of Cas9 can be reduced, reducing off-target effects.
2. Introducing a C'/D box sequence into the loop (loop) portion of the sgRNA backbone can effectively guide Cas proteins into the nucleus for gene editing.
3. The editing activity is relatively low when 2 or more C '/D box sequences are introduced in total in the loop (loop) portions of the sgRNA molecular backbone, but is rather higher when only 1C '/D box sequence is introduced (i.e. only 1C ' box and 1D box sequence are introduced). This is in contrast to the case of gene editing relying on nuclear insertion of a Cas protein with NLS attached (in practical application scenarios, the more nuclear localization sequences NLS, the higher the editing efficiency tends to be). Thus, the corresponding technical proposal achieves unexpected technical effects.
4. In the case of sgrnas with only one C '/D box, the C'/D box is more efficient to edit when attached to the loop 2 position (distal to the guide sequence) than when attached to the loop 1 position (proximal to the guide sequence) as shown in fig. 2.
Drawings
FIG. 1 is an exemplary box C'/D sequence of a U3 snoRNA molecule. From Narayanan A, spckmann W, terns R, terns MP. Role of the box C/D motif in localization of small nucleolar RNAs to coiled bodies and nucleic. Mol Biol cell.1999Jul;10 (7) 2131-47.doi:10.1091/mbc.10.7.2131.PMID 10397754; PMCID PMC25425.
Fig. 2 is a molecular structure of sgrnas containing specific framework sequences corresponding to SpCas9, showing crRNA sequences and tracrRNA sequences of the sgrnas before modification. And shows 2 of the numerous sites into which the snoRNA nuclear localization functional sequence can be inserted/replaced (loop 1 and loop 2). Loop (loop) 1 is the junction of the crRNA and tracrRNA in close proximity to the base complementarily paired stem 1 where the nuclear localization functional sequence can be ligated. Loop (loop) 2 is located within the tracrRNA sequence, immediately adjacent to the base complementary pairing formed stem 2, where the nuclear localization functional sequence can be ligated. The crRNA contains a guide sequence, where N at the guide sequence represents any ribonucleotide, and the ellipses represent that the number of ribonucleotides of the guide sequence can be varied as appropriate.
FIG. 3 is a schematic diagram of the target vector C'/Dbox-PAM.
Fig. 4 is a map of the control vector SpCas9-PAM plasmid.
FIG. 5 is a plasmid map of lentiviral vector pGFPPAM.
FIG. 6 shows the proportion of GFP positive cells detected by the flow cytometer of example 1.
FIG. 7 is a schematic diagram of the target vector C'/Dbox-1-1.
FIG. 8 is a schematic diagram of the target vector C'/Dbox-1-2.
FIG. 9 shows the proportion of GFP positive cells detected by the flow cytometer of example 2.
Detailed Description
Definition:
as used herein, the term "Cas protein", or a protein or polypeptide having "Cas enzymatic activity" or "Cas endonuclease activity", relates to a CRISPR-associated (Cas) polypeptide or protein encoded by a CRISPR-associated (Cas) gene that, when complexed or functionally combined with one or more guide RNAs (guide RNA, sgRNA, sgRNA molecules), is capable of being directed to a target sequence in a target nucleic acid and targeting or modifying the target nucleic acid. Cas endonucleases recognize, target, or modify specific target sites (target sequences or nucleotide sequences near target sequences) in target nucleic acids by sgRNA guidance.
As used herein, the term "sgRNA" (single guide RNA) refers to a single guide RNA that is used together with a Cas protein. sgrnas are fusions of crrnas and tracrrnas, and comprise guide sequences; or the sgrnas comprise crRNA sequences and guide sequences, and do not comprise tracrRNA sequences.
As used herein, the term "guide sequence" is used interchangeably with "targeting domain" to refer to a contiguous nucleotide sequence in an sgRNA that has partial or complete complementarity to a target sequence in a target nucleic acid and can hybridize to the target sequence in the target nucleic acid by base complementary pairing facilitated by a Cas protein. The complete complementarity of the guide sequences described herein to the target sequence is not necessary, so long as sufficient complementarity exists to cause hybridization and promote the formation of a CRISPR/Cas complex.
As used herein, when referring to an sgRNA, the term "backbone sequence" is intended to mean other nucleotide sequences in the sgRNA than the guide sequence. For example, sequences between the guide sequence in the sgRNA and the corresponding poly U of the transcription terminator may be included. The backbone sequence will generally not change due to changes in the target sequence. Thus, the framework sequence may be any feasible sequence.
As used herein, the term "target nucleic acid" may comprise any polynucleotide, such as DNA (target DNA) or RNA (target RNA). "target nucleic acid" refers to a nucleic acid that the sgRNA directs Cas protein to target or modify. The term "target nucleic acid" can be any polynucleotide that is endogenous or exogenous to a cell (e.g., a eukaryotic cell). For example, a "target nucleic acid" can be a polynucleotide that is present in a eukaryotic cell, or can be a sequence (or portion thereof) or a non-coding sequence (or portion thereof) that encodes a gene product (e.g., a protein). In some cases, a "target nucleic acid" may include one or more disease-associated genes and polynucleotides and signaling biochemical pathway-associated genes and polynucleotides.
As used herein, the term "target sequence" refers to a small stretch of nucleotide sequence in a target nucleic acid molecule that can be complementary (fully complementary or partially complementary) or hybridized to a guide sequence of an sgRNA molecule. The target sequence is often tens of bp in length, and may be, for example, about 10bp, about 20bp, about 30bp, about 40bp, about 50bp, about 60bp.
As used herein, the term "targeting" is defined as consisting of one or more of the following: cleaving one or more target nucleic acids, visualizing or detecting the one or more target nucleic acids, labeling the one or more target nucleic acids, transporting the one or more target nucleic acids, masking the one or more target nucleic acids, binding the one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs.
As used herein, the term "modification" is defined as consisting of one or more of the following: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of a nucleic acid, demethylation of a nucleic acid, and deamination of a nucleic acid.
As used herein, the term "cleavage" refers to cleavage of a covalent bond (e.g., covalent phosphodiester bond) in the ribosyl phosphodiester backbone of a polynucleotide, including but not limited to: the single-stranded polynucleotide is cleaved, either of the double-stranded polynucleotides comprising the two complementary single strands is cleaved, and both single strands of the double-stranded polynucleotide comprising the two complementary single strands are cleaved.
As used herein, when referring to "a sequence in a snoRNA molecule that plays a role in nuclear localization function" it is intended to mean a nucleotide sequence/element in a snoRNA molecule that plays an important role in nuclear localization function, in particular a nucleotide sequence that plays an important role in nucleolar localization function, non-limiting examples include C' box, D box.
As used herein, the terms C 'box and box C' are used interchangeably; the terms dbox and box D are used interchangeably.
As used herein, the term "nuclear localization sequence" (nuclear localization signal or nuclear localization sequence, NLS) is an amino acid sequence that serves as a tag for the transport of proteins through the nucleus into the nucleus.
As used herein, the term "snoRNA" (Small nucleolar RNA ) is a broad class of eukaryotic RNAs that play a role in the biogenesis of ribosomes within the nucleolus.
As used herein, the term "engineering" when referring to an sgRNA includes altering the nucleotide sequence of the sgRNA to produce an engineered sgRNA molecule.
As used herein, the term "non-complementary pairing sequence" when referring to the backbone sequence secondary structure of an sgRNA molecule refers to nucleotides in the backbone sequence secondary structure of the sgRNA that do not form intramolecular base complementary pairing. The framework sequence secondary structure is a secondary structure predicted by a person skilled in the art according to a conventional calculation method or a secondary structure determined according to a conventional experimental method.
As used herein, the terms "crRNA" and "tracrRNA" have the meanings commonly recognized by those skilled in the art, respectively.
As used herein, when referring to sgrnas, "loop" has the meaning commonly recognized by those skilled in the art, often refers to a loop in the stem-loop structure of an RNA in which bases complement to form a stem, while portions that cannot complement to form a loop protrude.
As used herein, "sgRNA" and "sgRNA molecule" are used interchangeably.
As used herein, when referring to a sequence that functions as a nuclear localization function in a snoRNA molecule, the term "substitution" refers to the replacement of a fragment consisting of 1 nucleotide or more than 1 consecutive nucleotides of the sgRNA molecule prior to modification with a sequence that functions as a nuclear localization function.
As used herein, when referring to a sequence that functions as a nuclear localization in a snoRNA molecule, the term "insert" refers to the insertion of only the nuclear localization sequence into the sgRNA sequence without deleting the nucleotides of the sgRNA molecule prior to modification.
As used herein, one instance of the "directing Cas protein to the nucleus" is that the engineered sgrnas of the invention can be transported into the nucleus, and the Cas protein is also transported into the nucleus simultaneously with the sgrnas. Generally, the one or more snoRNA nuclear localization functional sequences have sufficient strength to drive Cas protein accumulation in the nucleus of eukaryotic cells in a detectable amount. Detection of whether the Cas protein is directed to the nucleus or detection of the amount of sgRNA and Cas protein accumulated in the nucleus can be performed by any suitable technique. For example, a detectable label can be fused to the sgRNA or Cas protein such that the location within the cell is visualized, such as in conjunction with a means for detecting the location of the nucleus. The nuclei may also be isolated from the cells and their contents may then be analyzed by any suitable method for detecting RNA or proteins, including but not limited to, such as immunohistochemistry, western blotting, or enzymatic activity assays, and the like. Accumulation in the nucleus can also be determined indirectly, such as by determining the effect on targeting or modification of the target nucleic acid (e.g., determining DNA cleavage or mutation at the target sequence, or determining changes in transcription or translation levels of the gene to which the target sequence belongs).
As used herein, the term "operably linked" is intended to mean that the Cas protein coding sequence or the sgRNA coding sequence in the vector is linked to one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
As used herein, the term "regulatory element" is intended to include promoters, enhancers, internal Ribosome Entry Sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly U sequences). Regulatory elements include those that direct the continuous expression of nucleotide sequences in many types of host cells and those that direct the expression of nucleotide sequences only in certain host cells (e.g., tissue-specific regulatory sequences).
The following detailed description of the invention is provided in connection with the accompanying drawings that are presented to illustrate the invention and not to limit the scope thereof. The examples provided below are intended as guidelines for further modifications by one of ordinary skill in the art and are not to be construed as limiting the invention in any way.
The experimental methods in the following examples, unless otherwise specified, are conventional methods, and are carried out according to techniques or conditions described in the literature in the field or according to the product specifications. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified.
Literature Mol Biol cell 1999jul;10 The box C'/D of U3 snoRNA molecule of 2131-2147 is shown in FIG. 1. The following specific example of the invention is the introduction of box C '/D of U3 snoRNA molecule into sgRNA molecule, i.e. sequence GAGGAAGAGCGUCAGCAGGCUGA in FIG. 1 into sgRNA molecule, wherein GAGGAAGA is box C' sequence and GGCUGA is box D sequence.
In designing the example experiments, the sgRNA backbone sequence corresponding to SpCas9 was engineered (see fig. 2). The box C '/D of the U3 snoRNA molecule can be replaced at loop 1 and/or loop 2 position and the box C'/D of the U3 snoRNA molecule and the remaining sgRNA backbone can be linked using a linking sequence (ggcca, ctgcagggcc).
The engineered sgRNA molecules of the invention may comprise a crRNA portion and a tracrRNA portion, and the nuclear localization functional sequence may be linked within the crRNA sequence or within the tracrRNA sequence, and the nuclear localization functional sequence may also be linked at a chimeric position of the crRNA and tracrRNA, such as at loop 1 position in fig. 2.
The box C '/D sequence of the U3 snoRNA molecule was ligated to the sgRNA backbone of SpCas9 (as shown in fig. 2, 1U 3 snoRNA box C'/D was ligated to loop 1 and loop 2 positions of the sgRNA backbone, respectively), resulting in the following sequences:
guuuuagagcuaggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccuagcaaguuaaaauaagg cuaguccguuaucaacuuggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccaagu ggcaccgagucggugc。
wherein the uppercase sequence is U3 snoRNA box C '/D, the lowercase underlined is the backbone sequence of the sgRNA corresponding to SpCas9 prior to modification, and the lowercase underlined is the linker sequence (linking U3 snoRNA box C'/D and sgRNA backbone).
The U3 snoRNA box C'/D is connected with the sgRNA corresponding to the Cas9 to form the sgRNA with a nuclear entering function, so that the Cas9-sgRNA complex is guided to enter the nucleus.
Example 1 in vivo editing Activity verification of box C'/D-Cas9
1. Construction of verification vector
Construction of verification vector C'/D box-PAM
The sgRNA expression cassette sequence (SEQ ID No. 3) comprising U3 snoRNA box C'/D was synthesized at reagent company. Positions 1-249 of SEQ ID No.3 are the U6 promoter followed by the coding sequence of the engineered sgRNA molecule (positions 250-274 are the coding sequence of the guide sequence). The expression cassette sequence encodes an sgRNA molecule containing the guide sequence GAACGGCUCGGAGAUCAUCAUUGCG of the targeted cell validation library, and a framework sequence substituted into 2 boxes C'/D. The sequence of the sgRNA molecule is shown as SEQ ID No.1, and the corresponding DNA sequence of the sgRNA molecule is shown as SEQ ID No. 2.
The expression frame sequence (SEQ ID No. 3) is assembled on a pX601 vector (commercial) framework through Kpn I and Not I restriction sites, and an intermediate vector C'/D box-pre is obtained after sequencing verification.
In addition, the SpCas9 fragment without NLS was amplified by PCR using the vector pX459 vector (commercially available) as template with the following primers:
F:5’-gctctctggctaactaccggtgccaccatggccGACAAGAAGTACAGCAT-3’;
R:5’-atcagcgagctctaggaattcTTAGTCGCCTCCCAGCTGAGACAG-3’。
and connecting the PCR product after the sequencing verification is correct to an intermediate vector C '/D box-pre through Age I and EcoR I sites, and obtaining the target vector C'/D box-PAM after the sequencing verification is correct.
The complete sequence of the target vector C'/D box-PAM is shown in SEQ ID No. 4.
The schematic carrier diagram of the target carrier C'/Dbox-PAM is shown in FIG. 3.
2. Construction of control vector SpCas9-PAM
A control vector SpCas9-PAM was constructed that entered the nucleus using 2 x NLS.
The sequence GAACGGCTCGGAGATCATCATTGCG of the targeted cell verification library is connected to the vector skeleton pX459 expressing SpCas9 through Bbs I site, so that a control vector SpCas9-PAM is constructed, and the plasmid full sequence is shown as SEQ ID No. 13.
The control vector SpCas9-PAM plasmid map is shown in FIG. 4.
3. Vector transfection 293T library cells
Reference methods (HuZ, wang D, zhang C, et al, diversity noncanonical PAMs recognized by SpCas in human cells [ J ]. BioRxiv, 2019:67503) 293T library cells were constructed which contained target sites recognized by the above-described box C'/D-containing sgRNAs, the library itself contained a frame shift mutated GFP library, and the frame shift mutation when targeted for editing resulted in luminescence of cells that were not originally luminescent, and the effect of editing could be judged by detecting the proportion of luminescent cells.
The specific method for constructing 293T library cells comprises the following steps:
GFP PAM library design is described in the above literature, and the structure is CMV master-ATG-protospacer-NNNNN-EGFP-puro, wherein the protospacer sequence is GAACGGCTCGGAGATCATCATTGCG in accordance with the literature. N is any deoxyribonucleotide, such as A, T, C or G.
The CMV promter sequence used is shown in SEQ ID No. 14. The puro selection marker sequence used is shown in SEQ ID No. 15. EGFP sequence (without initiation and termination codons) was used as shown in SEQ ID No. 16.
Experiments the expression cassette hPG-pro-EGFP of the plasmid pRRLSIN.cPPT.PGK-GFP.WPRE (commercially available) was digested with EcoR V and Sal I cleavage sites and then used as a backbone, and CMV pro-ATG-protospacer-NNNNNNNNN-GFP-puro was ligated to the backbone via EcoR V and Sal I cleavage sites to obtain the lentiviral vector pGFPPAM of the expression library, the sequence of which is shown in SEQ ID No. 17.
The plasmid map of lentiviral vector pGFPPAM is shown in FIG. 5.
293T library cells were obtained using the lentiviral vector pGFPPAM described above according to the method described in the above-mentioned literature (Hu Z, wang D, zhang C, et al Diverse noncanonical PAMs recognized by SpCas in human cells [ J ]. BioRxiv, 2019:67503) (except that the lentiviral vector pGFPPAM was identical in sequence, the other parts of the method were identical).
Vectors C '/D box-pre, C'/D box-PAM, pX459, and SpCas9-PAM were transfected into 293T library cells at 800ng concentrations in 24 well plates.
The transfection method is as follows:
(1) Pancreatic enzyme (Trypsin 0.25%, EDTA, thermo, 11058021) digests 293T cells, counts the cells, and counts 2X 10 cells in 500. Mu.L per well 5 Cells were plated in 24 well plates.
(2) For each transfected sample, please prepare the complex according to the following steps:
a. in each well plated in cells, the aforementioned plasmid DNA was diluted in 50. Mu.L of serum-free Opti-MEM I (Thermo, 25200056) reducing serum medium and gently mixed;
b. lipofectamine 2000 (Thermo, 11668019) was gently mixed prior to use, and then 1.6. Mu.L of Lipofectamine 2000 was diluted in each well, i.e., 50. Mu.L of Opti-MEM I medium. Incubate for 5 minutes at room temperature. Note that: continuing to execute the step c within 25 minutes;
c. after 5 minutes incubation, diluted DNA was combined with diluted Lipofectamine 2000. Mix gently and incubate for 20 minutes at room temperature (the solution may look cloudy). Note that: the complex was stable for 6 hours at room temperature. The complexes were added to 293T library cells and mixed and detected using a flow cytometer after 48 hours.
4. Flow cytometry detects SpCas9 with C'/D box and the effect of its control on library cell editing
Cells 48h after transfection in step 3 were digested with pancreatin (Trypsin 0.25%, EDTA, thermo, 11058021), supernatant was removed by centrifugation at 300g 5min, cells from each well were resuspended with 500 μl of PBS, GFP fluorescence expression was detected by flow cytometry, cell debris was removed by FCS-se:Sup>A and SSC-se:Sup>A gates, and then the proportion of GFP fluorescence luminescence was detected by flow cytometry using 239T-PAM cells (i.e. 293T library cells as described above) as negative gates. The experiment of this example was repeated 3 times, the results are shown in FIG. 6, and the specific flow cytometer tests GFP positive cell proportion results are shown in Table 1.
TABLE 1 flow cytometer to determine the proportion of GFP positive cells (average of 3 replicates)
Note that: * Represents significant differences (P < 0.01) compared to the 293T-PAM, pX459, C'/D box-pre groups, respectively.
From the flow results, the proportion of GFP positive cells in the positive control SpCas9-PAM group is 3.25%, and the modeling is successful. The C '/D box-PAM grouping using box C '/D into the core has a remarkable editing effect, and library cells can be edited and GFP fluorescence can be generated, so that a strategy of using box C '/D to guide Cas9-sgRNA complex into the core is feasible. The efficiency is close to SpCas9-PAM using NLS as the nuclear in signal.
Example 2, influence of the position and quantity of box C'/D on editing Activity
1. Construction of verification vectors carrying sgrnas of different numbers of box C'/D
The C '/D box-PAM encoded sgRNA in example 1 carries two box C '/D, in order to verify the effect of different numbers of box C '/D on the nuclear editing activity of Cas9, two verification vectors were designed for expressing sgRNAs with only one box C '/D at different positions of the molecule (with 1 box C '/D attached at only loop 1 or loop 2 position of the sgRNA backbone), the backbone sequences of which are shown below:
C'/D box-1-1sgRNA backbone:
5’-gttttagagctaggccaGAGGAAGAGCGTCAGCAGGCTGActgcagggcctagcaagttaaaata aggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgc-3’;
C'/D box-1-2sgRNA backbone:
5’-gttttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttggccaGAGGAAGAGCGTCAGCAGGCTGActgcagggccaagtggcaccgagtcggtgc-3’;
of the above two sgRNA frameworks, the uppercase is a box C'/D sequence, the underlined lowercase is a sgRNA framework sequence of SpCas9 before modification, and the rest is a connecting sequence.
The sequence of two sgRNA expression frames (denoted as expression frame 1 and expression frame 2) containing only one U3 snoRNA box C '/D was synthesized at reagent company, both expression frames contained the corresponding sequence GAACGGCTCGGAGATCATCATTGCG of the guide sequence of the targeted cell verification library, and assembled on the pX601 vector backbone through Kpn I and Not I cleavage sites, and the intermediate vectors were named C '/D box-1-1-pre and C '/D box-1-2-pre, respectively, after sequencing verification was correct.
The nucleotide sequence of the synthetic sequence of the C'/D box-1-1 (expression frame 1) is shown as SEQ ID No. 7. Positions 1-249 of SEQ ID No.7 are the U6 promoter followed by the coding sequence of the engineered sgRNA molecule (positions 250-274 are the coding sequence of the guide sequence). The expression frame codes for an sgRNA molecule (SEQ ID No. 5), and the corresponding DNA sequence of the sgRNA molecule is shown as SEQ ID No. 6.
The nucleotide sequence of the synthetic sequence of C'/D box-1-2 (expression frame 2) is shown as SEQ ID No. 10. Positions 1-249 of SEQ ID No.10 are the U6 promoter followed by the coding sequence of the engineered sgRNA molecule (positions 250-274 are the coding sequence of the guide sequence). The expression frame codes for an sgRNA molecule (SEQ ID No. 8), and the corresponding DNA sequence of the sgRNA molecule is shown as SEQ ID No. 9.
In addition, the SpCas9 fragment without NLS was amplified by PCR using vector pX459 (commercially available) as template with the following primers:
F:5’-gctctctggctaactaccggtgccaccatggccGACAAGAAGTACAGCAT-3’;
R:5’-atcagcgagctctaggaattcTTAGTCGCCTCCCAGCTGAGACAG-3’。
the PCR products after the sequencing verification are correct are respectively connected to an intermediate vector C '/D box-1-1-pre and a C'/D box-1-2-pre through Age I and EcoR I sites, and the target vectors C '/D box-1-1 and C'/D box-1-2 are obtained after the sequencing verification is correct.
The complete sequence of the C'/D box-1-1 vector is shown as SEQ ID No. 11.
A schematic diagram of the target vector C'/D box-1-1 is shown in FIG. 7.
The complete sequence of the target vector C'/D box-1-2 is shown as SEQ ID No. 12.
The schematic carrier diagram of the target carrier C'/D box-1-2 is shown in FIG. 8.
2. Flow cytometry detected SpCas9 with different numbers of C'/D boxes and the effect of its control on library cell editing
The C '/D box-PAM, C'/D box-pre, pX459, spCas9-PAM vectors of example 1, and the C '/D box-1-1, C'/D box-1-2 vectors of this example were transfected with 293T library cells of example 1, respectively. Transfection procedure was the same as in example 1, and cells were collected 72h after transfection and assayed for GFP fluorescence ratio by flow cytometry (specific procedure was as in example 1).
Cells from 72h post-transfection were digested with pancreatin (Trypsin 0.25%, EDTA, thermo, 11058021), supernatant was removed by centrifugation at 300g for 5min, cells from each well were resuspended with 500 μl of PBS, GFP fluorescence expression was detected by flow cytometry, cell debris was removed by FCS-se:Sup>A and SSC-se:Sup>A gates, and after removal of cell debris by 239T-PAM cells (i.e. 239T library cells described above) as negative gates, the flow cytometer detected GFP fluorescence luminescence ratios. The experiment of this example was repeated 3 times, the results are shown in FIG. 9, and the specific flow cytometer detection GFP positive cell proportion results are shown in Table 2.
TABLE 2 flow cytometer to determine the proportion of GFP positive cells (average of 3 replicates)
/>
Note that: * Represents significant differences compared to the 293T-PAM, pX459, C'/D box-pre groups respectively (P<0.01)。 ## Indicating that the C '/D box1-2 group has a significant difference (P) compared with the C'/D box-PAM group<0.05)。
Experiments find that the editing activity of the C '/D box-PAM, C '/D box 1-1 (C '/D box is connected at sgRNA loop 1) and C '/D box1-2 (C '/D box is connected at sgRNA loop 2) groups is close to that of the SpCas9-PAM group. The proportion of GFP-positive cells in the C'/D box-pre group was substantially equivalent to that in the pX459 control group. From the streaming results, it can be seen that with only one C '/D box (C'/D box 1-1, C '/D box1-2 group) there is higher editing activity than with two C'/D boxes. In the case of sgrnas with only one C '/D box, the C'/D box is more efficient to edit when attached to the loop 2 position (distal to the guide sequence) than when attached to the loop 1 position (proximal to the guide sequence).
The present application is described in detail above. It will be apparent to those skilled in the art that the present application can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the application and without undue experimentation. While the application has been described with respect to specific embodiments, it will be appreciated that the application may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The application of some of the basic features may be done in accordance with the scope of the claims that follow.
<110> Guangzhou Ruifeng biotechnology Co Ltd
<120> engineered sgRNA molecules and uses thereof
<130> GNCLN221725
<160> 17
<170> PatentIn version 3.5
<210> 1
<211> 169
<212> RNA
<213> Artificial sequence
<400> 1
gaacggcucg gagaucauca uugcgguuuu agagcuaggc cagaggaaga gcgucagcag 60
gcugacugca gggccuagca aguuaaaaua aggcuagucc guuaucaacu uggccagagg 120
aagagcguca gcaggcugac ugcagggcca aguggcaccg agucggugc 169
<210> 2
<211> 169
<212> DNA
<213> Artificial sequence
<400> 2
gaacggctcg gagatcatca ttgcggtttt agagctaggc cagaggaaga gcgtcagcag 60
gctgactgca gggcctagca agttaaaata aggctagtcc gttatcaact tggccagagg 120
aagagcgtca gcaggctgac tgcagggcca agtggcaccg agtcggtgc 169
<210> 3
<211> 425
<212> DNA
<213> Artificial sequence
<400> 3
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctaggcc agaggaagag 300
cgtcagcagg ctgactgcag ggcctagcaa gttaaaataa ggctagtccg ttatcaactt 360
ggccagagga agagcgtcag caggctgact gcagggccaa gtggcaccga gtcggtgctt 420
ttttt 425
<210> 4
<211> 8286
<212> DNA
<213> Artificial sequence
<400> 4
cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60
ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120
aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180
agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240
ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300
tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360
atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420
ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480
gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540
ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600
tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660
aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720
tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780
gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840
aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900
acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960
gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020
tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080
tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140
tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200
ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260
gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320
tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380
tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440
tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500
tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560
tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620
acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680
tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740
acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800
agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860
gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920
ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980
acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040
aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100
tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160
ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220
cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280
aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340
aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400
aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460
aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520
ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580
tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640
ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700
aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760
tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820
acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880
aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940
atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000
agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060
agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120
agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180
tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240
aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300
ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360
agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420
tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480
gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540
agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600
atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660
tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720
acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780
gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840
agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900
tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960
acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020
tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080
gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140
gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200
tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260
tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320
gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380
tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440
tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500
agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560
tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620
atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680
ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740
actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800
ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860
gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920
gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980
tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040
tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100
tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160
aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220
aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280
aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340
atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400
aggccagagg aagagcgtca gcaggctgac tgcagggcct agcaagttaa aataaggcta 5460
gtccgttatc aacttggcca gaggaagagc gtcagcaggc tgactgcagg gccaagtggc 5520
accgagtcgg tgcttttttt gcggccgcag gaacccctag tgatggagtt ggccactccc 5580
tctctgcgcg ctcgctcgct cactgaggcc gggcgaccaa aggtcgcccg acgcccgggc 5640
tttgcccggg cggcctcagt gagcgagcga gcgcgcagct gcctgcaggg gcgcctgatg 5700
cggtattttc tccttacgca tctgtgcggt atttcacacc gcatacgtca aagcaaccat 5760
agtacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg cgcagcgtga 5820
ccgctacact tgccagcgcc ttagcgcccg ctcctttcgc tttcttccct tcctttctcg 5880
ccacgttcgc cggctttccc cgtcaagctc taaatcgggg gctcccttta gggttccgat 5940
ttagtgcttt acggcacctc gaccccaaaa aacttgattt gggtgatggt tcacgtagtg 6000
ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg ttctttaata 6060
gtggactctt gttccaaact ggaacaacac tcaactctat ctcgggctat tcttttgatt 6120
tataagggat tttgccgatt tcggtctatt ggttaaaaaa tgagctgatt taacaaaaat 6180
ttaacgcgaa ttttaacaaa atattaacgt ttacaatttt atggtgcact ctcagtacaa 6240
tctgctctga tgccgcatag ttaagccagc cccgacaccc gccaacaccc gctgacgcgc 6300
cctgacgggc ttgtctgctc ccggcatccg cttacagaca agctgtgacc gtctccggga 6360
gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg cgcgagacga aagggcctcg 6420
tgatacgcct atttttatag gttaatgtca tgataataat ggtttcttag acgtcaggtg 6480
gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa atacattcaa 6540
atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat tgaaaaagga 6600
agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg gcattttgcc 6660
ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa gatcagttgg 6720
gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt gagagttttc 6780
gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt ggcgcggtat 6840
tatcccgtat tgacgccggg caagagcaac tcggtcgccg catacactat tctcagaatg 6900
acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg acagtaagag 6960
aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta cttctgacaa 7020
cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat catgtaactc 7080
gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag cgtgacacca 7140
cgatgcctgt agcaatggca acaacgttgc gcaaactatt aactggcgaa ctacttactc 7200
tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca ggaccacttc 7260
tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc ggtgagcgtg 7320
gaagccgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt atcgtagtta 7380
tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc gctgagatag 7440
gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat atactttaga 7500
ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt tttgataatc 7560
tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac cccgtagaaa 7620
agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc ttgcaaacaa 7680
aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca actctttttc 7740
cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgttcttcta gtgtagccgt 7800
agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct ctgctaatcc 7860
tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg gactcaagac 7920
gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc acacagccca 7980
gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta tgagaaagcg 8040
ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg gtcggaacag 8100
gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt cctgtcgggt 8160
ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg cggagcctat 8220
ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg ccttttgctc 8280
acatgt 8286
<210> 5
<211> 135
<212> RNA
<213> Artificial sequence
<400> 5
gaacggcucg gagaucauca uugcgguuuu agagcuaggc cagaggaaga gcgucagcag 60
gcugacugca gggccuagca aguuaaaaua aggcuagucc guuaucaacu ugaaaaagug 120
gcaccgaguc ggugc 135
<210> 6
<211> 135
<212> DNA
<213> Artificial sequence
<400> 6
gaacggctcg gagatcatca ttgcggtttt agagctaggc cagaggaaga gcgtcagcag 60
gctgactgca gggcctagca agttaaaata aggctagtcc gttatcaact tgaaaaagtg 120
gcaccgagtc ggtgc 135
<210> 7
<211> 391
<212> DNA
<213> Artificial sequence
<400> 7
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctaggcc agaggaagag 300
cgtcagcagg ctgactgcag ggcctagcaa gttaaaataa ggctagtccg ttatcaactt 360
gaaaaagtgg caccgagtcg gtgctttttt t 391
<210> 8
<211> 135
<212> RNA
<213> Artificial sequence
<400> 8
gaacggcucg gagaucauca uugcgguuuu agagcuagaa auagcaaguu aaaauaaggc 60
uaguccguua ucaacuuggc cagaggaaga gcgucagcag gcugacugca gggccaagug 120
gcaccgaguc ggugc 135
<210> 9
<211> 135
<212> DNA
<213> Artificial sequence
<400> 9
gaacggctcg gagatcatca ttgcggtttt agagctagaa atagcaagtt aaaataaggc 60
tagtccgtta tcaacttggc cagaggaaga gcgtcagcag gctgactgca gggccaagtg 120
gcaccgagtc ggtgc 135
<210> 10
<211> 391
<212> DNA
<213> Artificial sequence
<400> 10
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctagaaa tagcaagtta 300
aaataaggct agtccgttat caacttggcc agaggaagag cgtcagcagg ctgactgcag 360
ggccaagtgg caccgagtcg gtgctttttt t 391
<210> 11
<211> 8252
<212> DNA
<213> Artificial sequence
<400> 11
cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60
ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120
aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180
agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240
ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300
tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360
atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420
ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480
gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540
ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600
tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660
aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720
tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780
gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840
aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900
acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960
gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020
tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080
tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140
tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200
ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260
gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320
tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380
tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440
tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500
tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560
tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620
acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680
tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740
acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800
agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860
gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920
ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980
acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040
aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100
tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160
ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220
cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280
aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340
aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400
aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460
aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520
ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580
tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640
ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700
aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760
tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820
acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880
aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940
atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000
agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060
agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120
agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180
tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240
aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300
ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360
agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420
tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480
gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540
agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600
atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660
tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720
acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780
gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840
agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900
tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960
acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020
tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080
gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140
gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200
tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260
tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320
gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380
tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440
tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500
agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560
tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620
atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680
ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740
actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800
ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860
gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920
gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980
tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040
tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100
tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160
aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220
aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280
aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340
atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400
aggccagagg aagagcgtca gcaggctgac tgcagggcct agcaagttaa aataaggcta 5460
gtccgttatc aacttgaaaa agtggcaccg agtcggtgct ttttttgcgg ccgcaggaac 5520
ccctagtgat ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc 5580
gaccaaaggt cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc 5640
gcagctgcct gcaggggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700
cacaccgcat acgtcaaagc aaccatagta cgcgccctgt agcggcgcat taagcgcggc 5760
gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccttag cgcccgctcc 5820
tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa 5880
tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact 5940
tgatttgggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt 6000
gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa 6060
ctctatctcg ggctattctt ttgatttata agggattttg ccgatttcgg tctattggtt 6120
aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgtttac 6180
aattttatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagccccg 6240
acacccgcca acacccgctg acgcgccctg acgggcttgt ctgctcccgg catccgctta 6300
cagacaagct gtgaccgtct ccgggagctg catgtgtcag aggttttcac cgtcatcacc 6360
gaaacgcgcg agacgaaagg gcctcgtgat acgcctattt ttataggtta atgtcatgat 6420
aataatggtt tcttagacgt caggtggcac ttttcgggga aatgtgcgcg gaacccctat 6480
ttgtttattt ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata 6540
aatgcttcaa taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct 6600
tattcccttt tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa 6660
agtaaaagat gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa 6720
cagcggtaag atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt 6780
taaagttctg ctatgtggcg cggtattatc ccgtattgac gccgggcaag agcaactcgg 6840
tcgccgcata cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca 6900
tcttacggat ggcatgacag taagagaatt atgcagtgct gccataacca tgagtgataa 6960
cactgcggcc aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt 7020
gcacaacatg ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc 7080
cataccaaac gacgagcgtg acaccacgat gcctgtagca atggcaacaa cgttgcgcaa 7140
actattaact ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga 7200
ggcggataaa gttgcaggac cacttctgcg ctcggccctt ccggctggct ggtttattgc 7260
tgataaatct ggagccggtg agcgtggaag ccgcggtatc attgcagcac tggggccaga 7320
tggtaagccc tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga 7380
acgaaataga cagatcgctg agataggtgc ctcactgatt aagcattggt aactgtcaga 7440
ccaagtttac tcatatatac tttagattga tttaaaactt catttttaat ttaaaaggat 7500
ctaggtgaag atcctttttg ataatctcat gaccaaaatc ccttaacgtg agttttcgtt 7560
ccactgagcg tcagaccccg tagaaaagat caaaggatct tcttgagatc ctttttttct 7620
gcgcgtaatc tgctgcttgc aaacaaaaaa accaccgcta ccagcggtgg tttgtttgcc 7680
ggatcaagag ctaccaactc tttttccgaa ggtaactggc ttcagcagag cgcagatacc 7740
aaatactgtt cttctagtgt agccgtagtt aggccaccac ttcaagaact ctgtagcacc 7800
gcctacatac ctcgctctgc taatcctgtt accagtggct gctgccagtg gcgataagtc 7860
gtgtcttacc gggttggact caagacgata gttaccggat aaggcgcagc ggtcgggctg 7920
aacggggggt tcgtgcacac agcccagctt ggagcgaacg acctacaccg aactgagata 7980
cctacagcgt gagctatgag aaagcgccac gcttcccgaa gggagaaagg cggacaggta 8040
tccggtaagc ggcagggtcg gaacaggaga gcgcacgagg gagcttccag ggggaaacgc 8100
ctggtatctt tatagtcctg tcgggtttcg ccacctctga cttgagcgtc gatttttgtg 8160
atgctcgtca ggggggcgga gcctatggaa aaacgccagc aacgcggcct ttttacggtt 8220
cctggccttt tgctggcctt ttgctcacat gt 8252
<210> 12
<211> 8252
<212> DNA
<213> Artificial sequence
<400> 12
cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60
ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120
aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180
agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240
ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300
tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360
atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420
ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480
gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540
ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600
tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660
aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720
tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780
gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840
aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900
acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960
gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020
tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080
tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140
tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200
ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260
gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320
tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380
tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440
tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500
tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560
tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620
acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680
tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740
acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800
agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860
gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920
ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980
acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040
aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100
tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160
ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220
cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280
aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340
aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400
aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460
aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520
ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580
tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640
ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700
aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760
tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820
acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880
aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940
atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000
agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060
agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120
agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180
tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240
aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300
ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360
agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420
tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480
gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540
agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600
atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660
tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720
acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780
gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840
agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900
tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960
acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020
tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080
gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140
gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200
tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260
tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320
gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380
tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440
tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500
agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560
tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620
atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680
ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740
actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800
ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860
gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920
gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980
tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040
tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100
tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160
aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220
aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280
aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340
atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400
agaaatagca agttaaaata aggctagtcc gttatcaact tggccagagg aagagcgtca 5460
gcaggctgac tgcagggcca agtggcaccg agtcggtgct ttttttgcgg ccgcaggaac 5520
ccctagtgat ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc 5580
gaccaaaggt cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc 5640
gcagctgcct gcaggggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700
cacaccgcat acgtcaaagc aaccatagta cgcgccctgt agcggcgcat taagcgcggc 5760
gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccttag cgcccgctcc 5820
tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa 5880
tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact 5940
tgatttgggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt 6000
gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa 6060
ctctatctcg ggctattctt ttgatttata agggattttg ccgatttcgg tctattggtt 6120
aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgtttac 6180
aattttatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagccccg 6240
acacccgcca acacccgctg acgcgccctg acgggcttgt ctgctcccgg catccgctta 6300
cagacaagct gtgaccgtct ccgggagctg catgtgtcag aggttttcac cgtcatcacc 6360
gaaacgcgcg agacgaaagg gcctcgtgat acgcctattt ttataggtta atgtcatgat 6420
aataatggtt tcttagacgt caggtggcac ttttcgggga aatgtgcgcg gaacccctat 6480
ttgtttattt ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata 6540
aatgcttcaa taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct 6600
tattcccttt tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa 6660
agtaaaagat gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa 6720
cagcggtaag atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt 6780
taaagttctg ctatgtggcg cggtattatc ccgtattgac gccgggcaag agcaactcgg 6840
tcgccgcata cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca 6900
tcttacggat ggcatgacag taagagaatt atgcagtgct gccataacca tgagtgataa 6960
cactgcggcc aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt 7020
gcacaacatg ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc 7080
cataccaaac gacgagcgtg acaccacgat gcctgtagca atggcaacaa cgttgcgcaa 7140
actattaact ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga 7200
ggcggataaa gttgcaggac cacttctgcg ctcggccctt ccggctggct ggtttattgc 7260
tgataaatct ggagccggtg agcgtggaag ccgcggtatc attgcagcac tggggccaga 7320
tggtaagccc tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga 7380
acgaaataga cagatcgctg agataggtgc ctcactgatt aagcattggt aactgtcaga 7440
ccaagtttac tcatatatac tttagattga tttaaaactt catttttaat ttaaaaggat 7500
ctaggtgaag atcctttttg ataatctcat gaccaaaatc ccttaacgtg agttttcgtt 7560
ccactgagcg tcagaccccg tagaaaagat caaaggatct tcttgagatc ctttttttct 7620
gcgcgtaatc tgctgcttgc aaacaaaaaa accaccgcta ccagcggtgg tttgtttgcc 7680
ggatcaagag ctaccaactc tttttccgaa ggtaactggc ttcagcagag cgcagatacc 7740
aaatactgtt cttctagtgt agccgtagtt aggccaccac ttcaagaact ctgtagcacc 7800
gcctacatac ctcgctctgc taatcctgtt accagtggct gctgccagtg gcgataagtc 7860
gtgtcttacc gggttggact caagacgata gttaccggat aaggcgcagc ggtcgggctg 7920
aacggggggt tcgtgcacac agcccagctt ggagcgaacg acctacaccg aactgagata 7980
cctacagcgt gagctatgag aaagcgccac gcttcccgaa gggagaaagg cggacaggta 8040
tccggtaagc ggcagggtcg gaacaggaga gcgcacgagg gagcttccag ggggaaacgc 8100
ctggtatctt tatagtcctg tcgggtttcg ccacctctga cttgagcgtc gatttttgtg 8160
atgctcgtca ggggggcgga gcctatggaa aaacgccagc aacgcggcct ttttacggtt 8220
cctggccttt tgctggcctt ttgctcacat gt 8252
<210> 13
<211> 9181
<212> DNA
<213> Artificial sequence
<400> 13
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctagaaa tagcaagtta 300
aaataaggct agtccgttat caacttgaaa aagtggcacc gagtcggtgc ttttttgttt 360
tagagctaga aatagcaagt taaaataagg ctagtccgtt tttagcgcgt gcgccaattc 420
tgcagacaaa tggctctaga ggtacccgtt acataactta cggtaaatgg cccgcctggc 480
tgaccgccca acgacccccg cccattgacg tcaatagtaa cgccaatagg gactttccat 540
tgacgtcaat gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat 600
catatgccaa gtacgccccc tattgacgtc aatgacggta aatggcccgc ctggcattgt 660
gcccagtaca tgaccttatg ggactttcct acttggcagt acatctacgt attagtcatc 720
gctattacca tggtcgaggt gagccccacg ttctgcttca ctctccccat ctcccccccc 780
tccccacccc caattttgta tttatttatt ttttaattat tttgtgcagc gatgggggcg 840
gggggggggg gggggcgcgc gccaggcggg gcggggcggg gcgaggggcg gggcggggcg 900
aggcggagag gtgcggcggc agccaatcag agcggcgcgc tccgaaagtt tccttttatg 960
gcgaggcggc ggcggcggcg gccctataaa aagcgaagcg cgcggcgggc gggagtcgct 1020
gcgcgctgcc ttcgccccgt gccccgctcc gccgccgcct cgcgccgccc gccccggctc 1080
tgactgaccg cgttactccc acaggtgagc gggcgggacg gcccttctcc tccgggctgt 1140
aattagctga gcaagaggta agggtttaag ggatggttgg ttggtggggt attaatgttt 1200
aattacctgg agcacctgcc tgaaatcact ttttttcagg ttggaccggt gccaccatgg 1260
actataagga ccacgacgga gactacaagg atcatgatat tgattacaaa gacgatgacg 1320
ataagatggc cccaaagaag aagcggaagg tcggtatcca cggagtccca gcagccgaca 1380
agaagtacag catcggcctg gacatcggca ccaactctgt gggctgggcc gtgatcaccg 1440
acgagtacaa ggtgcccagc aagaaattca aggtgctggg caacaccgac cggcacagca 1500
tcaagaagaa cctgatcgga gccctgctgt tcgacagcgg cgaaacagcc gaggccaccc 1560
ggctgaagag aaccgccaga agaagataca ccagacggaa gaaccggatc tgctatctgc 1620
aagagatctt cagcaacgag atggccaagg tggacgacag cttcttccac agactggaag 1680
agtccttcct ggtggaagag gataagaagc acgagcggca ccccatcttc ggcaacatcg 1740
tggacgaggt ggcctaccac gagaagtacc ccaccatcta ccacctgaga aagaaactgg 1800
tggacagcac cgacaaggcc gacctgcggc tgatctatct ggccctggcc cacatgatca 1860
agttccgggg ccacttcctg atcgagggcg acctgaaccc cgacaacagc gacgtggaca 1920
agctgttcat ccagctggtg cagacctaca accagctgtt cgaggaaaac cccatcaacg 1980
ccagcggcgt ggacgccaag gccatcctgt ctgccagact gagcaagagc agacggctgg 2040
aaaatctgat cgcccagctg cccggcgaga agaagaatgg cctgttcgga aacctgattg 2100
ccctgagcct gggcctgacc cccaacttca agagcaactt cgacctggcc gaggatgcca 2160
aactgcagct gagcaaggac acctacgacg acgacctgga caacctgctg gcccagatcg 2220
gcgaccagta cgccgacctg tttctggccg ccaagaacct gtccgacgcc atcctgctga 2280
gcgacatcct gagagtgaac accgagatca ccaaggcccc cctgagcgcc tctatgatca 2340
agagatacga cgagcaccac caggacctga ccctgctgaa agctctcgtg cggcagcagc 2400
tgcctgagaa gtacaaagag attttcttcg accagagcaa gaacggctac gccggctaca 2460
ttgacggcgg agccagccag gaagagttct acaagttcat caagcccatc ctggaaaaga 2520
tggacggcac cgaggaactg ctcgtgaagc tgaacagaga ggacctgctg cggaagcagc 2580
ggaccttcga caacggcagc atcccccacc agatccacct gggagagctg cacgccattc 2640
tgcggcggca ggaagatttt tacccattcc tgaaggacaa ccgggaaaag atcgagaaga 2700
tcctgacctt ccgcatcccc tactacgtgg gccctctggc caggggaaac agcagattcg 2760
cctggatgac cagaaagagc gaggaaacca tcaccccctg gaacttcgag gaagtggtgg 2820
acaagggcgc ttccgcccag agcttcatcg agcggatgac caacttcgat aagaacctgc 2880
ccaacgagaa ggtgctgccc aagcacagcc tgctgtacga gtacttcacc gtgtataacg 2940
agctgaccaa agtgaaatac gtgaccgagg gaatgagaaa gcccgccttc ctgagcggcg 3000
agcagaaaaa ggccatcgtg gacctgctgt tcaagaccaa ccggaaagtg accgtgaagc 3060
agctgaaaga ggactacttc aagaaaatcg agtgcttcga ctccgtggaa atctccggcg 3120
tggaagatcg gttcaacgcc tccctgggca cataccacga tctgctgaaa attatcaagg 3180
acaaggactt cctggacaat gaggaaaacg aggacattct ggaagatatc gtgctgaccc 3240
tgacactgtt tgaggacaga gagatgatcg aggaacggct gaaaacctat gcccacctgt 3300
tcgacgacaa agtgatgaag cagctgaagc ggcggagata caccggctgg ggcaggctga 3360
gccggaagct gatcaacggc atccgggaca agcagtccgg caagacaatc ctggatttcc 3420
tgaagtccga cggcttcgcc aacagaaact tcatgcagct gatccacgac gacagcctga 3480
cctttaaaga ggacatccag aaagcccagg tgtccggcca gggcgatagc ctgcacgagc 3540
acattgccaa tctggccggc agccccgcca ttaagaaggg catcctgcag acagtgaagg 3600
tggtggacga gctcgtgaaa gtgatgggcc ggcacaagcc cgagaacatc gtgatcgaaa 3660
tggccagaga gaaccagacc acccagaagg gacagaagaa cagccgcgag agaatgaagc 3720
ggatcgaaga gggcatcaaa gagctgggca gccagatcct gaaagaacac cccgtggaaa 3780
acacccagct gcagaacgag aagctgtacc tgtactacct gcagaatggg cgggatatgt 3840
acgtggacca ggaactggac atcaaccggc tgtccgacta cgatgtggac catatcgtgc 3900
ctcagagctt tctgaaggac gactccatcg acaacaaggt gctgaccaga agcgacaaga 3960
accggggcaa gagcgacaac gtgccctccg aagaggtcgt gaagaagatg aagaactact 4020
ggcggcagct gctgaacgcc aagctgatta cccagagaaa gttcgacaat ctgaccaagg 4080
ccgagagagg cggcctgagc gaactggata aggccggctt catcaagaga cagctggtgg 4140
aaacccggca gatcacaaag cacgtggcac agatcctgga ctcccggatg aacactaagt 4200
acgacgagaa tgacaagctg atccgggaag tgaaagtgat caccctgaag tccaagctgg 4260
tgtccgattt ccggaaggat ttccagtttt acaaagtgcg cgagatcaac aactaccacc 4320
acgcccacga cgcctacctg aacgccgtcg tgggaaccgc cctgatcaaa aagtacccta 4380
agctggaaag cgagttcgtg tacggcgact acaaggtgta cgacgtgcgg aagatgatcg 4440
ccaagagcga gcaggaaatc ggcaaggcta ccgccaagta cttcttctac agcaacatca 4500
tgaacttttt caagaccgag attaccctgg ccaacggcga gatccggaag cggcctctga 4560
tcgagacaaa cggcgaaacc ggggagatcg tgtgggataa gggccgggat tttgccaccg 4620
tgcggaaagt gctgagcatg ccccaagtga atatcgtgaa aaagaccgag gtgcagacag 4680
gcggcttcag caaagagtct atcctgccca agaggaacag cgataagctg atcgccagaa 4740
agaaggactg ggaccctaag aagtacggcg gcttcgacag ccccaccgtg gcctattctg 4800
tgctggtggt ggccaaagtg gaaaagggca agtccaagaa actgaagagt gtgaaagagc 4860
tgctggggat caccatcatg gaaagaagca gcttcgagaa gaatcccatc gactttctgg 4920
aagccaaggg ctacaaagaa gtgaaaaagg acctgatcat caagctgcct aagtactccc 4980
tgttcgagct ggaaaacggc cggaagagaa tgctggcctc tgccggcgaa ctgcagaagg 5040
gaaacgaact ggccctgccc tccaaatatg tgaacttcct gtacctggcc agccactatg 5100
agaagctgaa gggctccccc gaggataatg agcagaaaca gctgtttgtg gaacagcaca 5160
agcactacct ggacgagatc atcgagcaga tcagcgagtt ctccaagaga gtgatcctgg 5220
ccgacgctaa tctggacaaa gtgctgtccg cctacaacaa gcaccgggat aagcccatca 5280
gagagcaggc cgagaatatc atccacctgt ttaccctgac caatctggga gcccctgccg 5340
ccttcaagta ctttgacacc accatcgacc ggaagaggta caccagcacc aaagaggtgc 5400
tggacgccac cctgatccac cagagcatca ccggcctgta cgagacacgg atcgacctgt 5460
ctcagctggg aggcgacaaa aggccggcgg ccacgaaaaa ggccggccag gcaaaaaaga 5520
aaaaggaatt cggcagtgga gagggcagag gaagtctgct aacatgcggt gacgtcgagg 5580
agaatcctgg cccaatgacc gagtacaagc ccacggtgcg cctcgccacc cgcgacgacg 5640
tccccagggc cgtacgcacc ctcgccgccg cgttcgccga ctaccccgcc acgcgccaca 5700
ccgtcgatcc ggaccgccac atcgagcggg tcaccgagct gcaagaactc ttcctcacgc 5760
gcgtcgggct cgacatcggc aaggtgtggg tcgcggacga cggcgccgcg gtggcggtct 5820
ggaccacgcc ggagagcgtc gaagcggggg cggtgttcgc cgagatcggc ccgcgcatgg 5880
ccgagttgag cggttcccgg ctggccgcgc agcaacagat ggaaggcctc ctggcgccgc 5940
accggcccaa ggagcccgcg tggttcctgg ccaccgtcgg agtctcgccc gaccaccagg 6000
gcaagggtct gggcagcgcc gtcgtgctcc ccggagtgga ggcggccgag cgcgccgggg 6060
tgcccgcctt cctggagacc tccgcgcccc gcaacctccc cttctacgag cggctcggct 6120
tcaccgtcac cgccgacgtc gaggtgcccg aaggaccgcg cacctggtgc atgacccgca 6180
agcccggtgc ctgagaattc taactagagc tcgctgatca gcctcgactg tgccttctag 6240
ttgccagcca tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg aaggtgccac 6300
tcccactgtc ctttcctaat aaaatgagga aattgcatcg cattgtctga gtaggtgtca 6360
ttctattctg gggggtgggg tggggcagga cagcaagggg gaggattggg aagagaatag 6420
caggcatgct ggggagcggc cgcaggaacc cctagtgatg gagttggcca ctccctctct 6480
gcgcgctcgc tcgctcactg aggccgggcg accaaaggtc gcccgacgcc cgggctttgc 6540
ccgggcggcc tcagtgagcg agcgagcgcg cagctgcctg caggggcgcc tgatgcggta 6600
ttttctcctt acgcatctgt gcggtatttc acaccgcata cgtcaaagca accatagtac 6660
gcgccctgta gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct 6720
acacttgcca gcgccttagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 6780
ttcgccggct ttccccgtca agctctaaat cgggggctcc ctttagggtt ccgatttagt 6840
gctttacggc acctcgaccc caaaaaactt gatttgggtg atggttcacg tagtgggcca 6900
tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 6960
ctcttgttcc aaactggaac aacactcaac tctatctcgg gctattcttt tgatttataa 7020
gggattttgc cgatttcggt ctattggtta aaaaatgagc tgatttaaca aaaatttaac 7080
gcgaatttta acaaaatatt aacgtttaca attttatggt gcactctcag tacaatctgc 7140
tctgatgccg catagttaag ccagccccga cacccgccaa cacccgctga cgcgccctga 7200
cgggcttgtc tgctcccggc atccgcttac agacaagctg tgaccgtctc cgggagctgc 7260
atgtgtcaga ggttttcacc gtcatcaccg aaacgcgcga gacgaaaggg cctcgtgata 7320
cgcctatttt tataggttaa tgtcatgata ataatggttt cttagacgtc aggtggcact 7380
tttcggggaa atgtgcgcgg aacccctatt tgtttatttt tctaaataca ttcaaatatg 7440
tatccgctca tgagacaata accctgataa atgcttcaat aatattgaaa aaggaagagt 7500
atgagtattc aacatttccg tgtcgccctt attccctttt ttgcggcatt ttgccttcct 7560
gtttttgctc acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgca 7620
cgagtgggtt acatcgaact ggatctcaac agcggtaaga tccttgagag ttttcgcccc 7680
gaagaacgtt ttccaatgat gagcactttt aaagttctgc tatgtggcgc ggtattatcc 7740
cgtattgacg ccgggcaaga gcaactcggt cgccgcatac actattctca gaatgacttg 7800
gttgagtact caccagtcac agaaaagcat cttacggatg gcatgacagt aagagaatta 7860
tgcagtgctg ccataaccat gagtgataac actgcggcca acttacttct gacaacgatc 7920
ggaggaccga aggagctaac cgcttttttg cacaacatgg gggatcatgt aactcgcctt 7980
gatcgttggg aaccggagct gaatgaagcc ataccaaacg acgagcgtga caccacgatg 8040
cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg gcgaactact tactctagct 8100
tcccggcaac aattaataga ctggatggag gcggataaag ttgcaggacc acttctgcgc 8160
tcggcccttc cggctggctg gtttattgct gataaatctg gagccggtga gcgtggaagc 8220
cgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctac 8280
acgacgggga gtcaggcaac tatggatgaa cgaaatagac agatcgctga gataggtgcc 8340
tcactgatta agcattggta actgtcagac caagtttact catatatact ttagattgat 8400
ttaaaacttc atttttaatt taaaaggatc taggtgaaga tcctttttga taatctcatg 8460
accaaaatcc cttaacgtga gttttcgttc cactgagcgt cagaccccgt agaaaagatc 8520
aaaggatctt cttgagatcc tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa 8580
ccaccgctac cagcggtggt ttgtttgccg gatcaagagc taccaactct ttttccgaag 8640
gtaactggct tcagcagagc gcagatacca aatactgttc ttctagtgta gccgtagtta 8700
ggccaccact tcaagaactc tgtagcaccg cctacatacc tcgctctgct aatcctgtta 8760
ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg ggttggactc aagacgatag 8820
ttaccggata aggcgcagcg gtcgggctga acggggggtt cgtgcacaca gcccagcttg 8880
gagcgaacga cctacaccga actgagatac ctacagcgtg agctatgaga aagcgccacg 8940
cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg gcagggtcgg aacaggagag 9000
cgcacgaggg agcttccagg gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc 9060
cacctctgac ttgagcgtcg atttttgtga tgctcgtcag gggggcggag cctatggaaa 9120
aacgccagca acgcggcctt tttacggttc ctggcctttt gctggccttt tgctcacatg 9180
t 9181
<210> 14
<211> 584
<212> DNA
<213> Artificial sequence
<400> 14
gacattgatt attgactagt tattaatagt aatcaattac ggggtcatta gttcatagcc 60
catatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc tgaccgccca 120
acgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg ccaataggga 180
ctttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240
aagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300
ggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac atctacgtat 360
tagtcatcgc tattaccatg gtgatgcggt tttggcagta catcaatggg cgtggatagc 420
ggtttgactc acggggattt ccaagtctcc accccattga cgtcaatggg agtttgtttt 480
ggcaccaaaa tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca ttgacgcaaa 540
tgggcggtag gcgtgtacgg tgggaggtct atataagcag agct 584
<210> 15
<211> 600
<212> DNA
<213> Artificial sequence
<400> 15
atgaccgagt acaagcccac ggtgcgcctc gccacccgcg acgacgtccc cagggccgta 60
cgcaccctcg ccgccgcgtt cgccgactac cccgccacgc gccacaccgt cgatccggac 120
cgccacatcg agcgggtcac cgagctgcaa gaactcttcc tcacgcgcgt cgggctcgac 180
atcggcaagg tgtgggtcgc ggacgacggc gccgcggtgg cggtctggac cacgccggag 240
agcgtcgaag cgggggcggt gttcgccgag atcggcccgc gcatggccga gttgagcggt 300
tcccggctgg ccgcgcagca acagatggaa ggcctcctgg cgccgcaccg gcccaaggag 360
cccgcgtggt tcctggccac cgtcggcgtc tcgcccgacc accagggcaa gggtctgggc 420
agcgccgtcg tgctccccgg agtggaggcg gccgagcgcg ccggggtgcc cgccttcctg 480
gagacctccg cgccccgcaa cctccccttc tacgagcggc tcggcttcac cgtcaccgcc 540
gacgtcgagg tgcccgaagg accgcgcacc tggtgcatga cccgcaagcc cggtgcctga 600
<210> 16
<211> 714
<212> DNA
<213> Artificial sequence
<400> 16
gtgagcaagg gcgaggagct gttcaccggg gtggtgccca tcctggtcga gctggacggc 60
gacgtaaacg gccacaagtt cagcgtgtcc ggcgagggcg agggcgatgc cacctacggc 120
aagctgaccc tgaagttcat ctgcaccacc ggcaagctgc ccgtgccctg gcccaccctc 180
gtgaccaccc tgacctacgg cgtgcagtgc ttcagccgct accccgacca catgaagcag 240
cacgacttct tcaagtccgc catgcccgaa ggctacgtcc aggagcgcac catcttcttc 300
aaggacgacg gcaactacaa gacccgcgcc gaggtgaagt tcgagggcga caccctggtg 360
aaccgcatcg agctgaaggg catcgacttc aaggaggacg gcaacatcct ggggcacaag 420
ctggagtaca actacaacag ccacaacgtc tatatcatgg ccgacaagca gaagaacggc 480
atcaaggtga acttcaagat ccgccacaac atcgaggacg gcagcgtgca gctcgccgac 540
cactaccagc agaacacccc catcggcgac ggccccgtgc tgctgcccga caaccactac 600
ctgagcaccc agtccgccct gagcaaagac cccaacgaga agcgcgatca catggtcctg 660
ctggagttcg tgaccgccgc cgggatcact ctcggcatgg acgagctgta caag 714
<210> 17
<211> 8111
<212> DNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (2581)..(2587)
<223> n is a, c, g, or t
<400> 17
agcttaatgt agtcttatgc aatactcttg tagtcttgca acatggtaac gatgagttag 60
caacatgcct tacaaggaga gaaaaagcac cgtgcatgcc gattggtgga agtaaggtgg 120
tacgatcgtg ccttattagg aaggcaacag acgggtctga catggattgg acgaaccact 180
gaattgccgc attgcagaga tattgtattt aagtgcctag ctcgatacat aaacgggtct 240
ctctggttag accagatctg agcctgggag ctctctggct aactagggaa cccactgctt 300
aagcctcaat aaagcttgcc ttgagtgctt caagtagtgt gtgcccgtct gttgtgtgac 360
tctggtaact agagatccct cagacccttt tagtcagtgt ggaaaatctc tagcagtggc 420
gcccgaacag ggacttgaaa gcgaaaggga aaccagagga gctctctcga cgcaggactc 480
ggcttgctga agcgcgcacg gcaagaggcg aggggcggcg actggtgagt acgccaaaaa 540
ttttgactag cggaggctag aaggagagag atgggtgcga gagcgtcagt attaagcggg 600
ggagaattag atcgcgatgg gaaaaaattc ggttaaggcc agggggaaag aaaaaatata 660
aattaaaaca tatagtatgg gcaagcaggg agctagaacg attcgcagtt aatcctggcc 720
tgttagaaac atcagaaggc tgtagacaaa tactgggaca gctacaacca tcccttcaga 780
caggatcaga agaacttaga tcattatata atacagtagc aaccctctat tgtgtgcatc 840
aaaggataga gataaaagac accaaggaag ctttagacaa gatagaggaa gagcaaaaca 900
aaagtaagac caccgcacag caagcggccg ctgatcttca gacctggagg aggagatatg 960
agggacaatt ggagaagtga attatataaa tataaagtag taaaaattga accattagga 1020
gtagcaccca ccaaggcaaa gagaagagtg gtgcagagag aaaaaagagc agtgggaata 1080
ggagctttgt tccttgggtt cttgggagca gcaggaagca ctatgggcgc agcgtcaatg 1140
acgctgacgg tacaggccag acaattattg tctggtatag tgcagcagca gaacaatttg 1200
ctgagggcta ttgaggcgca acagcatctg ttgcaactca cagtctgggg catcaagcag 1260
ctccaggcaa gaatcctggc tgtggaaaga tacctaaagg atcaacagct cctggggatt 1320
tggggttgct ctggaaaact catttgcacc actgctgtgc cttggaatgc tagttggagt 1380
aataaatctc tggaacagat ttggaatcac acgacctgga tggagtggga cagagaaatt 1440
aacaattaca caagcttaat acactcctta attgaagaat cgcaaaacca gcaagaaaag 1500
aatgaacaag aattattgga attagataaa tgggcaagtt tgtggaattg gtttaacata 1560
acaaattggc tgtggtatat aaaattattc ataatgatag taggaggctt ggtaggttta 1620
agaatagttt ttgctgtact ttctatagtg aatagagtta ggcagggata ttcaccatta 1680
tcgtttcaga cccacctccc aaccccgagg ggacccgaca ggcccgaagg aatagaagaa 1740
gaaggtggag agagagacag agacagatcc attcgattag tgaacggatc tcgacggtat 1800
cggttaactt ttaaaagaaa aggggggatt ggggggtaca gtgcagggga aagaatagta 1860
gacataatag caacagacat acaaactaaa gaattacaaa aacaaattac aaaaattcaa 1920
aattttatcg atcacgagac tagcctcgag aagcttgata tcgacattga ttattgacta 1980
gttattaata gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg 2040
ttacataact tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga 2100
cgtcaataat gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat 2160
gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat catatgccaa 2220
gtacgccccc tattgacgtc aatgacggta aatggcccgc ctggcattat gcccagtaca 2280
tgaccttatg ggactttcct acttggcagt acatctacgt attagtcatc gctattacca 2340
tggtgatgcg gttttggcag tacatcaatg ggcgtggata gcggtttgac tcacggggat 2400
ttccaagtct ccaccccatt gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg 2460
actttccaaa atgtcgtaac aactccgccc cattgacgca aatgggcggt aggcgtgtac 2520
ggtgggaggt ctatataagc agagctgcca ccatggaacg gctcggagat catcattgcg 2580
nnnnnnngtg agcaagggcg aggagctgtt caccggggtg gtgcccatcc tggtcgagct 2640
ggacggcgac gtaaacggcc acaagttcag cgtgtccggc gagggcgagg gcgatgccac 2700
ctacggcaag ctgaccctga agttcatctg caccaccggc aagctgcccg tgccctggcc 2760
caccctcgtg accaccctga cctacggcgt gcagtgcttc agccgctacc ccgaccacat 2820
gaagcagcac gacttcttca agtccgccat gcccgaaggc tacgtccagg agcgcaccat 2880
cttcttcaag gacgacggca actacaagac ccgcgccgag gtgaagttcg agggcgacac 2940
cctggtgaac cgcatcgagc tgaagggcat cgacttcaag gaggacggca acatcctggg 3000
gcacaagctg gagtacaact acaacagcca caacgtctat atcatggccg acaagcagaa 3060
gaacggcatc aaggtgaact tcaagatccg ccacaacatc gaggacggca gcgtgcagct 3120
cgccgaccac taccagcaga acacccccat cggcgacggc cccgtgctgc tgcccgacaa 3180
ccactacctg agcacccagt ccgccctgag caaagacccc aacgagaagc gcgatcacat 3240
ggtcctgctg gagttcgtga ccgccgccgg gatcactctc ggcatggacg agctgtacaa 3300
ggagggcaga ggaagtcttc taacatgcgg tgacgtggag gagaatcccg gccctatgac 3360
cgagtacaag cccacggtgc gcctcgccac ccgcgacgac gtccccaggg ccgtacgcac 3420
cctcgccgcc gcgttcgccg actaccccgc cacgcgccac accgtcgatc cggaccgcca 3480
catcgagcgg gtcaccgagc tgcaagaact cttcctcacg cgcgtcgggc tcgacatcgg 3540
caaggtgtgg gtcgcggacg acggcgccgc ggtggcggtc tggaccacgc cggagagcgt 3600
cgaagcgggg gcggtgttcg ccgagatcgg cccgcgcatg gccgagttga gcggttcccg 3660
gctggccgcg cagcaacaga tggaaggcct cctggcgccg caccggccca aggagcccgc 3720
gtggttcctg gccaccgtcg gcgtctcgcc cgaccaccag ggcaagggtc tgggcagcgc 3780
cgtcgtgctc cccggagtgg aggcggccga gcgcgccggg gtgcccgcct tcctggagac 3840
ctccgcgccc cgcaacctcc ccttctacga gcggctcggc ttcaccgtca ccgccgacgt 3900
cgaggtgccc gaaggaccgc gcacctggtg catgacccgc aagcccggtg cctgagtcga 3960
caatcaacct ctggattaca aaatttgtga aagattgact ggtattctta actatgttgc 4020
tccttttacg ctatgtggat acgctgcttt aatgcctttg tatcatgcta ttgcttcccg 4080
tatggctttc attttctcct ccttgtataa atcctggttg ctgtctcttt atgaggagtt 4140
gtggcccgtt gtcaggcaac gtggcgtggt gtgcactgtg tttgctgacg caacccccac 4200
tggttggggc attgccacca cctgtcagct cctttccggg actttcgctt tccccctccc 4260
tattgccacg gcggaactca tcgccgcctg ccttgcccgc tgctggacag gggctcggct 4320
gttgggcact gacaattccg tggtgttgtc ggggaagctg acgtcctttc catggctgct 4380
cgcctgtgtt gccacctgga ttctgcgcgg gacgtccttc tgctacgtcc cttcggccct 4440
caatccagcg gaccttcctt cccgcggcct gctgccggct ctgcggcctc ttccgcgtct 4500
tcgccttcgc cctcagacga gtcggatctc cctttgggcc gcctccccgc ctggaattcg 4560
agctcggtac ctttaagacc aatgacttac aaggcagctg tagatcttag ccacttttta 4620
aaagaaaagg ggggactgga agggctaatt cactcccaac gaagacaaga tctgcttttt 4680
gcttgtactg ggtctctctg gttagaccag atctgagcct gggagctctc tggctaacta 4740
gggaacccac tgcttaagcc tcaataaagc ttgccttgag tgcttcaagt agtgtgtgcc 4800
cgtctgttgt gtgactctgg taactagaga tccctcagac ccttttagtc agtgtggaaa 4860
atctctagca gtagtagttc atgtcatctt attattcagt atttataact tgcaaagaaa 4920
tgaatatcag agagtgagag gaacttgttt attgcagctt ataatggtta caaataaagc 4980
aatagcatca caaatttcac aaataaagca tttttttcac tgcattctag ttgtggtttg 5040
tccaaactca tcaatgtatc ttatcatgtc tggctctagc tatcccgccc ctaactccgc 5100
ccatcccgcc cctaactccg cccagttccg cccattctcc gccccatggc tgactaattt 5160
tttttattta tgcagaggcc gaggccgcct cggcctctga gctattccag aagtagtgag 5220
gaggcttttt tggaggccta gggacgtacc caattcgccc tatagtgagt cgtattacgc 5280
gcgctcactg gccgtcgttt tacaacgtcg tgactgggaa aaccctggcg ttacccaact 5340
taatcgcctt gcagcacatc cccctttcgc cagctggcgt aatagcgaag aggcccgcac 5400
cgatcgccct tcccaacagt tgcgcagcct gaatggcgaa tgggacgcgc cctgtagcgg 5460
cgcattaagc gcggcgggtg tggtggttac gcgcagcgtg accgctacac ttgccagcgc 5520
cctagcgccc gctcctttcg ctttcttccc ttcctttctc gccacgttcg ccggctttcc 5580
ccgtcaagct ctaaatcggg ggctcccttt agggttccga tttagtgctt tacggcacct 5640
cgaccccaaa aaacttgatt agggtgatgg ttcacgtagt gggccatcgc cctgatagac 5700
ggtttttcgc cctttgacgt tggagtccac gttctttaat agtggactct tgttccaaac 5760
tggaacaaca ctcaacccta tctcggtcta ttcttttgat ttataaggga ttttgccgat 5820
ttcggcctat tggttaaaaa atgagctgat ttaacaaaaa tttaacgcga attttaacaa 5880
aatattaacg cttacaattt aggtggcact tttcggggaa atgtgcgcgg aacccctatt 5940
tgtttatttt tctaaataca ttcaaatatg tatccgctca tgagacaata accctgataa 6000
atgcttcaat aatattgaaa aaggaagagt atgagtattc aacatttccg tgtcgccctt 6060
attccctttt ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaa 6120
gtaaaagatg ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaac 6180
agcggtaaga tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcactttt 6240
aaagttctgc tatgtggcgc ggtattatcc cgtattgacg ccgggcaaga gcaactcggt 6300
cgccgcatac actattctca gaatgacttg gttgagtact caccagtcac agaaaagcat 6360
cttacggatg gcatgacagt aagagaatta tgcagtgctg ccataaccat gagtgataac 6420
actgcggcca acttacttct gacaacgatc ggaggaccga aggagctaac cgcttttttg 6480
cacaacatgg gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagcc 6540
ataccaaacg acgagcgtga caccacgatg cctgtagcaa tggcaacaac gttgcgcaaa 6600
ctattaactg gcgaactact tactctagct tcccggcaac aattaataga ctggatggag 6660
gcggataaag ttgcaggacc acttctgcgc tcggcccttc cggctggctg gtttattgct 6720
gataaatctg gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagat 6780
ggtaagccct cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaa 6840
cgaaatagac agatcgctga gataggtgcc tcactgatta agcattggta actgtcagac 6900
caagtttact catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc 6960
taggtgaaga tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc 7020
cactgagcgt cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg 7080
cgcgtaatct gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg 7140
gatcaagagc taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca 7200
aatactgttc ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg 7260
cctacatacc tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg 7320
tgtcttaccg ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga 7380
acggggggtt cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac 7440
ctacagcgtg agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat 7500
ccggtaagcg gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc 7560
tggtatcttt atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga 7620
tgctcgtcag gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc 7680
ctggcctttt gctggccttt tgctcacatg ttctttcctg cgttatcccc tgattctgtg 7740
gataaccgta ttaccgcctt tgagtgagct gataccgctc gccgcagccg aacgaccgag 7800
cgcagcgagt cagtgagcga ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc 7860
gcgcgttggc cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc 7920
agtgagcgca acgcaattaa tgtgagttag ctcactcatt aggcacccca ggctttacac 7980
tttatgcttc cggctcgtat gttgtgtgga attgtgagcg gataacaatt tcacacagga 8040
aacagctatg accatgatta cgccaagcgc gcaattaacc ctcactaaag ggaacaaaag 8100
ctggagctgc a 8111

Claims (10)

1. A method of engineering an sgRNA molecule comprising the steps of: introducing a sequence with a nuclear localization function into a snoRNA molecule on a nucleotide chain of the sgRNA to obtain an altered sgRNA molecule;
the sgRNA molecule consists of a guide sequence and a framework sequence;
the engineered sgRNA molecule can direct Cas protein to the nucleus of eukaryotic cells;
the Cas protein is Cas9 and the Cas protein does not contain a nuclear localization sequence;
the sequence for introducing the nucleation and positioning functions in the snoRNA molecule on the nucleotide chain of the sgRNA is realized according to any one of the following modes:
(A1) 1U 3 snorsbox C'/D was ligated to loop 1 and loop 2 positions of the sgRNA backbone sequence, respectively, to form the following sequences: guuuuagagcuaggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccuagcaaguuaaaauaaggcuaguccguuaucaacuuggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccaaguggcaccgagucggugc;
(A2) 1U 3 snorsbox C'/D was ligated at loop 1 position of the sgRNA backbone sequence to form the following sequence: guuuuagagcuaggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccuagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugc;
(A3) 1U 3 snorsbox C'/D was ligated at loop 2 position of the sgRNA backbone sequence to form the following sequence: guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuuggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccaaguggcaccgagucggugc.
2. The engineered sgRNA molecule is characterized in that: the engineered sgRNA molecule is prepared by the method of claim 1.
3. A DNA molecule encoding the engineered sgRNA molecule of claim 2.
4. An expression cassette, expression vector, recombinant or transgenic cell line comprising the DNA molecule of claim 3.
5. A kit comprising any one of the following:
i. a Cas9 protein, and the engineered sgRNA molecule of claim 2;
ii. An expression vector 1 comprising a nucleotide sequence encoding a Cas9 protein, and the engineered sgRNA molecule of claim 2;
a Cas9 protein, and an expression vector 2 comprising a nucleotide sequence encoding the engineered sgRNA molecule of claim 2;
iv, expression vector 1 comprising a nucleotide sequence encoding a Cas9 protein, and expression vector 2 comprising a nucleotide sequence encoding the engineered sgRNA molecule of claim 2;
v, expression vector 3 comprising a nucleotide sequence encoding a Cas9 protein and a nucleotide sequence encoding the engineered sgRNA molecule of claim 2.
6. A composition selected from any one of the following:
I. a composition comprising: a Cas9 protein, and the engineered sgRNA molecule of claim 2;
II. A composition comprising: a nucleic acid molecule 1 encoding a Cas9 protein, and the engineered sgRNA molecule of claim 2;
III, a composition comprising: a Cas9 protein, and nucleic acid molecule 2 encoding the engineered sgRNA molecule of claim 2;
IV, a composition comprising: a nucleic acid molecule 1 encoding a Cas9 protein, and a nucleic acid molecule 2 encoding the engineered sgRNA molecule of claim 2;
v, a composition comprising: a nucleic acid molecule 3 encoding a Cas9 protein and the engineered sgRNA molecule of claim 2.
7. Use of the engineered sgRNA molecule of claim 2 or the DNA molecule of claim 3 or the expression cassette, expression vector, recombinant or transgenic cell line of claim 4 or the kit of claim 5 or the composition of claim 6 for modifying a genomic target nucleic acid.
8. A method of modifying a genomic target nucleic acid, comprising: introducing the composition of claim 6 into an organism or biological cell, allowing expression of both the Cas9 protein and the engineered sgRNA molecule, effecting modification of genomic target nucleic acid.
9. A method of making a mutant of a biological cell, comprising: a method according to claim 8, wherein the genome of the biological cell is modified to obtain a mutant biological cell.
10. A method of making a biological mutant comprising: a method according to claim 8, wherein the biological mutant is obtained by targeting or modifying the genome of the organism.
CN202210539746.4A 2021-11-15 2022-05-18 Engineered sgRNA molecules and uses thereof Active CN114990104B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111348694 2021-11-15
CN2021113486944 2021-11-15

Publications (2)

Publication Number Publication Date
CN114990104A CN114990104A (en) 2022-09-02
CN114990104B true CN114990104B (en) 2023-10-20

Family

ID=83026997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210539746.4A Active CN114990104B (en) 2021-11-15 2022-05-18 Engineered sgRNA molecules and uses thereof

Country Status (1)

Country Link
CN (1) CN114990104B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020044039A1 (en) * 2018-08-29 2020-03-05 Oxford University Innovation Limited Modified sgrnas
CN110982818A (en) * 2019-12-20 2020-04-10 北京市农林科学院 Application of nuclear localization signal F4NLS in efficient creation of rice herbicide resistant material

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016176404A1 (en) * 2015-04-30 2016-11-03 The Brigham And Women's Hospital, Inc. Methods and kits for cloning-free genome editing
US11248216B2 (en) * 2016-04-25 2022-02-15 The Regents Of The University Of California Methods and compositions for genomic editing
CN115244176A (en) * 2019-08-19 2022-10-25 钟明宏 Conjugates of guide RNA-CAS protein complexes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020044039A1 (en) * 2018-08-29 2020-03-05 Oxford University Innovation Limited Modified sgrnas
CN110982818A (en) * 2019-12-20 2020-04-10 北京市农林科学院 Application of nuclear localization signal F4NLS in efficient creation of rice herbicide resistant material

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Aarthi Narayanan等.Role of the Box C/D Motif in Localization of Small Nucleolar RNAs to Coiled Bodies and Nucleoli.《Molecular Biology of the Cell》.2017,第第10卷卷(第第10卷期),第2131-2147页. *

Also Published As

Publication number Publication date
CN114990104A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
AU2020289750B2 (en) Engineered meganucleases with recognition sequences found in the human T cell receptor alpha constant region gene
KR102191739B1 (en) Modified foot-and-mouth disease virus 3C protease, composition and method thereof
CN111108207A (en) Genome editing means for gene therapy of genetic disorders and gene therapy in combination with viral vectors
AU2021200863A1 (en) Genetically-modified cells comprising a modified human t cell receptor alpha constant region gene
KR20150125994A (en) A cell expression system
CN112375748B (en) Novel coronavirus chimeric recombinant vaccine based on vesicular stomatitis virus vector, and preparation method and application thereof
CN107674862B (en) CIK modified by similar chimeric antigen receptor and preparation method and application thereof
KR101657717B1 (en) Mammalian expression vector
US20030024009A1 (en) Manipulation of the phenolic acid content and digestibility of plant cell walls by targeted expression of genes encoding cell wall degrading enzymes
CN112941038B (en) Novel recombinant coronavirus based on vesicular stomatitis virus vector, and preparation method and application thereof
CN109943566A (en) The sgRNAs of selectively targeted YBX1 gene and its application
CN114934031B (en) Novel Cas effect protein, gene editing system and application
CN106957859A (en) It is a kind of to be used to save measles virus, the system and method for recombinant measles virus
CN112442515B (en) Application of gRNA target combination in construction of hemophilia model pig cell line
CN111315212B (en) Genome edited birds
CN114990104B (en) Engineered sgRNA molecules and uses thereof
CN114525304B (en) Gene editing method
CN112442513B (en) Cas9 overexpression vector and construction method and application thereof
CN112538497B (en) CRISPR/Cas9 system and application thereof in construction of alpha, beta and alpha &amp; beta thalassemia model pig cell lines
CN115212297A (en) Genetically engineered medicine for treating inflammatory arthritis and preparation method thereof
KR20140043890A (en) Regulated gene expression systems and constructs thereof
CN112522292B (en) CRISPR/Cas9 system for constructing congenital amaranth clone pig nuclear donor cells and application thereof
CN112522310B (en) CRISPR system and application thereof in construction of LRP5 gene mutant osteoporosis clone pig nuclear donor cell
TW201209164A (en) Method for enhancing production of disease-resistant usage proteins using bioreactors
KR101989814B1 (en) Shuttle plasmid replicable in clostridium and escherichia coli, and recombinant microorganism having enhanced pentose metabolism and fermentation performance prepared using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant