CN114990104A - Modified sgRNA molecules and uses thereof - Google Patents

Modified sgRNA molecules and uses thereof Download PDF

Info

Publication number
CN114990104A
CN114990104A CN202210539746.4A CN202210539746A CN114990104A CN 114990104 A CN114990104 A CN 114990104A CN 202210539746 A CN202210539746 A CN 202210539746A CN 114990104 A CN114990104 A CN 114990104A
Authority
CN
China
Prior art keywords
sequence
molecule
box
sgrna
snorna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210539746.4A
Other languages
Chinese (zh)
Other versions
CN114990104B (en
Inventor
梁峻彬
梁兴祥
徐辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ruifeng Biotechnology Co ltd
Original Assignee
Guangzhou Ruifeng Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ruifeng Biotechnology Co ltd filed Critical Guangzhou Ruifeng Biotechnology Co ltd
Publication of CN114990104A publication Critical patent/CN114990104A/en
Application granted granted Critical
Publication of CN114990104B publication Critical patent/CN114990104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • C12N5/0684Cells of the urinary tract or kidneys
    • C12N5/0686Kidney cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/31Chemical structure of the backbone
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2510/00Genetically modified cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2810/00Vectors comprising a targeting moiety
    • C12N2810/10Vectors comprising a non-peptidic targeting moiety

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Urology & Nephrology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention discloses a modified sgRNA molecule and application thereof. The modified sgRNA molecule is obtained by introducing a sequence with a function of initiating nuclear localization in a snoRNA molecule into a nucleotide chain of the sgRNA. The invention develops a novel CRISPR-Cas system, and the modified sgRNA can guide the Cas protein into the nucleus for effective gene editing.

Description

Modified sgRNA molecule and application thereof
Technical Field
The invention relates to the technical field of biology, in particular to a modified sgRNA molecule and application thereof.
Background
The CRISPR/Cas system, an acquired immune system currently found in most bacteria and most archaea, can recognize and eliminate foreign plasmids or phages and leave foreign gene fragments in the self-genome as immunological memory. Naturally occurring CRISPR-Cas systems fall into two broad categories: class 1, the use of polyprotein complexes for nucleic acid cleavage; class 2, cleavage is performed using single protein effector domains. Due to the advantages offered by single protein effector domains, class 2 systems are the most widespread CRISPR tool for biological research and translation applications. Class 2 is further subdivided into three types II, V and VI, each using a different type of Cas protein. Among Cas proteins from class 2 systems, certain type II Cas9 and type V Cas12 have RNA-guided DNA endonuclease activity, while type VI Cas13 appears to show preferential RNA targeting and cleavage activity.
Among these, Cas9 and Cas12 effectors from class 2 CRISPR systems are RNA-guided endonucleases that can produce DSBs in a target DNA sequence. The CRISPR/Cas system mainly comprises a Cas protein and a single-stranded guide RNA (sgRNA), wherein the Cas protein has a function of cutting a DNA double strand, the sgRNA plays a guiding role, and the Cas protein can reach different target positions through base complementary pairing under the guidance of the sgRNA and cut a target gene to accurately edit the gene at a fixed point.
At present, the CRISPR/Cas system is the most popular and best used gene editing system because of its simple operation and accurate editing of nucleic acid sequences. However, Cas, as a nucleic acid editing technology, needs to enter into the cell nucleus to be combined with chromosomal DNA to function, and the currently mainstream nuclear entry method is to fuse Cas protein and nuclear localization signal (i.e., nuclear localization sequence, NLS), form a complex of Cas with the nuclear localization signal and sgRNA, and then interact nuclear localization signal and nuclear entry vector, so that Cas protein can be transported into the cell nucleus, thereby enabling Cas to function.
snoRNA (small nucleolar RNA) is a highly abundant small non-coding RNA in the nucleus, and the vast majority of snornas can be classified into two types: box C/D snorRNA and box H/ACA snorRNA, all have conserved characteristic secondary structures. The snornas have the function of directing specific nucleoside 2' -O-ribomethylation modifications and pseudouracil modifications in rRNA, snrnas or tRNA precursors. Various documents report that box C' and box D sequences in snornas play an important role in snoRNA RNP formation and interaction.
Disclosure of Invention
The invention aims to provide an improved sgRNA molecule and application thereof.
In a first aspect, the invention claims a method of engineering a sgRNA molecule.
The method for modifying the sgRNA molecule, which is claimed by the invention, can comprise the following steps: introducing a sequence with a function of initiating nuclear localization in a snoRNA molecule into a nucleotide chain of the sgRNA to obtain the modified sgRNA molecule.
In some cases, the engineered sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions as a nuclear localization in the snoRNA molecule. The framework sequence of the modified sgRNA molecule can be obtained by inserting or replacing a sequence with a function of nuclear localization in the snoRNA molecule into the framework sequence of the sgRNA molecule before modification.
In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule may be selected from the sequences shown below (a1) and/or (a 2):
(a1) box C' and/or box D sequences from snoRNA;
(a2) box H and/or box ACA sequences from snorRNAs.
Further, the sequence for nuclear localization function in the snoRNA molecule is box C' and/or box D sequence.
In some cases, the box C' sequence comprises or is a sequence DGAHBN, where D is U, G, or a; h is U, A or C; b is G, U or C; n may be any ribonucleotide.
In some cases, the box D sequence comprises the sequence NYVWGA or CUGA. Further, the box D sequence comprises the sequence NYVWGA or GGCUGA. Still further, the box D sequence is the sequence NYVWGA, GGCUGA or CUGA. Wherein N may be any ribonucleotide; y is C or U; v is C, G or A; w is U or A.
In some cases, the sequences that function in nuclear localization in the snoRNA molecule are box C ' and box D, and the box C ' sequence comprises GAGGAAGA or the box C ' sequence is GAGGAAGA, and the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.
In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule is GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the snoRNA molecule is selected from any snoRNA comprising a box C' or box D sequence. In some cases, the snoRNA molecule is any selected from: u3snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA, and U24 to U63 snoRNA.
In some cases, the sequence inserted or substituted into the backbone sequence of the sgRNA molecule prior to engineering that functions as a nuclear localisation in the snoRNA molecule is a sequence inserted or substituted into the snoRNA molecule that functions as a nuclear localisation in the snoRNA molecule prior to engineering that is a non-complementary pairing sequence of the backbone sequence secondary structure of the sgRNA molecule (i.e. a portion that does not form intramolecular base complementary pairings). The secondary structure of the framework sequence can be predicted by a person skilled in the art according to a conventional calculation method, or can be determined according to a conventional experimental method. The complementary pairing may be a conventional A-U, C-G base complementary pairing, or may not include other less common base complementary pairing (e.g., G-U, A-A, A-C, A-G, G-G, U-U, U-C pairing).
Further, in some cases, the scaffold sequence secondary structure non-complementary pairing sequence is a loop (loop) sequence of the pre-engineered sgRNA molecule scaffold sequence secondary structure (i.e., a loop in the sgRNA scaffold sequence stem-loop structure).
The engineered sgRNA molecules of the invention may comprise engineered or unmodified crRNA sequences, as well as engineered or unmodified tracrRNA sequences. For clarity, reference is made to the non-limiting exemplary diagram shown in fig. 2 (the sequence comprising loop1 and loop2 in fig. 2 is the sgRNA backbone sequence corresponding to SpCas 9). The nuclear localization functional sequence may be linked (inserted or substituted) to a position in the sgRNA secondary structure where the crRNA sequence is chimeric (i.e., linked) to the tracrRNA sequence, such as the loop1 position in fig. 2; alternatively, the nuclear localization functional sequence may be attached (i.e., inserted or substituted) to the inside of the crRNA sequence of the sgRNA or to the inside of the tracrRNA sequence, non-limiting examples such as optionally to a loop (loop) formed by only the tracrRNA sequence in the secondary structure of the sgRNA, as in loop2 position in fig. 2. Upon insertion of the nuclear localization functional sequence, the number of nucleotides originally belonging to the sgRNA before the alteration is not reduced, e.g., the number of nucleotides originally belonging to that part of the loop is not reduced. After replacement into the nuclear localization functional sequence, the number of nucleotides originally belonging to the sgRNA prior to alteration is reduced, e.g., the number of nucleotides originally belonging to that portion of the loop is reduced (e.g., all nucleotides originally belonging to the loop are replaced by the nuclear localization functional sequence).
In some cases, a linker sequence (linker) may or may not be used when introducing a sequence that functions as a nuclear localization in the snoRNA molecule on the nucleotide chain of the sgRNA. The sequence that functions as a nuclear localization in the snoRNA molecule can be linked to the framework sequence of the pre-engineered sgRNA molecule by a linking sequence. Furthermore, one end of a sequence for initiating a nuclear localization function in the snoRNA molecule is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 1(Linker1), and the other end of the sequence is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 2(Linker2), wherein the connecting sequence 1 and the connecting sequence 2 can be the same or different. Still further, the nucleotide sequence of the Linker1 is shown as positions 38-42 of SEQ ID No.1 (ggcca); the nucleotide sequence of the Linker2 is shown as 66 th to 75 th positions of SEQ ID No.1 (cugcagggcc).
Further, when 2 or more than 2 sequences with nuclear localization function in the snoRNA molecule exist in the nucleotide chain of the modified sgRNA, the nuclear localization function sequences can be directly connected with each other or connected through a connecting sequence. When the sequence for performing a nuclear localization function in the snoRNA molecule introduced into the nucleotide chain of the sgRNA is two or more box sequences, different box sequences may be directly connected to each other or may be connected to each other through a Linker sequence (which may be referred to as Linker sequence 3[ Linker3 ]). The linker sequence here may be the same as or different from the linker sequence linking the nuclear localization functional sequence and the framework sequence of the sgRNA molecule before modification. That is, the linker3 may be the same as or different from the linkers 1 and 2. Furthermore, the nucleotide sequence of the connecting sequence 3 is shown as the 51 st to 59 th positions of SEQ ID No.1 (GCGUCAGCA).
In a specific embodiment of the invention, the sequences that function as nuclear localization in the snoRNA molecule are box C' and box D in U3 snoRNA. Specifically, the box C' sequence in U3snoRNA is GAGGAAGA, and the box D sequence is GGCUGA/CUGA. The framework of the sgRNA is a sgRNA framework sequence corresponding to SpCas 9.
More specifically, the nucleotide sequence of the modified sgRNA molecule is any one of the following:
(b1) obtained by replacing the 1 st to 25 th positions of SEQ ID No.1 with a guide sequence for identifying a target nucleic acid,
wherein (b1) is the substitution of "box C 'and box D in U3 snoRNA" at the position of loop 1(loop1) of sgRNA backbone sequence corresponding to SpCas9, while the substitution of "box C' and box D in U3 snoRNA" at the position of loop 2(loop 2);
(b2) obtained by replacing the 1 st to 25 th positions of SEQ ID No.5 with a guide sequence for identifying the target nucleic acid,
wherein (b2) is the substitution of "box C' and box D in U3 snoRNA" at the position of loop 1(loop1) of sgRNA framework sequence corresponding to SpCas9, and the position of loop 2(loop2) is not modified;
(b3) obtained by replacing positions 1-25 of SEQ ID No.8 with a guide sequence for identifying the target nucleic acid,
wherein (b3) is the substitution of "box C' and box D in U3 snoRNA" at the position of loop 2(loop2) of sgRNA framework sequence corresponding to SpCas9, and the position of loop 1(loop1) is not modified.
For clarity, the lengths of the guide sequences described in (b1), (b2) and (b3) above may be varied moderately, and are not limited to a length of only 25 bp.
Wherein, positions 1-25 of SEQ ID No.1, positions 1-25 of SEQ ID No.5 and positions 1-25 of SEQ ID No.8 are guide sequences for identifying the target nucleic acid in the examples (GAACGGCUCGGAGAUCAUCAUUGCG).
In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the invention can be more than or equal to 10bp, more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp or more than or equal to 40bp, and can be less than or equal to 60bp, less than or equal to 50bp, less than or equal to 40bp, less than or equal to 30bp, less than or equal to 25bp, less than or equal to 20bp or less than or equal to 15 bp. Under certain conditions, the sequence length of the guide sequence of the modified sgRNA molecule can be 10bp-50bp, 10bp-40bp, 15bp-35bp, 15bp-30bp, 15bp-25bp, 17bp-24bp or 18bp-22bp, and can also be 20bp-35bp, 25bp-35bp or 28bp-32 bp.
In some cases, the sequence length of the backbone sequence of the modified sgRNA molecules of the invention (comprising the box sequence from snorRNA) may be equal to or greater than 15bp, equal to or greater than 20bp, equal to or greater than 25bp, equal to or greater than 30bp, equal to or greater than 40bp, equal to or greater than 50bp, equal to or greater than 60bp, equal to or greater than 70bp, equal to or greater than 80bp, equal to or greater than 90bp, equal to or greater than 100bp, equal to or greater than 110bp, equal to or greater than 120bp, equal to or greater than 130bp, equal to or greater than 140bp, equal to or greater than 150bp, equal to or greater than 160bp, equal to or greater than 170bp, equal to or greater than 180bp, equal to or greater than 200bp, equal to or greater than 210bp, equal to or greater than 250bp, equal to or greater than 300bp, equal to or greater than or equal to or greater than 150bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than or equal to or greater than 60bp, equal to or greater than 30bp, equal to or greater than 60bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than 200bp, equal to or greater than 200bp, equal to or equal to 100bp, equal to or equal to. In some cases, the sequence length of the backbone sequence (comprising box sequences from snornas) of the engineered sgRNA molecules described herein can be 10bp-300bp, 20bp-250bp, 30bp-240bp, 50bp-220bp, 80bp-200bp, or 100bp-180 bp.
In some cases, the modified sgRNA molecules of the invention may have a sequence length of ≥ 15bp, ≥ 20bp, ≥ 25bp, ≥ 30bp, ≥ 40bp, ≥ 50bp, ≥ 60bp, ≥ 70bp, ≥ 80bp, ≥ 90bp, ≥ 100bp, ≥ 110bp, ≥ 120bp, ≥ 130bp, ≥ 140bp, ≥ 150bp, ≥ 160bp, ≥ 170bp, ≥ 180bp, ≥ 190bp, ≥ 200bp, ≥ 210bp, ≥ 220, bp 250bp, ≥ 300bp or ≥ 350bp, or ≤ 350bp, ≤ 300bp, ≤ 250bp, ≤ 220bp, ≤ 210bp, ≤ 200bp, ≤ 190bp, ≤ 180bp, ≥ 170bp, ≤ 150bp, ≤ 300bp, ≤ 130bp, ≤ 100bp, ≤ 60bp, ≤ 100bp, or. Under certain conditions, the sequence length of the modified sgRNA molecule can be 10bp-300bp, 30bp-250bp, 50bp-240bp, 70bp-240bp, 90bp-220bp or 110bp-200 bp.
In some cases, the engineered sgRNA molecule can be used in conjunction with a Cas protein for gene targeting or modification, e.g., for eukaryotic cell gene targeting or modification, further for animal cell gene targeting or modification, and yet further for human cell gene targeting or modification. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence. In some cases, the Cas protein contains a nuclear localization sequence.
The engineered sgRNA molecule can direct the Cas protein to the nucleus, and further, can direct the Cas protein to a target nucleic acid within the nucleus. In some cases, the engineered sgRNA can direct Cas protein to the nucleus and target or modify the target nucleic acid. In some cases, the engineered sgRNA can form a complex with the Cas protein, direct the Cas protein to the nucleus, and target or modify the target nucleic acid. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the targeting the target nucleic acid consists of one or more of: cleaving one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids, transporting one or more target nucleic acids, masking one or more target nucleic acids, binding one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs. Further, in some cases, the targeting the target nucleic acid is binding to the target nucleic acid; in some cases, the targeting the target nucleic acid is cleaving the target nucleic acid.
In some cases, the modifying the target nucleic acid consists of one or more of: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of nucleic acids, demethylation of nucleic acids, and deamination of nucleic acids.
In some cases, the sgRNA includes at least one chemically modified nucleotide, non-limiting examples of which include 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-moe), 2 '-fluoro (2' -F), phosphorothioate (P ═ S) bond modifications between nucleotides. The chemical modification can be located on any number of nucleotides at any position. In some cases, the sgRNA comprises a modification at the 5 'end and/or the 3' end.
In some cases, the engineered sgRNA molecule can additionally add any number of nucleotides for modification, non-limiting examples such as 2 additional guanine nucleotides at the end of the sgRNA guide sequence in patent publication No. CN 104968784B.
In some cases, the Cas protein is selected from Cas9, Cas12, and Cas 13. In some cases, the Cas protein is selected from Cas9, Cas 12. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 without Cas endonuclease activity, including but not limited to a completely inactivated dead Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickases (nickases) with only single strand cleavage function, e.g., Cas9 nickases (Cas9 nickase, nCas9), Cas12 nickases. In some cases, the Cas9 is selected from SpCas9 (streptococcus pyogenes Cas 9).
It is understood that a corresponding method for modifying sgRNA molecules is within the scope of the present invention, as long as the sgRNA molecules modified to include sequences that serve the nuclear localization function in the snoRNA molecule are capable of directing any one specific Cas protein (without the nuclear localization sequence) to the nucleus.
In a second aspect, the invention claims an engineered sgRNA molecule.
In some cases, the engineered sgRNA molecule is prepared by the method described above in the first aspect.
In some cases, the engineered sgRNA molecule comprises a sequence on the nucleotide chain that functions as a nuclear localization in the snoRNA molecule. Further, in some cases, the engineered sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions in nuclear localization in the snoRNA molecule. The backbone sequence of the modified sgRNA molecule can be obtained by inserting or replacing a sequence with a nuclear localization function in the snoRNA molecule into the backbone sequence of the pre-modified sgRNA molecule.
In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule may be selected from the sequences shown below (a1) and/or (a 2):
(a1) box C' and/or box D sequences from snoRNA;
(a2) box H and/or box ACA sequences from snoRNA.
Further, the sequence for nuclear localization function in the snoRNA molecule is box C' sequence and/or box D sequence.
In some cases, the box C' sequence comprises or is a sequence DGAHBN, where D is U, G, or a; h is U, A or C; b is G, U or C; n may be any ribonucleotide.
In some cases, the box D sequence comprises the sequence NYVWGA or CUGA. Further, the box D sequence comprises the sequence NYVWGA or GGCUGA. Still further, the box D sequence is the sequence NYVWGA, GGCUGA or CUGA. Wherein N may be any ribonucleotide; y is C or U; v is C, G or A; w is U or A.
In some cases, the sequences that function in nuclear localization in the snoRNA molecule are box C ' and box D, and the box C ' sequence comprises GAGGAAGA or the box C ' sequence is GAGGAAGA, the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.
In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the sequence that functions as a nuclear localization in the snoRNA molecule is GAGGAAGAGCGUCAGCAGGCUGA.
In some cases, the snoRNA molecule is selected from any snoRNA comprising a box C' or box D sequence. In some cases, the snoRNA molecule is any selected from: u3snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA, and U24 to U63 snoRNA.
In some cases, the sequence inserted or substituted into the backbone sequence of the sgRNA molecule prior to engineering that functions as a nuclear localisation in the snoRNA molecule is a sequence inserted or substituted into the snoRNA molecule that functions as a nuclear localisation in the snoRNA molecule prior to engineering that is a non-complementary pairing sequence of the backbone sequence secondary structure of the sgRNA molecule (i.e. a portion that does not form intramolecular base complementary pairings). The secondary structure of the framework sequence can be predicted by a person skilled in the art according to a conventional calculation method, or can be determined according to a conventional experimental method. The complementary pairing may be a conventional A-U, C-G base complementary pairing, or may not include other less common base complementary pairing (e.g., G-U, A-A, A-C, A-G, G-G, U-U, U-C pairing).
Further, in some cases, the scaffold sequence secondary structure non-complementary pairing sequence is a loop (loop) sequence of the pre-engineered sgRNA molecule scaffold sequence secondary structure (i.e., a loop in the sgRNA scaffold sequence stem-loop structure).
The engineered sgRNA molecules of the invention may comprise engineered or unmodified crRNA sequences, as well as engineered or unmodified tracrRNA sequences. For clarity, reference is made to the non-limiting exemplary diagram shown in fig. 2 (the sequence comprising loop1 and loop2 in fig. 2 is the sgRNA backbone sequence corresponding to SpCas 9). The nuclear localization functional sequence may be linked (inserted or substituted) to a position in the sgRNA secondary structure where the crRNA sequence is chimeric (i.e., linked) to the tracrRNA sequence, such as the loop1 position in fig. 2; alternatively, the nuclear localization functional sequence may be attached (i.e., inserted or substituted) to the inside of the crRNA sequence of the sgRNA or to the inside of the tracrRNA sequence, non-limiting examples such as optionally to a loop (loop) formed by only the tracrRNA sequence in the secondary structure of the sgRNA, as in loop2 position in fig. 2. Upon insertion of the nuclear localization functional sequence, the number of nucleotides originally belonging to the sgRNA before the alteration is not reduced, e.g., the number of nucleotides originally belonging to that part of the loop is not reduced. After replacement into the nuclear localization functional sequence, the number of nucleotides originally belonging to the sgRNA prior to alteration is reduced, e.g., the number of nucleotides originally belonging to that portion of the loop is reduced (e.g., all nucleotides originally belonging to the loop are replaced by the nuclear localization functional sequence).
In some cases, a linker sequence (linker) may or may not be used when introducing a sequence that functions as a nuclear localization in the snoRNA molecule on the nucleotide chain of the sgRNA. The sequence that functions as a nuclear localization in the snoRNA molecule can be linked to the framework sequence of the pre-engineered sgRNA molecule by a linking sequence. Furthermore, one end of a sequence for initiating a nuclear localization function in the snoRNA molecule is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 1(Linker1), and the other end of the sequence is connected with the framework sequence of the modified sgRNA molecule through a connecting sequence 2(Linker2), wherein the connecting sequence 1 and the connecting sequence 2 can be the same or different. Still further, the nucleotide sequence of the Linker1 is shown as the 38 th to 42 th positions of SEQ ID No.1 (ggcca); the nucleotide sequence of the Linker2 is shown as 66 th to 75 th positions of SEQ ID No.1 (cugcagggcc).
Further, when 2 or more than 2 sequences with nuclear localization function in the snoRNA molecule exist in the nucleotide chain of the modified sgRNA, the nuclear localization function sequences can be directly connected with each other or connected through a connecting sequence. When the sequence for initiating the nuclear localization function in the snoRNA molecule introduced into the nucleotide chain of the sgRNA is two or more box sequences, different box sequences may be directly connected to each other or may be connected to each other through a Linker sequence (which may be referred to as Linker sequence 3[ Linker3 ]). The linker sequence here may be the same as or different from the linker sequence linking the nuclear localization functional sequence and the framework sequence of the sgRNA molecule before modification. That is, the linker3 may be the same as or different from the linker1 and the linker 2. Furthermore, the nucleotide sequence of the connecting sequence 3 is shown as the 51 st to 59 th positions of SEQ ID No.1 (GCGUCAGCA).
In a particular embodiment of the invention, the sequences that function in nuclear localization in the snoRNA molecule are box C' and box D in the U3 snoRNA. Specifically, the box C' sequence in U3snoRNA is GAGGAAGA, and the box D sequence is GGCUGA/CUGA. The framework of the sgRNA is a sgRNA framework sequence corresponding to SpCas 9.
More specifically, the nucleotide sequence of the modified sgRNA molecule is any one of the following:
(b1) obtained by replacing the 1 st to 25 th positions of SEQ ID No.1 with a guide sequence for identifying a target nucleic acid,
wherein (b1) is the substitution of "box C 'and box D in U3 snoRNA" at the position of loop 1(loop1) of sgRNA backbone sequence corresponding to SpCas9, while the substitution of "box C' and box D in U3 snoRNA" at the position of loop 2(loop 2);
(b2) obtained by replacing the 1 st to 25 th positions of SEQ ID No.5 with a guide sequence for identifying the target nucleic acid,
wherein (b2) is the substitution of "box C' and box D in U3 snoRNA" at the position of loop 1(loop1) of sgRNA framework sequence corresponding to SpCas9, and the position of loop 2(loop2) is not modified;
(b3) obtained by replacing the 1 st to 25 th positions of SEQ ID No.8 with a guide sequence for identifying the target nucleic acid,
wherein (b3) is the substitution of "box C' and box D in U3 snoRNA" at the position of loop 2(loop2) of sgRNA framework sequence corresponding to SpCas9, and the position of loop 1(loop1) is not modified.
For clarity, the lengths of the guide sequences described in (b1), (b2) and (b3) above may be varied moderately, and are not limited to a length of only 25 bp.
Wherein positions 1-25 of SEQ ID No.1, positions 1-25 of SEQ ID No.5 and positions 1-25 of SEQ ID No.8 are guide sequences for identifying a target nucleic acid in the examples (GAACGGCUCGGAGAUCAUCAUUGCG).
In some cases, the sequence length of the guide sequence of the modified sgRNA molecule of the invention can be more than or equal to 10bp, more than or equal to 15bp, more than or equal to 20bp, more than or equal to 25bp, more than or equal to 30bp or more than or equal to 40bp, and can be less than or equal to 60bp, less than or equal to 50bp, less than or equal to 40bp, less than or equal to 30bp, less than or equal to 25bp, less than or equal to 20bp or less than or equal to 15 bp. Under certain conditions, the sequence length of the guide sequence of the modified sgRNA molecule can be 10bp-50bp, 10bp-40bp, 15bp-35bp, 15bp-30bp, 15bp-25bp, 17bp-24bp or 18bp-22bp, and can also be 20bp-35bp, 25bp-35bp or 28bp-32 bp.
In some cases, the sequence length of the backbone sequence of the modified sgRNA molecules of the invention (comprising the box sequence from snorRNA) may be equal to or greater than 15bp, equal to or greater than 20bp, equal to or greater than 25bp, equal to or greater than 30bp, equal to or greater than 40bp, equal to or greater than 50bp, equal to or greater than 60bp, equal to or greater than 70bp, equal to or greater than 80bp, equal to or greater than 90bp, equal to or greater than 100bp, equal to or greater than 110bp, equal to or greater than 120bp, equal to or greater than 130bp, equal to or greater than 140bp, equal to or greater than 150bp, equal to or greater than 160bp, equal to or greater than 170bp, equal to or greater than 180bp, equal to or greater than 200bp, equal to or greater than 210bp, equal to or greater than 250bp, equal to or greater than 300bp, equal to or greater than or equal to or greater than 150bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than or equal to or greater than 60bp, equal to or greater than 30bp, equal to or greater than 60bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than 100bp, equal to or greater than 200bp, equal to or greater than 200bp, equal to or equal to 100bp, equal to or equal to. In some cases, the sequence length of the backbone sequence (comprising box sequences from snornas) of the engineered sgRNA molecules described herein can be 10bp-300bp, 20bp-250bp, 30bp-240bp, 50bp-220bp, 80bp-200bp, or 100bp-180 bp.
In some cases, the modified sgRNA molecules of the invention may have a sequence length of ≥ 15bp, ≥ 20bp, ≥ 25bp, ≥ 30bp, ≥ 40bp, ≥ 50bp, ≥ 60bp, ≥ 70bp, ≥ 80bp, ≥ 90bp, ≥ 100bp, ≥ 110bp, ≥ 120bp, ≥ 130bp, ≥ 140bp, ≥ 150bp, ≥ 160bp, ≥ 170bp, ≥ 180bp, ≥ 190bp, ≥ 200bp, ≥ 210bp, ≥ 220, bp 250bp, ≥ 300bp or ≥ 350bp, or ≤ 350bp, ≤ 300bp, ≤ 250bp, ≤ 220bp, ≤ 210bp, ≤ 200bp, ≤ 190bp, ≤ 180bp, ≥ 170bp, ≤ 150bp, ≤ 300bp, ≤ 130bp, ≤ 100bp, ≤ 60bp, ≤ 100bp, or. Under certain conditions, the sequence length of the modified sgRNA molecule can be 10bp-300bp, 30bp-250bp, 50bp-240bp, 70bp-240bp, 90bp-220bp or 110bp-200 bp.
In some cases, the engineered sgRNA molecule can be used in conjunction with a Cas protein for gene targeting or modification, e.g., for eukaryotic cell gene targeting or modification, further for animal cell gene targeting or modification, and yet further for human cell gene targeting or modification. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence. In some cases, the Cas protein contains a nuclear localization sequence.
The engineered sgRNA molecule can direct the Cas protein to the nucleus, and further, can direct the Cas protein to a target nucleic acid within the nucleus. In some cases, the engineered sgRNA can direct Cas protein to the nucleus and target or modify the target nucleic acid. In some cases, the engineered sgRNA can form a complex with the Cas protein, direct the Cas protein to the nucleus, and target or modify the target nucleic acid. The Cas protein may or may not contain a nuclear localization sequence. In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the targeting the target nucleic acid consists of one or more of: cleaving one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids, transporting one or more target nucleic acids, masking one or more target nucleic acids, binding one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs. Further, in some cases, the targeting the target nucleic acid is binding to the target nucleic acid; in some cases, the targeting the target nucleic acid is cleaving the target nucleic acid.
In some cases, the modifying the target nucleic acid consists of one or more of: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of nucleic acids, demethylation of nucleic acids, and deamination of nucleic acids.
In some cases, the sgRNA includes at least one chemically modified nucleotide, non-limiting examples of which include 2 '-O-methyl (2' -O-Me), 2'-O- (2-methoxyethyl) (2' -O-moe), 2 '-fluoro (2' -F), phosphorothioate (P ═ S) bond modifications between nucleotides. The chemical modification can be located on any number of nucleotides at any position. In some cases, the sgRNA comprises a modification at the 5 'end and/or the 3' end.
In some cases, the engineered sgRNA molecule can additionally add any number of nucleotides for modification, non-limiting examples such as 2 additional guanine nucleotides at the end of the sgRNA guide sequence in patent publication No. CN 104968784B.
In some cases, the Cas protein is selected from Cas9, Cas12, and Cas 13. In some cases, the Cas protein is selected from Cas9, Cas 12. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 without Cas endonuclease activity, including but not limited to a completely inactivated dead Cas protein (dead Cas protein). In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickases (nickases) with single strand cleavage function only, e.g., Cas9 nickase (Cas9 nickase, nCas9), Cas12 nickase. In some cases, the Cas9 is selected from SpCas9 (streptococcus pyogenes Cas 9).
It is understood that such engineered sgRNA molecules (comprising sequences that serve a nuclear localization function in a snoRNA molecule) are within the scope of the present invention, provided that any one particular Cas protein (not comprising a nuclear localization sequence) can be directed to the nucleus of the cell.
In a third aspect, the invention claims a DNA molecule encoding the engineered sgRNA molecule of the second aspect.
In a particular embodiment of the invention, the DNA molecule is any one of:
(c1) replacing 1 st-25 th position of SEQ ID No.2 with DNA sequence corresponding to the guide sequence to obtain the product;
(c2) replacing the 1 st to 25 th sites of SEQ ID No.6 with a DNA sequence corresponding to the guide sequence to obtain the DNA sequence;
(c3) replacing 1 st-25 th position of SEQ ID No.9 with DNA sequence corresponding to the guide sequence to obtain the product;
wherein (c1) - (c3) correspond to (b1) - (b3) above in sequence.
Wherein, the 1 st to 25 th positions of SEQ ID No.2, the 1 st to 25 th positions of SEQ ID No.6 and the 1 st to 25 th positions of SEQ ID No.9 are DNA sequences corresponding to the guide sequences for identifying the target nucleic acid in the examples (GAACGGCTCGGAGATCATCATTGCG).
In a fourth aspect, the invention claims an expression cassette, an expression vector, a recombinant bacterium or a transgenic cell line comprising a DNA molecule as described in the third aspect above.
The expression vector may comprise any regulatory element operably linked to the DNA molecule. In some cases, the regulatory element is a promoter and/or enhancer. In some cases, the regulatory element is a promoter.
In a particular embodiment of the invention, the promoter in the expression cassette that initiates transcription of the DNA molecule is the U6 promoter.
More specifically, the expression cassette is any one of:
(d1) obtained by replacing the 250 th-274 th position of SEQ ID No.3 with a DNA sequence corresponding to a guide sequence;
(d2) obtained by replacing the 250 nd-274 nd position of the SEQ ID No.7 with a DNA sequence corresponding to the guide sequence;
(d3) obtained by replacing the 250 th-274 th position of SEQ ID No.10 with a DNA sequence corresponding to a guide sequence;
wherein (d1) - (d3) correspond to the above (c1) - (c3) in sequence.
Wherein the positions 250-274 of SEQ ID No.3, 250-274 of SEQ ID No.7 and 250-274 of SEQ ID No.10 are DNA sequences corresponding to the guide sequences for identifying the target nucleic acid in the examples (GAACGGCTCGGAGATCATCATTGCG).
Accordingly, the expression vector may be an expression vector comprising an expression cassette as described hereinbefore.
In a specific embodiment of the present invention, the expression vector is a recombinant vector obtained by replacing a small fragment between the cleavage sites Kpn I and Not I of the pX601 vector with the expression cassette described above.
In a fifth aspect, the invention claims a kit.
The kit claimed in the present invention may comprise any one of the following:
i. a Cas protein, and an engineered sgRNA molecule as described in the second aspect, supra.
ii. An expression vector comprising a nucleotide sequence encoding a Cas protein (denoted as expression vector 1), and an engineered sgRNA molecule as described in the second aspect above.
iii, a Cas protein, and an expression vector comprising a nucleotide sequence encoding the engineered sgRNA molecule described above in the second aspect (denoted as expression vector 2).
iv, an expression vector comprising a nucleotide sequence encoding a Cas protein (i.e., expression vector 1), and an expression vector comprising a nucleotide sequence encoding an engineered sgRNA molecule as described above in the second aspect (i.e., expression vector 2).
v, an expression vector comprising a nucleotide sequence encoding a Cas protein and a nucleotide sequence encoding the engineered sgRNA molecule described in the second aspect above (denoted as expression vector 3).
In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the Cas protein contains a nuclear localization sequence.
In a sixth aspect, the invention claims a composition selected from any one of:
I. a composition comprising: a Cas protein, and an engineered sgRNA molecule as described above in the second aspect;
II. A composition comprising: a nucleic acid molecule 1 encoding a Cas protein (expression vector 1 as described in the fifth aspect above), and an engineered sgRNA molecule as described in the second aspect above;
III, a composition comprising: a Cas protein, and a nucleic acid molecule 2 encoding an engineered sgRNA molecule as described in the second aspect above (expression vector 2 as described in the fifth aspect above);
IV, a composition comprising: a nucleic acid molecule 1 encoding a Cas protein (expression vector 1 as described in the fifth aspect above), and a nucleic acid molecule 2 encoding an engineered sgRNA molecule as described in the second aspect above (expression vector 2 as described in the fifth aspect above);
v, a composition comprising: a nucleic acid molecule 3 encoding a Cas protein and the engineered sgRNA molecule described in the second aspect above (expression vector 3 as described in the fifth aspect above).
In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the Cas protein contains a nuclear localization sequence.
In a seventh aspect, the invention claims an RNP complex formed by a Cas protein and an engineered sgRNA molecule as described in the second aspect above.
In some cases, the Cas protein does not contain a nuclear localization sequence.
In some cases, the Cas protein contains a nuclear localization sequence.
In the above fifth to seventh aspects, the Cas protein may be selected from: cas9, Cas12, and Cas 13.
In some cases, the Cas protein is selected from Cas9, Cas 12. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 having Cas endonuclease activity. In some cases, the Cas protein is selected from Cas9, Cas12, and Cas13 without Cas endonuclease activity, including but not limited to a dead Cas protein (dead Cas protein) that is completely inactivated. In some cases, the Cas protein is selected from partially inactivated Cas9 and Cas12, including but not limited to Cas nickases (nickases) with only single strand cleavage function, e.g., Cas9 nickases (Cas9 nickase, nCas9), Cas12 nickases.
In a specific embodiment of the invention, the Cas protein is streptococcus pyogenes Cas9(SpCas 9).
In v of the above fifth aspect, the expression vector 3 can be a recombinant vector obtained by inserting a coding gene of a Cas protein (selected from Cas9, Cas12 or Cas 13; specifically SpCas9) which does not contain a Nuclear Localization Signal (NLS) into the expression vector of the above fourth aspect.
In a specific embodiment of the present invention, the expression vector 3 is any one of:
(e1) the complete sequence is obtained by replacing the 5365-5389 th site of SEQ ID No.4 with a DNA sequence corresponding to the guide sequence;
(e2) the complete sequence is obtained by replacing the 5365-5389 th site of SEQ ID No.11 with a DNA sequence corresponding to the guide sequence;
(e3) the complete sequence is obtained by replacing the 5365-5389 th position of SEQ ID No.12 with a DNA sequence corresponding to the guide sequence.
Wherein (e1) - (e3) correspond to the foregoing (d1) - (d3) in sequence.
Wherein, the 5365-5389 th positions of SEQ ID No.4, 5365-5389 th positions of SEQ ID No.11 and 5365-5389 th positions of SEQ ID No.12 are DNA sequences corresponding to the guide sequences for identifying the target nucleic acid in the examples (GAACGGCTCGGAGATCATCATTGCG).
In an eighth aspect, the invention claims the use of an engineered sgRNA molecule as described in the second aspect above, a DNA molecule as described in the third aspect above, an expression cassette, an expression vector, a recombinant bacterium or a transgenic cell line as described in the fourth aspect above, a kit as described in the fifth aspect above, a composition as described in the sixth aspect above or an RNP complex as described in the seventh aspect above in any one of:
p1, targeting or modifying a genomic target nucleic acid; and
p2, products prepared for targeting or modifying genomic target nucleic acids.
Wherein the targeting or modification of the genomic target nucleic acid may be: the gene targeting or modification method is used for eukaryotic cell gene targeting or modification, further can be used for animal cell gene targeting or modification, and further can be used for human cell gene targeting or modification.
In some cases, the product for targeting or modifying a genomic target nucleic acid is a medicament for treating a disease in an animal, including but not limited to a human subject.
In a ninth aspect, the invention claims a method of targeting or modifying a genomic target nucleic acid.
The presently claimed methods of targeting or modifying a genomic target nucleic acid can comprise: and (3) introducing the composition described in the sixth aspect into an organism or an organism cell, so that both the Cas protein and the modified sgRNA molecule are expressed, and the target or modification of the genome target nucleic acid is realized.
In a tenth aspect, the invention claims a method for preparing a mutant of a biological cell.
The claimed method for preparing a mutant of a biological cell may comprise: the genome of the biological cell is targeted or modified according to the method of the ninth aspect to obtain a mutant of the biological cell.
Wherein, the biological cell can be eukaryotic cell, further can be animal cell, and further can be human cell.
In a specific embodiment of the invention, the biological cell is a 293T cell.
In an eleventh aspect, the invention claims a method of making a biological mutant.
The claimed method for preparing a mutant of a biological cell may comprise: the genome of the organism is targeted or modified according to the method described in the ninth aspect above to obtain the biological mutant.
Wherein the organism may be a eukaryote, further an animal, further a mammal, such as a human.
The invention has the beneficial effects that:
1. in the prior art, the Cas protein is often linked with a nuclear localization sequence in practical application to help the Cas protein to be localized to the nucleus. The invention develops another novel CRISPR-Cas system, is a brand new technical scheme, and can effectively complete gene editing. Without being bound by theory, one skilled in the art can reasonably speculate that the sgrnas of the present invention form a complex with a Cas protein and subsequently enter the nucleus through interaction of the nuclear localization functional sequence from the snoRNA with the associated protein. Theoretically, it can be speculated that when the Cas protein is not connected with a Nuclear Localization Sequence (NLS), the Cas protein can be transported into the nucleus through the nuclear localization effect of the sgRNA connected with a snoRNA nuclear localization functional sequence, so that the nuclear entry of the Cas9 can be reduced, and the off-target effect can be reduced.
2. Introduction of a C'/D box sequence in the loop (loop) portion of the sgRNA molecular backbone can effectively guide Cas proteins into the nucleus for gene editing.
3. The editing activity is relatively low when a plurality of loop (loop) parts of the sgRNA molecular skeleton are introduced into 2 or more C '/D box sequences in total, and the editing activity is higher when only 1C '/D box sequence is introduced (namely, only 1C ' box and 1D box sequence are introduced). This is just the opposite of the case of gene editing by relying on Cas protein linked with NLS to enter the nucleus (in the practical application scenario, the more nuclear localization sequence NLS, the higher the editing efficiency is). Therefore, the corresponding technical scheme achieves unexpected technical effects.
4. In the case of sgrnas with only one C '/D box, the editing efficiency was higher when the C'/D box was attached at the loop2 position (distal end of the guide sequence) as shown in fig. 2 than when it was attached at the loop1 position (proximal end of the guide sequence).
Drawings
FIG. 1 is an exemplary box C'/D sequence of U3 snorRNA molecules. Derived from the documents Narayanan A, Speckmann W, Terns R, Terns MP.role of the box C/D motif in localization of small nucleolar RNAs to linked boxes and nucleoli. mol Biol cell.9 199Jul; 10(7) 2131-47.doi 10.1091/mbc 10.7.2131.PMID 10397754; PMCID PMC25425.
Fig. 2 shows the molecular structure of sgRNA corresponding to SpCas9 and containing a specific framework sequence, in which the crRNA sequence and tracrRNA sequence of sgRNA before modification are shown. And shows 2 of the numerous sites at which the snoRNA nuclear localization functional sequence can be inserted/substituted (loop1 and loop 2). Loop1 is the junction site of the crRNA and tracrRNA, which is immediately adjacent to stem 1 formed by base-complementary pairing, where a nuclear localization functional sequence may be attached. The loop (loop)2 is located within the tracrRNA sequence immediately adjacent to the stem 2 formed by base complementary pairing where a nuclear localisation function may be attached. The crRNA contains a guide sequence, where N at the guide sequence represents any ribonucleotide, and the ellipses indicate that the number of ribonucleotides in the guide sequence may vary as appropriate.
FIG. 3 is a carrier schematic diagram of the target carrier C'/D box-PAM.
FIG. 4 is a control vector SpCas9-PAM plasmid map.
FIG. 5 is a plasmid map of the lentiviral vector pGFPPAM.
FIG. 6 shows the results of measuring the proportion of GFP-positive cells by flow cytometry in example 1.
FIG. 7 is a schematic carrier diagram of the target carrier C'/D box-1-1.
FIG. 8 is a schematic carrier diagram of the object carrier C'/D box-1-2.
FIG. 9 shows the results of flow cytometry for detecting the proportion of GFP-positive cells in example 2.
Detailed Description
Defining:
as used herein, the term "Cas protein", or a protein or polypeptide having "Cas enzymatic activity" or "Cas endonuclease activity", relates to a CRISPR-associated (Cas) polypeptide or protein encoded by a CRISPR-associated (Cas) gene, which Cas protein or polypeptide is capable of being directed to a target sequence in a target nucleic acid and targeting or modifying the target nucleic acid when complexed or functionally combined with one or more guide RNAs (guide RNA, sgRNA molecules). By sgRNA guidance, the Cas endonuclease recognizes, targets, or modifies a specific target site (target sequence or nucleotide sequence near the target sequence) in the target nucleic acid.
As used herein, the term "sgRNA" (single-stranded guide RNA) refers to a single guide RNA used together with a Cas protein. The sgRNA is a fusion of crRNA and tracrRNA, and comprises a guide sequence; or the sgRNA comprises a crRNA sequence and a guide sequence, and does not comprise a tracrRNA sequence.
As used herein, the term "guide sequence" is used interchangeably with "targeting domain" and refers to a contiguous nucleotide sequence in a sgRNA that has partial or complete complementarity to a target sequence in a target nucleic acid and can hybridize to the target sequence in the target nucleic acid through base-complementary pairing facilitated by the Cas protein. Complete complementarity of the guide sequences described herein to the target sequence is not required, so long as there is sufficient complementarity to cause hybridization and promote formation of a CRISPR/Cas complex.
As used herein, the term "framework sequence" when referring to a sgRNA is intended to mean other nucleotide sequences in the sgRNA in addition to the guide sequence. For example, sequences between the guide sequence and the poly-U corresponding to the transcription terminator in the sgRNA can be included. The backbone sequence will generally not change due to changes in the target sequence. Thus, the backbone sequence may be any feasible sequence.
As used herein, the term "target nucleic acid" may comprise any polynucleotide, such as DNA (target DNA) or RNA (target RNA). By "target nucleic acid" is meant a nucleic acid that the sgRNA directs the Cas protein to target or modify. The term "target nucleic acid" can be any polynucleotide endogenous or exogenous to a cell (e.g., a eukaryotic cell). For example, a "target nucleic acid" can be a polynucleotide present in a eukaryotic cell, and can be a sequence (or portion thereof) that encodes a gene product (e.g., a protein) or a non-coding sequence (or portion thereof). In certain instances, a "target nucleic acid" can include one or more disease-associated genes and polynucleotides as well as signaling biochemical pathway-associated genes and polynucleotides.
As used herein, the term "target sequence" refers to a short piece of nucleotide sequence in a target nucleic acid molecule that is complementary (fully or partially complementary) or hybridizes to a guide sequence of a sgRNA molecule. The target sequence is often tens of bp in length, and may be, for example, about 10bp, about 20bp, about 30bp, about 40bp, about 50bp, about 60 bp.
As used herein, the term "targeted" is defined as consisting of one or more of the following: cleaving one or more target nucleic acids, visualizing or detecting one or more target nucleic acids, labeling one or more target nucleic acids, transporting one or more target nucleic acids, masking one or more target nucleic acids, binding one or more target nucleic acids, increasing the level of transcription and/or translation of a gene to which the target sequence belongs, and decreasing the level of transcription and/or translation of a gene to which the target sequence belongs.
As used herein, the term "modified" is defined as consisting of one or more of the following: nucleobase substitution, nucleobase deletion, nucleobase insertion, methylation of nucleic acids, demethylation of nucleic acids, and deamination of nucleic acids.
As used herein, the term "cleavage" (cleavage) refers to the breaking of a covalent bond (e.g., a covalent phosphodiester bond) in the ribosyl phosphodiester backbone of a polynucleotide, including but not limited to: the single-stranded polynucleotide is cleaved to cleave either single strand of the double-stranded polynucleotide comprising two complementary single strands, and both single strands of the double-stranded polynucleotide comprising two complementary single strands are cleaved.
As used herein, when referring to "a sequence that functions as a nuclear localisation function in a snoRNA molecule" it is intended to mean a nucleotide sequence/element of the snoRNA molecule that plays an important role in a nuclear localisation function, in particular a nucleotide sequence that plays an important role in a nucleolar localisation function, non-limiting examples include C' box, D box.
As used herein, the terms C 'box and box C' are used interchangeably; the terms D box and box D are used interchangeably.
As used herein, the term "nuclear localization signal" or Nuclear Localization Sequence (NLS) is a sequence of amino acids that serves as a tag for the transport of proteins through the nucleus and into the nucleus.
As used herein, the term "snoRNA" (Small nucleolar RNA) is a large class of eukaryotic RNAs that play a role in the biogenesis of ribosomes within the nucleoli.
As used herein, the term "engineered" when referring to a sgRNA includes altering the nucleotide sequence of the sgRNA resulting in an engineered sgRNA molecule.
As used herein, the term "non-complementary pairing sequence" when referring to the backbone sequence secondary structure of a sgRNA molecule refers to nucleotides in the backbone sequence secondary structure of the sgRNA that do not form intramolecular base-complementary pairings. The secondary structure of the framework sequence is predicted by a person skilled in the art according to a conventional calculation method, or determined according to a conventional experimental method.
As used herein, the terms "crRNA" and "tracrRNA" have the meanings that are commonly recognized by those skilled in the art, respectively.
As used herein, when referring to sgrnas, "loop" (loop) has a meaning commonly understood by those skilled in the art, and often refers to a loop in the stem-loop structure of an RNA in which the bases pair complementarily to form a stem, while the portion that cannot pair complementarily overhangs to form a loop.
As used herein, "sgRNA" and "sgRNA molecule" are used interchangeably.
As used herein, the term "replacement" when referring to a sequence that functions as a nuclear localization in a snoRNA molecule, refers to the replacement of a fragment consisting of 1 nucleotide or more than 1 contiguous nucleotide of the sgRNA molecule prior to engineering with a nuclear localization function sequence.
As used herein, the term "insertion" when referring to a sequence that functions in nuclear localization in a snoRNA molecule refers to the insertion of a nuclear localization function sequence only into the sgRNA sequence without deleting nucleotides of the sgRNA molecule prior to the alteration.
As used herein, one instance of the "directing Cas protein to the nucleus" is that the engineered sgRNA of the invention can be transported into the nucleus, and Cas protein is also transported into the nucleus simultaneously via the sgRNA. Generally, the one or more snoRNA nuclear localization functional sequences are of sufficient strength to drive accumulation of the Cas protein in a detectable amount in the nucleus of the eukaryotic cell. Detecting whether the Cas protein is directed to the nucleus or detecting the amount of accumulation of the sgRNA and Cas protein in the nucleus can be performed by any suitable technique. For example, a detectable label can be fused to the sgRNA or Cas protein such that the location within the cell is visualized, such as in conjunction with means for detecting the location of the nucleus. The nuclei may also be isolated from the cells and their contents may then be analyzed by any suitable method for detecting RNA or protein, including but not limited to methods such as immunohistochemistry, western blot or enzymatic activity assays, and the like. Accumulations in the nucleus can also be determined indirectly, such as by measuring the effect of targeting or modification on the target nucleic acid (e.g., measuring DNA cleavage or mutation at the target sequence, or measuring changes in the level of transcription or translation of the gene to which the target sequence belongs).
As used herein, the term "operably linked" is intended to mean that the Cas protein coding sequence or the sgRNA coding sequence in the vector is linked to one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
As used herein, the term "regulatory element" is intended to include promoters, enhancers, Internal Ribosome Entry Sites (IRES), and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly-U sequences). Regulatory elements include those that direct the continuous expression of a nucleotide sequence in many types of host cells and those that direct the expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences).
The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The examples provided below serve as a guide for further modifications by a person skilled in the art and do not constitute a limitation of the invention in any way.
The experimental procedures in the following examples, unless otherwise indicated, are conventional and are carried out according to the techniques or conditions described in the literature in the field or according to the instructions of the products. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.
The document Mol Biol cell.1999Jul; 10(7) 2131 (2147) the box C'/D of the U3snorNA molecule is shown in FIG. 1. The following specific example of the invention is the introduction of box C '/D of the U3snoRNA molecule in a sgRNA molecule, i.e. the sequence GAGGAAGAGCGUCAGCAGGCUGA in fig. 1 is introduced into a sgRNA molecule, wherein GAGGAAGA is the box C' sequence and GGCUGA is the box D sequence.
Example design in experiments, sgRNA backbone sequences corresponding to SpCas9 were engineered (see fig. 2). The box C '/D of the U3snoRNA molecule can be replaced at loop1 and/or loop2 positions and the box C'/D of the U3snoRNA molecule and the remaining sgRNA backbone are linked using a linking sequence (ggcca, ctgcaggcaggcc).
The engineered sgRNA molecules of the invention may comprise a crRNA portion and a tracrRNA portion, and the nuclear localization functional sequence may be linked within the crRNA sequence or within the tracrRNA sequence, or the nuclear localization functional sequence may be linked at a chimeric position of the crRNA and the tracrRNA, such as at loop1 in fig. 2.
The box C '/D sequence of the U3snoRNA molecule was ligated to the sgRNA backbone of SpCas9 (1U 3snoRNA box C'/D was ligated at loop1 and loop2 position of sgRNA backbone, respectively, as shown in fig. 2) to form the following sequence:
guuuuagagcuaggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccuagcaaguuaaaauaagg cuaguccguuaucaacuuggccaGAGGAAGAGCGUCAGCAGGCUGAcugcagggccaagu ggcaccgagucggugc。
wherein, the upper case sequence is U3snoRNA box C '/D, the lower case and underlined is the framework sequence of sgRNA corresponding to SpCas9 before modification, and the lower case and not underlined is a connecting sequence (connecting U3snoRNA box C'/D and the sgRNA framework).
The sgRNA with nuclear entry function is formed by connecting U3snoRNA box C'/D with sgRNA corresponding to Cas9, thereby guiding the Cas9-sgRNA complex into the nucleus.
Example 1 verification of in vivo editing Activity of box C'/D-Cas9
1. Construction of verification vectors
Construction of verification vector C'/D box-PAM
The sgRNA expression cassette sequence (SEQ ID No.3) containing U3 snorRNA box C'/D was synthesized at reagent company. The positions 1-249 of SEQ ID No.3 are the U6 promoter, followed by the coding sequence of the modified sgRNA molecule (positions 250-274 are the coding sequence of the guide sequence). The expression cassette sequence encodes a sgRNA molecule containing the guide sequence GAACGGCUCGGAGAUCAUCAUUGCG for a targeted cell validation library, and a framework sequence that is substituted with 2 box C'/D. The sequence of the sgRNA molecule is shown in SEQ ID No.1, and the DNA sequence corresponding to the sgRNA molecule is shown in SEQ ID No. 2.
The expression cassette sequence (SEQ ID No.3) was assembled to the backbone of the pX601 vector (commercially available) by Kpn I and Not I enzymatic cleavage sites, and the intermediate vector C'/D box-pre was obtained after the sequencing verification.
In addition, a SpCas9 fragment containing no NLS was amplified by PCR using the vector pX459 vector (commercially available) as a template with the following primers:
F:5’-gctctctggctaactaccggtgccaccatggccGACAAGAAGTACAGCAT-3’;
R:5’-atcagcgagctctaggaattcTTAGTCGCCTCCCAGCTGAGACAG-3’。
and connecting the PCR product which is verified to be correct by sequencing to an intermediate carrier C '/D box-pre through Age I and EcoR I sites, and obtaining a target carrier C'/D box-PAM after being verified to be correct by sequencing.
The carrier complete sequence of the target carrier C'/D box-PAM is shown as SEQ ID No. 4.
A schematic representation of the target vector C'/D box-PAM is shown in FIG. 3.
2. Construction of control vector SpCas9-PAM
A control vector SpCas9-PAM was constructed that entered the nucleus using 2 x NLS.
The sequence GAACGGCTCGGAGATCATCATTGCG of the targeted cell verification library is connected to a vector skeleton pX459 for expressing SpCas9 through a Bbs I site, a control vector SpCas9-PAM is constructed, and the whole plasmid sequence of the control vector SpCas9-PAM is shown as SEQ ID No. 13.
The control vector SpCas9-PAM plasmid map is shown in figure 4.
3. Vector transfection of 293T library cells
The reference method (Hu Z, Wang D, Zhang C, et al. reverse non-functional PAMs synthesized by SpCas9 in human cells [ J ]. bioRxiv,2019:671503) constructs 293T library cells which contain target sites recognized by the sgRNAs containing box C'/D, the library itself contains a GFP library with frame shift mutations, the frame shift mutations of the expression frame after targeted editing can cause the original non-luminous cells to emit light, and the editing effect can be judged by detecting the proportion of luminous cells.
The specific method for constructing the 293T library cell comprises the following steps:
GFP PAM library design reference is made to the above-mentioned reference, the structure of which is CMV promoter-ATG-protospacer-NNNNN-EGFP-puro, where the sequence of protospacer is consistent with the literature and is GAACGGCTCGGAGATCATCATTGCG. N is any deoxyribonucleotide such as A, T, C or G.
The CMV promoter sequence used is shown in SEQ ID No. 14. The puro selection marker sequence used is shown in SEQ ID No. 15. The EGFP sequence used (without the start codon and stop codon) is shown in SEQ ID No. 16.
Experiment, hPGK promoter-EGFP expression frame of plasmid pRRLSIN. cPPT. PGK-GFP. WPRE (available on market) is cut by enzyme digestion through EcoR V and Sal I enzyme digestion sites and then used as a framework, CMV promoter-ATG-promoter-NNNNNNN-GFP-puro is connected with the framework through EcoR V and Sal I enzyme digestion sites to obtain a lentiviral vector pGFPPAM of the expression library, and the sequence of the lentiviral vector pGFPPAM is shown as SEQ ID No. 17.
The plasmid map of the lentiviral vector pGFPPAM is shown in FIG. 5.
293T library cells were obtained using the lentiviral vector pGFPPAM as described above by the method described in the above-mentioned publication (Hu Z, Wang D, Zhang C, et al. reverse non-structural PAMs recovered by SpCas9 in human cells [ J ]. bioRxiv,2019:671503) (except that the lentiviral vector pGFPPAM sequence was different, the other parts of the method were the same).
Vectors C '/D box-pre, C'/D box-PAM, pX459 and SpCas9-PAM were transfected into 293T library cells in 24-well plates at 800ng concentration.
The transfection method is as follows:
(1) the 293T cells were digested with Trypsin (Trypsin 0.25%, EDTA, Thermo, 11058021), counted, and 2X 10 cells were added to 500. mu.L of each well 5 Cells were plated in 24-well plates.
(2) For each transfection sample, please prepare the complex according to the following steps:
a. in each well plated into the cells, the aforementioned plasmid DNA was diluted in 50. mu.L of serum-free Opti-MEM I (Thermo, 25200056) reducing serum medium and gently mixed;
b. lipofectamine 2000(Thermo, 11668019) was gently mixed prior to use, and then 1.6. mu.L of Lipofectamine 2000 was diluted in each well, i.e., 50. mu.L of Opti-MEM I medium. Incubate at room temperature for 5 minutes. Note that: continuing to perform step c within 25 minutes;
c. after 5min incubation, the diluted DNA was combined with diluted Lipofectamine 2000. Mix gently and incubate at room temperature for 20 minutes (the solution may appear cloudy). Note that: the complex was stable for 6 hours at room temperature. Complexes were added to 293T library cells and mixed, and after 48h detection was performed using a flow cytometer.
4. Flow cytometry detection of SpCas9 with C'/D box and its control editing effect on library cells
Cells 48h after transfection in step 3 were digested with Trypsin (Trypsin 0.25%, EDTA, Thermo, 11058021), centrifuged at 300g 5min to remove supernatant, cells in each well were resuspended with 500 μ L PBS, GFP fluorescence expression was detected by flow cytometry, and after removal of cell debris by FCS-A and SSC-A gating, 239T-PAM cells (i.e., 293T library cells described above) were used as negative gating, and GFP fluorescence ratio was detected by flow cytometry. The experiment of this example was repeated 3 times, and the results are shown in FIG. 6, and the results of specific flow cytometry for detecting the proportion of GFP-positive cells are shown in Table 1.
TABLE 1 flow cytometry detection of GFP Positive cell proportion (average of 3 replicates)
Figure BDA0003649792830000201
Note: indicates significant differences (P <0.01) compared to the 293T-PAM, pX459, C'/D box-pre groups, respectively.
As can be seen from the flow results, the GFP positive cell ratio of the positive control SpCas9-PAM group is 3.25%, and the modeling is successful. The C '/D box-PAM grouping using box C '/D nuclear entry has a significant editing effect, library cells can be edited and made to produce GFP fluorescence, so a strategy to use box C '/D to guide the Cas9-sgRNA complex into the nucleus is feasible. Its efficiency is close to SpCas9-PAM, which uses NLS as the nuclear entry signal.
Example 2 location and number of box C'/D influences the editing Activity
1. Construction of validation vectors carrying sgRNAs of different numbers of box C'/D
In order to verify the effect of different numbers of box C '/D on Cas9 endoediting activity, the following two verification vectors were designed for expressing sgrnas with only one box C '/D at different positions of the molecule (only 1 box C '/D attached to loop1 or loop2 position of sgRNA backbone) whose backbone sequences are shown below:
C'/D box-1-1sgRNA backbone:
5’-gttttagagctaggccaGAGGAAGAGCGTCAGCAGGCTGActgcagggcctagcaagttaaaata aggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgc-3’;
C'/D box-1-2sgRNA backbone:
5’-gttttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttggccaGAGGAAGAGCGTCAGCAGGCTGActgcagggccaagtggcaccgagtcggtgc-3’;
in the two sgRNA frameworks, the upper case is box C'/D sequence, the lower case with underlined lines is sgRNA framework sequence of SpCas9 before modification, and the rest are connecting sequences.
Two sgRNA expression cassette sequences containing only one U3snoRNA box C '/D (denoted as expression cassette 1 and expression cassette 2, both containing the corresponding sequence GAACGGCTCGGAGATCATCATTGCG to the guide sequence of the targeted cell validation library) were synthesized at reagent companies, assembled on the pX601 vector backbone via Kpn I and Not I cleavage sites, and sequenced to verify that the correct intermediate vectors were named C '/D box-1-1-pre and C '/D box-1-2-pre, respectively.
The nucleotide sequence of the synthetic sequence of C'/D box-1-1 (expression box 1) is shown in SEQ ID No. 7. The positions 1-249 of SEQ ID No.7 are U6 promoter, followed by the coding sequence of the sgRNA molecule after modification (positions 250-274 are the coding sequence of the guide sequence). The expression cassette encodes sgRNA molecules (SEQ ID No.5), and the DNA sequence corresponding to the sgRNA molecules is shown in SEQ ID No. 6.
The nucleotide sequence of the synthetic sequence of C'/D box-1-2 (expression box 2) is shown in SEQ ID No. 10. The 1 st-249 bit of SEQ ID No.10 is U6 promoter, followed by the coding sequence of the sgRNA molecule after modification (the 250 st-274 bit is the coding sequence of the guide sequence). The expression cassette encodes sgRNA molecules (SEQ ID No.8), and the DNA sequence corresponding to the sgRNA molecules is shown in SEQ ID No. 9.
In addition, the SpCas9 fragment containing no NLS was amplified by PCR using vector pX459 (commercially available) as a template by the following primers:
F:5’-gctctctggctaactaccggtgccaccatggccGACAAGAAGTACAGCAT-3’;
R:5’-atcagcgagctctaggaattcTTAGTCGCCTCCCAGCTGAGACAG-3’。
and respectively connecting the PCR products which are verified to be correct by sequencing to intermediate vectors C '/D box-1-1-pre and C'/D box-1-2-pre through Age I and EcoR I sites, and obtaining target vectors C '/D box-1-1 and C'/D box-1-2 after being verified to be correct by sequencing.
The complete sequence of the C'/D box-1-1 vector is shown in SEQ ID No. 11.
A schematic vector diagram of the targeting vector C'/D box-1-1 is shown in FIG. 7.
The complete sequence of the target vector C'/D box-1-2 vector is shown in SEQ ID No. 12.
A schematic representation of the target vector C'/D box-1-2 is shown in FIG. 8.
2. Flow cytometry to detect the editing effect of SpCas9 with different numbers of C'/D boxes and its controls on library cells
The C '/D box-PAM, C'/D box-pre, pX459, SpCas9-PAM vectors of example 1, and the C '/D box-1-1, C'/D box-1-2 vectors of this example were transfected with 293T library cells of example 1, respectively. The transfection method was the same as example 1, and the GFP fluorescence ratio was measured by flow cytometry at 72h after transfection (the method was the same as example 1).
Cells 72h after transfection were digested with pancreatin (Trypsin 0.25%, EDTA, Thermo, 11058021), the supernatant was centrifuged off at 300g 5min, the cells in each well were resuspended in 500 μ L PBS, GFP fluorescence expression was detected by flow cytometry, and after removal of cell debris by FCS-A and SSC-A gating, the GFP fluorescence emission ratio was detected by flow cytometry with 239T-PAM cells (i.e., the 239T library cells described above) as A negative gating. The experiment of this example was repeated 3 times, and the results are shown in FIG. 9, and the results of specific flow cytometry for detecting the proportion of GFP-positive cells are shown in Table 2.
TABLE 2 flow cytometry detection of GFP Positive cell proportion (average of 3 replicates)
Figure BDA0003649792830000221
Figure BDA0003649792830000231
Note: indicates that the groups have significant differences (P) compared with the 293T-PAM, pX459 and C'/D box-pre groups respectively<0.01)。 ## Indicates that the C '/D box1-2 group has significant difference (P) compared with the C'/D box-PAM group<0.05)。
The editing activity of the groups C '/D box-PAM, C '/D box 1-1 (C '/D box connected to sgRNA loop1) and C '/D box1-2 (C '/D box connected to sgRNA loop2) is close to that of the SpCas9-PAM group. The ratio of GFP positive cells in the C'/D box-pre group is basically equivalent to that in the pX459 control group. From the flow results, it can be seen that the higher the editing activity with only one C '/D box (C'/D box 1-1, C '/D box1-2 group) is, the higher the editing activity with two C'/D boxes connected. In the case of sgrnas with only one C '/D box, the editing efficiency was higher when the C'/D box was attached at loop2 position (distal end of the guide sequence) than when it was attached at loop1 position (proximal end of the guide sequence).
The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The use of some of the essential features is made possible within the scope of the claims attached below.
<110> Guangzhou Ruifeng Biotechnology, Inc
<120> modified sgRNA molecule and application thereof
<130> GNCLN221725
<160> 17
<170> PatentIn version 3.5
<210> 1
<211> 169
<212> RNA
<213> Artificial sequence
<400> 1
gaacggcucg gagaucauca uugcgguuuu agagcuaggc cagaggaaga gcgucagcag 60
gcugacugca gggccuagca aguuaaaaua aggcuagucc guuaucaacu uggccagagg 120
aagagcguca gcaggcugac ugcagggcca aguggcaccg agucggugc 169
<210> 2
<211> 169
<212> DNA
<213> Artificial sequence
<400> 2
gaacggctcg gagatcatca ttgcggtttt agagctaggc cagaggaaga gcgtcagcag 60
gctgactgca gggcctagca agttaaaata aggctagtcc gttatcaact tggccagagg 120
aagagcgtca gcaggctgac tgcagggcca agtggcaccg agtcggtgc 169
<210> 3
<211> 425
<212> DNA
<213> Artificial sequence
<400> 3
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctaggcc agaggaagag 300
cgtcagcagg ctgactgcag ggcctagcaa gttaaaataa ggctagtccg ttatcaactt 360
ggccagagga agagcgtcag caggctgact gcagggccaa gtggcaccga gtcggtgctt 420
ttttt 425
<210> 4
<211> 8286
<212> DNA
<213> Artificial sequence
<400> 4
cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60
ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120
aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180
agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240
ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300
tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360
atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420
ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480
gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540
ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600
tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660
aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720
tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780
gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840
aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900
acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960
gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020
tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080
tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140
tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200
ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260
gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320
tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380
tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440
tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500
tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560
tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620
acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680
tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740
acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800
agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860
gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920
ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980
acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040
aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100
tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160
ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220
cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280
aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340
aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400
aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460
aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520
ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580
tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640
ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700
aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760
tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820
acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880
aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940
atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000
agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060
agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120
agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180
tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240
aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300
ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360
agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420
tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480
gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540
agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600
atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660
tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720
acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780
gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840
agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900
tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960
acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020
tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080
gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140
gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200
tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260
tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320
gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380
tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440
tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500
agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560
tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620
atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680
ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740
actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800
ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860
gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920
gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980
tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040
tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100
tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160
aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220
aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280
aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340
atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400
aggccagagg aagagcgtca gcaggctgac tgcagggcct agcaagttaa aataaggcta 5460
gtccgttatc aacttggcca gaggaagagc gtcagcaggc tgactgcagg gccaagtggc 5520
accgagtcgg tgcttttttt gcggccgcag gaacccctag tgatggagtt ggccactccc 5580
tctctgcgcg ctcgctcgct cactgaggcc gggcgaccaa aggtcgcccg acgcccgggc 5640
tttgcccggg cggcctcagt gagcgagcga gcgcgcagct gcctgcaggg gcgcctgatg 5700
cggtattttc tccttacgca tctgtgcggt atttcacacc gcatacgtca aagcaaccat 5760
agtacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg cgcagcgtga 5820
ccgctacact tgccagcgcc ttagcgcccg ctcctttcgc tttcttccct tcctttctcg 5880
ccacgttcgc cggctttccc cgtcaagctc taaatcgggg gctcccttta gggttccgat 5940
ttagtgcttt acggcacctc gaccccaaaa aacttgattt gggtgatggt tcacgtagtg 6000
ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg ttctttaata 6060
gtggactctt gttccaaact ggaacaacac tcaactctat ctcgggctat tcttttgatt 6120
tataagggat tttgccgatt tcggtctatt ggttaaaaaa tgagctgatt taacaaaaat 6180
ttaacgcgaa ttttaacaaa atattaacgt ttacaatttt atggtgcact ctcagtacaa 6240
tctgctctga tgccgcatag ttaagccagc cccgacaccc gccaacaccc gctgacgcgc 6300
cctgacgggc ttgtctgctc ccggcatccg cttacagaca agctgtgacc gtctccggga 6360
gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg cgcgagacga aagggcctcg 6420
tgatacgcct atttttatag gttaatgtca tgataataat ggtttcttag acgtcaggtg 6480
gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa atacattcaa 6540
atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat tgaaaaagga 6600
agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg gcattttgcc 6660
ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa gatcagttgg 6720
gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt gagagttttc 6780
gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt ggcgcggtat 6840
tatcccgtat tgacgccggg caagagcaac tcggtcgccg catacactat tctcagaatg 6900
acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg acagtaagag 6960
aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta cttctgacaa 7020
cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat catgtaactc 7080
gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag cgtgacacca 7140
cgatgcctgt agcaatggca acaacgttgc gcaaactatt aactggcgaa ctacttactc 7200
tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca ggaccacttc 7260
tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc ggtgagcgtg 7320
gaagccgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt atcgtagtta 7380
tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc gctgagatag 7440
gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat atactttaga 7500
ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt tttgataatc 7560
tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac cccgtagaaa 7620
agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc ttgcaaacaa 7680
aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca actctttttc 7740
cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgttcttcta gtgtagccgt 7800
agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct ctgctaatcc 7860
tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg gactcaagac 7920
gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc acacagccca 7980
gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta tgagaaagcg 8040
ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg gtcggaacag 8100
gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt cctgtcgggt 8160
ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg cggagcctat 8220
ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg ccttttgctc 8280
acatgt 8286
<210> 5
<211> 135
<212> RNA
<213> Artificial sequence
<400> 5
gaacggcucg gagaucauca uugcgguuuu agagcuaggc cagaggaaga gcgucagcag 60
gcugacugca gggccuagca aguuaaaaua aggcuagucc guuaucaacu ugaaaaagug 120
gcaccgaguc ggugc 135
<210> 6
<211> 135
<212> DNA
<213> Artificial sequence
<400> 6
gaacggctcg gagatcatca ttgcggtttt agagctaggc cagaggaaga gcgtcagcag 60
gctgactgca gggcctagca agttaaaata aggctagtcc gttatcaact tgaaaaagtg 120
gcaccgagtc ggtgc 135
<210> 7
<211> 391
<212> DNA
<213> Artificial sequence
<400> 7
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctaggcc agaggaagag 300
cgtcagcagg ctgactgcag ggcctagcaa gttaaaataa ggctagtccg ttatcaactt 360
gaaaaagtgg caccgagtcg gtgctttttt t 391
<210> 8
<211> 135
<212> RNA
<213> Artificial sequence
<400> 8
gaacggcucg gagaucauca uugcgguuuu agagcuagaa auagcaaguu aaaauaaggc 60
uaguccguua ucaacuuggc cagaggaaga gcgucagcag gcugacugca gggccaagug 120
gcaccgaguc ggugc 135
<210> 9
<211> 135
<212> DNA
<213> Artificial sequence
<400> 9
gaacggctcg gagatcatca ttgcggtttt agagctagaa atagcaagtt aaaataaggc 60
tagtccgtta tcaacttggc cagaggaaga gcgtcagcag gctgactgca gggccaagtg 120
gcaccgagtc ggtgc 135
<210> 10
<211> 391
<212> DNA
<213> Artificial sequence
<400> 10
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctagaaa tagcaagtta 300
aaataaggct agtccgttat caacttggcc agaggaagag cgtcagcagg ctgactgcag 360
ggccaagtgg caccgagtcg gtgctttttt t 391
<210> 11
<211> 8252
<212> DNA
<213> Artificial sequence
<400> 11
cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60
ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120
aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180
agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240
ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300
tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360
atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420
ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480
gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540
ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600
tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660
aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720
tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780
gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840
aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900
acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960
gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020
tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080
tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140
tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200
ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260
gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320
tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380
tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440
tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500
tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560
tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620
acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680
tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740
acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800
agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860
gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920
ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980
acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040
aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100
tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160
ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220
cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280
aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340
aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400
aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460
aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520
ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580
tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640
ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700
aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760
tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820
acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880
aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940
atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000
agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060
agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120
agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180
tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240
aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300
ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360
agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420
tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480
gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540
agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600
atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660
tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720
acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780
gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840
agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900
tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960
acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020
tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080
gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140
gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200
tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260
tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320
gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380
tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440
tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500
agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560
tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620
atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680
ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740
actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800
ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860
gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920
gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980
tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040
tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100
tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160
aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220
aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280
aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340
atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400
aggccagagg aagagcgtca gcaggctgac tgcagggcct agcaagttaa aataaggcta 5460
gtccgttatc aacttgaaaa agtggcaccg agtcggtgct ttttttgcgg ccgcaggaac 5520
ccctagtgat ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc 5580
gaccaaaggt cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc 5640
gcagctgcct gcaggggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700
cacaccgcat acgtcaaagc aaccatagta cgcgccctgt agcggcgcat taagcgcggc 5760
gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccttag cgcccgctcc 5820
tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa 5880
tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact 5940
tgatttgggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt 6000
gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa 6060
ctctatctcg ggctattctt ttgatttata agggattttg ccgatttcgg tctattggtt 6120
aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgtttac 6180
aattttatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagccccg 6240
acacccgcca acacccgctg acgcgccctg acgggcttgt ctgctcccgg catccgctta 6300
cagacaagct gtgaccgtct ccgggagctg catgtgtcag aggttttcac cgtcatcacc 6360
gaaacgcgcg agacgaaagg gcctcgtgat acgcctattt ttataggtta atgtcatgat 6420
aataatggtt tcttagacgt caggtggcac ttttcgggga aatgtgcgcg gaacccctat 6480
ttgtttattt ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata 6540
aatgcttcaa taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct 6600
tattcccttt tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa 6660
agtaaaagat gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa 6720
cagcggtaag atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt 6780
taaagttctg ctatgtggcg cggtattatc ccgtattgac gccgggcaag agcaactcgg 6840
tcgccgcata cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca 6900
tcttacggat ggcatgacag taagagaatt atgcagtgct gccataacca tgagtgataa 6960
cactgcggcc aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt 7020
gcacaacatg ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc 7080
cataccaaac gacgagcgtg acaccacgat gcctgtagca atggcaacaa cgttgcgcaa 7140
actattaact ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga 7200
ggcggataaa gttgcaggac cacttctgcg ctcggccctt ccggctggct ggtttattgc 7260
tgataaatct ggagccggtg agcgtggaag ccgcggtatc attgcagcac tggggccaga 7320
tggtaagccc tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga 7380
acgaaataga cagatcgctg agataggtgc ctcactgatt aagcattggt aactgtcaga 7440
ccaagtttac tcatatatac tttagattga tttaaaactt catttttaat ttaaaaggat 7500
ctaggtgaag atcctttttg ataatctcat gaccaaaatc ccttaacgtg agttttcgtt 7560
ccactgagcg tcagaccccg tagaaaagat caaaggatct tcttgagatc ctttttttct 7620
gcgcgtaatc tgctgcttgc aaacaaaaaa accaccgcta ccagcggtgg tttgtttgcc 7680
ggatcaagag ctaccaactc tttttccgaa ggtaactggc ttcagcagag cgcagatacc 7740
aaatactgtt cttctagtgt agccgtagtt aggccaccac ttcaagaact ctgtagcacc 7800
gcctacatac ctcgctctgc taatcctgtt accagtggct gctgccagtg gcgataagtc 7860
gtgtcttacc gggttggact caagacgata gttaccggat aaggcgcagc ggtcgggctg 7920
aacggggggt tcgtgcacac agcccagctt ggagcgaacg acctacaccg aactgagata 7980
cctacagcgt gagctatgag aaagcgccac gcttcccgaa gggagaaagg cggacaggta 8040
tccggtaagc ggcagggtcg gaacaggaga gcgcacgagg gagcttccag ggggaaacgc 8100
ctggtatctt tatagtcctg tcgggtttcg ccacctctga cttgagcgtc gatttttgtg 8160
atgctcgtca ggggggcgga gcctatggaa aaacgccagc aacgcggcct ttttacggtt 8220
cctggccttt tgctggcctt ttgctcacat gt 8252
<210> 12
<211> 8252
<212> DNA
<213> Artificial sequence
<400> 12
cctgcaggca gctgcgcgct cgctcgctca ctgaggccgc ccgggcgtcg ggcgaccttt 60
ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120
aggggttcct gcggcctcta gactcgaggc gttgacattg attattgact agttattaat 180
agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac 240
ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa 300
tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt 360
atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca agtacgcccc 420
ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttat 480
gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc 540
ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga tttccaagtc 600
tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa 660
aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta cggtgggagg 720
tctatataag cagagctctc tggctaacta ccggtgccac catggccgac aagaagtaca 780
gcatcggcct ggacatcggc accaactctg tgggctgggc cgtgatcacc gacgagtaca 840
aggtgcccag caagaaattc aaggtgctgg gcaacaccga ccggcacagc atcaagaaga 900
acctgatcgg agccctgctg ttcgacagcg gcgaaacagc cgaggccacc cggctgaaga 960
gaaccgccag aagaagatac accagacgga agaaccggat ctgctatctg caagagatct 1020
tcagcaacga gatggccaag gtggacgaca gcttcttcca cagactggaa gagtccttcc 1080
tggtggaaga ggataagaag cacgagcggc accccatctt cggcaacatc gtggacgagg 1140
tggcctacca cgagaagtac cccaccatct accacctgag aaagaaactg gtggacagca 1200
ccgacaaggc cgacctgcgg ctgatctatc tggccctggc ccacatgatc aagttccggg 1260
gccacttcct gatcgagggc gacctgaacc ccgacaacag cgacgtggac aagctgttca 1320
tccagctggt gcagacctac aaccagctgt tcgaggaaaa ccccatcaac gccagcggcg 1380
tggacgccaa ggccatcctg tctgccagac tgagcaagag cagacggctg gaaaatctga 1440
tcgcccagct gcccggcgag aagaagaatg gcctgttcgg aaacctgatt gccctgagcc 1500
tgggcctgac ccccaacttc aagagcaact tcgacctggc cgaggatgcc aaactgcagc 1560
tgagcaagga cacctacgac gacgacctgg acaacctgct ggcccagatc ggcgaccagt 1620
acgccgacct gtttctggcc gccaagaacc tgtccgacgc catcctgctg agcgacatcc 1680
tgagagtgaa caccgagatc accaaggccc ccctgagcgc ctctatgatc aagagatacg 1740
acgagcacca ccaggacctg accctgctga aagctctcgt gcggcagcag ctgcctgaga 1800
agtacaaaga gattttcttc gaccagagca agaacggcta cgccggctac attgacggcg 1860
gagccagcca ggaagagttc tacaagttca tcaagcccat cctggaaaag atggacggca 1920
ccgaggaact gctcgtgaag ctgaacagag aggacctgct gcggaagcag cggaccttcg 1980
acaacggcag catcccccac cagatccacc tgggagagct gcacgccatt ctgcggcggc 2040
aggaagattt ttacccattc ctgaaggaca accgggaaaa gatcgagaag atcctgacct 2100
tccgcatccc ctactacgtg ggccctctgg ccaggggaaa cagcagattc gcctggatga 2160
ccagaaagag cgaggaaacc atcaccccct ggaacttcga ggaagtggtg gacaagggcg 2220
cttccgccca gagcttcatc gagcggatga ccaacttcga taagaacctg cccaacgaga 2280
aggtgctgcc caagcacagc ctgctgtacg agtacttcac cgtgtataac gagctgacca 2340
aagtgaaata cgtgaccgag ggaatgagaa agcccgcctt cctgagcggc gagcagaaaa 2400
aggccatcgt ggacctgctg ttcaagacca accggaaagt gaccgtgaag cagctgaaag 2460
aggactactt caagaaaatc gagtgcttcg actccgtgga aatctccggc gtggaagatc 2520
ggttcaacgc ctccctgggc acataccacg atctgctgaa aattatcaag gacaaggact 2580
tcctggacaa tgaggaaaac gaggacattc tggaagatat cgtgctgacc ctgacactgt 2640
ttgaggacag agagatgatc gaggaacggc tgaaaaccta tgcccacctg ttcgacgaca 2700
aagtgatgaa gcagctgaag cggcggagat acaccggctg gggcaggctg agccggaagc 2760
tgatcaacgg catccgggac aagcagtccg gcaagacaat cctggatttc ctgaagtccg 2820
acggcttcgc caacagaaac ttcatgcagc tgatccacga cgacagcctg acctttaaag 2880
aggacatcca gaaagcccag gtgtccggcc agggcgatag cctgcacgag cacattgcca 2940
atctggccgg cagccccgcc attaagaagg gcatcctgca gacagtgaag gtggtggacg 3000
agctcgtgaa agtgatgggc cggcacaagc ccgagaacat cgtgatcgaa atggccagag 3060
agaaccagac cacccagaag ggacagaaga acagccgcga gagaatgaag cggatcgaag 3120
agggcatcaa agagctgggc agccagatcc tgaaagaaca ccccgtggaa aacacccagc 3180
tgcagaacga gaagctgtac ctgtactacc tgcagaatgg gcgggatatg tacgtggacc 3240
aggaactgga catcaaccgg ctgtccgact acgatgtgga ccatatcgtg cctcagagct 3300
ttctgaagga cgactccatc gacaacaagg tgctgaccag aagcgacaag aaccggggca 3360
agagcgacaa cgtgccctcc gaagaggtcg tgaagaagat gaagaactac tggcggcagc 3420
tgctgaacgc caagctgatt acccagagaa agttcgacaa tctgaccaag gccgagagag 3480
gcggcctgag cgaactggat aaggccggct tcatcaagag acagctggtg gaaacccggc 3540
agatcacaaa gcacgtggca cagatcctgg actcccggat gaacactaag tacgacgaga 3600
atgacaagct gatccgggaa gtgaaagtga tcaccctgaa gtccaagctg gtgtccgatt 3660
tccggaagga tttccagttt tacaaagtgc gcgagatcaa caactaccac cacgcccacg 3720
acgcctacct gaacgccgtc gtgggaaccg ccctgatcaa aaagtaccct aagctggaaa 3780
gcgagttcgt gtacggcgac tacaaggtgt acgacgtgcg gaagatgatc gccaagagcg 3840
agcaggaaat cggcaaggct accgccaagt acttcttcta cagcaacatc atgaactttt 3900
tcaagaccga gattaccctg gccaacggcg agatccggaa gcggcctctg atcgagacaa 3960
acggcgaaac cggggagatc gtgtgggata agggccggga ttttgccacc gtgcggaaag 4020
tgctgagcat gccccaagtg aatatcgtga aaaagaccga ggtgcagaca ggcggcttca 4080
gcaaagagtc tatcctgccc aagaggaaca gcgataagct gatcgccaga aagaaggact 4140
gggaccctaa gaagtacggc ggcttcgaca gccccaccgt ggcctattct gtgctggtgg 4200
tggccaaagt ggaaaagggc aagtccaaga aactgaagag tgtgaaagag ctgctgggga 4260
tcaccatcat ggaaagaagc agcttcgaga agaatcccat cgactttctg gaagccaagg 4320
gctacaaaga agtgaaaaag gacctgatca tcaagctgcc taagtactcc ctgttcgagc 4380
tggaaaacgg ccggaagaga atgctggcct ctgccggcga actgcagaag ggaaacgaac 4440
tggccctgcc ctccaaatat gtgaacttcc tgtacctggc cagccactat gagaagctga 4500
agggctcccc cgaggataat gagcagaaac agctgtttgt ggaacagcac aagcactacc 4560
tggacgagat catcgagcag atcagcgagt tctccaagag agtgatcctg gccgacgcta 4620
atctggacaa agtgctgtcc gcctacaaca agcaccggga taagcccatc agagagcagg 4680
ccgagaatat catccacctg tttaccctga ccaatctggg agcccctgcc gccttcaagt 4740
actttgacac caccatcgac cggaagaggt acaccagcac caaagaggtg ctggacgcca 4800
ccctgatcca ccagagcatc accggcctgt acgagacacg gatcgacctg tctcagctgg 4860
gaggcgacta agaattccta gagctcgctg atcagcctcg actgtgcctt ctagttgcca 4920
gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 4980
tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 5040
tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaga atagcaggca 5100
tgctggggag gtaccgaggg cctatttccc atgattcctt catatttgca tatacgatac 5160
aaggctgtta gagagataat tggaattaat ttgactgtaa acacaaagat attagtacaa 5220
aatacgtgac gtagaaagta ataatttctt gggtagtttg cagttttaaa attatgtttt 5280
aaaatggact atcatatgct taccgtaact tgaaagtatt tcgatttctt ggctttatat 5340
atcttgtgga aaggacgaaa caccgaacgg ctcggagatc atcattgcgg ttttagagct 5400
agaaatagca agttaaaata aggctagtcc gttatcaact tggccagagg aagagcgtca 5460
gcaggctgac tgcagggcca agtggcaccg agtcggtgct ttttttgcgg ccgcaggaac 5520
ccctagtgat ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc 5580
gaccaaaggt cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc 5640
gcagctgcct gcaggggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700
cacaccgcat acgtcaaagc aaccatagta cgcgccctgt agcggcgcat taagcgcggc 5760
gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccttag cgcccgctcc 5820
tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa 5880
tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact 5940
tgatttgggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt 6000
gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa 6060
ctctatctcg ggctattctt ttgatttata agggattttg ccgatttcgg tctattggtt 6120
aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgtttac 6180
aattttatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagccccg 6240
acacccgcca acacccgctg acgcgccctg acgggcttgt ctgctcccgg catccgctta 6300
cagacaagct gtgaccgtct ccgggagctg catgtgtcag aggttttcac cgtcatcacc 6360
gaaacgcgcg agacgaaagg gcctcgtgat acgcctattt ttataggtta atgtcatgat 6420
aataatggtt tcttagacgt caggtggcac ttttcgggga aatgtgcgcg gaacccctat 6480
ttgtttattt ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata 6540
aatgcttcaa taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct 6600
tattcccttt tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa 6660
agtaaaagat gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa 6720
cagcggtaag atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt 6780
taaagttctg ctatgtggcg cggtattatc ccgtattgac gccgggcaag agcaactcgg 6840
tcgccgcata cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca 6900
tcttacggat ggcatgacag taagagaatt atgcagtgct gccataacca tgagtgataa 6960
cactgcggcc aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt 7020
gcacaacatg ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc 7080
cataccaaac gacgagcgtg acaccacgat gcctgtagca atggcaacaa cgttgcgcaa 7140
actattaact ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga 7200
ggcggataaa gttgcaggac cacttctgcg ctcggccctt ccggctggct ggtttattgc 7260
tgataaatct ggagccggtg agcgtggaag ccgcggtatc attgcagcac tggggccaga 7320
tggtaagccc tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga 7380
acgaaataga cagatcgctg agataggtgc ctcactgatt aagcattggt aactgtcaga 7440
ccaagtttac tcatatatac tttagattga tttaaaactt catttttaat ttaaaaggat 7500
ctaggtgaag atcctttttg ataatctcat gaccaaaatc ccttaacgtg agttttcgtt 7560
ccactgagcg tcagaccccg tagaaaagat caaaggatct tcttgagatc ctttttttct 7620
gcgcgtaatc tgctgcttgc aaacaaaaaa accaccgcta ccagcggtgg tttgtttgcc 7680
ggatcaagag ctaccaactc tttttccgaa ggtaactggc ttcagcagag cgcagatacc 7740
aaatactgtt cttctagtgt agccgtagtt aggccaccac ttcaagaact ctgtagcacc 7800
gcctacatac ctcgctctgc taatcctgtt accagtggct gctgccagtg gcgataagtc 7860
gtgtcttacc gggttggact caagacgata gttaccggat aaggcgcagc ggtcgggctg 7920
aacggggggt tcgtgcacac agcccagctt ggagcgaacg acctacaccg aactgagata 7980
cctacagcgt gagctatgag aaagcgccac gcttcccgaa gggagaaagg cggacaggta 8040
tccggtaagc ggcagggtcg gaacaggaga gcgcacgagg gagcttccag ggggaaacgc 8100
ctggtatctt tatagtcctg tcgggtttcg ccacctctga cttgagcgtc gatttttgtg 8160
atgctcgtca ggggggcgga gcctatggaa aaacgccagc aacgcggcct ttttacggtt 8220
cctggccttt tgctggcctt ttgctcacat gt 8252
<210> 13
<211> 9181
<212> DNA
<213> Artificial sequence
<400> 13
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg aacggctcgg agatcatcat tgcggtttta gagctagaaa tagcaagtta 300
aaataaggct agtccgttat caacttgaaa aagtggcacc gagtcggtgc ttttttgttt 360
tagagctaga aatagcaagt taaaataagg ctagtccgtt tttagcgcgt gcgccaattc 420
tgcagacaaa tggctctaga ggtacccgtt acataactta cggtaaatgg cccgcctggc 480
tgaccgccca acgacccccg cccattgacg tcaatagtaa cgccaatagg gactttccat 540
tgacgtcaat gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat 600
catatgccaa gtacgccccc tattgacgtc aatgacggta aatggcccgc ctggcattgt 660
gcccagtaca tgaccttatg ggactttcct acttggcagt acatctacgt attagtcatc 720
gctattacca tggtcgaggt gagccccacg ttctgcttca ctctccccat ctcccccccc 780
tccccacccc caattttgta tttatttatt ttttaattat tttgtgcagc gatgggggcg 840
gggggggggg gggggcgcgc gccaggcggg gcggggcggg gcgaggggcg gggcggggcg 900
aggcggagag gtgcggcggc agccaatcag agcggcgcgc tccgaaagtt tccttttatg 960
gcgaggcggc ggcggcggcg gccctataaa aagcgaagcg cgcggcgggc gggagtcgct 1020
gcgcgctgcc ttcgccccgt gccccgctcc gccgccgcct cgcgccgccc gccccggctc 1080
tgactgaccg cgttactccc acaggtgagc gggcgggacg gcccttctcc tccgggctgt 1140
aattagctga gcaagaggta agggtttaag ggatggttgg ttggtggggt attaatgttt 1200
aattacctgg agcacctgcc tgaaatcact ttttttcagg ttggaccggt gccaccatgg 1260
actataagga ccacgacgga gactacaagg atcatgatat tgattacaaa gacgatgacg 1320
ataagatggc cccaaagaag aagcggaagg tcggtatcca cggagtccca gcagccgaca 1380
agaagtacag catcggcctg gacatcggca ccaactctgt gggctgggcc gtgatcaccg 1440
acgagtacaa ggtgcccagc aagaaattca aggtgctggg caacaccgac cggcacagca 1500
tcaagaagaa cctgatcgga gccctgctgt tcgacagcgg cgaaacagcc gaggccaccc 1560
ggctgaagag aaccgccaga agaagataca ccagacggaa gaaccggatc tgctatctgc 1620
aagagatctt cagcaacgag atggccaagg tggacgacag cttcttccac agactggaag 1680
agtccttcct ggtggaagag gataagaagc acgagcggca ccccatcttc ggcaacatcg 1740
tggacgaggt ggcctaccac gagaagtacc ccaccatcta ccacctgaga aagaaactgg 1800
tggacagcac cgacaaggcc gacctgcggc tgatctatct ggccctggcc cacatgatca 1860
agttccgggg ccacttcctg atcgagggcg acctgaaccc cgacaacagc gacgtggaca 1920
agctgttcat ccagctggtg cagacctaca accagctgtt cgaggaaaac cccatcaacg 1980
ccagcggcgt ggacgccaag gccatcctgt ctgccagact gagcaagagc agacggctgg 2040
aaaatctgat cgcccagctg cccggcgaga agaagaatgg cctgttcgga aacctgattg 2100
ccctgagcct gggcctgacc cccaacttca agagcaactt cgacctggcc gaggatgcca 2160
aactgcagct gagcaaggac acctacgacg acgacctgga caacctgctg gcccagatcg 2220
gcgaccagta cgccgacctg tttctggccg ccaagaacct gtccgacgcc atcctgctga 2280
gcgacatcct gagagtgaac accgagatca ccaaggcccc cctgagcgcc tctatgatca 2340
agagatacga cgagcaccac caggacctga ccctgctgaa agctctcgtg cggcagcagc 2400
tgcctgagaa gtacaaagag attttcttcg accagagcaa gaacggctac gccggctaca 2460
ttgacggcgg agccagccag gaagagttct acaagttcat caagcccatc ctggaaaaga 2520
tggacggcac cgaggaactg ctcgtgaagc tgaacagaga ggacctgctg cggaagcagc 2580
ggaccttcga caacggcagc atcccccacc agatccacct gggagagctg cacgccattc 2640
tgcggcggca ggaagatttt tacccattcc tgaaggacaa ccgggaaaag atcgagaaga 2700
tcctgacctt ccgcatcccc tactacgtgg gccctctggc caggggaaac agcagattcg 2760
cctggatgac cagaaagagc gaggaaacca tcaccccctg gaacttcgag gaagtggtgg 2820
acaagggcgc ttccgcccag agcttcatcg agcggatgac caacttcgat aagaacctgc 2880
ccaacgagaa ggtgctgccc aagcacagcc tgctgtacga gtacttcacc gtgtataacg 2940
agctgaccaa agtgaaatac gtgaccgagg gaatgagaaa gcccgccttc ctgagcggcg 3000
agcagaaaaa ggccatcgtg gacctgctgt tcaagaccaa ccggaaagtg accgtgaagc 3060
agctgaaaga ggactacttc aagaaaatcg agtgcttcga ctccgtggaa atctccggcg 3120
tggaagatcg gttcaacgcc tccctgggca cataccacga tctgctgaaa attatcaagg 3180
acaaggactt cctggacaat gaggaaaacg aggacattct ggaagatatc gtgctgaccc 3240
tgacactgtt tgaggacaga gagatgatcg aggaacggct gaaaacctat gcccacctgt 3300
tcgacgacaa agtgatgaag cagctgaagc ggcggagata caccggctgg ggcaggctga 3360
gccggaagct gatcaacggc atccgggaca agcagtccgg caagacaatc ctggatttcc 3420
tgaagtccga cggcttcgcc aacagaaact tcatgcagct gatccacgac gacagcctga 3480
cctttaaaga ggacatccag aaagcccagg tgtccggcca gggcgatagc ctgcacgagc 3540
acattgccaa tctggccggc agccccgcca ttaagaaggg catcctgcag acagtgaagg 3600
tggtggacga gctcgtgaaa gtgatgggcc ggcacaagcc cgagaacatc gtgatcgaaa 3660
tggccagaga gaaccagacc acccagaagg gacagaagaa cagccgcgag agaatgaagc 3720
ggatcgaaga gggcatcaaa gagctgggca gccagatcct gaaagaacac cccgtggaaa 3780
acacccagct gcagaacgag aagctgtacc tgtactacct gcagaatggg cgggatatgt 3840
acgtggacca ggaactggac atcaaccggc tgtccgacta cgatgtggac catatcgtgc 3900
ctcagagctt tctgaaggac gactccatcg acaacaaggt gctgaccaga agcgacaaga 3960
accggggcaa gagcgacaac gtgccctccg aagaggtcgt gaagaagatg aagaactact 4020
ggcggcagct gctgaacgcc aagctgatta cccagagaaa gttcgacaat ctgaccaagg 4080
ccgagagagg cggcctgagc gaactggata aggccggctt catcaagaga cagctggtgg 4140
aaacccggca gatcacaaag cacgtggcac agatcctgga ctcccggatg aacactaagt 4200
acgacgagaa tgacaagctg atccgggaag tgaaagtgat caccctgaag tccaagctgg 4260
tgtccgattt ccggaaggat ttccagtttt acaaagtgcg cgagatcaac aactaccacc 4320
acgcccacga cgcctacctg aacgccgtcg tgggaaccgc cctgatcaaa aagtacccta 4380
agctggaaag cgagttcgtg tacggcgact acaaggtgta cgacgtgcgg aagatgatcg 4440
ccaagagcga gcaggaaatc ggcaaggcta ccgccaagta cttcttctac agcaacatca 4500
tgaacttttt caagaccgag attaccctgg ccaacggcga gatccggaag cggcctctga 4560
tcgagacaaa cggcgaaacc ggggagatcg tgtgggataa gggccgggat tttgccaccg 4620
tgcggaaagt gctgagcatg ccccaagtga atatcgtgaa aaagaccgag gtgcagacag 4680
gcggcttcag caaagagtct atcctgccca agaggaacag cgataagctg atcgccagaa 4740
agaaggactg ggaccctaag aagtacggcg gcttcgacag ccccaccgtg gcctattctg 4800
tgctggtggt ggccaaagtg gaaaagggca agtccaagaa actgaagagt gtgaaagagc 4860
tgctggggat caccatcatg gaaagaagca gcttcgagaa gaatcccatc gactttctgg 4920
aagccaaggg ctacaaagaa gtgaaaaagg acctgatcat caagctgcct aagtactccc 4980
tgttcgagct ggaaaacggc cggaagagaa tgctggcctc tgccggcgaa ctgcagaagg 5040
gaaacgaact ggccctgccc tccaaatatg tgaacttcct gtacctggcc agccactatg 5100
agaagctgaa gggctccccc gaggataatg agcagaaaca gctgtttgtg gaacagcaca 5160
agcactacct ggacgagatc atcgagcaga tcagcgagtt ctccaagaga gtgatcctgg 5220
ccgacgctaa tctggacaaa gtgctgtccg cctacaacaa gcaccgggat aagcccatca 5280
gagagcaggc cgagaatatc atccacctgt ttaccctgac caatctggga gcccctgccg 5340
ccttcaagta ctttgacacc accatcgacc ggaagaggta caccagcacc aaagaggtgc 5400
tggacgccac cctgatccac cagagcatca ccggcctgta cgagacacgg atcgacctgt 5460
ctcagctggg aggcgacaaa aggccggcgg ccacgaaaaa ggccggccag gcaaaaaaga 5520
aaaaggaatt cggcagtgga gagggcagag gaagtctgct aacatgcggt gacgtcgagg 5580
agaatcctgg cccaatgacc gagtacaagc ccacggtgcg cctcgccacc cgcgacgacg 5640
tccccagggc cgtacgcacc ctcgccgccg cgttcgccga ctaccccgcc acgcgccaca 5700
ccgtcgatcc ggaccgccac atcgagcggg tcaccgagct gcaagaactc ttcctcacgc 5760
gcgtcgggct cgacatcggc aaggtgtggg tcgcggacga cggcgccgcg gtggcggtct 5820
ggaccacgcc ggagagcgtc gaagcggggg cggtgttcgc cgagatcggc ccgcgcatgg 5880
ccgagttgag cggttcccgg ctggccgcgc agcaacagat ggaaggcctc ctggcgccgc 5940
accggcccaa ggagcccgcg tggttcctgg ccaccgtcgg agtctcgccc gaccaccagg 6000
gcaagggtct gggcagcgcc gtcgtgctcc ccggagtgga ggcggccgag cgcgccgggg 6060
tgcccgcctt cctggagacc tccgcgcccc gcaacctccc cttctacgag cggctcggct 6120
tcaccgtcac cgccgacgtc gaggtgcccg aaggaccgcg cacctggtgc atgacccgca 6180
agcccggtgc ctgagaattc taactagagc tcgctgatca gcctcgactg tgccttctag 6240
ttgccagcca tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg aaggtgccac 6300
tcccactgtc ctttcctaat aaaatgagga aattgcatcg cattgtctga gtaggtgtca 6360
ttctattctg gggggtgggg tggggcagga cagcaagggg gaggattggg aagagaatag 6420
caggcatgct ggggagcggc cgcaggaacc cctagtgatg gagttggcca ctccctctct 6480
gcgcgctcgc tcgctcactg aggccgggcg accaaaggtc gcccgacgcc cgggctttgc 6540
ccgggcggcc tcagtgagcg agcgagcgcg cagctgcctg caggggcgcc tgatgcggta 6600
ttttctcctt acgcatctgt gcggtatttc acaccgcata cgtcaaagca accatagtac 6660
gcgccctgta gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct 6720
acacttgcca gcgccttagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 6780
ttcgccggct ttccccgtca agctctaaat cgggggctcc ctttagggtt ccgatttagt 6840
gctttacggc acctcgaccc caaaaaactt gatttgggtg atggttcacg tagtgggcca 6900
tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 6960
ctcttgttcc aaactggaac aacactcaac tctatctcgg gctattcttt tgatttataa 7020
gggattttgc cgatttcggt ctattggtta aaaaatgagc tgatttaaca aaaatttaac 7080
gcgaatttta acaaaatatt aacgtttaca attttatggt gcactctcag tacaatctgc 7140
tctgatgccg catagttaag ccagccccga cacccgccaa cacccgctga cgcgccctga 7200
cgggcttgtc tgctcccggc atccgcttac agacaagctg tgaccgtctc cgggagctgc 7260
atgtgtcaga ggttttcacc gtcatcaccg aaacgcgcga gacgaaaggg cctcgtgata 7320
cgcctatttt tataggttaa tgtcatgata ataatggttt cttagacgtc aggtggcact 7380
tttcggggaa atgtgcgcgg aacccctatt tgtttatttt tctaaataca ttcaaatatg 7440
tatccgctca tgagacaata accctgataa atgcttcaat aatattgaaa aaggaagagt 7500
atgagtattc aacatttccg tgtcgccctt attccctttt ttgcggcatt ttgccttcct 7560
gtttttgctc acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgca 7620
cgagtgggtt acatcgaact ggatctcaac agcggtaaga tccttgagag ttttcgcccc 7680
gaagaacgtt ttccaatgat gagcactttt aaagttctgc tatgtggcgc ggtattatcc 7740
cgtattgacg ccgggcaaga gcaactcggt cgccgcatac actattctca gaatgacttg 7800
gttgagtact caccagtcac agaaaagcat cttacggatg gcatgacagt aagagaatta 7860
tgcagtgctg ccataaccat gagtgataac actgcggcca acttacttct gacaacgatc 7920
ggaggaccga aggagctaac cgcttttttg cacaacatgg gggatcatgt aactcgcctt 7980
gatcgttggg aaccggagct gaatgaagcc ataccaaacg acgagcgtga caccacgatg 8040
cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg gcgaactact tactctagct 8100
tcccggcaac aattaataga ctggatggag gcggataaag ttgcaggacc acttctgcgc 8160
tcggcccttc cggctggctg gtttattgct gataaatctg gagccggtga gcgtggaagc 8220
cgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctac 8280
acgacgggga gtcaggcaac tatggatgaa cgaaatagac agatcgctga gataggtgcc 8340
tcactgatta agcattggta actgtcagac caagtttact catatatact ttagattgat 8400
ttaaaacttc atttttaatt taaaaggatc taggtgaaga tcctttttga taatctcatg 8460
accaaaatcc cttaacgtga gttttcgttc cactgagcgt cagaccccgt agaaaagatc 8520
aaaggatctt cttgagatcc tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa 8580
ccaccgctac cagcggtggt ttgtttgccg gatcaagagc taccaactct ttttccgaag 8640
gtaactggct tcagcagagc gcagatacca aatactgttc ttctagtgta gccgtagtta 8700
ggccaccact tcaagaactc tgtagcaccg cctacatacc tcgctctgct aatcctgtta 8760
ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg ggttggactc aagacgatag 8820
ttaccggata aggcgcagcg gtcgggctga acggggggtt cgtgcacaca gcccagcttg 8880
gagcgaacga cctacaccga actgagatac ctacagcgtg agctatgaga aagcgccacg 8940
cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg gcagggtcgg aacaggagag 9000
cgcacgaggg agcttccagg gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc 9060
cacctctgac ttgagcgtcg atttttgtga tgctcgtcag gggggcggag cctatggaaa 9120
aacgccagca acgcggcctt tttacggttc ctggcctttt gctggccttt tgctcacatg 9180
t 9181
<210> 14
<211> 584
<212> DNA
<213> Artificial sequence
<400> 14
gacattgatt attgactagt tattaatagt aatcaattac ggggtcatta gttcatagcc 60
catatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc tgaccgccca 120
acgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg ccaataggga 180
ctttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240
aagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300
ggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac atctacgtat 360
tagtcatcgc tattaccatg gtgatgcggt tttggcagta catcaatggg cgtggatagc 420
ggtttgactc acggggattt ccaagtctcc accccattga cgtcaatggg agtttgtttt 480
ggcaccaaaa tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca ttgacgcaaa 540
tgggcggtag gcgtgtacgg tgggaggtct atataagcag agct 584
<210> 15
<211> 600
<212> DNA
<213> Artificial sequence
<400> 15
atgaccgagt acaagcccac ggtgcgcctc gccacccgcg acgacgtccc cagggccgta 60
cgcaccctcg ccgccgcgtt cgccgactac cccgccacgc gccacaccgt cgatccggac 120
cgccacatcg agcgggtcac cgagctgcaa gaactcttcc tcacgcgcgt cgggctcgac 180
atcggcaagg tgtgggtcgc ggacgacggc gccgcggtgg cggtctggac cacgccggag 240
agcgtcgaag cgggggcggt gttcgccgag atcggcccgc gcatggccga gttgagcggt 300
tcccggctgg ccgcgcagca acagatggaa ggcctcctgg cgccgcaccg gcccaaggag 360
cccgcgtggt tcctggccac cgtcggcgtc tcgcccgacc accagggcaa gggtctgggc 420
agcgccgtcg tgctccccgg agtggaggcg gccgagcgcg ccggggtgcc cgccttcctg 480
gagacctccg cgccccgcaa cctccccttc tacgagcggc tcggcttcac cgtcaccgcc 540
gacgtcgagg tgcccgaagg accgcgcacc tggtgcatga cccgcaagcc cggtgcctga 600
<210> 16
<211> 714
<212> DNA
<213> Artificial sequence
<400> 16
gtgagcaagg gcgaggagct gttcaccggg gtggtgccca tcctggtcga gctggacggc 60
gacgtaaacg gccacaagtt cagcgtgtcc ggcgagggcg agggcgatgc cacctacggc 120
aagctgaccc tgaagttcat ctgcaccacc ggcaagctgc ccgtgccctg gcccaccctc 180
gtgaccaccc tgacctacgg cgtgcagtgc ttcagccgct accccgacca catgaagcag 240
cacgacttct tcaagtccgc catgcccgaa ggctacgtcc aggagcgcac catcttcttc 300
aaggacgacg gcaactacaa gacccgcgcc gaggtgaagt tcgagggcga caccctggtg 360
aaccgcatcg agctgaaggg catcgacttc aaggaggacg gcaacatcct ggggcacaag 420
ctggagtaca actacaacag ccacaacgtc tatatcatgg ccgacaagca gaagaacggc 480
atcaaggtga acttcaagat ccgccacaac atcgaggacg gcagcgtgca gctcgccgac 540
cactaccagc agaacacccc catcggcgac ggccccgtgc tgctgcccga caaccactac 600
ctgagcaccc agtccgccct gagcaaagac cccaacgaga agcgcgatca catggtcctg 660
ctggagttcg tgaccgccgc cgggatcact ctcggcatgg acgagctgta caag 714
<210> 17
<211> 8111
<212> DNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (2581)..(2587)
<223> n is a, c, g, or t
<400> 17
agcttaatgt agtcttatgc aatactcttg tagtcttgca acatggtaac gatgagttag 60
caacatgcct tacaaggaga gaaaaagcac cgtgcatgcc gattggtgga agtaaggtgg 120
tacgatcgtg ccttattagg aaggcaacag acgggtctga catggattgg acgaaccact 180
gaattgccgc attgcagaga tattgtattt aagtgcctag ctcgatacat aaacgggtct 240
ctctggttag accagatctg agcctgggag ctctctggct aactagggaa cccactgctt 300
aagcctcaat aaagcttgcc ttgagtgctt caagtagtgt gtgcccgtct gttgtgtgac 360
tctggtaact agagatccct cagacccttt tagtcagtgt ggaaaatctc tagcagtggc 420
gcccgaacag ggacttgaaa gcgaaaggga aaccagagga gctctctcga cgcaggactc 480
ggcttgctga agcgcgcacg gcaagaggcg aggggcggcg actggtgagt acgccaaaaa 540
ttttgactag cggaggctag aaggagagag atgggtgcga gagcgtcagt attaagcggg 600
ggagaattag atcgcgatgg gaaaaaattc ggttaaggcc agggggaaag aaaaaatata 660
aattaaaaca tatagtatgg gcaagcaggg agctagaacg attcgcagtt aatcctggcc 720
tgttagaaac atcagaaggc tgtagacaaa tactgggaca gctacaacca tcccttcaga 780
caggatcaga agaacttaga tcattatata atacagtagc aaccctctat tgtgtgcatc 840
aaaggataga gataaaagac accaaggaag ctttagacaa gatagaggaa gagcaaaaca 900
aaagtaagac caccgcacag caagcggccg ctgatcttca gacctggagg aggagatatg 960
agggacaatt ggagaagtga attatataaa tataaagtag taaaaattga accattagga 1020
gtagcaccca ccaaggcaaa gagaagagtg gtgcagagag aaaaaagagc agtgggaata 1080
ggagctttgt tccttgggtt cttgggagca gcaggaagca ctatgggcgc agcgtcaatg 1140
acgctgacgg tacaggccag acaattattg tctggtatag tgcagcagca gaacaatttg 1200
ctgagggcta ttgaggcgca acagcatctg ttgcaactca cagtctgggg catcaagcag 1260
ctccaggcaa gaatcctggc tgtggaaaga tacctaaagg atcaacagct cctggggatt 1320
tggggttgct ctggaaaact catttgcacc actgctgtgc cttggaatgc tagttggagt 1380
aataaatctc tggaacagat ttggaatcac acgacctgga tggagtggga cagagaaatt 1440
aacaattaca caagcttaat acactcctta attgaagaat cgcaaaacca gcaagaaaag 1500
aatgaacaag aattattgga attagataaa tgggcaagtt tgtggaattg gtttaacata 1560
acaaattggc tgtggtatat aaaattattc ataatgatag taggaggctt ggtaggttta 1620
agaatagttt ttgctgtact ttctatagtg aatagagtta ggcagggata ttcaccatta 1680
tcgtttcaga cccacctccc aaccccgagg ggacccgaca ggcccgaagg aatagaagaa 1740
gaaggtggag agagagacag agacagatcc attcgattag tgaacggatc tcgacggtat 1800
cggttaactt ttaaaagaaa aggggggatt ggggggtaca gtgcagggga aagaatagta 1860
gacataatag caacagacat acaaactaaa gaattacaaa aacaaattac aaaaattcaa 1920
aattttatcg atcacgagac tagcctcgag aagcttgata tcgacattga ttattgacta 1980
gttattaata gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg 2040
ttacataact tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga 2100
cgtcaataat gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat 2160
gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat catatgccaa 2220
gtacgccccc tattgacgtc aatgacggta aatggcccgc ctggcattat gcccagtaca 2280
tgaccttatg ggactttcct acttggcagt acatctacgt attagtcatc gctattacca 2340
tggtgatgcg gttttggcag tacatcaatg ggcgtggata gcggtttgac tcacggggat 2400
ttccaagtct ccaccccatt gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg 2460
actttccaaa atgtcgtaac aactccgccc cattgacgca aatgggcggt aggcgtgtac 2520
ggtgggaggt ctatataagc agagctgcca ccatggaacg gctcggagat catcattgcg 2580
nnnnnnngtg agcaagggcg aggagctgtt caccggggtg gtgcccatcc tggtcgagct 2640
ggacggcgac gtaaacggcc acaagttcag cgtgtccggc gagggcgagg gcgatgccac 2700
ctacggcaag ctgaccctga agttcatctg caccaccggc aagctgcccg tgccctggcc 2760
caccctcgtg accaccctga cctacggcgt gcagtgcttc agccgctacc ccgaccacat 2820
gaagcagcac gacttcttca agtccgccat gcccgaaggc tacgtccagg agcgcaccat 2880
cttcttcaag gacgacggca actacaagac ccgcgccgag gtgaagttcg agggcgacac 2940
cctggtgaac cgcatcgagc tgaagggcat cgacttcaag gaggacggca acatcctggg 3000
gcacaagctg gagtacaact acaacagcca caacgtctat atcatggccg acaagcagaa 3060
gaacggcatc aaggtgaact tcaagatccg ccacaacatc gaggacggca gcgtgcagct 3120
cgccgaccac taccagcaga acacccccat cggcgacggc cccgtgctgc tgcccgacaa 3180
ccactacctg agcacccagt ccgccctgag caaagacccc aacgagaagc gcgatcacat 3240
ggtcctgctg gagttcgtga ccgccgccgg gatcactctc ggcatggacg agctgtacaa 3300
ggagggcaga ggaagtcttc taacatgcgg tgacgtggag gagaatcccg gccctatgac 3360
cgagtacaag cccacggtgc gcctcgccac ccgcgacgac gtccccaggg ccgtacgcac 3420
cctcgccgcc gcgttcgccg actaccccgc cacgcgccac accgtcgatc cggaccgcca 3480
catcgagcgg gtcaccgagc tgcaagaact cttcctcacg cgcgtcgggc tcgacatcgg 3540
caaggtgtgg gtcgcggacg acggcgccgc ggtggcggtc tggaccacgc cggagagcgt 3600
cgaagcgggg gcggtgttcg ccgagatcgg cccgcgcatg gccgagttga gcggttcccg 3660
gctggccgcg cagcaacaga tggaaggcct cctggcgccg caccggccca aggagcccgc 3720
gtggttcctg gccaccgtcg gcgtctcgcc cgaccaccag ggcaagggtc tgggcagcgc 3780
cgtcgtgctc cccggagtgg aggcggccga gcgcgccggg gtgcccgcct tcctggagac 3840
ctccgcgccc cgcaacctcc ccttctacga gcggctcggc ttcaccgtca ccgccgacgt 3900
cgaggtgccc gaaggaccgc gcacctggtg catgacccgc aagcccggtg cctgagtcga 3960
caatcaacct ctggattaca aaatttgtga aagattgact ggtattctta actatgttgc 4020
tccttttacg ctatgtggat acgctgcttt aatgcctttg tatcatgcta ttgcttcccg 4080
tatggctttc attttctcct ccttgtataa atcctggttg ctgtctcttt atgaggagtt 4140
gtggcccgtt gtcaggcaac gtggcgtggt gtgcactgtg tttgctgacg caacccccac 4200
tggttggggc attgccacca cctgtcagct cctttccggg actttcgctt tccccctccc 4260
tattgccacg gcggaactca tcgccgcctg ccttgcccgc tgctggacag gggctcggct 4320
gttgggcact gacaattccg tggtgttgtc ggggaagctg acgtcctttc catggctgct 4380
cgcctgtgtt gccacctgga ttctgcgcgg gacgtccttc tgctacgtcc cttcggccct 4440
caatccagcg gaccttcctt cccgcggcct gctgccggct ctgcggcctc ttccgcgtct 4500
tcgccttcgc cctcagacga gtcggatctc cctttgggcc gcctccccgc ctggaattcg 4560
agctcggtac ctttaagacc aatgacttac aaggcagctg tagatcttag ccacttttta 4620
aaagaaaagg ggggactgga agggctaatt cactcccaac gaagacaaga tctgcttttt 4680
gcttgtactg ggtctctctg gttagaccag atctgagcct gggagctctc tggctaacta 4740
gggaacccac tgcttaagcc tcaataaagc ttgccttgag tgcttcaagt agtgtgtgcc 4800
cgtctgttgt gtgactctgg taactagaga tccctcagac ccttttagtc agtgtggaaa 4860
atctctagca gtagtagttc atgtcatctt attattcagt atttataact tgcaaagaaa 4920
tgaatatcag agagtgagag gaacttgttt attgcagctt ataatggtta caaataaagc 4980
aatagcatca caaatttcac aaataaagca tttttttcac tgcattctag ttgtggtttg 5040
tccaaactca tcaatgtatc ttatcatgtc tggctctagc tatcccgccc ctaactccgc 5100
ccatcccgcc cctaactccg cccagttccg cccattctcc gccccatggc tgactaattt 5160
tttttattta tgcagaggcc gaggccgcct cggcctctga gctattccag aagtagtgag 5220
gaggcttttt tggaggccta gggacgtacc caattcgccc tatagtgagt cgtattacgc 5280
gcgctcactg gccgtcgttt tacaacgtcg tgactgggaa aaccctggcg ttacccaact 5340
taatcgcctt gcagcacatc cccctttcgc cagctggcgt aatagcgaag aggcccgcac 5400
cgatcgccct tcccaacagt tgcgcagcct gaatggcgaa tgggacgcgc cctgtagcgg 5460
cgcattaagc gcggcgggtg tggtggttac gcgcagcgtg accgctacac ttgccagcgc 5520
cctagcgccc gctcctttcg ctttcttccc ttcctttctc gccacgttcg ccggctttcc 5580
ccgtcaagct ctaaatcggg ggctcccttt agggttccga tttagtgctt tacggcacct 5640
cgaccccaaa aaacttgatt agggtgatgg ttcacgtagt gggccatcgc cctgatagac 5700
ggtttttcgc cctttgacgt tggagtccac gttctttaat agtggactct tgttccaaac 5760
tggaacaaca ctcaacccta tctcggtcta ttcttttgat ttataaggga ttttgccgat 5820
ttcggcctat tggttaaaaa atgagctgat ttaacaaaaa tttaacgcga attttaacaa 5880
aatattaacg cttacaattt aggtggcact tttcggggaa atgtgcgcgg aacccctatt 5940
tgtttatttt tctaaataca ttcaaatatg tatccgctca tgagacaata accctgataa 6000
atgcttcaat aatattgaaa aaggaagagt atgagtattc aacatttccg tgtcgccctt 6060
attccctttt ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaa 6120
gtaaaagatg ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaac 6180
agcggtaaga tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcactttt 6240
aaagttctgc tatgtggcgc ggtattatcc cgtattgacg ccgggcaaga gcaactcggt 6300
cgccgcatac actattctca gaatgacttg gttgagtact caccagtcac agaaaagcat 6360
cttacggatg gcatgacagt aagagaatta tgcagtgctg ccataaccat gagtgataac 6420
actgcggcca acttacttct gacaacgatc ggaggaccga aggagctaac cgcttttttg 6480
cacaacatgg gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagcc 6540
ataccaaacg acgagcgtga caccacgatg cctgtagcaa tggcaacaac gttgcgcaaa 6600
ctattaactg gcgaactact tactctagct tcccggcaac aattaataga ctggatggag 6660
gcggataaag ttgcaggacc acttctgcgc tcggcccttc cggctggctg gtttattgct 6720
gataaatctg gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagat 6780
ggtaagccct cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaa 6840
cgaaatagac agatcgctga gataggtgcc tcactgatta agcattggta actgtcagac 6900
caagtttact catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc 6960
taggtgaaga tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc 7020
cactgagcgt cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg 7080
cgcgtaatct gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg 7140
gatcaagagc taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca 7200
aatactgttc ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg 7260
cctacatacc tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg 7320
tgtcttaccg ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga 7380
acggggggtt cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac 7440
ctacagcgtg agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat 7500
ccggtaagcg gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc 7560
tggtatcttt atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga 7620
tgctcgtcag gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc 7680
ctggcctttt gctggccttt tgctcacatg ttctttcctg cgttatcccc tgattctgtg 7740
gataaccgta ttaccgcctt tgagtgagct gataccgctc gccgcagccg aacgaccgag 7800
cgcagcgagt cagtgagcga ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc 7860
gcgcgttggc cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc 7920
agtgagcgca acgcaattaa tgtgagttag ctcactcatt aggcacccca ggctttacac 7980
tttatgcttc cggctcgtat gttgtgtgga attgtgagcg gataacaatt tcacacagga 8040
aacagctatg accatgatta cgccaagcgc gcaattaacc ctcactaaag ggaacaaaag 8100
ctggagctgc a 8111

Claims (25)

1. A method of engineering a sgRNA molecule, comprising the steps of: introducing a sequence with a function of nuclear localization in a snorRNA molecule into a nucleotide chain of the sgRNA to obtain the modified sgRNA molecule.
2. The method of claim 1, wherein: the sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions as a nuclear localization in the snoRNA molecule;
further, the sequence which plays a nuclear localization function in the snoRNA molecule is a box C' sequence and/or a box D sequence;
still further, the box C' sequence comprises DGAHBN; wherein D is U, G or A; h is U, A or C; b is G, U or C; n may be any ribonucleotide; the box D sequence comprises NYVWGA or CUGA; wherein N may be any ribonucleotide; y is C or U; v is C, G or A; w is U or A;
still further, the box C 'sequence comprises GAGGAAGA or the box C' sequence is GAGGAAGA and the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA.
3. The method according to claim 1 or 2, characterized in that: the engineered sgRNA molecule can direct Cas protein to the nucleus;
further, the engineered sgRNA molecule can direct Cas protein to the nucleus and target or modify a target nucleic acid;
further, the Cas protein is optionally selected from Cas9, Cas12, and Cas 13.
4. The method of claim 3, the Cas protein being free of nuclear localization sequences.
5. The modified sgRNA molecule has a nucleotide chain comprising a sequence with a function of initiating nuclear localization in a snoRNA molecule.
6. The engineered sgRNA molecule of claim 5, wherein: the sgRNA molecule comprises a guide sequence, and a backbone sequence comprising a sequence that functions as a nuclear localization in the snoRNA molecule.
7. The engineered sgRNA molecule of claim 5 or 6, wherein: the sequence for the nuclear localization function in the snoRNA molecule is selected from the following sequences (a1) and/or (a 2):
(a1) box C' and/or box D sequences from snornas,
(a2) box H and/or box ACA sequences from snoRNA.
8. The engineered sgRNA molecule of claim 7, wherein: the sequences of the snoRNA molecule which play a nuclear localization function are box C' and box D;
further, the box C' sequence comprises or is DGAHBN; wherein D is U, G or A; h is U, A or C; b is G, U or C; n may be any ribonucleotide; the box D sequence comprises NYVWGA, GGCUGA or CUGA, or NYVWGA, GGCUGA or CUGA; wherein N can be any ribonucleotide; y is C or U; v is C, G or A; w is U or A;
still further, the box C 'sequence comprises GAGGAAGA or the box C' sequence is GAGGAAGA, the box D sequence comprises CUGA or the box D sequence is selected from GGCUGA and CUGA;
still further, the sequence serving a nuclear localization function in said snoRNA molecule comprises GAGGAAGAGCGUCAGCAGGCUGA;
further, the sequence of the snoRNA molecule which functions as a nuclear localization is GAGGAAGAGCGUCAGCAGGCUGA.
9. The engineered sgRNA molecule of any one of claims 5-8, wherein the snoRNA molecule is optionally selected from: u3snoRNA, U8 snoRNA, U14snoRNA, U15 snoRNA, U16 snoRNA, U20 snoRNA, U21 snoRNA, and U24 to U63 snoRNA.
10. The engineered sgRNA molecule of any one of claims 5-9, wherein: the modified sgRNA molecule is obtained by inserting or replacing a non-complementary pairing sequence of a framework sequence secondary structure of the sgRNA molecule into a sequence with a nuclear localization function in the snoRNA molecule before modification;
further, the framework sequence secondary structure non-complementary pairing sequence is a cyclization sequence of the framework sequence secondary structure of the sgRNA molecule before modification.
11. The engineered sgRNA molecule of claim 10, wherein: the modified sgRNA molecule is obtained by connecting a sequence with a framework sequence of the modified sgRNA molecule and a sequence with a function of initiating a nuclear localization in the snoRNA molecule through a connecting sequence;
furthermore, one end of a sequence with a nuclear localization function in the snoRNA molecule is connected with the framework sequence of the pre-modified sgRNA molecule through a connecting sequence 1, and the other end of the sequence is connected with the framework sequence of the pre-modified sgRNA molecule through a connecting sequence 2.
12. The engineered sgRNA molecule of any one of claims 5-11, wherein: more than 2 sequences which play a nuclear localization function in the snoRNA molecule exist on the nucleotide chain of the sgRNA, and the nuclear localization function sequences are directly connected with each other or connected through a connecting sequence 3.
13. The engineered sgRNA molecule of any one of claims 5-12, wherein: the sequence length of the guide sequence of the modified sgRNA molecule is 10bp-50 bp.
14. The engineered sgRNA molecule of any one of claims 5 to 13, wherein: the sequence length of the framework sequence of the modified sgRNA molecule is 10bp-300 bp.
15. The engineered sgRNA molecule of any one of claims 5-14, wherein: the engineered sgRNA molecule can direct Cas protein to the nucleus;
further, the engineered sgRNA molecule can direct Cas protein to the nucleus and target or modify the target nucleic acid.
16. The method of claim 15, the Cas protein being free of nuclear localization sequences.
17. The engineered sgRNA molecule of claim 15 or 16, wherein: the Cas protein is optionally selected from Cas9, Cas12, and Cas 13.
18. A DNA molecule encoding the engineered sgRNA molecule of any one of claims 5-17.
19. An expression cassette, expression vector, recombinant bacterium or transgenic cell line comprising the DNA molecule of claim 18.
20. A kit comprising any one of:
i. a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;
ii. An expression vector 1 comprising a nucleotide sequence encoding a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;
iii, a Cas protein, and an expression vector 2 comprising a nucleotide sequence encoding the engineered sgRNA molecule of any one of claims 5-17;
iv, an expression vector 1 comprising a nucleotide sequence encoding a Cas protein, and an expression vector 2 comprising a nucleotide sequence encoding the engineered sgRNA molecule of any one of claims 5-17;
v, an expression vector 3 comprising a nucleotide sequence encoding a Cas protein and a nucleotide sequence encoding the engineered sgRNA molecule of any one of claims 5-17.
21. A composition selected from any one of:
I. a composition comprising: a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;
II. A composition comprising: a nucleic acid molecule 1 encoding a Cas protein, and the engineered sgRNA molecule of any one of claims 5-17;
III, a composition comprising: a Cas protein, and a nucleic acid molecule 2 encoding the engineered sgRNA molecule of any one of claims 5-17;
IV, a composition comprising: a nucleic acid molecule 1 encoding a Cas protein, and a nucleic acid molecule 2 encoding the engineered sgRNA molecule of any one of claims 5-17;
v, a composition comprising: a nucleic acid molecule 3 encoding a Cas protein and the engineered sgRNA molecule of any one of claims 5-17.
22. The engineered sgRNA molecule of any one of claims 5-17, the DNA molecule of claim 18, the expression cassette, expression vector, recombinant bacterium, or transgenic cell line of claim 19, the kit of claim 20, or the composition of claim 21, for use in any one of:
p1, targeting or modifying a genomic target nucleic acid; and
p2, products prepared for targeting or modifying genomic target nucleic acids.
23. A method of targeting or modifying a genomic target nucleic acid comprising: introducing the composition of claim 21 into an organism or biological cell such that both the Cas protein and the engineered sgRNA molecule are expressed, resulting in targeting or modification of a genomic target nucleic acid.
24. A method of making a mutant biological cell, comprising: the method of claim 23, wherein the genome of the biological cell is targeted or modified to obtain a mutant of the biological cell.
25. A method of making a biological mutant comprising: the method of claim 23, wherein the genome of the organism is targeted or modified to obtain a biological mutant.
CN202210539746.4A 2021-11-15 2022-05-18 Engineered sgRNA molecules and uses thereof Active CN114990104B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111348694 2021-11-15
CN2021113486944 2021-11-15

Publications (2)

Publication Number Publication Date
CN114990104A true CN114990104A (en) 2022-09-02
CN114990104B CN114990104B (en) 2023-10-20

Family

ID=83026997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210539746.4A Active CN114990104B (en) 2021-11-15 2022-05-18 Engineered sgRNA molecules and uses thereof

Country Status (1)

Country Link
CN (1) CN114990104B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190002920A1 (en) * 2015-04-30 2019-01-03 The Brigham And Women's Hospital, Inc. Methods and kits for cloning-free genome editing
US20190194632A1 (en) * 2016-04-25 2019-06-27 The Regents Of The University Of California Methods and compositions for genomic editing
WO2020044039A1 (en) * 2018-08-29 2020-03-05 Oxford University Innovation Limited Modified sgrnas
CN110982818A (en) * 2019-12-20 2020-04-10 北京市农林科学院 Application of nuclear localization signal F4NLS in efficient creation of rice herbicide resistant material
US20210054371A1 (en) * 2019-08-19 2021-02-25 Minghong Zhong Conjugates of Guide RNA-Cas Protein Complex

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190002920A1 (en) * 2015-04-30 2019-01-03 The Brigham And Women's Hospital, Inc. Methods and kits for cloning-free genome editing
US20190194632A1 (en) * 2016-04-25 2019-06-27 The Regents Of The University Of California Methods and compositions for genomic editing
WO2020044039A1 (en) * 2018-08-29 2020-03-05 Oxford University Innovation Limited Modified sgrnas
US20210054371A1 (en) * 2019-08-19 2021-02-25 Minghong Zhong Conjugates of Guide RNA-Cas Protein Complex
CN110982818A (en) * 2019-12-20 2020-04-10 北京市农林科学院 Application of nuclear localization signal F4NLS in efficient creation of rice herbicide resistant material

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AARTHI NARAYANAN等: "Role of the Box C/D Motif in Localization of Small Nucleolar RNAs to Coiled Bodies and Nucleoli" *

Also Published As

Publication number Publication date
CN114990104B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
KR102191739B1 (en) Modified foot-and-mouth disease virus 3C protease, composition and method thereof
CN111108207A (en) Genome editing means for gene therapy of genetic disorders and gene therapy in combination with viral vectors
AU2021200863A1 (en) Genetically-modified cells comprising a modified human t cell receptor alpha constant region gene
AU2020289750A1 (en) Engineered meganucleases with recognition sequences found in the human T cell receptor alpha constant region gene
CN112375748B (en) Novel coronavirus chimeric recombinant vaccine based on vesicular stomatitis virus vector, and preparation method and application thereof
KR20150125994A (en) A cell expression system
CN107674862B (en) CIK modified by similar chimeric antigen receptor and preparation method and application thereof
KR101657717B1 (en) Mammalian expression vector
CN112941038B (en) Novel recombinant coronavirus based on vesicular stomatitis virus vector, and preparation method and application thereof
US20030024009A1 (en) Manipulation of the phenolic acid content and digestibility of plant cell walls by targeted expression of genes encoding cell wall degrading enzymes
CN109943566A (en) The sgRNAs of selectively targeted YBX1 gene and its application
CN114934031B (en) Novel Cas effect protein, gene editing system and application
CN106957859A (en) It is a kind of to be used to save measles virus, the system and method for recombinant measles virus
CN108026150A (en) Stem rust of wheat resistant gene and application method
CN112725348B (en) Gene and method for improving single-base editing efficiency of rice and application of gene
CN112442515B (en) Application of gRNA target combination in construction of hemophilia model pig cell line
CN111315212B (en) Genome edited birds
CN114990104B (en) Engineered sgRNA molecules and uses thereof
CN114525304B (en) Gene editing method
CN112442513B (en) Cas9 overexpression vector and construction method and application thereof
CN112538497B (en) CRISPR/Cas9 system and application thereof in construction of alpha, beta and alpha &amp; beta thalassemia model pig cell lines
CN115212297A (en) Genetically engineered medicine for treating inflammatory arthritis and preparation method thereof
KR20140043890A (en) Regulated gene expression systems and constructs thereof
CN112522292B (en) CRISPR/Cas9 system for constructing congenital amaranth clone pig nuclear donor cells and application thereof
CN112522310B (en) CRISPR system and application thereof in construction of LRP5 gene mutant osteoporosis clone pig nuclear donor cell

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant