WO2023029492A1 - 一种外源基因定点整合的系统及方法 - Google Patents

一种外源基因定点整合的系统及方法 Download PDF

Info

Publication number
WO2023029492A1
WO2023029492A1 PCT/CN2022/086979 CN2022086979W WO2023029492A1 WO 2023029492 A1 WO2023029492 A1 WO 2023029492A1 CN 2022086979 W CN2022086979 W CN 2022086979W WO 2023029492 A1 WO2023029492 A1 WO 2023029492A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
protein
acid molecule
sequence
strand
Prior art date
Application number
PCT/CN2022/086979
Other languages
English (en)
French (fr)
Inventor
李伟
周琪
王晨鑫
方森
陈阳灿
Original Assignee
中国科学院动物研究所
北京干细胞与再生医学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院动物研究所, 北京干细胞与再生医学研究院 filed Critical 中国科学院动物研究所
Priority to CN202280059607.XA priority Critical patent/CN117897481A/zh
Publication of WO2023029492A1 publication Critical patent/WO2023029492A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Definitions

  • This application relates to the fields of genetic engineering and molecular biology.
  • the application relates to methods for site-directed integration of exogenous genes that do not rely on homology arms and linearization of donor vectors.
  • the present application also relates to systems and kits for editing nucleic acids and uses thereof, as well as methods for editing nucleic acids.
  • the systems, kits, and methods of the present application can be used to break one nucleic acid strand of a double-stranded target nucleic acid and form flaps at its ends (particularly the ends produced by the break), and can be used to create lobes in a nucleic acid molecule of interest, such as a genome DNA) to insert a target nucleic acid or to replace a nucleotide fragment in a nucleic acid molecule of interest (eg, genomic DNA) with a target nucleic acid.
  • a nucleic acid molecule of interest such as a genome DNA
  • Gene editing technology is a hot field of biomedical research, and has broad application prospects in the clinical treatment of genetic diseases, the construction of animal models, and the genetic breeding of crops.
  • Gene editing technology includes operations such as deletion, addition, and replacement of a single nucleotide or a DNA sequence at a specific site in the genome.
  • the site-specific knock-in of exogenous genes can be realized through homologous dependent recombination (HDR): a homologous arm of 500-3000 bp is introduced on both sides of the exogenous gene to achieve precise site-specific integration of the exogenous gene, but its The efficiency is extremely low, only about 0.01%.
  • HDR homologous dependent recombination
  • nucleases such as ZFN (zinc-finger nucleases), TALEN (transcription activator-like effector nucleases) or CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated protein-9 nuclease) Cut at the point to generate a DNA double strand break (DSB, double strand break), which can promote the site-directed knock-in of exogenous genes mediated by homologous recombination.
  • ZFN zinc-finger nucleases
  • TALEN transcription activator-like effector nucleases
  • CRISPR/Cas9 clustered regularly interspaced short palindromic repeats/CRISPR-associated protein-9 nuclease
  • HMEJ Homology-mediated end joining
  • Site-directed integration of exogenous DNA fragments can also be achieved using linear single-stranded DNA as a donor. Both ends of the single-stranded DNA donor have a 30-50nt homology arm. After the nuclease cuts at a specific site in the genome, the single strand is integrated into the DSB site by means of SDSA (synthesis dependent strand annealing), thereby realizing Integration at specific loci in the genome.
  • Linear single-stranded DNA is more efficient than HDR, but it is not precise enough: extra base insertions and deletions often occur at the adapter at the 5' end of the single-stranded DNA.
  • the cost of chemical synthesis of long fragments of linear single-strand DNA is high and not readily available. Therefore, this method is not suitable for site-directed knock-in of large fragments (greater than 1Kb) of foreign genes. In addition, when the insert exceeds 1Kb, the integration efficiency will be significantly reduced.
  • NHEJ-based site-specific knock-in such as HITI (Homology-independent target integration) technology
  • HITI Homology-independent target integration
  • NHEJ-based site-directed knock-in is not directional, and the position of the linker is often imprecise, which is prone to insertion or deletion of additional bases.
  • the site-specific knock-in method based on MMEJ is based on NHEJ and introduces micro-homology arms at both ends of the exogenous gene, but the efficiency is still very low.
  • Prime Editing is a novel gene editing method.
  • This method uses a fusion protein composed of spCas9 (nCas9) with H840A mutation and reverse transcriptase MLV-RT (Murine Leukemia Virus-Reverse Transcriptase), and PegRNA (Prime editing guide RNA) transformed from gRNA (guide RNA) , can realize any single base conversion/transversion or deletion, addition and replacement of small fragments of DNA.
  • PegRNA is produced by introducing a PBS (Prime binding site) sequence and a template sequence at the 3' end of the gRNA, wherein the template sequence contains an editing sequence and a homologous sequence of a genomic DSB site.
  • PBS Primary binding site
  • the complex formed by nCas9 and PegRNA binds to the genomic targeting site and cleaves the PAM chain, then the PBS sequence on the PegRNA is complementary to the free 3' end of the PAM chain, and then MLV-RT binds to the PegRNA
  • the template sequence is used as a template, and the editing sequence and homologous sequence are extended by reverse transcription at the 3' end of the PAM strand nick. Subsequently, through processes such as DNA single-strand replacement and mismatch repair, repair can be completed at the nick and the edited sequence can be integrated into the targeted site.
  • H840A nCas9 only cuts one strand of double-stranded DNA (ie, the PAM strand), it will not generate DSBs to trigger NHEJ. Therefore, this method is not easy to introduce additional base deletions or insertions, and the editing accuracy is high.
  • the length of the template sequence on PegRNA limits the length of the editable sequence, Prime Editing is only suitable for deletion or knock-in of base sequences less than 100bp.
  • Cas protein or “Cas nuclease” is an RNA-guided nuclease. Cas proteins are also known as casn1 nucleases or CRISPR-associated nucleases. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain repeats and spacers, which are sequences complementary to mobile genetic elements and capable of targeting invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).
  • crRNA CRISPR RNA
  • tracrRNAs trans-encoded small RNAs
  • Cas protein and two RNAs are required for DNA cleavage by type II CRISPR systems.
  • crRNA and tracrRNA can be engineered to be incorporated into a single guide RNA ("sgRNA” or "gNRA” for short). See, eg, Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821 (2012), the entire contents of which are incorporated herein by reference.
  • the term “complementary” means that two nucleic acid sequences are capable of forming hydrogen bonds between each other according to the base pairing principle (Waston-Crick principle), thereby forming a duplex.
  • the term “complementary” includes “substantially complementary” and “completely complementary”.
  • the term “fully complementary” means that every base in one nucleic acid sequence is capable of pairing with a base in the other nucleic acid strand without mismatches or gaps.
  • the term "substantially complementary” means that most of the bases in one nucleic acid sequence are capable of pairing with bases in the other nucleic acid strand, allowing mismatches or gaps (e.g., one or mismatches or gaps of several nucleotides).
  • two nucleic acid sequences that are "complementary” eg, substantially complementary or fully complementary
  • DNA polymerase refers to an enzyme capable of synthesizing another nucleic acid strand (DNA strand) using one nucleic acid strand (eg, DNA strand or RNA strand) as a template.
  • the DNA polymerase may be a DNA-dependent DNA polymerase (i.e., an enzyme capable of synthesizing a complementary DNA strand using a DNA strand as a template), or an RNA-dependent DNA polymerase (i.e., an enzyme capable of An enzyme that synthesizes a complementary DNA strand from an RNA strand as a template).
  • the DNA polymerase used herein is an RNA-dependent DNA polymerase, such as reverse transcriptase.
  • reverse transcriptase refers to an enzyme capable of synthesizing a complementary DNA strand using an RNA strand as a template.
  • the reverse transcriptase of the present application includes, but is not limited to, reverse transcriptase from retrovirus or other viruses or bacteria, and DNA polymerase with reverse transcription activity, such as TTH DNA polymerase, Taq DNA polymerase, TNE DNA polymerase , TMA DNA polymerase, etc.
  • Reverse transcriptases from retroviruses include, but are not limited to, those from Moloney murine leukemia virus (M-MLV), human immunodeficiency virus (HIV), avian sarcoma-leukemia virus (ASLV), Rous sarcoma virus (RSV), avian Myeloblastosis virus (AMV), avian erythroblastosis virus helper virus, avian granulocytosis virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Reverse transcriptase of Y73 helper virus, Rous-associated virus and myeloblastosis-associated virus (MAV).
  • M-MLV Moloney murine leukemia virus
  • HAV human immunodeficiency virus
  • ASLV avian sarcoma-leukemia virus
  • RSV Rous sarcom
  • reverse transcriptases are also found, eg, in US Patent Application 2002/0198944 (herein incorporated by reference in its entirety).
  • reverse transcriptases of the present application include, but are not limited to, any form, for example, naturally occurring reverse transcriptase, naturally occurring mutant reverse transcriptase, engineered mutant reverse transcriptase or other variants (for example, retaining its reverse truncated variants with recording activity).
  • hybridization and “annealing” mean the process by which complementary single-stranded nucleic acid molecules form double-stranded nucleic acids.
  • hybridization and “annealing” have the same meaning and are used interchangeably.
  • two nucleic acid sequences that are perfectly or substantially complementary will hybridize or anneal.
  • the degree of complementarity required for hybridization or annealing of two nucleic acid sequences depends on the hybridization conditions used, especially temperature.
  • condition allowing nucleic acid hybridization has the meaning generally understood by those skilled in the art, and can be determined by conventional methods.
  • two nucleic acid molecules having complementary sequences can hybridize under appropriate hybridization conditions.
  • Such hybridization conditions may involve the following factors: temperature, pH value, composition and ionic strength of the hybridization buffer, etc., and may be determined according to the length and GC content of the two complementary nucleic acid molecules.
  • low stringency hybridization conditions may be used when the two complementary nucleic acid molecules are relatively short in length and/or have a relatively low GC content.
  • High stringency hybridization conditions can be used when the two complementary nucleic acid molecules are relatively long in length and/or have relatively high GC content.
  • hybridization conditions are well known to those skilled in the art and can be found, for example, in Joseph Sambrook, et al., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001); and M.L.M.Anderson, Nucleic Acid Hybridization, Springer-Verlag New York Inc. N.Y. (1999).
  • “hybridization” and “annealing” have the same meaning and are used interchangeably. Accordingly, the expressions “conditions allowing nucleic acid hybridization” and “conditions allowing nucleic acid annealing” also have the same meaning and are used interchangeably.
  • upstream is used to describe the relative positional relationship of two nucleic acid sequences (or two nucleic acid molecules), and has the meaning generally understood by those skilled in the art.
  • the expression “one nucleic acid sequence is located upstream of another nucleic acid sequence” means that when arranged in the 5' to 3' direction, the former is located in a more forward position (i.e., closer to the 5' end) than the latter Location).
  • downstream has the opposite meaning of "upstream”.
  • linker refers to a chemical entity used to connect two physical elements (eg, two nucleic acids or two polypeptides).
  • a linker used to connect two polypeptides can be a peptide linker (e.g., a linker comprising multiple amino acid residues);
  • a linker used to link two nucleic acids can be a nucleic acid linker (e.g., a linker comprising multiple nucleotide residues); ).
  • a guide sequence refers to a targeting sequence comprised by a guide RNA.
  • a guide sequence is a polynucleotide sequence that is sufficiently complementary to a target sequence to hybridize to and direct specific binding of a CRISPR/Cas complex to the target sequence.
  • the degree of complementarity between a guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%.
  • Methods to determine the complementarity of two nucleic acid sequences are within the purview of those of ordinary skill in the art. For example, there are public and commercially available alignment algorithms and programs such as, but not limited to, ClustalW, Smith-Waterman in matlab, Bowtie, Geneious, Biopython, and SeqMan.
  • the term "scaffold sequence” refers to a sequence in a guide RNA that is recognized and bound by a Cas protein.
  • the scaffold sequence may comprise or consist of the repeat sequence of CRISPR.
  • the term "functional complex” refers to a complex formed by the combination of a guide RNA (guide RNA or gRNA) and a Cas protein, which can recognize and cut polynucleotides associated with the guide RNA.
  • target nucleic acid refers to a polynucleotide to which a targeting sequence is targeted, eg, a sequence having complementarity to the targeting sequence. Full complementarity of the guide sequence to the target sequence is not required, so long as there is sufficient complementarity to cause hybridization of the two and facilitate binding of the CRISPR/Cas complex.
  • a target sequence can comprise any polynucleotide, such as DNA or RNA.
  • the target sequence is located in the nucleus or cytoplasm of the cell.
  • the target sequence may be located in an organelle of the eukaryotic cell such as the mitochondria or chloroplast.
  • the expression "target sequence” or “target nucleic acid” can be any endogenous or exogenous polynucleotide for cells (eg, eukaryotic cells).
  • the target nucleic acid may be a polynucleotide present in the nucleus of a eukaryotic cell (eg, genomic DNA), or a polynucleotide (eg, vector DNA) introduced into the cell from a foreign source.
  • a target nucleic acid can be a sequence encoding a gene product (eg, a protein) or a non-coding sequence (eg, a regulatory polynucleotide or dummy DNA).
  • the target nucleic acid or target sequence comprises or is adjacent to a protospacer adjacent motif (PAM).
  • PAM protospacer adjacent motif
  • the exact sequence and length requirements of the PAM depend on the Cas protein used.
  • a PAM is a 2-5 base pair sequence adjacent to the protospacer sequence in a CRISPR cluster. Those skilled in the art will be able to identify the PAM sequence to use with a given Cas protein.
  • vector refers to a nucleic acid delivery vehicle into which a polynucleotide can be inserted.
  • the vector is called an expression vector.
  • a vector can be introduced into a host cell by transformation, transduction or transfection, so that the genetic material elements it carries can be expressed in the host cell.
  • Vectors are well known to those skilled in the art, including but not limited to: plasmids; phagemids; cosmids; nanoliposome particles; exosomes; artificial chromosomes, such as yeast artificial chromosome (YAC), bacterial artificial chromosome (BAC ) or artificial chromosome (PAC) derived from P1; bacteriophage such as lambda phage or M13 phage and animal virus, etc.
  • YAC yeast artificial chromosome
  • BAC bacterial artificial chromosome
  • PAC artificial chromosome
  • Animal viruses that can be used as vectors include, but are not limited to, retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpesviruses (such as herpes simplex virus), poxviruses, baculoviruses, papillomaviruses, papillomaviruses, papillomaviruses, Polyoma vacuolar virus (eg SV40).
  • retroviruses including lentiviruses
  • adenoviruses such as herpes simplex virus
  • poxviruses such as herpes simplex virus
  • baculoviruses such as herpes simplex virus
  • baculoviruses such as herpes simplex virus
  • papillomaviruses papillomaviruses
  • papillomaviruses papillomaviruses
  • Polyoma vacuolar virus eg
  • the design of the expression vector may depend on factors such as the choice of host cell to be transformed, the level of expression desired, and the like.
  • the vector When the vector carries the exogenous DNA to be integrated into the host genome and the non-protein expression elements related to the integration of the exogenous DNA, the vector is called a donor vector.
  • Exogenous DNA includes, but is not limited to, complete genes or gene fragments, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and protein coding sequences.
  • the non-protein expression elements related to the integration of foreign DNA include but not limited to the homologous sequence of the proposed insertion site, the targeted cleavage sequence of the tool enzyme, etc.
  • Adeno-associated virus vectors include but are not limited to AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV-DJ and other serotypes of adeno-associated virus and other modified serotypes of adeno-associated virus.
  • the "intein” refers to a type of internal protein element that can mediate the splicing of the translated protein.
  • the intein is located in the middle of the polypeptide sequence, which is cut off after processing and catalyzes the connection of the exteins at both ends into a mature protein molecule.
  • the "intein splitting system” is a system for efficiently splitting and assembling larger protein molecules by using intein. Inteins can be split into N-terminal and C-terminal segments.
  • the target protein is split into two parts, the N-terminal segment and the C-terminal segment, which are respectively connected with the N-terminal segment and the C-terminal segment of intein to form a fusion protein.
  • Inteins suitable for use in the present invention are derived from, but not limited to, DnaE DNA polymerases from Synechocystis sp. PCC6803 and Nostoc punctiforme PCC73102 (Npu).
  • the term "host cell” refers to cells that can be used to introduce vectors, including, but not limited to, prokaryotic cells such as Escherichia coli or Bacillus subtilis, fungal cells such as yeast cells or Aspergillus, Insect cells such as S2 Drosophila cells or Sf9, or animal cells such as fibroblasts, CHO cells, COS cells, NSO cells, HeLa cells, BHK cells, HEK 293 cells or human cells.
  • prokaryotic cells such as Escherichia coli or Bacillus subtilis
  • fungal cells such as yeast cells or Aspergillus
  • Insect cells such as S2 Drosophila cells or Sf9
  • animal cells such as fibroblasts, CHO cells, COS cells, NSO cells, HeLa cells, BHK cells, HEK 293 cells or human cells.
  • spCas9(H840A) refers to a mutant of the spCas9 protein, specifically, the 840th amino acid corresponding to the spCas9 protein is mutated from H to A.
  • saCas9(R1226A) refers to a mutant of the saCas9 protein, specifically, the 1226th amino acid corresponding to the saCas9 protein is mutated from R to A.
  • PE-spCas9 refers to the fusion protein produced by fusion of spCas9(H840A) and reverse transcriptase MLV RT.
  • valve refers to a free fragment of nucleic acid connected at the 3' end of a nick produced by a strand break of a double-stranded target nucleic acid, which fragment does not bind to the corresponding core of the other strand.
  • the nucleotide fragments are complementary and therefore free.
  • homologous flap means that the flap sequence formed on the double-stranded target nucleic acid is identical or complementary to the terminal sequence of the specific cleavage site on the genome.
  • the flap can be obtained by the following steps: After cas protein breaks a strand of a double-stranded target nucleic acid (for example, a donor vector containing a nucleic acid sequence of interest), the 3' end of the cut nucleic acid strand
  • the extension can be performed using a template sequence (eg, pegRNA) that anneals to the fragmented nucleic acid strand as a template, and free nucleic acid fragments are formed.
  • a template sequence eg, pegRNA
  • the term “homologous dependent recombination (HDR, homologous dependent recombination)” refers to the nucleic acid sequence and genome or nucleic acid fragments upstream and/or downstream of the target nucleic acid sequence in a construct (for example, a donor nucleic acid vector).
  • a DNA recombination process based on sequence homology of nucleic acid sequences upstream and/or downstream of the target site.
  • the nucleic acid sequences upstream and/or downstream of the target nucleic acid sequence in the donor nucleic acid vector are referred to as "donor homology arms”.
  • the nucleic acid sequences upstream and/or downstream of the target site in the genome or nucleic acid fragment are referred to as "target site homology arms”.
  • the donor homology arms are identical or highly homologous, i.e., have at least 85%, 90%, 95%, 98%, or 100% homology to the target site homology arms sequence identity.
  • the donor homology arm is located upstream of the nucleic acid sequence of interest and the target site homology arm is located upstream of the target site. In certain embodiments, the donor homology arm is located downstream of the nucleic acid sequence of interest and the target site homology arm is located downstream of the target site.
  • the application provides a system or kit comprising the following four components:
  • the first functional complex is capable of breaking a nucleic acid strand of a first double-stranded target nucleic acid
  • the first index primer or the nucleic acid molecule D1 containing the nucleotide sequence encoding the first index primer wherein the first index primer contains the first index sequence and the first target binding sequence, and the first A tag sequence is located upstream or 5' of the first target binding sequence; and, under conditions that allow nucleic acid hybridization or annealing, the first target binding sequence is capable of hybridizing or annealing to the 3' of the fragmented nucleic acid strand end, forming a double-stranded structure, and the first tag sequence is not combined with the nucleic acid strand, and is in a free single-stranded state.
  • the first Cas protein is selected from Cas proteins that cut DNA single strands, for example, cutting DNA single strands refers to cutting non-gRNA target binding DNA single strands.
  • the first Cas protein is selected from Cas9 protein, Cas12a protein, cas12b protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12h protein, cas12i protein, cas14 protein, Cas13a protein , Cas1 protein, Cas1B protein, Cas2 protein, Cas3 protein, Cas4 protein, Cas5 protein, Cas6 protein, Cas7 protein, Cas8 protein, Cas10 protein, Csy1 protein, Csy2 protein, Csy3 protein, Cse1 protein, Cse2 protein, Csc1 protein, Csc2 protein, Csa5 protein, Csn2 protein, Csm2 protein, Csm3 protein, Csm4 protein, Csm5 protein, Csm6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr5 protein, Cmr6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, C
  • the first Cas protein can break one nucleic acid strand of the first double-stranded target nucleic acid and generate a nick.
  • the first Cas protein is a mutant of the Cas9 protein, such as a mutant of the Cas9 protein of Streptococcus pyogenes (S.pyogenes) (spCas9(H840A)).
  • the first Cas protein has the amino acid sequence shown in SEQ ID NO:3.
  • Cas9 proteins The sequences and structures of various Cas proteins are well known to those skilled in the art.
  • various Cas9 proteins and their homologues have been reported in various species, including but not limited to Streptococcus pyogenes and Streptococcus thermophilus.
  • Other suitable Cas9 proteins will be apparent to those skilled in the art based on the present disclosure, for example, Chylinski, Rhun, and Charpentier.
  • the Cas9 is Cas9 from the following species: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheriae (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Refs: NC_015683.1, NC_017317.1); Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychrof lexus torquisI (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1); Listeria
  • the first DNA polymerase is selected from, but not limited to, a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
  • the first DNA polymerase is an RNA-dependent DNA polymerase.
  • the first DNA polymerase is a reverse transcriptase, such as those listed above, such as the reverse transcriptase of Moloney Murine Leukemia Virus.
  • the first DNA polymerase has the amino acid sequence shown in SEQ ID NO:7.
  • the first Cas protein is linked to the first DNA polymerase.
  • the first Cas protein is covalently linked to the first DNA polymerase via a linker or not.
  • the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO:35.
  • the first Cas protein is fused to the first DNA polymerase through a peptide linker or not through a peptide linker to form a first fusion protein.
  • the first Cas protein is optionally connected or fused to the N-terminus of the first DNA polymerase through a linker; or, the first Cas protein is optionally connected or fused to the first DNA polymerase through a linker. the C-terminus of the first DNA polymerase.
  • the first fusion protein has the amino acid sequence shown in SEQ ID NO:8.
  • the linker is a peptide linker.
  • the peptide linker is 5-200 amino acids in length, such as 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 30-40, 40-50, 50- 60, 60-70, 70-80, 80-90, 90-100, 100-150 or 150-200 amino acids.
  • the first fusion protein or the first cas protein can be split into two parts by an intein split system. It is easy to understand that the intein splitting system can split at any amino acid position of the first fusion protein or the first cas protein. For example, in certain embodiments, the intein resolution system performs resolution within the first cas protein. Thus, in certain embodiments, the first cas protein is split into an N-terminal segment and a C-terminal segment.
  • the N-terminal segment and the C-terminal segment of the first cas protein may be fused to the N-terminal segment and the C-terminal segment of the intein (or respectively to the C-terminal segment and the N-terminal segment of the intein), and the two The latter can be reconstituted into the first active cas protein in the cell.
  • the N-terminal segment and the C-terminal segment of the first cas protein are inactive in the isolated state, but can be reconstituted into an active first cas protein in the cell.
  • the nucleic acid molecule A1 can be split into two parts, which respectively comprise the nucleotide sequences encoding the N-terminal segment and the C-terminal segment of the first cas protein.
  • the first DNA polymerase can be fused to the N-terminal segment or the C-terminal segment of the first cas protein.
  • the first DNA polymerase is fused to the C-terminal segment of the first cas protein.
  • the first gRNA contains a first guide sequence, and, under conditions that allow nucleic acid hybridization or annealing, the first guide sequence is capable of hybridizing or annealing to a nucleic acid of the first double-stranded target nucleic acid chain.
  • the length of the first guide sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the first gRNA also contains a first scaffold sequence, which can be recognized and bound by the first Cas protein, thereby forming a first functional complex.
  • the length of the first scaffold sequence is at least 20 nt, such as 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, or longer.
  • the first leader sequence is located upstream or 5' of the first scaffold sequence.
  • the first functional complex is capable of combining the first nucleic acid strand (the second strand) of the first double-stranded target nucleic acid after the first guide sequence binds to one nucleic acid strand (the second strand) of the first double-stranded target nucleic acid.
  • the other strand of nucleic acid (the first strand) is broken.
  • the first target binding sequence is capable of hybridizing or annealing to the 3' end of one nucleic acid strand of the fragmented target nucleic acid fragment under conditions that allow nucleic acid hybridization or annealing, and the 3' end is formed by the first functional complex cleaving the first double-stranded target nucleic acid.
  • the length of the first target binding sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50 -100nt, 100-200nt, or longer.
  • the length of the first tag sequence is at least 4nt, such as 4-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • said first DNA polymerase is capable of using a first indexing primer A template to extend the 3' end of the nucleic acid strand.
  • the extension forms the first lobes.
  • the first indexing primer is single-stranded deoxyribonucleic acid or single-stranded ribonucleic acid.
  • the first indexing primer is a single-stranded ribonucleic acid
  • the first DNA polymerase is an RNA-dependent DNA polymerase
  • the first indexing primer is a single-stranded deoxyribonucleic acid
  • the first DNA polymerase is a DNA-dependent DNA polymerase
  • the nucleic acid strand to which the first guide sequence binds is different than the nucleic acid strand to which the first target binding sequence binds. In certain embodiments, the nucleic acid strand to which the first guide sequence binds is the opposite strand to the nucleic acid strand to which the first target binding sequence binds.
  • the first index primer is attached to the first gRNA.
  • the first index primer is covalently linked to the first gRNA with or without a linker.
  • the first index primer is optionally linked to the 3' end of the first gRNA by a linker.
  • the linker is a nucleic acid linker (eg, ribonucleic acid linker or deoxyribonucleic acid linker).
  • the first indexing primer is a single-stranded ribonucleic acid, and it is connected to the 3' end of the first gRNA through a ribonucleic acid adapter or not through an ribonucleic acid adapter to form a first PegRNA.
  • the nucleic acid molecule A1 is capable of expressing the first Cas protein in cells. In certain embodiments, said nucleic acid molecule B1 is capable of expressing said first DNA polymerase in a cell. In some embodiments, the nucleic acid molecule C1 is capable of transcribing the first gRNA in the cell. In certain embodiments, the nucleic acid molecule D1 is capable of transcribing the first tagging primer in the cell.
  • the nucleic acid molecule A1 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule A1 is an expression vector containing a nucleotide sequence encoding the first Cas protein (eg, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule A1 is an expression vector containing a nucleotide sequence encoding the first Cas protein (eg, eukaryotic expression vectors).
  • the nucleic acid molecule B1 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule B1 is an expression expression comprising a nucleotide sequence encoding the first DNA polymerase Vectors (eg, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule B1 is an expression expression comprising a nucleotide sequence encoding the first DNA polymerase Vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule C1 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule C1 is an expression vector containing a nucleotide sequence encoding the first gRNA ( For example, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule C1 is an expression vector containing a nucleotide sequence encoding the first gRNA (for example, eukaryotic expression vectors).
  • the nucleic acid molecule D1 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule D1 is an expression vector containing a nucleotide sequence encoding the first label primer (eg, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule D1 is an expression vector containing a nucleotide sequence encoding the first label primer (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (for example, a eukaryotic expression vector); in some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are expressed in the cell can transcribe the first PegRNA containing the first gRNA and the first label primer.
  • two, three or four of the nucleic acid molecules A1, B1, C1 and D1 are contained in the same expression vector (eg, a eukaryotic expression vector).
  • system or kit comprises:
  • (M1-1) a first fusion protein containing the first Cas protein and the first DNA polymerase, or, a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein; or, (M1- 2) the isolated first Cas protein and first DNA polymerase, or a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase; and,
  • M2 the first PegRNA containing the first gRNA and the first label primer, or a nucleic acid molecule containing a nucleotide sequence encoding the first PegRNA.
  • system or kit further comprises:
  • the second nucleic acid editing system is homologous recombination technology.
  • system or kit further comprises a nucleic acid vector (eg, a donor nucleic acid vector).
  • a nucleic acid vector eg, a donor nucleic acid vector
  • the nucleic acid vector further comprises the first PAM sequence recognized by the first Cas protein, and/or, a donor homology arm.
  • the nucleic acid vector is double-stranded.
  • the nucleic acid vector is a circular double-stranded vector.
  • the nucleic acid vector comprises a first leader binding sequence (eg, the complement of the first leader sequence) capable of hybridizing or annealing to the first leader sequence.
  • a first leader binding sequence eg, the complement of the first leader sequence
  • the first functional complex is capable of cleaving a nucleic acid strand of the nucleic acid vector through the first guide binding sequence and the first PAM sequence.
  • the nucleic acid vector further comprises a nucleic acid sequence of interest.
  • the target nucleic acid sequence is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site in the genome.
  • the first PAM sequence and the donor homology arms are respectively located on both sides of the target nucleic acid sequence.
  • the first primer binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
  • the first functional complex breaks the first strand of the nucleic acid carrier, and the first strand comprises a nick generated by the break, located at the 3' end of the nick and the donor
  • the double-stranded portion between the homology arms contains the nucleic acid sequence of interest, and is referred to as a target nucleic acid fragment containing the nucleic acid sequence of interest.
  • the first indexing primer is capable of passing the 3' end of the nucleic acid strand broken by the first target binding sequence and the first functional complex hybridization or annealing to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the nucleic acid strand to which the first target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand comprising the first guide binding sequence.
  • the nucleic acid vector further comprises a first target sequence; wherein, under conditions that allow nucleic acid hybridization or annealing, the first index primer is capable of binding to the first target sequence through the first target binding sequence.
  • the target sequence is hybridized or annealed to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state; preferably, the first target sequence is located on the opposite strand of the first guide binding sequence .
  • the first target sequence is located at the end of the first strand that is cleaved, and in certain embodiments, after cleavage of the first strand by the first functional complex, a first target sequence is contained.
  • the 3' end of the nucleic acid strand of the sequence is capable of being extended (in some embodiments, forming a first lobe) templated by a first indexing primer that anneals to a first target sequence.
  • the nucleic acid vector further comprises a restriction site between the first target sequence and the donor homology arms.
  • the nucleic acid vector further comprises an exogenous gene between the first target sequence and the donor homology arms.
  • system or kit further comprises:
  • the second functional complex is capable of breaking one nucleic acid strand of the second double-stranded target nucleic acid.
  • the second Cas protein is the same or different from the first Cas protein. In certain embodiments, the second Cas protein is identical to the first Cas protein.
  • the second gRNA contains a second guide sequence, and the second guide sequence is capable of hybridizing or annealing to a nucleic acid of a second double-stranded target nucleic acid under conditions that allow nucleic acid hybridization or annealing chain.
  • the second functional complex binds the other strand (first strand) of the second double-stranded target nucleic acid after the second guide sequence binds to one strand (first strand) of the second double-stranded target nucleic acid.
  • Nucleic acid strand (second strand) breaks.
  • the second leader sequence is different from the first leader sequence.
  • the second double-stranded target nucleic acid is the same as or different from the first double-stranded target nucleic acid.
  • the second double-stranded target nucleic acid is identical to the first double-stranded target nucleic acid, and the second functional complex is at a different Positional breaks of different nucleic acid strands of the double stranded target nucleic acid.
  • the second functional complex cleaves different nucleic acid strands of the same double-stranded target nucleic acid as the first functional complex, and the nucleic acid strand to which the first guide sequence binds is the same as the first functional complex.
  • the nucleic acid strands to which the second guide sequence binds are different.
  • the strand of nucleic acid to which the first guide sequence binds is the opposite strand to the strand of nucleic acid to which the second guide sequence binds.
  • the second double-stranded target nucleic acid is the same double-stranded target nucleic acid as the first double-stranded target nucleic acid, the double-stranded target nucleic acid comprises a first strand and a second strand, and the first After the first guide sequence is combined with the second strand, the functional complex can break the first strand, and the second functional complex can break the second strand after the second guide sequence is combined with the first strand. Chain break.
  • the length of the second guide sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the second gRNA also contains a second scaffold sequence, which can be recognized and bound by the second Cas protein, thereby forming a second functional complex.
  • the length of the second scaffold sequence is at least 20 nt, such as 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, or longer.
  • the second scaffold sequence is the same as or different from the first scaffold sequence. In certain embodiments, the second scaffold sequence is identical to the first scaffold sequence.
  • the second leader sequence is located upstream or 5' to the second scaffold sequence.
  • the nucleic acid molecule C2 is capable of transcribing the second gRNA in the cell.
  • the nucleic acid molecule C2 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule C2 is an expression vector containing a nucleotide sequence encoding the second gRNA ( For example, eukaryotic expression vectors).
  • the second Cas protein is different from the first Cas protein; and, the system or kit further comprises:
  • the second Cas protein is capable of breaking one nucleic acid strand of the second double-stranded target nucleic acid and generating a nick.
  • the second Cas protein is selected from Cas proteins that cut DNA single strands, for example, cutting DNA single strands refers to cutting non-gRNA target binding DNA single strands.
  • the second Cas protein is selected from Cas9 protein, Cas12a protein, cas12b protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12h protein, cas12i protein, cas14 protein, Cas13a protein , Cas1 protein, Cas1B protein, Cas2 protein, Cas3 protein, Cas4 protein, Cas5 protein, Cas6 protein, Cas7 protein, Cas8 protein, Cas10 protein, Csy1 protein, Csy2 protein, Csy3 protein, Cse1 protein, Cse2 protein, Csc1 protein, Csc2 protein, Csa5 protein, Csn2 protein, Csm2 protein, Csm3 protein, Csm4 protein, Csm5 protein, Csm6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr5 protein, Cmr6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, C
  • the second Cas protein is a mutant of the Cas9 protein, such as a mutant of the Cas9 protein of Streptococcus pyogenes (S.pyogenes) (spCas9(H840A)).
  • the second Cas protein has the amino acid sequence shown in SEQ ID NO:3.
  • the nucleic acid molecule A2 is capable of expressing the second Cas protein in cells.
  • the nucleic acid molecule A2 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule A2 is an expression vector containing a nucleotide sequence encoding the second Cas protein (eg, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule A2 is an expression vector containing a nucleotide sequence encoding the second Cas protein (eg, eukaryotic expression vectors).
  • system or kit further comprises:
  • the second target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand under conditions that allow nucleic acid hybridization or annealing, and the 3' end is due to the A second functional complex is formed by breaking the nucleic acid strand.
  • the length of the second target binding sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50 -100nt, 100-200nt, or longer.
  • the second target binding sequence is different from the first target binding sequence.
  • the nucleic acid strand to which the second target binding sequence binds is different than the nucleic acid strand to which the first target binding sequence binds.
  • the nucleic acid strand to which the second target binding sequence binds is the opposite strand to the nucleic acid strand to which the first target binding sequence binds.
  • the length of the second tag sequence is at least 4nt, such as 4-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the second tag sequence is the same as or different from the first tag sequence; in some embodiments, the second tag sequence is different than the first tag sequence.
  • a second DNA polymerase is capable of extending said nucleic acid strand using a second indexing primer as a template the 3' end.
  • the extension forms a second valve process.
  • the second DNA polymerase is the same or different than the first DNA polymerase. In certain embodiments, the second DNA polymerase is the same as the first DNA polymerase.
  • the second indexing primer is single-stranded deoxyribonucleic acid or single-stranded ribonucleic acid.
  • the second indexing primer is single-stranded ribonucleic acid
  • the second DNA polymerase is RNA-dependent DNA polymerase
  • the second indexing primer is single-stranded deoxyribonucleic acid
  • the second DNA polymerase is a DNA-dependent DNA polymerase
  • the nucleic acid strand to which the second guide sequence binds is different than the nucleic acid strand to which the second target binding sequence binds. In certain embodiments, the nucleic acid strand to which the second guide sequence binds is the opposite strand to the nucleic acid strand to which the second target binding sequence binds.
  • the second guide sequence binds to the same nucleic acid strand as the first target binding sequence, and the binding position of the second guide sequence is located at the position of the binding position of the first target binding sequence. upstream or 5' end.
  • the first guide sequence binds to the same nucleic acid strand as the second target-binding sequence, and the binding position of the first guide sequence is located adjacent to the binding position of the second target-binding sequence. upstream or 5' end.
  • first valve and the second valve are comprised on the same double-stranded target nucleic acid and are on opposite nucleic acid strands from each other.
  • the nucleic acid molecule D2 is capable of transcribing the second tagging primer in the cell.
  • the nucleic acid molecule D2 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule D2 is an expression vector containing a nucleotide sequence encoding the second label primer (eg, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule D2 is an expression vector containing a nucleotide sequence encoding the second label primer (eg, eukaryotic expression vectors).
  • the second DNA polymerase is different from the first DNA polymerase; and, the system or kit further comprises:
  • the second DNA polymerase is selected from, but not limited to, a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
  • the second DNA polymerase is an RNA-dependent DNA polymerase.
  • the second DNA polymerase is a reverse transcriptase, such as those listed above, such as the reverse transcriptase of Moloney Murine Leukemia Virus.
  • the second DNA polymerase has the amino acid sequence shown in SEQ ID NO:7.
  • said nucleic acid molecule B2 is capable of expressing said second DNA polymerase in a cell.
  • the nucleic acid molecule B2 is contained in an expression vector (for example, a eukaryotic expression vector), or, the nucleic acid molecule B2 is an expression expression containing a nucleotide sequence encoding the second DNA polymerase Vectors (eg, eukaryotic expression vectors).
  • an expression vector for example, a eukaryotic expression vector
  • the nucleic acid molecule B2 is an expression expression containing a nucleotide sequence encoding the second DNA polymerase Vectors (eg, eukaryotic expression vectors).
  • the second index primer is connected to the second gRNA.
  • the second index primer is covalently linked to the second gRNA with or without a linker.
  • the second index primer is optionally linked to the 3' end of the second gRNA by a linker.
  • the linker is a nucleic acid linker (eg, ribonucleic acid linker or deoxyribonucleic acid linker).
  • the second labeling primer is a single-stranded ribonucleic acid, and it is connected to the 3' end of the second gRNA through a ribonucleic acid adapter or not through an ribonucleic acid adapter to form a second PegRNA.
  • the nucleic acid molecule C2 and nucleic acid molecule D2 are contained in the same expression vector (eg, a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA containing the second gRNA and the second tagging primer in the cell.
  • the system or kit comprises: a second PegRNA comprising the second gRNA and the second labeling primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA .
  • the second Cas protein is isolated or linked to the second DNA polymerase.
  • the second Cas protein is covalently linked to the second DNA polymerase via a linker or not.
  • the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO:35.
  • the second Cas protein is fused to the second DNA polymerase via a peptide linker or not via a peptide linker to form a second fusion protein.
  • the second Cas protein is optionally connected or fused to the N-terminus of the second DNA polymerase through a linker; or, the second Cas protein is optionally connected or fused to the N-terminal of the second DNA polymerase through a linker. the C-terminus of the second DNA polymerase.
  • the second fusion protein has the amino acid sequence shown in SEQ ID NO:8.
  • the second fusion protein or the second cas protein can be split into two parts by an intein split system. It is easy to understand that the intein splitting system can split at any amino acid position of the second fusion protein or the second cas protein. For example, in certain embodiments, the intein resolution system performs resolution within the second cas protein. Thus, in certain embodiments, the second cas protein is split into an N-terminal segment and a C-terminal segment.
  • the N-terminal segment and the C-terminal segment of the second cas protein may be fused to the N-terminal segment and the C-terminal segment of the intein (or respectively to the C-terminal segment and the N-terminal segment of the intein), and the two The latter can be reconstituted into an active second cas protein in the cell.
  • the N-terminal segment and the C-terminal segment of the second cas protein are inactive in an isolated state, but can be reconstituted into an active second cas protein in the cell.
  • the nucleic acid molecule A1 can be split into two parts, which respectively comprise the nucleotide sequences encoding the N-terminal segment and the C-terminal segment of the second cas protein.
  • the second DNA polymerase can be fused to the N-terminal segment or the C-terminal segment of the second cas protein.
  • the second DNA polymerase is fused to the C-terminal segment of the second cas protein.
  • the nucleic acid molecule A2 and nucleic acid molecule B2 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A2 and the nucleic acid molecule B2 can express the isolated second Cas protein and the second DNA polymerase in cells, or can express the isolated second Cas protein and the second DNA polymerase. A second fusion protein of the second DNA polymerase.
  • the system or kit comprises a second fusion protein comprising the second Cas protein and the second DNA polymerase, or, comprises a nucleotide encoding the second fusion protein sequence of nucleic acid molecules.
  • the isolated second Cas protein and the second DNA polymerase, or, the nucleic acid molecule capable of expressing the isolated second Cas protein and the second DNA polymerase comprises a second fusion protein comprising the second Cas protein and the second DNA polymerase, or, comprises a nucleotide encoding the second fusion protein sequence of nucleic acid molecules.
  • the isolated second Cas protein and the second DNA polymerase or, the nucleic acid molecule capable of expressing the isolated second Cas protein and the second DNA polymerase.
  • the first and second Cas proteins are the same Cas protein, the first and second DNA polymerases are the same DNA polymerase; and, the system or kit comprises:
  • (M1-1) a first fusion protein containing the first Cas protein and the first DNA polymerase, or, a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein; or, (M1- 2) the isolated first Cas protein and first DNA polymerase, or the nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase;
  • M2 a first PegRNA containing the first gRNA and a first label primer, or a nucleic acid molecule containing a nucleotide sequence encoding the first PegRNA;
  • M3 A second PegRNA containing the second gRNA and a second label primer, or a nucleic acid molecule containing a nucleotide sequence encoding the second PegRNA.
  • system or kit further comprises a nucleic acid vector (eg, a donor nucleic acid vector).
  • a nucleic acid vector eg, a donor nucleic acid vector
  • the nucleic acid vector further comprises a first PAM sequence recognized by the first Cas protein, and/or, a second PAM sequence recognized by the second Cas protein.
  • the nucleic acid vector is double-stranded.
  • the nucleic acid vector is a circular double-stranded vector.
  • the nucleic acid vector comprises a first guide binding sequence capable of hybridizing or annealing to the first guide sequence (for example, a complementary sequence of the first guide sequence), and/or, capable of hybridizing to the first guide sequence
  • a second guide binding sequence for example, a complementary sequence to the second guide sequence
  • Restriction enzyme cutting sites are also included between the binding sequences.
  • the first primer binding sequence and the second primer binding sequence are located on opposite strands of the nucleic acid vector.
  • the first functional complex can break a nucleic acid strand (first strand) of the nucleic acid carrier through the first guide binding sequence and the first PAM sequence; and/or , the second functional complex can break another nucleic acid strand (second strand) of the nucleic acid carrier through the second guide binding sequence and the second PAM sequence.
  • the nucleic acid vector further comprises a nucleic acid sequence of interest.
  • the target nucleic acid sequence is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site in the genome.
  • the first PAM sequence and the second PAM sequence are respectively located on both sides of the target nucleic acid sequence.
  • the first primer binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
  • the second primer binding sequence is located between the nucleic acid sequence of interest and the second PAM sequence.
  • the first functional complex and the second functional complex respectively break the first strand and the second strand of the nucleic acid vector, and the first strand and the second strand respectively comprise
  • the double-stranded portion located between the 3' ends of the two nicks contains the target nucleic acid sequence, which is called a target nucleic acid fragment containing the target nucleic acid sequence.
  • the first indexing primer is capable of passing the 3' end of the nucleic acid strand broken by the first target binding sequence and the first functional complex hybridization or annealing to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the nucleic acid strand to which the first target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand comprising the first guide binding sequence.
  • the second index primer is capable of passing the 3' end of the nucleic acid strand broken by the second target binding sequence and the second functional complex under conditions that allow nucleic acid hybridization or annealing hybridization or annealing to form a double-stranded structure, and the second tag sequence of the second tag primer is in a free state.
  • the nucleic acid strand to which the second target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand comprising the second leader binding sequence.
  • the nucleic acid strand to which the first target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand to which the second target binding sequence hybridizes or anneals.
  • the nucleic acid vector further comprises a first target sequence; wherein, under conditions that allow nucleic acid hybridization or annealing, the first index primer is capable of binding to the first target sequence through the first target binding sequence.
  • the target sequence is hybridized or annealed to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the first target sequence is on the opposite strand of the first lead binding sequence.
  • the first target sequence is located at the end of the first strand that is broken.
  • the 3' end of the nucleic acid strand containing the first target sequence can be annealed to the first index primer of the first target sequence.
  • the template is extended (preferably, forming the first lobe).
  • the nucleic acid vector also includes a second target sequence; wherein, under conditions that allow nucleic acid hybridization or annealing, the second index primer can hybridize or anneal to the second target sequence through the second target binding sequence to form double-stranded structure, and the second tag sequence of the second tag primer is in a free state.
  • the second target sequence is located on the opposite strand of the second lead binding sequence.
  • the second target sequence is located at the end of the broken second strand.
  • the 3' end of the nucleic acid strand containing the second target sequence can be anchored by a second indexing primer that anneals to the second target sequence.
  • the template is extended (preferably, forming the second lobes).
  • the nucleic acid strand comprising the first target sequence is on the opposite strand to the nucleic acid strand comprising the second target sequence.
  • the nucleic acid vector further comprises a restriction site between the first target sequence and the second target sequence.
  • the nucleic acid vector further comprises an exogenous gene between the first target sequence and the second target sequence.
  • system or kit further comprises:
  • a third nucleic acid editing system for double-strand breaking a third double-stranded target nucleic acid (9) A third nucleic acid editing system for double-strand breaking a third double-stranded target nucleic acid.
  • the third nucleic acid editing system is a site-specific nuclease technology, for example, ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly spaced short palindromic repeat)/Cas system.
  • ZFN zinc finger nuclease
  • TALEN transcription activator-like effector nuclease
  • CRISPR clustered regularly spaced short palindromic repeat
  • the third nucleic acid editing system is capable of breaking both strands of the third double-stranded target nucleic acid to form broken nucleotide fragments a1 and a2.
  • the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1 under conditions that allow nucleic acid hybridization or annealing.
  • the first tag sequence or its complementary sequence or the first flap can be combined with the cleaved nucleotide fragment at the end formed by the cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system a1 hybridizes or anneals.
  • the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3' end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the The 3' end or 3' portion is formed by the third nucleic acid editing system cleaving the third double-stranded target nucleic acid.
  • the fragmented nucleotide fragment a2 contains a target site homology arm having at least 85%, 86%, 87% homology with the donor homology arm. %, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity.
  • the target site homology arm is located upstream of the third double-stranded target nucleic acid break, and the donor homology arm is located upstream of the target nucleic acid sequence; or, the target site is located at the same
  • the source arm is located downstream of the break in the third double-stranded target nucleic acid, and the donor homology arm is located downstream of the nucleic acid sequence of interest.
  • the donor homology arms and the target site homology arms are each independently 100 to 300 bp, 300 to 500 bp, 500 to 1000 bp, 1000 to 2000 bp, 2000 to 5000 bp in length.
  • the sequence of the homology arm of the target site is selected from an exon sequence, an intron sequence, an intergenic sequence, a 3'UTR sequence, a 5'UTR sequence, a promoter sequence or a chromosomal sequence .
  • system or kit further comprises:
  • a third nucleic acid editing system for double-strand breaking a third double-stranded target nucleic acid (9) A third nucleic acid editing system for double-strand breaking a third double-stranded target nucleic acid.
  • the third nucleic acid editing system is a site-specific nuclease technology, for example, ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly spaced short palindromic repeat)/Cas system.
  • ZFN zinc finger nuclease
  • TALEN transcription activator-like effector nuclease
  • CRISPR clustered regularly spaced short palindromic repeat
  • the third nucleic acid editing system is capable of breaking both strands of the third double-stranded target nucleic acid to form broken nucleotide fragments a1 and a2.
  • the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment al under conditions that allow nucleic acid hybridization or annealing.
  • the first tag sequence or its complementary sequence or the first flap can be combined with the cleaved nucleotide fragment at the end formed by the cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system a1 hybridizes or anneals.
  • the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3' end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the The 3' end or 3' portion is formed by the third nucleic acid editing system cleaving the third double-stranded target nucleic acid.
  • the second tag sequence or its complement or the second flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a2 under conditions that allow nucleic acid hybridization or annealing.
  • the second tag sequence or its complementary sequence or the second valve can be combined with the cleaved nucleotide fragment at the end formed by the cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system a2 hybridization or annealing.
  • the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3' end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a2, and the The 3' end or 3' portion is formed by the third nucleic acid editing system cleaving the third double-stranded target nucleic acid.
  • the third nucleic acid editing system is a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas system.
  • the third nucleic acid editing system comprises: (i) a third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing A nucleic acid molecule encoding the nucleotide sequence of the third gRNA; wherein, the third gRNA can combine with the third Cas protein to form a third functional complex; the third functional complex can combine the third Both strands of the triple-double-stranded target nucleic acid are broken, forming broken nucleotide fragments a1 and a2.
  • the third Cas protein is selected from Cas proteins that cut DNA double strands, such as Cas9 protein.
  • the third gRNA has a sequence as shown in any one of SEQ ID NO: 11, 38, 54, 67, 80, 93, 106, 119 or 132.
  • the third gRNA has a sequence as shown in any one of SEQ ID NO: 11, 38, 54, 67, 80, 93, 106, 119 or 132.
  • the third gRNA when the gRNA is used to recognize the 3' URT region of the GAPDH site, the third gRNA has a sequence as shown in SEQ ID NO:11. In certain embodiments, when the third double-stranded target nucleic acid comprises a sequence as shown in SEQ ID NO: 145, the third gRNA has a sequence as shown in SEQ ID NO: 11.
  • the third gRNA when the gRNA is used to recognize the first intron of the human genome AAVS1 site, the third gRNA has a sequence as shown in SEQ ID NO:38. In certain embodiments, when the third double-stranded target nucleic acid comprises a sequence as shown in SEQ ID NO:146, the third gRNA has a sequence as shown in SEQ ID NO:38.
  • the gRNA is used to recognize the first intron of the genome Rosa26 site, and the third gRNA has a sequence as shown in SEQ ID NO:54.
  • the third gRNA has a sequence as shown in SEQ ID NO:54;
  • the gRNA is used to recognize the human genome CCR5 site, and the third gRNA has a sequence as shown in SEQ ID NO:67.
  • the third gRNA has a sequence as shown in SEQ ID NO: 67.
  • the gRNA is used to recognize the human genome TRAC site, and the third gRNA has a sequence as shown in SEQ ID NO:80.
  • the third gRNA has a sequence as shown in SEQ ID NO:80.
  • the gRNA is used to recognize the WAS-1 site, and the third gRNA has a sequence as shown in SEQ ID NO:93.
  • the third gRNA has a sequence as shown in SEQ ID NO:93.
  • the gRNA is used to recognize the WAS-3 site, and the third gRNA has a sequence as shown in SEQ ID NO:106.
  • the third gRNA has a sequence as shown in SEQ ID NO:106.
  • the gRNA is used to recognize the HBB site, and the third gRNA has a sequence as shown in SEQ ID NO:119.
  • the third gRNA has a sequence as shown in SEQ ID NO: 119.
  • the gRNA is used to recognize the IL2RG site, and the third gRNA has a sequence as shown in SEQ ID NO:132.
  • the third gRNA has a sequence as shown in SEQ ID NO: 132.
  • system or kit further comprises:
  • a fourth nucleic acid editing system for double-strand breaking a fourth double-stranded target nucleic acid (10) A fourth nucleic acid editing system for double-strand breaking a fourth double-stranded target nucleic acid.
  • the fourth nucleic acid editing system is a site-specific nuclease technology, for example, ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly spaced short palindromic repeat)/Cas system.
  • ZFN zinc finger nuclease
  • TALEN transcription activator-like effector nuclease
  • CRISPR clustered regularly spaced short palindromic repeat
  • the third nucleic acid editing system and the fourth nucleic acid editing system are selected from the same site-specific nuclease technology.
  • the fourth double-stranded target nucleic acid is identical to the third double-stranded target nucleic acid, and the third and fourth nucleic acid editing systems cleave the same double-stranded target nucleic acid at different positions.
  • stranded target nucleic acid forming fragmented nucleotide fragments a1, a2 and a3; wherein, prior to fragmentation, in the same double-stranded target nucleic acid, the nucleotide fragments a1, a2 and a3 are arranged in sequence (i.e., nucleoside Acid fragment a1 is connected to nucleotide fragment a3 through nucleotide fragment a2).
  • the third and fourth nucleic acid editing systems result in the separation of nucleotide fragments a1 and a2 and the separation of nucleotide fragments a2 and a3, respectively.
  • the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1 under conditions that allow nucleic acid hybridization or annealing.
  • the first tag sequence or its complementary sequence or the first flap can be combined with the cleaved nucleotide fragment at the end formed by the cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system a1 hybridizes or anneals.
  • the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3' end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the The 3' end or 3' portion is formed by the third nucleic acid editing system cleaving the third double-stranded target nucleic acid.
  • the second tag sequence or its complement or the second flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a3 under conditions that allow nucleic acid hybridization or annealing.
  • the second tag sequence or its complementary sequence or the second valve can be combined with the cleaved nucleotide fragment at the end formed by the cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system a3 hybridization or annealing.
  • the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3' end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a3, and the The 3' end or 3' portion is formed by the third nucleic acid editing system cleaving the third double-stranded target nucleic acid.
  • the fourth nucleic acid editing system is a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas system.
  • the fourth nucleic acid editing system comprises: (i) a fourth Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the fourth Cas protein, and (ii) a fourth gRNA or a nucleic acid molecule containing A nucleic acid molecule encoding the nucleotide sequence of the fourth gRNA; wherein, the fourth gRNA can combine with the fourth Cas protein to form a fourth functional complex; the fourth functional complex can combine the fourth Two strands of the quadruple-stranded target nucleic acid are broken, forming broken target nucleic acid fragments bl and b2.
  • the fourth Cas protein is selected from Cas proteins that cut DNA double strands, such as Cas9 protein.
  • the third nucleic acid editing system and the fourth nucleic acid editing system are CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas systems.
  • the third nucleic acid editing system is as defined above, and the fourth nucleic acid editing system is as defined above.
  • kits further comprise additional systems or components.
  • the additional components include one or more selected from the group consisting of:
  • One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional gRNAs or nucleosides encoding said additional gRNAs A nucleic acid molecule of an acid sequence, wherein the additional gRNA can bind to the Cas protein and form a functional complex.
  • the functional complex is capable of breaking both strands or one strand of a double-stranded target nucleic acid.
  • the Cas protein is capable of cleaving or breaking one or both strands of a double-stranded target nucleic acid.
  • the target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand to form a double-stranded structure, and the tag sequence does not Combined with the target nucleic acid fragment, it is in a free single-stranded state.
  • the additional DNA polymerase is selected from a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
  • the additional DNA polymerase is an RNA-dependent DNA polymerase, such as reverse transcriptase.
  • the additional systems include: one or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) A nucleic acid editing system for double-strand breaks in double-stranded target nucleic acids.
  • the nucleic acid editing system is a site-specific nuclease technology, for example, ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly spaced short loop Text repeats)/Cas system.
  • ZFN zinc finger nuclease
  • TALEN transcription activator-like effector nuclease
  • CRISPR clustered regularly spaced short loop Text repeats
  • the present application provides a fusion protein comprising a Cas protein and a template-dependent DNA polymerase, wherein the Cas protein can break a nucleic acid chain of a target nucleic acid.
  • the Cas protein can break a nucleic acid strand of the target nucleic acid and generate a nick.
  • the Cas protein is selected from Cas proteins that cut DNA single strands.
  • the Cas protein is selected from Cas9 protein, Cas12a protein, cas12b protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12h protein, cas12i protein, cas14 protein, Cas13a protein, Cas1 protein, Cas1B protein, Cas2 protein, Cas3 protein, Cas4 protein, Cas5 protein, Cas6 protein, Cas7 protein, Cas8 protein, Cas10 protein, Csy1 protein, Csy2 protein, Csy3 protein, Cse1 protein, Cse2 protein, Csc1 protein, Csc2 protein, Csa5 protein, Csn2 protein, Csm2 protein, Csm3 protein, Csm4 protein, Csm5 protein, Csm6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr5 protein, Cmr6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr
  • the Cas protein is a mutant of the Cas9 protein, such as a mutant of the Cas9 protein of Streptococcus pyogenes (S.pyogenes) (spCas9(H840A)).
  • the Cas protein has the amino acid sequence shown in SEQ ID NO:3.
  • the DNA polymerase is selected from a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
  • the DNA polymerase is an RNA-dependent DNA polymerase.
  • the DNA polymerase is reverse transcriptase, such as from Moloney murine leukemia virus human immunodeficiency virus (HIV), avian sarcoma-leukemia virus (ASLV), Rous sarcoma virus (RSV) , avian myeloblastosis virus (AMV), avian erythroblastosis virus helper virus, avian granulocytoma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, poultry Reverse transcriptase of oncovirus Y73 helper virus, Rous-associated virus and myeloblastosis-associated virus (MAV).
  • HCV Moloney murine leukemia virus human immunodeficiency virus
  • ASLV avian sarcoma-leukemia virus
  • RSV Rous sarcoma virus
  • AMV avian myeloblastosis virus
  • the DNA polymerase has the amino acid sequence shown in SEQ ID NO:7.
  • the Cas protein is covalently linked to the DNA polymerase via a linker or not.
  • the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO:35.
  • the Cas protein is optionally connected or fused to the N-terminus of the DNA polymerase by a linker; or, the Cas protein is optionally connected or fused to the DNA polymerase by a linker. C-terminal.
  • the fusion protein has the amino acid sequence shown in SEQ ID NO:8.
  • the present application provides a nucleic acid molecule comprising a polynucleotide encoding the aforementioned fusion protein.
  • the present application provides a vector comprising the aforementioned nucleic acid molecule.
  • the vector is an expression vector.
  • the vector is a eukaryotic expression vector.
  • the present application provides a host cell comprising the aforementioned nucleic acid molecule or the aforementioned vector.
  • the host cell is a prokaryotic cell, such as an Escherichia coli cell; or the host cell is a eukaryotic cell, such as a yeast cell, a fungal cell, a plant cell, or an animal cell.
  • the host cell is a mammalian cell, such as a human cell.
  • the present application provides a method for preparing the fusion protein as described above, which includes, (1) cultivating the host cell as described above under conditions that allow protein expression; and (2) isolating The fusion protein expressed by the host cell.
  • the application provides a complex comprising a first Cas protein and a template-dependent first DNA polymerase, wherein the first Cas protein has the ability to break a nucleic acid strand of a double-stranded target nucleic acid ability, and the first Cas protein complexes with the first DNA polymerase in a covalent or non-covalent manner.
  • the first Cas protein is capable of breaking one nucleic acid strand of a double-stranded target nucleic acid and generating a nick.
  • the first Cas protein is selected from Cas proteins that cut DNA single strands.
  • the first Cas protein is selected from Cas9 protein, Cas12a protein, cas12b protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12h protein, cas12i protein, cas14 protein, Cas13a protein , Cas1 protein, Cas1B protein, Cas2 protein, Cas3 protein, Cas4 protein, Cas5 protein, Cas6 protein, Cas7 protein, Cas8 protein, Cas10 protein, Csy1 protein, Csy2 protein, Csy3 protein, Cse1 protein, Cse2 protein, Csc1 protein, Csc2 protein, Csa5 protein, Csn2 protein, Csm2 protein, Csm3 protein, Csm4 protein, Csm5 protein, Csm6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr5 protein, Cmr6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, C
  • the first Cas protein is a mutant of the Cas9 protein, such as a mutant of the Cas9 protein of Streptococcus pyogenes (S.pyogenes) (spCas9(H840A)).
  • the first Cas protein has the amino acid sequence shown in SEQ ID NO:3.
  • the first DNA polymerase is selected from a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
  • the first DNA polymerase is an RNA-dependent DNA polymerase.
  • the first DNA polymerase is reverse transcriptase, for example from Moloney murine leukemia virus human immunodeficiency virus (HIV), avian sarcoma-leukemia virus (ASLV), Rous sarcoma virus ( RSV), avian myeloblastosis virus (AMV), avian erythroblastosis virus helper virus, avian granulocytoma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, Reverse transcriptase of avian sarcoma virus Y73 helper virus, Rous-associated virus and myeloblastosis-associated virus (MAV).
  • HCV Moloney murine leukemia virus human immunodeficiency virus
  • ASLV avian sarcoma-leukemia virus
  • RSV Rous sarcoma virus
  • AMV avian myeloblastosis
  • the first DNA polymerase has the amino acid sequence shown in SEQ ID NO:7.
  • the first Cas protein is covalently linked to the first DNA polymerase via a linker or not.
  • the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO:35.
  • the first Cas protein is fused to the first DNA polymerase through a peptide linker or not through a peptide linker to form a fusion protein.
  • the first Cas protein is optionally connected or fused to the N-terminus of the first DNA polymerase through a linker; or, the first Cas protein is optionally connected or fused to the first DNA polymerase through a linker. the C-terminus of the first DNA polymerase.
  • the first fusion protein has the amino acid sequence shown in SEQ ID NO:8.
  • the complex further comprises a first gRNA.
  • the first gRNA can be combined with the first Cas protein to form a first functional unit; the first functional unit can be combined with a nucleic acid strand in a double-stranded target nucleic acid (paragraph second strand), and break the other nucleic acid strand (first strand) in the double-stranded target nucleic acid.
  • the first gRNA contains a first guide sequence, and the first guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a double-stranded target nucleic acid under conditions that allow nucleic acid hybridization or annealing.
  • the length of the first guide sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the first gRNA also contains a first scaffold sequence, which can be recognized and bound by the first Cas protein, thereby forming a first functional unit.
  • the length of the first scaffold sequence is at least 20 nt, such as 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, or longer.
  • the first leader sequence is located upstream or 5' of the first scaffold sequence.
  • the complex or first functional unit is capable of breaking one strand of a double-stranded target nucleic acid after the first guide sequence binds to the double-stranded target nucleic acid.
  • the complex further comprises a double-stranded target nucleic acid.
  • the double-stranded target nucleic acid contains the first PAM sequence recognized by the first Cas protein and the first guide binding sequence capable of hybridizing or annealing with the first guide sequence, whereby the The first functional unit binds the double-stranded target nucleic acid through the first guide binding sequence and the first PAM sequence.
  • the complex further comprises a first index primer that hybridizes or anneals to the double-stranded target nucleic acid; wherein the first index primer contains a first target binding sequence capable of binding to the double-stranded target nucleic acid The strand target nucleic acid hybridizes or anneals.
  • the tagging primer contains a first tagging sequence and a first target binding sequence, the first tagging sequence is located upstream or 5' of the first target binding sequence; and, after allowing nucleic acid hybridization or annealing conditions, the first target binding sequence is capable of hybridizing or annealing to the double stranded target nucleic acid.
  • the first target binding sequence is capable of hybridizing or annealing to the 3' end of the nucleic acid strand of the double-stranded target nucleic acid that is broken by the first functional unit, forming a double-stranded structure.
  • said 3' end is formed by said first functional unit cleaving one nucleic acid strand of said double stranded target nucleic acid.
  • the first tag sequence is not combined with the broken nucleic acid strand, and is in a free single-stranded state.
  • the length of the first target binding sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50 -100nt, 100-200nt, or longer.
  • the length of the first tag sequence is at least 4nt, such as 4-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the first index primer binds to the fragmented nucleic acid strand through the first target binding sequence.
  • the first DNA polymerase binds to the fragmented nucleic acid strand and the first indexing primer.
  • the first indexing primer is single-stranded deoxyribonucleic acid or single-stranded ribonucleic acid.
  • the first indexing primer is a single-stranded ribonucleic acid
  • the first DNA polymerase is an RNA-dependent DNA polymerase
  • the first indexing primer is a single-stranded deoxyribonucleic acid
  • the first DNA polymerase is a DNA-dependent DNA polymerase
  • the fragmented nucleic acid strand is extended by the first DNA polymerase using the first indexing primer as a template to form a first valve.
  • the nucleic acid strand bound by the first gRNA is different from the nucleic acid strand bound by the first index primer. In certain embodiments, the nucleic acid strand bound by the first gRNA is the opposite strand to the nucleic acid strand bound by the first index primer.
  • the first index primer is attached to the first gRNA.
  • the first index primer is covalently linked to the first gRNA with or without a linker.
  • the first index primer is optionally linked to the 3' end of the first gRNA by a linker.
  • the linker is a nucleic acid linker (eg, ribonucleic acid linker or deoxyribonucleic acid linker).
  • the first indexing primer is a single-stranded ribonucleic acid, and it is connected to the 3' end of the first gRNA through a ribonucleic acid adapter or not through an ribonucleic acid adapter to form a first PegRNA.
  • the complex also includes a second Cas protein and a second gRNA, wherein the second Cas protein has the ability to break a nucleic acid strand of a double-stranded target nucleic acid, and the second gRNA can be combined with The second Cas protein binds to form a second functional unit; the second functional unit can bind to a double-stranded target nucleic acid and break one of its strands.
  • the second Cas protein is the same or different from the first Cas protein. In certain embodiments, the second Cas protein is identical to the first Cas protein.
  • the second Cas protein can break a nucleic acid strand of a double-stranded target nucleic acid and generate a nick.
  • the second Cas protein is selected from Cas proteins that cut DNA single strands.
  • the second Cas protein is selected from Cas9 protein, Cas12a protein, cas12b protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12h protein, cas12i protein, cas14 protein, Cas1 protein , Cas1B protein, Cas2 protein, Cas3 protein, Cas4 protein, Cas5 protein, Cas6 protein, Cas7 protein, Cas8 protein, Cas10 protein, Csy1 protein, Csy2 protein, Csy3 protein, Cse1 protein, Cse2 protein, Csc1 protein, Csc2 protein, Csa5 protein, Csn2 protein, Csm2 protein, Csm3 protein, Csm4 protein, Csm5 protein, Csm6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr5 protein, Cmr6 protein, Cmr1 protein, Cmr3 protein, Cmr4 protein, Cmr5 protein, C
  • the second Cas protein is a mutant of the Cas9 protein, such as a mutant of the Cas9 protein of Streptococcus pyogenes (S.pyogenes) (spCas9(H840A)).
  • the second Cas protein has the amino acid sequence shown in SEQ ID NO:3.
  • the second gRNA contains a second guide sequence, and the second guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a double stranded target nucleic acid under conditions that allow nucleic acid hybridization or annealing.
  • the second leader sequence is different from the first leader sequence.
  • the strand of nucleic acid to which the first guide sequence binds is different than the strand of nucleic acid to which the second guide sequence binds.
  • the strand of nucleic acid to which the first guide sequence binds is the opposite strand to the strand of nucleic acid to which the second guide sequence binds.
  • the second functional unit is identical to the double-stranded target nucleic acid to which the first functional unit binds, the double-stranded target nucleic acid comprises a first strand and a second strand, and the first functional unit is in After the first guide sequence is combined with the first strand, it can break the first strand, and the second functional unit can break the first strand after the second guide sequence is combined with the first strand.
  • the second functional unit is disrupted at a different position on the opposite strand than the first functional unit.
  • the length of the second guide sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the second gRNA also contains a second scaffold sequence, which can be recognized and bound by the second Cas protein, thereby forming a second functional unit.
  • the second scaffold sequence is the same as or different from the first scaffold sequence. In certain embodiments, the second scaffold sequence is identical to the first scaffold sequence.
  • the length of the second scaffold sequence is at least 20 nt, such as 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt, 100-200 nt, or longer.
  • the second leader sequence is located upstream or 5' to the second scaffold sequence.
  • the double-stranded target nucleic acid contains the second PAM sequence recognized by the second Cas protein and the second guide binding sequence capable of hybridizing or annealing with the second guide sequence, whereby the The second functional unit binds the double-stranded target nucleic acid through the second guide binding sequence and the second PAM sequence.
  • the complex further comprises a template-dependent second DNA polymerase, and the second DNA polymerase complexes with the second Cas protein in a covalent or non-covalent manner.
  • the second DNA polymerase is selected from a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
  • the second DNA polymerase is an RNA-dependent DNA polymerase.
  • the second DNA polymerase is reverse transcriptase, for example from Moloney murine leukemia virus human immunodeficiency virus (HIV), avian sarcoma-leukemia virus (ASLV), Rous sarcoma virus ( RSV), avian myeloblastosis virus (AMV), avian erythroblastosis virus helper virus, avian granulocytoma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, Reverse transcriptase of avian sarcoma virus Y73 helper virus, Rous-associated virus and myeloblastosis-associated virus (MAV).
  • HCV Moloney murine leukemia virus human immunodeficiency virus
  • ASLV avian sarcoma-leukemia virus
  • RSV Rous sarcoma virus
  • AMV avian myeloblastosis
  • the second DNA polymerase has the amino acid sequence shown in SEQ ID NO:7.
  • the second DNA polymerase is the same or different than the first DNA polymerase. In certain embodiments, the second DNA polymerase is the same as the first DNA polymerase.
  • the second Cas protein is covalently linked to the second DNA polymerase via a linker or not.
  • the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO:35.
  • the second Cas protein is fused to the second DNA polymerase through a peptide linker or not through a peptide linker to form a second fusion protein.
  • the second Cas protein is optionally connected or fused to the N-terminus of the second DNA polymerase through a linker; or, the second Cas protein is optionally connected or fused to the N-terminal of the second DNA polymerase through a linker. the C-terminus of the second DNA polymerase.
  • the second fusion protein has the amino acid sequence shown in SEQ ID NO:8.
  • the complex further comprises a second index primer that hybridizes or anneals to the double-stranded target nucleic acid; wherein the second index primer contains a second target binding sequence capable of binding to the double-stranded target nucleic acid The strand target nucleic acid hybridizes or anneals.
  • the tagging primer contains a second tagging sequence and a second target binding sequence, the second tagging sequence is located upstream or 5' of the second target binding sequence; and, after allowing nucleic acid hybridization
  • the second target binding sequence is capable of hybridizing or annealing to the double stranded target nucleic acid under conditions of annealing or annealing.
  • the second target binding sequence is capable of hybridizing or annealing to the 3' end of the nucleic acid strand of the double stranded target nucleic acid that is broken by the second functional unit, forming a double stranded structure.
  • said 3' end is formed by said second functional unit cleaving one strand of said double stranded target nucleic acid.
  • the second tag sequence is not combined with the broken nucleic acid strand, and is in a free single-stranded state.
  • the length of the second target binding sequence is at least 5nt, such as 5-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50 -100nt, 100-200nt, or longer.
  • the second target binding sequence is different from the first target binding sequence.
  • the nucleic acid strand to which the second target binding sequence binds is different than the nucleic acid strand to which the first target binding sequence binds.
  • the nucleic acid strand to which the second target binding sequence binds is the opposite strand to the nucleic acid strand to which the first target binding sequence binds.
  • the length of the second tag sequence is at least 4nt, such as 4-10nt, 10-15nt, 15-20nt, 20-25nt, 25-30nt, 30-40nt, 40-50nt, 50- 100nt, 100-200nt, or longer.
  • the second tag sequence is the same as or different from the first tag sequence. In certain embodiments, the second tag sequence is different from the first tag sequence.
  • the second index primer binds to the fragmented nucleic acid strand through the second target binding sequence.
  • the second DNA polymerase binds to the fragmented nucleic acid strand and to the second indexing primer.
  • the second indexing primer is single-stranded deoxyribonucleic acid or single-stranded ribonucleic acid.
  • the second indexing primer is single-stranded ribonucleic acid
  • the second DNA polymerase is RNA-dependent DNA polymerase
  • the second indexing primer is single-stranded deoxyribonucleic acid
  • the second DNA polymerase is a DNA-dependent DNA polymerase
  • the fragmented target nucleic acid fragment is extended by the second DNA polymerase using the second indexing primer as a template to form a second valve.
  • the nucleic acid strand to which the second gRNA binds is different than the nucleic acid strand to which the second indexing primer binds. In certain embodiments, the nucleic acid strand bound by the second gRNA is the opposite strand to the nucleic acid strand bound by the second index primer.
  • the second index primer is attached to the second gRNA.
  • the second index primer is covalently linked to the second gRNA with or without a linker.
  • the second index primer is optionally linked to the 3' end of the second gRNA by a linker.
  • the linker is a nucleic acid linker (eg, ribonucleic acid linker or deoxyribonucleic acid linker).
  • the second labeling primer is a single-stranded ribonucleic acid, and it is connected to the 3' end of the second gRNA through a ribonucleic acid adapter or not through an ribonucleic acid adapter to form a second PegRNA.
  • the first and second functional units bind a double-stranded target nucleic acid in a predetermined positional relationship.
  • the second guide sequence binds the same nucleic acid strand as the first target binding sequence; and/or, the first guide sequence binds the same nucleic acid strand as the second target binding sequence .
  • the binding position of the second guide sequence is located at the upstream or 5' end of the binding position of the first target binding sequence; and/or, the binding position of the first guide sequence is located at the upstream or 5' of the binding position of the second target binding sequence.
  • the binding position of the second guide sequence is located at the downstream or 3' end of the binding position of the first target binding sequence; and/or, the binding position of the first guide sequence is located at the Downstream or 3' of the binding location of the second target binding sequence.
  • the double-stranded target nucleic acid is selected from, but not limited to, genomic DNA and nucleic acid carrier DNA.
  • the present application provides a nucleic acid vector, the nucleic acid vector (for example, a donor nucleic acid vector) comprising the first PAM sequence recognized by the first Cas protein as described above.
  • the nucleic acid vector for example, a donor nucleic acid vector
  • the nucleic acid vector further comprises a donor homology arm.
  • the nucleic acid vector is double-stranded.
  • the nucleic acid vector is a circular double-stranded vector.
  • the nucleic acid vector comprises a first leader binding sequence (eg, the complement of the first leader sequence) capable of hybridizing or annealing to the first leader sequence.
  • a first leader binding sequence eg, the complement of the first leader sequence
  • the first functional complex is capable of cleaving a nucleic acid strand of the nucleic acid vector through the first guide binding sequence and the first PAM sequence.
  • the nucleic acid vector further comprises a nucleic acid sequence of interest.
  • the target nucleic acid sequence is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site in the genome.
  • the first PAM sequence and the donor homology arms are respectively located on both sides of the target nucleic acid sequence.
  • the first primer binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
  • the first functional complex breaks the first strand of the nucleic acid carrier, and the first strand comprises a nick generated by the break, located at the 3' end of the nick and the donor
  • the double-stranded portion between the homology arms contains the nucleic acid sequence of interest, and is referred to as a target nucleic acid fragment containing the nucleic acid sequence of interest.
  • the first indexing primer is capable of passing the 3' end of the nucleic acid strand broken by the first target binding sequence and the first functional complex hybridization or annealing to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the nucleic acid strand to which the first target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand comprising the first guide binding sequence.
  • the nucleic acid vector further comprises a first target sequence; wherein, under conditions that allow nucleic acid hybridization or annealing, the first index primer is capable of binding to the first target sequence through the first target binding sequence.
  • the target sequence is hybridized or annealed to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the first target sequence is on the opposite strand of the first lead binding sequence.
  • the first target sequence is located at the end of the first strand that is broken.
  • the 3' end of the nucleic acid strand containing the first target sequence can be annealed to the first index primer of the first target sequence.
  • the template is extended (preferably, forming the first lobe).
  • the nucleic acid vector further comprises a restriction site between the first target sequence and the donor homology arm.
  • the nucleic acid vector further comprises an exogenous gene between the first target sequence and the donor homology arms.
  • the present application provides a kit comprising the nucleic acid vector described in the tenth aspect, and one or more components in the system or kit of the first aspect (for example, the first Cas protein or nucleic acid molecule A1 containing a nucleotide sequence encoding the first Cas protein, template-dependent first DNA polymerase or nucleic acid molecule B1 containing a nucleotide sequence encoding the first DNA polymerase, The first gRNA or the nucleic acid molecule C1 containing the nucleotide sequence encoding the first gRNA, the first index primer or the nucleic acid molecule D1 containing the nucleotide sequence encoding the first index primer.
  • the first Cas protein or nucleic acid molecule A1 containing a nucleotide sequence encoding the first Cas protein template-dependent first DNA polymerase or nucleic acid molecule B1 containing a nucleotide sequence encoding the first DNA polymerase
  • the kit comprises the following 4 components:
  • the four components are contained in one or more (eg, 2, 3, 4) vectors.
  • the kit comprises the following vectors:
  • the first vector comprising the nucleic acid molecule A1 of the nucleotide sequence encoding the first Cas protein and the nucleic acid molecule B1 encoding the nucleotide sequence of the first DNA polymerase;
  • a second carrier comprising nucleic acid molecule C1 encoding the nucleotide sequence of the first gRNA and nucleic acid molecule D1 comprising the nucleotide sequence encoding the first tag primer;
  • the kit also includes one or more components of the aforementioned system or the third nucleic acid editing system described in the kit (for example, (i) a third Cas protein or a protein containing an encoding The nucleic acid molecule of the nucleotide sequence of the third Cas protein, and (ii) the third gRNA or the nucleic acid molecule containing the nucleotide sequence encoding the third gRNA).
  • a third Cas protein or a protein containing an encoding The nucleic acid molecule of the nucleotide sequence of the third Cas protein and (ii) the third gRNA or the nucleic acid molecule containing the nucleotide sequence encoding the third gRNA).
  • the present application provides a nucleic acid vector, the nucleic acid vector (for example, a donor nucleic acid vector) comprising the first PAM sequence recognized by the first Cas protein as described above.
  • the nucleic acid vector for example, a donor nucleic acid vector
  • the nucleic acid vector further comprises a second PAM sequence recognized by the second Cas protein as described above.
  • the nucleic acid vector is double-stranded.
  • the nucleic acid vector is a circular double-stranded vector.
  • the nucleic acid vector comprises a first guide binding sequence capable of hybridizing or annealing to the first guide sequence (for example, a complementary sequence of the first guide sequence), and/or, capable of hybridizing to the first guide sequence A second leader binding sequence (eg, the complement of the second leader sequence) that hybridizes or anneals to the second leader sequence.
  • the nucleic acid vector further comprises a restriction enzyme cutting site between the first primer binding sequence and the second primer binding sequence.
  • the first primer binding sequence and the second primer binding sequence are located on opposite strands of the nucleic acid vector.
  • the first functional complex can break a nucleic acid strand (first strand) of the nucleic acid carrier through the first guide binding sequence and the first PAM sequence; and/or , the second functional complex can break another nucleic acid strand (second strand) of the nucleic acid carrier through the second guide binding sequence and the second PAM sequence.
  • the nucleic acid vector further comprises a nucleic acid sequence of interest.
  • the target nucleic acid sequence is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site in the genome.
  • the first PAM sequence and the second PAM sequence are respectively located on both sides of the target nucleic acid sequence.
  • the first primer binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
  • the second primer binding sequence is located between the nucleic acid sequence of interest and the second PAM sequence.
  • the first functional complex and the second functional complex respectively break the first strand and the second strand of the nucleic acid vector, and the first strand and the second strand respectively comprise
  • the double-stranded portion located between the 3' ends of the two nicks contains the target nucleic acid sequence, which is called a target nucleic acid fragment containing the target nucleic acid sequence.
  • the first indexing primer is capable of passing the 3' end of the nucleic acid strand broken by the first target binding sequence and the first functional complex hybridization or annealing to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the nucleic acid strand to which the first target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand comprising the first guide binding sequence.
  • the second index primer is capable of passing the 3' end of the nucleic acid strand broken by the second target binding sequence and the second functional complex under conditions that allow nucleic acid hybridization or annealing hybridization or annealing to form a double-stranded structure, and the second tag sequence of the second tag primer is in a free state.
  • the nucleic acid strand to which the second target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand comprising the second leader binding sequence.
  • the nucleic acid strand to which the first target binding sequence hybridizes or anneals is the opposite strand to the nucleic acid strand to which the second target binding sequence hybridizes or anneals.
  • the nucleic acid vector further comprises a first target sequence; wherein, under conditions that allow nucleic acid hybridization or annealing, the first index primer is capable of binding to the first target sequence through the first target binding sequence.
  • the target sequence is hybridized or annealed to form a double-stranded structure, and the first tag sequence of the first tag primer is in a free state.
  • the first target sequence is on the opposite strand of the first lead binding sequence.
  • the first target sequence is located at the end of the first strand that is broken.
  • the 3' end of the nucleic acid strand containing the first target sequence can be annealed to the first index primer of the first target sequence.
  • the template is extended (preferably, forming the first lobe).
  • the nucleic acid vector also includes a second target sequence; wherein, under conditions that allow nucleic acid hybridization or annealing, the second index primer can hybridize or anneal to the second target sequence through the second target binding sequence to form double-stranded structure, and the second tag sequence of the second tag primer is in a free state.
  • the second target sequence is located on the opposite strand of the second lead binding sequence.
  • the second target sequence is located at the end of the broken second strand.
  • the 3' end of the nucleic acid strand containing the second target sequence can be anchored by a second indexing primer that anneals to the second target sequence.
  • the template is extended (preferably, forming the second lobes).
  • the nucleic acid strand comprising the first target sequence is on the opposite strand to the nucleic acid strand comprising the second target sequence.
  • the nucleic acid vector further comprises a restriction site between the first target sequence and the second target sequence.
  • the nucleic acid vector further comprises an exogenous gene between the first target sequence and the second target sequence.
  • the present application provides a kit comprising the nucleic acid vector described in the twelfth aspect, one or more components of the system or kit described in the first aspect (for example, The first Cas protein or the nucleic acid molecule A1 containing the nucleotide sequence encoding the first Cas protein, the template-dependent first DNA polymerase or the nucleic acid molecule containing the nucleotide sequence encoding the first DNA polymerase B1, the first gRNA or the nucleic acid molecule C1 containing the nucleotide sequence encoding the first gRNA, the first label primer or the nucleic acid molecule D1) containing the nucleotide sequence encoding the first label primer, and One or more components in the system or kit described on the one hand (for example, the second gRNA or the nucleic acid molecule C2 containing the nucleotide sequence encoding the second gRNA, the second Cas protein or containing The nucleic acid molecule A2 encoding
  • the kit comprises the following 8 components:
  • the eight components are contained in one or more (eg, 2, 3, 4, 5, 6, 7, 8) vectors.
  • the kit comprises the following vectors:
  • the first vector comprising the nucleic acid molecule A1 of the nucleotide sequence encoding the first Cas protein and the nucleic acid molecule B1 encoding the nucleotide sequence of the first DNA polymerase;
  • a second carrier comprising nucleic acid molecule C1 encoding the nucleotide sequence of the first gRNA and nucleic acid molecule D1 comprising the nucleotide sequence encoding the first label primer;
  • a fourth vector comprising nucleic acid molecule D2 encoding the nucleotide sequence of the second label primer and nucleic acid molecule B2 encoding the nucleotide sequence of the second DNA polymerase.
  • the kit also includes one or more components of the aforementioned system or the third nucleic acid editing system described in the kit (for example, (i) a third Cas protein or a protein containing an encoding The nucleic acid molecule of the nucleotide sequence of the third Cas protein, and (ii) the third gRNA or the nucleic acid molecule containing the nucleotide sequence encoding the third gRNA).
  • a third Cas protein or a protein containing an encoding The nucleic acid molecule of the nucleotide sequence of the third Cas protein and (ii) the third gRNA or the nucleic acid molecule containing the nucleotide sequence encoding the third gRNA).
  • the present application provides a method for breaking one nucleic acid strand of a double-stranded target nucleic acid and adding a flap at the 3' end of the nick, wherein the method comprises, using the aforementioned system or kit.
  • the method comprises the steps of:
  • step ii in step i:
  • the first Cas protein and the first gRNA combine to form a first functional complex, and the first functional complex breaks a nucleic acid strand of the double-stranded target nucleic acid;
  • said first index primer hybridizes or anneals to the 3' end of said fragmented nucleic acid strand by said first target binding sequence
  • the first DNA polymerase uses the first index primer annealed to the broken nucleic acid chain as a template to extend the broken nucleic acid chain to form a first valve.
  • the methods are performed intracellularly.
  • step i the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1 and the first The labeling primer or the nucleic acid molecule D1 is delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase and the first labeling primer in the cell.
  • step i said nucleic acid molecule A1, said nucleic acid molecule B1, said first gRNA or nucleic acid molecule C1 and said first labeling primer or nucleic acid molecule D1 are delivered into a cell, To provide the first Cas protein, the first gRNA, the first DNA polymerase and the first labeling primer in the cell.
  • the nucleic acid molecules A1, B1, C1 and D1 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase in the cell and the first index primer.
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated first Cas protein and a first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered Into the cell, and express in the cell, to provide the first Cas protein and the first DNA polymerase in the cell.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (eg, a eukaryotic expression vector). In some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing the first PegRNA containing the first gRNA and the first tagging primer in the cell.
  • the first PegRNA is delivered into the cell to provide the first gRNA and the first tagging primer in the cell, or, the first PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the first PegRNA is transcribed in the cell, so as to provide the first gRNA and the first tagging primer in the cell.
  • step i the nucleic acid molecule capable of expressing the isolated first Cas protein and the first DNA polymerase or the nucleic acid molecule containing the nucleotide sequence encoding the first fusion protein, And the nucleic acid molecule containing the nucleotide sequence encoding the first PegRNA is delivered into the cell, and is transcribed and expressed in the cell, thereby providing the first Cas protein, the first gRNA, and the first DNA polymer in the cell Enzyme and first indexing primer.
  • step i the double-stranded target nucleic acid or a nucleic acid molecule T containing the double-stranded target nucleic acid is delivered into a cell to provide the double-stranded target nucleic acid within the cell.
  • the first Cas protein, the first gRNA, the first DNA polymerase or the first labeling primer are as defined above.
  • the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein.
  • the first functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA, and a Nucleic acid strand breaks.
  • the present application provides a method for separately breaking two nucleic acid strands of a double-stranded target nucleic acid, and respectively cutting the 3' ends of the two nicks generated by the breakage in the two nucleic acid strands.
  • Adding a valve process wherein the method comprises using the system or kit as described above; wherein the first double-stranded target nucleic acid is the same as the second double-stranded target nucleic acid.
  • the method is used to cleave the two nucleic acid strands of a double-stranded target nucleic acid at different positions, respectively.
  • the methods are performed extracellularly or intracellularly.
  • the method comprises the steps of:
  • step ii in step i:
  • the first Cas protein and the first gRNA combine to form a first functional complex
  • the second Cas protein and the second gRNA combine to form a second functional complex
  • the first and second The functional complex respectively breaks the first strand and the second strand of the double-stranded target nucleic acid, and the first strand and the second strand respectively comprise a nick generated by the break, located between the 3' ends of the two nicks
  • the double-stranded portion of is referred to as target nucleic acid fragment F1; and,
  • the first index primer hybridizes or anneals to the 3' end of a nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the first target binding sequence; and, the first The two-label primer hybridizes or anneals to the 3' end of the other nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the second target binding sequence; and,
  • the first DNA polymerase and the second DNA polymerase respectively use the first index primer and the second index primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that the first strand and the second strand are composed of
  • the 3' ends generated by the break are respectively extended to form the first valve process and the second valve process, forming a double-stranded part with the first valve process and the second valve process, which is called the target nucleic acid fragment F2.
  • the methods are performed intracellularly.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first labeling primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, and the second label primer or nucleic acid molecule D2 are delivered into cells to provide the first Cas protein, the first gRNA, the second gRNA in the cell A DNA polymerase, a first labeling primer, a second Cas protein, a second gRNA, a second DNA polymerase and a second labeling primer.
  • the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, and D2 are delivered into cells to provide the first Cas protein, the second Cas protein, and the A gRNA, a first DNA polymerase, a first label primer, a second Cas protein, a second gRNA, a second DNA polymerase and a second label primer.
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated first Cas protein and a first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered Into the cell, and express in the cell, to provide the first Cas protein and the first DNA polymerase in the cell.
  • the nucleic acid molecule A2 and nucleic acid molecule B2 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A2 and the nucleic acid molecule B2 can express the isolated second Cas protein and the second DNA polymerase in cells, or can express the isolated second Cas protein and the second DNA polymerase. A second fusion protein of the second DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated second Cas protein and a second DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the second fusion protein is delivered Into the cell, and express in the cell, to provide the second Cas protein and the second DNA polymerase in the cell.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (eg, a eukaryotic expression vector). In some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing the first PegRNA containing the first gRNA and the first tagging primer in the cell.
  • the first PegRNA is delivered into the cell to provide the first gRNA and the first tagging primer in the cell, or, the first PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the first PegRNA is transcribed in the cell, so as to provide the first gRNA and the first tagging primer in the cell.
  • the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA containing the second gRNA and the second tagging primer in the cell.
  • the second PegRNA is delivered into the cell to provide the second gRNA and the second tagging primer in the cell, or alternatively, the second PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the second PegRNA is transcribed in the cell, so as to provide the second gRNA and the second tagging primer in the cell.
  • step i the double-stranded target nucleic acid or a nucleic acid molecule T containing the double-stranded target nucleic acid is delivered into a cell to provide the double-stranded target nucleic acid within the cell.
  • the first Cas protein, the first gRNA, the first DNA polymerase or the first labeling primer are as defined above.
  • the second Cas protein, the second gRNA, the second DNA polymerase or the second labeling primer are as defined above.
  • the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by the first Cas protein and a second PAM sequence recognized by the second Cas protein.
  • the first functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA, and a strand break; and, the second functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA, and breaks the other strand thereof.
  • the second Cas protein is identical to the first Cas protein, and the second DNA polymerase is identical to the first DNA polymerase; wherein, the first Cas protein is identical to the first DNA polymerase
  • the first and second gRNAs respectively form first and second functional complexes, and the first DNA polymerase respectively uses the first index primer and the second index primer annealed to the target nucleic acid fragment F1 as a template performing an elongation reaction so that the 3' ends of the first strand and the second strand resulting from the break are respectively extended to form a first valve process and a second valve process, forming a double-stranded portion having the first valve process and the second valve process, Referred to as target nucleic acid fragment F2.
  • the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first The labeling primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, and the second labeling primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, first index primer, second gRNA and second index primer.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first labeling primer or nucleic acid molecule D1, the second gRNA Or the nucleic acid molecule C2 and the second labeling primer or the nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first labeling primer, the second gRNA in the cell and a second index primer.
  • the nucleic acid molecules A1, B1, C1, D1, C2, and D2 are delivered into cells to provide the first Cas protein, the first gRNA, the first DNA polymerase, first index primer, second gRNA and second index primer.
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated first Cas protein and a first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered Into the cell, and express in the cell, to provide the first Cas protein and the first DNA polymerase in the cell.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (eg, a eukaryotic expression vector). In some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing the first PegRNA containing the first gRNA and the first tagging primer in the cell.
  • the first PegRNA is delivered into the cell to provide the first gRNA and the first tagging primer in the cell, or, the first PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the first PegRNA is transcribed in the cell, so as to provide the first gRNA and the first tagging primer in the cell.
  • the nucleic acid molecule C2 and nucleic acid molecule D2 are contained in the same expression vector (eg, a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA containing the second gRNA and the second tagging primer in the cell.
  • the second PegRNA is delivered into the cell to provide the second gRNA and the second tagging primer in the cell, or alternatively, the second PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the second PegRNA is transcribed in the cell, so as to provide the second gRNA and the second tagging primer in the cell.
  • step i the nucleic acid molecule capable of expressing the isolated first Cas protein and the first DNA polymerase or the nucleic acid molecule containing the nucleotide sequence encoding the first fusion protein
  • the nucleic acid molecule containing the nucleotide sequence encoding the first PegRNA and the nucleic acid molecule containing the nucleotide sequence encoding the second PegRNA are delivered into the cell, and are transcribed and expressed in the cell, thereby providing The first Cas protein, the first gRNA, the first DNA polymerase, the first label primer, the second gRNA and the second label primer.
  • the present application provides a method for inserting a target nucleic acid fragment into a nucleic acid molecule of interest; wherein, the method includes using the system or kit described in the eleventh aspect; wherein, the The first double-stranded target nucleic acid is identical to the second double-stranded target nucleic acid for providing fragments of the target nucleic acid that are located in the first strand of the double-stranded target nucleic acid resulting from a break between the 3' end and the 3' end of the second strand produced by the break; and, the third double-stranded target nucleic acid is a nucleic acid molecule of interest.
  • the method comprises:
  • the first strand and the second strand of the first double-stranded target nucleic acid are respectively broken, and the first strand and the second strand respectively comprise a nick generated by the break, and are located at the above-mentioned
  • the double-stranded part between the 3' ends of the two cuts is called the target nucleic acid fragment F1; the first valve process and the second valve process are added to the above two 3' ends respectively to form the first valve process and the second valve process
  • the double-stranded portion of the overhang is called target nucleic acid fragment F2.
  • the methods are performed extracellularly or intracellularly.
  • the step a is performed outside or within the cell; and the steps b and c are performed within the cell .
  • the method comprises the steps of:
  • step ii in step i:
  • the first Cas protein is combined with the first gRNA to form a first functional complex
  • the second Cas protein is combined with the second gRNA to form a second functional complex
  • the first and second functional complexes respectively break the first strand and the second strand of the double-stranded target nucleic acid, and the first strand and the second strand respectively comprise nicks generated by the breakage, located at the above two
  • the double-stranded portion between the 3' ends of the nick is called target nucleic acid fragment F1
  • the third nucleic acid editing system fragments the nucleic acid molecule of interest to form fragmented nucleotide fragments a1 and a2;
  • the first index primer hybridizes or anneals to the 3' end of a nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the first target binding sequence; and, the first The two-label primer hybridizes or anneals to the 3' end of the other nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the second target binding sequence; and,
  • the first DNA polymerase and the second DNA polymerase respectively use the first index primer and the second index primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that the first strand and the second strand
  • the 3' ends generated by the break are respectively extended to form the first valve process and the second valve process, forming a double-stranded part with the first valve process and the second valve process, which is called the target nucleic acid fragment F2;
  • the first The valve and the second valve are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a2, respectively; and,
  • the target nucleic acid fragment F2 is hybridized or annealed to the nucleotide fragments a1 and a2 through the first valve and the second valve respectively, and then inserted or connected between the nucleotide fragments a1 and a2, thereby, the Target nucleic acid fragments are inserted into the nucleic acid molecule of interest.
  • the first valve can hybridize or anneal to the 3' end or 3' part of a nucleic acid strand of the nucleotide fragment a1, and the 3' end or 3' part is due to
  • the third nucleic acid editing system is formed by cleaving the nucleic acid molecule of interest.
  • the complementary sequence of the first tag sequence or the first valve can hybridize or anneal to the 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the nucleotide There is a first spacer region between the 3' part of fragment a1 and the break end formed by the third double-stranded target nucleic acid.
  • the length of the first spacer region is 1 nt-200 nt, such as 1-10 nt, 10-20 nt, 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt or 100-200 nt.
  • the second valve can hybridize or anneal to the 3' end or 3' part of a nucleic acid strand of the nucleotide fragment a2, and the 3' end or 3' part is due to
  • the third nucleic acid editing system is formed by cleaving the nucleic acid molecule of interest.
  • the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a2, and the nucleotide There is a second spacer region between the 3' part of fragment a2 and the break end formed by the third double-stranded target nucleic acid.
  • the length of the second spacer region is 1 nt-200 nt, such as 1-10 nt, 10-20 nt, 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt or 100-200 nt.
  • the methods are performed intracellularly.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first labeling primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second label primer or nucleic acid molecule D2, and nucleic acid molecule A3 are delivered into cells to provide the first Cas protein, the second in the cell A gRNA, a first DNA polymerase, a first labeling primer, a second Cas protein, a second gRNA, a second DNA polymerase, a second labeling primer, and a third nucleic acid editing system.
  • the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2 and A3 are delivered into cells to provide the first Cas protein in the cells , the first gRNA, the first DNA polymerase, the first index primer, the second Cas protein, the second gRNA, the second DNA polymerase, the second index primer, and the third nucleic acid editing system.
  • step i the double-stranded target nucleic acid or a nucleic acid molecule T containing the double-stranded target nucleic acid is delivered into a cell to provide the double-stranded target nucleic acid within the cell.
  • the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by the first Cas protein and a second PAM sequence recognized by the second Cas protein.
  • the first functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA, and a strand break; and, the second functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA, and breaks the other strand thereof.
  • the nucleic acid molecule of interest is genomic DNA of the cell.
  • the first Cas protein, the first gRNA, the first DNA polymerase or the first labeling primer are as defined above.
  • the second Cas protein, the second gRNA, the second DNA polymerase or the second labeling primer are as defined above.
  • said third nucleic acid editing system is as defined above.
  • the third nucleic acid editing system is as defined above, and the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein.
  • the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA, and fragments it.
  • the first and second Cas proteins are identical, selected from Cas proteins that cut DNA single strands, and the second DNA polymerase is the same as the first DNA polymerase; wherein, The first Cas protein forms first and second functional complexes with the first and second gRNA respectively, and the first DNA polymerase anneals to the first label of the target nucleic acid fragment F1 respectively
  • the primer and the second index primer are used as templates, and an extension reaction is performed to form a target nucleic acid fragment F2 having a first valve and a second valve.
  • the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first The labeling primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second labeling primer or nucleic acid molecule D2, the third nucleic acid editing system or the nucleic acid molecule A3 encoding it are delivered into the cell, so that in the cell
  • the first Cas protein, the first gRNA, the first DNA polymerase, the first tagging primer, the second gRNA, the second tagging primer, and the third nucleic acid editing system are provided.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first indexing primer or nucleic acid molecule D1, the second The gRNA or nucleic acid molecule C2, the second label primer or nucleic acid molecule D2, the third nucleic acid editing system or its encoding nucleic acid molecule A3 are delivered into the cell to provide the first Cas protein, the first gRNA, A first DNA polymerase, a first index primer, a second gRNA, a second index primer, and a third nucleic acid editing system.
  • the nucleic acid molecules A1, B1, C1, D1, C2, D2, and A3 are delivered into cells to provide the first Cas protein, the first gRNA , a first DNA polymerase, a first index primer, a second gRNA, a second index primer and a third nucleic acid editing system.
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated first Cas protein and a first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered Into the cell, and express in the cell, to provide the first Cas protein and the first DNA polymerase in the cell.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (eg, a eukaryotic expression vector). In some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing the first PegRNA containing the first gRNA and the first tagging primer in the cell.
  • the first PegRNA is delivered into the cell to provide the first gRNA and the first tagging primer in the cell, or, the first PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the first PegRNA is transcribed in the cell, so as to provide the first gRNA and the first tagging primer in the cell.
  • the nucleic acid molecule C2 and nucleic acid molecule D2 are contained in the same expression vector (eg, a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA containing the second gRNA and the second tagging primer in the cell.
  • the second PegRNA is delivered into the cell to provide the second gRNA and the second tagging primer in the cell, or alternatively, the second PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the second PegRNA is transcribed in the cell, so as to provide the second gRNA and the second tagging primer in the cell.
  • the nucleic acid molecule capable of expressing the isolated first Cas protein and the first DNA polymerase or the nucleic acid molecule containing the nucleotide sequence encoding the first fusion protein The nucleic acid molecule containing the nucleotide sequence encoding the first PegRNA, the nucleic acid molecule containing the nucleotide sequence encoding the second PegRNA, and the nucleic acid molecule containing the sequence encoding the third nucleic acid editing system are delivered into the cell , and transcribed and expressed in the cell, thereby providing the first Cas protein, the first gRNA, the first DNA polymerase, the first label primer, the second gRNA, the second label primer and the third nucleic acid editing in the cell system.
  • the present application provides a method for inserting a target nucleic acid fragment into a nucleic acid molecule of interest; wherein, the method comprises using the system or kit described in the thirteenth aspect; wherein, The first double-stranded target nucleic acid is used to provide the target nucleic acid fragment, the target nucleic acid fragment is located between the 3' end of the double-stranded target nucleic acid produced by the first strand break and the donor homology arm and, the third double-stranded target nucleic acid is a nucleic acid molecule of interest;
  • the first double-stranded target nucleic acid is identical to the second double-stranded target nucleic acid and is comprised in the nucleic acid vector of the twelfth aspect of claim.
  • the method comprises:
  • the first strand of the first double-stranded target nucleic acid is broken, and the first strand comprises a nick generated by the break, located at the 3' end of the nick and the donor homology arm
  • the first strand part between is called the target nucleic acid strand S1;
  • the first valve is added to the above-mentioned 3' end to form the first strand part with the first valve, which is called the target nucleic acid strand S2;
  • the target nucleic acid strand S2 hybridizes or anneals to the first strand of the nucleotide fragment a1 through the first flap; the target nucleic acid strand S2 is used as a template for an extension reaction to form an extension chain E1, and the extension chain E1 includes The complementary sequence of the target nucleic acid strand S2 and the complementary sequence of the donor homology arm flanking the S2; the extension chain E1 is connected to a2 through the donor homology arm, thereby inserting the target nucleic acid fragment into the in the nucleic acid molecule of interest.
  • the 3' end of the first strand of the nucleotide fragment a1 comprises the complementary sequence of the first flap, and the 3' end of the second strand of the nucleotide fragment a1 comprises the first flap Contrasting sequence.
  • the nicked end of the nucleotide fragment a2 comprises a target site homology arm.
  • the methods are performed extracellularly or intracellularly.
  • the step a is carried out outside the cell or in the cell; the steps b, c and d are performed in the cell within.
  • the method comprises the steps of:
  • a double-stranded target nucleic acid and a nucleic acid molecule of interest said double-stranded target nucleic acid comprising a donor homology arm, the first PAM sequence recognized by the first Cas protein and the sequence recognized by the first gRNA (preferably, said The double-stranded target nucleic acid comprises a donor homology arm, the first PAM sequence recognized by the first Cas protein and the sequence recognized by the first guide sequence comprised by the first gRNA); and
  • step ii in step i:
  • the first Cas protein and the first gRNA combine to form a first functional complex
  • the first functional complex breaks the first strand of the double-stranded target nucleic acid, the first strand comprising a nick created by the break, the first strand between the 3' end of the nick and the donor homology arm
  • the strand portion is called target nucleic acid strand S1
  • the third nucleic acid editing system breaks the nucleic acid molecule of interest to form broken nucleotide fragments a1 and a2;
  • the first index primer hybridizes or anneals to the 3' end of the target nucleic acid strand S1 (i.e., the 3' end produced by the break) through the first target binding sequence;
  • the first DNA polymerase uses the first index primer and the second index primer annealed to the target nucleic acid strand S1 as a template to perform an extension reaction, so that the 3' ends of the first strand generated by the break are respectively extended to form the second A flap, forming a first strand portion having a first flap, referred to as target nucleic acid strand S2; wherein said first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1; and,
  • the target nucleic acid strand S2 hybridizes or anneals to the first strand of the nucleotide fragment a1 through the first flap, so that the target nucleic acid strand S2 is connected to the second strand of the target nucleic acid fragment a1 and the first strand of the target nucleic acid fragment a2 between two chains;
  • an extension reaction at the 3' end of the first strand of the nucleotide fragment a1 using the target nucleic acid strand S2 as a template to form an extension chain E1, the extension chain E1 comprising the complementary sequence of the target nucleic acid strand S2 and The complementary sequence of the donor homology arm flanked by S2; the extension chain E1 is annealed to the first strand of a2 through the donor homology arm, so that the extension chain E1 is connected to the first strand of the target nucleic acid fragment a1 A double-stranded structure is formed between the strand and the first strand of the target nucleic acid fragment a2, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest.
  • the first valve can hybridize or anneal to the 3' end or 3' part of a nucleic acid strand of the nucleotide fragment a1, and the 3' end or 3' part is due to
  • the third nucleic acid editing system is formed by cleaving the nucleic acid molecule of interest.
  • the complementary sequence of the first tag sequence or the first valve can hybridize or anneal to the 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the nucleotide There is a first spacer region between the 3' part of fragment a1 and the break end formed by the third double-stranded target nucleic acid.
  • the length of the first spacer region is 1 nt-200 nt, such as 1-10 nt, 10-20 nt, 20-30 nt, 30-40 nt, 40-50 nt, 50-100 nt or 100-200 nt.
  • the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first The labeling primer or nucleic acid molecule D1, the third nucleic acid editing system or the nucleic acid molecule A3 encoding it is delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, and the first labeling primer in the cell , the second Cas protein, the second gRNA, the second DNA polymerase, the second label primer, and the third nucleic acid editing system;
  • the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first label primer or nucleic acid molecule D1 contacts the double-stranded target nucleic acid extracellularly, and then delivers the edited double-stranded target nucleic acid and a third nucleic acid editing system or nucleic acid molecule A3 encoding it into the cell, so as to provide a A flap of double-stranded target nucleic acid and a third nucleic acid editing system.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first labeling primer or nucleic acid molecule D1, the nucleic acid molecule A3 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first labeling primer, and the third nucleic acid editing system in the cell.
  • the nucleic acid molecules A1, B1, C1, D1, and A3 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA in the cell A polymerase, a first indexing primer, and a third nucleic acid editing system.
  • step i the double-stranded target nucleic acid or a nucleic acid molecule T containing the double-stranded target nucleic acid is delivered into a cell to provide the double-stranded target nucleic acid within the cell.
  • the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a donor homology arm.
  • the first functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA, and a Chain break.
  • the nucleic acid molecule of interest is genomic DNA of the cell.
  • the first Cas protein, the first gRNA, the first DNA polymerase or the first labeling primer are as defined above.
  • said third nucleic acid editing system is as defined above.
  • the third nucleic acid editing system is as defined above, and the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein.
  • the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA, and fragments it.
  • the present application provides a method for replacing a target nucleic acid fragment with a nucleotide fragment in a nucleic acid molecule of interest; wherein, the method includes using the aforementioned system or kit;
  • the first double-stranded target nucleic acid is the same as the second double-stranded target nucleic acid for providing the target nucleic acid fragment, and the target nucleic acid fragment is located in the first strand of the double-stranded target nucleic acid by Between the nick produced by the break and the nick produced by the break in the second strand; and, the third double-stranded target nucleic acid is identical to the fourth double-stranded target nucleic acid, and is a nucleic acid molecule of interest.
  • the method comprises:
  • the first strand and the second strand of the first double-stranded target nucleic acid are respectively broken, and the first strand and the second strand respectively comprise a nick generated by the break, located at the above two
  • the double-stranded portion between the 3' ends of the cut is referred to as the target nucleic acid fragment F1;
  • the first valve process and the second valve process are added to the above two 3' ends respectively to form the first valve process and the second valve process the double-stranded portion, referred to as target nucleic acid fragment F2;
  • nucleic acid molecule of interest Fragmenting the nucleic acid molecule of interest with the third and fourth nucleic acid editing systems to form fragmented nucleotide fragments a1, a2 and a3; wherein, before the fragmentation, in the nucleic acid molecule of interest , the nucleotide fragments a1, a2 and a3 are arranged in sequence (that is, the nucleotide fragment a1 is connected to the nucleotide fragment a3 through the nucleotide fragment a2); and,
  • the method comprises the steps of:
  • nucleic acid molecule of interest with a third nucleic acid editing system and a fourth nucleic acid editing system.
  • step ii in step i:
  • the first Cas protein is combined with the first gRNA to form a first functional complex
  • the second Cas protein is combined with the second gRNA to form a second functional complex
  • the first and second functional complexes respectively break the first strand and the second strand of the double-stranded target nucleic acid, and the first strand and the second strand respectively comprise nicks generated by the breakage, located at the above two
  • the double-stranded portion between the 3' ends of the cut is called the target nucleic acid fragment F1
  • the third and fourth nucleic acid editing systems break the nucleic acid molecule of interest to form broken nucleotide fragments a1, a2 and a3;
  • the first index primer hybridizes or anneals to the 3' end of a nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the first target binding sequence; and, the first The two-label primer hybridizes or anneals to the 3' end of the other nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the second target binding sequence; and,
  • the first DNA polymerase and the second DNA polymerase respectively use the first index primer and the second index primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that the first strand and the second strand
  • the 3' ends generated by the break are respectively extended to form the first valve process and the second valve process, forming a double-stranded part with the first valve process and the second valve process, which is called the target nucleic acid fragment F2; wherein, the first the valve and the second valve are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a3, respectively; and,
  • the target nucleic acid fragment F2 hybridizes or anneals to the nucleotide fragments a1 and a3 through the first valve and the second valve respectively, and then connects between the nucleotide fragments a1 and a3, thereby, the nucleic acid molecule of interest
  • the nucleotide fragment a2 in is replaced by the target nucleic acid fragment.
  • the first valve can hybridize or anneal to the 3' end or 3' part of a nucleic acid strand of the nucleotide fragment a1, and the 3' end or 3' part is due to
  • the third nucleic acid editing system is formed by cleaving the nucleic acid molecule of interest.
  • the second valve can hybridize or anneal to the 3' end or 3' part of a nucleic acid strand of the nucleotide fragment a3, and the 3' end or 3' part is due to
  • the fourth nucleic acid editing system is formed by cleaving the nucleic acid molecule of interest.
  • the methods are performed intracellularly.
  • said nucleic acid molecule A1, said nucleic acid molecule B1, said first gRNA or nucleic acid molecule C1, said first labeling primer or nucleic acid molecule D1, said nucleic acid molecule A2 , the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second label primer or nucleic acid molecule D2, the nucleic acid molecule A3 and the nucleic acid molecule A4 are delivered into the cell to provide the first Cas in the cell protein, a first gRNA, a first DNA polymerase, a first tagging primer, a second Cas protein, a second gRNA, a second DNA polymerase, a second tagging primer, a third nucleic acid editing system, and a fourth nucleic acid editing system.
  • the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, A3 and A4 are delivered into the cell to provide the first Cas in the cell protein, a first gRNA, a first DNA polymerase, a first tagging primer, a second Cas protein, a second gRNA, a second DNA polymerase, a second tagging primer, a third nucleic acid editing system, and a fourth nucleic acid editing system.
  • step i the double-stranded target nucleic acid or a nucleic acid molecule T containing the double-stranded target nucleic acid is delivered into a cell to provide the double-stranded target nucleic acid within the cell.
  • the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by the first Cas protein and a second PAM sequence recognized by the second Cas protein.
  • the first functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA, and a strand break; and, the second functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA, and breaks the other strand thereof.
  • the nucleic acid molecule of interest is genomic DNA of the cell.
  • the first Cas protein, the first gRNA, the first DNA polymerase or the first labeling primer are as defined above.
  • the second Cas protein, the second gRNA, the second DNA polymerase or the second labeling primer are as defined above.
  • said third nucleic acid editing system is as defined above.
  • said fourth nucleic acid editing system is as defined above.
  • the third nucleic acid editing system is as defined above
  • the fourth nucleic acid editing system is as defined above
  • the nucleic acid molecule of interest contains the third PAM sequence recognized by the third Cas protein and The fourth PAM sequence recognized by the fourth Cas protein.
  • the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA, and fragments it; and, The fourth functional complex binds to the nucleic acid molecule of interest through the fourth PAM sequence and the fourth gRNA, and breaks it.
  • the first and second Cas proteins are identical, selected from Cas proteins that cut DNA single strands, and the second DNA polymerase is identical to the first DNA polymerase; wherein, The first Cas protein forms first and second functional complexes with the first and second gRNA respectively, and the first DNA polymerase anneals to the first label of the target nucleic acid fragment F1 respectively
  • the primer and the second index primer are used as templates, and an extension reaction is performed to form a target nucleic acid fragment F2 having a first valve and a second valve.
  • the nucleic acid molecule A4 is delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first label primer, the second gRNA, the second label primer, the third nucleic acid editing system and The fourth nucleic acid editing system.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first indexing primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second label primer or nucleic acid molecule D2, and the nucleic acid molecules A3 and A4 are delivered into the cell to provide the first Cas protein, the first gRNA, and the first DNA polymerase in the cell , the first index primer, the second gRNA, the second index primer, the third nucleic acid editing system and the fourth nucleic acid editing system.
  • the nucleic acid molecules A1, B1, C1, D1, C2, D2, A3, and A4 are delivered into cells to provide the first Cas protein, the second Cas protein, and the A gRNA, a first DNA polymerase, a first index primer, a second gRNA, a second index primer, a third nucleic acid editing system, and a fourth nucleic acid editing system.
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated first Cas protein and a first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered Into the cell, and express in the cell, to provide the first Cas protein and the first DNA polymerase in the cell.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (eg, a eukaryotic expression vector). In some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing the first PegRNA containing the first gRNA and the first tagging primer in the cell.
  • the first PegRNA is delivered into the cell to provide the first gRNA and the first tagging primer in the cell, or, the first PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the first PegRNA is transcribed in the cell, so as to provide the first gRNA and the first tagging primer in the cell.
  • the nucleic acid molecule C2 and nucleic acid molecule D2 are contained in the same expression vector (eg, a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA containing the second gRNA and the second tagging primer in the cell.
  • the second PegRNA is delivered into the cell to provide the second gRNA and the second tagging primer in the cell, or alternatively, the second PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the second PegRNA is transcribed in the cell, so as to provide the second gRNA and the second tagging primer in the cell.
  • the nucleic acid molecule capable of expressing the isolated first Cas protein and the first DNA polymerase or the nucleic acid molecule containing the nucleotide sequence encoding the first fusion protein A nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA, a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA, a nucleic acid molecule comprising a nucleotide sequence encoding the third nucleic acid editing system, and
  • the nucleic acid molecule containing the nucleotide sequence encoding the fourth nucleic acid editing system is delivered into the cell, and is transcribed and expressed in the cell, thereby providing the first Cas protein, the first gRNA, and the first DNA in the cell A polymerase, a first indexing primer, a second gRNA, a second indexing primer, a third nucleic acid editing system, and a fourth nucleic acid editing
  • the method comprises the steps of:
  • the first and second Cas protein, the first and second gRNA, the first and second DNA polymerase, and the first and second labeling primers as well as the third nucleic acid editing system and the first Four nucleic acid editing systems; wherein, the third nucleic acid editing system and the fourth nucleic acid editing system are respectively as defined above;
  • nucleic acid molecule of interest is contacted with the third nucleic acid editing system and the fourth nucleic acid editing system.
  • step ii in step i:
  • the first Cas protein and the first gRNA combine to form a first functional complex
  • the second Cas protein and the second gRNA combine to form a second functional complex
  • the third Cas protein and the third gRNA Combine to form a third functional complex
  • the fourth Cas protein and the fourth gRNA combine to form a fourth functional complex
  • the first and second functional complexes respectively cleave a first strand and a second strand of the double-stranded target nucleic acid, the first strand and the second strand respectively comprising a 3' end generated by the cleavage, located at the above-mentioned
  • the double-stranded portion between the two 3' ends is called target nucleic acid fragment F1
  • the third and fourth functional complexes bind and fragment the nucleic acid molecule of interest to form fragmented nucleotide fragments a1, a2, and a3;
  • the first index primer hybridizes or anneals to the 3' end of a nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the first target binding sequence; and, the first The two-label primer hybridizes or anneals to the 3' end of the other nucleic acid strand of the target nucleic acid fragment F1 (ie, the 3' end produced by the break) through the second target binding sequence; and,
  • the first DNA polymerase and the second DNA polymerase respectively use the first index primer and the second index primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that the first strand and the second strand
  • the 3' ends generated by the break are respectively extended to form the first valve process and the second valve process, forming a double-stranded part with the first valve process and the second valve process, which is called the target nucleic acid fragment F2; wherein, the first the valve and the second valve are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a3, respectively; and,
  • the third index primer hybridizes or anneals to the 3' end of a nucleic acid strand of the nucleotide fragment a1 through the third target binding sequence, wherein the 3' end is compounded due to the third functional and the fourth index primer hybridizes or anneals to the 3' end of a nucleic acid strand of the nucleotide fragment a3 through the fourth target binding sequence, wherein the The 3' end is formed by cleaving the nucleic acid molecule of interest by the fourth functional complex.
  • the methods are performed intracellularly.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first labeling primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second label primer or nucleic acid molecule D2, the nucleic acid molecule A3, and the nucleic acid molecule A4 are delivered into cells to provide The first and second Cas proteins, the first and second gRNA, the first and second DNA polymerases, and the first and second label primers, as well as the third nucleic acid editing system and the fourth Nucleic acid editing system.
  • the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, A3, A4 are delivered into the cell to provide the first, second Two Cas proteins, the first and second gRNAs, the first and second DNA polymerases, the first and second index primers, and the third and fourth nucleic acid editing systems.
  • step i the double-stranded target nucleic acid or a nucleic acid molecule T containing the double-stranded target nucleic acid is delivered into a cell to provide the double-stranded target nucleic acid within the cell.
  • the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by the first Cas protein and a second PAM sequence recognized by the second Cas protein.
  • the first functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA, and a strand break; and, the second functional complex binds to the double-stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA, and breaks the other strand thereof.
  • the nucleic acid molecule of interest contains a third PAM sequence recognized by the third Cas protein and a fourth PAM sequence recognized by the fourth Cas protein.
  • the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA, and fragments it; and, The fourth functional complex binds to the nucleic acid molecule of interest through the fourth PAM sequence and the fourth gRNA, and breaks it.
  • the nucleic acid molecule of interest is genomic DNA of the cell.
  • the first Cas protein, the first gRNA, the first DNA polymerase or the first labeling primer are as defined above.
  • the second Cas protein, the second gRNA, the second DNA polymerase or the second labeling primer are as defined above.
  • the first and second Cas proteins are identical and selected from Cas proteins that cut DNA single strands
  • the third and fourth Cas proteins are identical and selected from Cas proteins that cut DNA double strands.
  • Cas protein, and the first, second, third and fourth DNA polymerases are the same DNA polymerase; wherein, the first Cas protein and the first, second, third and fourth gRNA forming first, second, third and fourth functional complexes respectively; and, the first DNA polymerase uses the first index primer and the second index primer annealed to the target nucleic acid fragment F1 as templates respectively, An extension reaction is performed to form a target nucleic acid fragment F2 having a first valve and a second valve.
  • the nucleic acid molecule A4 is delivered into the cell to provide the first Cas protein, the first DNA polymerase, the first and second gRNA, and the first and second label primers, as well as the third nucleic acid editing system and the second nucleic acid editing system in the cell.
  • the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first indexing primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second label primer or nucleic acid molecule D2, and the nucleic acid molecules A3 and A4 are delivered into the cell to provide the first Cas protein, the first DNA polymerase, and the third nucleic acid in the cell editing system and the fourth nucleic acid editing system.
  • the nucleic acid molecule A1, B1, C1, D1, C2, D2, A3 is delivered into the cell to provide the first Cas protein in the cell, the first DNA polymerase, first and second gRNA, first and second index primers, third nucleic acid editing system and fourth nucleic acid editing system.
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (eg, eukaryotic expression vectors).
  • the nucleic acid molecule A1 and the nucleic acid molecule B1 can express the isolated first Cas protein and the first DNA polymerase in the cell, or can express the isolated first Cas protein and the first DNA polymerase. A first fusion protein of the first DNA polymerase.
  • a nucleic acid molecule capable of expressing the isolated first Cas protein and a first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered Into the cell, and express in the cell, to provide the first Cas protein and the first DNA polymerase in the cell.
  • the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (eg, a eukaryotic expression vector). In some embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing the first PegRNA containing the first gRNA and the first tagging primer in the cell.
  • the first PegRNA is delivered into the cell to provide the first gRNA and the first tagging primer in the cell, or, the first PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the first PegRNA is transcribed in the cell, so as to provide the first gRNA and the first tagging primer in the cell.
  • the nucleic acid molecule C2 and nucleic acid molecule D2 are contained in the same expression vector (eg, a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA containing the second gRNA and the second tagging primer in the cell.
  • the second PegRNA is delivered into the cell to provide the second gRNA and the second tagging primer in the cell, or alternatively, the second PegRNA containing The nucleic acid molecule of the nucleotide sequence is delivered into the cell, and the second PegRNA is transcribed in the cell, so as to provide the second gRNA and the second tagging primer in the cell.
  • the nucleic acid editing system, kit and method provided by the present application can break double-stranded target nucleic acid (for example, containing the target nucleic acid sequence or other exogenous A nucleic acid strand of a donor vector of a nucleic acid fragment) and forms a flap at the 3' end of the nick (which is a homologous flap structure identical or complementary to the end sequence of the genome-specific break).
  • the system, kit and method of the present application can realize efficient and accurate insertion and replacement of exogenous nucleic acid (especially large fragment exogenous nucleic acid) at specific sites in the genome.
  • the system of the present invention does not produce linearized DNA fragments, which improves the site-specific integration of exogenous genes safety; in addition, compared with HDR-mediated gene editing technology, the f-PAINT method in the present invention does not require the construction of homology arms, and can greatly increase the length of foreign genes carried by viral vectors, especially adeno-associated virus vectors .
  • Fig. 1 shows a schematic diagram of the principle of the method of the present invention (f-PAINT) mediating the insertion of foreign genes into the genome.
  • Figure 1a is a schematic diagram of the principle of site-specific integration of exogenous genes on the genome using HDR, NHEJ and f-PAINT methods.
  • the black double solid line represents the genome sequence
  • the gray double solid line represents the backbone sequence of the donor vector
  • the yellow bar represents the exogenous gene
  • the red and blue solid lines represent the left and right sides of the integration site on the genome and the donor vector, respectively. Homologous sequences of the vector (carried by itself or produced by processing).
  • the black triangles indicate the specific recognition and cutting sites of nucleases on genomic DNA, and the blunt ends or sticky ends of genomic DNA that undergo double-strand breaks under the action of nucleases.
  • the blue and purple triangles indicate the targeted recognition sequences of a pair of PE-Cas proteins (the first fusion protein) arranged in reverse.
  • the 3' end generates a homologous flap sequence that can be identical or complementary to the end sequence of the genomic break.
  • the DNA segment containing the foreign gene is located between the two homologous flap structures.
  • the HDR-based method uses the homology arm on the donor vector to achieve site-specific integration of exogenous genes at specific double-strand cleavage sites in genomic DNA through cellular HDR repair mechanisms.
  • the NHEJ-based method relies on the cell's own NHEJ repair mechanism to connect the exogenous gene fragment cut from the donor vector to the end of the double-strand break at the genome-specific integration site, thereby achieving site-specific integration of the exogenous gene.
  • the method (f-PAINT) of the present invention utilizes the homologous flap (valve process) sequence generated by processing on the donor vector to complement and pair with the end of the double-strand break on the genome, and occurs at the end of the genome break with the donor vector as the Template DNA replication, followed by double-strand hybridization to achieve end rejoining and site-directed integration of foreign fragments.
  • Figure 1b shows the processing to generate homologous valve (valve) sequences on the donor vector.
  • PE-spCas9/pegRNA recognizes and binds to the targeting recognition sequences on both sides of the foreign gene on the donor vector, and cuts the nucleic acid single strand that does not pair with the pegRNA. Subsequently, the primer binding sequence on the pegRNA binds to the end of the single-stranded free nucleic acid generated by the cleavage, and under the action of reverse transcriptase, the template sequence of the pegRNA (homologous to the end of the genome break) is used as a template to extend the homology Flap (valvular process) sequence.
  • Figure 2 shows the efficient and specific site-specific integration of exogenous genes on the 3'UTR of the GAPDH gene of human 293T cells using the f-PAINT method.
  • Figure 2a shows a schematic flow diagram of using different methods (HDR, NHEJ and f-PAINT) to knock in an exogenous gene (IRES-EGFP) in the 3'UTR region of the GAPDH gene of the human 293T cell genome.
  • the black solid line indicates the genome sequence; the blue and gray boxes indicate the exon protein coding region and the non-coding region respectively; the red and blue solid lines indicate the homologous sequence: the long solid line indicates the homology arm on the HDR donor vector Sequence; the short solid line indicates the homologous valve (valve process) sequence generated by processing on the f-PAINT donor vector.
  • Figure 2b shows the efficiency comparison of site-directed integration of exogenous genes mediated by different methods. The results showed that f-PAINT-mediated exogenous gene integration efficiency was significantly higher than that of HDR and NHEJ methods.
  • Figure 2c shows the PCR identification results of the correct edited gene sequence and by-products generated by site-directed integration of exogenous genes using NHEJ and f-PAINT.
  • Figure 2d shows the results of sanger sequencing at the junction of the correct edited gene sequence generated by site-specific integration of exogenous genes by NHEJ and f-PAINT. It was shown that f-PAINT method mediated site-directed integration of linkers with higher precision compared to NHEJ method.
  • Figure 3 shows the comparison of site-specific integration efficiencies of different methods of HDR, NHEJ, HMEJ and f-PAINT at AAVS1 locus in human genome and Rosa26 locus in mouse genome.
  • Fig. 3a shows the exogenous gene (CAG-EGFP) in the genome mediated by different methods of HDR, NHEJ, HMEJ and f-PAINT when saCas9/sgRNA is not used to target specific sites in the genome (AAVS1 or Rosa26).
  • the integration efficiency EGFP positive cell rate
  • the integration of foreign genes is the integration of non-specific sites, or called random integration.
  • the random integration of exogenous genes in the genome will lead to gene insertion mutations and disrupt the stability of the genome.
  • Figure 3b shows the distribution of exogenous gene (CAG-EGFP) on the genome mediated by different methods such as HDR, NHEJ, HMEJ and f-PAINT when using saCas9/sgRNA to target specific sites in the genome (AAVS1 or Rosa26). Integration efficiency (EGFP positive cell rate). At this time, the integration of exogenous genes is mainly site-specific integration.
  • CAG-EGFP exogenous gene
  • Figure 4 shows the use of HDR, HDR NT (HDR method for non-targeting specific sites), f-PAINT and f-PAINT NT (f-PAINT method for non-targeting specific sites) to mediate exogenous gene CAG-EGFP Comparison of site-specific integration efficiencies at gene therapy-related safe harbor sites and genetic disease-related gene sites on K562 cells. Both HDR and f-PAINT maintained a low level of random integration probability, but the f-PAINT method achieved more efficient site-specific integration of exogenous genes at different gene loci such as AAVS1, CCR5, TRAC, WAS, HBB, and IL2RG.
  • Figure 5 shows the genotype identification and linker Sanger sequencing results of site-specific integration of exogenous gene CAG-EGFP in safe harbor sites such as AAVS1, CCR5, and TRAC in K562 cells mediated by f-PAINT method.
  • Figure 6 shows the genotype identification and linker Sanger sequencing results of site-specific integration of the exogenous gene CAG-EGFP in WAS, HBB, IL2RG and other genetic disease-related sites of K562 cells mediated by the f-PAINT method.
  • Fig. 7 is a schematic diagram of the principle of site-directed integration of exogenous genes mediated by the h-PAINT method in the present invention.
  • the left side of the exogenous gene on the donor vector is the left homology arm with a length of 500-1500bp
  • the right side of the exogenous gene is the one that can be recognized and processed by PE-spCas9/pegRNA target recognition sequence.
  • the target recognition sequence generates the right homology flap under the action of PE-spCas9/pegRNA, and the homology flap interacts with the broken end of the genome through complementary base pairing to realize the extension of the broken end of the genome and connect with the other end of the genome.
  • the broken ends are complementary paired through the left homology arm to realize the integration of the foreign gene and the repair of the chain.
  • the right side of the foreign gene on the donor vector is the right homology arm with a length of 500-1500bp
  • the left side of the foreign gene is the target that can be recognized and processed by PE-spCas9/pegRNA recognition sequence.
  • Figure 8 shows the efficiency comparison of the site-specific integration of the exogenous gene IRES-EGFP on the 3'UTR of the human GAPDH gene mediated by f-PAINT and h-PAINT methods.
  • Figure 8a is a schematic diagram of h-PAINT method mediated exogenous gene IRES-EGFP on the 3'UTR of human GAPDH gene to achieve site-specific integration of exogenous genes.
  • the left side of the h-PAINT(LHA) donor vector is the 800bp left homology arm sequence, and the right side is the targeting recognition sequence of PE-spCas9/GAPDH-peg ⁇ ; the right side of the h-PAINT(RHA) donor vector
  • the right homology arm sequence is 800bp, and the target recognition sequence of PE-spCas9/GAPDH-peg ⁇ is on the left.
  • Figure 8b shows the efficiency results of site-specific integration of exogenous gene IRES-EGFP on the 3'UTR of human GAPDH gene mediated by f-PAINT, h-PAINT (LHA) and h-PAINT (RHA).
  • Figure 8c shows the genotype identification results of cells edited by different methods.
  • Figure 8d is the Sanger sequencing results of the 5' and 3' junctions of cells edited by the h-PAINT method.
  • Example 1 Using the f-PAINT system to insert an exogenous gene (IRES-EGFP) into the 3' of human GAPDH gene UTR area
  • this example designs the following experiment: use the f-PAINT system to knock the reporter gene IRES-EGFP into the 3'UTR region of the human genome GAPDH, and use HDR method and NHEJ method as controls.
  • the schematic diagram of the principles of three different methods of HDR, NHEJ and f-PAINT mediating the site-specific integration of exogenous genes is shown in Figure 1.
  • the flow chart of the site-specific integration of the IRES-EGFP reporter gene on the GAPDH gene mediated by the above different methods is shown in Figure 2a.
  • the GAPDH gene is located on chromosome 12 and encodes glyceraldehyde-3-phosphate dehydrogenase. It is an important housekeeping gene and is highly expressed in 293T cells. After the reporter gene is correctly integrated into the 3'UTR region of GAPDH, it can be transcribed together with the GAPDH gene. The IRES sequence can recruit ribosomes, so that EGFP can be expressed. The fluorescent signal of EGFP can be conveniently observed directly by fluorescence microscopy, and correctly edited EGFP-expressing cells can also be captured and quantified by flow cytometry.
  • the pCAG-spCas9-mCherry plasmid (which can express spCas9 protein (SEQ ID NO: 1) and mCherry protein (SEQ ID NO: 2)) and pCAG-spCas9(H840A)-mCherry plasmid (which can express spCas9) used in this embodiment (H840A) protein (SEQ ID NO:3) and mCherry protein (SEQ ID NO:2)), pCAG-saCas9 plasmid (it can express saCas9 protein (SEQ ID NO:4), pUC19-U6-gRNA (saCas9) ( It can transcribe a gRNA lacking a guide sequence (saCas9) (SEQ ID NO:5)), pUC19-U6-gRNA(spCas9) (which can transcribe a gRNA lacking a guide sequence (spCas9) (SEQ ID NO:6)
  • the nucleotide fragment encoding MLV TR (SEQ ID NO: 7) was amplified from the pCMV-PE2 (#132775) plasmid purchased from addgene, and the encoding spCas9 was amplified from the pCAG-spCas9(H840A)-mCherry plasmid A nucleotide fragment of the (H840A) portion and a nucleotide fragment encoding mCherry.
  • the above amplified MLV TR and spCas9 (H840A) nucleotide fragments were connected to the AscI/BsrGI double digested pCAG-spCas9 (H840A)-mCherry plasmid by In-fusion cloning technology to obtain pCAG-PE-spCas9-mCherry A plasmid capable of expressing PE-spCas9 protein (SEQ ID NO: 8) and mCherry protein.
  • the PE-spCas9 protein has fused MLV TR and spCas9(H840A).
  • the primers sgGAPDH-F (SEQ ID NO: 9) and sgGAPDH-R (SEQ ID NO: 10) were annealed and connected to the pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase to obtain pUC19-U6 - sgGAPDH plasmid, which can transcribe sgGAPDH (SEQ ID NO: 11), and guide the saCas9 protein to target the specific site in the 3' URT region of the human GAPDH site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers GAPDH-peg ⁇ -F (SEQ ID NO:20) and GAPDH-peg ⁇ -R (SEQ ID NO:21) were used for overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-GAPDH-peg ⁇ plasmid, which can transcribe GAPDH-peg ⁇ (SEQ ID NO: 22), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the primers GAPDH-peg ⁇ -F (SEQ ID NO:23) and GAPDH-peg ⁇ -R (SEQ ID NO:24) were used for overlap extension PCR, and the obtained fragments were recovered and connected to HindIII digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-GAPDH-peg ⁇ plasmid, which can transcribe GAPDH-peg ⁇ (SEQ ID NO: 25), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:19), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene IRES-EGFP (SEQ ID NO: 26) was synthesized by Jerry, and connected to the pGH plasmid vector digested by EcoRV through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • Invitrogen's Lipofectamine 3000 liposome transfection reagent was used to deliver pCAG-saCas9-mCherry, sgGAPDH, and the donor vector of HDR into 293T cells.
  • pCAG-saCas9-mCherry was delivered into 293T cells along with the donor vector of HDR.
  • the 293T cell line was obtained from the ATCC cell bank. After 24 hours of transfection, mCherry-positive cells were sorted by flow cytometry, and after the sorted cells were cultured for 5 days, the ratio of EGFP-positive cells was analyzed by flow cytometry.
  • pCAG-saCas9, sgGAPDH, pCAG-spCas9-mCherry, pUC19-U6-sg ⁇ , pUC19-U6-sg ⁇ together with the donor vector of NHEJ were transfected with Lipofectamine 3000 liposome transfection reagent from Invitrogen. into 293T cells.
  • pCAG-saCas9, pCAG-spCas9-mCherry, pUC19-U6-sg ⁇ , pUC19-U6-sg ⁇ together with the donor vector were transfected into 293T cells.
  • mCherry-positive cells were sorted by flow cytometry, and after the sorted cells were cultured for 5 days, the ratio of EGFP-positive cells was analyzed by flow cytometry.
  • pCAG-saCas9, sgGAPDH, pCAG-PE-spCas9-mCherry, pUC19-U6-GAPDH-peg ⁇ , pUC19-U6-GAPDH-peg ⁇ , pUC19-U6-GAPDH- peg ⁇ together with the donor vector of f-PAINT were transfected into 293T cells.
  • pCAG-saCas9, pCAG-PE-spCas9-mCherry, pUC19-U6-GAPDH-peg ⁇ , pUC19-U6-GAPDH-peg ⁇ together with the donor vector were transfected into 293T cells. After 24 hours of transfection, mCherry-positive cells were sorted by flow cytometry, and after the sorted cells were cultured for 5 days, the ratio of EGFP-positive cells was analyzed by flow cytometry.
  • Comparing the ratio of EGFP-positive cells in three different systems can reflect the difference in the site-specific integration efficiency of exogenous gene IRES-EGFP mediated by different systems.
  • the results of exogenous gene integration efficiency are shown in Figure 2b.
  • the results showed that the ratio of EGFP-positive cells in the f-PAINT system was about 30%, which was about 4 times that of the HDR system and about 2 times that of the NHEJ system.
  • Genomic DNA of cells edited by NHEJ and f-PAINT systems was extracted, and then primers GAPDH-P1(SEQ ID NO:29)/GAPDH-P2(SEQ ID NO:30) were used to amplify the correctly integrated 5' adapter, respectively.
  • GAPDH-P3(SEQ ID NO:31)/GAPDH-P4(SEQ ID NO:32) (amplification of correctly integrated 3' adapter), GAPDH-P1(SEQ ID NO:29)/GAPDH-P3(SEQ ID NO :31) (amplification of the 5' linker of the reverse integration of the exogenous gene), GAPDH-P2 (SEQ ID NO: 30)/GAPDH-P4 (SEQ ID NO: 32) (amplification of the 3' adapter of the reverse integration of the exogenous gene ' linker), GAPDH-P1(SEQ ID NO:29)/GAPDH-P5(SEQ ID NO:33) (5' linker for amplification vector backbone integration), GAPDH-P6(SEQ ID NO:34)/GAPDH- P4 (SEQ ID NO: 32) (the 3' linker integrated into the amplified vector backbone), GAPDH-P1 (SEQ ID NO: 29)/GAPDH-P6 (SEQ
  • the results of PCR identification are shown in Figure 2c.
  • the results show that the integration of exogenous genes mediated by the NHEJ method, due to the fragmentation of the donor vector, in addition to the correct integration of exogenous genes at the specific integration site of the genome, there will also be reverse integration of exogenous genes , Skeletal integration and other by-products.
  • the results of Sanger sequencing analysis are shown in Figure 2d.
  • the results showed that the connection linker of f-PAINT system is also more accurate than NHEJ method, and it is not easy to produce base insertion, deletion and substitution at the linker.
  • the f-PAINT system described in the present invention can greatly improve the efficiency and accuracy of site-specific integration of exogenous genes. Compared with NHEJ and other integration methods that rely on linearized double-stranded DNA as a donor vector, it is safer because it does not produce donor vector linearization.
  • Example 2 Using the f-PAINT system to insert an exogenous reporter gene (CAG-EGFP) into the human AAVS1 position dot, mouse Rosa26 locus
  • the reporter gene CAG-EGFP was site-specifically knocked into the first AAVS1 locus of the human genome using the f-PAINT system.
  • intron and the first intron of the mouse genome Rosa26 site, and HDR, NHEJ and HMEJ methods were used as controls.
  • the AAVS1 locus and Rosa26 locus are respectively recognized as safe harbor loci on the human and mouse genomes, and the insertion of foreign sequences at these loci will not affect the function of the cell itself.
  • the reporter gene CAG-EGFP has its own CAG promoter. After being integrated into the genome, the CAG promoter can drive the expression of EGFP.
  • the fluorescent signal of EGFP can be directly observed by fluorescence microscope, and can also be correctly edited by flow cytometry. Cells expressing EGFP were captured and quantified.
  • the reporter gene CAG-EGFP can also detect the random integration of exogenous genes in the genome.
  • the plasmids used in this example for expressing sgRNA and pegRNA and the donor plasmid carrying the reporter gene CAG-EGFP were constructed in the same manner as in Example 1.
  • the HMEJ donor vector is also based on the pGH plasmid as the vector backbone, and a homology arm sequence of about 800 bp is introduced on both sides of the reporter gene, and a spCas9/sg ⁇ targeting recognition sequence is introduced on the outside of the homology arm.
  • the primer sequences, sgRNA and pegRNA sequences, the sequence of the reporter gene CAG-EGFP and the sequence of the homology arm used for plasmid construction are shown in Table 1, and the specific primer sequences used are as follows.
  • the primers sgAAVS1-F (SEQ ID NO:36) and sgAAVS1-R (SEQ ID NO:10) were annealed and connected to the pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase to obtain pUC19-U6 -sg AAVS1 plasmid, which can transcribe sgAAVS1 (SEQ ID NO: 38), and guide the saCas9 protein to target the first intron of the human genome AAVS1 site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers AAVS1-peg ⁇ -F (SEQ ID NO:39) and AAVS1-peg ⁇ -R (SEQ ID NO:40) were used for overlap extension PCR, and the obtained fragments were recovered and connected to HindIII digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-AAVS1-peg ⁇ plasmid, which can transcribe AAVS1-peg ⁇ (SEQ ID NO: 41), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the primers AAVS1-peg ⁇ -F (SEQ ID NO:42) and AAVS1-peg ⁇ -R (SEQ ID NO:43) were used for overlap extension PCR, and the obtained fragments were recovered and connected to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-AAVS1-peg ⁇ plasmid, which can transcribe AAVS1-peg ⁇ (SEQ ID NO: 44), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:19), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • Primers sgmRosa26-F (SEQ ID NO: 52) and sgmRosa26-R (SEQ ID NO: 53) were annealed and connected to the pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase to obtain pUC19-U6 -sg Rosa26 plasmid, which can transcribe sgmRosa26 (SEQ ID NO:54), and guide the saCas9 protein to target the first intron of the genomic Rosa26 site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers mRosa26-peg ⁇ -F (SEQ ID NO:55) and mRosa26-peg ⁇ -R (SEQ ID NO:56) were used for overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-mRosa26-peg ⁇ plasmid, which can transcribe mRosa26-peg ⁇ (SEQ ID NO:57), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • Invitrogen Lipofectamine 3000 liposome transfection reagent to transfect pCAG-saCas9, pCAG-spCas9-mCherry, pUC19-U6-sg ⁇ , pUC19-U6-sg ⁇ together with the donor vector of NHEJ into 293T cells or mouse embryonic stem cells.
  • pCAG-saCas9, pCAG-spCas9-mCherry, pUC19-U6-sg ⁇ , together with the donor vector of HMEJ were transfected into 293T cells or mouse embryonic stem cells.
  • f-PAINT use Invitrogen's Lipofectamine 3000 liposome transfection reagent to transfer pCAG-saCas9, pCAG-PE-spCas9-mCherry, pUC19-U6-peg ⁇ , pUC19-U6-peg ⁇ together with f-PAINT Donor vectors were transfected into 293T cells or mouse embryonic stem cells. After 24 hours of cell transfection, mCherry-positive cells were sorted by flow cytometry, and after the sorted cells were cultured for 14 days, the ratio of EGFP-positive cells was analyzed by flow cytometry.
  • saCas9/sgRNA was used to target the AAVS1 site of the human genome or the Rosa26 site of the mouse: in HDR
  • the donor vector of pCAG-saCas9, sgGAPDH (or sgRosa26), and HDR was delivered to 293T cells or mouse embryonic stem cells with Lipofectamine 3000 lipofectamine transfection reagent from Invitrogen.
  • pCAG-saCas9, sgGAPDH (or sgRosa26), pCAG-spCas9-mCherry, pUC19-U6-sg ⁇ , together with the donor vector for HMEJ were transfected into 293T cells or mouse embryonic stem cells.
  • pCAG-saCas9, sgGAPDH (or sgRosa26), pCAG-PE-spCas9-mCherry, pUC19-U6-peg ⁇ , pUC19-U6- peg ⁇ together with the donor vector of f-PAINT was transfected into 293T cells or mouse embryonic stem cells. After 24 hours of cell transfection, mCherry-positive cells were sorted by flow cytometry, and after the sorted cells were cultured for 14 days, the ratio of EGFP-positive cells was analyzed by flow cytometry.
  • Example 3 Using f-PAINT system to insert exogenous gene (CAG-EGFP) into gene therapy-related security Territory-wide loci and genetic disease-associated loci
  • the primers sgCCR5-F (SEQ ID NO:65) and sgCCR5-R (SEQ ID NO:66) were annealed and connected to the pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase to obtain pUC19-U6 -sgCCR5 plasmid, which can transcribe sgCCR5 (SEQ ID NO:67), and guide the saCas9 protein to target the CCR5 site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers CCR5-peg ⁇ -F (SEQ ID NO:68) and CCR5-peg ⁇ -R (SEQ ID NO:69) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-CCR5-peg ⁇ plasmid, which can transcribe CCR5-peg ⁇ (SEQ ID NO: 70), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the primers CCR5-peg ⁇ -F (SEQ ID NO:71) and AAVS1-peg ⁇ -R (SEQ ID NO:72) were used for overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-CCR5-peg ⁇ plasmid, which can transcribe CCR5-peg ⁇ (SEQ ID NO: 73), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:19), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • sgTRAC-F SEQ ID NO: 78
  • sgTRAC-R SEQ ID NO: 79
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers TRAC-peg ⁇ -F (SEQ ID NO:81) and TRAC-peg ⁇ -R (SEQ ID NO:82) were used for overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology
  • the pUC19-U6-TRAC-peg ⁇ plasmid was obtained, which could transcribe TRAC-peg ⁇ (SEQ ID NO: 83), and guide the PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the primers TRAC-peg ⁇ -F (SEQ ID NO:84) and TRAC-peg ⁇ -R (SEQ ID NO:85) were used for overlap extension PCR, and the obtained fragments were recovered and connected to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-TRAC-peg ⁇ plasmid, which can transcribe TRAC-peg ⁇ (SEQ ID NO: 86), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:19), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • sgWAS-1-F (SEQ ID NO: 91) and sgWAS-1-R (SEQ ID NO: 92) were annealed and ligated to the pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase , to obtain the pUC19-U6-sg WAS-1 plasmid, which can transcribe sgWAS-1 (SEQ ID NO: 93), and guide the saCas9 protein to target the WAS-1 site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers WAS-1-peg ⁇ -F (SEQ ID NO:94) and WAS-1-peg ⁇ -R (SEQ ID NO:95) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII by In-fusion cloning technology
  • the pUC19-U6-WAS-1-peg ⁇ plasmid is obtained, which can transcribe WAS-1-peg ⁇ (SEQ ID NO:96) and guide the PE-spCas9 protein to target the donor
  • the specific recognition sequence (SEQ ID NO: 15) of sg ⁇ on the vector, and a homologous valve structure is reverse transcribed at the incision.
  • the primers WAS-1-peg ⁇ -F (SEQ ID NO:97) and WAS-1-peg ⁇ -R (SEQ ID NO:98) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII by In-fusion cloning technology
  • the pUC19-U6-WAS-1-peg ⁇ plasmid is obtained, which can transcribe WAS-1-peg ⁇ (SEQ ID NO:99) and guide the PE-spCas9 protein to target the donor
  • the specific recognition sequence (SEQ ID NO: 19) of sg ⁇ on the vector, and a homologous flap structure is reverse transcribed at the incision.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • Primers sgWAS-3-F (SEQ ID NO: 104) and sgWAS-3-R (SEQ ID NO: 105) were annealed and connected to the pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase, The pUC19-U6-sg WAS-3 plasmid is obtained, which can transcribe sgWAS-3 (SEQ ID NO: 106), and guide the saCas9 protein to target the WAS-3 site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers WAS-3-peg ⁇ -F (SEQ ID NO:107) and WAS-3-peg ⁇ -R (SEQ ID NO:108) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII by In-fusion cloning technology
  • the pUC19-U6-sg ⁇ plasmid vector On the digested pUC19-U6-sg ⁇ plasmid vector, the pUC19-U6-WAS-3-peg ⁇ plasmid is obtained, which can transcribe WAS-3-peg ⁇ (SEQ ID NO: 109) and guide the PE-spCas9 protein to target the donor
  • the specific recognition sequence (SEQ ID NO: 15) of sg ⁇ on the vector, and a homologous valve structure is reverse transcribed at the incision.
  • the primers WAS-3-peg ⁇ -F (SEQ ID NO:110) and WAS-3-peg ⁇ -R (SEQ ID NO:111) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII by In-fusion cloning technology
  • the pUC19-U6-WAS-3-peg ⁇ plasmid is obtained, which can transcribe WAS-3-peg ⁇ (SEQ ID NO: 112) and guide the PE-spCas9 protein to target the donor
  • the specific recognition sequence (SEQ ID NO: 19) of sg ⁇ on the vector, and a homologous flap structure is reverse transcribed at the incision.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • Primers sgHBB-F (SEQ ID NO: 117) and sgHBB-R (SEQ ID NO: 118) were annealed and connected to pUC19-U6-sgRNA (saCas9) plasmid digested with BsaI with T4 ligase to obtain pUC19-U6 -sgHBB plasmid, which can transcribe sgHBB (SEQ ID NO: 119), guide saCas9 protein targeting HBB site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers HBB-peg ⁇ -F (SEQ ID NO: 120) and HBB-peg ⁇ -R (SEQ ID NO: 121) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-HBB-peg ⁇ plasmid, which can transcribe HBB-peg ⁇ (SEQ ID NO: 122), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the primers HBB-peg ⁇ -F (SEQ ID NO: 123) and HBB-peg ⁇ -R (SEQ ID NO: 124) were subjected to overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-HBB-peg ⁇ plasmid, which can transcribe HBB-peg ⁇ (SEQ ID NO: 125), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:19), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • sgIL2RG-F SEQ ID NO: 130
  • sgIL2RG-R SEQ ID NO: 131
  • saCas9 pUC19-U6-sgRNA
  • BsaI BsaI with T4 ligase
  • pUC19-U6 -sgIL2RG plasmid which can transcribe sgIL2RG (SEQ ID NO: 132), and guide the saCas9 protein to target the IL2RG site.
  • Primers sg ⁇ -F (SEQ ID NO: 12) and sg ⁇ -R (SEQ ID NO: 13) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 -U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO:14), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO:15).
  • Primers sg ⁇ -F (SEQ ID NO: 16) and sg ⁇ -F (SEQ ID NO: 17) were annealed and connected to the pUC19-U6-gRNA (spCas9) plasmid vector digested with BsaI by T4 ligase to obtain pUC19 - U6-sg ⁇ plasmid, which can transcribe sg ⁇ (SEQ ID NO: 18), and guide the spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector (SEQ ID NO: 19).
  • the primers IL2RG-peg ⁇ -F (SEQ ID NO:133) and IL2RG-peg ⁇ -R (SEQ ID NO:134) were used for overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-IL2RG-peg ⁇ plasmid, which can transcribe IL2RG-peg ⁇ (SEQ ID NO: 135), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:15), and reverse transcribed the homologous flap structure at the cut.
  • the primers IL2RG-peg ⁇ -F (SEQ ID NO:136) and IL2RG-peg ⁇ -R (SEQ ID NO:137) were used for overlap extension PCR, and the obtained fragments were recovered and ligated to HindIII-digested pUC19 by In-fusion cloning technology -U6-sg ⁇ plasmid vector, obtain pUC19-U6-IL2RG-peg ⁇ plasmid, which can transcribe IL2RG-peg ⁇ (SEQ ID NO: 138), and guide PE-spCas9 protein to target the specific recognition sequence of sg ⁇ on the donor vector ( SEQ ID NO:19), and reverse transcribed the homologous flap structure at the cut.
  • the reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by Jerry, and connected to the EcoRV digested pGH plasmid vector through T4 ligase as the donor vector.
  • NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences of sg ⁇ and sg ⁇ on both sides of the reporter gene (sequences are SEQ ID NO: 15 and SEQ ID NO: 19), two specific The recognition sequence is in reverse order in the form of PAM-out.
  • the SE Cell Line 4D-Nucleofector X Kit from Lonza Company was used to deliver pCAG-saCas9, the plasmid expressing the sgRNA targeting each site, and the HDR donor vector to K562 cells by electroporation.
  • the SE Cell Line 4D-Nucleofector X Kit from Lonza Company was used to transfect pCAG-saCas9, the plasmid expressing sgRNA targeting each site, pCAG-PE-spCas9-mCherry, and The corresponding pUC19-U6-peg ⁇ , pUC19-U6-peg ⁇ plasmids and donor vectors for f-PAINT were delivered into K562 cells.
  • no sgRNA plasmid was used to target the genome-specific site.
  • mCherry-positive cells were sorted by flow cytometry, and after the sorted cells were cultured for 14 days, the ratio of EGFP-positive cells was analyzed by flow cytometry.
  • Extract the genome of cells edited by f-PAINT system use primers AAVS1-P1(SEQ ID NO:48)/CAG-EGFP-P2(SEQ ID NO:49) (amplify the 5' linker of AAVS1 site), CAG-EGFP -P3 (SEQ ID NO:50)/AAVS1-P4 (SEQ ID NO:51) (amplification of the 3' linker at the AAVS1 site), CCR5-P1 (SEQ ID NO:76)/CAG-EGFP-P2 (SEQ ID NO:49) (5' linker to amplify the CCR5 site), CAG-EGFP-P3 (SEQ ID NO:50)/CCR5-P4 (SEQ ID NO:77) (3' linker to amplif
  • Figure 4 shows the integration efficiency of the exogenous gene CAG-EGFP at different sites in the genome mediated by different methods.
  • the results showed that in K562 cells, for different sites, the accuracy of site-specific integration of exogenous genes mediated by f-PAINT method was not much different from that of HDR, but the efficiency of site-specific integration was significantly higher than that of HDR method.
  • the results of genotype identification and Sanger sequencing are shown in Figure 5 and Figure 6, and the results show that f-PAINT can accurately mediate the integration of foreign genes at specific sites in the genome. The above results show that the f-PAINT method has great application potential in gene therapy.
  • the reporter gene IRES-EGFP was site-specifically knocked on the 3'UTR region of the human GAPDH gene using the h-PAINT system on 293T cells, and f -PAINT method was used as a control, and the integration efficiency of the two methods was compared.
  • the construction of sgRNA, pegRNA expression vector and donor vector is as described in Example 1.
  • the h-PAINT (LHA) donor vector is to connect the 800bp GAPDH left homology arm upstream of the exogenous gene IRES-EGFP, and connect the targeting recognition sequence of PE-spCas9/sg ⁇ downstream;
  • h-PAINT (RHA) donor vector The body vector is to connect the targeting recognition sequence of PE-spCas9/sg ⁇ upstream of the exogenous gene IRES-EGFP, and connect the 800bp right homology arm of GAPDH downstream.
  • the primer sequences used for vector construction and the homology arm sequences of the donor vectors are the same as in Example 1, see Table 1 for details.
  • FIG. 7 The schematic diagram of h-PAINT method mediating the site-specific integration of exogenous genes on the genome is shown in Figure 7.
  • FIG. 8a The schematic diagram of site-specific integration of exogenous gene IRES-EGFP in GAPDH gene 3'UTR using h-PAINT method is shown in Figure 8a.
  • h-PAINT (LHA) system pCAG-saCas9, sgGAPDH, pCAG-PE-spCas9-mCherry, pUC19-U6-peg ⁇ together with h-PAINT (LHA ) donor vector was transfected into 293T cells.
  • h-PAINT RHA
  • pCAG-saCas9, sgGAPDH, pCAG-PE-spCas9-mCherry, pUC19-U6-peg ⁇ together with h-PAINT (RHA ) donor vector was transfected into 293T cells.
  • the f-PAINT system was implemented as previously described. In the negative control group of f-PAINT system and h-PAINT system, no sgGAPDH plasmid was transfected.
  • Genome extraction was performed on cells edited by the h-PAINT (LHA) and h-PAINT (RHA) systems, using primers GAPDH-P1-2 (SEQ ID NO: 143)/GAPDH-P2 (SEQ ID NO: 30) (amplified 5' linker of h-PAINT (LHA), GAPDH-P3 (SEQ ID NO:31)/GAPDH-P4 (SEQ ID NO:32) (3' linker of amplified h-PAINT (LHA), GAPDH- P1 (SEQ ID NO:29)/GAPDH-P2 (SEQ ID NO:30) (amplification of the 5' linker of h-PAINT (RHA), GAPDH-P3 (SEQ ID NO:31)/GAPDH-P4-2 (S

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种外源基因定点整合的方法,所述方法不依赖同源臂和供体载体线性化。还涉及用于编辑核酸的系统和试剂盒及其用途,以及编辑核酸的方法。所述系统、试剂盒和方法可用于断裂双链靶核酸的一条核酸链并在其末端(特别是由断裂所产生的末端)形成瓣突,并且可用于在感兴趣的核酸分子(例如基因组DNA)中插入靶核酸或将感兴趣的核酸分子(例如基因组DNA)中的核苷酸片段置换为靶核酸。

Description

一种外源基因定点整合的系统及方法 技术领域
本申请涉及基因工程和分子生物学领域。特别地,本申请涉及外源基因定点整合的方法,所述方法不依赖同源臂和供体载体线性化。本申请还涉及用于编辑核酸的系统和试剂盒及其用途,以及编辑核酸的方法。本申请的系统、试剂盒和方法可用于断裂双链靶核酸的一条核酸链并在其末端(特别是由断裂所产生的末端)形成瓣突,并且可用于在感兴趣的核酸分子(例如基因组DNA)中插入靶核酸或将感兴趣的核酸分子(例如基因组DNA)中的核苷酸片段置换为靶核酸。
背景技术
基因编辑技术是生物医学研究的热门领域,在遗传性疾病的临床治疗、动物模型的构建、农作物的遗传育种等方面具有广阔的应用前景。基因编辑技术包括在基因组特异位点上,对单个核苷酸或一段DNA序列进行删除、添加和替换等操作。外源基因的定点敲入可以通过同源重组(HDR,homologous dependent recombination)实现:在外源基因的两侧各引入一段500-3000bp的同源臂,可以实现外源基因精确的定点整合,但其效率极低,只有0.01%左右。通过人工构建的核酸酶如ZFN(zinc-finger nucleases)、TALEN(transcription activator-like effector nucleases)或CRISPR/Cas9(clustered regularly interspaced short palindromic repeats/CRISPR-associated protein-9 nuclease)在基因组的靶向位点进行切割,产生DNA双链断裂(DSB,double strand break),可以促进同源重组介导的外源基因的定点敲入。但由于大多数哺乳动物细胞主要依靠NHEJ(non-homologous end joining)进行DSB修复,基于核酸酶和同源重组的定点敲入效率依然很低,一般在1%左右。此外,由于同源重组只发生在细胞周期的S/G2期,对于处于终末分化阶段的大多数体细胞则无法通过以上方法实现外源基因的定点整合。
HMEJ(Homology-mediated end joining)是通过在供体载体上同源臂的两侧加入可以被CRISPR/Cas靶向切割的识别序列,从而使供体载体线性化,提高同源重组的效率。
以线性单链DNA为供体也可以实现外源DNA片段的定点整合。单链DNA供体的两端各有一段30-50nt的同源臂,核酸酶在基因组的特异位点切割后,单链依靠SDSA(synthesis dependent strand annealing)的方式整合到DSB位点,从而实现基因组特异位点的整合。线性单链DNA相比HDR更加高效,但不够精确:单链DNA的5’端的接头处常常发生额外的碱基插入和缺失。此外,长片段的线性DNA单链化学合成的成本很高,不易获得。因此,这种方法不适用于大片段(大于1Kb)的外源基因的定点敲入。除此之外,当插入片段超过1Kb,其整合效率也会显著降低。
基于NHEJ的定点敲入,如HITI(Homology-independent target integration)技术,不依赖外源基因两端的同源臂,其中,核酸酶在切割基因组上特异位点的同时也切割供体载体,随后线性化的外源基因DNA片段通过NHEJ DNA修复通路插入到基因组的断裂位点。基于NHEJ的定点敲入不具有方向性,且接头的位置常常不精确,容易产生额外的碱基插入或缺失。基于MMEJ的定点敲入方法是在NHEJ基础上,在外源基因的两端引入微同源臂,但效率仍然很低。
Prime Editing是一种新型基因编辑方法。该方法使用由具有H840A突变的spCas9(nCas9)与逆转录酶MLV-RT(Murine Leukemia Virus-Reverse Transcriptase)构成的融合蛋白,以及由gRNA(guide RNA)改造而来的PegRNA(Prime editing guide RNA),可以实现任意单碱基的转换/颠换或者小片段DNA的删除、添加及替换。 PegRNA是在gRNA的3’端引入一段PBS(Prime binding site)序列以及一段模板序列而产生的,其中,模板序列含有编辑序列和一段基因组DSB位点的同源序列。在该方法中,由nCas9与PegRNA形成的复合物结合到基因组靶向位点并切割PAM链,随后PegRNA上的PBS序列与PAM链上游离出来的3’末端互补配对,然后MLV-RT以PegRNA的模板序列为模板,在PAM链切口处的3’末端逆转录延伸出编辑序列和同源序列。随后,经过DNA单链的置换和错配修复等过程,可以在切口处完成修复并将编辑序列整合到靶向位点。由于H840A nCas9只切割双链DNA的一条链(即PAM链),不会产生DSB引发NHEJ,因此,该方法不易引入额外的碱基缺失或插入,编辑的精确度高。但是,由于PegRNA上模板序列的长度限制了可编辑序列的长度,Prime Editing仅适用于小于100bp的碱基序列的删除或敲入。
因此,建立一种能够高效进行基因定点敲入和置换的方法,特别是能够高效进行大片段(大于1Kbp)外源基因的插入和置换的方法,对于拓展基因编辑技术在生产以及医疗中的应用至关重要。
发明内容
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的核酸化学实验室操作步骤均为相应领域内广泛使用的常规步骤。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
术语“Cas蛋白”或“Cas核酸酶”是一种RNA引导的核酸酶。Cas蛋白也被称为casn1核酸酶或CRISPR相关核酸酶。CRISPR(聚簇规则间隔短回文重复)是一种适应性免疫系统,其提供针对移动遗传元件(病毒、转座元件和接合质粒)的保护。CRISPR簇含有重复序列(repeat)和间隔序列(spacer),其中,间隔序列是与移动遗传元件互补的序列,能够靶向侵入核酸。CRISPR簇被转录并加工成CRISPR RNA(crRNA)。在II型CRISPR系统中,对pre-crRNA的正确加工还需要反式编码的小RNA(tracrRNA)的参与。因此,在自然界中,II型CRISPR系统对DNA的切割需要Cas蛋白和两种RNA。但是,通过工程化可以将crRNA和tracrRNA并入单一引导RNA(简称“sgRNA”或“gNRA”)中。参见例如Jinek M.,Chylinski K.,Fonfara I.,Hauer M.,Doudna J.A.,Charpentier E.Science 337:816-821(2012),其全部内容通过引用并入本文。
如本文中所使用的,术语“互补”意指,两条核酸序列能够根据碱基配对原则(Waston-Crick原则)在彼此之间形成氢键,并由此形成双链体。在本申请中,术语“互补”包括“实质上互补”和“完全互补”。如本文中所使用的,术语“完全互补”意指,一条核酸序列中的每一个碱基都能够与另一条核酸链中的碱基配对,而不存在错配或缺口。如本文中所使用的,术语“实质上互补”意指,一条核酸序列中的大部分碱基都能够与另一条核酸链中的碱基配对,其允许存在错配或缺口(例如,一个或数个核苷酸的错配或缺口)。通常,在允许核酸杂交、退火或扩增的条件下,“互补”(例如实质上互补或完全互补)的两条核酸序列将选择性地/特异性地发生杂交或退火,并形成双链体。
如本文中所使用的,术语“DNA聚合酶”是指,能够以一条核酸链(例如DNA链或RNA链)为模板合成另一条核酸链(DNA链)的酶。在本申请中,DNA聚合酶可以是依赖于DNA的DNA聚合酶(即,能够以DNA链为模板合成互补的DNA链的酶),也可以是依赖于RNA的DNA聚合酶(即,能够以RNA链为模板合成互补的DNA链的酶)。在某些实施方案中,本申请所使用的DNA聚合酶为依赖于RNA的DNA聚合酶,例如逆转录酶。
如本文中所使用的,术语“逆转录酶(RT)”是指能够以RNA链为模板合成互补的 DNA链的酶。本申请的逆转录酶包括但不限于,来自逆转录病毒或其它病毒或细菌的逆转录酶,以及具有逆转录活性的DNA聚合酶,如TTH DNA聚合酶,Taq DNA聚合酶,TNE DNA聚合酶,TMA DNA聚合酶等。来自逆转录病毒的逆转录酶包括但不限于,来自Moloney鼠白血病病毒(M-MLV),人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶。逆转录酶的具体实例还可参见例如,美国专利申请2002/0198944(其全文通过引用方式并入本文)。另外,本申请的逆转录酶包括但不限于任何形式,例如,天然存在的逆转录酶,天然存在的突变体逆转录酶,工程化突变体逆转录酶或其它变体(例如,保留其逆转录活性的截短变体)。
如本文中所使用的,术语“杂交”和“退火”意指,互补的单链核酸分子形成双链核酸的过程。在本申请中,“杂交”和“退火”具有相同的含义,并且可互换使用。通常,完全互补或实质上互补的两条核酸序列可发生杂交或退火。两条核酸序列发生杂交或退火所需要的互补性取决于所使用的杂交条件,特别是温度。
如本文中所使用的,“允许核酸杂交的条件”具有本领域技术人员通常理解的含义,并且可通过常规的方法来确定。例如,具有互补序列的两条核酸分子可在合适的杂交条件下发生杂交。此类杂交条件可涉及下列因素:温度,杂交缓冲液的pH值、成分和离子强度等,并且可根据互补的两条核酸分子的长度和GC含量来确定。例如,当互补的两条核酸分子的长度相对较短和/或GC含量相对较低时,可采用低严紧的杂交条件。当互补的两条核酸分子的长度相对较长和/或GC含量相对较高时,可采用高严紧的杂交条件。此类杂交条件是本领域技术人员熟知的,并且可参见例如Joseph Sambrook,et al.,Molecular Cloning,A Laboratory Manual,Cold Spring Harbor Laboratory Press,Cold Spring Harbor,N.Y.(2001);和M.L.M.Anderson,Nucleic Acid Hybridization,Springer-Verlag New York Inc.N.Y.(1999)。在本申请中,“杂交”和“退火”具有相同的含义,并且可互换使用。相应地,表述“允许核酸杂交的条件”和“允许核酸退火的条件”也具有相同的含义,并且可互换使用。
如本文中所使用的,术语“上游”用于描述两条核酸序列(或两个核酸分子)的相对位置关系,并且具有本领域技术人员通常理解的含义。例如,表述“一条核酸序列位于另一条核酸序列的上游”意指,当以5'至3'方向排列时,与后者相比,前者位于更靠前的位置(即,更接近5'端的位置)。如本文中所使用的,术语“下游”具有与“上游”相反的含义。
如本文中所使用的,术语“接头”是指,用于连接两个实体元件(例如两个核酸或两个多肽)的化学实体。例如,用于连接两个多肽的接头可以为肽接头(例如,包含多个氨基酸残基的接头);用于连接两个核酸的接头可以为核酸接头(例如,包含多个核苷酸的接头)。
如本文中所使用的,术语“引导序列”是指导向RNA包含的靶向序列。在某些情况下,引导序列是与靶序列具有足够互补性,从而能够与所述靶序列杂交并引导CRISPR/Cas复合物与所述靶序列的特异性结合的多核苷酸序列。在某些实施方案中,引导序列与其相应靶序列之间的互补程度为至少50%、至少60%、至少70%、至少80%、至少90%、至少95%、或至少99%。确定两条核酸序列的互补性的方法在本领域普通技术人员的能力范围内。例如,存在公开和可商购的比对算法和程序,诸如但不限于ClustalW、matlab中的史密斯-沃特曼算法(Smith-Waterman)、Bowtie、Geneious、Biopython以及SeqMan。
如本文中所使用的,术语“支架序列”是指导向RNA中被Cas蛋白识别并结合的序列。在某些情况下,支架序列可包含或者由CRISPR的重复序列组成。
如本文中所使用的,术语“功能性复合物”是指,导向RNA(guide RNA或gRNA)与Cas蛋白结合所形成的复合体,其能够识别并切割与该导向RNA的多核苷酸。
如本文中所使用的,术语“靶核酸”或“靶序列”是指导向序列所靶向的多核苷酸,例如与该导向序列具有互补性的序列。导向序列与靶序列的完全互补性不是必需的,只要存在足够互补性以引起二者杂交并且促进CRISPR/Cas复合物的结合即可。靶序列可以包含任何多核苷酸,如DNA或RNA。在某些情况下,所述靶序列位于细胞的细胞核或细胞质中。在某些情况下,该靶序列可位于真核细胞的一个细胞器例如线粒体或叶绿体内。
在本发明中,表述“靶序列”或“靶核酸”对细胞(例如,真核细胞)而言,可以是任何内源或外源的多核苷酸。例如,靶核酸可以是存在于真核细胞的细胞核中的多核苷酸(例如基因组DNA),也可以是外源导入细胞中的多核苷酸(例如载体DNA)。例如,靶核酸可以是编码基因产物(例如蛋白质)的序列或非编码序列(例如,调节多核苷酸或无用DNA)。在某些情况下,靶核酸或靶序列包含原间隔序列临近基序(PAM)或与之相邻。对PAM的精确序列和长度的要求取决于使用的Cas蛋白。通常,PAM为CRISPR簇中临近原间隔序列的2-5个碱基对的序列。本领域技术人员能够鉴定与给定的Cas蛋白一起使用的PAM序列。
如本文中所使用的,术语“载体”是指,可将多聚核苷酸插入其中的一种核酸运载工具。当载体能使插入的多核苷酸编码的蛋白获得表达时,载体称为表达载体。载体可以通过转化,转导或者转染导入宿主细胞,使其携带的遗传物质元件在宿主细胞中获得表达。载体是本领域技术人员公知的,包括但不限于:质粒;噬菌粒;柯斯质粒;纳米脂质体颗粒;外泌体;人工染色体,例如酵母人工染色体(YAC)、细菌人工染色体(BAC)或P1来源的人工染色体(PAC);噬菌体如λ噬菌体或M13噬菌体及动物病毒等。可用作载体的动物病毒包括但不限于,逆转录酶病毒(包括慢病毒)、腺病毒、腺相关病毒、疱疹病毒(如单纯疱疹病毒)、痘病毒、杆状病毒、乳头瘤病毒、乳头多瘤空泡病毒(如SV40)。一种载体可以含有多种控制表达的元件,包括但不限于,启动子序列、转录起始序列、增强子序列、选择元件及报告基因。另外,载体还可含有复制起始位点。本领域技术人员将理解,表达载体的设计可取决于诸如待转化的宿主细胞的选择、所希望的表达水平等因素。当载体携带拟整合到宿主基因组上的外源DNA以及与外源DNA整合相关的非蛋白表达元件时,载体称为供体载体。外源DNA包括但不限于完整的基因或基因片段,启动子序列、转录起始序列、增强子序列、选择元件及蛋白编码序列。与外源DNA整合相关的非蛋白表达元件包括但不限于拟插入位点的同源序列、工具酶的靶向切割序列等。腺相关病毒载体包括但不限于AAV1,AAV2,AAV3,AAV4,AAV5,AAV6,AAV7,AAV8,AAV9,AAV-DJ等不同血清型的腺相关病毒以及其他改造的血清型的腺相关病毒。
本发明中,所述“内含肽”是指一类可以介导翻译后的蛋白进行剪接的内部蛋白原件。内含肽位于多肽序列的中间,经过加工后切除,并催化两端的蛋白质外显肽连接为成熟的蛋白质分子。所述“内含肽拆分系统”是一种利用内含肽对较大的蛋白质分子进行高效的拆分和拼接的系统。内含肽可以分开为N端段和C端段。将目的蛋白拆分为N端段和C端段两部分,分别与内含肽的N端段和C端段连接,形成融合蛋白。只有当N端部分和C端部分两融合蛋白相遇时,拆分的前体蛋白中的内含肽发生蛋白剪接去除,目的蛋白的N端段和C端段实现拼接,进而形成有功能的目的蛋白。本发明中适用的内含肽来自但不限于Synechocystis sp.PCC6803以及Nostoc punctiforme PCC73102(Npu)的 DnaE DNA聚合酶。
如本文中所使用的,术语“宿主细胞”是指,可用于导入载体的细胞,其包括但不限于,如大肠杆菌或枯草菌等的原核细胞,如酵母细胞或曲霉菌等的真菌细胞,如S2果蝇细胞或Sf9等的昆虫细胞,或者如纤维原细胞,CHO细胞,COS细胞,NSO细胞,HeLa细胞,BHK细胞,HEK 293细胞或人细胞等的动物细胞。
如本文中所使用的,术语“spCas9(H840A)”是指spCas9蛋白的一种突变体,具体是将对应于spCas9蛋白的第840位氨基酸由H突变为A。
同理的,术语“saCas9(R1226A)”是指saCas9蛋白的一种突变体,具体是将对应于saCas9蛋白的第1226位氨基酸由R突变为A。
术语“PE-spCas9”是指spCas9(H840A)与逆转录酶MLV RT融合产生的融合蛋白。
如本文中所使用的,术语“瓣突”是指双链靶核酸的一条链断裂所产生的切口处的3’端连接的一段游离的核酸片段,该片段不与对应的另一条链的核苷酸片段互补,因此呈游离状态。“同源瓣”是指双链靶核酸上形成的瓣突序列与基因组上特异切割位点的末端序列相同或互补。在一些实施方案中,所述瓣突可以通过以下步骤获得:cas蛋白断裂双链靶核酸(例如,含有目的核酸序列的供体载体)的一条链后,断裂的核酸链切口处的3’端能够以退火至断裂的核酸链上的模板序列(例如,pegRNA)为模板进行延伸,并形成游离的核酸片段。
如本文中所使用的,术语“同源重组(HDR,homologous dependent recombination)”是指基于构建体(例如,供体核酸载体)中目的核酸序列上游和/或下游的核酸序列与基因组或核酸片段中靶位点上游和/或下游的核酸序列的序列同源性进行的DNA重组过程。在本文中,所述供体核酸载体中目的核酸序列上游和/或下游的核酸序列被称为“供体同源臂”。在本文中,所述基因组或核酸片段中靶位点上游和/或下游的核酸序列被称为“靶位点同源臂”。
在某些实施方案中,所述供体同源臂与所述靶位点同源臂是相同的或高度同源的,即具有至少85%,90%,95%,98%,或100%序列同一性。
在某些实施方案中,所述供体同源臂位于所述目的核酸序列的上游,且所述靶位点同源臂位于所述靶位点的上游。在某些实施方案中,所述供体同源臂位于所述目的核酸序列的下游,且所述靶位点同源臂位于所述靶位点的下游。在第一方面,本申请提供了一种系统或试剂盒,其包含下述四种组分:
(1)第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1,其中,所述第一Cas蛋白能够切割或断裂第一双链靶核酸的一条核酸链;
(2)依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1;
(3)第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1,其中,所述第一gRNA能够与所述第一Cas蛋白结合,并形成第一功能性复合物;所述第一功能性复合物能够将第一双链靶核酸的一条核酸链断裂;
(4)第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1,其中,所述第一标签引物含有第一标签序列和第一靶结合序列,所述第一标签序列位于所述第一靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第一靶结合序列能够杂交或退火至所述断裂的核酸链的3’端,形成双链结构,且,所述第一标签序列不与所述核酸链结合,处于游离的单链状态。
在某些实施方案中,所述第一Cas蛋白选自切割DNA单链的Cas蛋白,例如所述切割DNA单链是指切割非gRNA靶向结合的DNA单链。
在某些实施方案中,所述第一Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白或Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式。
在某些实施方案中,所述第一Cas蛋白能够断裂第一双链靶核酸的一条核酸链,并产生切口。
在某些实施方案中,所述第一Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A))。
在某些实施方案中,所述第一Cas蛋白具有SEQ ID NO:3示的氨基酸序列。
各种Cas蛋白的序列和结构是本领域技术人员熟知的。目前,已经在多种物种中报道了多种Cas9蛋白及其同源物,包括但不限于酿脓链球菌和嗜热链球菌。基于本发明所公开的内容,其它适合的Cas9蛋白对于本领域技术人员将是显而易见的,例如,Chylinski,Rhun,and Charpentier.The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems.(2013)RNA Biology 10:5,726-737(其全部内容通过引用并入本文)中公开的Cas9蛋白。
在一些实施方案中,Cas9是来自以下物种的Cas9:溃疡棒状杆菌(NCBI Refs:NC_015683.1,NC_017317.1);白喉棒状杆菌(NCBI Refs:NC_016782.1,NC_016786.1);Spiroplasma syrphidicola(NCBI Ref:NC_021284.1);中间普雷沃菌(NCBI Ref:NC_017861.1);Spiroplasma taiwanense(NCBI Ref:NC_021846.1);海豚链球菌(NCBI Ref:NC_021314.1);Belliella baltica(NCBI Ref:NC_018010.1);Psychrof lexus torq uisI(NCBI Ref:NC_018721.1);嗜热链球菌(NCBI Ref:YP_820832.1);无害利斯特菌(NCBI Ref:NP_472073.1);酿脓链球菌(NCBI Ref:NC_017053.1)。
在某些实施方案中,所述第一DNA聚合酶选自但不限于依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第一DNA聚合酶为依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第一DNA聚合酶为逆转录酶,例如上文列举的逆转录酶,例如莫洛尼氏鼠白血病病毒的逆转录酶。
在某些实施方案中,所述第一DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列。
在某些实施方案中,所述第一Cas蛋白与所述第一DNA聚合酶相连接。
在某些实施方案中,所述第一Cas蛋白通过接头或者不通过接头与所述第一DNA聚合酶共价相连接。
在某些实施方案中,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列。
在某些实施方案中,所述第一Cas蛋白通过肽接头或者不通过肽接头与所述第一DNA聚合酶融合,形成第一融合蛋白。
在某些实施方案中,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的N端;或者,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的C端。
在某些实施方案中,所述第一融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
在一些实施方案中,所述接头为肽接头。在一些实施方案中,所述肽接头的长度为5-200个氨基酸,例如5,6,7,8,9,10,15,20,25,30,30-40,40-50,50-60,60-70,70-80,80-90,90-100,100-150或150-200个氨基酸。
在某些实施方案中,所述第一融合蛋白或所述第一cas蛋白可以通过内含肽拆分系统拆分为两个部分。易于理解,所述内含肽拆分系统可以在第一融合蛋白或所述第一cas蛋白的任意氨基酸位置拆分。例如,在某些实施方案中,所述内含肽拆分系统在所述的第一cas蛋白的内部进行拆分。因此,在某些实施方案中,所述第一cas蛋白被拆分为N端段和C端段。例如,所述第一cas蛋白的N端段和C端段可以分别与内含肽的N端段和C端段(或者分别与内含肽的C端段和N端段)融合,并且二者在细胞内能够重构成具有活性的第一cas蛋白。在某些实施方案中,所述第一cas蛋白的N端段和C端段在分离的状态下各自不具有活性,但在细胞内能够重构成具有活性的第一cas蛋白。相应地,在某些实施方案中,所述核酸分子A1可以被拆分为两个部分,其分别包含编码所述第一cas蛋白的N端段和C端段的核苷酸序列。此外,易于理解,在所述第一融合蛋白中,所述第一DNA聚合酶可以融合至所述第一cas蛋白的N端段或C端段。在某些实施方案中,所述第一DNA聚合酶融合至所述第一cas蛋白的C端段。
在某些实施方案中,所述第一gRNA含有第一引导序列,并且,在允许核酸杂交或退火的条件下,所述第一引导序列能够杂交或退火至第一双链靶核酸的一条核酸链。
在某些实施方案中,所述第一引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一gRNA还含有第一支架序列,其能够被所述第一Cas蛋白识别并结合,从而形成第一功能性复合物。
在某些实施方案中,所述第一支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一引导序列位于所述第一支架序列的上游或5’端。
在某些实施方案中,所述第一功能性复合物在所述第一引导序列与第一双链靶核酸的一条核酸链(第二链)结合后,能够将第一双链靶核酸的另一条核酸链(第一链)断裂。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一靶结合序列能够杂交或退火至所述断裂的靶核酸片段的一条核酸链的3’端,并且所述3’端是因所述第一功能性复合物断裂所述第一双链靶核酸而形成的。
在某些实施方案中,所述第一靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,在所述第一靶结合序列杂交或退火到所述断裂的靶核酸片段的一条核酸链的3’端后,所述第一DNA聚合酶能够以第一标签引物为模板,延伸所述核酸链的3’端。在某些实施方案中,所述延伸形成第一瓣突。
在某些实施方案中,所述第一标签引物为单链脱氧核糖核酸或者单链核糖核酸。
在某些实施方案中,所述第一标签引物为单链核糖核酸,并且所述第一DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第一标签引物为单链脱氧核糖核酸,并且所述第一DNA聚合酶为依赖于DNA的DNA聚合酶。
在某些实施方案中,所述第一引导序列结合的核酸链与所述第一靶结合序列结合的核酸链是不同的。在某些实施方案中,所述第一引导序列结合的核酸链是所述第一靶结 合序列结合的核酸链的相对链。
在某些实施方案中,所述第一标签引物与所述第一gRNA相连接。
在某些实施方案中,所述第一标签引物通过接头或者不通过接头与所述第一gRNA共价相连接。
在某些实施方案中,所述第一标签引物任选地通过接头连接至所述第一gRNA的3’端。
在某些实施方案中,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头)。
在某些实施方案中,所述第一标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第一gRNA的3’端相连接,形成第一PegRNA。
在某些实施方案中,所述核酸分子A1能够在细胞中表达所述第一Cas蛋白。在某些实施方案中,所述核酸分子B1能够在细胞中表达所述第一DNA聚合酶。在某些实施方案中,所述核酸分子C1能够在细胞中转录出所述第一gRNA。在某些实施方案中,所述核酸分子D1能够在细胞中转录出所述第一标签引物。
在某些实施方案中,所述核酸分子A1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子A1为含有编码所述第一Cas蛋白的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述核酸分子B1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子B1为含有编码所述第一DNA聚合酶的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述核酸分子C1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子C1为含有编码所述第一gRNA的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述核酸分子D1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子D1为含有编码所述第一标签引物的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。
在某些实施方案中,所述核酸分子A1、B1、C1和D1中的两个、三个或四个包含于相同的表达载体(例如,真核表达载体)中。
在某些实施方案中,所述系统或试剂盒包含:
(M1-1)含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白,或者,含有编码所述第一融合蛋白的核苷酸序列的核酸分子;或,(M1-2)分离的所述第一Cas蛋白和第一DNA聚合酶,或者,能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子;和,
(M2)含有所述第一gRNA和第一标签引物的第一PegRNA,或者,含有编码所述第一PegRNA的核苷酸序列的核酸分子。
在某些实施方案中,所述系统或试剂盒还包含:
(5)第二核酸编辑系统,所述第二核酸编辑系统为同源重组技术。
在某些实施方案中,所述系统或试剂盒还包含核酸载体(例如,供体核酸载体)。
在某些实施方案中,所述核酸载体还包含所述第一Cas蛋白识别的第一PAM序列,和/或,供体同源臂。
在某些实施方案中,所述核酸载体是双链的。
在某些实施方案中,所述核酸载体是环状双链载体。
在某些实施方案中,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列)。
在某些实施方案中,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序列,断裂所述核酸载体的一条核酸链。
在某些实施方案中,所述核酸载体还包含目的核酸序列。
在某些实施方案中,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段。
在某些实施方案中,所述第一PAM序列和所述供体同源臂分别位于目的核酸序列的两侧。
在某些实施方案中,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间。
在某些实施方案中,所述第一功能性复合物断裂所述核酸载体的第一链,所述第一链包含由断裂所产生的切口,位于上述切口的3’端和所述供体同源臂之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链。
在某些实施方案中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶序列位于所述第一引导结合序列的相对链。在某些实施方案中,所述第一靶序列位于断裂的第一链的末端,在某些实施方案中,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(在某些实施方案中,形成第一瓣突)。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述供体同源臂之间还包含限制性酶切位点。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述供体同源臂之间还包含外源基因。
在某些实施方案中,所述系统或试剂盒还包含:
(5)第二gRNA或含有编码所述第二gRNA的核苷酸序列的核酸分子C2,其中,所述第二gRNA能够与第二Cas蛋白结合,并形成第二功能性复合物;所述第二功能性复合物能够将第二双链靶核酸的一条核酸链断裂。
在某些实施方案中,所述第二Cas蛋白与所述第一Cas蛋白相同或者不同。在某些实施方案中,所述第二Cas蛋白与所述第一Cas蛋白相同。
在某些实施方案中,所述第二gRNA含有第二引导序列,并且,在允许核酸杂交或退火的条件下,所述第二引导序列能够杂交或退火到第二双链靶核酸的一条核酸链。
在某些实施方案中,所述第二功能性复合物在所述第二引导序列与第二双链靶核酸的一条链(第一链)结合后,将第二双链靶核酸的另一条核酸链(第二链)断裂。在某些实施方案中,所述第二引导序列与所述第一引导序列不同。
在某些实施方案中,所述第二双链靶核酸与所述第一双链靶核酸相同或者不同。
在某些实施方案中,所述第二双链靶核酸与所述第一双链靶核酸是相同的,并且,所述第二功能性复合物与所述第一功能性复合物在不同的位置断裂所述双链靶核酸的不同核酸链。
在某些实施方案中,所述第二功能性复合物与所述第一功能性复合物断裂相同的双链靶核酸的不同核酸链,并且,所述第一引导序列结合的核酸链与所述第二引导序列结合的核酸链是不同的。在某些实施方案中,所述第一引导序列结合的核酸链是所述第二引导序列结合的核酸链的相对链。
在某些实施方案中,所述第二双链靶核酸与所述第一双链靶核酸是同一双链靶核酸,所述双链靶核酸包含第一链和第二链,所述第一功能性复合物在所述第一引导序列与第二链结合后,能够将第一链断裂,所述第二功能性复合物在所述第二引导序列与第一链结合后,将第二链断裂。在某些实施方案中,所述第二引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二gRNA还含有第二支架序列,其能够被所述第二Cas蛋白识别并结合,从而形成第二功能性复合物。
在某些实施方案中,所述第二支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二支架序列与所述第一支架序列相同或者不同。在某些实施方案中,所述第二支架序列与所述第一支架序列相同。
在某些实施方案中,所述第二引导序列位于所述第二支架序列的上游或5’端。
在某些实施方案中,所述核酸分子C2能够在细胞中转录出所述第二gRNA。
在某些实施方案中,所述核酸分子C2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子C2为含有编码所述第二gRNA的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述第二Cas蛋白与所述第一Cas蛋白不同;并且,所述系统或试剂盒还包含:
(6)所述第二Cas蛋白或含有编码所述第二Cas蛋白的核苷酸序列的核酸分子A2,其中,所述第二Cas蛋白能够切割或断裂第二双链靶核酸的一条核酸链。
在某些实施方案中,所述第二Cas蛋白能够断裂第二双链靶核酸的一条核酸链,并产生切口。
在某些实施方案中,所述第二Cas蛋白选自切割DNA单链的Cas蛋白,例如所述切割DNA单链是指切割非gRNA靶向结合的DNA单链。
在某些实施方案中,所述第二Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1 蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式。
在某些实施方案中,所述第二Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A))。
在某些实施方案中,所述第二Cas蛋白具有SEQ ID NO:3所示的氨基酸序列。
在某些实施方案中,所述核酸分子A2能够在细胞中表达所述第二Cas蛋白。
在某些实施方案中,所述核酸分子A2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子A2为含有编码所述第二Cas蛋白的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述系统或试剂盒还包含:
(7)第二标签引物或含有编码所述第二标签引物的核苷酸序列的核酸分子D2,其中,所述第二标签引物含有第二标签序列和第二靶结合序列,所述第二标签序列位于所述第二靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第二靶结合序列能够杂交或退火到所述断裂的核酸链的3’端,形成双链结构,且,所述第二标签序列不与所述核酸链结合,处于游离的单链状态。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第二靶结合序列能够杂交或退火到所述断裂的核酸链的3’端,并且所述3’端是因所述第二功能性复合物断裂所述核酸链而形成的。
在某些实施方案中,所述第二靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二靶结合序列与所述第一靶结合序列不同。在某些实施方案中,所述第二靶结合序列结合的核酸链与所述第一靶结合序列结合的核酸链是不同的。在某些实施方案中,所述第二靶结合序列结合的核酸链是所述第一靶结合序列结合的核酸链的相对链。
在某些实施方案中,所述第二标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二标签序列与所述第一标签序列相同或不同;在某些实施方案中,所述第二标签序列与所述第一标签序列不同。
在某些实施方案中,在所述第二靶结合序列杂交或退火到所述断裂的核酸链的3’端后,第二DNA聚合酶能够以第二标签引物为模板,延伸所述核酸链的3’端。在某些实施方案中,所述延伸形成第二瓣突。
在某些实施方案中,所述第二DNA聚合酶与所述第一DNA聚合酶相同或者不同。在某些实施方案中,所述第二DNA聚合酶与所述第一DNA聚合酶相同。
在某些实施方案中,所述第二标签引物为单链脱氧核糖核酸或者单链核糖核酸。
在某些实施方案中,所述第二标签引物为单链核糖核酸,并且所述第二DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第二标签引物为单链脱氧核糖核酸,并且所述第二DNA聚合酶为依赖于DNA的DNA聚合酶。
在某些实施方案中,所述第二引导序列结合的核酸链与所述第二靶结合序列结合的核酸链是不同的。在某些实施方案中,所述第二引导序列结合的核酸链是所述第二靶结合序列结合的核酸链的相对链。
在某些实施方案中,所述第二引导序列与所述第一靶结合序列结合相同的核酸链,并且,所述第二引导序列的结合位置位于所述第一靶结合序列的结合位置的上游或5’ 端。
在某些实施方案中,所述第一引导序列与所述第二靶结合序列结合相同的核酸链,并且,所述第一引导序列的结合位置位于所述第二靶结合序列的结合位置的上游或5’端。
在某些实施方案中,所述第一瓣突和第二瓣突包含于相同的双链靶核酸上,且彼此位于相对的核酸链上。
在某些实施方案中,所述核酸分子D2能够在细胞中转录出所述第二标签引物。
在某些实施方案中,所述核酸分子D2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子D2为含有编码所述第二标签引物的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,所述第二DNA聚合酶与所述第一DNA聚合酶不同;并且,所述系统或试剂盒还包含:
(8)所述第二DNA聚合酶或含有编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2。
在某些实施方案中,所述第二DNA聚合酶选自但不限于依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第二DNA聚合酶为依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第二DNA聚合酶为逆转录酶,例如上文列举的逆转录酶,例如莫洛尼氏鼠白血病病毒的逆转录酶。
在某些实施方案中,所述第二DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列。
在某些实施方案中,所述核酸分子B2能够在细胞中表达所述第二DNA聚合酶。
在某些实施方案中,所述核酸分子B2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子B2为含有编码所述第二DNA聚合酶的核苷酸序列的表达载体(例如,真核表达载体)。
在某些实施方案中,其中,所述第二标签引物与所述第二gRNA相连接。
在某些实施方案中,所述第二标签引物通过接头或者不通过接头与所述第二gRNA共价相连接。
在某些实施方案中,所述第二标签引物任选地通过接头连接至所述第二gRNA的3’端。
在某些实施方案中,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头)。
在某些实施方案中,所述第二标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第二gRNA的3’端相连接,形成第二PegRNA。
在某些实施方案中,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA。
在某些实施方案中,所述系统或试剂盒包含:含有所述第二gRNA和所述第二标签引物的第二PegRNA,或者,含有编码所述第二PegRNA的核苷酸序列的核酸分子。
在某些实施方案中,所述第二Cas蛋白与所述第二DNA聚合酶是分离的或者相连接的。
在某些实施方案中,所述第二Cas蛋白通过接头或者不通过接头与所述第二DNA聚合酶共价相连接。
在某些实施方案中,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列。
在某些实施方案中,所述第二Cas蛋白通过肽接头或者不通过肽接头与所述第二DNA聚合酶融合,形成第二融合蛋白。
在某些实施方案中,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的N端;或者,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的C端。
在某些实施方案中,所述第二融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
在某些实施方案中,所述第二融合蛋白或所述第二cas蛋白可以通过内含肽拆分系统拆分为两个部分。易于理解,所述内含肽拆分系统可以在第二融合蛋白或所述第二cas蛋白的任意氨基酸位置拆分。例如,在某些实施方案中,所述内含肽拆分系统在所述的第二cas蛋白的内部进行拆分。因此,在某些实施方案中,所述第二cas蛋白被拆分为N端段和C端段。例如,所述第二cas蛋白的N端段和C端段可以分别与内含肽的N端段和C端段(或者分别与内含肽的C端段和N端段)融合,并且二者在细胞内能够重构成具有活性的第二cas蛋白。在某些实施方案中,所述第二cas蛋白的N端段和C端段在分离的状态下各自不具有活性,但在细胞内能够重构成具有活性的第二cas蛋白。相应地,在某些实施方案中,所述核酸分子A1可以被拆分为两个部分,其分别包含编码所述第二cas蛋白的N端段和C端段的核苷酸序列。此外,易于理解,在所述第二融合蛋白中,所述第二DNA聚合酶可以融合至所述第二cas蛋白的N端段或C端段。在某些实施方案中,所述第二DNA聚合酶融合至所述第二cas蛋白的C端段。
在某些实施方案中,所述核酸分子A2和核酸分子B2包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A2和核酸分子B2在细胞中能够表达分离的所述第二Cas蛋白和所述第二DNA聚合酶,或者能够表达含有所述第二Cas蛋白和所述第二DNA聚合酶的第二融合蛋白。
在某些实施方案中,所述系统或试剂盒包含,含有所述第二Cas蛋白和所述第二DNA聚合酶的第二融合蛋白,或者,含有编码所述第二融合蛋白的核苷酸序列的核酸分子。或者,分离的所述第二Cas蛋白和第二DNA聚合酶,或者,能够表达分离的所述第二Cas蛋白和第二DNA聚合酶的核酸分子。
在某些实施方案中,所述第一和第二Cas蛋白是相同的Cas蛋白,所述第一和第二DNA聚合酶是相同的DNA聚合酶;并且,所述系统或试剂盒包含:
(M1-1)含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白,或者,含有编码所述第一融合蛋白的核苷酸序列的核酸分子;或,(M1-2)分离的所述第一Cas蛋白和第一DNA聚合酶,或者,能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子;
(M2)含有所述第一gRNA和第一标签引物的第一PegRNA,或者,含有编码所述第一PegRNA的核苷酸序列的核酸分子;
(M3)含有所述第二gRNA和第二标签引物的第二PegRNA,或者,含有编码所述第二PegRNA的核苷酸序列的核酸分子。
在某些实施方案中,所述系统或试剂盒还包含核酸载体(例如,供体核酸载体)。
在某些实施方案中,所述核酸载体还包含所述第一Cas蛋白识别的第一PAM序列,和/或,所述第二Cas蛋白识别的第二PAM序列。
在某些实施方案中,所述核酸载体是双链的。
在某些实施方案中,所述核酸载体是环状双链载体。
在某些实施方案中,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列),和/或,能够与所述第二引导序列杂交或退火的第二引导结合序列(例如,所述第二引导序列的互补序列);任选地,所述 核酸载体在所述第一引导结合序列与所述第二引导结合序列之间还包含限制性酶切位点。
在某些实施方案中,所述第一引导结合序列与所述第二引导结合序列位于所述核酸载体的相对链上。
在某些实施方案中,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序列,断裂所述核酸载体的一条核酸链(第一链);和/或,所述第二功能性复合物能够通过所述第二引导结合序列和所述第二PAM序列,断裂所述核酸载体的另一条核酸链(第二链)。
在某些实施方案中,所述核酸载体还包含目的核酸序列。
在某些实施方案中,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段。
在某些实施方案中,所述第一PAM序列和第二PAM序列分别位于目的核酸序列的两侧。
在某些实施方案中,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间。
在某些实施方案中,所述第二引导结合序列位于目的核酸序列和所述第二PAM序列之间。
在某些实施方案中,所述第一功能性复合物和所述第二功能性复合物分别断裂所述核酸载体的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态。在某些实施方案中,所述第二靶结合序列杂交或退火的核酸链是含有所述第二引导结合序列的核酸链的相对链。
在某些实施方案中,所述第一靶结合序列杂交或退火的核酸链是所述第二靶结合序列杂交或退火的核酸链的相对链。
在某些实施方案中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶序列位于所述第一引导结合序列的相对链。在某些实施方案中,所述第一靶序列位于断裂的第一链的末端。在某些实施方案中,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(优选地,形成第一瓣突)。
和/或,
所述核酸载体还包含第二靶序列;其中,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二靶序列杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态。在某些实施方案中, 所述第二靶序列位于所述第二引导结合序列的相对链。在某些实施方案中,所述第二靶序列位于断裂的第二链的末端。在某些实施方案中,在所述第二功能性复合物断裂所述第二链后,含有第二靶序列的核酸链的3’端能够以退火至第二靶序列的第二标签引物为模板进行延伸(优选地,形成第二瓣突)。
在某些实施方案中,含有第一靶序列的核酸链位于含有第二靶序列的核酸链的相对链。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含限制性酶切位点。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含外源基因。
在某些实施方案中,所述系统或试剂盒还包含:
(9)用于将第三双链靶核酸双链断裂的第三核酸编辑系统。
在某些实施方案中,所述第三核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在某些实施方案中,所述第三核酸编辑系统能够将第三双链靶核酸的两条链断裂,形成断裂的核苷酸片段a1和a2。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签序列或其互补序列或所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火。
在某些实施方案中,所述第一标签序列或其互补序列或所述第一瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a1杂交或退火。
在某些实施方案中,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
在某些实施方案中,所述断裂的核苷酸片段a2中含有靶位点同源臂,所述靶位点同源臂与所述供体同源臂具有至少85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%、99%或100%序列同一性。
在某些实施方案中,所述靶位点同源臂位于第三双链靶核酸断裂处的上游,且所述供体同源臂位于目的核酸序列的上游;或者,所述靶位点同源臂位于第三双链靶核酸断裂处的下游,且所述供体同源臂位于目的核酸序列的下游。
在某些实施方案中,所述供体同源臂和所述靶位点同源臂的长度各自独立地为100至300bp,300至500bp,500至1000bp,1000至2000bp,2000至5000bp。
在某些实施方案中,所述靶位点同源臂的序列选自外显子序列、内含子序列、基因间序列、3’UTR序列、5’UTR序列、启动子序列或色体序列。
在某些实施方案中,所述系统或试剂盒还包含:
(9)用于将第三双链靶核酸双链断裂的第三核酸编辑系统。
在某些实施方案中,所述第三核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在某些实施方案中,所述第三核酸编辑系统能够将第三双链靶核酸的两条链断裂,形成断裂的核苷酸片段a1和a2。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签序列或其互补 序列或所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火。
在某些实施方案中,所述第一标签序列或其互补序列或所述第一瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a1杂交或退火。
在某些实施方案中,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第二标签序列或其互补序列或所述第二瓣突能够与断裂的核苷酸片段a2杂交或退火。
在某些实施方案中,所述第二标签序列或其互补序列或所述第二瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a2杂交或退火。
在某些实施方案中,所述第二标签序列的互补序列或所述第二瓣突能够杂交或退火到断裂的核苷酸片段a2的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
在某些实施方案中,所述第三核酸编辑系统是CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在某些实施方案中,所述第三核酸编辑系统包含:(i)第三Cas蛋白或含有编码所述第三Cas蛋白的核苷酸序列的核酸分子,以及(ii)第三gRNA或含有编码所述第三gRNA的核苷酸序列的核酸分子;其中,所述第三gRNA能够与第三Cas蛋白结合,并形成第三功能性复合物;所述第三功能性复合物能够将第三双链靶核酸的两条链断裂,形成断裂的核苷酸片段a1和a2。
在某些实施方案中,所述第三Cas蛋白选自切割DNA双链的Cas蛋白,例如Cas9蛋白。
在某些实施方案中,所述第三gRNA具有如SEQ ID NO:11、38、54、67、80、93、106、119或132中任意一项所示的序列。
在某些实施方案中,所述第三gRNA具有如SEQ ID NO:11、38、54、67、80、93、106、119或132中任意一项所示的序列。
在某些实施方案中,所述gRNA用于识别GAPDH位点的3’URT区时,所述第三gRNA具有如SEQ ID NO:11所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:145所示的序列时,所述第三gRNA具有如SEQ ID NO:11所示的序列。
在某些实施方案中,所述gRNA用于识别人基因组AAVS1位点的第一个内含子时,所述第三gRNA具有如SEQ ID NO:38所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:146所示的序列时,所述第三gRNA具有如SEQ ID NO:38所示的序列。
在某些实施方案中,所述gRNA用于识别基因组Rosa26位点的第一个内含子,所述第三gRNA具有如SEQ ID NO:54所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:147所示的序列时,所述第三gRNA具有如SEQ ID NO:54所示的序列;
在某些实施方案中,所述gRNA用于识别人基因组CCR5位点,所述第三gRNA具有如SEQ ID NO:67所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:148所示的序列时,所述第三gRNA具有如SEQ ID NO:67所示的序列。
在某些实施方案中,所述gRNA用于识别人基因组TRAC位点,所述第三gRNA具有如SEQ ID NO:80所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:149所示的序列时,所述第三gRNA具有如SEQ ID NO:80所示的序列。
在某些实施方案中,所述gRNA用于识别WAS-1位点,所述第三gRNA具有如 SEQ ID NO:93所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:150所示的序列时,所述第三gRNA具有如SEQ ID NO:93所示的序列。
在某些实施方案中,所述gRNA用于识别WAS-3位点,所述第三gRNA具有如SEQ ID NO:106所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:151所示的序列时,所述第三gRNA具有如SEQ ID NO:106所示的序列。
在某些实施方案中,所述gRNA用于识别HBB位点,所述第三gRNA具有如SEQ ID NO:119所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:152所示的序列时,所述第三gRNA具有如SEQ ID NO:119所示的序列。
在某些实施方案中,所述gRNA用于识别IL2RG位点,所述第三gRNA具有如SEQ ID NO:132所示的序列。在某些实施方案中,当第三双链靶核酸包含如SEQ ID NO:153所示的序列时,所述第三gRNA具有如SEQ ID NO:132所示的序列。
在某些实施方案中,所述系统或试剂盒还包含:
(10)用于将第四双链靶核酸双链断裂的第四核酸编辑系统。
在某些实施方案中,所述第四核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在某些实施方案中,所述第三核酸编辑系统和第四核酸编辑系统选自相同的位点特异性核酸酶技术。
在某些实施方案中,所述第四双链靶核酸与所述第三双链靶核酸是相同的,并且,所述第三和第四核酸编辑系统在不同的位置断裂所述相同的双链靶核酸,形成断裂的核苷酸片段a1、a2和a3;其中,在断裂之前,在所述相同的双链靶核酸中,核苷酸片段a1、a2和a3依次排列(即,核苷酸片段a1通过核苷酸片段a2与核苷酸片段a3相连)。在某些实施方案中,所述第三和第四核酸编辑系统分别导致核苷酸片段a1和a2的分离以及核苷酸片段a2和a3的分离。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签序列或其互补序列或所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火。
在某些实施方案中,所述第一标签序列或其互补序列或所述第一瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a1杂交或退火。
在某些实施方案中,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第二标签序列或其互补序列或所述第二瓣突能够与断裂的核苷酸片段a3杂交或退火。
在某些实施方案中,所述第二标签序列或其互补序列或所述第二瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a3杂交或退火。
在某些实施方案中,所述第二标签序列的互补序列或所述第二瓣突能够杂交或退火到断裂的核苷酸片段a3的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
在某些实施方案中,所述第四核酸编辑系统是CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在某些实施方案中,所述第四核酸编辑系统包含:(i)第四Cas蛋白或含有编码所述第四Cas蛋白的核苷酸序列的核酸分子,以及(ii)第四gRNA或含有编码所述第四gRNA的核苷酸序列的核酸分子;其中,所述第四gRNA能够与第四Cas蛋白结合,并形成第四功能性复合物;所述第四功能性复合物能够将第四双链靶核酸的两条链断裂,形成断 裂的靶核酸片段b1和b2。
在某些实施方案中,所述第四Cas蛋白选自切割DNA双链的Cas蛋白,例如Cas9蛋白。
在某些实施方案中,所述第三核酸编辑系统和第四核酸编辑系统是CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在某些实施方案中,所述第三核酸编辑系统如前定义,所述第四核酸编辑系统如前定义。
在某些实施方案中,所述试剂盒还包含额外的系统或组分。
在某些实施方案中,所述额外的组分包括选自下列的一项或多项:
(1)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的gRNA或含有编码所述额外的gRNA的核苷酸序列的核酸分子,其中,所述额外的gRNA能够与Cas蛋白结合,并形成功能性复合物。在某些实施方案中,所述功能性复合物能够将双链靶核酸的两条链或一条链断裂。
(2)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的Cas蛋白或含有编码所述额外的Cas蛋白的核苷酸序列的核酸分子。在某些实施方案中,所述Cas蛋白能够切割或断裂双链靶核酸的一条链或两条链。
(3)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的标签引物或含有编码所述额外的标签引物的核苷酸序列的核酸分子,其中,所述额外的标签引物含有标签序列和靶结合序列,所述标签序列位于所述靶结合序列的上游或5’端。在某些实施方案中,在允许核酸杂交或退火的条件下,所述靶结合序列能够杂交或退火到所述断裂的核酸链的3’端,形成双链结构,且,所述标签序列不与所述靶核酸片段结合,处于游离的单链状态。
(4)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的DNA聚合酶或含有编码所述额外的DNA聚合酶的核苷酸序列的核酸分子。在某些实施方案中,所述额外的DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶。在某些实施方案中,所述额外的DNA聚合酶为依赖于RNA的DNA聚合酶,例如逆转录酶。
在某些实施方案中,所述额外的系统包括:一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)用于将双链靶核酸双链断裂的核酸编辑系统。
在某些实施方案中,所述核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
在第二方面,本申请提供了一种融合蛋白,其包含Cas蛋白与依赖于模板的DNA聚合酶,其中,所述Cas蛋白能够断裂靶核酸的一条核酸链。
在某些实施方案中,所述Cas蛋白能够断裂靶核酸的一条核酸链,并产生切口。
在某些实施方案中,所述Cas蛋白选自切割DNA单链的Cas蛋白。
在某些实施方案中,所述Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3 蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式。
在某些实施方案中,所述Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A))。
在某些实施方案中,所述Cas蛋白具有SEQ ID NO:3所示的氨基酸序列。
在某些实施方案中,所述DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶。
在某些实施方案中,所述DNA聚合酶为依赖于RNA的DNA聚合酶。
在某些实施方案中,所述DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶。
在某些实施方案中,所述DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列。
在某些实施方案中,所述Cas蛋白通过接头或者不通过接头与所述DNA聚合酶共价相连接。
在某些实施方案中,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列。
在某些实施方案中,所述Cas蛋白任选地通过接头连接或融合至所述DNA聚合酶的N端;或者,所述Cas蛋白任选地通过接头连接或融合至所述DNA聚合酶的C端。
在某些实施方案中,所述融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
在第三方面,本申请提供了一种核酸分子,其包含编码如前所述的融合蛋白的多核苷酸。
在第四方面,本申请提供了一种载体,其包含如前所述的核酸分子。
在某些实施方案中,所述载体为表达载体。
在某些实施方案中,所述载体为真核表达载体。
在第五方面,本申请提供了一种宿主细胞,其包含如前所述的核酸分子或如前所述的载体。
在某些实施方案中,所述宿主细胞为原核细胞,例如大肠杆菌细胞;或者所述宿主细胞为真核细胞,例如,酵母细胞,真菌细胞,植物细胞,动物细胞。
在某些实施方案中,所述宿主细胞为哺乳动物细胞,例如人细胞。
在第五方面,本申请提供了一种制备如前所述的融合蛋白的方法,其包括,(1)在允许蛋白表达的条件下,培养如前所述的宿主细胞;和(2)分离所述宿主细胞表达的融合蛋白。
在第六方面,本申请提供了一种复合物,其包含第一Cas蛋白与依赖于模板的第一DNA聚合酶,其中,所述第一Cas蛋白具有断裂双链靶核酸的一条核酸链的能力,并且,所述第一Cas蛋白通过共价或者非共价的方式与第一DNA聚合酶复合。
在某些实施方案中,所述第一Cas蛋白能够断裂双链靶核酸的一条核酸链,并产生切口。
在某些实施方案中,所述第一Cas蛋白选自切割DNA单链的Cas蛋白。
在某些实施方案中,所述第一Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋 白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式。
在某些实施方案中,所述第一Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A))。
在某些实施方案中,所述第一Cas蛋白具有SEQ ID NO:3所示的氨基酸序列。
在某些实施方案中,所述第一DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第一DNA聚合酶为依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第一DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶。
在某些实施方案中,所述第一DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列。
在某些实施方案中,所述第一Cas蛋白通过接头或者不通过接头与所述第一DNA聚合酶共价相连接。
在某些实施方案中,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列。
在某些实施方案中,所述第一Cas蛋白通过肽接头或者不通过肽接头与所述第一DNA聚合酶融合,形成融第一合蛋白。
在某些实施方案中,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的N端;或者,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的C端。
在某些实施方案中,所述第一融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
在某些实施方案中,所述复合物还包含第一gRNA。
在某些实施方案中,所述第一gRNA能够与所述第一Cas蛋白结合,并形成第一功能性单元;所述第一功能性单元能够结合双链靶核酸中的一条核酸链(第二链),并将双链靶核酸中的另一条核酸链断裂(第一链)。
在某些实施方案中,所述第一gRNA含有第一引导序列,并且,在允许核酸杂交或退火的条件下,所述第一引导序列能够杂交或退火至双链靶核酸的一条核酸链。
在某些实施方案中,所述第一引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一gRNA还含有第一支架序列,其能够被所述第一Cas蛋白识别并结合,从而形成第一功能性单元。
在某些实施方案中,所述第一支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一引导序列位于所述第一支架序列的上游或5’端。
在某些实施方案中,所述复合物或第一功能性单元在所述第一引导序列与双链靶核酸结合后,能够将双链靶核酸的一条链断裂。
在某些实施方案中,所述复合物还包含双链靶核酸。
在某些实施方案中,所述双链靶核酸含有所述第一Cas蛋白识别的第一PAM序列以及能够与所述第一引导序列杂交或退火的第一引导结合序列,由此,所述第一功能性单元通过所述第一引导结合序列和所述第一PAM序列,结合所述双链靶核酸。
在某些实施方案中,所述复合物还包含与所述双链靶核酸杂交或退火的第一标签引物;其中,所述第一标签引物含有第一靶结合序列,其能够与所述双链靶核酸杂交或退火。
在某些实施方案中,所述标签引物含有第一标签序列和第一靶结合序列,所述第一标签序列位于所述第一靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第一靶结合序列能够杂交或退火至所述双链靶核酸。在某些实施方案中,所述第一靶结合序列能够杂交或退火至所述双链靶核酸被所述第一功能性单元断裂的核酸链的3’端,形成双链结构。在某些实施方案中,所述3’端是因所述第一功能性单元断裂所述双链靶核酸的一条核酸链而形成的。在某些实施方案中,所述第一标签序列不与所述断裂的核酸链结合,处于游离的单链状态。
在某些实施方案中,所述第一靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第一标签引物通过所述第一靶结合序列结合至所述断裂的核酸链。在某些实施方案中,所述第一DNA聚合酶与所述断裂的核酸链和所述第一标签引物结合。
在某些实施方案中,所述第一标签引物为单链脱氧核糖核酸或者单链核糖核酸。
在某些实施方案中,所述第一标签引物为单链核糖核酸,并且所述第一DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第一标签引物为单链脱氧核糖核酸,并且所述第一DNA聚合酶为依赖于DNA的DNA聚合酶。
在某些实施方案中,所述断裂的核酸链被所述第一DNA聚合酶以所述第一标签引物为模板延伸,形成第一瓣突。
在某些实施方案中,所述第一gRNA结合的核酸链与所述第一标签引物结合的核酸链是不同的。在某些实施方案中,所述第一gRNA结合的核酸链是所述第一标签引物结合的核酸链的相对链。
在某些实施方案中,所述第一标签引物与所述第一gRNA相连接。
在某些实施方案中,所述第一标签引物通过接头或者不通过接头与所述第一gRNA共价相连接。
在某些实施方案中,所述第一标签引物任选地通过接头连接至所述第一gRNA的3’端。
在某些实施方案中,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头)。
在某些实施方案中,所述第一标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第一gRNA的3’端相连接,形成第一PegRNA。
在某些实施方案中,所述复合物还包含第二Cas蛋白和第二gRNA,其中,所述第二Cas蛋白具有断裂双链靶核酸的一条核酸链的能力,所述第二gRNA能够与所述第二Cas蛋白结合,并形成第二功能性单元;所述第二功能性单元能够结合双链靶核酸,并将 其一条链断裂。
在某些实施方案中,所述第二Cas蛋白与所述第一Cas蛋白相同或者不同。在某些实施方案中,所述第二Cas蛋白与所述第一Cas蛋白相同。
在某些实施方案中,所述第二Cas蛋白能够断裂双链靶核酸的一条核酸链,并产生切口。
在某些实施方案中,所述第二Cas蛋白选自切割DNA单链的Cas蛋白。
在某些实施方案中,所述第二Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式。
在某些实施方案中,所述第二Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A))。
在某些实施方案中,所述第二Cas蛋白具有SEQ ID NO:3所示的氨基酸序列。
在某些实施方案中,所述第二gRNA含有第二引导序列,并且,在允许核酸杂交或退火的条件下,所述第二引导序列能够杂交或退火至双链靶核酸的一条核酸链。
在某些实施方案中,所述第二引导序列与所述第一引导序列不同。在某些实施方案中,所述第一引导序列结合的核酸链与所述第二引导序列结合的核酸链是不同的。在某些实施方案中,所述第一引导序列结合的核酸链是所述第二引导序列结合的核酸链的相对链。
在某些实施方案中,所述第二功能性单元与第一功能性单元结合的双链靶核酸相同,该双链靶核酸包含第一链和第二链,所述第一功能性单元在所述第一引导序列与第一链结合后,能够将第一链断裂,所述第二功能性单元在所述第二引导序列与第一链结合后,将第一链断裂。在某些实施方案中,所述第二功能性单元与第一功能性单元在相对链的不同位置产生断裂。
在某些实施方案中,所述第二引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二gRNA还含有第二支架序列,其能够被所述第二Cas蛋白识别并结合,从而形成第二功能性单元。
在某些实施方案中,所述第二支架序列与所述第一支架序列相同或者不同。在某些实施方案中,所述第二支架序列与所述第一支架序列相同。
在某些实施方案中,所述第二支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二引导序列位于所述第二支架序列的上游或5’端。
在某些实施方案中,所述双链靶核酸含有所述第二Cas蛋白识别的第二PAM序列以及能够与所述第二引导序列杂交或退火的第二引导结合序列,由此,所述第二功能性单元通过所述第二引导结合序列和所述第二PAM序列,结合所述双链靶核酸。
在某些实施方案中,所述复合物还包含依赖于模板的第二DNA聚合酶,所述第二DNA聚合酶通过共价或者非共价的方式与第二Cas蛋白复合。
在某些实施方案中,所述第二DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第二DNA聚合酶为依赖于RNA的DNA聚合酶。
在某些实施方案中,所述第二DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶。
在某些实施方案中,所述第二DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列。
在某些实施方案中,所述第二DNA聚合酶与所述第一DNA聚合酶相同或者不同。在某些实施方案中,所述第二DNA聚合酶与所述第一DNA聚合酶相同。
在某些实施方案中,所述第二Cas蛋白通过接头或者不通过接头与所述第二DNA聚合酶共价相连接。
在某些实施方案中,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列。
在某些实施方案中,所述第二Cas蛋白通过肽接头或者不通过肽接头与所述第二DNA聚合酶融合,形成融第二合蛋白。
在某些实施方案中,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的N端;或者,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的C端。
在某些实施方案中,所述第二融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
在某些实施方案中,所述复合物还包含与所述双链靶核酸杂交或退火的第二标签引物;其中,所述第二标签引物含有第二靶结合序列,其能够与所述双链靶核酸杂交或退火。
在某些实施方案中,所述标签引物含有第二标签序列和第二靶结合序列,所述第二标签序列位于所述第二靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第二靶结合序列能够杂交或退火至所述双链靶核酸。在某些实施方案中,所述第二靶结合序列能够杂交或退火至所述双链靶核酸被所述第二功能性单元断裂的核酸链的3’端,形成双链结构。在某些实施方案中,所述3’端是因所述第二功能性单元断裂所述双链靶核酸的一条链而形成的。在某些实施方案中,所述第二标签序列不与所述断裂的核酸链结合,处于游离的单链状态。
在某些实施方案中,所述第二靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二靶结合序列与所述第一靶结合序列不同。在某些实施方案中,所述第二靶结合序列结合的核酸链与所述第一靶结合序列结合的核酸链是不同的。在某些实施方案中,所述第二靶结合序列结合的核酸链是所述第一靶结合序列结合的核酸链的相对链。
在某些实施方案中,所述第二标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长。
在某些实施方案中,所述第二标签序列与所述第一标签序列相同或不同。在某些实施方案中,所述第二标签序列与所述第一标签序列不同。
在某些实施方案中,所述第二标签引物通过所述第二靶结合序列结合至所述断裂的核酸链。在某些实施方案中,所述第二DNA聚合酶与所述断裂的核酸链和所述第二标签引物结合。
在某些实施方案中,所述第二标签引物为单链脱氧核糖核酸或者单链核糖核酸。
在某些实施方案中,所述第二标签引物为单链核糖核酸,并且所述第二DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第二标签引物为单链脱氧核糖核酸,并且所述第二DNA聚合酶为依赖于DNA的DNA聚合酶。
在某些实施方案中,所述断裂的靶核酸片段被所述第二DNA聚合酶以所述第二标签引物为模板延伸,形成第二瓣突。
在某些实施方案中,所述第二gRNA结合的核酸链与所述第二标签引物结合的核酸链是不同的。在某些实施方案中,所述第二gRNA结合的核酸链是所述第二标签引物结合的核酸链的相对链。
在某些实施方案中,所述第二标签引物与所述第二gRNA相连接。
在某些实施方案中,所述第二标签引物通过接头或者不通过接头与所述第二gRNA共价相连接。
在某些实施方案中,所述第二标签引物任选地通过接头连接至所述第二gRNA的3’端。
在某些实施方案中,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头)。
在某些实施方案中,所述第二标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第二gRNA的3’端相连接,形成第二PegRNA。
在某些实施方案中,所述第一和第二功能性单元以预定的位置关系结合双链靶核酸。
在某些实施方案中,所述第二引导序列与所述第一靶结合序列结合相同的核酸链;和/或,所述第一引导序列与所述第二靶结合序列结合相同的核酸链。
在某些实施方案中,所述第二引导序列的结合位置位于所述第一靶结合序列的结合位置的上游或5’端;和/或,所述第一引导序列的结合位置位于所述第二靶结合序列的结合位置的上游或5’端。
在某些实施方案中,所述第二引导序列的结合位置位于所述第一靶结合序列的结合位置的下游或3’端;和/或,所述第一引导序列的结合位置位于所述第二靶结合序列的结合位置的下游或3’端。
在某些实施方案中,所述双链靶核酸选自但不限于基因组DNA和核酸载体DNA。
在第十方面,本申请提供了一种核酸载体,所述核酸载体(例如,供体核酸载体)包含如前所述的第一Cas蛋白所识别的第一PAM序列。
在某些实施方案中,所述核酸载体还包含供体同源臂。
在某些实施方案中,所述核酸载体是双链的。
在某些实施方案中,所述核酸载体是环状双链载体。
在某些实施方案中,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列)。
在某些实施方案中,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序列,断裂所述核酸载体的一条核酸链。
在某些实施方案中,所述核酸载体还包含目的核酸序列。
在某些实施方案中,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段。
在某些实施方案中,所述第一PAM序列和所述供体同源臂分别位于目的核酸序列的两侧。
在某些实施方案中,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间。
在某些实施方案中,所述第一功能性复合物断裂所述核酸载体的第一链,所述第一链包含由断裂所产生的切口,位于上述切口的3’端和所述供体同源臂之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链。
在某些实施方案中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶序列位于所述第一引导结合序列的相对链。在某些实施方案中,所述第一靶序列位于断裂的第一链的末端。在某些实施方案中,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(优选地,形成第一瓣突)。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述供体同源臂之间还包含限制性酶切位点。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述供体同源臂之间还包含外源基因。
在第十一方面,本申请提供了一种试剂盒,其包含第十方面所述的核酸载体,以及如第一方面的系统或试剂盒中的一项或多项组分(例如,第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1,依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1,第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1,第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1)。
在某些实施方案中,所述试剂盒包含下述4种组分:
(a)第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1;
(b)依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1;
(c)第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1;和,
(d)第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1。
在某些实施方案中,所述4种组分包含于1个或多个(例如,2个,3个,4个)载体中。
在某些实施方案中,所述试剂盒包含下述载体:
(a)如前所述的核酸载体;
(b)包含编码所述第一Cas蛋白的核苷酸序列的核酸分子A1和编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1的第一载体;
(c)包含编码所述第一gRNA的核苷酸序列的核酸分子C1和含有编码所述第一标签引物的核苷酸序列的核酸分子D1的第二载体;
任选地,所述试剂盒还包含如前所述的系统或试剂盒中所述的第三核酸编辑系统中的一项或多项组分(例如,(i)第三Cas蛋白或含有编码所述第三Cas蛋白的核苷酸序列 的核酸分子,以及(ii)第三gRNA或含有编码所述第三gRNA的核苷酸序列的核酸分子)。
在第十二方面,本申请提供了一种核酸载体,所述核酸载体(例如,供体核酸载体)包含如前所述的第一Cas蛋白所识别的第一PAM序列。
在某些实施方案中,所述核酸载体还包含如前所述的第二Cas蛋白所识别的第二PAM序列。
在某些实施方案中,所述核酸载体是双链的。
在某些实施方案中,所述核酸载体是环状双链载体。
在某些实施方案中,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列),和/或,能够与所述第二引导序列杂交或退火的第二引导结合序列(例如,所述第二引导序列的互补序列)。任选地,所述核酸载体在所述第一引导结合序列与所述第二引导结合序列之间还包含限制性酶切位点。
在某些实施方案中,所述第一引导结合序列与所述第二引导结合序列位于所述核酸载体的相对链上。
在某些实施方案中,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序列,断裂所述核酸载体的一条核酸链(第一链);和/或,所述第二功能性复合物能够通过所述第二引导结合序列和所述第二PAM序列,断裂所述核酸载体的另一条核酸链(第二链)。
在某些实施方案中,所述核酸载体还包含目的核酸序列。
在某些实施方案中,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段。
在某些实施方案中,所述第一PAM序列和第二PAM序列分别位于目的核酸序列的两侧。
在某些实施方案中,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间。
在某些实施方案中,所述第二引导结合序列位于目的核酸序列和所述第二PAM序列之间。
在某些实施方案中,所述第一功能性复合物和所述第二功能性复合物分别断裂所述核酸载体的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链。
在某些实施方案中,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态。在某些实施方案中,所述第二靶结合序列杂交或退火的核酸链是含有所述第二引导结合序列的核酸链的相对链。
在某些实施方案中,所述第一靶结合序列杂交或退火的核酸链是所述第二靶结合序 列杂交或退火的核酸链的相对链。
在某些实施方案中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态。在某些实施方案中,所述第一靶序列位于所述第一引导结合序列的相对链。在某些实施方案中,所述第一靶序列位于断裂的第一链的末端。在某些实施方案中,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(优选地,形成第一瓣突)。
和/或,
所述核酸载体还包含第二靶序列;其中,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二靶序列杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态。在某些实施方案中,所述第二靶序列位于所述第二引导结合序列的相对链。在某些实施方案中,所述第二靶序列位于断裂的第二链的末端。在某些实施方案中,在所述第二功能性复合物断裂所述第二链后,含有第二靶序列的核酸链的3’端能够以退火至第二靶序列的第二标签引物为模板进行延伸(优选地,形成第二瓣突)。
在某些实施方案中,含有第一靶序列的核酸链位于含有第二靶序列的核酸链的相对链。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含限制性酶切位点。
在某些实施方案中,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含外源基因。
在第十三方面,本申请提供了一种试剂盒,其包含第十二方面所述的核酸载体,如第一方面所述的系统或试剂盒中的一项或多项组分(例如,第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1,依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1,第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1,第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1),以及如第一方面所述的系统或试剂盒中的一项或多项组分(例如,第二gRNA或含有编码所述第二gRNA的核苷酸序列的核酸分子C2,所述第二Cas蛋白或含有编码所述第二Cas蛋白的核苷酸序列的核酸分子A2,第二标签引物或含有编码所述第二标签引物的核苷酸序列的核酸分子D2,所述第二DNA聚合酶或含有编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2)。
在某些实施方案中,所述试剂盒包含下述8种组分:
(a)第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1;
(b)依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1;
(c)第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1;
(d)第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1;
(e)第二gRNA或含有编码所述第二gRNA的核苷酸序列的核酸分子C2;
(f)所述第二Cas蛋白或含有编码所述第二Cas蛋白的核苷酸序列的核酸分子A2;
(g)第二标签引物或含有编码所述第二标签引物的核苷酸序列的核酸分子D2;和
(h)所述第二DNA聚合酶或含有编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2。
在某些实施方案中,所述8种组分包含于1个或多个(例如,2个,3个,4个,5个,6个,7个,8个)载体中。
在某些实施方案中,所述试剂盒包含下述载体:
(a)如前所述的核酸载体;
(b)包含编码所述第一Cas蛋白的核苷酸序列的核酸分子A1和编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1的第一载体;
(c)包含编码所述第一gRNA的核苷酸序列的核酸分子C1和含有编码所述第一标签引物的核苷酸序列的核酸分子D1的第二载体;
(d)包含编码所述第二gRNA的核苷酸序列的核酸分子C2和所述第二Cas蛋白和编码所述第二Cas蛋白的核苷酸序列的核酸分子A2的第三载体;和
(e)包含编码所述第二标签引物的核苷酸序列的核酸分子D2和编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2的第四载体。
任选地,所述试剂盒还包含如前所述的系统或试剂盒中所述的第三核酸编辑系统中的一项或多项组分(例如,(i)第三Cas蛋白或含有编码所述第三Cas蛋白的核苷酸序列的核酸分子,以及(ii)第三gRNA或含有编码所述第三gRNA的核苷酸序列的核酸分子)。
在第七方面,本申请提供了一种方法,其用于将双链靶核酸的一条核酸链断裂并在切口的3’端添加瓣突,其中,所述方法包括,使用如前所述的系统或试剂盒。
在某些实施方案中,所述方法包括以下步骤:
i.提供双链靶核酸;和
提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物;
ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物接触。
在某些实施方案中,在步骤ii中:
所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,并且,所述第一功能性复合物断裂所述双链靶核酸的一条核酸链;并且,
所述第一标签引物通过所述第一靶结合序列杂交或退火至所述断裂的核酸链的3’端;并且,
所述第一DNA聚合酶以退火至所述断裂的核酸链的第一标签引物为模板,延伸所述断裂的核酸链,形成第一瓣突。
在某些实施方案中,所述方法在细胞内进行。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1以及所述第一标签引物或核酸分子D1递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1和所述第一标签引物或核酸分子D1递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1和D1递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细 胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。在某些实施方案中,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物。
在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子,以及含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物。
在某些实施方案中,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸。
在某些实施方案中,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如前所定义。
在某些实施方案中,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列。在某些实施方案中,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条核酸链断裂。
在第八方面,本申请提供了一种方法,其用于将双链靶核酸的两条核酸链分别断裂,并在所述两条核酸链中由断裂产生的两个切口的3’端分别添加瓣突,其中,所述方法包括,使用如前所述的系统或试剂盒;其中,所述第一双链靶核酸与所述第二双链靶核酸是相同的。在某些实施方案中,所述方法用于将双链靶核酸的两条核酸链分别在不同位置断裂。
在某些实施方案中,所述方法在细胞外或细胞内进行。
在某些实施方案中,所述方法包括以下步骤:
i.提供双链靶核酸;和
提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、所述第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物;
ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物接触。
在某些实施方案中,在步骤ii中:
所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,且所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物;并且,所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1;并且,
所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条 核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2。
在某些实施方案中,所述方法在细胞内进行。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1、A2、B2、C2以及D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶。
在某些实施方案中,所述核酸分子A2和核酸分子B2包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A2和核酸分子B2在细胞中能够表达分离的所述第二Cas蛋白和所述第二DNA聚合酶,或者能够表达含有所述第二Cas蛋白和所述第二DNA聚合酶的第二融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第二Cas蛋白和第二DNA聚合酶的核酸分子或者含有编码所述第二融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第二Cas蛋白和所述第二DNA聚合酶。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。在某些实施方案中,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物。
在某些实施方案中,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例 如,真核表达载体)中。在某些实施方案中,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA。在某些实施方案中,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物。
在某些实施方案中,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸。
在某些实施方案中,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如前所定义。
在某些实施方案中,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如前所定义。
在某些实施方案中,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列。在某些实施方案中,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂。
在某些实施方案中,所述第二Cas蛋白与所述第一Cas蛋白相同,并且所述第二DNA聚合酶与所述第一DNA聚合酶相同;其中,所述第一Cas蛋白与所述第一和第二gRNA分别形成第一和第二功能性复合物,并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物。
在某些实施方案中,在步骤i中,所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物。
在某些实施方案中,在步骤i中,所述核酸分子A1、B1、C1、D1、C2以及D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中 能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。在某些实施方案中,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物。
在某些实施方案中,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA。在某些实施方案中,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物。
在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子、含有编码所述第一PegRNA的核苷酸序列的核酸分子以及含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物。
在第九方面,本申请提供了一种方法,其用于将靶核酸片段插入感兴趣的核酸分子;其中,所述方法包括,使用第十一方面所述的系统或试剂盒;其中,所述第一双链靶核酸与所述第二双链靶核酸是相同的,用于提供所述靶核酸片段,所述靶核酸片段位于所述双链靶核酸的第一链中由断裂产生的3’端与第二链中由断裂产生的3’端之间;并且,所述第三双链靶核酸为感兴趣的核酸分子。
在某些实施方案中,所述方法包括:
a.通过如前所述的方法,将所述第一双链靶核酸的第一链和第二链分别断裂,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1;在上述两个3’端分别添加第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2。
b.用所述第三核酸编辑系统将所述感兴趣的核酸分子断裂,形成断裂的核苷酸片段a1和a2;以及,
c.用所述靶核酸片段F2连接所述核苷酸片段a1和a2,从而将所述靶核酸片段插入所述感兴趣的核酸分子。
在某些实施方案中,所述方法在细胞外或细胞内进行。
在某些实施方案中,当所述感兴趣的核酸分子是存在于细胞内的基因组序列;所述步骤a在所述细胞外或细胞内进行;所述步骤b和c在所述细胞内进行。
在某些实施方案中,所述方法包括以下步骤:
i.提供双链靶核酸和感兴趣的核酸分子;和
提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、所述第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统;
ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物接触,并且,将所述感兴趣的核酸分子与所述第三核酸编辑系统接触。
在某些实施方案中,在步骤ii中:
所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物;并且,
所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1,且,所述第三核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1和a2;并且,
所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;其中,所述第一瓣突和第二瓣突分别能够与断裂的核苷酸片段a1和a2杂交或退火;并且,
所述靶核酸片段F2通过第一瓣突和第二瓣突分别与核苷酸片段a1和a2杂交或退火,进而被插入或连接至核苷酸片段a1和a2之间,从而,将所述靶核酸片段插入所述感兴趣的核酸分子中。
在某些实施方案中,所述第一瓣突能够杂交或退火到所述核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的。
在某些实施方案中,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’部分,且所述核苷酸片段a1的3’部分与所述第三双链靶核酸所形成的断裂末端之间具有第一间隔区域。
在某些实施方案中,所述第一间隔区域的长度为1nt-200nt,例如1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt或100-200nt。
在某些实施方案中,所述第二瓣突能够杂交或退火到所述核苷酸片段a2的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的。
在某些实施方案中,所述第二标签序列的互补序列或所述第二瓣突能够杂交或退火到断裂的核苷酸片段a2的一条核酸链的3’部分,且所述核苷酸片段a2的3’部分与所述第三双链靶核酸所形成的断裂末端之间具有第二间隔区域。
在某些实施方案中,所述第二间隔区域的长度为1nt-200nt,例如1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt或100-200nt。
在某些实施方案中,所述方法在细胞内进行。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸 分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1、A2、B2、C2、D2和A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸。
在某些实施方案中,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列。在某些实施方案中,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂。
在某些实施方案中,所述感兴趣的核酸分子是所述细胞的基因组DNA。
在某些实施方案中,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如前所定义。
在某些实施方案中,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如前所定义。
在某些实施方案中,所述第三核酸编辑系统如前所定义。
在某些实施方案中,所述第三核酸编辑系统如前所定义,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列。在某些实施方案中,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂。
在某些实施方案中,所述第一、第二Cas蛋白是相同的,选自切割DNA单链的Cas蛋白,并且所述第二DNA聚合酶与所述第一DNA聚合酶相同;其中,所述第一Cas蛋白与所述第一、第二gRNA分别形成第一、第二功能性复合物,并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,形成具有第一瓣突和第二瓣突的靶核酸片段F2。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1、C2、D2和A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物和第三核酸编辑系统。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。在某些实施方案中,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物。
在某些实施方案中,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA。在某些实施方案中,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物。
在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子、含有编码所述第一PegRNA的核苷酸序列的核酸分子、含有编码所述第二PegRNA的核苷酸序列的核酸分子以及含有编码所述第三核酸编辑系统的序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物和第三核酸编辑系统。
在第十四方面,本申请提供了一种方法,其用于将靶核酸片段插入感兴趣的核酸分子;其中,所述方法包括,使用第十三方面所述的系统或试剂盒;其中,所述第一双链靶核酸用于提供所述靶核酸片段,所述靶核酸片段位于所述双链靶核酸中由第一链断裂产生的3’端与所述供体同源臂之间;并且,所述第三双链靶核酸为感兴趣的核酸分子;
任选地,所述第一双链靶核酸与所述第二双链靶核酸是相同的,且包含于权利要求第十二方面的核酸载体中。
在某些实施方案中,所述方法包括:
a.通过如前所述的方法,将所述第一双链靶核酸的第一链断裂,所述第一链包含由断裂所产生的切口,位于上述切口3’端和供体同源臂之间的第一链部分被称为靶核酸链S1;在上述3’端添加第一瓣突,形成具有第一瓣突的第一链部分,被称为靶核酸链S2;
b.用所述第三核酸编辑系统将所述感兴趣的核酸分子断裂,形成断裂的核苷酸片段a1和a2;以及,
c.所述靶核酸链S2通过第一瓣突与核苷酸片段a1的第一链杂交或退火;以所述靶核酸链S2为模板进行延伸反应形成延伸链E1,所述延伸链E1包含所述靶核酸链S2的互补序列以及与所述S2侧接的供体同源臂的互补序列;所述延伸链E1通过供体同源臂 与a2连接,从而将所述靶核酸片段插入所述感兴趣的核酸分子中。
在某些实施方案中,所述核苷酸片段a1的第一链的3’端包含第一瓣突的互补序列,所述核苷酸片段a1的第二链的3’端包含第一瓣突的序列。
在某些实施方案中,所述核苷酸片段a2的切口末端包含靶位点同源臂。
在某些实施方案中,所述方法在细胞外或细胞内进行。
在某些实施方案中,当所述感兴趣的核酸分子是存在于细胞内的基因组DNA;所述步骤a在所述细胞外或细胞内进行;所述步骤b、c和d在所述细胞内进行。
在某些实施方案中,所述方法包括以下步骤:
i.提供双链靶核酸和感兴趣的核酸分子,所述双链靶核酸包含供体同源臂,第一Cas蛋白识别的第一PAM序列和第一gRNA识别的序列(优选地,所述双链靶核酸包含供体同源臂,第一Cas蛋白识别的第一PAM序列和第一gRNA包含的第一引导序列所识别的序列);和
提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统;
ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物接触,并且,将所述感兴趣的核酸分子与所述第三核酸编辑系统接触。
在某些实施方案中,在步骤ii中:
所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物;并且,
所述第一功能性复合物断裂所述双链靶核酸的第一链,所述第一链包含由断裂所产生的切口,位于切口的3’端和供体同源臂之间的第一链部分被称为靶核酸链S1,且,所述第三核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1和a2;并且,
所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸链S1的3’端(即由所述断裂产生的3’端);并且,
所述第一DNA聚合酶以退火至所述靶核酸链S1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链中由断裂产生的3’端分别延伸形成第一瓣突,形成具有第一瓣突的第一链部分,被称为靶核酸链S2;其中,所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火;并且,
所述靶核酸链S2通过第一瓣突与核苷酸片段a1的第一链杂交或退火,从而,所述靶核酸链S2连接于靶核酸片段a1的第二链和靶核酸片段a2的第二链之间;
在所述核苷酸片段a1的第一链的3’端以所述靶核酸链S2为模板进行延伸反应形成延伸链E1,所述延伸链E1包含所述靶核酸链S2的互补序列以及与所述S2侧接的供体同源臂的互补序列;所述延伸链E1通过供体同源臂与a2的第一链退火,从而,所述延伸链E1连接于靶核酸片段a1的第一链和靶核酸片段a2的第一链之间,形成双链结构,从而将所述靶核酸片段插入所述感兴趣的核酸分子中。
在某些实施方案中,所述第一瓣突能够杂交或退火到所述核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的。
在某些实施方案中,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’部分,且所述核苷酸片段a1的3’部分与所述第三双链靶核酸所形成的断裂末端之间具有第一间隔区域。
在某些实施方案中,所述第一间隔区域的长度为1nt-200nt,例如1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt或100-200nt。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一 DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统;
或者,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1与所述双链靶核酸在细胞外接触,然后,将所述经编辑的双链靶核酸与第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供具有第一瓣突的双链靶核酸和第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1和A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统。
在某些实施方案中,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸。
在某些实施方案中,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及供体同源臂。在某些实施方案中,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂。
在某些实施方案中,所述感兴趣的核酸分子是所述细胞的基因组DNA。
在某些实施方案中,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如如前所定义。
在某些实施方案中,所述第三核酸编辑系统如前所定义。
在某些实施方案中,所述第三核酸编辑系统如前所定义,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列。在某些实施方案中,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂。
第十方面,本申请提供了一种方法,其用于将靶核酸片段置换感兴趣的核酸分子中的核苷酸片段;其中,所述方法包括,使用如前所述的系统或试剂盒;其中,所述第一双链靶核酸与所述第二双链靶核酸是相同的,用于提供所述靶核酸片段,所述靶核酸片段位于所述双链靶核酸的第一链中由断裂产生的切口与第二链中由断裂产生的切口之间;并且,所述第三双链靶核酸与所述第四双链靶核酸是相同的,为感兴趣的核酸分子。
在某些实施方案中,所述方法包括:
a.通过如前的方法,将所述第一双链靶核酸的第一链和第二链分别断裂,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1;在上述两个3’端分别添加第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;
b.用所述第三和第四核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1、a2和a3;其中,在断裂之前,在所述感兴趣的核酸分子中,核苷酸片段a1、 a2和a3依次排列(即,核苷酸片段a1通过核苷酸片段a2与核苷酸片段a3相连);以及,
c.用所述靶核酸片段F2连接所述核苷酸片段a1和a3,从而将感兴趣的核酸分子中的核苷酸片段a2替换为所述靶核酸片段。
在某些实施方案中,所述方法包括以下步骤:
i.提供双链靶核酸和感兴趣的核酸分子;和
提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、所述第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物接触,并且,将所述感兴趣的核酸分子与第三核酸编辑系统和第四核酸编辑系统接触。
在某些实施方案中,在步骤ii中:
所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物;并且,
所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1,且,所述第三和第四核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1、a2和a3;并且,
所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;其中,所述第一瓣突和第二瓣突分别能够与断裂的核苷酸片段a1和a3杂交或退火;并且,
所述靶核酸片段F2通过第一瓣突和第二瓣突分别与核苷酸片段a1和a3杂交或退火,进而连接在核苷酸片段a1和a3之间,从而,将感兴趣的核酸分子中的核苷酸片段a2替换为所述靶核酸片段。
在某些实施方案中,所述第一瓣突能够杂交或退火到所述核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的。
在某些实施方案中,所述第二瓣突能够杂交或退火到所述核苷酸片段a3的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第四核酸编辑系统断裂所述感兴趣的核酸分子而形成的。
在某些实施方案中,所述方法在细胞内进行。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3和第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二 Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、核酸分子A3和核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,所述核酸分子A1、B1、C1、D1、A2、B2、C2、D2、A3和A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸。
在某些实施方案中,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列。在某些实施方案中,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂。
在某些实施方案中,所述感兴趣的核酸分子是所述细胞的基因组DNA。
在某些实施方案中,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如前所定义。
在某些实施方案中,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如前所定义。
在某些实施方案中,所述第三核酸编辑系统如前所定义。
在某些实施方案中,所述第四核酸编辑系统如前所定义。
在某些实施方案中,所述第三核酸编辑系统如前所定义,所述第四核酸编辑系统如前所定义,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列以及第四Cas蛋白识别的第四PAM序列。在某些实施方案中,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂;并且,所述第四功能性复合物通过所述第四PAM序列和所述第四gRNA与所述感兴趣的核酸分子结合,并将其断裂。
在某些实施方案中,所述第一和第二Cas蛋白是相同的,选自切割DNA单链的Cas蛋白,并且所述第二DNA聚合酶与所述第一DNA聚合酶相同;其中,所述第一Cas蛋白与所述第一、第二gRNA分别形成第一、第二功能性复合物,并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,形成具有第一瓣突和第二瓣突的靶核酸片段F2。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3和第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一 gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述核酸分子A3和A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1、C2、D2、A3以及A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。在某些实施方案中,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物。
在某些实施方案中,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA。在某些实施方案中,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子、含有编码所述第一PegRNA的核苷酸序列的核酸分子、含有编码所述第二PegRNA的核苷酸序列的核酸分子、含有编码所述第三核酸编辑系统的核苷酸序列的核酸分子以及含有编码所述第四核酸编辑系统的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,所述方法包括以下步骤:
i.提供双链靶核酸和感兴趣的核酸分子;和
提供所述第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统;其中,所述第三核酸编辑系统和第四核酸编辑系统分别如前所定义;
ii将所述双链靶核酸与所述第一和第二Cas蛋白、第一和第二gRNA、第一和第二DNA聚合酶、第一和第二标签引物接触,并且,将所述感兴趣的核酸分子与所述第三核酸编辑系统和第四核酸编辑系统接触。
在某些实施方案中,在步骤ii中:
所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物,所述第三Cas蛋白和第三gRNA相结合形成第三功能性复合物,且所述第四Cas蛋白和第四gRNA相结合形成第四功能性复合物;并且,
所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的3’端,位于上述两个3’端之间的双链部分被称为靶核酸片段F1,且,所述第三和第四功能性复合物结合并断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1、a2和a3;并且,
所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;其中,所述第一瓣突和第二瓣突分别能够与断裂的核苷酸片段a1和a3杂交或退火;并且,
所述第三标签引物通过所述第三靶结合序列杂交或退火至所述核苷酸片段a1的一条核酸链的3’端,其中,所述3’端是因所述第三功能性复合物断裂感兴趣的核酸分子而形成的;且,所述第四标签引物通过所述第四靶结合序列杂交或退火至所述核苷酸片段a3的一条核酸链的3’端,其中,所述3’端是因所述第四功能性复合物断裂感兴趣的核酸分子而形成的。
在某些实施方案中,所述方法在细胞内进行。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3和第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述核酸分子A3、所述核酸分子A4递送入细胞中,以在细胞内提供所述第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1、A2、B2、C2、D2、A3、A4递送入细胞中,以在细胞内提供第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸。
在某些实施方案中,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列。在某些实施方案中,在步骤ii中, 所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂。
在某些实施方案中,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列以及第四Cas蛋白识别的第四PAM序列。在某些实施方案中,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂;并且,所述第四功能性复合物通过所述第四PAM序列和所述第四gRNA与所述感兴趣的核酸分子结合,并将其断裂。
在某些实施方案中,所述感兴趣的核酸分子是所述细胞的基因组DNA。
在某些实施方案中,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如前所定义。
在某些实施方案中,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如前所定义。
在某些实施方案中,所述第一和第二Cas蛋白是相同的,选自切割DNA单链的Cas蛋白,所述第三和第四Cas蛋白是相同的,选自切割DNA双链的Cas蛋白,并且所述第一、第二、第三和第四DNA聚合酶是相同的DNA聚合酶;其中,所述第一Cas蛋白与所述第一、第二、第三和第四gRNA分别形成第一、第二、第三和第四功能性复合物;并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,形成具有第一瓣突和第二瓣突的靶核酸片段F2。
在某些实施方案中,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3、第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白,第一DNA聚合酶,第一、第二gRNA,以及第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述核酸分子A3和A4递送入细胞中,以在细胞内提供所述第一Cas蛋白,第一DNA聚合酶,第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,在步骤i中,将所述核酸分子A1、B1、C1、D1、C2、D2、A3、递送入细胞中,以在细胞内提供所述第一Cas蛋白,第一DNA聚合酶,第一、第二gRNA,以及第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统。
在某些实施方案中,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白。在某些实施方案中,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶。
在某些实施方案中,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA。在某些实施方案中,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和 所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物。
在某些实施方案中,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中。在某些实施方案中,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA。在某些实施方案中,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物。
发明的有益效果
与现有技术相比,本申请提供的核酸编辑系统、试剂盒和方法能够在基因组的特异位点进行双链核酸断裂的同时,断裂双链靶核酸(例如,含有目的核酸序列或其他外源核酸片段的供体载体)的一条核酸链并在切口的3’末端形成瓣突(其是与基因组特异断裂的末端序列相同或互补的同源瓣结构)。在此基础上,本申请的系统、试剂盒和方法能够实现高效、精确的外源核酸(特别是大片段外源核酸)在基因组特异位点的的插入和置换。
具体来说,至少实现了以下效果:1、大幅提高外源基因定点整合的效率;2、提高了连接处的精确性,降低了接头处碱基缺失或插入等突变的发生;3、外源基因在基因组特异位点的整合是单向的;4、相较于NHEJ及其他依赖供体载体双链切割的整合方案,本发明系统不产生线性化的DNA片段,提高了外源基因定点整合的安全性;此外,相较HDR介导的基因编辑技术,本发明中的f-PAINT方法不需要构建同源臂,可以大幅提高病毒载体特别是腺相关病毒载体所承载的外源基因的长度。
下面将结合附图和实施例对本发明的实施方案进行详细描述,但是本领域技术人员将理解,下列附图和实施例仅用于说明本发明,而不是对本发明的范围的限定。根据附图和优选实施方案的下列详细描述,本发明的各种目的和有利方面对于本领域技术人员来说将变得显然。
附图说明
图1显示了本发明方法(f-PAINT)介导外源基因插入基因组的原理的示意图。其中,图1a为使用HDR、NHEJ和f-PAINT方法进行外源基因在基因组上定点整合的原理示意图。其中,黑色双实线代表基因组序列;灰色双实线代表供体载体的骨架序列;黄色条形框代表外源基因;红色和蓝色实线分别表示基因组上整合位点左右两侧与供体载体(自身携带或加工产生)的同源序列。黑色三角表示基因组DNA上核酸酶的特异识别、切割位点,基因组DNA在核酸酶的作用下发生双链断裂的平末端或粘末端。蓝色和紫色三角表示一对呈反向排列的PE-Cas蛋白(第一融合蛋白)的靶向识别序列,PE-Cas蛋白识别并切割供体载体的一条核酸链产生切口,并在切口的3’端生成能够与基因组断裂的末端序列相同或互补的同源瓣序列。含有外源基因的DNA片段位于两个同源瓣结构之间。基于HDR的方法利用供体载体上的同源臂,通过细胞HDR修复机制,实现外源基因在基因组DNA特异双链切割位点的定点整合。基于NHEJ的方法则依赖细胞自身的NHEJ修复机制,将从供体载体上切割下来的外源基因片段连接到基因组特异整合位点双链断裂的末端上,从而实现外源基因的定点整合。本发明的方法(f-PAINT)是利 用在供体载体上加工生成的同源瓣(瓣突)序列,与基因组上双链断裂的末端互补配对,并在基因组断裂末端发生以供体载体为模板的DNA复制,进而通过双链杂交实现末端重新连接和外源片段的定点整合。图1b表示供体载体上同源瓣(瓣突)序列的加工产生过程。以PE-spCas9为例,PE-spCas9/pegRNA识别并结合供体载体上外源基因两侧的靶向识别序列,并切割其中不与pegRNA配对的核酸单链。随后,pegRNA上的引物结合序列结合到切割产生的游离核酸单链的末端,并在逆转录酶的作用在,以pegRNA的模板序列(与基因组断裂的末端同源)为模板,延伸出同源瓣(瓣突)序列。
图2显示了使用f-PAINT方法在人293T细胞的GAPDH基因的3’UTR上实现外源基因高效、特异的定点整合。其中,图2a显示了使用不同方法(HDR、NHEJ和f-PAINT)在人293T细胞基因组的GAPDH基因的3’UTR区定点敲入外源基因(IRES-EGFP)的流程示意图。黑色实线表示基因组序列;蓝色和灰色方框分别表示外显子蛋白编码区和非编码区;红色和蓝色实线表示同源序列:长实线表示HDR供体载体上的同源臂序列;短实线表示f-PAINT供体载体上加工生成的同源瓣(瓣突)序列。图2b表示使用不同方法介导外源基因定点整合的效率比较。结果显示f-PAINT介导的外源基因整合效率大幅高于HDR和NHEJ方法。图2c为使用NHEJ和f-PAINT两种方式进行外源基因定点整合产生的正确编辑基因序列和副产物的PCR鉴定结果。显示f-PIANT方式由于不产生线性化DNA片段,在基因组特异整合位点不产生外源基因反向插入、供体载体骨架插入等副产物。图2d为NHEJ和f-PAINT两种方式进行外源基因定点整合产生的正确编辑基因序列接头处的sanger测序结果。显示f-PAINT方法介导的定点整合的连接接头相比于NHEJ方法具有更高的精确性。
图3显示了HDR,NHEJ,HMEJ和f-PAINT不同方法在人基因组AAVS1位点和小鼠基因组Rosa26位点上定点整合效率的比较。其中,图3a为不使用saCas9/sgRNA靶向切割基因组特异位点(AAVS1或Rosa26)的情况下,HDR,NHEJ,HMEJ和f-PAINT不同方法介导的外源基因(CAG-EGFP)在基因组上的整合效率(EGFP阳性细胞率)。此时外源基因的整合为非特异位点的整合,或称为随机整合。外源基因在基因组的随机整合会导致基因插入突变,破环基因组的稳定性。图3b为使用saCas9/sgRNA靶向切割基因组特异位点(AAVS1或Rosa26)的情况下,HDR,NHEJ,HMEJ和f-PAINT等不同方法介导的外源基因(CAG-EGFP)在基因组上的整合效率(EGFP阳性细胞率)。此时外源基因的整合主要为位点特异的整合。
图4显示了使用HDR、HDR NT(非靶向特异位点的HDR方法)、f-PAINT和f-PAINT NT(非靶向特异位点的f-PAINT方法)介导外源基因CAG-EGFP在K562细胞上的基因治疗相关的安全港位点以及遗传疾病相关基因位点上定点整合的效率比较。HDR和f-PAINT均维持低水平的随机整合概率,但f-PAINT方法在AAVS1、CCR5、TRAC、WAS、HBB、IL2RG等不同基因位点上均实现了更高效率的外源基因定点整合。
图5显示f-PAINT方法介导的外源基因CAG-EGFP在K562细胞的AAVS1、CCR5、TRAC等安全港位点定点整合的基因型鉴定和接头Sanger测序结果。
图6显示f-PAINT方法介导的外源基因CAG-EGFP在K562细胞的WAS、HBB、IL2RG等遗传疾病相关位点定点整合的基因型鉴定和接头Sanger测序结果。
图7为本发明中h-PAINT方法介导外源基因定点整合的原理示意图。在h-PAINT(LHA)方法中,供体载体上外源基因的左侧为长500-1500bp的左侧同源臂,外源基因的右侧为可以被PE-spCas9/pegRNA识别并加工的靶向识别序列。靶向识别序列在PE-spCas9/pegRNA的作用下产生右侧同源瓣,同源瓣与基因组的断裂末端通过碱基互补配对发生相互作用,实现基因组断裂末端的延伸,并与基因组的另一个断裂末端通过左侧 同源臂互补配对,实现外源基因的整合和链的修复。对于h-PAINT(RHA),供体载体上外源基因的右侧为长500-1500bp的右侧同源臂,外源基因的左侧为可以被PE-spCas9/pegRNA识别并加工的靶向识别序列。
图8显示f-PAINT和h-PAINT方法介导外源基因IRES-EGFP在人GAPDH基因的3’UTR上对外源基因定点整合的效率比较。图8a为h-PAINT方法介导外源基因IRES-EGFP在人GAPDH基因的3’UTR上实现外源基因定点整合的示意图。h-PAINT(LHA)供体载体的左侧为800bp的左侧同源臂序列,右侧为PE-spCas9/GAPDH-pegβ的靶向识别序列;h-PAINT(RHA)供体载体的右侧为800bp的右侧同源臂序列,左侧为PE-spCas9/GAPDH-pegα的靶向识别序列。图8b为f-PAINT、h-PAINT(LHA)、h-PAINT(RHA)介导外源基因IRES-EGFP在人GAPDH基因的3’UTR上实现外源基因定点整合的效率结果。图8c为不同方法编辑的细胞的基因型鉴定结果。图8d为h-PAINT方法编辑细胞的5’和3’接头的Sanger测序结果。
序列信息
本发明涉及的部分序列的信息提供于下面的表1中。
表1:序列的描述
Figure PCTCN2022086979-appb-000001
Figure PCTCN2022086979-appb-000002
Figure PCTCN2022086979-appb-000003
Figure PCTCN2022086979-appb-000004
Figure PCTCN2022086979-appb-000005
Figure PCTCN2022086979-appb-000006
Figure PCTCN2022086979-appb-000007
Figure PCTCN2022086979-appb-000008
Figure PCTCN2022086979-appb-000009
Figure PCTCN2022086979-appb-000010
Figure PCTCN2022086979-appb-000011
Figure PCTCN2022086979-appb-000012
Figure PCTCN2022086979-appb-000013
Figure PCTCN2022086979-appb-000014
Figure PCTCN2022086979-appb-000015
Figure PCTCN2022086979-appb-000016
Figure PCTCN2022086979-appb-000017
Figure PCTCN2022086979-appb-000018
Figure PCTCN2022086979-appb-000019
Figure PCTCN2022086979-appb-000020
具体实施方式
现参照下列意在举例说明本发明(而非限定本发明)的实施例来描述本发明。
除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。例如,本发明中所使用的免疫学、生物化学、化学、分子生物学、微生物学、细胞生物学、基因组学和重组DNA等常规技术,可参见萨姆布鲁克(Sambrook)、弗里奇(Fritsch)和马尼亚蒂斯(Maniatis),《分子克隆:实验室手册》(MOLECULAR CLONING:A LABORATORY MANUAL),第2次编辑(1989);《当代分子生物学实验手册》(CURRENT PROTOCOLS IN MOLECULAR BIOLOGY)(F.M.奥苏贝尔(F.M.Ausubel)等人编辑,(1987));《酶学方法》(METHODS IN ENZYMOLOGY)系列(学术出版公司):《PCR 2:实用方法》(PCR 2:A PRACTICAL APPROACH)(M.J.麦克弗森(M.J.MacPherson)、B.D.黑姆斯(B.D.Hames)和G.R.泰勒(G.R.Taylor)编辑(1995)),以及《动物细胞培养》(ANIMAL  CELL CULTURE)(R.I.弗雷谢尼(R.I.Freshney)编辑(1987))。
另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。
实施例1.使用f-PAINT系统将外源基因(IRES-EGFP)定点插入人GAPDH基因3’ UTR区
为了验证f-PAINT系统将外源基因定点插入基因组的效果,本实施例设计了如下实验:使用f-PAINT系统将报告基因IRES-EGFP定点敲入人基因组GAPDH的3’UTR区,并且以HDR方法和NHEJ方法作为对照。HDR、NHEJ和f-PAINT三种不同方法介导外源基因定点整合的原理示意图如图1所示。上述不同方法介导IRES-EGFP报告基因在GAPDH基因上定点整合的流程示意图如图2a所示。
GAPDH基因位于12号染色体,编码甘油醛-3-磷酸脱氢酶,是重要的管家基因,在293T细胞中的表达丰度高。报告基因被正确整合到GAPDH的3’UTR区后能够随GAPDH基因一起转录。其中的IRES序列可以招募核糖体,从而使EGFP得以表达。EGFP的荧光信号可方便地通过荧光显微镜直接观察,也可以通过流式细胞术对正确编辑的表达EGFP的细胞进行捕捉和定量。
本实施例使用的pCAG-spCas9-mCherry质粒(其能够表达spCas9蛋白(SEQ ID NO:1)和mCherry蛋白(SEQ ID NO:2))、pCAG-spCas9(H840A)-mCherry质粒(其能够表达spCas9(H840A)蛋白(SEQ ID NO:3)和mCherry蛋白(SEQ ID NO:2))、pCAG-saCas9质粒(其能够表达saCas9蛋白(SEQ ID NO:4)、pUC19-U6-gRNA(saCas9)(其能够转录缺少引导序列的gRNA(saCas9)(SEQ ID NO:5))、pUC19-U6-gRNA(spCas9)(其能够转录缺少引导序列的gRNA(spCas9)(SEQ ID NO:6))、pGH质粒(作为供体载体的骨架)均获自中国科学院动物研究所李伟课题组。
从addgene公司购买的pCMV-PE2(#132775)质粒上扩增得到编码MLV TR(SEQ ID NO:7)的核苷酸片段,且从pCAG-spCas9(H840A)-mCherry质粒上扩增出编码spCas9(H840A)部分的核苷酸片段和编码mCherry的核苷酸片段。通过In-fusion克隆技术将上述扩增的MLV TR和spCas9(H840A)核苷酸片段连接到AscI/BsrGI双酶切的pCAG-spCas9(H840A)-mCherry质粒上,得到pCAG-PE-spCas9-mCherry质粒,其能够表达PE-spCas9蛋白(SEQ ID NO:8)和mCherry蛋白。所述PE-spCas9蛋白融合了MLV TR和spCas9(H840A)。
将引物sgGAPDH-F(SEQ ID NO:9)和sgGAPDH-R(SEQ ID NO:10)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sgGAPDH质粒,其能够转录出sgGAPDH(SEQ ID NO:11),引导saCas9蛋白靶向人GAPDH位点的3’URT区的特异位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物GAPDH-pegα-F(SEQ ID NO:20)和GAPDH-pegα-R(SEQ ID NO:21)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-GAPDH-pegα质粒,其能够转录出GAPDH-pegα(SEQ ID NO:22),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物GAPDH-pegβ-F(SEQ ID NO:23)和GAPDH-pegβ-R(SEQ ID NO:24)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-GAPDH-pegβ质粒,其能够转录出GAPDH-pegβ(SEQ ID NO:25),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因IRES-EGFP(SEQ ID NO:26)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:27和SEQ ID NO:28)。
在HDR系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9-mCherry、sgGAPDH、连同HDR的供体载体递送到293T细胞中。在HDR系统的对照组中,将pCAG-saCas9-mCherry连同HDR的供体载体递送到293T细胞中。293T细胞系来自ATCC细胞库。转染24小时后用流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养5天后,用流式细胞仪分析EGFP阳性细胞的比率。在NHEJ系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH、pCAG-spCas9-mCherry、pUC19-U6-sgα、pUC19-U6-sgβ连同NHEJ的供体载体转染到293T细胞中。在NHEJ系统的对照组中,将pCAG-saCas9、pCAG-spCas9-mCherry、pUC19-U6-sgα、pUC19-U6-sgβ连同供体载体转染到293T细胞中。转染24小时后在流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养5天后,用流式细胞仪分析EGFP阳性细胞的比率。在f-PAINT系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH、pCAG-PE-spCas9-mCherry、pUC19-U6-GAPDH-pegα、pUC19-U6-GAPDH-pegβ连同f-PAINT的供体载体转染到293T细胞中。在f-PAINT系统的阴性对照组中,将pCAG-saCas9、pCAG-PE-spCas9-mCherry、pUC19-U6-GAPDH-pegα、pUC19-U6-GAPDH-pegβ连同供体载体转染到293T细胞中。转染24小时后在流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养5天后,用流式细胞仪分析EGFP阳性细胞的比率。
比较三个不同系统的EGFP阳性细胞的比例,可以反应不同系统介导的外源基因IRES-EGFP定点整合效率的差异。外源基因整合效率结果如图2b所示。结果显示,f-PAINT系统中EGFP阳性细胞的比率为约30%,是HDR系统的约4倍,NHEJ系统的约2倍。
提取NHEJ和f-PAINT系统编辑的细胞的基因组DNA,然后分别用引物GAPDH-P1(SEQ ID NO:29)/GAPDH-P2(SEQ ID NO:30)(扩增正确整合的5’接头)、GAPDH-P3(SEQ ID NO:31)/GAPDH-P4(SEQ ID NO:32)(扩增正确整合的3’接头)、GAPDH-P1(SEQ ID NO:29)/GAPDH-P3(SEQ ID NO:31)(扩增外源基因反向整合的5’接头)、GAPDH-P2(SEQ ID NO:30)/GAPDH-P4(SEQ ID NO:32)(扩增外源基因反向整合的3’接头)、GAPDH-P1(SEQ ID NO:29)/GAPDH-P5(SEQ ID  NO:33)(扩增载体骨架整合的5’接头)、GAPDH-P6(SEQ ID NO:34)/GAPDH-P4(SEQ ID NO:32)(扩增载体骨架整合的3’接头)、GAPDH-P1(SEQ ID NO:29)/GAPDH-P6(SEQ ID NO:34)(扩增载体骨架反向整合的5’接头)、GAPDH-P5(SEQ ID NO:33)/GAPDH-P4(SEQ ID NO:32)(扩增载体骨架反向整合的3’接头)对基因编辑后的基因组DNA产物进行PCR鉴定,并对正确整合的接头序列进行Sanger法测序分析。
PCR鉴定结果如图2c所示。结果显示,NHEJ方法介导的外源基因整合,由于发生供体载体的片段化加工,在基因组的特异整合位点除了产生正确的外源基因整合之外,还会产生外源基因反向整合、骨架整合等副产物。Sanger测序分析结果如图2d所示。结果显示f-PAINT系统的连接接头相比NHEJ方法也更加精确,在接头处不容易产生碱基插入、缺失和替换。综上可知,本发明描述的f-PAINT系统可以大幅提高外源基因的定点整合效率和精确性。由于不产生供体载体线性化,相比较NHEJ等需要依赖线性化的双链DNA作为供体载体的整合方法具有更高的安全性。
实施例2.使用f-PAINT系统将外源报告基因(CAG-EGFP)定点插入人AAVS1位 点、小鼠Rosa26位点
为了进一步验证f-PAINT系统介导外源基因定点整合的位点特异性,本实施例设计了如下实验:使用f-PAINT系统将报告基因CAG-EGFP定点敲入人基因组AAVS1位点的第一个内含子以及小鼠基因组Rosa26位点的第一个内含子中,并且以HDR、NHEJ以及HMEJ方法作为对照。
AAVS1位点和Rosa26位点分别公认的是人和小鼠基因组上的安全港位点,外源序列在这些位点的插入不会影响细胞本身的功能。报告基因CAG-EGFP自带CAG启动子,整合到基因组中后,CAG启动子即可驱动EGFP的表达,EGFP的荧光信号可方便地通过荧光显微镜直接观察,也可以通过流式细胞术对正确编辑的表达EGFP的细胞进行捕捉和定量。除了检测外源基因在AAVS1位点或Rosa26位点的特异性整合外,报告基因CAG-EGFP也可以检测外源基因在基因组中的随机整合。
本实施例使用的用于表达sgRNA和pegRNA的质粒以及携带报告基因CAG-EGFP的供体质粒的构建方式同实施例1。其中,HMEJ供体载体也是以pGH质粒为载体骨架,在报告基因的两侧各引入长约800bp的同源臂序列,同源臂的外侧各引入一个spCas9/sgα的靶向识别序列。用于质粒构建的引物序列、sgRNA和pegRNA的序列、报告基因CAG-EGFP的序列以及同源臂的序列见表1,具体使用的引物序列如下所示。
(1)AAVS1位点
将引物sgAAVS1-F(SEQ ID NO:36)和sgAAVS1-R(SEQ ID NO:10)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sg AAVS1质粒,其能够转录出sgAAVS1(SEQ ID NO:38),引导saCas9蛋白靶向人基因组AAVS1位点的第一个内含子。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物AAVS1-pegα-F(SEQ ID NO:39)和AAVS1-pegα-R(SEQ ID NO:40)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-AAVS1-pegα质粒,其能够转录出AAVS1-pegα(SEQ ID NO:41),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物AAVS1-pegβ-F(SEQ ID NO:42)和AAVS1-pegβ-R(SEQ ID NO:43)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-AAVS1-pegβ质粒,其能够转录出AAVS1-pegβ(SEQ ID NO:44),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:46和SEQ ID NO:47)。
(2)Rosa26位点
将引物sgmRosa26-F(SEQ ID NO:52)和sgmRosa26-R(SEQ ID NO:53)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sg Rosa26质粒,其能够转录出sgmRosa26(SEQ ID NO:54),引导saCas9蛋白靶向基因组Rosa26位点的第一个内含子。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物mRosa26-pegα-F(SEQ ID NO:55)和mRosa26-pegα-R(SEQ ID NO:56)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-mRosa26-pegα质粒,其能够转录出mRosa26-pegα(SEQ ID NO:57),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物mRosa26-pegβ-F(SEQ ID NO:58)和mRosa26-pegβ-R(SEQ ID NO:59)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-mRosa26-pegβ质粒,其能够转录出mRosa26-pegβ(SEQ ID NO:60),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为 SEQ ID NO:61和SEQ ID NO:62)。
为了验证各个不同的方法对于外源基因CAG-EGFP在基因组中随机整合的影响,首先检测在不打靶基因组特异位点的情况下,各方法介导外源基因CAG-EGFP整合入基因组的效率:在HDR系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9-mCherry连同HDR的供体载体递送到293T细胞或小鼠胚胎干细胞中。在NHEJ系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、pCAG-spCas9-mCherry、pUC19-U6-sgα、pUC19-U6-sgβ连同NHEJ的供体载体转染到293T细胞或小鼠胚胎干细胞中。在HMEJ方法的实施中,将pCAG-saCas9、pCAG-spCas9-mCherry、pUC19-U6-sgα、连同HMEJ的供体载体转染到293T细胞或小鼠胚胎干细胞中。在f-PAINT系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、pCAG-PE-spCas9-mCherry、pUC19-U6-pegα、pUC19-U6-pegβ连同f-PAINT的供体载体转染到293T细胞或小鼠胚胎干细胞中。细胞转染24小时后在流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养14天后,用流式细胞仪分析EGFP阳性细胞的比率。
为了验证不同方法在基因组的特异位点进行外源基因整合的效率,在各个方法的实施中,通过saCas9/sgRNA对人基因组的AAVS1位点或小鼠的Rosa26位点进行靶向切割:在HDR系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH(或sgRosa26)、以及HDR的供体载体递送到293T细胞或小鼠胚胎干细胞中。在NHEJ系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH(或sgRosa26)、pCAG-spCas9-mCherry、pUC19-U6-sgα、pUC19-U6-sgβ连同NHEJ的供体载体转染到293T细胞或小鼠胚胎干细胞中。在HMEJ方法的实施中,将pCAG-saCas9、sgGAPDH(或sgRosa26)、pCAG-spCas9-mCherry、pUC19-U6-sgα、连同HMEJ的供体载体转染到293T细胞或小鼠胚胎干细胞中。在f-PAINT系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH(或sgRosa26)、pCAG-PE-spCas9-mCherry、pUC19-U6-pegα、pUC19-U6-pegβ连同f-PAINT的供体载体转染到293T细胞或小鼠胚胎干细胞中。细胞转染24小时后在流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养14天后,用流式细胞仪分析EGFP阳性细胞的比率。
分别比较在非靶向基因组特异位点和靶向基因组特异位点的情况下,四个不同系统产生的EGFP阳性细胞的比例,可以反应不同系统介导的外源基因CAG-EGFP在基因组上整合效率的差异。外源基因非靶向整合效率结果如图3a所示。结果显示,在人AAVS1和小鼠Rosa26两个位点,f-PAINT系统及HDR系统均具有较低的外源基因随机整合概率,而NHEJ和HMEJ系统均具有较高的外源基因随机整合概率。外源基因靶向整合效率结果如图3b所示。结果显示,在人AAVS1和小鼠Rosa26两个位点,f-PAINT系统相比其他方法具有最高的外源基因靶向整合效率。上述结果反映出f-PAINT系统介导的外源基因定点整合,不仅具有高效性,而且具有很强的位点特异性。
实施例3.使用f-PAINT系统将外源基因(CAG-EGFP)定点插入基因治疗相关的安 全港位点和遗传疾病相关位点
为了进一步验证f-PAINT系统在基因组不同位点介导外源基因定点整合的准确性和效率,并证明f-PAINT系统在基因治疗上的应用潜力,本实施例设计了如下实验:在K562细胞上使用f-PAINT系统将报告基因CAG-EGFP定点敲人基因组AAVS1、CCR5、TRAC等安全港位点以及WAS、HBB、IL2RG等遗传疾病相关位点,并以 HDR方法作为对照。sgRNA、pegRNA的表达载体以及供体载体的构建如实施例1所述。用于载体构建的引物序列、供体载体的同源臂序列见表1,具体使用的引物序列如下所示。
(1)AAVS1位点
所用的引物以及构建过程同实施例2。
(2)CCR5位点
将引物sgCCR5-F(SEQ ID NO:65)和sgCCR5-R(SEQ ID NO:66)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sgCCR5质粒,其能够转录出sgCCR5(SEQ ID NO:67),引导saCas9蛋白靶向CCR5位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物CCR5-pegα-F(SEQ ID NO:68)和CCR5-pegα-R(SEQ ID NO:69)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-CCR5-pegα质粒,其能够转录出CCR5-pegα(SEQ ID NO:70),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物CCR5-pegβ-F(SEQ ID NO:71)和AAVS1-pegβ-R(SEQ ID NO:72)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-CCR5-pegβ质粒,其能够转录出CCR5-pegβ(SEQ ID NO:73),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:74和SEQ ID NO:75)。
(3)TRAC位点
将引物sgTRAC-F(SEQ ID NO:78)和sgTRAC-R(SEQ ID NO:79)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sgTRAC质粒,其能够转录出sgTRAC(SEQ ID NO:80),引导saCas9蛋白靶向TRAC位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶 连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物TRAC-pegα-F(SEQ ID NO:81)和TRAC-pegα-R(SEQ ID NO:82)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-TRAC-pegα质粒,其能够转录出TRAC-pegα(SEQ ID NO:83),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物TRAC-pegβ-F(SEQ ID NO:84)和TRAC-pegβ-R(SEQ ID NO:85)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-TRAC-pegβ质粒,其能够转录出TRAC-pegβ(SEQ ID NO:86),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:87和SEQ ID NO:88)。
(4)WAS-1位点
将引物sg WAS-1-F(SEQ ID NO:91)和sgWAS-1-R(SEQ ID NO:92)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sg WAS-1质粒,其能够转录出sgWAS-1(SEQ ID NO:93),引导saCas9蛋白靶向WAS-1位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物WAS-1-pegα-F(SEQ ID NO:94)和WAS-1-pegα-R(SEQ ID NO:95)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-WAS-1-pegα质粒,其能够转录出WAS-1-pegα(SEQ ID NO:96),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物WAS-1-pegβ-F(SEQ ID NO:97)和WAS-1-pegβ-R(SEQ ID NO:98)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-WAS-1-pegβ质粒,其能够转录出WAS-1-pegβ(SEQ ID NO:99),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体 载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:100和SEQ ID NO:101)。
(5)WAS-3位点
将引物sgWAS-3-F(SEQ ID NO:104)和sgWAS-3-R(SEQ ID NO:105)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sg WAS-3质粒,其能够转录出sgWAS-3(SEQ ID NO:106),引导saCas9蛋白靶向WAS-3位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物WAS-3-pegα-F(SEQ ID NO:107)和WAS-3-pegα-R(SEQ ID NO:108)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-WAS-3-pegα质粒,其能够转录出WAS-3-pegα(SEQ ID NO:109),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物WAS-3-pegβ-F(SEQ ID NO:110)和WAS-3-pegβ-R(SEQ ID NO:111)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-WAS-3-pegβ质粒,其能够转录出WAS-3-pegβ(SEQ ID NO:112),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:113和SEQ ID NO:114)。
(6)HBB位点
将引物sgHBB-F(SEQ ID NO:117)和sgHBB-R(SEQ ID NO:118)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sgHBB质粒,其能够转录出sgHBB(SEQ ID NO:119),引导saCas9蛋白靶向HBB位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异 识别序列(SEQ ID NO:19)。
将引物HBB-pegα-F(SEQ ID NO:120)和HBB-pegα-R(SEQ ID NO:121)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-HBB-pegα质粒,其能够转录出HBB-pegα(SEQ ID NO:122),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物HBB-pegβ-F(SEQ ID NO:123)和HBB-pegβ-R(SEQ ID NO:124)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-HBB-pegβ质粒,其能够转录出HBB-pegβ(SEQ ID NO:125),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:126和SEQ ID NO:127)。
(7)IL2RG位点
将引物sgIL2RG-F(SEQ ID NO:130)和sgIL2RG-R(SEQ ID NO:131)退火并用T4连接酶连接到用BsaI酶切的pUC19-U6-sgRNA(saCas9)质粒上,得到pUC19-U6-sgIL2RG质粒,其能够转录出sgIL2RG(SEQ ID NO:132),引导saCas9蛋白靶向IL2RG位点。
将引物sgα-F(SEQ ID NO:12)和sgα-R(SEQ ID NO:13)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgα质粒,其能够转录出sgα(SEQ ID NO:14),引导spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15)。
将引物sgβ-F(SEQ ID NO:16)和sgβ-F(SEQ ID NO:17)退火并通过T4连接酶连接到用BsaI酶切的pUC19-U6-gRNA(spCas9)质粒载体上,得到pUC19-U6-sgβ质粒,其能够转录出sgβ(SEQ ID NO:18),引导spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19)。
将引物IL2RG-pegα-F(SEQ ID NO:133)和IL2RG-pegα-R(SEQ ID NO:134)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgα质粒载体上,得到pUC19-U6-IL2RG-pegα质粒,其能够转录出IL2RG-pegα(SEQ ID NO:135),引导PE-spCas9蛋白靶向供体载体上sgα的特异识别序列(SEQ ID NO:15),并在切口处反转录出同源瓣结构。
将引物IL2RG-pegβ-F(SEQ ID NO:136)和IL2RG-pegβ-R(SEQ ID NO:137)进行重叠延伸PCR,得到的片段回收后通过In-fusion克隆技术连接到HindIII酶切的pUC19-U6-sgβ质粒载体上,得到pUC19-U6-IL2RG-pegβ质粒,其能够转录出IL2RG-pegβ(SEQ ID NO:138),引导PE-spCas9蛋白靶向供体载体上sgβ的特异识别序列(SEQ ID NO:19),并在切口处反转录出同源瓣结构。
报告基因CAG-EGFP(SEQ ID NO:45)由捷瑞公司合成,并通过T4连接酶连接到EcoRV酶切的pGH质粒载体上,作为供体载体。NHEJ和f-PAINT系统使用相同的供体载体,其是在报告基因的两侧分别具有sgα和sgβ的特异识别序列(序列分别为SEQ ID NO:15和SEQ ID NO:19),两个特异识别序列以PAM-out的形式呈反向排列。对于 HDR系统的供体载体,在报告基因的两侧分别具有约800bp的同源臂序列(序列分别为SEQ ID NO:139和SEQ ID NO:140)。
在HDR系统的实施中,用Lonza公司的SE Cell Line 4D-Nucleofector X Kit通过电转染将pCAG-saCas9、表达打靶各个位点的sgRNA的质粒、以及HDR的供体载体递送到K562细胞中。在f-PAINT系统的实施中,用Lonza公司的SE Cell Line 4D-Nucleofector X Kit通过电转染将pCAG-saCas9、表达打靶各个位点的sgRNA的质粒、pCAG-PE-spCas9-mCherry、各位点对应的pUC19-U6-pegα、pUC19-U6-pegβ质粒以及f-PAINT的供体载体递送到K562细胞中。在非靶向基因组特异位点的对照组中,不使用sgRNA质粒对基因组特异位点进行打靶。细胞转染48小时后在流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养14天后,用流式细胞仪分析EGFP阳性细胞的比率。提取f-PAINT系统编辑细胞的基因组,用引物AAVS1-P1(SEQ ID NO:48)/CAG-EGFP-P2(SEQ ID NO:49)(扩增AAVS1位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/AAVS1-P4(SEQ ID NO:51)(扩增AAVS1位点的3’接头)、CCR5-P1(SEQ ID NO:76)/CAG-EGFP-P2(SEQ ID NO:49)(扩增CCR5位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/CCR5-P4(SEQ ID NO:77)(扩增CCR5位点的3’接头)、TRAC-P1(SEQ ID NO:89)/CAG-EGFP-P2(SEQ ID NO:49)(扩增TRAC位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/TRAC-P4(SEQ ID NO:90)(扩增TRAC位点的3’接头)、WAS-1-P1(SEQ ID NO:102)/CAG-EGFP-P2(SEQ ID NO:49)(扩增WAS-1位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/WAS-1-P4(SEQ ID NO:103)(扩增WAS-1位点的3’接头)、WAS-3-P1(SEQ ID NO:115)/CAG-EGFP-P2(SEQ ID NO:49)(扩增WAS-3位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/WAS-3-P4(SEQ ID NO:116)(扩增WAS-3位点的3’接头)、HBB-P1(SEQ ID NO:128)/CAG-EGFP-P2(SEQ ID NO:49)(扩增HBB位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/HBB-P4(SEQ ID NO:129)(扩增HBB位点的3’接头)、IL2RG-P1(SEQ ID NO:141)/CAG-EGFP-P2(SEQ ID NO:49)(扩增IL2RG位点的5’接头)、CAG-EGFP-P3(SEQ ID NO:50)/IL2RG-P4(SEQ ID NO:142)(扩增IL2RG位点的3’接头),并对扩增产物进行Sanger法测序。
不同方法介导外源基因CAG-EGFP在基因组不同位点整合的效率如图4所示。结果显示,在K562细胞中,对于不同位点,f-PAINT方法介导外源基因定点整合的准确性与HDR相差不大,但定点整合效率明显高于HDR方法。基因型鉴定和Sanger测序的结果如图5和图6所示,结果显示f-PAINT可以准确介导外源基因在基因组特异位点的整合。以上结果显示出f-PAINT方法在基因治疗上具有巨大应用潜力。
实施例4.使用h-PAINT系统将外源基因(IRES-EGFP)定点插入人GAPDH位点 的3’UTR区
为了检测h-PAINT方法介导的外源基因定点整合的效力,设计了如下实验:在293T细胞上使用h-PAINT系统将报告基因IRES-EGFP定点敲人GAPDH基因的3’UTR区,以f-PAINT方法作为对照,并比较两种方法的整合效率。sgRNA、pegRNA的表达载体以及供体载体的构建如实施例1所述。其中,h-PAINT(LHA)供体载体是在外源基因IRES-EGFP的上游连接800bp的GAPDH左侧同源臂,下游连接PE-spCas9/sgβ的靶向识别序列;h-PAINT(RHA)供体载体是在外源基因IRES-EGFP的上游连接PE-spCas9/sgα的靶向识别序列,下游连接800bp的GAPDH右侧同源臂。用于载体构建的引物序列、供体载体的同源臂序列同实施例1,具体见表1。
h-PAINT方法介导外源基因在基因组上定点整合的示意图如图7所示。使用h- PAINT方法介导外源基因IRES-EGFP在GAPDH基因3’UTR定点整合的示意图如图8a所示。在h-PAINT(LHA)系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH、pCAG-PE-spCas9-mCherry、pUC19-U6-pegβ连同h-PAINT(LHA)供体载体转染到293T细胞中。在h-PAINT(RHA)系统的实施中,用Invitrogen公司的Lipofectamine 3000脂质体转染试剂将pCAG-saCas9、sgGAPDH、pCAG-PE-spCas9-mCherry、pUC19-U6-pegα连同h-PAINT(RHA)供体载体转染到293T细胞中。f-PAINT系统的实施如前所述。在f-PAINT系统和h-PAINT系统的阴性对照组中,不转染sgGAPDH质粒。使用细胞转染24小时后在流式细胞仪分选mCherry阳性的细胞,分选得到的细胞继续培养5天后,用流式细胞仪分析EGFP阳性细胞的比率。将h-PAINT(LHA)和h-PAINT(RHA)系统编辑的细胞进行基因组提取,使用引物GAPDH-P1-2(SEQ ID NO:143)/GAPDH-P2(SEQ ID NO:30)(扩增h-PAINT(LHA)的5’接头)、GAPDH-P3(SEQ ID NO:31)/GAPDH-P4(SEQ ID NO:32)(扩增h-PAINT(LHA)的3’接头)、GAPDH-P1(SEQ ID NO:29)/GAPDH-P2(SEQ ID NO:30)(扩增h-PAINT(RHA)的5’接头)、GAPDH-P3(SEQ ID NO:31)/GAPDH-P4-2(SEQ ID NO:144)(扩增h-PAINT(RHA)的3’接头)对基因编辑后的产物进行PCR鉴定,并对扩增产物进行Sanger法测序分析。
不同方法介导外源基因IRES-EGFP在基因组GAPDH位点整合的效率如图8b所示。结果显示,h-PAINT(LHA)和h-PAINT(RHA)方法相较f-PAINT具有更高的效率。基因型鉴定和测序结果如图8c和8d所示。测序分析显示在h-PAINT系统中,长同源臂侧的接头处不易引入碱基突变,具有更高的精确性。
尽管本发明的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公布的所有教导,可以对细节进行各种修改和变动,并且这些改变均在本发明的保护范围之内。本发明的全部分为由所附权利要求及其任何等同物给出。

Claims (64)

  1. 一种系统或试剂盒,其包含下述四种组分:
    (1)第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1,其中,所述第一Cas蛋白能够切割或断裂第一双链靶核酸的一条核酸链;
    (2)依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1;
    (3)第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1,其中,所述第一gRNA能够与所述第一Cas蛋白结合,并形成第一功能性复合物;所述第一功能性复合物能够将第一双链靶核酸的一条核酸链断裂;
    (4)第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1,其中,所述第一标签引物含有第一标签序列和第一靶结合序列,所述第一标签序列位于所述第一靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第一靶结合序列能够杂交或退火至所述断裂的核酸链的3’端,形成双链结构,且,所述第一标签序列不与所述核酸链结合,处于游离的单链状态。
  2. 权利要求1的系统或试剂盒,其中,所述第一Cas蛋白选自切割DNA单链的Cas蛋白,例如所述切割DNA单链是指切割非gRNA靶向结合的DNA单链;
    优选地,所述第一Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白或Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式;
    优选地,所述第一Cas蛋白能够断裂第一双链靶核酸的一条核酸链,并产生切口;优选地,所述第一Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A));
    优选地,所述第一Cas蛋白具有SEQ ID NO:3示的氨基酸序列。
  3. 权利要求1或2的系统或试剂盒,其中,所述第一DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶;
    优选地,所述第一DNA聚合酶为依赖于RNA的DNA聚合酶;
    优选地,所述第一DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶;
    优选地,所述第一DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列。
  4. 权利要求1-3任一项的系统或试剂盒,其中,所述第一Cas蛋白与所述第一DNA 聚合酶相连接;
    优选地,所述第一Cas蛋白通过接头或者不通过接头与所述第一DNA聚合酶共价相连接;
    优选地,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列;
    优选地,所述第一Cas蛋白通过肽接头或者不通过肽接头与所述第一DNA聚合酶融合,形成第一融合蛋白;
    优选地,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的N端;或者,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的C端;
    优选地,所述第一融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
  5. 权利要求1-4任一项的系统或试剂盒,其中,所述第一gRNA含有第一引导序列,并且,在允许核酸杂交或退火的条件下,所述第一引导序列能够杂交或退火至第一双链靶核酸的一条核酸链;
    优选地,所述第一引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一gRNA还含有第一支架序列,其能够被所述第一Cas蛋白识别并结合,从而形成第一功能性复合物;
    优选地,所述第一支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一引导序列位于所述第一支架序列的上游或5’端;
    优选地,所述第一功能性复合物在所述第一引导序列与第一双链靶核酸的一条核酸链(第二链)结合后,能够将第一双链靶核酸的另一条核酸链(第一链)断裂。
  6. 权利要求1-5任一项的系统或试剂盒,其中,在允许核酸杂交或退火的条件下,所述第一靶结合序列能够杂交或退火至所述断裂的核酸链的3’端,并且所述3’端是因所述第一功能性复合物断裂所述核酸链而形成的;
    优选地,所述第一靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,在所述第一靶结合序列杂交或退火到所述断裂的核酸链的3’端后,所述第一DNA聚合酶能够以第一标签引物为模板,延伸所述核酸链的3’端;优选地,所述延伸形成第一瓣突;
    优选地,所述第一标签引物为单链脱氧核糖核酸或者单链核糖核酸;
    优选地,所述第一标签引物为单链核糖核酸,并且所述第一DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第一标签引物为单链脱氧核糖核酸,并且所述第一DNA聚合酶为依赖于DNA的DNA聚合酶;
    优选地,所述第一引导序列结合的核酸链与所述第一靶结合序列结合的核酸链是不同的;优选地,所述第一引导序列结合的核酸链是所述第一靶结合序列结合的核酸链的相对链。
  7. 权利要求1-6任一项的系统或试剂盒,其中,所述第一标签引物与所述第一gRNA 相连接;
    优选地,所述第一标签引物通过接头或者不通过接头与所述第一gRNA共价相连接;
    优选地,所述第一标签引物任选地通过接头连接至所述第一gRNA的3’端;
    优选地,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头);
    优选地,所述第一标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第一gRNA的3’端相连接,形成第一PegRNA。
  8. 权利要求1-7任一项的系统或试剂盒,其具有选自下列的一项或多项技术特征:
    (1)所述核酸分子A1能够在细胞中表达所述第一Cas蛋白;
    (2)所述核酸分子B1能够在细胞中表达所述第一DNA聚合酶;
    (3)所述核酸分子C1能够在细胞中转录出所述第一gRNA;
    (4)所述核酸分子D1能够在细胞中转录出所述第一标签引物;
    优选地,所述核酸分子A1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子A1为含有编码所述第一Cas蛋白的核苷酸序列的表达载体(例如,真核表达载体);
    优选地,所述核酸分子B1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子B1为含有编码所述第一DNA聚合酶的核苷酸序列的表达载体(例如,真核表达载体);
    优选地,所述核酸分子C1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子C1为含有编码所述第一gRNA的核苷酸序列的表达载体(例如,真核表达载体);
    优选地,所述核酸分子D1包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子D1为含有编码所述第一标签引物的核苷酸序列的表达载体(例如,真核表达载体);
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA;
    优选地,所述核酸分子A1、B1、C1和D1中的两个、三个或四个包含于相同的表达载体(例如,真核表达载体)中。
  9. 权利要求1-8任一项的系统或试剂盒,其中,所述系统或试剂盒包含:
    (M1-1)含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白,或者,含有编码所述第一融合蛋白的核苷酸序列的核酸分子;或,(M1-2)分离的所述第一Cas蛋白和第一DNA聚合酶,或者,能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子;和,
    (M2)含有所述第一gRNA和第一标签引物的第一PegRNA,或者,含有编码所述第一PegRNA的核苷酸序列的核酸分子。
  10. 权利要求1-9任一项的系统或试剂盒,其中,所述系统或试剂盒还包含:
    (5)第二gRNA或含有编码所述第二gRNA的核苷酸序列的核酸分子C2,其中,所述第二gRNA能够与第二Cas蛋白结合,并形成第二功能性复合物;所述第二功能性复合物能够将第二双链靶核酸的一条核酸链断裂;
    优选地,所述第二Cas蛋白与所述第一Cas蛋白相同或者不同;优选地,所述第二Cas蛋白与所述第一Cas蛋白相同;
    优选地,所述第二gRNA含有第二引导序列,并且,在允许核酸杂交或退火的条件下,所述第二引导序列能够杂交或退火到第二双链靶核酸的一条核酸链;
    优选地,所述第二功能性复合物在所述第二引导序列与第二双链靶核酸的一条链(第一链)结合后,将第二双链靶核酸的另一条核酸链(第二链)断裂;优选地,所述第二引导序列与所述第一引导序列不同;
    优选地,所述第二双链靶核酸与所述第一双链靶核酸相同或者不同;
    优选地,所述第二双链靶核酸与所述第一双链靶核酸是相同的,并且,所述第二功能性复合物与所述第一功能性复合物在不同的位置断裂所述双链靶核酸的不同核酸链;
    优选地,所述第二功能性复合物与所述第一功能性复合物断裂相同的双链靶核酸的不同核酸链,并且,所述第一引导序列结合的核酸链与所述第二引导序列结合的核酸链是不同的;优选地,所述第一引导序列结合的核酸链是所述第二引导序列结合的核酸链的相对链;
    优选地,所述第二双链靶核酸与所述第一双链靶核酸是同一双链靶核酸,所述双链靶核酸包含第一链和第二链,所述第一功能性复合物在所述第一引导序列与第二链结合后,能够将第一链断裂,所述第二功能性复合物在所述第二引导序列与第一链结合后,将第二链断裂;优选地,所述第二引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二gRNA还含有第二支架序列,其能够被所述第二Cas蛋白识别并结合,从而形成第二功能性复合物;
    优选地,所述第二支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二支架序列与所述第一支架序列相同或者不同;优选地,所述第二支架序列与所述第一支架序列相同;
    优选地,所述第二引导序列位于所述第二支架序列的上游或5’端;
    优选地,所述核酸分子C2能够在细胞中转录出所述第二gRNA;
    优选地,所述核酸分子C2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子C2为含有编码所述第二gRNA的核苷酸序列的表达载体(例如,真核表达载体)。
  11. 权利要求10的系统或试剂盒,其中,所述第二Cas蛋白与所述第一Cas蛋白不同;并且,所述系统或试剂盒还包含:
    (6)所述第二Cas蛋白或含有编码所述第二Cas蛋白的核苷酸序列的核酸分子A2,其中,所述第二Cas蛋白能够切割或断裂第二双链靶核酸的一条核酸链;
    优选地,所述第二Cas蛋白能够断裂第二双链靶核酸的一条核酸链,并产生切口;
    优选地,所述第二Cas蛋白选自切割DNA单链的Cas蛋白,例如所述切割DNA单链是指切割非gRNA靶向结合的DNA单链;
    优选地,所述第二Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋 白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式;
    优选地,所述第二Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A));
    优选地,所述第二Cas蛋白具有SEQ ID NO:3所示的氨基酸序列;
    优选地,所述核酸分子A2能够在细胞中表达所述第二Cas蛋白;
    优选地,所述核酸分子A2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子A2为含有编码所述第二Cas蛋白的核苷酸序列的表达载体(例如,真核表达载体)。
  12. 权利要求1-11任一项的系统或试剂盒,其中,所述系统或试剂盒还包含:
    (7)第二标签引物或含有编码所述第二标签引物的核苷酸序列的核酸分子D2,其中,所述第二标签引物含有第二标签序列和第二靶结合序列,所述第二标签序列位于所述第二靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第二靶结合序列能够杂交或退火到所述断裂的核酸链的3’端,形成双链结构,且,所述第二标签序列不与所述核酸链结合,处于游离的单链状态;
    优选地,在允许核酸杂交或退火的条件下,所述第二靶结合序列能够杂交或退火到所述断裂的核酸链的3’端,并且所述3’端是因所述第二功能性复合物断裂所述核酸链而形成的;
    优选地,所述第二靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二靶结合序列与所述第一靶结合序列不同;优选地,所述第二靶结合序列结合的核酸链与所述第一靶结合序列结合的核酸链是不同的;优选地,所述第二靶结合序列结合的核酸链是所述第一靶结合序列结合的核酸链的相对链;
    优选地,所述第二标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二标签序列与所述第一标签序列相同或不同;优选地,所述第二标签序列与所述第一标签序列不同;
    优选地,在所述第二靶结合序列杂交或退火到所述断裂的核酸链的3’端后,第二DNA聚合酶能够以第二标签引物为模板,延伸所述核酸链的3’端;优选地,所述延伸形成第二瓣突;
    优选地,所述第二DNA聚合酶与所述第一DNA聚合酶相同或者不同;优选地,所述第二DNA聚合酶与所述第一DNA聚合酶相同;
    优选地,所述第二标签引物为单链脱氧核糖核酸或者单链核糖核酸;
    优选地,所述第二标签引物为单链核糖核酸,并且所述第二DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第二标签引物为单链脱氧核糖核酸,并且所述第二DNA聚合酶为依赖于DNA的DNA聚合酶;
    优选地,所述第二引导序列结合的核酸链与所述第二靶结合序列结合的核酸链是不同的;优选地,所述第二引导序列结合的核酸链是所述第二靶结合序列结合的核酸链的 相对链;
    优选地,所述第二引导序列与所述第一靶结合序列结合相同的核酸链,并且,所述第二引导序列的结合位置位于所述第一靶结合序列的结合位置的上游或5’端;
    优选地,所述第一引导序列与所述第二靶结合序列结合相同的核酸链,并且,所述第一引导序列的结合位置位于所述第二靶结合序列的结合位置的上游或5’端;
    优选地,所述第一瓣突和第二瓣突包含于相同的双链靶核酸上,且彼此位于相对的核酸链上;
    优选地,所述核酸分子D2能够在细胞中转录出所述第二标签引物;
    优选地,所述核酸分子D2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子D2为含有编码所述第二标签引物的核苷酸序列的表达载体(例如,真核表达载体)。
  13. 权利要求12的系统或试剂盒,其中,所述第二DNA聚合酶与所述第一DNA聚合酶不同;并且,所述系统或试剂盒还包含:
    (8)所述第二DNA聚合酶或含有编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2;
    优选地,所述第二DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶;
    优选地,所述第二DNA聚合酶为依赖于RNA的DNA聚合酶;
    优选地,所述第二DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶;
    优选地,所述第二DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列;
    优选地,所述核酸分子B2能够在细胞中表达所述第二DNA聚合酶;
    优选地,所述核酸分子B2包含于表达载体(例如,真核表达载体)中,或者,所述核酸分子B2为含有编码所述第二DNA聚合酶的核苷酸序列的表达载体(例如,真核表达载体)。
  14. 权利要求12或13的系统或试剂盒,其中,所述第二标签引物与所述第二gRNA相连接;
    优选地,所述第二标签引物通过接头或者不通过接头与所述第二gRNA共价相连接;
    优选地,所述第二标签引物任选地通过接头连接至所述第二gRNA的3’端;
    优选地,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头);
    优选地,所述第二标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第二gRNA的3’端相连接,形成第二PegRNA;
    优选地,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA;
    优选地,所述系统或试剂盒包含:含有所述第二gRNA和所述第二标签引物的第二PegRNA,或者,含有编码所述第二PegRNA的核苷酸序列的核酸分子。
  15. 权利要求13或14的系统或试剂盒,其中,所述第二Cas蛋白与所述第二DNA聚合酶是分离的或者相连接的;
    优选地,所述第二Cas蛋白通过接头或者不通过接头与所述第二DNA聚合酶共价相连接;
    优选地,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列;
    优选地,所述第二Cas蛋白通过肽接头或者不通过肽接头与所述第二DNA聚合酶融合,形成第二融合蛋白;
    优选地,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的N端;或者,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的C端;
    优选地,所述第二融合蛋白具有SEQ ID NO:8所示的氨基酸序列;
    优选地,所述核酸分子A2和核酸分子B2包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A2和核酸分子B2在细胞中能够表达分离的所述第二Cas蛋白和所述第二DNA聚合酶,或者能够表达含有所述第二Cas蛋白和所述第二DNA聚合酶的第二融合蛋白;
    优选地,所述系统或试剂盒包含,含有所述第二Cas蛋白和所述第二DNA聚合酶的第二融合蛋白,或者,含有编码所述第二融合蛋白的核苷酸序列的核酸分子;或者,分离的所述第二Cas蛋白和第二DNA聚合酶,或者,能够表达分离的所述第二Cas蛋白和第二DNA聚合酶的核酸分子;
    优选地,所述第一和第二Cas蛋白是相同的Cas蛋白,所述第一和第二DNA聚合酶是相同的DNA聚合酶;并且,所述系统或试剂盒包含:
    (M1-1)含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白,或者,含有编码所述第一融合蛋白的核苷酸序列的核酸分子;或,(M1-2)分离的所述第一Cas蛋白和第一DNA聚合酶,或者,能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子;
    (M2)含有所述第一gRNA和第一标签引物的第一PegRNA,或者,含有编码所述第一PegRNA的核苷酸序列的核酸分子;
    (M3)含有所述第二gRNA和第二标签引物的第二PegRNA,或者,含有编码所述第二PegRNA的核苷酸序列的核酸分子。
  16. 权利要求1-15任一项的系统或试剂盒,其中,所述系统或试剂盒还包含核酸载体(例如,供体核酸载体);
    优选地,所述核酸载体还包含所述第一Cas蛋白识别的第一PAM序列,和/或,所述第二Cas蛋白识别的第二PAM序列;
    优选地,所述核酸载体是双链的;
    优选地,所述核酸载体是环状双链载体;
    优选地,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列),和/或,能够与所述第二引导序列杂交或退火的第二引导结合序列(例如,所述第二引导序列的互补序列);任选地,所述核酸载体在所述第一引导结合序列与所述第二引导结合序列之间还包含限制性酶切位点;
    优选地,所述第一引导结合序列与所述第二引导结合序列位于所述核酸载体的相对链上;
    优选地,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序 列,断裂所述核酸载体的一条核酸链(第一链);和/或,所述第二功能性复合物能够通过所述第二引导结合序列和所述第二PAM序列,断裂所述核酸载体的另一条核酸链(第二链)。
  17. 权利要求16的系统或试剂盒,其中,所述核酸载体还包含目的核酸序列;
    优选地,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段;
    优选地,所述第一PAM序列和第二PAM序列分别位于目的核酸序列的两侧;
    优选地,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间;
    优选地,所述第二引导结合序列位于目的核酸序列和所述第二PAM序列之间;
    优选地,所述第一功能性复合物和所述第二功能性复合物分别断裂所述核酸载体的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段;
    优选地,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链;
    优选地,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态;优选地,所述第二靶结合序列杂交或退火的核酸链是含有所述第二引导结合序列的核酸链的相对链;
    优选地,所述第一靶结合序列杂交或退火的核酸链是所述第二靶结合序列杂交或退火的核酸链的相对链。
  18. 权利要求16或17的系统或试剂盒,其中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶序列位于所述第一引导结合序列的相对链;优选地,所述第一靶序列位于断裂的第一链的末端;优选地,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(优选地,形成第一瓣突);
    和/或,
    所述核酸载体还包含第二靶序列;其中,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二靶序列杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态;优选地,所述第二靶序列位于所述第二引导结合序列的相对链;优选地,所述第二靶序列位于断裂的第二链的末端;优选地,在所述第二功能性复合物断裂所述第二链后,含有第二靶序列的核酸链的3’端能够以退火至第二靶序列的第二标签引物为模板进行延伸(优选地,形成第二瓣突);
    优选地,含有第一靶序列的核酸链位于含有第二靶序列的核酸链的相对链;
    优选地,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含限制性酶切位点;
    优选地,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含外源基因。
  19. 权利要求1-18任一项的系统或试剂盒,其中,所述系统或试剂盒还包含:
    (9)用于将第三双链靶核酸双链断裂的第三核酸编辑系统;
    优选地,所述第三核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
  20. 权利要求19的系统或试剂盒,其中,所述第三核酸编辑系统能够将第三双链靶核酸的两条链断裂,形成断裂的核苷酸片段a1和a2;
    优选地,在允许核酸杂交或退火的条件下,所述第一标签序列或其互补序列或所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火;
    优选地,所述第一标签序列或其互补序列或所述第一瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a1杂交或退火;
    优选地,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的;
    优选地,在允许核酸杂交或退火的条件下,所述第二标签序列或其互补序列或所述第二瓣突能够与断裂的核苷酸片段a2杂交或退火;
    优选地,所述第二标签序列或其互补序列或所述第二瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a2杂交或退火;
    优选地,所述第二标签序列的互补序列或所述第二瓣突能够杂交或退火到断裂的核苷酸片段a2的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
  21. 权利要求19或20的系统或试剂盒,其中,所述第三核酸编辑系统是CRISPR(成簇规律间隔短回文重复序列)/Cas系统;
    优选地,所述第三核酸编辑系统包含:(i)第三Cas蛋白或含有编码所述第三Cas蛋白的核苷酸序列的核酸分子,以及(ii)第三gRNA或含有编码所述第三gRNA的核苷酸序列的核酸分子;其中,所述第三gRNA能够与第三Cas蛋白结合,并形成第三功能性复合物;所述第三功能性复合物能够将第三双链靶核酸的两条链断裂,形成断裂的核苷酸片段a1和a2;
    优选地,所述第三Cas蛋白选自切割DNA双链的Cas蛋白,例如Cas9蛋白;
    优选地,所述第三gRNA具有如SEQ ID NO:11、38、54、67、80、93、106、119或132中任意一项所示的序列。
  22. 权利要求1-20任一项的系统或试剂盒,其中,所述系统或试剂盒还包含:
    (10)用于将第四双链靶核酸双链断裂的第四核酸编辑系统;
    优选地,所述第四核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统;
    优选地,所述第三核酸编辑系统和第四核酸编辑系统选自相同的位点特异性核酸酶技术。
  23. 权利要求22的系统或试剂盒,其中,所述第四双链靶核酸与所述第三双链靶核酸是相同的,并且,所述第三和第四核酸编辑系统在不同的位置断裂所述相同的双链靶 核酸,形成断裂的核苷酸片段a1、a2和a3;其中,在断裂之前,在所述相同的双链靶核酸中,核苷酸片段a1、a2和a3依次排列(即,核苷酸片段a1通过核苷酸片段a2与核苷酸片段a3相连);优选地,所述第三和第四核酸编辑系统分别导致核苷酸片段a1和a2的分离以及核苷酸片段a2和a3的分离;
    优选地,在允许核酸杂交或退火的条件下,所述第一标签序列或其互补序列或所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火;
    优选地,所述第一标签序列或其互补序列或所述第一瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a1杂交或退火;
    优选地,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的;
    优选地,在允许核酸杂交或退火的条件下,所述第二标签序列或其互补序列或所述第二瓣突能够与断裂的核苷酸片段a3杂交或退火;
    优选地,所述第二标签序列或其互补序列或所述第二瓣突能够在第三核酸编辑系统断裂第三双链靶核酸所形成的末端处与断裂的核苷酸片段a3杂交或退火;
    优选地,所述第二标签序列的互补序列或所述第二瓣突能够杂交或退火到断裂的核苷酸片段a3的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述第三双链靶核酸而形成的。
  24. 权利要求22的系统或试剂盒,其中,所述第四核酸编辑系统是CRISPR(成簇规律间隔短回文重复序列)/Cas系统;
    优选地,所述第四核酸编辑系统包含:(i)第四Cas蛋白或含有编码所述第四Cas蛋白的核苷酸序列的核酸分子,以及(ii)第四gRNA或含有编码所述第四gRNA的核苷酸序列的核酸分子;其中,所述第四gRNA能够与第四Cas蛋白结合,并形成第四功能性复合物;所述第四功能性复合物能够将第四双链靶核酸的两条链断裂,形成断裂的靶核酸片段b1和b2;
    优选地,所述第四Cas蛋白选自切割DNA双链的Cas蛋白,例如Cas9蛋白。
  25. 权利要求24的系统或试剂盒,其中,所述第三核酸编辑系统和第四核酸编辑系统是CRISPR(成簇规律间隔短回文重复序列)/Cas系统;
    优选地,所述第三核酸编辑系统如权利要求21中定义,所述第四核酸编辑系统如权利要求23中定义。
  26. 权利要求1-25任一项的系统或试剂盒,所述试剂盒还包含额外的系统或组分;
    优选地,所述额外的组分包括选自下列的一项或多项:
    (1)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的gRNA或含有编码所述额外的gRNA的核苷酸序列的核酸分子,其中,所述额外的gRNA能够与Cas蛋白结合,并形成功能性复合物;优选地,所述功能性复合物能够将双链靶核酸的两条链或一条链断裂;
    (2)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的Cas蛋白或含有编码所述额外的Cas蛋白的核苷酸序列的核酸分子;优选地,所述Cas蛋白能够切割或断裂双链靶核酸的一条链或两条链;
    (3)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的标签引物或含有编码所述额外的标签引物的核苷酸序列的核酸分子,其中, 所述额外的标签引物含有标签序列和靶结合序列,所述标签序列位于所述靶结合序列的上游或5’端;优选地,在允许核酸杂交或退火的条件下,所述靶结合序列能够杂交或退火到所述断裂的核酸链的3’端,形成双链结构,且,所述标签序列不与所述靶核酸片段结合,处于游离的单链状态;
    (4)一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)额外的DNA聚合酶或含有编码所述额外的DNA聚合酶的核苷酸序列的核酸分子;优选地,所述额外的DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶;优选地,所述额外的DNA聚合酶为依赖于RNA的DNA聚合酶,例如逆转录酶;
    优选地,所述额外的系统包括:一个或多个(例如,2个,3个,4个,5个,10个,15个,20个,或更多个)用于将双链靶核酸双链断裂的核酸编辑系统;
    优选地,所述核酸编辑系统为位点特异性核酸酶技术,例如,ZFN(锌指核酸酶)、TALEN(转录激活因子样效应核酸酶)或CRISPR(成簇规律间隔短回文重复序列)/Cas系统。
  27. 一种融合蛋白,其包含Cas蛋白与依赖于模板的DNA聚合酶,其中,所述Cas蛋白能够断裂靶核酸的一条核酸链;
    优选地,所述Cas蛋白能够断裂靶核酸的一条核酸链,并产生切口;
    优选地,所述Cas蛋白选自切割DNA单链的Cas蛋白;
    优选地,所述Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式;
    优选地,所述Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A));
    优选地,所述Cas蛋白具有SEQ ID NO:3所示的氨基酸序列;
    优选地,所述DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶;
    优选地,所述DNA聚合酶为依赖于RNA的DNA聚合酶;
    优选地,所述DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶;
    优选地,所述DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列;
    优选地,所述Cas蛋白通过接头或者不通过接头与所述DNA聚合酶共价相连接;
    优选地,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列;
    优选地,所述Cas蛋白任选地通过接头连接或融合至所述DNA聚合酶的N端;或者,所述Cas蛋白任选地通过接头连接或融合至所述DNA聚合酶的C端;
    优选地,所述融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
  28. 一种核酸分子,其包含编码权利要求27所述的融合蛋白的多核苷酸。
  29. 一种载体,其包含权利要求28所述的核酸分子;
    优选地,所述载体为表达载体;
    优选地,所述载体为真核表达载体。
  30. 一种宿主细胞,其包含权利要求28所述的核酸分子或权利要求29所述的载体;
    优选地,所述宿主细胞为原核细胞,例如大肠杆菌细胞;或者所述宿主细胞为真核细胞,例如,酵母细胞,真菌细胞,植物细胞,动物细胞;
    优选地,所述宿主细胞为哺乳动物细胞,例如人细胞。
  31. 一种制备权利要求27所述的融合蛋白的方法,其包括,(1)在允许蛋白表达的条件下,培养权利要求30所述的宿主细胞;和(2)分离所述宿主细胞表达的融合蛋白。
  32. 一种复合物,其包含第一Cas蛋白与依赖于模板的第一DNA聚合酶,其中,所述第一Cas蛋白具有断裂双链靶核酸的一条核酸链的能力,并且,所述第一Cas蛋白通过共价或者非共价的方式与第一DNA聚合酶复合;
    优选地,所述第一Cas蛋白能够断裂双链靶核酸的一条核酸链,并产生切口;
    优选地,所述第一Cas蛋白选自切割DNA单链的Cas蛋白;
    优选地,所述第一Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas13a蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式;
    优选地,所述第一Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A));
    优选地,所述第一Cas蛋白具有SEQ ID NO:3所示的氨基酸序列;
    优选地,所述第一DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶;
    优选地,所述第一DNA聚合酶为依赖于RNA的DNA聚合酶;
    优选地,所述第一DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒, Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶;
    优选地,所述第一DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列;
    优选地,所述第一Cas蛋白通过接头或者不通过接头与所述第一DNA聚合酶共价相连接;
    优选地,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列;
    优选地,所述第一Cas蛋白通过肽接头或者不通过肽接头与所述第一DNA聚合酶融合,形成融第一合蛋白;
    优选地,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的N端;或者,所述第一Cas蛋白任选地通过接头连接或融合至所述第一DNA聚合酶的C端;
    优选地,所述第一融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
  33. 权利要求32的复合物,其中,所述复合物还包含第一gRNA;
    优选地,所述第一gRNA能够与所述第一Cas蛋白结合,并形成第一功能性单元;所述第一功能性单元能够结合双链靶核酸中的一条核酸链(第二链),并将双链靶核酸中的另一条核酸链断裂(第一链);
    优选地,所述第一gRNA含有第一引导序列,并且,在允许核酸杂交或退火的条件下,所述第一引导序列能够杂交或退火至双链靶核酸的一条核酸链;
    优选地,所述第一引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一gRNA还含有第一支架序列,其能够被所述第一Cas蛋白识别并结合,从而形成第一功能性单元;
    优选地,所述第一支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一引导序列位于所述第一支架序列的上游或5’端;
    优选地,所述复合物或第一功能性单元在所述第一引导序列与双链靶核酸结合后,能够将双链靶核酸的一条链断裂。
  34. 权利要求32的复合物,其中,所述复合物还包含双链靶核酸,
    优选地,所述双链靶核酸含有所述第一Cas蛋白识别的第一PAM序列以及能够与所述第一引导序列杂交或退火的第一引导结合序列,由此,所述第一功能性单元通过所述第一引导结合序列和所述第一PAM序列,结合所述双链靶核酸。
  35. 权利要求34的复合物,其中,所述复合物还包含与所述双链靶核酸杂交或退火的第一标签引物;其中,所述第一标签引物含有第一靶结合序列,其能够与所述双链靶核酸杂交或退火;
    优选地,所述标签引物含有第一标签序列和第一靶结合序列,所述第一标签序列位于所述第一靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第一靶结合序列能够杂交或退火至所述双链靶核酸;优选地,所述第一靶结合序列能够杂交或退火至所述双链靶核酸被所述第一功能性单元断裂的核酸链的3’端,形成双链结构;优选地,所述3’端是因所述第一功能性单元断裂所述双链靶核酸的一条核酸链而形成的;优选地,所述第一标签序列不与所述断裂的核酸链结合,处于游离的单链状态;
    优选地,所述第一靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt, 20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第一标签引物通过所述第一靶结合序列结合至所述断裂的核酸链;优选地,所述第一DNA聚合酶与所述断裂的核酸链和所述第一标签引物结合;
    优选地,所述第一标签引物为单链脱氧核糖核酸或者单链核糖核酸;
    优选地,所述第一标签引物为单链核糖核酸,并且所述第一DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第一标签引物为单链脱氧核糖核酸,并且所述第一DNA聚合酶为依赖于DNA的DNA聚合酶;
    优选地,所述断裂的核酸链被所述第一DNA聚合酶以所述第一标签引物为模板延伸,形成第一瓣突;
    优选地,所述第一gRNA结合的核酸链与所述第一标签引物结合的核酸链是不同的;优选地,所述第一gRNA结合的核酸链是所述第一标签引物结合的核酸链的相对链。
  36. 权利要求35的复合物,其中,所述第一标签引物与所述第一gRNA相连接;
    优选地,所述第一标签引物通过接头或者不通过接头与所述第一gRNA共价相连接;
    优选地,所述第一标签引物任选地通过接头连接至所述第一gRNA的3’端;
    优选地,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头);
    优选地,所述第一标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第一gRNA的3’端相连接,形成第一PegRNA。
  37. 权利要求35或36的复合物,其中,所述复合物还包含第二Cas蛋白和第二gRNA,其中,所述第二Cas蛋白具有断裂双链靶核酸的一条核酸链的能力,所述第二gRNA能够与所述第二Cas蛋白结合,并形成第二功能性单元;所述第二功能性单元能够结合双链靶核酸,并将其一条链断裂;
    优选地,所述第二Cas蛋白与所述第一Cas蛋白相同或者不同;优选地,所述第二Cas蛋白与所述第一Cas蛋白相同;
    优选地,所述第二Cas蛋白能够断裂双链靶核酸的一条核酸链,并产生切口;
    优选地,所述第二Cas蛋白选自切割DNA单链的Cas蛋白;
    优选地,所述第二Cas蛋白选自Cas9蛋白、Cas12a蛋白、cas12b蛋白、cas12c蛋白、cas12d蛋白、cas12e蛋白、cas12f蛋白、cas12g蛋白、cas12h蛋白、cas12i蛋白、cas14蛋白、Cas1蛋白、Cas1B蛋白、Cas2蛋白、Cas3蛋白、Cas4蛋白、Cas5蛋白、Cas6蛋白、Cas7蛋白、Cas8蛋白、Cas10蛋白、Csy1蛋白、Csy2蛋白、Csy3蛋白、Cse1蛋白、Cse2蛋白、Csc1蛋白、Csc2蛋白、Csa5蛋白、Csn2蛋白、Csm2蛋白、Csm3蛋白、Csm4蛋白、Csm5蛋白、Csm6蛋白、Cmr1蛋白、Cmr3蛋白、Cmr4蛋白、Cmr5蛋白、Cmr6蛋白、Csb1蛋白、Csb2蛋白、Csb3蛋白、Csx17蛋白、Csx14蛋白、Csx10蛋白、Csx16蛋白、CsaX蛋白、Csx3蛋白、Csx1蛋白、Csx15蛋白、Csf1蛋白、Csf2蛋白、Csf3蛋白、Csf4蛋白的突变体(例如,spCas9(H840A),saCas9(R1226A))、突变体的同源物或突变体的修饰形式;
    优选地,所述第二Cas蛋白为Cas9蛋白的突变体,例如酿脓链球菌(S.pyogenes)的Cas9蛋白的突变体(spCas9(H840A));
    优选地,所述第二Cas蛋白具有SEQ ID NO:3所示的氨基酸序列;
    优选地,所述第二gRNA含有第二引导序列,并且,在允许核酸杂交或退火的条件下,所述第二引导序列能够杂交或退火至双链靶核酸的一条核酸链;
    优选地,所述第二引导序列与所述第一引导序列不同;优选地,所述第一引导序列结合的核酸链与所述第二引导序列结合的核酸链是不同的;优选地,所述第一引导序列结合的核酸链是所述第二引导序列结合的核酸链的相对链;
    优选地,所述第二功能性单元与第一功能性单元结合的双链靶核酸相同,该双链靶核酸包含第一链和第二链,所述第一功能性单元在所述第一引导序列与第一链结合后,能够将第一链断裂,所述第二功能性单元在所述第二引导序列与第一链结合后,将第一链断裂;优选地,所述第二功能性单元与第一功能性单元在相对链的不同位置产生断裂;
    优选地,所述第二引导序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二gRNA还含有第二支架序列,其能够被所述第二Cas蛋白识别并结合,从而形成第二功能性单元;
    优选地,所述第二支架序列与所述第一支架序列相同或者不同;优选地,所述第二支架序列与所述第一支架序列相同;
    优选地,所述第二支架序列的长度为至少20nt,例如20-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二引导序列位于所述第二支架序列的上游或5’端;
    优选地,所述双链靶核酸含有所述第二Cas蛋白识别的第二PAM序列以及能够与所述第二引导序列杂交或退火的第二引导结合序列,由此,所述第二功能性单元通过所述第二引导结合序列和所述第二PAM序列,结合所述双链靶核酸。
  38. 权利要求37的复合物,其中,所述复合物还包含依赖于模板的第二DNA聚合酶,所述第二DNA聚合酶通过共价或者非共价的方式与第二Cas蛋白复合;
    优选地,所述第二DNA聚合酶选自依赖于DNA的DNA聚合酶和依赖于RNA的DNA聚合酶;
    优选地,所述第二DNA聚合酶为依赖于RNA的DNA聚合酶;
    优选地,所述第二DNA聚合酶为逆转录酶,例如来自莫洛尼氏鼠白血病病毒人免疫缺陷病毒(HIV),禽肉瘤-白血病病毒(ASLV),Rous肉瘤病毒(RSV),禽成髓细胞增多症病毒(AMV),禽成红细胞增多症病毒辅助病毒,禽粒细胞瘤病毒MC29辅助病毒,禽网状内皮组织增生病毒辅助病毒,禽肉瘤病毒UR2辅助病毒,禽肉瘤病毒Y73辅助病毒,Rous相关病毒和成髓细胞增多相关病毒(MAV)的逆转录酶;
    优选地,所述第二DNA聚合酶具有SEQ ID NO:7所示的氨基酸序列;
    优选地,所述第二DNA聚合酶与所述第一DNA聚合酶相同或者不同;优选地,所述第二DNA聚合酶与所述第一DNA聚合酶相同;
    优选地,所述第二Cas蛋白通过接头或者不通过接头与所述第二DNA聚合酶共价相连接;
    优选地,所述接头为肽接头,例如柔性肽接头;例如,所述接头具有SEQ ID NO:35所示的氨基酸序列;
    优选地,所述第二Cas蛋白通过肽接头或者不通过肽接头与所述第二DNA聚合酶融合,形成融第二合蛋白;
    优选地,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的N端;或者,所述第二Cas蛋白任选地通过接头连接或融合至所述第二DNA聚合酶的C 端;
    优选地,所述第二融合蛋白具有SEQ ID NO:8所示的氨基酸序列。
  39. 权利要求38的复合物,其中,所述复合物还包含与所述双链靶核酸杂交或退火的第二标签引物;其中,所述第二标签引物含有第二靶结合序列,其能够与所述双链靶核酸杂交或退火;
    优选地,所述标签引物含有第二标签序列和第二靶结合序列,所述第二标签序列位于所述第二靶结合序列的上游或5’端;并且,在允许核酸杂交或退火的条件下,所述第二靶结合序列能够杂交或退火至所述双链靶核酸;优选地,所述第二靶结合序列能够杂交或退火至所述双链靶核酸被所述第二功能性单元断裂的核酸链的3’端,形成双链结构;优选地,所述3’端是因所述第二功能性单元断裂所述双链靶核酸的一条链而形成的;优选地,所述第二标签序列不与所述断裂的核酸链结合,处于游离的单链状态;
    优选地,所述第二靶结合序列的长度为至少5nt,例如5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二靶结合序列与所述第一靶结合序列不同;优选地,所述第二靶结合序列结合的核酸链与所述第一靶结合序列结合的核酸链是不同的;优选地,所述第二靶结合序列结合的核酸链是所述第一靶结合序列结合的核酸链的相对链;
    优选地,所述第二标签序列的长度为至少4nt,例如4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt,或者更长;
    优选地,所述第二标签序列与所述第一标签序列相同或不同;优选地,所述第二标签序列与所述第一标签序列不同;
    优选地,所述第二标签引物通过所述第二靶结合序列结合至所述断裂的核酸链;优选地,所述第二DNA聚合酶与所述断裂的核酸链和所述第二标签引物结合;
    优选地,所述第二标签引物为单链脱氧核糖核酸或者单链核糖核酸;
    优选地,所述第二标签引物为单链核糖核酸,并且所述第二DNA聚合酶为依赖于RNA的DNA聚合酶;或者,所述第二标签引物为单链脱氧核糖核酸,并且所述第二DNA聚合酶为依赖于DNA的DNA聚合酶;
    优选地,所述断裂的靶核酸片段被所述第二DNA聚合酶以所述第二标签引物为模板延伸,形成第二瓣突;
    优选地,所述第二gRNA结合的核酸链与所述第二标签引物结合的核酸链是不同的;优选地,所述第二gRNA结合的核酸链是所述第二标签引物结合的核酸链的相对链。
  40. 权利要求39的复合物,其中,所述第二标签引物与所述第二gRNA相连接;
    优选地,所述第二标签引物通过接头或者不通过接头与所述第二gRNA共价相连接;
    优选地,所述第二标签引物任选地通过接头连接至所述第二gRNA的3’端;
    优选地,所述接头为核酸接头(例如核糖核酸接头或脱氧核糖核酸接头);
    优选地,所述第二标签引物为单链核糖核酸,并且,其通过核糖核酸接头或者不通过核糖核酸接头与所述第二gRNA的3’端相连接,形成第二PegRNA。
  41. 权利要求40的复合物,其中,所述第一和第二功能性单元以预定的位置关系结合双链靶核酸;
    优选地,所述第二引导序列与所述第一靶结合序列结合相同的核酸链;和/或,所述 第一引导序列与所述第二靶结合序列结合相同的核酸链;
    优选地,所述第二引导序列的结合位置位于所述第一靶结合序列的结合位置的上游或5’端;和/或,所述第一引导序列的结合位置位于所述第二靶结合序列的结合位置的上游或5’端;
    优选地,所述第二引导序列的结合位置位于所述第一靶结合序列的结合位置的下游或3’端;和/或,所述第一引导序列的结合位置位于所述第二靶结合序列的结合位置的下游或3’端;
    优选地,所述双链靶核酸选自基因组DNA和核酸载体DNA。
  42. 一种核酸载体,所述核酸载体(例如,供体核酸载体)包含权利要求1-9任一项所述的第一Cas蛋白所识别的第一PAM序列;
    优选地,所述核酸载体还包含供体同源臂;
    优选地,所述核酸载体是双链的;
    优选地,所述核酸载体是环状双链载体;
    优选地,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列);
    优选地,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序列,断裂所述核酸载体的一条核酸链。
  43. 权利要求42的核酸载体,其中,所述核酸载体还包含目的核酸序列;
    优选地,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段;
    优选地,所述第一PAM序列和所述供体同源臂分别位于目的核酸序列的两侧;
    优选地,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间;
    优选地,所述第一功能性复合物断裂所述核酸载体的第一链,所述第一链包含由断裂所产生的切口,位于上述切口的3’端和所述供体同源臂之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段;
    优选地,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链。
  44. 权利要求42或43的核酸载体,其中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶序列位于所述第一引导结合序列的相对链;优选地,所述第一靶序列位于断裂的第一链的末端;优选地,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(优选地,形成第一瓣突);
    优选地,所述核酸载体在所述第一靶序列与所述供体同源臂之间还包含限制性酶切位点;
    优选地,所述核酸载体在所述第一靶序列与所述供体同源臂之间还包含外源基因。
  45. 一种试剂盒,其包含权利要求42-44任一项所述的核酸载体,以及权利要求1-9 任一项所述的系统或试剂盒中的一项或多项组分(例如,第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1,依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1,第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1,第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1);
    优选地,所述试剂盒包含下述4种组分:
    (a)第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1;
    (b)依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1;
    (c)第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1;和,
    (d)第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1;
    优选地,所述4种组分包含于1个或多个(例如,2个,3个,4个)载体中;
    优选地,所述试剂盒包含下述载体:
    (a)权利要求42-44任一项所述的核酸载体;
    (b)包含编码所述第一Cas蛋白的核苷酸序列的核酸分子A1和编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1的第一载体;
    (c)包含编码所述第一gRNA的核苷酸序列的核酸分子C1和含有编码所述第一标签引物的核苷酸序列的核酸分子D1的第二载体;
    任选地,所述试剂盒还包含权利要求19-21任一项系统或试剂盒中所述的第三核酸编辑系统中的一项或多项组分(例如,(i)第三Cas蛋白或含有编码所述第三Cas蛋白的核苷酸序列的核酸分子,以及(ii)第三gRNA或含有编码所述第三gRNA的核苷酸序列的核酸分子;)。
  46. 一种核酸载体,所述核酸载体(例如,供体核酸载体)包含权利要求1-9任一项所述的第一Cas蛋白所识别的第一PAM序列;
    优选地,所述核酸载体还包含权利要求10-18任一项所述的第二Cas蛋白所识别的第二PAM序列;
    优选地,所述核酸载体是双链的;
    优选地,所述核酸载体是环状双链载体;
    优选地,所述核酸载体包含能够与所述第一引导序列杂交或退火的第一引导结合序列(例如,所述第一引导序列的互补序列),和/或,能够与所述第二引导序列杂交或退火的第二引导结合序列(例如,所述第二引导序列的互补序列);任选地,所述核酸载体在所述第一引导结合序列与所述第二引导结合序列之间还包含限制性酶切位点;
    优选地,所述第一引导结合序列与所述第二引导结合序列位于所述核酸载体的相对链上;
    优选地,所述第一功能性复合物能够通过所述第一引导结合序列和所述第一PAM序列,断裂所述核酸载体的一条核酸链(第一链);和/或,所述第二功能性复合物能够通过所述第二引导结合序列和所述第二PAM序列,断裂所述核酸载体的另一条核酸链(第二链)。
  47. 权利要求46的核酸载体,其中,所述核酸载体还包含目的核酸序列;
    优选地,所述目的核酸序列是拟整合入基因组特异位点的外源基因或其他外源核酸片段;
    优选地,所述第一PAM序列和第二PAM序列分别位于目的核酸序列的两侧;
    优选地,所述第一引导结合序列位于目的核酸序列和所述第一PAM序列之间;
    优选地,所述第二引导结合序列位于目的核酸序列和所述第二PAM序列之间;
    优选地,所述第一功能性复合物和所述第二功能性复合物分别断裂所述核酸载体的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分包含目的核酸序列,被称为含有目的核酸序列的靶核酸片段;
    优选地,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶结合序列杂交或退火的核酸链是含有所述第一引导结合序列的核酸链的相对链;
    优选地,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二功能性复合物所断裂的核酸链的3’端杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态;优选地,所述第二靶结合序列杂交或退火的核酸链是含有所述第二引导结合序列的核酸链的相对链;
    优选地,所述第一靶结合序列杂交或退火的核酸链是所述第二靶结合序列杂交或退火的核酸链的相对链。
  48. 权利要求46或47的核酸载体,其中,所述核酸载体还包含第一靶序列;其中,在允许核酸杂交或退火的条件下,所述第一标签引物能够通过所述第一靶结合序列与所述第一靶序列杂交或退火,形成双链结构,并且,所述第一标签引物的所述第一标签序列处于游离状态;优选地,所述第一靶序列位于所述第一引导结合序列的相对链;优选地,所述第一靶序列位于断裂的第一链的末端;优选地,在所述第一功能性复合物断裂所述第一链后,含有第一靶序列的核酸链的3’端能够以退火至第一靶序列的第一标签引物为模板进行延伸(优选地,形成第一瓣突);
    和/或,
    所述核酸载体还包含第二靶序列;其中,在允许核酸杂交或退火的条件下,所述第二标签引物能够通过所述第二靶结合序列与所述第二靶序列杂交或退火,形成双链结构,并且,所述第二标签引物的所述第二标签序列处于游离状态;优选地,所述第二靶序列位于所述第二引导结合序列的相对链;优选地,所述第二靶序列位于断裂的第二链的末端;优选地,在所述第二功能性复合物断裂所述第二链后,含有第二靶序列的核酸链的3’端能够以退火至第二靶序列的第二标签引物为模板进行延伸(优选地,形成第二瓣突);
    优选地,含有第一靶序列的核酸链位于含有第二靶序列的核酸链的相对链;
    优选地,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含限制性酶切位点;
    优选地,所述核酸载体在所述第一靶序列与所述第二靶序列之间还包含外源基因。
  49. 一种试剂盒,其包含权利要求46-48任一项所述的核酸载体,权利要求1-9任一项所述的系统或试剂盒中的一项或多项组分(例如,第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1,依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1,第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1,第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1),以及权利要求10-18任一项所述的系统或试剂盒中的一项或多项组分(例如,第二gRNA或含有编码所述第二gRNA的核苷酸序列的核酸分子C2,所述第二Cas蛋白或含有编码所述第二Cas蛋白的核苷酸序列的核酸分子A2,第二标签引物或含 有编码所述第二标签引物的核苷酸序列的核酸分子D2,所述第二DNA聚合酶或含有编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2);
    优选地,所述试剂盒包含下述8种组分:
    (a)第一Cas蛋白或含有编码所述第一Cas蛋白的核苷酸序列的核酸分子A1;
    (b)依赖于模板的第一DNA聚合酶或含有编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1;
    (c)第一gRNA或含有编码所述第一gRNA的核苷酸序列的核酸分子C1;
    (d)第一标签引物或含有编码所述第一标签引物的核苷酸序列的核酸分子D1;
    (e)第二gRNA或含有编码所述第二gRNA的核苷酸序列的核酸分子C2;
    (f)所述第二Cas蛋白或含有编码所述第二Cas蛋白的核苷酸序列的核酸分子A2;
    (g)第二标签引物或含有编码所述第二标签引物的核苷酸序列的核酸分子D2;和
    (h)所述第二DNA聚合酶或含有编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2;
    优选地,所述8种组分包含于1个或多个(例如,2个,3个,4个,5个,6个,7个,8个)载体中;
    优选地,所述试剂盒包含下述载体:
    (a)权利要求46-48任一项所述的核酸载体;
    (b)包含编码所述第一Cas蛋白的核苷酸序列的核酸分子A1和编码所述第一DNA聚合酶的核苷酸序列的核酸分子B1的第一载体;
    (c)包含编码所述第一gRNA的核苷酸序列的核酸分子C1和含有编码所述第一标签引物的核苷酸序列的核酸分子D1的第二载体;
    (d)包含编码所述第二gRNA的核苷酸序列的核酸分子C2和所述第二Cas蛋白和编码所述第二Cas蛋白的核苷酸序列的核酸分子A2的第三载体;和
    (e)包含编码所述第二标签引物的核苷酸序列的核酸分子D2和编码所述第二DNA聚合酶的核苷酸序列的核酸分子B2的第四载体;
    任选地,所述试剂盒还包含权利要求19-21任一项系统或试剂盒中所述的第三核酸编辑系统中的一项或多项组分(例如,(i)第三Cas蛋白或含有编码所述第三Cas蛋白的核苷酸序列的核酸分子,以及(ii)第三gRNA或含有编码所述第三gRNA的核苷酸序列的核酸分子)。
  50. 一种方法,其用于将双链靶核酸的一条核酸链断裂并在切口的3’端添加瓣突,其中,所述方法包括,使用权利要求1-9任一项所述的系统或试剂盒。
  51. 权利要求50的方法,其中,所述方法包括以下步骤:
    i.提供双链靶核酸;和
    提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物;
    ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物接触;
    优选地,在步骤ii中:
    所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,并且,所述第一功能性复合物断裂所述双链靶核酸的一条核酸链;并且,
    所述第一标签引物通过所述第一靶结合序列杂交或退火至所述断裂的核酸链的3’端;并且,
    所述第一DNA聚合酶以退火至所述断裂的核酸链的第一标签引物为模板,延伸所述 断裂的核酸链,形成第一瓣突;
    优选地,所述方法在细胞内进行;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1以及所述第一标签引物或核酸分子D1递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1和所述第一标签引物或核酸分子D1递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1和D1递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物;
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA;优选地,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物;
    优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子,以及含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶和第一标签引物;
    优选地,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸;
    优选地,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如权利要求1-9任一项所定义;
    优选地,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列;优选地,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条核酸链断裂。
  52. 一种方法,其用于将双链靶核酸的两条核酸链分别断裂,并在所述两条核酸链中由断裂产生的两个切口的3’端分别添加瓣突,其中,所述方法包括,使用权利要求10-18任一项所述的系统或试剂盒;其中,所述第一双链靶核酸与所述第二双链靶核酸是相同的;优选地,所述方法用于将双链靶核酸的两条核酸链分别在不同位置断裂;
    优选地,所述方法在细胞外或细胞内进行。
  53. 权利要求52的方法,其中,所述方法包括以下步骤:
    i.提供双链靶核酸;和
    提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、所述第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物;
    ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物接触;
    优选地,在步骤ii中:
    所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,且所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物;并且,所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1;并且,
    所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
    所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1、A2、B2、C2以及D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物;
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶;
    优选地,所述核酸分子A2和核酸分子B2包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A2和核酸分子B2在细胞中能够表达分离的所述第二Cas蛋白和所述第二DNA聚合酶,或者能够表达含有所述第二Cas蛋白和所述第二DNA聚合酶的第二融合蛋白;优选地,在步骤i中,将能够表达分离的所述第二Cas蛋白和第二DNA聚合酶的核酸分子或者含有编码所述第二融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第二Cas蛋白和所述第 二DNA聚合酶;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA;优选地,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物;
    优选地,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA;优选地,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物;
    优选地,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸;
    优选地,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如权利要求1-9任一项所定义;
    优选地,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如权利要求10-18任一项所定义;
    优选地,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列;优选地,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂。
  54. 权利要求53的方法,其中,所述第二Cas蛋白与所述第一Cas蛋白相同,并且所述第二DNA聚合酶与所述第一DNA聚合酶相同;其中,所述第一Cas蛋白与所述第一和第二gRNA分别形成第一和第二功能性复合物,并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物;
    优选地,在步骤i中,所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2以及所述第二标签引物或核酸分子D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物;
    优选地,在步骤i中,所述核酸分子A1、B1、C1、D1、C2以及D2递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物;
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真 核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA;优选地,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物;
    优选地,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA;优选地,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物;
    优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子、含有编码所述第一PegRNA的核苷酸序列的核酸分子以及含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA和第二标签引物。
  55. 一种方法,其用于将靶核酸片段插入感兴趣的核酸分子;其中,所述方法包括,使用权利要求49的试剂盒;其中,所述第一双链靶核酸与所述第二双链靶核酸是相同的,用于提供所述靶核酸片段,所述靶核酸片段位于所述双链靶核酸的第一链中由断裂产生的3’端与第二链中由断裂产生的3’端之间;并且,所述第三双链靶核酸为感兴趣的核酸分子。
  56. 权利要求55的方法,其中,所述方法包括:
    a.通过权利要求52-54任一项的方法,将所述第一双链靶核酸的第一链和第二链分别断裂,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1;在上述两个3’端分别添加第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;
    b.用所述第三核酸编辑系统将所述感兴趣的核酸分子断裂,形成断裂的核苷酸片段a1和a2;以及,
    c.用所述靶核酸片段F2连接所述核苷酸片段a1和a2,从而将所述靶核酸片段插入所述感兴趣的核酸分子;
    优选地,所述方法在细胞外或细胞内进行;
    优选地,当所述感兴趣的核酸分子是存在于细胞内的基因组序列;所述步骤a在所述细胞外或细胞内进行;所述步骤b和c在所述细胞内进行;
    优选地,所述方法包括以下步骤:
    i.提供双链靶核酸和感兴趣的核酸分子;和
    提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、所述第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统;
    ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物接触,并且,将所述感兴趣的核酸分子与所述第三核酸编辑系统接触;
    优选地,在步骤ii中:
    所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物;并且,
    所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1,且,所述第三核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1和a2;并且,
    所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
    所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;其中,所述第一瓣突和第二瓣突分别能够与断裂的核苷酸片段a1和a2杂交或退火;并且,
    所述靶核酸片段F2通过第一瓣突和第二瓣突分别与核苷酸片段a1和a2杂交或退火,进而被插入或连接至核苷酸片段a1和a2之间,从而,将所述靶核酸片段插入所述感兴趣的核酸分子中;
    优选地,所述第一瓣突能够杂交或退火到所述核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的;
    优选地,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核苷酸片段a1的一条核酸链的3’部分,且所述核苷酸片段a1的3’部分与所述第三双链靶核酸所形成的断裂末端之间具有第一间隔区域;
    优选地,所述第一间隔区域的长度为1nt-200nt,例如1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt或100-200nt;
    优选地,所述第二瓣突能够杂交或退火到所述核苷酸片段a2的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的;
    优选地,所述第二标签序列的互补序列或所述第二瓣突能够杂交或退火到断裂的核苷酸片段a2的一条核酸链的3’部分,且所述核苷酸片段a2的3’部分与所述第三双链靶核酸所形成的断裂末端之间具有第二间隔区域;
    优选地,所述第二间隔区域的长度为1nt-200nt,例如1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt或100-200nt;
    优选地,所述方法在细胞内进行;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA 或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1、A2、B2、C2、D2和A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸;
    优选地,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列;优选地,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂;
    优选地,所述感兴趣的核酸分子是所述细胞的基因组DNA;
    优选地,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如权利要求1-9任一项所定义;
    优选地,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如权利要求10-18任一项所定义;
    优选地,所述第三核酸编辑系统如权利要求19-21任一项所定义;
    优选地,所述第三核酸编辑系统如权利要求21中定义,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列;优选地,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂。
  57. 权利要求56的方法,其中,所述第一、第二Cas蛋白是相同的,选自切割DNA单链的Cas蛋白,并且所述第二DNA聚合酶与所述第一DNA聚合酶相同;其中,所述第一Cas蛋白与所述第一、第二gRNA分别形成第一、第二功能性复合物,并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,形成具有第一瓣突和第二瓣突的靶核酸片段F2;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第 二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1、C2、D2和A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物和第三核酸编辑系统;
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA;优选地,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物;
    优选地,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA;优选地,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物;
    优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子、含有编码所述第一PegRNA的核苷酸序列的核酸分子、含有编码所述第二PegRNA的核苷酸序列的核酸分子以及含有编码所述第三核酸编辑系统的序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物和第三核酸编辑系统。
  58. 一种方法,其用于将靶核酸片段插入感兴趣的核酸分子;其中,所述方法包括,使用权利要求45的试剂盒;其中,所述第一双链靶核酸用于提供所述靶核酸片段,所述靶核酸片段位于所述双链靶核酸中由第一链断裂产生的3’端与所述供体同源臂之间;并且,所述第三双链靶核酸为感兴趣的核酸分子;
    任选地,所述第一双链靶核酸与所述第二双链靶核酸是相同的,且包含于权利要求46-48任一项的核酸载体中。
  59. 权利要求58的方法,其中,所述方法包括:
    a.通过权利要求50或51所述的方法,将所述第一双链靶核酸的第一链断裂,所述第一链包含由断裂所产生的切口,位于上述切口3’端和供体同源臂之间的第一链部分被称为靶核酸链S1;在上述3’端添加第一瓣突,形成具有第一瓣突的第一链部分,被称为靶核酸链S2;
    b.用所述第三核酸编辑系统将所述感兴趣的核酸分子断裂,形成断裂的核苷酸片段a1和a2;以及,
    c.所述靶核酸链S2通过第一瓣突与核苷酸片段a1的第一链杂交或退火;以所述靶核酸链S2为模板进行延伸反应形成延伸链E1,所述延伸链E1包含所述靶核酸链S2的互补序列以及与所述S2侧接的供体同源臂的互补序列;所述延伸链E1通过供体同源臂与a2的连接,从而将所述靶核酸片段插入所述感兴趣的核酸分子中;
    优选地,所述核苷酸片段a1的第一链的3’端包含第一瓣突的互补序列,所述核苷酸片段a1的第二链的3’端包含第一瓣突的序列;
    优选地,所述核苷酸片段a2的切口末端包含靶位点同源臂;
    优选地,所述方法在细胞外或细胞内进行;
    优选地,当所述感兴趣的核酸分子是存在于细胞内的基因组DNA;所述步骤a在所述细胞外或细胞内进行;所述步骤b、c和d在所述细胞内进行;
    优选地,所述方法包括以下步骤:
    i.提供双链靶核酸和感兴趣的核酸分子,所述双链靶核酸包含供体同源臂,第一Cas蛋白识别的第一PAM序列和第一gRNA识别的序列(优选地,所述双链靶核酸包含供体同源臂,第一Cas蛋白识别的第一PAM序列和第一gRNA包含的第一引导序列所识别的序列);和
    提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统;
    ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物接触,并且,将所述感兴趣的核酸分子与所述第三核酸编辑系统接触;
    优选地,在步骤ii中:
    所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物;并且,
    所述第一功能性复合物断裂所述双链靶核酸的第一链,所述第一链包含由断裂所产生的切口,位于切口的3’端和供体同源臂之间的第一链部分被称为靶核酸链S1,且,所述第三核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1和a2;并且,
    所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸链S1的3’端(即由所述断裂产生的3’端);并且,
    所述第一DNA聚合酶以退火至所述靶核酸链S1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链中由断裂产生的3’端分别延伸形成第一瓣突,形成具有第一瓣突的第一链部分,被称为靶核酸链S2;其中,所述第一瓣突能够与断裂的核苷酸片段a1杂交或退火;并且,
    所述靶核酸链S2通过第一瓣突与核苷酸片段a1的第一链杂交或退火,从而,所述靶核酸链S2连接于靶核酸片段a1的第二链和靶核酸片段a2的第二链之间;
    在所述核苷酸片段a1的第一链的3’端以所述靶核酸链S2为模板进行延伸反应形成延伸链E1,所述延伸链E1包含所述靶核酸链S2的互补序列以及与所述S2侧接的供体同源臂的互补序列;所述延伸链E1通过供体同源臂与a2的第一链退火,从而,所述延伸链E1连接于靶核酸片段a1的第一链和靶核酸片段a2的第一链之间,形成双链结构,从而将所述靶核酸片段插入所述感兴趣的核酸分子中;
    优选地,所述第一瓣突能够杂交或退火到所述核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的;
    优选地,所述第一标签序列的互补序列或所述第一瓣突能够杂交或退火到断裂的核 苷酸片段a1的一条核酸链的3’部分,且所述核苷酸片段a1的3’部分与所述第三双链靶核酸所形成的断裂末端之间具有第一间隔区域;
    优选地,所述第一间隔区域的长度为1nt-200nt,例如1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt或100-200nt;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统;
    或者,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1与所述双链靶核酸在细胞外接触,然后,将所述经编辑的双链靶核酸与第三核酸编辑系统或编码其的核酸分子A3递送入细胞中,以在细胞内提供具有第一瓣突的双链靶核酸和第三核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、核酸分子A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1和A3递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第三核酸编辑系统;
    优选地,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸;
    优选地,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及供体同源臂;优选地,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;
    优选地,所述感兴趣的核酸分子是所述细胞的基因组DNA;
    优选地,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如权利要求1-9任一项所定义;
    优选地,所述第三核酸编辑系统如权利要求19-21任一项所定义;
    优选地,所述第三核酸编辑系统如权利要求21中定义,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列;优选地,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂。
  60. 一种方法,其用于将靶核酸片段置换感兴趣的核酸分子中的核苷酸片段;其中,所述方法包括,使用权利要求22-25任一项所述的系统或试剂盒;其中,所述第一双链靶核酸与所述第二双链靶核酸是相同的,用于提供所述靶核酸片段,所述靶核酸片段位于所述双链靶核酸的第一链中由断裂产生的切口与第二链中由断裂产生的切口之间;并且,所述第三双链靶核酸与所述第四双链靶核酸是相同的,为感兴趣的核酸分子;
    优选地,所述方法包括:
    a.通过权利要求52-54任一项的方法,将所述第一双链靶核酸的第一链和第二链分别断裂,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1;在上述两个3’端分别添加第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;
    b.用所述第三和第四核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸 片段a1、a2和a3;其中,在断裂之前,在所述感兴趣的核酸分子中,核苷酸片段a1、a2和a3依次排列(即,核苷酸片段a1通过核苷酸片段a2与核苷酸片段a3相连);以及,
    c.用所述靶核酸片段F2连接所述核苷酸片段a1和a3,从而将感兴趣的核酸分子中的核苷酸片段a2替换为所述靶核酸片段。
  61. 权利要求60的方法,其中,所述方法包括以下步骤:
    i.提供双链靶核酸和感兴趣的核酸分子;和
    提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、所述第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    ii将所述双链靶核酸与所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶和第二标签引物接触,并且,将所述感兴趣的核酸分子与第三核酸编辑系统和第四核酸编辑系统接触;
    优选地,在步骤ii中:
    所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物;并且,
    所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的切口,位于上述两个切口的3’端之间的双链部分被称为靶核酸片段F1,且,所述第三和第四核酸编辑系统断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1、a2和a3;并且,
    所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
    所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;其中,所述第一瓣突和第二瓣突分别能够与断裂的核苷酸片段a1和a3杂交或退火;并且,
    所述靶核酸片段F2通过第一瓣突和第二瓣突分别与核苷酸片段a1和a3杂交或退火,进而连接在核苷酸片段a1和a3之间,从而,将感兴趣的核酸分子中的核苷酸片段a2替换为所述靶核酸片段;
    优选地,所述第一瓣突能够杂交或退火到所述核苷酸片段a1的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第三核酸编辑系统断裂所述感兴趣的核酸分子而形成的;
    优选地,所述第二瓣突能够杂交或退火到所述核苷酸片段a3的一条核酸链的3’端或3’部分,并且所述3’端或3’部分是因所述第四核酸编辑系统断裂所述感兴趣的核酸分子而形成的;
    优选地,所述方法在细胞内进行;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸 分子A3和第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、核酸分子A3和核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,所述核酸分子A1、B1、C1、D1、A2、B2、C2、D2、A3和A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二Cas蛋白、第二gRNA、第二DNA聚合酶、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸;
    优选地,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以及第二Cas蛋白识别的第二PAM序列;优选地,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂;
    优选地,所述感兴趣的核酸分子是所述细胞的基因组DNA;
    优选地,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如权利要求1-9任一项所定义;
    优选地,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如权利要求10-18任一项所定义;
    优选地,所述第三核酸编辑系统如权利要求19-21任一项所定义;
    优选地,所述第四核酸编辑系统如权利要求22-25任一项所定义;
    优选地,所述第三核酸编辑系统如权利要求21中定义,所述第四核酸编辑系统如权利要求23中定义,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列以及第四Cas蛋白识别的第四PAM序列;优选地,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂;并且,所述第四功能性复合物通过所述第四PAM序列和所述第四gRNA与所述感兴趣的核酸分子结合,并将其断裂。
  62. 权利要求61的方法,其中,所述第一和第二Cas蛋白是相同的,选自切割DNA单链的Cas蛋白,并且所述第二DNA聚合酶与所述第一DNA聚合酶相同;其中,所述第一Cas蛋白与所述第一、第二gRNA分别形成第一、第二功能性复合物,并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,形成具有第一瓣突和第二瓣突的靶核酸片段F2;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3和第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二 gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述核酸分子A3和A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1、C2、D2、A3以及A4递送入细胞中,以在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统;
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一gRNA和所述第一标签引物的第一PegRNA;优选地,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物;
    优选地,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA;优选地,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或含有编码所述第一融合蛋白的核苷酸序列的核酸分子、含有编码所述第一PegRNA的核苷酸序列的核酸分子、含有编码所述第二PegRNA的核苷酸序列的核酸分子、含有编码所述第三核酸编辑系统的核苷酸序列的核酸分子以及含有编码所述第四核酸编辑系统的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行转录和表达,从而在细胞内提供所述第一Cas蛋白、第一gRNA、第一DNA聚合酶、第一标签引物、第二gRNA、第二标签引物、第三核酸编辑系统和第四核酸编辑系统。
  63. 权利要求60的方法,其中,所述方法包括以下步骤:
    i.提供双链靶核酸和感兴趣的核酸分子;和
    提供所述第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统;其中,所述第三核酸编辑系统和第四核酸编辑系统分别如权利要求21和23中定义;
    ii将所述双链靶核酸与所述第一和第二Cas蛋白、第一和第二gRNA、第一和第二DNA聚合酶、第一和第二标签引物接触,并且,将所述感兴趣的核酸分子与所述第三核酸编辑系统和第四核酸编辑系统接触;
    优选地,在步骤ii中:
    所述第一Cas蛋白和第一gRNA相结合形成第一功能性复合物,所述第二Cas蛋白和第二gRNA相结合形成第二功能性复合物,所述第三Cas蛋白和第三gRNA相结合形成第三功能性复合物,且所述第四Cas蛋白和第四gRNA相结合形成第四功能性复合物;并且,
    所述第一和第二功能性复合物分别断裂所述双链靶核酸的第一链和第二链,所述第一链和第二链分别包含由断裂所产生的3’端,位于上述两个3’端之间的双链部分被称为靶核酸片段F1,且,所述第三和第四功能性复合物结合并断裂所述感兴趣的核酸分子,形成断裂的核苷酸片段a1、a2和a3;并且,
    所述第一标签引物通过所述第一靶结合序列杂交或退火至所述靶核酸片段F1的一条核酸链的3’端(即由所述断裂产生的3’端);且,所述第二标签引物通过所述第二靶结合序列杂交或退火至所述靶核酸片段F1的另一条核酸链的3’端(即由所述断裂产生的3’端);并且,
    所述第一DNA聚合酶和第二DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,从而使得第一链和第二链中由断裂产生的3’端分别延伸形成第一瓣突和第二瓣突,形成具有第一瓣突和第二瓣突的双链部分,被称为靶核酸片段F2;其中,所述第一瓣突和第二瓣突分别能够与断裂的核苷酸片段a1和a3杂交或退火;并且,
    所述第三标签引物通过所述第三靶结合序列杂交或退火至所述核苷酸片段a1的一条核酸链的3’端,其中,所述3’端是因所述第三功能性复合物断裂感兴趣的核酸分子而形成的;且,所述第四标签引物通过所述第四靶结合序列杂交或退火至所述核苷酸片段a3的一条核酸链的3’端,其中,所述3’端是因所述第四功能性复合物断裂感兴趣的核酸分子而形成的;
    优选地,所述方法在细胞内进行;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二Cas蛋白或核酸分子A2、所述第二DNA聚合酶或核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3和第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述核酸分子A2、所述核酸分子B2、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述核酸分子A3、所述核酸分子A4递送入细胞中,以在细胞内提供所述第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1、A2、B2、C2、D2、A3、A4递送入细胞中,以在细胞内提供第一、第二Cas蛋白,所述第一、第二gRNA,所述第一、第二DNA聚合酶,以及所述第一、第二标签引物,以及第三和第四核酸编辑系统;
    优选地,在步骤i中,将所述双链靶核酸或含有所述双链靶核酸的核酸分子T递送入细胞中,以在细胞内提供所述双链靶核酸;
    优选地,所述双链靶核酸或核酸分子T含有第一Cas蛋白识别的第一PAM序列以 及第二Cas蛋白识别的第二PAM序列;优选地,在步骤ii中,所述第一功能性复合物通过所述第一PAM序列和所述第一gRNA与所述双链靶核酸或核酸分子T结合,并将其一条链断裂;并且,所述第二功能性复合物通过所述第二PAM序列和所述第二gRNA与所述双链靶核酸或核酸分子T结合,并将其另一条链断裂;
    优选地,所述感兴趣的核酸分子含有第三Cas蛋白识别的第三PAM序列以及第四Cas蛋白识别的第四PAM序列;优选地,在步骤ii中,所述第三功能性复合物通过所述第三PAM序列和所述第三gRNA与所述感兴趣的核酸分子结合,并将其断裂;并且,所述第四功能性复合物通过所述第四PAM序列和所述第四gRNA与所述感兴趣的核酸分子结合,并将其断裂;
    优选地,所述感兴趣的核酸分子是所述细胞的基因组DNA;
    优选地,所述第一Cas蛋白、第一gRNA、第一DNA聚合酶或第一标签引物如权利要求1-9任一项所定义;
    优选地,所述第二Cas蛋白、第二gRNA、第二DNA聚合酶或第二标签引物如权利要求10-18任一项所定义。
  64. 权利要求61的方法,其中,所述第一和第二Cas蛋白是相同的,选自切割DNA单链的Cas蛋白,所述第三和第四Cas蛋白是相同的,选自切割DNA双链的Cas蛋白,并且所述第一、第二、第三和第四DNA聚合酶是相同的DNA聚合酶;其中,所述第一Cas蛋白与所述第一、第二、第三和第四gRNA分别形成第一、第二、第三和第四功能性复合物;并且,所述第一DNA聚合酶分别以退火至所述靶核酸片段F1的第一标签引物和第二标签引物为模板,进行延伸反应,形成具有第一瓣突和第二瓣突的靶核酸片段F2;
    优选地,在步骤i中,将所述第一Cas蛋白或核酸分子A1、所述第一DNA聚合酶或核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、第三核酸编辑系统或编码其的核酸分子A3、第四核酸编辑系统或编码其的核酸分子A4递送入细胞中,以在细胞内提供所述第一Cas蛋白,第一DNA聚合酶,第一、第二gRNA,以及第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、所述核酸分子B1、所述第一gRNA或核酸分子C1、所述第一标签引物或核酸分子D1、所述第二gRNA或核酸分子C2、所述第二标签引物或核酸分子D2、所述核酸分子A3和A4递送入细胞中,以在细胞内提供所述第一Cas蛋白,第一DNA聚合酶,第三核酸编辑系统和第四核酸编辑系统;
    优选地,在步骤i中,将所述核酸分子A1、B1、C1、D1、C2、D2、A3、递送入细胞中,以在细胞内提供所述第一Cas蛋白,第一DNA聚合酶,第一、第二gRNA,以及第一、第二标签引物,以及第三核酸编辑系统和第四核酸编辑系统;
    优选地,所述核酸分子A1和核酸分子B1包含于相同或不同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子A1和核酸分子B1在细胞中能够表达分离的所述第一Cas蛋白和所述第一DNA聚合酶,或者能够表达含有所述第一Cas蛋白和所述第一DNA聚合酶的第一融合蛋白;优选地,在步骤i中,将能够表达分离的所述第一Cas蛋白和第一DNA聚合酶的核酸分子或者含有编码所述第一融合蛋白的核苷酸序列的核酸分子递送入细胞中,并在细胞中进行表达,以在细胞内提供所述第一Cas蛋白和所述第一DNA聚合酶;
    优选地,所述核酸分子C1和核酸分子D1包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C1和核酸分子D1在细胞中能够转录出含有所述第一 gRNA和所述第一标签引物的第一PegRNA;优选地,在步骤i中,将所述第一PegRNA递送入细胞中以在细胞内提供所述第一gRNA和所述第一标签引物,或者,将含有编码所述第一PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第一PegRNA,以在细胞内提供所述第一gRNA和所述第一标签引物;
    优选地,所述核酸分子C2和核酸分子D2包含于相同的表达载体(例如,真核表达载体)中;优选地,所述核酸分子C2和核酸分子D2在细胞中能够转录出含有所述第二gRNA和所述第二标签引物的第二PegRNA;优选地,在步骤i中,将所述第二PegRNA递送入细胞中以在细胞内提供所述第二gRNA和所述第二标签引物,或者,将含有编码所述第二PegRNA的核苷酸序列的核酸分子递送入细胞中,并在细胞中转录所述第二PegRNA,以在细胞内提供所述第二gRNA和所述第二标签引物。
PCT/CN2022/086979 2021-09-03 2022-04-15 一种外源基因定点整合的系统及方法 WO2023029492A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280059607.XA CN117897481A (zh) 2021-09-03 2022-04-15 一种外源基因定点整合的系统及方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111032498.6 2021-09-03
CN202111032498 2021-09-03

Publications (1)

Publication Number Publication Date
WO2023029492A1 true WO2023029492A1 (zh) 2023-03-09

Family

ID=85411860

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086979 WO2023029492A1 (zh) 2021-09-03 2022-04-15 一种外源基因定点整合的系统及方法

Country Status (2)

Country Link
CN (1) CN117897481A (zh)
WO (1) WO2023029492A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108070610A (zh) * 2016-11-08 2018-05-25 中国科学院上海生命科学研究院 植物基因组定点敲入方法
CN108690845A (zh) * 2017-04-10 2018-10-23 中国科学院动物研究所 基因组编辑系统和方法
US20190359973A1 (en) * 2017-01-10 2019-11-28 Christiana Care Health Services, Inc. Methods for in vitro site-directed mutagenesis using gene editing technologies
CN112195164A (zh) * 2020-12-07 2021-01-08 中国科学院动物研究所 工程化的Cas效应蛋白及其使用方法
CN113913405A (zh) * 2020-07-10 2022-01-11 中国科学院动物研究所 一种编辑核酸的系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108070610A (zh) * 2016-11-08 2018-05-25 中国科学院上海生命科学研究院 植物基因组定点敲入方法
US20190359973A1 (en) * 2017-01-10 2019-11-28 Christiana Care Health Services, Inc. Methods for in vitro site-directed mutagenesis using gene editing technologies
CN108690845A (zh) * 2017-04-10 2018-10-23 中国科学院动物研究所 基因组编辑系统和方法
CN113913405A (zh) * 2020-07-10 2022-01-11 中国科学院动物研究所 一种编辑核酸的系统及方法
CN112195164A (zh) * 2020-12-07 2021-01-08 中国科学院动物研究所 工程化的Cas效应蛋白及其使用方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FU YA-WEN, DAI XIN-YUE, WANG WEN-TIAN, YANG ZHI-XUE, ZHAO JUAN-JUAN, ZHANG JIAN-PING, WEN WEI, ZHANG FENG, OBERG KERBY C, ZHANG LE: "Dynamics and competition of CRISPR–Cas9 ribonucleoproteins and AAV donor-mediated NHEJ, MMEJ and HDR editing", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, GB, vol. 49, no. 2, 25 January 2021 (2021-01-25), GB , pages 969 - 985, XP093042374, ISSN: 0305-1048, DOI: 10.1093/nar/gkaa1251 *
JAYAVARADHAN RAJESWARI, PILLIS DEVIN M., GOODMAN MICHAEL, ZHANG FAN, ZHANG YUE, ANDREASSEN PAUL R., MALIK PUNAM: "CRISPR-Cas9 fusion to dominant-negative 53BP1 enhances HDR and inhibits NHEJ specifically at Cas9 target sites", NATURE COMMUNICATIONS, vol. 10, no. 1, 1 December 2019 (2019-12-01), XP055775648, DOI: 10.1038/s41467-019-10735-7 *

Also Published As

Publication number Publication date
CN117897481A (zh) 2024-04-16

Similar Documents

Publication Publication Date Title
US20240117330A1 (en) Enzymes with ruvc domains
US10913941B2 (en) Enzymes with RuvC domains
US11827881B2 (en) Systems, methods, and compositions for site-specific genetic engineering using programmable addition via site-specific targeting elements (paste)
US11713471B2 (en) Class II, type V CRISPR systems
EP4012037A1 (en) Crispr/cas9 gene editing system and application thereof
US20190002920A1 (en) Methods and kits for cloning-free genome editing
WO2022007959A1 (zh) 一种编辑核酸的系统及方法
WO2017215648A1 (zh) 基因敲除方法
US20230074594A1 (en) Genome editing using crispr in corynebacterium
IL288263B (en) CRISPR DNA/RNA hybrid polynucleotides and methods of using them
CA3177828A1 (en) Enzymes with ruvc domains
JP2020517299A (ja) 縦列反復配列を有するドナーdna修復鋳型を使用する部位特異的なdna改変
US20220298494A1 (en) Enzymes with ruvc domains
CN117384880A (zh) 工程化的核酸修饰编辑器
KR102151064B1 (ko) 매칭된 5' 뉴클레오타이드를 포함하는 가이드 rna를 포함하는 유전자 교정용 조성물 및 이를 이용한 유전자 교정 방법
US20220220460A1 (en) Enzymes with ruvc domains
WO2023029492A1 (zh) 一种外源基因定点整合的系统及方法
Dong et al. A single digestion, single-stranded oligonucleotide mediated PCR-independent site-directed mutagenesis method
WO2022159742A1 (en) Novel engineered and chimeric nucleases
US20190309283A1 (en) Method for preparing long-chain single-stranded dna
US20240110167A1 (en) Enzymes with ruvc domains
GB2617659A (en) Enzymes with RUVC domains
WO2024044736A2 (en) Enhanced mammalian crispr editing with separated retron donor and nickases
WO2023183627A1 (en) Production of reverse transcribed dna (rt-dna) using a retron reverse transcriptase from exogenous rna
WO2024026478A1 (en) Compositions and methods for treating a congenital eye disease

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862638

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE