CN117897481A - Exogenous gene fixed-point integration system and method - Google Patents
Exogenous gene fixed-point integration system and method Download PDFInfo
- Publication number
- CN117897481A CN117897481A CN202280059607.XA CN202280059607A CN117897481A CN 117897481 A CN117897481 A CN 117897481A CN 202280059607 A CN202280059607 A CN 202280059607A CN 117897481 A CN117897481 A CN 117897481A
- Authority
- CN
- China
- Prior art keywords
- nucleic acid
- protein
- acid molecule
- sequence
- strand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 635
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000010354 integration Effects 0.000 title abstract description 12
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 1423
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 1297
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 1297
- 239000002773 nucleotide Substances 0.000 claims abstract description 226
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 226
- 239000013598 vector Substances 0.000 claims abstract description 125
- 239000012634 fragment Substances 0.000 claims abstract description 101
- 102000004169 proteins and genes Human genes 0.000 claims description 607
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 claims description 342
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 claims description 342
- 108020005004 Guide RNA Proteins 0.000 claims description 330
- 210000004027 cell Anatomy 0.000 claims description 252
- 230000027455 binding Effects 0.000 claims description 212
- 238000000137 annealing Methods 0.000 claims description 141
- 239000013604 expression vector Substances 0.000 claims description 126
- 108020004414 DNA Proteins 0.000 claims description 80
- 241000700605 Viruses Species 0.000 claims description 64
- 238000003776 cleavage reaction Methods 0.000 claims description 64
- 108020001507 fusion proteins Proteins 0.000 claims description 64
- 102000037865 fusion proteins Human genes 0.000 claims description 64
- 230000007017 scission Effects 0.000 claims description 64
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 63
- 230000000295 complement effect Effects 0.000 claims description 62
- 238000009396 hybridization Methods 0.000 claims description 60
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 53
- 108091033409 CRISPR Proteins 0.000 claims description 36
- 102100034343 Integrase Human genes 0.000 claims description 35
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 35
- 238000011144 upstream manufacturing Methods 0.000 claims description 31
- 229920002477 rna polymer Polymers 0.000 claims description 30
- 102000053602 DNA Human genes 0.000 claims description 28
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 28
- 230000001419 dependent effect Effects 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 22
- 101710163270 Nuclease Proteins 0.000 claims description 17
- 238000013467 fragmentation Methods 0.000 claims description 17
- 238000006062 fragmentation reaction Methods 0.000 claims description 17
- 238000010459 TALEN Methods 0.000 claims description 16
- 241000713838 Avian myeloblastosis virus Species 0.000 claims description 14
- 241000725303 Human immunodeficiency virus Species 0.000 claims description 14
- 241000714474 Rous sarcoma virus Species 0.000 claims description 14
- 102000040430 polynucleotide Human genes 0.000 claims description 12
- 108091033319 polynucleotide Proteins 0.000 claims description 12
- 239000002157 polynucleotide Substances 0.000 claims description 12
- 101150056210 csx1 gene Proteins 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 claims description 11
- 241000713869 Moloney murine leukemia virus Species 0.000 claims description 10
- 241000193996 Streptococcus pyogenes Species 0.000 claims description 10
- 108010017070 Zinc Finger Nucleases Proteins 0.000 claims description 10
- 230000014509 gene expression Effects 0.000 claims description 10
- 101150017047 CSM3 gene Proteins 0.000 claims description 8
- 101150069031 CSN2 gene Proteins 0.000 claims description 8
- 101150078885 CSY3 gene Proteins 0.000 claims description 8
- 108700004991 Cas12a Proteins 0.000 claims description 8
- 101100275895 Emericella nidulans (strain FGSC A4 / ATCC 38163 / CBS 112.46 / NRRL 194 / M139) csnB gene Proteins 0.000 claims description 8
- 101100007788 Escherichia coli (strain K12) casA gene Proteins 0.000 claims description 8
- 101100007792 Escherichia coli (strain K12) casB gene Proteins 0.000 claims description 8
- 101100219622 Escherichia coli (strain K12) casC gene Proteins 0.000 claims description 8
- 101100382541 Escherichia coli (strain K12) casD gene Proteins 0.000 claims description 8
- 101100326871 Escherichia coli (strain K12) ygbF gene Proteins 0.000 claims description 8
- 101100005249 Escherichia coli (strain K12) ygcB gene Proteins 0.000 claims description 8
- 101100387128 Myxococcus xanthus (strain DK1622) devR gene Proteins 0.000 claims description 8
- 101100387131 Myxococcus xanthus (strain DK1622) devS gene Proteins 0.000 claims description 8
- 101100059152 Thermococcus onnurineus (strain NA1) csm1 gene Proteins 0.000 claims description 8
- 101100273269 Thermus thermophilus (strain ATCC 27634 / DSM 579 / HB8) cse3 gene Proteins 0.000 claims description 8
- 101150090505 cas10 gene Proteins 0.000 claims description 8
- 101150117416 cas2 gene Proteins 0.000 claims description 8
- 101150055191 cas3 gene Proteins 0.000 claims description 8
- 101150111685 cas4 gene Proteins 0.000 claims description 8
- 101150049463 cas5 gene Proteins 0.000 claims description 8
- 101150106467 cas6 gene Proteins 0.000 claims description 8
- 101150044165 cas7 gene Proteins 0.000 claims description 8
- 101150040342 cmr4 gene Proteins 0.000 claims description 8
- 101150034961 cmr6 gene Proteins 0.000 claims description 8
- 101150085344 csa5 gene Proteins 0.000 claims description 8
- 101150089829 csc-1 gene Proteins 0.000 claims description 8
- 101150088639 csm4 gene Proteins 0.000 claims description 8
- 101150022488 csm5 gene Proteins 0.000 claims description 8
- 101150064365 csm6 gene Proteins 0.000 claims description 8
- 101150088252 csy1 gene Proteins 0.000 claims description 8
- 101150016576 csy2 gene Proteins 0.000 claims description 8
- 208000032839 leukemia Diseases 0.000 claims description 8
- 241000271566 Aves Species 0.000 claims description 7
- 241000713840 Avian erythroblastosis virus Species 0.000 claims description 7
- 206010018691 Granuloma Diseases 0.000 claims description 7
- 241000712909 Reticuloendotheliosis virus Species 0.000 claims description 7
- 241001069823 UR2 sarcoma virus Species 0.000 claims description 7
- 241000714476 Y73 sarcoma virus Species 0.000 claims description 7
- 208000005266 avian sarcoma Diseases 0.000 claims description 7
- 101150098304 cas13a gene Proteins 0.000 claims description 7
- 101150095330 cmr5 gene Proteins 0.000 claims description 7
- 230000036319 strand breaking Effects 0.000 claims description 7
- 101150100788 cmr3 gene Proteins 0.000 claims description 6
- 125000006850 spacer group Chemical group 0.000 claims description 6
- 101710151325 B2 protein Proteins 0.000 claims description 5
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 5
- 108091008146 restriction endonucleases Proteins 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 4
- 101710197851 B1 protein Proteins 0.000 claims description 3
- 241000588724 Escherichia coli Species 0.000 claims description 3
- 210000004102 animal cell Anatomy 0.000 claims description 3
- 230000002538 fungal effect Effects 0.000 claims description 3
- 210000005260 human cell Anatomy 0.000 claims description 3
- 210000004962 mammalian cell Anatomy 0.000 claims description 3
- 210000001236 prokaryotic cell Anatomy 0.000 claims description 3
- 210000005253 yeast cell Anatomy 0.000 claims description 3
- 238000012258 culturing Methods 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 102100035102 E3 ubiquitin-protein ligase MYCBP2 Human genes 0.000 claims 58
- 238000006243 chemical reaction Methods 0.000 claims 11
- 101100438439 Escherichia coli (strain K12) ygbT gene Proteins 0.000 claims 2
- 101100329497 Thermoproteus tenax (strain ATCC 35583 / DSM 2078 / JCM 9277 / NBRC 100435 / Kra 1) cas2 gene Proteins 0.000 claims 2
- 101150000705 cas1 gene Proteins 0.000 claims 2
- 239000000969 carrier Substances 0.000 claims 1
- 210000004899 c-terminal region Anatomy 0.000 description 19
- 230000017730 intein-mediated protein splicing Effects 0.000 description 18
- 101150104383 ALOX5AP gene Proteins 0.000 description 13
- 101100236114 Mus musculus Lrrfip1 gene Proteins 0.000 description 13
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 7
- 238000002744 homologous recombination Methods 0.000 description 7
- 230000006801 homologous recombination Effects 0.000 description 7
- 230000006780 non-homologous end joining Effects 0.000 description 7
- 230000008685 targeting Effects 0.000 description 7
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 6
- 238000012217 deletion Methods 0.000 description 6
- 230000037430 deletion Effects 0.000 description 6
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 108091029865 Exogenous DNA Proteins 0.000 description 5
- 241000714197 Avian myeloblastosis-associated virus Species 0.000 description 4
- 108020004682 Single-Stranded DNA Proteins 0.000 description 4
- 238000010362 genome editing Methods 0.000 description 4
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 230000002194 synthesizing effect Effects 0.000 description 4
- 238000010453 CRISPR/Cas method Methods 0.000 description 3
- 108020004635 Complementary DNA Proteins 0.000 description 3
- 108091028113 Trans-activating crRNA Proteins 0.000 description 3
- 150000001413 amino acids Chemical class 0.000 description 3
- 238000010804 cDNA synthesis Methods 0.000 description 3
- 239000002299 complementary DNA Substances 0.000 description 3
- 229920001184 polypeptide Polymers 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- 241001430294 unidentified retrovirus Species 0.000 description 3
- 108091079001 CRISPR RNA Proteins 0.000 description 2
- 241000702421 Dependoparvovirus Species 0.000 description 2
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 2
- 241000713824 Rous-associated virus Species 0.000 description 2
- 241000194020 Streptococcus thermophilus Species 0.000 description 2
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 2
- 210000004507 artificial chromosome Anatomy 0.000 description 2
- 210000001106 artificial yeast chromosome Anatomy 0.000 description 2
- 101150038500 cas9 gene Proteins 0.000 description 2
- 230000005782 double-strand break Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000007899 nucleic acid hybridization Methods 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 239000013612 plasmid Substances 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000005758 transcription activity Effects 0.000 description 2
- 230000005026 transcription initiation Effects 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- 241001655883 Adeno-associated virus - 1 Species 0.000 description 1
- 241000702423 Adeno-associated virus - 2 Species 0.000 description 1
- 241000202702 Adeno-associated virus - 3 Species 0.000 description 1
- 241000580270 Adeno-associated virus - 4 Species 0.000 description 1
- 241001634120 Adeno-associated virus - 5 Species 0.000 description 1
- 241000972680 Adeno-associated virus - 6 Species 0.000 description 1
- 241001164823 Adeno-associated virus - 7 Species 0.000 description 1
- 241001164825 Adeno-associated virus - 8 Species 0.000 description 1
- 241000228212 Aspergillus Species 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- 241000616876 Belliella baltica Species 0.000 description 1
- 102100035875 C-C chemokine receptor type 5 Human genes 0.000 description 1
- 101710149870 C-C chemokine receptor type 5 Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 241000186227 Corynebacterium diphtheriae Species 0.000 description 1
- 241000918600 Corynebacterium ulcerans Species 0.000 description 1
- 102100026234 Cytokine receptor common subunit gamma Human genes 0.000 description 1
- 238000012270 DNA recombination Methods 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 230000007018 DNA scission Effects 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 241001524679 Escherichia virus M13 Species 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 230000010337 G2 phase Effects 0.000 description 1
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 101001055227 Homo sapiens Cytokine receptor common subunit gamma Proteins 0.000 description 1
- 101001000998 Homo sapiens Protein phosphatase 1 regulatory subunit 12C Proteins 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 108091026898 Leader sequence (mRNA) Proteins 0.000 description 1
- 241000713666 Lentivirus Species 0.000 description 1
- 241000186781 Listeria Species 0.000 description 1
- 208000024556 Mendelian disease Diseases 0.000 description 1
- 241001529936 Murinae Species 0.000 description 1
- 241000714177 Murine leukemia virus Species 0.000 description 1
- 101150114527 Nkx2-5 gene Proteins 0.000 description 1
- 241000894763 Nostoc punctiforme PCC 73102 Species 0.000 description 1
- 241001631646 Papillomaviridae Species 0.000 description 1
- 241001135221 Prevotella intermedia Species 0.000 description 1
- 102100035620 Protein phosphatase 1 regulatory subunit 12C Human genes 0.000 description 1
- 241001647888 Psychroflexus Species 0.000 description 1
- 108020005091 Replication Origin Proteins 0.000 description 1
- 108700008625 Reporter Genes Proteins 0.000 description 1
- 230000018199 S phase Effects 0.000 description 1
- 241000700584 Simplexvirus Species 0.000 description 1
- 241001606419 Spiroplasma syrphidicola Species 0.000 description 1
- 241000203029 Spiroplasma taiwanense Species 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 241000192593 Synechocystis sp. PCC 6803 Species 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- 210000005006 adaptive immune system Anatomy 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 108010013829 alpha subunit DNA polymerase III Proteins 0.000 description 1
- 125000000539 amino acid group Chemical group 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000022131 cell cycle Effects 0.000 description 1
- 230000011712 cell development Effects 0.000 description 1
- 150000005829 chemical entities Chemical class 0.000 description 1
- 210000004978 chinese hamster ovary cell Anatomy 0.000 description 1
- 210000003763 chloroplast Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 210000001808 exosome Anatomy 0.000 description 1
- 210000002950 fibroblast Anatomy 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012214 genetic breeding Methods 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 239000002502 liposome Substances 0.000 description 1
- 230000033607 mismatch repair Effects 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 210000003463 organelle Anatomy 0.000 description 1
- 208000003154 papilloma Diseases 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000016434 protein splicing Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000001082 somatic cell Anatomy 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 241000701447 unidentified baculovirus Species 0.000 description 1
- 241001529453 unidentified herpesvirus Species 0.000 description 1
- 230000002477 vacuolizing effect Effects 0.000 description 1
- 239000003981 vehicle Substances 0.000 description 1
- 239000013603 viral vector Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/62—DNA sequences coding for fusion proteins
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N5/00—Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
- C12N5/10—Cells modified by introduction of foreign genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Medicinal Chemistry (AREA)
- Cell Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present application relates to methods for site-directed integration of exogenous genes that do not rely on homology arms and donor vector linearization. The application also relates to a system and a kit for editing nucleic acids and uses thereof, and a method for editing nucleic acids. The systems, kits, and methods of the present application can be used to break one nucleic acid strand of a double-stranded target nucleic acid and form a flap at its end (particularly the end resulting from the break), and can be used to insert a target nucleic acid into a nucleic acid molecule of interest (e.g., genomic DNA) or to replace a nucleotide fragment in a nucleic acid molecule of interest (e.g., genomic DNA) with a target nucleic acid.
Description
The present application relates to the fields of genetic engineering and molecular biology. In particular, the present application relates to methods for site-directed integration of exogenous genes that do not rely on homology arms and donor vector linearization. The application also relates to a system and a kit for editing nucleic acids and uses thereof, and a method for editing nucleic acids. The systems, kits, and methods of the present application can be used to break one nucleic acid strand of a double-stranded target nucleic acid and form a flap at its end (particularly the end resulting from the break), and can be used to insert a target nucleic acid into a nucleic acid molecule of interest (e.g., genomic DNA) or to replace a nucleotide fragment in a nucleic acid molecule of interest (e.g., genomic DNA) with a target nucleic acid.
The gene editing technology is a popular field of biomedical research, and has wide application prospect in the aspects of clinical treatment of hereditary diseases, construction of animal models, genetic breeding of crops and the like. Gene editing techniques include deletion, addition, and substitution of a single nucleotide or a stretch of DNA at a specific locus on the genome. Site-directed knock-in of exogenous genes can be achieved by homologous recombination (HDR, homologous dependent recombination): a section of 500-3000bp homology arms are respectively introduced at two sides of the exogenous gene, so that the accurate site-specific integration of the exogenous gene can be realized, but the efficiency is extremely low, and only about 0.01%. Site-directed knock-in of homologous recombination mediated foreign genes can be facilitated by cleavage of an artificially constructed nuclease such as ZFN (zinc-finger nucleotides), TALEN (transcription activator-like effector nucleases) or CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated protein-9 nucleic) at a targeted site in the genome, resulting in a DNA double strand break (DSB, double strand break). However, since most mammalian cells rely mainly on NHEJ (non-homologous end joining) for DSB repair, the site-directed knock-in efficiency based on nucleases and homologous recombination is still low, typically around 1%. Furthermore, since homologous recombination occurs only in the S/G2 phase of the cell cycle, site-directed integration of exogenous genes cannot be achieved by the above method for most somatic cells in the terminal differentiation stage.
HMEJ (homolog-mediated end joining) is a method of linearizing a donor vector by adding recognition sequences that can be targeted for cleavage by CRISPR/Cas on both sides of the Homology arm on the donor vector, thereby improving the efficiency of homologous recombination.
Site-directed integration of exogenous DNA fragments can also be achieved using linear single-stranded DNA as the donor. The two ends of the single-stranded DNA donor are respectively provided with a 30-50nt homology arm, and after nuclease is cut at the specific site of the genome, the single strand is integrated at the DSB site by means of SDSA (synthesis dependent strand annealing), so that the integration of the specific site of the genome is realized. Linear single-stranded DNA is more efficient than HDR, but not accurate enough: additional base insertions and deletions often occur at the 5' end of single stranded DNA at the adaptor. In addition, the cost of long fragment linear DNA single-stranded chemical synthesis is high and not readily available. Therefore, this method is not suitable for site-directed knock-in of large fragments (greater than 1 Kb) of foreign genes. In addition, when the insert exceeds 1Kb, the integration efficiency is significantly reduced.
Site-directed knock-in based on NHEJ, such as the HITI (homo-independent target integration) technique, does not rely on Homology arms at both ends of the foreign gene, wherein the nuclease cleaves the donor vector at the same time as it cleaves a specific site on the genome, and subsequently the linearized foreign gene DNA fragment is inserted into the cleavage site of the genome through the NHEJ DNA repair pathway. NHEJ-based site-directed knock-in is not directional and the position of the linker is often imprecise, prone to additional base insertions or deletions. The fixed-point knock-in method based on MMEJ is based on NHEJ, and micro homology arms are introduced at two ends of exogenous genes, but the efficiency is still low.
Prime Editing is a novel gene Editing method. The method uses fusion protein composed of spCas9 (nCas 9) with H840A mutation and reverse transcriptase MLV-RT (Murine Leukemia Virus-Reverse Transcriptase) and PegRNA (Prime editing guide RNA) modified by gRNA (guide RNA), and can realize the deletion, addition and replacement of any single base transition/transversion or small fragment DNA. PegRNA is generated by introducing a PBS (Prime binding site) sequence at the 3' end of the gRNA and a template sequence containing the editing sequence and a homologous sequence to the genomic DSB site. In this method, a complex formed by nCas9 and PegRNA binds to a genomic targeting site and cleaves the PAM strand, then the PBS sequence on the PegRNA is complementarily paired with the free 3 'end on the PAM strand, and then MLV-RT reverse transcribes the editing sequence and the homologous sequence at the 3' end at the nick of the PAM strand using the template sequence of the PegRNA as a template. Subsequently, repair can be accomplished at the nick and the editing sequence integrated into the targeting site through substitution and mismatch repair of the DNA single strand. Since H840A n cas9 cleaves only one strand of double-stranded DNA (i.e., PAM strand), DSB-initiated NHEJ is not generated, and therefore, this method is not easy to introduce additional base deletions or insertions, and editing accuracy is high. However, prime Editing is only applicable to deletion or knock-in of base sequences smaller than 100bp, since the length of the template sequence on PegRNA limits the length of the editable sequence.
Therefore, the establishment of a method capable of efficiently carrying out gene fixed-point knock-in and replacement, in particular to a method capable of efficiently carrying out insertion and replacement of large-fragment (more than 1 Kbp) exogenous genes, is very important for expanding the application of gene editing technology in production and medical treatment.
Disclosure of Invention
In the present invention, unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by one of ordinary skill in the art. Moreover, the nucleic acid chemistry laboratory procedures used herein are all conventional procedures widely used in the corresponding field. Meanwhile, in order to better understand the present invention, definitions and explanations of related terms are provided below.
The term "Cas protein" or "Cas nuclease" is an RNA-guided nuclease. Cas proteins are also known as Cas 1 nucleases or CRISPR-associated nucleases. CRISPR (clustered regularly interspaced short palindromic repeats) is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain repeat sequences (repeats) and spacer sequences (spacers), where the spacer sequences are sequences complementary to mobile genetic elements, capable of targeting invasive nucleic acids. The CRISPR cluster is transcribed and processed into CRISPR RNA (crRNA). In a type II CRISPR system, correct processing of pre-crrnas also requires involvement of trans-encoded small RNAs (tracrRNA). Thus, in nature, cas protein and two RNAs are required for DNA cleavage by type II CRISPR systems. However, crrnas and tracrRNA can be incorporated into a single guide RNA (simply "sgRNA" or "gNRA") by engineering. See, e.g., jink m., chlinski k., fonfara i., hauer m., doudna j.a., charpentier e.science 337:816-821 (2012), the entire contents of which are incorporated herein by reference.
As used herein, the term "complementary" means that two nucleic acid sequences are capable of forming hydrogen bonds between each other according to the base pairing rules (the Waston-Crick rules) and thereby forming a duplex. In this application, the term "complementary" includes "substantially complementary" and "fully complementary". As used herein, the term "fully complementary" means that each base in one nucleic acid sequence is capable of pairing with a base in another nucleic acid strand without a mismatch or gap. As used herein, the term "substantially complementary" means that a majority of bases in one nucleic acid sequence are capable of base pairing with bases in another nucleic acid strand, which allows for a mismatch or gap (e.g., a mismatch or gap of one or several nucleotides) to exist. Typically, two nucleic acid sequences that are "complementary" (e.g., substantially complementary or fully complementary) will selectively/specifically hybridize or anneal and form a duplex under conditions that allow the nucleic acids to hybridize, anneal or amplify.
As used herein, the term "DNA polymerase" refers to an enzyme capable of synthesizing one nucleic acid strand (e.g., a DNA strand or an RNA strand) as a template from the other. In the present application, the DNA polymerase may be DNA-dependent DNA polymerase (i.e., an enzyme capable of synthesizing a complementary DNA strand using a DNA strand as a template) or RNA-dependent DNA polymerase (i.e., an enzyme capable of synthesizing a complementary DNA strand using an RNA strand as a template). In certain embodiments, the DNA polymerase used herein is an RNA-dependent DNA polymerase, such as a reverse transcriptase.
As used herein, the term "Reverse Transcriptase (RT)" refers to an enzyme capable of synthesizing a complementary DNA strand using an RNA strand as a template. The reverse transcriptase of the present application includes, but is not limited to, reverse transcriptase from retrovirus or other viruses or bacteria, and DNA polymerase having reverse transcription activity, such as TTH DNA polymerase, taq DNA polymerase, TNE DNA polymerase, TMA DNA polymerase, etc. Reverse transcriptase from retrovirus includes, but is not limited to, reverse transcriptase from Moloney murine leukemia virus (M-MLV), human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-associated virus and myeloblastosis-associated virus (MAV). Specific examples of reverse transcriptases are also found, for example, in U.S. patent application 2002/0198944 (incorporated herein by reference in its entirety). In addition, reverse transcriptases of the present application include, but are not limited to, any form, e.g., naturally occurring reverse transcriptases, naturally occurring mutant reverse transcriptases, engineered mutant reverse transcriptases, or other variants (e.g., truncated variants that retain their reverse transcription activity).
As used herein, the terms "hybridization" and "annealing" refer to the process by which complementary single-stranded nucleic acid molecules form double-stranded nucleic acids. In this application, "hybridization" and "annealing" have the same meaning and are used interchangeably. In general, two nucleic acid sequences that are perfectly complementary or substantially complementary may hybridize or anneal. The complementarity required for hybridization or annealing of two nucleic acid sequences depends on the hybridization conditions, particularly the temperature, employed.
As used herein, "conditions that allow hybridization of nucleic acids" have meanings commonly understood by those of skill in the art and can be determined by conventional methods. For example, two nucleic acid molecules having complementary sequences may hybridize under suitable hybridization conditions. Such hybridization conditions may involve the following factors: temperature, pH, composition, and ionic strength of the hybridization buffer, etc., and can be determined based on the length and GC content of the two complementary nucleic acid molecules. For example, hybridization conditions of low stringency can be employed when the length of two complementary nucleic acid molecules is relatively short and/or the GC content is relatively low. When the length of the two complementary nucleic acid molecules is relatively long and/or the GC content is relatively high, high stringency hybridization conditions can be employed. Such hybridization conditions are well known to those skilled in the art and can be found, for example, in Joseph Sambrook, et al, molecular Cloning, A Laboratory Manual, cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y. (2001); and m.l.m. anderson, nucleic Acid Hybridization, springer-Verlag New York inc.n.y. (1999). In this application, "hybridization" and "annealing" have the same meaning and are used interchangeably. Accordingly, the expressions "conditions allowing nucleic acid hybridization" and "conditions allowing nucleic acid annealing" also have the same meaning and are used interchangeably.
As used herein, the term "upstream" is used to describe the relative positional relationship of two nucleic acid sequences (or two nucleic acid molecules) and has the meaning commonly understood by one of skill in the art. For example, the expression "one nucleic acid sequence is located upstream of another nucleic acid sequence" means that the former is located further forward (i.e., closer to the 5' end) than the latter when arranged in the 5' to 3' direction. As used herein, the term "downstream" has the opposite meaning as "upstream".
As used herein, the term "linker" refers to a chemical entity that is used to join two physical elements (e.g., two nucleic acids or two polypeptides). For example, the linker for linking the two polypeptides may be a peptide linker (e.g., a linker comprising multiple amino acid residues); the linker for linking the two nucleic acids may be a nucleic acid linker (e.g., a linker comprising multiple nucleotides).
As used herein, the term "guide sequence" refers to a targeting sequence that directs the inclusion of a targeting RNA. In certain instances, the guide sequence is a polynucleotide sequence that has sufficient complementarity to the target sequence to be able to hybridize to the target sequence and guide the specific binding of the CRISPR/Cas complex to the target sequence. In certain embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%. Methods for determining the complementarity of two nucleic acid sequences are within the ability of one of ordinary skill in the art. For example, there are published and commercially available alignment algorithms and programs such as, but not limited to, the Smith-Waterman algorithm (Smith-Waterman), bowtie, geneious, biopython, and SeqMan in ClustalW, matlab.
As used herein, the term "scaffold sequence" refers to a sequence in a guide RNA that is recognized and bound by a Cas protein. In certain instances, the scaffold sequence can comprise or consist of a repeated sequence of CRISPR.
As used herein, the term "functional complex" refers to a complex formed by binding a guide RNA (guide RNA or gRNA) to a Cas protein that is capable of recognizing and cleaving a polynucleotide that is associated with the guide RNA.
As used herein, the term "target nucleic acid" or "target sequence" refers to a polynucleotide targeted by a targeting sequence, e.g., a sequence that has complementarity to the targeting sequence. Complete complementarity of the guide sequence to the target sequence is not necessary, so long as sufficient complementarity exists to cause hybridization of the two and promote binding of the CRISPR/Cas complex. The target sequence may comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located within an organelle of a eukaryotic cell, such as a mitochondria or chloroplast.
In the present invention, the expression "target sequence" or "target nucleic acid" can be any endogenous or exogenous polynucleotide for a cell (e.g., a eukaryotic cell). For example, the target nucleic acid may be a polynucleotide (e.g., genomic DNA) present in the nucleus of a eukaryotic cell, or may be a polynucleotide (e.g., vector DNA) that has been exogenously introduced into a cell. For example, the target nucleic acid may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or unwanted DNA). In certain instances, the target nucleic acid or target sequence comprises or is adjacent to a Protospacer Adjacent Motif (PAM). The exact sequence and length requirements of PAM depend on the Cas protein used. Typically, PAM is a sequence of 2-5 base pairs adjacent to the protospacer in the CRISPR cluster. Those skilled in the art are able to identify PAM sequences for use with a given Cas protein.
As used herein, the term "vector" refers to a nucleic acid vehicle into which a polynucleotide may be inserted. When a vector enables expression of a protein encoded by an inserted polynucleotide, the vector is referred to as an expression vector. The vector may be introduced into a host cell by transformation, transduction or transfection such that the genetic material elements carried thereby are expressed in the host cell. Vectors are well known to those skilled in the art and include, but are not limited to: a plasmid; phagemid; a cosmid; nano-liposome particles; an exosome; artificial chromosomes, such as Yeast Artificial Chromosome (YAC), bacterial Artificial Chromosome (BAC), or P1-derived artificial chromosome (PAC); phages such as lambda phage or M13 phage, animal viruses, etc. Animal viruses that may be used as vectors include, but are not limited to, retrovirus (including lentivirus), adenovirus, adeno-associated virus, herpes virus (e.g., herpes simplex virus), poxvirus, baculovirus, papilloma virus, papilloma vacuolation virus (e.g., SV 40). A vector may contain a variety of elements that control expression, including, but not limited to, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and reporter genes. In addition, the vector may also contain a replication origin. Those skilled in the art will appreciate that the design of the expression vector may depend on factors such as the choice of host cell to be transformed, the desired level of expression, and the like. When the vector carries both the exogenous DNA to be integrated into the host genome and the non-protein expression element associated with the integration of the exogenous DNA, the vector is referred to as a donor vector. Exogenous DNA includes, but is not limited to, complete genes or gene fragments, promoter sequences, transcription initiation sequences, enhancer sequences, selection elements, and protein coding sequences. Non-protein expression elements associated with integration of exogenous DNA include, but are not limited to, homologous sequences at the site to be inserted, targeted cleavage sequences for the tool enzyme, and the like. Adeno-associated viral vectors include, but are not limited to, adeno-associated viruses of different serotypes such as AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV-DJ, and other engineered serotypes.
In the present invention, the term "intein" refers to an internal protein element that mediates splicing of the translated protein. Inteins are located in the middle of the polypeptide sequence, are cleaved off after processing, and catalyze the ligation of protein exons at both ends into mature protein molecules. The "intein resolution system" is a system that utilizes inteins to efficiently resolve and splice larger protein molecules. Inteins can be separated into an N-terminal segment and a C-terminal segment. The target protein is split into an N-terminal segment and a C-terminal segment which are respectively connected with the N-terminal segment and the C-terminal segment of the intein to form the fusion protein. Only when the two fusion proteins of the N-terminal part and the C-terminal part meet, the intein in the split precursor protein is subjected to protein splicing and removal, and the N-terminal segment and the C-terminal segment of the target protein are spliced, so that the functional target protein is formed. Inteins suitable for use in the present invention are derived from, but not limited to, the DnaE DNA polymerase of Synechocystis sp.PCC6803 and Nostoc punctiforme PCC73102 (Npu).
As used herein, the term "host cell" refers to a cell that can be used to introduce a vector, including, but not limited to, a prokaryotic cell such as e.g. escherichia coli or bacillus subtilis, a fungal cell such as e.g. yeast cells or aspergillus, an insect cell such as e.g. S2 drosophila cells or Sf9, or an animal cell such as e.g. fibroblasts, CHO cells, COS cells, NSO cells, heLa cells, BHK cells, HEK 293 cells or human cells.
As used herein, the term "spCas9 (H840A)" refers to a mutant of the spCas9 protein, specifically to mutate the amino acid at position 840 corresponding to the spCas9 protein from H to a.
Similarly, the term "saCas9 (R1226A)" refers to a mutant of a saCas9 protein, specifically, the mutation of amino acid 1226 corresponding to the saCas9 protein from R to a.
The term "PE-spCas9" refers to a fusion protein produced by fusion of spCas9 (H840A) with reverse transcriptase MLV RT.
As used herein, the term "flap" refers to a free nucleic acid fragment attached at the 3' end of a nick created by cleavage of one strand of a double-stranded target nucleic acid, which fragment is not complementary to the nucleotide fragment of the corresponding other strand, and thus is in a free state. "homologous flap" refers to a flap sequence formed on a double-stranded target nucleic acid that is identical or complementary to the terminal sequence of a specific cleavage site on the genome. In some embodiments, the valve process may be obtained by: after cas protein breaks one strand of a double-stranded target nucleic acid (e.g., a donor vector containing a nucleic acid sequence of interest), the 3' end at the break of the broken nucleic acid strand can be extended using the template sequence (e.g., pegRNA) annealed to the broken nucleic acid strand as a template and form a free nucleic acid fragment.
As used herein, the term "homologous recombination (HDR, homologous dependent recombination)" refers to a DNA recombination process that proceeds based on sequence homology of a nucleic acid sequence upstream and/or downstream of a nucleic acid sequence of interest in a construct (e.g., a donor nucleic acid vector) to a nucleic acid sequence upstream and/or downstream of a target site in a genome or nucleic acid fragment. The nucleic acid sequence upstream and/or downstream of the nucleic acid sequence of interest in the donor nucleic acid vector is referred to herein as a "donor homology arm". The nucleic acid sequence upstream and/or downstream of the target site in the genome or nucleic acid fragment is referred to herein as a "target site homology arm".
In certain embodiments, the donor homology arm is identical or highly homologous to the target site homology arm, i.e., has at least 85%,90%,95%,98%, or 100% sequence identity.
In certain embodiments, the donor homology arm is located upstream of the nucleic acid sequence of interest, and the target site homology arm is located upstream of the target site. In certain embodiments, the donor homology arm is located downstream of the nucleic acid sequence of interest, and the target site homology arm is located downstream of the target site. In a first aspect, the present application provides a system or kit comprising the following four components:
(1) A first Cas protein or a nucleic acid molecule A1 comprising a nucleotide sequence encoding the first Cas protein, wherein the first Cas protein is capable of cleaving or cleaving one nucleic acid strand of a first double-stranded target nucleic acid;
(2) A template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding said first DNA polymerase;
(3) A first gRNA or a nucleic acid molecule C1 that contains a nucleotide sequence encoding the first gRNA, wherein the first gRNA is capable of binding to the first Cas protein and forming a first functional complex; the first functional complex is capable of fragmenting one nucleic acid strand of a first double-stranded target nucleic acid;
(4) A first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding said first tag primer, wherein said first tag primer comprises a first tag sequence and a first target binding sequence, said first tag sequence being located upstream or 5' of said first target binding sequence; and, under conditions that allow hybridization or annealing of nucleic acids, the first target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand to form a double-stranded structure, and the first tag sequence is not bound to the nucleic acid strand in a free single-stranded state.
In certain embodiments, the first Cas protein is selected from Cas proteins that cleave single strands of DNA, e.g., the cleavage of single strands of DNA refers to cleavage of single strands of DNA that are not targeted for binding by gRNA.
In certain embodiments, the first Cas protein is selected from the group consisting of Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, B1 protein, csb2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx 2 protein, csx15 (variants of either the Cas9, csx1, csx 2 protein, csx 15), a mutant, csx1 protein, csx 2 protein, csx (variants of the like).
In certain embodiments, the first Cas protein is capable of cleaving one nucleic acid strand of a first double-stranded target nucleic acid and creating a nick.
In certain embodiments, the first Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A)).
In certain embodiments, the first Cas protein has the amino acid sequence set forth in SEQ ID No. 3.
The sequences and structures of various Cas proteins are well known to those skilled in the art. Currently, a variety of Cas9 proteins and their homologs have been reported in a variety of species, including but not limited to streptococcus pyogenes and streptococcus thermophilus. Other suitable Cas9 proteins will be apparent to those of skill in the art based on the present disclosure, e.g., the Cas9 proteins disclosed in cheilinski, rhun, and charplenier. The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems (2013) RNA Biology 10:5,726-737 (the entire contents of which are incorporated herein by reference).
In some embodiments, cas9 is Cas9 from the following species: corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); corynebacterium diphtheriae (NCBI Refs: NC_016782.1, NC_016786.1); spiroplasma syrphidicola (NCBI Ref: NC_ 021284.1); prevotella intermedia (NCBI Ref: NC_ 017861.1); spiroplasma taiwanense (NCBI Ref: NC_ 021846.1); streptococcus ragmitis (NCBI Ref: NC_ 021314.1); belliella baltica (NCBI Ref: NC_ 018010.1); psychrof lexus torq uisI (NCBI Ref: NC_ 018721.1); streptococcus thermophilus (NCBI Ref: YP_ 820832.1); listeria innocuitum (NCBI Ref: NP-472073.1); streptococcus pyogenes (NCBI Ref: NC_ 017053.1).
In certain embodiments, the first DNA polymerase is selected from, but is not limited to, a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
In certain embodiments, the first DNA polymerase is an RNA-dependent DNA polymerase.
In certain embodiments, the first DNA polymerase is a reverse transcriptase, such as the reverse transcriptase listed above, e.g., a reverse transcriptase of moloney murine leukemia virus.
In certain embodiments, the first DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7.
In certain embodiments, the first Cas protein is linked to the first DNA polymerase.
In certain embodiments, the first Cas protein is covalently linked to the first DNA polymerase through a linker or not.
In certain embodiments, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35.
In certain embodiments, the first Cas protein is fused to the first DNA polymerase with or without a peptide linker, forming a first fusion protein.
In certain embodiments, the first Cas protein is optionally linked or fused to the N-terminus of the first DNA polymerase by a linker; alternatively, the first Cas protein is optionally linked or fused to the C-terminus of the first DNA polymerase by a linker.
In certain embodiments, the first fusion protein has the amino acid sequence set forth in SEQ ID NO. 8.
In some embodiments, the linker is a peptide linker. In some embodiments, the peptide linker is 5-200 amino acids in length, e.g., 5,6,7,8,9, 10, 15, 20, 25, 30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length.
In certain embodiments, the first fusion protein or the first cas protein may be split into two portions by an intein split system. It will be readily appreciated that the intein resolution system may be resolved at any amino acid position of the first fusion protein or the first cas protein. For example, in certain embodiments, the intein resolution system splits inside the first cas protein. Thus, in certain embodiments, the first cas protein is split into an N-terminal segment and a C-terminal segment. For example, the N-and C-terminal segments of the first cas protein may be fused to the N-and C-terminal segments of the intein (or to the C-and N-terminal segments of the intein, respectively), respectively, and both may be capable of reconstituting the active first cas protein within the cell. In certain embodiments, the N-terminal and C-terminal segments of the first cas protein are each inactive in an isolated state, but are capable of reconstituting an active first cas protein within a cell. Accordingly, in certain embodiments, the nucleic acid molecule A1 may be split into two portions comprising nucleotide sequences encoding the N-terminal and C-terminal segments, respectively, of the first cas protein. Furthermore, it is easy to understand that in the first fusion protein, the first DNA polymerase may be fused to the N-terminal segment or the C-terminal segment of the first cas protein. In certain embodiments, the first DNA polymerase is fused to the C-terminal segment of the first cas protein.
In certain embodiments, the first gRNA contains a first guide sequence and the first guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a first double-stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acids.
In certain embodiments, the first guide sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the first gRNA further contains a first scaffold sequence that is capable of being recognized and bound by the first Cas protein, thereby forming a first functional complex.
In certain embodiments, the first scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the first guide sequence is located upstream or 5' of the first scaffold sequence.
In certain embodiments, the first functional complex is capable of cleaving one nucleic acid strand (the first strand) of the first double-stranded target nucleic acid after the first guide sequence binds to the other nucleic acid strand (the second strand) of the first double-stranded target nucleic acid.
In certain embodiments, the first target binding sequence is capable of hybridizing or annealing to the 3 'end of one nucleic acid strand of the fragmented target nucleic acid fragment under conditions that allow for hybridization or annealing of the nucleic acids, and the 3' end is formed as a result of the first functional complex fragmenting the first double stranded target nucleic acid.
In certain embodiments, the first target-binding sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer.
In certain embodiments, the first tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the first DNA polymerase is capable of extending the 3 'end of the nucleic acid strand after hybridization or annealing of the first target binding sequence to the 3' end of one nucleic acid strand of the fragmented target nucleic acid fragment using the first tag primer as a template. In certain embodiments, the extension forms a first lobe.
In certain embodiments, the first tagged primer is a single stranded deoxyribonucleic acid or a single stranded ribonucleic acid.
In certain embodiments, the first tag primer is a single-stranded ribonucleic acid and the first DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the first tag primer is a single stranded deoxyribonucleic acid and the first DNA polymerase is a DNA-dependent DNA polymerase.
In certain embodiments, the nucleic acid strand bound by the first guide sequence is different from the nucleic acid strand bound by the first target binding sequence. In certain embodiments, the first guide sequence-bound nucleic acid strand is the opposite strand of the first target binding sequence-bound nucleic acid strand.
In certain embodiments, the first tag primer is linked to the first gRNA.
In certain embodiments, the first tag primer is covalently linked to the first gRNA with or without a linker.
In certain embodiments, the first tag primer is attached to the 3' end of the first gRNA, optionally through a linker.
In certain embodiments, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker).
In certain embodiments, the first tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the first gRNA with or without a ribonucleic acid linker, forming a first PegRNA.
In certain embodiments, the nucleic acid molecule A1 is capable of expressing the first Cas protein in a cell. In certain embodiments, the nucleic acid molecule B1 is capable of expressing the first DNA polymerase in a cell. In certain embodiments, the nucleic acid molecule C1 is capable of transcribing the first gRNA in a cell. In certain embodiments, the nucleic acid molecule D1 is capable of transcribing the first tag primer in a cell.
In certain embodiments, the nucleic acid molecule A1 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule A1 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the first Cas protein.
In certain embodiments, the nucleic acid molecule B1 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule B1 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the first DNA polymerase.
In certain embodiments, the nucleic acid molecule C1 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule C1 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the first gRNA.
In certain embodiments, the nucleic acid molecule D1 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule D1 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the first tag primer.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or are capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g., a eukaryotic expression vector); in certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell.
In certain embodiments, two, three, or four of the nucleic acid molecules A1, B1, C1, and D1 are contained in the same expression vector (e.g., a eukaryotic expression vector).
In certain embodiments, the system or kit comprises:
(M1-1) a first fusion protein comprising the first Cas protein and the first DNA polymerase, or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein; or, (M1-2) the isolated first Cas protein and first DNA polymerase, or a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase; and, a step of, in the first embodiment,
(M2) a first PegRNA comprising the first gRNA and a first tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA.
In certain embodiments, the system or kit further comprises:
(5) A second nucleic acid editing system, the second nucleic acid editing system being a homologous recombination technique.
In certain embodiments, the system or kit further comprises a nucleic acid vector (e.g., a donor nucleic acid vector).
In certain embodiments, the nucleic acid vector further comprises a first PAM sequence recognized by the first Cas protein, and/or a donor homology arm.
In certain embodiments, the nucleic acid vector is double-stranded.
In certain embodiments, the nucleic acid vector is a circular double stranded vector.
In certain embodiments, the nucleic acid vector comprises a first guide binding sequence (e.g., a complement of the first guide sequence) capable of hybridizing or annealing to the first guide sequence.
In certain embodiments, the first functional complex is capable of cleaving one nucleic acid strand of the nucleic acid vector through the first guide binding sequence and the first PAM sequence.
In certain embodiments, the nucleic acid vector further comprises a nucleic acid sequence of interest.
In certain embodiments, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome.
In certain embodiments, the first PAM sequence and the donor homology arm flank the nucleic acid sequence of interest, respectively.
In certain embodiments, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
In certain embodiments, the first functional complex breaks a first strand of the nucleic acid vector, the first strand comprising a nick resulting from the break, and the portion of the double strand located between the 3' end of the nick and the donor homology arm comprises the nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest.
In certain embodiments, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state. In certain embodiments, the nucleic acid strand hybridized or annealed to the first target binding sequence is an opposing strand of the nucleic acid strand comprising the first guide binding sequence.
In certain embodiments, the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state; preferably, the first target sequence is located on the opposite strand of the first guide binding sequence. In certain embodiments, the first target sequence is located at the end of the cleaved first strand, and in certain embodiments, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence is capable of being extended (in certain embodiments, forming a first lobe) using the first tag primer annealed to the first target sequence as a template.
In certain embodiments, the nucleic acid vector further comprises a restriction enzyme site between the first target sequence and the donor homology arm.
In certain embodiments, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the donor homology arm.
In certain embodiments, the system or kit further comprises:
(5) A second gRNA or a nucleic acid molecule C2 that contains a nucleotide sequence encoding the second gRNA, wherein the second gRNA is capable of binding to a second Cas protein and forming a second functional complex; the second functional complex is capable of fragmenting one nucleic acid strand of a second double-stranded target nucleic acid.
In certain embodiments, the second Cas protein is the same as or different from the first Cas protein. In certain embodiments, the second Cas protein is the same as the first Cas protein.
In certain embodiments, the second gRNA contains a second guide sequence and the second guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a second double-stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acids.
In certain embodiments, the second functional complex breaks one strand (the first strand) of the second dual-strand target nucleic acid after the second guide sequence binds to the other strand (the second strand) of the second dual-strand target nucleic acid. In certain embodiments, the second guide sequence is different from the first guide sequence.
In certain embodiments, the second double stranded target nucleic acid is the same as or different from the first double stranded target nucleic acid.
In certain embodiments, the second double stranded target nucleic acid is the same as the first double stranded target nucleic acid, and the second functional complex breaks a different nucleic acid strand of the double stranded target nucleic acid at a different location than the first functional complex.
In certain embodiments, the second functional complex breaks a different nucleic acid strand of the same double-stranded target nucleic acid than the first functional complex, and the nucleic acid strand bound by the first guide sequence is different from the nucleic acid strand bound by the second guide sequence. In certain embodiments, the first guide sequence-bound nucleic acid strand is the opposite strand of the second guide sequence-bound nucleic acid strand.
In certain embodiments, the second double stranded target nucleic acid is the same double stranded target nucleic acid as the first double stranded target nucleic acid, the double stranded target nucleic acid comprising a first strand and a second strand, the first functional complex being capable of cleaving the first strand after the first guide sequence is bound to the second strand, the second functional complex being capable of cleaving the second strand after the second guide sequence is bound to the first strand. In certain embodiments, the second guide sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the second gRNA further contains a second scaffold sequence that is capable of being recognized and bound by the second Cas protein, thereby forming a second functional complex.
In certain embodiments, the second scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the second scaffold sequence is the same as or different from the first scaffold sequence. In certain embodiments, the second scaffold sequence is identical to the first scaffold sequence.
In certain embodiments, the second guide sequence is located upstream or 5' of the second scaffold sequence.
In certain embodiments, the nucleic acid molecule C2 is capable of transcribing the second gRNA in a cell.
In certain embodiments, the nucleic acid molecule C2 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule C2 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the second gRNA.
In certain embodiments, the second Cas protein is different from the first Cas protein; and, the system or kit further comprises:
(6) The second Cas protein or a nucleic acid molecule A2 comprising a nucleotide sequence encoding the second Cas protein, wherein the second Cas protein is capable of cleaving or cleaving one nucleic acid strand of a second double-stranded target nucleic acid.
In certain embodiments, the second Cas protein is capable of cleaving one nucleic acid strand of the second double-stranded target nucleic acid and making a nick.
In certain embodiments, the second Cas protein is selected from Cas proteins that cleave single strands of DNA, e.g., the cleavage of single strands of DNA refers to cleavage of single strands of DNA that are not targeted for binding by gRNA.
In certain embodiments, the second Cas protein is selected from the group consisting of Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr6 protein, B1 protein, csb2 protein, csb3 protein, csx17 protein, csx14, csx16 protein, csx 2 protein, csx1 protein, csx3 protein, csx1 protein, spCas9 (H840A), saCas9 (R1226A)), a homolog of the mutant or a modified form of the mutant.
In certain embodiments, the second Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A)).
In certain embodiments, the second Cas protein has the amino acid sequence set forth in SEQ ID No. 3.
In certain embodiments, the nucleic acid molecule A2 is capable of expressing the second Cas protein in a cell.
In certain embodiments, the nucleic acid molecule A2 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule A2 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the second Cas protein.
In certain embodiments, the system or kit further comprises:
(7) A second tag primer or a nucleic acid molecule D2 comprising a nucleotide sequence encoding said second tag primer, wherein said second tag primer comprises a second tag sequence and a second target binding sequence, said second tag sequence being located upstream or 5' of said second target binding sequence; and, under conditions that allow hybridization or annealing of nucleic acids, the second target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand to form a double-stranded structure, and the second tag sequence is not bound to the nucleic acid strand in a free single-stranded state.
In certain embodiments, the second target binding sequence is capable of hybridizing or annealing to the 3 'end of the fragmented nucleic acid strand under conditions that allow for hybridization or annealing of the nucleic acid, and the 3' end is formed as a result of fragmentation of the nucleic acid strand by the second functional complex.
In certain embodiments, the second target binding sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer in length.
In certain embodiments, the second target binding sequence is different from the first target binding sequence. In certain embodiments, the nucleic acid strand bound by the second target binding sequence is different from the nucleic acid strand bound by the first target binding sequence. In certain embodiments, the nucleic acid strand bound by the second target binding sequence is the opposite strand of the nucleic acid strand bound by the first target binding sequence.
In certain embodiments, the second tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the second tag sequence is the same as or different from the first tag sequence; in certain embodiments, the second tag sequence is different from the first tag sequence.
In certain embodiments, after hybridization or annealing of the second target binding sequence to the 3 'end of the fragmented nucleic acid strand, the second DNA polymerase can extend the 3' end of the nucleic acid strand using the second tag primer as a template. In certain embodiments, the extension forms a second lobe.
In certain embodiments, the second DNA polymerase is the same as or different from the first DNA polymerase. In certain embodiments, the second DNA polymerase is the same as the first DNA polymerase.
In certain embodiments, the second tagged primer is a single stranded deoxyribonucleic acid or a single stranded ribonucleic acid.
In certain embodiments, the second tag primer is a single-stranded ribonucleic acid and the second DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the second tag primer is a single stranded deoxyribonucleic acid and the second DNA polymerase is a DNA-dependent DNA polymerase.
In certain embodiments, the nucleic acid strand bound by the second guide sequence is different from the nucleic acid strand bound by the second target binding sequence. In certain embodiments, the second guide sequence-bound nucleic acid strand is the opposite strand of the second target binding sequence-bound nucleic acid strand.
In certain embodiments, the second guide sequence binds to the same nucleic acid strand as the first target binding sequence, and the binding site of the second guide sequence is located upstream or 5' of the binding site of the first target binding sequence.
In certain embodiments, the first guide sequence binds to the same nucleic acid strand as the second target binding sequence, and the binding site of the first guide sequence is located upstream or 5' of the binding site of the second target binding sequence.
In certain embodiments, the first and second lobes are contained on the same double-stranded target nucleic acid and are located on opposite nucleic acid strands from each other.
In certain embodiments, the nucleic acid molecule D2 is capable of transcribing the second tag primer in a cell.
In certain embodiments, the nucleic acid molecule D2 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule D2 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the second tag primer.
In certain embodiments, the second DNA polymerase is different from the first DNA polymerase; and, the system or kit further comprises:
(8) The second DNA polymerase or a nucleic acid molecule B2 comprising a nucleotide sequence encoding the second DNA polymerase.
In certain embodiments, the second DNA polymerase is selected from, but is not limited to, a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
In certain embodiments, the second DNA polymerase is an RNA-dependent DNA polymerase.
In certain embodiments, the second DNA polymerase is a reverse transcriptase, such as the reverse transcriptase listed above, e.g., a reverse transcriptase of moloney murine leukemia virus.
In certain embodiments, the second DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7.
In certain embodiments, the nucleic acid molecule B2 is capable of expressing the second DNA polymerase in a cell.
In certain embodiments, the nucleic acid molecule B2 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule B2 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the second DNA polymerase.
In certain embodiments, wherein the second tag primer is linked to the second gRNA.
In certain embodiments, the second tag primer is covalently linked to the second gRNA with or without a linker.
In certain embodiments, the second tag primer is attached to the 3' end of the second gRNA, optionally through a linker.
In certain embodiments, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker).
In certain embodiments, the second tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the second gRNA, either with or without a ribonucleic acid linker, forming a second PegRNA.
In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell.
In certain embodiments, the system or kit comprises: a second PegRNA comprising the second gRNA and the second tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA.
In certain embodiments, the second Cas protein is isolated or linked to the second DNA polymerase.
In certain embodiments, the second Cas protein is covalently linked to the second DNA polymerase through a linker or not.
In certain embodiments, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35.
In certain embodiments, the second Cas protein is fused to the second DNA polymerase with or without a peptide linker, forming a second fusion protein.
In certain embodiments, the second Cas protein is optionally linked or fused to the N-terminus of the second DNA polymerase by a linker; alternatively, the second Cas protein is optionally linked or fused to the C-terminus of the second DNA polymerase by a linker.
In certain embodiments, the second fusion protein has the amino acid sequence set forth in SEQ ID NO. 8.
In certain embodiments, the second fusion protein or the second cas protein may be split into two portions by an intein split system. It will be readily appreciated that the intein resolution system may be resolved at any amino acid position of the second fusion protein or the second cas protein. For example, in certain embodiments, the intein resolution system splits inside the second cas protein. Thus, in certain embodiments, the second cas protein is split into an N-terminal segment and a C-terminal segment. For example, the N-and C-terminal segments of the second cas protein may be fused to the N-and C-terminal segments of the intein (or to the C-and N-terminal segments of the intein, respectively), respectively, and both may be capable of reconstituting the active second cas protein within the cell. In certain embodiments, the N-terminal and C-terminal segments of the second cas protein are each inactive in an isolated state, but are capable of reconstituting an active second cas protein within a cell. Accordingly, in certain embodiments, the nucleic acid molecule A1 may be split into two portions comprising nucleotide sequences encoding the N-terminal and C-terminal segments, respectively, of the second cas protein. Furthermore, it is easy to understand that in the second fusion protein, the second DNA polymerase may be fused to the N-terminal or C-terminal segment of the second cas protein. In certain embodiments, the second DNA polymerase is fused to the C-terminal segment of the second cas protein.
In certain embodiments, the nucleic acid molecule A2 and the nucleic acid molecule B2 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A2 and nucleic acid molecule B2 are capable of expressing the isolated second Cas protein and the second DNA polymerase, or are capable of expressing a second fusion protein comprising the second Cas protein and the second DNA polymerase, in a cell.
In certain embodiments, the system or kit comprises a second fusion protein comprising the second Cas protein and the second DNA polymerase, or a nucleic acid molecule comprising a nucleotide sequence encoding the second fusion protein. Alternatively, the isolated second Cas protein and second DNA polymerase, or a nucleic acid molecule capable of expressing the isolated second Cas protein and second DNA polymerase.
In certain embodiments, the first and second Cas proteins are the same Cas protein and the first and second DNA polymerases are the same DNA polymerase; and, the system or kit comprises:
(M1-1) a first fusion protein comprising the first Cas protein and the first DNA polymerase, or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein; or, (M1-2) the isolated first Cas protein and first DNA polymerase, or a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase;
(M2) a first PegRNA comprising the first gRNA and a first tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA;
(M3) a second PegRNA comprising the second gRNA and a second tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA.
In certain embodiments, the system or kit further comprises a nucleic acid vector (e.g., a donor nucleic acid vector).
In certain embodiments, the nucleic acid vector further comprises a first PAM sequence recognized by the first Cas protein, and/or a second PAM sequence recognized by the second Cas protein.
In certain embodiments, the nucleic acid vector is double-stranded.
In certain embodiments, the nucleic acid vector is a circular double stranded vector.
In certain embodiments, the nucleic acid vector comprises a first guide binding sequence capable of hybridizing or annealing to the first guide sequence (e.g., a complement of the first guide sequence), and/or a second guide binding sequence capable of hybridizing or annealing to the second guide sequence (e.g., a complement of the second guide sequence); optionally, the nucleic acid vector further comprises a restriction enzyme site between the first and second guide binding sequences.
In certain embodiments, the first and second guide binding sequences are located on opposite strands of the nucleic acid vector.
In certain embodiments, the first functional complex is capable of cleaving one nucleic acid strand (first strand) of the nucleic acid vector through the first guide binding sequence and the first PAM sequence; and/or, the second functional complex is capable of cleaving another nucleic acid strand (second strand) of the nucleic acid vector through the second guide binding sequence and the second PAM sequence.
In certain embodiments, the nucleic acid vector further comprises a nucleic acid sequence of interest.
In certain embodiments, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome.
In certain embodiments, the first PAM sequence and the second PAM sequence flank the nucleic acid sequence of interest, respectively.
In certain embodiments, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
In certain embodiments, the second guide binding sequence is located between the nucleic acid sequence of interest and the second PAM sequence.
In certain embodiments, the first functional complex and the second functional complex cleave a first strand and a second strand, respectively, of the nucleic acid vector, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, and a double-stranded portion located between the 3' ends of the two nicks comprises a nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest.
In certain embodiments, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state. In certain embodiments, the nucleic acid strand hybridized or annealed to the first target binding sequence is an opposing strand of the nucleic acid strand comprising the first guide binding sequence.
In certain embodiments, the second tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the second functional complex via the second target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the second tag sequence of the second tag primer is in a free state. In certain embodiments, the nucleic acid strand hybridized or annealed to the second target binding sequence is an opposing strand of the nucleic acid strand comprising the second guide binding sequence.
In certain embodiments, the nucleic acid strand hybridized or annealed to the first target binding sequence is an opposite strand of the nucleic acid strand hybridized or annealed to the second target binding sequence.
In certain embodiments, the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state. In certain embodiments, the first target sequence is located on the opposite strand of the first guide binding sequence. In certain embodiments, the first target sequence is located at the end of the cleaved first strand. In certain embodiments, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence can be extended (preferably, forming a first lobe) using the first tag primer annealed to the first target sequence as a template.
And/or the number of the groups of groups,
the nucleic acid vector further comprises a second target sequence; wherein the second tag primer is capable of hybridizing or annealing to the second target sequence through the second target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the second tag sequence of the second tag primer is in a free state. In certain embodiments, the second target sequence is located on the opposite strand of the second guide binding sequence. In certain embodiments, the second target sequence is located at the end of the cleaved second strand. In certain embodiments, after cleavage of the second strand by the second functional complex, the 3' end of the nucleic acid strand comprising the second target sequence can be extended (preferably, forming a second lobe) using the second tag primer annealed to the second target sequence as a template.
In certain embodiments, the nucleic acid strand comprising the first target sequence is located on the opposite strand of the nucleic acid strand comprising the second target sequence.
In certain embodiments, the nucleic acid vector further comprises a restriction enzyme site between the first target sequence and the second target sequence.
In certain embodiments, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the second target sequence.
In certain embodiments, the system or kit further comprises:
(9) A third nucleic acid editing system for double strand breaking a third double stranded target nucleic acid.
In certain embodiments, the third nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
In certain embodiments, the third nucleic acid editing system is capable of fragmenting both strands of a third double-stranded target nucleic acid to form fragmented nucleotide fragments a1 and a2.
In certain embodiments, the first tag sequence or its complement or the first lobe is capable of hybridizing to or annealing to the fragmented nucleotide fragment a1 under conditions that allow for hybridization or annealing of nucleic acids.
In certain embodiments, the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the cleaved nucleotide fragment a1 at the end formed by cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the first tag sequence or the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3 'end or 3' portion is formed as a result of the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the fragmented nucleotide fragment a2 contains a target site homology arm with at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the donor homology arm.
In certain embodiments, the target site homology arm is located upstream of the third double-stranded target nucleic acid break and the donor homology arm is located upstream of the nucleic acid sequence of interest; alternatively, the target site homology arm is located downstream of the third double stranded target nucleic acid break and the donor homology arm is located downstream of the nucleic acid sequence of interest.
In certain embodiments, the donor homology arm and the target site homology arm are each independently 100 to 300bp,300 to 500bp,500 to 1000bp,1000 to 2000bp,2000 to 5000bp in length.
In certain embodiments, the sequence of the target site homology arm is selected from an exon sequence, an intron sequence, an intergenic sequence, a 3'utr sequence, a 5' utr sequence, a promoter sequence, or a color body sequence.
In certain embodiments, the system or kit further comprises:
(9) A third nucleic acid editing system for double strand breaking a third double stranded target nucleic acid.
In certain embodiments, the third nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
In certain embodiments, the third nucleic acid editing system is capable of fragmenting both strands of a third double-stranded target nucleic acid to form fragmented nucleotide fragments a1 and a2.
In certain embodiments, the first tag sequence or its complement or the first lobe is capable of hybridizing to or annealing to the fragmented nucleotide fragment a1 under conditions that allow for hybridization or annealing of nucleic acids.
In certain embodiments, the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the cleaved nucleotide fragment a1 at the end formed by cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the first tag sequence or the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3 'end or 3' portion is formed as a result of the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the second tag sequence or its complement or the second flap is capable of hybridizing to or annealing to the fragmented nucleotide fragment a2 under conditions that allow hybridization or annealing of the nucleic acids.
In certain embodiments, the second tag sequence or its complement or the second flap is capable of hybridizing or annealing to the cleaved nucleotide fragment a2 at the end formed by cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a2, and the 3 'end or 3' portion is formed as a result of the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the third nucleic acid editing system is a CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
In certain embodiments, the third nucleic acid editing system comprises: (i) A third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the third gRNA; wherein the third gRNA is capable of binding to a third Cas protein and forming a third functional complex; the third functional complex is capable of cleaving both strands of a third double-stranded target nucleic acid to form cleaved nucleotide fragments a1 and a2.
In certain embodiments, the third Cas protein is selected from Cas proteins that cleave DNA double strands, such as Cas9 proteins.
In certain embodiments, the third gRNA has a sequence as set forth in any one of SEQ ID NOs 11, 38, 54, 67, 80, 93, 106, 119 or 132.
In certain embodiments, the third gRNA has a sequence as set forth in any one of SEQ ID NOs 11, 38, 54, 67, 80, 93, 106, 119 or 132.
In certain embodiments, the gRNA is used to recognize the 3' URT region of the GAPDH site, and the third gRNA has the sequence set forth in SEQ ID NO. 11. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO:145, the third gRNA has a sequence as set forth in SEQ ID NO: 11.
In certain embodiments, the third gRNA has a sequence as set forth in SEQ ID No. 38 when the gRNA is used to recognize the first intron of the human genomic AAVS1 site. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO. 146, the third gRNA has a sequence as set forth in SEQ ID NO. 38.
In certain embodiments, the gRNA is used to recognize the first intron of the genomic Rosa26 site, and the third gRNA has the sequence shown as SEQ ID NO. 54. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO:147, the third gRNA has a sequence as set forth in SEQ ID NO: 54;
in certain embodiments, the gRNA is used to recognize the human genomic CCR5 site and the third gRNA has the sequence set forth in SEQ ID NO. 67. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO. 148, the third gRNA has a sequence as set forth in SEQ ID NO. 67.
In certain embodiments, the gRNA is used to recognize a human genomic TRAC site and the third gRNA has a sequence as set forth in SEQ ID NO. 80. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO:149, the third gRNA has a sequence as set forth in SEQ ID NO: 80.
In certain embodiments, the gRNA is used to recognize WAS-1 sites and the third gRNA has a sequence as set forth in SEQ ID NO. 93. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO. 150, the third gRNA has a sequence as set forth in SEQ ID NO. 93.
In certain embodiments, the gRNA is used to recognize WAS-3 sites and the third gRNA has the sequence shown as SEQ ID NO. 106. In certain embodiments, when the third double stranded target nucleic acid comprises the sequence set forth in SEQ ID NO. 151, the third gRNA has the sequence set forth in SEQ ID NO. 106.
In certain embodiments, the gRNA is used to recognize the HBB site and the third gRNA has the sequence shown as SEQ ID NO. 119. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO. 152, the third gRNA has a sequence as set forth in SEQ ID NO. 119.
In certain embodiments, the gRNA is used to recognize the IL2RG site and the third gRNA has the sequence shown as SEQ ID NO. 132. In certain embodiments, when the third double stranded target nucleic acid comprises a sequence as set forth in SEQ ID NO. 153, the third gRNA has a sequence as set forth in SEQ ID NO. 132.
In certain embodiments, the system or kit further comprises:
(10) A fourth nucleic acid editing system for double strand breaking a fourth double strand target nucleic acid.
In certain embodiments, the fourth nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
In certain embodiments, the third nucleic acid editing system and the fourth nucleic acid editing system are selected from the same site-specific nuclease technology.
In certain embodiments, the fourth double-stranded target nucleic acid is identical to the third double-stranded target nucleic acid, and the third and fourth nucleic acid editing systems cleave the same double-stranded target nucleic acid at different locations, forming cleaved nucleotide fragments a1, a2, and a3; wherein, prior to cleavage, in the same double-stranded target nucleic acid, the nucleotide fragments a1, a2 and a3 are arranged in sequence (i.e., the nucleotide fragment a1 is connected to the nucleotide fragment a3 by the nucleotide fragment a 2). In certain embodiments, the third and fourth nucleic acid editing systems result in the separation of nucleotide fragments a1 and a2 and the separation of nucleotide fragments a2 and a3, respectively.
In certain embodiments, the first tag sequence or its complement or the first lobe is capable of hybridizing to or annealing to the fragmented nucleotide fragment a1 under conditions that allow for hybridization or annealing of nucleic acids.
In certain embodiments, the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the cleaved nucleotide fragment a1 at the end formed by cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the first tag sequence or the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3 'end or 3' portion is formed as a result of the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the second tag sequence or its complement or the second flap is capable of hybridizing to or annealing to the fragmented nucleotide fragment a3 under conditions that allow hybridization or annealing of the nucleic acids.
In certain embodiments, the second tag sequence or its complement or the second flap is capable of hybridizing or annealing to the cleaved nucleotide fragment a3 at the end formed by cleavage of the third double-stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a3, and the 3 'end or 3' portion is formed as a result of the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
In certain embodiments, the fourth nucleic acid editing system is a CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
In certain embodiments, the fourth nucleic acid editing system comprises: (i) A fourth Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the fourth Cas protein, and (ii) a fourth gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the fourth gRNA; wherein the fourth gRNA is capable of binding to a fourth Cas protein and forming a fourth functional complex; the fourth functional complex is capable of fragmenting both strands of a fourth double-stranded target nucleic acid to form fragmented target nucleic acid fragments b1 and b2.
In certain embodiments, the fourth Cas protein is selected from Cas proteins that cleave DNA double strands, such as Cas9 proteins.
In certain embodiments, the third nucleic acid editing system and the fourth nucleic acid editing system are CRISPR (clustered regularly interspaced short palindromic repeats)/Cas systems.
In certain embodiments, the third nucleic acid editing system is as defined previously and the fourth nucleic acid editing system is as defined previously.
In certain embodiments, the kit further comprises additional systems or components.
In certain embodiments, the additional components include one or more selected from the group consisting of:
(1) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional grnas or nucleic acid molecules containing a nucleotide sequence encoding the additional grnas, wherein the additional grnas are capable of binding to a Cas protein and forming a functional complex. In certain embodiments, the functional complex is capable of cleaving two strands or one strand of a double stranded target nucleic acid.
(2) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional Cas proteins or nucleic acid molecules containing nucleotide sequences encoding the additional Cas proteins. In certain embodiments, the Cas protein is capable of cleaving or cleaving one strand or both strands of a double stranded target nucleic acid.
(3) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional tag primers or nucleic acid molecules comprising a nucleotide sequence encoding the additional tag primers, wherein the additional tag primers comprise a tag sequence and a target binding sequence, the tag sequence being located upstream or 5' of the target binding sequence. In certain embodiments, the target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand under conditions that allow for hybridization or annealing of the nucleic acid, forming a double stranded structure, and the tag sequence is not bound to the target nucleic acid fragment in a free single stranded state.
(4) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional DNA polymerases or nucleic acid molecules comprising nucleotide sequences encoding the additional DNA polymerases. In certain embodiments, the additional DNA polymerase is selected from the group consisting of a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase. In certain embodiments, the additional DNA polymerase is an RNA-dependent DNA polymerase, such as a reverse transcriptase.
In certain embodiments, the additional system comprises: one or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) nucleic acid editing systems for double strand breaking a double-stranded target nucleic acid.
In certain embodiments, the nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease), or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
In a second aspect, the present application provides a fusion protein comprising a Cas protein and a template-dependent DNA polymerase, wherein the Cas protein is capable of cleaving one nucleic acid strand of a target nucleic acid.
In certain embodiments, the Cas protein is capable of cleaving one nucleic acid strand of a target nucleic acid and creating a nick.
In certain embodiments, the Cas protein is selected from Cas proteins that cleave a DNA single strand.
In certain embodiments, the Cas protein is selected from the group consisting of Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, csr 3 protein, cmr4 protein, cmr5 protein, cmr6 protein, csb1 protein, B2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx1 protein, csx15 (variants of Csx 9, csx1, csx 2 protein), a (variants of Csx 9, csx) or a variant (variants of Csx1, csx) protein).
In certain embodiments, the Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A)).
In certain embodiments, the Cas protein has the amino acid sequence set forth in SEQ ID No. 3.
In certain embodiments, the DNA polymerase is selected from the group consisting of a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
In certain embodiments, the DNA polymerase is an RNA-dependent DNA polymerase.
In certain embodiments, the DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-associated virus, and myeloblastosis-associated virus (MAV).
In certain embodiments, the DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7.
In certain embodiments, the Cas protein is covalently linked to the DNA polymerase through a linker or not.
In certain embodiments, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35.
In certain embodiments, the Cas protein is optionally linked or fused to the N-terminus of the DNA polymerase by a linker; alternatively, the Cas protein is optionally linked or fused to the C-terminus of the DNA polymerase by a linker.
In certain embodiments, the fusion protein has the amino acid sequence set forth in SEQ ID NO. 8.
In a third aspect, the present application provides a nucleic acid molecule comprising a polynucleotide encoding a fusion protein as described above.
In a fourth aspect, the present application provides a vector comprising a nucleic acid molecule as described above.
In certain embodiments, the vector is an expression vector.
In certain embodiments, the vector is a eukaryotic expression vector.
In a fifth aspect, the present application provides a host cell comprising a nucleic acid molecule as described above or a vector as described above.
In certain embodiments, the host cell is a prokaryotic cell, such as an e.coli cell; or the host cell is a eukaryotic cell, e.g., a yeast cell, a fungal cell, a plant cell, an animal cell.
In certain embodiments, the host cell is a mammalian cell, such as a human cell.
In a fifth aspect, the present application provides a method of preparing a fusion protein as described above, comprising, (1) culturing a host cell as described above under conditions that allow expression of the protein; and (2) isolating the fusion protein expressed by the host cell.
In a sixth aspect, the present application provides a complex comprising a first Cas protein and a template-dependent first DNA polymerase, wherein the first Cas protein has the ability to cleave one nucleic acid strand of a double-stranded target nucleic acid, and wherein the first Cas protein is complexed with the first DNA polymerase by covalent or non-covalent means.
In certain embodiments, the first Cas protein is capable of cleaving one nucleic acid strand of a double-stranded target nucleic acid and creating a nick.
In certain embodiments, the first Cas protein is selected from Cas proteins that cleave a DNA single strand.
In certain embodiments, the first Cas protein is selected from the group consisting of Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, B1 protein, csb2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx 9 protein, csx (variants of Csx1, csx 9, csx1 protein, csx 2 protein), a (variants of Csx 9, csx) and (variants of the case of a mutant, csx1 protein, csx 9 protein, csx 2 protein).
In certain embodiments, the first Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A)).
In certain embodiments, the first Cas protein has the amino acid sequence set forth in SEQ ID No. 3.
In certain embodiments, the first DNA polymerase is selected from the group consisting of a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
In certain embodiments, the first DNA polymerase is an RNA-dependent DNA polymerase.
In certain embodiments, the first DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-associated virus, and myeloblastosis-associated virus (MAV).
In certain embodiments, the first DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7.
In certain embodiments, the first Cas protein is covalently linked to the first DNA polymerase through a linker or not.
In certain embodiments, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35.
In certain embodiments, the first Cas protein is fused to the first DNA polymerase with or without a peptide linker to form a fused first synthetic protein.
In certain embodiments, the first Cas protein is optionally linked or fused to the N-terminus of the first DNA polymerase by a linker; alternatively, the first Cas protein is optionally linked or fused to the C-terminus of the first DNA polymerase by a linker.
In certain embodiments, the first fusion protein has the amino acid sequence set forth in SEQ ID NO. 8.
In certain embodiments, the complex further comprises a first gRNA.
In certain embodiments, the first gRNA is capable of binding to the first Cas protein and forms a first functional unit; the first functional unit is capable of binding to one nucleic acid strand (second strand) of the double-stranded target nucleic acid and breaking the other nucleic acid strand (first strand) of the double-stranded target nucleic acid.
In certain embodiments, the first gRNA contains a first guide sequence and the first guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a double-stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acid.
In certain embodiments, the first guide sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the first gRNA further contains a first scaffold sequence that is capable of being recognized and bound by the first Cas protein, thereby forming a first functional unit.
In certain embodiments, the first scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the first guide sequence is located upstream or 5' of the first scaffold sequence.
In certain embodiments, the complex or first functional unit is capable of cleaving one strand of a double-stranded target nucleic acid after the first guide sequence binds to the double-stranded target nucleic acid.
In certain embodiments, the complex further comprises a double stranded target nucleic acid.
In certain embodiments, the double stranded target nucleic acid contains a first PAM sequence recognized by the first Cas protein and a first guide binding sequence capable of hybridizing or annealing to the first guide sequence, whereby the first functional unit binds the double stranded target nucleic acid through the first guide binding sequence and the first PAM sequence.
In certain embodiments, the complex further comprises a first tag primer that hybridizes to or anneals to the double-stranded target nucleic acid; wherein the first tag primer contains a first target binding sequence that is capable of hybridizing or annealing to the double stranded target nucleic acid.
In certain embodiments, the tag primer contains a first tag sequence and a first target binding sequence, the first tag sequence being located upstream or 5' of the first target binding sequence; and, the first target binding sequence is capable of hybridizing or annealing to the double stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acid. In certain embodiments, the first target binding sequence is capable of hybridizing or annealing to the 3' end of the nucleic acid strand of the double-stranded target nucleic acid that is cleaved by the first functional unit, forming a double-stranded structure. In certain embodiments, the 3' end is formed by cleavage of one nucleic acid strand of the double-stranded target nucleic acid by the first functional unit. In certain embodiments, the first tag sequence is not bound to the cleaved nucleic acid strand, in a free single stranded state.
In certain embodiments, the first target-binding sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer.
In certain embodiments, the first tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the first tag primer binds to the fragmented nucleic acid strand through the first target binding sequence. In certain embodiments, the first DNA polymerase is bound to the cleaved nucleic acid strand and the first tag primer.
In certain embodiments, the first tagged primer is a single stranded deoxyribonucleic acid or a single stranded ribonucleic acid.
In certain embodiments, the first tag primer is a single-stranded ribonucleic acid and the first DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the first tag primer is a single stranded deoxyribonucleic acid and the first DNA polymerase is a DNA-dependent DNA polymerase.
In certain embodiments, the cleaved nucleic acid strand is extended by the first DNA polymerase using the first tag primer as a template to form a first lobe.
In certain embodiments, the first gRNA-bound nucleic acid strand is different from the first tag primer-bound nucleic acid strand. In certain embodiments, the first gRNA-bound nucleic acid strand is the opposite strand of the first tag primer-bound nucleic acid strand.
In certain embodiments, the first tag primer is linked to the first gRNA.
In certain embodiments, the first tag primer is covalently linked to the first gRNA with or without a linker.
In certain embodiments, the first tag primer is attached to the 3' end of the first gRNA, optionally through a linker.
In certain embodiments, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker).
In certain embodiments, the first tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the first gRNA with or without a ribonucleic acid linker, forming a first PegRNA.
In certain embodiments, the complex further comprises a second Cas protein and a second gRNA, wherein the second Cas protein has the ability to cleave one nucleic acid strand of a double-stranded target nucleic acid, the second gRNA being capable of binding to the second Cas protein and forming a second functional unit; the second functional unit is capable of binding to a double stranded target nucleic acid and cleaving one strand thereof.
In certain embodiments, the second Cas protein is the same as or different from the first Cas protein. In certain embodiments, the second Cas protein is the same as the first Cas protein.
In certain embodiments, the second Cas protein is capable of cleaving one nucleic acid strand of a double-stranded target nucleic acid and creating a nick.
In certain embodiments, the second Cas protein is selected from Cas proteins that cleave a DNA single strand.
In certain embodiments, the second Cas protein is selected from the group consisting of Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, B1 protein, B2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx16 protein, csx1 protein, csx15 (variants of Csx 9, csx1, csx 2 protein), a (variants of Csx 9, csx1 protein, csx) and (variants of Csx 9, csx) protein).
In certain embodiments, the second Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A)).
In certain embodiments, the second Cas protein has the amino acid sequence set forth in SEQ ID No. 3.
In certain embodiments, the second gRNA contains a second guide sequence and the second guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a double-stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acid.
In certain embodiments, the second guide sequence is different from the first guide sequence. In certain embodiments, the nucleic acid strand to which the first guide sequence binds is different from the nucleic acid strand to which the second guide sequence binds. In certain embodiments, the first guide sequence-bound nucleic acid strand is the opposite strand of the second guide sequence-bound nucleic acid strand.
In certain embodiments, the second functional unit is identical to a double stranded target nucleic acid to which the first functional unit binds, the double stranded target nucleic acid comprising a first strand and a second strand, the first functional unit being capable of cleaving the first strand after the first guide sequence binds to the first strand, the second functional unit cleaving the first strand after the second guide sequence binds to the first strand. In certain embodiments, the second functional unit breaks at a different position relative to the chain than the first functional unit.
In certain embodiments, the second guide sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the second gRNA further contains a second scaffold sequence that is capable of being recognized and bound by the second Cas protein, thereby forming a second functional unit.
In certain embodiments, the second scaffold sequence is the same as or different from the first scaffold sequence. In certain embodiments, the second scaffold sequence is identical to the first scaffold sequence.
In certain embodiments, the second scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the second guide sequence is located upstream or 5' of the second scaffold sequence.
In certain embodiments, the double stranded target nucleic acid contains a second PAM sequence recognized by the second Cas protein and a second guide binding sequence capable of hybridizing or annealing to the second guide sequence, whereby the second functional unit binds the double stranded target nucleic acid through the second guide binding sequence and the second PAM sequence.
In certain embodiments, the complex further comprises a template-dependent second DNA polymerase that is complexed with the second Cas protein by covalent or non-covalent means.
In certain embodiments, the second DNA polymerase is selected from the group consisting of a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase.
In certain embodiments, the second DNA polymerase is an RNA-dependent DNA polymerase.
In certain embodiments, the second DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-associated virus, and myeloblastosis-associated virus (MAV).
In certain embodiments, the second DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7.
In certain embodiments, the second DNA polymerase is the same as or different from the first DNA polymerase. In certain embodiments, the second DNA polymerase is the same as the first DNA polymerase.
In certain embodiments, the second Cas protein is covalently linked to the second DNA polymerase through a linker or not.
In certain embodiments, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35.
In certain embodiments, the second Cas protein is fused to the second DNA polymerase with or without a peptide linker to form a fused second synthetic protein.
In certain embodiments, the second Cas protein is optionally linked or fused to the N-terminus of the second DNA polymerase by a linker; alternatively, the second Cas protein is optionally linked or fused to the C-terminus of the second DNA polymerase by a linker.
In certain embodiments, the second fusion protein has the amino acid sequence set forth in SEQ ID NO. 8.
In certain embodiments, the complex further comprises a second tag primer that hybridizes to or anneals to the double-stranded target nucleic acid; wherein the second tag primer contains a second target binding sequence that is capable of hybridizing or annealing to the double stranded target nucleic acid.
In certain embodiments, the tag primer contains a second tag sequence and a second target binding sequence, the second tag sequence being located upstream or 5' of the second target binding sequence; and, the second target binding sequence is capable of hybridizing or annealing to the double stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acid. In certain embodiments, the second target binding sequence is capable of hybridizing or annealing to the 3' end of the nucleic acid strand of the double-stranded target nucleic acid that is cleaved by the second functional unit, forming a double-stranded structure. In certain embodiments, the 3' end is formed by cleavage of one strand of the double-stranded target nucleic acid by the second functional unit. In certain embodiments, the second tag sequence is not bound to the cleaved nucleic acid strand, in a free single stranded state.
In certain embodiments, the second target binding sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer in length.
In certain embodiments, the second target binding sequence is different from the first target binding sequence. In certain embodiments, the nucleic acid strand bound by the second target binding sequence is different from the nucleic acid strand bound by the first target binding sequence. In certain embodiments, the nucleic acid strand bound by the second target binding sequence is the opposite strand of the nucleic acid strand bound by the first target binding sequence.
In certain embodiments, the second tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more.
In certain embodiments, the second tag sequence is the same as or different from the first tag sequence. In certain embodiments, the second tag sequence is different from the first tag sequence.
In certain embodiments, the second tag primer binds to the fragmented nucleic acid strand through the second target binding sequence. In certain embodiments, the second DNA polymerase is bound to the cleaved nucleic acid strand and the second tag primer.
In certain embodiments, the second tagged primer is a single stranded deoxyribonucleic acid or a single stranded ribonucleic acid.
In certain embodiments, the second tag primer is a single-stranded ribonucleic acid and the second DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the second tag primer is a single stranded deoxyribonucleic acid and the second DNA polymerase is a DNA-dependent DNA polymerase.
In certain embodiments, the fragmented target nucleic acid fragment is extended by the second DNA polymerase with the second tag primer as a template to form a second flap.
In certain embodiments, the second gRNA-bound nucleic acid strand is different from the second tag primer-bound nucleic acid strand. In certain embodiments, the second gRNA-bound nucleic acid strand is the opposite strand of the second tag primer-bound nucleic acid strand.
In certain embodiments, the second tag primer is linked to the second gRNA.
In certain embodiments, the second tag primer is covalently linked to the second gRNA with or without a linker.
In certain embodiments, the second tag primer is attached to the 3' end of the second gRNA, optionally through a linker.
In certain embodiments, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker).
In certain embodiments, the second tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the second gRNA, either with or without a ribonucleic acid linker, forming a second PegRNA.
In certain embodiments, the first and second functional units bind to a double stranded target nucleic acid in a predetermined positional relationship.
In certain embodiments, the second guide sequence binds to the same nucleic acid strand as the first target binding sequence; and/or, the first guide sequence binds to the same nucleic acid strand as the second target binding sequence.
In certain embodiments, the binding site of the second guide sequence is located upstream or 5' of the binding site of the first target binding sequence; and/or the binding site of the first guide sequence is located upstream or 5' to the binding site of the second target binding sequence.
In certain embodiments, the binding site of the second guide sequence is located downstream or 3' of the binding site of the first target binding sequence; and/or the binding site of the first guide sequence is located downstream or 3' of the binding site of the second target binding sequence.
In certain embodiments, the double stranded target nucleic acid is selected from, but is not limited to, genomic DNA and nucleic acid vector DNA.
In a tenth aspect, the present application provides a nucleic acid vector (e.g., donor nucleic acid vector) comprising a first PAM sequence recognized by a first Cas protein as described previously.
In certain embodiments, the nucleic acid vector further comprises a donor homology arm.
In certain embodiments, the nucleic acid vector is double-stranded.
In certain embodiments, the nucleic acid vector is a circular double stranded vector.
In certain embodiments, the nucleic acid vector comprises a first guide binding sequence (e.g., a complement of the first guide sequence) capable of hybridizing or annealing to the first guide sequence.
In certain embodiments, the first functional complex is capable of cleaving one nucleic acid strand of the nucleic acid vector through the first guide binding sequence and the first PAM sequence.
In certain embodiments, the nucleic acid vector further comprises a nucleic acid sequence of interest.
In certain embodiments, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome.
In certain embodiments, the first PAM sequence and the donor homology arm flank the nucleic acid sequence of interest, respectively.
In certain embodiments, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
In certain embodiments, the first functional complex breaks a first strand of the nucleic acid vector, the first strand comprising a nick resulting from the break, and the portion of the double strand located between the 3' end of the nick and the donor homology arm comprises the nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest.
In certain embodiments, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state. In certain embodiments, the nucleic acid strand hybridized or annealed to the first target binding sequence is an opposing strand of the nucleic acid strand comprising the first guide binding sequence.
In certain embodiments, the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state. In certain embodiments, the first target sequence is located on the opposite strand of the first guide binding sequence. In certain embodiments, the first target sequence is located at the end of the cleaved first strand. In certain embodiments, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence can be extended (preferably, forming a first lobe) using the first tag primer annealed to the first target sequence as a template.
In certain embodiments, the nucleic acid vector further comprises a restriction enzyme site between the first target sequence and the donor homology arm.
In certain embodiments, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the donor homology arm.
In an eleventh aspect, the present application provides a kit comprising the nucleic acid vector of the tenth aspect, and one or more components of the system or kit of the first aspect (e.g., a first Cas protein or a nucleic acid molecule A1 comprising a nucleotide sequence encoding the first Cas protein, a template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding the first DNA polymerase, a first gRNA or a nucleic acid molecule C1 comprising a nucleotide sequence encoding the first gRNA, a first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer).
In certain embodiments, the kit comprises the following 4 components:
(a) A first Cas protein or a nucleic acid molecule A1 containing a nucleotide sequence encoding the first Cas protein;
(b) A template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding said first DNA polymerase;
(c) A first gRNA or a nucleic acid molecule C1 containing a nucleotide sequence encoding the first gRNA; and, a step of, in the first embodiment,
(d) A first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding said first tag primer.
In certain embodiments, the 4 components are contained in 1 or more (e.g., 2, 3, 4) carriers.
In certain embodiments, the kit comprises the following vectors:
(a) A nucleic acid vector as hereinbefore described;
(b) A first vector comprising a nucleic acid molecule A1 encoding a nucleotide sequence of the first Cas protein and a nucleic acid molecule B1 encoding a nucleotide sequence of the first DNA polymerase;
(c) A second vector comprising a nucleic acid molecule C1 encoding a nucleotide sequence of the first gRNA and a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer;
optionally, the kit further comprises one or more components of the third nucleic acid editing system described in the system or kit described previously (e.g., (i) a third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the third gRNA).
In a twelfth aspect, the present application provides a nucleic acid vector (e.g., donor nucleic acid vector) comprising a first PAM sequence recognized by a first Cas protein as described previously.
In certain embodiments, the nucleic acid vector further comprises a second PAM sequence recognized by a second Cas protein as described previously.
In certain embodiments, the nucleic acid vector is double-stranded.
In certain embodiments, the nucleic acid vector is a circular double stranded vector.
In certain embodiments, the nucleic acid vector comprises a first guide binding sequence capable of hybridizing or annealing to the first guide sequence (e.g., a complement of the first guide sequence), and/or a second guide binding sequence capable of hybridizing or annealing to the second guide sequence (e.g., a complement of the second guide sequence). Optionally, the nucleic acid vector further comprises a restriction enzyme site between the first and second guide binding sequences.
In certain embodiments, the first and second guide binding sequences are located on opposite strands of the nucleic acid vector.
In certain embodiments, the first functional complex is capable of cleaving one nucleic acid strand (first strand) of the nucleic acid vector through the first guide binding sequence and the first PAM sequence; and/or, the second functional complex is capable of cleaving another nucleic acid strand (second strand) of the nucleic acid vector through the second guide binding sequence and the second PAM sequence.
In certain embodiments, the nucleic acid vector further comprises a nucleic acid sequence of interest.
In certain embodiments, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome.
In certain embodiments, the first PAM sequence and the second PAM sequence flank the nucleic acid sequence of interest, respectively.
In certain embodiments, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence.
In certain embodiments, the second guide binding sequence is located between the nucleic acid sequence of interest and the second PAM sequence.
In certain embodiments, the first functional complex and the second functional complex cleave a first strand and a second strand, respectively, of the nucleic acid vector, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, and a double-stranded portion located between the 3' ends of the two nicks comprises a nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest.
In certain embodiments, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state. In certain embodiments, the nucleic acid strand hybridized or annealed to the first target binding sequence is an opposing strand of the nucleic acid strand comprising the first guide binding sequence.
In certain embodiments, the second tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the second functional complex via the second target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the second tag sequence of the second tag primer is in a free state. In certain embodiments, the nucleic acid strand hybridized or annealed to the second target binding sequence is an opposing strand of the nucleic acid strand comprising the second guide binding sequence.
In certain embodiments, the nucleic acid strand hybridized or annealed to the first target binding sequence is an opposite strand of the nucleic acid strand hybridized or annealed to the second target binding sequence.
In certain embodiments, the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state. In certain embodiments, the first target sequence is located on the opposite strand of the first guide binding sequence. In certain embodiments, the first target sequence is located at the end of the cleaved first strand. In certain embodiments, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence can be extended (preferably, forming a first lobe) using the first tag primer annealed to the first target sequence as a template.
And/or the number of the groups of groups,
the nucleic acid vector further comprises a second target sequence; wherein the second tag primer is capable of hybridizing or annealing to the second target sequence through the second target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the second tag sequence of the second tag primer is in a free state. In certain embodiments, the second target sequence is located on the opposite strand of the second guide binding sequence. In certain embodiments, the second target sequence is located at the end of the cleaved second strand. In certain embodiments, after cleavage of the second strand by the second functional complex, the 3' end of the nucleic acid strand comprising the second target sequence can be extended (preferably, forming a second lobe) using the second tag primer annealed to the second target sequence as a template.
In certain embodiments, the nucleic acid strand comprising the first target sequence is located on the opposite strand of the nucleic acid strand comprising the second target sequence.
In certain embodiments, the nucleic acid vector further comprises a restriction enzyme site between the first target sequence and the second target sequence.
In certain embodiments, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the second target sequence.
In a thirteenth aspect, the present application provides a kit comprising the nucleic acid vector of the twelfth aspect, the one or more components of the system or kit of the first aspect (e.g., a first Cas protein or a nucleic acid molecule A1 comprising a nucleotide sequence encoding the first Cas protein, a template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding the first DNA polymerase, a first gRNA or a nucleic acid molecule C1 comprising a nucleotide sequence encoding the first gRNA, a first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer), and the one or more components of the system or kit of the first aspect (e.g., a second gRNA or a nucleic acid molecule C2 comprising a nucleotide sequence encoding the second gRNA, a second Cas protein or a nucleic acid molecule A2 comprising a nucleotide sequence encoding the second gRNA, a second tag primer or a nucleic acid molecule D2 comprising a nucleotide sequence encoding the second DNA polymerase, a second tag primer or a nucleic acid molecule B2 comprising a nucleotide sequence encoding the second DNA polymerase).
In certain embodiments, the kit comprises the following 8 components:
(a) A first Cas protein or a nucleic acid molecule A1 containing a nucleotide sequence encoding the first Cas protein;
(b) A template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding said first DNA polymerase;
(c) A first gRNA or a nucleic acid molecule C1 containing a nucleotide sequence encoding the first gRNA;
(d) A first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding said first tag primer;
(e) A second gRNA or a nucleic acid molecule C2 containing a nucleotide sequence encoding the second gRNA;
(f) The second Cas protein or a nucleic acid molecule A2 containing a nucleotide sequence encoding the second Cas protein;
(g) A second tag primer or a nucleic acid molecule D2 comprising a nucleotide sequence encoding said second tag primer; and
(h) The second DNA polymerase or a nucleic acid molecule B2 comprising a nucleotide sequence encoding the second DNA polymerase.
In certain embodiments, the 8 components are contained in 1 or more (e.g., 2, 3, 4, 5, 6, 7, 8) carriers.
In certain embodiments, the kit comprises the following vectors:
(a) A nucleic acid vector as hereinbefore described;
(b) A first vector comprising a nucleic acid molecule A1 encoding a nucleotide sequence of the first Cas protein and a nucleic acid molecule B1 encoding a nucleotide sequence of the first DNA polymerase;
(c) A second vector comprising a nucleic acid molecule C1 encoding a nucleotide sequence of the first gRNA and a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer;
(d) A third vector comprising nucleic acid molecule C2 encoding the nucleotide sequence of the second gRNA and the second Cas protein and nucleic acid molecule A2 encoding the nucleotide sequence of the second Cas protein; and
(e) A fourth vector comprising a nucleic acid molecule D2 encoding the nucleotide sequence of the second tag primer and a nucleic acid molecule B2 encoding the nucleotide sequence of the second DNA polymerase.
Optionally, the kit further comprises one or more components of the third nucleic acid editing system described in the system or kit described previously (e.g., (i) a third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the third gRNA).
In a seventh aspect, the present application provides a method for fragmenting one nucleic acid strand of a double-stranded target nucleic acid and adding a flap at the 3' end of the nick, wherein the method comprises using a system or kit as described previously.
In certain embodiments, the method comprises the steps of:
i. providing a double-stranded target nucleic acid; and
providing the first Cas protein, a first gRNA, a first DNA polymerase, and a first tag primer;
ii contacting the double stranded target nucleic acid with the first Cas protein, a first gRNA, a first DNA polymerase, and a first tag primer.
In certain embodiments, in step ii:
the first Cas protein and first gRNA combine to form a first functional complex, and the first functional complex breaks one nucleic acid strand of the double-stranded target nucleic acid; and, in addition, the processing unit,
the first tag primer hybridizes or anneals to the 3' end of the fragmented nucleic acid strand via the first target binding sequence; and, in addition, the processing unit,
the first DNA polymerase extends the fragmented nucleic acid strand with a first tag primer annealed to the fragmented nucleic acid strand as a template to form a first flap.
In certain embodiments, the method is performed intracellularly.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, and the first tag primer or nucleic acid molecule D1 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, and first tag primer within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, and the first tag primer or nucleic acid molecule D1 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, and first tag primer within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, and D1 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, and first tag primer within the cell.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell. In certain embodiments, in step i, the first PegRNA is delivered into the cell to provide the first gRNA and the first tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into the cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer within the cell.
In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein, and a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, and first tag primer within the cell.
In certain embodiments, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid within the cell.
In certain embodiments, the first Cas protein, the first gRNA, the first DNA polymerase, or the first tag primer are as previously defined.
In certain embodiments, the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein. In certain embodiments, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA and breaks one nucleic acid strand thereof.
In an eighth aspect, the present application provides a method for separately fragmenting two nucleic acid strands of a double-stranded target nucleic acid and separately adding a flap at the 3' ends of the two nicks created by the fragmentation in the two nucleic acid strands, wherein the method comprises using a system or kit as described above; wherein the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid. In certain embodiments, the method is used to break two nucleic acid strands of a double-stranded target nucleic acid at different positions, respectively.
In certain embodiments, the methods are performed extracellularly or intracellularly.
In certain embodiments, the method comprises the steps of:
i. providing a double-stranded target nucleic acid; and
providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, the second Cas protein, second gRNA, second DNA polymerase, and second tag primer;
ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer.
In certain embodiments, in step ii:
the first Cas protein and the first gRNA combine to form a first functional complex, and the second Cas protein and the second gRNA combine to form a second functional complex; and, the first and second functional complexes cleave the first strand and the second strand of the double-stranded target nucleic acid, respectively, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' -ends of the two nicks being referred to as a target nucleic acid fragment F1; and, in addition, the processing unit,
the first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,
The first and second DNA polymerases perform an extension reaction using the first and second tag primers annealed to the target nucleic acid fragment F1 as templates, respectively, such that 3' ends generated by cleavage in the first and second strands are extended to form first and second lobes, respectively, forming a double-stranded portion having the first and second lobes, referred to as the target nucleic acid fragment F2.
In certain embodiments, the method is performed intracellularly.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second Cas protein, the second gRNA, the second DNA polymerase, and the second tag primer within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, and D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer within the cell.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or are capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule A2 and the nucleic acid molecule B2 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A2 and nucleic acid molecule B2 are capable of expressing the isolated second Cas protein and the second DNA polymerase, or are capable of expressing a second fusion protein comprising the second Cas protein and the second DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated second Cas protein and second DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the second fusion protein is delivered into a cell and expressed in the cell to provide the second Cas protein and the second DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell. In certain embodiments, in step i, the first PegRNA is delivered into the cell to provide the first gRNA and the first tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into the cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer within the cell.
In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell. In certain embodiments, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer within the cell.
In certain embodiments, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid within the cell.
In certain embodiments, the first Cas protein, the first gRNA, the first DNA polymerase, or the first tag primer are as previously defined.
In certain embodiments, the second Cas protein, the second gRNA, the second DNA polymerase, or the second tag primer is as previously defined.
In certain embodiments, the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein. In certain embodiments, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof.
In certain embodiments, the second Cas protein is the same as the first Cas protein and the second DNA polymerase is the same as the first DNA polymerase; wherein the first Cas protein forms first and second functional complexes with the first and second grnas, respectively, and the first DNA polymerase performs an extension reaction using the first and second tag primers annealed to the target nucleic acid fragment F1 as templates, respectively, such that the 3' ends of the first and second strands resulting from the cleavage extend to form first and second lobes, respectively, forming a double-stranded portion having the first and second lobes, referred to as target nucleic acid fragment F2.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second gRNA, and the second tag primer within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, and D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or are capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell. In certain embodiments, in step i, the first PegRNA is delivered into the cell to provide the first gRNA and the first tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into the cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer within the cell.
In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell. In certain embodiments, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer within the cell.
In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein, a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA, and a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell.
In a ninth aspect, the present application provides a method for inserting a target nucleic acid fragment into a nucleic acid molecule of interest; wherein the method comprises using the system or kit of the eleventh aspect; wherein the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid for providing the target nucleic acid fragment, the target nucleic acid fragment being located between a 3 'end resulting from a break in a first strand and a 3' end resulting from a break in a second strand of the double stranded target nucleic acid; and, the third double-stranded target nucleic acid is a nucleic acid molecule of interest.
In certain embodiments, the method comprises:
a. breaking the first strand and the second strand of the first double-stranded target nucleic acid, respectively, by the method as described above, the first strand and the second strand comprising a nick resulting from the breaking, respectively, the double-stranded portion located between the 3' -ends of the two nicks being referred to as a target nucleic acid fragment F1; the first and second lobes are added to the two 3' ends, respectively, to form a double-stranded portion having the first and second lobes, which is referred to as a target nucleic acid fragment F2.
b. Fragmenting the nucleic acid molecule of interest with the third nucleic acid editing system to form fragmented nucleotide fragments a1 and a2; the method comprises the steps of,
c. Ligating the nucleotide fragments a1 and a2 with the target nucleic acid fragment F2, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest.
In certain embodiments, the methods are performed extracellularly or intracellularly.
In certain embodiments, when the nucleic acid molecule of interest is a genomic sequence present in a cell; said step a is performed outside or inside said cell; the steps b and c are performed within the cell.
In certain embodiments, the method comprises the steps of:
i. providing a double stranded target nucleic acid and a nucleic acid molecule of interest; and
providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, the second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system;
ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer, and contacting the nucleic acid molecule of interest with the third nucleic acid editing system.
In certain embodiments, in step ii:
The first Cas protein and the first gRNA combine to form a first functional complex, and the second Cas protein and the second gRNA combine to form a second functional complex; and, in addition, the processing unit,
the first and second functional complexes cleave a first strand and a second strand, respectively, of the double-stranded target nucleic acid, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' ends of the two nicks being referred to as target nucleic acid fragment F1, and the third nucleic acid editing system cleaves the nucleic acid molecule of interest, forming cleaved nucleotide fragments a1 and a2; and, in addition, the processing unit,
the first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,
the first DNA polymerase and the second DNA polymerase respectively take a first tag primer and a second tag primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that 3' -ends generated by cleavage in the first strand and the second strand respectively extend to form a first lobe and a second lobe, and a double-stranded part with the first lobe and the second lobe is formed and is called as a target nucleic acid fragment F2; wherein the first and second lobes are capable of hybridizing or annealing to the cleaved nucleotide fragments a1 and a2, respectively; and, in addition, the processing unit,
The target nucleic acid fragment F2 is hybridized or annealed to the nucleotide fragments a1 and a2 via the first and second lobes, respectively, and is inserted or ligated between the nucleotide fragments a1 and a2, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest.
In certain embodiments, the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a1, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3 'portion of one nucleic acid strand of the fragmented nucleotide fragment a1, with a first spacer region between the 3' portion of the nucleotide fragment a1 and the fragmented end formed by the third double stranded target nucleic acid.
In certain embodiments, the first spacer region has a length of 1nt to 200nt, such as 1nt to 10nt,10 to 20nt,20 to 30nt,30 to 40nt,40 to 50nt,50 to 100nt, or 100 to 200nt.
In certain embodiments, the second lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a2, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3 'portion of one nucleic acid strand of the fragmented nucleotide fragment a2, with a second spacer region between the 3' portion of the nucleotide fragment a2 and the fragmented end formed by the third double stranded target nucleic acid.
In certain embodiments, the second spacer region has a length of 1nt to 200nt, such as 1nt to 10nt,10 to 20nt,20 to 30nt,30 to 40nt,40 to 50nt,50 to 100nt, or 100 to 200nt.
In certain embodiments, the method is performed intracellularly.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, a third nucleic acid editing system, or a nucleic acid molecule A3 encoding the same is delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, nucleic acid molecule A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, and A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid within the cell.
In certain embodiments, the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein. In certain embodiments, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof.
In certain embodiments, the nucleic acid molecule of interest is genomic DNA of the cell.
In certain embodiments, the first Cas protein, the first gRNA, the first DNA polymerase, or the first tag primer are as previously defined.
In certain embodiments, the second Cas protein, the second gRNA, the second DNA polymerase, or the second tag primer is as previously defined.
In certain embodiments, the third nucleic acid editing system is as previously defined.
In certain embodiments, the third nucleic acid editing system is as defined previously, and the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein. In certain embodiments, in step ii, the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA and breaks it.
In certain embodiments, the first and second Cas proteins are identical, selected from Cas proteins that cleave a single strand of DNA, and the second DNA polymerase is identical to the first DNA polymerase; wherein the first Cas protein forms a first functional complex with the first gRNA and a second functional complex with the second gRNA, and the first DNA polymerase performs an extension reaction with the first tag primer and the second tag primer annealed to the target nucleic acid fragment F1 as templates, respectively, to form a target nucleic acid fragment F2 having a first flap and a second flap.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system, or the nucleic acid molecule A3 encoding the same are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, a third nucleic acid editing system, or a nucleic acid molecule A3 encoding the same is delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, D2, and A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, and third nucleic acid editing system within the cell.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or are capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell. In certain embodiments, in step i, the first PegRNA is delivered into the cell to provide the first gRNA and the first tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into the cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer within the cell.
In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell. In certain embodiments, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer within the cell.
In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein, a nucleic acid molecule containing a nucleotide sequence encoding the first PegRNA, a nucleic acid molecule containing a nucleotide sequence encoding the second PegRNA, and a nucleic acid molecule containing a sequence encoding the third nucleic acid editing system are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, and third nucleic acid editing system within the cell.
In a fourteenth aspect, the present application provides a method for inserting a target nucleic acid fragment into a nucleic acid molecule of interest; wherein the method comprises using the system or kit of the thirteenth aspect; wherein the first double-stranded target nucleic acid is used to provide the target nucleic acid fragment, the target nucleic acid fragment being located between the 3' end of the double-stranded target nucleic acid resulting from the first strand break and the donor homology arm; and, the third double-stranded target nucleic acid is a nucleic acid molecule of interest;
Optionally, the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid and is comprised in a nucleic acid vector according to the twelfth aspect of the claims.
In certain embodiments, the method comprises:
a. cleaving a first strand of the first double-stranded target nucleic acid, the first strand comprising a nick resulting from the cleavage, by the method as described above, the portion of the first strand located between the 3' end of the nick and the donor homology arm being referred to as target nucleic acid strand S1; adding a first lobe at the 3' end to form a first strand portion having the first lobe, referred to as a target nucleic acid strand S2;
b. fragmenting the nucleic acid molecule of interest with the third nucleic acid editing system to form fragmented nucleotide fragments a1 and a2; the method comprises the steps of,
c. the target nucleic acid strand S2 hybridizes or anneals to the first strand of the nucleotide fragment a1 through the first flap; performing an extension reaction using the target nucleic acid strand S2 as a template to form an extension strand E1, the extension strand E1 comprising a complementary sequence of the target nucleic acid strand S2 and a complementary sequence of a donor homology arm flanking the S2; the extended strand E1 is linked to a2 via a donor homology arm, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest.
In certain embodiments, the 3 'end of the first strand of the nucleotide fragment a1 comprises a complementary sequence of the first lobe and the 3' end of the second strand of the nucleotide fragment a1 comprises a sequence of the first lobe.
In certain embodiments, the nicked end of nucleotide fragment a2 comprises a target site homology arm.
In certain embodiments, the methods are performed extracellularly or intracellularly.
In certain embodiments, when the nucleic acid molecule of interest is genomic DNA present in a cell; said step a is performed outside or inside said cell; the steps b, c and d are performed within the cell.
In certain embodiments, the method comprises the steps of:
i. providing a double-stranded target nucleic acid comprising a donor homology arm, a first PAM sequence recognized by a first Cas protein and a first gRNA-recognized sequence (preferably, the double-stranded target nucleic acid comprises a donor homology arm, a first PAM sequence recognized by a first Cas protein and a first guide sequence recognized by a first gRNA); and
providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system;
ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, and contacting the nucleic acid molecule of interest with the third nucleic acid editing system.
In certain embodiments, in step ii:
the first Cas protein and the first gRNA combine to form a first functional complex; and, in addition, the processing unit,
the first functional complex breaks a first strand of the double-stranded target nucleic acid, the first strand comprising a nick resulting from the break, a portion of the first strand located between the 3' end of the nick and the donor homology arm being referred to as target nucleic acid strand S1, and the third nucleic acid editing system breaks the nucleic acid molecule of interest, forming broken nucleotide fragments a1 and a2; and, in addition, the processing unit,
the first tag primer hybridizes or anneals to the 3 'end of the target nucleic acid strand S1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, in addition, the processing unit,
the first DNA polymerase performs an extension reaction using the first and second tag primers annealed to the target nucleic acid strand S1 as templates, so that 3' -ends generated by cleavage in the first strand are respectively extended to form first lobes, forming first strand portions having the first lobes, which are referred to as target nucleic acid strand S2; wherein the first lobe is capable of hybridizing or annealing to the fragmented nucleotide fragment a 1; and, in addition, the processing unit,
The target nucleic acid strand S2 hybridizes or anneals to the first strand of the nucleotide fragment a1 through the first lobe, whereby the target nucleic acid strand S2 is connected between the second strand of the target nucleic acid fragment a1 and the second strand of the target nucleic acid fragment a 2;
performing an extension reaction on the 3' -end of the first strand of the nucleotide fragment a1 by using the target nucleic acid strand S2 as a template to form an extension strand E1, wherein the extension strand E1 comprises a complementary sequence of the target nucleic acid strand S2 and a complementary sequence of a donor homology arm flanking the S2; the extended strand E1 anneals to the first strand of a2 via a donor homology arm, such that the extended strand E1 is attached between the first strand of target nucleic acid fragment a1 and the first strand of target nucleic acid fragment a2, forming a double-stranded structure, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest.
In certain embodiments, the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a1, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system.
In certain embodiments, the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3 'portion of one nucleic acid strand of the fragmented nucleotide fragment a1, with a first spacer region between the 3' portion of the nucleotide fragment a1 and the fragmented end formed by the third double stranded target nucleic acid.
In certain embodiments, the first spacer region has a length of 1nt to 200nt, such as 1nt to 10nt,10 to 20nt,20 to 30nt,30 to 40nt,40 to 50nt,50 to 100nt, or 100 to 200nt.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, a third nucleic acid editing system, or a nucleic acid molecule A3 encoding the same is delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell;
alternatively, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1 is contacted with the double-stranded target nucleic acid extracellularly, and then the edited double-stranded target nucleic acid and a third nucleic acid editing system or nucleic acid molecule A3 encoding the same are delivered into a cell to provide a double-stranded target nucleic acid having a first lobe and a third nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, nucleic acid molecule A3 is delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, and A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system within the cell.
In certain embodiments, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid within the cell.
In certain embodiments, the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a donor homology arm. In certain embodiments, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA and breaks one strand thereof.
In certain embodiments, the nucleic acid molecule of interest is genomic DNA of the cell.
In certain embodiments, the first Cas protein, the first gRNA, the first DNA polymerase, or the first tag primer is as defined previously.
In certain embodiments, the third nucleic acid editing system is as previously defined.
In certain embodiments, the third nucleic acid editing system is as defined previously, and the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein. In certain embodiments, in step ii, the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA and breaks it.
In a tenth aspect, the present application provides a method for replacing a nucleotide fragment in a nucleic acid molecule of interest with a target nucleic acid fragment; wherein the method comprises using a system or kit as described above; wherein the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid for providing the target nucleic acid fragment, the target nucleic acid fragment being located between a nick resulting from a break in a first strand and a nick resulting from a break in a second strand of the double stranded target nucleic acid; and, the third double-stranded target nucleic acid is identical to the fourth double-stranded target nucleic acid, being a nucleic acid molecule of interest.
In certain embodiments, the method comprises:
a. by the method as before, the first strand and the second strand of the first double-stranded target nucleic acid are respectively cleaved, the first strand and the second strand respectively comprising a nick resulting from the cleavage, and the double-stranded portion located between the 3' -ends of the two nicks is referred to as a target nucleic acid fragment F1; adding a first lobe and a second lobe to the two 3' ends, respectively, to form a double-stranded portion having the first lobe and the second lobe, which is called a target nucleic acid fragment F2;
b. fragmenting said nucleic acid molecules of interest with said third and fourth nucleic acid editing systems to form fragmented nucleotide fragments a1, a2 and a3; wherein, prior to cleavage, in the nucleic acid molecule of interest, nucleotide fragments a1, a2 and a3 are arranged in sequence (i.e., nucleotide fragment a1 is linked to nucleotide fragment a3 by nucleotide fragment a 2); the method comprises the steps of,
c. ligating the nucleotide fragments a1 and a3 with the target nucleic acid fragment F2, thereby replacing the nucleotide fragment a2 in the nucleic acid molecule of interest with the target nucleic acid fragment.
In certain embodiments, the method comprises the steps of:
i. providing a double stranded target nucleic acid and a nucleic acid molecule of interest; and
Providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, the second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system;
ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer, and contacting the nucleic acid molecule of interest with a third nucleic acid editing system and a fourth nucleic acid editing system.
In certain embodiments, in step ii:
the first Cas protein and the first gRNA combine to form a first functional complex, and the second Cas protein and the second gRNA combine to form a second functional complex; and, in addition, the processing unit,
the first and second functional complexes cleave a first strand and a second strand, respectively, of the double-stranded target nucleic acid, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' ends of the two nicks being referred to as target nucleic acid fragment F1, and the third and fourth nucleic acid editing systems cleave the nucleic acid molecule of interest, forming cleaved nucleotide fragments a1, a2, and a3; and, in addition, the processing unit,
The first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,
the first DNA polymerase and the second DNA polymerase respectively take a first tag primer and a second tag primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that 3' -ends generated by cleavage in the first strand and the second strand respectively extend to form a first lobe and a second lobe, and a double-stranded part with the first lobe and the second lobe is formed and is called as a target nucleic acid fragment F2; wherein the first and second lobes are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a3, respectively; and, in addition, the processing unit,
the target nucleic acid fragment F2 hybridizes or anneals to the nucleotide fragments a1 and a3 via the first and second lobes, respectively, and is further connected between the nucleotide fragments a1 and a3, thereby replacing the nucleotide fragment a2 in the nucleic acid molecule of interest with the target nucleic acid fragment.
In certain embodiments, the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a1, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system.
In certain embodiments, the second lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a3, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the fourth nucleic acid editing system.
In certain embodiments, the method is performed intracellularly.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, and the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second Cas protein, the second gRNA, the second DNA polymerase, the second tag primer, the third nucleic acid editing system, and the fourth nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecule A3, and the nucleic acid molecule A4 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second Cas protein, the second gRNA, the second DNA polymerase, the second tag primer, the third nucleic acid editing system, and the fourth nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, A3, and A4 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second Cas protein, the second gRNA, the second DNA polymerase, the second tag primer, the third nucleic acid editing system, and the fourth nucleic acid editing system within the cell.
In certain embodiments, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid within the cell.
In certain embodiments, the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein. In certain embodiments, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof.
In certain embodiments, the nucleic acid molecule of interest is genomic DNA of the cell.
In certain embodiments, the first Cas protein, the first gRNA, the first DNA polymerase, or the first tag primer are as previously defined.
In certain embodiments, the second Cas protein, the second gRNA, the second DNA polymerase, or the second tag primer is as previously defined.
In certain embodiments, the third nucleic acid editing system is as previously defined.
In certain embodiments, the fourth nucleic acid editing system is as previously defined.
In certain embodiments, the third nucleic acid editing system is as defined above and the fourth nucleic acid editing system is as defined above, the nucleic acid molecule of interest comprising a third PAM sequence recognized by a third Cas protein and a fourth PAM sequence recognized by a fourth Cas protein. In certain embodiments, in step ii, the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA and breaks it; and, the fourth functional complex binds to and breaks the nucleic acid molecule of interest through the fourth PAM sequence and the fourth gRNA.
In certain embodiments, the first and second Cas proteins are identical, selected from Cas proteins that cleave a single strand of DNA, and the second DNA polymerase is identical to the first DNA polymerase; wherein the first Cas protein forms a first functional complex with the first gRNA and a second functional complex with the second gRNA, and the first DNA polymerase performs an extension reaction with the first tag primer and the second tag primer annealed to the target nucleic acid fragment F1 as templates, respectively, to form a target nucleic acid fragment F2 having a first flap and a second flap.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, and the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecules A3 and A4 are delivered into a cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second gRNA, the second tag primer, the third nucleic acid editing system, and the fourth nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, D2, A3, and A4 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or are capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell. In certain embodiments, in step i, the first PegRNA is delivered into the cell to provide the first gRNA and the first tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into the cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer within the cell.
In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell. In certain embodiments, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer within the cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein, a nucleic acid molecule containing a nucleotide sequence encoding the first PegRNA, a nucleic acid molecule containing a nucleotide sequence encoding the second PegRNA, a nucleic acid molecule containing a nucleotide sequence encoding the third nucleic acid editing system, and a nucleic acid molecule containing a nucleotide sequence encoding the fourth nucleic acid editing system are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell.
In certain embodiments, the method comprises the steps of:
i. providing a double stranded target nucleic acid and a nucleic acid molecule of interest; and
providing the first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and a third and fourth nucleic acid editing system; wherein the third nucleic acid editing system and the fourth nucleic acid editing system are each as defined above;
ii contacting the double stranded target nucleic acid with the first and second Cas proteins, first and second grnas, first and second DNA polymerases, first and second tag primers, and contacting the nucleic acid molecule of interest with the third nucleic acid editing system and fourth nucleic acid editing system.
In certain embodiments, in step ii:
the first Cas protein and the first gRNA combine to form a first functional complex, the second Cas protein and the second gRNA combine to form a second functional complex, the third Cas protein and the third gRNA combine to form a third functional complex, and the fourth Cas protein and the fourth gRNA combine to form a fourth functional complex; and, in addition, the processing unit,
The first and second functional complexes cleave, respectively, a first strand and a second strand of the double-stranded target nucleic acid, the first strand and the second strand comprising, respectively, a3 'end resulting from the cleavage, a double-stranded portion located between the two 3' ends being referred to as target nucleic acid fragment F1, and the third and fourth functional complexes bind to and cleave the nucleic acid molecule of interest, forming cleaved nucleotide fragments a1, a2 and a3; and, in addition, the processing unit,
the first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,
the first DNA polymerase and the second DNA polymerase respectively take a first tag primer and a second tag primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that 3' -ends generated by cleavage in the first strand and the second strand respectively extend to form a first lobe and a second lobe, and a double-stranded part with the first lobe and the second lobe is formed and is called as a target nucleic acid fragment F2; wherein the first and second lobes are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a3, respectively; and, in addition, the processing unit,
The third tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the nucleotide fragment a1 via the third target binding sequence, wherein the 3' end is formed by cleavage of a nucleic acid molecule of interest by the third functional complex; and, the fourth tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the nucleotide fragment a3 through the fourth target binding sequence, wherein the 3' end is formed by cleavage of the nucleic acid molecule of interest by the fourth functional complex.
In certain embodiments, the method is performed intracellularly.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, a third nucleic acid editing system or nucleic acid molecule A3 encoding the same, and a fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and third and fourth nucleic acid editing systems within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecule A3, the nucleic acid molecule A4 are delivered into a cell to provide the first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and third and fourth nucleic acid editing systems within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, A3, A4 are delivered into the cell to provide first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and third and fourth nucleic acid editing systems within the cell.
In certain embodiments, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid within the cell.
In certain embodiments, the double-stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein. In certain embodiments, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof.
In certain embodiments, the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein and a fourth PAM sequence recognized by a fourth Cas protein. In certain embodiments, in step ii, the third functional complex binds to the nucleic acid molecule of interest through the third PAM sequence and the third gRNA and breaks it; and, the fourth functional complex binds to and breaks the nucleic acid molecule of interest through the fourth PAM sequence and the fourth gRNA.
In certain embodiments, the nucleic acid molecule of interest is genomic DNA of the cell.
In certain embodiments, the first Cas protein, the first gRNA, the first DNA polymerase, or the first tag primer are as previously defined.
In certain embodiments, the second Cas protein, the second gRNA, the second DNA polymerase, or the second tag primer is as previously defined.
In certain embodiments, the first and second Cas proteins are identical, selected from Cas proteins that cleave a DNA duplex, the third and fourth Cas proteins are identical, selected from Cas proteins that cleave a DNA duplex, and the first, second, third, and fourth DNA polymerases are identical DNA polymerases; wherein the first Cas protein forms a first, second, third, and fourth functional complex with the first, second, third, and fourth grnas, respectively; the first DNA polymerase performs an extension reaction using the first and second tag primers annealed to the target nucleic acid fragment F1 as templates, respectively, to form a target nucleic acid fragment F2 having a first lobe and a second lobe.
In certain embodiments, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first Cas protein, first DNA polymerase, first and second gRNA, and first and second tag primers, and third and fourth nucleic acid editing systems within the cell.
In certain embodiments, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecules A3 and A4 are delivered into a cell to provide the first Cas protein, first DNA polymerase, third nucleic acid editing system and fourth nucleic acid editing system within the cell.
In certain embodiments, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, D2, A3 are delivered into the cell to provide the first Cas protein, first DNA polymerase, first and second grnas, and first and second tag primers, and third and fourth nucleic acid editing systems within the cell.
In certain embodiments, the nucleic acid molecule A1 and the nucleic acid molecule B1 are contained in the same or different expression vectors (e.g., eukaryotic expression vectors). In certain embodiments, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase, or are capable of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase, in a cell. In certain embodiments, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell.
In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell. In certain embodiments, in step i, the first PegRNA is delivered into the cell to provide the first gRNA and the first tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into the cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer within the cell.
In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are contained in the same expression vector (e.g., a eukaryotic expression vector). In certain embodiments, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell. In certain embodiments, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer within the cell.
Advantageous effects of the invention
Compared with the prior art, the nucleic acid editing system, the kit and the method provided by the application can break one nucleic acid strand of double-stranded target nucleic acid (for example, a donor vector containing a target nucleic acid sequence or other exogenous nucleic acid fragments) and form a flap at the 3' -end of a notch (which is a homologous flap structure identical to or complementary to the end sequence of the genome-specific break) while performing double-stranded nucleic acid breaking at the specific site of the genome. On the basis, the system, the kit and the method can realize the efficient and accurate insertion and replacement of the exogenous nucleic acid (especially the large fragment exogenous nucleic acid) at the specific locus of the genome.
Specifically, at least the following effects are achieved: 1. greatly improving the efficiency of site-directed integration of exogenous genes; 2. the accuracy of the joint is improved, and the occurrence of mutation such as base deletion or insertion at the joint is reduced; 3. integration of the foreign gene at the genomic specific site is unidirectional; 4. compared with NHEJ and other integration schemes relying on double-strand cutting of donor vectors, the system does not generate linearized DNA fragments, and improves the safety of site-directed integration of exogenous genes; in addition, compared with the HDR mediated gene editing technology, the f-PAINT method does not need to construct a homology arm, and can greatly improve the length of the exogenous gene carried by the virus vector, particularly the adeno-associated virus vector.
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples, but it will be understood by those skilled in the art that the following drawings and examples are only for illustrating the present invention and are not to be construed as limiting the scope of the present invention. Various objects and advantageous aspects of the present invention will become apparent to those skilled in the art from the following detailed description of the preferred embodiments and the accompanying drawings.
FIG. 1 shows a schematic representation of the principle of the method (f-PAINT) of the invention for mediating insertion of an exogenous gene into the genome. FIG. 1a is a schematic diagram of the site-directed genomic integration of exogenous genes using the HDR, NHEJ and f-PAINT methods. Wherein the black double solid line represents genomic sequence; the grey double solid lines represent the backbone sequence of the donor vector; yellow bar boxes represent exogenous genes; the red and blue solid lines represent homologous sequences on the genome on both the left and right sides of the integration site, respectively, to the donor vector (either self-carried or produced by processing). Black triangles represent specific recognition and cleavage sites for nucleases on genomic DNA, which under the action of nucleases undergo double-strand breaks at blunt or sticky ends. Blue and purple triangles represent a pair of targeting recognition sequences of PE-Cas proteins (first fusion proteins) in reverse alignment, which recognize and cleave one nucleic acid strand of the donor vector to create a nick, and at the 3' end of the nick, a homologous flap sequence is generated that can be identical or complementary to the end sequence of the genome break. The DNA fragment containing the foreign gene is located between two homologous valve structures. The HDR-based method utilizes a homology arm on a donor vector to realize the site-directed integration of the exogenous gene at a specific double-strand cleavage site of genomic DNA through a cellular HDR repair mechanism. The NHEJ-based method relies on the self NHEJ repair mechanism of cells, and exogenous gene fragments cut from a donor vector are connected to the end of double strand break of a genome specific integration site, so that site-directed integration of exogenous genes is realized. The method (f-PAINT) of the invention is to complementarily pair homologous flap (valvular process) sequences generated by processing on a donor vector with the ends of double strand breaks on a genome, and to generate DNA replication taking the donor vector as a template at the ends of the genome breaks, thereby realizing end reconnection and site-directed integration of exogenous fragments by double strand hybridization. FIG. 1b shows the process of producing homologous flap (cusp) sequences on a donor vector. Taking PE-spCas9 as an example, PE-spCas9/pegRNA recognizes and binds to the targeting recognition sequences on both sides of the exogenous gene on the donor vector and cleaves the nucleic acid single strand that is not paired with the pegRNA. Subsequently, the primer binding sequence on the pegRNA binds to the end of the free nucleic acid single strand generated by cleavage and upon reverse transcriptase, extends a homologous flap (valve process) sequence using the template sequence of the pegRNA (homologous to the end of the genome break) as template.
FIG. 2 shows the use of the f-PAINT method to achieve efficient, specific site-directed integration of an exogenous gene on the 3' UTR of the GAPDH gene of human 293T cells. FIG. 2a, among others, shows a schematic flow of site-directed knock-in of an exogenous gene (IRES-EGFP) in the 3' UTR region of the GAPDH gene of the human 293T cell genome using different methods (HDR, NHEJ and f-PAINT). The solid black line indicates genomic sequence; blue and grey boxes represent the exon protein coding and non-coding regions, respectively; the red and blue solid lines represent homologous sequences: the long solid line indicates homology arm sequences on the HDR donor vector; the short solid lines represent the processed homologous flap (valve process) sequences on the f-PAINT donor vector. FIG. 2b shows a comparison of the efficiency of site-directed integration of exogenous genes using different methods. The results show that the f-PAINT mediated integration efficiency of the exogenous gene is significantly higher than that of the HDR and NHEJ methods. FIG. 2c shows PCR identification of correctly edited gene sequences and byproducts generated by site-directed integration of exogenous genes using both NHEJ and f-PAINT. The f-PIANT mode is shown to generate no linearized DNA fragment, and no by-products such as foreign gene reverse insertion, donor vector skeleton insertion and the like are generated at a genome specific integration site. FIG. 2d shows the sanger sequencing results at the junction of the correctly edited gene sequence generated by site-directed integration of exogenous genes in two modes, NHEJ and f-PAINT. The f-PAINT method mediated site-directed integration of the linker was shown to have greater precision than the NHEJ method.
FIG. 3 shows a comparison of the efficiency of site-directed integration of HDR, NHEJ, HMEJ and f-PAINT different approaches at the AAVS1 site in the human genome and the Rosa26 site in the mouse genome. Among them, FIG. 3a shows the efficiency of integration of exogenous genes (CAG-EGFP) on the genome mediated by different methods of HDR, NHEJ, HMEJ and f-PAINT (EGFP positive cell rate) without the use of sacAS9/sgRNA targeting cleavage of the genome specific site (AAVS 1 or Rosa 26). The integration of the foreign gene is now an integration at a non-specific site, or referred to as random integration. Random integration of foreign genes in the genome can lead to gene insertion mutations, compromising the stability of the genome. FIG. 3b shows the efficiency of genome integration (EGFP positive cell rate) of foreign genes (CAG-EGFP) mediated by different methods such as HDR, NHEJ, HMEJ and f-PAINT using sacAS9/sgRNA targeting to cleave the genome specific site (AAVS 1 or Rosa 26). At this time, the integration of the foreign gene is mainly site-specific integration.
FIG. 4 shows a comparison of the efficiency of site-directed integration of the exogenous gene CAG-EGFP at a gene therapy-related safe harbor site on K562 cells and at a genetic disease-related gene site using HDR, HDR NT (non-site-specific targeting HDR method), f-PAINT and f-PAINT NT (non-site-specific targeting f-PAINT method). Both HDR and f-PAINT maintain low level random integration probability, but the f-PAINT method realizes higher efficiency exogenous gene site-directed integration on different gene loci such as AAVS1, CCR5, TRAC, WAS, HBB, IL RG and the like.
FIG. 5 shows the genotyping and linker Sanger sequencing results of f-PAINT method-mediated site-directed integration of the exogenous gene CAG-EGFP at the AAVS1, CCR5, TRAC, etc., safe harbor sites of K562 cells.
FIG. 6 shows the genotyping and linker Sanger sequencing results of f-PAINT method-mediated site-directed integration of the exogenous gene CAG-EGFP at the WAS, HBB, IL RG and other genetic disease-related sites of K562 cells.
FIG. 7 is a schematic diagram of the h-PAINT method of the present invention for mediating site-directed integration of exogenous genes. In the h-PAINT (LHA) method, the left side of the exogenous gene on the donor vector is a left homology arm 500-1500bp long, and the right side of the exogenous gene is a targeting recognition sequence that can be recognized and processed by PE-spCas 9/pegRNA. The targeting recognition sequence generates a right homologous valve under the action of PE-spCas9/pegRNA, the homologous valve and the breaking end of the genome are subjected to interaction through base complementation, so that the extension of the breaking end of the genome is realized, and the extension of the breaking end of the genome and the other breaking end of the genome are subjected to complementation and pairing through a left homologous arm, so that the integration of exogenous genes and the repair of chains are realized. For h-PAINT (RHA), the right side of the exogenous gene on the donor vector is a right homologous arm 500-1500bp long, and the left side of the exogenous gene is a targeting recognition sequence which can be recognized and processed by PE-spCas 9/pegRNA.
FIG. 8 shows a comparison of the efficiency of f-PAINT and h-PAINT methods for mediating site-directed integration of the exogenous gene IRES-EGFP on the 3' UTR of the human GAPDH gene. FIG. 8a is a schematic diagram of h-PAINT method mediated site-directed integration of exogenous gene IRES-EGFP on the 3' UTR of human GAPDH gene. The left side of the h-PAINT (LHA) donor vector is a left homologous arm sequence of 800bp, and the right side is a targeting recognition sequence of PE-spCas9/GAPDH-peg beta; the right homologous arm sequence of the h-PAINT (RHA) donor vector is 800bp on the right side, and the targeting recognition sequence of PE-spCas9/GAPDH-peg alpha on the left side. FIG. 8b shows the results of the efficiency of f-PAINT, h-PAINT (LHA), h-PAINT (RHA) mediated integration of exogenous gene IRES-EGFP on the 3' UTR of human GAPDH gene. FIG. 8c shows the genotyping results for cells edited by different methods. FIG. 8d shows Sanger sequencing of 5 'and 3' junctions of the edited cells of the h-PAINT method.
Sequence information
The information of the partial sequences to which the present invention relates is provided in table 1 below.
Table 1: description of the sequence
The invention will now be described with reference to the following examples, which are intended to illustrate the invention, but not to limit it.
The experiments and methods described in the examples were performed substantially in accordance with conventional methods well known in the art and described in various references unless specifically indicated. For example, for the conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA used in the present invention, reference may be made to Sambrook (Sambrook), friech (Fritsch) and manitis (Maniatis), molecular cloning: laboratory Manual (MOLECULAR CLONING: A LABORATORY MANUAL), edit 2 (1989); the handbook of contemporary molecular biology (CURRENT PROTOCOLS IN MOLECULAR BIOLOGY) (edited by f.m. ausubel (f.m. ausubel) et al, (1987)); series (academic publishing company) of methods in enzymology (METHODS IN ENZYMOLOGY): PCR 2: practical methods (PCR 2:A PRACTICAL APPROACH) (M.J. MaxFrson (M.J. MacPherson), B.D. Hemsl (B.D. Hames) and G.R. Taylor (G.R. Taylor) editions (1995)), and animal cell CULTURE (ANIMAL CELL CULTURE) (R.I. French Lei Xieni (R.I. Freshney) editions (1987)).
In addition, the specific conditions are not specified in the examples, and the process is carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention. Those skilled in the art will appreciate that the examples describe the invention by way of example and are not intended to limit the scope of the invention as claimed. All publications and other references mentioned herein are incorporated by reference in their entirety.
Example 1 site-directed insertion of an exogenous Gene (IRES-EGFP) into the human GAPDH Gene 3 'using the f-PAINT System'
UTR region
To verify the effect of the f-PAINT system on site-directed insertion of exogenous genes into the genome, the present example designed the following experiment: the reporter gene IRES-EGFP was site-directed knocked into the 3' UTR region of human genomic GAPDH using the f-PAINT system, and the HDR method and the NHEJ method were used as controls. Schematic diagrams of the three different methods of HDR, NHEJ and f-PAINT mediated site-directed integration of exogenous genes are shown in FIG. 1. The scheme of the above-mentioned methods for mediating the site-directed integration of IRES-EGFP reporter gene on GAPDH gene is shown in FIG. 2 a.
The GAPDH gene is located on chromosome 12, codes glyceraldehyde-3-phosphate dehydrogenase, is an important housekeeping gene, and has high expression abundance in 293T cells. The reporter gene can be transcribed with the GAPDH gene after it has been properly integrated into the 3' utr region of GAPDH. Wherein the IRES sequence recruits ribosomes to allow EGFP expression. The fluorescence signal of EGFP can be conveniently observed directly by a fluorescence microscope, and the correctly edited EGFP-expressing cells can be captured and quantified by flow cytometry.
The pCAG-spCas9-mCherry plasmids used in this example, which were capable of expressing spCas9 protein (SEQ ID NO: 1) and mCherry protein (SEQ ID NO: 2), pCAG-spCas9 (H840A) -mCherry plasmids, which were capable of expressing spCas9 (H840A) protein (SEQ ID NO: 3) and mCherry protein (SEQ ID NO: 2), pCAG-saCas9 plasmids, which were capable of expressing saCas9 protein (SEQ ID NO: 4), pUC19-U6-gRNA (saps9), which were capable of transcribing gRNA lacking the guide sequence (saps9) (SEQ ID NO: 5), pUC19-U6-gRNA (spCas9), which were capable of transcribing gRNA lacking the guide sequence (spCas9) (SEQ ID NO: 6), and pGH plasmids, which were used as backbones of donor vectors, were obtained from the group Li Wei of the institute of animals.
A nucleotide fragment encoding MLV TR (SEQ ID NO: 7) was amplified from the pCMV-PE2 (# 132775) plasmid purchased from addgene, inc., and a nucleotide fragment encoding the spCas9 (H840A) portion and a nucleotide fragment encoding mCherry were amplified from the pCAG-spCas9 (H840A) -mCherry plasmid. The amplified MLV TR and spCas9 (H840A) nucleotide fragments described above were ligated by In-fusion cloning techniques to the AscI/BsrGI double digested pCAG-spCas9 (H840A) -mCherry plasmid, resulting In pCAG-PE-spCas9-mCherry plasmids capable of expressing PE-spCas9 protein (SEQ ID NO: 8) and mCherry protein. The PE-spCas9 protein fuses MLV TR and spCas9 (H840A).
Primers sgGAPDH-F (SEQ ID NO: 9) and sgGAPDH-R (SEQ ID NO: 10) were annealed and ligated with T4 ligase to the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sgGAPDH plasmid capable of transcribing sgGAPDH (SEQ ID NO: 11) directing the saCas9 protein to the specific site of the 3' URT region of the human GAPDH site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers GAPDH-peg alpha-F (SEQ ID NO: 20) and GAPDH-peg alpha-R (SEQ ID NO: 21) were subjected to overlap extension PCR, the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg alpha plasmid vector by In-fusion cloning technology to give pUC19-U6-GAPDH-peg alpha plasmid capable of transcribing GAPDH-peg alpha (SEQ ID NO: 22), directing the specific recognition sequence of peg alpha (SEQ ID NO: 15) on PE-spCas9 protein targeting donor vector, and reverse transcribing homologous flap structures at the nicks.
The primers GAPDH-peg beta-F (SEQ ID NO: 23) and GAPDH-peg beta-R (SEQ ID NO: 24) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-GAPDH-peg beta plasmid capable of transcribing GAPDH-peg beta (SEQ ID NO: 25), directing the specific recognition sequence of sg beta (SEQ ID NO: 19) on PE-spCas9 protein targeting donor vector, and reverse transcribing homologous flap structure at the cut.
The reporter gene IRES-EGFP (SEQ ID NO: 26) was synthesized by JieRui, inc., and ligated to EcoRV-digested pGH plasmid vector via T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there was a homology arm sequence of about 800bp on both sides of the reporter gene (SEQ ID NO:27 and SEQ ID NO:28, respectively).
In the implementation of the HDR system, pCAG-saCas9-mCherry, sgGAPDH, along with the donor vector of HDR, was delivered into 293T cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the control group of the HDR system, pCAG-saCas9-mCherry was delivered into 293T cells along with the donor vector of HDR. 293T cell lines were from ATCC cell banks. After 24 hours of transfection, mCherry positive cells were sorted by flow cytometry, and after 5 days of culture, the ratio of EGFP positive cells was analyzed by flow cytometry. In the implementation of the NHEJ system, pCAG-sacAS9, sgGAPDH, pCAG-spCas9-mCherry, pUC19-U6-sgα, pUC19-U6-sgβ together with the donor vector of NHEJ were transfected into 293T cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the control group of the NHEJ system, pCAG-sacAS9, pCAG-spCas9-mCherry, pUC19-U6-sg alpha, pUC19-U6-sg beta were transfected into 293T cells together with donor vector. After 24 hours of transfection, mCherry positive cells were sorted on a flow cytometer, and after 5 days of culture of the sorted cells, the ratio of EGFP positive cells was analyzed on the flow cytometer. In the implementation of the f-PAINT system, the donor vector of pCAG-saCas9, sgGAPDH, pCAG-PE-spCas9-mCherry, pUC19-U6-GAPDH-pegα, pUC19-U6-GAPDH-pegβ together with f-PAINT was transfected into 293T cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the negative control group of the f-PAINT system, pCAG-sacAS9, pCAG-PE-spCas9-mCherry, pUC19-U6-GAPDH-peg alpha, pUC19-U6-GAPDH-peg beta were transfected into 293T cells together with donor vector. After 24 hours of transfection, mCherry positive cells were sorted on a flow cytometer, and after 5 days of culture of the sorted cells, the ratio of EGFP positive cells was analyzed on the flow cytometer.
Comparing the proportion of EGFP positive cells of three different systems can reflect the difference of the site-directed integration efficiency of the exogenous gene IRES-EGFP mediated by different systems. The result of exogenous gene integration efficiency is shown in FIG. 2 b. The results show that the ratio of EGFP-positive cells in the f-PAINT system is about 30 times that of the HDR system, about 4 times that of the NHEJ system.
Extracting genomic DNA of cells edited by NHEJ and f-PAINT systems, and then identifying the amplified gene-inverted-5-1 (SEQ ID NO: 29)/GAPDH-P4 (SEQ ID NO: 32), amplified gene-inverted-5-2 (SEQ ID NO: 30), amplified correctly-integrated 5 '-linker, GAPDH-P3 (SEQ ID NO: 31)/GAPDH-P4 (SEQ ID NO: 32), amplified correctly-integrated 3' -linker, GAPDH-P1 (SEQ ID NO: 29)/GAPDH-P3 (SEQ ID NO: 31), amplified-gene-inverted-5 '-linker, GAPDH-2 (SEQ ID NO: 30)/GAPDH-P4 (SEQ ID NO: 32), amplified-gene-inverted-3' -linker, GAPDH-P1 (SEQ ID NO: 29)/GAPDH-P5 (SEQ ID NO: 33) (amplified-5 '-linker), GAPDH-6 (SEQ ID NO: 34)/GAPDH-P4 (SEQ ID NO: 32), amplified-gene-inverted-3' -linker, amplified by primers GAPDH-P1 (SEQ ID NO: 29)/GAPDH-P4 (SEQ ID NO: 32), and Sanger sequencing analysis was performed on the correctly integrated linker sequences.
The PCR identification results are shown in FIG. 2 c. The results show that the NHEJ method mediated exogenous gene integration can generate byproducts such as exogenous gene reverse integration, skeleton integration and the like besides the correct exogenous gene integration at the specific integration site of the genome due to the fragmentation processing of the donor vector. The results of Sanger sequencing analysis are shown in FIG. 2 d. The results show that the ligation junctions of the f-PAINT system are also more accurate than the NHEJ method, and base insertions, deletions and substitutions are not readily produced at the junctions. In conclusion, the f-PAINT system described by the invention can greatly improve the site-directed integration efficiency and accuracy of the exogenous gene. Since linearization of the donor vector does not occur, the integration method, which requires double-stranded DNA depending on linearization as the donor vector, has higher safety than NHEJ or the like.
Example 2 site-directed insertion of exogenous reporter Gene (CAG-EGFP) into human AAVS1 Using the f-PAINT System
Point, mouse Rosa26 site
To further verify the site-specific nature of the f-PAINT system mediated site-directed integration of exogenous genes, the present example designed the following experiment: the f-PAINT system was used to knock the reporter gene CAG-EGFP site-specific into the first intron of the human genome AAVS1 site and the first intron of the mouse genome Rosa26 site, and the HDR, NHEJ and HMEJ methods were used as controls.
The AAVS1 site and the Rosa26 site are recognized as safe harbor sites on the human and mouse genomes, respectively, where insertion of foreign sequences does not affect the function of the cell itself. The reporter gene CAG-EGFP is provided with a CAG promoter, and after the reporter gene CAG-EGFP is integrated into a genome, the CAG promoter can drive the EGFP to express, and fluorescent signals of the EGFP can be directly observed through a fluorescent microscope conveniently, and also can be captured and quantified through flow cytometry. In addition to detecting specific integration of the exogenous gene at the AAVS1 site or the Rosa26 site, the reporter gene CAG-EGFP can also detect random integration of the exogenous gene in the genome.
The plasmids used in this example for expressing sgrnas and pegrnas were constructed in the same manner as in example 1, except that the donor plasmid carrying the reporter gene CAG-EGFP was used. Wherein, the HMEJ donor vector also takes pGH plasmid as a vector skeleton, homologous arm sequences with the length of about 800bp are respectively introduced at two sides of a reporter gene, and a target recognition sequence of spCas9/sg alpha is respectively introduced at the outer sides of the homologous arms. The sequences of the primers used for plasmid construction, the sequences of sgrnas and pegrnas, the sequences of the reporter gene CAG-EGFP and the sequences of the homology arms are shown in table 1, and the specific primer sequences used are shown below.
(1) AAVS1 site
Primers sgAAVS1-F (SEQ ID NO: 36) and sgAAVS1-R (SEQ ID NO: 10) were annealed and ligated with T4 ligase to the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sg AAVS1 plasmid capable of transcribing sgAAVS1 (SEQ ID NO: 38), directing the saCas9 protein to the first intron of the human genome AAVS1 site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers AAVS1-peg alpha-F (SEQ ID NO: 39) and AAVS1-peg alpha-R (SEQ ID NO: 40) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg alpha plasmid vector by In-fusion cloning technology to give pUC19-U6-AAVS1-peg alpha plasmid capable of transcribing AAVS1-peg alpha (SEQ ID NO: 41), directing the specific recognition sequence of peg alpha (SEQ ID NO: 15) on PE-spCas9 protein targeting donor vector, and reverse transcribing the homologous flap structure at the cut.
The primers AAVS1-peg beta-F (SEQ ID NO: 42) and AAVS1-peg beta-R (SEQ ID NO: 43) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-AAVS1-peg beta plasmid capable of transcribing AAVS1-peg beta (SEQ ID NO: 44), directing the specific recognition sequence of peg beta (SEQ ID NO: 19) on PE-spCas9 protein targeting donor vector, and reverse transcribing the homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:46 and SEQ ID NO:47, respectively) on both sides of the reporter gene.
(2) Rosa26 site
Primers sgmRosa26-F (SEQ ID NO: 52) and sgmRosa26-R (SEQ ID NO: 53) were annealed and ligated with T4 ligase to the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sg Rosa26 plasmid capable of transcribing sgmRosa26 (SEQ ID NO: 54), directing the saCas9 protein to the first intron of the genomic Rosa26 site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers mRosa26-peg alpha-F (SEQ ID NO: 55) and mRosa26-peg alpha-R (SEQ ID NO: 56) were subjected to overlap extension PCR, the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg alpha plasmid vector by In-fusion cloning technology to give pUC19-U6-mRosa26-peg alpha plasmid capable of transcribing mRosa26-peg alpha (SEQ ID NO: 57), directing the specific recognition sequence of sg alpha on PE-spCas9 protein targeting donor vector (SEQ ID NO: 15), and reverse transcribing homologous flap structure at the cut.
The primers mRosa26-peg beta-F (SEQ ID NO: 58) and mRosa26-peg beta-R (SEQ ID NO: 59) were subjected to overlap extension PCR, the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-mRosa26-peg beta plasmid capable of transcribing mRosa26-peg beta (SEQ ID NO: 60), directing the specific recognition sequence of sg beta (SEQ ID NO: 19) on PE-spCas9 protein targeting donor vector, and reverse transcribing homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:61 and SEQ ID NO:62, respectively) on both sides of the reporter gene.
To verify the effect of each of the different methods on random integration of the exogenous gene CAG-EGFP in the genome, first the efficiency of each method to mediate integration of the exogenous gene CAG-EGFP into the genome without targeting specific loci of the genome was examined: in the implementation of the HDR system, pCAG-saCas9-mCherry was delivered into 293T cells or mouse embryonic stem cells with the donor vector of HDR using Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the implementation of the NHEJ system, pCAG-sacAS9, pCAG-spCas9-mCherry, pUC19-U6-sgα, pUC19-U6-sgβ together with the donor vector of NHEJ were transfected into 293T cells or mouse embryonic stem cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the implementation of the HMEJ method, pCAG-saCas9, pCAG-spCas9-mCherry, pUC19-U6-sgα, along with the donor vector of HMEJ, are transfected into 293T cells or mouse embryonic stem cells. In the implementation of the f-PAINT system, donor vectors of pCAG-saCas9, pCAG-PE-spCas9-mCherry, pUC19-U6-pegα, pUC19-U6-pegβ together with f-PAINT were transfected into 293T cells or mouse embryonic stem cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. Cells were sorted on a flow cytometer 24 hours after transfection, and after the sorted cells continued to be cultured for 14 days, the ratio of EGFP positive cells was analyzed on the flow cytometer.
To verify the efficiency of exogenous gene integration at specific sites of the genome for the different methods, in the implementation of each method, targeted cleavage was performed by saCas9/sgRNA at the AAVS1 site of the human genome or the Rosa26 site of the mouse: in the implementation of the HDR system, pCAG-saCas9, sgGAPDH (or sgRosa 26), and the donor vector of HDR were delivered into 293T cells or mouse embryonic stem cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the implementation of the NHEJ system, pCAG-sacAS9, sgGAPDH (or sgRosa 26), pCAG-spCas9-mCherry, pUC19-U6-sgα, pUC19-U6-sgβ together with the donor vector of NHEJ were transfected into 293T cells or mouse embryonic stem cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the implementation of the HMEJ method, pCAG-saCas9, sgGAPDH (or sgRosa 26), pCAG-spCas9-mCherry, pUC19-U6-sgα, along with the donor vector of HMEJ, are transfected into 293T cells or mouse embryonic stem cells. In the implementation of the f-PAINT system, the donor vector of pCAG-saCas9, sgGAPDH (or sgRosa 26), pCAG-PE-spCas9-mCherry, pUC19-U6-pegα, pUC19-U6-pegβ together with f-PAINT was transfected into 293T cells or mouse embryonic stem cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. Cells were sorted on a flow cytometer 24 hours after transfection, and after the sorted cells continued to be cultured for 14 days, the ratio of EGFP positive cells was analyzed on the flow cytometer.
The ratio of EGFP positive cells generated by four different systems under the conditions of non-targeted genome specific sites and targeted genome specific sites can be compared respectively, and the difference of integration efficiency of exogenous genes CAG-EGFP mediated by different systems on genome can be reflected. The result of non-targeted integration efficiency of the foreign gene is shown in FIG. 3 a. The results show that the f-PAINT system and the HDR system have lower random integration probability of the exogenous genes at two sites of human AAVS1 and mouse Rosa26, and the NHEJ and HMEJ systems have higher random integration probability of the exogenous genes. The result of the targeted integration efficiency of the exogenous gene is shown in FIG. 3 b. The results show that the f-PAINT system has highest exogenous gene targeting integration efficiency compared with other methods at two sites of human AAVS1 and mouse Rosa 26. The results reflect the f-PAINT system mediated site-directed integration of the exogenous gene, and not only have high efficiency, but also have strong site specificity.
Example 3 site-directed insertion of exogenous Gene (CAG-EGFP) into Gene therapy-related Ann Using the f-PAINT System
Full harbor site and genetic disease-associated site
In order to further verify the accuracy and efficiency of the f-PAINT system in mediating site-directed integration of exogenous genes at different loci of the genome and demonstrate the application potential of the f-PAINT system in gene therapy, the present example designed the following experiment: the f-PAINT system is used on K562 cells to knock the safety harbor sites such as AAVS1, CCR5, TRAC and the like of the reporter gene CAG-EGFP at fixed points, and the related sites of the genetic diseases such as WAS, HBB, IL RG and the like, and the HDR method is used as a control. The construction of sgrnas, expression vectors for pegrnas and donor vectors is described in example 1. The primer sequences used for vector construction and the homology arm sequences of the donor vector are shown in Table 1, and the specific primer sequences used are shown below.
(1) AAVS1 site
The primers used and the construction procedure were the same as in example 2.
(2) CCR5 site
Primers sgCCR5-F (SEQ ID NO: 65) and sgCCR5-R (SEQ ID NO: 66) were annealed and ligated with T4 ligase onto the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI, resulting in a pUC19-U6-sgCCR5 plasmid capable of transcribing sgCCR5 (SEQ ID NO: 67), guiding the saCas9 protein to target the CCR5 site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers CCR5-pegα -F (SEQ ID NO: 68) and CCR5-pegα -R (SEQ ID NO: 69) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sgα plasmid vector by In-fusion cloning technology to give pUC19-U6-CCR5-pegα plasmid capable of transcribing CCR5-pegα (SEQ ID NO: 70), directing PE-spCas9 protein to target the specific recognition sequence of sα (SEQ ID NO: 15) on the donor vector, and reverse transcribing the homologous flap structure at the cut.
The primers CCR5-peg beta-F (SEQ ID NO: 71) and AAVS1-peg beta-R (SEQ ID NO: 72) were subjected to overlap extension PCR, the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-CCR5-peg beta plasmid capable of transcribing CCR5-peg beta (SEQ ID NO: 73), directing the PE-spCas9 protein to target the specific recognition sequence of sg beta (SEQ ID NO: 19) on the donor vector, and reverse transcribing the homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:74 and SEQ ID NO:75, respectively) on both sides of the reporter gene.
(3) TRAC site
Primers sgTRAC-F (SEQ ID NO: 78) and sgTRAC-R (SEQ ID NO: 79) were annealed and ligated with T4 ligase to the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI, resulting in pUC19-U6-sgTRAC plasmid capable of transcribing sgTRAC (SEQ ID NO: 80), guiding the saCas9 protein to target the TRAC site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers TRAC-peg alpha-F (SEQ ID NO: 81) and TRAC-peg alpha-R (SEQ ID NO: 82) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg alpha plasmid vector by In-fusion cloning technology to give pUC19-U6-TRAC-peg alpha plasmid capable of transcribing TRAC-peg alpha (SEQ ID NO: 83), directing PE-spCas9 protein to target the specific recognition sequence of sg alpha (SEQ ID NO: 15) on the donor vector, and reverse transcribing the homologous flap structure at the cut.
The primers TRAC-peg beta-F (SEQ ID NO: 84) and TRAC-peg beta-R (SEQ ID NO: 85) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-TRAC-peg beta plasmid capable of transcribing TRAC-peg beta (SEQ ID NO: 86), directing the PE-spCas9 protein to target the specific recognition sequence of sg beta (SEQ ID NO: 19) on the donor vector, and reverse transcribing the homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:87 and SEQ ID NO:88, respectively) on both sides of the reporter gene.
(4) WAS-1 site
Primers sg WAS-1-F (SEQ ID NO: 91) and sgWAS-1-R (SEQ ID NO: 92) were annealed and ligated with T4 ligase to the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sg WAS-1 plasmid, which WAS capable of transcribing sgWAS-1 (SEQ ID NO: 93), directing the saCas9 protein to the WAS-1 site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers WAS-1-peg alpha-F (SEQ ID NO: 94) and WAS-1-peg alpha-R (SEQ ID NO: 95) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg alpha plasmid vector by In-fusion cloning technology to give pUC19-U6-WAS-1-peg alpha plasmid capable of transcribing WAS-1-peg alpha (SEQ ID NO: 96), directing the PE-spCas9 protein to target the specific recognition sequence of sg alpha (SEQ ID NO: 15) on the donor vector, and reverse transcribing the homologous flap structure at the cut.
The primers WAS-1-peg beta-F (SEQ ID NO: 97) and WAS-1-peg beta-R (SEQ ID NO: 98) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-WAS-1-peg beta plasmid capable of transcribing WAS-1-peg beta (SEQ ID NO: 99), directing the specific recognition sequence of sg beta (SEQ ID NO: 19) on PE-spCas9 protein targeting donor vector, and reverse transcribing homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:100 and SEQ ID NO:101, respectively) on both sides of the reporter gene.
(5) WAS-3 site
Primers sgWAS-3-F (SEQ ID NO: 104) and sgWAS-3-R (SEQ ID NO: 105) were annealed and ligated with T4 ligase onto pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sg WAS-3 plasmid capable of transcribing sgWAS-3 (SEQ ID NO: 106), directing the saCas9 protein to WAS-3 site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers WAS-3-pegα -F (SEQ ID NO: 107) and WAS-3-pegα -R (SEQ ID NO: 108) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg α plasmid vector by In-fusion cloning technology to give pUC19-U6-WAS-3-pegα plasmid capable of transcribing WAS-3-pegα (SEQ ID NO: 109), directing the PE-spCas9 protein to the specific recognition sequence of sg α (SEQ ID NO: 15) on the donor vector, and reverse transcribing the homologous flap structure at the cut.
The primers WAS-3-peg beta-F (SEQ ID NO: 110) and WAS-3-peg beta-R (SEQ ID NO: 111) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-WAS-3-peg beta plasmid capable of transcribing WAS-3-peg beta (SEQ ID NO: 112), directing the specific recognition sequence of sg beta (SEQ ID NO: 19) on PE-spCas9 protein targeting donor vector, and reverse transcribing homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:113 and SEQ ID NO:114, respectively) on both sides of the reporter gene.
(6) HBB site
Primers sgHBB-F (SEQ ID NO: 117) and sgHBB-R (SEQ ID NO: 118) were annealed and ligated with T4 ligase to the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sgHBB plasmid capable of transcribing sgHBB (SEQ ID NO: 119) directing the saCas9 protein to the HBB site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers HBB-peg alpha-F (SEQ ID NO: 120) and HBB-peg alpha-R (SEQ ID NO: 121) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC 19-U6-peg alpha plasmid vector by In-fusion cloning technology to give pUC19-U6-HBB-peg alpha plasmid capable of transcribing HBB-peg alpha (SEQ ID NO: 122), directing PE-spCas9 protein to target specific recognition sequence (SEQ ID NO: 15) of sg alpha on the donor vector, and reverse transcribing homologous flap structure at the cut.
The primers HBB-peg beta-F (SEQ ID NO: 123) and HBB-peg beta-R (SEQ ID NO: 124) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-HBB-peg beta plasmid capable of transcribing HBB-peg beta (SEQ ID NO: 125), directing PE-spCas9 protein to target specific recognition sequence (SEQ ID NO: 19) of sg beta on the donor vector, and reverse transcribing homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:126 and SEQ ID NO:127, respectively) on both sides of the reporter gene.
(7) IL2RG site
Primers sgIL2RG-F (SEQ ID NO: 130) and sgIL2RG-R (SEQ ID NO: 131) were annealed and ligated with T4 ligase onto the pUC19-U6-sgRNA (saCas 9) plasmid digested with BsaI to give pUC19-U6-sgIL2RG plasmid capable of transcribing sgIL2RG (SEQ ID NO: 132) directing the saCas9 protein to the IL2RG site.
Primers sg alpha-F (SEQ ID NO: 12) and sg alpha-R (SEQ ID NO: 13) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI to give pUC19-U6-sg alpha plasmid capable of transcribing sg alpha (SEQ ID NO: 14) directing the spCas9 protein to target the specific recognition sequence of sg alpha on the donor vector (SEQ ID NO: 15).
Primers sg beta-F (SEQ ID NO: 16) and sg beta-F (SEQ ID NO: 17) were annealed and ligated via T4 ligase to the pUC19-U6-gRNA (spCas 9) plasmid vector digested with BsaI, resulting in pUC19-U6-sg beta plasmid capable of transcribing sg beta (SEQ ID NO: 18), directing the spCas9 protein to target specific recognition sequence of sg beta on the donor vector (SEQ ID NO: 19).
The primers IL2RG-pegα -F (SEQ ID NO: 133) and IL2RG-pegα -R (SEQ ID NO: 134) were subjected to overlap extension PCR, and the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sgα plasmid vector by In-fusion cloning technology to give pUC19-U6-IL2RG-pegα plasmid capable of transcribing IL2RG-pegα (SEQ ID NO: 135), directing the PE-spCas9 protein to target the specific recognition sequence of sgα on the donor vector (SEQ ID NO: 15), and reverse transcribing the homologous flap structure at the cut.
The primers IL2RG-peg beta-F (SEQ ID NO: 136) and IL2RG-peg beta-R (SEQ ID NO: 137) were subjected to overlap extension PCR, the resulting fragments were recovered and ligated to HindIII digested pUC19-U6-sg beta plasmid vector by In-fusion cloning technology to give pUC19-U6-IL2RG-peg beta plasmid capable of transcribing IL2RG-peg beta (SEQ ID NO: 138), directing the specific recognition sequence of sg beta (SEQ ID NO: 19) on PE-spCas9 protein targeting donor vector, and reverse transcribing homologous flap structure at the cut.
The reporter gene CAG-EGFP (SEQ ID NO: 45) was synthesized by JieRui, inc., and ligated to EcoRV digested pGH plasmid vector by T4 ligase as a donor vector. The NHEJ and f-PAINT systems use the same donor vector, which has specific recognition sequences for sg alpha and sg beta on both sides of the reporter gene (SEQ ID NO:15 and SEQ ID NO:19, respectively), which are arranged in reverse in PAM-out. For the donor vector of the HDR system, there is a homology arm sequence of about 800bp (SEQ ID NO:139 and SEQ ID NO:140, respectively) on both sides of the reporter gene.
In the implementation of the HDR system, pCAG-sacAS9, plasmids expressing sgRNAs targeting various sites, and donor vectors of HDR were delivered into K562 cells by electrotransfection with SE Cell Line 4D-Nucleofector X Kit from Lonza, inc. In the implementation of the f-PAINT system, pCAG-sacAS9, plasmids expressing sgRNA targeting each site, pCAG-PE-spCas9-mCherry, pUC19-U6-pegα, pUC19-U6-pegβ plasmids corresponding to each site, and donor vectors of f-PAINT were delivered into K562 cells by electrotransfection with SE Cell Line 4D-Nucleofector X Kit of Lonza. In the control group, which did not target the genome-specific sites, the sgRNA plasmid was not used to target the genome-specific sites. Cells were sorted on a flow cytometer 48 hours after transfection, and the sorted cells were cultured for 14 days and analyzed for EGFP-positive cell ratios using the flow cytometer. Extracting the genome of the f-PAINT system-edited cells, amplifying the genome of the cells with primers AAVS1-P1 (SEQ ID NO: 48)/CAG-EGFP-P2 (SEQ ID NO: 49) (amplifying the 5' linker of the AAVS1 site), CAG-EGFP-P3 (SEQ ID NO: 50)/AAVS 1-P4 (SEQ ID NO: 51) (amplifying the 3' linker of the AAVS1 site), CCR5-P1 (SEQ ID NO: 76)/CAG-EGFP-P2 (SEQ ID NO: 49) (amplifying the 5' linker of the CCR5 site), CAG-EGFP-P3 (SEQ ID NO: 50)/CCR 5-P4 (SEQ ID NO: 77) (amplifying the 3' linker of the CCR5 site), TRAC-EGFP 1 (SEQ ID NO: 89)/CAG-EGFP 2 (SEQ ID NO: 49) (amplifying the 5' linker of the TRAC site), CAG-EGFP 3 (TRAID NO: 50)/C-P4 (SEQ ID NO: 90) (amplifying the 5' linker of the WAS 1 site), CAG-EGFP 2 (WAP 1-5 ' linker of the WAP 1/WAS 1 (WAID NO: 50) amplifying the CAG-EGFP 1-P2 (SEQ ID NO: 50), WAS-3-P1 (SEQ ID NO: 115)/CAG-EGFP-P2 (SEQ ID NO: 49) (5 'linker of amplified WAS-3 site), CAG-EGFP-P3 (SEQ ID NO: 50)/WAS-3-P4 (SEQ ID NO: 116) (3' linker of amplified WAS-3 site), HBB-P1 (SEQ ID NO: 128)/CAG-EGFP-P2 (SEQ ID NO: 49) (5 'linker of amplified HBB site), CAG-EGFP-P3 (SEQ ID NO: 50)/HBB-P4 (SEQ ID NO: 129) (3' linker of amplified HBB site), IL2RG-P1 (SEQ ID NO: 141)/CAG-EGFP 2 (SEQ ID NO: 49) (5 'linker of amplified IL2RG site), CAG-EGFP-P3 (SEQ ID NO: 50)/IL 2RG-P4 (SEQ ID NO: 142) (3' linker of amplified IL2RG site), and performing the linker sequencing method.
The efficiency of the different methods to mediate the integration of the exogenous gene CAG-EGFP at different sites in the genome is shown in FIG. 4. The results show that the f-PAINT method mediates site-directed integration of exogenous genes in K562 cells with a low accuracy compared to HDR for different sites, but the site-directed integration efficiency is significantly higher than that of the HDR method. The results of genotyping and Sanger sequencing are shown in FIGS. 5 and 6, which show that f-PAINT can accurately mediate the integration of an exogenous gene at a genomic specific site. The above results show that the f-PAINT method has great potential for use in gene therapy.
Example 4 site-directed insertion of exogenous Gene (IRES-EGFP) into the human GAPDH site Using the h-PAINT System
3' UTR region of (C)
To test the efficacy of h-PAINT method mediated site-directed integration of exogenous genes, the following experiment was designed: the 3' UTR region of the reporter gene IRES-EGFP site-directed knocked into the human GAPDH gene was used on 293T cells using the h-PAINT system, the f-PAINT method was used as a control, and the integration efficiencies of the two methods were compared. The construction of sgrnas, expression vectors for pegrnas and donor vectors is described in example 1. Wherein, the h-PAINT (LHA) donor vector is a target recognition sequence of connecting 800bp GAPDH left homologous arm at the upstream of exogenous gene IRES-EGFP and connecting PE-spCas9/sg beta at the downstream; the h-PAINT (RHA) donor vector is a targeting recognition sequence of PE-spCas9/sg alpha connected with the upstream of the exogenous gene IRES-EGFP, and is connected with a 800bp GAPDH right homologous arm at the downstream. The primer sequences used for vector construction and the homology arm sequences of the donor vector are the same as in example 1, and are specifically shown in Table 1.
A schematic of h-PAINT method mediated site-directed integration of exogenous genes on the genome is shown in FIG. 7. A schematic representation of the use of the h-PAINT method to mediate the integration of the exogenous gene IRES-EGFP at the 3' UTR site of the GAPDH gene is shown in FIG. 8 a. In the implementation of the h-PAINT (LHA) system, pCAG-saCas9, sgGAPDH, pCAG-PE-spCas9-mCherry, pUC19-U6-peg β together with the h-PAINT (LHA) donor vector were transfected into 293T cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. In the implementation of the h-PAINT (RHA) system, pCAG-saCas9, sgGAPDH, pCAG-PE-spCas9-mCherry, pUC19-U6-peg alpha together with h-PAINT (RHA) donor vector were transfected into 293T cells with Lipofectamine 3000 liposome transfection reagent from Invitrogen. The f-PAINT system is implemented as described above. In the negative control group of the f-PAINT system and the h-PAINT system, the sgGAPDH plasmid was not transfected. The mCherry positive cells were sorted on a flow cytometer 24 hours after transfection with cells, and the ratio of EGFP positive cells was analyzed on the flow cytometer after 5 days of culture of the sorted cells. Genome extraction was performed on cells edited by h-PAINT (LHA) and h-PAINT (RHA) systems, PCR was performed on the gene-edited product using primers GAPDH-P1-2 (SEQ ID NO: 143)/GAPDH-P2 (SEQ ID NO: 30) (5 'linker of amplified h-PAINT (LHA)), GAPDH-P3 (SEQ ID NO: 31)/GAPDH-P4 (SEQ ID NO: 32) (3' linker of amplified h-PAINT (LHA)), GAPDH-P1 (SEQ ID NO: 29)/GAPDH-P2 (SEQ ID NO: 30) (5 'linker of amplified h-PAINT (RHA)), GAPDH-P3 (SEQ ID NO: 31)/GAPDH-P4-2 (SEQ ID NO: 144) (3' linker of amplified h-PAINT (RHA)), and Sanger sequencing analysis was performed on the amplified product.
The efficiency of the different approaches to mediate the integration of the exogenous gene IRES-EGFP at the genomic GAPDH site is shown in FIG. 8 b. The results show that the h-PAINT (LHA) and h-PAINT (RHA) methods have higher efficiency than the f-PAINT methods. Genotyping and sequencing results are shown in FIGS. 8c and 8 d. Sequencing analysis shows that in the h-PAINT system, the base mutation is not easy to be introduced at the joint on the side of the long homology arm, and the accuracy is higher.
Although specific embodiments of the invention have been described in detail, those skilled in the art will appreciate that: many modifications and variations of details may be made to adapt to a particular situation and the invention is intended to be within the scope of the invention. The full scope of the invention is given by the appended claims together with any equivalents thereof.
Claims (64)
- A system or kit comprising the following four components:(1) A first Cas protein or a nucleic acid molecule A1 comprising a nucleotide sequence encoding the first Cas protein, wherein the first Cas protein is capable of cleaving or cleaving one nucleic acid strand of a first double-stranded target nucleic acid;(2) A template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding said first DNA polymerase;(3) A first gRNA or a nucleic acid molecule C1 that contains a nucleotide sequence encoding the first gRNA, wherein the first gRNA is capable of binding to the first Cas protein and forming a first functional complex; the first functional complex is capable of fragmenting one nucleic acid strand of a first double-stranded target nucleic acid;(4) A first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding said first tag primer, wherein said first tag primer comprises a first tag sequence and a first target binding sequence, said first tag sequence being located upstream or 5' of said first target binding sequence; and, under conditions that allow hybridization or annealing of nucleic acids, the first target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand to form a double-stranded structure, and the first tag sequence is not bound to the nucleic acid strand in a free single-stranded state.
- The system or kit of claim 1, wherein the first Cas protein is selected from Cas proteins that cleave single strands of DNA, e.g., the cleavage of single strands of DNA refers to cleavage of single strands of DNA that are not targeted for binding by gRNA;preferably, the first Cas protein is selected from Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, cas1 protein, B2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx 9 (variants of Csx 9, csx 2 protein, csx1 protein), a (variants of Csx 9, csx 2 protein);Preferably, the first Cas protein is capable of cleaving one nucleic acid strand of a first double-stranded target nucleic acid and creating a nick; preferably, the first Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A));preferably, the first Cas protein has the amino acid sequence set forth in SEQ ID No. 3.
- The system or kit of claim 1 or 2, wherein the first DNA polymerase is selected from the group consisting of a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase;preferably, the first DNA polymerase is an RNA-dependent DNA polymerase;preferably, the first DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-related virus, and myeloblastosis-related virus (MAV);preferably, the first DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7.
- The system or kit of any one of claims 1-3, wherein the first Cas protein is linked to the first DNA polymerase;preferably, the first Cas protein is covalently linked to the first DNA polymerase by a linker or not;preferably, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35;preferably, the first Cas protein is fused to the first DNA polymerase with or without a peptide linker, forming a first fusion protein;preferably, the first Cas protein is linked or fused to the N-terminus of the first DNA polymerase, optionally through a linker; alternatively, the first Cas protein is optionally linked or fused to the C-terminus of the first DNA polymerase by a linker;preferably, the first fusion protein has the amino acid sequence shown in SEQ ID NO. 8.
- The system or kit of any one of claims 1-4, wherein the first gRNA contains a first guide sequence and the first guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a first double-stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acids;Preferably, the first guide sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more;preferably, the first gRNA further contains a first scaffold sequence that is capable of being recognized and bound by the first Cas protein, thereby forming a first functional complex;preferably, the first scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the first guide sequence is located upstream or 5' to the first scaffold sequence;preferably, the first functional complex is capable of cleaving one nucleic acid strand (the first strand) of the first double-stranded target nucleic acid after the first guide sequence binds to the other nucleic acid strand (the second strand) of the first double-stranded target nucleic acid.
- The system or kit of any one of claims 1-5, wherein the first target binding sequence is capable of hybridizing or annealing to the 3 'end of the fragmented nucleic acid strand under conditions permitting hybridization or annealing of the nucleic acid, and the 3' end is formed as a result of fragmentation of the nucleic acid strand by the first functional complex;Preferably, the first target-binding sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer in length;preferably, the first tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, after hybridization or annealing of the first target binding sequence to the 3 'end of the fragmented nucleic acid strand, the first DNA polymerase is capable of extending the 3' end of the nucleic acid strand with the first tag primer as a template; preferably, the extension forms a first lobe;preferably, the first tag primer is a single-stranded deoxyribonucleic acid or a single-stranded ribonucleic acid;preferably, the first tag primer is a single-stranded ribonucleic acid and the first DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the first tag primer is a single stranded deoxyribonucleic acid and the first DNA polymerase is a DNA-dependent DNA polymerase;preferably, the nucleic acid strand bound by the first guide sequence is different from the nucleic acid strand bound by the first target binding sequence; preferably, the first guide sequence-bound nucleic acid strand is the opposite strand of the first target binding sequence-bound nucleic acid strand.
- The system or kit of any one of claims 1-6, wherein the first tag primer is linked to the first gRNA;preferably, the first tag primer is covalently linked to the first gRNA with or without a linker;preferably, the first tag primer is attached to the 3' end of the first gRNA, optionally through a linker;preferably, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker);preferably, the first tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the first gRNA with or without a ribonucleic acid linker to form a first PegRNA.
- The system or kit of any one of claims 1-7 having one or more of the following features selected from:(1) The nucleic acid molecule A1 is capable of expressing the first Cas protein in a cell;(2) Said nucleic acid molecule B1 is capable of expressing said first DNA polymerase in a cell;(3) Said nucleic acid molecule C1 is capable of transcribing said first gRNA in a cell;(4) Said nucleic acid molecule D1 is capable of transcribing said first tag primer in a cell;preferably, the nucleic acid molecule A1 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule A1 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the first Cas protein;Preferably, the nucleic acid molecule B1 is comprised in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule B1 is an expression vector (e.g., a eukaryotic expression vector) comprising a nucleotide sequence encoding the first DNA polymerase;preferably, the nucleic acid molecule C1 is comprised in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule C1 is an expression vector (e.g., a eukaryotic expression vector) comprising a nucleotide sequence encoding the first gRNA;preferably, the nucleic acid molecule D1 is comprised in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule D1 is an expression vector (e.g., a eukaryotic expression vector) comprising a nucleotide sequence encoding the first tag primer;preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell;preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell;Preferably, two, three or four of the nucleic acid molecules A1, B1, C1 and D1 are comprised in the same expression vector (e.g. eukaryotic expression vector).
- The system or kit of any one of claims 1-8, wherein the system or kit comprises:(M1-1) a first fusion protein comprising the first Cas protein and the first DNA polymerase, or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein; or, (M1-2) the isolated first Cas protein and first DNA polymerase, or a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase; and, a step of, in the first embodiment,(M2) a first PegRNA comprising the first gRNA and a first tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA.
- The system or kit of any one of claims 1-9, wherein the system or kit further comprises:(5) A second gRNA or a nucleic acid molecule C2 that contains a nucleotide sequence encoding the second gRNA, wherein the second gRNA is capable of binding to a second Cas protein and forming a second functional complex; the second functional complex is capable of fragmenting one nucleic acid strand of a second double-stranded target nucleic acid;Preferably, the second Cas protein is the same as or different from the first Cas protein; preferably, the second Cas protein is the same as the first Cas protein;preferably, the second gRNA contains a second guide sequence and the second guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a second double-stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acids;preferably, the second functional complex breaks one strand (the first strand) of the second double-stranded target nucleic acid after the second guide sequence binds to the other strand (the second strand) of the second double-stranded target nucleic acid; preferably, the second guide sequence is different from the first guide sequence;preferably, the second double stranded target nucleic acid is the same as or different from the first double stranded target nucleic acid;preferably, the second double stranded target nucleic acid is identical to the first double stranded target nucleic acid and the second functional complex breaks a different nucleic acid strand of the double stranded target nucleic acid at a different location than the first functional complex;preferably, the second functional complex breaks a different nucleic acid strand of the same double-stranded target nucleic acid than the first functional complex, and the nucleic acid strand bound by the first guide sequence is different from the nucleic acid strand bound by the second guide sequence; preferably, the first guide sequence-bound nucleic acid strand is the opposite strand of the second guide sequence-bound nucleic acid strand;Preferably, the second double-stranded target nucleic acid is the same double-stranded target nucleic acid as the first double-stranded target nucleic acid, the double-stranded target nucleic acid comprising a first strand and a second strand, the first functional complex being capable of cleaving the first strand after the first guide sequence is bound to the second strand, the second functional complex being capable of cleaving the second strand after the second guide sequence is bound to the first strand; preferably, the length of the second guide sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the second gRNA further contains a second scaffold sequence that is capable of being recognized and bound by the second Cas protein, thereby forming a second functional complex;preferably, the second scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the second scaffold sequence is the same as or different from the first scaffold sequence; preferably, the second scaffold sequence is identical to the first scaffold sequence;preferably, the second guide sequence is located upstream or 5' to the second scaffold sequence;Preferably, the nucleic acid molecule C2 is capable of transcribing the second gRNA in a cell;preferably, the nucleic acid molecule C2 is comprised in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule C2 is an expression vector (e.g., a eukaryotic expression vector) comprising a nucleotide sequence encoding the second gRNA.
- The system or kit of claim 10, wherein the second Cas protein is different from the first Cas protein; and, the system or kit further comprises:(6) The second Cas protein or a nucleic acid molecule A2 comprising a nucleotide sequence encoding the second Cas protein, wherein the second Cas protein is capable of cleaving or cleaving one nucleic acid strand of a second double-stranded target nucleic acid;preferably, the second Cas protein is capable of cleaving one nucleic acid strand of the second double-stranded target nucleic acid and making a nick;preferably, the second Cas protein is selected from Cas proteins that cleave single strands of DNA, e.g., the cleavage of single strands of DNA refers to cleavage of single strands of DNA to which non-gRNA is targeted;preferably, the second Cas protein is selected from Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, csr 3 protein, cmr4 protein, cmr5 protein, cmr6 protein, cmb 1 protein, B2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx 9 protein, csx (variants of the case of a), csx 9, csx 2 protein, csx (variants of the case of a), csx 2 protein, csx 9, csx 2 protein;Preferably, the second Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A));preferably, the second Cas protein has the amino acid sequence set forth in SEQ ID No. 3;preferably, the nucleic acid molecule A2 is capable of expressing the second Cas protein in a cell;preferably, the nucleic acid molecule A2 is contained in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule A2 is an expression vector (e.g., a eukaryotic expression vector) containing a nucleotide sequence encoding the second Cas protein.
- The system or kit of any one of claims 1-11, wherein the system or kit further comprises:(7) A second tag primer or a nucleic acid molecule D2 comprising a nucleotide sequence encoding said second tag primer, wherein said second tag primer comprises a second tag sequence and a second target binding sequence, said second tag sequence being located upstream or 5' of said second target binding sequence; and, under conditions permitting hybridization or annealing of the nucleic acid, the second target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand to form a double-stranded structure, and the second tag sequence is not bound to the nucleic acid strand in a free single-stranded state;Preferably, the second target binding sequence is capable of hybridizing or annealing to the 3 'end of the fragmented nucleic acid strand under conditions allowing hybridization or annealing of the nucleic acid, and the 3' end is formed as a result of fragmentation of the nucleic acid strand by the second functional complex;preferably, the second target-binding sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer in length;preferably, the second target binding sequence is different from the first target binding sequence; preferably, the nucleic acid strand bound by the second target binding sequence is different from the nucleic acid strand bound by the first target binding sequence; preferably, the nucleic acid strand bound by the second target binding sequence is the opposite strand of the nucleic acid strand bound by the first target binding sequence;preferably, the second tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the second tag sequence is the same as or different from the first tag sequence; preferably, the second tag sequence is different from the first tag sequence;Preferably, after hybridization or annealing of the second target binding sequence to the 3 'end of the fragmented nucleic acid strand, the second DNA polymerase is capable of extending the 3' end of the nucleic acid strand with the second tag primer as a template; preferably, the extension forms a second lobe;preferably, the second DNA polymerase is the same as or different from the first DNA polymerase; preferably, the second DNA polymerase is the same as the first DNA polymerase;preferably, the second tag primer is a single-stranded deoxyribonucleic acid or a single-stranded ribonucleic acid;preferably, the second tag primer is a single-stranded ribonucleic acid and the second DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the second tag primer is a single stranded deoxyribonucleic acid and the second DNA polymerase is a DNA-dependent DNA polymerase;preferably, the nucleic acid strand bound by the second guide sequence is different from the nucleic acid strand bound by the second target binding sequence; preferably, the nucleic acid strand bound by the second guide sequence is the opposite strand of the nucleic acid strand bound by the second target binding sequence;preferably, the second guide sequence binds to the same nucleic acid strand as the first target binding sequence and the binding site of the second guide sequence is located upstream or 5' to the binding site of the first target binding sequence;Preferably, the first guide sequence binds to the same nucleic acid strand as the second target binding sequence, and the binding site of the first guide sequence is located upstream or 5' to the binding site of the second target binding sequence;preferably, the first and second lobes are comprised on the same double stranded target nucleic acid and are located on opposite nucleic acid strands from each other;preferably, the nucleic acid molecule D2 is capable of transcribing the second tag primer in a cell;preferably, the nucleic acid molecule D2 is comprised in an expression vector (e.g. a eukaryotic expression vector), or the nucleic acid molecule D2 is an expression vector (e.g. a eukaryotic expression vector) comprising a nucleotide sequence encoding the second tag primer.
- The system or kit of claim 12, wherein the second DNA polymerase is different from the first DNA polymerase; and, the system or kit further comprises:(8) The second DNA polymerase or a nucleic acid molecule B2 comprising a nucleotide sequence encoding the second DNA polymerase;preferably, the second DNA polymerase is selected from the group consisting of DNA-dependent DNA polymerases and RNA-dependent DNA polymerases;preferably, the second DNA polymerase is an RNA-dependent DNA polymerase;Preferably, the second DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-related virus, and myeloblastosis-related virus (MAV);preferably, the second DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7;preferably, the nucleic acid molecule B2 is capable of expressing the second DNA polymerase in a cell;preferably, the nucleic acid molecule B2 is comprised in an expression vector (e.g., a eukaryotic expression vector), or the nucleic acid molecule B2 is an expression vector (e.g., a eukaryotic expression vector) comprising a nucleotide sequence encoding the second DNA polymerase.
- The system or kit of claim 12 or 13, wherein the second tag primer is linked to the second gRNA;preferably, the second tag primer is covalently linked to the second gRNA with or without a linker;Preferably, the second tag primer is attached to the 3' end of the second gRNA, optionally through a linker;preferably, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker);preferably, the second tag primer is a single-stranded ribonucleic acid and is linked to the 3' end of the second gRNA, either with or without a ribonucleic acid linker, forming a second PegRNA;preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell;preferably, the system or kit comprises: a second PegRNA comprising the second gRNA and the second tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA.
- The system or kit of claim 13 or 14, wherein the second Cas protein is isolated or linked to the second DNA polymerase;preferably, the second Cas protein is covalently linked to the second DNA polymerase by a linker or not;Preferably, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35;preferably, the second Cas protein is fused to the second DNA polymerase with or without a peptide linker, forming a second fusion protein;preferably, the second Cas protein is linked or fused to the N-terminus of the second DNA polymerase, optionally through a linker; alternatively, the second Cas protein is linked or fused to the C-terminus of the second DNA polymerase, optionally through a linker;preferably, the second fusion protein has the amino acid sequence shown in SEQ ID NO. 8;preferably, the nucleic acid molecule A2 and the nucleic acid molecule B2 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A2 and nucleic acid molecule B2 are capable of expressing the isolated second Cas protein and the second DNA polymerase or are capable of expressing a second fusion protein containing the second Cas protein and the second DNA polymerase in a cell;preferably, the system or kit comprises, a second fusion protein comprising the second Cas protein and the second DNA polymerase, or a nucleic acid molecule comprising a nucleotide sequence encoding the second fusion protein; alternatively, the isolated second Cas protein and second DNA polymerase, or a nucleic acid molecule capable of expressing the isolated second Cas protein and second DNA polymerase;Preferably, the first and second Cas proteins are the same Cas protein, and the first and second DNA polymerases are the same DNA polymerase; and, the system or kit comprises:(M1-1) a first fusion protein comprising the first Cas protein and the first DNA polymerase, or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein; or, (M1-2) the isolated first Cas protein and first DNA polymerase, or a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase;(M2) a first PegRNA comprising the first gRNA and a first tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA;(M3) a second PegRNA comprising the second gRNA and a second tag primer, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA.
- The system or kit of any one of claims 1-15, wherein the system or kit further comprises a nucleic acid vector (e.g., a donor nucleic acid vector);preferably, the nucleic acid vector further comprises a first PAM sequence recognized by the first Cas protein, and/or a second PAM sequence recognized by the second Cas protein;Preferably, the nucleic acid vector is double-stranded;preferably, the nucleic acid vector is a circular double stranded vector;preferably, the nucleic acid vector comprises a first guide binding sequence capable of hybridizing or annealing to the first guide sequence (e.g., a complement of the first guide sequence), and/or a second guide binding sequence capable of hybridizing or annealing to the second guide sequence (e.g., a complement of the second guide sequence); optionally, the nucleic acid vector further comprises a restriction enzyme site between the first and second guide binding sequences;preferably, the first and second guide binding sequences are located on opposite strands of the nucleic acid vector;preferably, the first functional complex is capable of cleaving one nucleic acid strand (first strand) of the nucleic acid vector by the first guide binding sequence and the first PAM sequence; and/or, the second functional complex is capable of cleaving another nucleic acid strand (second strand) of the nucleic acid vector through the second guide binding sequence and the second PAM sequence.
- The system or kit of claim 16, wherein the nucleic acid vector further comprises a nucleic acid sequence of interest;Preferably, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome;preferably, the first PAM sequence and the second PAM sequence are located on both sides of the nucleic acid sequence of interest, respectively;preferably, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence;preferably, the second guide binding sequence is located between the nucleic acid sequence of interest and the second PAM sequence;preferably, the first functional complex and the second functional complex cleave a first strand and a second strand of the nucleic acid vector, respectively, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' -ends of the two nicks comprising a nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest;preferably, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state; preferably, the nucleic acid strand hybridized or annealed by the first target binding sequence is an opposite strand of the nucleic acid strand comprising the first guide binding sequence;Preferably, the second tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the second functional complex via the second target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the second tag sequence of the second tag primer is in a free state; preferably, the nucleic acid strand hybridized or annealed by the second target binding sequence is an opposite strand of the nucleic acid strand comprising the second guide binding sequence;preferably, the nucleic acid strand hybridized or annealed by the first target binding sequence is the opposite strand of the nucleic acid strand hybridized or annealed by the second target binding sequence.
- The system or kit of claim 16 or 17, wherein the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state; preferably, the first target sequence is located on the opposite strand of the first guide binding sequence; preferably, the first target sequence is located at the end of the cleaved first strand; preferably, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence is capable of extension (preferably, forming a first lobe) with the first tag primer annealed to the first target sequence as a template;And/or the number of the groups of groups,the nucleic acid vector further comprises a second target sequence; wherein the second tag primer is capable of hybridizing or annealing to the second target sequence through the second target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the second tag sequence of the second tag primer is in a free state; preferably, the second target sequence is located on the opposite strand of the second guide binding sequence; preferably, the second target sequence is located at the end of the cleaved second strand; preferably, after cleavage of the second strand by the second functional complex, the 3' end of the nucleic acid strand comprising the second target sequence is capable of extension (preferably, forming a second lobe) with the second tag primer annealed to the second target sequence as a template;preferably, the nucleic acid strand comprising the first target sequence is located on the opposite strand of the nucleic acid strand comprising the second target sequence;preferably, the nucleic acid vector further comprises a restriction site between the first target sequence and the second target sequence;preferably, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the second target sequence.
- The system or kit of any one of claims 1-18, wherein the system or kit further comprises:(9) A third nucleic acid editing system for double strand breaking a third double strand target nucleic acid;preferably, the third nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease) or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
- The system or kit of claim 19, wherein the third nucleic acid editing system is capable of fragmenting both strands of a third double-stranded target nucleic acid to form fragmented nucleotide fragments a1 and a2;preferably, said first tag sequence or its complement or said first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1 under conditions allowing hybridization or annealing of nucleic acids;preferably, the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1 at the end formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system;preferably, the complement of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3 'end or 3' portion is formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system;Preferably, said second tag sequence or its complement or said second flap is capable of hybridizing to or annealing to the fragmented nucleotide fragment a2 under conditions allowing hybridization or annealing of nucleic acids;preferably, the second tag sequence or its complement or the second flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a2 at the end formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system;preferably, the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a2, and the 3 'end or 3' portion is formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
- The system or kit of claim 19 or 20, wherein the third nucleic acid editing system is a CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system;preferably, the third nucleic acid editing system comprises: (i) A third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the third gRNA; wherein the third gRNA is capable of binding to a third Cas protein and forming a third functional complex; the third functional complex is capable of cleaving both strands of a third double-stranded target nucleic acid to form cleaved nucleotide fragments a1 and a2;Preferably, the third Cas protein is selected from Cas proteins that cleave DNA double strands, such as Cas9 proteins;preferably, the third gRNA has a sequence as set forth in any one of SEQ ID NOs 11, 38, 54, 67, 80, 93, 106, 119 or 132.
- The system or kit of any one of claims 1-20, wherein the system or kit further comprises:(10) A fourth nucleic acid editing system for double strand breaking a fourth double strand target nucleic acid;preferably, the fourth nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease) or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system;preferably, the third nucleic acid editing system and the fourth nucleic acid editing system are selected from the same site-specific nuclease technology.
- The system or kit of claim 22, wherein the fourth double stranded target nucleic acid is identical to the third double stranded target nucleic acid and the third and fourth nucleic acid editing systems cleave the same double stranded target nucleic acid at different locations forming cleaved nucleotide fragments a1, a2 and a3; wherein, prior to cleavage, in the same double-stranded target nucleic acid, the nucleotide fragments a1, a2 and a3 are arranged in sequence (i.e., the nucleotide fragment a1 is connected to the nucleotide fragment a3 by the nucleotide fragment a 2); preferably, the third and fourth nucleic acid editing systems result in the separation of nucleotide fragments a1 and a2 and the separation of nucleotide fragments a2 and a3, respectively;Preferably, said first tag sequence or its complement or said first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1 under conditions allowing hybridization or annealing of nucleic acids;preferably, the first tag sequence or its complement or the first flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a1 at the end formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system;preferably, the complement of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3 'end or 3' portion is formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system;preferably, said second tag sequence or its complement or said second flap is capable of hybridizing to or annealing to the fragmented nucleotide fragment a3 under conditions allowing hybridization or annealing of nucleic acids;preferably, the second tag sequence or its complement or the second flap is capable of hybridizing or annealing to the fragmented nucleotide fragment a3 at the end formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system;Preferably, the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the fragmented nucleotide fragment a3, and the 3 'end or 3' portion is formed by the fragmentation of the third double stranded target nucleic acid by the third nucleic acid editing system.
- The system or kit of claim 22, wherein the fourth nucleic acid editing system is a CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system;preferably, the fourth nucleic acid editing system comprises: (i) A fourth Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the fourth Cas protein, and (ii) a fourth gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the fourth gRNA; wherein the fourth gRNA is capable of binding to a fourth Cas protein and forming a fourth functional complex; the fourth functional complex is capable of fragmenting both strands of a fourth double-stranded target nucleic acid to form fragmented target nucleic acid fragments b1 and b2;preferably, the fourth Cas protein is selected from Cas proteins that cleave DNA double strands, such as Cas9 proteins.
- The system or kit of claim 24, wherein the third nucleic acid editing system and fourth nucleic acid editing system are CRISPR (clustered regularly interspaced short palindromic repeats)/Cas systems;Preferably, the third nucleic acid editing system is as defined in claim 21 and the fourth nucleic acid editing system is as defined in claim 23.
- The system or kit of any one of claims 1-25, further comprising an additional system or component;preferably, the additional component comprises one or more selected from the group consisting of:(1) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional grnas or nucleic acid molecules containing a nucleotide sequence encoding the additional grnas, wherein the additional grnas are capable of binding to a Cas protein and forming a functional complex; preferably, the functional complex is capable of cleaving two or one strand of a double stranded target nucleic acid;(2) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional Cas proteins or nucleic acid molecules containing a nucleotide sequence encoding the additional Cas proteins; preferably, the Cas protein is capable of cleaving or cleaving one or both strands of a double stranded target nucleic acid;(3) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional tag primers or nucleic acid molecules comprising a nucleotide sequence encoding the additional tag primers, wherein the additional tag primers comprise a tag sequence and a target binding sequence, the tag sequence being located upstream or 5' of the target binding sequence; preferably, the target binding sequence is capable of hybridizing or annealing to the 3' end of the fragmented nucleic acid strand under conditions that allow hybridization or annealing of the nucleic acid, forming a double stranded structure, and the tag sequence is not bound to the target nucleic acid fragment in a free single stranded state;(4) One or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) additional DNA polymerases or nucleic acid molecules comprising nucleotide sequences encoding the additional DNA polymerases; preferably, the additional DNA polymerase is selected from the group consisting of DNA-dependent DNA polymerases and RNA-dependent DNA polymerases; preferably, the additional DNA polymerase is an RNA-dependent DNA polymerase, such as a reverse transcriptase;preferably, the additional system comprises: one or more (e.g., 2, 3, 4, 5, 10, 15, 20, or more) nucleic acid editing systems for double strand breaking a double-stranded target nucleic acid;preferably, the nucleic acid editing system is a site-specific nuclease technology, e.g., ZFN (zinc finger nuclease), TALEN (transcription activator-like effector nuclease) or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system.
- A fusion protein comprising a Cas protein and a template dependent DNA polymerase, wherein the Cas protein is capable of cleaving one nucleic acid strand of a target nucleic acid;preferably, the Cas protein is capable of cleaving one nucleic acid strand of a target nucleic acid and creating a nick;Preferably, the Cas protein is selected from Cas proteins that cleave single strands of DNA;preferably, the Cas protein is selected from Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, B1 protein, B2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx16 protein, csx1 protein, csx15 (for example, a 9, a mutant form of the Csx, csx1 protein), csx 2 protein;preferably, the Cas protein is a mutant of Cas9 protein, for example a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A));Preferably, the Cas protein has the amino acid sequence shown in SEQ ID No. 3;preferably, the DNA polymerase is selected from the group consisting of DNA-dependent DNA polymerases and RNA-dependent DNA polymerases;preferably, the DNA polymerase is an RNA-dependent DNA polymerase;preferably, the DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from the moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-related virus and myeloblastosis-related virus (MAV);preferably, the DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7;preferably, the Cas protein is covalently linked to the DNA polymerase through a linker or not;preferably, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35;Preferably, the Cas protein is linked or fused to the N-terminus of the DNA polymerase, optionally through a linker; alternatively, the Cas protein is optionally linked or fused to the C-terminus of the DNA polymerase by a linker;preferably, the fusion protein has the amino acid sequence shown in SEQ ID NO. 8.
- A nucleic acid molecule comprising a polynucleotide encoding the fusion protein of claim 27.
- A vector comprising the nucleic acid molecule of claim 28;preferably, the vector is an expression vector;preferably, the vector is a eukaryotic expression vector.
- A host cell comprising the nucleic acid molecule of claim 28 or the vector of claim 29;preferably, the host cell is a prokaryotic cell, such as an E.coli cell; or the host cell is a eukaryotic cell, e.g., a yeast cell, a fungal cell, a plant cell, an animal cell;preferably, the host cell is a mammalian cell, such as a human cell.
- A method of making the fusion protein of claim 27, comprising, (1) culturing the host cell of claim 30 under conditions that allow expression of the protein; and (2) isolating the fusion protein expressed by the host cell.
- A complex comprising a first Cas protein and a template-dependent first DNA polymerase, wherein the first Cas protein has the ability to cleave one nucleic acid strand of a double-stranded target nucleic acid and the first Cas protein is complexed with the first DNA polymerase by covalent or non-covalent means;preferably, the first Cas protein is capable of cleaving one nucleic acid strand of a double-stranded target nucleic acid and creating a nick;preferably, the first Cas protein is selected from Cas proteins that cleave a DNA single strand;preferably, the first Cas protein is selected from Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas13a protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, cas1 protein, B2 protein, csb3 protein, csx17 protein, csx14 protein, csx10 protein, csx 9 (variants of Csx, csx 9, csx1 protein, csx 2 protein), a (variants of Csx 9, csx1, csx 2 protein);Preferably, the first Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A));preferably, the first Cas protein has the amino acid sequence set forth in SEQ ID No. 3;preferably, the first DNA polymerase is selected from the group consisting of a DNA-dependent DNA polymerase and an RNA-dependent DNA polymerase;preferably, the first DNA polymerase is an RNA-dependent DNA polymerase;preferably, the first DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-related virus, and myeloblastosis-related virus (MAV);preferably, the first DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7;preferably, the first Cas protein is covalently linked to the first DNA polymerase by a linker or not;Preferably, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35;preferably, the first Cas protein is fused to the first DNA polymerase with or without a peptide linker to form a fused first synthetic protein;preferably, the first Cas protein is linked or fused to the N-terminus of the first DNA polymerase, optionally through a linker; alternatively, the first Cas protein is optionally linked or fused to the C-terminus of the first DNA polymerase by a linker;preferably, the first fusion protein has the amino acid sequence shown in SEQ ID NO. 8.
- The complex of claim 32, wherein the complex further comprises a first gRNA;preferably, the first gRNA is capable of binding to the first Cas protein and forms a first functional unit; the first functional unit is capable of binding to one nucleic acid strand (second strand) of the double-stranded target nucleic acid and cleaving the other nucleic acid strand (first strand) of the double-stranded target nucleic acid;preferably, the first gRNA contains a first guide sequence and the first guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a double stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acids;Preferably, the first guide sequence has a length of at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or more;preferably, the first gRNA further contains a first scaffold sequence that is capable of being recognized and bound by the first Cas protein, thereby forming a first functional unit;preferably, the first scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the first guide sequence is located upstream or 5' to the first scaffold sequence;preferably, the complex or first functional unit is capable of cleaving one strand of a double stranded target nucleic acid after the first guide sequence binds to the double stranded target nucleic acid.
- The complex of claim 32, wherein the complex further comprises a double-stranded target nucleic acid,preferably, the double stranded target nucleic acid contains a first PAM sequence recognized by the first Cas protein and a first guide binding sequence capable of hybridizing or annealing to the first guide sequence, whereby the first functional unit binds the double stranded target nucleic acid through the first guide binding sequence and the first PAM sequence.
- The complex of claim 34, wherein the complex further comprises a first tag primer that hybridizes or anneals to the double-stranded target nucleic acid; wherein the first tag primer contains a first target binding sequence that is capable of hybridizing or annealing to the double-stranded target nucleic acid;preferably, the tag primer comprises a first tag sequence and a first target binding sequence, the first tag sequence being located upstream or 5' to the first target binding sequence; and, the first target binding sequence is capable of hybridizing or annealing to the double stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acid; preferably, the first target binding sequence is capable of hybridizing or annealing to the 3' end of the nucleic acid strand of the double-stranded target nucleic acid that is cleaved by the first functional unit, forming a double-stranded structure; preferably, the 3' end is formed by cleavage of one nucleic acid strand of the double-stranded target nucleic acid by the first functional unit; preferably, the first tag sequence is not bound to the fragmented nucleic acid strand, in a free single stranded state;preferably, the first target-binding sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt, 20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer in length;Preferably, the first tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the first tag primer binds to the fragmented nucleic acid strand through the first target binding sequence; preferably, the first DNA polymerase is bound to the fragmented nucleic acid strand and the first tag primer;preferably, the first tag primer is a single-stranded deoxyribonucleic acid or a single-stranded ribonucleic acid;preferably, the first tag primer is a single-stranded ribonucleic acid and the first DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the first tag primer is a single stranded deoxyribonucleic acid and the first DNA polymerase is a DNA-dependent DNA polymerase;preferably, the cleaved nucleic acid strand is extended by the first DNA polymerase using the first tag primer as a template to form a first flap;preferably, the first gRNA-bound nucleic acid strand is different from the first tag primer-bound nucleic acid strand; preferably, the first gRNA-bound nucleic acid strand is the opposite strand of the first tag primer-bound nucleic acid strand.
- The complex of claim 35, wherein the first tag primer is linked to the first gRNA;preferably, the first tag primer is covalently linked to the first gRNA with or without a linker;preferably, the first tag primer is attached to the 3' end of the first gRNA, optionally through a linker;preferably, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker);preferably, the first tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the first gRNA with or without a ribonucleic acid linker to form a first PegRNA.
- The complex of claim 35 or 36, wherein the complex further comprises a second Cas protein and a second gRNA, wherein the second Cas protein has the ability to cleave one nucleic acid strand of a double-stranded target nucleic acid, the second gRNA being capable of binding to the second Cas protein and forming a second functional unit; the second functional unit is capable of binding to a double stranded target nucleic acid and cleaving one strand thereof;preferably, the second Cas protein is the same as or different from the first Cas protein; preferably, the second Cas protein is the same as the first Cas protein;Preferably, the second Cas protein is capable of cleaving one nucleic acid strand of a double-stranded target nucleic acid and creating a nick;preferably, the second Cas protein is selected from Cas proteins that cleave single strands of DNA;preferably, the second Cas protein is selected from Cas9 protein, cas12a protein, cas12B protein, cas12c protein, cas12d protein, cas12e protein, cas12f protein, cas12g protein, cas12H protein, cas12i protein, cas14 protein, cas1B protein, cas2 protein, cas3 protein, cas4 protein, cas5 protein, cas6 protein, cas7 protein, cas8 protein, cas10 protein, csy1 protein, csy2 protein, csy3 protein, cse1 protein, cse2 protein, csc1 protein, csc2 protein, csa5 protein, csn2 protein, csm3 protein, csm4 protein, csm5 protein, csm6 protein, cmr1 protein, cmr3 protein, cmr4 protein, cmr5 protein, cmr6 protein, csb1 protein, csb2 protein, B3 protein, csx17 protein, csx14 protein, csx16 protein, csx3 protein, csx 9 (for example, a mutant form of Csx 9, csx 2 protein), csx 2 protein, csx 9 protein;Preferably, the second Cas protein is a mutant of Cas9 protein, e.g., a mutant of Cas9 protein of streptococcus pyogenes (spCas 9 (H840A));preferably, the second Cas protein has the amino acid sequence set forth in SEQ ID No. 3;preferably, the second gRNA contains a second guide sequence and the second guide sequence is capable of hybridizing or annealing to one nucleic acid strand of a double stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acids;preferably, the second guide sequence is different from the first guide sequence; preferably, the nucleic acid strand to which the first guide sequence binds is different from the nucleic acid strand to which the second guide sequence binds; preferably, the first guide sequence-bound nucleic acid strand is the opposite strand of the second guide sequence-bound nucleic acid strand;preferably, the second functional unit is identical to a double stranded target nucleic acid to which a first functional unit is bound, the double stranded target nucleic acid comprising a first strand and a second strand, the first functional unit being capable of cleaving the first strand after the first guide sequence is bound to the first strand, the second functional unit being capable of cleaving the first strand after the second guide sequence is bound to the first strand; preferably, the second functional unit breaks at a different position relative to the chain than the first functional unit;Preferably, the length of the second guide sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the second gRNA further contains a second scaffold sequence that is capable of being recognized and bound by the second Cas protein, thereby forming a second functional unit;preferably, the second scaffold sequence is the same as or different from the first scaffold sequence; preferably, the second scaffold sequence is identical to the first scaffold sequence;preferably, the second scaffold sequence has a length of at least 20nt, such as 20-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the second guide sequence is located upstream or 5' to the second scaffold sequence;preferably, the double stranded target nucleic acid contains a second PAM sequence recognized by the second Cas protein and a second guide binding sequence capable of hybridizing or annealing to the second guide sequence, whereby the second functional unit binds the double stranded target nucleic acid through the second guide binding sequence and the second PAM sequence.
- The complex of claim 37, wherein the complex further comprises a template-dependent second DNA polymerase that is complexed with a second Cas protein by covalent or non-covalent means;Preferably, the second DNA polymerase is selected from the group consisting of DNA-dependent DNA polymerases and RNA-dependent DNA polymerases;preferably, the second DNA polymerase is an RNA-dependent DNA polymerase;preferably, the second DNA polymerase is a reverse transcriptase, such as a reverse transcriptase from moloney murine leukemia virus Human Immunodeficiency Virus (HIV), avian sarcoma-leukemia virus (ASLV), rous Sarcoma Virus (RSV), avian Myeloblastosis Virus (AMV), avian erythroblastosis virus helper virus, avian granuloma virus MC29 helper virus, avian reticuloendotheliosis virus helper virus, avian sarcoma virus UR2 helper virus, avian sarcoma virus Y73 helper virus, rous-related virus, and myeloblastosis-related virus (MAV);preferably, the second DNA polymerase has the amino acid sequence shown in SEQ ID NO. 7;preferably, the second DNA polymerase is the same as or different from the first DNA polymerase; preferably, the second DNA polymerase is the same as the first DNA polymerase;preferably, the second Cas protein is covalently linked to the second DNA polymerase by a linker or not;preferably, the linker is a peptide linker, such as a flexible peptide linker; for example, the linker has the amino acid sequence shown in SEQ ID NO. 35;Preferably, the second Cas protein is fused to the second DNA polymerase with or without a peptide linker to form a fused second synthetic protein;preferably, the second Cas protein is linked or fused to the N-terminus of the second DNA polymerase, optionally through a linker; alternatively, the second Cas protein is linked or fused to the C-terminus of the second DNA polymerase, optionally through a linker;preferably, the second fusion protein has the amino acid sequence shown in SEQ ID NO. 8.
- The complex of claim 38, wherein the complex further comprises a second tag primer that hybridizes to or anneals to the double-stranded target nucleic acid; wherein the second tag primer contains a second target binding sequence that is capable of hybridizing or annealing to the double-stranded target nucleic acid;preferably, the tag primer contains a second tag sequence and a second target binding sequence, the second tag sequence being located upstream or 5' of the second target binding sequence; and, the second target binding sequence is capable of hybridizing or annealing to the double stranded target nucleic acid under conditions that allow hybridization or annealing of the nucleic acid; preferably, the second target binding sequence is capable of hybridizing or annealing to the 3' end of the nucleic acid strand of the double-stranded target nucleic acid that is cleaved by the second functional unit, forming a double-stranded structure; preferably, the 3' end is formed by cleavage of one strand of the double stranded target nucleic acid by the second functional unit; preferably, the second tag sequence is not bound to the fragmented nucleic acid strand, in a free single stranded state;Preferably, the second target-binding sequence is at least 5nt, such as 5-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer in length;preferably, the second target binding sequence is different from the first target binding sequence; preferably, the nucleic acid strand bound by the second target binding sequence is different from the nucleic acid strand bound by the first target binding sequence; preferably, the nucleic acid strand bound by the second target binding sequence is the opposite strand of the nucleic acid strand bound by the first target binding sequence;preferably, the second tag sequence has a length of at least 4nt, such as 4-10nt,10-15nt,15-20nt,20-25nt,25-30nt,30-40nt,40-50nt,50-100nt,100-200nt, or longer;preferably, the second tag sequence is the same as or different from the first tag sequence; preferably, the second tag sequence is different from the first tag sequence;preferably, the second tag primer binds to the fragmented nucleic acid strand through the second target binding sequence; preferably, the second DNA polymerase is bound to the fragmented nucleic acid strand and the second tag primer;Preferably, the second tag primer is a single-stranded deoxyribonucleic acid or a single-stranded ribonucleic acid;preferably, the second tag primer is a single-stranded ribonucleic acid and the second DNA polymerase is an RNA-dependent DNA polymerase; alternatively, the second tag primer is a single stranded deoxyribonucleic acid and the second DNA polymerase is a DNA-dependent DNA polymerase;preferably, the fragmented target nucleic acid fragment is extended by the second DNA polymerase with the second tag primer as a template to form a second flap;preferably, the second gRNA-bound nucleic acid strand is different from the second tag primer-bound nucleic acid strand; preferably, the second gRNA-bound nucleic acid strand is the opposite strand of the second tag primer-bound nucleic acid strand.
- The complex of claim 39, wherein the second tag primer is linked to the second gRNA;preferably, the second tag primer is covalently linked to the second gRNA with or without a linker;preferably, the second tag primer is attached to the 3' end of the second gRNA, optionally through a linker;preferably, the linker is a nucleic acid linker (e.g., a ribonucleic acid linker or a deoxyribonucleic acid linker);Preferably, the second tag primer is a single stranded ribonucleic acid and is linked to the 3' end of the second gRNA, either with or without a ribonucleic acid linker, forming a second PegRNA.
- The complex of claim 40, wherein the first and second functional units bind to double stranded target nucleic acids in a predetermined positional relationship;preferably, the second guide sequence binds to the same nucleic acid strand as the first target binding sequence; and/or, the first guide sequence binds to the same nucleic acid strand as the second target binding sequence;preferably, the binding site of the second guide sequence is located upstream or 5' to the binding site of the first target binding sequence; and/or the binding site of the first guide sequence is located upstream or 5' to the binding site of the second target binding sequence;preferably, the binding site of the second guide sequence is located downstream or 3' to the binding site of the first target binding sequence; and/or the binding site of the first guide sequence is located downstream or 3' of the binding site of the second target binding sequence;preferably, the double stranded target nucleic acid is selected from genomic DNA and nucleic acid vector DNA.
- A nucleic acid vector (e.g., a donor nucleic acid vector) comprising a first PAM sequence recognized by the first Cas protein of any one of claims 1-9;preferably, the nucleic acid vector further comprises a donor homology arm;preferably, the nucleic acid vector is double-stranded;preferably, the nucleic acid vector is a circular double stranded vector;preferably, the nucleic acid vector comprises a first guide binding sequence (e.g., a complement of the first guide sequence) capable of hybridizing or annealing to the first guide sequence;preferably, the first functional complex is capable of cleaving one nucleic acid strand of the nucleic acid vector through the first guide binding sequence and the first PAM sequence.
- The nucleic acid vector of claim 42, wherein the nucleic acid vector further comprises a nucleic acid sequence of interest;preferably, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome;preferably, the first PAM sequence and the donor homology arm are located on either side of the nucleic acid sequence of interest;preferably, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence;Preferably, the first functional complex breaks a first strand of the nucleic acid vector, the first strand comprising a nick resulting from the break, the double-stranded portion between the 3' end of the nick and the donor homology arm comprising a nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest;preferably, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state; preferably, the nucleic acid strand hybridized or annealed by the first target binding sequence is the opposite strand of the nucleic acid strand containing the first guide binding sequence.
- The nucleic acid vector of claim 42 or 43, wherein the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state; preferably, the first target sequence is located on the opposite strand of the first guide binding sequence; preferably, the first target sequence is located at the end of the cleaved first strand; preferably, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence is capable of extension (preferably, forming a first lobe) with the first tag primer annealed to the first target sequence as a template;Preferably, the nucleic acid vector further comprises a restriction site between the first target sequence and the donor homology arm;preferably, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the donor homology arm.
- A kit comprising the nucleic acid vector of any one of claims 42-44, and one or more components of the system or kit of any one of claims 1-9 (e.g., a first Cas protein or a nucleic acid molecule A1 comprising a nucleotide sequence encoding the first Cas protein, a template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding the first DNA polymerase, a first gRNA or a nucleic acid molecule C1 comprising a nucleotide sequence encoding the first gRNA, a first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer);preferably, the kit comprises the following 4 components:(a) A first Cas protein or a nucleic acid molecule A1 containing a nucleotide sequence encoding the first Cas protein;(b) A template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding said first DNA polymerase;(c) A first gRNA or a nucleic acid molecule C1 containing a nucleotide sequence encoding the first gRNA; and, a step of, in the first embodiment,(d) A first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding said first tag primer;preferably, the 4 components are contained in 1 or more (e.g., 2, 3, 4) carriers;preferably, the kit comprises the following vectors:(a) The nucleic acid vector of any one of claims 42-44;(b) A first vector comprising a nucleic acid molecule A1 encoding a nucleotide sequence of the first Cas protein and a nucleic acid molecule B1 encoding a nucleotide sequence of the first DNA polymerase;(c) A second vector comprising a nucleic acid molecule C1 encoding a nucleotide sequence of the first gRNA and a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer;optionally, the kit further comprises one or more components of the third nucleic acid editing system described in any one of claims 19-21 (e.g., (i) a third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the third gRNA).
- A nucleic acid vector (e.g., a donor nucleic acid vector) comprising a first PAM sequence recognized by the first Cas protein of any one of claims 1-9;preferably, the nucleic acid vector further comprises a second PAM sequence recognized by the second Cas protein of any one of claims 10-18;preferably, the nucleic acid vector is double-stranded;preferably, the nucleic acid vector is a circular double stranded vector;preferably, the nucleic acid vector comprises a first guide binding sequence capable of hybridizing or annealing to the first guide sequence (e.g., a complement of the first guide sequence), and/or a second guide binding sequence capable of hybridizing or annealing to the second guide sequence (e.g., a complement of the second guide sequence); optionally, the nucleic acid vector further comprises a restriction enzyme site between the first and second guide binding sequences;preferably, the first and second guide binding sequences are located on opposite strands of the nucleic acid vector;preferably, the first functional complex is capable of cleaving one nucleic acid strand (first strand) of the nucleic acid vector by the first guide binding sequence and the first PAM sequence; and/or, the second functional complex is capable of cleaving another nucleic acid strand (second strand) of the nucleic acid vector through the second guide binding sequence and the second PAM sequence.
- The nucleic acid vector of claim 46, wherein the nucleic acid vector further comprises a nucleic acid sequence of interest;preferably, the nucleic acid sequence of interest is an exogenous gene or other exogenous nucleic acid fragment to be integrated into a specific site of the genome;preferably, the first PAM sequence and the second PAM sequence are located on both sides of the nucleic acid sequence of interest, respectively;preferably, the first guide binding sequence is located between the nucleic acid sequence of interest and the first PAM sequence;preferably, the second guide binding sequence is located between the nucleic acid sequence of interest and the second PAM sequence;preferably, the first functional complex and the second functional complex cleave a first strand and a second strand of the nucleic acid vector, respectively, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' -ends of the two nicks comprising a nucleic acid sequence of interest, referred to as a target nucleic acid fragment comprising the nucleic acid sequence of interest;preferably, the first tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the first functional complex via the first target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the first tag sequence of the first tag primer is in a free state; preferably, the nucleic acid strand hybridized or annealed by the first target binding sequence is an opposite strand of the nucleic acid strand comprising the first guide binding sequence;Preferably, the second tag primer is capable of hybridizing or annealing to the 3' end of the cleaved nucleic acid strand of the second functional complex via the second target binding sequence under conditions that allow hybridization or annealing of the nucleic acid, forming a double-stranded structure, and the second tag sequence of the second tag primer is in a free state; preferably, the nucleic acid strand hybridized or annealed by the second target binding sequence is an opposite strand of the nucleic acid strand comprising the second guide binding sequence;preferably, the nucleic acid strand hybridized or annealed by the first target binding sequence is the opposite strand of the nucleic acid strand hybridized or annealed by the second target binding sequence.
- The nucleic acid vector of claim 46 or 47, wherein the nucleic acid vector further comprises a first target sequence; wherein the first tag primer is capable of hybridizing or annealing to the first target sequence through the first target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the first tag sequence of the first tag primer is in a free state; preferably, the first target sequence is located on the opposite strand of the first guide binding sequence; preferably, the first target sequence is located at the end of the cleaved first strand; preferably, after cleavage of the first strand by the first functional complex, the 3' end of the nucleic acid strand comprising the first target sequence is capable of extension (preferably, forming a first lobe) with the first tag primer annealed to the first target sequence as a template;And/or the number of the groups of groups,the nucleic acid vector further comprises a second target sequence; wherein the second tag primer is capable of hybridizing or annealing to the second target sequence through the second target binding sequence under conditions that allow hybridization or annealing of nucleic acids, forming a double-stranded structure, and wherein the second tag sequence of the second tag primer is in a free state; preferably, the second target sequence is located on the opposite strand of the second guide binding sequence; preferably, the second target sequence is located at the end of the cleaved second strand; preferably, after cleavage of the second strand by the second functional complex, the 3' end of the nucleic acid strand comprising the second target sequence is capable of extension (preferably, forming a second lobe) with the second tag primer annealed to the second target sequence as a template;preferably, the nucleic acid strand comprising the first target sequence is located on the opposite strand of the nucleic acid strand comprising the second target sequence;preferably, the nucleic acid vector further comprises a restriction site between the first target sequence and the second target sequence;preferably, the nucleic acid vector further comprises an exogenous gene between the first target sequence and the second target sequence.
- A kit comprising the nucleic acid vector of any one of claims 46-48, the one or more components of the system or kit of any one of claims 1-9 (e.g., a first Cas protein or a nucleic acid molecule A1 comprising a nucleotide sequence encoding the first Cas protein, a template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding the first DNA polymerase, a first gRNA or a nucleic acid molecule C1 comprising a nucleotide sequence encoding the first gRNA, a first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer), and the one or more components of the system or kit of any one of claims 10-18 (e.g., a second gRNA or a nucleic acid molecule C2 comprising a nucleotide sequence encoding the second gRNA, the second Cas protein or a nucleic acid molecule A2 comprising a nucleotide sequence encoding the second Cas protein, a second tag primer or a nucleic acid molecule D2 comprising a nucleotide sequence encoding the second DNA polymerase;Preferably, the kit comprises the following 8 components:(a) A first Cas protein or a nucleic acid molecule A1 containing a nucleotide sequence encoding the first Cas protein;(b) A template-dependent first DNA polymerase or a nucleic acid molecule B1 comprising a nucleotide sequence encoding said first DNA polymerase;(c) A first gRNA or a nucleic acid molecule C1 containing a nucleotide sequence encoding the first gRNA;(d) A first tag primer or a nucleic acid molecule D1 comprising a nucleotide sequence encoding said first tag primer;(e) A second gRNA or a nucleic acid molecule C2 containing a nucleotide sequence encoding the second gRNA;(f) The second Cas protein or a nucleic acid molecule A2 containing a nucleotide sequence encoding the second Cas protein;(g) A second tag primer or a nucleic acid molecule D2 comprising a nucleotide sequence encoding said second tag primer; and(h) The second DNA polymerase or a nucleic acid molecule B2 comprising a nucleotide sequence encoding the second DNA polymerase;preferably, the 8 components are contained in 1 or more (e.g., 2, 3, 4, 5, 6, 7, 8) vectors;preferably, the kit comprises the following vectors:(a) The nucleic acid vector of any one of claims 46-48;(b) A first vector comprising a nucleic acid molecule A1 encoding a nucleotide sequence of the first Cas protein and a nucleic acid molecule B1 encoding a nucleotide sequence of the first DNA polymerase;(c) A second vector comprising a nucleic acid molecule C1 encoding a nucleotide sequence of the first gRNA and a nucleic acid molecule D1 comprising a nucleotide sequence encoding the first tag primer;(d) A third vector comprising nucleic acid molecule C2 encoding the nucleotide sequence of the second gRNA and the second Cas protein and nucleic acid molecule A2 encoding the nucleotide sequence of the second Cas protein; and(e) A fourth vector comprising a nucleic acid molecule D2 encoding a nucleotide sequence of the second tag primer and a nucleic acid molecule B2 encoding a nucleotide sequence of the second DNA polymerase;optionally, the kit further comprises one or more components of the third nucleic acid editing system described in any one of claims 19-21 (e.g., (i) a third Cas protein or a nucleic acid molecule containing a nucleotide sequence encoding the third Cas protein, and (ii) a third gRNA or a nucleic acid molecule containing a nucleotide sequence encoding the third gRNA).
- A method for fragmenting one nucleic acid strand of a double-stranded target nucleic acid and adding a flap at the 3' end of the nick, wherein the method comprises using the system or kit of any one of claims 1-9.
- The method of claim 50, wherein the method comprises the steps of:i. providing a double-stranded target nucleic acid; andproviding the first Cas protein, a first gRNA, a first DNA polymerase, and a first tag primer;ii contacting the double-stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, and first tag primer;preferably, in step ii:the first Cas protein and first gRNA combine to form a first functional complex, and the first functional complex breaks one nucleic acid strand of the double-stranded target nucleic acid; and, in addition, the processing unit,the first tag primer hybridizes or anneals to the 3' end of the fragmented nucleic acid strand via the first target binding sequence; and, in addition, the processing unit,the first DNA polymerase extends the fragmented nucleic acid strand with a first tag primer annealed to the fragmented nucleic acid strand as a template, forming a first lobe;preferably, the method is performed intracellularly;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, and the first tag primer or nucleic acid molecule D1 are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, and first tag primer within the cell;Preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, and the first tag primer or nucleic acid molecule D1 are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, and first tag primer within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1 and D1 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase and first tag primer within the cell;preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell;Preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell; preferably, in step i, the first PegRNA is delivered into a cell to provide the first gRNA and the first tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into a cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer in the cell;preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein, and a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase and first tag primer within the cell;preferably, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid in the cell;Preferably, the first Cas protein, the first gRNA, the first DNA polymerase or the first tag primer is as defined in any one of claims 1-9;preferably, the double stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein; preferably, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T via the first PAM sequence and the first gRNA and breaks one of its nucleic acid strands.
- A method for separately fragmenting two nucleic acid strands of a double-stranded target nucleic acid and separately adding a flap at the 3' ends of the two nicks created by the fragmentation in the two nucleic acid strands, wherein the method comprises using the system or kit of any one of claims 10-18; wherein the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid; preferably, the method is used to break two nucleic acid strands of a double-stranded target nucleic acid at different positions, respectively;preferably, the method is performed extracellularly or intracellularly.
- The method of claim 52, wherein the method comprises the steps of:i. providing a double-stranded target nucleic acid; andproviding the first Cas protein, first gRNA, first DNA polymerase, first tag primer, the second Cas protein, second gRNA, second DNA polymerase, and second tag primer;ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer;preferably, in step ii:the first Cas protein and the first gRNA combine to form a first functional complex, and the second Cas protein and the second gRNA combine to form a second functional complex; and, the first and second functional complexes cleave the first strand and the second strand of the double-stranded target nucleic acid, respectively, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' -ends of the two nicks being referred to as a target nucleic acid fragment F1; and, in addition, the processing unit,the first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,the first and second DNA polymerases respectively perform an extension reaction using the first and second tag primers annealed to the target nucleic acid fragment F1 as templates, such that 3' ends generated by cleavage in the first and second strands respectively extend to form first and second lobes, forming a double-stranded portion having the first and second lobes, referred to as target nucleic acid fragment F2;Preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer within the cell;preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second Cas protein, the second gRNA, the second DNA polymerase, and the second tag primer within the cell;Preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, and D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer within the cell;preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell;preferably, the nucleic acid molecule A2 and the nucleic acid molecule B2 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A2 and nucleic acid molecule B2 are capable of expressing the isolated second Cas protein and the second DNA polymerase or are capable of expressing a second fusion protein containing the second Cas protein and the second DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated second Cas protein and second DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the second fusion protein is delivered into a cell and expressed in the cell to provide the second Cas protein and the second DNA polymerase within the cell;Preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell; preferably, in step i, the first PegRNA is delivered into a cell to provide the first gRNA and the first tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into a cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer in the cell;preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell; preferably, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer in the cell;Preferably, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid in the cell;preferably, the first Cas protein, the first gRNA, the first DNA polymerase or the first tag primer is as defined in any one of claims 1-9;preferably, the second Cas protein, the second gRNA, the second DNA polymerase or the second tag primer is as defined in any one of claims 10-18;preferably, the double stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein; preferably, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T via the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof.
- The method of claim 53, wherein the second Cas protein is the same as the first Cas protein and the second DNA polymerase is the same as the first DNA polymerase; wherein the first Cas protein forms first and second functional complexes with the first and second grnas, respectively, and the first DNA polymerase performs an extension reaction with the first and second tag primers annealed to the target nucleic acid fragment F1 as templates, respectively, such that the 3' ends of the first and second strands resulting from the cleavage extend to form first and second lobes, respectively, forming a double-stranded portion having the first and second lobes, referred to as target nucleic acid fragment F2;Preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell;preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, and the second tag primer or nucleic acid molecule D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, and D2 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell;Preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell;preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell; preferably, in step i, the first PegRNA is delivered into a cell to provide the first gRNA and the first tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into a cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer in the cell;Preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell; preferably, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer in the cell;preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule comprising a nucleotide sequence encoding the first fusion protein, a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA, and a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, and second tag primer within the cell.
- A method for inserting a target nucleic acid fragment into a nucleic acid molecule of interest; wherein the method comprises using the kit of claim 49; wherein the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid for providing the target nucleic acid fragment, the target nucleic acid fragment being located between a 3 'end resulting from a break in a first strand and a 3' end resulting from a break in a second strand of the double stranded target nucleic acid; and, the third double-stranded target nucleic acid is a nucleic acid molecule of interest.
- The method of claim 55, wherein the method comprises:a. cleaving a first strand and a second strand of the first double-stranded target nucleic acid, respectively, by the method of any one of claims 52 to 54, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, the double-stranded portion located between the 3' -ends of the two nicks being referred to as target nucleic acid fragment F1; adding a first lobe and a second lobe to the two 3' ends, respectively, to form a double-stranded portion having the first lobe and the second lobe, which is called a target nucleic acid fragment F2;b. fragmenting the nucleic acid molecule of interest with the third nucleic acid editing system to form fragmented nucleotide fragments a1 and a2; the method comprises the steps of,c. Ligating the nucleotide fragments a1 and a2 with the target nucleic acid fragment F2, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest;preferably, the method is performed extracellularly or intracellularly;preferably, when the nucleic acid molecule of interest is a genomic sequence present in a cell; said step a is performed outside or inside said cell; said steps b and c are performed within said cell;preferably, the method comprises the steps of:i. providing a double stranded target nucleic acid and a nucleic acid molecule of interest; andproviding the first Cas protein, first gRNA, first DNA polymerase, first tag primer, the second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system;ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer, and contacting the nucleic acid molecule of interest with the third nucleic acid editing system;preferably, in step ii:the first Cas protein and the first gRNA combine to form a first functional complex, and the second Cas protein and the second gRNA combine to form a second functional complex; and, in addition, the processing unit,The first and second functional complexes cleave a first strand and a second strand, respectively, of the double-stranded target nucleic acid, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' ends of the two nicks being referred to as target nucleic acid fragment F1, and the third nucleic acid editing system cleaves the nucleic acid molecule of interest, forming cleaved nucleotide fragments a1 and a2; and, in addition, the processing unit,the first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,the first DNA polymerase and the second DNA polymerase respectively take a first tag primer and a second tag primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that 3' -ends generated by cleavage in the first strand and the second strand respectively extend to form a first lobe and a second lobe, and a double-stranded part with the first lobe and the second lobe is formed and is called as a target nucleic acid fragment F2; wherein the first and second lobes are capable of hybridizing or annealing to the cleaved nucleotide fragments a1 and a2, respectively; and, in addition, the processing unit,The target nucleic acid fragment F2 hybridizes or anneals to the nucleotide fragments a1 and a2 through the first and second lobes, respectively, and is inserted or ligated between the nucleotide fragments a1 and a2, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest;preferably, the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a1, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system;preferably, the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3 'portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3' portion of the nucleotide fragment a1 has a first spacer region between the fragmented end formed by the third double stranded target nucleic acid;preferably, the first spacing region has a length of 1nt to 200nt, such as 1 to 10nt,10 to 20nt,20 to 30nt,30 to 40nt,40 to 50nt,50 to 100nt, or 100 to 200nt;preferably, the second lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a2, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system;Preferably, the complementary sequence of the second tag sequence or the second flap is capable of hybridizing or annealing to the 3 'portion of one nucleic acid strand of the fragmented nucleotide fragment a2, with a second spacer region between the 3' portion of the nucleotide fragment a2 and the fragmented end formed by the third double stranded target nucleic acid;preferably, the length of the second spacing region is 1nt-200nt, such as 1-10nt,10-20nt,20-30nt,30-40nt,40-50nt,50-100nt or 100-200nt;preferably, the method is performed intracellularly;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, a third nucleic acid editing system, or a nucleic acid molecule A3 encoding the same is delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell;Preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, nucleic acid molecule A3 is delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, and A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system within the cell;preferably, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid in the cell;preferably, the double stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein; preferably, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T via the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof;Preferably, the nucleic acid molecule of interest is genomic DNA of the cell;preferably, the first Cas protein, the first gRNA, the first DNA polymerase or the first tag primer is as defined in any one of claims 1-9;preferably, the second Cas protein, the second gRNA, the second DNA polymerase or the second tag primer is as defined in any one of claims 10-18;preferably, the third nucleic acid editing system is as defined in any one of claims 19 to 21;preferably, the third nucleic acid editing system is as defined in claim 21, the nucleic acid molecule of interest containing a third PAM sequence recognized by a third Cas protein; preferably, in step ii, the third functional complex binds to the nucleic acid molecule of interest via the third PAM sequence and the third gRNA and breaks it.
- The method of claim 56, wherein the first and second Cas proteins are identical, selected from Cas proteins that cleave a single strand of DNA, and the second DNA polymerase is identical to the first DNA polymerase; wherein the first Cas protein forms a first functional complex and a second functional complex with the first gRNA and the second gRNA, respectively, and the first DNA polymerase performs an extension reaction with the first tag primer and the second tag primer annealed to the target nucleic acid fragment F1 as templates, respectively, to form a target nucleic acid fragment F2 having a first flap and a second flap;Preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system, or the nucleic acid molecule A3 encoding the same are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, a third nucleic acid editing system or a nucleic acid molecule A3 encoding the same is delivered into a cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second gRNA, the second tag primer, the third nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, D2, and A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, and third nucleic acid editing system within the cell;Preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell;preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell; preferably, in step i, the first PegRNA is delivered into a cell to provide the first gRNA and the first tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into a cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer in the cell;Preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell; preferably, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer in the cell;preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein, a nucleic acid molecule containing a nucleotide sequence encoding the first PegRNA, a nucleic acid molecule containing a nucleotide sequence encoding the second PegRNA, and a nucleic acid molecule containing a sequence encoding the third nucleic acid editing system are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second tag primer, and third nucleic acid editing system within the cell.
- A method for inserting a target nucleic acid fragment into a nucleic acid molecule of interest; wherein the method comprises using the kit of claim 45; wherein the first double-stranded target nucleic acid is used to provide the target nucleic acid fragment, the target nucleic acid fragment being located between the 3' end of the double-stranded target nucleic acid resulting from the first strand break and the donor homology arm; and, the third double-stranded target nucleic acid is a nucleic acid molecule of interest;optionally, the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid and is comprised in a nucleic acid vector according to any one of claims 46 to 48.
- The method of claim 58, wherein the method comprises:a. cleaving a first strand of the first double-stranded target nucleic acid by the method of claim 50 or 51, the first strand comprising a nick resulting from the cleavage, the portion of the first strand located between the 3' end of the nick and the donor homology arm being referred to as target nucleic acid strand S1; adding a first lobe at the 3' end to form a first strand portion having the first lobe, referred to as a target nucleic acid strand S2;b. fragmenting the nucleic acid molecule of interest with the third nucleic acid editing system to form fragmented nucleotide fragments a1 and a2; the method comprises the steps of,c. The target nucleic acid strand S2 hybridizes or anneals to the first strand of the nucleotide fragment a1 through the first flap; performing an extension reaction using the target nucleic acid strand S2 as a template to form an extension strand E1, the extension strand E1 comprising a complementary sequence of the target nucleic acid strand S2 and a complementary sequence of a donor homology arm flanking the S2; ligating the extended strand E1 to a2 via a donor homology arm, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest;preferably, the 3 'end of the first strand of the nucleotide fragment a1 comprises the complement of the first lobe and the 3' end of the second strand of the nucleotide fragment a1 comprises the sequence of the first lobe;preferably, the nick end of the nucleotide fragment a2 comprises a target site homology arm;preferably, the method is performed extracellularly or intracellularly;preferably, when the nucleic acid molecule of interest is genomic DNA present in a cell; said step a is performed outside or inside said cell; said steps b, c and d are performed within said cell;preferably, the method comprises the steps of:i. providing a double-stranded target nucleic acid comprising a donor homology arm, a first PAM sequence recognized by a first Cas protein and a first gRNA-recognized sequence (preferably, the double-stranded target nucleic acid comprises a donor homology arm, a first PAM sequence recognized by a first Cas protein and a first guide sequence recognized by a first gRNA); andProviding the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system;ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, and contacting the nucleic acid molecule of interest with the third nucleic acid editing system;preferably, in step ii:the first Cas protein and the first gRNA combine to form a first functional complex; and, in addition, the processing unit,the first functional complex breaks a first strand of the double-stranded target nucleic acid, the first strand comprising a nick resulting from the break, a portion of the first strand located between the 3' end of the nick and the donor homology arm being referred to as target nucleic acid strand S1, and the third nucleic acid editing system breaks the nucleic acid molecule of interest, forming broken nucleotide fragments a1 and a2; and, in addition, the processing unit,the first tag primer hybridizes or anneals to the 3 'end of the target nucleic acid strand S1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, in addition, the processing unit,the first DNA polymerase performs an extension reaction using the first and second tag primers annealed to the target nucleic acid strand S1 as templates, so that 3' -ends generated by cleavage in the first strand are respectively extended to form first lobes, forming first strand portions having the first lobes, which are referred to as target nucleic acid strand S2; wherein the first lobe is capable of hybridizing or annealing to the fragmented nucleotide fragment a 1; and, in addition, the processing unit,The target nucleic acid strand S2 hybridizes or anneals to the first strand of the nucleotide fragment a1 through the first lobe, whereby the target nucleic acid strand S2 is connected between the second strand of the target nucleic acid fragment a1 and the second strand of the target nucleic acid fragment a 2;performing an extension reaction on the 3' -end of the first strand of the nucleotide fragment a1 by using the target nucleic acid strand S2 as a template to form an extension strand E1, wherein the extension strand E1 comprises a complementary sequence of the target nucleic acid strand S2 and a complementary sequence of a donor homology arm flanking the S2; annealing the extended strand E1 to the first strand of a2 through a donor homology arm, whereby the extended strand E1 is connected between the first strand of the target nucleic acid fragment a1 and the first strand of the target nucleic acid fragment a2, forming a double-stranded structure, thereby inserting the target nucleic acid fragment into the nucleic acid molecule of interest;preferably, the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a1, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system;preferably, the complementary sequence of the first tag sequence or the first flap is capable of hybridizing or annealing to the 3 'portion of one nucleic acid strand of the fragmented nucleotide fragment a1, and the 3' portion of the nucleotide fragment a1 has a first spacer region between the fragmented end formed by the third double stranded target nucleic acid;Preferably, the first spacing region has a length of 1nt to 200nt, such as 1 to 10nt,10 to 20nt,20 to 30nt,30 to 40nt,40 to 50nt,50 to 100nt, or 100 to 200nt;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, a third nucleic acid editing system, or a nucleic acid molecule A3 encoding the same is delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system within the cell;alternatively, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1 is contacted with the double-stranded target nucleic acid extracellularly, and then the edited double-stranded target nucleic acid is delivered into a cell with a third nucleic acid editing system or nucleic acid molecule A3 encoding the same to provide double-stranded target nucleic acid with a first lobe and a third nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, nucleic acid molecule A3 is delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system within the cell;Preferably, in step i, the nucleic acid molecules A1, B1, C1, D1 and A3 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, third nucleic acid editing system within the cell;preferably, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid in the cell;preferably, the double stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a donor homology arm; preferably, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T via the first PAM sequence and the first gRNA and breaks one strand thereof;preferably, the nucleic acid molecule of interest is genomic DNA of the cell;preferably, the first Cas protein, the first gRNA, the first DNA polymerase or the first tag primer is as defined in any one of claims 1-9;preferably, the third nucleic acid editing system is as defined in any one of claims 19 to 21;preferably, the third nucleic acid editing system is as defined in claim 21, the nucleic acid molecule of interest containing a third PAM sequence recognized by a third Cas protein; preferably, in step ii, the third functional complex binds to the nucleic acid molecule of interest via the third PAM sequence and the third gRNA and breaks it.
- A method for replacing a nucleotide fragment in a nucleic acid molecule of interest with a target nucleic acid fragment; wherein the method comprises using the system or kit of any one of claims 22-25; wherein the first double stranded target nucleic acid is identical to the second double stranded target nucleic acid for providing the target nucleic acid fragment, the target nucleic acid fragment being located between a nick resulting from a break in a first strand and a nick resulting from a break in a second strand of the double stranded target nucleic acid; and, the third double-stranded target nucleic acid is identical to the fourth double-stranded target nucleic acid, being a nucleic acid molecule of interest;preferably, the method comprises:a. cleaving a first strand and a second strand of the first double-stranded target nucleic acid, respectively, by the method of any one of claims 52 to 54, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, the double-stranded portion located between the 3' -ends of the two nicks being referred to as target nucleic acid fragment F1; adding a first lobe and a second lobe to the two 3' ends, respectively, to form a double-stranded portion having the first lobe and the second lobe, which is called a target nucleic acid fragment F2;b. fragmenting said nucleic acid molecules of interest with said third and fourth nucleic acid editing systems to form fragmented nucleotide fragments a1, a2 and a3; wherein, prior to cleavage, in the nucleic acid molecule of interest, nucleotide fragments a1, a2 and a3 are arranged in sequence (i.e., nucleotide fragment a1 is linked to nucleotide fragment a3 by nucleotide fragment a 2); the method comprises the steps of,c. Ligating the nucleotide fragments a1 and a3 with the target nucleic acid fragment F2, thereby replacing the nucleotide fragment a2 in the nucleic acid molecule of interest with the target nucleic acid fragment.
- The method of claim 60, wherein the method comprises the steps of:i. providing a double stranded target nucleic acid and a nucleic acid molecule of interest; andproviding the first Cas protein, first gRNA, first DNA polymerase, first tag primer, the second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system;ii contacting the double stranded target nucleic acid with the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, and second tag primer, and contacting the nucleic acid molecule of interest with a third nucleic acid editing system and a fourth nucleic acid editing system;preferably, in step ii:the first Cas protein and the first gRNA combine to form a first functional complex, and the second Cas protein and the second gRNA combine to form a second functional complex; and, in addition, the processing unit,the first and second functional complexes cleave a first strand and a second strand, respectively, of the double-stranded target nucleic acid, the first strand and the second strand comprising a nick resulting from the cleavage, respectively, a double-stranded portion located between the 3' ends of the two nicks being referred to as target nucleic acid fragment F1, and the third and fourth nucleic acid editing systems cleave the nucleic acid molecule of interest, forming cleaved nucleotide fragments a1, a2, and a3; and, in addition, the processing unit,The first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,the first DNA polymerase and the second DNA polymerase respectively take a first tag primer and a second tag primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that 3' -ends generated by cleavage in the first strand and the second strand respectively extend to form a first lobe and a second lobe, and a double-stranded part with the first lobe and the second lobe is formed and is called as a target nucleic acid fragment F2; wherein the first and second lobes are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a3, respectively; and, in addition, the processing unit,the target nucleic acid fragment F2 hybridizes or anneals to the nucleotide fragments a1 and a3 through the first and second lobes, respectively, and is further connected between the nucleotide fragments a1 and a3, thereby replacing the nucleotide fragment a2 in the nucleic acid molecule of interest with the target nucleic acid fragment;Preferably, the first lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a1, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the third nucleic acid editing system;preferably, the second lobe is capable of hybridizing or annealing to the 3 'end or 3' portion of one nucleic acid strand of the nucleotide fragment a3, and the 3 'end or 3' portion is formed by cleavage of the nucleic acid molecule of interest by the fourth nucleic acid editing system;preferably, the method is performed intracellularly;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, and the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second Cas protein, the second gRNA, the second DNA polymerase, the second tag primer, the third nucleic acid editing system, and the fourth nucleic acid editing system within the cell;Preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, nucleic acid molecule A3 and nucleic acid molecule A4 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system and fourth nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, A3, and A4 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second Cas protein, second gRNA, second DNA polymerase, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell;preferably, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid in the cell;Preferably, the double stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein; preferably, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T via the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof;preferably, the nucleic acid molecule of interest is genomic DNA of the cell;preferably, the first Cas protein, the first gRNA, the first DNA polymerase or the first tag primer is as defined in any one of claims 1-9;preferably, the second Cas protein, the second gRNA, the second DNA polymerase or the second tag primer is as defined in any one of claims 10-18;preferably, the third nucleic acid editing system is as defined in any one of claims 19 to 21;preferably, the fourth nucleic acid editing system is as defined in any one of claims 22 to 25;preferably, the third nucleic acid editing system is as defined in claim 21, the fourth nucleic acid editing system is as defined in claim 23, the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein and a fourth PAM sequence recognized by a fourth Cas protein; preferably, in step ii, the third functional complex binds to the nucleic acid molecule of interest via the third PAM sequence and the third gRNA and breaks it; and, the fourth functional complex binds to and breaks the nucleic acid molecule of interest through the fourth PAM sequence and the fourth gRNA.
- The method of claim 61, wherein the first and second Cas proteins are identical, selected from Cas proteins that cleave a single strand of DNA, and the second DNA polymerase is identical to the first DNA polymerase; wherein the first Cas protein forms a first functional complex and a second functional complex with the first gRNA and the second gRNA, respectively, and the first DNA polymerase performs an extension reaction with the first tag primer and the second tag primer annealed to the target nucleic acid fragment F1 as templates, respectively, to form a target nucleic acid fragment F2 having a first flap and a second flap;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, and the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell;Preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecules A3 and A4 are delivered into a cell to provide the first Cas protein, the first gRNA, the first DNA polymerase, the first tag primer, the second gRNA, the second tag primer, the third nucleic acid editing system and the fourth nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, D2, A3, and A4 are delivered into the cell to provide the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell;preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell;Preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell; preferably, in step i, the first PegRNA is delivered into a cell to provide the first gRNA and the first tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into a cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer in the cell;preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell; preferably, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer in the cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein, a nucleic acid molecule containing a nucleotide sequence encoding the first PegRNA, a nucleic acid molecule containing a nucleotide sequence encoding the second PegRNA, a nucleic acid molecule containing a nucleotide sequence encoding the third nucleic acid editing system, and a nucleic acid molecule containing a nucleotide sequence encoding the fourth nucleic acid editing system are delivered into a cell and transcribed and expressed in the cell, thereby providing the first Cas protein, first gRNA, first DNA polymerase, first tag primer, second gRNA, second tag primer, third nucleic acid editing system, and fourth nucleic acid editing system within the cell.
- The method of claim 60, wherein the method comprises the steps of:i. providing a double stranded target nucleic acid and a nucleic acid molecule of interest; andproviding the first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and a third and fourth nucleic acid editing system; wherein the third nucleic acid editing system and the fourth nucleic acid editing system are as defined in claims 21 and 23, respectively;ii contacting the double stranded target nucleic acid with the first and second Cas proteins, first and second grnas, first and second DNA polymerases, first and second tag primers, and contacting the nucleic acid molecule of interest with the third nucleic acid editing system and fourth nucleic acid editing system;preferably, in step ii:the first Cas protein and the first gRNA combine to form a first functional complex, the second Cas protein and the second gRNA combine to form a second functional complex, the third Cas protein and the third gRNA combine to form a third functional complex, and the fourth Cas protein and the fourth gRNA combine to form a fourth functional complex; and, in addition, the processing unit,The first and second functional complexes cleave, respectively, a first strand and a second strand of the double-stranded target nucleic acid, the first strand and the second strand comprising, respectively, a3 'end resulting from the cleavage, a double-stranded portion located between the two 3' ends being referred to as target nucleic acid fragment F1, and the third and fourth functional complexes bind to and cleave the nucleic acid molecule of interest, forming cleaved nucleotide fragments a1, a2 and a3; and, in addition, the processing unit,the first tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the first target binding sequence; and, the second tag primer hybridizes or anneals to the 3 'end of the other nucleic acid strand of the target nucleic acid fragment F1 (i.e., the 3' end resulting from the cleavage) via the second target binding sequence; and, in addition, the processing unit,the first DNA polymerase and the second DNA polymerase respectively take a first tag primer and a second tag primer annealed to the target nucleic acid fragment F1 as templates to perform an extension reaction, so that 3' -ends generated by cleavage in the first strand and the second strand respectively extend to form a first lobe and a second lobe, and a double-stranded part with the first lobe and the second lobe is formed and is called as a target nucleic acid fragment F2; wherein the first and second lobes are capable of hybridizing or annealing to the fragmented nucleotide fragments a1 and a3, respectively; and, in addition, the processing unit,The third tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the nucleotide fragment a1 via the third target binding sequence, wherein the 3' end is formed by cleavage of a nucleic acid molecule of interest by the third functional complex; and, the fourth tag primer hybridizes or anneals to the 3 'end of one nucleic acid strand of the nucleotide fragment a3 through the fourth target binding sequence, wherein the 3' end is formed by cleavage of the nucleic acid molecule of interest by the fourth functional complex;preferably, the method is performed intracellularly;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second Cas protein or nucleic acid molecule A2, the second DNA polymerase or nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, and the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and the third and fourth nucleic acid editing systems within the cell;Preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the nucleic acid molecule A2, the nucleic acid molecule B2, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecule A3, the nucleic acid molecule A4 are delivered into a cell to provide the first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and third and fourth nucleic acid editing systems within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, A2, B2, C2, D2, A3, A4 are delivered into the cell to provide first and second Cas proteins, the first and second grnas, the first and second DNA polymerases, and the first and second tag primers, and third and fourth nucleic acid editing systems within the cell;preferably, in step i, the double stranded target nucleic acid or a nucleic acid molecule T comprising the double stranded target nucleic acid is delivered into a cell to provide the double stranded target nucleic acid in the cell;Preferably, the double stranded target nucleic acid or nucleic acid molecule T contains a first PAM sequence recognized by a first Cas protein and a second PAM sequence recognized by a second Cas protein; preferably, in step ii, the first functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T via the first PAM sequence and the first gRNA and breaks one strand thereof; and, the second functional complex binds to the double stranded target nucleic acid or nucleic acid molecule T through the second PAM sequence and the second gRNA and breaks the other strand thereof;preferably, the nucleic acid molecule of interest contains a third PAM sequence recognized by a third Cas protein and a fourth PAM sequence recognized by a fourth Cas protein; preferably, in step ii, the third functional complex binds to the nucleic acid molecule of interest via the third PAM sequence and the third gRNA and breaks it; and, the fourth functional complex binds to and breaks the nucleic acid molecule of interest through the fourth PAM sequence and the fourth gRNA;preferably, the nucleic acid molecule of interest is genomic DNA of the cell;preferably, the first Cas protein, the first gRNA, the first DNA polymerase or the first tag primer is as defined in any one of claims 1-9;Preferably, the second Cas protein, the second gRNA, the second DNA polymerase or the second tag primer is as defined in any one of claims 10-18.
- The method of claim 61, wherein the first and second Cas proteins are identical, selected from the group consisting of Cas proteins that cleave a DNA duplex, the third and fourth Cas proteins are identical, selected from the group consisting of Cas proteins that cleave a DNA duplex, and the first, second, third, and fourth DNA polymerases are identical DNA polymerases; wherein the first Cas protein forms a first, second, third, and fourth functional complex with the first, second, third, and fourth grnas, respectively; the first DNA polymerase performs an extension reaction using the first and second tag primers annealed to the target nucleic acid fragment F1 as templates, respectively, to form a target nucleic acid fragment F2 having a first flap and a second flap;preferably, in step i, the first Cas protein or nucleic acid molecule A1, the first DNA polymerase or nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the third nucleic acid editing system or nucleic acid molecule A3 encoding the same, the fourth nucleic acid editing system or nucleic acid molecule A4 encoding the same are delivered into a cell to provide the first Cas protein, the first DNA polymerase, the first and second gRNA, and the first and second tag primers, and the third and fourth nucleic acid editing systems within the cell;Preferably, in step i, the nucleic acid molecule A1, the nucleic acid molecule B1, the first gRNA or nucleic acid molecule C1, the first tag primer or nucleic acid molecule D1, the second gRNA or nucleic acid molecule C2, the second tag primer or nucleic acid molecule D2, the nucleic acid molecules A3 and A4 are delivered into a cell to provide the first Cas protein, first DNA polymerase, third nucleic acid editing system and fourth nucleic acid editing system within the cell;preferably, in step i, the nucleic acid molecules A1, B1, C1, D1, C2, D2, A3 are delivered into the cell to provide the first Cas protein, first DNA polymerase, first, second gRNA, and first, second tag primers, and third and fourth nucleic acid editing systems within the cell;preferably, the nucleic acid molecule A1 and the nucleic acid molecule B1 are comprised in the same or different expression vectors (e.g. eukaryotic expression vectors); preferably, the nucleic acid molecule A1 and nucleic acid molecule B1 are capable of expressing the isolated first Cas protein and the first DNA polymerase or of expressing a first fusion protein comprising the first Cas protein and the first DNA polymerase in a cell; preferably, in step i, a nucleic acid molecule capable of expressing the isolated first Cas protein and first DNA polymerase or a nucleic acid molecule containing a nucleotide sequence encoding the first fusion protein is delivered into a cell and expressed in the cell to provide the first Cas protein and the first DNA polymerase within the cell;Preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C1 and the nucleic acid molecule D1 are capable of transcribing a first PegRNA comprising the first gRNA and the first tag primer in a cell; preferably, in step i, the first PegRNA is delivered into a cell to provide the first gRNA and the first tag primer in the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the first PegRNA is delivered into a cell and the first PegRNA is transcribed in the cell to provide the first gRNA and the first tag primer in the cell;preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are comprised in the same expression vector (e.g. eukaryotic expression vector); preferably, the nucleic acid molecule C2 and the nucleic acid molecule D2 are capable of transcribing a second PegRNA comprising the second gRNA and the second tag primer in the cell; preferably, in step i, the second PegRNA is delivered into the cell to provide the second gRNA and the second tag primer within the cell, or a nucleic acid molecule comprising a nucleotide sequence encoding the second PegRNA is delivered into the cell and the second PegRNA is transcribed in the cell to provide the second gRNA and the second tag primer within the cell.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2021110324986 | 2021-09-03 | ||
CN202111032498 | 2021-09-03 | ||
PCT/CN2022/086979 WO2023029492A1 (en) | 2021-09-03 | 2022-04-15 | System and method for site-specific integration of exogenous genes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117897481A true CN117897481A (en) | 2024-04-16 |
Family
ID=85411860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280059607.XA Pending CN117897481A (en) | 2021-09-03 | 2022-04-15 | Exogenous gene fixed-point integration system and method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117897481A (en) |
WO (1) | WO2023029492A1 (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108070610B (en) * | 2016-11-08 | 2021-11-09 | 中国科学院分子植物科学卓越创新中心 | Plant genome fixed-point knocking-in method |
EP3568470B1 (en) * | 2017-01-10 | 2022-07-06 | Christiana Care Health Services, Inc. | Methods for in vitro site-directed mutagenesis using gene editing technologies |
CN108690845B (en) * | 2017-04-10 | 2021-04-27 | 中国科学院动物研究所 | Genome editing system and method |
CN113913405A (en) * | 2020-07-10 | 2022-01-11 | 中国科学院动物研究所 | System and method for editing nucleic acid |
CN113308451B (en) * | 2020-12-07 | 2023-07-25 | 中国科学院动物研究所 | Engineered Cas effector proteins and methods of use thereof |
-
2022
- 2022-04-15 CN CN202280059607.XA patent/CN117897481A/en active Pending
- 2022-04-15 WO PCT/CN2022/086979 patent/WO2023029492A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2023029492A1 (en) | 2023-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10913941B2 (en) | Enzymes with RuvC domains | |
US20240117330A1 (en) | Enzymes with ruvc domains | |
US11713471B2 (en) | Class II, type V CRISPR systems | |
WO2022253185A1 (en) | Cas12 protein, gene editing system containing cas12 protein, and application | |
US9738908B2 (en) | CRISPR/Cas systems for genomic modification and gene modulation | |
US20240175055A1 (en) | Crispr/cas9 gene editing system and application thereof | |
WO2022007959A1 (en) | System and method for editing nucleic acid | |
CN109136248B (en) | Multi-target editing vector and construction method and application thereof | |
IL288263B (en) | Crispr hybrid dna/rna polynucleotides and methods of use | |
US20240209332A1 (en) | Enzymes with ruvc domains | |
JP2020517299A (en) | Site-specific DNA modification using a donor DNA repair template with tandem repeats | |
US20220298494A1 (en) | Enzymes with ruvc domains | |
CN112608948A (en) | Structure of two multifunctional gene editing tools and use method thereof | |
US20230340481A1 (en) | Systems and methods for transposing cargo nucleotide sequences | |
US20220220460A1 (en) | Enzymes with ruvc domains | |
CN117384880A (en) | Engineered nucleic acid modification editor | |
KR102151064B1 (en) | Gene editing composition comprising sgRNAs with matched 5' nucleotide and gene editing method using the same | |
WO2022159742A1 (en) | Novel engineered and chimeric nucleases | |
WO2023029492A1 (en) | System and method for site-specific integration of exogenous genes | |
JP2024509047A (en) | CRISPR-related transposon system and its usage | |
JP2024509048A (en) | CRISPR-related transposon system and its usage | |
CN110551763A (en) | CRISPR/SlutCas9 gene editing system and application thereof | |
US12123014B2 (en) | Class II, type V CRISPR systems | |
US20240110167A1 (en) | Enzymes with ruvc domains | |
GB2617659A (en) | Enzymes with RUVC domains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |