CN117126827A - Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application - Google Patents
Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application Download PDFInfo
- Publication number
- CN117126827A CN117126827A CN202310733252.4A CN202310733252A CN117126827A CN 117126827 A CN117126827 A CN 117126827A CN 202310733252 A CN202310733252 A CN 202310733252A CN 117126827 A CN117126827 A CN 117126827A
- Authority
- CN
- China
- Prior art keywords
- glycosylase
- seq
- amino acid
- fusion protein
- acid sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 102000037865 fusion proteins Human genes 0.000 title claims abstract description 104
- 108020001507 fusion proteins Proteins 0.000 title claims abstract description 104
- 102000006943 Uracil-DNA Glycosidase Human genes 0.000 title claims abstract description 40
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 title claims abstract description 40
- 108091027544 Subgenomic mRNA Proteins 0.000 claims abstract description 41
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 claims abstract description 29
- 101710163270 Nuclease Proteins 0.000 claims abstract description 24
- 230000000694 effects Effects 0.000 claims abstract description 24
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 18
- 241000588724 Escherichia coli Species 0.000 claims abstract description 12
- 102000004190 Enzymes Human genes 0.000 claims abstract description 6
- 108090000790 Enzymes Proteins 0.000 claims abstract description 6
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 6
- 108060004795 Methyltransferase Proteins 0.000 claims abstract description 3
- 102220605874 Cytosolic arginine sensor for mTORC1 subunit 2_D10A_mutation Human genes 0.000 claims abstract 4
- 108010077850 Nuclear Localization Signals Proteins 0.000 claims description 20
- 238000002360 preparation method Methods 0.000 claims description 19
- 238000010362 genome editing Methods 0.000 claims description 10
- 102100026406 G/T mismatch-specific thymine DNA glycosylase Human genes 0.000 claims description 5
- 101000835738 Homo sapiens G/T mismatch-specific thymine DNA glycosylase Proteins 0.000 claims description 5
- 150000007523 nucleic acids Chemical class 0.000 claims description 5
- 241000244206 Nematoda Species 0.000 claims description 4
- 102000039446 nucleic acids Human genes 0.000 claims description 4
- 108020004707 nucleic acids Proteins 0.000 claims description 4
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 4
- 125000003275 alpha amino acid group Chemical group 0.000 claims 21
- 230000000717 retained effect Effects 0.000 claims 1
- 210000004027 cell Anatomy 0.000 abstract description 62
- 230000035772 mutation Effects 0.000 abstract description 44
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 abstract description 16
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 abstract description 12
- 229940104302 cytosine Drugs 0.000 abstract description 8
- 210000002257 embryonic structure Anatomy 0.000 abstract description 6
- 229940113082 thymine Drugs 0.000 abstract description 6
- 238000010171 animal model Methods 0.000 abstract description 5
- 210000004986 primary T-cell Anatomy 0.000 abstract description 5
- 208000026350 Inborn Genetic disease Diseases 0.000 abstract description 4
- 208000016361 genetic disease Diseases 0.000 abstract description 4
- 206010064571 Gene mutation Diseases 0.000 abstract description 3
- 230000001404 mediated effect Effects 0.000 abstract description 2
- 230000008685 targeting Effects 0.000 abstract description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 abstract 2
- 241000124008 Mammalia Species 0.000 abstract 1
- 230000009466 transformation Effects 0.000 abstract 1
- 150000001413 amino acids Chemical group 0.000 description 66
- 238000006243 chemical reaction Methods 0.000 description 34
- 238000010586 diagram Methods 0.000 description 29
- 108020004414 DNA Proteins 0.000 description 27
- 235000001014 amino acid Nutrition 0.000 description 26
- 238000012163 sequencing technique Methods 0.000 description 23
- 239000013598 vector Substances 0.000 description 22
- 238000001890 transfection Methods 0.000 description 18
- 239000013604 expression vector Substances 0.000 description 16
- RXWNCPJZOCPEPQ-NVWDDTSBSA-N puromycin Chemical compound C1=CC(OC)=CC=C1C[C@H](N)C(=O)N[C@H]1[C@@H](O)[C@H](N2C3=NC=NC(=C3N=C2)N(C)C)O[C@@H]1CO RXWNCPJZOCPEPQ-NVWDDTSBSA-N 0.000 description 16
- 239000000047 product Substances 0.000 description 14
- 101000808011 Homo sapiens Vascular endothelial growth factor A Proteins 0.000 description 12
- 102100039037 Vascular endothelial growth factor A Human genes 0.000 description 12
- 239000012634 fragment Substances 0.000 description 12
- 108091033409 CRISPR Proteins 0.000 description 10
- 102100035102 E3 ubiquitin-protein ligase MYCBP2 Human genes 0.000 description 9
- 102220600801 BPI fold-containing family A member 1_L74Q_mutation Human genes 0.000 description 8
- 108010075254 C-Peptide Proteins 0.000 description 8
- 239000013592 cell lysate Substances 0.000 description 8
- 230000009089 cytolysis Effects 0.000 description 8
- 230000009437 off-target effect Effects 0.000 description 8
- 239000013612 plasmid Substances 0.000 description 8
- 229950010131 puromycin Drugs 0.000 description 8
- 238000012216 screening Methods 0.000 description 8
- 241000699666 Mus <mouse, genus> Species 0.000 description 7
- 239000003242 anti bacterial agent Substances 0.000 description 7
- 229940088710 antibiotic agent Drugs 0.000 description 7
- 239000013642 negative control Substances 0.000 description 7
- 102000053602 DNA Human genes 0.000 description 6
- 230000005782 double-strand break Effects 0.000 description 6
- 238000000034 method Methods 0.000 description 6
- 238000002818 protein evolution Methods 0.000 description 6
- 230000008439 repair process Effects 0.000 description 6
- 239000012096 transfection reagent Substances 0.000 description 6
- 102220535565 Trace amine-associated receptor 6_I37T_mutation Human genes 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 210000001161 mammalian embryo Anatomy 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 4
- 241000699670 Mus sp. Species 0.000 description 4
- 108020004682 Single-Stranded DNA Proteins 0.000 description 4
- 210000001744 T-lymphocyte Anatomy 0.000 description 4
- 230000003197 catalytic effect Effects 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 239000002609 medium Substances 0.000 description 4
- 229910052754 neon Inorganic materials 0.000 description 4
- GKAOGPIIYCISHV-UHFFFAOYSA-N neon atom Chemical compound [Ne] GKAOGPIIYCISHV-UHFFFAOYSA-N 0.000 description 4
- 239000002773 nucleotide Substances 0.000 description 4
- 125000003729 nucleotide group Chemical group 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 238000010354 CRISPR gene editing Methods 0.000 description 3
- 108010063362 DNA-(Apurinic or Apyrimidinic Site) Lyase Proteins 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000033590 base-excision repair Effects 0.000 description 3
- 238000005119 centrifugation Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000006780 non-homologous end joining Effects 0.000 description 3
- 230000001717 pathogenic effect Effects 0.000 description 3
- 235000018102 proteins Nutrition 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 208000035657 Abasia Diseases 0.000 description 2
- 229930024421 Adenine Natural products 0.000 description 2
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 2
- 108020004635 Complementary DNA Proteins 0.000 description 2
- 230000033616 DNA repair Effects 0.000 description 2
- 230000007018 DNA scission Effects 0.000 description 2
- 102000010719 DNA-(Apurinic or Apyrimidinic Site) Lyase Human genes 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 206010013801 Duchenne Muscular Dystrophy Diseases 0.000 description 2
- 108010042407 Endonucleases Proteins 0.000 description 2
- 102000004533 Endonucleases Human genes 0.000 description 2
- 102000001554 Hemoglobins Human genes 0.000 description 2
- 108010054147 Hemoglobins Proteins 0.000 description 2
- 101000914514 Homo sapiens T-cell-specific surface glycoprotein CD28 Proteins 0.000 description 2
- 240000000249 Morus alba Species 0.000 description 2
- 235000008708 Morus alba Nutrition 0.000 description 2
- 102100027213 T-cell-specific surface glycoprotein CD28 Human genes 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 229960000643 adenine Drugs 0.000 description 2
- 201000001038 autosomal recessive chronic granulomatous disease cytochrome b-positive type II Diseases 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010804 cDNA synthesis Methods 0.000 description 2
- 229960005091 chloramphenicol Drugs 0.000 description 2
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- 238000012258 culturing Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000013613 expression plasmid Substances 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- OOYGSFOGFJDDHP-KMCOLRRFSA-N kanamycin A sulfate Chemical compound OS(O)(=O)=O.O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N OOYGSFOGFJDDHP-KMCOLRRFSA-N 0.000 description 2
- 229960002064 kanamycin sulfate Drugs 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000002018 overexpression Effects 0.000 description 2
- 101150036331 pah gene Proteins 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 239000002244 precipitate Substances 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 238000007480 sanger sequencing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 210000005253 yeast cell Anatomy 0.000 description 2
- OZFAFGSSMRRTDW-UHFFFAOYSA-N (2,4-dichlorophenyl) benzenesulfonate Chemical compound ClC1=CC(Cl)=CC=C1OS(=O)(=O)C1=CC=CC=C1 OZFAFGSSMRRTDW-UHFFFAOYSA-N 0.000 description 1
- 102220564922 14-3-3 protein epsilon_H92A_mutation Human genes 0.000 description 1
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 238000011746 C57BL/6J (JAX™ mouse strain) Methods 0.000 description 1
- 102220616659 CCAAT/enhancer-binding protein alpha_D63A_mutation Human genes 0.000 description 1
- 238000010446 CRISPR interference Methods 0.000 description 1
- 102220596411 Centrosomal protein of 63 kDa_H92L_mutation Human genes 0.000 description 1
- 230000005971 DNA damage repair Effects 0.000 description 1
- 230000008836 DNA modification Effects 0.000 description 1
- 102100028285 DNA repair protein REV1 Human genes 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 231100001074 DNA strand break Toxicity 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 239000012591 Dulbecco’s Phosphate Buffered Saline Substances 0.000 description 1
- ZGTMUACCHSMWAC-UHFFFAOYSA-L EDTA disodium salt (anhydrous) Chemical compound [Na+].[Na+].OC(=O)CN(CC([O-])=O)CCN(CC(O)=O)CC([O-])=O ZGTMUACCHSMWAC-UHFFFAOYSA-L 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- 102220646033 Gonadotropin-releasing hormone receptor_I37S_mutation Human genes 0.000 description 1
- 208000025500 Hutchinson-Gilford progeria syndrome Diseases 0.000 description 1
- 102220466333 Iduronate 2-sulfatase_H11P_mutation Human genes 0.000 description 1
- 238000012404 In vitro experiment Methods 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- 229930182816 L-glutamine Natural products 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- 102220479923 Leucine-rich repeat-containing protein 26_H11A_mutation Human genes 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 101150042254 P43K gene Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 208000007932 Progeria Diseases 0.000 description 1
- 102220509267 Sarcolipin_L74A_mutation Human genes 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 101150063416 add gene Proteins 0.000 description 1
- 206010064930 age-related macular degeneration Diseases 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000037429 base substitution Effects 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229930189065 blasticidin Natural products 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 239000006143 cell culture medium Substances 0.000 description 1
- 108091036078 conserved sequence Proteins 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 230000009615 deamination Effects 0.000 description 1
- 238000006481 deamination reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003651 drinking water Substances 0.000 description 1
- 235000020188 drinking water Nutrition 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000007457 establishment of nucleus localization Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 230000007124 immune defense Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000002934 lysing effect Effects 0.000 description 1
- 208000002780 macular degeneration Diseases 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000002205 phenol-chloroform extraction Methods 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009465 prokaryotic expression Effects 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 239000002719 pyrimidine nucleotide Substances 0.000 description 1
- 150000003230 pyrimidines Chemical class 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 102200104027 rs1057520006 Human genes 0.000 description 1
- 102220174099 rs201478799 Human genes 0.000 description 1
- 102220093468 rs759289686 Human genes 0.000 description 1
- 102220250470 rs774802052 Human genes 0.000 description 1
- 102220075804 rs796053434 Human genes 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 208000007056 sickle cell anemia Diseases 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/24—Hydrolases (3) acting on glycosyl compounds (3.2)
- C12N9/2497—Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y302/00—Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
- C12Y302/02—Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
- C12Y302/02027—Uracil-DNA glycosylase (3.2.2.27)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/01—Fusion polypeptide containing a localisation/targetting motif
- C07K2319/09—Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Peptides Or Proteins (AREA)
Abstract
The invention discloses a fusion protein, a base editing system mediated by uracil-N-glycosylase mutant and application thereof. The fusion protein comprises nuclease and uracil-N-glycosylase mutant, wherein the uracil-N-glycosylase mutant is connected with the nuclease or inserted into the nuclease, and the nuclease is SpCas9 protein with D10A mutation or SpryCas9 protein with D10A mutation or Cas enzyme with other nuclease activity deleted and helicase activity reserved; uracil-N-glycosylase mutants are cytosine-N-glycosylase or thymine-N-glycosylase. The editing system can recognize and cleave cytosine/thymine of a target sequence under the guidance of sgRNA, and generate base mutation to guanine; can efficiently realize the site-directed targeting transformation of mammal cell lines, human primary T cells, mouse embryos and escherichia coli genetic materials DNA, and provides a powerful tool for treating genetic diseases caused by gene mutation and establishing related experimental animal models.
Description
Technical Field
The invention belongs to the technical field of gene editing, and particularly relates to a fusion protein, a base editing system (CGBE, TSBE) and application thereof.
Background
Single nucleotide variation can lead to the occurrence of about 2/3 of human genetic diseases, with about 59,813 pathogenic single nucleotide variations. For example, in sickle cell anemia, the gene encoding the beta chain of hemoglobin undergoes a base substitution of CTT > CAT, mutating glutamic acid to valine, resulting in structural and functional abnormalities of hemoglobin. For another example, 99% of hyperphenylalaninemia or PKU is caused by PAH gene mutation, and more than 20 PAH gene mutations have been confirmed in our country, which account for about 80% of PAH mutant genes, wherein 259C > T (48.3%) and 286G > A (15.5%) are hot spot mutations. At present, the methods for treating genetic diseases caused by base mutation and relieving drugs are very limited, and the effect is difficult to satisfy, so that the research and development of safer, more efficient and economical treatment means are urgent.
CRISPR-Cas9 is an adaptive immune defense formed by bacteria and archaea in the long-term evolution process, and can be used to combat invasive viruses and foreign DNA, while CRISPR-Cas9 gene editing technology is a technology for specific DNA modification of targeted genes. Gene editing technology based on CRISPR-Cas9 has great application prospect in a series of application fields of gene therapy, such as treatment of hematopathy, tumor and other genetic diseases. The CRISPR/Cas9 technology induces homologous recombination (HDR) and non-homologous end joining (NHEJ) repair pathways in cells by creating DNA Double Strand Breaks (DSBs) at the target point, thereby enabling site-directed knockout, substitution, insertion, etc. modifications to genomic DNA. However, DSB-initiated DNA repair is difficult to achieve efficient and stable single base mutations, greatly limiting the broad application of CRISPR-Cas9 technology. While the advent of single base editing systems has effectively made up for this deficiency, researchers have begun using single base editing systems to create and correct animal models of human diseases, including duchenne muscular dystrophy (Duchenne muscular dystrophy), premature adult aging syndrome (Progeria), and age-related macular degeneration.
The single Base editing system (Base editing) fuses different Base modification enzymes by using a Nickase Cas9 (D10A), and introduces a single nucleotide mutation in a specific region of a gene. The two single base editors most widely used at present are a cytosine base editor (Cytosine base editors, CBE) and an adenine base editor (Adenine base editors, ABE), which can achieve accurate c.g to t.a or a.t to g.c substitutions within a 4-8 nucleotide (from the PAM distal end) window, respectively, without DNA double strand breaks, but CBE and ABE can only produce transition mutations, and cannot produce transversion mutations, so GBE (mutation to produce C to G) which can produce transversion mutations and a lead editor (PE) which can produce arbitrary mutations of arbitrary bases appear successively. However, the editing efficiency of the existing GBE for producing C to G has certain site preference and the purity of the product is not high, while PE is not suitable for different types of cells. Moreover, the existing base editing system has the conditions of PAM preference or low partial site targeting efficiency, and the size of the expression plasmid of the base editor is far beyond the packaging range of adenovirus, which is not beneficial to clinical research and application. Therefore, the development of a novel base editor which has no PAM limit, generates high-efficiency transversion mutation and has smaller expression plasmid is the key of the current gene editing application research and clinical application.
uracil-DNA-glycosylase (UDG), also known as uracil-N-glycosylase (UNG), is the first enzyme to be recruited in the base excision repair pathway, playing an important role in antibody Class Switching (CSR) and in somatic high frequency mutagenesis (SHM). Its function is to cleave uracil formed by cytosine deamination or a wrongly ligated deoxyribose diabetes pyrimidine nucleotide (dUMP) in DNA replication, forming a pyrimidine-free site. However, the pyrimidine-free site provides an informationless template for DNA synthesis, and DNA polymerase cannot replicate normally, but studies have found REV1 as a backbone molecule, stabilizing UNG, recruiting a repair DNA polymerase across lesions, creating both transition and transversion mutation types. Studies have shown that introducing mutations at the catalytically active site of UNG will result in enzymes with different catalytic activities, e.g., mutation of Asn at position 204 of UNG to Asp can result in cytosine-N-glycosylase (CDG) with cytosine cleaving activity, mutation of Tyr at position 147 of UNG to Ala can result in thymine-N-glycosylase (TDG) with thymine cleaving activity, while in vitro experiments confirm that CDG is more active on single stranded DNA than double stranded DNA, whereas TDG is more active on double stranded DNA than single stranded DNA. In yeast cells with deleted AP endonuclease genes, over-expression of CDG can generate transversion mutation mainly comprising C > G; over-expression of TDG will produce a T > G based transversion mutation. The main reason for the transversion mutation is probably due to the abasic site generated by UNG, CDG and TDG in the yeast cells, which tends to base pair with C during DNA damage repair and replication, and after one round of replication, abasic site to G base mutation is generated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a novel cytosine editor and thymine editor which can generate efficient transversion mutation and are not limited by PAM sequences.
In a first aspect, the invention provides a fusion protein comprising a nuclease, a uracil-N-glycosylase mutant, wherein the uracil-N-glycosylase mutant is linked to the nuclease, or the uracil-N-glycosylase mutant is inserted into the nuclease. The nuclease is a D10A mutant SpCas9 protein or a D10A mutant SpryCas9 protein, or a Cas enzyme (such as Cas12f, ISCB and the like) with the activity of other nucleases deleted and the activity of helicase reserved; the uracil-N-glycosylase mutant is cytosine-N-glycosylase or thymine-N-glycosylase. The fusion protein has a structure shown as a formula (I), a formula (II) or a formula (III);
A-B-L1-C-L2-A is of formula (I);
A-C-L1-B-L2-A is of formula (II);
A-C 1-1046 -L3-B-L3-C 1063-1367 -L2-a formula (iii);
A-C 1-1009 -L3-B-L3-C 1011-1367 -L2-a formula (iv);
A-C 1-1028 -L3-B-L3-C 1030-1367 -L2-a formula (v);
A-C 1-1248 -L3-B-L3-C 1250-1367 -L2-a formula (vi);
wherein A is a nuclear localization signal or none, B is cytosine-N-glycosylase (CDG) or thymine-N-glycosylase (TDG), C is a nuclease, the superscript denotes the site of the nuclease, and L1, L2 and L3 are none or each independently a connecting peptide.
The invention designs the structure of the fusion protein, so that the fusion protein can form a complex with the sgRNA to form a base editing system, and the sgRNA can guide the fusion protein to recognize and cut a target sequence to generate single base conversion of C-to-G or T-to-S. The cytosine-N-glycosylase (CDG) or thymine-N-glycosylase (TDG) is uracil-N-glycosylase mutant including human uracil-N-glycosylase mutants (hCDG and hTDG) or E.coli uracil-N-glycosylase mutants (eCDG and eTDG) or nematode uracil-N-glycosylase mutants (cCDG and cTDG); the amino acid sequence of hCDG is shown as SEQ ID NO. 1; hTDG with amino acid sequence shown in SEQ ID NO.2 and modified m ,hTDG m Comprises a sequence shown in SEQ ID NO. 7-8; the amino acid sequence of eCDG is shown as SEQ ID NO. 3; the amino acid sequence of eTDG is shown as SEQ ID NO. 4; the amino acid sequence of cCDG is shown as SEQ ID NO. 5; the amino acid sequence of cTDG is shown in SEQ ID NO. 6.
Preferably, the fusion protein further comprises a nuclear localization signal; the nuclear localization signal is fused to the N-terminal and the C-terminal of the fusion protein;
preferably, the amino acid sequence of the nuclear localization signal is shown in SEQ ID NO. 9.
Preferably, the fusion protein further comprises a linker peptide L1, L3 for linking the nuclease, uracil-N-glycosylase mutants;
and/or a linker peptide L2 for linking the fusion protein and the nuclear localization signal.
Wherein, the connecting peptide L1 is a direct connecting nuclease and uracil-N-glycosylase mutant, the amino acid sequence is shown as SEQ ID NO.10, and the connecting peptide L3 is a connecting nuclease and uracil-N-glycosylase mutant after the uracil-N-glycosylase mutant is inserted into the nuclease, and comprises 5AA, the amino acid sequence is shown as SEQ ID NO.11, and 10AA, the amino acid sequence is shown as SEQ ID NO. 12; the amino acid sequence of the connecting peptide L2 is SGGS.
Preferably, the fusion protein comprises:
N-CGBE with the amino acid sequence shown in SEQ ID NO. 13; C-CGBE, the amino acid sequence of which is shown in SEQ ID NO. 14; CE-CGBE-1, the amino acid sequence of which is shown as SEQ ID NO. 15; CE-CGBE-2, the amino acid sequence of which is shown as SEQ ID NO. 17; CE-CGBE-3, the amino acid sequence of which is shown as SEQ ID NO. 18; pTac-CE-CGBE with the amino acid sequence shown in SEQ ID NO. 19; CE-sprycbe with an amino acid sequence shown as SEQ ID NO. 20; CE-TSBE-1, the amino acid sequence of which is shown as SEQ ID NO. 16; CE-TSBE-2, the amino acid sequence of which is shown in SEQ ID NO. 21; CE-TSBE-3, the amino acid sequence of which is shown as SEQ ID NO. 22; the amino acid sequence of CE-TSBE-V206I is shown as SEQ ID NO. 23; CE-TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 24; CE (1010) -TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 25; CE (1029) -TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 26; CE (1249) -TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 27.
In a second aspect, the invention provides a nucleic acid molecule comprising a gene encoding a fusion protein according to the first aspect. In the invention, the fusion protein can be prepared by inserting the encoding gene of the fusion protein in the first aspect into an expression vector and introducing the encoding gene into cells for expression.
In a third aspect, the invention provides a kit comprising a base editing system mediated by uracil-N-glycosylase mutant, the base editing system comprising a fusion protein according to the first aspect and a sgRNA. Wherein, the base editing system formed by fusion proteins comprising nuclease and cytosine-N-glycosylase (CDG) is named as 'C to G' editor (CGBE); the base editing system consisting of fusion proteins comprising nuclease, thymine-N-glycosylase (TDG) is named 'T to S' editor (TSBE).
In the invention, a designed fusion protein and sgRNA form a base editing system, the fusion protein can form a complex with the sgRNA, and the sgRNA can guide the fusion protein to recognize and cut a target sequence to generate single base conversion of C-to-G or T-to-S.
In a fourth aspect, the present invention provides the use of a fusion protein according to the first aspect, a nucleic acid molecule according to the second aspect or a base editing system according to the third aspect for the preparation of a gene editing product.
In a fifth aspect, the present invention provides a gene editing kit comprising the base editing system of the third aspect.
In a sixth aspect, the present invention provides a base editing method comprising base editing using the base editing system according to the third aspect.
Compared with the prior art, the invention has the following beneficial effects:
(1) The fusion protein can form a complex with sgRNA to form a base editing system, the sgRNA can guide the fusion protein to recognize and cut cytosine or thymine on a target sequence, single base conversion of C-to-G or T-to-S occurs, and particularly thymine can not be edited directly by a previous base tool;
(2) The base editing system can realize single base conversion (C-to-G, C-to-A, T-to-G and T-to-C) on the genome of a Hela immortalized cell line and escherichia coli, so that the variety of editing products on a specific region of a gene is greatly enriched;
(3) The base editing system provided by the invention has higher editing efficiency than the reported base editor GBE, and can generate transversion mutation with higher editing efficiency on the same target site.
Drawings
FIG. 1 is a schematic diagram of the structure of a fusion protein CGBE and a schematic diagram of the action principle when genome editing occurs, wherein A is a schematic diagram of the structure of a part of the fusion protein CGBE; b is a functional principle diagram of editing fusion protein in genome.
FIG. 2 is a graph showing the results of editing CGBE at the endogenous gene locus of Hela cells, wherein A-C is the effect of different fusion modes of CDG and Cas9 proteins on editing efficiency: a is an editing statistical diagram of C-CGBE, N-CGBE and CE-CGBE-1 at a Dicer site; b is a statistical chart of the purity of products of C-CGBE, N-CGBE and CE-CGBE-1 at three sites C to G; c is a statistical graph of index generated by C-CGBE, N-CGBE and CE-CGBE-1 at three positions; D-E is the effect of CDG from different species on editing efficiency: d is a statistical diagram of the purity of products of CE-CGBE-1, CE-CGBE-2 and CE-CGBE-3 at three sites C to G; e is CE-CGBE-1, CE-CGBE-2 and CE-CGBE-3 to generate an index statistical map at three positions; F. g is an edit statistical plot of CE-spryCGBE at sites other than NGG PAM, respectively.
FIG. 3 is a graph showing the result of editing CGBE at the E.coli endogenous gene locus, wherein A-H are respectively statistical graphs of editing pTac-CE-CGBE at the E.coli 8 endogenous gene loci.
FIG. 4 is an edit of CGBE at an endogenous site of a human primary T cell, wherein A is a sequencing peak diagram of CGBE edited at a VEGFA at an endogenous site of a human primary T cell, B is statistics of a second generation sequencing result, and C is a statistics diagram of CGBE generation indels.
FIG. 5 is a graph of the results of comparing CGBE with the existing GBE editor, wherein A is a schematic diagram of the structure of 4 reported GBEs and CE-CGBE-1; b is a heat map of the reported editing efficiency of GBE and CE-CGBE-1 at 8 sites; c is a heat map of the reported purity of the GBE and CE-CGBE-1 products at 8 sites C to G; d is a heat map of the reported GBE and CE-CGBE-1 at 8-site index; e is a heat map of reported purity of the products of GBE and CE-CGBE-1 at 8 sites C to T.
FIG. 6 is a graph of the detection result of CGBE off-target effect.
FIG. 7 is a schematic diagram of the structure of a fusion protein TSBE and a schematic diagram of the action principle when editing occurs on a genome, wherein A is a schematic diagram of the structure of a part of the fusion protein TSBE; b is a functional principle diagram of editing fusion protein in genome.
FIG. 8 is a graph showing the effect of linker length on TSBE, A is the editing efficiency of TSBE-1, TSBE-2 and TSBE-3 at 2 endogenous sites of Hela cells; b is the purity of the products of TSBE-1, TSBE-2 and TSBE-3 at 2 endogenous sites T to G in HeLa cells.
FIG. 9 is a graph of the detection result of TSBE off-target effect.
FIG. 10 is a schematic diagram of the protein evolution of Artificial Intelligence (AI) assisted TSBE and the results verification, A is a schematic diagram of the protein evolution of Artificial Intelligence (AI) assisted TSBE; b is a comparison of the numbers of mutants screened to be higher in editing efficiency than the wild-type TDG by two different sequences, and C, D is a functional verification result graph of the screened mutants respectively.
FIG. 11 shows the effect of different positions of the TDG2 mutant TDG2 (R260K) inserted into spCas9 on the editing efficiency of TSBE, wherein A is the editing efficiency of TSBE-2, CE (1010) -TSBE-R260K, CE (1029) -TSBE-R260K and CE (1249) -TSBE-R260K at the endogenous site Dicer of Hela cells; b is the editing efficiency of TSBE-2, CE (1010) -TSBE-R260K, CE (1029) -TSBE-R260K and CE (1249) -TSBE-R260K at the endogenous site VEGFA of the HeLa cells.
FIG. 12 is a graph showing the results of editing TSBE in db/db heterozygote mouse embryos, A being the proportion of TSBE-correctable pathogenic base mutations; b is TSBE editing the sequencing peak diagram of db/db heterozygote mouse embryo; c is a statistical graph of the second generation sequencing results of TSBE edited db/db heterozygous mouse embryos. P-value: * p <0.05, < p <0.01, < p <0.001.
Detailed Description
The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
EXAMPLE 1 construction of fusion protein CGBE plasmid
As shown in FIG. 1A, the gene editing tool according to the present invention provides fusion proteins including N-CGBE, C-CGBE, CE-CGBE-1, CE-CGBE-2, CE-CGBE-3, pTac-CE-CGBE and CE-spryCGBE.
The preparation method of the fusion protein N-CGBE comprises the following steps: the pCMV-BE3 (# 73021) is used as a basic vector, a human cytosine-N-glycosylase hCDG2 sequence is connected to the N end of SpCas9 (D10A) through a connecting peptide sequence XTEN with the length of 16 amino acids, a nuclear localization signal NLS is connected to the C end of SpCas9 (D10A), and a fusion protein N-CGBE expression vector is constructed, the structure diagram is shown in figure 1A, and the amino acid sequence of the fusion protein N-CGBE is SEQ ID NO.13.
The preparation method of the fusion protein C-CGBE comprises the following steps: the pCMV-BE3 (# 73021) is used as a basic vector, a human cytosine-N-glycosylase hCDG2 sequence is connected to the C end of SpCas9 (D10A) through a connecting peptide sequence XTEN with the length of 16 amino acids, a nuclear localization signal NLS is connected to the C end of hCDG2, and a fusion protein N-CGBE expression vector is constructed, the structure diagram is shown in figure 1A, and the amino acid sequence of the fusion protein C-CGBE is SEQ ID NO.14.
The preparation method of the fusion protein CE-CGBE-1 comprises the following steps: the pCMV-BE3 (# 73021) is used as a basic vector, a human cytosine-N-glycosylase hCDG2 sequence is inserted into the middle of the SpCas9 (D10A), 16 amino acids at 1047-1062 positions of the sequence are replaced, a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), a fusion protein N-CGBE expression vector is constructed, the structure schematic diagram is shown in figure 1A, and the amino acid sequence of the fusion protein CE-CGBE-1 is SEQ ID NO.15.
The preparation method of the fusion protein CE-CGBE-2 comprises the following steps: the preparation method comprises the steps of taking pCMV-BE3 (# 73021) as a basic vector, inserting a cytosine-N-glycosylase eCDG2 sequence derived from escherichia coli into the middle of SpCas9 (D10A), replacing 16 amino acids at 1047-1062 positions of the sequence, connecting a nuclear localization signal NLS at the N end and the C end of the SpCas9 (D10A), and constructing a fusion protein CE-CGBE-2 expression vector, wherein the structural schematic diagram is shown in figure 1A, and the amino acid sequence of the fusion protein CE-CGBE-2 is SEQ ID NO.17.
The preparation method of the fusion protein CE-CGBE-3 comprises the following steps: the preparation method comprises the steps of taking pCMV-BE3 (# 73021) as a basic vector, inserting a cytosine-N-glycosylase cCDG2 sequence derived from nematodes into the middle of SpCas9 (D10A), replacing 16 amino acids at 1047-1062 positions of the sequence, connecting a nuclear localization signal NLS at the N end and the C end of the SpCas9 (D10A), and constructing a fusion protein CE-CGBE-3 expression vector, wherein the structural schematic diagram is shown in figure 1A, and the sequence of the fusion protein CE-CGBE-3 is SEQ ID NO.18.
The preparation method of the fusion protein pTac-CE-CGBE comprises the following steps: the prokaryotic expression vector pTac_ABE_pSC101_Kana is taken as a basic vector, NLS-spCas9n (1-1046) -hCDG2-spCas9n (1063-1367) is used for replacing the AID-spCas9 (dead) sequence in the original vector, a fusion protein pTac-CE-CGBE expression vector is constructed, the structure diagram is shown in figure 1A, the amino acid sequence of the fusion protein pTac-CE-CGBE is SEQ ID NO.19, and the nucleic acid sequence of pTac_ABE_pSC101_Kana is SEQ ID NO.30.
The preparation method of the fusion protein CE-spryCGBE comprises the following steps: the pCMV-BE3 (# 73021) is used as a basic vector, a human cytosine-N-glycosylase hCDG2 sequence is inserted into the middle of SpryCas9 (D10A), 1 amino acid at 1010 position is replaced, a nuclear localization signal NLS is connected to the N end and the C end of the SpryCas9 (D10A), a fusion protein CE-sprycGGE expression vector is constructed, the structure diagram is shown in figure 1A, and the amino acid sequence of the fusion protein CE-sprycGGE is SEQ ID NO.20.
The principle of the fusion protein when editing on genome is shown in fig. 1B, only SpCas9 (D10A) or Spry Cas9 (D10A) with single-stranded DNA cleavage activity or the like in CDG2, which is a core element of the fusion protein, is fused by a connecting peptide to form a complex with sgRNA in editing cells. Under the guidance of sgRNA, the fusion protein precisely recognizes and binds genomic DNA complementary to the sgRNA sequence, then unwinds the double helix structure of the genomic DNA and cuts the DNA single strand complementary to the sgRNA sequence to form a notch (Nick), meanwhile, CDG2 can bind to the single strand DNA region of an R-loop region formed by the unwound genomic DNA double strand and the sgRNA, cytosine (C) located in an active editing window on the DNA single strand is excised to form apurinic and apyrimidinic sites (AP site), and then base excision repair (base-precision repair) is initiated, and (1) if the AP site is recognized by AP lyase and cut or spontaneously breaks, a notch is formed on the DNA single strand, and the notch and a notch generated by adjacent positions on the complementary DNA single strand of nCas9 (D10A) can initiate DNA double strand break (DNable-strand and break, DSB), and Non-homologous end connection (NHou-homologous endjoining) is initiated, and random DNA repair is carried out to cause random mutation of the ends of Ind; (2) An AP site can be subjected to another error-prone repair, missing bases are randomly repaired into four different types of bases (A, T, C and G) with a certain probability, and the AP site is more prone to pairing with the C base due to the existence of 'C rule', so that mutation from the AP site to the G base is generated with a higher probability.
Example 2 editing of fusion protein CGBE at the endogenous site of Hela cells
This example uses N-CGBE, C-CGBE, CE-CGBE-1, CE-CGBE-2, CE-CGBE-3 and CE-sprycBE for base editing in HeLa cells.
N-CGBE, C-CGBE, CE-CGBE-1, CE-CGBE-2, CE-CGBE-3 and CE-spryCGBE, and specific sites of sgRNA (sgRNA sequence reference Koblan, L.W., arbab, M., shen, M.W., et al.effect C.G-to-G. C base editors developed using CRISPRi screens, target-library analysis, and map learning.Nat Biotechnol 39,1414-1425 (2021), https:// doi.org/10.1038/s 41587-021-00938-z) were co-transfected with the transfection reagent PEI (2.5. Mu.g) at a mass ratio of 2:1 (700 ng:350 ng) into the HeLa cell line (1X 10 5 Cell amount/group). Only sgRNA of specific sites is added into a negative control group (Mock), the dosage of a single vector is consistent with that of an experimental group, and after 24 hours of transfection, the antibiotics puromycin (1 mug/ml) and Blasticidin Blastidin (20 mug/ml) are added for screening, and after 5 days of transfection, cells are collected for lysis, and genome is extracted. And (3) using a cell lysate as a template, amplifying a target fragment containing a target site by utilizing a PCR reaction, and identifying the integral base editing condition of the site by second generation sequencing.
The frequency of C3 base conversion of N-CGBE, C-CGBE and CE-CGBE-1 at the endogenous site Dicer 1 of Hela cells is shown in FIG. 2A (the abscissa represents the group, and the ordinate represents the efficiency of C conversion to different bases in the second-generation sequencing result), and the experimental results are derived from 3 biological replicates. The results can be seen: N-CGBE, C-CGBE, CE-CGBE-1 produced different degrees of base conversion at the C3 base site of Dicer 1, with the base conversion from C to G being the main, and the editing efficiency (33-38%) of CE-CGBE-1 was much higher than that of N-CGBE (9.8-12%) and C-CGBE (1.1-1.4%).
The base conversion frequency of N-CGBE, C-CGBE and CE-CGBE-1 at 3 endogenous sites of Hela cells is shown in FIG. 2B (the abscissa represents the group, and the ordinate represents the efficiency of C-to-G base conversion in the second generation sequencing result), and the frequency of index generation by N-CGBE, C-CGBE, CE-CGBE-1 at the endogenous sites of Hela cells is shown in FIG. 2C (the abscissa represents the group, and the ordinate represents the frequency of index generation). The results can be seen: N-CGBE, C-CGBE and CE-CGBE-1 compared, CE-CGBE-1 produced higher C-to-G editing efficiency (18-30%) and lower index frequency (0.84-4.88%), and C-CGBE produced lower C-to-G editing efficiency (0.6-7%) and higher index frequency (11.85-67.2%).
The frequency of base conversion of CE-CGBE-1, CE-CGBE-2 and CE-CGBE-3 at 3 endogenous sites in Hela cells is shown in FIG. 2D, the frequency of index generation is shown in FIG. 2E, and the results can be seen: CE-CGBE-1 produced higher editing efficiency of C to G than CE-CGBE-2, while producing lower indels, and CE-CGBE-3 had substantially no editing efficiency. The sequence of hCDG in CE-CGBE-1 was used in the subsequent experiments.
The frequency of base conversion of CE-sprycGGBE at 2 endogenous sites of HeLa cells is shown in FIGS. 2F and 2G, the abscissa indicates the position of edited C base in sgRNA (PAM is 21-23), and the ordinate indicates the conversion efficiency of C to different bases in the second generation sequencing result. The results can be seen: at sites other than NGG PAM, CE-sprycbe effectively edited the C base, mainly resulting in C to G base mutation (3-21%).
EXAMPLE 3 editing of fusion protein pTac-CE-CGBE at E.coli endogenous site
This example uses pTac-CE-CGBE for base editing in E.coli.
8ng of specific site sgRNA (same as above) and 8ng of pTac-CE-CGBE were simultaneously added to 50. Mu.l of BW25113 competent cells, and the mixture was allowed to stand on ice for 30min and immediately cooled on ice for 2-3min after heat shock in a water bath at 42℃for 60 s. After the addition of 450. Mu.l of LB medium, the mixture was resuscitated at 220rpm for 1h at 37 ℃.2400 Xg was centrifuged for 3min, 400. Mu.l of the supernatant was discarded, the cells were resuspended in the remaining medium and plated on LB plates (containing 25. Mu.g/ml chloramphenicol, 50. Mu.g/ml kanamycin sulfate). The following day the monoclonal was picked and inoculated into 2ml of liquid LB medium (containing 25. Mu.g/ml chloramphenicol, 50. Mu.g/ml kanamycin sulfate). After 16h of incubation, a precipitate was obtained by centrifugation at 1000 Xg for 10 min. The precipitate was resuspended with 100. Mu.l of alkaline lysis solution (25mM NaOH,0.2mM Na2-EDTA, pH 12), heated at 85℃for 30min, and neutralized by adding 100. Mu.l of neutralization solution (40 mM Tris-HCl, pH 7.5). The supernatant was obtained by centrifugation at 1000 Xg for 10min and used as a substrate for PCR. The target fragment containing the target site is amplified by PCR reaction, and the integral base editing condition of the site is identified by second generation sequencing.
The base conversion frequency of pTac-CE-CGBE at the E.coli endogenous site is shown in FIGS. 3A-3H (the abscissa indicates the group, and the ordinate indicates the efficiency of conversion of C to different bases in the second-generation sequencing result), and it can be seen that: pTac-CE-CGBE effectively generates base conversion at the E.coli endogenous site C base, wherein the base conversion from C to A and C to T is the main, and the total editing efficiency is 1% -80%.
Example 4 editing of fusion protein CGBE at endogenous sites in human primary T cells
Prior to electrotransformation, T cells were activated with human T-Activator CD3/CD28 beads (Thermo Fisher Scientific) for 2 days and cultured in T cell medium (containing 100U/mL IL-2,2mM L-glutamine and 2vol% human AB serum) and CD3/CD28 beads were removed from cells 1 day prior to electrotransformation. When electrotransport was performed using NEON Transfection System, 3.0X10 s in each sample were measured 5 Individual cells were pelleted by centrifugation for 5 minutes at a rate of 300×g and washed once with DPBS. T cells were then suspended in 10 μl NEON buffer R. 600ng of CE-CGBE-1mRNA,600ng sgRNA (Genscript synthesis) was added to the cell solution. Control wells used NEON buffer R without any RNA added. The parameters for the electrical transfer at NEON Transfection System are as follows: 160 v,10ms, three pulses. After electrotransformation, the cells were placed in 500mL fresh T cell culture medium in a 48-well plate. Culturing the cells for 3 days after electrotransformation, separating genome DNA, amplifying target fragments containing target sites by utilizing PCR reaction, and performing sanger sequencing and second generation sequencing.
Base conversion of CE-CGBE-1 at the endogenous site of human primary T cells As shown in FIGS. 4A-4C, 4A is the Mulberry sequencing result, and the second generation sequencing result of CE-CGBE-1 at the C6 of the VEGFA site resulted in 30% C to G,20% C to T,4B compared with the control group showed that CE-CGBE-1 resulted in 21% C to G,18% C to T,6% C to A base conversion at the C6 of the VEGFA site and 8.8% C to T base conversion at the C9 of the VEGFA site. 4C is a statistic that CE-CGBE-1 produced an indel at the VEGFA site, and CE-CGBE-1 produced 2.5% of the indel at the VEGFA site.
Example 5 comparison of CGBE with an existing GBE editor
The reported GBE used in this example: #163543, #140256, #163565 and #163546 were all purchased from addgene, and the structural schematic is shown in FIG. 5A, and CE-CGBE-1 and the reported GBE were base edited in Hela cells.
CE-CGBE-1, #163543, #140256, #163565 and #163546, and specific sites of sgRNA (supra) were co-transfected with the transfection reagent PEI (2.5. Mu.g) into the HeLa cell line (1X 10) at a mass ratio of 2:1 (700 ng:350 ng) 5 Cell amount/group). The negative control group is only added with sgRNA at specific sites, the single vector dosage is consistent with that of the experimental group, the screening is carried out by adding the antibiotic puromycin (1 mug/ml) after 24 hours of transfection, and after 5 days of transfection, cells are collected for lysis, and genome is extracted. And (3) using a cell lysate as a template, amplifying a target fragment containing a target site by utilizing a PCR reaction, and identifying the integral base editing condition of the site by second generation sequencing.
The frequency of base conversion of CE-CGBE-1, #163543, #140256, #163565 and #163546 at 8 endogenous sites of Hela cells is shown in FIG. 5B, the purity of conversion of C to G is shown in FIG. 5C, the frequency of generation of indels is shown in FIG. 5D, the purity of conversion of C to T is shown in FIG. 5E, the row of the heat map represents each editing site, the column represents the group, the darkness of each cell represents the size of the value, the darker the color, and the larger the value. The results can be seen: compared to the already reported GBEs (# 163543, #140256, #163565 and # 163546), the editing efficiency of CE-CGBE-1 at bit positions Dicer 1, #12 and #18 is higher than that of the already reported GBEs, and the editing efficiency at bit positions VEGFA, #1, #2 and #11 is equal to that of the already reported GBEs; the purity of the product of the CE-CGBE-1 for generating the C-to-G editing at each position is equivalent to that of the reported GBE; CE-CGBE-1 produced indels at multiple sites more frequently than the reported GBE; CE-CGBE-1 produced C-to-T edited products at positions Dicer 1 and #18 at a lower purity than the reported GBE.
Example 6 detection of off-target Effect of fusion protein CGBE
Cas-OFFinder (CRISPR RGEN Tools (rgenome. Net)) was used to predict potential off-target sites for Cas9 RNA-guided endonucleases, the first 10 off-target sites were selected for validation. Hela cells were transfected with CE-CGBE-1, the negative control group was supplemented with sgRNA at only specific sites, the single vector usage was consistent with that of the experimental group, screening was performed 24 hours after transfection with the antibiotics puromycin (1. Mu.g/ml) and Blastidin (20. Mu.g/ml), cells were collected for lysis 5 days after transfection, 10 potential off-target site sequences were PCR amplified for each target site using the cell lysate as template, and the overall off-target editing of the site was identified by second generation sequencing.
The off-target effects of CE-CGBE-1 at HeLa cell endogenous sites #1, #2 and Dicer are shown in FIG. 6, with the abscissa representing each off-target site and the ordinate representing the percent base editing. The results can be seen: the off-target effect of CE-CGBE-1 at HeLa cell endogenous sites #1, #2 and Dicer was very low, less than 0.25%.
EXAMPLE 7 construction of fusion protein TSBE plasmid
As shown in FIG. 7A, the present embodiment provides fusion proteins CE-TSBE-1, CE-TSBE-2, CE-TSBE-3, CE-TSBE-V206I and CE-TSBE-R260K.
The preparation method of the fusion protein CE-TSBE-1 comprises the following steps: the pCMV-BE3 (# 73021) is used as a basic vector, a humanized TDG2 sequence is inserted into the middle of the SpCas9 (D10A), 16 amino acids at 1047-1062 positions of the humanized TDG2 sequence are replaced, a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), a fusion protein CE-TSBE-1 expression vector is constructed, the structure diagram is shown in FIG. 7A, and the amino acid sequence of the fusion protein CE-TSBE-1 is SEQ ID NO.16.
The preparation method of the fusion protein CE-TSBE-2 comprises the following steps: the CE-CGBE-1 is used as a basic vector, a humanized TDG2 sequence is inserted into the middle of the SpCas9 (D10A) to replace 16 amino acids at 1047-1062 positions, a 5AA linker is connected to the N end and the C end of the TDG2, a nuclear positioning signal NLS is connected to the N end and the C end of the SpCas9 (D10A), and a fusion protein CE-TSBE-2 expression vector is constructed, the structural schematic diagram is shown in figure 7A, and the amino acid sequence of the fusion protein CE-TSBE-2 is SEQ ID NO.21.
The preparation method of the fusion protein CE-TSBE-3 comprises the following steps: the pCMV-BE3 (# 73021) is taken as a basic vector, a humanized TDG2 sequence is inserted into the middle of the SpCas9 (D10A) to replace 16 amino acids at 1047-1062 positions, a 10AA linker is connected to the N end and the C end of the TDG2, a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), a fusion protein CE-TSBE-3 expression vector is constructed, the structural schematic diagram is shown in FIG. 7A, and the amino acid sequence of the fusion protein CE-TSBE-3 is SEQ ID NO.22.
The preparation method of the fusion protein CE-TSBE-V206I comprises the following steps: the pCMV-BE3 (# 73021) is taken as a basic vector, a humanized TDG2 (V206I) sequence is inserted into the middle of the SpCas9 (D10A) to replace 16 amino acids at 1047-1062 positions, a 5AA linker is connected to the N end and the C end of the TDG2, a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), and a fusion protein CE-TSBE-V206I expression vector is constructed, the structural diagram is shown in figure 7A, and the amino acid sequence of the fusion protein CE-TSBE-V206I is SEQ ID NO.23.
The preparation method of the fusion protein CE-TSBE-R260K comprises the following steps: the pCMV-BE3 (# 73021) is used as a basic vector, a humanized TDG2 (R260K) sequence is inserted into the middle of the SpCas9 (D10A) to replace 16 amino acids at 1047-1062 positions, a 5AA linker is connected to the N end and the C end of the TDG2, a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), a fusion protein CE-TSBE-R260K expression vector is constructed, the structural diagram is shown in figure 7A, and the amino acid sequence of the fusion protein CE-TSBE-R260K is SEQ ID NO.24.
The preparation method of the fusion protein CE (1010) -TSBE-R260K, CE (1028) -TSBE-R260K, CE (1248) -TSBE-R260K is the same as that above, a humanized TDG2 (R260K) sequence is inserted into the middle of SpCas9 (D10A) to replace amino acids at different positions, and the amino acid sequences are shown in SEQ ID NO. 26-27.
The principle of the fusion protein when editing on genome is shown in fig. 7B, only SpCas9 (D10A) or Spry Cas9 (D10A) with single-stranded DNA cleavage activity or the like in the core element TDG2 of the fusion protein is fused by a connecting peptide, and a complex is formed with sgRNA in editing cells. Under the guidance of sgRNA, the fusion protein precisely recognizes and binds genomic DNA complementary to the sgRNA sequence, then unwinds the double helix structure of the genomic DNA and cuts a DNA single strand complementary to the sgRNA sequence to form a Nick (Nick), meanwhile, TDG2 can bind to a single strand DNA region of an R-loop region formed by the unwound genomic DNA double strand and the sgRNA, thymine (T) located in an active editing window on the DNA single strand is excised to form apurinic/Apyrimidinic (AP) site (AP site), and then base excision repair (base-precision repair) is initiated, wherein (1) if the AP site is recognized by AP lyase and cut or spontaneously breaks, a Nick is formed on the DNA single strand, and the Nick generated by nCas9 (D10A) at a position adjacent to the complementary DNA single strand can initiate DNA double strand break (DNA double strand break, DSB), and Non-homologous end connection (Non-homologous endjoining) is initiated to randomly mutate the end by DNA strand break, thereby causing mutation; (2) The AP site can also be repaired easily by another error, the missing base is randomly repaired into four different types of bases (A, T, C and G) with a certain probability, and the AP site is more prone to pairing with the C base due to the existence of 'C rule', so that mutation from the AP site to the G base is generated more probability.
Example 8 editing of fusion protein TSBE at the endogenous site of Hela cells
This example uses CE-TSBE-1, CE-TSBE-2 and CE-TSBE-3 for base editing in HeLa cells.
CE-TSBE-1, CE-TSBE-2 and CE-TSBE-3, and sgRNA at specific sites (supra) were co-transfected with the transfection reagent PEI (2.5. Mu.g) into the HeLa cell line (1X 10) at a mass ratio of 2:1 (700 ng:350 ng) 5 Cell amount/group). Only sgRNA at specific sites was added to the negative control group, the amount of single vector was consistent with that of the experimental group, and after 24 hours of transfection, the antibiotics puromycin (1. Mu.g/ml) and Blastidin (20. Mu.g/ml) were added for screening, and after 5 days of transfection, cells were collected for lysis to extract genome. And (3) using a cell lysate as a template, amplifying a target fragment containing a target site by utilizing a PCR reaction, and identifying the integral base editing condition of the site by second generation sequencing.
The editing efficiency of CE-TSBE-1, CE-TSBE-2 and CE-TSBE-3 at 2 endogenous sites of Hela cells is shown in FIG. 8A, the abscissa indicates the positions of edited T bases in sgRNA (PAM at positions 21-23), and the ordinate indicates the overall efficiency of T base mutation in the second generation sequencing results. The purity of the products of the T-G mutation of CE-TSBE-1, CE-TSBE-2 and CE-TSBE-3 at 2 endogenous sites of Hela cells is shown in FIG. 8B, the abscissa indicates the position of the edited T base in sgRNA (PAM positions 21-23), and the ordinate indicates the purity of the products of the T-G mutation. The experimental results were from 3 biological replicates. The results can be seen: the editing efficiency (T4: 3.8-5.8%; T5: 13-16.8%; T6: 14.2-17.4%; T8: 4.7-5.5%) produced by CE-TSBE-2 at positions T4, T5, T6 and T8 of Dicer site 1 were all higher than the editing efficiency (T4: 1.4-2.1%; T5: 4.7-4.9%; T6: 2.2-2.6%; T8: 0.5-1.9%) produced by CE-TSBE-1 and the editing efficiency (T4: 1-1.7%; T5: 4-7.9%; T6: 4.2-5.6%; T8: 2.8-4.7%) produced by CE-TSBE-3. The purity of the T-G edited product produced by the three methods is not obviously different. The editing efficiency (T-2:7-9.4%; T3:6.1-7.6%; T5:22.9-28.3%) generated by CE-TSBE-2 at sites T-2, T3 and T5 of VEGFA were all higher than the editing efficiency (T-2:5.9%; T3:3.5-3.7%; T5:4-4.5%) generated by CE-TSBE-1 and the editing efficiency (T-2:0.5-1.1%; T3:2.1-3.4%; T5:10.3-13.9%). The purity of the T-G edited product produced by the three methods is not obviously different.
Example 9 detection of TSBE off-target Effect of fusion proteins
Cas-OFFinder (CRISPR RGEN Tools (rgenome. Net)) was used to predict potential off-target sites for Cas9 RNA-guided endonucleases, the first 10 off-target sites were selected for validation. Hela cells were transfected with CE-TSBE-2, the negative control group was supplemented with sgRNA at only specific sites, the single vector usage was consistent with that of the experimental group, screening was performed 24 hours after transfection with the antibiotics puromycin (1. Mu.g/ml) and Blastidin (20. Mu.g/ml), cells were harvested for 5 days after transfection to lyse, 10 potential off-target site sequences were PCR amplified for each target site using the cell lysate as template, and overall off-target editing for that site was identified by second generation sequencing.
The off-target effects of CE-TSBE-1 at HeLa cell endogenous sites #1, VEGFA and Dicer are shown in FIG. 9, with the abscissa representing each off-target site and the ordinate representing the percent base editing. The results can be seen: the off-target effect of CE-TSBE-1 at the endogenous sites #1, VEGFA and Dicer of the Hela cells is very low, and has no obvious difference from that of a negative control group.
Example 10Artificial Intelligence (AI) assisted protein evolution and mutation validation of TSBE
Due to the wide similarity between human and protein languages, several recent research efforts have applied Large Language Models (LLMs) to protein domain studies, including directed protein evolution, evolution dynamics and evolution of human antibodies. This example uses 9 different ranking strategies to systematically benchmark 17 different pre-trained protein LLMs. The goal is to determine the best combination by computer evaluation, followed by protein evolution for true TDG. The LLM-based TSBE protein evolution flow is shown in FIG. 10A:
Stage 1: pretraining of Large Language Models (LLMs) was performed using ESM (evolution-scale modeling), including multiple data sets of Uniprot, uniref, uniref100, swissProt, etc. Of the 17 LLM candidates initially selected, based on their stability and high performance, esm2_t33_650m_ur50d was finally selected to evaluate the mutation effect.
Stage 2: (1) Analysis was initiated by identifying a range of potential mutation sites in the wild-type sequence. These sites are then input into LLM (large language model) via a mask language model, and the output generated from the previous step is the amino acid profile for each candidate site. (2) 9 different ranking strategies were employed to evaluate the quality of each amino acid distribution. The effectiveness of these ranking strategies was assessed using a Top-N Rate index that measures the proportion of variants with higher fitness than wild-type in the Top N ranking. Finally, the Wildtype marginal probability strategy was chosen because it performed better. (3) Using the data obtained in stage 2, a regression model was fitted to generate each possible variant of the selected site by mutating the given wild type sequence. These variants are then input into a regression model to obtain the respective ranks. The results of these analyses were then evaluated using the Top-N Rate index and used to refine the regression model.
Stage 3: after the regression model is completed, sorting based on the full-length sequence of the TDG and sorting based on the semi-non-conservative sequence of the TDG catalytic active region are carried out, 38 mutants of the TDG full-length sequence sorting top50 and 65 mutants of the TDG catalytic active region semi-non-conservative sequence sorting top100 (single mutation amino acid information is shown in table 1) are selected respectively, and plasmid construction and function verification are carried out. The preparation method of the mutant plasmid comprises the following steps: the CE-CGBE-2 is used as a basic vector, a humanized TDG2 (containing single amino acid mutation, a PCR primer is used for introducing amino acid point mutation) mutant sequence is inserted into the middle of SpCas9 (D10A), 16 amino acids at 1047-1062 positions of the mutant sequence are replaced, a 5AA linker is connected to the N end and the C end of the TDG2 mutant sequence, a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), and a fusion protein mutant expression vector is constructed, the structural schematic diagram is shown in figure 7A, for example, the amino acid sequence of CE-TSBE-V206I in the fusion protein mutant is SEQ ID NO.23, and the amino acid sequence of CE-TSBE-R260K is SEQ ID NO.24.
TABLE 1
Variants from full length ranking | Variants from CD domain ranking | Two variants combination |
V274A,H92A,P43R,I103Q | P165H,P165K,P165S,P165F | V206I/R260K |
P43K,I103K,D183G,H11A | P165V,P165N,P165A,P165Q | V206I/E219D |
H92L,H11P,E182P,P122E | P165R,P165T,V206I,R260K | V206I/G107E |
H11K,H92V,L74A,V185K | R260T,V206C,L233I,R260H | R260K/G107E |
I103A,I37S,I103S,H283P | C132T,C132V,P165L,L201M | L74Q/H92E |
H92Q,I37T,I103R,S55G | P165M,P165E,P165D,P165I | L74Q/I37T |
E219D,G107E,V185Q,I103E | P165G,L201V,P165C,L201C | H92E/I37T |
D63A,G107E,H11S,H283I | L201Y,S270T,S270C,R260Q | G107E/L74Q |
N238S,K297E,H92E,L74Q | R260L,R260N,R260A,R260C | G107E/H92E |
H11R,I103G | R260I,R260E,R260T,R260S | G107E/I37T |
R260M,R260D,R260Y,C132S | R260K/L74Q | |
C132A,C132I,C132L,H154N | R260K/H92E | |
H154Q,H154M,V206A,V206L | ||
V206N,V206M,V206G,L233V | ||
L233C,W245Y,P165Y,L201I | ||
W245L,S270N,P271L,R260V | ||
V206S |
TSBE mutant plasmids, as well as specific sites of sgRNA (supra), were co-transfected with the transfection reagent PEI (2.5. Mu.g) into the Hela cell line (1X 10) at a mass ratio of 2:1 (700 ng:350 ng) 5 Cell amount/group). Adding T in the control groupSBE-2 and specific sites of sgRNA, single carrier dosage and experimental group are consistent, 24 hours after transfection, adding the antibiotics puromycin (1 mug/ml) and Blastidin (20 mug/ml) for screening, 5 days after transfection, collecting cells for lysis, extracting genome. And (3) using a cell lysate as a template, amplifying a target fragment containing a target site by utilizing a PCR reaction, and identifying the integral base editing condition of the site by second generation sequencing.
Comparison of the number of mutants screened for higher editing efficiency than wild-type TDG by two different orderings as shown in FIG. 10B, among 38 mutants based on TDG full-length sequence ordering top50, 33 were higher in editing efficiency than wild-type TDG, while among 65 mutants based on TDG catalytic activity region semi-non-conserved sequence ordering top100, 17 were higher in editing activity than wild-type TDG. As shown in FIG. 10C, the results of functional verification of the mutant (38+65) are shown, the abscissa indicates the total base editing efficiency, the ordinate indicates the editing efficiency from T to G, and the mutants having editing activity 1.5 times that of the wild type are R260K, G107E, L74Q, etc.
The amino acid mutations with improved editing efficiency are combined pairwise (the single mutation amino acid information is shown in table 1), and the functions of the double amino acid mutants are constructed and verified. The preparation method of the mutant plasmid comprises the following steps: the CE-CGBE-2 is used as a basic vector, a humanized TDG2 (containing two amino acid mutations, a PCR primer is used for introducing amino acid point mutation) mutant sequence is inserted into the middle of SpCas9 (D10A), 16 amino acids at 1047-1062 positions of the mutant sequence are replaced, a 5AA linker is connected to the N end and the C end of the TDG2 mutant sequence, and a nuclear localization signal NLS is connected to the N end and the C end of the SpCas9 (D10A), so that a fusion protein double mutant expression vector is constructed. TSBE double mutant plasmids, as well as specific sites of sgRNA (supra), were co-transfected with the transfection reagent PEI (2.5. Mu.g) into the HeLa cell line (1X 10) at a mass ratio of 2:1 (700 ng:350 ng) 5 Cell amount/group). TSBE-2 and specific site sgRNA were added to the control group, the amount of single vector was the same as that of the experimental group, and after 24 hours of transfection, the antibiotics puromycin (1. Mu.g/ml) and Blastidin (20. Mu.g/ml) were added for screening, and after 5 days of transfection, cells were collected for lysis to extract the genome. Using cell lysate as template, and amplifying by PCR reaction to obtain target siteAnd (3) identifying the overall base editing condition of the site through second generation sequencing.
As a result of functional verification of the double amino acid mutant, as shown in FIG. 10D, the abscissa represents the total base editing efficiency, the ordinate represents the editing efficiency from T to G, and the double amino acid mutants having further improved editing activity were G107E/260K, L74Q/G107E, H92E/G107E and I37T/G107E, wherein the editing activity of G107E/260K was 2.3 times that of the wild type, and the editing activity of L74Q/G107E was 2.6 times that of the wild type.
Example 11 Effect of TDG2 (260K) insertion into different locations of spCas9 on TSBE editing efficiency
TSBE-2, CE (1010) -TSBE-R260K, CE (1029) -TSBE-R260K and CE (1249) -TSBE-R260K, and the sgRNA at the specific site (supra) were co-transfected with the transfection reagent PEI (2.5 ug) into the HeLa cell line (1X 10) in a mass ratio of 2:1 (700 ng:350 ng) 5 Cell amount/group). Only sgRNA at specific sites was added to the negative control group, the amount of single vector was consistent with that of the experimental group, and after 24 hours of transfection, the antibiotics puromycin (1. Mu.g/ml) and Blastidin (20. Mu.g/ml) were added for screening, and after 5 days of transfection, cells were collected for lysis to extract genome. And (3) using a cell lysate as a template, amplifying a target fragment containing a target site by utilizing a PCR reaction, and identifying the integral base editing condition of the site by second generation sequencing.
The frequency of base conversion of TSBE-2, CE (1010) -TSBE-R260K, CE (1029) -TSBE-R260K and CE (1249) -TSBE-R260K at the endogenous site Dicer1 of Hela cells is shown in FIG. 11A, the abscissa indicates the position of the T base, and the ordinate indicates the percentage of base editing. Description of results: CE (1010) -TSBE-R260K has an editing efficiency of 28.8% at T8, which is much higher than that of TSBE-2 at T8 by 8.75%; the editing efficiency of CE (1029) -TSBE-R260K and CE (1249) -TSBE-R260K at T5 is lower than that of TSBE-2, but the editing efficiency is still higher than that of TSBE-2 at T6, and the editing windows of CE (1029) -TSBE-R260K and CE (1249) -TSBE-R260K are narrower, so that the method is more suitable for editing T5. As shown in FIG. 11B, the base conversion frequencies of the VEGFA at the endogenous sites of the HeLa cells of the TSBE-2, the CE (1010) -TSBE-R260K, CE (1029) -TSBE-R260K and the CE (1249) -TSBE-R260K are higher than those of the TSBE-2 in the editing efficiency of the CE (1010) -TSBE-R260K at the T-1, the T5 and the T7; the editing efficiency of the CE (1029) -TSBE-R260K and the CE (1249) -TSBE-R260K at T5 is equal to that of the TSBE-2, and the editing windows of the CE (1029) -TSBE-R260K and the CE (1249) -TSBE-R260K are narrower and are more suitable for editing T5. In summary, inserting TDG2 (260K) into different positions of spCas9 affects the editing efficiency and editing window of TSBE.
Example 12 editing of fusion protein TSBE in db/db heterozygous mouse embryo
db/db mice and C57BL/6J mice were purchased from southern model organism and laboratory animal resource centers at the western lake university, respectively, and the mice were kept in specific pathogen-free facilities with adequate diet and drinking water in a 12 hour light and 12 hour dark cycle. All animal experiments accord with the draft regulation formulated by Hangzhou laboratory animal evaluation and certification society, and are approved by the laboratory animal resource center of the university of West lake. db/db homozygous male mice and C57BL/6J female mice were used as embryo donors to obtain db/db heterozygous embryos. Chemically modified sgrnas were synthesized from gold (Genscript) and have the sequence shown in SEQ ID No.28. The CE-TSBE-R260K plasmid is used as a template, a T7 promoter sequence is added to a primer, a T7-CE-TSBE-R260K DNA fragment is obtained through PCR amplification, and then the DNA fragment is purified by a phenol chloroform method and used as a template for in vitro transcription, wherein the sequence of the DNA fragment is shown as SEQ ID NO.29. T7-CE-TSBE-R260K mRNA was transcribed using an in vitro RNA transcription Kit (mMESSAGE mMACHINE T7 Ultra Kit, ambion) and was obtained, and the resulting mixture was then purified to give a DNA fragment containing T7-CE-TSBE-R260K mRNA (100 ng. Mu.l -1 ) And sgRNA (60 ng. Mu.l) -1 ) Injecting the solution of the complex into db/db heterozygote embryo cytoplasm, culturing for 3.5 days, then lysing cells, extracting genome, amplifying target fragments containing target sites by utilizing PCR reaction, and identifying the integral base editing condition of the sites by Mulberry sequencing (sanger sequencing) and second generation sequencing.
FIG. 12A shows that TSBE can theoretically edit and repair 64% of single-base pathogenic mutations, CE-TSBE-R260K edits db/db heterozygous mouse embryos as shown in FIGS. 12B-12C, the ratio of spontaneous G to T mutations of db/db heterozygous mouse embryos is 47.8-49%, the ratio of G at the site of a successfully edited TSBE mouse embryo is 60-78%, and the improvement is 12.2-29% without other byproducts.
Claims (10)
1. A fusion protein, characterized in that the fusion protein comprises nuclease, uracil-N-glycosylase mutant, wherein the uracil-N-glycosylase mutant is linked to nuclease, or uracil-N-glycosylase mutant is inserted into nuclease, and the nuclease is a D10A mutated SpCas9 protein or a D10A mutated spryccas 9 protein, or other Cas enzymes with deleted nuclease activity and retained helicase activity; the uracil-N-glycosylase mutant is cytosine-N-glycosylase or thymine-N-glycosylase.
2. The fusion protein of claim 1, wherein the cytosine-N-glycosylase is a human uracil-N-glycosylase mutant hCDG or an escherichia coli-derived uracil-N-glycosylase mutant eCDG or a nematode-derived uracil-N-glycosylase mutant cdg; the thymine-N-glycosylase is a humanized uracil-N-glycosylase mutant hTDG or a uracil-N-glycosylase mutant eTDG derived from escherichia coli or a uracil-N-glycosylase mutant cTDG derived from nematodes.
3. The fusion protein of claim 1, wherein the amino acid sequence of hCDG is shown in SEQ ID No. 1; the amino acid sequence of hTDG is shown as SEQ ID NO.2, SEQ ID NO.7 or SEQ ID NO. 8; the amino acid sequence of eCDG is shown as SEQ ID NO. 3; the amino acid sequence of eTDG is shown as SEQ ID NO. 4; the amino acid sequence of cCDG is shown as SEQ ID NO. 5; the amino acid sequence of cTDG is shown in SEQ ID NO. 6.
4. The fusion protein of claim 1, wherein the fusion protein further comprises a nuclear localization signal; the nuclear localization signal is fused to the N-terminus and/or the C-terminus of the fusion protein.
5. The fusion protein of claim 1 or 4, further comprising a linker peptide for linking the nuclease, uracil-N-glycosylase mutants;
and/or a linker peptide for linking the fusion protein and the nuclear localization signal.
6. The fusion protein of any one of claims 1-5, wherein the fusion protein comprises:
N-CGBE with the amino acid sequence shown in SEQ ID NO. 13;
C-CGBE, the amino acid sequence of which is shown in SEQ ID NO. 14;
CE-CGBE-1, the amino acid sequence of which is shown as SEQ ID NO. 15;
CE-CGBE-2, the amino acid sequence of which is shown as SEQ ID NO. 17;
CE-CGBE-3, the amino acid sequence of which is shown as SEQ ID NO. 18;
pTac-CE-CGBE with the amino acid sequence shown in SEQ ID NO. 19;
CE-sprycbe with an amino acid sequence shown as SEQ ID NO. 20;
CE-TSBE-1, the amino acid sequence of which is shown as SEQ ID NO. 16;
CE-TSBE-2, the amino acid sequence of which is shown in SEQ ID NO. 21;
CE-TSBE-3, the amino acid sequence of which is shown as SEQ ID NO. 22;
the amino acid sequence of CE-TSBE-V206I is shown as SEQ ID NO. 23;
CE-TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 24;
CE (1010) -TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 25;
CE (1029) -TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 26;
CE (1249) -TSBE-R260K, the amino acid sequence of which is shown in SEQ ID NO. 27.
7. A nucleic acid molecule comprising a gene encoding the fusion protein of any one of claims 1-6.
8. A base editing system comprising a mutant uracil-N-glycosylase, wherein the base editing system comprises the fusion protein of any of claims 1-6 and an sgRNA.
9. Use of the fusion protein of any one of claims 1-6, the nucleic acid molecule of claim 7 or the base editing system of claim 8 in the preparation of a gene editing product.
10. A gene editing kit comprising the base editing system of claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310733252.4A CN117126827A (en) | 2023-06-20 | 2023-06-20 | Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310733252.4A CN117126827A (en) | 2023-06-20 | 2023-06-20 | Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117126827A true CN117126827A (en) | 2023-11-28 |
Family
ID=88859077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310733252.4A Pending CN117126827A (en) | 2023-06-20 | 2023-06-20 | Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117126827A (en) |
-
2023
- 2023-06-20 CN CN202310733252.4A patent/CN117126827A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110835634B (en) | Novel base conversion editing system and application thereof | |
CN110835629B (en) | Construction method and application of novel base conversion editing system | |
EP3744844A1 (en) | Extended single guide rna and use thereof | |
CN110835632B (en) | Use of novel base transition editing system for gene therapy | |
KR20010071227A (en) | Cell-free chimeraplasty and eukaryotic use of heteroduplex mutational vectors | |
AU2019319230B2 (en) | Novel mutations that enhance the DNA cleavage activity of acidaminococcus sp. Cpf1 | |
US20220136041A1 (en) | Off-Target Single Nucleotide Variants Caused by Single-Base Editing and High-Specificity Off-Target-Free Single-Base Gene Editing Tool | |
WO2023142594A1 (en) | Accurate pam-limitation-free adenine base editor and use thereof | |
CN114075559A (en) | Type 2 CRISPR/Cas9 gene editing system and application thereof | |
Kim et al. | Base editing of organellar DNA with programmable deaminases | |
CN116751764B (en) | Cas9 protein, type II CRISPR/Cas9 gene editing system and application | |
CN113249362A (en) | Modified cytosine base editor and application thereof | |
CN117126827A (en) | Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application | |
CN115786304A (en) | Cas12a protein mutant, base editor containing same and application | |
CN115704015A (en) | Targeted mutagenesis system based on adenine and cytosine double-base editor | |
JP2024501892A (en) | Novel nucleic acid-guided nuclease | |
CN115772523A (en) | Base editing tool | |
WO2020036181A1 (en) | Method for isolating or identifying cell, and cell mass | |
CN117106758B (en) | RiCBE system for realizing C/G to T/A editing on gC motif of DNA | |
CN113403342A (en) | Single base mutation method and system adopted by same | |
WO2023024089A1 (en) | Base editing system for achieving a-to-c and/or a-to-t base mutation and use thereof | |
US20240132873A1 (en) | Site-specific genome modification technology | |
WO2023169482A1 (en) | Modified crispr-based gene editing system and methods of use | |
US20230348877A1 (en) | Base editing enzymes | |
US20230265421A1 (en) | Type ii crispr/cas9 genome editing system and the application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |