CN117683755A - C-to-G base editing system - Google Patents
C-to-G base editing system Download PDFInfo
- Publication number
- CN117683755A CN117683755A CN202410130316.6A CN202410130316A CN117683755A CN 117683755 A CN117683755 A CN 117683755A CN 202410130316 A CN202410130316 A CN 202410130316A CN 117683755 A CN117683755 A CN 117683755A
- Authority
- CN
- China
- Prior art keywords
- seq
- amino acid
- acid sequence
- cda1
- cytosine deaminase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- MZZYGYNZAOVRTG-UHFFFAOYSA-N 2-hydroxy-n-(1h-1,2,4-triazol-5-yl)benzamide Chemical compound OC1=CC=CC=C1C(=O)NC1=NC=NN1 MZZYGYNZAOVRTG-UHFFFAOYSA-N 0.000 claims abstract description 68
- 101000658622 Homo sapiens Testis-specific Y-encoded-like protein 2 Proteins 0.000 claims abstract description 65
- 102100034917 Testis-specific Y-encoded-like protein 2 Human genes 0.000 claims abstract description 65
- 108020001507 fusion proteins Proteins 0.000 claims abstract description 35
- 102000037865 fusion proteins Human genes 0.000 claims abstract description 35
- 102000000311 Cytosine Deaminase Human genes 0.000 claims abstract description 22
- 108010080611 Cytosine Deaminase Proteins 0.000 claims abstract description 22
- 108010077850 Nuclear Localization Signals Proteins 0.000 claims abstract description 12
- 102100037111 Uracil-DNA glycosylase Human genes 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 32
- 239000013604 expression vector Substances 0.000 claims description 19
- 238000010362 genome editing Methods 0.000 claims description 18
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 17
- 108020004707 nucleic acids Proteins 0.000 claims description 16
- 102000039446 nucleic acids Human genes 0.000 claims description 16
- 150000007523 nucleic acids Chemical class 0.000 claims description 16
- 101000807668 Homo sapiens Uracil-DNA glycosylase Proteins 0.000 claims description 6
- 239000012620 biological material Substances 0.000 claims description 6
- 244000005700 microbiome Species 0.000 claims description 6
- 101100427576 Caenorhabditis elegans ung-1 gene Proteins 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 claims 30
- 238000010353 genetic engineering Methods 0.000 abstract description 3
- 150000001413 amino acids Chemical class 0.000 description 23
- 239000000047 product Substances 0.000 description 23
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 20
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 20
- 239000012634 fragment Substances 0.000 description 19
- 238000010276 construction Methods 0.000 description 11
- 239000007788 liquid Substances 0.000 description 11
- 108020004414 DNA Proteins 0.000 description 10
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 10
- 239000002609 medium Substances 0.000 description 10
- 238000012408 PCR amplification Methods 0.000 description 8
- 239000001963 growth medium Substances 0.000 description 8
- 108091033409 CRISPR Proteins 0.000 description 7
- 210000004027 cell Anatomy 0.000 description 7
- 238000012165 high-throughput sequencing Methods 0.000 description 7
- 230000006698 induction Effects 0.000 description 7
- 230000001580 bacterial effect Effects 0.000 description 6
- 238000012258 culturing Methods 0.000 description 6
- 230000002950 deficient Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- FSBIGDSBMBYOPN-VKHMYHEASA-N L-canavanine Chemical compound OC(=O)[C@@H](N)CCONC(N)=N FSBIGDSBMBYOPN-VKHMYHEASA-N 0.000 description 5
- 230000003321 amplification Effects 0.000 description 5
- 238000003199 nucleic acid amplification method Methods 0.000 description 5
- 229940035893 uracil Drugs 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 108010068047 endodeoxyribonuclease AscI Proteins 0.000 description 4
- 239000008103 glucose Substances 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 239000006228 supernatant Substances 0.000 description 4
- 241000894006 Bacteria Species 0.000 description 3
- 238000010354 CRISPR gene editing Methods 0.000 description 3
- 238000007400 DNA extraction Methods 0.000 description 3
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 3
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 3
- FSBIGDSBMBYOPN-UHFFFAOYSA-N O-guanidino-DL-homoserine Natural products OC(=O)C(N)CCON=C(N)N FSBIGDSBMBYOPN-UHFFFAOYSA-N 0.000 description 3
- MUPFEKGTMRGPLJ-RMMQSMQOSA-N Raffinose Natural products O(C[C@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O[C@@]2(CO)[C@H](O)[C@@H](O)[C@@H](CO)O2)O1)[C@@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@@H](CO)O1 MUPFEKGTMRGPLJ-RMMQSMQOSA-N 0.000 description 3
- MUPFEKGTMRGPLJ-UHFFFAOYSA-N UNPD196149 Natural products OC1C(O)C(CO)OC1(CO)OC1C(O)C(O)C(O)C(COC2C(C(O)C(O)C(CO)O2)O)O1 MUPFEKGTMRGPLJ-UHFFFAOYSA-N 0.000 description 3
- 101150063416 add gene Proteins 0.000 description 3
- 238000003776 cleavage reaction Methods 0.000 description 3
- 239000011248 coating agent Substances 0.000 description 3
- 238000000576 coating method Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 108010030074 endodeoxyribonuclease MluI Proteins 0.000 description 3
- 229930182830 galactose Natural products 0.000 description 3
- 230000001939 inductive effect Effects 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 239000013612 plasmid Substances 0.000 description 3
- MUPFEKGTMRGPLJ-ZQSKZDJDSA-N raffinose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO[C@@H]2[C@@H]([C@@H](O)[C@@H](O)[C@@H](CO)O2)O)O1 MUPFEKGTMRGPLJ-ZQSKZDJDSA-N 0.000 description 3
- 230000007017 scission Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 229920000936 Agarose Polymers 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 2
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 2
- 229910009891 LiAc Inorganic materials 0.000 description 2
- 101150003481 UNG1 gene Proteins 0.000 description 2
- 230000033590 base-excision repair Effects 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000005782 double-strand break Effects 0.000 description 2
- 108010000306 endodeoxyribonuclease PaeI Proteins 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000006780 non-homologous end joining Effects 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 239000008223 sterile water Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- LQXHSCOPYJCOMD-UHFFFAOYSA-N 7h-purin-6-ylazanium;sulfate Chemical compound OS(O)(=O)=O.NC1=NC=NC2=C1NC=N2.NC1=NC=NC2=C1NC=N2 LQXHSCOPYJCOMD-UHFFFAOYSA-N 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 241000972773 Aulopiformes Species 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- -1 CDA1 amino acid Chemical class 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 108010072863 GACGTC-specific type II deoxyribonucleases Proteins 0.000 description 1
- 108010089133 GGTACC-specific type II deoxyribonucleases Proteins 0.000 description 1
- 229940113491 Glycosylase inhibitor Drugs 0.000 description 1
- 108020005004 Guide RNA Proteins 0.000 description 1
- 239000001888 Peptone Substances 0.000 description 1
- 108010080698 Peptones Proteins 0.000 description 1
- 241000251745 Petromyzon marinus Species 0.000 description 1
- 108700001094 Plant Genes Proteins 0.000 description 1
- 229920002562 Polyethylene Glycol 3350 Polymers 0.000 description 1
- 241001536974 Saccharomyces cerevisiae BY4743 Species 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 241000193996 Streptococcus pyogenes Species 0.000 description 1
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 1
- 101710160987 Uracil-DNA glycosylase Proteins 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000037429 base substitution Effects 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 229940041514 candida albicans extract Drugs 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000009615 deamination Effects 0.000 description 1
- 238000006481 deamination reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 238000012226 gene silencing method Methods 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000009630 liquid culture Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 235000019319 peptone Nutrition 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000012264 purified product Substances 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 235000019515 salmon Nutrition 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 210000005253 yeast cell Anatomy 0.000 description 1
- 239000012138 yeast extract Substances 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A50/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
- Y02A50/30—Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change
Abstract
The invention relates to the field of genetic engineering in biotechnology, in particular to a C-to-G base editing system; the C-to-G base editing system contains the following (A1) or (A2): (A1) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A) of different truncated lengths and a nuclear localization signal NLS; (A2) Cytosine deaminase CDA1 or fusion proteins of cytosine deaminase CDA1, nCas9 (D10A), uracil-DNA glycosylase UNG1 and nuclear localization signal NLS of different truncated lengths. The invention changes the editing window of the existing CGBE base editor from 5 th to 9 th positions of the PAM far end to 3 rd to 4 th positions; the editing window is reduced to 1-2 bases, so that the accuracy is improved; in addition, the method has the advantages of low off-target, high purity of editing products and the like, and has wide application prospect.
Description
Technical Field
The invention relates to the field of genetic engineering in biotechnology, in particular to a C-to-G base editing system constructed based on cytosine deaminase CDA 1.
Background
The genome editing technology is one kind of genetic engineering technology to directionally modify genome of organism. CRISPR/Cas9 systems are currently the most commonly used genome editing systems that target a target site by guide RNA, and cleavage of the target site by Cas9 protein produces a Double Strand Break (DSB), often triggering a non-homologous end joining repair (NHEJ) mechanism, resulting in random base insertions/deletions (Indels) of the target site, leading to gene silencing or loss of function. However, since the repair results are random, it is difficult for the CRISPR/Cas9 system to accurately introduce point mutations. Base Editors (BEs) built based on CRISPR/Cas9 can base specific base switch to target sites without causing DSBs. The main base editors at present mainly comprise a Cytosine Base Editor (CBE) and an Adenine Base Editor (ABE), can realize base substitution between C-to-T and A-to-G, and are widely applied to the fields of human disease animal model construction, clinical trials, plant gene function verification, crop improvement and the like. In recent years, base editors such as CGBE and AYBE capable of realizing base transversions have also appeared successively. The CGBE base editor is currently developed mainly based on CBE, and the efficiency and purity of C-to-G are improved by removing Uracil Glycosylase Inhibitor (UGI) or replacing it with uracil glycosylase (UNG) or fusing Base Excision Repair (BER) pathway related proteins. The editing window of the current mainstream CGBE tool is limited by the 5 th-9 th bit of PAM distal end, and has WC (w=a/T) motif preference, limiting the wide application of the current CGBE tool.
Disclosure of Invention
Aiming at the defects of the existing base editing tool, the invention provides a C-to-G base editing system which has high accuracy, high universality and low off-target rate.
The technical scheme adopted for solving the technical problems is as follows:
the first invention provides a C-to-G base editing system comprising the following A1) or A2):
a1 Is derived from sea lampreyPetromyzon marinus) Cytosine deaminase PmCDA1 (hereinafter referred to as CDA 1) or cytosine deaminase CDA1 with different truncated lengths from streptococcus pyogenesStreptococcus pyogenes) A fusion protein of SpCas9 nickase (nCas 9 (D10A)) and nuclear localization signal NLS, a base editing system collectively referred to as CDA1 miniCGBE containing the fusion protein;
a2 Cytosine deaminase CDA1 derived from sea lamprey or cytosine deaminase CDA1 of different truncated length, spCas9 nickase (nCas 9 (D10A)) derived from streptococcus pyogenes, saccharomyces cerevisiae @Saccharomyces cerevisiae) uracil-DNA glycosylase UNG1 and Nuclear Localization Signal (NLS), and a base editing system, collectively referred to as CDA1 CGBE, containing the fusion protein.
In a specific embodiment, the cytosine deaminase CDA1 in both fusion proteins (A1) and (A2) described in the present invention is truncated to different lengths.
In specific embodiments, the fusion protein in the C-to-G base editing system is specifically selected from any one of the following B1) -B4):
b1 A fusion protein with the full length of CDA1, fused in the structure of A1) or A2) and fused at the C end of nCas9 (D10A), wherein a base editing system containing the fusion protein is called cCDA1-miniCGBE or cCDA1-CGBE;
b2 A fusion protein with the full length of CDA1, fused in the structures of A1) and A2) and fused at the N end of nCas9 (D10A), wherein a base editing system containing the fusion protein is called CDA1-miniCGBE or CDA1-CGBE;
b3 CDA1 is truncated from the C-terminus by the number cda1Δ, to 195, 194, 193, 192, 190, 188 amino acids, designated cda1Δ 195, cda1Δ194, cdaaΔ193, cdaaΔ192, cdaaaΔ190, cdaaΔ188, respectively, fused into the A1) structure; truncating 195, 194, 193, 192, 190, 188, 182, 176, 167, 161, 158, 150 amino acids named cda1Δ195, cda1Δ194, cdaaα193, cdaaα192, cdaaα190, cdaaΔ188, cdaaaα182, cdaaα176, cdaα167, cdaaΔ161, cdaΔ158, cdaaα150, respectively, fused into the A2) structure;
b4 CDA1 truncated from the N-terminus to the 28 th amino acid sequence and from the C-terminus to the 161 th amino acid sequence, designated cda1Δn28-161, fused in the A2) structure.
The cytosine deaminase CDA1 is fused to the N-terminal of nCas9 (D10A) with the exception of cCDA1-miniCGBE and cCDA 1-CGBE.
The amino acid sequence of CDA1 comprises the sequence shown in SEQ ID NO. 1.
Preferably, the CDA1Δ195 amino acid sequence comprises the sequence shown in SEQ ID NO. 2.
Preferably, the CDA1Δ194 amino acid sequence comprises the sequence shown in SEQ ID NO. 3.
Preferably, the CDA1Δ193 amino acid sequence comprises the sequence set forth in SEQ ID NO. 4.
Preferably, the CDA1Δ192 amino acid sequence comprises the sequence shown in SEQ ID NO. 5.
Preferably, the CDA1Δ190 amino acid sequence comprises the sequence shown in SEQ ID NO. 6.
Preferably, the CDA1Δ188 amino acid sequence comprises the sequence shown in SEQ ID NO. 7.
Preferably, the CDA1Δ182 amino acid sequence comprises the sequence shown in SEQ ID NO. 8.
Preferably, the CDA1Δ176 amino acid sequence comprises the sequence shown in SEQ ID NO. 9.
Preferably, the CDA1Δ167 amino acid sequence comprises the sequence shown in SEQ ID NO. 10.
Preferably, the CDA1Δ161 amino acid sequence comprises the sequence shown in SEQ ID NO. 11.
Preferably, the CDA1Δ158 amino acid sequence comprises the sequence shown in SEQ ID NO. 12.
Preferably, the CDA1Δ150 amino acid sequence comprises the sequence shown in SEQ ID NO. 13.
Preferably, the CDA1ΔN28-161 amino acid sequence comprises the sequence shown in SEQ ID NO. 14.
SEQ ID NO.1:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMIQVKILHTTKSPAV
SEQ ID NO.2:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMI
SEQ ID NO.3:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIM
SEQ ID NO.4:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSI
SEQ ID NO.5:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELS
SEQ ID NO.6:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSE
SEQ ID NO.7:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRR
SEQ ID NO.8:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLK
SEQ ID NO.9:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRW
SEQ ID NO.10:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSS
SEQ ID NO.11:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRK
SEQ ID NO.12:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQC
SEQ ID NO.13:
MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNV
SEQ ID NO.14:
MSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRK
The yeast derived uracil-DNA glycosylase UNG1 is fused to the C-terminus of nCas9 (D10A).
The UNG1 amino acid sequence comprises a sequence shown in SEQ ID NO. 15.
SEQ ID NO.15:
MWCMRRLPTNSVMTVARKRKQTTIEDFFGTKKSTNEAPNKKGKSGATFMTITNGAAIKTETKAVAKEANTDKYPANSNAKDVYSKNLSSNLRTLLSLELETIDDSWFPHLMDEFKKPYFVKLKQFVTKEQADHTVFPPAKDIYSWTRLTPFNKVKVVIIGQDPYHNFNQAHGLAFSVKPPTPAPPSLKNIYKELKQEYPDFVEDNKVGDLTHWASQGVLLLNTSLTVRAHNANSHSKHGWETFTKRVVQLLIQDREADGKSLVFLLWGNNAIKLVESLLGSTSVGSGSKYPNIMVMKSVHPSPLSASRGFFGTNHFKMINDWLYNTRGEKMIDWSVVPGTSLREVQEANARLESESKDP
The nuclear localization signal NLS is fused to the C-terminus of nCas9 (D10A) or UNG 1.
The amino acid sequence of the NLS comprises a sequence shown in SEQ ID NO. 16.
SEQ ID NO.16:
PKKKRKV
In specific embodiments, the base editing system further comprises an sgRNA or an sgRNA expression vector.
By designing the fusion protein, the fusion protein and sgRNA form a complex, and the target sequence can be targeted and base editing can be performed.
In a second aspect, the invention also provides the use of the C-to-G base editing system described above for gene editing, for preparing a product for gene editing, or for improving the accuracy of gene editing.
In a third aspect, the invention also protects the fusion protein as described hereinbefore.
In a fourth aspect, the invention also provides nucleic acid molecules encoding the fusion proteins of the foregoing.
In a fifth aspect, the invention also provides a biological material associated with a nucleic acid molecule as hereinbefore described, said biological material being any one of the following:
(C1) An expression cassette comprising a nucleic acid molecule as hereinbefore described;
(C2) A recombinant vector comprising a nucleic acid molecule as described above, or a recombinant vector comprising an expression cassette as described in (C1);
(C3) A recombinant microorganism comprising a nucleic acid molecule as described above, or a recombinant microorganism comprising an expression cassette as described in (C1), or a recombinant microorganism comprising a recombinant vector as described in (C2);
(C4) A recombinant host cell comprising a nucleic acid molecule as described hereinbefore, or a recombinant host cell comprising an expression cassette as described under (C1), or a recombinant host cell comprising a recombinant vector as described under (C2).
In a sixth aspect, the invention also protects the fusion protein as described above, the nucleic acid molecule as described above, the use of the biological material as described above in any of the following (D1) - (D4):
(D1) Application in gene editing;
(D2) Application in preparing a gene editing system;
(D3) Application in preparing gene editing products;
(D4) The application in improving the gene editing accuracy.
In a seventh aspect, the invention provides a method of improving the editing accuracy of a C-to-G base editing system by preparing a gene editing vector for editing by forming a complex of any of the fusion proteins described above and a nucleic acid molecule encoding an sgRNA.
Advantageous effects
The invention changes the editing window of the existing CGBE base editor from 5 th to 9 th positions of the PAM far end to 3 rd to 4 th positions; the editing window is reduced to 1-2 bases, so that the accuracy is improved; in addition, the method has the advantages of low off-target, high purity and the like, and has wide application prospect.
Drawings
FIG. 1 is a schematic representation of the full length and all truncated forms of cytosine deaminase CDA 1;
FIG. 2 is a schematic diagram of the structure of a CDA1 BE3 base editing system expression vector;
FIG. 3 is a schematic diagram of the structure of the CDA1 miniCGBE base editing system expression vector;
FIG. 4 is a schematic diagram of the structure of a CDA1 CGBE base editing system expression vector;
FIG. 5 shows that BE3, miniCGBE and CGBE constructed at the N-terminus or C-terminus of nCas9 (D10A) of CDA1 are on the target PolyC-1, respectively, with the PAM distal end at positions 1 to 7 (C 1 -C 7 ) Is compared with the editing efficiency of the video file;
FIG. 6 is a graph of all editors depicted in FIG. 5 on the target PolyC-1, C 3 The purity of the product at position C-to-G;
FIG. 7 is a statistical plot of the average C-to-G editing efficiency of BE3, miniCGBE and CGBE base editors constructed from CDA1, CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188 fusion proteins in three polyC targets;
FIG. 8 is a BE3, miniCGBE and CGBE base editor constructed from CDA1, CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188 fusion proteins C in three polyC targets 3 The purity of the product at position C-to-G;
FIG. 9 is a statistical plot of the average C-to-G editing efficiency of BE3 and CGBE base editors constructed from CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150, CDA1ΔN28-161 fusion proteins in three polyC targets;
FIG. 10 is a BE3 and CGBE base editor constructed from CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150, CDA1ΔN28-161 fusion proteins C in three polyC targets 3 The purity of the product at position C-to-G;
FIG. 11 is a schematic diagram of CDA1-CGBE, CDA1Δ194-CGBE, CDA1Δ176-CGBE, CDA1Δ161-CGBE, CDA1ΔN28-161-CGBECAN1-5 off-target analysis on target.
Detailed Description
In order to more particularly demonstrate the technical solutions and advantages of the present invention, the following description will proceed with reference being made to the accompanying drawings. The embodiments shown are merely preferred embodiments and are not intended to limit the invention in any other form or forms, as the invention may be embodied in different forms. The experimental procedures in the examples below, unless otherwise indicated, are conventional and are carried out according to techniques, experimental conditions, reagents or according to product specifications described in the literature in this field. The following instruments, materials, reagents, etc. are commercially available unless otherwise specified. All the sequences of the invention are 5 'end to 3' end, and the amino acid sequence of each protein sequence is N end to C end if no special description exists.
YPDA medium in the following examples was prepared from 20g/L peptone, 10g/L yeast extract, 20g/L glucose, 0.12g/L adenine hemisulfate and water, and 15g/L agarose was added to the solid medium.
The defective media in the examples below were prepared from 6.7g/L YNB, 20g/L glucose, a suitable amount of SC defective amino acid mixture lacking uracil and leucine (SC-L-U) and water, and the solid medium was supplemented with an additional 15g/L agarose.
The induction medium in the following examples was formulated with 6.7g/L YNB, 20g/L galactose, 10g/L raffinose, a suitable amount of SC-deficient amino acid mixture lacking uracil and leucine (SC-L-U) and water.
The cCDA1-miniCGBE and the cCDA1-CGBE base editor described in B1) are constructed by taking a pJT46_GalL_cCDA1-BE3 (Addgene: # 145039) expression vector as a skeleton.
All CDA1 miniCGBE and CDA1 CGBE base editors described in B2), B3) and B4) in the invention were constructed with pJT45_GalL_nCDA1-BE3 (Addgene: # 145038) expression vector as a backbone.
All sgRNA expression vectors of the invention were expressed as pJT303_SNR 52_sgRNA\uCan13 (Addgene: # 145066) as backbone for construction of the vector of the invention.
The cCDA1-miniCGBE, cCDA1-CGBE, 7 CDA1 miniCGBE and 14 CDA1 CGBE base editors of the invention are fused at the N end or the C end of nCas9 (D10A) with one of the CDA1 deaminase with different lengths shown in SEQ ID NO.1-SEQ ID NO.14, the whole length is named CDA1, and the rest are named numbers corresponding to the length of the CDA1 amino acid, and the figure is shown in FIG. 1.
The fusion protein B1) provided by the invention has nCas9 (D10A) and a terminator at two ends of UGI with two enzyme cutting sites of AscI and SphI respectively, so that UGI can be conveniently removed or UNG1 fragments can be connected between the two enzyme cutting sites.
The fusion proteins described in B2), B3) and B4) of the invention are connected with promoters at two ends of CDA1galL and nCas9 (D10A) have two cleavage sites of SpeI and SbfI, respectively, facilitating cloning and ligation of CDA1 of different lengths intermediate the two cleavage sites.
The fusion proteins B2), B3) and B4) in the invention have two enzyme cutting sites of AscI and MluI respectively at nCas9 (D10A) and terminator connected with two ends of UGI, so that UGI can be conveniently removed or UNG1 fragments can be connected between the two enzyme cutting sites.
All miniCGBE base editors (P galL-nCDA1Δ -nCas9 (D10A) -NLS) described in the invention are constructed by taking the structure of pJT45_GalL_nCDA1-BE3 deleted UGI as a skeleton.
All CGBE base editors (P galL-nCDA1Δ -nCas9 (D10A) -UNG 1-NLS) are constructed by replacing UGI in pJT45_GalL_nCDA1-BE3 with a UNG1 structure described by SEQ ID NO.15 as a framework.
The invention changes the editing window of the existing CGBE base editor from 5 th to 9 th positions of the PAM far end to 3 rd to 4 th positions; the editing window is reduced to 1-2 bases, so that the accuracy is improved; in addition, the method has the advantages of low off-target, high purity of editing products and the like, and has wide application prospect.
Example 1: design and construction of vectors
The carrier construction method comprises the following steps:
c1 Design of the primers required in the construction vector as shown in Table 1;
c2 PCR amplification using Phanta Max Super-Fidelity DNA Polymerase, corresponding primer pairs and DNA templates;
c3 Using restriction enzyme to enzyme cut plasmid vector as skeleton, the reaction condition is: 37 ℃ for 4 hours;
c4 The PCR amplification product and the enzyme digestion product are respectively subjected to fragment size identification through agarose gel electrophoresis, and the digestion is recovered;
c5 Using OK Clon DNA ligation kit II (Accurate) to perform seamless cloning ligation of the purified linear vector and fragment;
c6 Transferring into E.coli DH5 alpha competent cells, coating onto a resistant LB medium, and selecting a monoclonal for sequencing verification;
c7 Transferring the colony with correct sequence to 5mL liquid resistant LB culture medium, and shake culturing at 37 ℃ for 12-18 hours at 225 r/min;
c8 Plasmid was extracted using OMEGA plasmid miniprep kit.
TABLE 1 construction of vector amplification primer sequences
Construction of a base editor expression vector:
d1 Using the primer pair of cCDA1-miniCGBE in Table 1, carrying out PCR amplification by using pJT46_GalL_cCDA1-BE3 as a template, respectively, obtaining fragments, and connecting the fragments with the pJT46_GalL_cCDA1-BE3 linear vector digested by AscI/SphI endonuclease to obtain a cCDA1-miniCGBE expression vector, as shown in FIG. 3;
d2 PCR amplification was performed using the 1F/1R and 3F/3R primer pairs of cCDA1-miniCGBE in Table 1, respectively, using pJT46_GalL_cCDA1-BE3 as a template, to obtain fragment 1 and fragment 3, respectively; PCR amplification was performed using the primer pair 2F and 2R of cCDA1-CGBE in Table 1 and the yeast genome as a template to obtain fragment 2 carrying the UNG1 gene; ligating the 3 fragments with the pJT46_GalL_cCDA1-BE3 linear vector digested with AscI/SphI endonuclease to obtain a cCDA1-CGBE expression vector as shown in FIG. 4;
d3 14 kinds of CDA1 fragments are respectively amplified by using primers in Table 1 and pJT45_GalL_nCDA1-BE3 as templates to obtain corresponding fragments, wherein the sizes of the fragments are shown in FIG. 1, and the fragments are connected to pJT45_GalL_nCDA1-BE3 linear vectors digested by speI/SbfI endonuclease to obtain CDA1ΔBE3 expression vectors shown in FIG. 2;
d4 Using the miniCGBE series primers shown in Table 1, corresponding fragments were obtained by amplification using pJT45_GalL_nCDA1-BE3 as a template, and ligated to the AscI/MluI endonuclease digested D3) linear vector, to obtain a CDA1ΔminiCGBE expression vector, as shown in FIG. 3;
d5 Using the CGBE series primers shown in table 1, fragments 1 and 3 were obtained by amplification using pjt45_gall_ncda1-BE3 as a template, fragment 2 with UNG1 gene was obtained by amplification using the CGBE-2F and CGBE-2R primer pair as a template, and the cda1Δcgbe expression vector was obtained by ligation to the AscI/MluI endonuclease digested D3) linear vector, as shown in fig. 4.
Construction of sgRNA expression vector: using the target point correspondence in Table 1F primer and general R primer, pJT303_SNR 52_sgRNA\uCan1-3 amplified as template and ligated to AatII/KpnI endonuclease digested pJT303_SNR 52_sgRNA/uCan1On a linear vector, a corresponding sgRNA expression vector can be obtained, and the target sequences are shown in Table 2.
TABLE 2 sgRNA sequences
Example 2: transformation and induction of inducible yeasts and high throughput sequencing analysis
In order to detect the editing of the editor in yeast, the present example uses D1) and D2) vectors described above to co-transform and induce, respectively, with the PolyC-1 targeted sgRNA vector in inducible yeast; the vectors of D3), D4) and D5) are used for co-transformation and induction with 3 sgRNA vectors of PolyC in inducible yeast respectively; after DNA extraction, editing efficiency is obtained by high throughput sequencing of the target fragment, and analysis and visualization. The specific embodiment is as follows:
yeast transformation:
e1 Culturing for 2-3 days at 28deg.C on YPDA medium using Saccharomyces cerevisiae BY 4743;
e2 With sterile ddH 2 After washing and harvesting the yeast cells, 100mM LiAc was added and incubated at 28℃for 10 minutes;
e3 After 5 seconds of centrifugation, the supernatant was removed and the cells were mixed with 0.5-1. Mu.g of plasmid DNA, 240. Mu.L of 50% PEG3350, 36. Mu.L of 1M LiAc, 50. Mu.L of 2mg/mL salmon sperm DNA and 20. Mu.L of sterile water in a centrifuge tube and incubated at 42℃for 1-3 hours;
e4 Centrifuging for 5 seconds, removing supernatant, and culturing at 28deg.C on defective solid culture medium SC-L-U (yeast synthetic culture medium lacking uracil and leucine) for 2-3 days;
e5 Monoclonal is selected, shake-cultured in a defective liquid culture medium SC-L-U at 28 ℃ for 18-20 hours at 225r/min, and positive is confirmed by PCR amplification.
Yeast induction:
f1 3-5 positive colonies were picked and cultured in 3mL of defective liquid medium SC-L-U containing 2% glucose at 28℃for 18-20 hours;
f2 Sucking 0.8mL of bacterial liquid, centrifuging, discarding the supernatant, washing 3 times with sterile water to remove residual glucose, and then re-suspending in 5mL of SC-L-U liquid induction medium containing 2% galactose and 1% raffinose, and shake culturing at 28 ℃ for 20 hours at 225 r/min;
f3 Sucking 0.5mL of bacterial liquid, and briefly centrifuging to discard the supernatant to obtain the induced bacterial cells.
Extraction of yeast genome DNA and acquisition of target gene fragments:
g1 Extracting yeast genomic DNA using a yeast genomic DNA extraction kit (Solarbio);
g2 PCR amplification was performed using Phanta Max Super-Fidelity DNA Polymerase, the corresponding primer pair and yeast genomic DNA as templates, the primer pair being shown in Table 3;
g3 After PCR products were purified using the Cycle Pure kit (OMEGA), sequencing was performed.
High throughput sequencing and analysis:
all yeast DNA described in the examples above were subjected to high throughput sequencing by the following steps:
h1 As shown in table 3, PCR amplification was performed using primer pairs for the corresponding targets to obtain gene fragments containing the target sites, and purification was performed using Cycle Pure kit (OMEGA);
h2 Performing PCR-free library construction, high throughput sequencing and data analysis (Bokesen organism, beijing, china) on the purified product, wherein the Illumina NovaSeq 6000 platform is used for sequencing;
h3 Averaging over 100,000 reads per sample. After data filtering, the FASTQ file is analyzed by using https:// github.com/zfcarpe/Cas9Sequencing script;
h4 Vector diagram drawing tools GraphPad Prism 8 and Adobe Illustrator are used for drawing vector diagrams as shown in fig. 4-8.
TABLE 3 high throughput amplification primers
/>
/>
Example 3: editing conditions of a series of CDA1 miniCGBE/CGBE and cCDA1-miniCGBE/CGBE base editors constructed by full-length CDA1
Constructing miniCGBE/CGBE base editors of which the full-length CDA1 is fused at the N end or the C end of nCas9 (D10A), and detecting the editing efficiency (hereinafter referred to as editing efficiency) of C-to-G of the corresponding CBE contrast (BE 3) at the PolyC-1 locus. The results are shown in FIG. 5:
i1 CBE editors CDA1-BE3 and CDA1-BE3 are both very inefficient to edit;
i2 The highest CDA1 miniCGBE/CGBE editing efficiency is improved to about 5 percent;
i3 The cda1-miniCGBE editing efficiency is improved to 18%, and the cda1-CGBE editing efficiency is improved to about 20%.
Further, the purity of the C-to-G edit product (hereinafter referred to as product purity) of the 3 rd position (C3) of the distal end of the target PAM, i.e., the C-to-G duty cycle in C-to-D (d=a/T/G) (total C-to-D is 1), was counted, as shown in fig. 6: the purity of the CDA1 miniCGBE/CGBE and the purity of the CDA1-miniCGBE/CGBE products are improved by 18-44 times.
Example 4: editing condition of BE3/miniCGBE/CGBE base editor constructed by 7 lengths of CDA1
The above embodiments show that, compared with CDA1-BE3, the cCDA1-BE 3C-to-T editing window is narrower, so that the miniCGBE/CGBE editor constructed by the cCDA1 has higher efficiency, and therefore, the editing window can BE reduced to improve the editing efficiency of C-to-G. Earlier studies showed that truncating CDA1 and fusing to the N-terminus of nCas9 (D10A) could narrow the CBE editing window, and in view of this, CDA1 minisbe/CGBE was further engineered into 6 cda1Δminisbe and 13 cda1Δcgbe editors.
After high throughput sequencing and data analysis by transforming the BE3/miniCGBE/CGBE vector constructed with CDA1, CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188 and the sgRNA vector of 3 polyC targets in yeast, respectively, as shown in FIG. 7:
j1 Editing window of base editor constructed based on CDA1 is mainly C 3 -C 4 A bit;
j2 The result shows that the C-to-G editing efficiency of the miniCGBE and the CGBE constructed by the full-length CDA1 at three target sites is very low, and the C-to-G editing efficiency of the miniCGBE and the CGBE constructed by the truncated CDA1 is improved;
j3 CDA1 with the same length, miniCGBE and CGBE have similar C-to-G editing efficiency, but CGBE has higher accuracy and mainly edits C 3 Bits where cda1Δ195 and cda1Δ194 edit efficiency is higher.
Further, the product purity of these editors at 3 targets was counted as shown in fig. 8:
k1 The purity of the products of BE3 with 7 CDA1 lengths is very low, and the products are greatly improved by miniCGBE and CGBE, so that the purity of the products is improved by 80 times at most;
k2 In CDA1 with the same length, the purity of the CGBE product is higher than that of miniCGBE;
k3 In CDA1Δ195 and CDA1Δ194 which are high in editing efficiency, the purity of the product of CDA1Δ194 is higher regardless of miniCGBE or CGBE.
Thus, in combination with the above results, the base editor with the best editing result is CDA1Δ194-CGBE, which shows higher C-to-G editing activity and higher product purity.
Example 5: editing situation of BE3/CGBE base editor constructed by 7 lengths CDA1
The BE3/CGBE vector constructed by converting CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150, CDA1ΔN28-161 and the sgRNA vector of 3 polyC targets in yeast respectively were subjected to high-throughput sequencing and data analysis. The results are shown in FIG. 9:
l1) is the same as conclusion I2), compared with BE3, the editing efficiency of CGBE C-to-G is obviously improved;
l2) after shortening the C end of the CDA1 to 150aa, the editing efficiency of the editor is lower than 1%;
l3) truncates each of the N-and C-termini of CDA1 to a certain length at the same time, still maintaining a certain deamination activity.
Further, the product purity of these editors at 3 targets was counted as shown in fig. 10:
m1) compared with BE3, the purity of the CGBE product is greatly improved by 71 times at most;
m2) after C-terminal truncation of CDA1 to 150aa, not only the editing efficiency was greatly reduced, but also the product purity was lower than 0.5.
Example 6: CGBE base editor constructed by CDA1 with 6 lengthsCAN1Off-target detection of-5 target
The off-target detection is mainly carried out by screening positive bacteria through L-canavanine, carrying out whole genome sequencing and detecting off-target; as shown in Table 2, toCAN1-5 target construction of sgRNA expression vectors; culturing the bacteria liquid after induction editing by using a culture medium containing canavanine; the normal growth colony in the culture medium is the successfully edited positive bacterium. The detailed steps are as follows:
n1) As in example 2, the expression vector containing the editor and the sgRNA vector were subjected to yeast transformation and cultivation, while a set of blank controls without any expression vector was cultivated;
n2) subsequently, as in example 2, the cells were transferred to a liquid induction medium containing 2% galactose and 1% raffinose, and shake-cultured at 28℃for 20 hours at 225 r/min;
n3) diluting the bacterial liquid 10,000 times, coating a blank control on YPDA culture medium, coating the rest bacterial liquid on SC-Arg solid culture medium containing 60 mug/mL L-canavanine, and culturing at 28 ℃ for 2-3 days;
n4) colonies were picked from each dish and cultured in YPDA liquid medium at 28℃with shaking at 225r/min for 20 hours;
n5) extracting 0.5-1 mL bacterial liquid, and extracting yeast genome DNA by using a yeast genome DNA extraction kit (Solarbio);
n6) quality assessment of the extracted DNA samples, and library construction, whole genome sequencing and bioinformatics analysis.
The invention detects the fusion proteins CDA1-CGBE, CDA1Δ194-CGBE, CDA1Δ176-CGBE, CDA1Δ161-CGBE and CDA1ΔN28-161-CGBE in the targetingCAN1Target of geneCAN1-5 off-target condition. As shown in fig. 11:
o1) CDA1-CGBE showed a significantly higher total SNV across the genome than the blank, while the total number of truncated CDA1 SNV was decreased;
o2) CDA1Δ176-CGBE, CDA1Δ161-CGBE and CDA1ΔN28-161-CGBE produce less SNV in total;
o3) according to the mutation type of SNV, the frequency of CDA1-CGBE mutation C-to-T (G-to-A) is obviously higher than that of the control, CDA1Δ194-CGBE is reduced, CDA1Δ176-CGBE, CDA1Δ161-CGBE and CDA1ΔN28-161-CGBE show similar results with the blank control;
o4) 5 inserts/deletions (Indels) produced by the editor on the whole genome were similar to the blank.
Taken together, these results indicate that shorter CDA1 exhibited lower off-target rates and higher safety. While editing efficiency is high, high security is required, and CDA1Δ161-CGBE can be selected. If higher editing efficiency is desired, CDA1Δ194-CGBE may be selected.
Summarizing: the invention provides a series of miniCGBE and CGBE base editors constructed based on engineered truncated CDA1, which can realize accurate and efficient C-to-G base editing, and the main editing window is at the C of the remote end of PAM 3 Sites, which are areas where the existing CGBE base editor cannot edit or edit with low efficiency. Furthermore, cda1Δcgbe significantly reduced off-target editing in the whole genome. Therefore, the serial CGBE base editor provided by the invention greatly enriches the existing base editing tools, provides more tool choices for researchers, can provide efficient and accurate C-to-G editing, and brings new possibilities for gene therapy, disease research and agricultural accurate breeding.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included in the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.
Claims (10)
1. A C-to-G base editing system comprising a fusion protein selected from the group consisting of (A1) and (A2) below:
(A1) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A) of different truncated lengths and a nuclear localization signal NLS;
(A2) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A), uracil-DNA glycosylase UNG1 and nuclear localization signal NLS of different truncated lengths; wherein,
cytosine deaminase CDA1 of different truncated length is named cda1Δ195, cda1Δ194, cda1Δ193, cda1Δ192, cda1Δ190, cda1Δ188, cda1Δ182, cda1Δ176, cda1Δ167, cda1Δ161, cda1Δ158, cda1Δ150 and cda1Δn28-161, respectively:
the amino acid sequence of the CDA1 is shown as SEQ ID NO. 1; the CDA1Δ195 amino acid sequence is shown in SEQ ID NO. 2;
the CDA1Δ194 amino acid sequence is shown in SEQ ID NO. 3; the CDA1Δ193 amino acid sequence is shown in SEQ ID NO. 4;
the CDA1Δ192 amino acid sequence is shown in SEQ ID NO. 5; the CDA1Δ190 amino acid sequence is shown in SEQ ID NO. 6;
the CDA1Δ188 amino acid sequence is shown in SEQ ID NO. 7; the CDA1Δ182 amino acid sequence is shown in SEQ ID NO. 8;
the CDA1Δ176 amino acid sequence is shown in SEQ ID NO. 9; the CDA1Δ167 amino acid sequence is shown as SEQ ID NO. 10;
the CDA1Δ161 amino acid sequence is shown as SEQ ID NO. 11; the CDA1Δ158 amino acid sequence is shown in SEQ ID NO. 12; the CDA1Δ150 amino acid sequence is shown in SEQ ID NO. 13; the CDA1ΔN28-161 amino acid sequence is shown in SEQ ID NO. 14.
2. The C-to-G base editing system of claim 1, wherein the UNG1 is fused to the C-terminus of nCas9 (D10A); the amino acid sequence of UNG1 is shown as SEQ ID NO. 15.
3. The C-to-G base editing system according to claim 1, characterized in that said nuclear localization signal NLS is fused to the C-terminus of nCas9 (D10A) or UNG 1; the amino acid sequence of the NLS is shown as SEQ ID NO. 16.
4. The C-to-G base editing system of claim 1, further comprising an sgRNA or an sgRNA expression vector.
5. Use of the C-to-G base editing system of any one of claims 1-4 for gene editing, for preparing a product for gene editing, or for improving the accuracy of gene editing.
6. The fusion protein of claim 1, wherein the fusion protein is selected from the group consisting of (A1) and (A2) as follows:
(A1) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A) of different truncated lengths and a nuclear localization signal NLS;
(A2) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A), uracil-DNA glycosylase UNG1 and nuclear localization signal NLS of different truncated lengths;
wherein, cytosine deaminase CDA1 with different truncated lengths is named CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188, CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150 and CDA1ΔN28-161 respectively:
the amino acid sequence of the CDA1 is shown as SEQ ID NO. 1; the CDA1Δ195 amino acid sequence is shown in SEQ ID NO. 2;
the CDA1Δ194 amino acid sequence is shown in SEQ ID NO. 3; the CDA1Δ193 amino acid sequence is shown in SEQ ID NO. 4;
the CDA1Δ192 amino acid sequence is shown in SEQ ID NO. 5; the CDA1Δ190 amino acid sequence is shown in SEQ ID NO. 6;
the CDA1Δ188 amino acid sequence is shown in SEQ ID NO. 7; the CDA1Δ182 amino acid sequence is shown in SEQ ID NO. 8;
the CDA1Δ176 amino acid sequence is shown in SEQ ID NO. 9; the CDA1Δ167 amino acid sequence is shown as SEQ ID NO. 10;
the CDA1Δ161 amino acid sequence is shown as SEQ ID NO. 11; the CDA1Δ158 amino acid sequence is shown in SEQ ID NO. 12; the CDA1Δ150 amino acid sequence is shown in SEQ ID NO. 13; the CDA1ΔN28-161 amino acid sequence is shown in SEQ ID NO. 14.
7. A nucleic acid molecule encoding the fusion protein of claim 6.
8. A biological material associated with the nucleic acid molecule of claim 7, said biological material being any one of the following:
(C1) An expression cassette comprising the nucleic acid molecule of claim 7;
(C2) A recombinant vector comprising the nucleic acid molecule of claim 7, or a recombinant vector comprising the expression cassette of (C1);
(C3) A recombinant microorganism comprising the nucleic acid molecule of claim 7, or a recombinant microorganism comprising the expression cassette of (C1), or a recombinant microorganism comprising the recombinant vector of (C2);
(C4) A recombinant host cell comprising the nucleic acid molecule of claim 7, or a recombinant host cell comprising the expression cassette of (C1), or a recombinant host cell comprising the recombinant vector of (C2).
9. Use of the fusion protein of claim 6, the nucleic acid molecule of claim 7, the biomaterial of claim 8 in any one of the following (D1) - (D4):
(D1) Application in gene editing;
(D2) Application in preparing a gene editing system;
(D3) Application in preparing gene editing products;
(D4) The application in improving the gene editing accuracy.
10. A method for improving the editing accuracy of a C-to-G base editing system, which comprises preparing a gene editing vector by forming a complex of the fusion protein of claim 6 and a nucleic acid molecule encoding sgRNA.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410130316.6A CN117683755A (en) | 2024-01-31 | 2024-01-31 | C-to-G base editing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410130316.6A CN117683755A (en) | 2024-01-31 | 2024-01-31 | C-to-G base editing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117683755A true CN117683755A (en) | 2024-03-12 |
Family
ID=90130378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410130316.6A Pending CN117683755A (en) | 2024-01-31 | 2024-01-31 | C-to-G base editing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117683755A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110914310A (en) * | 2017-03-10 | 2020-03-24 | 哈佛大学的校长及成员们 | Cytosine to guanine base editor |
CN113151229A (en) * | 2020-01-22 | 2021-07-23 | 中国科学院遗传与发育生物学研究所 | Cytosine deaminase and cytosine editor comprising the same |
CN113201517A (en) * | 2021-05-12 | 2021-08-03 | 广州大学 | Cytosine single base editor tool and application thereof |
CN116135974A (en) * | 2021-11-17 | 2023-05-19 | 中国科学院天津工业生物技术研究所 | Recombinant glycosylase base editing system and application thereof |
-
2024
- 2024-01-31 CN CN202410130316.6A patent/CN117683755A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110914310A (en) * | 2017-03-10 | 2020-03-24 | 哈佛大学的校长及成员们 | Cytosine to guanine base editor |
CN113151229A (en) * | 2020-01-22 | 2021-07-23 | 中国科学院遗传与发育生物学研究所 | Cytosine deaminase and cytosine editor comprising the same |
CN113201517A (en) * | 2021-05-12 | 2021-08-03 | 广州大学 | Cytosine single base editor tool and application thereof |
CN116135974A (en) * | 2021-11-17 | 2023-05-19 | 中国科学院天津工业生物技术研究所 | Recombinant glycosylase base editing system and application thereof |
Non-Patent Citations (3)
Title |
---|
ABDULLAH 等: "CRISPR base editing and prime editing: DSB and template-free editing systems for bacteria and plants", 《SYNTH SYST BIOTECHNOL》, vol. 05, no. 04, 2 September 2020 (2020-09-02), pages 277 - 292 * |
刘佳慧 等: "单碱基基因编辑系统的研究进展", 《世界科技研究与发展》, vol. 39, no. 06, 15 September 2017 (2017-09-15), pages 457 - 462 * |
无: "Accession.NP_013691.1, uracil-DNA glycosylase [Saccharomyces cerevisiae S288C]", 《GENBANK》, 26 January 2024 (2024-01-26) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170088845A1 (en) | Vectors and methods for fungal genome engineering by crispr-cas9 | |
CN109880851B (en) | Screening report vector and screening method for enriching CRISPR/Cas 9-mediated homologous recombination repair cells | |
CN109306361B (en) | Novel gene editing system for base fixed-point conversion from A/T to G/C | |
CN110607320B (en) | Plant genome directional base editing framework vector and application thereof | |
CN110846239B (en) | Recombinant yarrowia lipolytica with high homologous recombination efficiency as well as construction method and application thereof | |
CN108034671B (en) | Plasmid vector and method for establishing plant population by using same | |
CN113717960A (en) | Novel Cas9 protein, CRISPR-Cas9 genome directed editing vector and genome editing method | |
WO2022156188A1 (en) | Method for producing target dna sequence and cloning vector | |
CN110938614A (en) | High-activity β -galactosidase, plasmid for high-throughput screening of same and preparation method thereof | |
WO2022057094A1 (en) | Molecular cloning method based on synthetic gene and saccharomyces cerevisiae homologous recombination mechanism | |
CN116286931B (en) | Double-plasmid system for rapid gene editing of Ralstonia eutropha and application thereof | |
CN113564197A (en) | Construction method and application of CRISPR/Cas 9-mediated plant polygene editing vector | |
AU2013361289B2 (en) | Compositions and methods for creating altered and improved cells and organisms | |
WO2023016021A1 (en) | Base editing tool and construction method therefor | |
CN114540356B (en) | Rhodosporidium toruloides promoter and application thereof | |
CN117683755A (en) | C-to-G base editing system | |
CN111334523A (en) | In-vivo multi-round iterative assembly method for large-scale DNA | |
CN107384968A (en) | The Yeast engineering bacterium strain of Chromosomal fusion transformation | |
CN116083432B (en) | Mulberry U6 promoter and application thereof | |
US8586826B2 (en) | Virus vector and use thereof | |
CN116478987A (en) | PE-Nt4 guided editing system and application thereof in genome base editing | |
CN115896158A (en) | Recombinant vector in plant gene editing, method for screening false positive callus by using recombinant vector and application of recombinant vector | |
CN117327742A (en) | Technical method for promoting efficient replacement and homogenization of chlamydomonas chloroplast genome | |
KR20230041978A (en) | Novel U6 promoter separated form grapevine and use of the same | |
CN117467693A (en) | Genome editing vector and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |