CN117683755A

CN117683755A - C-to-G base editing system

Info

Publication number: CN117683755A
Application number: CN202410130316.6A
Authority: CN
Inventors: 谭俊杰; 李铮; 赵薇; 李绍康
Original assignee: Sanya Research Institute Of Nanjing Agricultural University; Nanjing Agricultural University
Current assignee: Sanya Research Institute Of Nanjing Agricultural University; Nanjing Agricultural University
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-03-12

Abstract

The invention relates to the field of genetic engineering in biotechnology, in particular to a C-to-G base editing system; the C-to-G base editing system contains the following (A1) or (A2): (A1) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A) of different truncated lengths and a nuclear localization signal NLS; (A2) Cytosine deaminase CDA1 or fusion proteins of cytosine deaminase CDA1, nCas9 (D10A), uracil-DNA glycosylase UNG1 and nuclear localization signal NLS of different truncated lengths. The invention changes the editing window of the existing CGBE base editor from 5 th to 9 th positions of the PAM far end to 3 rd to 4 th positions; the editing window is reduced to 1-2 bases, so that the accuracy is improved; in addition, the method has the advantages of low off-target, high purity of editing products and the like, and has wide application prospect.

Description

C-to-G base editing system

Technical Field

The invention relates to the field of genetic engineering in biotechnology, in particular to a C-to-G base editing system constructed based on cytosine deaminase CDA 1.

Background

The genome editing technology is one kind of genetic engineering technology to directionally modify genome of organism. CRISPR/Cas9 systems are currently the most commonly used genome editing systems that target a target site by guide RNA, and cleavage of the target site by Cas9 protein produces a Double Strand Break (DSB), often triggering a non-homologous end joining repair (NHEJ) mechanism, resulting in random base insertions/deletions (Indels) of the target site, leading to gene silencing or loss of function. However, since the repair results are random, it is difficult for the CRISPR/Cas9 system to accurately introduce point mutations. Base Editors (BEs) built based on CRISPR/Cas9 can base specific base switch to target sites without causing DSBs. The main base editors at present mainly comprise a Cytosine Base Editor (CBE) and an Adenine Base Editor (ABE), can realize base substitution between C-to-T and A-to-G, and are widely applied to the fields of human disease animal model construction, clinical trials, plant gene function verification, crop improvement and the like. In recent years, base editors such as CGBE and AYBE capable of realizing base transversions have also appeared successively. The CGBE base editor is currently developed mainly based on CBE, and the efficiency and purity of C-to-G are improved by removing Uracil Glycosylase Inhibitor (UGI) or replacing it with uracil glycosylase (UNG) or fusing Base Excision Repair (BER) pathway related proteins. The editing window of the current mainstream CGBE tool is limited by the 5 th-9 th bit of PAM distal end, and has WC (w=a/T) motif preference, limiting the wide application of the current CGBE tool.

Disclosure of Invention

Aiming at the defects of the existing base editing tool, the invention provides a C-to-G base editing system which has high accuracy, high universality and low off-target rate.

The technical scheme adopted for solving the technical problems is as follows:

the first invention provides a C-to-G base editing system comprising the following A1) or A2):

a1 Is derived from sea lampreyPetromyzon marinus) Cytosine deaminase PmCDA1 (hereinafter referred to as CDA 1) or cytosine deaminase CDA1 with different truncated lengths from streptococcus pyogenesStreptococcus pyogenes) A fusion protein of SpCas9 nickase (nCas 9 (D10A)) and nuclear localization signal NLS, a base editing system collectively referred to as CDA1 miniCGBE containing the fusion protein;

a2 Cytosine deaminase CDA1 derived from sea lamprey or cytosine deaminase CDA1 of different truncated length, spCas9 nickase (nCas 9 (D10A)) derived from streptococcus pyogenes, saccharomyces cerevisiae @Saccharomyces cerevisiae) uracil-DNA glycosylase UNG1 and Nuclear Localization Signal (NLS), and a base editing system, collectively referred to as CDA1 CGBE, containing the fusion protein.

In a specific embodiment, the cytosine deaminase CDA1 in both fusion proteins (A1) and (A2) described in the present invention is truncated to different lengths.

In specific embodiments, the fusion protein in the C-to-G base editing system is specifically selected from any one of the following B1) -B4):

b1 A fusion protein with the full length of CDA1, fused in the structure of A1) or A2) and fused at the C end of nCas9 (D10A), wherein a base editing system containing the fusion protein is called cCDA1-miniCGBE or cCDA1-CGBE;

b2 A fusion protein with the full length of CDA1, fused in the structures of A1) and A2) and fused at the N end of nCas9 (D10A), wherein a base editing system containing the fusion protein is called CDA1-miniCGBE or CDA1-CGBE;

b3 CDA1 is truncated from the C-terminus by the number cda1Δ, to 195, 194, 193, 192, 190, 188 amino acids, designated cda1Δ 195, cda1Δ194, cdaaΔ193, cdaaΔ192, cdaaaΔ190, cdaaΔ188, respectively, fused into the A1) structure; truncating 195, 194, 193, 192, 190, 188, 182, 176, 167, 161, 158, 150 amino acids named cda1Δ195, cda1Δ194, cdaaα193, cdaaα192, cdaaα190, cdaaΔ188, cdaaaα182, cdaaα176, cdaα167, cdaaΔ161, cdaΔ158, cdaaα150, respectively, fused into the A2) structure;

b4 CDA1 truncated from the N-terminus to the 28 th amino acid sequence and from the C-terminus to the 161 th amino acid sequence, designated cda1Δn28-161, fused in the A2) structure.

The cytosine deaminase CDA1 is fused to the N-terminal of nCas9 (D10A) with the exception of cCDA1-miniCGBE and cCDA 1-CGBE.

The amino acid sequence of CDA1 comprises the sequence shown in SEQ ID NO. 1.

Preferably, the CDA1Δ195 amino acid sequence comprises the sequence shown in SEQ ID NO. 2.

Preferably, the CDA1Δ194 amino acid sequence comprises the sequence shown in SEQ ID NO. 3.

Preferably, the CDA1Δ193 amino acid sequence comprises the sequence set forth in SEQ ID NO. 4.

Preferably, the CDA1Δ192 amino acid sequence comprises the sequence shown in SEQ ID NO. 5.

Preferably, the CDA1Δ190 amino acid sequence comprises the sequence shown in SEQ ID NO. 6.

Preferably, the CDA1Δ188 amino acid sequence comprises the sequence shown in SEQ ID NO. 7.

Preferably, the CDA1Δ182 amino acid sequence comprises the sequence shown in SEQ ID NO. 8.

Preferably, the CDA1Δ176 amino acid sequence comprises the sequence shown in SEQ ID NO. 9.

Preferably, the CDA1Δ167 amino acid sequence comprises the sequence shown in SEQ ID NO. 10.

Preferably, the CDA1Δ161 amino acid sequence comprises the sequence shown in SEQ ID NO. 11.

Preferably, the CDA1Δ158 amino acid sequence comprises the sequence shown in SEQ ID NO. 12.

Preferably, the CDA1Δ150 amino acid sequence comprises the sequence shown in SEQ ID NO. 13.

Preferably, the CDA1ΔN28-161 amino acid sequence comprises the sequence shown in SEQ ID NO. 14.

SEQ ID NO.1：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMIQVKILHTTKSPAV

SEQ ID NO.2：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMI

SEQ ID NO.3：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIM

SEQ ID NO.4：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSI

SEQ ID NO.5：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELS

SEQ ID NO.6：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSE

SEQ ID NO.7：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRR

SEQ ID NO.8：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLK

SEQ ID NO.9：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRW

SEQ ID NO.10：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSS

SEQ ID NO.11：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRK

SEQ ID NO.12：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQC

SEQ ID NO.13：

MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNV

SEQ ID NO.14：

MSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQIGLWNLRDNGVGLNVMVSEHYQCCRK

The yeast derived uracil-DNA glycosylase UNG1 is fused to the C-terminus of nCas9 (D10A).

The UNG1 amino acid sequence comprises a sequence shown in SEQ ID NO. 15.

SEQ ID NO.15：

MWCMRRLPTNSVMTVARKRKQTTIEDFFGTKKSTNEAPNKKGKSGATFMTITNGAAIKTETKAVAKEANTDKYPANSNAKDVYSKNLSSNLRTLLSLELETIDDSWFPHLMDEFKKPYFVKLKQFVTKEQADHTVFPPAKDIYSWTRLTPFNKVKVVIIGQDPYHNFNQAHGLAFSVKPPTPAPPSLKNIYKELKQEYPDFVEDNKVGDLTHWASQGVLLLNTSLTVRAHNANSHSKHGWETFTKRVVQLLIQDREADGKSLVFLLWGNNAIKLVESLLGSTSVGSGSKYPNIMVMKSVHPSPLSASRGFFGTNHFKMINDWLYNTRGEKMIDWSVVPGTSLREVQEANARLESESKDP

The nuclear localization signal NLS is fused to the C-terminus of nCas9 (D10A) or UNG 1.

The amino acid sequence of the NLS comprises a sequence shown in SEQ ID NO. 16.

SEQ ID NO.16：

PKKKRKV

In specific embodiments, the base editing system further comprises an sgRNA or an sgRNA expression vector.

By designing the fusion protein, the fusion protein and sgRNA form a complex, and the target sequence can be targeted and base editing can be performed.

In a second aspect, the invention also provides the use of the C-to-G base editing system described above for gene editing, for preparing a product for gene editing, or for improving the accuracy of gene editing.

In a third aspect, the invention also protects the fusion protein as described hereinbefore.

In a fourth aspect, the invention also provides nucleic acid molecules encoding the fusion proteins of the foregoing.

In a fifth aspect, the invention also provides a biological material associated with a nucleic acid molecule as hereinbefore described, said biological material being any one of the following:

(C1) An expression cassette comprising a nucleic acid molecule as hereinbefore described;

(C2) A recombinant vector comprising a nucleic acid molecule as described above, or a recombinant vector comprising an expression cassette as described in (C1);

(C3) A recombinant microorganism comprising a nucleic acid molecule as described above, or a recombinant microorganism comprising an expression cassette as described in (C1), or a recombinant microorganism comprising a recombinant vector as described in (C2);

(C4) A recombinant host cell comprising a nucleic acid molecule as described hereinbefore, or a recombinant host cell comprising an expression cassette as described under (C1), or a recombinant host cell comprising a recombinant vector as described under (C2).

In a sixth aspect, the invention also protects the fusion protein as described above, the nucleic acid molecule as described above, the use of the biological material as described above in any of the following (D1) - (D4):

(D1) Application in gene editing;

(D2) Application in preparing a gene editing system;

(D3) Application in preparing gene editing products;

(D4) The application in improving the gene editing accuracy.

In a seventh aspect, the invention provides a method of improving the editing accuracy of a C-to-G base editing system by preparing a gene editing vector for editing by forming a complex of any of the fusion proteins described above and a nucleic acid molecule encoding an sgRNA.

Advantageous effects

The invention changes the editing window of the existing CGBE base editor from 5 th to 9 th positions of the PAM far end to 3 rd to 4 th positions; the editing window is reduced to 1-2 bases, so that the accuracy is improved; in addition, the method has the advantages of low off-target, high purity and the like, and has wide application prospect.

Drawings

FIG. 1 is a schematic representation of the full length and all truncated forms of cytosine deaminase CDA 1;

FIG. 2 is a schematic diagram of the structure of a CDA1 BE3 base editing system expression vector;

FIG. 3 is a schematic diagram of the structure of the CDA1 miniCGBE base editing system expression vector;

FIG. 4 is a schematic diagram of the structure of a CDA1 CGBE base editing system expression vector;

FIG. 5 shows that BE3, miniCGBE and CGBE constructed at the N-terminus or C-terminus of nCas9 (D10A) of CDA1 are on the target PolyC-1, respectively, with the PAM distal end at positions 1 to 7 (C ₁ -C ₇ ) Is compared with the editing efficiency of the video file;

FIG. 6 is a graph of all editors depicted in FIG. 5 on the target PolyC-1, C ₃ The purity of the product at position C-to-G;

FIG. 7 is a statistical plot of the average C-to-G editing efficiency of BE3, miniCGBE and CGBE base editors constructed from CDA1, CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188 fusion proteins in three polyC targets;

FIG. 8 is a BE3, miniCGBE and CGBE base editor constructed from CDA1, CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188 fusion proteins C in three polyC targets ₃ The purity of the product at position C-to-G;

FIG. 9 is a statistical plot of the average C-to-G editing efficiency of BE3 and CGBE base editors constructed from CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150, CDA1ΔN28-161 fusion proteins in three polyC targets;

FIG. 10 is a BE3 and CGBE base editor constructed from CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150, CDA1ΔN28-161 fusion proteins C in three polyC targets ₃ The purity of the product at position C-to-G;

FIG. 11 is a schematic diagram of CDA1-CGBE, CDA1Δ194-CGBE, CDA1Δ176-CGBE, CDA1Δ161-CGBE, CDA1ΔN28-161-CGBECAN1-5 off-target analysis on target.

Detailed Description

In order to more particularly demonstrate the technical solutions and advantages of the present invention, the following description will proceed with reference being made to the accompanying drawings. The embodiments shown are merely preferred embodiments and are not intended to limit the invention in any other form or forms, as the invention may be embodied in different forms. The experimental procedures in the examples below, unless otherwise indicated, are conventional and are carried out according to techniques, experimental conditions, reagents or according to product specifications described in the literature in this field. The following instruments, materials, reagents, etc. are commercially available unless otherwise specified. All the sequences of the invention are 5 'end to 3' end, and the amino acid sequence of each protein sequence is N end to C end if no special description exists.

YPDA medium in the following examples was prepared from 20g/L peptone, 10g/L yeast extract, 20g/L glucose, 0.12g/L adenine hemisulfate and water, and 15g/L agarose was added to the solid medium.

The defective media in the examples below were prepared from 6.7g/L YNB, 20g/L glucose, a suitable amount of SC defective amino acid mixture lacking uracil and leucine (SC-L-U) and water, and the solid medium was supplemented with an additional 15g/L agarose.

The induction medium in the following examples was formulated with 6.7g/L YNB, 20g/L galactose, 10g/L raffinose, a suitable amount of SC-deficient amino acid mixture lacking uracil and leucine (SC-L-U) and water.

The cCDA1-miniCGBE and the cCDA1-CGBE base editor described in B1) are constructed by taking a pJT46_GalL_cCDA1-BE3 (Addgene: # 145039) expression vector as a skeleton.

All CDA1 miniCGBE and CDA1 CGBE base editors described in B2), B3) and B4) in the invention were constructed with pJT45_GalL_nCDA1-BE3 (Addgene: # 145038) expression vector as a backbone.

All sgRNA expression vectors of the invention were expressed as pJT303_SNR 52_sgRNA\uCan13 (Addgene: # 145066) as backbone for construction of the vector of the invention.

The cCDA1-miniCGBE, cCDA1-CGBE, 7 CDA1 miniCGBE and 14 CDA1 CGBE base editors of the invention are fused at the N end or the C end of nCas9 (D10A) with one of the CDA1 deaminase with different lengths shown in SEQ ID NO.1-SEQ ID NO.14, the whole length is named CDA1, and the rest are named numbers corresponding to the length of the CDA1 amino acid, and the figure is shown in FIG. 1.

The fusion protein B1) provided by the invention has nCas9 (D10A) and a terminator at two ends of UGI with two enzyme cutting sites of AscI and SphI respectively, so that UGI can be conveniently removed or UNG1 fragments can be connected between the two enzyme cutting sites.

The fusion proteins described in B2), B3) and B4) of the invention are connected with promoters at two ends of CDA1galL and nCas9 (D10A) have two cleavage sites of SpeI and SbfI, respectively, facilitating cloning and ligation of CDA1 of different lengths intermediate the two cleavage sites.

The fusion proteins B2), B3) and B4) in the invention have two enzyme cutting sites of AscI and MluI respectively at nCas9 (D10A) and terminator connected with two ends of UGI, so that UGI can be conveniently removed or UNG1 fragments can be connected between the two enzyme cutting sites.

All miniCGBE base editors (P galL-nCDA1Δ -nCas9 (D10A) -NLS) described in the invention are constructed by taking the structure of pJT45_GalL_nCDA1-BE3 deleted UGI as a skeleton.

All CGBE base editors (P galL-nCDA1Δ -nCas9 (D10A) -UNG 1-NLS) are constructed by replacing UGI in pJT45_GalL_nCDA1-BE3 with a UNG1 structure described by SEQ ID NO.15 as a framework.

The invention changes the editing window of the existing CGBE base editor from 5 th to 9 th positions of the PAM far end to 3 rd to 4 th positions; the editing window is reduced to 1-2 bases, so that the accuracy is improved; in addition, the method has the advantages of low off-target, high purity of editing products and the like, and has wide application prospect.

Example 1: design and construction of vectors

The carrier construction method comprises the following steps:

c1 Design of the primers required in the construction vector as shown in Table 1;

c2 PCR amplification using Phanta Max Super-Fidelity DNA Polymerase, corresponding primer pairs and DNA templates;

c3 Using restriction enzyme to enzyme cut plasmid vector as skeleton, the reaction condition is: 37 ℃ for 4 hours;

c4 The PCR amplification product and the enzyme digestion product are respectively subjected to fragment size identification through agarose gel electrophoresis, and the digestion is recovered;

c5 Using OK Clon DNA ligation kit II (Accurate) to perform seamless cloning ligation of the purified linear vector and fragment;

c6 Transferring into E.coli DH5 alpha competent cells, coating onto a resistant LB medium, and selecting a monoclonal for sequencing verification;

c7 Transferring the colony with correct sequence to 5mL liquid resistant LB culture medium, and shake culturing at 37 ℃ for 12-18 hours at 225 r/min;

c8 Plasmid was extracted using OMEGA plasmid miniprep kit.

TABLE 1 construction of vector amplification primer sequences

Construction of a base editor expression vector:

d1 Using the primer pair of cCDA1-miniCGBE in Table 1, carrying out PCR amplification by using pJT46_GalL_cCDA1-BE3 as a template, respectively, obtaining fragments, and connecting the fragments with the pJT46_GalL_cCDA1-BE3 linear vector digested by AscI/SphI endonuclease to obtain a cCDA1-miniCGBE expression vector, as shown in FIG. 3;

d2 PCR amplification was performed using the 1F/1R and 3F/3R primer pairs of cCDA1-miniCGBE in Table 1, respectively, using pJT46_GalL_cCDA1-BE3 as a template, to obtain fragment 1 and fragment 3, respectively; PCR amplification was performed using the primer pair 2F and 2R of cCDA1-CGBE in Table 1 and the yeast genome as a template to obtain fragment 2 carrying the UNG1 gene; ligating the 3 fragments with the pJT46_GalL_cCDA1-BE3 linear vector digested with AscI/SphI endonuclease to obtain a cCDA1-CGBE expression vector as shown in FIG. 4;

d3 14 kinds of CDA1 fragments are respectively amplified by using primers in Table 1 and pJT45_GalL_nCDA1-BE3 as templates to obtain corresponding fragments, wherein the sizes of the fragments are shown in FIG. 1, and the fragments are connected to pJT45_GalL_nCDA1-BE3 linear vectors digested by speI/SbfI endonuclease to obtain CDA1ΔBE3 expression vectors shown in FIG. 2;

d4 Using the miniCGBE series primers shown in Table 1, corresponding fragments were obtained by amplification using pJT45_GalL_nCDA1-BE3 as a template, and ligated to the AscI/MluI endonuclease digested D3) linear vector, to obtain a CDA1ΔminiCGBE expression vector, as shown in FIG. 3;

d5 Using the CGBE series primers shown in table 1, fragments 1 and 3 were obtained by amplification using pjt45_gall_ncda1-BE3 as a template, fragment 2 with UNG1 gene was obtained by amplification using the CGBE-2F and CGBE-2R primer pair as a template, and the cda1Δcgbe expression vector was obtained by ligation to the AscI/MluI endonuclease digested D3) linear vector, as shown in fig. 4.

Construction of sgRNA expression vector: using the target point correspondence in Table 1F primer and general R primer, pJT303_SNR 52_sgRNA\uCan1-3 amplified as template and ligated to AatII/KpnI endonuclease digested pJT303_SNR 52_sgRNA/uCan1On a linear vector, a corresponding sgRNA expression vector can be obtained, and the target sequences are shown in Table 2.

TABLE 2 sgRNA sequences

Example 2: transformation and induction of inducible yeasts and high throughput sequencing analysis

In order to detect the editing of the editor in yeast, the present example uses D1) and D2) vectors described above to co-transform and induce, respectively, with the PolyC-1 targeted sgRNA vector in inducible yeast; the vectors of D3), D4) and D5) are used for co-transformation and induction with 3 sgRNA vectors of PolyC in inducible yeast respectively; after DNA extraction, editing efficiency is obtained by high throughput sequencing of the target fragment, and analysis and visualization. The specific embodiment is as follows:

yeast transformation:

e1 Culturing for 2-3 days at 28deg.C on YPDA medium using Saccharomyces cerevisiae BY 4743;

e2 With sterile ddH ₂ After washing and harvesting the yeast cells, 100mM LiAc was added and incubated at 28℃for 10 minutes;

e3 After 5 seconds of centrifugation, the supernatant was removed and the cells were mixed with 0.5-1. Mu.g of plasmid DNA, 240. Mu.L of 50% PEG3350, 36. Mu.L of 1M LiAc, 50. Mu.L of 2mg/mL salmon sperm DNA and 20. Mu.L of sterile water in a centrifuge tube and incubated at 42℃for 1-3 hours;

e4 Centrifuging for 5 seconds, removing supernatant, and culturing at 28deg.C on defective solid culture medium SC-L-U (yeast synthetic culture medium lacking uracil and leucine) for 2-3 days;

e5 Monoclonal is selected, shake-cultured in a defective liquid culture medium SC-L-U at 28 ℃ for 18-20 hours at 225r/min, and positive is confirmed by PCR amplification.

Yeast induction:

f1 3-5 positive colonies were picked and cultured in 3mL of defective liquid medium SC-L-U containing 2% glucose at 28℃for 18-20 hours;

f2 Sucking 0.8mL of bacterial liquid, centrifuging, discarding the supernatant, washing 3 times with sterile water to remove residual glucose, and then re-suspending in 5mL of SC-L-U liquid induction medium containing 2% galactose and 1% raffinose, and shake culturing at 28 ℃ for 20 hours at 225 r/min;

f3 Sucking 0.5mL of bacterial liquid, and briefly centrifuging to discard the supernatant to obtain the induced bacterial cells.

Extraction of yeast genome DNA and acquisition of target gene fragments:

g1 Extracting yeast genomic DNA using a yeast genomic DNA extraction kit (Solarbio);

g2 PCR amplification was performed using Phanta Max Super-Fidelity DNA Polymerase, the corresponding primer pair and yeast genomic DNA as templates, the primer pair being shown in Table 3;

g3 After PCR products were purified using the Cycle Pure kit (OMEGA), sequencing was performed.

High throughput sequencing and analysis:

all yeast DNA described in the examples above were subjected to high throughput sequencing by the following steps:

h1 As shown in table 3, PCR amplification was performed using primer pairs for the corresponding targets to obtain gene fragments containing the target sites, and purification was performed using Cycle Pure kit (OMEGA);

h2 Performing PCR-free library construction, high throughput sequencing and data analysis (Bokesen organism, beijing, china) on the purified product, wherein the Illumina NovaSeq 6000 platform is used for sequencing;

h3 Averaging over 100,000 reads per sample. After data filtering, the FASTQ file is analyzed by using https:// github.com/zfcarpe/Cas9Sequencing script;

h4 Vector diagram drawing tools GraphPad Prism 8 and Adobe Illustrator are used for drawing vector diagrams as shown in fig. 4-8.

TABLE 3 high throughput amplification primers

/>

Example 3: editing conditions of a series of CDA1 miniCGBE/CGBE and cCDA1-miniCGBE/CGBE base editors constructed by full-length CDA1

Constructing miniCGBE/CGBE base editors of which the full-length CDA1 is fused at the N end or the C end of nCas9 (D10A), and detecting the editing efficiency (hereinafter referred to as editing efficiency) of C-to-G of the corresponding CBE contrast (BE 3) at the PolyC-1 locus. The results are shown in FIG. 5:

i1 CBE editors CDA1-BE3 and CDA1-BE3 are both very inefficient to edit;

i2 The highest CDA1 miniCGBE/CGBE editing efficiency is improved to about 5 percent;

i3 The cda1-miniCGBE editing efficiency is improved to 18%, and the cda1-CGBE editing efficiency is improved to about 20%.

Further, the purity of the C-to-G edit product (hereinafter referred to as product purity) of the 3 rd position (C3) of the distal end of the target PAM, i.e., the C-to-G duty cycle in C-to-D (d=a/T/G) (total C-to-D is 1), was counted, as shown in fig. 6: the purity of the CDA1 miniCGBE/CGBE and the purity of the CDA1-miniCGBE/CGBE products are improved by 18-44 times.

Example 4: editing condition of BE3/miniCGBE/CGBE base editor constructed by 7 lengths of CDA1

The above embodiments show that, compared with CDA1-BE3, the cCDA1-BE 3C-to-T editing window is narrower, so that the miniCGBE/CGBE editor constructed by the cCDA1 has higher efficiency, and therefore, the editing window can BE reduced to improve the editing efficiency of C-to-G. Earlier studies showed that truncating CDA1 and fusing to the N-terminus of nCas9 (D10A) could narrow the CBE editing window, and in view of this, CDA1 minisbe/CGBE was further engineered into 6 cda1Δminisbe and 13 cda1Δcgbe editors.

After high throughput sequencing and data analysis by transforming the BE3/miniCGBE/CGBE vector constructed with CDA1, CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188 and the sgRNA vector of 3 polyC targets in yeast, respectively, as shown in FIG. 7:

j1 Editing window of base editor constructed based on CDA1 is mainly C ₃ -C ₄ A bit;

j2 The result shows that the C-to-G editing efficiency of the miniCGBE and the CGBE constructed by the full-length CDA1 at three target sites is very low, and the C-to-G editing efficiency of the miniCGBE and the CGBE constructed by the truncated CDA1 is improved;

j3 CDA1 with the same length, miniCGBE and CGBE have similar C-to-G editing efficiency, but CGBE has higher accuracy and mainly edits C ₃ Bits where cda1Δ195 and cda1Δ194 edit efficiency is higher.

Further, the product purity of these editors at 3 targets was counted as shown in fig. 8:

k1 The purity of the products of BE3 with 7 CDA1 lengths is very low, and the products are greatly improved by miniCGBE and CGBE, so that the purity of the products is improved by 80 times at most;

k2 In CDA1 with the same length, the purity of the CGBE product is higher than that of miniCGBE;

k3 In CDA1Δ195 and CDA1Δ194 which are high in editing efficiency, the purity of the product of CDA1Δ194 is higher regardless of miniCGBE or CGBE.

Thus, in combination with the above results, the base editor with the best editing result is CDA1Δ194-CGBE, which shows higher C-to-G editing activity and higher product purity.

Example 5: editing situation of BE3/CGBE base editor constructed by 7 lengths CDA1

The BE3/CGBE vector constructed by converting CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150, CDA1ΔN28-161 and the sgRNA vector of 3 polyC targets in yeast respectively were subjected to high-throughput sequencing and data analysis. The results are shown in FIG. 9:

l1) is the same as conclusion I2), compared with BE3, the editing efficiency of CGBE C-to-G is obviously improved;

l2) after shortening the C end of the CDA1 to 150aa, the editing efficiency of the editor is lower than 1%;

l3) truncates each of the N-and C-termini of CDA1 to a certain length at the same time, still maintaining a certain deamination activity.

Further, the product purity of these editors at 3 targets was counted as shown in fig. 10:

m1) compared with BE3, the purity of the CGBE product is greatly improved by 71 times at most;

m2) after C-terminal truncation of CDA1 to 150aa, not only the editing efficiency was greatly reduced, but also the product purity was lower than 0.5.

Example 6: CGBE base editor constructed by CDA1 with 6 lengthsCAN1Off-target detection of-5 target

The off-target detection is mainly carried out by screening positive bacteria through L-canavanine, carrying out whole genome sequencing and detecting off-target; as shown in Table 2, toCAN1-5 target construction of sgRNA expression vectors; culturing the bacteria liquid after induction editing by using a culture medium containing canavanine; the normal growth colony in the culture medium is the successfully edited positive bacterium. The detailed steps are as follows:

n1) As in example 2, the expression vector containing the editor and the sgRNA vector were subjected to yeast transformation and cultivation, while a set of blank controls without any expression vector was cultivated;

n2) subsequently, as in example 2, the cells were transferred to a liquid induction medium containing 2% galactose and 1% raffinose, and shake-cultured at 28℃for 20 hours at 225 r/min;

n3) diluting the bacterial liquid 10,000 times, coating a blank control on YPDA culture medium, coating the rest bacterial liquid on SC-Arg solid culture medium containing 60 mug/mL L-canavanine, and culturing at 28 ℃ for 2-3 days;

n4) colonies were picked from each dish and cultured in YPDA liquid medium at 28℃with shaking at 225r/min for 20 hours;

n5) extracting 0.5-1 mL bacterial liquid, and extracting yeast genome DNA by using a yeast genome DNA extraction kit (Solarbio);

n6) quality assessment of the extracted DNA samples, and library construction, whole genome sequencing and bioinformatics analysis.

The invention detects the fusion proteins CDA1-CGBE, CDA1Δ194-CGBE, CDA1Δ176-CGBE, CDA1Δ161-CGBE and CDA1ΔN28-161-CGBE in the targetingCAN1Target of geneCAN1-5 off-target condition. As shown in fig. 11:

o1) CDA1-CGBE showed a significantly higher total SNV across the genome than the blank, while the total number of truncated CDA1 SNV was decreased;

o2) CDA1Δ176-CGBE, CDA1Δ161-CGBE and CDA1ΔN28-161-CGBE produce less SNV in total;

o3) according to the mutation type of SNV, the frequency of CDA1-CGBE mutation C-to-T (G-to-A) is obviously higher than that of the control, CDA1Δ194-CGBE is reduced, CDA1Δ176-CGBE, CDA1Δ161-CGBE and CDA1ΔN28-161-CGBE show similar results with the blank control;

o4) 5 inserts/deletions (Indels) produced by the editor on the whole genome were similar to the blank.

Taken together, these results indicate that shorter CDA1 exhibited lower off-target rates and higher safety. While editing efficiency is high, high security is required, and CDA1Δ161-CGBE can be selected. If higher editing efficiency is desired, CDA1Δ194-CGBE may be selected.

Summarizing: the invention provides a series of miniCGBE and CGBE base editors constructed based on engineered truncated CDA1, which can realize accurate and efficient C-to-G base editing, and the main editing window is at the C of the remote end of PAM ₃ Sites, which are areas where the existing CGBE base editor cannot edit or edit with low efficiency. Furthermore, cda1Δcgbe significantly reduced off-target editing in the whole genome. Therefore, the serial CGBE base editor provided by the invention greatly enriches the existing base editing tools, provides more tool choices for researchers, can provide efficient and accurate C-to-G editing, and brings new possibilities for gene therapy, disease research and agricultural accurate breeding.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included in the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.

Claims

1. A C-to-G base editing system comprising a fusion protein selected from the group consisting of (A1) and (A2) below:

(A1) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A) of different truncated lengths and a nuclear localization signal NLS;

(A2) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A), uracil-DNA glycosylase UNG1 and nuclear localization signal NLS of different truncated lengths; wherein,

cytosine deaminase CDA1 of different truncated length is named cda1Δ195, cda1Δ194, cda1Δ193, cda1Δ192, cda1Δ190, cda1Δ188, cda1Δ182, cda1Δ176, cda1Δ167, cda1Δ161, cda1Δ158, cda1Δ150 and cda1Δn28-161, respectively:

the amino acid sequence of the CDA1 is shown as SEQ ID NO. 1; the CDA1Δ195 amino acid sequence is shown in SEQ ID NO. 2;

the CDA1Δ194 amino acid sequence is shown in SEQ ID NO. 3; the CDA1Δ193 amino acid sequence is shown in SEQ ID NO. 4;

the CDA1Δ192 amino acid sequence is shown in SEQ ID NO. 5; the CDA1Δ190 amino acid sequence is shown in SEQ ID NO. 6;

the CDA1Δ188 amino acid sequence is shown in SEQ ID NO. 7; the CDA1Δ182 amino acid sequence is shown in SEQ ID NO. 8;

the CDA1Δ176 amino acid sequence is shown in SEQ ID NO. 9; the CDA1Δ167 amino acid sequence is shown as SEQ ID NO. 10;

the CDA1Δ161 amino acid sequence is shown as SEQ ID NO. 11; the CDA1Δ158 amino acid sequence is shown in SEQ ID NO. 12; the CDA1Δ150 amino acid sequence is shown in SEQ ID NO. 13; the CDA1ΔN28-161 amino acid sequence is shown in SEQ ID NO. 14.

2. The C-to-G base editing system of claim 1, wherein the UNG1 is fused to the C-terminus of nCas9 (D10A); the amino acid sequence of UNG1 is shown as SEQ ID NO. 15.

3. The C-to-G base editing system according to claim 1, characterized in that said nuclear localization signal NLS is fused to the C-terminus of nCas9 (D10A) or UNG 1; the amino acid sequence of the NLS is shown as SEQ ID NO. 16.

4. The C-to-G base editing system of claim 1, further comprising an sgRNA or an sgRNA expression vector.

5. Use of the C-to-G base editing system of any one of claims 1-4 for gene editing, for preparing a product for gene editing, or for improving the accuracy of gene editing.

6. The fusion protein of claim 1, wherein the fusion protein is selected from the group consisting of (A1) and (A2) as follows:

(A2) Fusion proteins of cytosine deaminase CDA1 or cytosine deaminase CDA1, nCas9 (D10A), uracil-DNA glycosylase UNG1 and nuclear localization signal NLS of different truncated lengths;

wherein, cytosine deaminase CDA1 with different truncated lengths is named CDA1Δ195, CDA1Δ194, CDA1Δ193, CDA1Δ192, CDA1Δ190, CDA1Δ188, CDA1Δ182, CDA1Δ176, CDA1Δ167, CDA1Δ161, CDA1Δ158, CDA1Δ150 and CDA1ΔN28-161 respectively:

7. A nucleic acid molecule encoding the fusion protein of claim 6.

8. A biological material associated with the nucleic acid molecule of claim 7, said biological material being any one of the following:

(C1) An expression cassette comprising the nucleic acid molecule of claim 7;

(C2) A recombinant vector comprising the nucleic acid molecule of claim 7, or a recombinant vector comprising the expression cassette of (C1);

(C3) A recombinant microorganism comprising the nucleic acid molecule of claim 7, or a recombinant microorganism comprising the expression cassette of (C1), or a recombinant microorganism comprising the recombinant vector of (C2);

(C4) A recombinant host cell comprising the nucleic acid molecule of claim 7, or a recombinant host cell comprising the expression cassette of (C1), or a recombinant host cell comprising the recombinant vector of (C2).

9. Use of the fusion protein of claim 6, the nucleic acid molecule of claim 7, the biomaterial of claim 8 in any one of the following (D1) - (D4):

(D1) Application in gene editing;

(D2) Application in preparing a gene editing system;

(D3) Application in preparing gene editing products;

(D4) The application in improving the gene editing accuracy.

10. A method for improving the editing accuracy of a C-to-G base editing system, which comprises preparing a gene editing vector by forming a complex of the fusion protein of claim 6 and a nucleic acid molecule encoding sgRNA.